# 单卡GPU 进行 ChatGLM3-6B模型 LORA 高效微调
本 Cookbook 将带领开发者使用 `AdvertiseGen` 对 ChatGLM3-6B 数据集进行 lora微调，使其具备专业的广告生成能力。

## 硬件需求
显存：24GB
显卡架构：安培架构（推荐）
内存：16GB

## 1. 准备数据集
我们使用 AdvertiseGen 数据集来进行微调。从 [Google Drive](https://drive.google.com/file/d/13_vf0xRTQsyneRKdD1bZIr93vBGOczrk/view?usp=sharing) 或者 [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1) 下载处理好的 AdvertiseGen 数据集，将解压后的 AdvertiseGen 目录放到本目录的 `/data/` 下, 例如。
> /media/zr/Data/Code/ChatGLM3/finetune_demo/data/AdvertiseGen

接着，运行本代码来切割数据集

In [1]:
import json
from typing import Union
from pathlib import Path


def _resolve_path(path: Union[str, Path]) -> Path:
    return Path(path).expanduser().resolve()


def _mkdir(dir_name: Union[str, Path]):
    dir_name = _resolve_path(dir_name)
    if not dir_name.is_dir():
        dir_name.mkdir(parents=True, exist_ok=False)


def convert_adgen(data_dir: Union[str, Path], save_dir: Union[str, Path]):
    def _convert(in_file: Path, out_file: Path):
        _mkdir(out_file.parent)
        with open(in_file, encoding='utf-8') as fin:
            with open(out_file, 'wt', encoding='utf-8') as fout:
                for line in fin:
                    dct = json.loads(line)
                    sample = {'conversations': [{'role': 'user', 'content': dct['content']},
                                                {'role': 'assistant', 'content': dct['summary']}]}
                    fout.write(json.dumps(sample, ensure_ascii=False) + '\n')

    data_dir = _resolve_path(data_dir)
    save_dir = _resolve_path(save_dir)

    train_file = data_dir / 'train.json'
    if train_file.is_file():
        out_file = save_dir / train_file.relative_to(data_dir)
        _convert(train_file, out_file)

    dev_file = data_dir / 'dev.json'
    if dev_file.is_file():
        out_file = save_dir / dev_file.relative_to(data_dir)
        _convert(dev_file, out_file)


convert_adgen('/root/data/AdvertiseGen', '/root/data/AdvertiseGen_fix')

## 2. 使用命令行开始微调,我们使用 lora 进行微调
接着，我们仅需要将配置好的参数以命令行的形式传参给程序，就可以使用命令行进行高效微调，这里将 `/media/zr/Data/Code/ChatGLM3/venv/bin/python3` 换成你的 python3 的绝对路径以保证正常运行。

In [2]:
!/root/miniconda3/envs/llm/bin/python finetune_hf.py  /root/data/AdvertiseGen_fix  /root/autodl-tmp/chatglm3-6b/  configs/lora.yaml

Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.



Loading checkpoint shards:   0%|                          | 0/7 [00:00<?, ?it/s]


Loading checkpoint shards:  14%|██▌               | 1/7 [00:00<00:02,  2.59it/s]


Loading checkpoint shards:  29%|█████▏            | 2/7 [00:00<00:01,  2.82it/s]


Loading checkpoint shards:  43%|███████▋          | 3/7 [00:01<00:01,  2.93it/s]


Loading checkpoint shards:  57%|██████████▎       | 4/7 [00:01<00:01,  2.86it/s]


Loading checkpoint shards:  71%|████████████▊     | 5/7 [00:01<00:00,  2.88it/s]


Loading checkpoint shards:  86%|███████████████▍  | 6/7 [00:02<00:00,  2.93it/s]


Loading checkpoint shards: 100%|██████████████████| 7/7 [00:02<00:00,  3.29it/s]
Loading checkpoint shards: 100%|██████████████████| 7/7 [00:02<00:00,  3.04it/s]


trainable params: 1,949,696 || all params: 6,245,533,696 || trainable%: 0.031217444255383614
--> Model



--> model has 1.949696M params



Setting num_proc from 16 back to 1 for the train split to disable multiprocessing as it only contains one shard.

Generating train split: 0 examples [00:00, ? examples/s]


Generating train split: 99068 examples [00:00, 685980.97 examples/s]
Generating train split: 114599 examples [00:00, 699535.95 examples/s]
Setting num_proc from 16 back to 1 for the validation split to disable multiprocessing as it only contains one shard.

Generating validation split: 0 examples [00:00, ? examples/s]
Generating validation split: 1070 examples [00:00, 170636.30 examples/s]
Setting num_proc from 16 back to 1 for the test split to disable multiprocessing as it only contains one shard.

Generating test split: 0 examples [00:00, ? examples/s]


Generating test split: 1070 examples [00:00, 181991.29 examples/s]



Map (num_proc=16):   0%|                      | 0/114599 [00:00<?, ? examples/s]


Map (num_proc=16):   1%|         | 1000/114599 [00:00<01:05, 1746.08 examples/s]


Map (num_proc=16):   5%|▍       | 6000/114599 [00:00<00:09, 11112.12 examples/s]


Map (num_proc=16):  16%|█      | 18000/114599 [00:00<00:02, 33402.46 examples/s]


Map (num_proc=16):  23%|█▌     | 26000/114599 [00:00<00:02, 43817.40 examples/s]


Map (num_proc=16):  31%|██▏    | 35000/114599 [00:01<00:01, 52833.49 examples/s]


Map (num_proc=16):  38%|██▋    | 44000/114599 [00:01<00:01, 60430.67 examples/s]


Map (num_proc=16):  46%|███▏   | 53000/114599 [00:01<00:00, 67657.76 examples/s]


Map (num_proc=16):  54%|███▊   | 62000/114599 [00:01<00:00, 71175.21 examples/s]


Map (num_proc=16):  61%|████▎  | 70000/114599 [00:01<00:00, 70716.11 examples/s]


Map (num_proc=16):  71%|████▉  | 81000/114599 [00:01<00:00, 79525.49 examples/s]


Map (num_proc=16):  79%|█████▌ | 90163/114599 [00:01<00:00, 77381.40 examples/s]


Map (num_proc=16):  88%|█████▎| 100326/114599 [00:01<00:00, 81099.04 examples/s]


Map (num_proc=16):  96%|█████▋| 109465/114599 [00:01<00:00, 82510.05 examples/s]


Map (num_proc=16): 100%|██████| 114599/114599 [00:02<00:00, 48440.40 examples/s]


train_dataset: Dataset({
    features: ['input_ids', 'labels'],
    num_rows: 114599
})



Map (num_proc=16):   0%|                        | 0/1070 [00:00<?, ? examples/s]


Map (num_proc=16):   6%|▉             | 67/1070 [00:00<00:06, 164.27 examples/s]


Map (num_proc=16):  63%|███████▌    | 670/1070 [00:00<00:00, 1660.25 examples/s]


Map (num_proc=16): 100%|███████████| 1070/1070 [00:00<00:00, 1241.57 examples/s]


val_dataset: Dataset({
    features: ['input_ids', 'output_ids'],
    num_rows: 1070
})



Map (num_proc=16):   0%|                        | 0/1070 [00:00<?, ? examples/s]


Map (num_proc=16):   6%|▉             | 67/1070 [00:00<00:06, 163.73 examples/s]


Map (num_proc=16):  81%|█████████▊  | 871/1070 [00:00<00:00, 2006.25 examples/s]


Map (num_proc=16): 100%|███████████| 1070/1070 [00:00<00:00, 1203.98 examples/s]


test_dataset: Dataset({
    features: ['input_ids', 'output_ids'],
    num_rows: 1070
})
--> Sanity check
           '[gMASK]': 64790 -> -100
               'sop': 64792 -> -100
          '<|user|>': 64795 -> -100
                  '': 30910 -> -100
                '\n': 13 -> -100
                  '': 30910 -> -100
                '类型': 33467 -> -100
                 '#': 31010 -> -100
                 '裤': 56532 -> -100
                 '*': 30998 -> -100
                 '版': 55090 -> -100
                 '型': 54888 -> -100
                 '#': 31010 -> -100
                '宽松': 40833 -> -100
                 '*': 30998 -> -100
                '风格': 32799 -> -100
                 '#': 31010 -> -100
                '性感': 40589 -> -100
                 '*': 30998 -> -100
                '图案': 37505 -> -100
                 '#': 31010 -> -100
                '线条': 37216 -> -100
                 '*': 30998 -> -100
                 '裤': 56532 -> -100
    

max_steps is given, it will override any value given in num_train_epochs


***** Running training *****
  Num examples = 114,599
  Num Epochs = 3
  Instantaneous batch size per device = 6
  Total train batch size (w. parallel, distributed & accumulation) = 6
  Gradient Accumulation steps = 1
  Total optimization steps = 50,000
  Number of trainable parameters = 1,949,696

  0%|                                                 | 0/50000 [00:00<?, ?it/s]


  0%|                                      | 1/50000 [00:02<35:31:31,  2.56s/it]


  0%|                                      | 2/50000 [00:03<19:51:39,  1.43s/it]


  0%|                                      | 3/50000 [00:03<14:27:14,  1.04s/it]


  0%|                                      | 4/50000 [00:04<12:41:38,  1.09it/s]


  0%|                                      | 5/50000 [00:05<11:29:13,  1.21it/s]


  0%|                                      | 6/50000 [00:05<10:30:10,  1.32it/s]


  0%|                                      | 7/50000 [00:06<11:29:01,  1.21it/s]


  0%|                                      | 8/50000 [00:07<10:29:48,  1.32it/s]


  0%|                                       | 9/50000 [00:07<9:44:00,  1.43it/s]


  0%|                                      | 10/50000 [00:08<8:55:54,  1.55it/s]


  0%|                                      | 11/50000 [00:09<9:23:45,  1.48it/s]


  0%|                                     | 12/50000 [00:10<10:09:24,  1.37it/s]


  0%|                                      | 13/50000 [00:10<9:54:44,  1.40it/s]


  0%|                                      | 14/50000 [00:11<9:17:55,  1.49it/s]


  0%|                                      | 15/50000 [00:11<9:04:54,  1.53it/s]


  0%|                                      | 16/50000 [00:12<9:02:04,  1.54it/s]


  0%|                                      | 17/50000 [00:13<9:06:31,  1.52it/s]


  0%|                                      | 18/50000 [00:13<8:50:32,  1.57it/s]


  0%|                                      | 19/50000 [00:14<8:37:46,  1.61it/s]


  0%|                                      | 20/50000 [00:15<9:10:48,  1.51it/s]


  0%|                                      | 21/50000 [00:15<8:34:09,  1.62it/s]


  0%|                                      | 22/50000 [00:16<8:23:30,  1.65it/s]


  0%|                                      | 23/50000 [00:16<8:12:56,  1.69it/s]


  0%|                                      | 24/50000 [00:17<8:35:08,  1.62it/s]


  0%|                                      | 25/50000 [00:18<8:18:38,  1.67it/s]


  0%|                                      | 26/50000 [00:18<8:14:00,  1.69it/s]


  0%|                                      | 27/50000 [00:19<8:44:08,  1.59it/s]


  0%|                                      | 28/50000 [00:19<8:47:34,  1.58it/s]


  0%|                                      | 29/50000 [00:20<9:38:44,  1.44it/s]


  0%|                                      | 30/50000 [00:21<9:08:36,  1.52it/s]


  0%|                                      | 31/50000 [00:22<9:01:26,  1.54it/s]


  0%|                                      | 32/50000 [00:22<9:03:11,  1.53it/s]


  0%|                                      | 33/50000 [00:23<8:35:57,  1.61it/s]


  0%|                                      | 34/50000 [00:23<8:30:44,  1.63it/s]


  0%|                                      | 35/50000 [00:24<8:25:26,  1.65it/s]


  0%|                                      | 36/50000 [00:25<8:34:25,  1.62it/s]


  0%|                                      | 37/50000 [00:25<8:37:26,  1.61it/s]


  0%|                                      | 38/50000 [00:26<8:51:15,  1.57it/s]


  0%|                                      | 39/50000 [00:26<8:39:02,  1.60it/s]


  0%|                                      | 40/50000 [00:27<8:24:17,  1.65it/s]


  0%|                                      | 41/50000 [00:28<8:58:34,  1.55it/s]


  0%|                                      | 42/50000 [00:28<8:45:33,  1.58it/s]


  0%|                                      | 43/50000 [00:29<9:16:41,  1.50it/s]


  0%|                                      | 44/50000 [00:30<9:17:26,  1.49it/s]


  0%|                                      | 45/50000 [00:30<9:05:18,  1.53it/s]


  0%|                                      | 46/50000 [00:31<8:29:10,  1.64it/s]


  0%|                                      | 47/50000 [00:32<8:36:17,  1.61it/s]


  0%|                                      | 48/50000 [00:32<8:26:54,  1.64it/s]


  0%|                                      | 49/50000 [00:33<8:03:45,  1.72it/s]


  0%|                                      | 50/50000 [00:33<8:03:00,  1.72it/s]


  0%|                                      | 51/50000 [00:34<8:53:43,  1.56it/s]


  0%|                                      | 52/50000 [00:35<8:53:48,  1.56it/s]


  0%|                                      | 53/50000 [00:35<9:40:56,  1.43it/s]


  0%|                                      | 54/50000 [00:36<8:51:50,  1.57it/s]


  0%|                                      | 55/50000 [00:37<8:48:06,  1.58it/s]


  0%|                                      | 56/50000 [00:37<9:40:19,  1.43it/s]


  0%|                                      | 57/50000 [00:38<8:57:29,  1.55it/s]


  0%|                                      | 58/50000 [00:39<8:56:31,  1.55it/s]


  0%|                                      | 59/50000 [00:39<8:46:08,  1.58it/s]


  0%|                                      | 60/50000 [00:40<8:39:56,  1.60it/s]


  0%|                                      | 61/50000 [00:40<8:42:10,  1.59it/s]


  0%|                                      | 62/50000 [00:41<8:34:08,  1.62it/s]


  0%|                                      | 63/50000 [00:42<8:45:47,  1.58it/s]


  0%|                                      | 64/50000 [00:42<8:24:13,  1.65it/s]


  0%|                                      | 65/50000 [00:43<8:42:06,  1.59it/s]


  0%|                                      | 66/50000 [00:44<8:42:04,  1.59it/s]


  0%|                                      | 67/50000 [00:44<8:43:23,  1.59it/s]


  0%|                                      | 68/50000 [00:45<8:34:47,  1.62it/s]


  0%|                                      | 69/50000 [00:45<8:30:33,  1.63it/s]


  0%|                                      | 70/50000 [00:46<8:40:07,  1.60it/s]


  0%|                                      | 71/50000 [00:47<8:24:59,  1.65it/s]


  0%|                                      | 72/50000 [00:47<8:42:38,  1.59it/s]


  0%|                                      | 73/50000 [00:48<9:28:08,  1.46it/s]


  0%|                                      | 74/50000 [00:49<9:03:15,  1.53it/s]


  0%|                                      | 75/50000 [00:49<9:22:45,  1.48it/s]


  0%|                                      | 76/50000 [00:50<9:02:13,  1.53it/s]


  0%|                                     | 77/50000 [00:51<10:46:23,  1.29it/s]


  0%|                                     | 78/50000 [00:52<10:59:16,  1.26it/s]


  0%|                                     | 79/50000 [00:53<10:11:40,  1.36it/s]


  0%|                                      | 80/50000 [00:53<9:49:24,  1.41it/s]


  0%|                                      | 81/50000 [00:54<9:27:35,  1.47it/s]


  0%|                                      | 82/50000 [00:54<9:12:16,  1.51it/s]


  0%|                                      | 83/50000 [00:55<9:30:22,  1.46it/s]


  0%|                                      | 84/50000 [00:56<8:57:03,  1.55it/s]


  0%|                                      | 85/50000 [00:56<9:04:59,  1.53it/s]


  0%|                                      | 86/50000 [00:57<9:03:03,  1.53it/s]


  0%|                                      | 87/50000 [00:58<9:01:49,  1.54it/s]


  0%|                                      | 88/50000 [00:58<8:39:54,  1.60it/s]


  0%|                                      | 89/50000 [00:59<9:09:23,  1.51it/s]


  0%|                                      | 90/50000 [01:00<9:48:17,  1.41it/s]


  0%|                                      | 91/50000 [01:00<9:42:31,  1.43it/s]


  0%|                                      | 92/50000 [01:01<9:24:11,  1.47it/s]


  0%|                                      | 93/50000 [01:02<9:11:14,  1.51it/s]


  0%|                                      | 94/50000 [01:02<8:51:09,  1.57it/s]


  0%|                                      | 95/50000 [01:03<9:02:26,  1.53it/s]


  0%|                                      | 96/50000 [01:04<8:56:09,  1.55it/s]


  0%|                                      | 97/50000 [01:04<8:40:26,  1.60it/s]


  0%|                                      | 98/50000 [01:05<9:16:00,  1.50it/s]


  0%|                                      | 99/50000 [01:06<8:57:24,  1.55it/s]


  0%|                                     | 100/50000 [01:06<9:15:29,  1.50it/s]
                                                                                
{'loss': 3.685, 'grad_norm': 1.9492013454437256, 'learning_rate': 0.000998, 'epoch': 0.01}

  0%|                                     | 100/50000 [01:06<9:15:29,  1.50it/s]


  0%|                                     | 101/50000 [01:07<9:17:45,  1.49it/s]


  0%|                                     | 102/50000 [01:08<8:46:19,  1.58it/s]


  0%|                                     | 103/50000 [01:08<8:43:12,  1.59it/s]


  0%|                                     | 104/50000 [01:09<8:53:32,  1.56it/s]


  0%|                                     | 105/50000 [01:09<8:56:01,  1.55it/s]


  0%|                                     | 106/50000 [01:10<8:57:43,  1.55it/s]


  0%|                                     | 107/50000 [01:11<9:39:52,  1.43it/s]


  0%|                                    | 108/50000 [01:12<10:11:00,  1.36it/s]


  0%|                                     | 109/50000 [01:12<9:52:03,  1.40it/s]


  0%|                                     | 110/50000 [01:13<9:42:16,  1.43it/s]


  0%|                                     | 111/50000 [01:14<9:14:29,  1.50it/s]


  0%|                                     | 112/50000 [01:14<9:17:16,  1.49it/s]


  0%|                                    | 113/50000 [01:15<10:03:24,  1.38it/s]


  0%|                                    | 114/50000 [01:16<10:12:22,  1.36it/s]


  0%|                                    | 115/50000 [01:17<10:12:13,  1.36it/s]


  0%|                                     | 116/50000 [01:17<9:55:42,  1.40it/s]


  0%|                                    | 117/50000 [01:18<10:05:27,  1.37it/s]


  0%|                                    | 118/50000 [01:19<10:07:24,  1.37it/s]


  0%|                                     | 119/50000 [01:20<9:50:53,  1.41it/s]


  0%|                                     | 120/50000 [01:20<9:39:23,  1.43it/s]


  0%|                                     | 121/50000 [01:21<9:42:30,  1.43it/s]


  0%|                                     | 122/50000 [01:22<9:28:30,  1.46it/s]


  0%|                                     | 123/50000 [01:22<9:22:05,  1.48it/s]


  0%|                                     | 124/50000 [01:23<8:58:44,  1.54it/s]


  0%|                                     | 125/50000 [01:23<8:51:49,  1.56it/s]


  0%|                                     | 126/50000 [01:24<8:40:34,  1.60it/s]


  0%|                                     | 127/50000 [01:25<8:43:35,  1.59it/s]


  0%|                                     | 128/50000 [01:25<8:53:38,  1.56it/s]


  0%|                                     | 129/50000 [01:26<9:50:30,  1.41it/s]


  0%|                                     | 130/50000 [01:27<9:22:08,  1.48it/s]


  0%|                                     | 131/50000 [01:27<8:42:52,  1.59it/s]


  0%|                                     | 132/50000 [01:28<9:17:11,  1.49it/s]


  0%|                                     | 133/50000 [01:29<9:16:50,  1.49it/s]


  0%|                                     | 134/50000 [01:29<9:33:05,  1.45it/s]


  0%|                                     | 135/50000 [01:30<9:20:51,  1.48it/s]


  0%|                                     | 136/50000 [01:31<9:02:41,  1.53it/s]


  0%|                                     | 137/50000 [01:31<8:56:06,  1.55it/s]


  0%|                                     | 138/50000 [01:32<8:54:29,  1.55it/s]


  0%|                                     | 139/50000 [01:33<8:42:15,  1.59it/s]


  0%|                                     | 140/50000 [01:33<8:43:24,  1.59it/s]


  0%|                                     | 141/50000 [01:34<8:29:56,  1.63it/s]


  0%|                                     | 142/50000 [01:34<8:34:49,  1.61it/s]


  0%|                                     | 143/50000 [01:35<8:06:03,  1.71it/s]


  0%|                                     | 144/50000 [01:36<8:21:08,  1.66it/s]


  0%|                                     | 145/50000 [01:36<7:54:44,  1.75it/s]


  0%|                                     | 146/50000 [01:37<8:09:57,  1.70it/s]


  0%|                                     | 147/50000 [01:37<8:30:15,  1.63it/s]


  0%|                                     | 148/50000 [01:38<8:38:00,  1.60it/s]


  0%|                                     | 149/50000 [01:39<8:49:53,  1.57it/s]


  0%|                                     | 150/50000 [01:39<9:19:23,  1.49it/s]


  0%|                                     | 151/50000 [01:40<8:35:29,  1.61it/s]


  0%|                                     | 152/50000 [01:41<9:35:32,  1.44it/s]


  0%|                                    | 153/50000 [01:42<10:26:52,  1.33it/s]


  0%|                                    | 154/50000 [01:42<10:27:03,  1.32it/s]


  0%|                                     | 155/50000 [01:43<9:36:11,  1.44it/s]


  0%|                                     | 156/50000 [01:44<9:28:52,  1.46it/s]


  0%|                                     | 157/50000 [01:44<9:00:12,  1.54it/s]


  0%|                                     | 158/50000 [01:45<8:55:55,  1.55it/s]


  0%|                                     | 159/50000 [01:46<9:59:10,  1.39it/s]


  0%|                                    | 160/50000 [01:46<10:01:17,  1.38it/s]


  0%|                                     | 161/50000 [01:47<9:18:07,  1.49it/s]


  0%|                                     | 162/50000 [01:48<9:10:18,  1.51it/s]


  0%|                                     | 163/50000 [01:48<9:24:02,  1.47it/s]


  0%|                                     | 164/50000 [01:49<9:20:19,  1.48it/s]


  0%|                                     | 165/50000 [01:50<9:17:42,  1.49it/s]


  0%|                                     | 166/50000 [01:50<9:28:29,  1.46it/s]


  0%|                                     | 167/50000 [01:51<9:18:36,  1.49it/s]


  0%|                                     | 168/50000 [01:52<9:27:54,  1.46it/s]


  0%|▏                                    | 169/50000 [01:52<8:59:33,  1.54it/s]


  0%|▏                                    | 170/50000 [01:53<9:32:20,  1.45it/s]


  0%|▏                                    | 171/50000 [01:54<9:04:11,  1.53it/s]


  0%|▏                                    | 172/50000 [01:54<8:52:07,  1.56it/s]


  0%|▏                                    | 173/50000 [01:55<9:18:58,  1.49it/s]


  0%|▏                                    | 174/50000 [01:56<9:27:50,  1.46it/s]


  0%|▏                                    | 175/50000 [01:56<9:03:57,  1.53it/s]


  0%|▏                                    | 176/50000 [01:57<8:47:15,  1.57it/s]


  0%|▏                                    | 177/50000 [01:58<8:38:47,  1.60it/s]


  0%|▏                                    | 178/50000 [01:58<8:22:00,  1.65it/s]


  0%|▏                                    | 179/50000 [01:59<8:30:35,  1.63it/s]


  0%|▏                                    | 180/50000 [01:59<8:27:05,  1.64it/s]


  0%|▏                                    | 181/50000 [02:00<8:01:54,  1.72it/s]


  0%|▏                                    | 182/50000 [02:01<8:21:11,  1.66it/s]


  0%|▏                                    | 183/50000 [02:01<8:30:01,  1.63it/s]


  0%|▏                                    | 184/50000 [02:02<9:05:04,  1.52it/s]


  0%|▏                                    | 185/50000 [02:03<9:00:38,  1.54it/s]


  0%|▏                                    | 186/50000 [02:03<9:06:58,  1.52it/s]


  0%|▏                                    | 187/50000 [02:04<9:02:08,  1.53it/s]


  0%|▏                                    | 188/50000 [02:05<9:41:13,  1.43it/s]


  0%|▏                                    | 189/50000 [02:05<9:51:56,  1.40it/s]


  0%|▏                                    | 190/50000 [02:06<9:36:31,  1.44it/s]


  0%|▏                                    | 191/50000 [02:07<9:50:10,  1.41it/s]


  0%|▏                                    | 192/50000 [02:07<9:18:13,  1.49it/s]


  0%|▏                                    | 193/50000 [02:08<9:00:26,  1.54it/s]


  0%|▏                                    | 194/50000 [02:09<9:00:39,  1.54it/s]


  0%|▏                                    | 195/50000 [02:09<9:29:50,  1.46it/s]


  0%|▏                                    | 196/50000 [02:10<9:42:16,  1.43it/s]


  0%|▏                                   | 197/50000 [02:11<10:39:27,  1.30it/s]


  0%|▏                                    | 198/50000 [02:12<9:55:40,  1.39it/s]


  0%|▏                                   | 199/50000 [02:12<10:03:27,  1.38it/s]


  0%|▏                                    | 200/50000 [02:13<9:44:38,  1.42it/s]
                                                                                
{'loss': 3.5587, 'grad_norm': 2.2072553634643555, 'learning_rate': 0.000996, 'epoch': 0.01}

  0%|▏                                    | 200/50000 [02:13<9:44:38,  1.42it/s]


  0%|▏                                    | 201/50000 [02:14<9:15:43,  1.49it/s]


  0%|▏                                    | 202/50000 [02:14<8:47:51,  1.57it/s]


  0%|▏                                    | 203/50000 [02:15<8:48:37,  1.57it/s]


  0%|▏                                    | 204/50000 [02:16<9:00:04,  1.54it/s]


  0%|▏                                    | 205/50000 [02:16<8:57:07,  1.55it/s]


  0%|▏                                    | 206/50000 [02:17<8:45:36,  1.58it/s]


  0%|▏                                    | 207/50000 [02:17<8:16:20,  1.67it/s]


  0%|▏                                    | 208/50000 [02:18<8:23:20,  1.65it/s]


  0%|▏                                    | 209/50000 [02:19<8:10:59,  1.69it/s]


  0%|▏                                    | 210/50000 [02:20<9:51:44,  1.40it/s]


  0%|▏                                    | 211/50000 [02:20<9:57:27,  1.39it/s]


  0%|▏                                    | 212/50000 [02:21<9:33:05,  1.45it/s]


  0%|▏                                    | 213/50000 [02:22<9:28:34,  1.46it/s]


  0%|▏                                    | 214/50000 [02:22<9:38:40,  1.43it/s]


  0%|▏                                    | 215/50000 [02:23<9:55:19,  1.39it/s]


  0%|▏                                    | 216/50000 [02:24<9:14:18,  1.50it/s]


  0%|▏                                    | 217/50000 [02:24<8:51:52,  1.56it/s]


  0%|▏                                    | 218/50000 [02:25<8:29:49,  1.63it/s]


  0%|▏                                    | 219/50000 [02:25<8:07:16,  1.70it/s]


  0%|▏                                    | 220/50000 [02:26<9:14:07,  1.50it/s]


  0%|▏                                    | 221/50000 [02:27<9:49:50,  1.41it/s]


  0%|▏                                    | 222/50000 [02:28<9:29:17,  1.46it/s]


  0%|▏                                    | 223/50000 [02:28<9:27:33,  1.46it/s]


  0%|▏                                    | 224/50000 [02:29<9:15:28,  1.49it/s]


  0%|▏                                    | 225/50000 [02:29<8:37:42,  1.60it/s]


  0%|▏                                    | 226/50000 [02:30<9:25:46,  1.47it/s]


  0%|▏                                    | 227/50000 [02:31<9:04:29,  1.52it/s]


  0%|▏                                    | 228/50000 [02:32<9:24:50,  1.47it/s]


  0%|▏                                    | 229/50000 [02:32<9:21:11,  1.48it/s]


  0%|▏                                    | 230/50000 [02:33<9:18:38,  1.48it/s]


  0%|▏                                    | 231/50000 [02:33<9:10:35,  1.51it/s]


  0%|▏                                    | 232/50000 [02:34<9:04:59,  1.52it/s]


  0%|▏                                    | 233/50000 [02:35<8:30:02,  1.63it/s]


  0%|▏                                    | 234/50000 [02:35<8:42:24,  1.59it/s]


  0%|▏                                    | 235/50000 [02:36<9:30:17,  1.45it/s]


  0%|▏                                    | 236/50000 [02:37<9:26:04,  1.47it/s]


  0%|▏                                    | 237/50000 [02:37<9:21:34,  1.48it/s]


  0%|▏                                    | 238/50000 [02:38<9:46:04,  1.42it/s]


  0%|▏                                    | 239/50000 [02:39<9:05:49,  1.52it/s]


  0%|▏                                    | 240/50000 [02:39<8:57:48,  1.54it/s]


  0%|▏                                    | 241/50000 [02:40<8:41:24,  1.59it/s]


  0%|▏                                    | 242/50000 [02:41<8:40:52,  1.59it/s]


  0%|▏                                    | 243/50000 [02:41<8:43:33,  1.58it/s]


  0%|▏                                    | 244/50000 [02:42<9:35:36,  1.44it/s]


  0%|▏                                    | 245/50000 [02:43<9:23:30,  1.47it/s]


  0%|▏                                    | 246/50000 [02:43<9:14:57,  1.49it/s]


  0%|▏                                    | 247/50000 [02:44<9:42:41,  1.42it/s]


  0%|▏                                    | 248/50000 [02:45<9:29:50,  1.46it/s]


  0%|▏                                    | 249/50000 [02:46<9:37:24,  1.44it/s]


  0%|▏                                    | 250/50000 [02:46<9:01:15,  1.53it/s]


  1%|▏                                    | 251/50000 [02:47<9:33:06,  1.45it/s]


  1%|▏                                    | 252/50000 [02:48<9:31:11,  1.45it/s]


  1%|▏                                    | 253/50000 [02:48<9:28:13,  1.46it/s]


  1%|▏                                    | 254/50000 [02:49<9:42:21,  1.42it/s]


  1%|▏                                   | 255/50000 [02:50<11:13:06,  1.23it/s]


  1%|▏                                   | 256/50000 [02:51<10:28:32,  1.32it/s]


  1%|▏                                   | 257/50000 [02:51<10:00:31,  1.38it/s]


  1%|▏                                    | 258/50000 [02:52<9:48:46,  1.41it/s]


  1%|▏                                   | 259/50000 [02:53<10:13:24,  1.35it/s]


  1%|▏                                    | 260/50000 [02:53<9:31:07,  1.45it/s]


  1%|▏                                    | 261/50000 [02:54<9:04:31,  1.52it/s]


  1%|▏                                    | 262/50000 [02:55<9:21:06,  1.48it/s]


  1%|▏                                    | 263/50000 [02:55<8:59:17,  1.54it/s]


  1%|▏                                    | 264/50000 [02:56<8:34:22,  1.61it/s]


  1%|▏                                    | 265/50000 [02:56<8:30:55,  1.62it/s]


  1%|▏                                    | 266/50000 [02:57<8:36:30,  1.60it/s]


  1%|▏                                    | 267/50000 [02:58<9:02:07,  1.53it/s]


  1%|▏                                    | 268/50000 [02:58<9:03:06,  1.53it/s]


  1%|▏                                    | 269/50000 [02:59<8:43:38,  1.58it/s]


  1%|▏                                    | 270/50000 [03:00<8:55:02,  1.55it/s]


  1%|▏                                    | 271/50000 [03:00<8:31:33,  1.62it/s]


  1%|▏                                    | 272/50000 [03:01<8:04:24,  1.71it/s]


  1%|▏                                    | 273/50000 [03:01<8:21:01,  1.65it/s]


  1%|▏                                    | 274/50000 [03:02<8:39:01,  1.60it/s]


  1%|▏                                    | 275/50000 [03:03<8:21:49,  1.65it/s]


  1%|▏                                    | 276/50000 [03:03<7:59:36,  1.73it/s]


  1%|▏                                    | 277/50000 [03:04<8:23:59,  1.64it/s]


  1%|▏                                    | 278/50000 [03:04<8:28:34,  1.63it/s]


  1%|▏                                    | 279/50000 [03:05<8:52:08,  1.56it/s]


  1%|▏                                    | 280/50000 [03:06<9:20:28,  1.48it/s]


  1%|▏                                    | 281/50000 [03:07<8:57:16,  1.54it/s]


  1%|▏                                    | 282/50000 [03:07<9:02:43,  1.53it/s]


  1%|▏                                    | 283/50000 [03:08<9:08:13,  1.51it/s]


  1%|▏                                    | 284/50000 [03:09<9:19:52,  1.48it/s]


  1%|▏                                    | 285/50000 [03:09<9:10:46,  1.50it/s]


  1%|▏                                    | 286/50000 [03:10<8:50:17,  1.56it/s]


  1%|▏                                    | 287/50000 [03:11<9:12:31,  1.50it/s]


  1%|▏                                    | 288/50000 [03:11<8:52:48,  1.56it/s]


  1%|▏                                    | 289/50000 [03:12<9:09:12,  1.51it/s]


  1%|▏                                    | 290/50000 [03:12<9:06:20,  1.52it/s]


  1%|▏                                    | 291/50000 [03:13<8:50:06,  1.56it/s]


  1%|▏                                    | 292/50000 [03:14<9:13:57,  1.50it/s]


  1%|▏                                    | 293/50000 [03:14<9:06:38,  1.52it/s]


  1%|▏                                    | 294/50000 [03:15<9:18:51,  1.48it/s]


  1%|▏                                    | 295/50000 [03:16<8:57:19,  1.54it/s]


  1%|▏                                    | 296/50000 [03:16<8:25:49,  1.64it/s]


  1%|▏                                    | 297/50000 [03:17<8:14:04,  1.68it/s]


  1%|▏                                    | 298/50000 [03:17<8:29:24,  1.63it/s]


  1%|▏                                    | 299/50000 [03:18<9:20:57,  1.48it/s]


  1%|▏                                    | 300/50000 [03:19<8:34:45,  1.61it/s]
                                                                                
{'loss': 3.5052, 'grad_norm': 2.241060495376587, 'learning_rate': 0.000994, 'epoch': 0.02}

  1%|▏                                    | 300/50000 [03:19<8:34:45,  1.61it/s]


  1%|▏                                    | 301/50000 [03:19<8:36:08,  1.60it/s]


  1%|▏                                    | 302/50000 [03:20<8:09:30,  1.69it/s]


  1%|▏                                    | 303/50000 [03:21<8:04:11,  1.71it/s]


  1%|▏                                    | 304/50000 [03:21<8:34:59,  1.61it/s]


  1%|▏                                    | 305/50000 [03:22<8:26:34,  1.64it/s]


  1%|▏                                    | 306/50000 [03:23<9:07:13,  1.51it/s]


  1%|▏                                    | 307/50000 [03:23<8:40:00,  1.59it/s]


  1%|▏                                    | 308/50000 [03:24<8:33:08,  1.61it/s]


  1%|▏                                    | 309/50000 [03:24<8:45:46,  1.58it/s]


  1%|▏                                    | 310/50000 [03:25<8:45:14,  1.58it/s]


  1%|▏                                    | 311/50000 [03:26<8:55:46,  1.55it/s]


  1%|▏                                    | 312/50000 [03:26<8:56:57,  1.54it/s]


  1%|▏                                    | 313/50000 [03:27<8:44:57,  1.58it/s]


  1%|▏                                    | 314/50000 [03:28<8:44:41,  1.58it/s]


  1%|▏                                    | 315/50000 [03:28<8:32:02,  1.62it/s]


  1%|▏                                    | 316/50000 [03:29<9:25:52,  1.46it/s]


  1%|▏                                    | 317/50000 [03:30<9:00:47,  1.53it/s]


  1%|▏                                    | 318/50000 [03:30<8:58:52,  1.54it/s]


  1%|▏                                    | 319/50000 [03:31<9:02:25,  1.53it/s]


  1%|▏                                    | 320/50000 [03:32<8:59:57,  1.53it/s]


  1%|▏                                    | 321/50000 [03:32<8:45:28,  1.58it/s]


  1%|▏                                    | 322/50000 [03:33<9:52:36,  1.40it/s]


  1%|▏                                   | 323/50000 [03:34<10:22:12,  1.33it/s]


  1%|▏                                    | 324/50000 [03:35<9:57:19,  1.39it/s]


  1%|▏                                   | 325/50000 [03:35<10:04:05,  1.37it/s]


  1%|▏                                    | 326/50000 [03:36<9:50:54,  1.40it/s]


  1%|▏                                    | 327/50000 [03:37<9:29:15,  1.45it/s]


  1%|▏                                    | 328/50000 [03:37<9:00:01,  1.53it/s]


  1%|▏                                    | 329/50000 [03:38<9:17:48,  1.48it/s]


  1%|▏                                    | 330/50000 [03:39<9:07:48,  1.51it/s]


  1%|▏                                    | 331/50000 [03:39<9:10:08,  1.50it/s]


  1%|▏                                    | 332/50000 [03:40<9:24:51,  1.47it/s]


  1%|▏                                    | 333/50000 [03:41<9:20:24,  1.48it/s]


  1%|▏                                   | 334/50000 [03:41<10:15:22,  1.35it/s]


  1%|▏                                   | 335/50000 [03:42<10:36:13,  1.30it/s]


  1%|▏                                   | 336/50000 [03:43<10:01:01,  1.38it/s]


  1%|▏                                    | 337/50000 [03:44<9:40:51,  1.43it/s]


  1%|▎                                    | 338/50000 [03:44<9:09:25,  1.51it/s]


  1%|▎                                    | 339/50000 [03:45<8:47:31,  1.57it/s]


  1%|▎                                    | 340/50000 [03:45<8:36:53,  1.60it/s]


  1%|▎                                    | 341/50000 [03:46<8:31:04,  1.62it/s]


  1%|▎                                    | 342/50000 [03:47<8:22:20,  1.65it/s]


  1%|▎                                    | 343/50000 [03:47<8:09:13,  1.69it/s]


  1%|▎                                    | 344/50000 [03:48<8:45:34,  1.57it/s]


  1%|▎                                    | 345/50000 [03:48<8:33:55,  1.61it/s]


  1%|▎                                    | 346/50000 [03:49<8:57:53,  1.54it/s]


  1%|▎                                    | 347/50000 [03:50<8:31:40,  1.62it/s]


  1%|▎                                    | 348/50000 [03:50<8:38:15,  1.60it/s]


  1%|▎                                    | 349/50000 [03:51<8:03:07,  1.71it/s]


  1%|▎                                    | 350/50000 [03:51<8:04:09,  1.71it/s]


  1%|▎                                    | 351/50000 [03:52<8:34:48,  1.61it/s]


  1%|▎                                    | 352/50000 [03:53<8:45:21,  1.58it/s]


  1%|▎                                    | 353/50000 [03:54<9:34:56,  1.44it/s]


  1%|▎                                    | 354/50000 [03:54<9:10:00,  1.50it/s]


  1%|▎                                    | 355/50000 [03:55<9:05:03,  1.52it/s]


  1%|▎                                    | 356/50000 [03:56<9:24:14,  1.47it/s]


  1%|▎                                    | 357/50000 [03:56<9:05:31,  1.52it/s]


  1%|▎                                    | 358/50000 [03:57<8:51:04,  1.56it/s]


  1%|▎                                    | 359/50000 [03:57<8:48:50,  1.56it/s]


  1%|▎                                    | 360/50000 [03:58<8:47:18,  1.57it/s]


  1%|▎                                    | 361/50000 [03:59<9:11:48,  1.50it/s]


  1%|▎                                    | 362/50000 [03:59<9:09:40,  1.51it/s]


  1%|▎                                    | 363/50000 [04:00<9:55:04,  1.39it/s]


  1%|▎                                    | 364/50000 [04:01<9:24:11,  1.47it/s]


  1%|▎                                    | 365/50000 [04:01<9:11:55,  1.50it/s]


  1%|▎                                    | 366/50000 [04:02<9:23:48,  1.47it/s]


  1%|▎                                    | 367/50000 [04:03<9:32:04,  1.45it/s]


  1%|▎                                    | 368/50000 [04:04<9:41:34,  1.42it/s]


  1%|▎                                    | 369/50000 [04:04<9:46:32,  1.41it/s]


  1%|▎                                    | 370/50000 [04:05<9:58:05,  1.38it/s]


  1%|▎                                    | 371/50000 [04:06<9:41:59,  1.42it/s]


  1%|▎                                    | 372/50000 [04:06<9:35:22,  1.44it/s]


  1%|▎                                    | 373/50000 [04:07<9:10:13,  1.50it/s]


  1%|▎                                    | 374/50000 [04:08<8:43:21,  1.58it/s]


  1%|▎                                    | 375/50000 [04:08<8:33:49,  1.61it/s]


  1%|▎                                    | 376/50000 [04:09<8:35:03,  1.61it/s]


  1%|▎                                    | 377/50000 [04:10<9:14:14,  1.49it/s]


  1%|▎                                    | 378/50000 [04:10<9:07:57,  1.51it/s]


  1%|▎                                    | 379/50000 [04:11<9:47:51,  1.41it/s]


  1%|▎                                    | 380/50000 [04:12<9:59:03,  1.38it/s]


  1%|▎                                    | 381/50000 [04:12<9:39:20,  1.43it/s]


  1%|▎                                    | 382/50000 [04:13<9:48:06,  1.41it/s]


  1%|▎                                    | 383/50000 [04:14<9:39:36,  1.43it/s]


  1%|▎                                    | 384/50000 [04:15<9:33:37,  1.44it/s]


  1%|▎                                    | 385/50000 [04:15<9:29:21,  1.45it/s]


  1%|▎                                    | 386/50000 [04:16<9:23:16,  1.47it/s]


  1%|▎                                   | 387/50000 [04:17<10:04:23,  1.37it/s]


  1%|▎                                   | 388/50000 [04:17<10:02:24,  1.37it/s]


  1%|▎                                    | 389/50000 [04:18<9:27:32,  1.46it/s]


  1%|▎                                    | 390/50000 [04:19<9:17:22,  1.48it/s]


  1%|▎                                    | 391/50000 [04:19<8:59:06,  1.53it/s]


  1%|▎                                    | 392/50000 [04:20<8:32:27,  1.61it/s]


  1%|▎                                    | 393/50000 [04:21<8:40:12,  1.59it/s]


  1%|▎                                    | 394/50000 [04:21<8:31:37,  1.62it/s]


  1%|▎                                    | 395/50000 [04:22<9:21:55,  1.47it/s]


  1%|▎                                    | 396/50000 [04:22<8:48:23,  1.56it/s]


  1%|▎                                    | 397/50000 [04:23<8:38:57,  1.59it/s]


  1%|▎                                    | 398/50000 [04:24<8:46:26,  1.57it/s]


  1%|▎                                    | 399/50000 [04:24<8:30:04,  1.62it/s]


  1%|▎                                    | 400/50000 [04:25<8:32:30,  1.61it/s]
                                                                                
{'loss': 3.4888, 'grad_norm': 2.3367817401885986, 'learning_rate': 0.000992, 'epoch': 0.02}

  1%|▎                                    | 400/50000 [04:25<8:32:30,  1.61it/s]


  1%|▎                                    | 401/50000 [04:26<8:20:03,  1.65it/s]


  1%|▎                                    | 402/50000 [04:26<8:37:54,  1.60it/s]


  1%|▎                                    | 403/50000 [04:27<8:17:37,  1.66it/s]


  1%|▎                                    | 404/50000 [04:27<7:57:50,  1.73it/s]


  1%|▎                                    | 405/50000 [04:28<8:16:02,  1.67it/s]


  1%|▎                                    | 406/50000 [04:28<8:08:31,  1.69it/s]


  1%|▎                                    | 407/50000 [04:29<8:03:11,  1.71it/s]


  1%|▎                                    | 408/50000 [04:30<8:42:24,  1.58it/s]


  1%|▎                                    | 409/50000 [04:30<8:20:45,  1.65it/s]


  1%|▎                                    | 410/50000 [04:31<8:25:42,  1.63it/s]


  1%|▎                                    | 411/50000 [04:32<8:08:59,  1.69it/s]


  1%|▎                                    | 412/50000 [04:32<8:28:28,  1.63it/s]


  1%|▎                                    | 413/50000 [04:33<8:40:41,  1.59it/s]


  1%|▎                                    | 414/50000 [04:34<9:02:27,  1.52it/s]


  1%|▎                                    | 415/50000 [04:34<8:44:08,  1.58it/s]


  1%|▎                                    | 416/50000 [04:35<8:26:32,  1.63it/s]


  1%|▎                                    | 417/50000 [04:35<8:03:58,  1.71it/s]


  1%|▎                                    | 418/50000 [04:36<7:55:17,  1.74it/s]


  1%|▎                                    | 419/50000 [04:36<7:54:02,  1.74it/s]


  1%|▎                                    | 420/50000 [04:37<8:14:55,  1.67it/s]


  1%|▎                                    | 421/50000 [04:38<8:42:09,  1.58it/s]


  1%|▎                                    | 422/50000 [04:38<8:48:34,  1.56it/s]


  1%|▎                                    | 423/50000 [04:39<9:17:06,  1.48it/s]


  1%|▎                                    | 424/50000 [04:40<9:11:29,  1.50it/s]


  1%|▎                                    | 425/50000 [04:40<8:51:53,  1.55it/s]


  1%|▎                                    | 426/50000 [04:41<8:38:08,  1.59it/s]


  1%|▎                                    | 427/50000 [04:42<8:38:02,  1.59it/s]


  1%|▎                                    | 428/50000 [04:42<9:03:35,  1.52it/s]


  1%|▎                                    | 429/50000 [04:43<8:49:24,  1.56it/s]


  1%|▎                                    | 430/50000 [04:44<9:46:05,  1.41it/s]


  1%|▎                                    | 431/50000 [04:45<9:55:03,  1.39it/s]


  1%|▎                                   | 432/50000 [04:45<10:11:26,  1.35it/s]


  1%|▎                                    | 433/50000 [04:46<9:32:49,  1.44it/s]


  1%|▎                                    | 434/50000 [04:47<9:19:43,  1.48it/s]


  1%|▎                                    | 435/50000 [04:47<9:13:47,  1.49it/s]


  1%|▎                                    | 436/50000 [04:48<8:43:18,  1.58it/s]


  1%|▎                                    | 437/50000 [04:48<9:09:17,  1.50it/s]


  1%|▎                                    | 438/50000 [04:49<9:11:07,  1.50it/s]


  1%|▎                                    | 439/50000 [04:50<8:55:25,  1.54it/s]


  1%|▎                                    | 440/50000 [04:50<9:17:35,  1.48it/s]


  1%|▎                                    | 441/50000 [04:51<8:46:00,  1.57it/s]


  1%|▎                                    | 442/50000 [04:52<8:31:20,  1.62it/s]


  1%|▎                                    | 443/50000 [04:52<8:54:09,  1.55it/s]


  1%|▎                                    | 444/50000 [04:53<8:36:50,  1.60it/s]


  1%|▎                                    | 445/50000 [04:54<8:47:53,  1.56it/s]


  1%|▎                                    | 446/50000 [04:54<8:46:35,  1.57it/s]


  1%|▎                                    | 447/50000 [04:55<9:04:14,  1.52it/s]


  1%|▎                                    | 448/50000 [04:56<9:05:28,  1.51it/s]


  1%|▎                                    | 449/50000 [04:56<8:56:59,  1.54it/s]


  1%|▎                                    | 450/50000 [04:57<8:59:03,  1.53it/s]


  1%|▎                                    | 451/50000 [04:58<9:03:24,  1.52it/s]


  1%|▎                                    | 452/50000 [04:58<9:04:57,  1.52it/s]


  1%|▎                                    | 453/50000 [04:59<9:07:44,  1.51it/s]


  1%|▎                                    | 454/50000 [04:59<8:29:37,  1.62it/s]


  1%|▎                                    | 455/50000 [05:00<8:31:58,  1.61it/s]


  1%|▎                                    | 456/50000 [05:01<8:43:06,  1.58it/s]


  1%|▎                                    | 457/50000 [05:01<8:46:00,  1.57it/s]


  1%|▎                                   | 458/50000 [05:02<10:03:35,  1.37it/s]


  1%|▎                                    | 459/50000 [05:03<9:37:49,  1.43it/s]


  1%|▎                                    | 460/50000 [05:04<9:42:17,  1.42it/s]


  1%|▎                                    | 461/50000 [05:04<9:37:03,  1.43it/s]


  1%|▎                                    | 462/50000 [05:05<8:50:38,  1.56it/s]


  1%|▎                                    | 463/50000 [05:06<9:33:22,  1.44it/s]


  1%|▎                                    | 464/50000 [05:06<9:10:13,  1.50it/s]


  1%|▎                                    | 465/50000 [05:07<9:12:48,  1.49it/s]


  1%|▎                                    | 466/50000 [05:08<9:13:10,  1.49it/s]


  1%|▎                                    | 467/50000 [05:08<9:07:53,  1.51it/s]


  1%|▎                                    | 468/50000 [05:09<9:03:34,  1.52it/s]


  1%|▎                                    | 469/50000 [05:10<8:58:43,  1.53it/s]


  1%|▎                                    | 470/50000 [05:10<8:50:47,  1.56it/s]


  1%|▎                                    | 471/50000 [05:11<8:35:44,  1.60it/s]


  1%|▎                                    | 472/50000 [05:11<8:56:39,  1.54it/s]


  1%|▎                                    | 473/50000 [05:12<8:42:51,  1.58it/s]


  1%|▎                                    | 474/50000 [05:13<8:42:43,  1.58it/s]


  1%|▎                                    | 475/50000 [05:13<8:31:30,  1.61it/s]


  1%|▎                                    | 476/50000 [05:14<8:58:46,  1.53it/s]


  1%|▎                                    | 477/50000 [05:15<9:04:45,  1.52it/s]


  1%|▎                                    | 478/50000 [05:15<8:48:46,  1.56it/s]


  1%|▎                                    | 479/50000 [05:16<8:59:28,  1.53it/s]


  1%|▎                                    | 480/50000 [05:16<8:32:33,  1.61it/s]


  1%|▎                                    | 481/50000 [05:17<8:40:01,  1.59it/s]


  1%|▎                                    | 482/50000 [05:18<8:43:53,  1.58it/s]


  1%|▎                                    | 483/50000 [05:18<8:51:21,  1.55it/s]


  1%|▎                                    | 484/50000 [05:19<8:40:48,  1.58it/s]


  1%|▎                                    | 485/50000 [05:20<8:28:43,  1.62it/s]


  1%|▎                                    | 486/50000 [05:20<8:24:47,  1.63it/s]


  1%|▎                                    | 487/50000 [05:21<8:36:10,  1.60it/s]


  1%|▎                                    | 488/50000 [05:21<8:28:33,  1.62it/s]


  1%|▎                                    | 489/50000 [05:22<8:29:39,  1.62it/s]


  1%|▎                                    | 490/50000 [05:23<8:16:08,  1.66it/s]


  1%|▎                                    | 491/50000 [05:23<8:28:35,  1.62it/s]


  1%|▎                                    | 492/50000 [05:24<8:02:01,  1.71it/s]


  1%|▎                                    | 493/50000 [05:25<8:49:10,  1.56it/s]


  1%|▎                                    | 494/50000 [05:25<8:59:37,  1.53it/s]


  1%|▎                                    | 495/50000 [05:26<8:37:13,  1.60it/s]


  1%|▎                                    | 496/50000 [05:26<8:27:38,  1.63it/s]


  1%|▎                                    | 497/50000 [05:27<8:04:39,  1.70it/s]


  1%|▎                                    | 498/50000 [05:28<8:28:23,  1.62it/s]


  1%|▎                                    | 499/50000 [05:28<8:32:34,  1.61it/s]


  1%|▎                                    | 500/50000 [05:29<8:13:37,  1.67it/s]
                                                                                
{'loss': 3.456, 'grad_norm': 2.226874828338623, 'learning_rate': 0.00099, 'epoch': 0.03}

  1%|▎                                    | 500/50000 [05:29<8:13:37,  1.67it/s]


  1%|▎                                    | 501/50000 [05:29<8:06:46,  1.69it/s]


  1%|▎                                    | 502/50000 [05:30<8:39:38,  1.59it/s]


  1%|▎                                    | 503/50000 [05:31<8:39:28,  1.59it/s]


  1%|▎                                    | 504/50000 [05:31<8:32:17,  1.61it/s]


  1%|▎                                    | 505/50000 [05:32<8:27:14,  1.63it/s]


  1%|▎                                    | 506/50000 [05:33<9:16:47,  1.48it/s]


  1%|▍                                    | 507/50000 [05:33<8:37:31,  1.59it/s]


  1%|▍                                    | 508/50000 [05:34<8:45:09,  1.57it/s]


  1%|▍                                    | 509/50000 [05:34<8:26:54,  1.63it/s]


  1%|▍                                    | 510/50000 [05:35<9:26:32,  1.46it/s]


  1%|▍                                    | 511/50000 [05:36<9:00:46,  1.53it/s]


  1%|▍                                    | 512/50000 [05:37<9:07:36,  1.51it/s]


  1%|▍                                    | 513/50000 [05:37<9:43:33,  1.41it/s]


  1%|▍                                    | 514/50000 [05:38<9:09:24,  1.50it/s]


  1%|▍                                    | 515/50000 [05:39<9:08:58,  1.50it/s]


  1%|▍                                    | 516/50000 [05:39<9:31:00,  1.44it/s]


  1%|▍                                    | 517/50000 [05:40<8:54:26,  1.54it/s]


  1%|▍                                    | 518/50000 [05:41<9:27:07,  1.45it/s]


  1%|▍                                    | 519/50000 [05:41<8:59:22,  1.53it/s]


  1%|▍                                    | 520/50000 [05:42<9:22:51,  1.47it/s]


  1%|▍                                    | 521/50000 [05:43<8:54:58,  1.54it/s]


  1%|▍                                    | 522/50000 [05:43<8:55:42,  1.54it/s]


  1%|▍                                    | 523/50000 [05:44<8:42:11,  1.58it/s]


  1%|▍                                    | 524/50000 [05:45<8:46:56,  1.56it/s]


  1%|▍                                    | 525/50000 [05:45<9:10:52,  1.50it/s]


  1%|▍                                    | 526/50000 [05:46<9:11:45,  1.49it/s]


  1%|▍                                    | 527/50000 [05:47<9:07:19,  1.51it/s]


  1%|▍                                    | 528/50000 [05:47<9:04:15,  1.51it/s]


  1%|▍                                    | 529/50000 [05:48<8:35:43,  1.60it/s]


  1%|▍                                    | 530/50000 [05:49<9:03:06,  1.52it/s]


  1%|▍                                    | 531/50000 [05:49<8:58:15,  1.53it/s]


  1%|▍                                    | 532/50000 [05:50<9:26:44,  1.45it/s]


  1%|▍                                    | 533/50000 [05:51<9:20:59,  1.47it/s]


  1%|▍                                   | 534/50000 [05:52<10:40:45,  1.29it/s]


  1%|▍                                   | 535/50000 [05:52<10:13:45,  1.34it/s]


  1%|▍                                    | 536/50000 [05:53<9:14:16,  1.49it/s]


  1%|▍                                    | 537/50000 [05:53<9:13:54,  1.49it/s]


  1%|▍                                    | 538/50000 [05:54<9:34:27,  1.44it/s]


  1%|▍                                    | 539/50000 [05:55<9:03:03,  1.52it/s]


  1%|▍                                    | 540/50000 [05:55<9:15:15,  1.48it/s]


  1%|▍                                    | 541/50000 [05:56<9:29:16,  1.45it/s]


  1%|▍                                    | 542/50000 [05:57<9:06:59,  1.51it/s]


  1%|▍                                    | 543/50000 [05:57<9:02:19,  1.52it/s]


  1%|▍                                    | 544/50000 [05:58<9:05:25,  1.51it/s]


  1%|▍                                   | 545/50000 [05:59<10:05:42,  1.36it/s]


  1%|▍                                    | 546/50000 [06:00<9:38:53,  1.42it/s]


  1%|▍                                    | 547/50000 [06:00<9:10:39,  1.50it/s]


  1%|▍                                    | 548/50000 [06:01<8:50:56,  1.55it/s]


  1%|▍                                    | 549/50000 [06:01<8:51:05,  1.55it/s]


  1%|▍                                    | 550/50000 [06:02<8:40:18,  1.58it/s]


  1%|▍                                    | 551/50000 [06:03<8:11:58,  1.68it/s]


  1%|▍                                    | 552/50000 [06:03<9:06:03,  1.51it/s]


  1%|▍                                    | 553/50000 [06:04<8:37:02,  1.59it/s]


  1%|▍                                    | 554/50000 [06:05<8:24:28,  1.63it/s]


  1%|▍                                    | 555/50000 [06:05<9:13:07,  1.49it/s]


  1%|▍                                    | 556/50000 [06:06<8:55:48,  1.54it/s]


  1%|▍                                    | 557/50000 [06:07<9:13:56,  1.49it/s]


  1%|▍                                    | 558/50000 [06:07<8:51:49,  1.55it/s]


  1%|▍                                    | 559/50000 [06:08<8:37:45,  1.59it/s]


  1%|▍                                    | 560/50000 [06:08<8:26:21,  1.63it/s]


  1%|▍                                    | 561/50000 [06:09<8:32:29,  1.61it/s]


  1%|▍                                    | 562/50000 [06:10<9:42:33,  1.41it/s]


  1%|▍                                    | 563/50000 [06:11<9:24:12,  1.46it/s]


  1%|▍                                    | 564/50000 [06:11<8:55:55,  1.54it/s]


  1%|▍                                    | 565/50000 [06:12<8:54:33,  1.54it/s]


  1%|▍                                    | 566/50000 [06:12<8:56:52,  1.53it/s]


  1%|▍                                    | 567/50000 [06:13<9:13:19,  1.49it/s]


  1%|▍                                    | 568/50000 [06:14<9:03:52,  1.51it/s]


  1%|▍                                    | 569/50000 [06:15<9:17:41,  1.48it/s]


  1%|▍                                    | 570/50000 [06:15<9:07:24,  1.50it/s]


  1%|▍                                    | 571/50000 [06:16<9:12:40,  1.49it/s]


  1%|▍                                   | 572/50000 [06:17<10:00:13,  1.37it/s]


  1%|▍                                   | 573/50000 [06:18<10:20:31,  1.33it/s]


  1%|▍                                    | 574/50000 [06:18<9:37:13,  1.43it/s]


  1%|▍                                    | 575/50000 [06:19<9:49:37,  1.40it/s]


  1%|▍                                   | 576/50000 [06:20<10:33:57,  1.30it/s]


  1%|▍                                   | 577/50000 [06:21<10:55:39,  1.26it/s]


  1%|▍                                   | 578/50000 [06:21<10:43:10,  1.28it/s]


  1%|▍                                   | 579/50000 [06:22<10:17:56,  1.33it/s]


  1%|▍                                    | 580/50000 [06:23<9:47:50,  1.40it/s]


  1%|▍                                    | 581/50000 [06:23<9:31:33,  1.44it/s]


  1%|▍                                    | 582/50000 [06:24<8:47:58,  1.56it/s]


  1%|▍                                    | 583/50000 [06:24<8:37:01,  1.59it/s]


  1%|▍                                    | 584/50000 [06:25<8:35:48,  1.60it/s]


  1%|▍                                    | 585/50000 [06:26<8:19:15,  1.65it/s]


  1%|▍                                    | 586/50000 [06:27<9:59:15,  1.37it/s]


  1%|▍                                    | 587/50000 [06:27<9:42:46,  1.41it/s]


  1%|▍                                    | 588/50000 [06:28<9:15:23,  1.48it/s]


  1%|▍                                    | 589/50000 [06:28<8:59:12,  1.53it/s]


  1%|▍                                    | 590/50000 [06:29<8:58:53,  1.53it/s]


  1%|▍                                    | 591/50000 [06:30<9:04:56,  1.51it/s]


  1%|▍                                    | 592/50000 [06:30<8:36:33,  1.59it/s]


  1%|▍                                    | 593/50000 [06:31<8:40:03,  1.58it/s]


  1%|▍                                    | 594/50000 [06:32<8:40:46,  1.58it/s]


  1%|▍                                    | 595/50000 [06:32<8:24:13,  1.63it/s]


  1%|▍                                    | 596/50000 [06:33<8:30:09,  1.61it/s]


  1%|▍                                    | 597/50000 [06:33<8:32:26,  1.61it/s]


  1%|▍                                    | 598/50000 [06:34<8:21:28,  1.64it/s]


  1%|▍                                    | 599/50000 [06:35<8:10:40,  1.68it/s]


  1%|▍                                    | 600/50000 [06:35<8:12:19,  1.67it/s]
                                                                                
{'loss': 3.4489, 'grad_norm': 2.64300274848938, 'learning_rate': 0.000988, 'epoch': 0.03}

  1%|▍                                    | 600/50000 [06:35<8:12:19,  1.67it/s]


  1%|▍                                    | 601/50000 [06:36<8:55:06,  1.54it/s]


  1%|▍                                    | 602/50000 [06:37<8:34:16,  1.60it/s]


  1%|▍                                    | 603/50000 [06:37<8:39:49,  1.58it/s]


  1%|▍                                    | 604/50000 [06:38<8:43:48,  1.57it/s]


  1%|▍                                    | 605/50000 [06:39<9:03:51,  1.51it/s]


  1%|▍                                    | 606/50000 [06:39<9:03:22,  1.52it/s]


  1%|▍                                    | 607/50000 [06:40<8:18:35,  1.65it/s]


  1%|▍                                    | 608/50000 [06:40<8:31:26,  1.61it/s]


  1%|▍                                    | 609/50000 [06:41<8:20:14,  1.65it/s]


  1%|▍                                    | 610/50000 [06:42<8:23:23,  1.64it/s]


  1%|▍                                    | 611/50000 [06:42<8:20:40,  1.64it/s]


  1%|▍                                    | 612/50000 [06:43<8:11:08,  1.68it/s]


  1%|▍                                    | 613/50000 [06:43<8:50:17,  1.55it/s]


  1%|▍                                    | 614/50000 [06:44<8:27:19,  1.62it/s]


  1%|▍                                    | 615/50000 [06:45<8:12:54,  1.67it/s]


  1%|▍                                    | 616/50000 [06:45<8:33:50,  1.60it/s]


  1%|▍                                    | 617/50000 [06:46<8:14:22,  1.66it/s]


  1%|▍                                    | 618/50000 [06:46<8:20:38,  1.64it/s]


  1%|▍                                    | 619/50000 [06:47<8:50:39,  1.55it/s]


  1%|▍                                    | 620/50000 [06:48<8:53:50,  1.54it/s]


  1%|▍                                    | 621/50000 [06:49<9:41:21,  1.42it/s]


  1%|▍                                    | 622/50000 [06:49<9:12:09,  1.49it/s]


  1%|▍                                    | 623/50000 [06:50<8:56:20,  1.53it/s]


  1%|▍                                    | 624/50000 [06:50<8:31:29,  1.61it/s]


  1%|▍                                    | 625/50000 [06:51<8:42:02,  1.58it/s]


  1%|▍                                    | 626/50000 [06:52<8:12:48,  1.67it/s]


  1%|▍                                    | 627/50000 [06:52<8:24:23,  1.63it/s]


  1%|▍                                    | 628/50000 [06:53<8:28:07,  1.62it/s]


  1%|▍                                    | 629/50000 [06:53<8:13:54,  1.67it/s]


  1%|▍                                    | 630/50000 [06:54<8:20:48,  1.64it/s]


  1%|▍                                    | 631/50000 [06:55<8:59:06,  1.53it/s]


  1%|▍                                    | 632/50000 [06:56<9:27:24,  1.45it/s]


  1%|▍                                    | 633/50000 [06:56<9:15:20,  1.48it/s]


  1%|▍                                    | 634/50000 [06:57<8:46:40,  1.56it/s]


  1%|▍                                    | 635/50000 [06:57<8:33:10,  1.60it/s]


  1%|▍                                    | 636/50000 [06:58<9:22:54,  1.46it/s]


  1%|▍                                    | 637/50000 [06:59<9:47:22,  1.40it/s]


  1%|▍                                    | 638/50000 [07:00<9:34:08,  1.43it/s]


  1%|▍                                    | 639/50000 [07:00<9:23:12,  1.46it/s]


  1%|▍                                    | 640/50000 [07:01<8:49:16,  1.55it/s]


  1%|▍                                    | 641/50000 [07:01<8:34:41,  1.60it/s]


  1%|▍                                    | 642/50000 [07:02<8:36:59,  1.59it/s]


  1%|▍                                    | 643/50000 [07:03<8:27:27,  1.62it/s]


  1%|▍                                    | 644/50000 [07:03<8:53:42,  1.54it/s]


  1%|▍                                    | 645/50000 [07:04<8:19:25,  1.65it/s]


  1%|▍                                    | 646/50000 [07:05<8:22:51,  1.64it/s]


  1%|▍                                    | 647/50000 [07:05<8:53:47,  1.54it/s]


  1%|▍                                    | 648/50000 [07:06<8:41:20,  1.58it/s]


  1%|▍                                    | 649/50000 [07:06<8:24:07,  1.63it/s]


  1%|▍                                    | 650/50000 [07:07<9:14:01,  1.48it/s]


  1%|▍                                    | 651/50000 [07:08<9:40:29,  1.42it/s]


  1%|▍                                    | 652/50000 [07:09<9:11:27,  1.49it/s]


  1%|▍                                    | 653/50000 [07:09<8:43:33,  1.57it/s]


  1%|▍                                    | 654/50000 [07:10<9:05:12,  1.51it/s]


  1%|▍                                    | 655/50000 [07:11<9:27:59,  1.45it/s]


  1%|▍                                    | 656/50000 [07:11<9:24:39,  1.46it/s]


  1%|▍                                    | 657/50000 [07:12<8:49:42,  1.55it/s]


  1%|▍                                    | 658/50000 [07:12<8:43:53,  1.57it/s]


  1%|▍                                    | 659/50000 [07:13<8:53:41,  1.54it/s]


  1%|▍                                    | 660/50000 [07:14<8:49:46,  1.55it/s]


  1%|▍                                    | 661/50000 [07:15<9:23:19,  1.46it/s]


  1%|▍                                    | 662/50000 [07:15<8:54:55,  1.54it/s]


  1%|▍                                    | 663/50000 [07:16<8:38:04,  1.59it/s]


  1%|▍                                    | 664/50000 [07:16<8:41:51,  1.58it/s]


  1%|▍                                    | 665/50000 [07:17<9:27:58,  1.45it/s]


  1%|▍                                    | 666/50000 [07:18<8:56:35,  1.53it/s]


  1%|▍                                    | 667/50000 [07:18<9:14:00,  1.48it/s]


  1%|▍                                    | 668/50000 [07:19<9:10:11,  1.49it/s]


  1%|▍                                    | 669/50000 [07:20<9:13:35,  1.49it/s]


  1%|▍                                    | 670/50000 [07:20<8:54:08,  1.54it/s]


  1%|▍                                    | 671/50000 [07:21<8:32:49,  1.60it/s]


  1%|▍                                    | 672/50000 [07:22<8:17:57,  1.65it/s]


  1%|▍                                    | 673/50000 [07:22<8:43:13,  1.57it/s]


  1%|▍                                    | 674/50000 [07:23<9:01:08,  1.52it/s]


  1%|▍                                    | 675/50000 [07:24<8:44:09,  1.57it/s]


  1%|▌                                    | 676/50000 [07:24<8:30:43,  1.61it/s]


  1%|▌                                    | 677/50000 [07:25<8:21:06,  1.64it/s]


  1%|▌                                    | 678/50000 [07:25<8:33:08,  1.60it/s]


  1%|▌                                    | 679/50000 [07:26<8:54:40,  1.54it/s]


  1%|▌                                    | 680/50000 [07:27<8:30:27,  1.61it/s]


  1%|▌                                    | 681/50000 [07:27<8:52:35,  1.54it/s]


  1%|▌                                    | 682/50000 [07:28<8:57:11,  1.53it/s]


  1%|▌                                    | 683/50000 [07:29<9:04:52,  1.51it/s]


  1%|▌                                    | 684/50000 [07:29<8:30:39,  1.61it/s]


  1%|▌                                    | 685/50000 [07:30<8:52:39,  1.54it/s]


  1%|▌                                    | 686/50000 [07:31<9:00:14,  1.52it/s]


  1%|▌                                    | 687/50000 [07:31<8:46:45,  1.56it/s]


  1%|▌                                    | 688/50000 [07:32<8:25:06,  1.63it/s]


  1%|▌                                    | 689/50000 [07:32<8:30:15,  1.61it/s]


  1%|▌                                    | 690/50000 [07:33<8:37:47,  1.59it/s]


  1%|▌                                    | 691/50000 [07:34<8:36:55,  1.59it/s]


  1%|▌                                    | 692/50000 [07:34<9:13:02,  1.49it/s]


  1%|▌                                    | 693/50000 [07:35<8:42:55,  1.57it/s]


  1%|▌                                    | 694/50000 [07:36<9:30:11,  1.44it/s]


  1%|▌                                    | 695/50000 [07:36<9:01:07,  1.52it/s]


  1%|▌                                    | 696/50000 [07:37<8:59:18,  1.52it/s]


  1%|▌                                    | 697/50000 [07:38<8:59:49,  1.52it/s]


  1%|▌                                    | 698/50000 [07:38<8:56:59,  1.53it/s]


  1%|▌                                    | 699/50000 [07:39<9:15:42,  1.48it/s]


  1%|▌                                    | 700/50000 [07:40<8:53:52,  1.54it/s]
                                                                                
{'loss': 3.4752, 'grad_norm': 2.281412124633789, 'learning_rate': 0.0009860000000000001, 'epoch': 0.04}

  1%|▌                                    | 700/50000 [07:40<8:53:52,  1.54it/s]


  1%|▌                                    | 701/50000 [07:40<9:10:51,  1.49it/s]


  1%|▌                                    | 702/50000 [07:41<8:50:35,  1.55it/s]


  1%|▌                                    | 703/50000 [07:42<8:58:11,  1.53it/s]


  1%|▌                                    | 704/50000 [07:42<8:40:09,  1.58it/s]


  1%|▌                                    | 705/50000 [07:43<8:49:27,  1.55it/s]


  1%|▌                                    | 706/50000 [07:43<8:14:44,  1.66it/s]


  1%|▌                                    | 707/50000 [07:44<8:25:09,  1.63it/s]


  1%|▌                                    | 708/50000 [07:45<8:09:36,  1.68it/s]


  1%|▌                                    | 709/50000 [07:45<8:05:17,  1.69it/s]


  1%|▌                                    | 710/50000 [07:46<7:57:36,  1.72it/s]


  1%|▌                                    | 711/50000 [07:46<8:31:48,  1.61it/s]


  1%|▌                                    | 712/50000 [07:47<8:33:04,  1.60it/s]


  1%|▌                                    | 713/50000 [07:48<8:23:13,  1.63it/s]


  1%|▌                                    | 714/50000 [07:48<8:16:16,  1.66it/s]


  1%|▌                                    | 715/50000 [07:49<8:36:29,  1.59it/s]


  1%|▌                                    | 716/50000 [07:50<8:20:51,  1.64it/s]


  1%|▌                                    | 717/50000 [07:50<8:49:15,  1.55it/s]


  1%|▌                                    | 718/50000 [07:51<9:23:28,  1.46it/s]


  1%|▌                                    | 719/50000 [07:52<9:20:21,  1.47it/s]


  1%|▌                                    | 720/50000 [07:52<9:14:58,  1.48it/s]


  1%|▌                                    | 721/50000 [07:53<9:59:27,  1.37it/s]


  1%|▌                                    | 722/50000 [07:54<9:37:22,  1.42it/s]


  1%|▌                                    | 723/50000 [07:54<8:42:01,  1.57it/s]


  1%|▌                                    | 724/50000 [07:55<8:44:20,  1.57it/s]


  1%|▌                                    | 725/50000 [07:56<8:49:15,  1.55it/s]


  1%|▌                                    | 726/50000 [07:56<8:47:57,  1.56it/s]


  1%|▌                                    | 727/50000 [07:57<8:34:23,  1.60it/s]


  1%|▌                                    | 728/50000 [07:57<8:08:44,  1.68it/s]


  1%|▌                                    | 729/50000 [07:58<8:40:15,  1.58it/s]


  1%|▌                                    | 730/50000 [07:59<8:33:44,  1.60it/s]


  1%|▌                                    | 731/50000 [07:59<8:26:14,  1.62it/s]


  1%|▌                                    | 732/50000 [08:00<8:42:51,  1.57it/s]


  1%|▌                                    | 733/50000 [08:01<8:54:18,  1.54it/s]


  1%|▌                                    | 734/50000 [08:01<8:38:54,  1.58it/s]


  1%|▌                                    | 735/50000 [08:02<8:28:09,  1.62it/s]


  1%|▌                                    | 736/50000 [08:02<8:09:55,  1.68it/s]


  1%|▌                                    | 737/50000 [08:03<8:42:42,  1.57it/s]


  1%|▌                                    | 738/50000 [08:04<8:46:22,  1.56it/s]


  1%|▌                                    | 739/50000 [08:05<9:16:03,  1.48it/s]


  1%|▌                                    | 740/50000 [08:05<8:51:07,  1.55it/s]


  1%|▌                                    | 741/50000 [08:06<8:58:33,  1.52it/s]


  1%|▌                                    | 742/50000 [08:06<8:31:20,  1.61it/s]


  1%|▌                                    | 743/50000 [08:07<9:01:35,  1.52it/s]


  1%|▌                                    | 744/50000 [08:08<8:40:03,  1.58it/s]


  1%|▌                                    | 745/50000 [08:08<8:38:13,  1.58it/s]


  1%|▌                                    | 746/50000 [08:09<8:32:15,  1.60it/s]


  1%|▌                                    | 747/50000 [08:09<8:14:17,  1.66it/s]


  1%|▌                                    | 748/50000 [08:10<8:04:45,  1.69it/s]


  1%|▌                                    | 749/50000 [08:11<8:02:44,  1.70it/s]


  2%|▌                                    | 750/50000 [08:11<8:13:49,  1.66it/s]


  2%|▌                                    | 751/50000 [08:12<8:32:26,  1.60it/s]


  2%|▌                                    | 752/50000 [08:13<8:37:33,  1.59it/s]


  2%|▌                                    | 753/50000 [08:13<9:08:15,  1.50it/s]


  2%|▌                                    | 754/50000 [08:14<9:18:19,  1.47it/s]


  2%|▌                                    | 755/50000 [08:15<9:27:36,  1.45it/s]


  2%|▌                                    | 756/50000 [08:15<9:38:38,  1.42it/s]


  2%|▌                                    | 757/50000 [08:16<9:28:37,  1.44it/s]


  2%|▌                                    | 758/50000 [08:17<9:01:19,  1.52it/s]


  2%|▌                                    | 759/50000 [08:17<8:56:11,  1.53it/s]


  2%|▌                                    | 760/50000 [08:18<8:55:48,  1.53it/s]


  2%|▌                                    | 761/50000 [08:19<8:52:26,  1.54it/s]


  2%|▌                                    | 762/50000 [08:19<8:18:04,  1.65it/s]


  2%|▌                                    | 763/50000 [08:20<8:02:53,  1.70it/s]


  2%|▌                                    | 764/50000 [08:20<8:10:55,  1.67it/s]


  2%|▌                                    | 765/50000 [08:21<8:18:02,  1.65it/s]


  2%|▌                                    | 766/50000 [08:22<8:43:07,  1.57it/s]


  2%|▌                                    | 767/50000 [08:22<8:21:38,  1.64it/s]


  2%|▌                                    | 768/50000 [08:23<8:08:26,  1.68it/s]


  2%|▌                                    | 769/50000 [08:23<8:02:08,  1.70it/s]


  2%|▌                                    | 770/50000 [08:24<8:34:15,  1.60it/s]


  2%|▌                                    | 771/50000 [08:25<8:07:18,  1.68it/s]


  2%|▌                                    | 772/50000 [08:25<8:15:14,  1.66it/s]


  2%|▌                                    | 773/50000 [08:26<8:23:59,  1.63it/s]


  2%|▌                                    | 774/50000 [08:27<9:21:24,  1.46it/s]


  2%|▌                                    | 775/50000 [08:27<9:55:23,  1.38it/s]


  2%|▌                                   | 776/50000 [08:28<10:04:23,  1.36it/s]


  2%|▌                                    | 777/50000 [08:29<9:40:19,  1.41it/s]


  2%|▌                                    | 778/50000 [08:30<9:18:55,  1.47it/s]


  2%|▌                                    | 779/50000 [08:30<9:05:22,  1.50it/s]


  2%|▌                                    | 780/50000 [08:31<9:02:07,  1.51it/s]


  2%|▌                                    | 781/50000 [08:32<9:17:27,  1.47it/s]


  2%|▌                                    | 782/50000 [08:32<8:59:34,  1.52it/s]


  2%|▌                                    | 783/50000 [08:33<8:58:23,  1.52it/s]


  2%|▌                                    | 784/50000 [08:34<9:17:10,  1.47it/s]


  2%|▌                                    | 785/50000 [08:34<9:06:18,  1.50it/s]


  2%|▌                                    | 786/50000 [08:35<9:17:17,  1.47it/s]


  2%|▌                                    | 787/50000 [08:36<9:14:49,  1.48it/s]


  2%|▌                                    | 788/50000 [08:36<9:26:54,  1.45it/s]


  2%|▌                                    | 789/50000 [08:37<9:20:18,  1.46it/s]


  2%|▌                                    | 790/50000 [08:38<9:14:51,  1.48it/s]


  2%|▌                                    | 791/50000 [08:38<9:02:40,  1.51it/s]


  2%|▌                                    | 792/50000 [08:39<9:04:53,  1.51it/s]


  2%|▌                                    | 793/50000 [08:40<9:06:25,  1.50it/s]


  2%|▌                                    | 794/50000 [08:40<8:45:54,  1.56it/s]


  2%|▌                                    | 795/50000 [08:41<8:25:16,  1.62it/s]


  2%|▌                                    | 796/50000 [08:41<8:29:29,  1.61it/s]


  2%|▌                                    | 797/50000 [08:42<8:33:52,  1.60it/s]


  2%|▌                                    | 798/50000 [08:43<8:58:21,  1.52it/s]


  2%|▌                                    | 799/50000 [08:43<9:14:51,  1.48it/s]


  2%|▌                                    | 800/50000 [08:44<9:52:19,  1.38it/s]
                                                                                
{'loss': 3.4416, 'grad_norm': 2.0989389419555664, 'learning_rate': 0.000984, 'epoch': 0.04}

  2%|▌                                    | 800/50000 [08:44<9:52:19,  1.38it/s]


  2%|▌                                    | 801/50000 [08:45<9:33:24,  1.43it/s]


  2%|▌                                    | 802/50000 [08:46<9:20:12,  1.46it/s]


  2%|▌                                    | 803/50000 [08:46<8:55:25,  1.53it/s]


  2%|▌                                    | 804/50000 [08:47<8:18:44,  1.64it/s]


  2%|▌                                    | 805/50000 [08:47<8:24:50,  1.62it/s]


  2%|▌                                    | 806/50000 [08:48<8:48:09,  1.55it/s]


  2%|▌                                    | 807/50000 [08:49<8:50:04,  1.55it/s]


  2%|▌                                    | 808/50000 [08:49<8:49:57,  1.55it/s]


  2%|▌                                    | 809/50000 [08:50<8:34:15,  1.59it/s]


  2%|▌                                    | 810/50000 [08:50<8:27:50,  1.61it/s]


  2%|▌                                    | 811/50000 [08:51<8:40:29,  1.58it/s]


  2%|▌                                    | 812/50000 [08:52<8:21:32,  1.63it/s]


  2%|▌                                    | 813/50000 [08:52<8:45:28,  1.56it/s]


  2%|▌                                    | 814/50000 [08:53<9:27:14,  1.45it/s]


  2%|▌                                    | 815/50000 [08:54<9:24:58,  1.45it/s]


  2%|▌                                    | 816/50000 [08:55<9:23:41,  1.45it/s]


  2%|▌                                    | 817/50000 [08:55<9:08:48,  1.49it/s]


  2%|▌                                    | 818/50000 [08:56<9:10:42,  1.49it/s]


  2%|▌                                    | 819/50000 [08:56<8:53:22,  1.54it/s]


  2%|▌                                    | 820/50000 [08:57<8:58:18,  1.52it/s]


  2%|▌                                    | 821/50000 [08:58<8:57:03,  1.53it/s]


  2%|▌                                    | 822/50000 [08:58<8:54:33,  1.53it/s]


  2%|▌                                    | 823/50000 [08:59<9:16:43,  1.47it/s]


  2%|▌                                    | 824/50000 [09:00<9:17:50,  1.47it/s]


  2%|▌                                    | 825/50000 [09:00<9:04:35,  1.50it/s]


  2%|▌                                    | 826/50000 [09:01<8:42:55,  1.57it/s]


  2%|▌                                    | 827/50000 [09:02<9:24:59,  1.45it/s]


  2%|▌                                    | 828/50000 [09:02<8:57:10,  1.53it/s]


  2%|▌                                    | 829/50000 [09:03<8:21:22,  1.63it/s]


  2%|▌                                    | 830/50000 [09:04<8:15:38,  1.65it/s]


  2%|▌                                    | 831/50000 [09:04<8:30:18,  1.61it/s]


  2%|▌                                    | 832/50000 [09:05<8:26:26,  1.62it/s]


  2%|▌                                    | 833/50000 [09:06<8:57:14,  1.53it/s]


  2%|▌                                    | 834/50000 [09:06<9:13:57,  1.48it/s]


  2%|▌                                    | 835/50000 [09:07<9:04:52,  1.50it/s]


  2%|▌                                    | 836/50000 [09:08<9:43:28,  1.40it/s]


  2%|▌                                    | 837/50000 [09:08<9:24:06,  1.45it/s]


  2%|▌                                    | 838/50000 [09:09<9:29:04,  1.44it/s]


  2%|▌                                    | 839/50000 [09:10<9:20:05,  1.46it/s]


  2%|▌                                    | 840/50000 [09:10<9:29:58,  1.44it/s]


  2%|▌                                    | 841/50000 [09:11<9:22:31,  1.46it/s]


  2%|▌                                    | 842/50000 [09:12<8:56:58,  1.53it/s]


  2%|▌                                    | 843/50000 [09:12<9:00:41,  1.52it/s]


  2%|▌                                    | 844/50000 [09:13<9:14:53,  1.48it/s]


  2%|▋                                    | 845/50000 [09:14<8:46:53,  1.55it/s]


  2%|▋                                    | 846/50000 [09:15<9:39:03,  1.41it/s]


  2%|▋                                    | 847/50000 [09:15<9:00:49,  1.51it/s]


  2%|▋                                    | 848/50000 [09:16<9:37:25,  1.42it/s]


  2%|▌                                   | 849/50000 [09:17<10:14:24,  1.33it/s]


  2%|▋                                    | 850/50000 [09:17<9:43:58,  1.40it/s]


  2%|▋                                    | 851/50000 [09:18<9:30:32,  1.44it/s]


  2%|▋                                    | 852/50000 [09:19<8:59:27,  1.52it/s]


  2%|▋                                    | 853/50000 [09:19<8:57:39,  1.52it/s]


  2%|▋                                    | 854/50000 [09:20<8:24:44,  1.62it/s]


  2%|▋                                    | 855/50000 [09:20<8:27:25,  1.61it/s]


  2%|▋                                    | 856/50000 [09:21<8:36:59,  1.58it/s]


  2%|▋                                    | 857/50000 [09:22<8:35:56,  1.59it/s]


  2%|▋                                    | 858/50000 [09:22<8:24:12,  1.62it/s]


  2%|▋                                    | 859/50000 [09:23<8:49:04,  1.55it/s]


  2%|▋                                    | 860/50000 [09:24<8:52:31,  1.54it/s]


  2%|▋                                    | 861/50000 [09:24<9:01:17,  1.51it/s]


  2%|▋                                    | 862/50000 [09:25<9:13:51,  1.48it/s]


  2%|▋                                    | 863/50000 [09:26<9:25:57,  1.45it/s]


  2%|▋                                    | 864/50000 [09:26<8:55:04,  1.53it/s]


  2%|▋                                    | 865/50000 [09:27<8:21:58,  1.63it/s]


  2%|▋                                    | 866/50000 [09:27<7:54:03,  1.73it/s]


  2%|▋                                    | 867/50000 [09:28<8:09:16,  1.67it/s]


  2%|▋                                    | 868/50000 [09:29<8:18:31,  1.64it/s]


  2%|▋                                    | 869/50000 [09:29<8:43:42,  1.56it/s]


  2%|▋                                    | 870/50000 [09:30<8:47:10,  1.55it/s]


  2%|▋                                    | 871/50000 [09:31<8:34:06,  1.59it/s]


  2%|▋                                    | 872/50000 [09:31<8:43:15,  1.56it/s]


  2%|▋                                    | 873/50000 [09:32<8:44:40,  1.56it/s]


  2%|▋                                    | 874/50000 [09:32<8:33:15,  1.60it/s]


  2%|▋                                    | 875/50000 [09:33<7:58:29,  1.71it/s]


  2%|▋                                    | 876/50000 [09:34<8:11:49,  1.66it/s]


  2%|▋                                    | 877/50000 [09:34<8:25:52,  1.62it/s]


  2%|▋                                    | 878/50000 [09:35<8:20:14,  1.64it/s]


  2%|▋                                    | 879/50000 [09:35<8:25:45,  1.62it/s]


  2%|▋                                    | 880/50000 [09:36<8:59:37,  1.52it/s]


  2%|▋                                    | 881/50000 [09:37<8:40:50,  1.57it/s]


  2%|▋                                    | 882/50000 [09:37<8:29:03,  1.61it/s]


  2%|▋                                    | 883/50000 [09:38<8:55:30,  1.53it/s]


  2%|▋                                    | 884/50000 [09:39<8:50:12,  1.54it/s]


  2%|▋                                    | 885/50000 [09:39<8:57:25,  1.52it/s]


  2%|▋                                    | 886/50000 [09:40<8:43:46,  1.56it/s]


  2%|▋                                    | 887/50000 [09:41<8:23:29,  1.63it/s]


  2%|▋                                    | 888/50000 [09:41<8:27:50,  1.61it/s]


  2%|▋                                    | 889/50000 [09:42<8:37:07,  1.58it/s]


  2%|▋                                    | 890/50000 [09:42<8:23:30,  1.63it/s]


  2%|▋                                    | 891/50000 [09:43<8:36:11,  1.59it/s]


  2%|▋                                    | 892/50000 [09:44<9:42:09,  1.41it/s]


  2%|▋                                    | 893/50000 [09:45<9:13:33,  1.48it/s]


  2%|▋                                    | 894/50000 [09:45<8:47:25,  1.55it/s]


  2%|▋                                    | 895/50000 [09:46<9:17:45,  1.47it/s]


  2%|▋                                    | 896/50000 [09:47<9:40:50,  1.41it/s]


  2%|▋                                    | 897/50000 [09:47<8:48:45,  1.55it/s]


  2%|▋                                    | 898/50000 [09:48<8:50:10,  1.54it/s]


  2%|▋                                    | 899/50000 [09:48<8:32:36,  1.60it/s]


  2%|▋                                    | 900/50000 [09:49<8:43:39,  1.56it/s]
                                                                                
{'loss': 3.4138, 'grad_norm': 2.2522268295288086, 'learning_rate': 0.000982, 'epoch': 0.05}

  2%|▋                                    | 900/50000 [09:49<8:43:39,  1.56it/s]


  2%|▋                                    | 901/50000 [09:50<9:35:22,  1.42it/s]


  2%|▋                                    | 902/50000 [09:51<8:58:52,  1.52it/s]


  2%|▋                                    | 903/50000 [09:51<9:02:30,  1.51it/s]


  2%|▋                                    | 904/50000 [09:52<8:59:06,  1.52it/s]


  2%|▋                                    | 905/50000 [09:53<9:02:42,  1.51it/s]


  2%|▋                                    | 906/50000 [09:53<9:00:16,  1.51it/s]


  2%|▋                                    | 907/50000 [09:54<8:32:20,  1.60it/s]


  2%|▋                                    | 908/50000 [09:54<8:34:28,  1.59it/s]


  2%|▋                                    | 909/50000 [09:55<8:42:07,  1.57it/s]


  2%|▋                                    | 910/50000 [09:56<8:27:21,  1.61it/s]


  2%|▋                                    | 911/50000 [09:56<8:23:10,  1.63it/s]


  2%|▋                                    | 912/50000 [09:57<8:58:03,  1.52it/s]


  2%|▋                                    | 913/50000 [09:58<8:30:52,  1.60it/s]


  2%|▋                                    | 914/50000 [09:58<8:01:43,  1.70it/s]


  2%|▋                                    | 915/50000 [09:59<8:25:28,  1.62it/s]


  2%|▋                                    | 916/50000 [09:59<8:18:48,  1.64it/s]


  2%|▋                                    | 917/50000 [10:00<8:18:37,  1.64it/s]


  2%|▋                                    | 918/50000 [10:00<7:54:37,  1.72it/s]


  2%|▋                                    | 919/50000 [10:01<8:37:30,  1.58it/s]


  2%|▋                                    | 920/50000 [10:02<8:22:38,  1.63it/s]


  2%|▋                                    | 921/50000 [10:02<8:38:30,  1.58it/s]


  2%|▋                                    | 922/50000 [10:03<8:29:26,  1.61it/s]


  2%|▋                                    | 923/50000 [10:04<8:51:01,  1.54it/s]


  2%|▋                                    | 924/50000 [10:04<8:32:03,  1.60it/s]


  2%|▋                                    | 925/50000 [10:05<8:24:46,  1.62it/s]


  2%|▋                                    | 926/50000 [10:05<8:09:01,  1.67it/s]


  2%|▋                                    | 927/50000 [10:06<9:26:29,  1.44it/s]


  2%|▋                                    | 928/50000 [10:07<9:24:19,  1.45it/s]


  2%|▋                                    | 929/50000 [10:08<9:14:54,  1.47it/s]


  2%|▋                                    | 930/50000 [10:08<9:25:55,  1.45it/s]


  2%|▋                                   | 931/50000 [10:09<10:04:18,  1.35it/s]


  2%|▋                                    | 932/50000 [10:10<9:38:11,  1.41it/s]


  2%|▋                                    | 933/50000 [10:11<9:10:37,  1.49it/s]


  2%|▋                                    | 934/50000 [10:11<9:05:15,  1.50it/s]


  2%|▋                                    | 935/50000 [10:12<9:33:11,  1.43it/s]


  2%|▋                                    | 936/50000 [10:13<8:57:52,  1.52it/s]


  2%|▋                                    | 937/50000 [10:13<8:51:48,  1.54it/s]


  2%|▋                                    | 938/50000 [10:14<8:35:04,  1.59it/s]


  2%|▋                                    | 939/50000 [10:14<8:05:27,  1.68it/s]


  2%|▋                                    | 940/50000 [10:15<7:46:15,  1.75it/s]


  2%|▋                                    | 941/50000 [10:16<8:31:35,  1.60it/s]


  2%|▋                                    | 942/50000 [10:16<8:25:26,  1.62it/s]


  2%|▋                                    | 943/50000 [10:17<8:35:12,  1.59it/s]


  2%|▋                                    | 944/50000 [10:17<8:43:34,  1.56it/s]


  2%|▋                                    | 945/50000 [10:18<8:43:12,  1.56it/s]


  2%|▋                                    | 946/50000 [10:19<8:38:48,  1.58it/s]


  2%|▋                                    | 947/50000 [10:19<8:46:08,  1.55it/s]


  2%|▋                                    | 948/50000 [10:20<8:41:54,  1.57it/s]


  2%|▋                                    | 949/50000 [10:21<8:42:05,  1.57it/s]


  2%|▋                                    | 950/50000 [10:21<8:32:49,  1.59it/s]


  2%|▋                                    | 951/50000 [10:22<8:41:55,  1.57it/s]


  2%|▋                                    | 952/50000 [10:23<9:07:24,  1.49it/s]


  2%|▋                                    | 953/50000 [10:23<9:05:59,  1.50it/s]


  2%|▋                                    | 954/50000 [10:24<9:06:34,  1.50it/s]


  2%|▋                                    | 955/50000 [10:25<9:02:22,  1.51it/s]


  2%|▋                                    | 956/50000 [10:25<8:46:52,  1.55it/s]


  2%|▋                                    | 957/50000 [10:26<8:33:02,  1.59it/s]


  2%|▋                                    | 958/50000 [10:26<8:32:39,  1.59it/s]


  2%|▋                                    | 959/50000 [10:27<8:46:21,  1.55it/s]


  2%|▋                                    | 960/50000 [10:28<8:16:27,  1.65it/s]


  2%|▋                                    | 961/50000 [10:28<8:35:10,  1.59it/s]


  2%|▋                                    | 962/50000 [10:29<8:34:17,  1.59it/s]


  2%|▋                                    | 963/50000 [10:30<8:57:19,  1.52it/s]


  2%|▋                                    | 964/50000 [10:30<8:54:17,  1.53it/s]


  2%|▋                                    | 965/50000 [10:31<8:33:38,  1.59it/s]


  2%|▋                                    | 966/50000 [10:32<8:37:42,  1.58it/s]


  2%|▋                                    | 967/50000 [10:32<9:20:56,  1.46it/s]


  2%|▋                                    | 968/50000 [10:33<9:15:36,  1.47it/s]


  2%|▋                                    | 969/50000 [10:34<9:38:59,  1.41it/s]


  2%|▋                                   | 970/50000 [10:35<10:09:53,  1.34it/s]


  2%|▋                                   | 971/50000 [10:35<10:00:59,  1.36it/s]


  2%|▋                                    | 972/50000 [10:36<9:46:42,  1.39it/s]


  2%|▋                                    | 973/50000 [10:37<9:48:04,  1.39it/s]


  2%|▋                                    | 974/50000 [10:37<9:37:38,  1.41it/s]


  2%|▋                                    | 975/50000 [10:38<9:27:14,  1.44it/s]


  2%|▋                                    | 976/50000 [10:39<8:59:46,  1.51it/s]


  2%|▋                                    | 977/50000 [10:39<8:59:10,  1.52it/s]


  2%|▋                                    | 978/50000 [10:40<8:35:32,  1.58it/s]


  2%|▋                                    | 979/50000 [10:41<8:36:02,  1.58it/s]


  2%|▋                                    | 980/50000 [10:41<8:44:23,  1.56it/s]


  2%|▋                                    | 981/50000 [10:42<8:28:49,  1.61it/s]


  2%|▋                                    | 982/50000 [10:42<8:57:12,  1.52it/s]


  2%|▋                                    | 983/50000 [10:43<8:54:48,  1.53it/s]


  2%|▋                                    | 984/50000 [10:44<8:52:56,  1.53it/s]


  2%|▋                                    | 985/50000 [10:44<8:28:26,  1.61it/s]


  2%|▋                                    | 986/50000 [10:45<8:31:18,  1.60it/s]


  2%|▋                                    | 987/50000 [10:46<9:09:19,  1.49it/s]


  2%|▋                                    | 988/50000 [10:46<8:50:10,  1.54it/s]


  2%|▋                                    | 989/50000 [10:47<8:33:46,  1.59it/s]


  2%|▋                                    | 990/50000 [10:47<8:07:41,  1.67it/s]


  2%|▋                                    | 991/50000 [10:48<8:40:12,  1.57it/s]


  2%|▋                                    | 992/50000 [10:49<8:37:48,  1.58it/s]


  2%|▋                                    | 993/50000 [10:49<8:25:16,  1.62it/s]


  2%|▋                                    | 994/50000 [10:50<8:17:51,  1.64it/s]


  2%|▋                                    | 995/50000 [10:51<8:29:57,  1.60it/s]


  2%|▋                                    | 996/50000 [10:51<8:13:41,  1.65it/s]


  2%|▋                                    | 997/50000 [10:52<8:29:53,  1.60it/s]


  2%|▋                                    | 998/50000 [10:52<8:11:56,  1.66it/s]


  2%|▋                                    | 999/50000 [10:53<8:07:29,  1.68it/s]


  2%|▋                                   | 1000/50000 [10:54<7:57:55,  1.71it/s]
                                                                                
{'loss': 3.4178, 'grad_norm': 2.262876272201538, 'learning_rate': 0.00098, 'epoch': 0.05}

  2%|▋                                   | 1000/50000 [10:54<7:57:55,  1.71it/s]


  2%|▋                                   | 1001/50000 [10:54<7:58:45,  1.71it/s]


  2%|▋                                   | 1002/50000 [10:55<8:30:18,  1.60it/s]


  2%|▋                                   | 1003/50000 [10:55<8:19:50,  1.63it/s]


  2%|▋                                   | 1004/50000 [10:56<8:15:38,  1.65it/s]


  2%|▋                                   | 1005/50000 [10:57<8:44:13,  1.56it/s]


  2%|▋                                   | 1006/50000 [10:57<8:42:01,  1.56it/s]


  2%|▋                                   | 1007/50000 [10:58<8:49:49,  1.54it/s]


  2%|▋                                   | 1008/50000 [10:59<9:12:53,  1.48it/s]


  2%|▋                                   | 1009/50000 [11:00<9:31:30,  1.43it/s]


  2%|▋                                   | 1010/50000 [11:00<9:04:18,  1.50it/s]


  2%|▋                                   | 1011/50000 [11:01<8:39:05,  1.57it/s]


  2%|▋                                   | 1012/50000 [11:01<8:44:37,  1.56it/s]


  2%|▋                                   | 1013/50000 [11:02<8:22:13,  1.63it/s]


  2%|▋                                   | 1014/50000 [11:03<9:02:50,  1.50it/s]


  2%|▋                                   | 1015/50000 [11:03<8:52:01,  1.53it/s]


  2%|▋                                   | 1016/50000 [11:04<8:46:00,  1.55it/s]


  2%|▋                                   | 1017/50000 [11:05<9:18:04,  1.46it/s]


  2%|▋                                   | 1018/50000 [11:05<9:06:07,  1.49it/s]


  2%|▋                                   | 1019/50000 [11:06<8:51:24,  1.54it/s]


  2%|▋                                   | 1020/50000 [11:07<8:25:54,  1.61it/s]


  2%|▋                                   | 1021/50000 [11:07<8:32:47,  1.59it/s]


  2%|▋                                   | 1022/50000 [11:08<8:26:45,  1.61it/s]


  2%|▋                                   | 1023/50000 [11:08<8:19:17,  1.63it/s]


  2%|▋                                   | 1024/50000 [11:09<9:11:11,  1.48it/s]


  2%|▋                                   | 1025/50000 [11:10<8:51:41,  1.54it/s]


  2%|▋                                   | 1026/50000 [11:10<8:36:13,  1.58it/s]


  2%|▋                                   | 1027/50000 [11:11<8:48:38,  1.54it/s]


  2%|▋                                   | 1028/50000 [11:12<9:42:22,  1.40it/s]


  2%|▋                                   | 1029/50000 [11:12<9:07:22,  1.49it/s]


  2%|▋                                   | 1030/50000 [11:13<9:41:15,  1.40it/s]


  2%|▋                                   | 1031/50000 [11:14<9:21:45,  1.45it/s]


  2%|▋                                   | 1032/50000 [11:15<9:28:20,  1.44it/s]


  2%|▋                                   | 1033/50000 [11:15<9:03:19,  1.50it/s]


  2%|▋                                   | 1034/50000 [11:16<9:20:42,  1.46it/s]


  2%|▋                                   | 1035/50000 [11:17<9:26:06,  1.44it/s]


  2%|▋                                   | 1036/50000 [11:17<8:35:01,  1.58it/s]


  2%|▋                                   | 1037/50000 [11:18<8:27:40,  1.61it/s]


  2%|▋                                   | 1038/50000 [11:18<8:30:23,  1.60it/s]


  2%|▋                                   | 1039/50000 [11:19<8:35:02,  1.58it/s]


  2%|▋                                   | 1040/50000 [11:20<8:38:43,  1.57it/s]


  2%|▋                                   | 1041/50000 [11:20<8:39:33,  1.57it/s]


  2%|▊                                   | 1042/50000 [11:21<9:05:28,  1.50it/s]


  2%|▊                                   | 1043/50000 [11:22<8:48:53,  1.54it/s]


  2%|▊                                   | 1044/50000 [11:22<8:32:50,  1.59it/s]


  2%|▊                                   | 1045/50000 [11:23<8:24:58,  1.62it/s]


  2%|▊                                   | 1046/50000 [11:24<9:35:01,  1.42it/s]


  2%|▊                                   | 1047/50000 [11:24<9:36:15,  1.42it/s]


  2%|▊                                   | 1048/50000 [11:25<9:16:37,  1.47it/s]


  2%|▊                                   | 1049/50000 [11:26<9:04:37,  1.50it/s]


  2%|▊                                   | 1050/50000 [11:26<8:57:33,  1.52it/s]


  2%|▊                                   | 1051/50000 [11:27<9:03:39,  1.50it/s]


  2%|▊                                   | 1052/50000 [11:28<8:39:55,  1.57it/s]


  2%|▊                                   | 1053/50000 [11:28<8:21:41,  1.63it/s]


  2%|▊                                   | 1054/50000 [11:29<8:16:40,  1.64it/s]


  2%|▊                                   | 1055/50000 [11:29<8:13:09,  1.65it/s]


  2%|▊                                   | 1056/50000 [11:30<8:23:48,  1.62it/s]


  2%|▊                                   | 1057/50000 [11:31<8:29:13,  1.60it/s]


  2%|▊                                   | 1058/50000 [11:32<9:57:07,  1.37it/s]


  2%|▊                                   | 1059/50000 [11:32<9:36:26,  1.42it/s]


  2%|▊                                   | 1060/50000 [11:33<9:20:05,  1.46it/s]


  2%|▊                                   | 1061/50000 [11:34<8:53:07,  1.53it/s]


  2%|▊                                   | 1062/50000 [11:34<8:35:51,  1.58it/s]


  2%|▊                                   | 1063/50000 [11:35<8:22:04,  1.62it/s]


  2%|▊                                   | 1064/50000 [11:35<8:31:01,  1.60it/s]


  2%|▊                                   | 1065/50000 [11:36<8:34:11,  1.59it/s]


  2%|▊                                   | 1066/50000 [11:37<9:46:07,  1.39it/s]


  2%|▊                                   | 1067/50000 [11:38<9:23:45,  1.45it/s]


  2%|▊                                   | 1068/50000 [11:38<9:54:15,  1.37it/s]


  2%|▊                                   | 1069/50000 [11:39<9:51:40,  1.38it/s]


  2%|▊                                   | 1070/50000 [11:40<9:42:10,  1.40it/s]


  2%|▊                                   | 1071/50000 [11:40<9:41:25,  1.40it/s]


  2%|▊                                   | 1072/50000 [11:41<9:11:48,  1.48it/s]


  2%|▊                                   | 1073/50000 [11:42<9:54:18,  1.37it/s]


  2%|▊                                  | 1074/50000 [11:43<10:23:46,  1.31it/s]


  2%|▊                                   | 1075/50000 [11:43<9:59:42,  1.36it/s]


  2%|▊                                   | 1076/50000 [11:44<9:41:27,  1.40it/s]


  2%|▊                                   | 1077/50000 [11:45<9:31:31,  1.43it/s]


  2%|▊                                   | 1078/50000 [11:45<9:15:21,  1.47it/s]


  2%|▊                                   | 1079/50000 [11:46<9:10:17,  1.48it/s]


  2%|▊                                   | 1080/50000 [11:47<9:18:38,  1.46it/s]


  2%|▊                                   | 1081/50000 [11:47<9:05:57,  1.49it/s]


  2%|▊                                   | 1082/50000 [11:48<8:43:02,  1.56it/s]


  2%|▊                                   | 1083/50000 [11:49<9:06:16,  1.49it/s]


  2%|▊                                   | 1084/50000 [11:49<9:00:17,  1.51it/s]


  2%|▊                                   | 1085/50000 [11:50<8:42:03,  1.56it/s]


  2%|▊                                   | 1086/50000 [11:51<8:30:59,  1.60it/s]


  2%|▊                                   | 1087/50000 [11:51<8:23:06,  1.62it/s]


  2%|▊                                   | 1088/50000 [11:52<8:34:40,  1.58it/s]


  2%|▊                                   | 1089/50000 [11:52<8:16:33,  1.64it/s]


  2%|▊                                   | 1090/50000 [11:53<8:30:05,  1.60it/s]


  2%|▊                                   | 1091/50000 [11:54<8:39:32,  1.57it/s]


  2%|▊                                   | 1092/50000 [11:54<8:05:21,  1.68it/s]


  2%|▊                                   | 1093/50000 [11:55<8:41:31,  1.56it/s]


  2%|▊                                   | 1094/50000 [11:56<9:22:52,  1.45it/s]


  2%|▊                                   | 1095/50000 [11:56<8:57:55,  1.52it/s]


  2%|▊                                   | 1096/50000 [11:57<9:19:42,  1.46it/s]


  2%|▊                                   | 1097/50000 [11:58<9:15:50,  1.47it/s]


  2%|▊                                   | 1098/50000 [11:58<8:45:20,  1.55it/s]


  2%|▊                                   | 1099/50000 [11:59<8:27:01,  1.61it/s]


  2%|▊                                   | 1100/50000 [12:00<8:51:47,  1.53it/s]
                                                                                
{'loss': 3.4491, 'grad_norm': 2.3529934883117676, 'learning_rate': 0.000978, 'epoch': 0.06}

  2%|▊                                   | 1100/50000 [12:00<8:51:47,  1.53it/s]


  2%|▊                                   | 1101/50000 [12:00<9:21:48,  1.45it/s]


  2%|▊                                   | 1102/50000 [12:01<8:55:37,  1.52it/s]


  2%|▊                                   | 1103/50000 [12:02<8:57:23,  1.52it/s]


  2%|▊                                   | 1104/50000 [12:02<8:37:04,  1.58it/s]


  2%|▊                                   | 1105/50000 [12:03<8:49:10,  1.54it/s]


  2%|▊                                   | 1106/50000 [12:04<8:54:48,  1.52it/s]


  2%|▊                                   | 1107/50000 [12:04<8:21:06,  1.63it/s]


  2%|▊                                   | 1108/50000 [12:05<8:15:06,  1.65it/s]


  2%|▊                                   | 1109/50000 [12:05<8:03:33,  1.69it/s]


  2%|▊                                   | 1110/50000 [12:06<8:10:41,  1.66it/s]


  2%|▊                                   | 1111/50000 [12:06<8:09:20,  1.67it/s]


  2%|▊                                   | 1112/50000 [12:07<7:44:32,  1.75it/s]


  2%|▊                                   | 1113/50000 [12:08<8:25:33,  1.61it/s]


  2%|▊                                   | 1114/50000 [12:08<8:16:51,  1.64it/s]


  2%|▊                                   | 1115/50000 [12:09<9:09:07,  1.48it/s]


  2%|▊                                   | 1116/50000 [12:10<8:45:43,  1.55it/s]


  2%|▊                                   | 1117/50000 [12:10<9:37:32,  1.41it/s]


  2%|▊                                   | 1118/50000 [12:11<9:50:43,  1.38it/s]


  2%|▊                                   | 1119/50000 [12:12<9:28:13,  1.43it/s]


  2%|▊                                   | 1120/50000 [12:13<9:35:58,  1.41it/s]


  2%|▊                                   | 1121/50000 [12:13<9:02:23,  1.50it/s]


  2%|▊                                   | 1122/50000 [12:14<9:27:19,  1.44it/s]


  2%|▊                                   | 1123/50000 [12:14<8:35:34,  1.58it/s]


  2%|▊                                   | 1124/50000 [12:15<8:33:55,  1.59it/s]


  2%|▊                                   | 1125/50000 [12:16<8:38:57,  1.57it/s]


  2%|▊                                   | 1126/50000 [12:16<8:47:16,  1.54it/s]


  2%|▊                                   | 1127/50000 [12:17<9:58:32,  1.36it/s]


  2%|▊                                   | 1128/50000 [12:18<9:35:27,  1.42it/s]


  2%|▊                                   | 1129/50000 [12:19<9:37:59,  1.41it/s]


  2%|▊                                   | 1130/50000 [12:19<9:12:57,  1.47it/s]


  2%|▊                                   | 1131/50000 [12:20<9:47:53,  1.39it/s]


  2%|▊                                   | 1132/50000 [12:21<9:27:36,  1.43it/s]


  2%|▊                                   | 1133/50000 [12:21<9:11:52,  1.48it/s]


  2%|▊                                   | 1134/50000 [12:22<9:19:29,  1.46it/s]


  2%|▊                                   | 1135/50000 [12:23<9:53:56,  1.37it/s]


  2%|▊                                   | 1136/50000 [12:24<9:41:08,  1.40it/s]


  2%|▊                                   | 1137/50000 [12:24<9:22:51,  1.45it/s]


  2%|▊                                   | 1138/50000 [12:25<9:33:48,  1.42it/s]


  2%|▊                                   | 1139/50000 [12:26<9:19:10,  1.46it/s]


  2%|▊                                  | 1140/50000 [12:27<10:09:29,  1.34it/s]


  2%|▊                                   | 1141/50000 [12:27<9:28:57,  1.43it/s]


  2%|▊                                   | 1142/50000 [12:28<9:19:10,  1.46it/s]


  2%|▊                                   | 1143/50000 [12:28<9:03:07,  1.50it/s]


  2%|▊                                   | 1144/50000 [12:29<8:54:46,  1.52it/s]


  2%|▊                                   | 1145/50000 [12:30<8:56:39,  1.52it/s]


  2%|▊                                   | 1146/50000 [12:30<9:13:51,  1.47it/s]


  2%|▊                                   | 1147/50000 [12:31<9:55:03,  1.37it/s]


  2%|▊                                  | 1148/50000 [12:32<10:14:13,  1.33it/s]


  2%|▊                                   | 1149/50000 [12:33<9:30:42,  1.43it/s]


  2%|▊                                   | 1150/50000 [12:33<9:14:04,  1.47it/s]


  2%|▊                                   | 1151/50000 [12:34<8:47:01,  1.54it/s]


  2%|▊                                   | 1152/50000 [12:34<8:24:57,  1.61it/s]


  2%|▊                                   | 1153/50000 [12:35<8:06:38,  1.67it/s]


  2%|▊                                   | 1154/50000 [12:36<7:58:22,  1.70it/s]


  2%|▊                                   | 1155/50000 [12:36<8:09:32,  1.66it/s]


  2%|▊                                   | 1156/50000 [12:37<7:55:39,  1.71it/s]


  2%|▊                                   | 1157/50000 [12:37<8:09:07,  1.66it/s]


  2%|▊                                   | 1158/50000 [12:38<9:11:21,  1.48it/s]


  2%|▊                                   | 1159/50000 [12:39<9:12:49,  1.47it/s]


  2%|▊                                   | 1160/50000 [12:39<8:44:33,  1.55it/s]


  2%|▊                                   | 1161/50000 [12:40<8:24:47,  1.61it/s]


  2%|▊                                   | 1162/50000 [12:41<8:21:32,  1.62it/s]


  2%|▊                                   | 1163/50000 [12:41<8:17:48,  1.64it/s]


  2%|▊                                   | 1164/50000 [12:42<8:20:03,  1.63it/s]


  2%|▊                                   | 1165/50000 [12:42<8:18:13,  1.63it/s]


  2%|▊                                   | 1166/50000 [12:43<8:30:57,  1.59it/s]


  2%|▊                                   | 1167/50000 [12:44<8:19:54,  1.63it/s]


  2%|▊                                   | 1168/50000 [12:44<8:54:45,  1.52it/s]


  2%|▊                                   | 1169/50000 [12:45<8:37:59,  1.57it/s]


  2%|▊                                   | 1170/50000 [12:46<8:47:50,  1.54it/s]


  2%|▊                                   | 1171/50000 [12:46<8:47:07,  1.54it/s]


  2%|▊                                   | 1172/50000 [12:47<8:26:32,  1.61it/s]


  2%|▊                                   | 1173/50000 [12:47<8:09:02,  1.66it/s]


  2%|▊                                   | 1174/50000 [12:48<8:13:50,  1.65it/s]


  2%|▊                                   | 1175/50000 [12:49<8:00:02,  1.70it/s]


  2%|▊                                   | 1176/50000 [12:49<7:53:39,  1.72it/s]


  2%|▊                                   | 1177/50000 [12:50<9:03:19,  1.50it/s]


  2%|▊                                   | 1178/50000 [12:51<8:48:34,  1.54it/s]


  2%|▊                                   | 1179/50000 [12:51<8:43:05,  1.56it/s]


  2%|▊                                   | 1180/50000 [12:52<8:48:22,  1.54it/s]


  2%|▊                                   | 1181/50000 [12:52<8:15:54,  1.64it/s]


  2%|▊                                   | 1182/50000 [12:53<7:43:52,  1.75it/s]


  2%|▊                                   | 1183/50000 [12:54<8:17:49,  1.63it/s]


  2%|▊                                   | 1184/50000 [12:54<7:54:25,  1.71it/s]


  2%|▊                                   | 1185/50000 [12:55<8:31:52,  1.59it/s]


  2%|▊                                   | 1186/50000 [12:56<8:54:51,  1.52it/s]


  2%|▊                                   | 1187/50000 [12:56<8:21:56,  1.62it/s]


  2%|▊                                   | 1188/50000 [12:57<9:02:01,  1.50it/s]


  2%|▊                                   | 1189/50000 [12:58<8:46:05,  1.55it/s]


  2%|▊                                   | 1190/50000 [12:58<8:31:51,  1.59it/s]


  2%|▊                                   | 1191/50000 [12:59<8:31:16,  1.59it/s]


  2%|▊                                   | 1192/50000 [12:59<8:21:26,  1.62it/s]


  2%|▊                                   | 1193/50000 [13:00<8:29:59,  1.60it/s]


  2%|▊                                   | 1194/50000 [13:01<8:58:19,  1.51it/s]


  2%|▊                                   | 1195/50000 [13:01<9:02:00,  1.50it/s]


  2%|▊                                   | 1196/50000 [13:02<8:55:22,  1.52it/s]


  2%|▊                                   | 1197/50000 [13:03<9:09:19,  1.48it/s]


  2%|▊                                   | 1198/50000 [13:03<9:08:10,  1.48it/s]


  2%|▊                                   | 1199/50000 [13:04<8:30:21,  1.59it/s]


  2%|▊                                   | 1200/50000 [13:05<8:36:25,  1.57it/s]
                                                                                
{'loss': 3.4479, 'grad_norm': 2.3779749870300293, 'learning_rate': 0.000976, 'epoch': 0.06}

  2%|▊                                   | 1200/50000 [13:05<8:36:25,  1.57it/s]


  2%|▊                                   | 1201/50000 [13:05<8:25:06,  1.61it/s]


  2%|▊                                   | 1202/50000 [13:06<8:27:54,  1.60it/s]


  2%|▊                                  | 1203/50000 [13:07<10:04:34,  1.35it/s]


  2%|▊                                   | 1204/50000 [13:07<9:23:44,  1.44it/s]


  2%|▊                                  | 1205/50000 [13:08<10:05:06,  1.34it/s]


  2%|▊                                   | 1206/50000 [13:09<9:48:08,  1.38it/s]


  2%|▊                                   | 1207/50000 [13:10<9:28:08,  1.43it/s]


  2%|▊                                   | 1208/50000 [13:10<9:15:44,  1.46it/s]


  2%|▊                                   | 1209/50000 [13:11<9:25:57,  1.44it/s]


  2%|▊                                   | 1210/50000 [13:12<9:12:26,  1.47it/s]


  2%|▊                                   | 1211/50000 [13:12<9:20:11,  1.45it/s]


  2%|▊                                   | 1212/50000 [13:13<8:57:42,  1.51it/s]


  2%|▊                                   | 1213/50000 [13:14<9:19:37,  1.45it/s]


  2%|▊                                   | 1214/50000 [13:14<8:57:15,  1.51it/s]


  2%|▊                                   | 1215/50000 [13:15<8:40:05,  1.56it/s]


  2%|▉                                   | 1216/50000 [13:16<8:38:37,  1.57it/s]


  2%|▉                                   | 1217/50000 [13:16<8:42:00,  1.56it/s]


  2%|▉                                   | 1218/50000 [13:17<8:44:20,  1.55it/s]


  2%|▉                                   | 1219/50000 [13:17<8:27:33,  1.60it/s]


  2%|▉                                   | 1220/50000 [13:18<8:48:19,  1.54it/s]


  2%|▉                                   | 1221/50000 [13:19<8:37:48,  1.57it/s]


  2%|▉                                   | 1222/50000 [13:19<8:44:32,  1.55it/s]


  2%|▉                                   | 1223/50000 [13:20<8:40:02,  1.56it/s]


  2%|▉                                   | 1224/50000 [13:21<8:25:57,  1.61it/s]


  2%|▉                                   | 1225/50000 [13:21<8:37:44,  1.57it/s]


  2%|▉                                   | 1226/50000 [13:22<8:39:51,  1.56it/s]


  2%|▉                                   | 1227/50000 [13:22<8:24:19,  1.61it/s]


  2%|▉                                   | 1228/50000 [13:23<8:05:51,  1.67it/s]


  2%|▉                                   | 1229/50000 [13:24<8:15:56,  1.64it/s]


  2%|▉                                   | 1230/50000 [13:24<8:46:37,  1.54it/s]


  2%|▉                                   | 1231/50000 [13:25<8:36:34,  1.57it/s]


  2%|▉                                   | 1232/50000 [13:26<8:39:00,  1.57it/s]


  2%|▉                                   | 1233/50000 [13:26<8:23:45,  1.61it/s]


  2%|▉                                   | 1234/50000 [13:27<7:53:49,  1.72it/s]


  2%|▉                                   | 1235/50000 [13:27<7:37:23,  1.78it/s]


  2%|▉                                   | 1236/50000 [13:28<8:37:29,  1.57it/s]


  2%|▉                                   | 1237/50000 [13:29<8:28:43,  1.60it/s]


  2%|▉                                   | 1238/50000 [13:29<8:19:31,  1.63it/s]


  2%|▉                                   | 1239/50000 [13:30<8:05:35,  1.67it/s]


  2%|▉                                   | 1240/50000 [13:30<8:00:24,  1.69it/s]


  2%|▉                                   | 1241/50000 [13:31<7:55:09,  1.71it/s]


  2%|▉                                   | 1242/50000 [13:32<8:30:29,  1.59it/s]


  2%|▉                                   | 1243/50000 [13:32<8:11:38,  1.65it/s]


  2%|▉                                   | 1244/50000 [13:33<8:15:29,  1.64it/s]


  2%|▉                                   | 1245/50000 [13:33<8:27:30,  1.60it/s]


  2%|▉                                   | 1246/50000 [13:34<8:31:01,  1.59it/s]


  2%|▉                                   | 1247/50000 [13:35<8:41:18,  1.56it/s]


  2%|▉                                   | 1248/50000 [13:35<8:42:21,  1.56it/s]


  2%|▉                                   | 1249/50000 [13:36<8:52:21,  1.53it/s]


  2%|▉                                   | 1250/50000 [13:37<9:10:42,  1.48it/s]


  3%|▉                                   | 1251/50000 [13:37<8:48:44,  1.54it/s]


  3%|▉                                   | 1252/50000 [13:38<8:26:03,  1.61it/s]


  3%|▉                                   | 1253/50000 [13:39<8:49:11,  1.54it/s]


  3%|▉                                   | 1254/50000 [13:39<8:35:17,  1.58it/s]


  3%|▉                                   | 1255/50000 [13:40<8:24:00,  1.61it/s]


  3%|▉                                   | 1256/50000 [13:41<8:31:36,  1.59it/s]


  3%|▉                                   | 1257/50000 [13:41<8:56:01,  1.52it/s]


  3%|▉                                  | 1258/50000 [13:42<10:05:50,  1.34it/s]


  3%|▉                                   | 1259/50000 [13:43<9:28:00,  1.43it/s]


  3%|▉                                   | 1260/50000 [13:43<9:15:26,  1.46it/s]


  3%|▉                                   | 1261/50000 [13:44<9:09:50,  1.48it/s]


  3%|▉                                   | 1262/50000 [13:45<9:05:44,  1.49it/s]


  3%|▉                                   | 1263/50000 [13:45<8:56:48,  1.51it/s]


  3%|▉                                   | 1264/50000 [13:46<9:02:46,  1.50it/s]


  3%|▉                                   | 1265/50000 [13:47<8:27:44,  1.60it/s]


  3%|▉                                   | 1266/50000 [13:47<8:34:39,  1.58it/s]


  3%|▉                                   | 1267/50000 [13:48<8:32:03,  1.59it/s]


  3%|▉                                   | 1268/50000 [13:48<8:17:47,  1.63it/s]


  3%|▉                                   | 1269/50000 [13:49<8:33:49,  1.58it/s]


  3%|▉                                   | 1270/50000 [13:50<8:56:26,  1.51it/s]


  3%|▉                                   | 1271/50000 [13:51<9:12:23,  1.47it/s]


  3%|▉                                   | 1272/50000 [13:51<8:55:05,  1.52it/s]


  3%|▉                                   | 1273/50000 [13:52<9:09:56,  1.48it/s]


  3%|▉                                   | 1274/50000 [13:53<9:01:05,  1.50it/s]


  3%|▉                                   | 1275/50000 [13:53<9:38:11,  1.40it/s]


  3%|▉                                   | 1276/50000 [13:54<9:19:11,  1.45it/s]


  3%|▉                                   | 1277/50000 [13:55<9:02:30,  1.50it/s]


  3%|▉                                   | 1278/50000 [13:55<8:57:00,  1.51it/s]


  3%|▉                                   | 1279/50000 [13:56<8:51:31,  1.53it/s]


  3%|▉                                   | 1280/50000 [13:57<8:29:16,  1.59it/s]


  3%|▉                                   | 1281/50000 [13:57<8:39:53,  1.56it/s]


  3%|▉                                   | 1282/50000 [13:58<8:27:15,  1.60it/s]


  3%|▉                                   | 1283/50000 [13:58<8:32:33,  1.58it/s]


  3%|▉                                   | 1284/50000 [13:59<8:35:02,  1.58it/s]


  3%|▉                                   | 1285/50000 [14:00<8:28:11,  1.60it/s]


  3%|▉                                   | 1286/50000 [14:00<8:10:25,  1.66it/s]


  3%|▉                                   | 1287/50000 [14:01<8:04:58,  1.67it/s]


  3%|▉                                   | 1288/50000 [14:01<8:14:24,  1.64it/s]


  3%|▉                                   | 1289/50000 [14:02<8:11:28,  1.65it/s]


  3%|▉                                   | 1290/50000 [14:03<7:51:44,  1.72it/s]


  3%|▉                                   | 1291/50000 [14:03<8:11:04,  1.65it/s]


  3%|▉                                   | 1292/50000 [14:04<8:46:40,  1.54it/s]


  3%|▉                                   | 1293/50000 [14:05<8:24:53,  1.61it/s]


  3%|▉                                   | 1294/50000 [14:05<8:12:40,  1.65it/s]


  3%|▉                                   | 1295/50000 [14:06<8:07:12,  1.67it/s]


  3%|▉                                   | 1296/50000 [14:06<8:35:40,  1.57it/s]


  3%|▉                                   | 1297/50000 [14:07<8:57:39,  1.51it/s]


  3%|▉                                   | 1298/50000 [14:08<8:32:36,  1.58it/s]


  3%|▉                                   | 1299/50000 [14:08<8:31:36,  1.59it/s]


  3%|▉                                  | 1300/50000 [14:09<10:16:29,  1.32it/s]
                                                                                
{'loss': 3.4408, 'grad_norm': 2.412285327911377, 'learning_rate': 0.000974, 'epoch': 0.07}

  3%|▉                                  | 1300/50000 [14:09<10:16:29,  1.32it/s]


  3%|▉                                  | 1301/50000 [14:10<10:28:39,  1.29it/s]


  3%|▉                                   | 1302/50000 [14:11<9:35:51,  1.41it/s]


  3%|▉                                   | 1303/50000 [14:11<9:10:55,  1.47it/s]


  3%|▉                                   | 1304/50000 [14:12<8:52:01,  1.53it/s]


  3%|▉                                   | 1305/50000 [14:13<9:19:33,  1.45it/s]


  3%|▉                                   | 1306/50000 [14:13<9:13:31,  1.47it/s]


  3%|▉                                   | 1307/50000 [14:14<8:41:44,  1.56it/s]


  3%|▉                                   | 1308/50000 [14:15<8:40:47,  1.56it/s]


  3%|▉                                   | 1309/50000 [14:15<8:47:58,  1.54it/s]


  3%|▉                                   | 1310/50000 [14:16<8:37:23,  1.57it/s]


  3%|▉                                   | 1311/50000 [14:16<8:27:02,  1.60it/s]


  3%|▉                                   | 1312/50000 [14:17<8:27:36,  1.60it/s]


  3%|▉                                   | 1313/50000 [14:18<8:12:32,  1.65it/s]


  3%|▉                                   | 1314/50000 [14:18<8:19:01,  1.63it/s]


  3%|▉                                   | 1315/50000 [14:19<8:29:35,  1.59it/s]


  3%|▉                                   | 1316/50000 [14:19<8:12:21,  1.65it/s]


  3%|▉                                   | 1317/50000 [14:20<9:06:17,  1.49it/s]


  3%|▉                                   | 1318/50000 [14:21<9:01:02,  1.50it/s]


  3%|▉                                   | 1319/50000 [14:22<8:51:15,  1.53it/s]


  3%|▉                                   | 1320/50000 [14:22<8:50:27,  1.53it/s]


  3%|▉                                   | 1321/50000 [14:23<8:39:07,  1.56it/s]


  3%|▉                                   | 1322/50000 [14:23<8:26:24,  1.60it/s]


  3%|▉                                   | 1323/50000 [14:24<8:20:41,  1.62it/s]


  3%|▉                                   | 1324/50000 [14:25<7:55:59,  1.70it/s]


  3%|▉                                   | 1325/50000 [14:25<7:54:55,  1.71it/s]


  3%|▉                                   | 1326/50000 [14:26<7:55:35,  1.71it/s]


  3%|▉                                   | 1327/50000 [14:26<8:19:17,  1.62it/s]


  3%|▉                                   | 1328/50000 [14:27<8:06:37,  1.67it/s]


  3%|▉                                   | 1329/50000 [14:28<8:41:18,  1.56it/s]


  3%|▉                                   | 1330/50000 [14:29<9:28:07,  1.43it/s]


  3%|▉                                   | 1331/50000 [14:29<9:19:26,  1.45it/s]


  3%|▉                                   | 1332/50000 [14:30<9:27:40,  1.43it/s]


  3%|▉                                   | 1333/50000 [14:30<8:41:16,  1.56it/s]


  3%|▉                                   | 1334/50000 [14:31<9:00:55,  1.50it/s]


  3%|▉                                   | 1335/50000 [14:32<8:51:03,  1.53it/s]


  3%|▉                                   | 1336/50000 [14:33<9:06:24,  1.48it/s]


  3%|▉                                   | 1337/50000 [14:33<9:01:00,  1.50it/s]


  3%|▉                                   | 1338/50000 [14:34<8:54:09,  1.52it/s]


  3%|▉                                   | 1339/50000 [14:34<9:00:08,  1.50it/s]


  3%|▉                                   | 1340/50000 [14:35<8:25:14,  1.61it/s]


  3%|▉                                   | 1341/50000 [14:36<8:36:58,  1.57it/s]


  3%|▉                                   | 1342/50000 [14:36<9:18:41,  1.45it/s]


  3%|▉                                   | 1343/50000 [14:37<9:06:40,  1.48it/s]


  3%|▉                                   | 1344/50000 [14:38<8:36:41,  1.57it/s]


  3%|▉                                   | 1345/50000 [14:38<8:24:48,  1.61it/s]


  3%|▉                                   | 1346/50000 [14:39<8:15:03,  1.64it/s]


  3%|▉                                   | 1347/50000 [14:40<8:40:37,  1.56it/s]


  3%|▉                                   | 1348/50000 [14:40<8:18:31,  1.63it/s]


  3%|▉                                   | 1349/50000 [14:41<8:22:56,  1.61it/s]


  3%|▉                                   | 1350/50000 [14:41<8:09:10,  1.66it/s]


  3%|▉                                   | 1351/50000 [14:42<8:21:03,  1.62it/s]


  3%|▉                                   | 1352/50000 [14:43<9:31:17,  1.42it/s]


  3%|▉                                   | 1353/50000 [14:43<8:59:06,  1.50it/s]


  3%|▉                                   | 1354/50000 [14:44<8:40:59,  1.56it/s]


  3%|▉                                   | 1355/50000 [14:45<7:59:14,  1.69it/s]


  3%|▉                                   | 1356/50000 [14:45<7:57:47,  1.70it/s]


  3%|▉                                   | 1357/50000 [14:46<7:59:35,  1.69it/s]


  3%|▉                                   | 1358/50000 [14:46<7:57:48,  1.70it/s]


  3%|▉                                   | 1359/50000 [14:47<8:05:56,  1.67it/s]


  3%|▉                                   | 1360/50000 [14:47<7:56:13,  1.70it/s]


  3%|▉                                   | 1361/50000 [14:48<8:20:08,  1.62it/s]


  3%|▉                                   | 1362/50000 [14:49<8:15:09,  1.64it/s]


  3%|▉                                   | 1363/50000 [14:49<8:31:41,  1.58it/s]


  3%|▉                                   | 1364/50000 [14:50<8:38:52,  1.56it/s]


  3%|▉                                   | 1365/50000 [14:51<9:04:26,  1.49it/s]


  3%|▉                                   | 1366/50000 [14:52<9:23:26,  1.44it/s]


  3%|▉                                   | 1367/50000 [14:52<8:46:50,  1.54it/s]


  3%|▉                                   | 1368/50000 [14:53<8:25:47,  1.60it/s]


  3%|▉                                   | 1369/50000 [14:53<8:15:44,  1.63it/s]


  3%|▉                                   | 1370/50000 [14:54<8:52:26,  1.52it/s]


  3%|▉                                   | 1371/50000 [14:55<8:51:04,  1.53it/s]


  3%|▉                                   | 1372/50000 [14:55<8:33:24,  1.58it/s]


  3%|▉                                   | 1373/50000 [14:56<8:25:29,  1.60it/s]


  3%|▉                                   | 1374/50000 [14:56<8:15:27,  1.64it/s]


  3%|▉                                   | 1375/50000 [14:57<8:00:43,  1.69it/s]


  3%|▉                                   | 1376/50000 [14:58<7:55:09,  1.71it/s]


  3%|▉                                   | 1377/50000 [14:58<7:56:20,  1.70it/s]


  3%|▉                                   | 1378/50000 [14:59<7:35:53,  1.78it/s]


  3%|▉                                   | 1379/50000 [14:59<7:49:59,  1.72it/s]


  3%|▉                                   | 1380/50000 [15:00<7:50:52,  1.72it/s]


  3%|▉                                   | 1381/50000 [15:00<7:51:24,  1.72it/s]


  3%|▉                                   | 1382/50000 [15:01<7:46:47,  1.74it/s]


  3%|▉                                   | 1383/50000 [15:02<7:59:36,  1.69it/s]


  3%|▉                                   | 1384/50000 [15:02<8:08:12,  1.66it/s]


  3%|▉                                   | 1385/50000 [15:03<8:26:21,  1.60it/s]


  3%|▉                                   | 1386/50000 [15:04<8:55:05,  1.51it/s]


  3%|▉                                   | 1387/50000 [15:04<8:36:12,  1.57it/s]


  3%|▉                                   | 1388/50000 [15:05<8:28:55,  1.59it/s]


  3%|█                                   | 1389/50000 [15:06<8:37:38,  1.57it/s]


  3%|█                                   | 1390/50000 [15:06<8:07:53,  1.66it/s]


  3%|█                                   | 1391/50000 [15:07<8:42:00,  1.55it/s]


  3%|█                                   | 1392/50000 [15:07<8:39:11,  1.56it/s]


  3%|█                                   | 1393/50000 [15:08<8:46:24,  1.54it/s]


  3%|█                                   | 1394/50000 [15:09<8:23:58,  1.61it/s]


  3%|█                                   | 1395/50000 [15:09<8:45:03,  1.54it/s]


  3%|█                                   | 1396/50000 [15:10<9:03:26,  1.49it/s]


  3%|█                                   | 1397/50000 [15:11<9:01:56,  1.49it/s]


  3%|█                                   | 1398/50000 [15:12<9:43:49,  1.39it/s]


  3%|█                                   | 1399/50000 [15:12<9:30:08,  1.42it/s]


  3%|█                                   | 1400/50000 [15:13<9:37:59,  1.40it/s]
                                                                                
{'loss': 3.4528, 'grad_norm': 2.47792649269104, 'learning_rate': 0.000972, 'epoch': 0.07}

  3%|█                                   | 1400/50000 [15:13<9:37:59,  1.40it/s]


  3%|█                                   | 1401/50000 [15:14<9:10:35,  1.47it/s]


  3%|█                                   | 1402/50000 [15:14<8:57:35,  1.51it/s]


  3%|█                                   | 1403/50000 [15:15<9:19:46,  1.45it/s]


  3%|█                                   | 1404/50000 [15:16<9:11:54,  1.47it/s]


  3%|█                                   | 1405/50000 [15:16<8:50:44,  1.53it/s]


  3%|█                                   | 1406/50000 [15:17<8:34:28,  1.57it/s]


  3%|█                                   | 1407/50000 [15:17<8:20:10,  1.62it/s]


  3%|█                                   | 1408/50000 [15:18<8:34:40,  1.57it/s]


  3%|█                                   | 1409/50000 [15:19<8:35:36,  1.57it/s]


  3%|█                                   | 1410/50000 [15:19<8:53:14,  1.52it/s]


  3%|█                                   | 1411/50000 [15:20<9:16:38,  1.45it/s]


  3%|▉                                  | 1412/50000 [15:21<10:08:24,  1.33it/s]


  3%|▉                                  | 1413/50000 [15:22<10:03:21,  1.34it/s]


  3%|█                                   | 1414/50000 [15:22<9:18:19,  1.45it/s]


  3%|█                                   | 1415/50000 [15:23<9:14:08,  1.46it/s]


  3%|█                                   | 1416/50000 [15:24<8:49:25,  1.53it/s]


  3%|█                                   | 1417/50000 [15:24<9:11:05,  1.47it/s]


  3%|█                                   | 1418/50000 [15:25<8:50:24,  1.53it/s]


  3%|█                                   | 1419/50000 [15:26<8:37:53,  1.56it/s]


  3%|█                                   | 1420/50000 [15:26<8:24:39,  1.60it/s]


  3%|█                                   | 1421/50000 [15:27<8:19:46,  1.62it/s]


  3%|█                                   | 1422/50000 [15:27<8:36:15,  1.57it/s]


  3%|█                                   | 1423/50000 [15:28<8:46:18,  1.54it/s]


  3%|█                                   | 1424/50000 [15:29<8:44:12,  1.54it/s]


  3%|█                                   | 1425/50000 [15:29<8:12:38,  1.64it/s]


  3%|█                                   | 1426/50000 [15:30<8:17:26,  1.63it/s]


  3%|█                                   | 1427/50000 [15:30<8:05:37,  1.67it/s]


  3%|█                                   | 1428/50000 [15:31<8:18:30,  1.62it/s]


  3%|█                                   | 1429/50000 [15:32<8:10:56,  1.65it/s]


  3%|█                                   | 1430/50000 [15:32<8:15:02,  1.64it/s]


  3%|█                                   | 1431/50000 [15:33<8:14:28,  1.64it/s]


  3%|█                                   | 1432/50000 [15:34<8:11:07,  1.65it/s]


  3%|█                                   | 1433/50000 [15:34<8:52:32,  1.52it/s]


  3%|█                                   | 1434/50000 [15:35<8:40:52,  1.55it/s]


  3%|█                                   | 1435/50000 [15:36<8:26:40,  1.60it/s]


  3%|█                                   | 1436/50000 [15:36<8:37:47,  1.56it/s]


  3%|█                                   | 1437/50000 [15:37<8:37:41,  1.56it/s]


  3%|█                                   | 1438/50000 [15:37<8:14:41,  1.64it/s]


  3%|█                                   | 1439/50000 [15:38<8:06:11,  1.66it/s]


  3%|█                                   | 1440/50000 [15:39<8:20:17,  1.62it/s]


  3%|█                                   | 1441/50000 [15:39<8:57:11,  1.51it/s]


  3%|█                                   | 1442/50000 [15:40<8:58:17,  1.50it/s]


  3%|█                                   | 1443/50000 [15:41<9:09:18,  1.47it/s]


  3%|█                                   | 1444/50000 [15:41<9:22:16,  1.44it/s]


  3%|█                                   | 1445/50000 [15:42<9:11:02,  1.47it/s]


  3%|█                                   | 1446/50000 [15:43<8:47:44,  1.53it/s]


  3%|█                                   | 1447/50000 [15:44<9:37:31,  1.40it/s]


  3%|█                                   | 1448/50000 [15:44<9:27:36,  1.43it/s]


  3%|█                                   | 1449/50000 [15:45<9:19:12,  1.45it/s]


  3%|█                                   | 1450/50000 [15:46<8:54:47,  1.51it/s]


  3%|█                                   | 1451/50000 [15:46<8:36:18,  1.57it/s]


  3%|█                                   | 1452/50000 [15:47<9:41:52,  1.39it/s]


  3%|█                                   | 1453/50000 [15:48<9:09:11,  1.47it/s]


  3%|█                                   | 1454/50000 [15:48<8:55:37,  1.51it/s]


  3%|█                                   | 1455/50000 [15:49<9:13:02,  1.46it/s]


  3%|█                                   | 1456/50000 [15:50<9:11:39,  1.47it/s]


  3%|█                                   | 1457/50000 [15:50<8:59:51,  1.50it/s]


  3%|█                                   | 1458/50000 [15:51<9:13:25,  1.46it/s]


  3%|█                                   | 1459/50000 [15:52<9:10:10,  1.47it/s]


  3%|█                                   | 1460/50000 [15:52<9:08:28,  1.47it/s]


  3%|█                                   | 1461/50000 [15:53<9:22:42,  1.44it/s]


  3%|█                                   | 1462/50000 [15:54<8:56:35,  1.51it/s]


  3%|█                                   | 1463/50000 [15:54<8:41:27,  1.55it/s]


  3%|█                                   | 1464/50000 [15:55<8:04:11,  1.67it/s]


  3%|█                                   | 1465/50000 [15:55<8:01:40,  1.68it/s]


  3%|█                                   | 1466/50000 [15:56<8:19:59,  1.62it/s]


  3%|█                                   | 1467/50000 [15:57<8:09:50,  1.65it/s]


  3%|█                                   | 1468/50000 [15:57<8:17:55,  1.62it/s]


  3%|█                                   | 1469/50000 [15:58<8:23:39,  1.61it/s]


  3%|█                                   | 1470/50000 [15:58<8:07:50,  1.66it/s]


  3%|█                                   | 1471/50000 [15:59<8:41:51,  1.55it/s]


  3%|█                                   | 1472/50000 [16:00<8:49:40,  1.53it/s]


  3%|█                                   | 1473/50000 [16:01<9:02:51,  1.49it/s]


  3%|█                                   | 1474/50000 [16:01<9:17:12,  1.45it/s]


  3%|█                                   | 1475/50000 [16:02<9:06:29,  1.48it/s]


  3%|█                                   | 1476/50000 [16:03<8:59:14,  1.50it/s]


  3%|█                                   | 1477/50000 [16:03<9:25:27,  1.43it/s]


  3%|█                                   | 1478/50000 [16:04<9:18:27,  1.45it/s]


  3%|█                                   | 1479/50000 [16:05<8:43:16,  1.55it/s]


  3%|█                                   | 1480/50000 [16:05<9:09:43,  1.47it/s]


  3%|█                                   | 1481/50000 [16:06<9:34:56,  1.41it/s]


  3%|█                                   | 1482/50000 [16:07<9:46:21,  1.38it/s]


  3%|█                                  | 1483/50000 [16:08<10:41:38,  1.26it/s]


  3%|█                                   | 1484/50000 [16:08<9:53:54,  1.36it/s]


  3%|█                                   | 1485/50000 [16:09<9:49:41,  1.37it/s]


  3%|█                                   | 1486/50000 [16:10<9:25:20,  1.43it/s]


  3%|█                                   | 1487/50000 [16:10<9:21:54,  1.44it/s]


  3%|█                                   | 1488/50000 [16:11<9:13:23,  1.46it/s]


  3%|█                                   | 1489/50000 [16:12<8:44:32,  1.54it/s]


  3%|█                                   | 1490/50000 [16:12<8:42:43,  1.55it/s]


  3%|█                                   | 1491/50000 [16:13<9:25:17,  1.43it/s]


  3%|█                                   | 1492/50000 [16:14<8:49:21,  1.53it/s]


  3%|█                                   | 1493/50000 [16:14<8:55:08,  1.51it/s]


  3%|█                                   | 1494/50000 [16:15<8:35:57,  1.57it/s]


  3%|█                                   | 1495/50000 [16:16<9:10:31,  1.47it/s]


  3%|█                                   | 1496/50000 [16:16<9:09:52,  1.47it/s]


  3%|█                                   | 1497/50000 [16:17<8:51:03,  1.52it/s]


  3%|█                                   | 1498/50000 [16:18<9:28:37,  1.42it/s]


  3%|█                                   | 1499/50000 [16:18<9:05:38,  1.48it/s]


  3%|█                                   | 1500/50000 [16:19<8:47:14,  1.53it/s]
                                                                                
{'loss': 3.4376, 'grad_norm': 2.3938076496124268, 'learning_rate': 0.0009699999999999999, 'epoch': 0.08}

  3%|█                                   | 1500/50000 [16:19<8:47:14,  1.53it/s]


  3%|█                                   | 1501/50000 [16:20<9:34:57,  1.41it/s]


  3%|█                                   | 1502/50000 [16:21<9:18:56,  1.45it/s]


  3%|█                                   | 1503/50000 [16:21<9:13:55,  1.46it/s]


  3%|█                                   | 1504/50000 [16:22<9:05:43,  1.48it/s]


  3%|█                                   | 1505/50000 [16:22<8:35:42,  1.57it/s]


  3%|█                                   | 1506/50000 [16:23<8:42:04,  1.55it/s]


  3%|█                                   | 1507/50000 [16:24<8:32:32,  1.58it/s]


  3%|█                                   | 1508/50000 [16:24<8:33:06,  1.58it/s]


  3%|█                                   | 1509/50000 [16:25<8:37:19,  1.56it/s]


  3%|█                                   | 1510/50000 [16:26<8:47:57,  1.53it/s]


  3%|█                                   | 1511/50000 [16:26<8:41:30,  1.55it/s]


  3%|█                                   | 1512/50000 [16:27<9:00:26,  1.50it/s]


  3%|█                                   | 1513/50000 [16:28<9:02:23,  1.49it/s]


  3%|█                                   | 1514/50000 [16:28<9:24:22,  1.43it/s]


  3%|█                                   | 1515/50000 [16:29<9:36:32,  1.40it/s]


  3%|█                                   | 1516/50000 [16:30<9:01:38,  1.49it/s]


  3%|█                                   | 1517/50000 [16:30<9:01:43,  1.49it/s]


  3%|█                                   | 1518/50000 [16:31<8:44:50,  1.54it/s]


  3%|█                                   | 1519/50000 [16:32<8:47:04,  1.53it/s]


  3%|█                                   | 1520/50000 [16:32<8:11:08,  1.65it/s]


  3%|█                                   | 1521/50000 [16:33<7:49:00,  1.72it/s]


  3%|█                                   | 1522/50000 [16:33<7:46:26,  1.73it/s]


  3%|█                                   | 1523/50000 [16:34<8:11:11,  1.64it/s]


  3%|█                                   | 1524/50000 [16:35<8:15:44,  1.63it/s]


  3%|█                                   | 1525/50000 [16:35<8:05:22,  1.66it/s]


  3%|█                                   | 1526/50000 [16:36<8:11:41,  1.64it/s]


  3%|█                                   | 1527/50000 [16:36<8:11:18,  1.64it/s]


  3%|█                                   | 1528/50000 [16:37<8:29:41,  1.59it/s]


  3%|█                                   | 1529/50000 [16:38<8:28:44,  1.59it/s]


  3%|█                                   | 1530/50000 [16:38<8:18:46,  1.62it/s]


  3%|█                                   | 1531/50000 [16:39<8:21:25,  1.61it/s]


  3%|█                                   | 1532/50000 [16:39<8:06:05,  1.66it/s]


  3%|█                                   | 1533/50000 [16:40<8:22:56,  1.61it/s]


  3%|█                                   | 1534/50000 [16:41<8:50:36,  1.52it/s]


  3%|█                                   | 1535/50000 [16:41<8:26:30,  1.59it/s]


  3%|█                                   | 1536/50000 [16:42<8:20:06,  1.62it/s]


  3%|█                                   | 1537/50000 [16:43<8:53:11,  1.51it/s]


  3%|█                                   | 1538/50000 [16:43<9:05:01,  1.48it/s]


  3%|█                                   | 1539/50000 [16:44<8:38:10,  1.56it/s]


  3%|█                                   | 1540/50000 [16:45<8:56:42,  1.50it/s]


  3%|█                                   | 1541/50000 [16:45<8:30:40,  1.58it/s]


  3%|█                                   | 1542/50000 [16:46<8:36:59,  1.56it/s]


  3%|█                                   | 1543/50000 [16:47<8:47:37,  1.53it/s]


  3%|█                                   | 1544/50000 [16:47<8:55:12,  1.51it/s]


  3%|█                                   | 1545/50000 [16:48<8:35:44,  1.57it/s]


  3%|█                                   | 1546/50000 [16:48<8:16:01,  1.63it/s]


  3%|█                                   | 1547/50000 [16:49<9:12:30,  1.46it/s]


  3%|█                                   | 1548/50000 [16:50<9:01:37,  1.49it/s]


  3%|█                                   | 1549/50000 [16:51<9:14:21,  1.46it/s]


  3%|█                                   | 1550/50000 [16:51<9:10:43,  1.47it/s]


  3%|█                                   | 1551/50000 [16:52<9:26:59,  1.42it/s]


  3%|█                                   | 1552/50000 [16:53<9:35:21,  1.40it/s]


  3%|█                                   | 1553/50000 [16:53<9:00:50,  1.49it/s]


  3%|█                                   | 1554/50000 [16:54<8:51:56,  1.52it/s]


  3%|█                                   | 1555/50000 [16:55<8:34:51,  1.57it/s]


  3%|█                                   | 1556/50000 [16:55<8:33:37,  1.57it/s]


  3%|█                                   | 1557/50000 [16:56<8:25:09,  1.60it/s]


  3%|█                                   | 1558/50000 [16:57<9:02:41,  1.49it/s]


  3%|█                                   | 1559/50000 [16:57<8:44:02,  1.54it/s]


  3%|█                                   | 1560/50000 [16:58<8:26:57,  1.59it/s]


  3%|█                                   | 1561/50000 [16:58<8:17:39,  1.62it/s]


  3%|█                                   | 1562/50000 [16:59<8:21:43,  1.61it/s]


  3%|█▏                                  | 1563/50000 [17:00<8:22:57,  1.61it/s]


  3%|█▏                                  | 1564/50000 [17:00<8:36:07,  1.56it/s]


  3%|█▏                                  | 1565/50000 [17:01<8:19:24,  1.62it/s]


  3%|█▏                                  | 1566/50000 [17:02<8:12:21,  1.64it/s]


  3%|█▏                                  | 1567/50000 [17:02<8:15:03,  1.63it/s]


  3%|█▏                                  | 1568/50000 [17:03<8:10:33,  1.65it/s]


  3%|█▏                                  | 1569/50000 [17:03<8:15:14,  1.63it/s]


  3%|█▏                                  | 1570/50000 [17:04<8:10:42,  1.64it/s]


  3%|█▏                                  | 1571/50000 [17:05<8:04:35,  1.67it/s]


  3%|█▏                                  | 1572/50000 [17:05<7:55:41,  1.70it/s]


  3%|█▏                                  | 1573/50000 [17:06<8:31:30,  1.58it/s]


  3%|█▏                                  | 1574/50000 [17:07<8:37:25,  1.56it/s]


  3%|█▏                                  | 1575/50000 [17:07<8:27:37,  1.59it/s]


  3%|█▏                                  | 1576/50000 [17:08<8:14:45,  1.63it/s]


  3%|█▏                                  | 1577/50000 [17:08<8:30:15,  1.58it/s]


  3%|█▏                                  | 1578/50000 [17:09<8:52:25,  1.52it/s]


  3%|█▏                                  | 1579/50000 [17:10<8:55:07,  1.51it/s]


  3%|█▏                                  | 1580/50000 [17:10<8:44:51,  1.54it/s]


  3%|█▏                                  | 1581/50000 [17:11<9:45:01,  1.38it/s]


  3%|█▏                                  | 1582/50000 [17:12<9:25:41,  1.43it/s]


  3%|█▏                                  | 1583/50000 [17:13<9:02:56,  1.49it/s]


  3%|█▏                                  | 1584/50000 [17:13<8:54:40,  1.51it/s]


  3%|█▏                                  | 1585/50000 [17:14<8:41:13,  1.55it/s]


  3%|█▏                                  | 1586/50000 [17:14<8:27:17,  1.59it/s]


  3%|█▏                                  | 1587/50000 [17:15<8:16:04,  1.63it/s]


  3%|█▏                                  | 1588/50000 [17:15<7:59:09,  1.68it/s]


  3%|█▏                                  | 1589/50000 [17:16<8:27:04,  1.59it/s]


  3%|█▏                                  | 1590/50000 [17:17<8:40:20,  1.55it/s]


  3%|█▏                                  | 1591/50000 [17:18<8:40:33,  1.55it/s]


  3%|█▏                                  | 1592/50000 [17:18<8:32:10,  1.58it/s]


  3%|█▏                                  | 1593/50000 [17:19<8:22:27,  1.61it/s]


  3%|█▏                                  | 1594/50000 [17:19<8:12:32,  1.64it/s]


  3%|█▏                                  | 1595/50000 [17:20<8:17:59,  1.62it/s]


  3%|█▏                                  | 1596/50000 [17:21<8:20:16,  1.61it/s]


  3%|█▏                                  | 1597/50000 [17:21<8:43:16,  1.54it/s]


  3%|█▏                                  | 1598/50000 [17:22<8:46:04,  1.53it/s]


  3%|█▏                                  | 1599/50000 [17:23<9:10:04,  1.47it/s]


  3%|█▏                                  | 1600/50000 [17:23<9:30:09,  1.41it/s]
                                                                                
{'loss': 3.4089, 'grad_norm': 2.1441612243652344, 'learning_rate': 0.000968, 'epoch': 0.08}

  3%|█▏                                  | 1600/50000 [17:23<9:30:09,  1.41it/s]


  3%|█▏                                  | 1601/50000 [17:24<9:11:30,  1.46it/s]


  3%|█▏                                  | 1602/50000 [17:25<8:44:38,  1.54it/s]


  3%|█▏                                  | 1603/50000 [17:25<8:24:03,  1.60it/s]


  3%|█▏                                  | 1604/50000 [17:26<8:35:38,  1.56it/s]


  3%|█▏                                  | 1605/50000 [17:27<8:39:23,  1.55it/s]


  3%|█▏                                  | 1606/50000 [17:27<8:17:40,  1.62it/s]


  3%|█▏                                  | 1607/50000 [17:28<8:22:08,  1.61it/s]


  3%|█▏                                  | 1608/50000 [17:28<8:50:11,  1.52it/s]


  3%|█▏                                  | 1609/50000 [17:29<8:27:55,  1.59it/s]


  3%|█▏                                  | 1610/50000 [17:30<8:27:34,  1.59it/s]


  3%|█▏                                  | 1611/50000 [17:30<8:09:01,  1.65it/s]


  3%|█▏                                  | 1612/50000 [17:31<8:08:07,  1.65it/s]


  3%|█▏                                  | 1613/50000 [17:31<7:53:56,  1.70it/s]


  3%|█▏                                  | 1614/50000 [17:32<7:53:04,  1.70it/s]


  3%|█▏                                  | 1615/50000 [17:33<8:06:13,  1.66it/s]


  3%|█▏                                  | 1616/50000 [17:34<9:27:46,  1.42it/s]


  3%|█▏                                  | 1617/50000 [17:34<9:30:42,  1.41it/s]


  3%|█▏                                  | 1618/50000 [17:35<9:36:16,  1.40it/s]


  3%|█▏                                  | 1619/50000 [17:36<9:01:18,  1.49it/s]


  3%|█▏                                  | 1620/50000 [17:36<9:23:16,  1.43it/s]


  3%|█▏                                  | 1621/50000 [17:37<9:25:53,  1.42it/s]


  3%|█▏                                  | 1622/50000 [17:38<9:44:56,  1.38it/s]


  3%|█▏                                  | 1623/50000 [17:38<9:33:12,  1.41it/s]


  3%|█▏                                  | 1624/50000 [17:39<9:15:46,  1.45it/s]


  3%|█▏                                  | 1625/50000 [17:40<9:20:29,  1.44it/s]


  3%|█▏                                  | 1626/50000 [17:40<9:03:50,  1.48it/s]


  3%|█▏                                  | 1627/50000 [17:41<8:45:24,  1.53it/s]


  3%|█▏                                  | 1628/50000 [17:42<8:41:06,  1.55it/s]


  3%|█▏                                  | 1629/50000 [17:42<8:46:10,  1.53it/s]


  3%|█▏                                  | 1630/50000 [17:43<8:26:42,  1.59it/s]


  3%|█▏                                  | 1631/50000 [17:44<8:18:29,  1.62it/s]


  3%|█▏                                  | 1632/50000 [17:44<8:31:10,  1.58it/s]


  3%|█▏                                  | 1633/50000 [17:45<8:40:05,  1.55it/s]


  3%|█▏                                  | 1634/50000 [17:45<8:26:19,  1.59it/s]


  3%|█▏                                  | 1635/50000 [17:46<8:31:59,  1.57it/s]


  3%|█▏                                  | 1636/50000 [17:47<8:31:28,  1.58it/s]


  3%|█▏                                  | 1637/50000 [17:47<8:22:23,  1.60it/s]


  3%|█▏                                  | 1638/50000 [17:48<8:33:55,  1.57it/s]


  3%|█▏                                  | 1639/50000 [17:49<8:59:26,  1.49it/s]


  3%|█▏                                  | 1640/50000 [17:49<8:53:37,  1.51it/s]


  3%|█▏                                  | 1641/50000 [17:50<8:34:18,  1.57it/s]


  3%|█▏                                  | 1642/50000 [17:51<8:26:48,  1.59it/s]


  3%|█▏                                  | 1643/50000 [17:51<8:35:24,  1.56it/s]


  3%|█▏                                  | 1644/50000 [17:52<8:24:38,  1.60it/s]


  3%|█▏                                  | 1645/50000 [17:53<8:51:20,  1.52it/s]


  3%|█▏                                  | 1646/50000 [17:53<9:12:58,  1.46it/s]


  3%|█▏                                  | 1647/50000 [17:54<8:46:22,  1.53it/s]


  3%|█▏                                  | 1648/50000 [17:55<9:09:34,  1.47it/s]


  3%|█▏                                  | 1649/50000 [17:55<8:57:43,  1.50it/s]


  3%|█▏                                  | 1650/50000 [17:56<8:43:12,  1.54it/s]


  3%|█▏                                  | 1651/50000 [17:56<8:09:28,  1.65it/s]


  3%|█▏                                  | 1652/50000 [17:57<8:27:48,  1.59it/s]


  3%|█▏                                  | 1653/50000 [17:58<8:34:31,  1.57it/s]


  3%|█▏                                  | 1654/50000 [17:58<8:43:44,  1.54it/s]


  3%|█▏                                  | 1655/50000 [17:59<8:47:15,  1.53it/s]


  3%|█▏                                  | 1656/50000 [18:00<8:46:27,  1.53it/s]


  3%|█▏                                  | 1657/50000 [18:00<8:14:35,  1.63it/s]


  3%|█▏                                  | 1658/50000 [18:01<8:26:44,  1.59it/s]


  3%|█▏                                  | 1659/50000 [18:02<8:38:25,  1.55it/s]


  3%|█▏                                  | 1660/50000 [18:02<8:28:05,  1.59it/s]


  3%|█▏                                  | 1661/50000 [18:03<8:30:10,  1.58it/s]


  3%|█▏                                  | 1662/50000 [18:04<8:37:42,  1.56it/s]


  3%|█▏                                  | 1663/50000 [18:04<8:44:33,  1.54it/s]


  3%|█▏                                  | 1664/50000 [18:05<8:33:58,  1.57it/s]


  3%|█▏                                  | 1665/50000 [18:05<8:26:32,  1.59it/s]


  3%|█▏                                  | 1666/50000 [18:06<9:03:14,  1.48it/s]


  3%|█▏                                  | 1667/50000 [18:07<8:23:26,  1.60it/s]


  3%|█▏                                  | 1668/50000 [18:07<8:13:10,  1.63it/s]


  3%|█▏                                  | 1669/50000 [18:08<8:41:36,  1.54it/s]


  3%|█▏                                  | 1670/50000 [18:09<8:42:46,  1.54it/s]


  3%|█▏                                  | 1671/50000 [18:09<9:00:49,  1.49it/s]


  3%|█▏                                  | 1672/50000 [18:10<8:45:15,  1.53it/s]


  3%|█▏                                  | 1673/50000 [18:11<9:14:58,  1.45it/s]


  3%|█▏                                  | 1674/50000 [18:11<9:21:53,  1.43it/s]


  3%|█▏                                  | 1675/50000 [18:12<9:10:46,  1.46it/s]


  3%|█▏                                  | 1676/50000 [18:13<8:44:44,  1.53it/s]


  3%|█▏                                  | 1677/50000 [18:13<8:29:24,  1.58it/s]


  3%|█▏                                  | 1678/50000 [18:14<8:47:44,  1.53it/s]


  3%|█▏                                  | 1679/50000 [18:15<8:52:56,  1.51it/s]


  3%|█▏                                  | 1680/50000 [18:15<8:52:27,  1.51it/s]


  3%|█▏                                  | 1681/50000 [18:16<8:24:17,  1.60it/s]


  3%|█▏                                  | 1682/50000 [18:17<9:17:53,  1.44it/s]


  3%|█▏                                  | 1683/50000 [18:17<8:51:12,  1.52it/s]


  3%|█▏                                  | 1684/50000 [18:18<8:53:55,  1.51it/s]


  3%|█▏                                  | 1685/50000 [18:19<8:43:46,  1.54it/s]


  3%|█▏                                  | 1686/50000 [18:19<8:47:16,  1.53it/s]


  3%|█▏                                  | 1687/50000 [18:20<8:46:26,  1.53it/s]


  3%|█▏                                  | 1688/50000 [18:20<8:22:58,  1.60it/s]


  3%|█▏                                  | 1689/50000 [18:21<8:37:13,  1.56it/s]


  3%|█▏                                  | 1690/50000 [18:22<8:40:58,  1.55it/s]


  3%|█▏                                  | 1691/50000 [18:22<8:41:41,  1.54it/s]


  3%|█▏                                  | 1692/50000 [18:23<8:38:49,  1.55it/s]


  3%|█▏                                  | 1693/50000 [18:24<8:39:43,  1.55it/s]


  3%|█▏                                  | 1694/50000 [18:24<8:27:50,  1.59it/s]


  3%|█▏                                  | 1695/50000 [18:25<9:14:35,  1.45it/s]


  3%|█▏                                  | 1696/50000 [18:26<9:04:39,  1.48it/s]


  3%|█▏                                  | 1697/50000 [18:27<9:06:55,  1.47it/s]


  3%|█▏                                  | 1698/50000 [18:27<8:46:52,  1.53it/s]


  3%|█▏                                  | 1699/50000 [18:28<8:39:12,  1.55it/s]


  3%|█▏                                  | 1700/50000 [18:28<8:55:17,  1.50it/s]
                                                                                
{'loss': 3.3927, 'grad_norm': 2.466203212738037, 'learning_rate': 0.000966, 'epoch': 0.09}

  3%|█▏                                  | 1700/50000 [18:28<8:55:17,  1.50it/s]


  3%|█▏                                  | 1701/50000 [18:29<8:41:32,  1.54it/s]


  3%|█▏                                  | 1702/50000 [18:30<8:44:38,  1.53it/s]


  3%|█▏                                  | 1703/50000 [18:30<8:34:22,  1.56it/s]


  3%|█▏                                  | 1704/50000 [18:31<8:56:54,  1.50it/s]


  3%|█▏                                  | 1705/50000 [18:32<8:47:47,  1.53it/s]


  3%|█▏                                  | 1706/50000 [18:32<8:27:35,  1.59it/s]


  3%|█▏                                  | 1707/50000 [18:33<8:13:25,  1.63it/s]


  3%|█▏                                  | 1708/50000 [18:34<9:09:06,  1.47it/s]


  3%|█▏                                  | 1709/50000 [18:34<8:50:02,  1.52it/s]


  3%|█▏                                  | 1710/50000 [18:35<9:12:24,  1.46it/s]


  3%|█▏                                  | 1711/50000 [18:36<8:38:49,  1.55it/s]


  3%|█▏                                  | 1712/50000 [18:36<8:48:40,  1.52it/s]


  3%|█▏                                  | 1713/50000 [18:37<8:31:10,  1.57it/s]


  3%|█▏                                  | 1714/50000 [18:38<9:24:30,  1.43it/s]


  3%|█▏                                  | 1715/50000 [18:38<9:01:40,  1.49it/s]


  3%|█▏                                  | 1716/50000 [18:39<9:16:59,  1.44it/s]


  3%|█▏                                  | 1717/50000 [18:40<8:35:55,  1.56it/s]


  3%|█▏                                  | 1718/50000 [18:40<8:20:12,  1.61it/s]


  3%|█▏                                  | 1719/50000 [18:41<8:24:28,  1.60it/s]


  3%|█▏                                  | 1720/50000 [18:42<9:08:24,  1.47it/s]


  3%|█▏                                  | 1721/50000 [18:42<9:04:22,  1.48it/s]


  3%|█▏                                  | 1722/50000 [18:43<9:15:41,  1.45it/s]


  3%|█▏                                  | 1723/50000 [18:44<9:22:15,  1.43it/s]


  3%|█▏                                  | 1724/50000 [18:44<9:00:13,  1.49it/s]


  3%|█▏                                  | 1725/50000 [18:45<8:29:37,  1.58it/s]


  3%|█▏                                  | 1726/50000 [18:46<8:53:02,  1.51it/s]


  3%|█▏                                  | 1727/50000 [18:46<8:32:13,  1.57it/s]


  3%|█▏                                  | 1728/50000 [18:47<8:22:08,  1.60it/s]


  3%|█▏                                  | 1729/50000 [18:48<9:09:58,  1.46it/s]


  3%|█▏                                  | 1730/50000 [18:48<8:57:56,  1.50it/s]


  3%|█▏                                  | 1731/50000 [18:49<8:41:37,  1.54it/s]


  3%|█▏                                  | 1732/50000 [18:50<9:07:00,  1.47it/s]


  3%|█▏                                  | 1733/50000 [18:50<8:54:26,  1.51it/s]


  3%|█▏                                  | 1734/50000 [18:51<8:28:41,  1.58it/s]


  3%|█▏                                  | 1735/50000 [18:51<8:30:17,  1.58it/s]


  3%|█▏                                  | 1736/50000 [18:52<8:17:44,  1.62it/s]


  3%|█▎                                  | 1737/50000 [18:53<8:08:59,  1.64it/s]


  3%|█▎                                  | 1738/50000 [18:53<8:05:45,  1.66it/s]


  3%|█▎                                  | 1739/50000 [18:54<8:14:17,  1.63it/s]


  3%|█▎                                  | 1740/50000 [18:54<8:31:00,  1.57it/s]


  3%|█▎                                  | 1741/50000 [18:55<8:34:55,  1.56it/s]


  3%|█▎                                  | 1742/50000 [18:56<8:54:59,  1.50it/s]


  3%|█▎                                  | 1743/50000 [18:57<8:56:30,  1.50it/s]


  3%|█▎                                  | 1744/50000 [18:57<8:51:16,  1.51it/s]


  3%|█▎                                  | 1745/50000 [18:58<8:13:23,  1.63it/s]


  3%|█▎                                  | 1746/50000 [18:58<8:43:01,  1.54it/s]


  3%|█▎                                  | 1747/50000 [18:59<8:45:04,  1.53it/s]


  3%|█▎                                  | 1748/50000 [19:00<8:31:06,  1.57it/s]


  3%|█▎                                  | 1749/50000 [19:00<8:52:22,  1.51it/s]


  4%|█▎                                  | 1750/50000 [19:01<8:57:32,  1.50it/s]


  4%|█▎                                  | 1751/50000 [19:02<8:36:49,  1.56it/s]


  4%|█▎                                  | 1752/50000 [19:02<8:19:25,  1.61it/s]


  4%|█▎                                  | 1753/50000 [19:03<8:11:31,  1.64it/s]


  4%|█▎                                  | 1754/50000 [19:03<8:18:13,  1.61it/s]


  4%|█▎                                  | 1755/50000 [19:04<8:01:42,  1.67it/s]


  4%|█▎                                  | 1756/50000 [19:05<8:03:48,  1.66it/s]


  4%|█▎                                  | 1757/50000 [19:05<7:50:07,  1.71it/s]


  4%|█▎                                  | 1758/50000 [19:06<8:32:52,  1.57it/s]


  4%|█▎                                  | 1759/50000 [19:07<9:25:22,  1.42it/s]


  4%|█▎                                  | 1760/50000 [19:07<8:47:07,  1.53it/s]


  4%|█▎                                  | 1761/50000 [19:08<8:38:37,  1.55it/s]


  4%|█▎                                  | 1762/50000 [19:09<8:54:09,  1.51it/s]


  4%|█▎                                  | 1763/50000 [19:09<9:35:12,  1.40it/s]


  4%|█▎                                  | 1764/50000 [19:10<9:26:04,  1.42it/s]


  4%|█▎                                  | 1765/50000 [19:11<9:40:04,  1.39it/s]


  4%|█▎                                  | 1766/50000 [19:12<9:36:59,  1.39it/s]


  4%|█▎                                  | 1767/50000 [19:12<9:05:38,  1.47it/s]


  4%|█▎                                  | 1768/50000 [19:13<9:03:40,  1.48it/s]


  4%|█▎                                  | 1769/50000 [19:13<8:40:58,  1.54it/s]


  4%|█▎                                  | 1770/50000 [19:14<8:25:06,  1.59it/s]


  4%|█▎                                  | 1771/50000 [19:15<8:36:45,  1.56it/s]


  4%|█▎                                  | 1772/50000 [19:16<9:39:55,  1.39it/s]


  4%|█▎                                  | 1773/50000 [19:16<9:36:46,  1.39it/s]


  4%|█▎                                  | 1774/50000 [19:17<9:08:39,  1.46it/s]


  4%|█▎                                  | 1775/50000 [19:17<8:28:25,  1.58it/s]


  4%|█▎                                  | 1776/50000 [19:18<9:01:11,  1.49it/s]


  4%|█▎                                  | 1777/50000 [19:19<9:16:21,  1.44it/s]


  4%|█▎                                  | 1778/50000 [19:19<8:33:42,  1.56it/s]


  4%|█▎                                  | 1779/50000 [19:20<8:21:23,  1.60it/s]


  4%|█▎                                  | 1780/50000 [19:21<8:46:54,  1.53it/s]


  4%|█▎                                  | 1781/50000 [19:21<8:24:38,  1.59it/s]


  4%|█▎                                  | 1782/50000 [19:22<7:59:07,  1.68it/s]


  4%|█▎                                  | 1783/50000 [19:23<8:43:38,  1.53it/s]


  4%|█▎                                  | 1784/50000 [19:23<9:29:16,  1.41it/s]


  4%|█▎                                  | 1785/50000 [19:24<9:32:55,  1.40it/s]


  4%|█▎                                  | 1786/50000 [19:25<9:18:04,  1.44it/s]


  4%|█▎                                  | 1787/50000 [19:26<9:32:42,  1.40it/s]


  4%|█▎                                  | 1788/50000 [19:26<8:55:14,  1.50it/s]


  4%|█▎                                  | 1789/50000 [19:27<8:37:16,  1.55it/s]


  4%|█▎                                  | 1790/50000 [19:28<9:49:42,  1.36it/s]


  4%|█▎                                  | 1791/50000 [19:28<8:58:37,  1.49it/s]


  4%|█▎                                  | 1792/50000 [19:29<8:34:24,  1.56it/s]


  4%|█▎                                  | 1793/50000 [19:29<8:21:48,  1.60it/s]


  4%|█▎                                  | 1794/50000 [19:30<8:14:34,  1.62it/s]


  4%|█▎                                  | 1795/50000 [19:31<8:02:02,  1.67it/s]


  4%|█▎                                  | 1796/50000 [19:31<8:16:42,  1.62it/s]


  4%|█▎                                  | 1797/50000 [19:32<8:24:48,  1.59it/s]


  4%|█▎                                  | 1798/50000 [19:33<8:57:01,  1.50it/s]


  4%|█▎                                  | 1799/50000 [19:33<9:18:11,  1.44it/s]


  4%|█▎                                  | 1800/50000 [19:34<9:04:49,  1.47it/s]
                                                                                
{'loss': 3.4621, 'grad_norm': 2.5757670402526855, 'learning_rate': 0.000964, 'epoch': 0.09}

  4%|█▎                                  | 1800/50000 [19:34<9:04:49,  1.47it/s]


  4%|█▎                                  | 1801/50000 [19:35<9:06:08,  1.47it/s]


  4%|█▎                                  | 1802/50000 [19:35<9:16:47,  1.44it/s]


  4%|█▎                                  | 1803/50000 [19:36<8:53:02,  1.51it/s]


  4%|█▎                                  | 1804/50000 [19:37<9:16:46,  1.44it/s]


  4%|█▎                                  | 1805/50000 [19:37<8:51:36,  1.51it/s]


  4%|█▎                                  | 1806/50000 [19:38<8:37:01,  1.55it/s]


  4%|█▎                                  | 1807/50000 [19:39<8:23:40,  1.59it/s]


  4%|█▎                                  | 1808/50000 [19:39<8:32:40,  1.57it/s]


  4%|█▎                                  | 1809/50000 [19:40<8:14:38,  1.62it/s]


  4%|█▎                                  | 1810/50000 [19:40<8:23:19,  1.60it/s]


  4%|█▎                                  | 1811/50000 [19:41<9:19:50,  1.43it/s]


  4%|█▎                                  | 1812/50000 [19:42<9:27:55,  1.41it/s]


  4%|█▎                                  | 1813/50000 [19:43<8:59:18,  1.49it/s]


  4%|█▎                                  | 1814/50000 [19:43<8:50:08,  1.51it/s]


  4%|█▎                                  | 1815/50000 [19:44<8:51:15,  1.51it/s]


  4%|█▎                                  | 1816/50000 [19:45<8:41:28,  1.54it/s]


  4%|█▎                                  | 1817/50000 [19:45<8:36:06,  1.56it/s]


  4%|█▎                                  | 1818/50000 [19:46<8:44:30,  1.53it/s]


  4%|█▎                                  | 1819/50000 [19:46<8:24:30,  1.59it/s]


  4%|█▎                                  | 1820/50000 [19:47<8:16:25,  1.62it/s]


  4%|█▎                                  | 1821/50000 [19:48<7:48:48,  1.71it/s]


  4%|█▎                                  | 1822/50000 [19:48<8:09:57,  1.64it/s]


  4%|█▎                                  | 1823/50000 [19:49<7:40:10,  1.74it/s]


  4%|█▎                                  | 1824/50000 [19:49<7:41:23,  1.74it/s]


  4%|█▎                                  | 1825/50000 [19:50<7:41:52,  1.74it/s]


  4%|█▎                                  | 1826/50000 [19:51<8:05:31,  1.65it/s]


  4%|█▎                                  | 1827/50000 [19:51<8:36:22,  1.55it/s]


  4%|█▎                                  | 1828/50000 [19:52<8:56:20,  1.50it/s]


  4%|█▎                                  | 1829/50000 [19:53<8:30:24,  1.57it/s]


  4%|█▎                                  | 1830/50000 [19:53<8:21:06,  1.60it/s]


  4%|█▎                                  | 1831/50000 [19:54<7:57:04,  1.68it/s]


  4%|█▎                                  | 1832/50000 [19:54<7:48:45,  1.71it/s]


  4%|█▎                                  | 1833/50000 [19:55<8:46:33,  1.52it/s]


  4%|█▎                                  | 1834/50000 [19:56<8:32:22,  1.57it/s]


  4%|█▎                                  | 1835/50000 [19:56<8:19:40,  1.61it/s]


  4%|█▎                                  | 1836/50000 [19:57<8:13:27,  1.63it/s]


  4%|█▎                                  | 1837/50000 [19:58<8:41:40,  1.54it/s]


  4%|█▎                                  | 1838/50000 [19:58<8:42:48,  1.54it/s]


  4%|█▎                                  | 1839/50000 [19:59<8:28:09,  1.58it/s]


  4%|█▎                                  | 1840/50000 [19:59<8:00:35,  1.67it/s]


  4%|█▎                                  | 1841/50000 [20:00<8:15:33,  1.62it/s]


  4%|█▎                                  | 1842/50000 [20:01<8:27:29,  1.58it/s]


  4%|█▎                                  | 1843/50000 [20:01<8:17:09,  1.61it/s]


  4%|█▎                                  | 1844/50000 [20:02<7:53:47,  1.69it/s]


  4%|█▎                                  | 1845/50000 [20:03<8:35:08,  1.56it/s]


  4%|█▎                                  | 1846/50000 [20:03<8:34:29,  1.56it/s]


  4%|█▎                                  | 1847/50000 [20:04<8:32:28,  1.57it/s]


  4%|█▎                                  | 1848/50000 [20:04<8:20:27,  1.60it/s]


  4%|█▎                                  | 1849/50000 [20:05<8:26:06,  1.59it/s]


  4%|█▎                                  | 1850/50000 [20:06<8:55:45,  1.50it/s]


  4%|█▎                                  | 1851/50000 [20:06<8:48:00,  1.52it/s]


  4%|█▎                                  | 1852/50000 [20:07<8:02:21,  1.66it/s]


  4%|█▎                                  | 1853/50000 [20:08<8:10:29,  1.64it/s]


  4%|█▎                                  | 1854/50000 [20:08<8:41:30,  1.54it/s]


  4%|█▎                                  | 1855/50000 [20:09<8:47:18,  1.52it/s]


  4%|█▎                                  | 1856/50000 [20:10<8:52:32,  1.51it/s]


  4%|█▎                                  | 1857/50000 [20:10<8:10:03,  1.64it/s]


  4%|█▎                                  | 1858/50000 [20:11<8:45:15,  1.53it/s]


  4%|█▎                                  | 1859/50000 [20:12<8:49:27,  1.52it/s]


  4%|█▎                                  | 1860/50000 [20:12<8:33:35,  1.56it/s]


  4%|█▎                                  | 1861/50000 [20:13<8:38:55,  1.55it/s]


  4%|█▎                                  | 1862/50000 [20:13<8:28:27,  1.58it/s]


  4%|█▎                                  | 1863/50000 [20:14<8:36:22,  1.55it/s]


  4%|█▎                                  | 1864/50000 [20:15<8:52:41,  1.51it/s]


  4%|█▎                                  | 1865/50000 [20:15<8:51:51,  1.51it/s]


  4%|█▎                                  | 1866/50000 [20:16<8:34:20,  1.56it/s]


  4%|█▎                                  | 1867/50000 [20:17<8:34:21,  1.56it/s]


  4%|█▎                                  | 1868/50000 [20:17<8:45:08,  1.53it/s]


  4%|█▎                                  | 1869/50000 [20:18<8:31:13,  1.57it/s]


  4%|█▎                                  | 1870/50000 [20:19<8:51:16,  1.51it/s]


  4%|█▎                                  | 1871/50000 [20:19<8:42:57,  1.53it/s]


  4%|█▎                                  | 1872/50000 [20:20<8:27:40,  1.58it/s]


  4%|█▎                                  | 1873/50000 [20:21<8:26:13,  1.58it/s]


  4%|█▎                                  | 1874/50000 [20:21<8:29:43,  1.57it/s]


  4%|█▎                                  | 1875/50000 [20:22<8:30:31,  1.57it/s]


  4%|█▎                                  | 1876/50000 [20:22<8:38:43,  1.55it/s]


  4%|█▎                                  | 1877/50000 [20:23<8:35:30,  1.56it/s]


  4%|█▎                                  | 1878/50000 [20:24<8:58:05,  1.49it/s]


  4%|█▎                                  | 1879/50000 [20:24<8:39:42,  1.54it/s]


  4%|█▎                                  | 1880/50000 [20:25<8:36:04,  1.55it/s]


  4%|█▎                                  | 1881/50000 [20:26<8:42:45,  1.53it/s]


  4%|█▎                                  | 1882/50000 [20:26<9:01:31,  1.48it/s]


  4%|█▎                                  | 1883/50000 [20:27<9:20:58,  1.43it/s]


  4%|█▎                                  | 1884/50000 [20:28<9:09:30,  1.46it/s]


  4%|█▎                                  | 1885/50000 [20:28<8:50:40,  1.51it/s]


  4%|█▎                                  | 1886/50000 [20:29<8:51:21,  1.51it/s]


  4%|█▎                                  | 1887/50000 [20:30<8:32:03,  1.57it/s]


  4%|█▎                                  | 1888/50000 [20:30<8:33:46,  1.56it/s]


  4%|█▎                                  | 1889/50000 [20:31<8:21:08,  1.60it/s]


  4%|█▎                                  | 1890/50000 [20:32<8:29:15,  1.57it/s]


  4%|█▎                                  | 1891/50000 [20:32<8:21:06,  1.60it/s]


  4%|█▎                                  | 1892/50000 [20:33<9:06:55,  1.47it/s]


  4%|█▎                                  | 1893/50000 [20:34<9:01:15,  1.48it/s]


  4%|█▎                                  | 1894/50000 [20:34<9:17:38,  1.44it/s]


  4%|█▎                                  | 1895/50000 [20:35<9:14:43,  1.45it/s]


  4%|█▎                                  | 1896/50000 [20:36<9:50:34,  1.36it/s]


  4%|█▎                                  | 1897/50000 [20:37<9:34:49,  1.39it/s]


  4%|█▎                                  | 1898/50000 [20:37<9:02:27,  1.48it/s]


  4%|█▎                                  | 1899/50000 [20:38<9:20:00,  1.43it/s]


  4%|█▎                                  | 1900/50000 [20:39<9:29:10,  1.41it/s]
                                                                                
{'loss': 3.4155, 'grad_norm': 2.1340949535369873, 'learning_rate': 0.000962, 'epoch': 0.1}

  4%|█▎                                  | 1900/50000 [20:39<9:29:10,  1.41it/s]


  4%|█▎                                  | 1901/50000 [20:39<9:00:03,  1.48it/s]


  4%|█▎                                  | 1902/50000 [20:40<8:57:50,  1.49it/s]


  4%|█▎                                  | 1903/50000 [20:41<8:59:24,  1.49it/s]


  4%|█▎                                  | 1904/50000 [20:41<8:34:44,  1.56it/s]


  4%|█▎                                  | 1905/50000 [20:42<8:52:23,  1.51it/s]


  4%|█▎                                  | 1906/50000 [20:43<8:37:09,  1.55it/s]


  4%|█▎                                  | 1907/50000 [20:43<8:23:30,  1.59it/s]


  4%|█▎                                  | 1908/50000 [20:44<8:26:14,  1.58it/s]


  4%|█▎                                  | 1909/50000 [20:45<9:02:11,  1.48it/s]


  4%|█▍                                  | 1910/50000 [20:45<8:32:02,  1.57it/s]


  4%|█▍                                  | 1911/50000 [20:46<8:36:48,  1.55it/s]


  4%|█▍                                  | 1912/50000 [20:46<8:26:20,  1.58it/s]


  4%|█▍                                  | 1913/50000 [20:47<8:29:44,  1.57it/s]


  4%|█▍                                  | 1914/50000 [20:48<8:36:41,  1.55it/s]


  4%|█▍                                  | 1915/50000 [20:48<8:55:38,  1.50it/s]


  4%|█▍                                  | 1916/50000 [20:49<8:26:02,  1.58it/s]


  4%|█▍                                  | 1917/50000 [20:50<8:15:53,  1.62it/s]


  4%|█▍                                  | 1918/50000 [20:50<8:31:27,  1.57it/s]


  4%|█▍                                  | 1919/50000 [20:51<8:36:13,  1.55it/s]


  4%|█▍                                  | 1920/50000 [20:51<8:12:22,  1.63it/s]


  4%|█▍                                  | 1921/50000 [20:52<8:09:11,  1.64it/s]


  4%|█▍                                  | 1922/50000 [20:53<8:36:23,  1.55it/s]


  4%|█▍                                  | 1923/50000 [20:53<8:36:35,  1.55it/s]


  4%|█▍                                  | 1924/50000 [20:54<8:15:35,  1.62it/s]


  4%|█▍                                  | 1925/50000 [20:55<8:23:32,  1.59it/s]


  4%|█▍                                  | 1926/50000 [20:55<8:23:10,  1.59it/s]


  4%|█▍                                  | 1927/50000 [20:56<8:10:45,  1.63it/s]


  4%|█▍                                  | 1928/50000 [20:56<8:12:45,  1.63it/s]


  4%|█▍                                  | 1929/50000 [20:57<8:29:18,  1.57it/s]


  4%|█▍                                  | 1930/50000 [20:58<8:50:26,  1.51it/s]


  4%|█▍                                  | 1931/50000 [20:59<9:06:45,  1.47it/s]


  4%|█▍                                  | 1932/50000 [20:59<8:41:11,  1.54it/s]


  4%|█▍                                  | 1933/50000 [21:00<8:17:16,  1.61it/s]


  4%|█▍                                  | 1934/50000 [21:00<8:27:50,  1.58it/s]


  4%|█▍                                  | 1935/50000 [21:01<8:19:54,  1.60it/s]


  4%|█▍                                  | 1936/50000 [21:02<8:11:23,  1.63it/s]


  4%|█▍                                  | 1937/50000 [21:02<9:07:40,  1.46it/s]


  4%|█▍                                  | 1938/50000 [21:03<8:47:46,  1.52it/s]


  4%|█▍                                  | 1939/50000 [21:04<9:14:06,  1.45it/s]


  4%|█▍                                  | 1940/50000 [21:04<9:00:07,  1.48it/s]


  4%|█▍                                  | 1941/50000 [21:05<8:41:03,  1.54it/s]


  4%|█▍                                  | 1942/50000 [21:06<8:17:10,  1.61it/s]


  4%|█▍                                  | 1943/50000 [21:06<8:20:08,  1.60it/s]


  4%|█▍                                  | 1944/50000 [21:07<8:53:16,  1.50it/s]


  4%|█▍                                  | 1945/50000 [21:07<8:11:26,  1.63it/s]


  4%|█▍                                  | 1946/50000 [21:08<8:14:36,  1.62it/s]


  4%|█▍                                  | 1947/50000 [21:09<8:43:07,  1.53it/s]


  4%|█▍                                  | 1948/50000 [21:09<9:01:33,  1.48it/s]


  4%|█▍                                  | 1949/50000 [21:10<8:41:58,  1.53it/s]


  4%|█▍                                  | 1950/50000 [21:11<8:20:44,  1.60it/s]


  4%|█▍                                  | 1951/50000 [21:11<8:24:07,  1.59it/s]


  4%|█▍                                  | 1952/50000 [21:12<8:17:16,  1.61it/s]


  4%|█▍                                  | 1953/50000 [21:13<8:39:58,  1.54it/s]


  4%|█▍                                  | 1954/50000 [21:13<8:35:09,  1.55it/s]


  4%|█▍                                  | 1955/50000 [21:14<8:35:53,  1.55it/s]


  4%|█▍                                  | 1956/50000 [21:15<8:36:16,  1.55it/s]


  4%|█▍                                  | 1957/50000 [21:15<8:16:44,  1.61it/s]


  4%|█▍                                  | 1958/50000 [21:16<8:24:13,  1.59it/s]


  4%|█▍                                  | 1959/50000 [21:16<8:17:21,  1.61it/s]


  4%|█▍                                  | 1960/50000 [21:17<8:11:00,  1.63it/s]


  4%|█▍                                  | 1961/50000 [21:18<9:04:22,  1.47it/s]


  4%|█▍                                  | 1962/50000 [21:18<8:34:53,  1.55it/s]


  4%|█▍                                  | 1963/50000 [21:19<8:50:48,  1.51it/s]


  4%|█▍                                  | 1964/50000 [21:20<8:46:43,  1.52it/s]


  4%|█▍                                  | 1965/50000 [21:20<9:16:15,  1.44it/s]


  4%|█▍                                  | 1966/50000 [21:21<8:41:41,  1.53it/s]


  4%|█▍                                  | 1967/50000 [21:22<8:49:17,  1.51it/s]


  4%|█▍                                  | 1968/50000 [21:23<9:32:30,  1.40it/s]


  4%|█▍                                  | 1969/50000 [21:23<9:09:52,  1.46it/s]


  4%|█▍                                  | 1970/50000 [21:24<8:55:29,  1.49it/s]


  4%|█▍                                  | 1971/50000 [21:24<8:53:01,  1.50it/s]


  4%|█▍                                  | 1972/50000 [21:25<8:31:24,  1.57it/s]


  4%|█▍                                  | 1973/50000 [21:26<8:37:34,  1.55it/s]


  4%|█▍                                  | 1974/50000 [21:26<8:56:39,  1.49it/s]


  4%|█▍                                  | 1975/50000 [21:27<8:46:04,  1.52it/s]


  4%|█▍                                  | 1976/50000 [21:28<8:43:14,  1.53it/s]


  4%|█▍                                  | 1977/50000 [21:28<8:41:21,  1.54it/s]


  4%|█▍                                  | 1978/50000 [21:29<8:25:30,  1.58it/s]


  4%|█▍                                  | 1979/50000 [21:30<8:15:18,  1.62it/s]


  4%|█▍                                  | 1980/50000 [21:30<8:20:03,  1.60it/s]


  4%|█▍                                  | 1981/50000 [21:31<8:25:06,  1.58it/s]


  4%|█▍                                  | 1982/50000 [21:31<8:23:59,  1.59it/s]


  4%|█▍                                  | 1983/50000 [21:32<8:51:11,  1.51it/s]


  4%|█▍                                  | 1984/50000 [21:33<8:42:26,  1.53it/s]


  4%|█▍                                  | 1985/50000 [21:33<8:37:42,  1.55it/s]


  4%|█▍                                  | 1986/50000 [21:34<8:07:52,  1.64it/s]


  4%|█▍                                  | 1987/50000 [21:35<8:22:42,  1.59it/s]


  4%|█▍                                  | 1988/50000 [21:35<8:07:14,  1.64it/s]


  4%|█▍                                  | 1989/50000 [21:36<8:00:54,  1.66it/s]


  4%|█▍                                  | 1990/50000 [21:36<7:59:21,  1.67it/s]


  4%|█▍                                  | 1991/50000 [21:37<8:13:32,  1.62it/s]


  4%|█▍                                  | 1992/50000 [21:38<7:56:19,  1.68it/s]


  4%|█▍                                  | 1993/50000 [21:38<7:35:55,  1.75it/s]


  4%|█▍                                  | 1994/50000 [21:39<7:31:25,  1.77it/s]


  4%|█▍                                  | 1995/50000 [21:39<7:48:06,  1.71it/s]


  4%|█▍                                  | 1996/50000 [21:40<8:10:15,  1.63it/s]


  4%|█▍                                  | 1997/50000 [21:41<8:38:25,  1.54it/s]


  4%|█▍                                  | 1998/50000 [21:41<8:43:52,  1.53it/s]


  4%|█▍                                  | 1999/50000 [21:42<8:26:30,  1.58it/s]


  4%|█▍                                  | 2000/50000 [21:43<8:23:33,  1.59it/s]
                                                                                
{'loss': 3.3959, 'grad_norm': 2.145294189453125, 'learning_rate': 0.00096, 'epoch': 0.1}

  4%|█▍                                  | 2000/50000 [21:43<8:23:33,  1.59it/s]


  4%|█▍                                  | 2001/50000 [21:43<8:28:57,  1.57it/s]


  4%|█▍                                  | 2002/50000 [21:44<8:28:20,  1.57it/s]


  4%|█▍                                  | 2003/50000 [21:45<9:36:02,  1.39it/s]


  4%|█▍                                  | 2004/50000 [21:45<9:16:30,  1.44it/s]


  4%|█▍                                  | 2005/50000 [21:46<9:05:47,  1.47it/s]


  4%|█▍                                  | 2006/50000 [21:47<8:38:45,  1.54it/s]


  4%|█▍                                  | 2007/50000 [21:47<8:24:21,  1.59it/s]


  4%|█▍                                  | 2008/50000 [21:48<8:05:26,  1.65it/s]


  4%|█▍                                  | 2009/50000 [21:48<8:08:52,  1.64it/s]


  4%|█▍                                  | 2010/50000 [21:49<8:06:28,  1.64it/s]


  4%|█▍                                  | 2011/50000 [21:50<8:03:45,  1.65it/s]


  4%|█▍                                  | 2012/50000 [21:50<7:51:42,  1.70it/s]


  4%|█▍                                  | 2013/50000 [21:51<7:48:39,  1.71it/s]


  4%|█▍                                  | 2014/50000 [21:51<8:35:27,  1.55it/s]


  4%|█▍                                  | 2015/50000 [21:52<8:39:25,  1.54it/s]


  4%|█▍                                  | 2016/50000 [21:53<8:24:45,  1.58it/s]


  4%|█▍                                  | 2017/50000 [21:53<8:07:25,  1.64it/s]


  4%|█▍                                  | 2018/50000 [21:54<8:10:30,  1.63it/s]


  4%|█▍                                  | 2019/50000 [21:55<8:27:57,  1.57it/s]


  4%|█▍                                  | 2020/50000 [21:55<8:15:46,  1.61it/s]


  4%|█▍                                  | 2021/50000 [21:56<8:11:48,  1.63it/s]


  4%|█▍                                  | 2022/50000 [21:57<8:42:53,  1.53it/s]


  4%|█▍                                  | 2023/50000 [21:57<8:30:44,  1.57it/s]


  4%|█▍                                  | 2024/50000 [21:58<9:05:21,  1.47it/s]


  4%|█▍                                  | 2025/50000 [21:59<8:44:53,  1.52it/s]


  4%|█▍                                  | 2026/50000 [21:59<8:02:42,  1.66it/s]


  4%|█▍                                  | 2027/50000 [22:00<8:12:55,  1.62it/s]


  4%|█▍                                  | 2028/50000 [22:00<7:50:22,  1.70it/s]


  4%|█▍                                  | 2029/50000 [22:01<7:48:56,  1.70it/s]


  4%|█▍                                  | 2030/50000 [22:01<8:29:38,  1.57it/s]


  4%|█▍                                  | 2031/50000 [22:02<8:46:45,  1.52it/s]


  4%|█▍                                  | 2032/50000 [22:03<8:39:03,  1.54it/s]


  4%|█▍                                  | 2033/50000 [22:03<8:38:15,  1.54it/s]


  4%|█▍                                  | 2034/50000 [22:04<8:54:19,  1.50it/s]


  4%|█▍                                  | 2035/50000 [22:05<8:26:17,  1.58it/s]


  4%|█▍                                  | 2036/50000 [22:05<8:27:40,  1.57it/s]


  4%|█▍                                  | 2037/50000 [22:06<9:13:51,  1.44it/s]


  4%|█▍                                  | 2038/50000 [22:07<8:45:48,  1.52it/s]


  4%|█▍                                  | 2039/50000 [22:07<8:32:06,  1.56it/s]


  4%|█▍                                  | 2040/50000 [22:08<8:31:57,  1.56it/s]


  4%|█▍                                  | 2041/50000 [22:09<8:19:43,  1.60it/s]


  4%|█▍                                  | 2042/50000 [22:09<8:09:32,  1.63it/s]


  4%|█▍                                  | 2043/50000 [22:10<8:02:26,  1.66it/s]


  4%|█▍                                  | 2044/50000 [22:10<8:06:29,  1.64it/s]


  4%|█▍                                  | 2045/50000 [22:11<8:44:45,  1.52it/s]


  4%|█▍                                  | 2046/50000 [22:12<8:29:53,  1.57it/s]


  4%|█▍                                  | 2047/50000 [22:12<8:22:29,  1.59it/s]


  4%|█▍                                  | 2048/50000 [22:13<8:49:56,  1.51it/s]


  4%|█▍                                  | 2049/50000 [22:14<8:48:37,  1.51it/s]


  4%|█▍                                  | 2050/50000 [22:14<8:13:41,  1.62it/s]


  4%|█▍                                  | 2051/50000 [22:15<8:05:19,  1.65it/s]


  4%|█▍                                  | 2052/50000 [22:16<9:13:52,  1.44it/s]


  4%|█▍                                  | 2053/50000 [22:16<8:29:54,  1.57it/s]


  4%|█▍                                  | 2054/50000 [22:17<8:13:37,  1.62it/s]


  4%|█▍                                  | 2055/50000 [22:18<9:05:48,  1.46it/s]


  4%|█▍                                  | 2056/50000 [22:18<8:55:09,  1.49it/s]


  4%|█▍                                  | 2057/50000 [22:19<8:38:42,  1.54it/s]


  4%|█▍                                  | 2058/50000 [22:20<8:36:16,  1.55it/s]


  4%|█▍                                  | 2059/50000 [22:20<8:43:48,  1.53it/s]


  4%|█▍                                  | 2060/50000 [22:21<8:20:17,  1.60it/s]


  4%|█▍                                  | 2061/50000 [22:21<8:24:53,  1.58it/s]


  4%|█▍                                  | 2062/50000 [22:22<9:12:01,  1.45it/s]


  4%|█▍                                  | 2063/50000 [22:23<8:37:02,  1.55it/s]


  4%|█▍                                  | 2064/50000 [22:23<8:35:24,  1.55it/s]


  4%|█▍                                  | 2065/50000 [22:24<8:31:37,  1.56it/s]


  4%|█▍                                  | 2066/50000 [22:25<8:31:50,  1.56it/s]


  4%|█▍                                  | 2067/50000 [22:25<8:18:23,  1.60it/s]


  4%|█▍                                  | 2068/50000 [22:26<8:24:08,  1.58it/s]


  4%|█▍                                  | 2069/50000 [22:26<7:54:05,  1.68it/s]


  4%|█▍                                  | 2070/50000 [22:27<8:14:34,  1.62it/s]


  4%|█▍                                  | 2071/50000 [22:28<8:03:18,  1.65it/s]


  4%|█▍                                  | 2072/50000 [22:28<7:52:12,  1.69it/s]


  4%|█▍                                  | 2073/50000 [22:29<7:34:41,  1.76it/s]


  4%|█▍                                  | 2074/50000 [22:30<8:59:02,  1.48it/s]


  4%|█▍                                  | 2075/50000 [22:30<8:51:29,  1.50it/s]


  4%|█▍                                  | 2076/50000 [22:31<9:04:39,  1.47it/s]


  4%|█▍                                  | 2077/50000 [22:32<8:54:53,  1.49it/s]


  4%|█▍                                  | 2078/50000 [22:32<8:38:52,  1.54it/s]


  4%|█▍                                  | 2079/50000 [22:33<8:20:19,  1.60it/s]


  4%|█▍                                  | 2080/50000 [22:34<8:23:45,  1.59it/s]


  4%|█▍                                  | 2081/50000 [22:34<8:29:20,  1.57it/s]


  4%|█▍                                  | 2082/50000 [22:35<9:15:34,  1.44it/s]


  4%|█▍                                  | 2083/50000 [22:36<9:01:12,  1.48it/s]


  4%|█▌                                  | 2084/50000 [22:36<9:17:24,  1.43it/s]


  4%|█▌                                  | 2085/50000 [22:37<8:41:26,  1.53it/s]


  4%|█▌                                  | 2086/50000 [22:38<8:46:16,  1.52it/s]


  4%|█▌                                  | 2087/50000 [22:38<8:42:01,  1.53it/s]


  4%|█▌                                  | 2088/50000 [22:39<8:26:52,  1.58it/s]


  4%|█▌                                  | 2089/50000 [22:40<8:50:06,  1.51it/s]


  4%|█▌                                  | 2090/50000 [22:40<8:25:01,  1.58it/s]


  4%|█▌                                  | 2091/50000 [22:41<8:08:58,  1.63it/s]


  4%|█▌                                  | 2092/50000 [22:41<7:54:49,  1.68it/s]


  4%|█▌                                  | 2093/50000 [22:42<7:50:31,  1.70it/s]


  4%|█▌                                  | 2094/50000 [22:42<7:59:31,  1.67it/s]


  4%|█▌                                  | 2095/50000 [22:43<7:52:12,  1.69it/s]


  4%|█▌                                  | 2096/50000 [22:44<7:48:35,  1.70it/s]


  4%|█▌                                  | 2097/50000 [22:44<8:24:26,  1.58it/s]


  4%|█▌                                  | 2098/50000 [22:45<8:17:05,  1.61it/s]


  4%|█▌                                  | 2099/50000 [22:46<8:21:04,  1.59it/s]


  4%|█▌                                  | 2100/50000 [22:46<8:45:34,  1.52it/s]
                                                                                
{'loss': 3.4144, 'grad_norm': 2.543454170227051, 'learning_rate': 0.000958, 'epoch': 0.11}

  4%|█▌                                  | 2100/50000 [22:46<8:45:34,  1.52it/s]


  4%|█▌                                  | 2101/50000 [22:47<8:13:02,  1.62it/s]


  4%|█▌                                  | 2102/50000 [22:47<8:04:48,  1.65it/s]


  4%|█▌                                  | 2103/50000 [22:48<8:15:23,  1.61it/s]


  4%|█▌                                  | 2104/50000 [22:49<8:16:54,  1.61it/s]


  4%|█▌                                  | 2105/50000 [22:49<8:17:59,  1.60it/s]


  4%|█▌                                  | 2106/50000 [22:50<9:05:34,  1.46it/s]


  4%|█▌                                  | 2107/50000 [22:51<8:51:58,  1.50it/s]


  4%|█▌                                  | 2108/50000 [22:52<9:07:43,  1.46it/s]


  4%|█▌                                  | 2109/50000 [22:52<9:30:35,  1.40it/s]


  4%|█▌                                  | 2110/50000 [22:53<9:32:46,  1.39it/s]


  4%|█▌                                  | 2111/50000 [22:54<9:18:32,  1.43it/s]


  4%|█▌                                  | 2112/50000 [22:54<8:53:23,  1.50it/s]


  4%|█▌                                  | 2113/50000 [22:55<9:50:07,  1.35it/s]


  4%|█▌                                  | 2114/50000 [22:56<9:28:56,  1.40it/s]


  4%|█▌                                  | 2115/50000 [22:56<9:08:18,  1.46it/s]


  4%|█▌                                  | 2116/50000 [22:57<8:41:45,  1.53it/s]


  4%|█▌                                  | 2117/50000 [22:58<8:24:38,  1.58it/s]


  4%|█▌                                  | 2118/50000 [22:58<8:37:11,  1.54it/s]


  4%|█▌                                  | 2119/50000 [22:59<8:05:53,  1.64it/s]


  4%|█▌                                  | 2120/50000 [22:59<8:20:48,  1.59it/s]


  4%|█▌                                  | 2121/50000 [23:00<8:45:25,  1.52it/s]


  4%|█▌                                  | 2122/50000 [23:01<8:21:15,  1.59it/s]


  4%|█▌                                  | 2123/50000 [23:01<8:13:10,  1.62it/s]


  4%|█▌                                  | 2124/50000 [23:02<8:15:17,  1.61it/s]


  4%|█▌                                  | 2125/50000 [23:03<8:21:16,  1.59it/s]


  4%|█▌                                  | 2126/50000 [23:03<8:10:17,  1.63it/s]


  4%|█▌                                  | 2127/50000 [23:04<8:13:10,  1.62it/s]


  4%|█▌                                  | 2128/50000 [23:04<8:04:52,  1.65it/s]


  4%|█▌                                  | 2129/50000 [23:05<7:43:45,  1.72it/s]


  4%|█▌                                  | 2130/50000 [23:06<7:41:26,  1.73it/s]


  4%|█▌                                  | 2131/50000 [23:06<8:47:44,  1.51it/s]


  4%|█▌                                  | 2132/50000 [23:07<8:30:35,  1.56it/s]


  4%|█▌                                  | 2133/50000 [23:08<8:36:54,  1.54it/s]


  4%|█▌                                  | 2134/50000 [23:08<8:07:16,  1.64it/s]


  4%|█▌                                  | 2135/50000 [23:09<8:02:22,  1.65it/s]


  4%|█▌                                  | 2136/50000 [23:09<8:29:19,  1.57it/s]


  4%|█▌                                  | 2137/50000 [23:10<8:08:53,  1.63it/s]


  4%|█▌                                  | 2138/50000 [23:11<8:12:43,  1.62it/s]


  4%|█▌                                  | 2139/50000 [23:11<8:07:45,  1.64it/s]


  4%|█▌                                  | 2140/50000 [23:12<7:55:25,  1.68it/s]


  4%|█▌                                  | 2141/50000 [23:12<8:03:20,  1.65it/s]


  4%|█▌                                  | 2142/50000 [23:13<7:56:43,  1.67it/s]


  4%|█▌                                  | 2143/50000 [23:14<8:05:50,  1.64it/s]


  4%|█▌                                  | 2144/50000 [23:14<8:09:05,  1.63it/s]


  4%|█▌                                  | 2145/50000 [23:15<8:23:21,  1.58it/s]


  4%|█▌                                  | 2146/50000 [23:16<8:25:51,  1.58it/s]


  4%|█▌                                  | 2147/50000 [23:16<9:19:15,  1.43it/s]


  4%|█▌                                  | 2148/50000 [23:17<9:23:08,  1.42it/s]


  4%|█▌                                  | 2149/50000 [23:18<8:54:03,  1.49it/s]


  4%|█▌                                  | 2150/50000 [23:18<8:57:59,  1.48it/s]


  4%|█▌                                  | 2151/50000 [23:19<8:52:57,  1.50it/s]


  4%|█▌                                  | 2152/50000 [23:20<8:26:56,  1.57it/s]


  4%|█▌                                  | 2153/50000 [23:20<8:32:51,  1.55it/s]


  4%|█▌                                  | 2154/50000 [23:21<8:21:37,  1.59it/s]


  4%|█▌                                  | 2155/50000 [23:22<8:24:33,  1.58it/s]


  4%|█▌                                  | 2156/50000 [23:22<8:14:37,  1.61it/s]


  4%|█▌                                  | 2157/50000 [23:23<8:22:40,  1.59it/s]


  4%|█▌                                  | 2158/50000 [23:23<8:32:22,  1.56it/s]


  4%|█▌                                  | 2159/50000 [23:24<8:54:52,  1.49it/s]


  4%|█▌                                  | 2160/50000 [23:25<8:44:18,  1.52it/s]


  4%|█▌                                  | 2161/50000 [23:25<8:39:56,  1.53it/s]


  4%|█▌                                  | 2162/50000 [23:26<9:03:07,  1.47it/s]


  4%|█▌                                  | 2163/50000 [23:27<8:59:07,  1.48it/s]


  4%|█▌                                  | 2164/50000 [23:27<8:32:13,  1.56it/s]


  4%|█▌                                  | 2165/50000 [23:28<8:32:59,  1.55it/s]


  4%|█▌                                  | 2166/50000 [23:29<8:12:27,  1.62it/s]


  4%|█▌                                  | 2167/50000 [23:29<8:08:26,  1.63it/s]


  4%|█▌                                  | 2168/50000 [23:30<8:10:20,  1.63it/s]


  4%|█▌                                  | 2169/50000 [23:31<8:34:53,  1.55it/s]


  4%|█▌                                  | 2170/50000 [23:31<8:24:12,  1.58it/s]


  4%|█▌                                  | 2171/50000 [23:32<8:22:52,  1.59it/s]


  4%|█▌                                  | 2172/50000 [23:32<8:15:49,  1.61it/s]


  4%|█▌                                  | 2173/50000 [23:33<8:02:15,  1.65it/s]


  4%|█▌                                  | 2174/50000 [23:34<7:53:52,  1.68it/s]


  4%|█▌                                  | 2175/50000 [23:34<7:52:22,  1.69it/s]


  4%|█▌                                  | 2176/50000 [23:35<8:47:16,  1.51it/s]


  4%|█▌                                  | 2177/50000 [23:36<8:29:50,  1.56it/s]


  4%|█▌                                  | 2178/50000 [23:36<8:07:08,  1.64it/s]


  4%|█▌                                  | 2179/50000 [23:37<8:32:40,  1.55it/s]


  4%|█▌                                  | 2180/50000 [23:37<8:36:27,  1.54it/s]


  4%|█▌                                  | 2181/50000 [23:38<8:11:43,  1.62it/s]


  4%|█▌                                  | 2182/50000 [23:39<8:03:30,  1.65it/s]


  4%|█▌                                  | 2183/50000 [23:39<8:45:12,  1.52it/s]


  4%|█▌                                  | 2184/50000 [23:40<8:44:59,  1.52it/s]


  4%|█▌                                  | 2185/50000 [23:41<9:03:35,  1.47it/s]


  4%|█▌                                  | 2186/50000 [23:41<8:59:23,  1.48it/s]


  4%|█▌                                  | 2187/50000 [23:42<8:35:22,  1.55it/s]


  4%|█▌                                  | 2188/50000 [23:43<8:17:04,  1.60it/s]


  4%|█▌                                  | 2189/50000 [23:43<8:08:32,  1.63it/s]


  4%|█▌                                  | 2190/50000 [23:44<8:14:39,  1.61it/s]


  4%|█▌                                  | 2191/50000 [23:44<8:02:32,  1.65it/s]


  4%|█▌                                  | 2192/50000 [23:45<7:42:34,  1.72it/s]


  4%|█▌                                  | 2193/50000 [23:45<7:27:13,  1.78it/s]


  4%|█▌                                  | 2194/50000 [23:46<7:24:45,  1.79it/s]


  4%|█▌                                  | 2195/50000 [23:47<8:08:58,  1.63it/s]


  4%|█▌                                  | 2196/50000 [23:47<8:02:52,  1.65it/s]


  4%|█▌                                  | 2197/50000 [23:48<8:00:10,  1.66it/s]


  4%|█▌                                  | 2198/50000 [23:48<7:50:48,  1.69it/s]


  4%|█▌                                  | 2199/50000 [23:49<8:13:02,  1.62it/s]


  4%|█▌                                  | 2200/50000 [23:50<8:25:23,  1.58it/s]
                                                                                
{'loss': 3.4336, 'grad_norm': 2.453179121017456, 'learning_rate': 0.0009559999999999999, 'epoch': 0.12}

  4%|█▌                                  | 2200/50000 [23:50<8:25:23,  1.58it/s]


  4%|█▌                                  | 2201/50000 [23:50<8:11:28,  1.62it/s]


  4%|█▌                                  | 2202/50000 [23:51<8:36:53,  1.54it/s]


  4%|█▌                                  | 2203/50000 [23:52<8:21:05,  1.59it/s]


  4%|█▌                                  | 2204/50000 [23:52<8:14:27,  1.61it/s]


  4%|█▌                                  | 2205/50000 [23:53<8:43:44,  1.52it/s]


  4%|█▌                                  | 2206/50000 [23:54<8:39:17,  1.53it/s]


  4%|█▌                                  | 2207/50000 [23:54<8:59:23,  1.48it/s]


  4%|█▌                                  | 2208/50000 [23:55<8:59:21,  1.48it/s]


  4%|█▌                                  | 2209/50000 [23:56<8:53:14,  1.49it/s]


  4%|█▌                                  | 2210/50000 [23:57<9:32:51,  1.39it/s]


  4%|█▌                                  | 2211/50000 [23:57<9:04:44,  1.46it/s]


  4%|█▌                                  | 2212/50000 [23:58<8:54:00,  1.49it/s]


  4%|█▌                                  | 2213/50000 [23:58<8:55:35,  1.49it/s]


  4%|█▌                                  | 2214/50000 [23:59<8:44:42,  1.52it/s]


  4%|█▌                                  | 2215/50000 [24:00<9:01:42,  1.47it/s]


  4%|█▌                                  | 2216/50000 [24:00<8:22:28,  1.58it/s]


  4%|█▌                                  | 2217/50000 [24:01<8:47:36,  1.51it/s]


  4%|█▌                                  | 2218/50000 [24:02<9:03:41,  1.46it/s]


  4%|█▌                                  | 2219/50000 [24:02<8:38:16,  1.54it/s]


  4%|█▌                                  | 2220/50000 [24:03<8:16:17,  1.60it/s]


  4%|█▌                                  | 2221/50000 [24:04<8:26:42,  1.57it/s]


  4%|█▌                                  | 2222/50000 [24:04<8:28:09,  1.57it/s]


  4%|█▌                                  | 2223/50000 [24:05<8:31:59,  1.56it/s]


  4%|█▌                                  | 2224/50000 [24:06<8:33:07,  1.55it/s]


  4%|█▌                                  | 2225/50000 [24:06<8:20:15,  1.59it/s]


  4%|█▌                                  | 2226/50000 [24:07<8:33:57,  1.55it/s]


  4%|█▌                                  | 2227/50000 [24:08<8:53:10,  1.49it/s]


  4%|█▌                                  | 2228/50000 [24:08<8:29:55,  1.56it/s]


  4%|█▌                                  | 2229/50000 [24:09<8:10:40,  1.62it/s]


  4%|█▌                                  | 2230/50000 [24:09<8:35:25,  1.54it/s]


  4%|█▌                                  | 2231/50000 [24:10<8:50:13,  1.50it/s]


  4%|█▌                                  | 2232/50000 [24:11<9:16:22,  1.43it/s]


  4%|█▌                                  | 2233/50000 [24:12<9:20:22,  1.42it/s]


  4%|█▌                                  | 2234/50000 [24:12<9:04:59,  1.46it/s]


  4%|█▌                                  | 2235/50000 [24:13<9:15:56,  1.43it/s]


  4%|█▌                                  | 2236/50000 [24:14<9:01:44,  1.47it/s]


  4%|█▌                                  | 2237/50000 [24:14<8:47:23,  1.51it/s]


  4%|█▌                                  | 2238/50000 [24:15<8:50:49,  1.50it/s]


  4%|█▌                                  | 2239/50000 [24:16<8:29:09,  1.56it/s]


  4%|█▌                                  | 2240/50000 [24:16<8:56:54,  1.48it/s]


  4%|█▌                                  | 2241/50000 [24:17<8:40:48,  1.53it/s]


  4%|█▌                                  | 2242/50000 [24:18<8:37:12,  1.54it/s]


  4%|█▌                                  | 2243/50000 [24:18<8:17:59,  1.60it/s]


  4%|█▌                                  | 2244/50000 [24:19<8:21:02,  1.59it/s]


  4%|█▌                                  | 2245/50000 [24:19<8:06:57,  1.63it/s]


  4%|█▌                                  | 2246/50000 [24:20<8:12:11,  1.62it/s]


  4%|█▌                                  | 2247/50000 [24:21<8:18:34,  1.60it/s]


  4%|█▌                                  | 2248/50000 [24:21<8:30:38,  1.56it/s]


  4%|█▌                                  | 2249/50000 [24:22<8:07:30,  1.63it/s]


  4%|█▌                                  | 2250/50000 [24:22<8:18:23,  1.60it/s]


  5%|█▌                                  | 2251/50000 [24:23<8:22:59,  1.58it/s]


  5%|█▌                                  | 2252/50000 [24:24<8:24:31,  1.58it/s]


  5%|█▌                                  | 2253/50000 [24:24<8:27:09,  1.57it/s]


  5%|█▌                                  | 2254/50000 [24:25<8:12:29,  1.62it/s]


  5%|█▌                                  | 2255/50000 [24:26<8:38:55,  1.53it/s]


  5%|█▌                                  | 2256/50000 [24:26<8:52:25,  1.49it/s]


  5%|█▋                                  | 2257/50000 [24:27<8:37:35,  1.54it/s]


  5%|█▋                                  | 2258/50000 [24:28<9:20:06,  1.42it/s]


  5%|█▋                                  | 2259/50000 [24:28<8:55:35,  1.49it/s]


  5%|█▋                                  | 2260/50000 [24:29<9:39:02,  1.37it/s]


  5%|█▋                                  | 2261/50000 [24:30<9:05:47,  1.46it/s]


  5%|█▋                                  | 2262/50000 [24:30<8:46:54,  1.51it/s]


  5%|█▋                                  | 2263/50000 [24:31<8:23:16,  1.58it/s]


  5%|█▋                                  | 2264/50000 [24:32<8:24:44,  1.58it/s]


  5%|█▋                                  | 2265/50000 [24:32<8:12:22,  1.62it/s]


  5%|█▋                                  | 2266/50000 [24:33<8:41:55,  1.52it/s]


  5%|█▋                                  | 2267/50000 [24:34<8:59:40,  1.47it/s]


  5%|█▋                                  | 2268/50000 [24:34<8:46:33,  1.51it/s]


  5%|█▋                                  | 2269/50000 [24:35<8:57:57,  1.48it/s]


  5%|█▋                                  | 2270/50000 [24:36<8:51:58,  1.50it/s]


  5%|█▋                                  | 2271/50000 [24:36<8:45:41,  1.51it/s]


  5%|█▋                                  | 2272/50000 [24:37<8:09:54,  1.62it/s]


  5%|█▋                                  | 2273/50000 [24:37<8:06:16,  1.64it/s]


  5%|█▋                                  | 2274/50000 [24:38<8:34:36,  1.55it/s]


  5%|█▋                                  | 2275/50000 [24:39<8:03:29,  1.65it/s]


  5%|█▋                                  | 2276/50000 [24:39<7:58:49,  1.66it/s]


  5%|█▋                                  | 2277/50000 [24:40<8:24:25,  1.58it/s]


  5%|█▋                                  | 2278/50000 [24:41<8:34:40,  1.55it/s]


  5%|█▋                                  | 2279/50000 [24:41<8:37:23,  1.54it/s]


  5%|█▋                                  | 2280/50000 [24:42<8:40:43,  1.53it/s]


  5%|█▋                                  | 2281/50000 [24:43<8:17:30,  1.60it/s]


  5%|█▋                                  | 2282/50000 [24:43<8:39:32,  1.53it/s]


  5%|█▋                                  | 2283/50000 [24:44<8:46:48,  1.51it/s]


  5%|█▋                                  | 2284/50000 [24:45<8:42:49,  1.52it/s]


  5%|█▋                                  | 2285/50000 [24:45<8:47:31,  1.51it/s]


  5%|█▋                                  | 2286/50000 [24:46<8:49:14,  1.50it/s]


  5%|█▋                                  | 2287/50000 [24:47<9:03:07,  1.46it/s]


  5%|█▋                                  | 2288/50000 [24:47<8:49:36,  1.50it/s]


  5%|█▋                                  | 2289/50000 [24:48<8:40:18,  1.53it/s]


  5%|█▋                                  | 2290/50000 [24:49<8:56:47,  1.48it/s]


  5%|█▋                                  | 2291/50000 [24:49<8:48:16,  1.51it/s]


  5%|█▋                                  | 2292/50000 [24:50<8:11:23,  1.62it/s]


  5%|█▋                                  | 2293/50000 [24:51<8:41:17,  1.53it/s]


  5%|█▋                                  | 2294/50000 [24:51<8:40:21,  1.53it/s]


  5%|█▋                                  | 2295/50000 [24:52<8:24:38,  1.58it/s]


  5%|█▋                                  | 2296/50000 [24:53<8:43:47,  1.52it/s]


  5%|█▋                                  | 2297/50000 [24:53<8:48:13,  1.51it/s]


  5%|█▋                                  | 2298/50000 [24:54<9:00:22,  1.47it/s]


  5%|█▋                                  | 2299/50000 [24:55<8:58:26,  1.48it/s]


  5%|█▋                                  | 2300/50000 [24:55<9:05:59,  1.46it/s]
                                                                                
{'loss': 3.3779, 'grad_norm': 2.2879526615142822, 'learning_rate': 0.000954, 'epoch': 0.12}

  5%|█▋                                  | 2300/50000 [24:55<9:05:59,  1.46it/s]


  5%|█▋                                  | 2301/50000 [24:56<8:32:19,  1.55it/s]


  5%|█▋                                  | 2302/50000 [24:57<8:34:08,  1.55it/s]


  5%|█▋                                  | 2303/50000 [24:57<8:14:14,  1.61it/s]


  5%|█▋                                  | 2304/50000 [24:58<8:04:50,  1.64it/s]


  5%|█▋                                  | 2305/50000 [24:58<8:08:53,  1.63it/s]


  5%|█▋                                  | 2306/50000 [24:59<7:53:36,  1.68it/s]


  5%|█▋                                  | 2307/50000 [24:59<7:54:51,  1.67it/s]


  5%|█▋                                  | 2308/50000 [25:00<7:48:19,  1.70it/s]


  5%|█▋                                  | 2309/50000 [25:01<8:01:51,  1.65it/s]


  5%|█▋                                  | 2310/50000 [25:01<7:53:09,  1.68it/s]


  5%|█▋                                  | 2311/50000 [25:02<8:09:41,  1.62it/s]


  5%|█▋                                  | 2312/50000 [25:03<8:40:00,  1.53it/s]


  5%|█▋                                  | 2313/50000 [25:03<8:24:16,  1.58it/s]


  5%|█▋                                  | 2314/50000 [25:04<8:25:24,  1.57it/s]


  5%|█▋                                  | 2315/50000 [25:04<8:12:38,  1.61it/s]


  5%|█▋                                  | 2316/50000 [25:05<8:20:11,  1.59it/s]


  5%|█▋                                  | 2317/50000 [25:06<8:30:01,  1.56it/s]


  5%|█▋                                  | 2318/50000 [25:06<8:01:32,  1.65it/s]


  5%|█▋                                  | 2319/50000 [25:07<8:29:38,  1.56it/s]


  5%|█▋                                  | 2320/50000 [25:08<8:24:46,  1.57it/s]


  5%|█▋                                  | 2321/50000 [25:08<7:56:27,  1.67it/s]


  5%|█▋                                  | 2322/50000 [25:09<8:04:26,  1.64it/s]


  5%|█▋                                  | 2323/50000 [25:09<8:28:00,  1.56it/s]


  5%|█▋                                  | 2324/50000 [25:10<8:30:55,  1.56it/s]


  5%|█▋                                  | 2325/50000 [25:11<8:00:36,  1.65it/s]


  5%|█▋                                  | 2326/50000 [25:11<7:52:17,  1.68it/s]


  5%|█▋                                  | 2327/50000 [25:12<7:50:49,  1.69it/s]


  5%|█▋                                  | 2328/50000 [25:12<7:52:49,  1.68it/s]


  5%|█▋                                  | 2329/50000 [25:13<8:06:37,  1.63it/s]


  5%|█▋                                  | 2330/50000 [25:14<7:59:54,  1.66it/s]


  5%|█▋                                  | 2331/50000 [25:14<7:59:13,  1.66it/s]


  5%|█▋                                  | 2332/50000 [25:15<8:54:26,  1.49it/s]


  5%|█▋                                  | 2333/50000 [25:16<8:38:42,  1.53it/s]


  5%|█▋                                  | 2334/50000 [25:16<8:13:02,  1.61it/s]


  5%|█▋                                  | 2335/50000 [25:17<8:21:50,  1.58it/s]


  5%|█▋                                  | 2336/50000 [25:18<9:54:50,  1.34it/s]


  5%|█▋                                  | 2337/50000 [25:18<9:10:38,  1.44it/s]


  5%|█▋                                  | 2338/50000 [25:19<8:57:58,  1.48it/s]


  5%|█▋                                  | 2339/50000 [25:20<9:18:15,  1.42it/s]


  5%|█▋                                  | 2340/50000 [25:21<9:01:44,  1.47it/s]


  5%|█▋                                  | 2341/50000 [25:21<8:35:00,  1.54it/s]


  5%|█▋                                  | 2342/50000 [25:22<8:50:52,  1.50it/s]


  5%|█▋                                  | 2343/50000 [25:22<8:51:27,  1.49it/s]


  5%|█▋                                  | 2344/50000 [25:23<9:13:37,  1.43it/s]


  5%|█▋                                  | 2345/50000 [25:24<8:38:53,  1.53it/s]


  5%|█▋                                  | 2346/50000 [25:25<9:09:25,  1.45it/s]


  5%|█▋                                  | 2347/50000 [25:25<8:46:20,  1.51it/s]


  5%|█▋                                  | 2348/50000 [25:26<9:30:35,  1.39it/s]


  5%|█▋                                  | 2349/50000 [25:27<8:53:42,  1.49it/s]


  5%|█▋                                  | 2350/50000 [25:27<9:13:39,  1.43it/s]


  5%|█▋                                  | 2351/50000 [25:28<9:04:29,  1.46it/s]


  5%|█▋                                  | 2352/50000 [25:29<8:52:02,  1.49it/s]


  5%|█▋                                  | 2353/50000 [25:29<9:01:12,  1.47it/s]


  5%|█▋                                  | 2354/50000 [25:30<9:23:36,  1.41it/s]


  5%|█▋                                  | 2355/50000 [25:31<9:32:58,  1.39it/s]


  5%|█▋                                  | 2356/50000 [25:31<9:14:43,  1.43it/s]


  5%|█▋                                  | 2357/50000 [25:32<9:03:36,  1.46it/s]


  5%|█▋                                  | 2358/50000 [25:33<8:54:17,  1.49it/s]


  5%|█▋                                  | 2359/50000 [25:33<8:37:09,  1.54it/s]


  5%|█▋                                  | 2360/50000 [25:34<9:18:52,  1.42it/s]


  5%|█▋                                  | 2361/50000 [25:35<8:53:13,  1.49it/s]


  5%|█▋                                  | 2362/50000 [25:35<8:17:39,  1.60it/s]


  5%|█▋                                  | 2363/50000 [25:36<9:12:02,  1.44it/s]


  5%|█▋                                  | 2364/50000 [25:37<9:29:26,  1.39it/s]


  5%|█▋                                  | 2365/50000 [25:38<9:09:27,  1.44it/s]


  5%|█▋                                  | 2366/50000 [25:38<9:16:50,  1.43it/s]


  5%|█▋                                  | 2367/50000 [25:39<9:07:54,  1.45it/s]


  5%|█▋                                  | 2368/50000 [25:40<8:39:17,  1.53it/s]


  5%|█▋                                  | 2369/50000 [25:40<8:35:32,  1.54it/s]


  5%|█▋                                  | 2370/50000 [25:41<8:37:38,  1.53it/s]


  5%|█▋                                  | 2371/50000 [25:42<9:03:47,  1.46it/s]


  5%|█▋                                  | 2372/50000 [25:42<9:01:54,  1.46it/s]


  5%|█▋                                  | 2373/50000 [25:43<8:42:31,  1.52it/s]


  5%|█▋                                  | 2374/50000 [25:43<8:30:19,  1.56it/s]


  5%|█▋                                  | 2375/50000 [25:44<8:34:34,  1.54it/s]


  5%|█▋                                  | 2376/50000 [25:45<8:36:55,  1.54it/s]


  5%|█▋                                  | 2377/50000 [25:45<8:26:24,  1.57it/s]


  5%|█▋                                  | 2378/50000 [25:46<8:37:17,  1.53it/s]


  5%|█▋                                  | 2379/50000 [25:47<8:06:29,  1.63it/s]


  5%|█▋                                  | 2380/50000 [25:47<9:07:02,  1.45it/s]


  5%|█▋                                  | 2381/50000 [25:48<8:35:40,  1.54it/s]


  5%|█▋                                  | 2382/50000 [25:49<8:29:06,  1.56it/s]


  5%|█▋                                  | 2383/50000 [25:49<8:06:09,  1.63it/s]


  5%|█▋                                  | 2384/50000 [25:50<8:20:04,  1.59it/s]


  5%|█▋                                  | 2385/50000 [25:51<8:53:08,  1.49it/s]


  5%|█▋                                  | 2386/50000 [25:51<8:08:48,  1.62it/s]


  5%|█▋                                  | 2387/50000 [25:52<7:40:43,  1.72it/s]


  5%|█▋                                  | 2388/50000 [25:52<7:59:12,  1.66it/s]


  5%|█▋                                  | 2389/50000 [25:53<8:04:39,  1.64it/s]


  5%|█▋                                  | 2390/50000 [25:54<8:59:27,  1.47it/s]


  5%|█▋                                  | 2391/50000 [25:54<9:09:55,  1.44it/s]


  5%|█▋                                  | 2392/50000 [25:55<8:49:28,  1.50it/s]


  5%|█▋                                  | 2393/50000 [25:56<8:45:50,  1.51it/s]


  5%|█▋                                  | 2394/50000 [25:56<8:51:02,  1.49it/s]


  5%|█▋                                  | 2395/50000 [25:57<8:52:56,  1.49it/s]


  5%|█▋                                  | 2396/50000 [25:58<8:22:52,  1.58it/s]


  5%|█▋                                  | 2397/50000 [25:58<8:40:56,  1.52it/s]


  5%|█▋                                  | 2398/50000 [25:59<9:10:36,  1.44it/s]


  5%|█▋                                  | 2399/50000 [26:00<8:38:05,  1.53it/s]


  5%|█▋                                  | 2400/50000 [26:00<8:24:10,  1.57it/s]
                                                                                
{'loss': 3.3718, 'grad_norm': 2.5939042568206787, 'learning_rate': 0.0009519999999999999, 'epoch': 0.13}

  5%|█▋                                  | 2400/50000 [26:00<8:24:10,  1.57it/s]


  5%|█▋                                  | 2401/50000 [26:01<8:49:45,  1.50it/s]


  5%|█▋                                  | 2402/50000 [26:02<8:27:58,  1.56it/s]


  5%|█▋                                  | 2403/50000 [26:02<8:27:40,  1.56it/s]


  5%|█▋                                  | 2404/50000 [26:03<8:09:25,  1.62it/s]


  5%|█▋                                  | 2405/50000 [26:03<8:16:18,  1.60it/s]


  5%|█▋                                  | 2406/50000 [26:04<8:48:47,  1.50it/s]


  5%|█▋                                  | 2407/50000 [26:05<8:52:53,  1.49it/s]


  5%|█▋                                  | 2408/50000 [26:05<8:28:39,  1.56it/s]


  5%|█▋                                  | 2409/50000 [26:06<8:14:42,  1.60it/s]


  5%|█▋                                  | 2410/50000 [26:07<8:21:15,  1.58it/s]


  5%|█▋                                  | 2411/50000 [26:07<8:25:51,  1.57it/s]


  5%|█▋                                  | 2412/50000 [26:08<8:24:46,  1.57it/s]


  5%|█▋                                  | 2413/50000 [26:09<8:12:04,  1.61it/s]


  5%|█▋                                  | 2414/50000 [26:09<8:21:07,  1.58it/s]


  5%|█▋                                  | 2415/50000 [26:10<8:56:37,  1.48it/s]


  5%|█▋                                  | 2416/50000 [26:11<8:56:43,  1.48it/s]


  5%|█▋                                  | 2417/50000 [26:11<8:44:52,  1.51it/s]


  5%|█▋                                  | 2418/50000 [26:12<8:24:29,  1.57it/s]


  5%|█▋                                  | 2419/50000 [26:13<8:56:01,  1.48it/s]


  5%|█▋                                  | 2420/50000 [26:13<8:16:26,  1.60it/s]


  5%|█▋                                  | 2421/50000 [26:14<8:47:10,  1.50it/s]


  5%|█▋                                  | 2422/50000 [26:15<8:45:29,  1.51it/s]


  5%|█▋                                  | 2423/50000 [26:15<8:20:22,  1.58it/s]


  5%|█▋                                  | 2424/50000 [26:16<9:08:56,  1.44it/s]


  5%|█▋                                  | 2425/50000 [26:17<9:06:55,  1.45it/s]


  5%|█▋                                  | 2426/50000 [26:17<8:59:33,  1.47it/s]


  5%|█▋                                  | 2427/50000 [26:18<8:33:43,  1.54it/s]


  5%|█▋                                  | 2428/50000 [26:18<8:20:02,  1.59it/s]


  5%|█▋                                  | 2429/50000 [26:19<8:41:26,  1.52it/s]


  5%|█▋                                  | 2430/50000 [26:20<9:06:51,  1.45it/s]


  5%|█▊                                  | 2431/50000 [26:21<8:56:43,  1.48it/s]


  5%|█▊                                  | 2432/50000 [26:21<9:04:48,  1.46it/s]


  5%|█▊                                  | 2433/50000 [26:22<9:01:19,  1.46it/s]


  5%|█▊                                  | 2434/50000 [26:23<8:42:23,  1.52it/s]


  5%|█▊                                  | 2435/50000 [26:23<8:38:12,  1.53it/s]


  5%|█▊                                  | 2436/50000 [26:24<8:53:59,  1.48it/s]


  5%|█▊                                  | 2437/50000 [26:25<9:10:57,  1.44it/s]


  5%|█▊                                  | 2438/50000 [26:25<9:08:49,  1.44it/s]


  5%|█▊                                  | 2439/50000 [26:26<8:38:36,  1.53it/s]


  5%|█▊                                  | 2440/50000 [26:27<8:54:18,  1.48it/s]


  5%|█▊                                  | 2441/50000 [26:27<9:11:16,  1.44it/s]


  5%|█▊                                  | 2442/50000 [26:28<9:02:58,  1.46it/s]


  5%|█▊                                  | 2443/50000 [26:29<8:37:31,  1.53it/s]


  5%|█▊                                  | 2444/50000 [26:29<8:30:39,  1.55it/s]


  5%|█▊                                  | 2445/50000 [26:30<8:37:32,  1.53it/s]


  5%|█▊                                  | 2446/50000 [26:31<9:02:25,  1.46it/s]


  5%|█▊                                  | 2447/50000 [26:31<8:50:47,  1.49it/s]


  5%|█▊                                  | 2448/50000 [26:32<8:30:36,  1.55it/s]


  5%|█▊                                  | 2449/50000 [26:33<8:35:53,  1.54it/s]


  5%|█▊                                  | 2450/50000 [26:34<9:56:54,  1.33it/s]


  5%|█▊                                  | 2451/50000 [26:34<9:37:57,  1.37it/s]


  5%|█▊                                  | 2452/50000 [26:35<9:57:22,  1.33it/s]


  5%|█▊                                  | 2453/50000 [26:36<9:53:20,  1.34it/s]


  5%|█▊                                  | 2454/50000 [26:37<9:59:35,  1.32it/s]


  5%|█▊                                  | 2455/50000 [26:37<9:53:24,  1.34it/s]


  5%|█▊                                  | 2456/50000 [26:38<9:25:50,  1.40it/s]


  5%|█▊                                  | 2457/50000 [26:39<9:26:48,  1.40it/s]


  5%|█▊                                  | 2458/50000 [26:39<8:46:24,  1.51it/s]


  5%|█▊                                  | 2459/50000 [26:40<8:45:12,  1.51it/s]


  5%|█▊                                  | 2460/50000 [26:41<8:38:35,  1.53it/s]


  5%|█▊                                  | 2461/50000 [26:41<8:07:43,  1.62it/s]


  5%|█▊                                  | 2462/50000 [26:42<8:12:26,  1.61it/s]


  5%|█▊                                  | 2463/50000 [26:42<8:02:08,  1.64it/s]


  5%|█▊                                  | 2464/50000 [26:43<7:48:59,  1.69it/s]


  5%|█▊                                  | 2465/50000 [26:43<7:42:41,  1.71it/s]


  5%|█▊                                  | 2466/50000 [26:44<7:42:54,  1.71it/s]


  5%|█▊                                  | 2467/50000 [26:45<8:18:01,  1.59it/s]


  5%|█▊                                  | 2468/50000 [26:45<7:58:43,  1.65it/s]


  5%|█▊                                  | 2469/50000 [26:46<8:07:34,  1.62it/s]


  5%|█▊                                  | 2470/50000 [26:47<8:47:17,  1.50it/s]


  5%|█▊                                  | 2471/50000 [26:47<8:46:20,  1.51it/s]


  5%|█▊                                  | 2472/50000 [26:48<8:42:16,  1.52it/s]


  5%|█▊                                  | 2473/50000 [26:49<8:47:03,  1.50it/s]


  5%|█▊                                  | 2474/50000 [26:49<9:06:05,  1.45it/s]


  5%|█▊                                  | 2475/50000 [26:50<8:39:54,  1.52it/s]


  5%|█▊                                  | 2476/50000 [26:51<8:08:31,  1.62it/s]


  5%|█▊                                  | 2477/50000 [26:51<7:56:20,  1.66it/s]


  5%|█▊                                  | 2478/50000 [26:52<7:47:35,  1.69it/s]


  5%|█▊                                  | 2479/50000 [26:52<7:45:48,  1.70it/s]


  5%|█▊                                  | 2480/50000 [26:53<8:08:29,  1.62it/s]


  5%|█▊                                  | 2481/50000 [26:53<8:00:22,  1.65it/s]


  5%|█▊                                  | 2482/50000 [26:54<8:03:47,  1.64it/s]


  5%|█▊                                  | 2483/50000 [26:55<8:12:43,  1.61it/s]


  5%|█▊                                  | 2484/50000 [26:55<8:22:49,  1.57it/s]


  5%|█▊                                  | 2485/50000 [26:56<8:08:54,  1.62it/s]


  5%|█▊                                  | 2486/50000 [26:57<8:40:14,  1.52it/s]


  5%|█▊                                  | 2487/50000 [26:58<9:05:39,  1.45it/s]


  5%|█▊                                  | 2488/50000 [26:58<8:43:46,  1.51it/s]


  5%|█▊                                  | 2489/50000 [26:59<8:41:57,  1.52it/s]


  5%|█▊                                  | 2490/50000 [27:00<9:21:04,  1.41it/s]


  5%|█▊                                  | 2491/50000 [27:00<8:56:06,  1.48it/s]


  5%|█▊                                  | 2492/50000 [27:01<8:43:19,  1.51it/s]


  5%|█▊                                  | 2493/50000 [27:01<8:26:55,  1.56it/s]


  5%|█▊                                  | 2494/50000 [27:02<8:37:47,  1.53it/s]


  5%|█▊                                  | 2495/50000 [27:03<8:33:20,  1.54it/s]


  5%|█▊                                  | 2496/50000 [27:04<9:04:02,  1.46it/s]


  5%|█▊                                  | 2497/50000 [27:04<8:18:06,  1.59it/s]


  5%|█▊                                  | 2498/50000 [27:05<8:04:35,  1.63it/s]


  5%|█▊                                  | 2499/50000 [27:05<8:28:13,  1.56it/s]


  5%|█▊                                  | 2500/50000 [27:06<8:32:55,  1.54it/s]
                                                                                
{'loss': 3.3836, 'grad_norm': 2.323286771774292, 'learning_rate': 0.00095, 'epoch': 0.13}

  5%|█▊                                  | 2500/50000 [27:06<8:32:55,  1.54it/s]


  5%|█▊                                  | 2501/50000 [27:07<8:13:29,  1.60it/s]


  5%|█▊                                  | 2502/50000 [27:07<8:44:53,  1.51it/s]


  5%|█▊                                  | 2503/50000 [27:08<9:00:47,  1.46it/s]


  5%|█▊                                  | 2504/50000 [27:09<8:50:18,  1.49it/s]


  5%|█▊                                  | 2505/50000 [27:09<8:33:52,  1.54it/s]


  5%|█▊                                  | 2506/50000 [27:10<8:02:15,  1.64it/s]


  5%|█▊                                  | 2507/50000 [27:10<8:00:20,  1.65it/s]


  5%|█▊                                  | 2508/50000 [27:11<7:58:58,  1.65it/s]


  5%|█▊                                  | 2509/50000 [27:12<8:05:24,  1.63it/s]


  5%|█▊                                  | 2510/50000 [27:12<8:13:22,  1.60it/s]


  5%|█▊                                  | 2511/50000 [27:13<8:27:46,  1.56it/s]


  5%|█▊                                  | 2512/50000 [27:14<8:34:40,  1.54it/s]


  5%|█▊                                  | 2513/50000 [27:14<8:42:39,  1.51it/s]


  5%|█▊                                  | 2514/50000 [27:15<8:21:17,  1.58it/s]


  5%|█▊                                  | 2515/50000 [27:16<8:33:17,  1.54it/s]


  5%|█▊                                  | 2516/50000 [27:16<8:14:35,  1.60it/s]


  5%|█▊                                  | 2517/50000 [27:17<8:16:25,  1.59it/s]


  5%|█▊                                  | 2518/50000 [27:17<8:30:52,  1.55it/s]


  5%|█▊                                  | 2519/50000 [27:18<8:14:23,  1.60it/s]


  5%|█▊                                  | 2520/50000 [27:19<8:01:20,  1.64it/s]


  5%|█▊                                  | 2521/50000 [27:19<8:17:47,  1.59it/s]


  5%|█▊                                  | 2522/50000 [27:20<8:40:13,  1.52it/s]


  5%|█▊                                  | 2523/50000 [27:21<8:34:31,  1.54it/s]


  5%|█▊                                  | 2524/50000 [27:21<8:24:15,  1.57it/s]


  5%|█▊                                  | 2525/50000 [27:22<8:58:21,  1.47it/s]


  5%|█▊                                  | 2526/50000 [27:23<8:28:08,  1.56it/s]


  5%|█▊                                  | 2527/50000 [27:23<9:15:31,  1.42it/s]


  5%|█▊                                  | 2528/50000 [27:24<9:18:40,  1.42it/s]


  5%|█▊                                  | 2529/50000 [27:25<9:35:03,  1.38it/s]


  5%|█▊                                  | 2530/50000 [27:26<9:14:31,  1.43it/s]


  5%|█▊                                  | 2531/50000 [27:26<8:46:58,  1.50it/s]


  5%|█▊                                  | 2532/50000 [27:27<8:44:10,  1.51it/s]


  5%|█▊                                  | 2533/50000 [27:27<8:49:31,  1.49it/s]


  5%|█▊                                  | 2534/50000 [27:28<8:15:08,  1.60it/s]


  5%|█▊                                  | 2535/50000 [27:29<8:24:48,  1.57it/s]


  5%|█▊                                  | 2536/50000 [27:29<9:14:54,  1.43it/s]


  5%|█▊                                  | 2537/50000 [27:30<9:11:12,  1.44it/s]


  5%|█▊                                  | 2538/50000 [27:31<8:47:23,  1.50it/s]


  5%|█▊                                  | 2539/50000 [27:31<8:24:57,  1.57it/s]


  5%|█▊                                  | 2540/50000 [27:32<8:30:06,  1.55it/s]


  5%|█▊                                  | 2541/50000 [27:33<8:32:12,  1.54it/s]


  5%|█▊                                  | 2542/50000 [27:33<8:59:50,  1.47it/s]


  5%|█▊                                  | 2543/50000 [27:34<8:42:03,  1.52it/s]


  5%|█▊                                  | 2544/50000 [27:35<8:25:05,  1.57it/s]


  5%|█▊                                  | 2545/50000 [27:35<8:07:08,  1.62it/s]


  5%|█▊                                  | 2546/50000 [27:36<8:40:22,  1.52it/s]


  5%|█▊                                  | 2547/50000 [27:36<8:08:10,  1.62it/s]


  5%|█▊                                  | 2548/50000 [27:37<7:56:55,  1.66it/s]


  5%|█▊                                  | 2549/50000 [27:38<8:16:09,  1.59it/s]


  5%|█▊                                  | 2550/50000 [27:38<8:06:57,  1.62it/s]


  5%|█▊                                  | 2551/50000 [27:39<8:35:33,  1.53it/s]


  5%|█▊                                  | 2552/50000 [27:40<9:06:13,  1.45it/s]


  5%|█▊                                  | 2553/50000 [27:40<8:57:11,  1.47it/s]


  5%|█▊                                  | 2554/50000 [27:41<8:58:15,  1.47it/s]


  5%|█▊                                  | 2555/50000 [27:42<8:48:31,  1.50it/s]


  5%|█▊                                  | 2556/50000 [27:43<9:01:31,  1.46it/s]


  5%|█▊                                  | 2557/50000 [27:43<8:21:31,  1.58it/s]


  5%|█▊                                  | 2558/50000 [27:44<8:21:19,  1.58it/s]


  5%|█▊                                  | 2559/50000 [27:44<8:33:04,  1.54it/s]


  5%|█▊                                  | 2560/50000 [27:45<9:11:11,  1.43it/s]


  5%|█▊                                  | 2561/50000 [27:46<9:51:40,  1.34it/s]


  5%|█▊                                  | 2562/50000 [27:47<9:10:47,  1.44it/s]


  5%|█▊                                  | 2563/50000 [27:47<8:39:19,  1.52it/s]


  5%|█▊                                  | 2564/50000 [27:48<8:30:53,  1.55it/s]


  5%|█▊                                  | 2565/50000 [27:48<8:45:52,  1.50it/s]


  5%|█▊                                  | 2566/50000 [27:49<8:47:16,  1.50it/s]


  5%|█▊                                  | 2567/50000 [27:50<8:18:27,  1.59it/s]


  5%|█▊                                  | 2568/50000 [27:50<8:26:24,  1.56it/s]


  5%|█▊                                  | 2569/50000 [27:51<8:28:11,  1.56it/s]


  5%|█▊                                  | 2570/50000 [27:52<8:27:24,  1.56it/s]


  5%|█▊                                  | 2571/50000 [27:52<8:29:39,  1.55it/s]


  5%|█▊                                  | 2572/50000 [27:53<8:16:18,  1.59it/s]


  5%|█▊                                  | 2573/50000 [27:53<8:07:00,  1.62it/s]


  5%|█▊                                  | 2574/50000 [27:54<8:32:34,  1.54it/s]


  5%|█▊                                  | 2575/50000 [27:55<8:39:20,  1.52it/s]


  5%|█▊                                  | 2576/50000 [27:56<8:56:46,  1.47it/s]


  5%|█▊                                  | 2577/50000 [27:56<8:29:18,  1.55it/s]


  5%|█▊                                  | 2578/50000 [27:57<8:32:34,  1.54it/s]


  5%|█▊                                  | 2579/50000 [27:57<8:16:55,  1.59it/s]


  5%|█▊                                  | 2580/50000 [27:58<8:22:24,  1.57it/s]


  5%|█▊                                  | 2581/50000 [27:59<8:12:49,  1.60it/s]


  5%|█▊                                  | 2582/50000 [27:59<8:33:01,  1.54it/s]


  5%|█▊                                  | 2583/50000 [28:00<8:42:11,  1.51it/s]


  5%|█▊                                  | 2584/50000 [28:01<8:02:13,  1.64it/s]


  5%|█▊                                  | 2585/50000 [28:01<8:06:15,  1.63it/s]


  5%|█▊                                  | 2586/50000 [28:02<8:15:09,  1.60it/s]


  5%|█▊                                  | 2587/50000 [28:02<8:01:41,  1.64it/s]


  5%|█▊                                  | 2588/50000 [28:03<8:07:20,  1.62it/s]


  5%|█▊                                  | 2589/50000 [28:04<8:29:24,  1.55it/s]


  5%|█▊                                  | 2590/50000 [28:04<8:05:47,  1.63it/s]


  5%|█▊                                  | 2591/50000 [28:05<8:11:31,  1.61it/s]


  5%|█▊                                  | 2592/50000 [28:06<8:05:44,  1.63it/s]


  5%|█▊                                  | 2593/50000 [28:06<9:24:18,  1.40it/s]


  5%|█▊                                  | 2594/50000 [28:07<9:09:51,  1.44it/s]


  5%|█▊                                  | 2595/50000 [28:08<8:56:53,  1.47it/s]


  5%|█▊                                  | 2596/50000 [28:08<9:05:46,  1.45it/s]


  5%|█▊                                  | 2597/50000 [28:09<9:04:34,  1.45it/s]


  5%|█▊                                  | 2598/50000 [28:10<8:44:12,  1.51it/s]


  5%|█▊                                  | 2599/50000 [28:10<8:29:53,  1.55it/s]


  5%|█▊                                  | 2600/50000 [28:11<8:19:57,  1.58it/s]
                                                                                
{'loss': 3.4049, 'grad_norm': 2.5335865020751953, 'learning_rate': 0.000948, 'epoch': 0.14}

  5%|█▊                                  | 2600/50000 [28:11<8:19:57,  1.58it/s]


  5%|█▊                                  | 2601/50000 [28:12<8:19:19,  1.58it/s]


  5%|█▊                                  | 2602/50000 [28:12<8:21:27,  1.58it/s]


  5%|█▊                                  | 2603/50000 [28:13<8:18:31,  1.58it/s]


  5%|█▊                                  | 2604/50000 [28:13<8:04:27,  1.63it/s]


  5%|█▉                                  | 2605/50000 [28:14<8:40:05,  1.52it/s]


  5%|█▉                                  | 2606/50000 [28:15<8:26:52,  1.56it/s]


  5%|█▉                                  | 2607/50000 [28:16<8:51:04,  1.49it/s]


  5%|█▉                                  | 2608/50000 [28:16<8:52:06,  1.48it/s]


  5%|█▉                                  | 2609/50000 [28:17<8:46:50,  1.50it/s]


  5%|█▉                                  | 2610/50000 [28:18<8:56:53,  1.47it/s]


  5%|█▉                                  | 2611/50000 [28:18<8:57:39,  1.47it/s]


  5%|█▉                                  | 2612/50000 [28:19<8:37:18,  1.53it/s]


  5%|█▉                                  | 2613/50000 [28:19<8:21:36,  1.57it/s]


  5%|█▉                                  | 2614/50000 [28:20<8:13:36,  1.60it/s]


  5%|█▉                                  | 2615/50000 [28:21<7:54:41,  1.66it/s]


  5%|█▉                                  | 2616/50000 [28:21<8:25:12,  1.56it/s]


  5%|█▉                                  | 2617/50000 [28:22<8:27:54,  1.55it/s]


  5%|█▉                                  | 2618/50000 [28:23<8:14:57,  1.60it/s]


  5%|█▉                                  | 2619/50000 [28:23<8:13:31,  1.60it/s]


  5%|█▉                                  | 2620/50000 [28:24<8:42:52,  1.51it/s]


  5%|█▉                                  | 2621/50000 [28:25<8:25:23,  1.56it/s]


  5%|█▉                                  | 2622/50000 [28:25<8:16:04,  1.59it/s]


  5%|█▉                                  | 2623/50000 [28:26<8:27:39,  1.56it/s]


  5%|█▉                                  | 2624/50000 [28:27<8:35:43,  1.53it/s]


  5%|█▉                                  | 2625/50000 [28:27<8:38:22,  1.52it/s]


  5%|█▉                                  | 2626/50000 [28:28<9:19:06,  1.41it/s]


  5%|█▉                                  | 2627/50000 [28:29<8:52:11,  1.48it/s]


  5%|█▉                                  | 2628/50000 [28:29<8:28:58,  1.55it/s]


  5%|█▉                                  | 2629/50000 [28:30<8:29:03,  1.55it/s]


  5%|█▉                                  | 2630/50000 [28:31<8:59:56,  1.46it/s]


  5%|█▉                                  | 2631/50000 [28:31<8:40:19,  1.52it/s]


  5%|█▉                                  | 2632/50000 [28:32<8:06:40,  1.62it/s]


  5%|█▉                                  | 2633/50000 [28:32<8:02:58,  1.63it/s]


  5%|█▉                                  | 2634/50000 [28:33<7:51:33,  1.67it/s]


  5%|█▉                                  | 2635/50000 [28:34<8:08:55,  1.61it/s]


  5%|█▉                                  | 2636/50000 [28:34<8:47:15,  1.50it/s]


  5%|█▉                                  | 2637/50000 [28:35<8:39:04,  1.52it/s]


  5%|█▉                                  | 2638/50000 [28:36<8:36:05,  1.53it/s]


  5%|█▉                                  | 2639/50000 [28:36<8:31:12,  1.54it/s]


  5%|█▉                                  | 2640/50000 [28:37<8:20:11,  1.58it/s]


  5%|█▉                                  | 2641/50000 [28:38<8:32:03,  1.54it/s]


  5%|█▉                                  | 2642/50000 [28:38<8:28:21,  1.55it/s]


  5%|█▉                                  | 2643/50000 [28:39<8:14:07,  1.60it/s]


  5%|█▉                                  | 2644/50000 [28:39<8:24:03,  1.57it/s]


  5%|█▉                                  | 2645/50000 [28:40<8:14:08,  1.60it/s]


  5%|█▉                                  | 2646/50000 [28:41<8:28:07,  1.55it/s]


  5%|█▉                                  | 2647/50000 [28:41<8:33:34,  1.54it/s]


  5%|█▉                                  | 2648/50000 [28:42<8:37:23,  1.53it/s]


  5%|█▉                                  | 2649/50000 [28:43<9:00:03,  1.46it/s]


  5%|█▉                                  | 2650/50000 [28:43<8:52:49,  1.48it/s]


  5%|█▉                                  | 2651/50000 [28:44<8:34:10,  1.53it/s]


  5%|█▉                                  | 2652/50000 [28:45<8:52:02,  1.48it/s]


  5%|█▉                                  | 2653/50000 [28:45<8:35:13,  1.53it/s]


  5%|█▉                                  | 2654/50000 [28:46<9:17:36,  1.42it/s]


  5%|█▉                                  | 2655/50000 [28:47<9:11:26,  1.43it/s]


  5%|█▉                                  | 2656/50000 [28:48<9:02:16,  1.46it/s]


  5%|█▉                                  | 2657/50000 [28:48<9:14:27,  1.42it/s]


  5%|█▉                                  | 2658/50000 [28:49<9:07:38,  1.44it/s]


  5%|█▉                                  | 2659/50000 [28:50<8:41:40,  1.51it/s]


  5%|█▉                                  | 2660/50000 [28:50<9:06:01,  1.44it/s]


  5%|█▉                                  | 2661/50000 [28:51<8:56:55,  1.47it/s]


  5%|█▉                                  | 2662/50000 [28:51<8:09:44,  1.61it/s]


  5%|█▉                                  | 2663/50000 [28:52<8:48:09,  1.49it/s]


  5%|█▉                                  | 2664/50000 [28:53<8:40:08,  1.52it/s]


  5%|█▉                                  | 2665/50000 [28:53<8:22:29,  1.57it/s]


  5%|█▉                                  | 2666/50000 [28:54<8:20:32,  1.58it/s]


  5%|█▉                                  | 2667/50000 [28:55<8:10:08,  1.61it/s]


  5%|█▉                                  | 2668/50000 [28:55<7:54:20,  1.66it/s]


  5%|█▉                                  | 2669/50000 [28:56<8:29:53,  1.55it/s]


  5%|█▉                                  | 2670/50000 [28:57<8:19:18,  1.58it/s]


  5%|█▉                                  | 2671/50000 [28:57<8:19:19,  1.58it/s]


  5%|█▉                                  | 2672/50000 [28:58<8:20:50,  1.57it/s]


  5%|█▉                                  | 2673/50000 [28:58<8:23:20,  1.57it/s]


  5%|█▉                                  | 2674/50000 [28:59<8:22:07,  1.57it/s]


  5%|█▉                                  | 2675/50000 [29:00<8:31:42,  1.54it/s]


  5%|█▉                                  | 2676/50000 [29:00<8:29:28,  1.55it/s]


  5%|█▉                                  | 2677/50000 [29:01<8:18:45,  1.58it/s]


  5%|█▉                                  | 2678/50000 [29:02<8:17:30,  1.59it/s]


  5%|█▉                                  | 2679/50000 [29:02<8:04:31,  1.63it/s]


  5%|█▉                                  | 2680/50000 [29:03<8:17:57,  1.58it/s]


  5%|█▉                                  | 2681/50000 [29:04<8:22:50,  1.57it/s]


  5%|█▉                                  | 2682/50000 [29:04<8:06:59,  1.62it/s]


  5%|█▉                                  | 2683/50000 [29:05<8:28:44,  1.55it/s]


  5%|█▉                                  | 2684/50000 [29:06<8:38:01,  1.52it/s]


  5%|█▉                                  | 2685/50000 [29:06<8:11:34,  1.60it/s]


  5%|█▉                                  | 2686/50000 [29:07<8:17:37,  1.58it/s]


  5%|█▉                                  | 2687/50000 [29:07<8:31:37,  1.54it/s]


  5%|█▉                                  | 2688/50000 [29:08<8:51:23,  1.48it/s]


  5%|█▉                                  | 2689/50000 [29:09<9:08:16,  1.44it/s]


  5%|█▉                                  | 2690/50000 [29:10<8:58:27,  1.46it/s]


  5%|█▉                                  | 2691/50000 [29:10<8:51:26,  1.48it/s]


  5%|█▉                                  | 2692/50000 [29:11<8:25:48,  1.56it/s]


  5%|█▉                                  | 2693/50000 [29:11<8:24:11,  1.56it/s]


  5%|█▉                                  | 2694/50000 [29:12<9:04:45,  1.45it/s]


  5%|█▉                                  | 2695/50000 [29:13<8:35:06,  1.53it/s]


  5%|█▉                                  | 2696/50000 [29:13<8:39:46,  1.52it/s]


  5%|█▉                                  | 2697/50000 [29:14<8:27:54,  1.55it/s]


  5%|█▉                                  | 2698/50000 [29:15<9:28:35,  1.39it/s]


  5%|█▉                                  | 2699/50000 [29:16<9:07:45,  1.44it/s]


  5%|█▉                                  | 2700/50000 [29:16<9:02:09,  1.45it/s]
                                                                                
{'loss': 3.4149, 'grad_norm': 2.563560724258423, 'learning_rate': 0.000946, 'epoch': 0.14}

  5%|█▉                                  | 2700/50000 [29:16<9:02:09,  1.45it/s]


  5%|█▉                                  | 2701/50000 [29:17<8:52:10,  1.48it/s]


  5%|█▉                                  | 2702/50000 [29:17<8:21:24,  1.57it/s]


  5%|█▉                                  | 2703/50000 [29:18<8:32:41,  1.54it/s]


  5%|█▉                                  | 2704/50000 [29:19<8:28:43,  1.55it/s]


  5%|█▉                                  | 2705/50000 [29:19<8:06:17,  1.62it/s]


  5%|█▉                                  | 2706/50000 [29:20<7:52:12,  1.67it/s]


  5%|█▉                                  | 2707/50000 [29:20<7:40:38,  1.71it/s]


  5%|█▉                                  | 2708/50000 [29:21<7:58:04,  1.65it/s]


  5%|█▉                                  | 2709/50000 [29:22<8:11:52,  1.60it/s]


  5%|█▉                                  | 2710/50000 [29:22<8:12:32,  1.60it/s]


  5%|█▉                                  | 2711/50000 [29:23<8:23:20,  1.57it/s]


  5%|█▉                                  | 2712/50000 [29:24<8:32:32,  1.54it/s]


  5%|█▉                                  | 2713/50000 [29:24<9:03:28,  1.45it/s]


  5%|█▉                                  | 2714/50000 [29:25<9:20:38,  1.41it/s]


  5%|█▉                                  | 2715/50000 [29:26<8:47:08,  1.50it/s]


  5%|█▉                                  | 2716/50000 [29:27<8:59:54,  1.46it/s]


  5%|█▉                                  | 2717/50000 [29:27<9:09:32,  1.43it/s]


  5%|█▉                                  | 2718/50000 [29:28<8:53:13,  1.48it/s]


  5%|█▉                                  | 2719/50000 [29:29<8:51:46,  1.48it/s]


  5%|█▉                                  | 2720/50000 [29:29<8:15:45,  1.59it/s]


  5%|█▉                                  | 2721/50000 [29:30<9:21:09,  1.40it/s]


  5%|█▉                                  | 2722/50000 [29:31<9:09:57,  1.43it/s]


  5%|█▉                                  | 2723/50000 [29:31<8:28:19,  1.55it/s]


  5%|█▉                                  | 2724/50000 [29:32<7:54:51,  1.66it/s]


  5%|█▉                                  | 2725/50000 [29:32<8:09:28,  1.61it/s]


  5%|█▉                                  | 2726/50000 [29:33<8:33:34,  1.53it/s]


  5%|█▉                                  | 2727/50000 [29:34<8:49:11,  1.49it/s]


  5%|█▉                                  | 2728/50000 [29:34<8:37:12,  1.52it/s]


  5%|█▉                                  | 2729/50000 [29:35<8:40:38,  1.51it/s]


  5%|█▉                                  | 2730/50000 [29:36<8:41:34,  1.51it/s]


  5%|█▉                                  | 2731/50000 [29:36<8:54:08,  1.47it/s]


  5%|█▉                                  | 2732/50000 [29:37<8:53:58,  1.48it/s]


  5%|█▉                                  | 2733/50000 [29:38<8:43:28,  1.50it/s]


  5%|█▉                                  | 2734/50000 [29:38<8:34:36,  1.53it/s]


  5%|█▉                                  | 2735/50000 [29:39<8:35:50,  1.53it/s]


  5%|█▉                                  | 2736/50000 [29:40<8:48:37,  1.49it/s]


  5%|█▉                                  | 2737/50000 [29:40<8:45:33,  1.50it/s]


  5%|█▉                                  | 2738/50000 [29:41<8:59:22,  1.46it/s]


  5%|█▉                                  | 2739/50000 [29:42<8:59:32,  1.46it/s]


  5%|█▉                                  | 2740/50000 [29:42<8:51:44,  1.48it/s]


  5%|█▉                                  | 2741/50000 [29:43<8:31:13,  1.54it/s]


  5%|█▉                                  | 2742/50000 [29:44<8:51:49,  1.48it/s]


  5%|█▉                                  | 2743/50000 [29:45<9:10:37,  1.43it/s]


  5%|█▉                                  | 2744/50000 [29:45<9:15:03,  1.42it/s]


  5%|█▉                                  | 2745/50000 [29:46<8:46:07,  1.50it/s]


  5%|█▉                                  | 2746/50000 [29:47<8:49:45,  1.49it/s]


  5%|█▉                                  | 2747/50000 [29:47<8:49:18,  1.49it/s]


  5%|█▉                                  | 2748/50000 [29:48<8:25:11,  1.56it/s]


  5%|█▉                                  | 2749/50000 [29:48<8:15:32,  1.59it/s]


  6%|█▉                                  | 2750/50000 [29:49<7:57:08,  1.65it/s]


  6%|█▉                                  | 2751/50000 [29:50<7:50:04,  1.68it/s]


  6%|█▉                                  | 2752/50000 [29:50<8:02:57,  1.63it/s]


  6%|█▉                                  | 2753/50000 [29:51<8:33:28,  1.53it/s]


  6%|█▉                                  | 2754/50000 [29:52<8:57:43,  1.46it/s]


  6%|█▉                                  | 2755/50000 [29:52<9:19:30,  1.41it/s]


  6%|█▉                                  | 2756/50000 [29:53<8:52:06,  1.48it/s]


  6%|█▉                                  | 2757/50000 [29:54<9:03:27,  1.45it/s]


  6%|█▉                                  | 2758/50000 [29:54<8:40:56,  1.51it/s]


  6%|█▉                                  | 2759/50000 [29:55<8:20:39,  1.57it/s]


  6%|█▉                                  | 2760/50000 [29:56<8:27:27,  1.55it/s]


  6%|█▉                                  | 2761/50000 [29:56<8:15:39,  1.59it/s]


  6%|█▉                                  | 2762/50000 [29:57<8:19:21,  1.58it/s]


  6%|█▉                                  | 2763/50000 [29:57<8:23:27,  1.56it/s]


  6%|█▉                                  | 2764/50000 [29:58<8:04:01,  1.63it/s]


  6%|█▉                                  | 2765/50000 [29:59<7:54:55,  1.66it/s]


  6%|█▉                                  | 2766/50000 [29:59<7:49:59,  1.67it/s]


  6%|█▉                                  | 2767/50000 [30:00<7:47:53,  1.68it/s]


  6%|█▉                                  | 2768/50000 [30:00<8:04:25,  1.63it/s]


  6%|█▉                                  | 2769/50000 [30:01<7:58:00,  1.65it/s]


  6%|█▉                                  | 2770/50000 [30:02<7:55:00,  1.66it/s]


  6%|█▉                                  | 2771/50000 [30:02<7:54:25,  1.66it/s]


  6%|█▉                                  | 2772/50000 [30:03<7:51:03,  1.67it/s]


  6%|█▉                                  | 2773/50000 [30:03<7:53:19,  1.66it/s]


  6%|█▉                                  | 2774/50000 [30:04<8:23:56,  1.56it/s]


  6%|█▉                                  | 2775/50000 [30:05<8:26:39,  1.55it/s]


  6%|█▉                                  | 2776/50000 [30:05<8:23:11,  1.56it/s]


  6%|█▉                                  | 2777/50000 [30:06<8:49:20,  1.49it/s]


  6%|██                                  | 2778/50000 [30:07<8:58:01,  1.46it/s]


  6%|██                                  | 2779/50000 [30:07<8:32:41,  1.54it/s]


  6%|██                                  | 2780/50000 [30:08<8:02:16,  1.63it/s]


  6%|██                                  | 2781/50000 [30:09<8:15:02,  1.59it/s]


  6%|██                                  | 2782/50000 [30:09<8:36:14,  1.52it/s]


  6%|██                                  | 2783/50000 [30:10<9:41:11,  1.35it/s]


  6%|██                                  | 2784/50000 [30:11<9:40:43,  1.36it/s]


  6%|██                                  | 2785/50000 [30:12<9:21:55,  1.40it/s]


  6%|██                                  | 2786/50000 [30:12<8:53:40,  1.47it/s]


  6%|██                                  | 2787/50000 [30:13<8:33:55,  1.53it/s]


  6%|██                                  | 2788/50000 [30:13<8:11:18,  1.60it/s]


  6%|██                                  | 2789/50000 [30:14<8:25:16,  1.56it/s]


  6%|██                                  | 2790/50000 [30:15<8:28:58,  1.55it/s]


  6%|██                                  | 2791/50000 [30:15<8:18:02,  1.58it/s]


  6%|██                                  | 2792/50000 [30:16<8:28:26,  1.55it/s]


  6%|██                                  | 2793/50000 [30:17<8:05:55,  1.62it/s]


  6%|██                                  | 2794/50000 [30:17<7:51:41,  1.67it/s]


  6%|██                                  | 2795/50000 [30:18<7:59:36,  1.64it/s]


  6%|██                                  | 2796/50000 [30:18<7:38:59,  1.71it/s]


  6%|██                                  | 2797/50000 [30:19<7:58:13,  1.65it/s]


  6%|██                                  | 2798/50000 [30:20<8:22:04,  1.57it/s]


  6%|██                                  | 2799/50000 [30:20<8:07:12,  1.61it/s]


  6%|██                                  | 2800/50000 [30:21<8:31:40,  1.54it/s]
                                                                                
{'loss': 3.4057, 'grad_norm': 2.430704355239868, 'learning_rate': 0.000944, 'epoch': 0.15}

  6%|██                                  | 2800/50000 [30:21<8:31:40,  1.54it/s]


  6%|██                                  | 2801/50000 [30:22<8:00:05,  1.64it/s]


  6%|██                                  | 2802/50000 [30:22<8:28:19,  1.55it/s]


  6%|██                                  | 2803/50000 [30:23<8:29:36,  1.54it/s]


  6%|██                                  | 2804/50000 [30:24<8:16:56,  1.58it/s]


  6%|██                                  | 2805/50000 [30:24<8:17:19,  1.58it/s]


  6%|██                                  | 2806/50000 [30:25<8:20:19,  1.57it/s]


  6%|██                                  | 2807/50000 [30:25<8:10:29,  1.60it/s]


  6%|██                                  | 2808/50000 [30:26<8:13:14,  1.59it/s]


  6%|██                                  | 2809/50000 [30:27<8:25:21,  1.56it/s]


  6%|██                                  | 2810/50000 [30:27<8:27:26,  1.55it/s]


  6%|██                                  | 2811/50000 [30:28<8:43:56,  1.50it/s]


  6%|██                                  | 2812/50000 [30:29<8:57:57,  1.46it/s]


  6%|██                                  | 2813/50000 [30:29<8:50:38,  1.48it/s]


  6%|██                                  | 2814/50000 [30:30<9:00:48,  1.45it/s]


  6%|██                                  | 2815/50000 [30:31<8:38:49,  1.52it/s]


  6%|██                                  | 2816/50000 [30:31<8:18:58,  1.58it/s]


  6%|██                                  | 2817/50000 [30:32<8:26:05,  1.55it/s]


  6%|██                                  | 2818/50000 [30:33<8:26:42,  1.55it/s]


  6%|██                                  | 2819/50000 [30:33<8:13:28,  1.59it/s]


  6%|██                                  | 2820/50000 [30:34<8:49:45,  1.48it/s]


  6%|██                                  | 2821/50000 [30:35<8:43:14,  1.50it/s]


  6%|██                                  | 2822/50000 [30:35<8:35:37,  1.52it/s]


  6%|██                                  | 2823/50000 [30:36<8:21:12,  1.57it/s]


  6%|██                                  | 2824/50000 [30:36<8:09:37,  1.61it/s]


  6%|██                                  | 2825/50000 [30:37<8:18:04,  1.58it/s]


  6%|██                                  | 2826/50000 [30:38<7:58:38,  1.64it/s]


  6%|██                                  | 2827/50000 [30:38<7:56:50,  1.65it/s]


  6%|██                                  | 2828/50000 [30:39<8:08:59,  1.61it/s]


  6%|██                                  | 2829/50000 [30:40<8:37:34,  1.52it/s]


  6%|██                                  | 2830/50000 [30:40<8:39:05,  1.51it/s]


  6%|██                                  | 2831/50000 [30:41<8:26:32,  1.55it/s]


  6%|██                                  | 2832/50000 [30:42<8:41:53,  1.51it/s]


  6%|██                                  | 2833/50000 [30:42<8:36:19,  1.52it/s]


  6%|██                                  | 2834/50000 [30:43<9:35:27,  1.37it/s]


  6%|██                                  | 2835/50000 [30:44<9:32:04,  1.37it/s]


  6%|██                                  | 2836/50000 [30:45<8:53:52,  1.47it/s]


  6%|██                                  | 2837/50000 [30:45<8:46:17,  1.49it/s]


  6%|██                                  | 2838/50000 [30:46<9:19:38,  1.40it/s]


  6%|██                                  | 2839/50000 [30:47<9:01:54,  1.45it/s]


  6%|██                                  | 2840/50000 [30:47<8:49:20,  1.48it/s]


  6%|██                                  | 2841/50000 [30:48<8:14:15,  1.59it/s]


  6%|██                                  | 2842/50000 [30:48<7:59:19,  1.64it/s]


  6%|██                                  | 2843/50000 [30:49<7:56:29,  1.65it/s]


  6%|██                                  | 2844/50000 [30:50<8:55:52,  1.47it/s]


  6%|██                                  | 2845/50000 [30:51<9:09:11,  1.43it/s]


  6%|██                                  | 2846/50000 [30:51<8:40:31,  1.51it/s]


  6%|██                                  | 2847/50000 [30:52<8:21:38,  1.57it/s]


  6%|██                                  | 2848/50000 [30:52<8:11:29,  1.60it/s]


  6%|██                                  | 2849/50000 [30:53<8:12:53,  1.59it/s]


  6%|██                                  | 2850/50000 [30:54<8:15:40,  1.59it/s]


  6%|██                                  | 2851/50000 [30:54<7:38:31,  1.71it/s]


  6%|██                                  | 2852/50000 [30:55<7:59:05,  1.64it/s]


  6%|██                                  | 2853/50000 [30:55<8:04:35,  1.62it/s]


  6%|██                                  | 2854/50000 [30:56<8:18:53,  1.58it/s]


  6%|██                                  | 2855/50000 [30:57<8:17:03,  1.58it/s]


  6%|██                                  | 2856/50000 [30:57<7:59:19,  1.64it/s]


  6%|██                                  | 2857/50000 [30:58<8:12:16,  1.60it/s]


  6%|██                                  | 2858/50000 [30:59<8:39:47,  1.51it/s]


  6%|██                                  | 2859/50000 [30:59<8:13:40,  1.59it/s]


  6%|██                                  | 2860/50000 [31:00<8:07:14,  1.61it/s]


  6%|██                                  | 2861/50000 [31:00<8:22:15,  1.56it/s]


  6%|██                                  | 2862/50000 [31:01<8:22:12,  1.56it/s]


  6%|██                                  | 2863/50000 [31:02<7:46:08,  1.69it/s]


  6%|██                                  | 2864/50000 [31:02<8:15:24,  1.59it/s]


  6%|██                                  | 2865/50000 [31:03<8:37:50,  1.52it/s]


  6%|██                                  | 2866/50000 [31:04<8:37:00,  1.52it/s]


  6%|██                                  | 2867/50000 [31:04<8:53:54,  1.47it/s]


  6%|██                                  | 2868/50000 [31:05<8:43:28,  1.50it/s]


  6%|██                                  | 2869/50000 [31:06<8:33:16,  1.53it/s]


  6%|██                                  | 2870/50000 [31:06<8:51:45,  1.48it/s]


  6%|██                                  | 2871/50000 [31:07<8:46:17,  1.49it/s]


  6%|██                                  | 2872/50000 [31:08<8:31:59,  1.53it/s]


  6%|██                                  | 2873/50000 [31:09<9:32:09,  1.37it/s]


  6%|██                                  | 2874/50000 [31:09<8:55:15,  1.47it/s]


  6%|██                                  | 2875/50000 [31:10<8:32:19,  1.53it/s]


  6%|██                                  | 2876/50000 [31:10<8:35:40,  1.52it/s]


  6%|██                                  | 2877/50000 [31:11<8:23:01,  1.56it/s]


  6%|██                                  | 2878/50000 [31:12<8:30:45,  1.54it/s]


  6%|██                                  | 2879/50000 [31:12<8:30:09,  1.54it/s]


  6%|██                                  | 2880/50000 [31:13<8:37:06,  1.52it/s]


  6%|██                                  | 2881/50000 [31:13<8:02:30,  1.63it/s]


  6%|██                                  | 2882/50000 [31:14<8:16:08,  1.58it/s]


  6%|██                                  | 2883/50000 [31:15<8:59:20,  1.46it/s]


  6%|██                                  | 2884/50000 [31:15<8:20:29,  1.57it/s]


  6%|██                                  | 2885/50000 [31:16<8:42:13,  1.50it/s]


  6%|██                                  | 2886/50000 [31:17<8:34:40,  1.53it/s]


  6%|██                                  | 2887/50000 [31:18<8:50:38,  1.48it/s]


  6%|██                                  | 2888/50000 [31:18<9:06:29,  1.44it/s]


  6%|██                                  | 2889/50000 [31:19<8:56:08,  1.46it/s]


  6%|██                                  | 2890/50000 [31:20<8:32:32,  1.53it/s]


  6%|██                                  | 2891/50000 [31:20<8:26:21,  1.55it/s]


  6%|██                                  | 2892/50000 [31:21<7:54:35,  1.65it/s]


  6%|██                                  | 2893/50000 [31:21<8:19:05,  1.57it/s]


  6%|██                                  | 2894/50000 [31:22<8:36:14,  1.52it/s]


  6%|██                                 | 2895/50000 [31:23<10:02:03,  1.30it/s]


  6%|██                                  | 2896/50000 [31:24<9:18:37,  1.41it/s]


  6%|██                                  | 2897/50000 [31:24<9:28:56,  1.38it/s]


  6%|██                                  | 2898/50000 [31:25<9:26:47,  1.39it/s]


  6%|██                                  | 2899/50000 [31:26<9:05:57,  1.44it/s]


  6%|██                                  | 2900/50000 [31:27<9:03:03,  1.45it/s]
                                                                                
{'loss': 3.3754, 'grad_norm': 2.3022711277008057, 'learning_rate': 0.000942, 'epoch': 0.15}

  6%|██                                  | 2900/50000 [31:27<9:03:03,  1.45it/s]


  6%|██                                  | 2901/50000 [31:27<9:13:42,  1.42it/s]


  6%|██                                  | 2902/50000 [31:28<8:47:56,  1.49it/s]


  6%|██                                  | 2903/50000 [31:29<8:51:11,  1.48it/s]


  6%|██                                  | 2904/50000 [31:29<8:13:44,  1.59it/s]


  6%|██                                  | 2905/50000 [31:30<8:42:29,  1.50it/s]


  6%|██                                  | 2906/50000 [31:30<8:42:45,  1.50it/s]


  6%|██                                  | 2907/50000 [31:31<8:19:04,  1.57it/s]


  6%|██                                  | 2908/50000 [31:32<8:11:30,  1.60it/s]


  6%|██                                  | 2909/50000 [31:32<8:21:29,  1.57it/s]


  6%|██                                  | 2910/50000 [31:33<8:43:30,  1.50it/s]


  6%|██                                  | 2911/50000 [31:34<8:27:05,  1.55it/s]


  6%|██                                  | 2912/50000 [31:34<8:17:01,  1.58it/s]


  6%|██                                  | 2913/50000 [31:35<8:02:32,  1.63it/s]


  6%|██                                  | 2914/50000 [31:35<8:10:09,  1.60it/s]


  6%|██                                  | 2915/50000 [31:36<8:23:00,  1.56it/s]


  6%|██                                  | 2916/50000 [31:37<8:44:12,  1.50it/s]


  6%|██                                  | 2917/50000 [31:38<8:38:50,  1.51it/s]


  6%|██                                  | 2918/50000 [31:38<8:06:10,  1.61it/s]


  6%|██                                  | 2919/50000 [31:39<9:03:08,  1.44it/s]


  6%|██                                  | 2920/50000 [31:40<8:49:19,  1.48it/s]


  6%|██                                  | 2921/50000 [31:40<8:43:56,  1.50it/s]


  6%|██                                  | 2922/50000 [31:41<8:19:23,  1.57it/s]


  6%|██                                  | 2923/50000 [31:41<7:59:12,  1.64it/s]


  6%|██                                  | 2924/50000 [31:42<7:52:29,  1.66it/s]


  6%|██                                  | 2925/50000 [31:42<7:47:50,  1.68it/s]


  6%|██                                  | 2926/50000 [31:43<8:23:46,  1.56it/s]


  6%|██                                  | 2927/50000 [31:44<8:27:39,  1.55it/s]


  6%|██                                  | 2928/50000 [31:45<8:44:20,  1.50it/s]


  6%|██                                  | 2929/50000 [31:45<8:34:40,  1.52it/s]


  6%|██                                  | 2930/50000 [31:46<8:47:05,  1.49it/s]


  6%|██                                  | 2931/50000 [31:46<8:24:39,  1.55it/s]


  6%|██                                  | 2932/50000 [31:47<8:48:00,  1.49it/s]


  6%|██                                 | 2933/50000 [31:48<10:20:58,  1.26it/s]


  6%|██                                  | 2934/50000 [31:49<9:16:16,  1.41it/s]


  6%|██                                  | 2935/50000 [31:50<9:47:24,  1.34it/s]


  6%|██                                  | 2936/50000 [31:50<9:12:36,  1.42it/s]


  6%|██                                  | 2937/50000 [31:51<9:48:49,  1.33it/s]


  6%|██                                  | 2938/50000 [31:52<9:06:38,  1.43it/s]


  6%|██                                  | 2939/50000 [31:52<8:33:45,  1.53it/s]


  6%|██                                  | 2940/50000 [31:53<7:59:41,  1.64it/s]


  6%|██                                  | 2941/50000 [31:54<8:38:14,  1.51it/s]


  6%|██                                  | 2942/50000 [31:54<8:30:18,  1.54it/s]


  6%|██                                  | 2943/50000 [31:55<8:15:40,  1.58it/s]


  6%|██                                  | 2944/50000 [31:55<7:59:33,  1.64it/s]


  6%|██                                  | 2945/50000 [31:56<8:12:03,  1.59it/s]


  6%|██                                  | 2946/50000 [31:57<8:56:22,  1.46it/s]


  6%|██                                  | 2947/50000 [31:57<8:31:02,  1.53it/s]


  6%|██                                  | 2948/50000 [31:58<8:16:11,  1.58it/s]


  6%|██                                  | 2949/50000 [31:58<7:45:45,  1.68it/s]


  6%|██                                  | 2950/50000 [31:59<7:38:31,  1.71it/s]


  6%|██                                  | 2951/50000 [32:00<8:40:16,  1.51it/s]


  6%|██▏                                 | 2952/50000 [32:00<8:27:05,  1.55it/s]


  6%|██▏                                 | 2953/50000 [32:01<8:29:45,  1.54it/s]


  6%|██▏                                 | 2954/50000 [32:02<8:19:36,  1.57it/s]


  6%|██▏                                 | 2955/50000 [32:03<8:47:28,  1.49it/s]


  6%|██▏                                 | 2956/50000 [32:03<8:55:55,  1.46it/s]


  6%|██▏                                 | 2957/50000 [32:04<8:29:11,  1.54it/s]


  6%|██▏                                 | 2958/50000 [32:04<8:17:50,  1.57it/s]


  6%|██▏                                 | 2959/50000 [32:05<8:06:50,  1.61it/s]


  6%|██▏                                 | 2960/50000 [32:06<8:09:45,  1.60it/s]


  6%|██▏                                 | 2961/50000 [32:06<7:36:58,  1.72it/s]


  6%|██▏                                 | 2962/50000 [32:07<7:52:14,  1.66it/s]


  6%|██▏                                 | 2963/50000 [32:07<7:58:14,  1.64it/s]


  6%|██▏                                 | 2964/50000 [32:08<7:51:48,  1.66it/s]


  6%|██▏                                 | 2965/50000 [32:09<8:00:32,  1.63it/s]


  6%|██▏                                 | 2966/50000 [32:09<7:50:20,  1.67it/s]


  6%|██▏                                 | 2967/50000 [32:10<7:50:31,  1.67it/s]


  6%|██▏                                 | 2968/50000 [32:10<7:59:36,  1.63it/s]


  6%|██▏                                 | 2969/50000 [32:11<8:39:47,  1.51it/s]


  6%|██▏                                 | 2970/50000 [32:12<8:20:11,  1.57it/s]


  6%|██▏                                 | 2971/50000 [32:12<8:16:44,  1.58it/s]


  6%|██▏                                 | 2972/50000 [32:13<8:29:00,  1.54it/s]


  6%|██▏                                 | 2973/50000 [32:14<7:55:32,  1.65it/s]


  6%|██▏                                 | 2974/50000 [32:14<7:44:27,  1.69it/s]


  6%|██▏                                 | 2975/50000 [32:15<8:03:22,  1.62it/s]


  6%|██▏                                 | 2976/50000 [32:15<8:18:00,  1.57it/s]


  6%|██▏                                 | 2977/50000 [32:16<8:08:51,  1.60it/s]


  6%|██▏                                 | 2978/50000 [32:17<8:37:07,  1.52it/s]


  6%|██▏                                 | 2979/50000 [32:17<8:20:41,  1.57it/s]


  6%|██▏                                 | 2980/50000 [32:18<7:45:01,  1.69it/s]


  6%|██▏                                 | 2981/50000 [32:18<7:44:30,  1.69it/s]


  6%|██▏                                 | 2982/50000 [32:19<7:36:36,  1.72it/s]


  6%|██▏                                 | 2983/50000 [32:20<8:10:08,  1.60it/s]


  6%|██▏                                 | 2984/50000 [32:20<8:12:06,  1.59it/s]


  6%|██▏                                 | 2985/50000 [32:21<7:56:53,  1.64it/s]


  6%|██▏                                 | 2986/50000 [32:22<8:06:58,  1.61it/s]


  6%|██▏                                 | 2987/50000 [32:22<8:55:34,  1.46it/s]


  6%|██▏                                 | 2988/50000 [32:23<8:49:43,  1.48it/s]


  6%|██▏                                 | 2989/50000 [32:24<8:48:35,  1.48it/s]


  6%|██▏                                 | 2990/50000 [32:24<8:22:23,  1.56it/s]


  6%|██▏                                 | 2991/50000 [32:25<8:21:52,  1.56it/s]


  6%|██▏                                 | 2992/50000 [32:26<8:18:38,  1.57it/s]


  6%|██▏                                 | 2993/50000 [32:26<9:10:14,  1.42it/s]


  6%|██▏                                 | 2994/50000 [32:27<8:58:42,  1.45it/s]


  6%|██▏                                 | 2995/50000 [32:28<8:34:29,  1.52it/s]


  6%|██▏                                 | 2996/50000 [32:28<8:39:47,  1.51it/s]


  6%|██▏                                 | 2997/50000 [32:29<8:45:04,  1.49it/s]


  6%|██▏                                 | 2998/50000 [32:30<8:20:29,  1.57it/s]


  6%|██▏                                 | 2999/50000 [32:30<8:09:03,  1.60it/s]


  6%|██▏                                 | 3000/50000 [32:31<8:55:54,  1.46it/s]
                                                                                
{'loss': 3.3854, 'grad_norm': 2.3062198162078857, 'learning_rate': 0.00094, 'epoch': 0.16}

  6%|██▏                                 | 3000/50000 [32:31<8:55:54,  1.46it/s]


  6%|██                                 | 3001/50000 [32:32<10:17:20,  1.27it/s]


  6%|██                                 | 3002/50000 [32:33<10:16:18,  1.27it/s]


  6%|██▏                                 | 3003/50000 [32:33<9:20:44,  1.40it/s]


  6%|██▏                                 | 3004/50000 [32:34<9:26:13,  1.38it/s]


  6%|██▏                                 | 3005/50000 [32:35<9:04:07,  1.44it/s]


  6%|██▏                                 | 3006/50000 [32:35<8:24:11,  1.55it/s]


  6%|██▏                                 | 3007/50000 [32:36<8:13:17,  1.59it/s]


  6%|██▏                                 | 3008/50000 [32:37<8:26:25,  1.55it/s]


  6%|██▏                                 | 3009/50000 [32:37<8:21:05,  1.56it/s]


  6%|██▏                                 | 3010/50000 [32:38<8:11:05,  1.59it/s]


  6%|██▏                                 | 3011/50000 [32:38<8:02:28,  1.62it/s]


  6%|██▏                                 | 3012/50000 [32:39<7:49:15,  1.67it/s]


  6%|██▏                                 | 3013/50000 [32:40<8:18:43,  1.57it/s]


  6%|██▏                                 | 3014/50000 [32:40<8:05:54,  1.61it/s]


  6%|██▏                                 | 3015/50000 [32:41<8:37:44,  1.51it/s]


  6%|██▏                                 | 3016/50000 [32:42<8:40:02,  1.51it/s]


  6%|██▏                                 | 3017/50000 [32:42<8:25:15,  1.55it/s]


  6%|██▏                                 | 3018/50000 [32:43<8:31:19,  1.53it/s]


  6%|██▏                                 | 3019/50000 [32:44<8:29:28,  1.54it/s]


  6%|██▏                                 | 3020/50000 [32:44<8:14:43,  1.58it/s]


  6%|██▏                                 | 3021/50000 [32:45<8:08:53,  1.60it/s]


  6%|██▏                                 | 3022/50000 [32:45<8:08:05,  1.60it/s]


  6%|██▏                                 | 3023/50000 [32:46<8:01:41,  1.63it/s]


  6%|██▏                                 | 3024/50000 [32:47<8:16:15,  1.58it/s]


  6%|██▏                                 | 3025/50000 [32:47<8:08:29,  1.60it/s]


  6%|██▏                                 | 3026/50000 [32:48<8:20:22,  1.56it/s]


  6%|██▏                                 | 3027/50000 [32:49<7:53:45,  1.65it/s]


  6%|██▏                                 | 3028/50000 [32:49<8:25:02,  1.55it/s]


  6%|██▏                                 | 3029/50000 [32:50<9:55:21,  1.31it/s]


  6%|██                                 | 3030/50000 [32:51<10:32:54,  1.24it/s]


  6%|██▏                                 | 3031/50000 [32:52<9:41:33,  1.35it/s]


  6%|██▏                                 | 3032/50000 [32:52<9:26:36,  1.38it/s]


  6%|██▏                                 | 3033/50000 [32:53<8:49:23,  1.48it/s]


  6%|██▏                                 | 3034/50000 [32:54<8:27:47,  1.54it/s]


  6%|██▏                                 | 3035/50000 [32:54<8:36:23,  1.52it/s]


  6%|██▏                                 | 3036/50000 [32:55<8:28:13,  1.54it/s]


  6%|██▏                                 | 3037/50000 [32:56<8:47:49,  1.48it/s]


  6%|██▏                                 | 3038/50000 [32:56<8:09:20,  1.60it/s]


  6%|██▏                                 | 3039/50000 [32:57<8:21:30,  1.56it/s]


  6%|██▏                                 | 3040/50000 [32:57<7:53:42,  1.65it/s]


  6%|██▏                                 | 3041/50000 [32:58<7:39:37,  1.70it/s]


  6%|██▏                                 | 3042/50000 [32:58<7:35:37,  1.72it/s]


  6%|██▏                                 | 3043/50000 [32:59<7:47:38,  1.67it/s]


  6%|██▏                                 | 3044/50000 [33:00<8:17:14,  1.57it/s]


  6%|██▏                                 | 3045/50000 [33:00<8:04:52,  1.61it/s]


  6%|██▏                                 | 3046/50000 [33:01<7:53:12,  1.65it/s]


  6%|██▏                                 | 3047/50000 [33:02<8:11:53,  1.59it/s]


  6%|██▏                                 | 3048/50000 [33:02<8:30:43,  1.53it/s]


  6%|██▏                                 | 3049/50000 [33:03<8:30:29,  1.53it/s]


  6%|██▏                                 | 3050/50000 [33:04<8:25:58,  1.55it/s]


  6%|██▏                                 | 3051/50000 [33:04<8:02:12,  1.62it/s]


  6%|██▏                                 | 3052/50000 [33:05<8:33:25,  1.52it/s]


  6%|██▏                                 | 3053/50000 [33:06<8:25:03,  1.55it/s]


  6%|██▏                                 | 3054/50000 [33:06<8:25:07,  1.55it/s]


  6%|██▏                                 | 3055/50000 [33:07<8:13:09,  1.59it/s]


  6%|██▏                                 | 3056/50000 [33:07<7:45:00,  1.68it/s]


  6%|██▏                                 | 3057/50000 [33:08<7:36:27,  1.71it/s]


  6%|██▏                                 | 3058/50000 [33:08<7:36:46,  1.71it/s]


  6%|██▏                                 | 3059/50000 [33:09<7:57:41,  1.64it/s]


  6%|██▏                                 | 3060/50000 [33:10<7:57:24,  1.64it/s]


  6%|██▏                                 | 3061/50000 [33:10<8:13:32,  1.59it/s]


  6%|██▏                                 | 3062/50000 [33:11<8:15:54,  1.58it/s]


  6%|██▏                                 | 3063/50000 [33:12<9:06:23,  1.43it/s]


  6%|██▏                                 | 3064/50000 [33:13<8:53:01,  1.47it/s]


  6%|██▏                                 | 3065/50000 [33:13<9:16:46,  1.40it/s]


  6%|██▏                                 | 3066/50000 [33:14<9:24:42,  1.39it/s]


  6%|██▏                                 | 3067/50000 [33:15<9:11:12,  1.42it/s]


  6%|██▏                                 | 3068/50000 [33:15<8:23:02,  1.55it/s]


  6%|██▏                                 | 3069/50000 [33:16<8:06:04,  1.61it/s]


  6%|██▏                                 | 3070/50000 [33:16<8:08:04,  1.60it/s]


  6%|██▏                                 | 3071/50000 [33:17<8:08:08,  1.60it/s]


  6%|██▏                                 | 3072/50000 [33:18<8:31:59,  1.53it/s]


  6%|██▏                                 | 3073/50000 [33:18<8:19:58,  1.56it/s]


  6%|██▏                                 | 3074/50000 [33:19<8:08:34,  1.60it/s]


  6%|██▏                                 | 3075/50000 [33:20<8:18:15,  1.57it/s]


  6%|██▏                                 | 3076/50000 [33:20<8:22:04,  1.56it/s]


  6%|██▏                                 | 3077/50000 [33:21<8:21:50,  1.56it/s]


  6%|██▏                                 | 3078/50000 [33:22<8:26:12,  1.54it/s]


  6%|██▏                                 | 3079/50000 [33:22<8:07:03,  1.61it/s]


  6%|██▏                                 | 3080/50000 [33:23<7:57:48,  1.64it/s]


  6%|██▏                                 | 3081/50000 [33:24<8:28:39,  1.54it/s]


  6%|██▏                                 | 3082/50000 [33:24<8:32:01,  1.53it/s]


  6%|██▏                                 | 3083/50000 [33:25<8:10:38,  1.59it/s]


  6%|██▏                                 | 3084/50000 [33:25<8:17:46,  1.57it/s]


  6%|██▏                                 | 3085/50000 [33:26<9:02:34,  1.44it/s]


  6%|██▏                                 | 3086/50000 [33:27<8:45:17,  1.49it/s]


  6%|██▏                                 | 3087/50000 [33:27<8:24:27,  1.55it/s]


  6%|██▏                                 | 3088/50000 [33:28<8:24:35,  1.55it/s]


  6%|██▏                                 | 3089/50000 [33:29<8:09:48,  1.60it/s]


  6%|██▏                                 | 3090/50000 [33:29<8:03:47,  1.62it/s]


  6%|██▏                                 | 3091/50000 [33:30<7:26:44,  1.75it/s]


  6%|██▏                                 | 3092/50000 [33:30<7:42:33,  1.69it/s]


  6%|██▏                                 | 3093/50000 [33:31<7:32:51,  1.73it/s]


  6%|██▏                                 | 3094/50000 [33:32<8:08:06,  1.60it/s]


  6%|██▏                                 | 3095/50000 [33:32<8:21:57,  1.56it/s]


  6%|██▏                                 | 3096/50000 [33:33<8:08:04,  1.60it/s]


  6%|██▏                                 | 3097/50000 [33:33<7:51:13,  1.66it/s]


  6%|██▏                                 | 3098/50000 [33:34<8:04:28,  1.61it/s]


  6%|██▏                                 | 3099/50000 [33:35<7:54:15,  1.65it/s]


  6%|██▏                                 | 3100/50000 [33:35<8:01:43,  1.62it/s]
                                                                                
{'loss': 3.3749, 'grad_norm': 2.572348117828369, 'learning_rate': 0.0009379999999999999, 'epoch': 0.16}

  6%|██▏                                 | 3100/50000 [33:35<8:01:43,  1.62it/s]


  6%|██▏                                 | 3101/50000 [33:36<8:11:40,  1.59it/s]


  6%|██▏                                 | 3102/50000 [33:37<8:22:56,  1.55it/s]


  6%|██▏                                 | 3103/50000 [33:38<9:04:53,  1.43it/s]


  6%|██▏                                 | 3104/50000 [33:38<9:01:47,  1.44it/s]


  6%|██▏                                 | 3105/50000 [33:39<8:31:27,  1.53it/s]


  6%|██▏                                 | 3106/50000 [33:39<8:26:23,  1.54it/s]


  6%|██▏                                 | 3107/50000 [33:40<8:31:51,  1.53it/s]


  6%|██▏                                 | 3108/50000 [33:41<8:19:12,  1.57it/s]


  6%|██▏                                 | 3109/50000 [33:41<8:22:13,  1.56it/s]


  6%|██▏                                 | 3110/50000 [33:42<7:51:02,  1.66it/s]


  6%|██▏                                 | 3111/50000 [33:42<7:47:51,  1.67it/s]


  6%|██▏                                 | 3112/50000 [33:43<8:23:11,  1.55it/s]


  6%|██▏                                 | 3113/50000 [33:44<8:26:27,  1.54it/s]


  6%|██▏                                 | 3114/50000 [33:44<8:24:18,  1.55it/s]


  6%|██▏                                 | 3115/50000 [33:45<8:08:05,  1.60it/s]


  6%|██▏                                 | 3116/50000 [33:46<7:53:46,  1.65it/s]


  6%|██▏                                 | 3117/50000 [33:46<8:05:53,  1.61it/s]


  6%|██▏                                 | 3118/50000 [33:47<9:09:32,  1.42it/s]


  6%|██▏                                | 3119/50000 [33:48<10:07:39,  1.29it/s]


  6%|██▏                                 | 3120/50000 [33:49<9:59:59,  1.30it/s]


  6%|██▏                                 | 3121/50000 [33:50<9:48:07,  1.33it/s]


  6%|██▏                                 | 3122/50000 [33:50<9:20:41,  1.39it/s]


  6%|██▏                                 | 3123/50000 [33:51<9:13:16,  1.41it/s]


  6%|██▏                                 | 3124/50000 [33:52<9:16:08,  1.40it/s]


  6%|██▏                                | 3125/50000 [33:53<10:08:13,  1.28it/s]


  6%|██▎                                 | 3126/50000 [33:53<9:25:44,  1.38it/s]


  6%|██▎                                 | 3127/50000 [33:54<9:24:09,  1.38it/s]


  6%|██▎                                 | 3128/50000 [33:54<9:03:21,  1.44it/s]


  6%|██▎                                 | 3129/50000 [33:55<9:20:13,  1.39it/s]


  6%|██▎                                 | 3130/50000 [33:56<8:53:32,  1.46it/s]


  6%|██▎                                 | 3131/50000 [33:56<8:31:48,  1.53it/s]


  6%|██▎                                 | 3132/50000 [33:57<8:29:10,  1.53it/s]


  6%|██▎                                 | 3133/50000 [33:58<8:19:09,  1.56it/s]


  6%|██▎                                 | 3134/50000 [33:58<8:43:11,  1.49it/s]


  6%|██▎                                 | 3135/50000 [33:59<8:34:27,  1.52it/s]


  6%|██▎                                 | 3136/50000 [34:00<8:39:32,  1.50it/s]


  6%|██▎                                 | 3137/50000 [34:00<8:17:22,  1.57it/s]


  6%|██▎                                 | 3138/50000 [34:01<8:16:33,  1.57it/s]


  6%|██▎                                 | 3139/50000 [34:02<8:33:50,  1.52it/s]


  6%|██▎                                 | 3140/50000 [34:02<8:08:53,  1.60it/s]


  6%|██▎                                 | 3141/50000 [34:03<7:58:42,  1.63it/s]


  6%|██▎                                 | 3142/50000 [34:03<7:50:45,  1.66it/s]


  6%|██▎                                 | 3143/50000 [34:04<8:40:57,  1.50it/s]


  6%|██▎                                 | 3144/50000 [34:05<8:21:33,  1.56it/s]


  6%|██▎                                 | 3145/50000 [34:05<8:08:57,  1.60it/s]


  6%|██▎                                 | 3146/50000 [34:06<8:00:06,  1.63it/s]


  6%|██▎                                 | 3147/50000 [34:07<8:10:14,  1.59it/s]


  6%|██▎                                 | 3148/50000 [34:07<8:46:07,  1.48it/s]


  6%|██▎                                 | 3149/50000 [34:08<8:38:12,  1.51it/s]


  6%|██▎                                 | 3150/50000 [34:09<8:38:20,  1.51it/s]


  6%|██▎                                 | 3151/50000 [34:09<8:50:46,  1.47it/s]


  6%|██▎                                 | 3152/50000 [34:10<8:40:19,  1.50it/s]


  6%|██▎                                 | 3153/50000 [34:11<8:55:47,  1.46it/s]


  6%|██▎                                 | 3154/50000 [34:11<8:55:50,  1.46it/s]


  6%|██▎                                 | 3155/50000 [34:12<8:48:37,  1.48it/s]


  6%|██▎                                 | 3156/50000 [34:13<8:50:57,  1.47it/s]


  6%|██▎                                 | 3157/50000 [34:14<9:13:45,  1.41it/s]


  6%|██▎                                 | 3158/50000 [34:14<8:56:24,  1.46it/s]


  6%|██▎                                 | 3159/50000 [34:15<8:44:14,  1.49it/s]


  6%|██▎                                 | 3160/50000 [34:15<8:22:44,  1.55it/s]


  6%|██▎                                 | 3161/50000 [34:16<8:12:54,  1.58it/s]


  6%|██▎                                 | 3162/50000 [34:17<8:22:06,  1.55it/s]


  6%|██▎                                 | 3163/50000 [34:18<9:11:31,  1.42it/s]


  6%|██▎                                 | 3164/50000 [34:18<8:35:06,  1.52it/s]


  6%|██▎                                 | 3165/50000 [34:19<9:02:12,  1.44it/s]


  6%|██▎                                 | 3166/50000 [34:19<8:38:47,  1.50it/s]


  6%|██▎                                 | 3167/50000 [34:20<8:40:12,  1.50it/s]


  6%|██▎                                 | 3168/50000 [34:21<8:17:27,  1.57it/s]


  6%|██▎                                 | 3169/50000 [34:21<8:00:03,  1.63it/s]


  6%|██▎                                 | 3170/50000 [34:22<8:45:34,  1.49it/s]


  6%|██▎                                 | 3171/50000 [34:23<8:45:03,  1.49it/s]


  6%|██▎                                 | 3172/50000 [34:24<9:23:03,  1.39it/s]


  6%|██▎                                 | 3173/50000 [34:24<9:00:54,  1.44it/s]


  6%|██▎                                 | 3174/50000 [34:25<8:57:05,  1.45it/s]


  6%|██▎                                 | 3175/50000 [34:26<8:48:27,  1.48it/s]


  6%|██▎                                 | 3176/50000 [34:26<8:29:15,  1.53it/s]


  6%|██▎                                 | 3177/50000 [34:27<8:14:10,  1.58it/s]


  6%|██▎                                 | 3178/50000 [34:27<8:03:37,  1.61it/s]


  6%|██▎                                 | 3179/50000 [34:28<8:14:04,  1.58it/s]


  6%|██▎                                 | 3180/50000 [34:29<7:56:23,  1.64it/s]


  6%|██▎                                 | 3181/50000 [34:29<8:12:43,  1.58it/s]


  6%|██▎                                 | 3182/50000 [34:30<8:21:13,  1.56it/s]


  6%|██▎                                 | 3183/50000 [34:30<7:59:58,  1.63it/s]


  6%|██▎                                 | 3184/50000 [34:31<8:28:33,  1.53it/s]


  6%|██▎                                 | 3185/50000 [34:32<8:10:37,  1.59it/s]


  6%|██▎                                 | 3186/50000 [34:32<7:55:20,  1.64it/s]


  6%|██▎                                 | 3187/50000 [34:33<7:45:59,  1.67it/s]


  6%|██▎                                 | 3188/50000 [34:34<8:18:44,  1.56it/s]


  6%|██▎                                 | 3189/50000 [34:34<8:04:00,  1.61it/s]


  6%|██▎                                 | 3190/50000 [34:35<7:52:24,  1.65it/s]


  6%|██▎                                 | 3191/50000 [34:35<7:52:22,  1.65it/s]


  6%|██▎                                 | 3192/50000 [34:36<7:51:38,  1.65it/s]


  6%|██▎                                 | 3193/50000 [34:36<7:26:36,  1.75it/s]


  6%|██▎                                 | 3194/50000 [34:37<7:31:43,  1.73it/s]


  6%|██▎                                 | 3195/50000 [34:38<7:48:46,  1.66it/s]


  6%|██▎                                 | 3196/50000 [34:38<7:59:15,  1.63it/s]


  6%|██▎                                 | 3197/50000 [34:39<7:56:03,  1.64it/s]


  6%|██▎                                 | 3198/50000 [34:40<7:48:10,  1.67it/s]


  6%|██▎                                 | 3199/50000 [34:40<8:24:28,  1.55it/s]


  6%|██▎                                 | 3200/50000 [34:41<8:25:34,  1.54it/s]
                                                                                
{'loss': 3.4511, 'grad_norm': 2.505192995071411, 'learning_rate': 0.0009360000000000001, 'epoch': 0.17}

  6%|██▎                                 | 3200/50000 [34:41<8:25:34,  1.54it/s]


  6%|██▎                                 | 3201/50000 [34:42<8:11:35,  1.59it/s]


  6%|██▎                                 | 3202/50000 [34:42<7:40:27,  1.69it/s]


  6%|██▎                                 | 3203/50000 [34:43<8:25:00,  1.54it/s]


  6%|██▎                                 | 3204/50000 [34:44<8:29:00,  1.53it/s]


  6%|██▎                                 | 3205/50000 [34:44<8:12:38,  1.58it/s]


  6%|██▎                                 | 3206/50000 [34:45<8:37:06,  1.51it/s]


  6%|██▎                                 | 3207/50000 [34:45<8:22:30,  1.55it/s]


  6%|██▎                                 | 3208/50000 [34:46<8:04:02,  1.61it/s]


  6%|██▎                                 | 3209/50000 [34:47<8:05:26,  1.61it/s]


  6%|██▎                                 | 3210/50000 [34:47<8:29:02,  1.53it/s]


  6%|██▎                                 | 3211/50000 [34:48<8:22:52,  1.55it/s]


  6%|██▎                                 | 3212/50000 [34:49<8:41:24,  1.50it/s]


  6%|██▎                                 | 3213/50000 [34:49<8:31:26,  1.52it/s]


  6%|██▎                                 | 3214/50000 [34:50<8:27:34,  1.54it/s]


  6%|██▎                                 | 3215/50000 [34:51<8:26:20,  1.54it/s]


  6%|██▎                                 | 3216/50000 [34:51<8:42:03,  1.49it/s]


  6%|██▎                                 | 3217/50000 [34:52<8:39:24,  1.50it/s]


  6%|██▎                                 | 3218/50000 [34:53<8:15:47,  1.57it/s]


  6%|██▎                                 | 3219/50000 [34:53<8:22:52,  1.55it/s]


  6%|██▎                                 | 3220/50000 [34:54<8:42:05,  1.49it/s]


  6%|██▎                                 | 3221/50000 [34:55<8:19:16,  1.56it/s]


  6%|██▎                                 | 3222/50000 [34:55<8:40:46,  1.50it/s]


  6%|██▎                                 | 3223/50000 [34:56<9:03:17,  1.43it/s]


  6%|██▎                                 | 3224/50000 [34:57<8:56:15,  1.45it/s]


  6%|██▎                                 | 3225/50000 [34:57<8:16:42,  1.57it/s]


  6%|██▎                                 | 3226/50000 [34:58<8:08:47,  1.59it/s]


  6%|██▎                                 | 3227/50000 [34:58<8:15:08,  1.57it/s]


  6%|██▎                                 | 3228/50000 [34:59<8:18:30,  1.56it/s]


  6%|██▎                                 | 3229/50000 [35:00<8:20:26,  1.56it/s]


  6%|██▎                                 | 3230/50000 [35:00<8:36:34,  1.51it/s]


  6%|██▎                                 | 3231/50000 [35:01<8:28:43,  1.53it/s]


  6%|██▎                                 | 3232/50000 [35:02<8:17:08,  1.57it/s]


  6%|██▎                                 | 3233/50000 [35:02<8:08:59,  1.59it/s]


  6%|██▎                                 | 3234/50000 [35:03<8:18:09,  1.56it/s]


  6%|██▎                                 | 3235/50000 [35:04<8:51:57,  1.47it/s]


  6%|██▎                                 | 3236/50000 [35:04<8:43:50,  1.49it/s]


  6%|██▎                                 | 3237/50000 [35:05<8:59:24,  1.44it/s]


  6%|██▎                                 | 3238/50000 [35:06<8:44:42,  1.49it/s]


  6%|██▎                                 | 3239/50000 [35:06<8:55:33,  1.46it/s]


  6%|██▎                                 | 3240/50000 [35:07<9:00:51,  1.44it/s]


  6%|██▎                                 | 3241/50000 [35:08<9:15:10,  1.40it/s]


  6%|██▎                                 | 3242/50000 [35:09<8:53:43,  1.46it/s]


  6%|██▎                                 | 3243/50000 [35:09<8:40:07,  1.50it/s]


  6%|██▎                                 | 3244/50000 [35:10<8:24:31,  1.54it/s]


  6%|██▎                                 | 3245/50000 [35:10<8:00:25,  1.62it/s]


  6%|██▎                                 | 3246/50000 [35:11<7:48:18,  1.66it/s]


  6%|██▎                                 | 3247/50000 [35:12<8:00:05,  1.62it/s]


  6%|██▎                                 | 3248/50000 [35:12<7:44:51,  1.68it/s]


  6%|██▎                                 | 3249/50000 [35:13<8:00:45,  1.62it/s]


  6%|██▎                                 | 3250/50000 [35:13<8:24:20,  1.54it/s]


  7%|██▎                                 | 3251/50000 [35:14<8:09:27,  1.59it/s]


  7%|██▎                                 | 3252/50000 [35:15<8:33:50,  1.52it/s]


  7%|██▎                                 | 3253/50000 [35:16<8:45:58,  1.48it/s]


  7%|██▎                                 | 3254/50000 [35:16<9:11:41,  1.41it/s]


  7%|██▎                                 | 3255/50000 [35:17<8:53:28,  1.46it/s]


  7%|██▎                                 | 3256/50000 [35:18<8:45:33,  1.48it/s]


  7%|██▎                                 | 3257/50000 [35:18<8:23:10,  1.55it/s]


  7%|██▎                                 | 3258/50000 [35:19<8:07:47,  1.60it/s]


  7%|██▎                                 | 3259/50000 [35:19<8:14:26,  1.58it/s]


  7%|██▎                                 | 3260/50000 [35:20<7:54:02,  1.64it/s]


  7%|██▎                                 | 3261/50000 [35:21<8:51:58,  1.46it/s]


  7%|██▎                                 | 3262/50000 [35:21<8:51:05,  1.47it/s]


  7%|██▎                                 | 3263/50000 [35:22<9:05:52,  1.43it/s]


  7%|██▎                                 | 3264/50000 [35:23<8:50:25,  1.47it/s]


  7%|██▎                                 | 3265/50000 [35:23<8:36:46,  1.51it/s]


  7%|██▎                                 | 3266/50000 [35:24<8:58:52,  1.45it/s]


  7%|██▎                                 | 3267/50000 [35:25<8:24:57,  1.54it/s]


  7%|██▎                                 | 3268/50000 [35:25<7:56:42,  1.63it/s]


  7%|██▎                                 | 3269/50000 [35:26<8:03:27,  1.61it/s]


  7%|██▎                                 | 3270/50000 [35:27<8:03:24,  1.61it/s]


  7%|██▎                                 | 3271/50000 [35:27<8:07:40,  1.60it/s]


  7%|██▎                                 | 3272/50000 [35:28<8:00:23,  1.62it/s]


  7%|██▎                                 | 3273/50000 [35:28<7:53:41,  1.64it/s]


  7%|██▎                                 | 3274/50000 [35:29<7:40:26,  1.69it/s]


  7%|██▎                                 | 3275/50000 [35:30<8:00:31,  1.62it/s]


  7%|██▎                                 | 3276/50000 [35:30<8:27:00,  1.54it/s]


  7%|██▎                                 | 3277/50000 [35:31<8:22:44,  1.55it/s]


  7%|██▎                                 | 3278/50000 [35:32<8:22:42,  1.55it/s]


  7%|██▎                                 | 3279/50000 [35:32<9:05:58,  1.43it/s]


  7%|██▎                                 | 3280/50000 [35:33<9:11:12,  1.41it/s]


  7%|██▎                                 | 3281/50000 [35:34<8:40:27,  1.50it/s]


  7%|██▎                                 | 3282/50000 [35:35<9:06:47,  1.42it/s]


  7%|██▎                                 | 3283/50000 [35:35<8:32:56,  1.52it/s]


  7%|██▎                                 | 3284/50000 [35:36<8:37:22,  1.50it/s]


  7%|██▎                                 | 3285/50000 [35:36<8:33:26,  1.52it/s]


  7%|██▎                                 | 3286/50000 [35:37<7:58:57,  1.63it/s]


  7%|██▎                                 | 3287/50000 [35:38<8:27:51,  1.53it/s]


  7%|██▎                                 | 3288/50000 [35:38<8:06:16,  1.60it/s]


  7%|██▎                                 | 3289/50000 [35:39<8:13:05,  1.58it/s]


  7%|██▎                                 | 3290/50000 [35:40<8:23:51,  1.55it/s]


  7%|██▎                                 | 3291/50000 [35:40<8:46:36,  1.48it/s]


  7%|██▎                                 | 3292/50000 [35:41<8:42:40,  1.49it/s]


  7%|██▎                                 | 3293/50000 [35:42<8:26:40,  1.54it/s]


  7%|██▎                                 | 3294/50000 [35:42<8:33:18,  1.52it/s]


  7%|██▎                                 | 3295/50000 [35:43<8:54:52,  1.46it/s]


  7%|██▎                                 | 3296/50000 [35:44<8:35:15,  1.51it/s]


  7%|██▎                                 | 3297/50000 [35:44<8:33:25,  1.52it/s]


  7%|██▎                                 | 3298/50000 [35:45<8:37:58,  1.50it/s]


  7%|██▍                                 | 3299/50000 [35:46<8:16:08,  1.57it/s]


  7%|██▍                                 | 3300/50000 [35:46<8:06:38,  1.60it/s]
                                                                                
{'loss': 3.3727, 'grad_norm': 2.6721649169921875, 'learning_rate': 0.000934, 'epoch': 0.17}

  7%|██▍                                 | 3300/50000 [35:46<8:06:38,  1.60it/s]


  7%|██▍                                 | 3301/50000 [35:47<7:40:18,  1.69it/s]


  7%|██▍                                 | 3302/50000 [35:47<7:40:10,  1.69it/s]


  7%|██▍                                 | 3303/50000 [35:48<8:54:05,  1.46it/s]


  7%|██▍                                 | 3304/50000 [35:49<8:40:45,  1.49it/s]


  7%|██▍                                 | 3305/50000 [35:49<8:18:04,  1.56it/s]


  7%|██▍                                 | 3306/50000 [35:50<8:22:43,  1.55it/s]


  7%|██▍                                 | 3307/50000 [35:51<9:22:10,  1.38it/s]


  7%|██▍                                 | 3308/50000 [35:52<9:10:48,  1.41it/s]


  7%|██▍                                 | 3309/50000 [35:52<8:36:10,  1.51it/s]


  7%|██▍                                 | 3310/50000 [35:53<8:35:15,  1.51it/s]


  7%|██▍                                 | 3311/50000 [35:53<8:02:38,  1.61it/s]


  7%|██▍                                 | 3312/50000 [35:54<8:23:28,  1.55it/s]


  7%|██▍                                 | 3313/50000 [35:55<8:09:47,  1.59it/s]


  7%|██▍                                 | 3314/50000 [35:55<7:53:04,  1.64it/s]


  7%|██▍                                 | 3315/50000 [35:56<8:24:37,  1.54it/s]


  7%|██▍                                 | 3316/50000 [35:57<8:40:29,  1.49it/s]


  7%|██▍                                 | 3317/50000 [35:57<8:18:53,  1.56it/s]


  7%|██▍                                 | 3318/50000 [35:58<7:49:39,  1.66it/s]


  7%|██▍                                 | 3319/50000 [35:58<7:30:45,  1.73it/s]


  7%|██▍                                 | 3320/50000 [35:59<8:06:08,  1.60it/s]


  7%|██▍                                 | 3321/50000 [36:00<8:00:37,  1.62it/s]


  7%|██▍                                 | 3322/50000 [36:00<8:28:24,  1.53it/s]


  7%|██▍                                 | 3323/50000 [36:01<8:30:58,  1.52it/s]


  7%|██▍                                 | 3324/50000 [36:01<7:59:35,  1.62it/s]


  7%|██▍                                 | 3325/50000 [36:02<8:32:02,  1.52it/s]


  7%|██▍                                 | 3326/50000 [36:03<8:17:19,  1.56it/s]


  7%|██▍                                 | 3327/50000 [36:04<9:19:25,  1.39it/s]


  7%|██▍                                 | 3328/50000 [36:04<8:53:12,  1.46it/s]


  7%|██▍                                 | 3329/50000 [36:05<8:49:44,  1.47it/s]


  7%|██▍                                 | 3330/50000 [36:06<8:19:22,  1.56it/s]


  7%|██▍                                 | 3331/50000 [36:06<8:11:09,  1.58it/s]


  7%|██▍                                 | 3332/50000 [36:07<8:15:50,  1.57it/s]


  7%|██▍                                 | 3333/50000 [36:07<8:02:54,  1.61it/s]


  7%|██▍                                 | 3334/50000 [36:08<8:05:45,  1.60it/s]


  7%|██▍                                 | 3335/50000 [36:09<8:07:35,  1.60it/s]


  7%|██▍                                 | 3336/50000 [36:09<8:37:31,  1.50it/s]


  7%|██▍                                 | 3337/50000 [36:10<8:35:43,  1.51it/s]


  7%|██▍                                 | 3338/50000 [36:11<8:40:21,  1.49it/s]


  7%|██▍                                 | 3339/50000 [36:11<8:25:49,  1.54it/s]


  7%|██▍                                 | 3340/50000 [36:12<8:14:13,  1.57it/s]


  7%|██▍                                 | 3341/50000 [36:13<7:54:34,  1.64it/s]


  7%|██▍                                 | 3342/50000 [36:13<7:49:28,  1.66it/s]


  7%|██▍                                 | 3343/50000 [36:14<7:40:07,  1.69it/s]


  7%|██▍                                 | 3344/50000 [36:15<9:05:11,  1.43it/s]


  7%|██▍                                 | 3345/50000 [36:15<8:23:26,  1.54it/s]


  7%|██▍                                 | 3346/50000 [36:16<8:18:28,  1.56it/s]


  7%|██▍                                 | 3347/50000 [36:16<8:04:48,  1.60it/s]


  7%|██▍                                 | 3348/50000 [36:17<8:06:56,  1.60it/s]


  7%|██▍                                 | 3349/50000 [36:18<8:32:39,  1.52it/s]


  7%|██▍                                 | 3350/50000 [36:19<8:56:36,  1.45it/s]


  7%|██▍                                 | 3351/50000 [36:19<8:37:15,  1.50it/s]


  7%|██▍                                 | 3352/50000 [36:20<8:17:58,  1.56it/s]


  7%|██▍                                 | 3353/50000 [36:20<8:38:52,  1.50it/s]


  7%|██▍                                 | 3354/50000 [36:21<8:27:59,  1.53it/s]


  7%|██▍                                 | 3355/50000 [36:22<8:27:32,  1.53it/s]


  7%|██▍                                 | 3356/50000 [36:22<8:12:33,  1.58it/s]


  7%|██▍                                 | 3357/50000 [36:23<8:30:16,  1.52it/s]


  7%|██▍                                 | 3358/50000 [36:24<8:45:55,  1.48it/s]


  7%|██▍                                 | 3359/50000 [36:24<8:57:00,  1.45it/s]


  7%|██▍                                 | 3360/50000 [36:25<8:34:35,  1.51it/s]


  7%|██▍                                 | 3361/50000 [36:26<8:26:15,  1.54it/s]


  7%|██▍                                 | 3362/50000 [36:26<8:05:44,  1.60it/s]


  7%|██▍                                 | 3363/50000 [36:27<8:09:04,  1.59it/s]


  7%|██▍                                 | 3364/50000 [36:28<8:32:31,  1.52it/s]


  7%|██▍                                 | 3365/50000 [36:28<8:18:55,  1.56it/s]


  7%|██▍                                 | 3366/50000 [36:29<8:27:09,  1.53it/s]


  7%|██▍                                 | 3367/50000 [36:30<8:24:27,  1.54it/s]


  7%|██▍                                 | 3368/50000 [36:30<8:55:15,  1.45it/s]


  7%|██▍                                 | 3369/50000 [36:31<9:00:03,  1.44it/s]


  7%|██▍                                 | 3370/50000 [36:32<8:29:20,  1.53it/s]


  7%|██▍                                 | 3371/50000 [36:32<9:12:42,  1.41it/s]


  7%|██▍                                 | 3372/50000 [36:33<9:46:50,  1.32it/s]


  7%|██▍                                 | 3373/50000 [36:34<9:17:41,  1.39it/s]


  7%|██▍                                 | 3374/50000 [36:34<8:33:00,  1.51it/s]


  7%|██▍                                 | 3375/50000 [36:35<8:08:22,  1.59it/s]


  7%|██▍                                 | 3376/50000 [36:35<7:38:18,  1.70it/s]


  7%|██▍                                 | 3377/50000 [36:36<8:52:24,  1.46it/s]


  7%|██▍                                 | 3378/50000 [36:37<8:26:11,  1.54it/s]


  7%|██▍                                 | 3379/50000 [36:37<7:56:33,  1.63it/s]


  7%|██▍                                 | 3380/50000 [36:38<8:47:25,  1.47it/s]


  7%|██▍                                 | 3381/50000 [36:39<8:28:20,  1.53it/s]


  7%|██▍                                 | 3382/50000 [36:39<8:06:32,  1.60it/s]


  7%|██▍                                 | 3383/50000 [36:40<8:10:09,  1.59it/s]


  7%|██▍                                 | 3384/50000 [36:41<8:44:18,  1.48it/s]


  7%|██▍                                 | 3385/50000 [36:42<8:36:28,  1.50it/s]


  7%|██▍                                 | 3386/50000 [36:42<9:40:36,  1.34it/s]


  7%|██▍                                 | 3387/50000 [36:43<9:34:15,  1.35it/s]


  7%|██▍                                 | 3388/50000 [36:44<9:07:14,  1.42it/s]


  7%|██▍                                 | 3389/50000 [36:45<9:15:10,  1.40it/s]


  7%|██▍                                 | 3390/50000 [36:45<9:25:11,  1.37it/s]


  7%|██▍                                 | 3391/50000 [36:46<8:51:59,  1.46it/s]


  7%|██▍                                 | 3392/50000 [36:46<8:24:11,  1.54it/s]


  7%|██▍                                 | 3393/50000 [36:47<9:03:28,  1.43it/s]


  7%|██▍                                 | 3394/50000 [36:48<8:51:27,  1.46it/s]


  7%|██▍                                 | 3395/50000 [36:49<9:08:37,  1.42it/s]


  7%|██▍                                 | 3396/50000 [36:49<8:39:07,  1.50it/s]


  7%|██▍                                 | 3397/50000 [36:50<8:57:00,  1.45it/s]


  7%|██▍                                 | 3398/50000 [36:51<8:51:17,  1.46it/s]


  7%|██▍                                 | 3399/50000 [36:51<8:36:58,  1.50it/s]


  7%|██▍                                 | 3400/50000 [36:52<8:23:18,  1.54it/s]
                                                                                
{'loss': 3.3903, 'grad_norm': 2.527125597000122, 'learning_rate': 0.0009320000000000001, 'epoch': 0.18}

  7%|██▍                                 | 3400/50000 [36:52<8:23:18,  1.54it/s]


  7%|██▍                                 | 3401/50000 [36:53<8:24:15,  1.54it/s]


  7%|██▍                                 | 3402/50000 [36:53<8:32:12,  1.52it/s]


  7%|██▍                                 | 3403/50000 [36:54<8:15:38,  1.57it/s]


  7%|██▍                                 | 3404/50000 [36:55<8:41:30,  1.49it/s]


  7%|██▍                                 | 3405/50000 [36:55<8:41:28,  1.49it/s]


  7%|██▍                                 | 3406/50000 [36:56<8:41:13,  1.49it/s]


  7%|██▍                                 | 3407/50000 [36:57<8:38:30,  1.50it/s]


  7%|██▍                                 | 3408/50000 [36:57<8:35:16,  1.51it/s]


  7%|██▍                                 | 3409/50000 [36:58<8:16:58,  1.56it/s]


  7%|██▍                                 | 3410/50000 [36:58<8:05:31,  1.60it/s]


  7%|██▍                                 | 3411/50000 [36:59<7:51:39,  1.65it/s]


  7%|██▍                                 | 3412/50000 [37:00<8:01:02,  1.61it/s]


  7%|██▍                                 | 3413/50000 [37:00<8:36:16,  1.50it/s]


  7%|██▍                                 | 3414/50000 [37:01<8:18:48,  1.56it/s]


  7%|██▍                                 | 3415/50000 [37:02<8:36:53,  1.50it/s]


  7%|██▍                                 | 3416/50000 [37:02<7:53:54,  1.64it/s]


  7%|██▍                                 | 3417/50000 [37:03<7:58:17,  1.62it/s]


  7%|██▍                                 | 3418/50000 [37:03<7:59:53,  1.62it/s]


  7%|██▍                                 | 3419/50000 [37:04<7:48:47,  1.66it/s]


  7%|██▍                                 | 3420/50000 [37:05<8:27:02,  1.53it/s]


  7%|██▍                                 | 3421/50000 [37:05<8:11:57,  1.58it/s]


  7%|██▍                                 | 3422/50000 [37:06<7:54:03,  1.64it/s]


  7%|██▍                                 | 3423/50000 [37:07<7:48:50,  1.66it/s]


  7%|██▍                                 | 3424/50000 [37:07<7:42:19,  1.68it/s]


  7%|██▍                                 | 3425/50000 [37:08<7:53:53,  1.64it/s]


  7%|██▍                                 | 3426/50000 [37:08<7:42:59,  1.68it/s]


  7%|██▍                                 | 3427/50000 [37:09<7:31:00,  1.72it/s]


  7%|██▍                                 | 3428/50000 [37:10<8:11:24,  1.58it/s]


  7%|██▍                                 | 3429/50000 [37:10<8:11:54,  1.58it/s]


  7%|██▍                                 | 3430/50000 [37:11<8:11:50,  1.58it/s]


  7%|██▍                                 | 3431/50000 [37:12<8:23:12,  1.54it/s]


  7%|██▍                                 | 3432/50000 [37:12<7:59:01,  1.62it/s]


  7%|██▍                                 | 3433/50000 [37:13<7:46:31,  1.66it/s]


  7%|██▍                                 | 3434/50000 [37:13<7:45:02,  1.67it/s]


  7%|██▍                                 | 3435/50000 [37:14<7:33:47,  1.71it/s]


  7%|██▍                                 | 3436/50000 [37:14<7:43:33,  1.67it/s]


  7%|██▍                                 | 3437/50000 [37:15<7:37:15,  1.70it/s]


  7%|██▍                                 | 3438/50000 [37:16<7:40:04,  1.69it/s]


  7%|██▍                                 | 3439/50000 [37:16<8:33:34,  1.51it/s]


  7%|██▍                                 | 3440/50000 [37:17<8:56:52,  1.45it/s]


  7%|██▍                                 | 3441/50000 [37:18<8:47:27,  1.47it/s]


  7%|██▍                                 | 3442/50000 [37:18<8:35:06,  1.51it/s]


  7%|██▍                                 | 3443/50000 [37:19<8:26:34,  1.53it/s]


  7%|██▍                                 | 3444/50000 [37:20<9:44:04,  1.33it/s]


  7%|██▍                                 | 3445/50000 [37:21<8:57:01,  1.44it/s]


  7%|██▍                                 | 3446/50000 [37:21<8:44:42,  1.48it/s]


  7%|██▍                                 | 3447/50000 [37:22<8:37:36,  1.50it/s]


  7%|██▍                                 | 3448/50000 [37:23<8:36:52,  1.50it/s]


  7%|██▍                                 | 3449/50000 [37:23<8:56:11,  1.45it/s]


  7%|██▍                                 | 3450/50000 [37:24<8:32:12,  1.51it/s]


  7%|██▍                                 | 3451/50000 [37:25<8:45:32,  1.48it/s]


  7%|██▍                                 | 3452/50000 [37:25<8:42:26,  1.48it/s]


  7%|██▍                                 | 3453/50000 [37:26<8:58:31,  1.44it/s]


  7%|██▍                                 | 3454/50000 [37:27<8:35:26,  1.51it/s]


  7%|██▍                                | 3455/50000 [37:28<10:07:44,  1.28it/s]


  7%|██▍                                 | 3456/50000 [37:28<9:03:58,  1.43it/s]


  7%|██▍                                 | 3457/50000 [37:29<8:37:59,  1.50it/s]


  7%|██▍                                 | 3458/50000 [37:29<8:31:43,  1.52it/s]


  7%|██▍                                 | 3459/50000 [37:30<8:23:10,  1.54it/s]


  7%|██▍                                 | 3460/50000 [37:31<8:30:19,  1.52it/s]


  7%|██▍                                 | 3461/50000 [37:31<8:25:07,  1.54it/s]


  7%|██▍                                 | 3462/50000 [37:32<8:02:13,  1.61it/s]


  7%|██▍                                 | 3463/50000 [37:33<7:57:55,  1.62it/s]


  7%|██▍                                 | 3464/50000 [37:33<8:22:02,  1.54it/s]


  7%|██▍                                 | 3465/50000 [37:34<8:08:47,  1.59it/s]


  7%|██▍                                 | 3466/50000 [37:34<7:56:45,  1.63it/s]


  7%|██▍                                 | 3467/50000 [37:35<7:42:24,  1.68it/s]


  7%|██▍                                 | 3468/50000 [37:36<8:20:14,  1.55it/s]


  7%|██▍                                 | 3469/50000 [37:36<8:01:48,  1.61it/s]


  7%|██▍                                 | 3470/50000 [37:37<7:53:18,  1.64it/s]


  7%|██▍                                 | 3471/50000 [37:38<9:02:24,  1.43it/s]


  7%|██▍                                 | 3472/50000 [37:38<8:56:13,  1.45it/s]


  7%|██▌                                 | 3473/50000 [37:39<8:48:57,  1.47it/s]


  7%|██▌                                 | 3474/50000 [37:40<8:30:38,  1.52it/s]


  7%|██▌                                 | 3475/50000 [37:40<8:19:12,  1.55it/s]


  7%|██▌                                 | 3476/50000 [37:41<8:38:13,  1.50it/s]


  7%|██▌                                 | 3477/50000 [37:42<8:30:38,  1.52it/s]


  7%|██▌                                 | 3478/50000 [37:43<9:30:11,  1.36it/s]


  7%|██▌                                 | 3479/50000 [37:43<9:50:54,  1.31it/s]


  7%|██▌                                 | 3480/50000 [37:44<9:28:42,  1.36it/s]


  7%|██▌                                 | 3481/50000 [37:45<8:55:34,  1.45it/s]


  7%|██▌                                 | 3482/50000 [37:45<8:26:30,  1.53it/s]


  7%|██▌                                 | 3483/50000 [37:46<8:07:41,  1.59it/s]


  7%|██▌                                 | 3484/50000 [37:47<8:20:57,  1.55it/s]


  7%|██▌                                 | 3485/50000 [37:47<8:39:30,  1.49it/s]


  7%|██▌                                 | 3486/50000 [37:48<9:13:57,  1.40it/s]


  7%|██▌                                 | 3487/50000 [37:49<8:36:23,  1.50it/s]


  7%|██▌                                 | 3488/50000 [37:49<8:28:42,  1.52it/s]


  7%|██▌                                 | 3489/50000 [37:50<8:07:11,  1.59it/s]


  7%|██▌                                 | 3490/50000 [37:50<7:52:00,  1.64it/s]


  7%|██▌                                 | 3491/50000 [37:51<7:51:35,  1.64it/s]


  7%|██▌                                 | 3492/50000 [37:52<8:15:05,  1.57it/s]


  7%|██▌                                 | 3493/50000 [37:52<8:25:37,  1.53it/s]


  7%|██▌                                 | 3494/50000 [37:53<8:24:06,  1.54it/s]


  7%|██▌                                 | 3495/50000 [37:54<8:45:32,  1.47it/s]


  7%|██▌                                 | 3496/50000 [37:54<8:01:59,  1.61it/s]


  7%|██▌                                 | 3497/50000 [37:55<8:07:27,  1.59it/s]


  7%|██▌                                 | 3498/50000 [37:56<8:09:50,  1.58it/s]


  7%|██▌                                 | 3499/50000 [37:56<8:15:56,  1.56it/s]


  7%|██▌                                 | 3500/50000 [37:57<7:49:27,  1.65it/s]
                                                                                
{'loss': 3.401, 'grad_norm': 2.4717135429382324, 'learning_rate': 0.00093, 'epoch': 0.18}

  7%|██▌                                 | 3500/50000 [37:57<7:49:27,  1.65it/s]


  7%|██▌                                 | 3501/50000 [37:58<8:30:13,  1.52it/s]


  7%|██▌                                 | 3502/50000 [37:58<8:34:33,  1.51it/s]


  7%|██▌                                 | 3503/50000 [37:59<8:54:23,  1.45it/s]


  7%|██▌                                 | 3504/50000 [38:00<8:41:20,  1.49it/s]


  7%|██▌                                 | 3505/50000 [38:00<8:34:59,  1.50it/s]


  7%|██▌                                 | 3506/50000 [38:01<8:18:53,  1.55it/s]


  7%|██▌                                 | 3507/50000 [38:01<7:55:56,  1.63it/s]


  7%|██▌                                 | 3508/50000 [38:02<7:42:49,  1.67it/s]


  7%|██▌                                 | 3509/50000 [38:03<8:10:10,  1.58it/s]


  7%|██▌                                 | 3510/50000 [38:03<8:19:02,  1.55it/s]


  7%|██▌                                 | 3511/50000 [38:04<8:34:03,  1.51it/s]


  7%|██▌                                 | 3512/50000 [38:05<8:16:35,  1.56it/s]


  7%|██▌                                 | 3513/50000 [38:05<8:14:49,  1.57it/s]


  7%|██▌                                 | 3514/50000 [38:06<8:10:36,  1.58it/s]


  7%|██▌                                 | 3515/50000 [38:07<8:54:25,  1.45it/s]


  7%|██▌                                 | 3516/50000 [38:07<8:50:05,  1.46it/s]


  7%|██▌                                 | 3517/50000 [38:08<8:38:11,  1.50it/s]


  7%|██▌                                 | 3518/50000 [38:09<9:36:12,  1.34it/s]


  7%|██▌                                 | 3519/50000 [38:10<9:22:18,  1.38it/s]


  7%|██▌                                 | 3520/50000 [38:10<8:41:43,  1.48it/s]


  7%|██▌                                 | 3521/50000 [38:11<8:42:29,  1.48it/s]


  7%|██▌                                 | 3522/50000 [38:11<8:35:58,  1.50it/s]


  7%|██▌                                 | 3523/50000 [38:12<8:52:46,  1.45it/s]


  7%|██▌                                 | 3524/50000 [38:13<9:02:59,  1.43it/s]


  7%|██▌                                 | 3525/50000 [38:14<8:56:25,  1.44it/s]


  7%|██▌                                 | 3526/50000 [38:14<8:44:34,  1.48it/s]


  7%|██▌                                 | 3527/50000 [38:15<8:15:30,  1.56it/s]


  7%|██▌                                 | 3528/50000 [38:16<9:06:43,  1.42it/s]


  7%|██▌                                 | 3529/50000 [38:16<8:41:44,  1.48it/s]


  7%|██▌                                 | 3530/50000 [38:17<8:20:28,  1.55it/s]


  7%|██▌                                 | 3531/50000 [38:18<8:27:37,  1.53it/s]


  7%|██▌                                 | 3532/50000 [38:18<8:26:44,  1.53it/s]


  7%|██▌                                 | 3533/50000 [38:19<8:31:57,  1.51it/s]


  7%|██▌                                 | 3534/50000 [38:19<8:26:52,  1.53it/s]


  7%|██▌                                 | 3535/50000 [38:20<7:50:15,  1.65it/s]


  7%|██▌                                 | 3536/50000 [38:21<8:21:36,  1.54it/s]


  7%|██▌                                 | 3537/50000 [38:21<8:40:48,  1.49it/s]


  7%|██▌                                 | 3538/50000 [38:22<8:34:32,  1.50it/s]


  7%|██▌                                 | 3539/50000 [38:23<8:35:57,  1.50it/s]


  7%|██▌                                 | 3540/50000 [38:23<7:59:35,  1.61it/s]


  7%|██▌                                 | 3541/50000 [38:24<9:07:23,  1.41it/s]


  7%|██▌                                 | 3542/50000 [38:25<9:07:55,  1.41it/s]


  7%|██▌                                 | 3543/50000 [38:26<8:49:00,  1.46it/s]


  7%|██▌                                 | 3544/50000 [38:26<8:29:50,  1.52it/s]


  7%|██▌                                 | 3545/50000 [38:27<8:14:54,  1.56it/s]


  7%|██▌                                 | 3546/50000 [38:27<8:07:53,  1.59it/s]


  7%|██▌                                 | 3547/50000 [38:28<8:29:24,  1.52it/s]


  7%|██▌                                 | 3548/50000 [38:29<9:07:25,  1.41it/s]


  7%|██▌                                 | 3549/50000 [38:29<8:38:22,  1.49it/s]


  7%|██▌                                 | 3550/50000 [38:30<8:32:42,  1.51it/s]


  7%|██▌                                 | 3551/50000 [38:31<8:33:22,  1.51it/s]


  7%|██▌                                 | 3552/50000 [38:31<8:50:21,  1.46it/s]


  7%|██▌                                 | 3553/50000 [38:32<8:41:00,  1.49it/s]


  7%|██▌                                 | 3554/50000 [38:33<8:39:01,  1.49it/s]


  7%|██▌                                 | 3555/50000 [38:33<8:18:40,  1.55it/s]


  7%|██▌                                 | 3556/50000 [38:34<9:17:47,  1.39it/s]


  7%|██▌                                 | 3557/50000 [38:35<9:24:27,  1.37it/s]


  7%|██▌                                 | 3558/50000 [38:36<9:13:47,  1.40it/s]


  7%|██▌                                 | 3559/50000 [38:36<9:03:26,  1.42it/s]


  7%|██▌                                 | 3560/50000 [38:37<8:47:30,  1.47it/s]


  7%|██▌                                 | 3561/50000 [38:38<8:53:56,  1.45it/s]


  7%|██▌                                 | 3562/50000 [38:38<8:51:21,  1.46it/s]


  7%|██▌                                 | 3563/50000 [38:39<8:40:52,  1.49it/s]


  7%|██▌                                 | 3564/50000 [38:40<8:35:06,  1.50it/s]


  7%|██▌                                 | 3565/50000 [38:40<8:16:14,  1.56it/s]


  7%|██▌                                 | 3566/50000 [38:41<8:20:39,  1.55it/s]


  7%|██▌                                 | 3567/50000 [38:42<8:37:59,  1.49it/s]


  7%|██▌                                 | 3568/50000 [38:42<9:01:32,  1.43it/s]


  7%|██▌                                 | 3569/50000 [38:43<8:53:49,  1.45it/s]


  7%|██▌                                 | 3570/50000 [38:44<9:09:39,  1.41it/s]


  7%|██▌                                 | 3571/50000 [38:45<9:04:03,  1.42it/s]


  7%|██▌                                 | 3572/50000 [38:45<8:39:28,  1.49it/s]


  7%|██▌                                 | 3573/50000 [38:46<9:05:28,  1.42it/s]


  7%|██▌                                 | 3574/50000 [38:47<8:58:08,  1.44it/s]


  7%|██▌                                 | 3575/50000 [38:47<8:25:09,  1.53it/s]


  7%|██▌                                 | 3576/50000 [38:48<8:38:58,  1.49it/s]


  7%|██▌                                 | 3577/50000 [38:49<9:00:50,  1.43it/s]


  7%|██▌                                 | 3578/50000 [38:49<8:32:47,  1.51it/s]


  7%|██▌                                 | 3579/50000 [38:50<8:51:48,  1.45it/s]


  7%|██▌                                 | 3580/50000 [38:51<8:46:34,  1.47it/s]


  7%|██▌                                 | 3581/50000 [38:51<9:23:58,  1.37it/s]


  7%|██▌                                 | 3582/50000 [38:52<8:51:19,  1.46it/s]


  7%|██▌                                 | 3583/50000 [38:53<8:37:22,  1.50it/s]


  7%|██▌                                 | 3584/50000 [38:53<8:27:41,  1.52it/s]


  7%|██▌                                 | 3585/50000 [38:54<8:25:17,  1.53it/s]


  7%|██▌                                 | 3586/50000 [38:55<8:39:43,  1.49it/s]


  7%|██▌                                 | 3587/50000 [38:55<8:42:37,  1.48it/s]


  7%|██▌                                 | 3588/50000 [38:56<8:18:08,  1.55it/s]


  7%|██▌                                 | 3589/50000 [38:57<8:34:39,  1.50it/s]


  7%|██▌                                 | 3590/50000 [38:57<8:31:34,  1.51it/s]


  7%|██▌                                 | 3591/50000 [38:58<8:14:47,  1.56it/s]


  7%|██▌                                 | 3592/50000 [38:59<8:16:07,  1.56it/s]


  7%|██▌                                 | 3593/50000 [38:59<7:59:34,  1.61it/s]


  7%|██▌                                 | 3594/50000 [39:00<8:01:04,  1.61it/s]


  7%|██▌                                 | 3595/50000 [39:00<8:24:40,  1.53it/s]


  7%|██▌                                 | 3596/50000 [39:01<7:54:33,  1.63it/s]


  7%|██▌                                 | 3597/50000 [39:02<8:02:08,  1.60it/s]


  7%|██▌                                 | 3598/50000 [39:02<8:49:49,  1.46it/s]


  7%|██▌                                 | 3599/50000 [39:03<8:46:25,  1.47it/s]


  7%|██▌                                 | 3600/50000 [39:04<8:58:03,  1.44it/s]
                                                                                
{'loss': 3.324, 'grad_norm': 2.4401516914367676, 'learning_rate': 0.0009280000000000001, 'epoch': 0.19}

  7%|██▌                                 | 3600/50000 [39:04<8:58:03,  1.44it/s]


  7%|██▌                                 | 3601/50000 [39:04<8:30:30,  1.51it/s]


  7%|██▌                                 | 3602/50000 [39:05<8:15:28,  1.56it/s]


  7%|██▌                                 | 3603/50000 [39:06<8:22:35,  1.54it/s]


  7%|██▌                                 | 3604/50000 [39:06<8:26:02,  1.53it/s]


  7%|██▌                                 | 3605/50000 [39:07<8:21:05,  1.54it/s]


  7%|██▌                                 | 3606/50000 [39:08<8:19:07,  1.55it/s]


  7%|██▌                                 | 3607/50000 [39:08<8:28:23,  1.52it/s]


  7%|██▌                                 | 3608/50000 [39:09<8:27:02,  1.52it/s]


  7%|██▌                                 | 3609/50000 [39:10<8:47:08,  1.47it/s]


  7%|██▌                                 | 3610/50000 [39:10<8:43:08,  1.48it/s]


  7%|██▌                                 | 3611/50000 [39:11<9:40:34,  1.33it/s]


  7%|██▌                                 | 3612/50000 [39:12<9:13:19,  1.40it/s]


  7%|██▌                                 | 3613/50000 [39:13<9:13:52,  1.40it/s]


  7%|██▌                                 | 3614/50000 [39:13<9:01:49,  1.43it/s]


  7%|██▌                                 | 3615/50000 [39:14<8:18:58,  1.55it/s]


  7%|██▌                                 | 3616/50000 [39:14<8:23:19,  1.54it/s]


  7%|██▌                                 | 3617/50000 [39:15<8:20:35,  1.54it/s]


  7%|██▌                                 | 3618/50000 [39:16<7:50:16,  1.64it/s]


  7%|██▌                                 | 3619/50000 [39:16<7:39:52,  1.68it/s]


  7%|██▌                                 | 3620/50000 [39:17<7:58:56,  1.61it/s]


  7%|██▌                                 | 3621/50000 [39:18<7:59:18,  1.61it/s]


  7%|██▌                                 | 3622/50000 [39:18<7:59:33,  1.61it/s]


  7%|██▌                                 | 3623/50000 [39:19<8:08:31,  1.58it/s]


  7%|██▌                                 | 3624/50000 [39:19<8:14:56,  1.56it/s]


  7%|██▌                                 | 3625/50000 [39:20<8:20:57,  1.54it/s]


  7%|██▌                                 | 3626/50000 [39:21<8:38:55,  1.49it/s]


  7%|██▌                                 | 3627/50000 [39:21<8:32:25,  1.51it/s]


  7%|██▌                                 | 3628/50000 [39:22<9:17:26,  1.39it/s]


  7%|██▌                                 | 3629/50000 [39:23<9:01:57,  1.43it/s]


  7%|██▌                                 | 3630/50000 [39:24<8:33:29,  1.51it/s]


  7%|██▌                                 | 3631/50000 [39:24<8:29:29,  1.52it/s]


  7%|██▌                                 | 3632/50000 [39:25<8:24:27,  1.53it/s]


  7%|██▌                                 | 3633/50000 [39:25<8:07:11,  1.59it/s]


  7%|██▌                                 | 3634/50000 [39:26<8:11:20,  1.57it/s]


  7%|██▌                                 | 3635/50000 [39:27<7:56:33,  1.62it/s]


  7%|██▌                                 | 3636/50000 [39:27<7:44:45,  1.66it/s]


  7%|██▌                                 | 3637/50000 [39:28<7:58:24,  1.62it/s]


  7%|██▌                                 | 3638/50000 [39:28<7:53:14,  1.63it/s]


  7%|██▌                                 | 3639/50000 [39:29<8:07:21,  1.59it/s]


  7%|██▌                                 | 3640/50000 [39:30<7:54:56,  1.63it/s]


  7%|██▌                                 | 3641/50000 [39:30<7:48:49,  1.65it/s]


  7%|██▌                                 | 3642/50000 [39:31<8:05:02,  1.59it/s]


  7%|██▌                                 | 3643/50000 [39:32<7:57:32,  1.62it/s]


  7%|██▌                                 | 3644/50000 [39:32<8:20:10,  1.54it/s]


  7%|██▌                                 | 3645/50000 [39:33<8:37:52,  1.49it/s]


  7%|██▋                                 | 3646/50000 [39:34<8:21:48,  1.54it/s]


  7%|██▋                                 | 3647/50000 [39:34<9:09:26,  1.41it/s]


  7%|██▋                                 | 3648/50000 [39:35<9:01:40,  1.43it/s]


  7%|██▋                                 | 3649/50000 [39:36<9:16:03,  1.39it/s]


  7%|██▋                                 | 3650/50000 [39:37<8:44:17,  1.47it/s]


  7%|██▋                                 | 3651/50000 [39:37<8:24:47,  1.53it/s]


  7%|██▋                                 | 3652/50000 [39:38<8:27:38,  1.52it/s]


  7%|██▋                                 | 3653/50000 [39:38<8:21:07,  1.54it/s]


  7%|██▋                                 | 3654/50000 [39:39<8:59:59,  1.43it/s]


  7%|██▋                                 | 3655/50000 [39:40<8:53:27,  1.45it/s]


  7%|██▋                                 | 3656/50000 [39:40<8:25:36,  1.53it/s]


  7%|██▋                                 | 3657/50000 [39:41<8:00:02,  1.61it/s]


  7%|██▋                                 | 3658/50000 [39:42<7:57:33,  1.62it/s]


  7%|██▋                                 | 3659/50000 [39:42<7:45:15,  1.66it/s]


  7%|██▋                                 | 3660/50000 [39:43<7:40:44,  1.68it/s]


  7%|██▋                                 | 3661/50000 [39:43<8:00:53,  1.61it/s]


  7%|██▋                                 | 3662/50000 [39:44<7:54:31,  1.63it/s]


  7%|██▋                                 | 3663/50000 [39:45<7:57:26,  1.62it/s]


  7%|██▋                                 | 3664/50000 [39:45<7:46:19,  1.66it/s]


  7%|██▋                                 | 3665/50000 [39:46<7:41:28,  1.67it/s]


  7%|██▋                                 | 3666/50000 [39:46<7:43:46,  1.67it/s]


  7%|██▋                                 | 3667/50000 [39:47<7:32:24,  1.71it/s]


  7%|██▋                                 | 3668/50000 [39:48<7:30:16,  1.71it/s]


  7%|██▋                                 | 3669/50000 [39:48<8:16:30,  1.56it/s]


  7%|██▋                                 | 3670/50000 [39:49<8:39:51,  1.49it/s]


  7%|██▋                                 | 3671/50000 [39:50<8:39:19,  1.49it/s]


  7%|██▋                                 | 3672/50000 [39:50<8:42:07,  1.48it/s]


  7%|██▋                                 | 3673/50000 [39:51<8:59:06,  1.43it/s]


  7%|██▋                                 | 3674/50000 [39:52<8:17:00,  1.55it/s]


  7%|██▋                                 | 3675/50000 [39:52<8:32:10,  1.51it/s]


  7%|██▋                                 | 3676/50000 [39:53<8:49:14,  1.46it/s]


  7%|██▋                                 | 3677/50000 [39:54<8:49:00,  1.46it/s]


  7%|██▋                                 | 3678/50000 [39:55<8:48:36,  1.46it/s]


  7%|██▋                                 | 3679/50000 [39:55<9:06:38,  1.41it/s]


  7%|██▋                                 | 3680/50000 [39:56<9:34:33,  1.34it/s]


  7%|██▋                                 | 3681/50000 [39:57<8:54:16,  1.44it/s]


  7%|██▋                                 | 3682/50000 [39:57<8:14:58,  1.56it/s]


  7%|██▋                                 | 3683/50000 [39:58<8:13:13,  1.57it/s]


  7%|██▋                                 | 3684/50000 [39:58<7:58:46,  1.61it/s]


  7%|██▋                                 | 3685/50000 [39:59<8:06:07,  1.59it/s]


  7%|██▋                                 | 3686/50000 [40:00<8:08:23,  1.58it/s]


  7%|██▋                                 | 3687/50000 [40:00<8:08:42,  1.58it/s]


  7%|██▋                                 | 3688/50000 [40:01<8:09:09,  1.58it/s]


  7%|██▋                                 | 3689/50000 [40:02<8:21:58,  1.54it/s]


  7%|██▋                                 | 3690/50000 [40:02<8:18:39,  1.55it/s]


  7%|██▋                                 | 3691/50000 [40:03<8:20:53,  1.54it/s]


  7%|██▋                                 | 3692/50000 [40:04<8:22:00,  1.54it/s]


  7%|██▋                                 | 3693/50000 [40:04<8:08:05,  1.58it/s]


  7%|██▋                                 | 3694/50000 [40:05<8:31:09,  1.51it/s]


  7%|██▋                                 | 3695/50000 [40:06<8:36:31,  1.49it/s]


  7%|██▋                                 | 3696/50000 [40:06<8:51:02,  1.45it/s]


  7%|██▋                                 | 3697/50000 [40:07<8:32:47,  1.50it/s]


  7%|██▋                                 | 3698/50000 [40:08<8:07:27,  1.58it/s]


  7%|██▋                                 | 3699/50000 [40:08<7:56:34,  1.62it/s]


  7%|██▋                                 | 3700/50000 [40:09<7:50:19,  1.64it/s]
                                                                                
{'loss': 3.3988, 'grad_norm': 2.535391092300415, 'learning_rate': 0.0009260000000000001, 'epoch': 0.19}

  7%|██▋                                 | 3700/50000 [40:09<7:50:19,  1.64it/s]


  7%|██▋                                 | 3701/50000 [40:09<8:30:41,  1.51it/s]


  7%|██▋                                 | 3702/50000 [40:10<8:31:30,  1.51it/s]


  7%|██▋                                 | 3703/50000 [40:11<8:12:51,  1.57it/s]


  7%|██▋                                 | 3704/50000 [40:11<8:45:47,  1.47it/s]


  7%|██▋                                 | 3705/50000 [40:12<8:36:25,  1.49it/s]


  7%|██▋                                 | 3706/50000 [40:13<8:25:06,  1.53it/s]


  7%|██▋                                 | 3707/50000 [40:13<8:18:43,  1.55it/s]


  7%|██▋                                 | 3708/50000 [40:14<8:33:30,  1.50it/s]


  7%|██▋                                 | 3709/50000 [40:15<8:13:00,  1.56it/s]


  7%|██▋                                 | 3710/50000 [40:15<8:11:40,  1.57it/s]


  7%|██▋                                 | 3711/50000 [40:16<7:54:34,  1.63it/s]


  7%|██▋                                 | 3712/50000 [40:17<8:19:40,  1.54it/s]


  7%|██▋                                 | 3713/50000 [40:17<8:40:15,  1.48it/s]


  7%|██▋                                 | 3714/50000 [40:18<8:30:55,  1.51it/s]


  7%|██▋                                 | 3715/50000 [40:19<8:29:59,  1.51it/s]


  7%|██▋                                 | 3716/50000 [40:19<8:26:30,  1.52it/s]


  7%|██▋                                 | 3717/50000 [40:20<8:02:08,  1.60it/s]


  7%|██▋                                 | 3718/50000 [40:20<7:55:13,  1.62it/s]


  7%|██▋                                 | 3719/50000 [40:21<8:21:38,  1.54it/s]


  7%|██▋                                 | 3720/50000 [40:22<9:23:28,  1.37it/s]


  7%|██▋                                 | 3721/50000 [40:23<9:35:46,  1.34it/s]


  7%|██▋                                 | 3722/50000 [40:23<9:13:45,  1.39it/s]


  7%|██▋                                 | 3723/50000 [40:24<8:54:05,  1.44it/s]


  7%|██▋                                 | 3724/50000 [40:25<8:34:20,  1.50it/s]


  7%|██▋                                 | 3725/50000 [40:25<8:11:53,  1.57it/s]


  7%|██▋                                 | 3726/50000 [40:26<8:36:37,  1.49it/s]


  7%|██▋                                 | 3727/50000 [40:27<8:51:45,  1.45it/s]


  7%|██▋                                 | 3728/50000 [40:27<8:31:43,  1.51it/s]


  7%|██▋                                 | 3729/50000 [40:28<8:29:21,  1.51it/s]


  7%|██▋                                 | 3730/50000 [40:29<8:26:28,  1.52it/s]


  7%|██▋                                 | 3731/50000 [40:29<8:27:11,  1.52it/s]


  7%|██▋                                 | 3732/50000 [40:30<8:14:32,  1.56it/s]


  7%|██▋                                 | 3733/50000 [40:31<8:38:30,  1.49it/s]


  7%|██▋                                 | 3734/50000 [40:31<8:50:49,  1.45it/s]


  7%|██▋                                 | 3735/50000 [40:32<9:23:38,  1.37it/s]


  7%|██▋                                 | 3736/50000 [40:33<8:54:06,  1.44it/s]


  7%|██▋                                 | 3737/50000 [40:34<8:46:27,  1.46it/s]


  7%|██▋                                 | 3738/50000 [40:34<8:15:58,  1.55it/s]


  7%|██▋                                 | 3739/50000 [40:35<8:03:19,  1.60it/s]


  7%|██▋                                 | 3740/50000 [40:35<8:22:18,  1.53it/s]


  7%|██▋                                 | 3741/50000 [40:36<8:04:58,  1.59it/s]


  7%|██▋                                 | 3742/50000 [40:37<7:49:50,  1.64it/s]


  7%|██▋                                 | 3743/50000 [40:37<7:49:25,  1.64it/s]


  7%|██▋                                 | 3744/50000 [40:38<8:03:52,  1.59it/s]


  7%|██▋                                 | 3745/50000 [40:38<8:10:56,  1.57it/s]


  7%|██▋                                 | 3746/50000 [40:39<8:34:25,  1.50it/s]


  7%|██▋                                 | 3747/50000 [40:40<9:16:00,  1.39it/s]


  7%|██▋                                 | 3748/50000 [40:41<9:57:09,  1.29it/s]


  7%|██▋                                 | 3749/50000 [40:42<9:12:48,  1.39it/s]


  8%|██▋                                 | 3750/50000 [40:42<8:27:52,  1.52it/s]


  8%|██▋                                 | 3751/50000 [40:43<8:13:02,  1.56it/s]


  8%|██▋                                 | 3752/50000 [40:43<8:32:27,  1.50it/s]


  8%|██▋                                 | 3753/50000 [40:44<7:55:27,  1.62it/s]


  8%|██▋                                 | 3754/50000 [40:44<7:48:57,  1.64it/s]


  8%|██▋                                 | 3755/50000 [40:45<7:29:01,  1.72it/s]


  8%|██▋                                 | 3756/50000 [40:46<7:42:07,  1.67it/s]


  8%|██▋                                 | 3757/50000 [40:46<7:21:18,  1.75it/s]


  8%|██▋                                 | 3758/50000 [40:47<7:42:39,  1.67it/s]


  8%|██▋                                 | 3759/50000 [40:48<8:14:44,  1.56it/s]


  8%|██▋                                 | 3760/50000 [40:48<8:17:04,  1.55it/s]


  8%|██▋                                 | 3761/50000 [40:49<8:04:04,  1.59it/s]


  8%|██▋                                 | 3762/50000 [40:49<8:11:16,  1.57it/s]


  8%|██▋                                 | 3763/50000 [40:50<8:01:29,  1.60it/s]


  8%|██▋                                 | 3764/50000 [40:51<7:57:26,  1.61it/s]


  8%|██▋                                 | 3765/50000 [40:51<8:02:02,  1.60it/s]


  8%|██▋                                 | 3766/50000 [40:52<8:06:42,  1.58it/s]


  8%|██▋                                 | 3767/50000 [40:52<7:48:06,  1.65it/s]


  8%|██▋                                 | 3768/50000 [40:53<8:36:13,  1.49it/s]


  8%|██▋                                 | 3769/50000 [40:54<8:45:26,  1.47it/s]


  8%|██▋                                 | 3770/50000 [40:55<8:53:11,  1.45it/s]


  8%|██▋                                 | 3771/50000 [40:55<8:26:31,  1.52it/s]


  8%|██▋                                 | 3772/50000 [40:56<8:09:18,  1.57it/s]


  8%|██▋                                 | 3773/50000 [40:57<8:32:41,  1.50it/s]


  8%|██▋                                 | 3774/50000 [40:57<8:31:03,  1.51it/s]


  8%|██▋                                 | 3775/50000 [40:58<8:11:24,  1.57it/s]


  8%|██▋                                 | 3776/50000 [40:58<8:09:23,  1.57it/s]


  8%|██▋                                 | 3777/50000 [40:59<7:40:51,  1.67it/s]


  8%|██▋                                 | 3778/50000 [41:00<8:13:21,  1.56it/s]


  8%|██▋                                 | 3779/50000 [41:00<8:14:59,  1.56it/s]


  8%|██▋                                 | 3780/50000 [41:01<8:21:58,  1.53it/s]


  8%|██▋                                 | 3781/50000 [41:02<9:02:02,  1.42it/s]


  8%|██▋                                 | 3782/50000 [41:03<8:47:58,  1.46it/s]


  8%|██▋                                 | 3783/50000 [41:03<8:48:09,  1.46it/s]


  8%|██▋                                 | 3784/50000 [41:04<8:10:51,  1.57it/s]


  8%|██▋                                 | 3785/50000 [41:04<8:00:05,  1.60it/s]


  8%|██▋                                 | 3786/50000 [41:05<8:11:39,  1.57it/s]


  8%|██▋                                 | 3787/50000 [41:06<8:09:40,  1.57it/s]


  8%|██▋                                 | 3788/50000 [41:06<8:43:46,  1.47it/s]


  8%|██▋                                 | 3789/50000 [41:07<8:50:54,  1.45it/s]


  8%|██▋                                 | 3790/50000 [41:08<8:55:54,  1.44it/s]


  8%|██▋                                 | 3791/50000 [41:08<8:21:35,  1.54it/s]


  8%|██▋                                 | 3792/50000 [41:09<8:35:20,  1.49it/s]


  8%|██▋                                 | 3793/50000 [41:10<8:19:52,  1.54it/s]


  8%|██▋                                 | 3794/50000 [41:10<8:02:08,  1.60it/s]


  8%|██▋                                 | 3795/50000 [41:11<8:26:51,  1.52it/s]


  8%|██▋                                 | 3796/50000 [41:12<8:04:02,  1.59it/s]


  8%|██▋                                 | 3797/50000 [41:12<7:59:29,  1.61it/s]


  8%|██▋                                 | 3798/50000 [41:13<7:50:33,  1.64it/s]


  8%|██▋                                 | 3799/50000 [41:14<8:36:55,  1.49it/s]


  8%|██▋                                 | 3800/50000 [41:14<8:52:26,  1.45it/s]
                                                                                
{'loss': 3.3934, 'grad_norm': 2.6797447204589844, 'learning_rate': 0.000924, 'epoch': 0.2}

  8%|██▋                                 | 3800/50000 [41:14<8:52:26,  1.45it/s]


  8%|██▋                                 | 3801/50000 [41:15<8:40:44,  1.48it/s]


  8%|██▋                                 | 3802/50000 [41:16<8:53:48,  1.44it/s]


  8%|██▋                                 | 3803/50000 [41:16<9:09:57,  1.40it/s]


  8%|██▋                                 | 3804/50000 [41:17<8:38:01,  1.49it/s]


  8%|██▋                                 | 3805/50000 [41:18<8:47:50,  1.46it/s]


  8%|██▋                                 | 3806/50000 [41:18<8:28:17,  1.51it/s]


  8%|██▋                                 | 3807/50000 [41:19<8:05:52,  1.58it/s]


  8%|██▋                                 | 3808/50000 [41:19<7:53:09,  1.63it/s]


  8%|██▋                                 | 3809/50000 [41:20<7:31:49,  1.70it/s]


  8%|██▋                                 | 3810/50000 [41:21<7:46:44,  1.65it/s]


  8%|██▋                                 | 3811/50000 [41:21<8:27:22,  1.52it/s]


  8%|██▋                                 | 3812/50000 [41:22<8:55:33,  1.44it/s]


  8%|██▋                                 | 3813/50000 [41:23<9:33:46,  1.34it/s]


  8%|██▋                                 | 3814/50000 [41:24<9:18:04,  1.38it/s]


  8%|██▋                                 | 3815/50000 [41:24<8:48:02,  1.46it/s]


  8%|██▋                                 | 3816/50000 [41:25<8:22:40,  1.53it/s]


  8%|██▋                                 | 3817/50000 [41:26<8:10:38,  1.57it/s]


  8%|██▋                                 | 3818/50000 [41:26<8:13:47,  1.56it/s]


  8%|██▋                                 | 3819/50000 [41:27<8:18:06,  1.55it/s]


  8%|██▊                                 | 3820/50000 [41:28<8:38:39,  1.48it/s]


  8%|██▊                                 | 3821/50000 [41:28<8:29:17,  1.51it/s]


  8%|██▊                                 | 3822/50000 [41:29<8:15:21,  1.55it/s]


  8%|██▊                                 | 3823/50000 [41:29<8:14:43,  1.56it/s]


  8%|██▊                                 | 3824/50000 [41:30<9:15:46,  1.38it/s]


  8%|██▊                                 | 3825/50000 [41:31<8:54:08,  1.44it/s]


  8%|██▊                                 | 3826/50000 [41:32<9:07:18,  1.41it/s]


  8%|██▊                                 | 3827/50000 [41:32<8:51:19,  1.45it/s]


  8%|██▊                                 | 3828/50000 [41:33<8:08:55,  1.57it/s]


  8%|██▊                                 | 3829/50000 [41:34<8:17:38,  1.55it/s]


  8%|██▊                                 | 3830/50000 [41:34<8:13:38,  1.56it/s]


  8%|██▊                                 | 3831/50000 [41:35<8:23:43,  1.53it/s]


  8%|██▊                                 | 3832/50000 [41:35<8:20:04,  1.54it/s]


  8%|██▊                                 | 3833/50000 [41:36<8:17:32,  1.55it/s]


  8%|██▊                                 | 3834/50000 [41:37<8:44:11,  1.47it/s]


  8%|██▊                                 | 3835/50000 [41:37<8:19:59,  1.54it/s]


  8%|██▊                                 | 3836/50000 [41:38<8:07:11,  1.58it/s]


  8%|██▊                                 | 3837/50000 [41:39<8:18:43,  1.54it/s]


  8%|██▊                                 | 3838/50000 [41:40<9:07:54,  1.40it/s]


  8%|██▊                                 | 3839/50000 [41:40<8:58:26,  1.43it/s]


  8%|██▊                                 | 3840/50000 [41:41<9:55:06,  1.29it/s]


  8%|██▊                                 | 3841/50000 [41:42<9:48:06,  1.31it/s]


  8%|██▊                                 | 3842/50000 [41:43<9:07:21,  1.41it/s]


  8%|██▊                                 | 3843/50000 [41:43<9:07:00,  1.41it/s]


  8%|██▊                                 | 3844/50000 [41:44<8:35:43,  1.49it/s]


  8%|██▊                                 | 3845/50000 [41:45<8:44:34,  1.47it/s]


  8%|██▊                                 | 3846/50000 [41:45<8:25:54,  1.52it/s]


  8%|██▊                                 | 3847/50000 [41:46<9:04:37,  1.41it/s]


  8%|██▊                                 | 3848/50000 [41:47<8:29:53,  1.51it/s]


  8%|██▊                                 | 3849/50000 [41:47<8:34:46,  1.49it/s]


  8%|██▊                                 | 3850/50000 [41:48<8:09:39,  1.57it/s]


  8%|██▊                                 | 3851/50000 [41:48<8:07:35,  1.58it/s]


  8%|██▊                                 | 3852/50000 [41:49<7:57:06,  1.61it/s]


  8%|██▊                                 | 3853/50000 [41:50<7:34:32,  1.69it/s]


  8%|██▊                                 | 3854/50000 [41:50<7:27:05,  1.72it/s]


  8%|██▊                                 | 3855/50000 [41:51<7:46:47,  1.65it/s]


  8%|██▊                                 | 3856/50000 [41:51<7:20:31,  1.75it/s]


  8%|██▊                                 | 3857/50000 [41:52<7:46:32,  1.65it/s]


  8%|██▊                                 | 3858/50000 [41:53<8:44:28,  1.47it/s]


  8%|██▊                                 | 3859/50000 [41:53<8:06:47,  1.58it/s]


  8%|██▊                                 | 3860/50000 [41:54<8:53:47,  1.44it/s]


  8%|██▊                                 | 3861/50000 [41:55<8:42:48,  1.47it/s]


  8%|██▊                                 | 3862/50000 [41:56<9:00:16,  1.42it/s]


  8%|██▊                                 | 3863/50000 [41:56<8:42:45,  1.47it/s]


  8%|██▊                                 | 3864/50000 [41:57<8:20:12,  1.54it/s]


  8%|██▊                                 | 3865/50000 [41:57<7:58:49,  1.61it/s]


  8%|██▊                                 | 3866/50000 [41:58<7:49:37,  1.64it/s]


  8%|██▊                                 | 3867/50000 [41:58<7:43:09,  1.66it/s]


  8%|██▊                                 | 3868/50000 [41:59<7:38:33,  1.68it/s]


  8%|██▊                                 | 3869/50000 [42:00<7:29:39,  1.71it/s]


  8%|██▊                                 | 3870/50000 [42:00<7:43:45,  1.66it/s]


  8%|██▊                                 | 3871/50000 [42:01<7:33:16,  1.70it/s]


  8%|██▊                                 | 3872/50000 [42:01<7:31:40,  1.70it/s]


  8%|██▊                                 | 3873/50000 [42:02<8:28:48,  1.51it/s]


  8%|██▊                                 | 3874/50000 [42:03<8:26:33,  1.52it/s]


  8%|██▊                                 | 3875/50000 [42:04<8:22:03,  1.53it/s]


  8%|██▊                                 | 3876/50000 [42:04<8:29:05,  1.51it/s]


  8%|██▊                                 | 3877/50000 [42:05<8:09:09,  1.57it/s]


  8%|██▊                                 | 3878/50000 [42:05<8:11:17,  1.56it/s]


  8%|██▊                                 | 3879/50000 [42:06<8:08:21,  1.57it/s]


  8%|██▊                                 | 3880/50000 [42:07<8:19:30,  1.54it/s]


  8%|██▊                                 | 3881/50000 [42:07<8:25:49,  1.52it/s]


  8%|██▊                                 | 3882/50000 [42:08<8:09:43,  1.57it/s]


  8%|██▊                                 | 3883/50000 [42:09<8:17:33,  1.54it/s]


  8%|██▊                                 | 3884/50000 [42:09<8:08:15,  1.57it/s]


  8%|██▊                                 | 3885/50000 [42:10<8:17:51,  1.54it/s]


  8%|██▊                                 | 3886/50000 [42:11<8:18:42,  1.54it/s]


  8%|██▊                                 | 3887/50000 [42:11<8:18:00,  1.54it/s]


  8%|██▊                                 | 3888/50000 [42:12<8:15:55,  1.55it/s]


  8%|██▊                                 | 3889/50000 [42:13<8:55:07,  1.44it/s]


  8%|██▊                                 | 3890/50000 [42:13<8:33:06,  1.50it/s]


  8%|██▊                                 | 3891/50000 [42:14<8:45:58,  1.46it/s]


  8%|██▊                                 | 3892/50000 [42:15<8:41:22,  1.47it/s]


  8%|██▊                                 | 3893/50000 [42:15<8:23:32,  1.53it/s]


  8%|██▊                                 | 3894/50000 [42:16<8:10:59,  1.57it/s]


  8%|██▊                                 | 3895/50000 [42:17<8:12:33,  1.56it/s]


  8%|██▊                                 | 3896/50000 [42:17<9:00:19,  1.42it/s]


  8%|██▊                                 | 3897/50000 [42:18<8:33:55,  1.50it/s]


  8%|██▊                                 | 3898/50000 [42:19<8:43:09,  1.47it/s]


  8%|██▊                                 | 3899/50000 [42:19<8:14:38,  1.55it/s]


  8%|██▊                                 | 3900/50000 [42:20<8:31:41,  1.50it/s]
                                                                                
{'loss': 3.3907, 'grad_norm': 2.7818663120269775, 'learning_rate': 0.0009220000000000001, 'epoch': 0.2}

  8%|██▊                                 | 3900/50000 [42:20<8:31:41,  1.50it/s]


  8%|██▊                                 | 3901/50000 [42:20<7:54:23,  1.62it/s]


  8%|██▊                                 | 3902/50000 [42:21<8:09:45,  1.57it/s]


  8%|██▊                                 | 3903/50000 [42:22<8:28:15,  1.51it/s]


  8%|██▊                                 | 3904/50000 [42:23<8:39:29,  1.48it/s]


  8%|██▊                                 | 3905/50000 [42:23<8:32:34,  1.50it/s]


  8%|██▊                                 | 3906/50000 [42:24<8:58:50,  1.43it/s]


  8%|██▊                                 | 3907/50000 [42:25<8:53:16,  1.44it/s]


  8%|██▊                                 | 3908/50000 [42:25<8:36:19,  1.49it/s]


  8%|██▊                                 | 3909/50000 [42:26<8:16:57,  1.55it/s]


  8%|██▊                                 | 3910/50000 [42:27<8:34:36,  1.49it/s]


  8%|██▊                                 | 3911/50000 [42:27<8:30:21,  1.51it/s]


  8%|██▊                                 | 3912/50000 [42:28<8:40:33,  1.48it/s]


  8%|██▊                                 | 3913/50000 [42:28<8:00:26,  1.60it/s]


  8%|██▊                                 | 3914/50000 [42:29<8:05:58,  1.58it/s]


  8%|██▊                                 | 3915/50000 [42:30<8:17:44,  1.54it/s]


  8%|██▊                                 | 3916/50000 [42:31<9:02:11,  1.42it/s]


  8%|██▊                                 | 3917/50000 [42:31<8:38:01,  1.48it/s]


  8%|██▊                                 | 3918/50000 [42:32<8:32:44,  1.50it/s]


  8%|██▊                                 | 3919/50000 [42:33<8:17:23,  1.54it/s]


  8%|██▊                                 | 3920/50000 [42:33<8:14:05,  1.55it/s]


  8%|██▊                                 | 3921/50000 [42:34<7:58:39,  1.60it/s]


  8%|██▊                                 | 3922/50000 [42:35<8:44:23,  1.46it/s]


  8%|██▊                                 | 3923/50000 [42:35<8:55:12,  1.43it/s]


  8%|██▊                                 | 3924/50000 [42:36<8:10:45,  1.56it/s]


  8%|██▊                                 | 3925/50000 [42:36<8:10:48,  1.56it/s]


  8%|██▊                                 | 3926/50000 [42:37<7:54:48,  1.62it/s]


  8%|██▊                                 | 3927/50000 [42:38<7:46:37,  1.65it/s]


  8%|██▊                                 | 3928/50000 [42:38<7:43:39,  1.66it/s]


  8%|██▊                                 | 3929/50000 [42:39<7:47:35,  1.64it/s]


  8%|██▊                                 | 3930/50000 [42:39<7:38:39,  1.67it/s]


  8%|██▊                                 | 3931/50000 [42:40<8:04:29,  1.58it/s]


  8%|██▊                                 | 3932/50000 [42:41<8:22:33,  1.53it/s]


  8%|██▊                                 | 3933/50000 [42:41<8:29:10,  1.51it/s]


  8%|██▊                                 | 3934/50000 [42:42<8:48:54,  1.45it/s]


  8%|██▊                                 | 3935/50000 [42:43<8:17:12,  1.54it/s]


  8%|██▊                                 | 3936/50000 [42:43<8:16:50,  1.55it/s]


  8%|██▊                                 | 3937/50000 [42:44<8:13:36,  1.56it/s]


  8%|██▊                                 | 3938/50000 [42:45<7:42:07,  1.66it/s]


  8%|██▊                                 | 3939/50000 [42:45<7:51:22,  1.63it/s]


  8%|██▊                                 | 3940/50000 [42:46<7:46:20,  1.65it/s]


  8%|██▊                                 | 3941/50000 [42:46<8:13:09,  1.56it/s]


  8%|██▊                                 | 3942/50000 [42:47<8:05:03,  1.58it/s]


  8%|██▊                                 | 3943/50000 [42:48<7:52:25,  1.62it/s]


  8%|██▊                                 | 3944/50000 [42:48<8:24:52,  1.52it/s]


  8%|██▊                                 | 3945/50000 [42:49<8:08:55,  1.57it/s]


  8%|██▊                                 | 3946/50000 [42:50<7:53:40,  1.62it/s]


  8%|██▊                                 | 3947/50000 [42:50<7:54:35,  1.62it/s]


  8%|██▊                                 | 3948/50000 [42:51<8:26:08,  1.52it/s]


  8%|██▊                                 | 3949/50000 [42:51<7:50:21,  1.63it/s]


  8%|██▊                                 | 3950/50000 [42:52<8:02:22,  1.59it/s]


  8%|██▊                                 | 3951/50000 [42:53<8:03:32,  1.59it/s]


  8%|██▊                                 | 3952/50000 [42:53<8:07:03,  1.58it/s]


  8%|██▊                                 | 3953/50000 [42:54<8:06:49,  1.58it/s]


  8%|██▊                                 | 3954/50000 [42:55<7:51:57,  1.63it/s]


  8%|██▊                                 | 3955/50000 [42:55<7:45:48,  1.65it/s]


  8%|██▊                                 | 3956/50000 [42:56<7:51:51,  1.63it/s]


  8%|██▊                                 | 3957/50000 [42:57<8:03:19,  1.59it/s]


  8%|██▊                                 | 3958/50000 [42:57<8:27:58,  1.51it/s]


  8%|██▊                                 | 3959/50000 [42:58<8:06:52,  1.58it/s]


  8%|██▊                                 | 3960/50000 [42:58<8:16:40,  1.54it/s]


  8%|██▊                                 | 3961/50000 [42:59<8:46:09,  1.46it/s]


  8%|██▊                                 | 3962/50000 [43:00<8:41:17,  1.47it/s]


  8%|██▊                                 | 3963/50000 [43:01<8:40:45,  1.47it/s]


  8%|██▊                                 | 3964/50000 [43:01<8:54:06,  1.44it/s]


  8%|██▊                                 | 3965/50000 [43:02<9:02:08,  1.42it/s]


  8%|██▊                                 | 3966/50000 [43:03<9:06:07,  1.40it/s]


  8%|██▊                                 | 3967/50000 [43:04<9:35:56,  1.33it/s]


  8%|██▊                                 | 3968/50000 [43:04<9:00:22,  1.42it/s]


  8%|██▊                                 | 3969/50000 [43:05<8:45:18,  1.46it/s]


  8%|██▊                                 | 3970/50000 [43:05<8:26:05,  1.52it/s]


  8%|██▊                                 | 3971/50000 [43:07<9:52:14,  1.30it/s]


  8%|██▊                                 | 3972/50000 [43:07<8:50:27,  1.45it/s]


  8%|██▊                                 | 3973/50000 [43:08<8:18:08,  1.54it/s]


  8%|██▊                                 | 3974/50000 [43:08<8:20:05,  1.53it/s]


  8%|██▊                                 | 3975/50000 [43:09<8:14:16,  1.55it/s]


  8%|██▊                                 | 3976/50000 [43:10<8:17:27,  1.54it/s]


  8%|██▊                                 | 3977/50000 [43:10<8:24:08,  1.52it/s]


  8%|██▊                                 | 3978/50000 [43:11<8:25:51,  1.52it/s]


  8%|██▊                                 | 3979/50000 [43:12<8:25:36,  1.52it/s]


  8%|██▊                                 | 3980/50000 [43:12<8:10:49,  1.56it/s]


  8%|██▊                                 | 3981/50000 [43:13<8:15:11,  1.55it/s]


  8%|██▊                                 | 3982/50000 [43:13<8:13:42,  1.55it/s]


  8%|██▊                                 | 3983/50000 [43:14<8:12:39,  1.56it/s]


  8%|██▊                                 | 3984/50000 [43:15<8:16:22,  1.55it/s]


  8%|██▊                                 | 3985/50000 [43:15<8:33:53,  1.49it/s]


  8%|██▊                                 | 3986/50000 [43:16<8:16:25,  1.54it/s]


  8%|██▊                                 | 3987/50000 [43:17<9:01:00,  1.42it/s]


  8%|██▊                                 | 3988/50000 [43:18<9:09:41,  1.40it/s]


  8%|██▊                                 | 3989/50000 [43:18<8:57:42,  1.43it/s]


  8%|██▊                                 | 3990/50000 [43:19<8:46:15,  1.46it/s]


  8%|██▊                                 | 3991/50000 [43:20<8:57:50,  1.43it/s]


  8%|██▊                                 | 3992/50000 [43:20<8:30:26,  1.50it/s]


  8%|██▊                                 | 3993/50000 [43:21<8:25:40,  1.52it/s]


  8%|██▉                                 | 3994/50000 [43:21<8:09:12,  1.57it/s]


  8%|██▉                                 | 3995/50000 [43:22<8:09:26,  1.57it/s]


  8%|██▉                                 | 3996/50000 [43:23<8:16:59,  1.54it/s]


  8%|██▉                                 | 3997/50000 [43:23<8:10:41,  1.56it/s]


  8%|██▉                                 | 3998/50000 [43:24<8:06:17,  1.58it/s]


  8%|██▉                                 | 3999/50000 [43:25<9:05:51,  1.40it/s]


  8%|██▉                                 | 4000/50000 [43:26<8:41:44,  1.47it/s]
                                                                                
{'loss': 3.4136, 'grad_norm': 2.703531503677368, 'learning_rate': 0.00092, 'epoch': 0.21}

  8%|██▉                                 | 4000/50000 [43:26<8:41:44,  1.47it/s]


  8%|██▉                                 | 4001/50000 [43:26<8:23:41,  1.52it/s]


  8%|██▉                                 | 4002/50000 [43:27<9:09:14,  1.40it/s]


  8%|██▉                                 | 4003/50000 [43:28<9:15:24,  1.38it/s]


  8%|██▉                                 | 4004/50000 [43:28<9:04:30,  1.41it/s]


  8%|██▉                                 | 4005/50000 [43:29<8:48:07,  1.45it/s]


  8%|██▉                                 | 4006/50000 [43:30<8:29:15,  1.51it/s]


  8%|██▉                                 | 4007/50000 [43:30<8:45:46,  1.46it/s]


  8%|██▉                                 | 4008/50000 [43:31<8:02:24,  1.59it/s]


  8%|██▉                                 | 4009/50000 [43:32<9:05:59,  1.40it/s]


  8%|██▉                                 | 4010/50000 [43:32<8:40:23,  1.47it/s]


  8%|██▉                                 | 4011/50000 [43:33<8:37:07,  1.48it/s]


  8%|██▉                                 | 4012/50000 [43:34<8:36:28,  1.48it/s]


  8%|██▉                                 | 4013/50000 [43:34<8:47:57,  1.45it/s]


  8%|██▉                                 | 4014/50000 [43:35<8:54:01,  1.44it/s]


  8%|██▉                                 | 4015/50000 [43:36<8:29:07,  1.51it/s]


  8%|██▉                                 | 4016/50000 [43:36<8:23:15,  1.52it/s]


  8%|██▉                                 | 4017/50000 [43:37<9:05:36,  1.40it/s]


  8%|██▉                                 | 4018/50000 [43:38<8:46:01,  1.46it/s]


  8%|██▉                                 | 4019/50000 [43:39<8:57:26,  1.43it/s]


  8%|██▉                                 | 4020/50000 [43:39<8:41:49,  1.47it/s]


  8%|██▉                                 | 4021/50000 [43:40<8:40:54,  1.47it/s]


  8%|██▉                                 | 4022/50000 [43:40<8:17:06,  1.54it/s]


  8%|██▉                                 | 4023/50000 [43:41<8:06:02,  1.58it/s]


  8%|██▉                                 | 4024/50000 [43:42<7:56:59,  1.61it/s]


  8%|██▉                                 | 4025/50000 [43:42<7:40:33,  1.66it/s]


  8%|██▉                                 | 4026/50000 [43:43<8:11:42,  1.56it/s]


  8%|██▉                                 | 4027/50000 [43:44<7:56:34,  1.61it/s]


  8%|██▉                                 | 4028/50000 [43:44<7:32:12,  1.69it/s]


  8%|██▉                                 | 4029/50000 [43:45<8:05:47,  1.58it/s]


  8%|██▉                                 | 4030/50000 [43:45<8:11:22,  1.56it/s]


  8%|██▉                                 | 4031/50000 [43:46<8:02:07,  1.59it/s]


  8%|██▉                                 | 4032/50000 [43:47<8:25:22,  1.52it/s]


  8%|██▉                                 | 4033/50000 [43:47<8:36:39,  1.48it/s]


  8%|██▉                                 | 4034/50000 [43:48<8:15:34,  1.55it/s]


  8%|██▉                                 | 4035/50000 [43:49<7:57:52,  1.60it/s]


  8%|██▉                                 | 4036/50000 [43:49<8:01:23,  1.59it/s]


  8%|██▉                                 | 4037/50000 [43:50<8:36:30,  1.48it/s]


  8%|██▉                                 | 4038/50000 [43:51<8:35:42,  1.49it/s]


  8%|██▉                                 | 4039/50000 [43:51<8:07:34,  1.57it/s]


  8%|██▉                                 | 4040/50000 [43:52<8:26:37,  1.51it/s]


  8%|██▉                                 | 4041/50000 [43:53<8:30:45,  1.50it/s]


  8%|██▉                                 | 4042/50000 [43:53<8:42:24,  1.47it/s]


  8%|██▉                                 | 4043/50000 [43:54<8:18:36,  1.54it/s]


  8%|██▉                                 | 4044/50000 [43:55<8:34:31,  1.49it/s]


  8%|██▉                                 | 4045/50000 [43:55<8:57:10,  1.43it/s]


  8%|██▉                                 | 4046/50000 [43:56<8:33:21,  1.49it/s]


  8%|██▉                                 | 4047/50000 [43:57<8:46:06,  1.46it/s]


  8%|██▉                                 | 4048/50000 [43:57<8:31:25,  1.50it/s]


  8%|██▉                                 | 4049/50000 [43:58<8:34:05,  1.49it/s]


  8%|██▉                                 | 4050/50000 [43:59<8:31:42,  1.50it/s]


  8%|██▉                                 | 4051/50000 [43:59<8:34:17,  1.49it/s]


  8%|██▉                                 | 4052/50000 [44:00<8:27:09,  1.51it/s]


  8%|██▉                                 | 4053/50000 [44:01<8:03:05,  1.59it/s]


  8%|██▉                                 | 4054/50000 [44:01<8:30:20,  1.50it/s]


  8%|██▉                                 | 4055/50000 [44:02<7:54:25,  1.61it/s]


  8%|██▉                                 | 4056/50000 [44:02<7:45:59,  1.64it/s]


  8%|██▉                                 | 4057/50000 [44:03<7:54:29,  1.61it/s]


  8%|██▉                                 | 4058/50000 [44:04<8:03:19,  1.58it/s]


  8%|██▉                                 | 4059/50000 [44:04<8:12:21,  1.56it/s]


  8%|██▉                                 | 4060/50000 [44:05<8:15:58,  1.54it/s]


  8%|██▉                                 | 4061/50000 [44:06<8:12:43,  1.55it/s]


  8%|██▉                                 | 4062/50000 [44:07<8:50:49,  1.44it/s]


  8%|██▉                                 | 4063/50000 [44:07<8:23:57,  1.52it/s]


  8%|██▉                                 | 4064/50000 [44:08<8:35:36,  1.48it/s]


  8%|██▉                                 | 4065/50000 [44:09<8:45:48,  1.46it/s]


  8%|██▉                                 | 4066/50000 [44:09<8:05:07,  1.58it/s]


  8%|██▉                                 | 4067/50000 [44:10<8:03:33,  1.58it/s]


  8%|██▉                                 | 4068/50000 [44:10<7:45:37,  1.64it/s]


  8%|██▉                                 | 4069/50000 [44:11<8:33:12,  1.49it/s]


  8%|██▉                                 | 4070/50000 [44:12<8:05:50,  1.58it/s]


  8%|██▉                                 | 4071/50000 [44:12<8:02:37,  1.59it/s]


  8%|██▉                                 | 4072/50000 [44:13<8:13:21,  1.55it/s]


  8%|██▉                                 | 4073/50000 [44:14<8:12:06,  1.56it/s]


  8%|██▉                                 | 4074/50000 [44:14<8:17:09,  1.54it/s]


  8%|██▉                                 | 4075/50000 [44:15<8:55:21,  1.43it/s]


  8%|██▉                                 | 4076/50000 [44:16<8:24:06,  1.52it/s]


  8%|██▉                                 | 4077/50000 [44:16<8:47:51,  1.45it/s]


  8%|██▉                                 | 4078/50000 [44:17<8:27:26,  1.51it/s]


  8%|██▉                                 | 4079/50000 [44:18<8:22:05,  1.52it/s]


  8%|██▉                                 | 4080/50000 [44:18<8:21:07,  1.53it/s]


  8%|██▉                                 | 4081/50000 [44:19<8:36:59,  1.48it/s]


  8%|██▉                                 | 4082/50000 [44:20<8:35:49,  1.48it/s]


  8%|██▉                                 | 4083/50000 [44:20<8:33:34,  1.49it/s]


  8%|██▉                                 | 4084/50000 [44:21<8:30:35,  1.50it/s]


  8%|██▉                                 | 4085/50000 [44:22<8:13:55,  1.55it/s]


  8%|██▉                                 | 4086/50000 [44:22<8:18:20,  1.54it/s]


  8%|██▉                                 | 4087/50000 [44:23<8:05:32,  1.58it/s]


  8%|██▉                                 | 4088/50000 [44:23<7:57:44,  1.60it/s]


  8%|██▉                                 | 4089/50000 [44:24<8:09:45,  1.56it/s]


  8%|██▉                                 | 4090/50000 [44:25<7:53:33,  1.62it/s]


  8%|██▉                                 | 4091/50000 [44:25<7:45:06,  1.65it/s]


  8%|██▉                                 | 4092/50000 [44:26<8:33:49,  1.49it/s]


  8%|██▉                                 | 4093/50000 [44:27<8:33:43,  1.49it/s]


  8%|██▉                                 | 4094/50000 [44:27<8:43:47,  1.46it/s]


  8%|██▉                                 | 4095/50000 [44:28<8:20:26,  1.53it/s]


  8%|██▉                                 | 4096/50000 [44:29<8:16:57,  1.54it/s]


  8%|██▉                                 | 4097/50000 [44:29<8:19:00,  1.53it/s]


  8%|██▉                                 | 4098/50000 [44:30<8:35:37,  1.48it/s]


  8%|██▉                                 | 4099/50000 [44:31<8:13:42,  1.55it/s]


  8%|██▉                                 | 4100/50000 [44:31<8:17:05,  1.54it/s]
                                                                                
{'loss': 3.3802, 'grad_norm': 2.580595016479492, 'learning_rate': 0.0009180000000000001, 'epoch': 0.21}

  8%|██▉                                 | 4100/50000 [44:31<8:17:05,  1.54it/s]


  8%|██▉                                 | 4101/50000 [44:32<8:34:37,  1.49it/s]


  8%|██▉                                 | 4102/50000 [44:33<8:12:56,  1.55it/s]


  8%|██▉                                 | 4103/50000 [44:33<8:12:15,  1.55it/s]


  8%|██▉                                 | 4104/50000 [44:34<8:16:32,  1.54it/s]


  8%|██▉                                 | 4105/50000 [44:35<8:42:55,  1.46it/s]


  8%|██▉                                 | 4106/50000 [44:35<8:33:29,  1.49it/s]


  8%|██▉                                 | 4107/50000 [44:36<8:28:02,  1.51it/s]


  8%|██▉                                 | 4108/50000 [44:36<7:51:44,  1.62it/s]


  8%|██▉                                 | 4109/50000 [44:37<8:24:09,  1.52it/s]


  8%|██▉                                 | 4110/50000 [44:38<8:41:03,  1.47it/s]


  8%|██▉                                 | 4111/50000 [44:39<9:31:07,  1.34it/s]


  8%|██▉                                 | 4112/50000 [44:39<8:59:01,  1.42it/s]


  8%|██▉                                 | 4113/50000 [44:40<8:15:48,  1.54it/s]


  8%|██▉                                 | 4114/50000 [44:41<8:39:00,  1.47it/s]


  8%|██▉                                 | 4115/50000 [44:41<8:55:12,  1.43it/s]


  8%|██▉                                 | 4116/50000 [44:42<8:32:29,  1.49it/s]


  8%|██▉                                 | 4117/50000 [44:43<7:55:46,  1.61it/s]


  8%|██▉                                 | 4118/50000 [44:43<8:17:43,  1.54it/s]


  8%|██▉                                 | 4119/50000 [44:44<7:56:11,  1.61it/s]


  8%|██▉                                 | 4120/50000 [44:44<7:51:18,  1.62it/s]


  8%|██▉                                 | 4121/50000 [44:45<8:38:24,  1.47it/s]


  8%|██▉                                 | 4122/50000 [44:46<8:50:27,  1.44it/s]


  8%|██▉                                 | 4123/50000 [44:47<9:27:39,  1.35it/s]


  8%|██▉                                 | 4124/50000 [44:47<8:43:58,  1.46it/s]


  8%|██▉                                 | 4125/50000 [44:48<8:33:26,  1.49it/s]


  8%|██▉                                 | 4126/50000 [44:49<8:11:50,  1.55it/s]


  8%|██▉                                 | 4127/50000 [44:49<8:32:22,  1.49it/s]


  8%|██▉                                 | 4128/50000 [44:50<8:15:31,  1.54it/s]


  8%|██▉                                 | 4129/50000 [44:51<8:11:12,  1.56it/s]


  8%|██▉                                 | 4130/50000 [44:51<8:16:47,  1.54it/s]


  8%|██▉                                 | 4131/50000 [44:52<8:23:30,  1.52it/s]


  8%|██▉                                 | 4132/50000 [44:53<8:44:44,  1.46it/s]


  8%|██▉                                 | 4133/50000 [44:53<8:21:36,  1.52it/s]


  8%|██▉                                 | 4134/50000 [44:54<8:25:34,  1.51it/s]


  8%|██▉                                 | 4135/50000 [44:55<8:25:26,  1.51it/s]


  8%|██▉                                 | 4136/50000 [44:55<8:02:04,  1.59it/s]


  8%|██▉                                 | 4137/50000 [44:56<8:21:58,  1.52it/s]


  8%|██▉                                 | 4138/50000 [44:56<8:06:46,  1.57it/s]


  8%|██▉                                 | 4139/50000 [44:57<7:56:06,  1.61it/s]


  8%|██▉                                 | 4140/50000 [44:58<8:03:11,  1.58it/s]


  8%|██▉                                 | 4141/50000 [44:58<7:49:27,  1.63it/s]


  8%|██▉                                 | 4142/50000 [44:59<7:52:58,  1.62it/s]


  8%|██▉                                 | 4143/50000 [45:00<8:00:58,  1.59it/s]


  8%|██▉                                 | 4144/50000 [45:00<8:45:35,  1.45it/s]


  8%|██▉                                 | 4145/50000 [45:01<8:45:13,  1.46it/s]


  8%|██▉                                 | 4146/50000 [45:02<8:24:35,  1.51it/s]


  8%|██▉                                 | 4147/50000 [45:02<8:39:40,  1.47it/s]


  8%|██▉                                 | 4148/50000 [45:03<8:56:03,  1.43it/s]


  8%|██▉                                 | 4149/50000 [45:04<8:30:37,  1.50it/s]


  8%|██▉                                 | 4150/50000 [45:04<8:14:16,  1.55it/s]


  8%|██▉                                 | 4151/50000 [45:05<8:12:55,  1.55it/s]


  8%|██▉                                 | 4152/50000 [45:06<8:11:47,  1.55it/s]


  8%|██▉                                 | 4153/50000 [45:06<8:38:57,  1.47it/s]


  8%|██▉                                 | 4154/50000 [45:07<9:19:17,  1.37it/s]


  8%|██▉                                 | 4155/50000 [45:08<8:46:22,  1.45it/s]


  8%|██▉                                 | 4156/50000 [45:08<8:26:13,  1.51it/s]


  8%|██▉                                 | 4157/50000 [45:09<8:36:57,  1.48it/s]


  8%|██▉                                 | 4158/50000 [45:10<8:50:33,  1.44it/s]


  8%|██▉                                 | 4159/50000 [45:10<8:26:15,  1.51it/s]


  8%|██▉                                 | 4160/50000 [45:11<8:31:50,  1.49it/s]


  8%|██▉                                 | 4161/50000 [45:12<9:03:41,  1.41it/s]


  8%|██▉                                 | 4162/50000 [45:13<9:27:26,  1.35it/s]


  8%|██▉                                 | 4163/50000 [45:13<8:29:53,  1.50it/s]


  8%|██▉                                 | 4164/50000 [45:14<8:54:36,  1.43it/s]


  8%|██▉                                 | 4165/50000 [45:15<8:31:51,  1.49it/s]


  8%|██▉                                 | 4166/50000 [45:15<8:15:57,  1.54it/s]


  8%|███                                 | 4167/50000 [45:16<8:20:52,  1.53it/s]


  8%|███                                 | 4168/50000 [45:17<8:39:25,  1.47it/s]


  8%|███                                 | 4169/50000 [45:17<8:10:05,  1.56it/s]


  8%|███                                 | 4170/50000 [45:18<8:18:15,  1.53it/s]


  8%|███                                 | 4171/50000 [45:19<8:19:39,  1.53it/s]


  8%|███                                 | 4172/50000 [45:19<8:17:33,  1.54it/s]


  8%|███                                 | 4173/50000 [45:20<9:04:05,  1.40it/s]


  8%|███                                 | 4174/50000 [45:21<9:03:10,  1.41it/s]


  8%|███                                 | 4175/50000 [45:21<8:36:30,  1.48it/s]


  8%|███                                 | 4176/50000 [45:22<8:33:03,  1.49it/s]


  8%|███                                 | 4177/50000 [45:23<8:32:45,  1.49it/s]


  8%|███                                 | 4178/50000 [45:23<8:24:26,  1.51it/s]


  8%|███                                 | 4179/50000 [45:24<9:24:01,  1.35it/s]


  8%|███                                 | 4180/50000 [45:25<9:01:40,  1.41it/s]


  8%|███                                 | 4181/50000 [45:26<8:44:34,  1.46it/s]


  8%|███                                 | 4182/50000 [45:26<8:31:11,  1.49it/s]


  8%|███                                 | 4183/50000 [45:27<9:05:00,  1.40it/s]


  8%|███                                 | 4184/50000 [45:28<8:54:15,  1.43it/s]


  8%|███                                 | 4185/50000 [45:28<8:36:40,  1.48it/s]


  8%|███                                 | 4186/50000 [45:29<8:38:47,  1.47it/s]


  8%|███                                 | 4187/50000 [45:30<8:22:42,  1.52it/s]


  8%|███                                 | 4188/50000 [45:30<8:07:07,  1.57it/s]


  8%|███                                 | 4189/50000 [45:31<8:27:29,  1.50it/s]


  8%|███                                 | 4190/50000 [45:32<8:33:06,  1.49it/s]


  8%|███                                 | 4191/50000 [45:32<8:26:24,  1.51it/s]


  8%|███                                 | 4192/50000 [45:33<8:11:20,  1.55it/s]


  8%|███                                 | 4193/50000 [45:33<8:02:07,  1.58it/s]


  8%|███                                 | 4194/50000 [45:34<8:28:07,  1.50it/s]


  8%|███                                 | 4195/50000 [45:35<7:49:44,  1.63it/s]


  8%|███                                 | 4196/50000 [45:35<8:01:03,  1.59it/s]


  8%|███                                 | 4197/50000 [45:36<8:11:49,  1.55it/s]


  8%|███                                 | 4198/50000 [45:37<8:09:19,  1.56it/s]


  8%|███                                 | 4199/50000 [45:37<7:57:45,  1.60it/s]


  8%|███                                 | 4200/50000 [45:38<7:48:02,  1.63it/s]
                                                                                
{'loss': 3.3797, 'grad_norm': 2.4867303371429443, 'learning_rate': 0.000916, 'epoch': 0.22}

  8%|███                                 | 4200/50000 [45:38<7:48:02,  1.63it/s]


  8%|███                                 | 4201/50000 [45:38<7:56:55,  1.60it/s]


  8%|███                                 | 4202/50000 [45:39<7:33:31,  1.68it/s]


  8%|███                                 | 4203/50000 [45:40<7:39:36,  1.66it/s]


  8%|███                                 | 4204/50000 [45:40<8:26:58,  1.51it/s]


  8%|███                                 | 4205/50000 [45:41<9:07:32,  1.39it/s]


  8%|███                                 | 4206/50000 [45:42<8:39:29,  1.47it/s]


  8%|███                                 | 4207/50000 [45:43<8:55:10,  1.43it/s]


  8%|███                                 | 4208/50000 [45:43<9:04:48,  1.40it/s]


  8%|███                                 | 4209/50000 [45:44<9:03:36,  1.40it/s]


  8%|███                                 | 4210/50000 [45:45<8:32:28,  1.49it/s]


  8%|███                                 | 4211/50000 [45:45<9:05:23,  1.40it/s]


  8%|███                                 | 4212/50000 [45:46<8:49:39,  1.44it/s]


  8%|███                                 | 4213/50000 [45:47<8:29:41,  1.50it/s]


  8%|███                                 | 4214/50000 [45:47<8:29:12,  1.50it/s]


  8%|███                                 | 4215/50000 [45:48<8:23:16,  1.52it/s]


  8%|███                                 | 4216/50000 [45:49<8:10:26,  1.56it/s]


  8%|███                                 | 4217/50000 [45:49<8:00:48,  1.59it/s]


  8%|███                                 | 4218/50000 [45:50<7:53:19,  1.61it/s]


  8%|███                                 | 4219/50000 [45:50<7:50:52,  1.62it/s]


  8%|███                                 | 4220/50000 [45:51<8:21:50,  1.52it/s]


  8%|███                                 | 4221/50000 [45:52<8:10:43,  1.55it/s]


  8%|███                                 | 4222/50000 [45:52<8:01:34,  1.58it/s]


  8%|███                                 | 4223/50000 [45:53<8:03:53,  1.58it/s]


  8%|███                                 | 4224/50000 [45:54<8:04:02,  1.58it/s]


  8%|███                                 | 4225/50000 [45:54<8:11:23,  1.55it/s]


  8%|███                                 | 4226/50000 [45:55<8:19:26,  1.53it/s]


  8%|███                                 | 4227/50000 [45:56<7:59:04,  1.59it/s]


  8%|███                                 | 4228/50000 [45:56<7:44:51,  1.64it/s]


  8%|███                                 | 4229/50000 [45:57<7:34:50,  1.68it/s]


  8%|███                                 | 4230/50000 [45:57<7:46:34,  1.63it/s]


  8%|███                                 | 4231/50000 [45:58<7:26:36,  1.71it/s]


  8%|███                                 | 4232/50000 [45:59<7:37:57,  1.67it/s]


  8%|███                                 | 4233/50000 [45:59<7:16:14,  1.75it/s]


  8%|███                                 | 4234/50000 [46:00<7:19:05,  1.74it/s]


  8%|███                                 | 4235/50000 [46:00<7:25:22,  1.71it/s]


  8%|███                                 | 4236/50000 [46:01<7:55:54,  1.60it/s]


  8%|███                                 | 4237/50000 [46:02<8:08:31,  1.56it/s]


  8%|███                                 | 4238/50000 [46:02<8:56:33,  1.42it/s]


  8%|███                                 | 4239/50000 [46:03<8:52:12,  1.43it/s]


  8%|███                                 | 4240/50000 [46:04<8:07:49,  1.56it/s]


  8%|███                                 | 4241/50000 [46:04<8:07:39,  1.56it/s]


  8%|███                                 | 4242/50000 [46:05<7:58:52,  1.59it/s]


  8%|███                                 | 4243/50000 [46:06<8:41:47,  1.46it/s]


  8%|███                                 | 4244/50000 [46:06<8:38:44,  1.47it/s]


  8%|███                                 | 4245/50000 [46:07<8:37:58,  1.47it/s]


  8%|███                                 | 4246/50000 [46:08<8:34:34,  1.48it/s]


  8%|███                                 | 4247/50000 [46:08<8:22:10,  1.52it/s]


  8%|███                                 | 4248/50000 [46:09<8:20:35,  1.52it/s]


  8%|███                                 | 4249/50000 [46:10<7:58:00,  1.60it/s]


  8%|███                                 | 4250/50000 [46:10<8:28:57,  1.50it/s]


  9%|███                                 | 4251/50000 [46:11<8:15:09,  1.54it/s]


  9%|███                                 | 4252/50000 [46:11<8:01:07,  1.58it/s]


  9%|███                                 | 4253/50000 [46:12<8:26:38,  1.50it/s]


  9%|███                                 | 4254/50000 [46:13<8:22:25,  1.52it/s]


  9%|███                                 | 4255/50000 [46:14<8:16:30,  1.54it/s]


  9%|███                                 | 4256/50000 [46:14<8:09:27,  1.56it/s]


  9%|███                                 | 4257/50000 [46:15<7:51:28,  1.62it/s]


  9%|███                                 | 4258/50000 [46:15<7:56:05,  1.60it/s]


  9%|███                                 | 4259/50000 [46:16<7:42:08,  1.65it/s]


  9%|███                                 | 4260/50000 [46:17<7:46:49,  1.63it/s]


  9%|███                                 | 4261/50000 [46:17<8:00:26,  1.59it/s]


  9%|███                                 | 4262/50000 [46:18<7:53:47,  1.61it/s]


  9%|███                                 | 4263/50000 [46:18<7:59:17,  1.59it/s]


  9%|███                                 | 4264/50000 [46:19<8:06:07,  1.57it/s]


  9%|███                                 | 4265/50000 [46:20<8:09:15,  1.56it/s]


  9%|███                                 | 4266/50000 [46:20<8:01:12,  1.58it/s]


  9%|███                                 | 4267/50000 [46:21<8:00:07,  1.59it/s]


  9%|███                                 | 4268/50000 [46:22<7:54:52,  1.61it/s]


  9%|███                                 | 4269/50000 [46:22<8:08:42,  1.56it/s]


  9%|███                                 | 4270/50000 [46:23<8:34:51,  1.48it/s]


  9%|███                                 | 4271/50000 [46:24<8:32:18,  1.49it/s]


  9%|███                                 | 4272/50000 [46:24<8:20:31,  1.52it/s]


  9%|███                                 | 4273/50000 [46:25<7:49:42,  1.62it/s]


  9%|███                                 | 4274/50000 [46:25<7:33:24,  1.68it/s]


  9%|███                                 | 4275/50000 [46:26<7:16:44,  1.74it/s]


  9%|███                                 | 4276/50000 [46:27<7:55:29,  1.60it/s]


  9%|███                                 | 4277/50000 [46:27<7:43:08,  1.65it/s]


  9%|███                                 | 4278/50000 [46:28<7:30:10,  1.69it/s]


  9%|███                                 | 4279/50000 [46:28<7:33:52,  1.68it/s]


  9%|███                                 | 4280/50000 [46:29<7:11:19,  1.77it/s]


  9%|███                                 | 4281/50000 [46:30<7:30:47,  1.69it/s]


  9%|███                                 | 4282/50000 [46:30<7:31:31,  1.69it/s]


  9%|███                                 | 4283/50000 [46:31<7:37:57,  1.66it/s]


  9%|███                                 | 4284/50000 [46:31<7:35:03,  1.67it/s]


  9%|███                                 | 4285/50000 [46:32<7:31:45,  1.69it/s]


  9%|███                                 | 4286/50000 [46:33<8:01:40,  1.58it/s]


  9%|███                                 | 4287/50000 [46:33<8:08:22,  1.56it/s]


  9%|███                                 | 4288/50000 [46:34<7:56:41,  1.60it/s]


  9%|███                                 | 4289/50000 [46:34<7:48:07,  1.63it/s]


  9%|███                                 | 4290/50000 [46:35<7:35:07,  1.67it/s]


  9%|███                                 | 4291/50000 [46:36<7:27:32,  1.70it/s]


  9%|███                                 | 4292/50000 [46:36<8:22:48,  1.52it/s]


  9%|███                                 | 4293/50000 [46:37<8:02:26,  1.58it/s]


  9%|███                                 | 4294/50000 [46:38<8:11:02,  1.55it/s]


  9%|███                                 | 4295/50000 [46:38<7:48:13,  1.63it/s]


  9%|███                                 | 4296/50000 [46:39<8:16:05,  1.54it/s]


  9%|███                                 | 4297/50000 [46:40<8:20:33,  1.52it/s]


  9%|███                                 | 4298/50000 [46:40<7:48:19,  1.63it/s]


  9%|███                                 | 4299/50000 [46:41<7:56:45,  1.60it/s]


  9%|███                                 | 4300/50000 [46:41<8:05:38,  1.57it/s]
                                                                                
{'loss': 3.44, 'grad_norm': 2.312074661254883, 'learning_rate': 0.0009140000000000001, 'epoch': 0.23}

  9%|███                                 | 4300/50000 [46:41<8:05:38,  1.57it/s]


  9%|███                                 | 4301/50000 [46:42<8:31:08,  1.49it/s]


  9%|███                                 | 4302/50000 [46:43<8:15:04,  1.54it/s]


  9%|███                                 | 4303/50000 [46:43<8:19:55,  1.52it/s]


  9%|███                                 | 4304/50000 [46:44<8:23:13,  1.51it/s]


  9%|███                                 | 4305/50000 [46:45<7:56:43,  1.60it/s]


  9%|███                                 | 4306/50000 [46:45<8:29:15,  1.50it/s]


  9%|███                                 | 4307/50000 [46:46<8:08:08,  1.56it/s]


  9%|███                                 | 4308/50000 [46:47<8:15:01,  1.54it/s]


  9%|███                                 | 4309/50000 [46:47<7:58:12,  1.59it/s]


  9%|███                                 | 4310/50000 [46:48<8:03:38,  1.57it/s]


  9%|███                                 | 4311/50000 [46:49<8:31:05,  1.49it/s]


  9%|███                                 | 4312/50000 [46:49<8:25:12,  1.51it/s]


  9%|███                                 | 4313/50000 [46:50<8:10:57,  1.55it/s]


  9%|███                                 | 4314/50000 [46:51<8:42:23,  1.46it/s]


  9%|███                                 | 4315/50000 [46:52<9:21:28,  1.36it/s]


  9%|███                                 | 4316/50000 [46:52<9:03:54,  1.40it/s]


  9%|███                                 | 4317/50000 [46:53<8:31:30,  1.49it/s]


  9%|███                                 | 4318/50000 [46:53<8:11:31,  1.55it/s]


  9%|███                                 | 4319/50000 [46:54<8:00:26,  1.58it/s]


  9%|███                                 | 4320/50000 [46:55<8:32:14,  1.49it/s]


  9%|███                                 | 4321/50000 [46:55<8:24:46,  1.51it/s]


  9%|███                                 | 4322/50000 [46:56<8:43:00,  1.46it/s]


  9%|███                                 | 4323/50000 [46:57<8:33:32,  1.48it/s]


  9%|███                                 | 4324/50000 [46:57<8:28:29,  1.50it/s]


  9%|███                                 | 4325/50000 [46:58<8:55:04,  1.42it/s]


  9%|███                                 | 4326/50000 [46:59<8:37:45,  1.47it/s]


  9%|███                                 | 4327/50000 [47:00<8:46:22,  1.45it/s]


  9%|███                                 | 4328/50000 [47:00<8:41:45,  1.46it/s]


  9%|███                                 | 4329/50000 [47:01<8:16:48,  1.53it/s]


  9%|███                                 | 4330/50000 [47:01<8:04:58,  1.57it/s]


  9%|███                                 | 4331/50000 [47:02<8:36:29,  1.47it/s]


  9%|███                                 | 4332/50000 [47:03<8:17:20,  1.53it/s]


  9%|███                                 | 4333/50000 [47:03<8:02:30,  1.58it/s]


  9%|███                                 | 4334/50000 [47:04<8:05:20,  1.57it/s]


  9%|███                                 | 4335/50000 [47:05<7:58:26,  1.59it/s]


  9%|███                                 | 4336/50000 [47:05<7:50:46,  1.62it/s]


  9%|███                                 | 4337/50000 [47:06<7:52:36,  1.61it/s]


  9%|███                                 | 4338/50000 [47:06<7:43:46,  1.64it/s]


  9%|███                                 | 4339/50000 [47:07<8:29:20,  1.49it/s]


  9%|███                                 | 4340/50000 [47:08<8:25:25,  1.51it/s]


  9%|███▏                                | 4341/50000 [47:09<8:25:32,  1.51it/s]


  9%|███▏                                | 4342/50000 [47:09<8:24:13,  1.51it/s]


  9%|███▏                                | 4343/50000 [47:10<8:18:53,  1.53it/s]


  9%|███▏                                | 4344/50000 [47:11<8:16:30,  1.53it/s]


  9%|███▏                                | 4345/50000 [47:11<8:45:07,  1.45it/s]


  9%|███▏                                | 4346/50000 [47:12<8:34:52,  1.48it/s]


  9%|███▏                                | 4347/50000 [47:12<8:10:30,  1.55it/s]


  9%|███▏                                | 4348/50000 [47:13<8:13:33,  1.54it/s]


  9%|███▏                                | 4349/50000 [47:14<8:40:50,  1.46it/s]


  9%|███▏                                | 4350/50000 [47:15<8:36:18,  1.47it/s]


  9%|███▏                                | 4351/50000 [47:15<8:47:58,  1.44it/s]


  9%|███▏                                | 4352/50000 [47:16<8:28:03,  1.50it/s]


  9%|███▏                                | 4353/50000 [47:17<8:21:31,  1.52it/s]


  9%|███▏                                | 4354/50000 [47:17<7:56:53,  1.60it/s]


  9%|███▏                                | 4355/50000 [47:18<7:59:48,  1.59it/s]


  9%|███▏                                | 4356/50000 [47:18<7:44:26,  1.64it/s]


  9%|███▏                                | 4357/50000 [47:19<7:40:56,  1.65it/s]


  9%|███▏                                | 4358/50000 [47:19<7:29:47,  1.69it/s]


  9%|███▏                                | 4359/50000 [47:20<8:01:30,  1.58it/s]


  9%|███▏                                | 4360/50000 [47:21<7:51:15,  1.61it/s]


  9%|███▏                                | 4361/50000 [47:21<7:48:16,  1.62it/s]


  9%|███▏                                | 4362/50000 [47:22<7:55:02,  1.60it/s]


  9%|███▏                                | 4363/50000 [47:23<7:55:31,  1.60it/s]


  9%|███▏                                | 4364/50000 [47:23<7:24:55,  1.71it/s]


  9%|███▏                                | 4365/50000 [47:24<7:29:53,  1.69it/s]


  9%|███▏                                | 4366/50000 [47:24<7:27:45,  1.70it/s]


  9%|███▏                                | 4367/50000 [47:25<7:40:42,  1.65it/s]


  9%|███▏                                | 4368/50000 [47:26<7:21:41,  1.72it/s]


  9%|███▏                                | 4369/50000 [47:26<8:16:38,  1.53it/s]


  9%|███▏                                | 4370/50000 [47:27<8:03:14,  1.57it/s]


  9%|███▏                                | 4371/50000 [47:28<8:12:47,  1.54it/s]


  9%|███▏                                | 4372/50000 [47:28<8:16:28,  1.53it/s]


  9%|███▏                                | 4373/50000 [47:29<7:51:50,  1.61it/s]


  9%|███▏                                | 4374/50000 [47:29<7:37:24,  1.66it/s]


  9%|███▏                                | 4375/50000 [47:30<8:11:03,  1.55it/s]


  9%|███▏                                | 4376/50000 [47:31<7:57:59,  1.59it/s]


  9%|███▏                                | 4377/50000 [47:31<8:03:12,  1.57it/s]


  9%|███▏                                | 4378/50000 [47:32<8:01:13,  1.58it/s]


  9%|███▏                                | 4379/50000 [47:33<7:41:09,  1.65it/s]


  9%|███▏                                | 4380/50000 [47:33<7:40:02,  1.65it/s]


  9%|███▏                                | 4381/50000 [47:34<7:50:41,  1.62it/s]


  9%|███▏                                | 4382/50000 [47:34<7:53:54,  1.60it/s]


  9%|███▏                                | 4383/50000 [47:35<8:14:55,  1.54it/s]


  9%|███▏                                | 4384/50000 [47:36<8:00:39,  1.58it/s]


  9%|███▏                                | 4385/50000 [47:36<8:10:54,  1.55it/s]


  9%|███▏                                | 4386/50000 [47:37<8:06:38,  1.56it/s]


  9%|███▏                                | 4387/50000 [47:38<8:13:37,  1.54it/s]


  9%|███▏                                | 4388/50000 [47:38<8:08:46,  1.56it/s]


  9%|███▏                                | 4389/50000 [47:39<7:37:37,  1.66it/s]


  9%|███▏                                | 4390/50000 [47:40<7:52:25,  1.61it/s]


  9%|███▏                                | 4391/50000 [47:40<7:55:43,  1.60it/s]


  9%|███▏                                | 4392/50000 [47:41<8:00:49,  1.58it/s]


  9%|███▏                                | 4393/50000 [47:42<8:22:07,  1.51it/s]


  9%|███▏                                | 4394/50000 [47:42<8:03:21,  1.57it/s]


  9%|███▏                                | 4395/50000 [47:43<8:08:57,  1.55it/s]


  9%|███▏                                | 4396/50000 [47:43<8:08:32,  1.56it/s]


  9%|███▏                                | 4397/50000 [47:44<7:55:17,  1.60it/s]


  9%|███▏                                | 4398/50000 [47:45<8:06:13,  1.56it/s]


  9%|███▏                                | 4399/50000 [47:45<8:16:37,  1.53it/s]


  9%|███▏                                | 4400/50000 [47:46<8:15:18,  1.53it/s]
                                                                                
{'loss': 3.4038, 'grad_norm': 2.518805980682373, 'learning_rate': 0.000912, 'epoch': 0.23}

  9%|███▏                                | 4400/50000 [47:46<8:15:18,  1.53it/s]


  9%|███▏                                | 4401/50000 [47:47<7:57:20,  1.59it/s]


  9%|███▏                                | 4402/50000 [47:47<7:38:57,  1.66it/s]


  9%|███▏                                | 4403/50000 [47:48<7:52:11,  1.61it/s]


  9%|███▏                                | 4404/50000 [47:48<7:55:25,  1.60it/s]


  9%|███▏                                | 4405/50000 [47:49<7:57:40,  1.59it/s]


  9%|███▏                                | 4406/50000 [47:50<7:58:52,  1.59it/s]


  9%|███▏                                | 4407/50000 [47:50<7:46:31,  1.63it/s]


  9%|███▏                                | 4408/50000 [47:51<8:17:17,  1.53it/s]


  9%|███▏                                | 4409/50000 [47:52<8:09:37,  1.55it/s]


  9%|███▏                                | 4410/50000 [47:52<8:10:00,  1.55it/s]


  9%|███▏                                | 4411/50000 [47:53<8:41:13,  1.46it/s]


  9%|███▏                                | 4412/50000 [47:54<8:29:17,  1.49it/s]


  9%|███▏                                | 4413/50000 [47:55<9:11:28,  1.38it/s]


  9%|███▏                                | 4414/50000 [47:55<9:31:53,  1.33it/s]


  9%|███▏                                | 4415/50000 [47:56<9:16:06,  1.37it/s]


  9%|███▏                                | 4416/50000 [47:57<9:00:45,  1.40it/s]


  9%|███▏                                | 4417/50000 [47:57<8:45:43,  1.45it/s]


  9%|███▏                                | 4418/50000 [47:58<9:00:05,  1.41it/s]


  9%|███▏                                | 4419/50000 [47:59<8:53:58,  1.42it/s]


  9%|███▏                                | 4420/50000 [47:59<8:35:37,  1.47it/s]


  9%|███▏                                | 4421/50000 [48:00<8:35:41,  1.47it/s]


  9%|███▏                                | 4422/50000 [48:01<8:24:13,  1.51it/s]


  9%|███▏                                | 4423/50000 [48:01<8:23:38,  1.51it/s]


  9%|███▏                                | 4424/50000 [48:02<8:07:08,  1.56it/s]


  9%|███▏                                | 4425/50000 [48:03<8:33:50,  1.48it/s]


  9%|███▏                                | 4426/50000 [48:03<8:34:28,  1.48it/s]


  9%|███▏                                | 4427/50000 [48:04<8:23:30,  1.51it/s]


  9%|███▏                                | 4428/50000 [48:05<8:34:27,  1.48it/s]


  9%|███▏                                | 4429/50000 [48:06<8:52:50,  1.43it/s]


  9%|███▏                                | 4430/50000 [48:06<9:36:40,  1.32it/s]


  9%|███▏                                | 4431/50000 [48:07<9:13:38,  1.37it/s]


  9%|███▏                                | 4432/50000 [48:08<8:44:31,  1.45it/s]


  9%|███▏                                | 4433/50000 [48:08<8:38:39,  1.46it/s]


  9%|███▏                                | 4434/50000 [48:09<8:17:07,  1.53it/s]


  9%|███▏                                | 4435/50000 [48:09<7:53:28,  1.60it/s]


  9%|███▏                                | 4436/50000 [48:10<7:56:55,  1.59it/s]


  9%|███▏                                | 4437/50000 [48:11<7:59:25,  1.58it/s]


  9%|███▏                                | 4438/50000 [48:11<8:16:59,  1.53it/s]


  9%|███▏                                | 4439/50000 [48:12<8:35:33,  1.47it/s]


  9%|███▏                                | 4440/50000 [48:13<8:16:28,  1.53it/s]


  9%|███▏                                | 4441/50000 [48:13<8:20:26,  1.52it/s]


  9%|███▏                                | 4442/50000 [48:14<7:58:45,  1.59it/s]


  9%|███▏                                | 4443/50000 [48:15<7:39:14,  1.65it/s]


  9%|███▏                                | 4444/50000 [48:15<7:55:41,  1.60it/s]


  9%|███▏                                | 4445/50000 [48:16<7:49:48,  1.62it/s]


  9%|███▏                                | 4446/50000 [48:16<7:26:29,  1.70it/s]


  9%|███▏                                | 4447/50000 [48:17<7:46:51,  1.63it/s]


  9%|███▏                                | 4448/50000 [48:18<8:08:20,  1.55it/s]


  9%|███▏                                | 4449/50000 [48:18<7:53:03,  1.60it/s]


  9%|███▏                                | 4450/50000 [48:19<7:46:30,  1.63it/s]


  9%|███▏                                | 4451/50000 [48:19<7:32:11,  1.68it/s]


  9%|███▏                                | 4452/50000 [48:20<8:53:46,  1.42it/s]


  9%|███▏                                | 4453/50000 [48:21<8:44:57,  1.45it/s]


  9%|███▏                                | 4454/50000 [48:22<9:17:41,  1.36it/s]


  9%|███▏                                | 4455/50000 [48:22<8:40:06,  1.46it/s]


  9%|███▏                                | 4456/50000 [48:23<8:35:19,  1.47it/s]


  9%|███▏                                | 4457/50000 [48:24<8:32:02,  1.48it/s]


  9%|███▏                                | 4458/50000 [48:24<8:16:36,  1.53it/s]


  9%|███▏                                | 4459/50000 [48:25<8:39:42,  1.46it/s]


  9%|███▏                                | 4460/50000 [48:26<8:17:46,  1.52it/s]


  9%|███▏                                | 4461/50000 [48:26<8:02:25,  1.57it/s]


  9%|███▏                                | 4462/50000 [48:27<7:48:49,  1.62it/s]


  9%|███▏                                | 4463/50000 [48:28<7:53:42,  1.60it/s]


  9%|███▏                                | 4464/50000 [48:28<7:37:06,  1.66it/s]


  9%|███▏                                | 4465/50000 [48:29<7:15:58,  1.74it/s]


  9%|███▏                                | 4466/50000 [48:29<7:12:03,  1.76it/s]


  9%|███▏                                | 4467/50000 [48:30<7:43:44,  1.64it/s]


  9%|███▏                                | 4468/50000 [48:30<7:19:22,  1.73it/s]


  9%|███▏                                | 4469/50000 [48:31<7:24:25,  1.71it/s]


  9%|███▏                                | 4470/50000 [48:32<8:04:41,  1.57it/s]


  9%|███▏                                | 4471/50000 [48:32<8:02:07,  1.57it/s]


  9%|███▏                                | 4472/50000 [48:33<8:23:51,  1.51it/s]


  9%|███▏                                | 4473/50000 [48:34<7:48:55,  1.62it/s]


  9%|███▏                                | 4474/50000 [48:34<7:59:34,  1.58it/s]


  9%|███▏                                | 4475/50000 [48:35<8:04:04,  1.57it/s]


  9%|███▏                                | 4476/50000 [48:36<7:52:46,  1.60it/s]


  9%|███▏                                | 4477/50000 [48:36<7:53:35,  1.60it/s]


  9%|███▏                                | 4478/50000 [48:37<8:04:12,  1.57it/s]


  9%|███▏                                | 4479/50000 [48:38<8:30:52,  1.49it/s]


  9%|███▏                                | 4480/50000 [48:38<8:11:33,  1.54it/s]


  9%|███▏                                | 4481/50000 [48:39<7:51:00,  1.61it/s]


  9%|███▏                                | 4482/50000 [48:39<7:55:11,  1.60it/s]


  9%|███▏                                | 4483/50000 [48:40<7:49:23,  1.62it/s]


  9%|███▏                                | 4484/50000 [48:41<7:58:28,  1.59it/s]


  9%|███▏                                | 4485/50000 [48:41<8:01:51,  1.57it/s]


  9%|███▏                                | 4486/50000 [48:42<8:26:26,  1.50it/s]


  9%|███▏                                | 4487/50000 [48:43<8:11:21,  1.54it/s]


  9%|███▏                                | 4488/50000 [48:43<8:15:15,  1.53it/s]


  9%|███▏                                | 4489/50000 [48:44<8:16:30,  1.53it/s]


  9%|███▏                                | 4490/50000 [48:45<8:10:06,  1.55it/s]


  9%|███▏                                | 4491/50000 [48:45<8:17:10,  1.53it/s]


  9%|███▏                                | 4492/50000 [48:46<8:01:51,  1.57it/s]


  9%|███▏                                | 4493/50000 [48:47<8:23:19,  1.51it/s]


  9%|███▏                                | 4494/50000 [48:47<8:19:16,  1.52it/s]


  9%|███▏                                | 4495/50000 [48:48<8:02:11,  1.57it/s]


  9%|███▏                                | 4496/50000 [48:48<8:00:07,  1.58it/s]


  9%|███▏                                | 4497/50000 [48:49<7:45:42,  1.63it/s]


  9%|███▏                                | 4498/50000 [48:50<8:00:24,  1.58it/s]


  9%|███▏                                | 4499/50000 [48:50<7:45:08,  1.63it/s]


  9%|███▏                                | 4500/50000 [48:51<7:39:58,  1.65it/s]
                                                                                
{'loss': 3.3663, 'grad_norm': 2.577329635620117, 'learning_rate': 0.00091, 'epoch': 0.24}



  9%|███▏                                | 4500/50000 [48:51<7:39:58,  1.65it/s]


  9%|███▏                                | 4501/50000 [48:51<7:36:33,  1.66it/s]


  9%|███▏                                | 4502/50000 [48:52<7:44:08,  1.63it/s]


  9%|███▏                                | 4503/50000 [48:53<7:43:31,  1.64it/s]


  9%|███▏                                | 4504/50000 [48:53<7:56:08,  1.59it/s]


  9%|███▏                                | 4505/50000 [48:54<8:18:29,  1.52it/s]


  9%|███▏                                | 4506/50000 [48:55<8:19:15,  1.52it/s]


  9%|███▏                                | 4507/50000 [48:55<8:21:19,  1.51it/s]


  9%|███▏                                | 4508/50000 [48:56<8:22:39,  1.51it/s]


  9%|███▏                                | 4509/50000 [48:57<7:50:54,  1.61it/s]


  9%|███▏                                | 4510/50000 [48:57<8:12:55,  1.54it/s]


  9%|███▏                                | 4511/50000 [48:58<8:11:57,  1.54it/s]


  9%|███▏                                | 4512/50000 [48:58<7:30:38,  1.68it/s]


  9%|███▏                                | 4513/50000 [48:59<8:19:32,  1.52it/s]


  9%|███▎                                | 4514/50000 [49:00<8:06:28,  1.56it/s]


  9%|███▎                                | 4515/50000 [49:00<8:11:44,  1.54it/s]


  9%|███▎                                | 4516/50000 [49:01<8:26:53,  1.50it/s]


  9%|███▎                                | 4517/50000 [49:02<8:07:18,  1.56it/s]


  9%|███▎                                | 4518/50000 [49:03<8:28:31,  1.49it/s]


  9%|███▎                                | 4519/50000 [49:03<8:47:52,  1.44it/s]


  9%|███▎                                | 4520/50000 [49:04<9:46:58,  1.29it/s]


  9%|███▎                                | 4521/50000 [49:05<9:32:04,  1.32it/s]


  9%|███▎                                | 4522/50000 [49:05<8:36:18,  1.47it/s]


  9%|███▎                                | 4523/50000 [49:06<8:13:49,  1.53it/s]


  9%|███▏                               | 4524/50000 [49:07<10:01:06,  1.26it/s]


  9%|███▎                                | 4525/50000 [49:08<9:28:59,  1.33it/s]


  9%|███▎                                | 4526/50000 [49:08<8:53:30,  1.42it/s]


  9%|███▎                                | 4527/50000 [49:09<9:23:25,  1.35it/s]


  9%|███▎                                | 4528/50000 [49:10<9:18:57,  1.36it/s]


  9%|███▎                                | 4529/50000 [49:11<8:43:41,  1.45it/s]


  9%|███▎                                | 4530/50000 [49:11<8:11:53,  1.54it/s]


  9%|███▎                                | 4531/50000 [49:12<8:08:16,  1.55it/s]


  9%|███▎                                | 4532/50000 [49:12<8:08:35,  1.55it/s]


  9%|███▎                                | 4533/50000 [49:13<7:57:22,  1.59it/s]


  9%|███▎                                | 4534/50000 [49:14<8:00:56,  1.58it/s]


  9%|███▎                                | 4535/50000 [49:14<7:51:46,  1.61it/s]


  9%|███▎                                | 4536/50000 [49:15<8:20:27,  1.51it/s]


  9%|███▎                                | 4537/50000 [49:16<8:40:32,  1.46it/s]


  9%|███▎                                | 4538/50000 [49:16<8:40:05,  1.46it/s]


  9%|███▎                                | 4539/50000 [49:17<8:50:07,  1.43it/s]


  9%|███▎                                | 4540/50000 [49:18<9:02:14,  1.40it/s]


  9%|███▎                                | 4541/50000 [49:19<9:05:39,  1.39it/s]


  9%|███▎                                | 4542/50000 [49:19<8:52:59,  1.42it/s]


  9%|███▎                                | 4543/50000 [49:20<8:18:17,  1.52it/s]


  9%|███▎                                | 4544/50000 [49:21<8:34:51,  1.47it/s]


  9%|███▎                                | 4545/50000 [49:21<8:34:18,  1.47it/s]


  9%|███▎                                | 4546/50000 [49:22<8:12:26,  1.54it/s]


  9%|███▎                                | 4547/50000 [49:22<7:55:45,  1.59it/s]


  9%|███▎                                | 4548/50000 [49:23<8:01:07,  1.57it/s]


  9%|███▎                                | 4549/50000 [49:24<7:57:49,  1.59it/s]


  9%|███▎                                | 4550/50000 [49:24<7:49:38,  1.61it/s]


  9%|███▎                                | 4551/50000 [49:25<8:44:18,  1.44it/s]


  9%|███▎                                | 4552/50000 [49:26<8:20:43,  1.51it/s]


  9%|███▎                                | 4553/50000 [49:26<8:24:19,  1.50it/s]


  9%|███▎                                | 4554/50000 [49:27<8:41:50,  1.45it/s]


  9%|███▎                                | 4555/50000 [49:28<8:19:01,  1.52it/s]


  9%|███▎                                | 4556/50000 [49:28<8:41:00,  1.45it/s]


  9%|███▎                                | 4557/50000 [49:29<8:15:34,  1.53it/s]


  9%|███▎                                | 4558/50000 [49:30<8:02:26,  1.57it/s]


  9%|███▎                                | 4559/50000 [49:30<8:09:14,  1.55it/s]


  9%|███▎                                | 4560/50000 [49:31<7:49:38,  1.61it/s]


  9%|███▎                                | 4561/50000 [49:31<7:38:43,  1.65it/s]


  9%|███▎                                | 4562/50000 [49:32<7:52:43,  1.60it/s]


  9%|███▎                                | 4563/50000 [49:33<8:15:15,  1.53it/s]


  9%|███▎                                | 4564/50000 [49:33<8:02:32,  1.57it/s]


  9%|███▎                                | 4565/50000 [49:34<8:28:28,  1.49it/s]


  9%|███▎                                | 4566/50000 [49:35<8:18:46,  1.52it/s]


  9%|███▎                                | 4567/50000 [49:35<8:10:39,  1.54it/s]


  9%|███▎                                | 4568/50000 [49:36<8:38:27,  1.46it/s]


  9%|███▎                                | 4569/50000 [49:37<8:17:12,  1.52it/s]


  9%|███▎                                | 4570/50000 [49:38<8:44:55,  1.44it/s]


  9%|███▎                                | 4571/50000 [49:38<8:58:12,  1.41it/s]


  9%|███▎                                | 4572/50000 [49:39<8:30:58,  1.48it/s]


  9%|███▎                                | 4573/50000 [49:39<8:07:41,  1.55it/s]


  9%|███▎                                | 4574/50000 [49:40<8:35:03,  1.47it/s]


  9%|███▎                                | 4575/50000 [49:41<8:06:13,  1.56it/s]


  9%|███▎                                | 4576/50000 [49:41<7:47:24,  1.62it/s]


  9%|███▎                                | 4577/50000 [49:42<8:51:54,  1.42it/s]


  9%|███▎                                | 4578/50000 [49:43<8:26:36,  1.49it/s]


  9%|███▎                                | 4579/50000 [49:43<8:05:58,  1.56it/s]


  9%|███▎                                | 4580/50000 [49:44<8:30:44,  1.48it/s]


  9%|███▎                                | 4581/50000 [49:45<9:09:06,  1.38it/s]


  9%|███▎                                | 4582/50000 [49:46<8:34:22,  1.47it/s]


  9%|███▎                                | 4583/50000 [49:46<8:25:38,  1.50it/s]


  9%|███▎                                | 4584/50000 [49:47<8:09:27,  1.55it/s]


  9%|███▎                                | 4585/50000 [49:47<7:56:50,  1.59it/s]


  9%|███▎                                | 4586/50000 [49:48<7:32:58,  1.67it/s]


  9%|███▎                                | 4587/50000 [49:49<8:04:47,  1.56it/s]


  9%|███▎                                | 4588/50000 [49:49<7:46:22,  1.62it/s]


  9%|███▎                                | 4589/50000 [49:50<7:41:45,  1.64it/s]


  9%|███▎                                | 4590/50000 [49:51<7:56:59,  1.59it/s]


  9%|███▎                                | 4591/50000 [49:51<7:32:31,  1.67it/s]


  9%|███▎                                | 4592/50000 [49:52<7:15:23,  1.74it/s]


  9%|███▎                                | 4593/50000 [49:52<7:35:40,  1.66it/s]


  9%|███▎                                | 4594/50000 [49:53<7:32:41,  1.67it/s]


  9%|███▎                                | 4595/50000 [49:53<7:43:32,  1.63it/s]


  9%|███▎                                | 4596/50000 [49:54<7:42:23,  1.64it/s]


  9%|███▎                                | 4597/50000 [49:55<7:33:00,  1.67it/s]


  9%|███▎                                | 4598/50000 [49:55<8:00:06,  1.58it/s]


  9%|███▎                                | 4599/50000 [49:56<8:01:10,  1.57it/s]


  9%|███▎                                | 4600/50000 [49:57<7:54:38,  1.59it/s]
                                                                                
{'loss': 3.3966, 'grad_norm': 2.7222790718078613, 'learning_rate': 0.0009080000000000001, 'epoch': 0.24}

  9%|███▎                                | 4600/50000 [49:57<7:54:38,  1.59it/s]


  9%|███▎                                | 4601/50000 [49:57<7:40:22,  1.64it/s]


  9%|███▎                                | 4602/50000 [49:58<7:50:17,  1.61it/s]


  9%|███▎                                | 4603/50000 [49:58<7:41:29,  1.64it/s]


  9%|███▎                                | 4604/50000 [49:59<7:29:43,  1.68it/s]


  9%|███▎                                | 4605/50000 [50:00<7:38:34,  1.65it/s]


  9%|███▎                                | 4606/50000 [50:00<7:33:13,  1.67it/s]


  9%|███▎                                | 4607/50000 [50:01<8:10:19,  1.54it/s]


  9%|███▎                                | 4608/50000 [50:02<8:39:37,  1.46it/s]


  9%|███▎                                | 4609/50000 [50:02<8:11:42,  1.54it/s]


  9%|███▎                                | 4610/50000 [50:03<8:16:37,  1.52it/s]


  9%|███▎                                | 4611/50000 [50:04<7:57:06,  1.59it/s]


  9%|███▎                                | 4612/50000 [50:04<8:16:19,  1.52it/s]


  9%|███▎                                | 4613/50000 [50:05<7:59:37,  1.58it/s]


  9%|███▎                                | 4614/50000 [50:06<8:06:37,  1.55it/s]


  9%|███▎                                | 4615/50000 [50:06<7:51:25,  1.60it/s]


  9%|███▎                                | 4616/50000 [50:07<7:34:59,  1.66it/s]


  9%|███▎                                | 4617/50000 [50:07<7:49:14,  1.61it/s]


  9%|███▎                                | 4618/50000 [50:08<7:39:12,  1.65it/s]


  9%|███▎                                | 4619/50000 [50:09<8:14:27,  1.53it/s]


  9%|███▎                                | 4620/50000 [50:09<8:11:08,  1.54it/s]


  9%|███▎                                | 4621/50000 [50:10<8:24:38,  1.50it/s]


  9%|███▎                                | 4622/50000 [50:11<8:05:28,  1.56it/s]


  9%|███▎                                | 4623/50000 [50:11<8:12:00,  1.54it/s]


  9%|███▎                                | 4624/50000 [50:12<8:11:14,  1.54it/s]


  9%|███▎                                | 4625/50000 [50:12<7:48:53,  1.61it/s]


  9%|███▎                                | 4626/50000 [50:13<8:09:09,  1.55it/s]


  9%|███▎                                | 4627/50000 [50:14<7:47:19,  1.62it/s]


  9%|███▎                                | 4628/50000 [50:14<7:54:53,  1.59it/s]


  9%|███▎                                | 4629/50000 [50:15<7:43:09,  1.63it/s]


  9%|███▎                                | 4630/50000 [50:16<9:00:50,  1.40it/s]


  9%|███▎                                | 4631/50000 [50:16<8:30:41,  1.48it/s]


  9%|███▎                                | 4632/50000 [50:17<8:13:53,  1.53it/s]


  9%|███▎                                | 4633/50000 [50:18<8:16:39,  1.52it/s]


  9%|███▎                                | 4634/50000 [50:19<8:51:23,  1.42it/s]


  9%|███▎                                | 4635/50000 [50:19<8:45:40,  1.44it/s]


  9%|███▎                                | 4636/50000 [50:20<8:52:09,  1.42it/s]


  9%|███▎                                | 4637/50000 [50:21<8:21:41,  1.51it/s]


  9%|███▎                                | 4638/50000 [50:21<8:36:43,  1.46it/s]


  9%|███▎                                | 4639/50000 [50:22<8:05:12,  1.56it/s]


  9%|███▎                                | 4640/50000 [50:22<8:14:54,  1.53it/s]


  9%|███▎                                | 4641/50000 [50:23<7:55:45,  1.59it/s]


  9%|███▎                                | 4642/50000 [50:24<7:49:22,  1.61it/s]


  9%|███▎                                | 4643/50000 [50:24<8:16:57,  1.52it/s]


  9%|███▎                                | 4644/50000 [50:25<7:46:12,  1.62it/s]


  9%|███▎                                | 4645/50000 [50:25<7:21:55,  1.71it/s]


  9%|███▎                                | 4646/50000 [50:26<7:22:54,  1.71it/s]


  9%|███▎                                | 4647/50000 [50:27<7:52:13,  1.60it/s]


  9%|███▎                                | 4648/50000 [50:27<7:55:24,  1.59it/s]


  9%|███▎                                | 4649/50000 [50:28<7:50:31,  1.61it/s]


  9%|███▎                                | 4650/50000 [50:29<7:52:48,  1.60it/s]


  9%|███▎                                | 4651/50000 [50:29<8:13:41,  1.53it/s]


  9%|███▎                                | 4652/50000 [50:30<8:32:34,  1.47it/s]


  9%|███▎                                | 4653/50000 [50:31<8:30:47,  1.48it/s]


  9%|███▎                                | 4654/50000 [50:31<8:19:35,  1.51it/s]


  9%|███▎                                | 4655/50000 [50:32<7:56:08,  1.59it/s]


  9%|███▎                                | 4656/50000 [50:33<8:05:56,  1.56it/s]


  9%|███▎                                | 4657/50000 [50:33<7:45:15,  1.62it/s]


  9%|███▎                                | 4658/50000 [50:34<8:30:36,  1.48it/s]


  9%|███▎                                | 4659/50000 [50:35<8:15:08,  1.53it/s]


  9%|███▎                                | 4660/50000 [50:35<8:27:17,  1.49it/s]


  9%|███▎                                | 4661/50000 [50:36<8:42:05,  1.45it/s]


  9%|███▎                                | 4662/50000 [50:37<8:40:46,  1.45it/s]


  9%|███▎                                | 4663/50000 [50:37<8:20:49,  1.51it/s]


  9%|███▎                                | 4664/50000 [50:38<8:14:08,  1.53it/s]


  9%|███▎                                | 4665/50000 [50:38<7:56:32,  1.59it/s]


  9%|███▎                                | 4666/50000 [50:39<8:06:05,  1.55it/s]


  9%|███▎                                | 4667/50000 [50:40<8:08:28,  1.55it/s]


  9%|███▎                                | 4668/50000 [50:41<8:17:31,  1.52it/s]


  9%|███▎                                | 4669/50000 [50:41<8:23:43,  1.50it/s]


  9%|███▎                                | 4670/50000 [50:42<8:14:30,  1.53it/s]


  9%|███▎                                | 4671/50000 [50:42<8:05:02,  1.56it/s]


  9%|███▎                                | 4672/50000 [50:43<8:02:15,  1.57it/s]


  9%|███▎                                | 4673/50000 [50:44<7:50:10,  1.61it/s]


  9%|███▎                                | 4674/50000 [50:44<7:43:04,  1.63it/s]


  9%|███▎                                | 4675/50000 [50:45<7:33:45,  1.66it/s]


  9%|███▎                                | 4676/50000 [50:45<7:34:25,  1.66it/s]


  9%|███▎                                | 4677/50000 [50:46<7:30:42,  1.68it/s]


  9%|███▎                                | 4678/50000 [50:47<7:33:36,  1.67it/s]


  9%|███▎                                | 4679/50000 [50:47<7:09:18,  1.76it/s]


  9%|███▎                                | 4680/50000 [50:48<7:23:37,  1.70it/s]


  9%|███▎                                | 4681/50000 [50:48<7:26:51,  1.69it/s]


  9%|███▎                                | 4682/50000 [50:49<7:11:35,  1.75it/s]


  9%|███▎                                | 4683/50000 [50:50<7:36:01,  1.66it/s]


  9%|███▎                                | 4684/50000 [50:50<8:30:02,  1.48it/s]


  9%|███▎                                | 4685/50000 [50:51<8:20:33,  1.51it/s]


  9%|███▎                                | 4686/50000 [50:52<8:11:04,  1.54it/s]


  9%|███▎                                | 4687/50000 [50:52<8:54:29,  1.41it/s]


  9%|███▍                                | 4688/50000 [50:53<9:02:24,  1.39it/s]


  9%|███▍                                | 4689/50000 [50:54<8:15:19,  1.52it/s]


  9%|███▍                                | 4690/50000 [50:54<7:58:45,  1.58it/s]


  9%|███▍                                | 4691/50000 [50:55<7:40:01,  1.64it/s]


  9%|███▍                                | 4692/50000 [50:55<7:44:07,  1.63it/s]


  9%|███▍                                | 4693/50000 [50:56<7:39:47,  1.64it/s]


  9%|███▍                                | 4694/50000 [50:57<8:02:36,  1.56it/s]


  9%|███▍                                | 4695/50000 [50:57<7:54:06,  1.59it/s]


  9%|███▍                                | 4696/50000 [50:58<8:02:29,  1.56it/s]


  9%|███▍                                | 4697/50000 [50:59<7:58:22,  1.58it/s]


  9%|███▍                                | 4698/50000 [50:59<8:05:24,  1.56it/s]


  9%|███▍                                | 4699/50000 [51:00<8:11:45,  1.54it/s]


  9%|███▍                                | 4700/50000 [51:01<8:49:11,  1.43it/s]
                                                                                
{'loss': 3.3839, 'grad_norm': 2.49208402633667, 'learning_rate': 0.000906, 'epoch': 0.25}

  9%|███▍                                | 4700/50000 [51:01<8:49:11,  1.43it/s]


  9%|███▍                                | 4701/50000 [51:01<8:32:35,  1.47it/s]


  9%|███▍                                | 4702/50000 [51:02<8:21:00,  1.51it/s]


  9%|███▍                                | 4703/50000 [51:03<8:14:14,  1.53it/s]


  9%|███▍                                | 4704/50000 [51:03<8:15:41,  1.52it/s]


  9%|███▍                                | 4705/50000 [51:04<8:17:42,  1.52it/s]


  9%|███▍                                | 4706/50000 [51:05<8:14:35,  1.53it/s]


  9%|███▍                                | 4707/50000 [51:05<8:26:56,  1.49it/s]


  9%|███▍                                | 4708/50000 [51:06<8:24:09,  1.50it/s]


  9%|███▍                                | 4709/50000 [51:07<8:16:17,  1.52it/s]


  9%|███▍                                | 4710/50000 [51:07<8:38:43,  1.46it/s]


  9%|███▍                                | 4711/50000 [51:08<8:16:25,  1.52it/s]


  9%|███▍                                | 4712/50000 [51:09<8:53:40,  1.41it/s]


  9%|███▍                                | 4713/50000 [51:10<8:46:54,  1.43it/s]


  9%|███▍                                | 4714/50000 [51:10<8:36:22,  1.46it/s]


  9%|███▍                                | 4715/50000 [51:11<8:16:10,  1.52it/s]


  9%|███▍                                | 4716/50000 [51:12<8:35:38,  1.46it/s]


  9%|███▍                                | 4717/50000 [51:12<8:28:36,  1.48it/s]


  9%|███▍                                | 4718/50000 [51:13<8:45:44,  1.44it/s]


  9%|███▍                                | 4719/50000 [51:14<8:22:47,  1.50it/s]


  9%|███▍                                | 4720/50000 [51:14<8:14:25,  1.53it/s]


  9%|███▍                                | 4721/50000 [51:15<7:55:34,  1.59it/s]


  9%|███▍                                | 4722/50000 [51:15<7:49:58,  1.61it/s]


  9%|███▍                                | 4723/50000 [51:16<7:44:10,  1.63it/s]


  9%|███▍                                | 4724/50000 [51:17<7:33:06,  1.67it/s]


  9%|███▍                                | 4725/50000 [51:17<7:04:42,  1.78it/s]


  9%|███▍                                | 4726/50000 [51:18<7:12:10,  1.75it/s]


  9%|███▍                                | 4727/50000 [51:18<7:14:29,  1.74it/s]


  9%|███▍                                | 4728/50000 [51:19<7:14:34,  1.74it/s]


  9%|███▍                                | 4729/50000 [51:19<7:39:02,  1.64it/s]


  9%|███▍                                | 4730/50000 [51:20<7:42:26,  1.63it/s]


  9%|███▍                                | 4731/50000 [51:21<7:28:31,  1.68it/s]


  9%|███▍                                | 4732/50000 [51:21<7:38:43,  1.64it/s]


  9%|███▍                                | 4733/50000 [51:22<7:24:29,  1.70it/s]


  9%|███▍                                | 4734/50000 [51:22<7:36:13,  1.65it/s]


  9%|███▍                                | 4735/50000 [51:23<7:18:03,  1.72it/s]


  9%|███▍                                | 4736/50000 [51:24<7:17:30,  1.72it/s]


  9%|███▍                                | 4737/50000 [51:24<7:01:41,  1.79it/s]


  9%|███▍                                | 4738/50000 [51:25<7:11:20,  1.75it/s]


  9%|███▍                                | 4739/50000 [51:25<7:27:57,  1.68it/s]


  9%|███▍                                | 4740/50000 [51:26<7:41:03,  1.64it/s]


  9%|███▍                                | 4741/50000 [51:26<7:30:18,  1.68it/s]


  9%|███▍                                | 4742/50000 [51:27<7:21:25,  1.71it/s]


  9%|███▍                                | 4743/50000 [51:28<7:13:41,  1.74it/s]


  9%|███▍                                | 4744/50000 [51:28<7:09:40,  1.76it/s]


  9%|███▍                                | 4745/50000 [51:29<7:25:24,  1.69it/s]


  9%|███▍                                | 4746/50000 [51:29<7:29:10,  1.68it/s]


  9%|███▍                                | 4747/50000 [51:30<7:36:16,  1.65it/s]


  9%|███▍                                | 4748/50000 [51:31<7:29:39,  1.68it/s]


  9%|███▍                                | 4749/50000 [51:31<7:43:44,  1.63it/s]


 10%|███▍                                | 4750/50000 [51:32<7:50:40,  1.60it/s]


 10%|███▍                                | 4751/50000 [51:33<7:58:32,  1.58it/s]


 10%|███▍                                | 4752/50000 [51:33<8:04:12,  1.56it/s]


 10%|███▍                                | 4753/50000 [51:34<8:42:11,  1.44it/s]


 10%|███▍                                | 4754/50000 [51:35<8:47:23,  1.43it/s]


 10%|███▍                                | 4755/50000 [51:36<9:21:09,  1.34it/s]


 10%|███▍                                | 4756/50000 [51:36<9:41:47,  1.30it/s]


 10%|███▍                                | 4757/50000 [51:37<8:53:26,  1.41it/s]


 10%|███▍                                | 4758/50000 [51:38<8:38:01,  1.46it/s]


 10%|███▍                                | 4759/50000 [51:38<8:28:42,  1.48it/s]


 10%|███▍                                | 4760/50000 [51:39<8:08:14,  1.54it/s]


 10%|███▍                                | 4761/50000 [51:39<7:39:01,  1.64it/s]


 10%|███▍                                | 4762/50000 [51:40<7:36:23,  1.65it/s]


 10%|███▍                                | 4763/50000 [51:41<7:39:55,  1.64it/s]


 10%|███▍                                | 4764/50000 [51:41<7:49:24,  1.61it/s]


 10%|███▍                                | 4765/50000 [51:42<8:34:33,  1.47it/s]


 10%|███▍                                | 4766/50000 [51:43<8:32:03,  1.47it/s]


 10%|███▍                                | 4767/50000 [51:43<8:27:20,  1.49it/s]


 10%|███▍                                | 4768/50000 [51:44<8:04:02,  1.56it/s]


 10%|███▍                                | 4769/50000 [51:45<8:04:53,  1.55it/s]


 10%|███▍                                | 4770/50000 [51:45<8:08:18,  1.54it/s]


 10%|███▍                                | 4771/50000 [51:46<8:49:16,  1.42it/s]


 10%|███▍                                | 4772/50000 [51:47<8:18:05,  1.51it/s]


 10%|███▍                                | 4773/50000 [51:47<8:01:52,  1.56it/s]


 10%|███▍                                | 4774/50000 [51:48<7:28:34,  1.68it/s]


 10%|███▍                                | 4775/50000 [51:48<7:37:13,  1.65it/s]


 10%|███▍                                | 4776/50000 [51:49<7:48:55,  1.61it/s]


 10%|███▍                                | 4777/50000 [51:50<8:16:21,  1.52it/s]


 10%|███▍                                | 4778/50000 [51:50<8:09:14,  1.54it/s]


 10%|███▍                                | 4779/50000 [51:51<7:36:41,  1.65it/s]


 10%|███▍                                | 4780/50000 [51:52<7:52:55,  1.59it/s]


 10%|███▍                                | 4781/50000 [51:52<7:56:59,  1.58it/s]


 10%|███▍                                | 4782/50000 [51:53<8:01:19,  1.57it/s]


 10%|███▍                                | 4783/50000 [51:54<8:32:13,  1.47it/s]


 10%|███▍                                | 4784/50000 [51:54<8:04:40,  1.55it/s]


 10%|███▍                                | 4785/50000 [51:55<8:01:00,  1.57it/s]


 10%|███▍                                | 4786/50000 [51:55<7:48:24,  1.61it/s]


 10%|███▍                                | 4787/50000 [51:56<8:01:02,  1.57it/s]


 10%|███▍                                | 4788/50000 [51:57<7:30:39,  1.67it/s]


 10%|███▍                                | 4789/50000 [51:57<8:28:53,  1.48it/s]


 10%|███▍                                | 4790/50000 [51:58<8:12:07,  1.53it/s]


 10%|███▍                                | 4791/50000 [51:59<7:56:11,  1.58it/s]


 10%|███▍                                | 4792/50000 [51:59<7:50:39,  1.60it/s]


 10%|███▍                                | 4793/50000 [52:00<7:46:45,  1.61it/s]


 10%|███▍                                | 4794/50000 [52:00<7:35:33,  1.65it/s]


 10%|███▍                                | 4795/50000 [52:01<7:54:40,  1.59it/s]


 10%|███▍                                | 4796/50000 [52:02<8:04:32,  1.55it/s]


 10%|███▍                                | 4797/50000 [52:02<8:03:43,  1.56it/s]


 10%|███▍                                | 4798/50000 [52:03<8:00:47,  1.57it/s]


 10%|███▍                                | 4799/50000 [52:04<7:58:59,  1.57it/s]


 10%|███▍                                | 4800/50000 [52:04<8:22:15,  1.50it/s]
                                                                                
{'loss': 3.4049, 'grad_norm': 2.3538753986358643, 'learning_rate': 0.0009040000000000001, 'epoch': 0.25}

 10%|███▍                                | 4800/50000 [52:04<8:22:15,  1.50it/s]


 10%|███▍                                | 4801/50000 [52:05<8:21:02,  1.50it/s]


 10%|███▍                                | 4802/50000 [52:06<8:25:53,  1.49it/s]


 10%|███▍                                | 4803/50000 [52:07<8:57:25,  1.40it/s]


 10%|███▍                                | 4804/50000 [52:07<8:32:32,  1.47it/s]


 10%|███▍                                | 4805/50000 [52:08<8:44:12,  1.44it/s]


 10%|███▍                                | 4806/50000 [52:09<9:30:42,  1.32it/s]


 10%|███▍                                | 4807/50000 [52:10<9:53:17,  1.27it/s]


 10%|███▍                                | 4808/50000 [52:10<9:24:27,  1.33it/s]


 10%|███▍                                | 4809/50000 [52:11<8:49:55,  1.42it/s]


 10%|███▍                                | 4810/50000 [52:12<8:27:13,  1.48it/s]


 10%|███▍                                | 4811/50000 [52:12<8:29:51,  1.48it/s]


 10%|███▍                                | 4812/50000 [52:13<8:10:24,  1.54it/s]


 10%|███▍                                | 4813/50000 [52:13<7:48:17,  1.61it/s]


 10%|███▍                                | 4814/50000 [52:14<8:13:14,  1.53it/s]


 10%|███▍                                | 4815/50000 [52:15<8:05:45,  1.55it/s]


 10%|███▍                                | 4816/50000 [52:15<8:20:22,  1.50it/s]


 10%|███▍                                | 4817/50000 [52:16<8:17:30,  1.51it/s]


 10%|███▍                                | 4818/50000 [52:17<9:01:15,  1.39it/s]


 10%|███▍                                | 4819/50000 [52:18<9:23:22,  1.34it/s]


 10%|███▍                                | 4820/50000 [52:18<9:01:34,  1.39it/s]


 10%|███▍                                | 4821/50000 [52:19<9:34:59,  1.31it/s]


 10%|███▍                                | 4822/50000 [52:20<9:17:03,  1.35it/s]


 10%|███▍                                | 4823/50000 [52:21<8:55:47,  1.41it/s]


 10%|███▍                                | 4824/50000 [52:22<9:43:59,  1.29it/s]


 10%|███▍                                | 4825/50000 [52:22<8:57:37,  1.40it/s]


 10%|███▍                                | 4826/50000 [52:23<9:12:44,  1.36it/s]


 10%|███▍                                | 4827/50000 [52:23<8:34:16,  1.46it/s]


 10%|███▍                                | 4828/50000 [52:24<8:23:00,  1.50it/s]


 10%|███▍                                | 4829/50000 [52:25<8:23:41,  1.49it/s]


 10%|███▍                                | 4830/50000 [52:25<8:21:24,  1.50it/s]


 10%|███▍                                | 4831/50000 [52:26<8:23:56,  1.49it/s]


 10%|███▍                                | 4832/50000 [52:27<8:19:55,  1.51it/s]


 10%|███▍                                | 4833/50000 [52:27<8:21:27,  1.50it/s]


 10%|███▍                                | 4834/50000 [52:28<8:19:49,  1.51it/s]


 10%|███▍                                | 4835/50000 [52:29<7:58:31,  1.57it/s]


 10%|███▍                                | 4836/50000 [52:29<7:50:44,  1.60it/s]


 10%|███▍                                | 4837/50000 [52:30<8:21:33,  1.50it/s]


 10%|███▍                                | 4838/50000 [52:31<7:46:13,  1.61it/s]


 10%|███▍                                | 4839/50000 [52:31<7:40:33,  1.63it/s]


 10%|███▍                                | 4840/50000 [52:32<8:45:20,  1.43it/s]


 10%|███▍                                | 4841/50000 [52:33<8:33:42,  1.47it/s]


 10%|███▍                                | 4842/50000 [52:33<8:08:22,  1.54it/s]


 10%|███▍                                | 4843/50000 [52:34<7:54:55,  1.58it/s]


 10%|███▍                                | 4844/50000 [52:34<7:49:37,  1.60it/s]


 10%|███▍                                | 4845/50000 [52:35<7:54:32,  1.59it/s]


 10%|███▍                                | 4846/50000 [52:36<8:22:50,  1.50it/s]


 10%|███▍                                | 4847/50000 [52:37<8:26:26,  1.49it/s]


 10%|███▍                                | 4848/50000 [52:37<8:10:17,  1.53it/s]


 10%|███▍                                | 4849/50000 [52:38<8:32:20,  1.47it/s]


 10%|███▍                                | 4850/50000 [52:39<9:04:14,  1.38it/s]


 10%|███▍                                | 4851/50000 [52:39<9:08:44,  1.37it/s]


 10%|███▍                                | 4852/50000 [52:40<9:11:57,  1.36it/s]


 10%|███▍                                | 4853/50000 [52:41<9:11:10,  1.37it/s]


 10%|███▍                                | 4854/50000 [52:42<8:46:00,  1.43it/s]


 10%|███▍                                | 4855/50000 [52:42<8:18:24,  1.51it/s]


 10%|███▍                                | 4856/50000 [52:43<7:54:48,  1.58it/s]


 10%|███▍                                | 4857/50000 [52:43<7:56:50,  1.58it/s]


 10%|███▍                                | 4858/50000 [52:44<7:56:43,  1.58it/s]


 10%|███▍                                | 4859/50000 [52:45<8:03:48,  1.56it/s]


 10%|███▍                                | 4860/50000 [52:45<7:54:19,  1.59it/s]


 10%|███▍                                | 4861/50000 [52:46<7:42:12,  1.63it/s]


 10%|███▌                                | 4862/50000 [52:46<7:47:53,  1.61it/s]


 10%|███▌                                | 4863/50000 [52:47<7:25:28,  1.69it/s]


 10%|███▌                                | 4864/50000 [52:48<7:26:06,  1.69it/s]


 10%|███▌                                | 4865/50000 [52:48<8:04:28,  1.55it/s]


 10%|███▌                                | 4866/50000 [52:49<7:37:03,  1.65it/s]


 10%|███▌                                | 4867/50000 [52:50<8:04:42,  1.55it/s]


 10%|███▌                                | 4868/50000 [52:50<7:49:23,  1.60it/s]


 10%|███▌                                | 4869/50000 [52:51<7:52:45,  1.59it/s]


 10%|███▌                                | 4870/50000 [52:51<7:33:54,  1.66it/s]


 10%|███▌                                | 4871/50000 [52:52<8:05:29,  1.55it/s]


 10%|███▌                                | 4872/50000 [52:53<8:05:25,  1.55it/s]


 10%|███▌                                | 4873/50000 [52:53<7:47:05,  1.61it/s]


 10%|███▌                                | 4874/50000 [52:54<7:54:04,  1.59it/s]


 10%|███▌                                | 4875/50000 [52:55<8:03:12,  1.56it/s]


 10%|███▌                                | 4876/50000 [52:55<7:51:04,  1.60it/s]


 10%|███▌                                | 4877/50000 [52:56<7:38:32,  1.64it/s]


 10%|███▌                                | 4878/50000 [52:56<7:36:43,  1.65it/s]


 10%|███▌                                | 4879/50000 [52:57<7:54:04,  1.59it/s]


 10%|███▌                                | 4880/50000 [52:58<8:00:26,  1.57it/s]


 10%|███▌                                | 4881/50000 [52:58<8:21:01,  1.50it/s]


 10%|███▌                                | 4882/50000 [52:59<8:19:15,  1.51it/s]


 10%|███▌                                | 4883/50000 [53:00<8:05:02,  1.55it/s]


 10%|███▌                                | 4884/50000 [53:01<9:12:30,  1.36it/s]


 10%|███▌                                | 4885/50000 [53:01<8:30:38,  1.47it/s]


 10%|███▌                                | 4886/50000 [53:02<8:25:59,  1.49it/s]


 10%|███▌                                | 4887/50000 [53:02<8:08:19,  1.54it/s]


 10%|███▌                                | 4888/50000 [53:03<8:03:18,  1.56it/s]


 10%|███▌                                | 4889/50000 [53:04<7:48:20,  1.61it/s]


 10%|███▌                                | 4890/50000 [53:04<7:34:54,  1.65it/s]


 10%|███▌                                | 4891/50000 [53:05<8:07:18,  1.54it/s]


 10%|███▌                                | 4892/50000 [53:06<8:31:28,  1.47it/s]


 10%|███▌                                | 4893/50000 [53:06<8:51:17,  1.42it/s]


 10%|███▌                                | 4894/50000 [53:07<8:56:29,  1.40it/s]


 10%|███▌                                | 4895/50000 [53:08<8:38:16,  1.45it/s]


 10%|███▌                                | 4896/50000 [53:08<8:15:31,  1.52it/s]


 10%|███▌                                | 4897/50000 [53:09<8:21:05,  1.50it/s]


 10%|███▌                                | 4898/50000 [53:10<8:15:03,  1.52it/s]


 10%|███▌                                | 4899/50000 [53:10<7:54:08,  1.59it/s]


 10%|███▌                                | 4900/50000 [53:11<7:56:24,  1.58it/s]
                                                                                
{'loss': 3.3781, 'grad_norm': 2.627166271209717, 'learning_rate': 0.000902, 'epoch': 0.26}

 10%|███▌                                | 4900/50000 [53:11<7:56:24,  1.58it/s]


 10%|███▌                                | 4901/50000 [53:12<7:44:04,  1.62it/s]


 10%|███▌                                | 4902/50000 [53:12<7:40:42,  1.63it/s]


 10%|███▌                                | 4903/50000 [53:13<7:51:16,  1.59it/s]


 10%|███▌                                | 4904/50000 [53:14<8:15:03,  1.52it/s]


 10%|███▌                                | 4905/50000 [53:14<8:42:08,  1.44it/s]


 10%|███▌                                | 4906/50000 [53:15<8:17:13,  1.51it/s]


 10%|███▌                                | 4907/50000 [53:16<8:34:26,  1.46it/s]


 10%|███▌                                | 4908/50000 [53:16<9:03:00,  1.38it/s]


 10%|███▌                                | 4909/50000 [53:17<9:27:22,  1.32it/s]


 10%|███▌                                | 4910/50000 [53:18<8:46:07,  1.43it/s]


 10%|███▌                                | 4911/50000 [53:18<8:12:52,  1.52it/s]


 10%|███▌                                | 4912/50000 [53:19<8:09:34,  1.53it/s]


 10%|███▌                                | 4913/50000 [53:20<8:52:46,  1.41it/s]


 10%|███▌                                | 4914/50000 [53:21<8:41:54,  1.44it/s]


 10%|███▌                                | 4915/50000 [53:21<8:29:41,  1.47it/s]


 10%|███▌                                | 4916/50000 [53:22<8:02:49,  1.56it/s]


 10%|███▌                                | 4917/50000 [53:22<7:47:01,  1.61it/s]


 10%|███▌                                | 4918/50000 [53:23<7:35:49,  1.65it/s]


 10%|███▌                                | 4919/50000 [53:24<7:44:56,  1.62it/s]


 10%|███▌                                | 4920/50000 [53:24<7:49:56,  1.60it/s]


 10%|███▌                                | 4921/50000 [53:25<7:49:42,  1.60it/s]


 10%|███▌                                | 4922/50000 [53:25<7:43:40,  1.62it/s]


 10%|███▌                                | 4923/50000 [53:26<7:36:16,  1.65it/s]


 10%|███▌                                | 4924/50000 [53:27<7:50:54,  1.60it/s]


 10%|███▌                                | 4925/50000 [53:27<7:45:32,  1.61it/s]


 10%|███▌                                | 4926/50000 [53:28<7:18:26,  1.71it/s]


 10%|███▌                                | 4927/50000 [53:28<7:18:36,  1.71it/s]


 10%|███▌                                | 4928/50000 [53:29<7:15:47,  1.72it/s]


 10%|███▌                                | 4929/50000 [53:30<7:35:10,  1.65it/s]


 10%|███▌                                | 4930/50000 [53:30<7:50:07,  1.60it/s]


 10%|███▌                                | 4931/50000 [53:31<7:44:52,  1.62it/s]


 10%|███▌                                | 4932/50000 [53:32<7:51:13,  1.59it/s]


 10%|███▌                                | 4933/50000 [53:32<8:04:16,  1.55it/s]


 10%|███▌                                | 4934/50000 [53:33<7:49:25,  1.60it/s]


 10%|███▌                                | 4935/50000 [53:33<7:58:53,  1.57it/s]


 10%|███▌                                | 4936/50000 [53:34<7:49:38,  1.60it/s]


 10%|███▌                                | 4937/50000 [53:35<8:20:47,  1.50it/s]


 10%|███▌                                | 4938/50000 [53:35<7:56:14,  1.58it/s]


 10%|███▌                                | 4939/50000 [53:36<7:57:23,  1.57it/s]


 10%|███▌                                | 4940/50000 [53:37<7:30:26,  1.67it/s]


 10%|███▌                                | 4941/50000 [53:37<7:25:07,  1.69it/s]


 10%|███▌                                | 4942/50000 [53:38<7:35:34,  1.65it/s]


 10%|███▌                                | 4943/50000 [53:38<7:42:24,  1.62it/s]


 10%|███▌                                | 4944/50000 [53:39<8:05:21,  1.55it/s]


 10%|███▌                                | 4945/50000 [53:40<8:05:02,  1.55it/s]


 10%|███▌                                | 4946/50000 [53:40<7:54:47,  1.58it/s]


 10%|███▌                                | 4947/50000 [53:41<7:56:13,  1.58it/s]


 10%|███▌                                | 4948/50000 [53:42<8:01:33,  1.56it/s]


 10%|███▌                                | 4949/50000 [53:42<7:53:53,  1.58it/s]


 10%|███▌                                | 4950/50000 [53:43<8:05:33,  1.55it/s]


 10%|███▌                                | 4951/50000 [53:43<7:51:21,  1.59it/s]


 10%|███▌                                | 4952/50000 [53:44<8:01:02,  1.56it/s]


 10%|███▌                                | 4953/50000 [53:45<7:45:49,  1.61it/s]


 10%|███▌                                | 4954/50000 [53:45<8:06:17,  1.54it/s]


 10%|███▌                                | 4955/50000 [53:46<7:53:18,  1.59it/s]


 10%|███▌                                | 4956/50000 [53:47<8:05:20,  1.55it/s]


 10%|███▌                                | 4957/50000 [53:47<8:06:55,  1.54it/s]


 10%|███▌                                | 4958/50000 [53:48<8:06:40,  1.54it/s]


 10%|███▌                                | 4959/50000 [53:49<8:04:14,  1.55it/s]


 10%|███▌                                | 4960/50000 [53:49<7:49:19,  1.60it/s]


 10%|███▌                                | 4961/50000 [53:50<7:41:30,  1.63it/s]


 10%|███▌                                | 4962/50000 [53:51<7:55:56,  1.58it/s]


 10%|███▌                                | 4963/50000 [53:51<7:49:00,  1.60it/s]


 10%|███▌                                | 4964/50000 [53:52<9:12:56,  1.36it/s]


 10%|███▌                                | 4965/50000 [53:53<8:33:04,  1.46it/s]


 10%|███▌                                | 4966/50000 [53:53<8:51:13,  1.41it/s]


 10%|███▌                                | 4967/50000 [53:54<9:02:37,  1.38it/s]


 10%|███▌                                | 4968/50000 [53:55<8:45:43,  1.43it/s]


 10%|███▌                                | 4969/50000 [53:56<8:39:15,  1.45it/s]


 10%|███▌                                | 4970/50000 [53:56<8:35:51,  1.45it/s]


 10%|███▌                                | 4971/50000 [53:57<8:29:23,  1.47it/s]


 10%|███▌                                | 4972/50000 [53:58<9:03:00,  1.38it/s]


 10%|███▌                                | 4973/50000 [53:58<8:53:50,  1.41it/s]


 10%|███▌                                | 4974/50000 [53:59<8:34:47,  1.46it/s]


 10%|███▌                                | 4975/50000 [54:00<8:03:02,  1.55it/s]


 10%|███▌                                | 4976/50000 [54:00<7:45:06,  1.61it/s]


 10%|███▌                                | 4977/50000 [54:01<8:52:20,  1.41it/s]


 10%|███▌                                | 4978/50000 [54:02<8:56:53,  1.40it/s]


 10%|███▌                                | 4979/50000 [54:02<8:39:38,  1.44it/s]


 10%|███▌                                | 4980/50000 [54:03<8:23:28,  1.49it/s]


 10%|███▌                                | 4981/50000 [54:04<9:14:45,  1.35it/s]


 10%|███▌                                | 4982/50000 [54:04<8:30:53,  1.47it/s]


 10%|███▌                                | 4983/50000 [54:05<8:20:07,  1.50it/s]


 10%|███▌                                | 4984/50000 [54:06<8:40:12,  1.44it/s]


 10%|███▌                                | 4985/50000 [54:07<8:55:39,  1.40it/s]


 10%|███▌                                | 4986/50000 [54:07<9:05:02,  1.38it/s]


 10%|███▌                                | 4987/50000 [54:08<8:31:06,  1.47it/s]


 10%|███▌                                | 4988/50000 [54:09<8:31:34,  1.47it/s]


 10%|███▌                                | 4989/50000 [54:09<8:30:27,  1.47it/s]


 10%|███▌                                | 4990/50000 [54:10<8:45:54,  1.43it/s]


 10%|███▌                                | 4991/50000 [54:11<8:35:13,  1.46it/s]


 10%|███▌                                | 4992/50000 [54:11<8:29:39,  1.47it/s]


 10%|███▌                                | 4993/50000 [54:12<8:09:39,  1.53it/s]


 10%|███▌                                | 4994/50000 [54:13<8:04:14,  1.55it/s]


 10%|███▌                                | 4995/50000 [54:13<8:22:12,  1.49it/s]


 10%|███▌                                | 4996/50000 [54:14<8:04:22,  1.55it/s]


 10%|███▌                                | 4997/50000 [54:14<7:54:54,  1.58it/s]


 10%|███▌                                | 4998/50000 [54:15<7:23:26,  1.69it/s]


 10%|███▌                                | 4999/50000 [54:16<7:23:12,  1.69it/s]


 10%|███▌                                | 5000/50000 [54:16<7:16:05,  1.72it/s]
                                                                                
{'loss': 3.3465, 'grad_norm': 2.5794265270233154, 'learning_rate': 0.0009000000000000001, 'epoch': 0.26}

 10%|███▌                                | 5000/50000 [54:16<7:16:05,  1.72it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.01s/it][A





 75%|█████████████████████████████████▊           | 3/4 [00:03<00:01,  1.23s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:04<00:00,  1.30s/it][ABuilding prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache


Loading model cost 0.831 seconds.
Prefix dict has been built successfully.



                                                                                


                                                                                
[A{'eval_rouge-1': 32.701170000000005, 'eval_rouge-2': 7.485713999999999, 'eval_rouge-l': 25.88236, 'eval_bleu-4': 0.03686248770142814, 'eval_runtime': 8.3452, 'eval_samples_per_second': 5.991, 'eval_steps_per_second': 0.479, 'epoch': 0.26}

 10%|███▌                                | 5000/50000 [54:24<7:16:05,  1.72it/s]

100%|█████████████████████████████████████████████| 4/4 [00:06<00:00,  1.30s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-5000


tokenizer config file saved in ./output/tmp-checkpoint-5000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-5000/special_tokens_map.json



 10%|███▌                               | 5001/50000 [54:25<39:15:50,  3.14s/it]


 10%|███▌                               | 5002/50000 [54:26<29:41:48,  2.38s/it]


 10%|███▌                               | 5003/50000 [54:26<22:41:49,  1.82s/it]


 10%|███▌                               | 5004/50000 [54:27<18:13:58,  1.46s/it]


 10%|███▌                               | 5005/50000 [54:27<14:41:53,  1.18s/it]


 10%|███▌                               | 5006/50000 [54:28<12:38:07,  1.01s/it]


 10%|███▌                               | 5007/50000 [54:29<10:57:25,  1.14it/s]


 10%|███▌                               | 5008/50000 [54:29<10:05:17,  1.24it/s]


 10%|███▌                                | 5009/50000 [54:30<9:11:54,  1.36it/s]


 10%|███▌                                | 5010/50000 [54:31<8:50:02,  1.41it/s]


 10%|███▌                                | 5011/50000 [54:31<8:57:01,  1.40it/s]


 10%|███▌                                | 5012/50000 [54:32<8:59:18,  1.39it/s]


 10%|███▌                                | 5013/50000 [54:33<8:40:07,  1.44it/s]


 10%|███▌                                | 5014/50000 [54:33<8:47:15,  1.42it/s]


 10%|███▌                                | 5015/50000 [54:34<8:34:35,  1.46it/s]


 10%|███▌                                | 5016/50000 [54:35<8:25:50,  1.48it/s]


 10%|███▌                                | 5017/50000 [54:35<8:21:00,  1.50it/s]


 10%|███▌                                | 5018/50000 [54:36<8:05:02,  1.55it/s]


 10%|███▌                                | 5019/50000 [54:36<7:26:17,  1.68it/s]


 10%|███▌                                | 5020/50000 [54:37<7:25:10,  1.68it/s]


 10%|███▌                                | 5021/50000 [54:38<7:34:26,  1.65it/s]


 10%|███▌                                | 5022/50000 [54:38<7:26:41,  1.68it/s]


 10%|███▌                                | 5023/50000 [54:39<7:17:02,  1.72it/s]


 10%|███▌                                | 5024/50000 [54:39<7:32:54,  1.66it/s]


 10%|███▌                                | 5025/50000 [54:40<8:13:00,  1.52it/s]


 10%|███▌                                | 5026/50000 [54:41<7:50:56,  1.59it/s]


 10%|███▌                                | 5027/50000 [54:41<8:21:27,  1.49it/s]


 10%|███▌                                | 5028/50000 [54:42<8:10:58,  1.53it/s]


 10%|███▌                                | 5029/50000 [54:43<7:57:41,  1.57it/s]


 10%|███▌                                | 5030/50000 [54:43<7:54:01,  1.58it/s]


 10%|███▌                                | 5031/50000 [54:44<8:53:07,  1.41it/s]


 10%|███▌                                | 5032/50000 [54:45<8:20:05,  1.50it/s]


 10%|███▌                                | 5033/50000 [54:46<9:20:43,  1.34it/s]


 10%|███▌                                | 5034/50000 [54:46<9:23:52,  1.33it/s]


 10%|███▋                                | 5035/50000 [54:47<8:45:38,  1.43it/s]


 10%|███▋                                | 5036/50000 [54:48<9:30:03,  1.31it/s]


 10%|███▋                                | 5037/50000 [54:49<9:34:30,  1.30it/s]


 10%|███▋                                | 5038/50000 [54:49<9:10:14,  1.36it/s]


 10%|███▋                                | 5039/50000 [54:50<9:00:17,  1.39it/s]


 10%|███▋                                | 5040/50000 [54:51<8:34:43,  1.46it/s]


 10%|███▋                                | 5041/50000 [54:51<8:23:56,  1.49it/s]


 10%|███▋                                | 5042/50000 [54:52<7:59:28,  1.56it/s]


 10%|███▋                                | 5043/50000 [54:53<7:57:57,  1.57it/s]


 10%|███▋                                | 5044/50000 [54:53<8:02:33,  1.55it/s]


 10%|███▋                                | 5045/50000 [54:54<8:32:00,  1.46it/s]


 10%|███▋                                | 5046/50000 [54:55<8:05:04,  1.54it/s]


 10%|███▋                                | 5047/50000 [54:55<8:23:43,  1.49it/s]


 10%|███▋                                | 5048/50000 [54:56<8:03:29,  1.55it/s]


 10%|███▋                                | 5049/50000 [54:56<8:00:46,  1.56it/s]


 10%|███▋                                | 5050/50000 [54:57<7:46:01,  1.61it/s]


 10%|███▋                                | 5051/50000 [54:58<7:55:33,  1.58it/s]


 10%|███▋                                | 5052/50000 [54:58<7:57:56,  1.57it/s]


 10%|███▋                                | 5053/50000 [54:59<7:48:03,  1.60it/s]


 10%|███▋                                | 5054/50000 [55:00<8:29:25,  1.47it/s]


 10%|███▋                                | 5055/50000 [55:00<8:03:15,  1.55it/s]


 10%|███▋                                | 5056/50000 [55:01<7:40:38,  1.63it/s]


 10%|███▋                                | 5057/50000 [55:02<8:03:37,  1.55it/s]


 10%|███▋                                | 5058/50000 [55:02<7:45:09,  1.61it/s]


 10%|███▋                                | 5059/50000 [55:03<8:28:55,  1.47it/s]


 10%|███▋                                | 5060/50000 [55:04<8:28:15,  1.47it/s]


 10%|███▋                                | 5061/50000 [55:04<8:35:26,  1.45it/s]


 10%|███▋                                | 5062/50000 [55:05<8:31:32,  1.46it/s]


 10%|███▋                                | 5063/50000 [55:06<8:23:04,  1.49it/s]


 10%|███▋                                | 5064/50000 [55:06<8:21:22,  1.49it/s]


 10%|███▋                                | 5065/50000 [55:07<8:05:55,  1.54it/s]


 10%|███▋                                | 5066/50000 [55:08<8:05:04,  1.54it/s]


 10%|███▋                                | 5067/50000 [55:08<8:29:17,  1.47it/s]


 10%|███▋                                | 5068/50000 [55:09<8:58:26,  1.39it/s]


 10%|███▋                                | 5069/50000 [55:10<8:11:26,  1.52it/s]


 10%|███▋                                | 5070/50000 [55:10<8:10:17,  1.53it/s]


 10%|███▋                                | 5071/50000 [55:11<8:05:31,  1.54it/s]


 10%|███▋                                | 5072/50000 [55:12<8:46:09,  1.42it/s]


 10%|███▋                                | 5073/50000 [55:12<8:37:39,  1.45it/s]


 10%|███▋                                | 5074/50000 [55:13<8:30:06,  1.47it/s]


 10%|███▋                                | 5075/50000 [55:14<8:20:35,  1.50it/s]


 10%|███▋                                | 5076/50000 [55:14<8:13:52,  1.52it/s]


 10%|███▋                                | 5077/50000 [55:15<9:39:14,  1.29it/s]


 10%|███▋                                | 5078/50000 [55:16<9:28:16,  1.32it/s]


 10%|███▋                                | 5079/50000 [55:17<8:57:01,  1.39it/s]


 10%|███▋                                | 5080/50000 [55:17<8:32:08,  1.46it/s]


 10%|███▋                                | 5081/50000 [55:18<8:37:38,  1.45it/s]


 10%|███▋                                | 5082/50000 [55:19<8:35:52,  1.45it/s]


 10%|███▋                                | 5083/50000 [55:19<8:43:34,  1.43it/s]


 10%|███▋                                | 5084/50000 [55:20<8:28:38,  1.47it/s]


 10%|███▋                                | 5085/50000 [55:21<8:38:28,  1.44it/s]


 10%|███▋                                | 5086/50000 [55:21<8:23:35,  1.49it/s]


 10%|███▋                                | 5087/50000 [55:22<8:18:41,  1.50it/s]


 10%|███▋                                | 5088/50000 [55:23<8:12:33,  1.52it/s]


 10%|███▋                                | 5089/50000 [55:23<8:28:33,  1.47it/s]


 10%|███▋                                | 5090/50000 [55:24<8:36:33,  1.45it/s]


 10%|███▋                                | 5091/50000 [55:25<8:16:29,  1.51it/s]


 10%|███▋                                | 5092/50000 [55:25<8:20:53,  1.49it/s]


 10%|███▋                                | 5093/50000 [55:26<8:15:25,  1.51it/s]


 10%|███▋                                | 5094/50000 [55:27<8:30:35,  1.47it/s]


 10%|███▋                                | 5095/50000 [55:27<8:03:58,  1.55it/s]


 10%|███▋                                | 5096/50000 [55:28<7:33:07,  1.65it/s]


 10%|███▋                                | 5097/50000 [55:29<7:26:30,  1.68it/s]


 10%|███▋                                | 5098/50000 [55:29<7:59:16,  1.56it/s]


 10%|███▋                                | 5099/50000 [55:30<7:51:41,  1.59it/s]


 10%|███▋                                | 5100/50000 [55:31<8:15:27,  1.51it/s]
                                                                                
{'loss': 3.4241, 'grad_norm': 2.8174490928649902, 'learning_rate': 0.000898, 'epoch': 0.27}

 10%|███▋                                | 5100/50000 [55:31<8:15:27,  1.51it/s]


 10%|███▋                                | 5101/50000 [55:31<7:52:56,  1.58it/s]


 10%|███▋                                | 5102/50000 [55:32<8:20:33,  1.49it/s]


 10%|███▋                                | 5103/50000 [55:33<8:31:32,  1.46it/s]


 10%|███▋                                | 5104/50000 [55:33<8:10:11,  1.53it/s]


 10%|███▋                                | 5105/50000 [55:34<8:38:21,  1.44it/s]


 10%|███▋                                | 5106/50000 [55:35<7:59:59,  1.56it/s]


 10%|███▋                                | 5107/50000 [55:35<8:03:46,  1.55it/s]


 10%|███▋                                | 5108/50000 [55:36<8:00:49,  1.56it/s]


 10%|███▋                                | 5109/50000 [55:37<8:20:18,  1.50it/s]


 10%|███▋                                | 5110/50000 [55:37<8:02:13,  1.55it/s]


 10%|███▋                                | 5111/50000 [55:38<7:46:46,  1.60it/s]


 10%|███▋                                | 5112/50000 [55:38<7:55:47,  1.57it/s]


 10%|███▋                                | 5113/50000 [55:39<8:06:21,  1.54it/s]


 10%|███▋                                | 5114/50000 [55:40<7:53:56,  1.58it/s]


 10%|███▋                                | 5115/50000 [55:40<7:45:12,  1.61it/s]


 10%|███▋                                | 5116/50000 [55:41<7:40:34,  1.62it/s]


 10%|███▋                                | 5117/50000 [55:41<7:44:26,  1.61it/s]


 10%|███▋                                | 5118/50000 [55:42<7:39:59,  1.63it/s]


 10%|███▋                                | 5119/50000 [55:43<7:52:27,  1.58it/s]


 10%|███▋                                | 5120/50000 [55:44<8:20:22,  1.49it/s]


 10%|███▋                                | 5121/50000 [55:44<8:57:37,  1.39it/s]


 10%|███▋                                | 5122/50000 [55:45<8:31:14,  1.46it/s]


 10%|███▋                                | 5123/50000 [55:46<8:18:33,  1.50it/s]


 10%|███▋                                | 5124/50000 [55:46<8:29:29,  1.47it/s]


 10%|███▋                                | 5125/50000 [55:47<8:21:27,  1.49it/s]


 10%|███▋                                | 5126/50000 [55:48<8:24:28,  1.48it/s]


 10%|███▋                                | 5127/50000 [55:48<8:03:46,  1.55it/s]


 10%|███▋                                | 5128/50000 [55:49<8:03:52,  1.55it/s]


 10%|███▋                                | 5129/50000 [55:49<8:02:18,  1.55it/s]


 10%|███▋                                | 5130/50000 [55:50<8:39:04,  1.44it/s]


 10%|███▋                                | 5131/50000 [55:51<8:31:06,  1.46it/s]


 10%|███▋                                | 5132/50000 [55:52<8:45:47,  1.42it/s]


 10%|███▋                                | 5133/50000 [55:52<8:18:49,  1.50it/s]


 10%|███▋                                | 5134/50000 [55:53<8:03:54,  1.55it/s]


 10%|███▋                                | 5135/50000 [55:53<7:45:08,  1.61it/s]


 10%|███▋                                | 5136/50000 [55:54<7:57:25,  1.57it/s]


 10%|███▋                                | 5137/50000 [55:55<7:48:57,  1.59it/s]


 10%|███▋                                | 5138/50000 [55:55<7:31:53,  1.65it/s]


 10%|███▋                                | 5139/50000 [55:56<8:19:17,  1.50it/s]


 10%|███▋                                | 5140/50000 [55:57<7:53:03,  1.58it/s]


 10%|███▋                                | 5141/50000 [55:57<7:53:01,  1.58it/s]


 10%|███▋                                | 5142/50000 [55:58<7:54:19,  1.58it/s]


 10%|███▋                                | 5143/50000 [55:59<7:42:34,  1.62it/s]


 10%|███▋                                | 5144/50000 [55:59<7:35:45,  1.64it/s]


 10%|███▋                                | 5145/50000 [56:00<7:59:32,  1.56it/s]


 10%|███▋                                | 5146/50000 [56:00<7:54:47,  1.57it/s]


 10%|███▋                                | 5147/50000 [56:01<7:57:01,  1.57it/s]


 10%|███▋                                | 5148/50000 [56:02<7:47:18,  1.60it/s]


 10%|███▋                                | 5149/50000 [56:02<8:10:43,  1.52it/s]


 10%|███▋                                | 5150/50000 [56:03<7:55:29,  1.57it/s]


 10%|███▋                                | 5151/50000 [56:04<7:41:58,  1.62it/s]


 10%|███▋                                | 5152/50000 [56:04<7:46:51,  1.60it/s]


 10%|███▋                                | 5153/50000 [56:05<7:30:13,  1.66it/s]


 10%|███▋                                | 5154/50000 [56:05<7:56:00,  1.57it/s]


 10%|███▋                                | 5155/50000 [56:06<8:19:42,  1.50it/s]


 10%|███▋                                | 5156/50000 [56:07<8:22:34,  1.49it/s]


 10%|███▋                                | 5157/50000 [56:07<7:59:39,  1.56it/s]


 10%|███▋                                | 5158/50000 [56:08<9:00:25,  1.38it/s]


 10%|███▋                                | 5159/50000 [56:09<8:40:08,  1.44it/s]


 10%|███▋                                | 5160/50000 [56:10<8:10:26,  1.52it/s]


 10%|███▋                                | 5161/50000 [56:10<8:28:13,  1.47it/s]


 10%|███▋                                | 5162/50000 [56:11<8:20:16,  1.49it/s]


 10%|███▋                                | 5163/50000 [56:12<9:22:50,  1.33it/s]


 10%|███▋                                | 5164/50000 [56:13<9:16:19,  1.34it/s]


 10%|███▋                                | 5165/50000 [56:13<8:54:08,  1.40it/s]


 10%|███▋                                | 5166/50000 [56:14<8:32:51,  1.46it/s]


 10%|███▋                                | 5167/50000 [56:15<9:10:34,  1.36it/s]


 10%|███▋                                | 5168/50000 [56:15<9:07:32,  1.36it/s]


 10%|███▋                                | 5169/50000 [56:16<9:08:18,  1.36it/s]


 10%|███▋                                | 5170/50000 [56:17<8:42:43,  1.43it/s]


 10%|███▋                                | 5171/50000 [56:18<8:46:42,  1.42it/s]


 10%|███▋                                | 5172/50000 [56:18<9:11:30,  1.35it/s]


 10%|███▋                                | 5173/50000 [56:19<8:36:33,  1.45it/s]


 10%|███▋                                | 5174/50000 [56:20<8:12:09,  1.52it/s]


 10%|███▋                                | 5175/50000 [56:20<7:59:13,  1.56it/s]


 10%|███▋                                | 5176/50000 [56:21<7:32:26,  1.65it/s]


 10%|███▋                                | 5177/50000 [56:21<7:46:58,  1.60it/s]


 10%|███▋                                | 5178/50000 [56:22<7:38:45,  1.63it/s]


 10%|███▋                                | 5179/50000 [56:23<7:47:06,  1.60it/s]


 10%|███▋                                | 5180/50000 [56:23<8:13:16,  1.51it/s]


 10%|███▋                                | 5181/50000 [56:24<8:28:44,  1.47it/s]


 10%|███▋                                | 5182/50000 [56:25<8:41:00,  1.43it/s]


 10%|███▋                                | 5183/50000 [56:25<7:58:48,  1.56it/s]


 10%|███▋                                | 5184/50000 [56:26<8:21:27,  1.49it/s]


 10%|███▋                                | 5185/50000 [56:27<8:05:34,  1.54it/s]


 10%|███▋                                | 5186/50000 [56:27<8:10:24,  1.52it/s]


 10%|███▋                                | 5187/50000 [56:28<8:06:53,  1.53it/s]


 10%|███▋                                | 5188/50000 [56:29<8:12:58,  1.52it/s]


 10%|███▋                                | 5189/50000 [56:29<8:15:51,  1.51it/s]


 10%|███▋                                | 5190/50000 [56:30<7:50:52,  1.59it/s]


 10%|███▋                                | 5191/50000 [56:30<7:40:38,  1.62it/s]


 10%|███▋                                | 5192/50000 [56:31<7:46:01,  1.60it/s]


 10%|███▋                                | 5193/50000 [56:32<7:37:01,  1.63it/s]


 10%|███▋                                | 5194/50000 [56:32<7:51:03,  1.59it/s]


 10%|███▋                                | 5195/50000 [56:33<7:43:31,  1.61it/s]


 10%|███▋                                | 5196/50000 [56:34<7:51:15,  1.58it/s]


 10%|███▋                                | 5197/50000 [56:34<7:24:33,  1.68it/s]


 10%|███▋                                | 5198/50000 [56:35<7:37:32,  1.63it/s]


 10%|███▋                                | 5199/50000 [56:35<7:48:14,  1.59it/s]


 10%|███▋                                | 5200/50000 [56:36<7:52:50,  1.58it/s]
                                                                                
{'loss': 3.3894, 'grad_norm': 2.43112850189209, 'learning_rate': 0.000896, 'epoch': 0.27}

 10%|███▋                                | 5200/50000 [56:36<7:52:50,  1.58it/s]


 10%|███▋                                | 5201/50000 [56:37<7:24:22,  1.68it/s]


 10%|███▋                                | 5202/50000 [56:37<7:41:43,  1.62it/s]


 10%|███▋                                | 5203/50000 [56:38<7:32:45,  1.65it/s]


 10%|███▋                                | 5204/50000 [56:39<8:02:20,  1.55it/s]


 10%|███▋                                | 5205/50000 [56:39<8:20:02,  1.49it/s]


 10%|███▋                                | 5206/50000 [56:40<8:00:56,  1.55it/s]


 10%|███▋                                | 5207/50000 [56:40<7:55:47,  1.57it/s]


 10%|███▋                                | 5208/50000 [56:41<7:52:00,  1.58it/s]


 10%|███▊                                | 5209/50000 [56:42<7:53:33,  1.58it/s]


 10%|███▊                                | 5210/50000 [56:42<7:57:15,  1.56it/s]


 10%|███▊                                | 5211/50000 [56:43<7:43:04,  1.61it/s]


 10%|███▊                                | 5212/50000 [56:44<7:51:27,  1.58it/s]


 10%|███▊                                | 5213/50000 [56:44<8:12:01,  1.52it/s]


 10%|███▊                                | 5214/50000 [56:45<8:11:49,  1.52it/s]


 10%|███▊                                | 5215/50000 [56:46<8:10:07,  1.52it/s]


 10%|███▊                                | 5216/50000 [56:46<7:49:12,  1.59it/s]


 10%|███▊                                | 5217/50000 [56:47<7:41:32,  1.62it/s]


 10%|███▊                                | 5218/50000 [56:47<7:30:38,  1.66it/s]


 10%|███▊                                | 5219/50000 [56:48<7:11:17,  1.73it/s]


 10%|███▊                                | 5220/50000 [56:49<7:44:02,  1.61it/s]


 10%|███▊                                | 5221/50000 [56:49<7:14:54,  1.72it/s]


 10%|███▊                                | 5222/50000 [56:50<7:31:52,  1.65it/s]


 10%|███▊                                | 5223/50000 [56:50<7:22:24,  1.69it/s]


 10%|███▊                                | 5224/50000 [56:51<7:36:58,  1.63it/s]


 10%|███▊                                | 5225/50000 [56:52<7:44:26,  1.61it/s]


 10%|███▊                                | 5226/50000 [56:52<8:15:46,  1.51it/s]


 10%|███▊                                | 5227/50000 [56:53<8:08:46,  1.53it/s]


 10%|███▊                                | 5228/50000 [56:54<7:49:39,  1.59it/s]


 10%|███▊                                | 5229/50000 [56:54<7:16:27,  1.71it/s]


 10%|███▊                                | 5230/50000 [56:55<7:54:35,  1.57it/s]


 10%|███▊                                | 5231/50000 [56:56<8:22:42,  1.48it/s]


 10%|███▊                                | 5232/50000 [56:56<8:02:18,  1.55it/s]


 10%|███▊                                | 5233/50000 [56:57<8:28:00,  1.47it/s]


 10%|███▊                                | 5234/50000 [56:58<8:28:51,  1.47it/s]


 10%|███▊                                | 5235/50000 [56:58<8:17:59,  1.50it/s]


 10%|███▊                                | 5236/50000 [56:59<7:57:34,  1.56it/s]


 10%|███▊                                | 5237/50000 [56:59<8:01:37,  1.55it/s]


 10%|███▊                                | 5238/50000 [57:00<7:28:26,  1.66it/s]


 10%|███▊                                | 5239/50000 [57:01<7:42:29,  1.61it/s]


 10%|███▊                                | 5240/50000 [57:01<7:33:59,  1.64it/s]


 10%|███▊                                | 5241/50000 [57:02<8:05:19,  1.54it/s]


 10%|███▊                                | 5242/50000 [57:03<7:54:19,  1.57it/s]


 10%|███▊                                | 5243/50000 [57:03<8:12:04,  1.52it/s]


 10%|███▊                                | 5244/50000 [57:04<7:46:22,  1.60it/s]


 10%|███▊                                | 5245/50000 [57:05<8:19:45,  1.49it/s]


 10%|███▊                                | 5246/50000 [57:05<8:01:30,  1.55it/s]


 10%|███▊                                | 5247/50000 [57:06<7:51:38,  1.58it/s]


 10%|███▊                                | 5248/50000 [57:06<7:51:46,  1.58it/s]


 10%|███▊                                | 5249/50000 [57:07<8:13:40,  1.51it/s]


 10%|███▊                                | 5250/50000 [57:08<8:11:16,  1.52it/s]


 11%|███▊                                | 5251/50000 [57:08<8:12:30,  1.51it/s]


 11%|███▊                                | 5252/50000 [57:09<8:14:53,  1.51it/s]


 11%|███▊                                | 5253/50000 [57:10<8:19:28,  1.49it/s]


 11%|███▊                                | 5254/50000 [57:10<7:57:04,  1.56it/s]


 11%|███▊                                | 5255/50000 [57:11<7:42:55,  1.61it/s]


 11%|███▊                                | 5256/50000 [57:12<8:33:43,  1.45it/s]


 11%|███▊                                | 5257/50000 [57:12<8:05:47,  1.54it/s]


 11%|███▊                                | 5258/50000 [57:13<8:04:31,  1.54it/s]


 11%|███▊                                | 5259/50000 [57:14<7:48:05,  1.59it/s]


 11%|███▊                                | 5260/50000 [57:14<7:55:41,  1.57it/s]


 11%|███▊                                | 5261/50000 [57:15<7:44:38,  1.60it/s]


 11%|███▊                                | 5262/50000 [57:16<8:05:44,  1.54it/s]


 11%|███▊                                | 5263/50000 [57:16<8:20:36,  1.49it/s]


 11%|███▊                                | 5264/50000 [57:17<7:55:04,  1.57it/s]


 11%|███▊                                | 5265/50000 [57:17<7:48:14,  1.59it/s]


 11%|███▊                                | 5266/50000 [57:18<7:42:04,  1.61it/s]


 11%|███▊                                | 5267/50000 [57:19<7:49:07,  1.59it/s]


 11%|███▊                                | 5268/50000 [57:19<7:56:57,  1.56it/s]


 11%|███▊                                | 5269/50000 [57:20<7:22:22,  1.69it/s]


 11%|███▊                                | 5270/50000 [57:20<7:21:07,  1.69it/s]


 11%|███▊                                | 5271/50000 [57:21<7:35:50,  1.64it/s]


 11%|███▊                                | 5272/50000 [57:22<8:03:43,  1.54it/s]


 11%|███▊                                | 5273/50000 [57:22<7:54:21,  1.57it/s]


 11%|███▊                                | 5274/50000 [57:23<8:37:16,  1.44it/s]


 11%|███▊                                | 5275/50000 [57:24<8:16:32,  1.50it/s]


 11%|███▊                                | 5276/50000 [57:24<7:56:21,  1.56it/s]


 11%|███▊                                | 5277/50000 [57:25<8:37:07,  1.44it/s]


 11%|███▊                                | 5278/50000 [57:26<8:09:16,  1.52it/s]


 11%|███▊                                | 5279/50000 [57:27<8:36:54,  1.44it/s]


 11%|███▊                                | 5280/50000 [57:27<9:02:36,  1.37it/s]


 11%|███▊                                | 5281/50000 [57:28<9:05:45,  1.37it/s]


 11%|███▊                                | 5282/50000 [57:29<8:30:40,  1.46it/s]


 11%|███▊                                | 5283/50000 [57:29<8:36:17,  1.44it/s]


 11%|███▊                                | 5284/50000 [57:30<8:46:25,  1.42it/s]


 11%|███▊                                | 5285/50000 [57:31<8:38:45,  1.44it/s]


 11%|███▊                                | 5286/50000 [57:32<8:21:49,  1.49it/s]


 11%|███▊                                | 5287/50000 [57:32<8:21:13,  1.49it/s]


 11%|███▊                                | 5288/50000 [57:33<8:36:46,  1.44it/s]


 11%|███▊                                | 5289/50000 [57:34<8:40:13,  1.43it/s]


 11%|███▊                                | 5290/50000 [57:34<8:26:58,  1.47it/s]


 11%|███▊                                | 5291/50000 [57:35<8:37:55,  1.44it/s]


 11%|███▊                                | 5292/50000 [57:36<8:25:47,  1.47it/s]


 11%|███▊                                | 5293/50000 [57:36<8:08:25,  1.53it/s]


 11%|███▊                                | 5294/50000 [57:37<8:20:30,  1.49it/s]


 11%|███▊                                | 5295/50000 [57:37<7:45:47,  1.60it/s]


 11%|███▊                                | 5296/50000 [57:38<8:08:01,  1.53it/s]


 11%|███▊                                | 5297/50000 [57:39<8:34:52,  1.45it/s]


 11%|███▊                                | 5298/50000 [57:40<8:10:34,  1.52it/s]


 11%|███▊                                | 5299/50000 [57:40<8:16:09,  1.50it/s]


 11%|███▊                                | 5300/50000 [57:41<8:11:33,  1.52it/s]
                                                                                
{'loss': 3.3921, 'grad_norm': 2.6298649311065674, 'learning_rate': 0.000894, 'epoch': 0.28}

 11%|███▊                                | 5300/50000 [57:41<8:11:33,  1.52it/s]


 11%|███▊                                | 5301/50000 [57:42<8:27:18,  1.47it/s]


 11%|███▊                                | 5302/50000 [57:42<8:08:03,  1.53it/s]


 11%|███▊                                | 5303/50000 [57:43<8:04:35,  1.54it/s]


 11%|███▊                                | 5304/50000 [57:43<7:50:52,  1.58it/s]


 11%|███▊                                | 5305/50000 [57:44<8:15:39,  1.50it/s]


 11%|███▊                                | 5306/50000 [57:45<8:28:43,  1.46it/s]


 11%|███▊                                | 5307/50000 [57:46<8:19:11,  1.49it/s]


 11%|███▊                                | 5308/50000 [57:46<8:08:06,  1.53it/s]


 11%|███▊                                | 5309/50000 [57:47<8:10:11,  1.52it/s]


 11%|███▊                                | 5310/50000 [57:48<8:37:57,  1.44it/s]


 11%|███▊                                | 5311/50000 [57:48<8:05:44,  1.53it/s]


 11%|███▊                                | 5312/50000 [57:49<7:54:44,  1.57it/s]


 11%|███▊                                | 5313/50000 [57:49<7:52:19,  1.58it/s]


 11%|███▊                                | 5314/50000 [57:50<7:42:10,  1.61it/s]


 11%|███▊                                | 5315/50000 [57:51<7:50:46,  1.58it/s]


 11%|███▊                                | 5316/50000 [57:51<8:19:49,  1.49it/s]


 11%|███▊                                | 5317/50000 [57:52<7:57:19,  1.56it/s]


 11%|███▊                                | 5318/50000 [57:53<7:40:03,  1.62it/s]


 11%|███▊                                | 5319/50000 [57:53<7:35:56,  1.63it/s]


 11%|███▊                                | 5320/50000 [57:54<8:06:31,  1.53it/s]


 11%|███▊                                | 5321/50000 [57:55<8:11:58,  1.51it/s]


 11%|███▊                                | 5322/50000 [57:55<8:03:03,  1.54it/s]


 11%|███▊                                | 5323/50000 [57:56<7:49:41,  1.59it/s]


 11%|███▊                                | 5324/50000 [57:56<7:50:20,  1.58it/s]


 11%|███▊                                | 5325/50000 [57:57<8:07:42,  1.53it/s]


 11%|███▊                                | 5326/50000 [57:58<8:04:10,  1.54it/s]


 11%|███▊                                | 5327/50000 [57:58<8:07:22,  1.53it/s]


 11%|███▊                                | 5328/50000 [57:59<8:11:02,  1.52it/s]


 11%|███▊                                | 5329/50000 [58:00<8:12:10,  1.51it/s]


 11%|███▊                                | 5330/50000 [58:00<8:10:25,  1.52it/s]


 11%|███▊                                | 5331/50000 [58:01<7:44:57,  1.60it/s]


 11%|███▊                                | 5332/50000 [58:02<7:48:27,  1.59it/s]


 11%|███▊                                | 5333/50000 [58:02<7:29:57,  1.65it/s]


 11%|███▊                                | 5334/50000 [58:03<7:44:59,  1.60it/s]


 11%|███▊                                | 5335/50000 [58:03<7:44:10,  1.60it/s]


 11%|███▊                                | 5336/50000 [58:04<7:47:37,  1.59it/s]


 11%|███▊                                | 5337/50000 [58:05<7:29:01,  1.66it/s]


 11%|███▊                                | 5338/50000 [58:05<7:24:28,  1.67it/s]


 11%|███▊                                | 5339/50000 [58:06<7:00:47,  1.77it/s]


 11%|███▊                                | 5340/50000 [58:06<7:20:03,  1.69it/s]


 11%|███▊                                | 5341/50000 [58:07<7:13:54,  1.72it/s]


 11%|███▊                                | 5342/50000 [58:08<7:29:20,  1.66it/s]


 11%|███▊                                | 5343/50000 [58:08<7:40:02,  1.62it/s]


 11%|███▊                                | 5344/50000 [58:09<8:09:20,  1.52it/s]


 11%|███▊                                | 5345/50000 [58:10<8:24:13,  1.48it/s]


 11%|███▊                                | 5346/50000 [58:10<7:59:39,  1.55it/s]


 11%|███▊                                | 5347/50000 [58:11<7:49:07,  1.59it/s]


 11%|███▊                                | 5348/50000 [58:11<7:48:21,  1.59it/s]


 11%|███▊                                | 5349/50000 [58:12<7:58:57,  1.55it/s]


 11%|███▊                                | 5350/50000 [58:13<7:45:18,  1.60it/s]


 11%|███▊                                | 5351/50000 [58:13<7:51:11,  1.58it/s]


 11%|███▊                                | 5352/50000 [58:14<7:58:09,  1.56it/s]


 11%|███▊                                | 5353/50000 [58:15<8:05:47,  1.53it/s]


 11%|███▊                                | 5354/50000 [58:15<8:05:41,  1.53it/s]


 11%|███▊                                | 5355/50000 [58:16<7:54:18,  1.57it/s]


 11%|███▊                                | 5356/50000 [58:17<8:11:14,  1.51it/s]


 11%|███▊                                | 5357/50000 [58:17<7:59:30,  1.55it/s]


 11%|███▊                                | 5358/50000 [58:18<7:37:21,  1.63it/s]


 11%|███▊                                | 5359/50000 [58:19<7:59:47,  1.55it/s]


 11%|███▊                                | 5360/50000 [58:19<7:44:19,  1.60it/s]


 11%|███▊                                | 5361/50000 [58:20<8:09:22,  1.52it/s]


 11%|███▊                                | 5362/50000 [58:20<8:00:55,  1.55it/s]


 11%|███▊                                | 5363/50000 [58:21<7:42:18,  1.61it/s]


 11%|███▊                                | 5364/50000 [58:22<8:10:43,  1.52it/s]


 11%|███▊                                | 5365/50000 [58:22<8:08:50,  1.52it/s]


 11%|███▊                                | 5366/50000 [58:23<7:37:21,  1.63it/s]


 11%|███▊                                | 5367/50000 [58:24<7:32:43,  1.64it/s]


 11%|███▊                                | 5368/50000 [58:24<7:58:26,  1.55it/s]


 11%|███▊                                | 5369/50000 [58:25<7:53:20,  1.57it/s]


 11%|███▊                                | 5370/50000 [58:26<7:55:15,  1.57it/s]


 11%|███▊                                | 5371/50000 [58:26<8:00:46,  1.55it/s]


 11%|███▊                                | 5372/50000 [58:27<7:59:07,  1.55it/s]


 11%|███▊                                | 5373/50000 [58:27<7:46:36,  1.59it/s]


 11%|███▊                                | 5374/50000 [58:28<8:08:09,  1.52it/s]


 11%|███▊                                | 5375/50000 [58:29<8:07:01,  1.53it/s]


 11%|███▊                                | 5376/50000 [58:30<8:20:31,  1.49it/s]


 11%|███▊                                | 5377/50000 [58:30<8:22:00,  1.48it/s]


 11%|███▊                                | 5378/50000 [58:31<8:10:10,  1.52it/s]


 11%|███▊                                | 5379/50000 [58:32<8:14:14,  1.50it/s]


 11%|███▊                                | 5380/50000 [58:32<8:58:06,  1.38it/s]


 11%|███▊                                | 5381/50000 [58:33<9:38:47,  1.28it/s]


 11%|███▉                                | 5382/50000 [58:34<9:03:38,  1.37it/s]


 11%|███▉                                | 5383/50000 [58:35<8:51:36,  1.40it/s]


 11%|███▉                                | 5384/50000 [58:35<8:04:32,  1.53it/s]


 11%|███▉                                | 5385/50000 [58:36<8:08:05,  1.52it/s]


 11%|███▉                                | 5386/50000 [58:36<8:24:49,  1.47it/s]


 11%|███▉                                | 5387/50000 [58:37<8:13:21,  1.51it/s]


 11%|███▉                                | 5388/50000 [58:38<7:55:45,  1.56it/s]


 11%|███▉                                | 5389/50000 [58:38<8:03:08,  1.54it/s]


 11%|███▉                                | 5390/50000 [58:39<8:27:28,  1.47it/s]


 11%|███▉                                | 5391/50000 [58:40<8:59:23,  1.38it/s]


 11%|███▉                                | 5392/50000 [58:41<8:36:33,  1.44it/s]


 11%|███▉                                | 5393/50000 [58:41<8:14:42,  1.50it/s]


 11%|███▉                                | 5394/50000 [58:42<8:06:40,  1.53it/s]


 11%|███▉                                | 5395/50000 [58:42<7:46:51,  1.59it/s]


 11%|███▉                                | 5396/50000 [58:43<7:30:11,  1.65it/s]


 11%|███▉                                | 5397/50000 [58:44<7:59:53,  1.55it/s]


 11%|███▉                                | 5398/50000 [58:44<8:19:09,  1.49it/s]


 11%|███▉                                | 5399/50000 [58:45<7:55:47,  1.56it/s]


 11%|███▉                                | 5400/50000 [58:46<7:45:50,  1.60it/s]
                                                                                
{'loss': 3.3576, 'grad_norm': 2.6885898113250732, 'learning_rate': 0.000892, 'epoch': 0.28}

 11%|███▉                                | 5400/50000 [58:46<7:45:50,  1.60it/s]


 11%|███▉                                | 5401/50000 [58:46<8:11:46,  1.51it/s]


 11%|███▉                                | 5402/50000 [58:47<7:45:43,  1.60it/s]


 11%|███▉                                | 5403/50000 [58:48<8:14:37,  1.50it/s]


 11%|███▉                                | 5404/50000 [58:48<7:47:39,  1.59it/s]


 11%|███▉                                | 5405/50000 [58:49<7:55:33,  1.56it/s]


 11%|███▉                                | 5406/50000 [58:49<8:01:02,  1.55it/s]


 11%|███▉                                | 5407/50000 [58:50<7:56:28,  1.56it/s]


 11%|███▉                                | 5408/50000 [58:51<8:01:43,  1.54it/s]


 11%|███▉                                | 5409/50000 [58:51<8:09:40,  1.52it/s]


 11%|███▉                                | 5410/50000 [58:52<8:04:45,  1.53it/s]


 11%|███▉                                | 5411/50000 [58:53<8:17:37,  1.49it/s]


 11%|███▉                                | 5412/50000 [58:53<8:18:02,  1.49it/s]


 11%|███▉                                | 5413/50000 [58:54<8:31:35,  1.45it/s]


 11%|███▉                                | 5414/50000 [58:55<8:58:30,  1.38it/s]


 11%|███▉                                | 5415/50000 [58:56<8:46:46,  1.41it/s]


 11%|███▉                                | 5416/50000 [58:56<8:54:16,  1.39it/s]


 11%|███▉                                | 5417/50000 [58:57<8:38:00,  1.43it/s]


 11%|███▉                                | 5418/50000 [58:58<8:29:18,  1.46it/s]


 11%|███▉                                | 5419/50000 [58:58<8:14:52,  1.50it/s]


 11%|███▉                                | 5420/50000 [58:59<7:34:35,  1.63it/s]


 11%|███▉                                | 5421/50000 [58:59<7:21:03,  1.68it/s]


 11%|███▉                                | 5422/50000 [59:00<7:18:39,  1.69it/s]


 11%|███▉                                | 5423/50000 [59:01<7:26:47,  1.66it/s]


 11%|███▉                                | 5424/50000 [59:01<7:09:10,  1.73it/s]


 11%|███▉                                | 5425/50000 [59:02<7:29:54,  1.65it/s]


 11%|███▉                                | 5426/50000 [59:02<7:36:01,  1.63it/s]


 11%|███▉                                | 5427/50000 [59:03<7:57:22,  1.56it/s]


 11%|███▉                                | 5428/50000 [59:04<8:02:15,  1.54it/s]


 11%|███▉                                | 5429/50000 [59:05<8:24:26,  1.47it/s]


 11%|███▉                                | 5430/50000 [59:05<8:24:02,  1.47it/s]


 11%|███▉                                | 5431/50000 [59:06<7:55:36,  1.56it/s]


 11%|███▉                                | 5432/50000 [59:06<7:59:57,  1.55it/s]


 11%|███▉                                | 5433/50000 [59:07<8:08:20,  1.52it/s]


 11%|███▉                                | 5434/50000 [59:08<7:36:52,  1.63it/s]


 11%|███▉                                | 5435/50000 [59:08<7:45:00,  1.60it/s]


 11%|███▉                                | 5436/50000 [59:09<7:21:54,  1.68it/s]


 11%|███▉                                | 5437/50000 [59:09<7:28:54,  1.65it/s]


 11%|███▉                                | 5438/50000 [59:10<7:33:59,  1.64it/s]


 11%|███▉                                | 5439/50000 [59:11<7:22:00,  1.68it/s]


 11%|███▉                                | 5440/50000 [59:11<8:14:01,  1.50it/s]


 11%|███▉                                | 5441/50000 [59:12<8:00:13,  1.55it/s]


 11%|███▉                                | 5442/50000 [59:13<7:37:44,  1.62it/s]


 11%|███▉                                | 5443/50000 [59:13<7:48:27,  1.59it/s]


 11%|███▉                                | 5444/50000 [59:14<7:40:19,  1.61it/s]


 11%|███▉                                | 5445/50000 [59:14<7:27:39,  1.66it/s]


 11%|███▉                                | 5446/50000 [59:15<8:07:18,  1.52it/s]


 11%|███▉                                | 5447/50000 [59:16<8:06:14,  1.53it/s]


 11%|███▉                                | 5448/50000 [59:17<8:34:20,  1.44it/s]


 11%|███▉                                | 5449/50000 [59:17<8:07:22,  1.52it/s]


 11%|███▉                                | 5450/50000 [59:18<7:54:02,  1.57it/s]


 11%|███▉                                | 5451/50000 [59:18<7:44:06,  1.60it/s]


 11%|███▉                                | 5452/50000 [59:19<7:44:01,  1.60it/s]


 11%|███▉                                | 5453/50000 [59:20<7:40:35,  1.61it/s]


 11%|███▉                                | 5454/50000 [59:20<8:28:55,  1.46it/s]


 11%|███▉                                | 5455/50000 [59:21<8:21:47,  1.48it/s]


 11%|███▉                                | 5456/50000 [59:22<8:09:54,  1.52it/s]


 11%|███▉                                | 5457/50000 [59:22<7:57:10,  1.56it/s]


 11%|███▉                                | 5458/50000 [59:23<8:03:54,  1.53it/s]


 11%|███▉                                | 5459/50000 [59:24<8:28:59,  1.46it/s]


 11%|███▉                                | 5460/50000 [59:24<7:58:05,  1.55it/s]


 11%|███▉                                | 5461/50000 [59:25<7:55:52,  1.56it/s]


 11%|███▉                                | 5462/50000 [59:26<7:56:41,  1.56it/s]


 11%|███▉                                | 5463/50000 [59:26<7:54:42,  1.56it/s]


 11%|███▉                                | 5464/50000 [59:27<8:36:42,  1.44it/s]


 11%|███▉                                | 5465/50000 [59:28<8:12:43,  1.51it/s]


 11%|███▉                                | 5466/50000 [59:28<8:25:58,  1.47it/s]


 11%|███▉                                | 5467/50000 [59:29<9:22:48,  1.32it/s]


 11%|███▉                                | 5468/50000 [59:30<8:59:01,  1.38it/s]


 11%|███▉                                | 5469/50000 [59:31<8:38:15,  1.43it/s]


 11%|███▉                                | 5470/50000 [59:31<8:44:03,  1.42it/s]


 11%|███▉                                | 5471/50000 [59:32<8:17:59,  1.49it/s]


 11%|███▉                                | 5472/50000 [59:33<8:19:22,  1.49it/s]


 11%|███▉                                | 5473/50000 [59:33<8:07:45,  1.52it/s]


 11%|███▉                                | 5474/50000 [59:34<8:24:00,  1.47it/s]


 11%|███▉                                | 5475/50000 [59:35<8:30:37,  1.45it/s]


 11%|███▉                                | 5476/50000 [59:35<7:46:54,  1.59it/s]


 11%|███▉                                | 5477/50000 [59:36<7:42:02,  1.61it/s]


 11%|███▉                                | 5478/50000 [59:36<7:54:09,  1.56it/s]


 11%|███▉                                | 5479/50000 [59:37<7:42:57,  1.60it/s]


 11%|███▉                                | 5480/50000 [59:38<7:25:40,  1.66it/s]


 11%|███▉                                | 5481/50000 [59:38<7:54:40,  1.56it/s]


 11%|███▉                                | 5482/50000 [59:39<8:42:34,  1.42it/s]


 11%|███▉                                | 5483/50000 [59:40<8:55:30,  1.39it/s]


 11%|███▉                                | 5484/50000 [59:41<9:26:33,  1.31it/s]


 11%|███▉                                | 5485/50000 [59:41<9:15:56,  1.33it/s]


 11%|███▉                                | 5486/50000 [59:42<9:07:05,  1.36it/s]


 11%|███▉                                | 5487/50000 [59:43<8:49:31,  1.40it/s]


 11%|███▉                                | 5488/50000 [59:44<8:39:56,  1.43it/s]


 11%|███▉                                | 5489/50000 [59:44<8:50:43,  1.40it/s]


 11%|███▉                                | 5490/50000 [59:45<8:14:12,  1.50it/s]


 11%|███▉                                | 5491/50000 [59:45<8:09:37,  1.52it/s]


 11%|███▉                                | 5492/50000 [59:46<7:36:15,  1.63it/s]


 11%|███▉                                | 5493/50000 [59:47<8:01:49,  1.54it/s]


 11%|███▉                                | 5494/50000 [59:47<7:41:24,  1.61it/s]


 11%|███▉                                | 5495/50000 [59:48<7:47:57,  1.59it/s]


 11%|███▉                                | 5496/50000 [59:49<7:58:14,  1.55it/s]


 11%|███▉                                | 5497/50000 [59:49<8:02:37,  1.54it/s]


 11%|███▉                                | 5498/50000 [59:50<7:57:16,  1.55it/s]


 11%|███▉                                | 5499/50000 [59:51<8:11:56,  1.51it/s]


 11%|███▉                                | 5500/50000 [59:51<8:47:19,  1.41it/s]
                                                                                
{'loss': 3.409, 'grad_norm': 3.080181121826172, 'learning_rate': 0.0008900000000000001, 'epoch': 0.29}

 11%|███▉                                | 5500/50000 [59:51<8:47:19,  1.41it/s]


 11%|███▉                                | 5501/50000 [59:52<8:51:48,  1.39it/s]


 11%|███▉                                | 5502/50000 [59:53<8:14:58,  1.50it/s]


 11%|███▉                                | 5503/50000 [59:53<7:56:07,  1.56it/s]


 11%|███▉                                | 5504/50000 [59:54<7:55:32,  1.56it/s]


 11%|███▉                                | 5505/50000 [59:55<8:25:13,  1.47it/s]


 11%|███▉                                | 5506/50000 [59:55<8:24:16,  1.47it/s]


 11%|███▉                                | 5507/50000 [59:56<8:23:38,  1.47it/s]


 11%|███▉                                | 5508/50000 [59:57<8:19:05,  1.49it/s]


 11%|███▉                                | 5509/50000 [59:57<7:52:01,  1.57it/s]


 11%|███▉                                | 5510/50000 [59:58<8:01:07,  1.54it/s]


 11%|███▉                                | 5511/50000 [59:59<7:47:44,  1.59it/s]


 11%|███▉                                | 5512/50000 [59:59<7:35:36,  1.63it/s]


 11%|███▋                              | 5513/50000 [1:00:00<7:45:25,  1.59it/s]


 11%|███▋                              | 5514/50000 [1:00:00<7:49:28,  1.58it/s]


 11%|███▊                              | 5515/50000 [1:00:01<7:58:00,  1.55it/s]


 11%|███▊                              | 5516/50000 [1:00:02<7:56:49,  1.55it/s]


 11%|███▊                              | 5517/50000 [1:00:02<7:55:54,  1.56it/s]


 11%|███▊                              | 5518/50000 [1:00:03<8:03:46,  1.53it/s]


 11%|███▊                              | 5519/50000 [1:00:04<7:56:42,  1.56it/s]


 11%|███▊                              | 5520/50000 [1:00:04<8:05:43,  1.53it/s]


 11%|███▊                              | 5521/50000 [1:00:05<8:25:17,  1.47it/s]


 11%|███▊                              | 5522/50000 [1:00:06<8:22:53,  1.47it/s]


 11%|███▊                              | 5523/50000 [1:00:06<8:18:32,  1.49it/s]


 11%|███▊                              | 5524/50000 [1:00:07<8:37:01,  1.43it/s]


 11%|███▊                              | 5525/50000 [1:00:08<8:12:49,  1.50it/s]


 11%|███▊                              | 5526/50000 [1:00:09<8:46:28,  1.41it/s]


 11%|███▊                              | 5527/50000 [1:00:09<8:03:19,  1.53it/s]


 11%|███▊                              | 5528/50000 [1:00:10<7:50:31,  1.58it/s]


 11%|███▊                              | 5529/50000 [1:00:10<8:10:37,  1.51it/s]


 11%|███▊                              | 5530/50000 [1:00:11<8:33:04,  1.44it/s]


 11%|███▊                              | 5531/50000 [1:00:12<8:07:12,  1.52it/s]


 11%|███▊                              | 5532/50000 [1:00:12<7:35:45,  1.63it/s]


 11%|███▊                              | 5533/50000 [1:00:13<8:08:27,  1.52it/s]


 11%|███▊                              | 5534/50000 [1:00:14<8:04:08,  1.53it/s]


 11%|███▊                              | 5535/50000 [1:00:14<8:23:58,  1.47it/s]


 11%|███▊                              | 5536/50000 [1:00:15<8:17:38,  1.49it/s]


 11%|███▊                              | 5537/50000 [1:00:16<9:38:50,  1.28it/s]


 11%|███▊                              | 5538/50000 [1:00:17<8:59:00,  1.37it/s]


 11%|███▊                              | 5539/50000 [1:00:17<8:48:27,  1.40it/s]


 11%|███▊                              | 5540/50000 [1:00:18<8:47:46,  1.40it/s]


 11%|███▊                              | 5541/50000 [1:00:19<8:28:41,  1.46it/s]


 11%|███▊                              | 5542/50000 [1:00:19<8:23:44,  1.47it/s]


 11%|███▊                              | 5543/50000 [1:00:20<8:17:24,  1.49it/s]


 11%|███▊                              | 5544/50000 [1:00:21<8:15:52,  1.49it/s]


 11%|███▊                              | 5545/50000 [1:00:21<8:09:07,  1.51it/s]


 11%|███▊                              | 5546/50000 [1:00:22<7:37:03,  1.62it/s]


 11%|███▊                              | 5547/50000 [1:00:23<7:59:29,  1.55it/s]


 11%|███▊                              | 5548/50000 [1:00:23<7:38:01,  1.62it/s]


 11%|███▊                              | 5549/50000 [1:00:24<7:34:11,  1.63it/s]


 11%|███▊                              | 5550/50000 [1:00:24<7:19:02,  1.69it/s]


 11%|███▊                              | 5551/50000 [1:00:25<7:11:03,  1.72it/s]


 11%|███▊                              | 5552/50000 [1:00:25<7:26:29,  1.66it/s]


 11%|███▊                              | 5553/50000 [1:00:26<7:32:09,  1.64it/s]


 11%|███▊                              | 5554/50000 [1:00:27<7:37:19,  1.62it/s]


 11%|███▊                              | 5555/50000 [1:00:27<7:44:52,  1.59it/s]


 11%|███▊                              | 5556/50000 [1:00:28<8:13:17,  1.50it/s]


 11%|███▊                              | 5557/50000 [1:00:29<8:15:40,  1.49it/s]


 11%|███▊                              | 5558/50000 [1:00:29<7:36:12,  1.62it/s]


 11%|███▊                              | 5559/50000 [1:00:30<7:56:48,  1.55it/s]


 11%|███▊                              | 5560/50000 [1:00:31<7:41:38,  1.60it/s]


 11%|███▊                              | 5561/50000 [1:00:31<7:46:30,  1.59it/s]


 11%|███▊                              | 5562/50000 [1:00:32<7:51:13,  1.57it/s]


 11%|███▊                              | 5563/50000 [1:00:32<7:37:53,  1.62it/s]


 11%|███▊                              | 5564/50000 [1:00:33<7:58:16,  1.55it/s]


 11%|███▊                              | 5565/50000 [1:00:34<8:02:26,  1.54it/s]


 11%|███▊                              | 5566/50000 [1:00:35<8:22:51,  1.47it/s]


 11%|███▊                              | 5567/50000 [1:00:35<8:18:09,  1.49it/s]


 11%|███▊                              | 5568/50000 [1:00:36<8:16:24,  1.49it/s]


 11%|███▊                              | 5569/50000 [1:00:36<7:48:33,  1.58it/s]


 11%|███▊                              | 5570/50000 [1:00:37<7:48:48,  1.58it/s]


 11%|███▊                              | 5571/50000 [1:00:38<7:52:12,  1.57it/s]


 11%|███▊                              | 5572/50000 [1:00:38<7:32:06,  1.64it/s]


 11%|███▊                              | 5573/50000 [1:00:39<8:04:43,  1.53it/s]


 11%|███▊                              | 5574/50000 [1:00:40<7:43:39,  1.60it/s]


 11%|███▊                              | 5575/50000 [1:00:40<7:49:37,  1.58it/s]


 11%|███▊                              | 5576/50000 [1:00:41<7:38:34,  1.61it/s]


 11%|███▊                              | 5577/50000 [1:00:41<7:44:52,  1.59it/s]


 11%|███▊                              | 5578/50000 [1:00:42<7:29:59,  1.65it/s]


 11%|███▊                              | 5579/50000 [1:00:43<7:20:33,  1.68it/s]


 11%|███▊                              | 5580/50000 [1:00:43<7:20:54,  1.68it/s]


 11%|███▊                              | 5581/50000 [1:00:44<7:15:36,  1.70it/s]


 11%|███▊                              | 5582/50000 [1:00:44<7:18:58,  1.69it/s]


 11%|███▊                              | 5583/50000 [1:00:45<7:49:47,  1.58it/s]


 11%|███▊                              | 5584/50000 [1:00:46<8:12:54,  1.50it/s]


 11%|███▊                              | 5585/50000 [1:00:47<8:10:10,  1.51it/s]


 11%|███▊                              | 5586/50000 [1:00:47<8:01:28,  1.54it/s]


 11%|███▊                              | 5587/50000 [1:00:48<7:56:38,  1.55it/s]


 11%|███▊                              | 5588/50000 [1:00:49<8:23:06,  1.47it/s]


 11%|███▊                              | 5589/50000 [1:00:49<8:07:03,  1.52it/s]


 11%|███▊                              | 5590/50000 [1:00:50<8:25:55,  1.46it/s]


 11%|███▊                              | 5591/50000 [1:00:50<7:57:58,  1.55it/s]


 11%|███▊                              | 5592/50000 [1:00:51<8:15:15,  1.49it/s]


 11%|███▊                              | 5593/50000 [1:00:52<7:55:59,  1.55it/s]


 11%|███▊                              | 5594/50000 [1:00:52<7:34:14,  1.63it/s]


 11%|███▊                              | 5595/50000 [1:00:53<7:44:05,  1.59it/s]


 11%|███▊                              | 5596/50000 [1:00:54<7:46:43,  1.59it/s]


 11%|███▊                              | 5597/50000 [1:00:54<7:34:43,  1.63it/s]


 11%|███▊                              | 5598/50000 [1:00:55<7:26:19,  1.66it/s]


 11%|███▊                              | 5599/50000 [1:00:55<7:31:31,  1.64it/s]


 11%|███▊                              | 5600/50000 [1:00:56<7:58:22,  1.55it/s]
                                                                                
{'loss': 3.3832, 'grad_norm': 2.724524974822998, 'learning_rate': 0.000888, 'epoch': 0.29}

 11%|███▊                              | 5600/50000 [1:00:56<7:58:22,  1.55it/s]


 11%|███▊                              | 5601/50000 [1:00:57<7:49:47,  1.58it/s]


 11%|███▊                              | 5602/50000 [1:00:57<7:39:36,  1.61it/s]


 11%|███▊                              | 5603/50000 [1:00:58<7:40:58,  1.61it/s]


 11%|███▊                              | 5604/50000 [1:00:58<7:18:36,  1.69it/s]


 11%|███▊                              | 5605/50000 [1:00:59<7:27:41,  1.65it/s]


 11%|███▊                              | 5606/50000 [1:01:00<7:32:30,  1.64it/s]


 11%|███▊                              | 5607/50000 [1:01:00<7:09:58,  1.72it/s]


 11%|███▊                              | 5608/50000 [1:01:01<7:39:49,  1.61it/s]


 11%|███▊                              | 5609/50000 [1:01:01<7:13:42,  1.71it/s]


 11%|███▊                              | 5610/50000 [1:01:02<7:21:22,  1.68it/s]


 11%|███▊                              | 5611/50000 [1:01:03<7:39:15,  1.61it/s]


 11%|███▊                              | 5612/50000 [1:01:03<7:23:52,  1.67it/s]


 11%|███▊                              | 5613/50000 [1:01:04<7:33:12,  1.63it/s]


 11%|███▊                              | 5614/50000 [1:01:05<7:39:04,  1.61it/s]


 11%|███▊                              | 5615/50000 [1:01:05<7:24:58,  1.66it/s]


 11%|███▊                              | 5616/50000 [1:01:06<7:20:46,  1.68it/s]


 11%|███▊                              | 5617/50000 [1:01:06<7:40:19,  1.61it/s]


 11%|███▊                              | 5618/50000 [1:01:07<8:33:18,  1.44it/s]


 11%|███▊                              | 5619/50000 [1:01:08<8:31:09,  1.45it/s]


 11%|███▊                              | 5620/50000 [1:01:09<8:19:35,  1.48it/s]


 11%|███▊                              | 5621/50000 [1:01:09<8:04:29,  1.53it/s]


 11%|███▊                              | 5622/50000 [1:01:10<8:11:42,  1.50it/s]


 11%|███▊                              | 5623/50000 [1:01:11<8:10:30,  1.51it/s]


 11%|███▊                              | 5624/50000 [1:01:11<7:55:18,  1.56it/s]


 11%|███▊                              | 5625/50000 [1:01:12<7:58:52,  1.54it/s]


 11%|███▊                              | 5626/50000 [1:01:12<7:54:14,  1.56it/s]


 11%|███▊                              | 5627/50000 [1:01:13<8:02:06,  1.53it/s]


 11%|███▊                              | 5628/50000 [1:01:14<8:41:11,  1.42it/s]


 11%|███▊                              | 5629/50000 [1:01:15<9:14:31,  1.33it/s]


 11%|███▊                              | 5630/50000 [1:01:15<8:30:21,  1.45it/s]


 11%|███▊                              | 5631/50000 [1:01:16<8:06:23,  1.52it/s]


 11%|███▊                              | 5632/50000 [1:01:16<7:46:55,  1.58it/s]


 11%|███▊                              | 5633/50000 [1:01:17<7:40:04,  1.61it/s]


 11%|███▊                              | 5634/50000 [1:01:18<7:39:42,  1.61it/s]


 11%|███▊                              | 5635/50000 [1:01:18<8:06:39,  1.52it/s]


 11%|███▊                              | 5636/50000 [1:01:19<8:17:53,  1.49it/s]


 11%|███▊                              | 5637/50000 [1:01:20<8:12:54,  1.50it/s]


 11%|███▊                              | 5638/50000 [1:01:20<7:58:25,  1.55it/s]


 11%|███▊                              | 5639/50000 [1:01:21<8:45:53,  1.41it/s]


 11%|███▊                              | 5640/50000 [1:01:22<8:18:39,  1.48it/s]


 11%|███▊                              | 5641/50000 [1:01:22<7:56:50,  1.55it/s]


 11%|███▊                              | 5642/50000 [1:01:23<7:52:47,  1.56it/s]


 11%|███▊                              | 5643/50000 [1:01:24<7:52:43,  1.56it/s]


 11%|███▊                              | 5644/50000 [1:01:24<7:42:45,  1.60it/s]


 11%|███▊                              | 5645/50000 [1:01:25<8:23:28,  1.47it/s]


 11%|███▊                              | 5646/50000 [1:01:26<7:54:47,  1.56it/s]


 11%|███▊                              | 5647/50000 [1:01:26<7:47:02,  1.58it/s]


 11%|███▊                              | 5648/50000 [1:01:27<7:52:52,  1.56it/s]


 11%|███▊                              | 5649/50000 [1:01:27<7:37:30,  1.62it/s]


 11%|███▊                              | 5650/50000 [1:01:28<7:40:41,  1.60it/s]


 11%|███▊                              | 5651/50000 [1:01:29<7:32:52,  1.63it/s]


 11%|███▊                              | 5652/50000 [1:01:29<7:09:59,  1.72it/s]


 11%|███▊                              | 5653/50000 [1:01:30<7:21:16,  1.67it/s]


 11%|███▊                              | 5654/50000 [1:01:30<7:23:30,  1.67it/s]


 11%|███▊                              | 5655/50000 [1:01:31<7:32:08,  1.63it/s]


 11%|███▊                              | 5656/50000 [1:01:32<7:59:45,  1.54it/s]


 11%|███▊                              | 5657/50000 [1:01:32<7:56:12,  1.55it/s]


 11%|███▊                              | 5658/50000 [1:01:33<7:53:42,  1.56it/s]


 11%|███▊                              | 5659/50000 [1:01:34<7:53:20,  1.56it/s]


 11%|███▊                              | 5660/50000 [1:01:34<7:43:06,  1.60it/s]


 11%|███▊                              | 5661/50000 [1:01:35<8:01:29,  1.53it/s]


 11%|███▊                              | 5662/50000 [1:01:36<8:18:56,  1.48it/s]


 11%|███▊                              | 5663/50000 [1:01:36<8:11:00,  1.50it/s]


 11%|███▊                              | 5664/50000 [1:01:37<8:31:44,  1.44it/s]


 11%|███▊                              | 5665/50000 [1:01:38<8:17:15,  1.49it/s]


 11%|███▊                              | 5666/50000 [1:01:38<8:15:23,  1.49it/s]


 11%|███▊                              | 5667/50000 [1:01:39<7:53:24,  1.56it/s]


 11%|███▊                              | 5668/50000 [1:01:40<8:01:45,  1.53it/s]


 11%|███▊                              | 5669/50000 [1:01:40<7:58:56,  1.54it/s]


 11%|███▊                              | 5670/50000 [1:01:41<7:36:12,  1.62it/s]


 11%|███▊                              | 5671/50000 [1:01:42<7:39:36,  1.61it/s]


 11%|███▊                              | 5672/50000 [1:01:42<7:39:34,  1.61it/s]


 11%|███▊                              | 5673/50000 [1:01:43<7:43:59,  1.59it/s]


 11%|███▊                              | 5674/50000 [1:01:43<7:37:10,  1.62it/s]


 11%|███▊                              | 5675/50000 [1:01:44<7:57:37,  1.55it/s]


 11%|███▊                              | 5676/50000 [1:01:45<8:16:52,  1.49it/s]


 11%|███▊                              | 5677/50000 [1:01:45<8:15:33,  1.49it/s]


 11%|███▊                              | 5678/50000 [1:01:46<8:14:42,  1.49it/s]


 11%|███▊                              | 5679/50000 [1:01:47<7:39:45,  1.61it/s]


 11%|███▊                              | 5680/50000 [1:01:47<8:21:38,  1.47it/s]


 11%|███▊                              | 5681/50000 [1:01:48<7:57:34,  1.55it/s]


 11%|███▊                              | 5682/50000 [1:01:49<7:58:44,  1.54it/s]


 11%|███▊                              | 5683/50000 [1:01:49<8:18:20,  1.48it/s]


 11%|███▊                              | 5684/50000 [1:01:50<8:13:16,  1.50it/s]


 11%|███▊                              | 5685/50000 [1:01:51<7:59:51,  1.54it/s]


 11%|███▊                              | 5686/50000 [1:01:51<8:03:05,  1.53it/s]


 11%|███▊                              | 5687/50000 [1:01:52<8:21:22,  1.47it/s]


 11%|███▊                              | 5688/50000 [1:01:53<8:11:08,  1.50it/s]


 11%|███▊                              | 5689/50000 [1:01:54<8:37:05,  1.43it/s]


 11%|███▊                              | 5690/50000 [1:01:54<8:12:43,  1.50it/s]


 11%|███▊                              | 5691/50000 [1:01:55<7:37:49,  1.61it/s]


 11%|███▊                              | 5692/50000 [1:01:55<7:07:54,  1.73it/s]


 11%|███▊                              | 5693/50000 [1:01:56<7:48:27,  1.58it/s]


 11%|███▊                              | 5694/50000 [1:01:57<8:55:08,  1.38it/s]


 11%|███▊                              | 5695/50000 [1:01:58<9:13:58,  1.33it/s]


 11%|███▊                              | 5696/50000 [1:01:58<9:08:28,  1.35it/s]


 11%|███▊                              | 5697/50000 [1:01:59<8:35:36,  1.43it/s]


 11%|███▊                              | 5698/50000 [1:02:00<8:45:32,  1.40it/s]


 11%|███▉                              | 5699/50000 [1:02:00<8:22:22,  1.47it/s]


 11%|███▉                              | 5700/50000 [1:02:01<8:32:02,  1.44it/s]
                                                                                
{'loss': 3.3786, 'grad_norm': 2.459348440170288, 'learning_rate': 0.0008860000000000001, 'epoch': 0.3}

 11%|███▉                              | 5700/50000 [1:02:01<8:32:02,  1.44it/s]


 11%|███▉                              | 5701/50000 [1:02:02<8:18:50,  1.48it/s]


 11%|███▉                              | 5702/50000 [1:02:02<7:58:31,  1.54it/s]


 11%|███▉                              | 5703/50000 [1:02:03<8:06:41,  1.52it/s]


 11%|███▉                              | 5704/50000 [1:02:03<7:48:48,  1.57it/s]


 11%|███▉                              | 5705/50000 [1:02:04<7:54:29,  1.56it/s]


 11%|███▉                              | 5706/50000 [1:02:05<7:38:48,  1.61it/s]


 11%|███▉                              | 5707/50000 [1:02:05<7:34:47,  1.62it/s]


 11%|███▉                              | 5708/50000 [1:02:06<7:41:50,  1.60it/s]


 11%|███▉                              | 5709/50000 [1:02:07<7:45:24,  1.59it/s]


 11%|███▉                              | 5710/50000 [1:02:07<7:44:58,  1.59it/s]


 11%|███▉                              | 5711/50000 [1:02:08<7:20:18,  1.68it/s]


 11%|███▉                              | 5712/50000 [1:02:08<7:32:40,  1.63it/s]


 11%|███▉                              | 5713/50000 [1:02:09<7:48:14,  1.58it/s]


 11%|███▉                              | 5714/50000 [1:02:10<7:33:54,  1.63it/s]


 11%|███▉                              | 5715/50000 [1:02:10<7:07:53,  1.72it/s]


 11%|███▉                              | 5716/50000 [1:02:11<7:00:03,  1.76it/s]


 11%|███▉                              | 5717/50000 [1:02:12<8:17:26,  1.48it/s]


 11%|███▉                              | 5718/50000 [1:02:12<8:25:12,  1.46it/s]


 11%|███▉                              | 5719/50000 [1:02:13<7:58:21,  1.54it/s]


 11%|███▉                              | 5720/50000 [1:02:14<7:54:56,  1.55it/s]


 11%|███▉                              | 5721/50000 [1:02:14<7:51:14,  1.57it/s]


 11%|███▉                              | 5722/50000 [1:02:15<8:21:25,  1.47it/s]


 11%|███▉                              | 5723/50000 [1:02:16<8:02:43,  1.53it/s]


 11%|███▉                              | 5724/50000 [1:02:16<8:06:20,  1.52it/s]


 11%|███▉                              | 5725/50000 [1:02:17<8:09:03,  1.51it/s]


 11%|███▉                              | 5726/50000 [1:02:17<7:51:14,  1.57it/s]


 11%|███▉                              | 5727/50000 [1:02:18<7:50:10,  1.57it/s]


 11%|███▉                              | 5728/50000 [1:02:19<8:00:34,  1.54it/s]


 11%|███▉                              | 5729/50000 [1:02:19<8:03:36,  1.53it/s]


 11%|███▉                              | 5730/50000 [1:02:20<7:31:38,  1.63it/s]


 11%|███▉                              | 5731/50000 [1:02:21<7:56:18,  1.55it/s]


 11%|███▉                              | 5732/50000 [1:02:21<8:19:17,  1.48it/s]


 11%|███▉                              | 5733/50000 [1:02:22<8:02:35,  1.53it/s]


 11%|███▉                              | 5734/50000 [1:02:23<7:48:12,  1.58it/s]


 11%|███▉                              | 5735/50000 [1:02:23<7:29:46,  1.64it/s]


 11%|███▉                              | 5736/50000 [1:02:24<8:07:41,  1.51it/s]


 11%|███▉                              | 5737/50000 [1:02:24<7:43:17,  1.59it/s]


 11%|███▉                              | 5738/50000 [1:02:25<7:35:59,  1.62it/s]


 11%|███▉                              | 5739/50000 [1:02:26<7:28:01,  1.65it/s]


 11%|███▉                              | 5740/50000 [1:02:27<8:19:40,  1.48it/s]


 11%|███▉                              | 5741/50000 [1:02:27<8:19:30,  1.48it/s]


 11%|███▉                              | 5742/50000 [1:02:28<8:01:14,  1.53it/s]


 11%|███▉                              | 5743/50000 [1:02:29<8:46:14,  1.40it/s]


 11%|███▉                              | 5744/50000 [1:02:29<8:18:31,  1.48it/s]


 11%|███▉                              | 5745/50000 [1:02:30<7:57:43,  1.54it/s]


 11%|███▉                              | 5746/50000 [1:02:30<7:25:41,  1.65it/s]


 11%|███▉                              | 5747/50000 [1:02:31<8:12:28,  1.50it/s]


 11%|███▉                              | 5748/50000 [1:02:32<8:14:46,  1.49it/s]


 11%|███▉                              | 5749/50000 [1:02:32<7:58:07,  1.54it/s]


 12%|███▉                              | 5750/50000 [1:02:33<8:23:17,  1.47it/s]


 12%|███▉                              | 5751/50000 [1:02:34<8:04:04,  1.52it/s]


 12%|███▉                              | 5752/50000 [1:02:34<8:07:29,  1.51it/s]


 12%|███▉                              | 5753/50000 [1:02:35<7:54:59,  1.55it/s]


 12%|███▉                              | 5754/50000 [1:02:36<7:52:52,  1.56it/s]


 12%|███▉                              | 5755/50000 [1:02:36<8:09:15,  1.51it/s]


 12%|███▉                              | 5756/50000 [1:02:37<8:19:31,  1.48it/s]


 12%|███▉                              | 5757/50000 [1:02:38<7:50:15,  1.57it/s]


 12%|███▉                              | 5758/50000 [1:02:38<7:35:32,  1.62it/s]


 12%|███▉                              | 5759/50000 [1:02:39<7:39:31,  1.60it/s]


 12%|███▉                              | 5760/50000 [1:02:40<8:07:50,  1.51it/s]


 12%|███▉                              | 5761/50000 [1:02:40<8:33:24,  1.44it/s]


 12%|███▉                              | 5762/50000 [1:02:41<8:46:59,  1.40it/s]


 12%|███▉                              | 5763/50000 [1:02:42<8:51:05,  1.39it/s]


 12%|███▉                              | 5764/50000 [1:02:42<8:19:32,  1.48it/s]


 12%|███▉                              | 5765/50000 [1:02:43<8:10:14,  1.50it/s]


 12%|███▉                              | 5766/50000 [1:02:44<7:36:27,  1.62it/s]


 12%|███▉                              | 5767/50000 [1:02:44<7:41:25,  1.60it/s]


 12%|███▉                              | 5768/50000 [1:02:45<7:44:55,  1.59it/s]


 12%|███▉                              | 5769/50000 [1:02:46<7:47:18,  1.58it/s]


 12%|███▉                              | 5770/50000 [1:02:46<7:47:43,  1.58it/s]


 12%|███▉                              | 5771/50000 [1:02:47<7:51:00,  1.57it/s]


 12%|███▉                              | 5772/50000 [1:02:47<7:43:03,  1.59it/s]


 12%|███▉                              | 5773/50000 [1:02:48<7:50:16,  1.57it/s]


 12%|███▉                              | 5774/50000 [1:02:49<8:19:34,  1.48it/s]


 12%|███▉                              | 5775/50000 [1:02:49<8:11:36,  1.50it/s]


 12%|███▉                              | 5776/50000 [1:02:50<8:23:00,  1.47it/s]


 12%|███▉                              | 5777/50000 [1:02:51<8:06:19,  1.52it/s]


 12%|███▉                              | 5778/50000 [1:02:51<7:57:43,  1.54it/s]


 12%|███▉                              | 5779/50000 [1:02:52<7:37:47,  1.61it/s]


 12%|███▉                              | 5780/50000 [1:02:53<7:51:39,  1.56it/s]


 12%|███▉                              | 5781/50000 [1:02:53<8:14:44,  1.49it/s]


 12%|███▉                              | 5782/50000 [1:02:54<8:33:16,  1.44it/s]


 12%|███▉                              | 5783/50000 [1:02:55<8:21:59,  1.47it/s]


 12%|███▉                              | 5784/50000 [1:02:55<8:10:02,  1.50it/s]


 12%|███▉                              | 5785/50000 [1:02:56<8:07:08,  1.51it/s]


 12%|███▉                              | 5786/50000 [1:02:57<7:47:06,  1.58it/s]


 12%|███▉                              | 5787/50000 [1:02:57<7:37:10,  1.61it/s]


 12%|███▉                              | 5788/50000 [1:02:58<7:30:17,  1.64it/s]


 12%|███▉                              | 5789/50000 [1:02:59<7:40:48,  1.60it/s]


 12%|███▉                              | 5790/50000 [1:02:59<7:53:49,  1.56it/s]


 12%|███▉                              | 5791/50000 [1:03:00<8:02:56,  1.53it/s]


 12%|███▉                              | 5792/50000 [1:03:01<8:19:38,  1.47it/s]


 12%|███▉                              | 5793/50000 [1:03:01<7:59:10,  1.54it/s]


 12%|███▉                              | 5794/50000 [1:03:02<8:18:29,  1.48it/s]


 12%|███▉                              | 5795/50000 [1:03:03<8:07:25,  1.51it/s]


 12%|███▉                              | 5796/50000 [1:03:03<8:22:34,  1.47it/s]


 12%|███▉                              | 5797/50000 [1:03:04<7:56:54,  1.54it/s]


 12%|███▉                              | 5798/50000 [1:03:04<7:43:10,  1.59it/s]


 12%|███▉                              | 5799/50000 [1:03:05<7:25:12,  1.65it/s]


 12%|███▉                              | 5800/50000 [1:03:06<7:40:18,  1.60it/s]
                                                                                
{'loss': 3.4113, 'grad_norm': 2.4726788997650146, 'learning_rate': 0.000884, 'epoch': 0.3}

 12%|███▉                              | 5800/50000 [1:03:06<7:40:18,  1.60it/s]


 12%|███▉                              | 5801/50000 [1:03:06<7:45:27,  1.58it/s]


 12%|███▉                              | 5802/50000 [1:03:07<7:28:09,  1.64it/s]


 12%|███▉                              | 5803/50000 [1:03:08<7:45:23,  1.58it/s]


 12%|███▉                              | 5804/50000 [1:03:08<7:46:11,  1.58it/s]


 12%|███▉                              | 5805/50000 [1:03:09<7:32:42,  1.63it/s]


 12%|███▉                              | 5806/50000 [1:03:09<7:58:40,  1.54it/s]


 12%|███▉                              | 5807/50000 [1:03:10<8:01:03,  1.53it/s]


 12%|███▉                              | 5808/50000 [1:03:11<7:58:09,  1.54it/s]


 12%|███▉                              | 5809/50000 [1:03:11<7:43:22,  1.59it/s]


 12%|███▉                              | 5810/50000 [1:03:12<8:10:12,  1.50it/s]


 12%|███▉                              | 5811/50000 [1:03:13<7:55:53,  1.55it/s]


 12%|███▉                              | 5812/50000 [1:03:13<7:24:28,  1.66it/s]


 12%|███▉                              | 5813/50000 [1:03:14<7:21:17,  1.67it/s]


 12%|███▉                              | 5814/50000 [1:03:14<7:13:23,  1.70it/s]


 12%|███▉                              | 5815/50000 [1:03:15<8:13:30,  1.49it/s]


 12%|███▉                              | 5816/50000 [1:03:16<7:48:43,  1.57it/s]


 12%|███▉                              | 5817/50000 [1:03:16<7:54:52,  1.55it/s]


 12%|███▉                              | 5818/50000 [1:03:17<7:35:32,  1.62it/s]


 12%|███▉                              | 5819/50000 [1:03:18<7:27:28,  1.65it/s]


 12%|███▉                              | 5820/50000 [1:03:18<7:38:42,  1.61it/s]


 12%|███▉                              | 5821/50000 [1:03:19<7:44:56,  1.58it/s]


 12%|███▉                              | 5822/50000 [1:03:20<8:12:19,  1.50it/s]


 12%|███▉                              | 5823/50000 [1:03:20<8:11:27,  1.50it/s]


 12%|███▉                              | 5824/50000 [1:03:21<7:55:22,  1.55it/s]


 12%|███▉                              | 5825/50000 [1:03:22<8:03:50,  1.52it/s]


 12%|███▉                              | 5826/50000 [1:03:22<8:20:10,  1.47it/s]


 12%|███▉                              | 5827/50000 [1:03:23<8:16:49,  1.48it/s]


 12%|███▉                              | 5828/50000 [1:03:24<8:00:29,  1.53it/s]


 12%|███▉                              | 5829/50000 [1:03:24<8:24:35,  1.46it/s]


 12%|███▉                              | 5830/50000 [1:03:25<8:38:40,  1.42it/s]


 12%|███▉                              | 5831/50000 [1:03:26<8:47:03,  1.40it/s]


 12%|███▉                              | 5832/50000 [1:03:26<8:13:22,  1.49it/s]


 12%|███▉                              | 5833/50000 [1:03:27<8:52:34,  1.38it/s]


 12%|███▉                              | 5834/50000 [1:03:28<8:31:18,  1.44it/s]


 12%|███▉                              | 5835/50000 [1:03:28<8:05:09,  1.52it/s]


 12%|███▉                              | 5836/50000 [1:03:29<8:45:19,  1.40it/s]


 12%|███▉                              | 5837/50000 [1:03:30<8:33:14,  1.43it/s]


 12%|███▉                              | 5838/50000 [1:03:31<8:59:24,  1.36it/s]


 12%|███▉                              | 5839/50000 [1:03:31<8:27:25,  1.45it/s]


 12%|███▉                              | 5840/50000 [1:03:32<8:07:53,  1.51it/s]


 12%|███▉                              | 5841/50000 [1:03:33<7:52:57,  1.56it/s]


 12%|███▉                              | 5842/50000 [1:03:33<8:13:59,  1.49it/s]


 12%|███▉                              | 5843/50000 [1:03:34<7:55:54,  1.55it/s]


 12%|███▉                              | 5844/50000 [1:03:35<7:55:37,  1.55it/s]


 12%|███▉                              | 5845/50000 [1:03:35<7:38:45,  1.60it/s]


 12%|███▉                              | 5846/50000 [1:03:36<7:28:16,  1.64it/s]


 12%|███▉                              | 5847/50000 [1:03:36<7:32:07,  1.63it/s]


 12%|███▉                              | 5848/50000 [1:03:37<8:42:07,  1.41it/s]


 12%|███▉                              | 5849/50000 [1:03:38<8:25:26,  1.46it/s]


 12%|███▉                              | 5850/50000 [1:03:39<8:20:25,  1.47it/s]


 12%|███▉                              | 5851/50000 [1:03:39<8:10:08,  1.50it/s]


 12%|███▉                              | 5852/50000 [1:03:40<8:19:35,  1.47it/s]


 12%|███▉                              | 5853/50000 [1:03:41<8:07:12,  1.51it/s]


 12%|███▉                              | 5854/50000 [1:03:41<7:59:36,  1.53it/s]


 12%|███▉                              | 5855/50000 [1:03:42<7:59:30,  1.53it/s]


 12%|███▉                              | 5856/50000 [1:03:42<7:44:16,  1.58it/s]


 12%|███▉                              | 5857/50000 [1:03:43<7:32:10,  1.63it/s]


 12%|███▉                              | 5858/50000 [1:03:44<7:37:34,  1.61it/s]


 12%|███▉                              | 5859/50000 [1:03:44<8:07:05,  1.51it/s]


 12%|███▉                              | 5860/50000 [1:03:45<7:45:21,  1.58it/s]


 12%|███▉                              | 5861/50000 [1:03:46<8:18:01,  1.48it/s]


 12%|███▉                              | 5862/50000 [1:03:47<8:57:37,  1.37it/s]


 12%|███▉                              | 5863/50000 [1:03:47<8:46:23,  1.40it/s]


 12%|███▉                              | 5864/50000 [1:03:48<8:28:59,  1.45it/s]


 12%|███▉                              | 5865/50000 [1:03:49<8:20:38,  1.47it/s]


 12%|███▉                              | 5866/50000 [1:03:49<8:32:15,  1.44it/s]


 12%|███▉                              | 5867/50000 [1:03:50<8:17:11,  1.48it/s]


 12%|███▉                              | 5868/50000 [1:03:50<7:51:31,  1.56it/s]


 12%|███▉                              | 5869/50000 [1:03:51<7:37:41,  1.61it/s]


 12%|███▉                              | 5870/50000 [1:03:52<7:39:24,  1.60it/s]


 12%|███▉                              | 5871/50000 [1:03:52<8:07:23,  1.51it/s]


 12%|███▉                              | 5872/50000 [1:03:53<7:48:22,  1.57it/s]


 12%|███▉                              | 5873/50000 [1:03:54<8:11:45,  1.50it/s]


 12%|███▉                              | 5874/50000 [1:03:54<8:13:32,  1.49it/s]


 12%|███▉                              | 5875/50000 [1:03:55<8:14:57,  1.49it/s]


 12%|███▉                              | 5876/50000 [1:03:56<8:15:50,  1.48it/s]


 12%|███▉                              | 5877/50000 [1:03:56<8:13:45,  1.49it/s]


 12%|███▉                              | 5878/50000 [1:03:57<8:15:10,  1.49it/s]


 12%|███▉                              | 5879/50000 [1:03:58<7:52:27,  1.56it/s]


 12%|███▉                              | 5880/50000 [1:03:58<7:42:19,  1.59it/s]


 12%|███▉                              | 5881/50000 [1:03:59<7:36:48,  1.61it/s]


 12%|███▉                              | 5882/50000 [1:04:00<8:01:17,  1.53it/s]


 12%|████                              | 5883/50000 [1:04:00<7:47:13,  1.57it/s]


 12%|████                              | 5884/50000 [1:04:01<7:36:00,  1.61it/s]


 12%|████                              | 5885/50000 [1:04:01<7:14:55,  1.69it/s]


 12%|████                              | 5886/50000 [1:04:02<7:46:19,  1.58it/s]


 12%|████                              | 5887/50000 [1:04:03<7:31:09,  1.63it/s]


 12%|████                              | 5888/50000 [1:04:03<7:24:48,  1.65it/s]


 12%|████                              | 5889/50000 [1:04:04<7:55:46,  1.55it/s]


 12%|████                              | 5890/50000 [1:04:05<7:43:11,  1.59it/s]


 12%|████                              | 5891/50000 [1:04:05<7:32:23,  1.63it/s]


 12%|████                              | 5892/50000 [1:04:06<7:41:03,  1.59it/s]


 12%|████                              | 5893/50000 [1:04:06<7:44:10,  1.58it/s]


 12%|████                              | 5894/50000 [1:04:07<7:38:14,  1.60it/s]


 12%|████                              | 5895/50000 [1:04:08<7:51:59,  1.56it/s]


 12%|████                              | 5896/50000 [1:04:08<8:12:00,  1.49it/s]


 12%|████                              | 5897/50000 [1:04:09<7:56:04,  1.54it/s]


 12%|████                              | 5898/50000 [1:04:10<7:52:59,  1.55it/s]


 12%|████                              | 5899/50000 [1:04:10<7:32:35,  1.62it/s]


 12%|████                              | 5900/50000 [1:04:11<7:37:45,  1.61it/s]
                                                                                
{'loss': 3.372, 'grad_norm': 2.928089141845703, 'learning_rate': 0.000882, 'epoch': 0.31}

 12%|████                              | 5900/50000 [1:04:11<7:37:45,  1.61it/s]


 12%|████                              | 5901/50000 [1:04:11<7:31:43,  1.63it/s]


 12%|████                              | 5902/50000 [1:04:12<7:20:24,  1.67it/s]


 12%|████                              | 5903/50000 [1:04:12<6:52:31,  1.78it/s]


 12%|████                              | 5904/50000 [1:04:13<7:48:35,  1.57it/s]


 12%|████                              | 5905/50000 [1:04:14<7:39:05,  1.60it/s]


 12%|████                              | 5906/50000 [1:04:15<7:47:48,  1.57it/s]


 12%|████                              | 5907/50000 [1:04:15<7:51:07,  1.56it/s]


 12%|████                              | 5908/50000 [1:04:16<8:00:27,  1.53it/s]


 12%|████                              | 5909/50000 [1:04:16<7:53:09,  1.55it/s]


 12%|████                              | 5910/50000 [1:04:17<7:52:04,  1.56it/s]


 12%|████                              | 5911/50000 [1:04:18<7:52:44,  1.55it/s]


 12%|████                              | 5912/50000 [1:04:19<8:14:55,  1.48it/s]


 12%|████                              | 5913/50000 [1:04:19<8:26:01,  1.45it/s]


 12%|████                              | 5914/50000 [1:04:20<8:14:59,  1.48it/s]


 12%|████                              | 5915/50000 [1:04:21<8:06:05,  1.51it/s]


 12%|████                              | 5916/50000 [1:04:21<7:40:16,  1.60it/s]


 12%|████                              | 5917/50000 [1:04:22<8:21:57,  1.46it/s]


 12%|████                              | 5918/50000 [1:04:22<7:59:49,  1.53it/s]


 12%|████                              | 5919/50000 [1:04:23<8:03:40,  1.52it/s]


 12%|████                              | 5920/50000 [1:04:24<7:55:24,  1.55it/s]


 12%|████                              | 5921/50000 [1:04:24<8:02:04,  1.52it/s]


 12%|████                              | 5922/50000 [1:04:25<8:03:54,  1.52it/s]


 12%|████                              | 5923/50000 [1:04:26<8:22:29,  1.46it/s]


 12%|████                              | 5924/50000 [1:04:26<8:01:47,  1.52it/s]


 12%|████                              | 5925/50000 [1:04:27<7:49:45,  1.56it/s]


 12%|████                              | 5926/50000 [1:04:28<7:39:55,  1.60it/s]


 12%|████                              | 5927/50000 [1:04:28<7:52:28,  1.55it/s]


 12%|████                              | 5928/50000 [1:04:29<7:42:05,  1.59it/s]


 12%|████                              | 5929/50000 [1:04:30<7:52:42,  1.55it/s]


 12%|████                              | 5930/50000 [1:04:30<8:33:24,  1.43it/s]


 12%|████                              | 5931/50000 [1:04:31<8:17:46,  1.48it/s]


 12%|████                              | 5932/50000 [1:04:32<9:33:57,  1.28it/s]


 12%|████                              | 5933/50000 [1:04:33<8:49:14,  1.39it/s]


 12%|████                              | 5934/50000 [1:04:33<8:33:13,  1.43it/s]


 12%|████                              | 5935/50000 [1:04:34<9:10:50,  1.33it/s]


 12%|████                              | 5936/50000 [1:04:35<8:51:10,  1.38it/s]


 12%|████                              | 5937/50000 [1:04:36<8:58:51,  1.36it/s]


 12%|████                              | 5938/50000 [1:04:36<8:46:52,  1.39it/s]


 12%|████                              | 5939/50000 [1:04:37<8:47:27,  1.39it/s]


 12%|████                              | 5940/50000 [1:04:38<8:20:47,  1.47it/s]


 12%|████                              | 5941/50000 [1:04:38<8:11:52,  1.49it/s]


 12%|████                              | 5942/50000 [1:04:39<7:47:51,  1.57it/s]


 12%|████                              | 5943/50000 [1:04:39<7:38:57,  1.60it/s]


 12%|████                              | 5944/50000 [1:04:40<8:03:58,  1.52it/s]


 12%|████                              | 5945/50000 [1:04:41<8:25:40,  1.45it/s]


 12%|████                              | 5946/50000 [1:04:41<8:04:08,  1.52it/s]


 12%|████                              | 5947/50000 [1:04:42<8:01:36,  1.52it/s]


 12%|████                              | 5948/50000 [1:04:43<8:13:47,  1.49it/s]


 12%|████                              | 5949/50000 [1:04:43<7:59:51,  1.53it/s]


 12%|████                              | 5950/50000 [1:04:44<7:50:11,  1.56it/s]


 12%|████                              | 5951/50000 [1:04:45<7:54:47,  1.55it/s]


 12%|████                              | 5952/50000 [1:04:45<8:19:12,  1.47it/s]


 12%|████                              | 5953/50000 [1:04:46<7:56:52,  1.54it/s]


 12%|████                              | 5954/50000 [1:04:47<7:52:16,  1.55it/s]


 12%|████                              | 5955/50000 [1:04:47<8:10:29,  1.50it/s]


 12%|████                              | 5956/50000 [1:04:48<7:45:16,  1.58it/s]


 12%|████                              | 5957/50000 [1:04:49<7:44:14,  1.58it/s]


 12%|████                              | 5958/50000 [1:04:49<7:31:09,  1.63it/s]


 12%|████                              | 5959/50000 [1:04:50<7:25:58,  1.65it/s]


 12%|████                              | 5960/50000 [1:04:50<7:29:13,  1.63it/s]


 12%|████                              | 5961/50000 [1:04:51<8:24:33,  1.45it/s]


 12%|████                              | 5962/50000 [1:04:52<8:21:36,  1.46it/s]


 12%|████                              | 5963/50000 [1:04:53<8:39:29,  1.41it/s]


 12%|████                              | 5964/50000 [1:04:53<8:46:21,  1.39it/s]


 12%|████                              | 5965/50000 [1:04:54<9:18:32,  1.31it/s]


 12%|████                              | 5966/50000 [1:04:55<8:57:47,  1.36it/s]


 12%|████                              | 5967/50000 [1:04:56<8:47:12,  1.39it/s]


 12%|████                              | 5968/50000 [1:04:56<8:18:41,  1.47it/s]


 12%|████                              | 5969/50000 [1:04:57<7:57:25,  1.54it/s]


 12%|████                              | 5970/50000 [1:04:57<7:37:04,  1.61it/s]


 12%|████                              | 5971/50000 [1:04:58<8:03:38,  1.52it/s]


 12%|████                              | 5972/50000 [1:04:59<8:04:53,  1.51it/s]


 12%|████                              | 5973/50000 [1:04:59<8:16:58,  1.48it/s]


 12%|████                              | 5974/50000 [1:05:00<7:42:55,  1.59it/s]


 12%|████                              | 5975/50000 [1:05:01<8:00:14,  1.53it/s]


 12%|████                              | 5976/50000 [1:05:01<8:05:18,  1.51it/s]


 12%|████                              | 5977/50000 [1:05:02<8:20:35,  1.47it/s]


 12%|████                              | 5978/50000 [1:05:03<7:58:45,  1.53it/s]


 12%|████                              | 5979/50000 [1:05:03<8:05:37,  1.51it/s]


 12%|████                              | 5980/50000 [1:05:04<8:10:31,  1.50it/s]


 12%|████                              | 5981/50000 [1:05:05<8:05:27,  1.51it/s]


 12%|████                              | 5982/50000 [1:05:05<8:19:08,  1.47it/s]


 12%|████                              | 5983/50000 [1:05:06<7:54:55,  1.54it/s]


 12%|████                              | 5984/50000 [1:05:07<7:54:37,  1.55it/s]


 12%|████                              | 5985/50000 [1:05:07<7:27:11,  1.64it/s]


 12%|████                              | 5986/50000 [1:05:08<7:37:55,  1.60it/s]


 12%|████                              | 5987/50000 [1:05:09<8:02:38,  1.52it/s]


 12%|████                              | 5988/50000 [1:05:09<7:47:22,  1.57it/s]


 12%|████                              | 5989/50000 [1:05:10<7:52:10,  1.55it/s]


 12%|████                              | 5990/50000 [1:05:10<7:59:39,  1.53it/s]


 12%|████                              | 5991/50000 [1:05:11<7:39:46,  1.60it/s]


 12%|████                              | 5992/50000 [1:05:12<8:03:50,  1.52it/s]


 12%|████                              | 5993/50000 [1:05:12<7:52:16,  1.55it/s]


 12%|████                              | 5994/50000 [1:05:13<8:06:33,  1.51it/s]


 12%|████                              | 5995/50000 [1:05:14<9:01:17,  1.35it/s]


 12%|████                              | 5996/50000 [1:05:15<8:24:26,  1.45it/s]


 12%|████                              | 5997/50000 [1:05:15<7:46:38,  1.57it/s]


 12%|████                              | 5998/50000 [1:05:16<8:50:02,  1.38it/s]


 12%|████                              | 5999/50000 [1:05:17<8:35:49,  1.42it/s]


 12%|████                              | 6000/50000 [1:05:17<8:13:22,  1.49it/s]
                                                                                
{'loss': 3.4112, 'grad_norm': 2.8264877796173096, 'learning_rate': 0.00088, 'epoch': 0.31}

 12%|████                              | 6000/50000 [1:05:17<8:13:22,  1.49it/s]


 12%|████                              | 6001/50000 [1:05:18<8:08:48,  1.50it/s]


 12%|████                              | 6002/50000 [1:05:19<8:21:25,  1.46it/s]


 12%|████                              | 6003/50000 [1:05:19<8:15:47,  1.48it/s]


 12%|████                              | 6004/50000 [1:05:20<8:14:34,  1.48it/s]


 12%|████                              | 6005/50000 [1:05:21<8:04:01,  1.51it/s]


 12%|████                              | 6006/50000 [1:05:21<7:50:54,  1.56it/s]


 12%|████                              | 6007/50000 [1:05:22<7:48:53,  1.56it/s]


 12%|████                              | 6008/50000 [1:05:22<7:47:31,  1.57it/s]


 12%|████                              | 6009/50000 [1:05:23<7:38:08,  1.60it/s]


 12%|████                              | 6010/50000 [1:05:24<7:58:41,  1.53it/s]


 12%|████                              | 6011/50000 [1:05:24<7:57:01,  1.54it/s]


 12%|████                              | 6012/50000 [1:05:25<8:14:24,  1.48it/s]


 12%|████                              | 6013/50000 [1:05:26<7:59:43,  1.53it/s]


 12%|████                              | 6014/50000 [1:05:26<7:42:35,  1.58it/s]


 12%|████                              | 6015/50000 [1:05:27<7:37:17,  1.60it/s]


 12%|████                              | 6016/50000 [1:05:28<7:24:04,  1.65it/s]


 12%|████                              | 6017/50000 [1:05:28<7:27:19,  1.64it/s]


 12%|████                              | 6018/50000 [1:05:29<7:42:21,  1.59it/s]


 12%|████                              | 6019/50000 [1:05:29<7:34:55,  1.61it/s]


 12%|████                              | 6020/50000 [1:05:30<7:31:14,  1.62it/s]


 12%|████                              | 6021/50000 [1:05:31<7:17:38,  1.67it/s]


 12%|████                              | 6022/50000 [1:05:31<7:33:00,  1.62it/s]


 12%|████                              | 6023/50000 [1:05:32<7:43:47,  1.58it/s]


 12%|████                              | 6024/50000 [1:05:33<7:55:37,  1.54it/s]


 12%|████                              | 6025/50000 [1:05:33<8:16:53,  1.48it/s]


 12%|████                              | 6026/50000 [1:05:34<7:53:23,  1.55it/s]


 12%|████                              | 6027/50000 [1:05:35<7:58:55,  1.53it/s]


 12%|████                              | 6028/50000 [1:05:35<7:35:00,  1.61it/s]


 12%|████                              | 6029/50000 [1:05:36<8:00:27,  1.53it/s]


 12%|████                              | 6030/50000 [1:05:37<8:38:21,  1.41it/s]


 12%|████                              | 6031/50000 [1:05:37<8:10:54,  1.49it/s]


 12%|████                              | 6032/50000 [1:05:38<7:55:46,  1.54it/s]


 12%|████                              | 6033/50000 [1:05:38<7:42:25,  1.58it/s]


 12%|████                              | 6034/50000 [1:05:39<7:26:16,  1.64it/s]


 12%|████                              | 6035/50000 [1:05:40<7:50:15,  1.56it/s]


 12%|████                              | 6036/50000 [1:05:40<7:51:11,  1.56it/s]


 12%|████                              | 6037/50000 [1:05:41<7:21:53,  1.66it/s]


 12%|████                              | 6038/50000 [1:05:42<7:29:47,  1.63it/s]


 12%|████                              | 6039/50000 [1:05:42<7:40:48,  1.59it/s]


 12%|████                              | 6040/50000 [1:05:43<7:42:16,  1.58it/s]


 12%|████                              | 6041/50000 [1:05:43<7:31:27,  1.62it/s]


 12%|████                              | 6042/50000 [1:05:44<7:24:31,  1.65it/s]


 12%|████                              | 6043/50000 [1:05:45<7:39:15,  1.60it/s]


 12%|████                              | 6044/50000 [1:05:45<7:38:31,  1.60it/s]


 12%|████                              | 6045/50000 [1:05:46<8:03:15,  1.52it/s]


 12%|████                              | 6046/50000 [1:05:47<7:42:35,  1.58it/s]


 12%|████                              | 6047/50000 [1:05:47<7:36:17,  1.61it/s]


 12%|████                              | 6048/50000 [1:05:48<7:43:07,  1.58it/s]


 12%|████                              | 6049/50000 [1:05:48<7:34:06,  1.61it/s]


 12%|████                              | 6050/50000 [1:05:49<7:23:35,  1.65it/s]


 12%|████                              | 6051/50000 [1:05:50<7:18:56,  1.67it/s]


 12%|████                              | 6052/50000 [1:05:50<7:37:50,  1.60it/s]


 12%|████                              | 6053/50000 [1:05:51<7:42:43,  1.58it/s]


 12%|████                              | 6054/50000 [1:05:52<7:36:25,  1.60it/s]


 12%|████                              | 6055/50000 [1:05:52<8:01:46,  1.52it/s]


 12%|████                              | 6056/50000 [1:05:53<8:07:47,  1.50it/s]


 12%|████                              | 6057/50000 [1:05:54<8:05:01,  1.51it/s]


 12%|████                              | 6058/50000 [1:05:54<7:49:20,  1.56it/s]


 12%|████                              | 6059/50000 [1:05:55<7:20:58,  1.66it/s]


 12%|████                              | 6060/50000 [1:05:55<7:30:59,  1.62it/s]


 12%|████                              | 6061/50000 [1:05:56<7:44:37,  1.58it/s]


 12%|████                              | 6062/50000 [1:05:57<7:42:56,  1.58it/s]


 12%|████                              | 6063/50000 [1:05:57<7:48:53,  1.56it/s]


 12%|████                              | 6064/50000 [1:05:58<7:52:58,  1.55it/s]


 12%|████                              | 6065/50000 [1:05:59<8:17:14,  1.47it/s]


 12%|████                              | 6066/50000 [1:05:59<7:58:45,  1.53it/s]


 12%|████▏                             | 6067/50000 [1:06:00<8:37:11,  1.42it/s]


 12%|████▏                             | 6068/50000 [1:06:01<8:22:21,  1.46it/s]


 12%|████▏                             | 6069/50000 [1:06:01<7:43:36,  1.58it/s]


 12%|████▏                             | 6070/50000 [1:06:02<7:47:41,  1.57it/s]


 12%|████▏                             | 6071/50000 [1:06:03<7:50:32,  1.56it/s]


 12%|████▏                             | 6072/50000 [1:06:03<7:34:58,  1.61it/s]


 12%|████▏                             | 6073/50000 [1:06:04<7:37:58,  1.60it/s]


 12%|████▏                             | 6074/50000 [1:06:05<7:44:13,  1.58it/s]


 12%|████▏                             | 6075/50000 [1:06:05<7:51:17,  1.55it/s]


 12%|████▏                             | 6076/50000 [1:06:06<8:10:44,  1.49it/s]


 12%|████▏                             | 6077/50000 [1:06:07<8:07:04,  1.50it/s]


 12%|████▏                             | 6078/50000 [1:06:07<7:45:14,  1.57it/s]


 12%|████▏                             | 6079/50000 [1:06:08<7:25:47,  1.64it/s]


 12%|████▏                             | 6080/50000 [1:06:08<7:55:20,  1.54it/s]


 12%|████▏                             | 6081/50000 [1:06:09<7:41:55,  1.58it/s]


 12%|████▏                             | 6082/50000 [1:06:10<7:42:28,  1.58it/s]


 12%|████▏                             | 6083/50000 [1:06:10<8:13:54,  1.48it/s]


 12%|████▏                             | 6084/50000 [1:06:11<8:15:53,  1.48it/s]


 12%|████▏                             | 6085/50000 [1:06:12<8:06:11,  1.51it/s]


 12%|████▏                             | 6086/50000 [1:06:12<8:02:10,  1.52it/s]


 12%|████▏                             | 6087/50000 [1:06:13<7:57:53,  1.53it/s]


 12%|████▏                             | 6088/50000 [1:06:14<7:42:24,  1.58it/s]


 12%|████▏                             | 6089/50000 [1:06:14<7:32:50,  1.62it/s]


 12%|████▏                             | 6090/50000 [1:06:15<7:41:25,  1.59it/s]


 12%|████▏                             | 6091/50000 [1:06:15<7:30:47,  1.62it/s]


 12%|████▏                             | 6092/50000 [1:06:16<7:58:40,  1.53it/s]


 12%|████▏                             | 6093/50000 [1:06:17<7:57:57,  1.53it/s]


 12%|████▏                             | 6094/50000 [1:06:17<7:54:45,  1.54it/s]


 12%|████▏                             | 6095/50000 [1:06:18<7:52:34,  1.55it/s]


 12%|████▏                             | 6096/50000 [1:06:19<7:59:24,  1.53it/s]


 12%|████▏                             | 6097/50000 [1:06:19<7:57:11,  1.53it/s]


 12%|████▏                             | 6098/50000 [1:06:20<7:39:03,  1.59it/s]


 12%|████▏                             | 6099/50000 [1:06:21<7:45:44,  1.57it/s]


 12%|████▏                             | 6100/50000 [1:06:21<7:54:35,  1.54it/s]
                                                                                
{'loss': 3.3737, 'grad_norm': 2.756864547729492, 'learning_rate': 0.000878, 'epoch': 0.32}

 12%|████▏                             | 6100/50000 [1:06:21<7:54:35,  1.54it/s]


 12%|████▏                             | 6101/50000 [1:06:22<7:44:14,  1.58it/s]


 12%|████▏                             | 6102/50000 [1:06:23<8:26:48,  1.44it/s]


 12%|████▏                             | 6103/50000 [1:06:23<7:59:50,  1.52it/s]


 12%|████▏                             | 6104/50000 [1:06:24<7:42:38,  1.58it/s]


 12%|████▏                             | 6105/50000 [1:06:24<7:28:01,  1.63it/s]


 12%|████▏                             | 6106/50000 [1:06:25<7:37:15,  1.60it/s]


 12%|████▏                             | 6107/50000 [1:06:26<7:47:43,  1.56it/s]


 12%|████▏                             | 6108/50000 [1:06:26<7:46:45,  1.57it/s]


 12%|████▏                             | 6109/50000 [1:06:27<7:43:26,  1.58it/s]


 12%|████▏                             | 6110/50000 [1:06:28<7:36:39,  1.60it/s]


 12%|████▏                             | 6111/50000 [1:06:28<7:27:49,  1.63it/s]


 12%|████▏                             | 6112/50000 [1:06:29<7:27:09,  1.64it/s]


 12%|████▏                             | 6113/50000 [1:06:30<7:42:33,  1.58it/s]


 12%|████▏                             | 6114/50000 [1:06:30<7:27:53,  1.63it/s]


 12%|████▏                             | 6115/50000 [1:06:31<7:55:34,  1.54it/s]


 12%|████▏                             | 6116/50000 [1:06:32<8:01:55,  1.52it/s]


 12%|████▏                             | 6117/50000 [1:06:32<8:00:48,  1.52it/s]


 12%|████▏                             | 6118/50000 [1:06:33<8:38:58,  1.41it/s]


 12%|████▏                             | 6119/50000 [1:06:34<8:12:56,  1.48it/s]


 12%|████▏                             | 6120/50000 [1:06:34<7:46:26,  1.57it/s]


 12%|████▏                             | 6121/50000 [1:06:35<7:44:25,  1.57it/s]


 12%|████▏                             | 6122/50000 [1:06:35<7:41:45,  1.58it/s]


 12%|████▏                             | 6123/50000 [1:06:36<8:05:19,  1.51it/s]


 12%|████▏                             | 6124/50000 [1:06:37<8:21:32,  1.46it/s]


 12%|████▏                             | 6125/50000 [1:06:38<8:58:36,  1.36it/s]


 12%|████▏                             | 6126/50000 [1:06:38<8:36:01,  1.42it/s]


 12%|████▏                             | 6127/50000 [1:06:39<8:18:57,  1.47it/s]


 12%|████▏                             | 6128/50000 [1:06:40<8:50:02,  1.38it/s]


 12%|████▏                             | 6129/50000 [1:06:40<8:05:58,  1.50it/s]


 12%|████▏                             | 6130/50000 [1:06:41<8:07:52,  1.50it/s]


 12%|████▏                             | 6131/50000 [1:06:42<7:49:43,  1.56it/s]


 12%|████▏                             | 6132/50000 [1:06:42<7:38:21,  1.60it/s]


 12%|████▏                             | 6133/50000 [1:06:43<7:23:39,  1.65it/s]


 12%|████▏                             | 6134/50000 [1:06:43<7:52:37,  1.55it/s]


 12%|████▏                             | 6135/50000 [1:06:44<8:22:36,  1.45it/s]


 12%|████▏                             | 6136/50000 [1:06:45<8:02:48,  1.51it/s]


 12%|████▏                             | 6137/50000 [1:06:46<8:01:22,  1.52it/s]


 12%|████▏                             | 6138/50000 [1:06:46<7:57:33,  1.53it/s]


 12%|████▏                             | 6139/50000 [1:06:47<7:57:41,  1.53it/s]


 12%|████▏                             | 6140/50000 [1:06:47<7:39:48,  1.59it/s]


 12%|████▏                             | 6141/50000 [1:06:48<7:27:22,  1.63it/s]


 12%|████▏                             | 6142/50000 [1:06:49<7:35:19,  1.61it/s]


 12%|████▏                             | 6143/50000 [1:06:49<7:54:49,  1.54it/s]


 12%|████▏                             | 6144/50000 [1:06:50<7:37:03,  1.60it/s]


 12%|████▏                             | 6145/50000 [1:06:50<7:21:26,  1.66it/s]


 12%|████▏                             | 6146/50000 [1:06:51<7:27:21,  1.63it/s]


 12%|████▏                             | 6147/50000 [1:06:52<7:42:18,  1.58it/s]


 12%|████▏                             | 6148/50000 [1:06:52<7:48:27,  1.56it/s]


 12%|████▏                             | 6149/50000 [1:06:53<8:12:40,  1.48it/s]


 12%|████▏                             | 6150/50000 [1:06:54<7:58:26,  1.53it/s]


 12%|████▏                             | 6151/50000 [1:06:54<7:44:21,  1.57it/s]


 12%|████▏                             | 6152/50000 [1:06:55<8:08:23,  1.50it/s]


 12%|████▏                             | 6153/50000 [1:06:56<7:50:03,  1.55it/s]


 12%|████▏                             | 6154/50000 [1:06:56<7:45:34,  1.57it/s]


 12%|████▏                             | 6155/50000 [1:06:57<7:51:55,  1.55it/s]


 12%|████▏                             | 6156/50000 [1:06:58<7:39:49,  1.59it/s]


 12%|████▏                             | 6157/50000 [1:06:58<7:31:21,  1.62it/s]


 12%|████▏                             | 6158/50000 [1:06:59<7:39:17,  1.59it/s]


 12%|████▏                             | 6159/50000 [1:06:59<7:28:17,  1.63it/s]


 12%|████▏                             | 6160/50000 [1:07:00<7:50:53,  1.55it/s]


 12%|████▏                             | 6161/50000 [1:07:01<8:12:32,  1.48it/s]


 12%|████▏                             | 6162/50000 [1:07:01<7:53:52,  1.54it/s]


 12%|████▏                             | 6163/50000 [1:07:02<7:57:22,  1.53it/s]


 12%|████▏                             | 6164/50000 [1:07:03<8:19:55,  1.46it/s]


 12%|████▏                             | 6165/50000 [1:07:04<8:10:03,  1.49it/s]


 12%|████▏                             | 6166/50000 [1:07:04<8:01:49,  1.52it/s]


 12%|████▏                             | 6167/50000 [1:07:05<7:46:15,  1.57it/s]


 12%|████▏                             | 6168/50000 [1:07:05<7:27:07,  1.63it/s]


 12%|████▏                             | 6169/50000 [1:07:06<7:38:40,  1.59it/s]


 12%|████▏                             | 6170/50000 [1:07:07<7:43:57,  1.57it/s]


 12%|████▏                             | 6171/50000 [1:07:07<8:06:08,  1.50it/s]


 12%|████▏                             | 6172/50000 [1:07:08<7:59:00,  1.52it/s]


 12%|████▏                             | 6173/50000 [1:07:09<7:54:05,  1.54it/s]


 12%|████▏                             | 6174/50000 [1:07:09<7:39:32,  1.59it/s]


 12%|████▏                             | 6175/50000 [1:07:10<7:47:40,  1.56it/s]


 12%|████▏                             | 6176/50000 [1:07:10<7:47:39,  1.56it/s]


 12%|████▏                             | 6177/50000 [1:07:11<7:34:12,  1.61it/s]


 12%|████▏                             | 6178/50000 [1:07:12<7:11:43,  1.69it/s]


 12%|████▏                             | 6179/50000 [1:07:12<7:07:49,  1.71it/s]


 12%|████▏                             | 6180/50000 [1:07:13<7:24:09,  1.64it/s]


 12%|████▏                             | 6181/50000 [1:07:13<7:35:43,  1.60it/s]


 12%|████▏                             | 6182/50000 [1:07:14<7:22:55,  1.65it/s]


 12%|████▏                             | 6183/50000 [1:07:15<7:18:08,  1.67it/s]


 12%|████▏                             | 6184/50000 [1:07:15<7:45:50,  1.57it/s]


 12%|████▏                             | 6185/50000 [1:07:16<7:14:10,  1.68it/s]


 12%|████▏                             | 6186/50000 [1:07:17<8:10:06,  1.49it/s]


 12%|████▏                             | 6187/50000 [1:07:17<7:33:57,  1.61it/s]


 12%|████▏                             | 6188/50000 [1:07:18<8:06:29,  1.50it/s]


 12%|████▏                             | 6189/50000 [1:07:19<7:58:28,  1.53it/s]


 12%|████▏                             | 6190/50000 [1:07:19<7:47:09,  1.56it/s]


 12%|████▏                             | 6191/50000 [1:07:20<7:30:51,  1.62it/s]


 12%|████▏                             | 6192/50000 [1:07:20<7:22:23,  1.65it/s]


 12%|████▏                             | 6193/50000 [1:07:21<7:17:43,  1.67it/s]


 12%|████▏                             | 6194/50000 [1:07:22<8:08:02,  1.50it/s]


 12%|████▏                             | 6195/50000 [1:07:23<8:32:49,  1.42it/s]


 12%|████▏                             | 6196/50000 [1:07:23<8:16:16,  1.47it/s]


 12%|████▏                             | 6197/50000 [1:07:24<8:04:34,  1.51it/s]


 12%|████▏                             | 6198/50000 [1:07:24<7:52:06,  1.55it/s]


 12%|████▏                             | 6199/50000 [1:07:25<7:38:03,  1.59it/s]


 12%|████▏                             | 6200/50000 [1:07:26<8:04:39,  1.51it/s]
                                                                                
{'loss': 3.3713, 'grad_norm': 2.5599708557128906, 'learning_rate': 0.000876, 'epoch': 0.32}

 12%|████▏                             | 6200/50000 [1:07:26<8:04:39,  1.51it/s]


 12%|████▏                             | 6201/50000 [1:07:26<8:06:13,  1.50it/s]


 12%|████▏                             | 6202/50000 [1:07:27<7:50:43,  1.55it/s]


 12%|████▏                             | 6203/50000 [1:07:28<7:32:57,  1.61it/s]


 12%|████▏                             | 6204/50000 [1:07:28<7:21:49,  1.65it/s]


 12%|████▏                             | 6205/50000 [1:07:29<7:18:14,  1.67it/s]


 12%|████▏                             | 6206/50000 [1:07:29<7:28:08,  1.63it/s]


 12%|████▏                             | 6207/50000 [1:07:30<7:30:54,  1.62it/s]


 12%|████▏                             | 6208/50000 [1:07:31<8:12:48,  1.48it/s]


 12%|████▏                             | 6209/50000 [1:07:31<7:52:31,  1.54it/s]


 12%|████▏                             | 6210/50000 [1:07:32<7:38:19,  1.59it/s]


 12%|████▏                             | 6211/50000 [1:07:33<7:36:45,  1.60it/s]


 12%|████▏                             | 6212/50000 [1:07:33<7:45:19,  1.57it/s]


 12%|████▏                             | 6213/50000 [1:07:34<7:48:25,  1.56it/s]


 12%|████▏                             | 6214/50000 [1:07:35<7:53:24,  1.54it/s]


 12%|████▏                             | 6215/50000 [1:07:35<7:52:47,  1.54it/s]


 12%|████▏                             | 6216/50000 [1:07:36<8:38:19,  1.41it/s]


 12%|████▏                             | 6217/50000 [1:07:37<9:06:59,  1.33it/s]


 12%|████▏                             | 6218/50000 [1:07:38<8:30:28,  1.43it/s]


 12%|████▏                             | 6219/50000 [1:07:38<8:41:24,  1.40it/s]


 12%|████▏                             | 6220/50000 [1:07:39<8:22:28,  1.45it/s]


 12%|████▏                             | 6221/50000 [1:07:40<8:36:17,  1.41it/s]


 12%|████▏                             | 6222/50000 [1:07:40<8:19:01,  1.46it/s]


 12%|████▏                             | 6223/50000 [1:07:41<7:55:59,  1.53it/s]


 12%|████▏                             | 6224/50000 [1:07:42<8:20:33,  1.46it/s]


 12%|████▏                             | 6225/50000 [1:07:42<8:17:40,  1.47it/s]


 12%|████▏                             | 6226/50000 [1:07:43<7:56:29,  1.53it/s]


 12%|████▏                             | 6227/50000 [1:07:43<7:40:17,  1.58it/s]


 12%|████▏                             | 6228/50000 [1:07:44<7:39:53,  1.59it/s]


 12%|████▏                             | 6229/50000 [1:07:45<7:31:12,  1.62it/s]


 12%|████▏                             | 6230/50000 [1:07:45<7:38:56,  1.59it/s]


 12%|████▏                             | 6231/50000 [1:07:46<7:47:12,  1.56it/s]


 12%|████▏                             | 6232/50000 [1:07:47<7:55:52,  1.53it/s]


 12%|████▏                             | 6233/50000 [1:07:47<7:32:48,  1.61it/s]


 12%|████▏                             | 6234/50000 [1:07:48<7:53:59,  1.54it/s]


 12%|████▏                             | 6235/50000 [1:07:49<7:38:25,  1.59it/s]


 12%|████▏                             | 6236/50000 [1:07:49<7:42:43,  1.58it/s]


 12%|████▏                             | 6237/50000 [1:07:50<7:34:44,  1.60it/s]


 12%|████▏                             | 6238/50000 [1:07:51<8:34:12,  1.42it/s]


 12%|████▏                             | 6239/50000 [1:07:51<8:02:30,  1.51it/s]


 12%|████▏                             | 6240/50000 [1:07:52<7:38:50,  1.59it/s]


 12%|████▏                             | 6241/50000 [1:07:52<7:58:48,  1.52it/s]


 12%|████▏                             | 6242/50000 [1:07:53<8:03:50,  1.51it/s]


 12%|████▏                             | 6243/50000 [1:07:54<8:20:22,  1.46it/s]


 12%|████▏                             | 6244/50000 [1:07:55<8:07:54,  1.49it/s]


 12%|████▏                             | 6245/50000 [1:07:55<7:42:31,  1.58it/s]


 12%|████▏                             | 6246/50000 [1:07:56<7:17:11,  1.67it/s]


 12%|████▏                             | 6247/50000 [1:07:56<7:27:50,  1.63it/s]


 12%|████▏                             | 6248/50000 [1:07:57<7:35:16,  1.60it/s]


 12%|████▏                             | 6249/50000 [1:07:57<7:21:07,  1.65it/s]


 12%|████▎                             | 6250/50000 [1:07:58<8:14:38,  1.47it/s]


 13%|████▎                             | 6251/50000 [1:07:59<8:25:02,  1.44it/s]


 13%|████▎                             | 6252/50000 [1:08:00<8:37:38,  1.41it/s]


 13%|████▎                             | 6253/50000 [1:08:00<8:23:37,  1.45it/s]


 13%|████▎                             | 6254/50000 [1:08:01<7:51:42,  1.55it/s]


 13%|████▎                             | 6255/50000 [1:08:02<8:07:26,  1.50it/s]


 13%|████▎                             | 6256/50000 [1:08:02<7:58:23,  1.52it/s]


 13%|████▎                             | 6257/50000 [1:08:03<8:03:05,  1.51it/s]


 13%|████▎                             | 6258/50000 [1:08:04<7:45:34,  1.57it/s]


 13%|████▎                             | 6259/50000 [1:08:04<7:36:05,  1.60it/s]


 13%|████▎                             | 6260/50000 [1:08:05<7:23:55,  1.64it/s]


 13%|████▎                             | 6261/50000 [1:08:06<8:14:50,  1.47it/s]


 13%|████▎                             | 6262/50000 [1:08:06<8:33:09,  1.42it/s]


 13%|████▎                             | 6263/50000 [1:08:07<8:26:05,  1.44it/s]


 13%|████▎                             | 6264/50000 [1:08:08<8:12:43,  1.48it/s]


 13%|████▎                             | 6265/50000 [1:08:08<7:57:44,  1.53it/s]


 13%|████▎                             | 6266/50000 [1:08:09<7:34:54,  1.60it/s]


 13%|████▎                             | 6267/50000 [1:08:10<7:53:30,  1.54it/s]


 13%|████▎                             | 6268/50000 [1:08:10<8:07:54,  1.49it/s]


 13%|████▎                             | 6269/50000 [1:08:11<7:40:41,  1.58it/s]


 13%|████▎                             | 6270/50000 [1:08:11<7:51:53,  1.54it/s]


 13%|████▎                             | 6271/50000 [1:08:12<7:50:00,  1.55it/s]


 13%|████▎                             | 6272/50000 [1:08:13<7:34:57,  1.60it/s]


 13%|████▎                             | 6273/50000 [1:08:13<7:20:22,  1.65it/s]


 13%|████▎                             | 6274/50000 [1:08:14<8:06:39,  1.50it/s]


 13%|████▎                             | 6275/50000 [1:08:15<8:00:22,  1.52it/s]


 13%|████▎                             | 6276/50000 [1:08:15<7:46:15,  1.56it/s]


 13%|████▎                             | 6277/50000 [1:08:16<7:44:52,  1.57it/s]


 13%|████▎                             | 6278/50000 [1:08:17<7:45:09,  1.57it/s]


 13%|████▎                             | 6279/50000 [1:08:17<8:32:28,  1.42it/s]


 13%|████▎                             | 6280/50000 [1:08:18<8:10:09,  1.49it/s]


 13%|████▎                             | 6281/50000 [1:08:19<8:00:07,  1.52it/s]


 13%|████▎                             | 6282/50000 [1:08:19<8:01:23,  1.51it/s]


 13%|████▎                             | 6283/50000 [1:08:20<8:13:51,  1.48it/s]


 13%|████▎                             | 6284/50000 [1:08:21<8:12:15,  1.48it/s]


 13%|████▎                             | 6285/50000 [1:08:21<8:04:10,  1.50it/s]


 13%|████▎                             | 6286/50000 [1:08:22<7:55:50,  1.53it/s]


 13%|████▎                             | 6287/50000 [1:08:23<7:33:36,  1.61it/s]


 13%|████▎                             | 6288/50000 [1:08:23<7:43:29,  1.57it/s]


 13%|████▎                             | 6289/50000 [1:08:24<7:49:40,  1.55it/s]


 13%|████▎                             | 6290/50000 [1:08:24<7:22:53,  1.64it/s]


 13%|████▎                             | 6291/50000 [1:08:25<7:59:22,  1.52it/s]


 13%|████▎                             | 6292/50000 [1:08:26<7:47:17,  1.56it/s]


 13%|████▎                             | 6293/50000 [1:08:26<7:30:12,  1.62it/s]


 13%|████▎                             | 6294/50000 [1:08:27<7:50:04,  1.55it/s]


 13%|████▎                             | 6295/50000 [1:08:28<7:15:23,  1.67it/s]


 13%|████▎                             | 6296/50000 [1:08:28<7:22:09,  1.65it/s]


 13%|████▎                             | 6297/50000 [1:08:29<7:15:40,  1.67it/s]


 13%|████▎                             | 6298/50000 [1:08:29<7:05:22,  1.71it/s]


 13%|████▎                             | 6299/50000 [1:08:30<7:20:08,  1.65it/s]


 13%|████▎                             | 6300/50000 [1:08:31<8:12:01,  1.48it/s]
                                                                                
{'loss': 3.356, 'grad_norm': 2.5824050903320312, 'learning_rate': 0.000874, 'epoch': 0.33}

 13%|████▎                             | 6300/50000 [1:08:31<8:12:01,  1.48it/s]


 13%|████▎                             | 6301/50000 [1:08:31<7:51:47,  1.54it/s]


 13%|████▎                             | 6302/50000 [1:08:32<7:34:41,  1.60it/s]


 13%|████▎                             | 6303/50000 [1:08:33<7:29:59,  1.62it/s]


 13%|████▎                             | 6304/50000 [1:08:33<7:24:55,  1.64it/s]


 13%|████▎                             | 6305/50000 [1:08:34<7:14:37,  1.68it/s]


 13%|████▎                             | 6306/50000 [1:08:34<6:52:56,  1.76it/s]


 13%|████▎                             | 6307/50000 [1:08:35<6:35:00,  1.84it/s]


 13%|████▎                             | 6308/50000 [1:08:35<7:33:24,  1.61it/s]


 13%|████▎                             | 6309/50000 [1:08:36<8:01:05,  1.51it/s]


 13%|████▎                             | 6310/50000 [1:08:37<7:45:23,  1.56it/s]


 13%|████▎                             | 6311/50000 [1:08:37<7:45:24,  1.56it/s]


 13%|████▎                             | 6312/50000 [1:08:38<7:30:21,  1.62it/s]


 13%|████▎                             | 6313/50000 [1:08:39<7:25:16,  1.64it/s]


 13%|████▎                             | 6314/50000 [1:08:39<7:55:36,  1.53it/s]


 13%|████▎                             | 6315/50000 [1:08:40<7:21:41,  1.65it/s]


 13%|████▎                             | 6316/50000 [1:08:41<7:46:01,  1.56it/s]


 13%|████▎                             | 6317/50000 [1:08:41<8:24:30,  1.44it/s]


 13%|████▎                             | 6318/50000 [1:08:42<7:52:16,  1.54it/s]


 13%|████▎                             | 6319/50000 [1:08:43<7:51:36,  1.54it/s]


 13%|████▎                             | 6320/50000 [1:08:43<8:05:06,  1.50it/s]


 13%|████▎                             | 6321/50000 [1:08:44<7:48:11,  1.55it/s]


 13%|████▎                             | 6322/50000 [1:08:45<7:52:57,  1.54it/s]


 13%|████▎                             | 6323/50000 [1:08:45<7:32:58,  1.61it/s]


 13%|████▎                             | 6324/50000 [1:08:46<8:26:29,  1.44it/s]


 13%|████▎                             | 6325/50000 [1:08:47<8:04:35,  1.50it/s]


 13%|████▎                             | 6326/50000 [1:08:47<7:39:27,  1.58it/s]


 13%|████▎                             | 6327/50000 [1:08:48<7:50:58,  1.55it/s]


 13%|████▎                             | 6328/50000 [1:08:48<7:22:29,  1.64it/s]


 13%|████▎                             | 6329/50000 [1:08:49<7:21:05,  1.65it/s]


 13%|████▎                             | 6330/50000 [1:08:50<7:43:39,  1.57it/s]


 13%|████▎                             | 6331/50000 [1:08:50<7:45:28,  1.56it/s]


 13%|████▎                             | 6332/50000 [1:08:51<8:08:08,  1.49it/s]


 13%|████▎                             | 6333/50000 [1:08:52<8:11:02,  1.48it/s]


 13%|████▎                             | 6334/50000 [1:08:53<8:42:00,  1.39it/s]


 13%|████▎                             | 6335/50000 [1:08:53<8:40:35,  1.40it/s]


 13%|████▎                             | 6336/50000 [1:08:54<8:26:11,  1.44it/s]


 13%|████▎                             | 6337/50000 [1:08:54<8:04:20,  1.50it/s]


 13%|████▎                             | 6338/50000 [1:08:55<8:44:08,  1.39it/s]


 13%|████▎                             | 6339/50000 [1:08:56<8:08:42,  1.49it/s]


 13%|████▎                             | 6340/50000 [1:08:56<7:57:32,  1.52it/s]


 13%|████▎                             | 6341/50000 [1:08:57<7:34:38,  1.60it/s]


 13%|████▎                             | 6342/50000 [1:08:58<7:47:21,  1.56it/s]


 13%|████▎                             | 6343/50000 [1:08:58<7:49:22,  1.55it/s]


 13%|████▎                             | 6344/50000 [1:08:59<7:55:07,  1.53it/s]


 13%|████▎                             | 6345/50000 [1:09:00<8:08:48,  1.49it/s]


 13%|████▎                             | 6346/50000 [1:09:01<8:48:39,  1.38it/s]


 13%|████▎                             | 6347/50000 [1:09:01<8:19:48,  1.46it/s]


 13%|████▎                             | 6348/50000 [1:09:02<7:50:09,  1.55it/s]


 13%|████▎                             | 6349/50000 [1:09:02<8:05:19,  1.50it/s]


 13%|████▎                             | 6350/50000 [1:09:03<7:49:26,  1.55it/s]


 13%|████▎                             | 6351/50000 [1:09:04<7:49:34,  1.55it/s]


 13%|████▎                             | 6352/50000 [1:09:04<7:56:27,  1.53it/s]


 13%|████▎                             | 6353/50000 [1:09:05<7:46:06,  1.56it/s]


 13%|████▎                             | 6354/50000 [1:09:06<7:54:07,  1.53it/s]


 13%|████▎                             | 6355/50000 [1:09:06<7:51:27,  1.54it/s]


 13%|████▎                             | 6356/50000 [1:09:07<7:38:25,  1.59it/s]


 13%|████▎                             | 6357/50000 [1:09:08<7:47:24,  1.56it/s]


 13%|████▎                             | 6358/50000 [1:09:08<7:27:34,  1.63it/s]


 13%|████▎                             | 6359/50000 [1:09:09<7:23:08,  1.64it/s]


 13%|████▎                             | 6360/50000 [1:09:09<7:00:11,  1.73it/s]


 13%|████▎                             | 6361/50000 [1:09:10<6:57:06,  1.74it/s]


 13%|████▎                             | 6362/50000 [1:09:10<7:18:29,  1.66it/s]


 13%|████▎                             | 6363/50000 [1:09:11<7:41:47,  1.57it/s]


 13%|████▎                             | 6364/50000 [1:09:12<7:48:13,  1.55it/s]


 13%|████▎                             | 6365/50000 [1:09:13<7:55:38,  1.53it/s]


 13%|████▎                             | 6366/50000 [1:09:13<8:32:16,  1.42it/s]


 13%|████▎                             | 6367/50000 [1:09:14<8:26:19,  1.44it/s]


 13%|████▎                             | 6368/50000 [1:09:15<8:06:57,  1.49it/s]


 13%|████▎                             | 6369/50000 [1:09:15<8:15:35,  1.47it/s]


 13%|████▎                             | 6370/50000 [1:09:16<7:48:34,  1.55it/s]


 13%|████▎                             | 6371/50000 [1:09:16<7:35:04,  1.60it/s]


 13%|████▎                             | 6372/50000 [1:09:17<7:25:36,  1.63it/s]


 13%|████▎                             | 6373/50000 [1:09:18<7:16:12,  1.67it/s]


 13%|████▎                             | 6374/50000 [1:09:18<7:43:09,  1.57it/s]


 13%|████▎                             | 6375/50000 [1:09:19<8:07:30,  1.49it/s]


 13%|████▎                             | 6376/50000 [1:09:20<7:58:09,  1.52it/s]


 13%|████▎                             | 6377/50000 [1:09:20<7:41:50,  1.57it/s]


 13%|████▎                             | 6378/50000 [1:09:21<7:23:26,  1.64it/s]


 13%|████▎                             | 6379/50000 [1:09:22<8:17:00,  1.46it/s]


 13%|████▎                             | 6380/50000 [1:09:22<8:11:28,  1.48it/s]


 13%|████▎                             | 6381/50000 [1:09:23<7:53:44,  1.53it/s]


 13%|████▎                             | 6382/50000 [1:09:24<7:31:47,  1.61it/s]


 13%|████▎                             | 6383/50000 [1:09:24<7:34:23,  1.60it/s]


 13%|████▎                             | 6384/50000 [1:09:25<7:36:03,  1.59it/s]


 13%|████▎                             | 6385/50000 [1:09:25<7:46:51,  1.56it/s]


 13%|████▎                             | 6386/50000 [1:09:26<7:46:08,  1.56it/s]


 13%|████▎                             | 6387/50000 [1:09:27<7:42:52,  1.57it/s]


 13%|████▎                             | 6388/50000 [1:09:27<7:12:26,  1.68it/s]


 13%|████▎                             | 6389/50000 [1:09:28<7:13:46,  1.68it/s]


 13%|████▎                             | 6390/50000 [1:09:28<7:20:14,  1.65it/s]


 13%|████▎                             | 6391/50000 [1:09:29<7:17:51,  1.66it/s]


 13%|████▎                             | 6392/50000 [1:09:30<7:28:32,  1.62it/s]


 13%|████▎                             | 6393/50000 [1:09:30<7:36:07,  1.59it/s]


 13%|████▎                             | 6394/50000 [1:09:31<7:36:32,  1.59it/s]


 13%|████▎                             | 6395/50000 [1:09:32<7:47:19,  1.56it/s]


 13%|████▎                             | 6396/50000 [1:09:32<7:27:19,  1.62it/s]


 13%|████▎                             | 6397/50000 [1:09:33<7:29:46,  1.62it/s]


 13%|████▎                             | 6398/50000 [1:09:33<7:26:01,  1.63it/s]


 13%|████▎                             | 6399/50000 [1:09:34<7:30:15,  1.61it/s]


 13%|████▎                             | 6400/50000 [1:09:35<7:57:15,  1.52it/s]
                                                                                
{'loss': 3.3747, 'grad_norm': 2.6387319564819336, 'learning_rate': 0.000872, 'epoch': 0.34}

 13%|████▎                             | 6400/50000 [1:09:35<7:57:15,  1.52it/s]


 13%|████▎                             | 6401/50000 [1:09:36<8:10:37,  1.48it/s]


 13%|████▎                             | 6402/50000 [1:09:36<7:47:59,  1.55it/s]


 13%|████▎                             | 6403/50000 [1:09:37<7:37:45,  1.59it/s]


 13%|████▎                             | 6404/50000 [1:09:37<8:00:13,  1.51it/s]


 13%|████▎                             | 6405/50000 [1:09:38<8:00:11,  1.51it/s]


 13%|████▎                             | 6406/50000 [1:09:39<8:40:44,  1.40it/s]


 13%|████▎                             | 6407/50000 [1:09:40<8:09:41,  1.48it/s]


 13%|████▎                             | 6408/50000 [1:09:40<7:44:49,  1.56it/s]


 13%|████▎                             | 6409/50000 [1:09:41<7:45:32,  1.56it/s]


 13%|████▎                             | 6410/50000 [1:09:42<8:33:25,  1.42it/s]


 13%|████▎                             | 6411/50000 [1:09:42<8:00:30,  1.51it/s]


 13%|████▎                             | 6412/50000 [1:09:43<7:40:13,  1.58it/s]


 13%|████▎                             | 6413/50000 [1:09:44<8:29:50,  1.42it/s]


 13%|████▎                             | 6414/50000 [1:09:44<8:08:57,  1.49it/s]


 13%|████▎                             | 6415/50000 [1:09:45<8:05:23,  1.50it/s]


 13%|████▎                             | 6416/50000 [1:09:46<8:07:02,  1.49it/s]


 13%|████▎                             | 6417/50000 [1:09:46<8:48:28,  1.37it/s]


 13%|████▎                             | 6418/50000 [1:09:47<8:32:59,  1.42it/s]


 13%|████▎                             | 6419/50000 [1:09:48<7:53:51,  1.53it/s]


 13%|████▎                             | 6420/50000 [1:09:48<8:18:42,  1.46it/s]


 13%|████▎                             | 6421/50000 [1:09:49<8:00:43,  1.51it/s]


 13%|████▎                             | 6422/50000 [1:09:50<7:41:17,  1.57it/s]


 13%|████▎                             | 6423/50000 [1:09:50<7:46:57,  1.56it/s]


 13%|████▎                             | 6424/50000 [1:09:51<7:35:47,  1.59it/s]


 13%|████▎                             | 6425/50000 [1:09:51<7:37:41,  1.59it/s]


 13%|████▎                             | 6426/50000 [1:09:52<7:38:52,  1.58it/s]


 13%|████▎                             | 6427/50000 [1:09:53<7:32:46,  1.60it/s]


 13%|████▎                             | 6428/50000 [1:09:53<7:27:11,  1.62it/s]


 13%|████▎                             | 6429/50000 [1:09:54<7:41:24,  1.57it/s]


 13%|████▎                             | 6430/50000 [1:09:54<7:15:56,  1.67it/s]


 13%|████▎                             | 6431/50000 [1:09:55<7:11:16,  1.68it/s]


 13%|████▎                             | 6432/50000 [1:09:56<7:05:13,  1.71it/s]


 13%|████▎                             | 6433/50000 [1:09:56<7:23:09,  1.64it/s]


 13%|████▍                             | 6434/50000 [1:09:57<7:53:08,  1.53it/s]


 13%|████▍                             | 6435/50000 [1:09:58<8:11:19,  1.48it/s]


 13%|████▍                             | 6436/50000 [1:09:58<7:54:08,  1.53it/s]


 13%|████▍                             | 6437/50000 [1:09:59<7:43:24,  1.57it/s]


 13%|████▍                             | 6438/50000 [1:09:59<7:26:15,  1.63it/s]


 13%|████▍                             | 6439/50000 [1:10:00<7:41:52,  1.57it/s]


 13%|████▍                             | 6440/50000 [1:10:01<8:07:16,  1.49it/s]


 13%|████▍                             | 6441/50000 [1:10:02<8:05:26,  1.50it/s]


 13%|████▍                             | 6442/50000 [1:10:02<8:07:54,  1.49it/s]


 13%|████▍                             | 6443/50000 [1:10:03<8:11:02,  1.48it/s]


 13%|████▍                             | 6444/50000 [1:10:04<8:23:35,  1.44it/s]


 13%|████▍                             | 6445/50000 [1:10:04<8:01:17,  1.51it/s]


 13%|████▍                             | 6446/50000 [1:10:05<7:46:04,  1.56it/s]


 13%|████▍                             | 6447/50000 [1:10:06<8:11:59,  1.48it/s]


 13%|████▍                             | 6448/50000 [1:10:06<7:54:27,  1.53it/s]


 13%|████▍                             | 6449/50000 [1:10:07<8:10:21,  1.48it/s]


 13%|████▍                             | 6450/50000 [1:10:08<8:00:21,  1.51it/s]


 13%|████▍                             | 6451/50000 [1:10:08<8:14:32,  1.47it/s]


 13%|████▍                             | 6452/50000 [1:10:09<7:54:53,  1.53it/s]


 13%|████▍                             | 6453/50000 [1:10:10<7:58:00,  1.52it/s]


 13%|████▍                             | 6454/50000 [1:10:10<7:39:17,  1.58it/s]


 13%|████▍                             | 6455/50000 [1:10:11<8:24:02,  1.44it/s]


 13%|████▍                             | 6456/50000 [1:10:12<8:29:31,  1.42it/s]


 13%|████▍                             | 6457/50000 [1:10:12<8:04:02,  1.50it/s]


 13%|████▍                             | 6458/50000 [1:10:13<8:02:47,  1.50it/s]


 13%|████▍                             | 6459/50000 [1:10:14<7:43:56,  1.56it/s]


 13%|████▍                             | 6460/50000 [1:10:14<8:08:31,  1.49it/s]


 13%|████▍                             | 6461/50000 [1:10:15<7:53:25,  1.53it/s]


 13%|████▍                             | 6462/50000 [1:10:15<7:41:29,  1.57it/s]


 13%|████▍                             | 6463/50000 [1:10:16<8:24:29,  1.44it/s]


 13%|████▍                             | 6464/50000 [1:10:17<8:17:00,  1.46it/s]


 13%|████▍                             | 6465/50000 [1:10:18<8:22:57,  1.44it/s]


 13%|████▍                             | 6466/50000 [1:10:18<8:11:48,  1.48it/s]


 13%|████▍                             | 6467/50000 [1:10:19<7:44:39,  1.56it/s]


 13%|████▍                             | 6468/50000 [1:10:19<7:18:02,  1.66it/s]


 13%|████▍                             | 6469/50000 [1:10:20<7:23:42,  1.64it/s]


 13%|████▍                             | 6470/50000 [1:10:21<7:17:54,  1.66it/s]


 13%|████▍                             | 6471/50000 [1:10:21<7:41:34,  1.57it/s]


 13%|████▍                             | 6472/50000 [1:10:22<7:31:39,  1.61it/s]


 13%|████▍                             | 6473/50000 [1:10:23<8:00:00,  1.51it/s]


 13%|████▍                             | 6474/50000 [1:10:23<7:41:50,  1.57it/s]


 13%|████▍                             | 6475/50000 [1:10:24<7:29:11,  1.61it/s]


 13%|████▍                             | 6476/50000 [1:10:24<7:34:08,  1.60it/s]


 13%|████▍                             | 6477/50000 [1:10:25<7:40:18,  1.58it/s]


 13%|████▍                             | 6478/50000 [1:10:26<7:49:09,  1.55it/s]


 13%|████▍                             | 6479/50000 [1:10:26<7:30:30,  1.61it/s]


 13%|████▍                             | 6480/50000 [1:10:27<7:09:26,  1.69it/s]


 13%|████▍                             | 6481/50000 [1:10:28<7:17:29,  1.66it/s]


 13%|████▍                             | 6482/50000 [1:10:28<7:44:22,  1.56it/s]


 13%|████▍                             | 6483/50000 [1:10:29<7:43:24,  1.57it/s]


 13%|████▍                             | 6484/50000 [1:10:30<7:44:01,  1.56it/s]


 13%|████▍                             | 6485/50000 [1:10:30<7:30:50,  1.61it/s]


 13%|████▍                             | 6486/50000 [1:10:31<7:26:56,  1.62it/s]


 13%|████▍                             | 6487/50000 [1:10:31<7:56:36,  1.52it/s]


 13%|████▍                             | 6488/50000 [1:10:32<7:41:31,  1.57it/s]


 13%|████▍                             | 6489/50000 [1:10:33<7:27:52,  1.62it/s]


 13%|████▍                             | 6490/50000 [1:10:33<7:58:36,  1.52it/s]


 13%|████▍                             | 6491/50000 [1:10:34<8:04:23,  1.50it/s]


 13%|████▍                             | 6492/50000 [1:10:35<8:20:00,  1.45it/s]


 13%|████▍                             | 6493/50000 [1:10:35<8:08:12,  1.49it/s]


 13%|████▍                             | 6494/50000 [1:10:36<8:09:49,  1.48it/s]


 13%|████▍                             | 6495/50000 [1:10:37<8:22:23,  1.44it/s]


 13%|████▍                             | 6496/50000 [1:10:37<7:57:56,  1.52it/s]


 13%|████▍                             | 6497/50000 [1:10:38<7:43:59,  1.56it/s]


 13%|████▍                             | 6498/50000 [1:10:39<7:45:25,  1.56it/s]


 13%|████▍                             | 6499/50000 [1:10:39<7:47:25,  1.55it/s]


 13%|████▍                             | 6500/50000 [1:10:40<7:19:00,  1.65it/s]
                                                                                
{'loss': 3.4282, 'grad_norm': 3.6091952323913574, 'learning_rate': 0.00087, 'epoch': 0.34}

 13%|████▍                             | 6500/50000 [1:10:40<7:19:00,  1.65it/s]


 13%|████▍                             | 6501/50000 [1:10:41<8:13:02,  1.47it/s]


 13%|████▍                             | 6502/50000 [1:10:41<8:10:50,  1.48it/s]


 13%|████▍                             | 6503/50000 [1:10:42<7:53:58,  1.53it/s]


 13%|████▍                             | 6504/50000 [1:10:43<7:43:31,  1.56it/s]


 13%|████▍                             | 6505/50000 [1:10:43<7:59:51,  1.51it/s]


 13%|████▍                             | 6506/50000 [1:10:44<8:16:57,  1.46it/s]


 13%|████▍                             | 6507/50000 [1:10:45<8:08:48,  1.48it/s]


 13%|████▍                             | 6508/50000 [1:10:45<8:18:30,  1.45it/s]


 13%|████▍                             | 6509/50000 [1:10:46<8:11:18,  1.48it/s]


 13%|████▍                             | 6510/50000 [1:10:47<8:07:41,  1.49it/s]


 13%|████▍                             | 6511/50000 [1:10:47<7:52:43,  1.53it/s]


 13%|████▍                             | 6512/50000 [1:10:48<8:31:57,  1.42it/s]


 13%|████▍                             | 6513/50000 [1:10:49<8:38:02,  1.40it/s]


 13%|████▍                             | 6514/50000 [1:10:50<8:22:12,  1.44it/s]


 13%|████▍                             | 6515/50000 [1:10:50<7:56:04,  1.52it/s]


 13%|████▍                             | 6516/50000 [1:10:51<7:36:19,  1.59it/s]


 13%|████▍                             | 6517/50000 [1:10:51<7:27:03,  1.62it/s]


 13%|████▍                             | 6518/50000 [1:10:52<7:53:02,  1.53it/s]


 13%|████▍                             | 6519/50000 [1:10:53<7:50:46,  1.54it/s]


 13%|████▍                             | 6520/50000 [1:10:53<7:45:15,  1.56it/s]


 13%|████▍                             | 6521/50000 [1:10:54<7:45:19,  1.56it/s]


 13%|████▍                             | 6522/50000 [1:10:54<7:32:11,  1.60it/s]


 13%|████▍                             | 6523/50000 [1:10:55<7:28:02,  1.62it/s]


 13%|████▍                             | 6524/50000 [1:10:56<7:16:46,  1.66it/s]


 13%|████▍                             | 6525/50000 [1:10:56<7:15:42,  1.66it/s]


 13%|████▍                             | 6526/50000 [1:10:57<7:12:33,  1.68it/s]


 13%|████▍                             | 6527/50000 [1:10:58<7:31:59,  1.60it/s]


 13%|████▍                             | 6528/50000 [1:10:58<7:35:14,  1.59it/s]


 13%|████▍                             | 6529/50000 [1:10:59<7:39:58,  1.58it/s]


 13%|████▍                             | 6530/50000 [1:10:59<7:29:56,  1.61it/s]


 13%|████▍                             | 6531/50000 [1:11:00<7:33:42,  1.60it/s]


 13%|████▍                             | 6532/50000 [1:11:01<7:57:33,  1.52it/s]


 13%|████▍                             | 6533/50000 [1:11:01<7:43:25,  1.56it/s]


 13%|████▍                             | 6534/50000 [1:11:02<7:44:27,  1.56it/s]


 13%|████▍                             | 6535/50000 [1:11:03<7:47:47,  1.55it/s]


 13%|████▍                             | 6536/50000 [1:11:03<8:07:09,  1.49it/s]


 13%|████▍                             | 6537/50000 [1:11:04<8:10:25,  1.48it/s]


 13%|████▍                             | 6538/50000 [1:11:05<8:12:32,  1.47it/s]


 13%|████▍                             | 6539/50000 [1:11:05<7:51:57,  1.53it/s]


 13%|████▍                             | 6540/50000 [1:11:06<7:56:46,  1.52it/s]


 13%|████▍                             | 6541/50000 [1:11:07<7:42:13,  1.57it/s]


 13%|████▍                             | 6542/50000 [1:11:07<8:00:14,  1.51it/s]


 13%|████▍                             | 6543/50000 [1:11:08<8:05:18,  1.49it/s]


 13%|████▍                             | 6544/50000 [1:11:09<7:55:10,  1.52it/s]


 13%|████▍                             | 6545/50000 [1:11:09<7:53:37,  1.53it/s]


 13%|████▍                             | 6546/50000 [1:11:10<8:17:53,  1.45it/s]


 13%|████▍                             | 6547/50000 [1:11:11<8:48:09,  1.37it/s]


 13%|████▍                             | 6548/50000 [1:11:12<8:33:30,  1.41it/s]


 13%|████▍                             | 6549/50000 [1:11:12<8:16:11,  1.46it/s]


 13%|████▍                             | 6550/50000 [1:11:13<8:24:12,  1.44it/s]


 13%|████▍                             | 6551/50000 [1:11:14<8:31:21,  1.42it/s]


 13%|████▍                             | 6552/50000 [1:11:14<8:04:16,  1.50it/s]


 13%|████▍                             | 6553/50000 [1:11:15<7:56:09,  1.52it/s]


 13%|████▍                             | 6554/50000 [1:11:16<8:14:23,  1.46it/s]


 13%|████▍                             | 6555/50000 [1:11:16<8:15:30,  1.46it/s]


 13%|████▍                             | 6556/50000 [1:11:17<8:04:59,  1.49it/s]


 13%|████▍                             | 6557/50000 [1:11:18<8:24:28,  1.44it/s]


 13%|████▍                             | 6558/50000 [1:11:18<8:18:33,  1.45it/s]


 13%|████▍                             | 6559/50000 [1:11:19<8:13:06,  1.47it/s]


 13%|████▍                             | 6560/50000 [1:11:20<8:12:56,  1.47it/s]


 13%|████▍                             | 6561/50000 [1:11:21<8:48:39,  1.37it/s]


 13%|████▍                             | 6562/50000 [1:11:21<8:29:38,  1.42it/s]


 13%|████▍                             | 6563/50000 [1:11:22<8:24:44,  1.43it/s]


 13%|████▍                             | 6564/50000 [1:11:23<8:57:13,  1.35it/s]


 13%|████▍                             | 6565/50000 [1:11:23<8:50:59,  1.36it/s]


 13%|████▍                             | 6566/50000 [1:11:24<8:24:49,  1.43it/s]


 13%|████▍                             | 6567/50000 [1:11:25<7:59:34,  1.51it/s]


 13%|████▍                             | 6568/50000 [1:11:25<8:13:36,  1.47it/s]


 13%|████▍                             | 6569/50000 [1:11:26<7:54:09,  1.53it/s]


 13%|████▍                             | 6570/50000 [1:11:26<7:30:50,  1.61it/s]


 13%|████▍                             | 6571/50000 [1:11:27<8:31:41,  1.41it/s]


 13%|████▍                             | 6572/50000 [1:11:28<9:04:36,  1.33it/s]


 13%|████▍                             | 6573/50000 [1:11:29<8:30:57,  1.42it/s]


 13%|████▍                             | 6574/50000 [1:11:30<8:54:08,  1.35it/s]


 13%|████▍                             | 6575/50000 [1:11:30<8:12:42,  1.47it/s]


 13%|████▍                             | 6576/50000 [1:11:31<8:05:41,  1.49it/s]


 13%|████▍                             | 6577/50000 [1:11:31<7:31:05,  1.60it/s]


 13%|████▍                             | 6578/50000 [1:11:32<7:42:23,  1.57it/s]


 13%|████▍                             | 6579/50000 [1:11:33<8:26:14,  1.43it/s]


 13%|████▍                             | 6580/50000 [1:11:33<8:02:40,  1.50it/s]


 13%|████▍                             | 6581/50000 [1:11:34<8:00:09,  1.51it/s]


 13%|████▍                             | 6582/50000 [1:11:35<7:43:16,  1.56it/s]


 13%|████▍                             | 6583/50000 [1:11:35<7:47:54,  1.55it/s]


 13%|████▍                             | 6584/50000 [1:11:36<7:43:26,  1.56it/s]


 13%|████▍                             | 6585/50000 [1:11:37<7:49:41,  1.54it/s]


 13%|████▍                             | 6586/50000 [1:11:37<8:06:44,  1.49it/s]


 13%|████▍                             | 6587/50000 [1:11:38<8:10:12,  1.48it/s]


 13%|████▍                             | 6588/50000 [1:11:39<7:43:34,  1.56it/s]


 13%|████▍                             | 6589/50000 [1:11:39<7:45:39,  1.55it/s]


 13%|████▍                             | 6590/50000 [1:11:40<7:33:19,  1.60it/s]


 13%|████▍                             | 6591/50000 [1:11:41<7:40:45,  1.57it/s]


 13%|████▍                             | 6592/50000 [1:11:41<7:57:11,  1.52it/s]


 13%|████▍                             | 6593/50000 [1:11:42<7:51:13,  1.54it/s]


 13%|████▍                             | 6594/50000 [1:11:43<7:58:04,  1.51it/s]


 13%|████▍                             | 6595/50000 [1:11:43<7:36:13,  1.59it/s]


 13%|████▍                             | 6596/50000 [1:11:44<8:08:22,  1.48it/s]


 13%|████▍                             | 6597/50000 [1:11:45<8:32:09,  1.41it/s]


 13%|████▍                             | 6598/50000 [1:11:45<8:01:25,  1.50it/s]


 13%|████▍                             | 6599/50000 [1:11:46<8:22:24,  1.44it/s]


 13%|████▍                             | 6600/50000 [1:11:47<8:18:30,  1.45it/s]
                                                                                
{'loss': 3.3465, 'grad_norm': 3.107985496520996, 'learning_rate': 0.0008680000000000001, 'epoch': 0.35}

 13%|████▍                             | 6600/50000 [1:11:47<8:18:30,  1.45it/s]


 13%|████▍                             | 6601/50000 [1:11:47<8:07:03,  1.49it/s]


 13%|████▍                             | 6602/50000 [1:11:48<8:00:46,  1.50it/s]


 13%|████▍                             | 6603/50000 [1:11:49<7:48:21,  1.54it/s]


 13%|████▍                             | 6604/50000 [1:11:49<7:56:58,  1.52it/s]


 13%|████▍                             | 6605/50000 [1:11:50<7:36:21,  1.58it/s]


 13%|████▍                             | 6606/50000 [1:11:50<7:38:45,  1.58it/s]


 13%|████▍                             | 6607/50000 [1:11:51<7:39:18,  1.57it/s]


 13%|████▍                             | 6608/50000 [1:11:52<7:26:10,  1.62it/s]


 13%|████▍                             | 6609/50000 [1:11:52<7:34:20,  1.59it/s]


 13%|████▍                             | 6610/50000 [1:11:53<7:22:11,  1.64it/s]


 13%|████▍                             | 6611/50000 [1:11:54<8:10:07,  1.48it/s]


 13%|████▍                             | 6612/50000 [1:11:54<8:08:26,  1.48it/s]


 13%|████▍                             | 6613/50000 [1:11:55<7:49:11,  1.54it/s]


 13%|████▍                             | 6614/50000 [1:11:56<7:45:39,  1.55it/s]


 13%|████▍                             | 6615/50000 [1:11:56<7:44:39,  1.56it/s]


 13%|████▍                             | 6616/50000 [1:11:57<7:43:22,  1.56it/s]


 13%|████▍                             | 6617/50000 [1:11:57<7:28:34,  1.61it/s]


 13%|████▌                             | 6618/50000 [1:11:58<7:20:54,  1.64it/s]


 13%|████▌                             | 6619/50000 [1:11:59<7:36:13,  1.58it/s]


 13%|████▌                             | 6620/50000 [1:11:59<7:42:52,  1.56it/s]


 13%|████▌                             | 6621/50000 [1:12:00<7:28:14,  1.61it/s]


 13%|████▌                             | 6622/50000 [1:12:01<7:35:38,  1.59it/s]


 13%|████▌                             | 6623/50000 [1:12:01<7:45:05,  1.55it/s]


 13%|████▌                             | 6624/50000 [1:12:02<7:36:33,  1.58it/s]


 13%|████▌                             | 6625/50000 [1:12:02<7:06:41,  1.69it/s]


 13%|████▌                             | 6626/50000 [1:12:03<7:22:24,  1.63it/s]


 13%|████▌                             | 6627/50000 [1:12:04<8:11:38,  1.47it/s]


 13%|████▌                             | 6628/50000 [1:12:04<7:28:29,  1.61it/s]


 13%|████▌                             | 6629/50000 [1:12:05<7:37:17,  1.58it/s]


 13%|████▌                             | 6630/50000 [1:12:06<7:39:18,  1.57it/s]


 13%|████▌                             | 6631/50000 [1:12:06<7:20:08,  1.64it/s]


 13%|████▌                             | 6632/50000 [1:12:07<7:13:36,  1.67it/s]


 13%|████▌                             | 6633/50000 [1:12:08<7:52:18,  1.53it/s]


 13%|████▌                             | 6634/50000 [1:12:08<8:29:41,  1.42it/s]


 13%|████▌                             | 6635/50000 [1:12:09<8:04:56,  1.49it/s]


 13%|████▌                             | 6636/50000 [1:12:10<8:17:13,  1.45it/s]


 13%|████▌                             | 6637/50000 [1:12:10<8:10:00,  1.47it/s]


 13%|████▌                             | 6638/50000 [1:12:11<8:02:06,  1.50it/s]


 13%|████▌                             | 6639/50000 [1:12:12<7:36:04,  1.58it/s]


 13%|████▌                             | 6640/50000 [1:12:12<7:11:36,  1.67it/s]


 13%|████▌                             | 6641/50000 [1:12:13<7:49:21,  1.54it/s]


 13%|████▌                             | 6642/50000 [1:12:13<7:38:15,  1.58it/s]


 13%|████▌                             | 6643/50000 [1:12:14<7:55:10,  1.52it/s]


 13%|████▌                             | 6644/50000 [1:12:15<7:51:49,  1.53it/s]


 13%|████▌                             | 6645/50000 [1:12:15<7:52:02,  1.53it/s]


 13%|████▌                             | 6646/50000 [1:12:16<8:12:07,  1.47it/s]


 13%|████▌                             | 6647/50000 [1:12:17<8:26:18,  1.43it/s]


 13%|████▌                             | 6648/50000 [1:12:18<8:02:33,  1.50it/s]


 13%|████▌                             | 6649/50000 [1:12:18<8:05:14,  1.49it/s]


 13%|████▌                             | 6650/50000 [1:12:19<8:40:01,  1.39it/s]


 13%|████▌                             | 6651/50000 [1:12:20<7:57:54,  1.51it/s]


 13%|████▌                             | 6652/50000 [1:12:20<7:53:37,  1.53it/s]


 13%|████▌                             | 6653/50000 [1:12:21<8:37:42,  1.40it/s]


 13%|████▌                             | 6654/50000 [1:12:22<8:11:49,  1.47it/s]


 13%|████▌                             | 6655/50000 [1:12:22<8:32:52,  1.41it/s]


 13%|████▌                             | 6656/50000 [1:12:23<8:24:56,  1.43it/s]


 13%|████▌                             | 6657/50000 [1:12:24<8:19:31,  1.45it/s]


 13%|████▌                             | 6658/50000 [1:12:24<8:06:03,  1.49it/s]


 13%|████▌                             | 6659/50000 [1:12:25<7:40:22,  1.57it/s]


 13%|████▌                             | 6660/50000 [1:12:26<8:02:38,  1.50it/s]


 13%|████▌                             | 6661/50000 [1:12:26<7:47:15,  1.55it/s]


 13%|████▌                             | 6662/50000 [1:12:27<7:43:27,  1.56it/s]


 13%|████▌                             | 6663/50000 [1:12:28<7:33:49,  1.59it/s]


 13%|████▌                             | 6664/50000 [1:12:28<7:25:44,  1.62it/s]


 13%|████▌                             | 6665/50000 [1:12:29<7:35:14,  1.59it/s]


 13%|████▌                             | 6666/50000 [1:12:29<7:17:08,  1.65it/s]


 13%|████▌                             | 6667/50000 [1:12:30<8:08:59,  1.48it/s]


 13%|████▌                             | 6668/50000 [1:12:31<8:38:18,  1.39it/s]


 13%|████▌                             | 6669/50000 [1:12:32<7:53:46,  1.52it/s]


 13%|████▌                             | 6670/50000 [1:12:32<7:53:30,  1.53it/s]


 13%|████▌                             | 6671/50000 [1:12:33<8:14:34,  1.46it/s]


 13%|████▌                             | 6672/50000 [1:12:34<7:46:04,  1.55it/s]


 13%|████▌                             | 6673/50000 [1:12:34<7:35:42,  1.58it/s]


 13%|████▌                             | 6674/50000 [1:12:35<7:53:25,  1.53it/s]


 13%|████▌                             | 6675/50000 [1:12:35<7:40:52,  1.57it/s]


 13%|████▌                             | 6676/50000 [1:12:36<7:41:41,  1.56it/s]


 13%|████▌                             | 6677/50000 [1:12:37<7:29:55,  1.60it/s]


 13%|████▌                             | 6678/50000 [1:12:37<7:20:19,  1.64it/s]


 13%|████▌                             | 6679/50000 [1:12:38<7:15:04,  1.66it/s]


 13%|████▌                             | 6680/50000 [1:12:38<7:12:37,  1.67it/s]


 13%|████▌                             | 6681/50000 [1:12:39<7:08:13,  1.69it/s]


 13%|████▌                             | 6682/50000 [1:12:40<7:18:06,  1.65it/s]


 13%|████▌                             | 6683/50000 [1:12:40<8:06:10,  1.48it/s]


 13%|████▌                             | 6684/50000 [1:12:41<8:21:19,  1.44it/s]


 13%|████▌                             | 6685/50000 [1:12:42<7:58:18,  1.51it/s]


 13%|████▌                             | 6686/50000 [1:12:42<7:54:18,  1.52it/s]


 13%|████▌                             | 6687/50000 [1:12:43<7:54:28,  1.52it/s]


 13%|████▌                             | 6688/50000 [1:12:44<7:35:12,  1.59it/s]


 13%|████▌                             | 6689/50000 [1:12:44<7:46:23,  1.55it/s]


 13%|████▌                             | 6690/50000 [1:12:45<7:41:52,  1.56it/s]


 13%|████▌                             | 6691/50000 [1:12:46<7:44:13,  1.55it/s]


 13%|████▌                             | 6692/50000 [1:12:46<7:51:21,  1.53it/s]


 13%|████▌                             | 6693/50000 [1:12:47<7:45:25,  1.55it/s]


 13%|████▌                             | 6694/50000 [1:12:48<7:46:17,  1.55it/s]


 13%|████▌                             | 6695/50000 [1:12:48<7:33:02,  1.59it/s]


 13%|████▌                             | 6696/50000 [1:12:49<7:18:25,  1.65it/s]


 13%|████▌                             | 6697/50000 [1:12:50<8:23:49,  1.43it/s]


 13%|████▌                             | 6698/50000 [1:12:50<8:17:08,  1.45it/s]


 13%|████▌                             | 6699/50000 [1:12:51<8:11:21,  1.47it/s]


 13%|████▌                             | 6700/50000 [1:12:51<7:36:30,  1.58it/s]
                                                                                
{'loss': 3.3858, 'grad_norm': 2.7365522384643555, 'learning_rate': 0.000866, 'epoch': 0.35}

 13%|████▌                             | 6700/50000 [1:12:51<7:36:30,  1.58it/s]


 13%|████▌                             | 6701/50000 [1:12:52<7:38:37,  1.57it/s]


 13%|████▌                             | 6702/50000 [1:12:53<7:47:00,  1.55it/s]


 13%|████▌                             | 6703/50000 [1:12:53<7:32:16,  1.60it/s]


 13%|████▌                             | 6704/50000 [1:12:54<7:24:52,  1.62it/s]


 13%|████▌                             | 6705/50000 [1:12:55<7:34:14,  1.59it/s]


 13%|████▌                             | 6706/50000 [1:12:55<7:47:15,  1.54it/s]


 13%|████▌                             | 6707/50000 [1:12:56<7:27:22,  1.61it/s]


 13%|████▌                             | 6708/50000 [1:12:56<7:15:30,  1.66it/s]


 13%|████▌                             | 6709/50000 [1:12:57<7:30:00,  1.60it/s]


 13%|████▌                             | 6710/50000 [1:12:58<7:44:49,  1.55it/s]


 13%|████▌                             | 6711/50000 [1:12:58<7:49:32,  1.54it/s]


 13%|████▌                             | 6712/50000 [1:12:59<8:03:50,  1.49it/s]


 13%|████▌                             | 6713/50000 [1:13:00<8:36:59,  1.40it/s]


 13%|████▌                             | 6714/50000 [1:13:01<8:47:42,  1.37it/s]


 13%|████▌                             | 6715/50000 [1:13:01<8:22:22,  1.44it/s]


 13%|████▌                             | 6716/50000 [1:13:02<8:02:55,  1.49it/s]


 13%|████▌                             | 6717/50000 [1:13:03<7:41:45,  1.56it/s]


 13%|████▌                             | 6718/50000 [1:13:03<7:40:35,  1.57it/s]


 13%|████▌                             | 6719/50000 [1:13:04<7:28:09,  1.61it/s]


 13%|████▌                             | 6720/50000 [1:13:05<7:55:45,  1.52it/s]


 13%|████▌                             | 6721/50000 [1:13:05<8:29:47,  1.41it/s]


 13%|████▌                             | 6722/50000 [1:13:06<8:15:35,  1.46it/s]


 13%|████▌                             | 6723/50000 [1:13:07<8:46:24,  1.37it/s]


 13%|████▌                             | 6724/50000 [1:13:07<8:30:34,  1.41it/s]


 13%|████▌                             | 6725/50000 [1:13:08<8:38:04,  1.39it/s]


 13%|████▌                             | 6726/50000 [1:13:09<8:19:20,  1.44it/s]


 13%|████▌                             | 6727/50000 [1:13:09<7:56:36,  1.51it/s]


 13%|████▌                             | 6728/50000 [1:13:10<7:57:50,  1.51it/s]


 13%|████▌                             | 6729/50000 [1:13:11<8:41:14,  1.38it/s]


 13%|████▌                             | 6730/50000 [1:13:12<8:39:59,  1.39it/s]


 13%|████▌                             | 6731/50000 [1:13:12<8:24:40,  1.43it/s]


 13%|████▌                             | 6732/50000 [1:13:13<7:53:12,  1.52it/s]


 13%|████▌                             | 6733/50000 [1:13:13<7:39:10,  1.57it/s]


 13%|████▌                             | 6734/50000 [1:13:14<7:45:51,  1.55it/s]


 13%|████▌                             | 6735/50000 [1:13:15<7:35:23,  1.58it/s]


 13%|████▌                             | 6736/50000 [1:13:15<7:21:32,  1.63it/s]


 13%|████▌                             | 6737/50000 [1:13:16<7:16:05,  1.65it/s]


 13%|████▌                             | 6738/50000 [1:13:17<7:24:37,  1.62it/s]


 13%|████▌                             | 6739/50000 [1:13:17<7:16:55,  1.65it/s]


 13%|████▌                             | 6740/50000 [1:13:18<7:10:04,  1.68it/s]


 13%|████▌                             | 6741/50000 [1:13:18<7:35:37,  1.58it/s]


 13%|████▌                             | 6742/50000 [1:13:19<7:24:34,  1.62it/s]


 13%|████▌                             | 6743/50000 [1:13:20<8:07:26,  1.48it/s]


 13%|████▌                             | 6744/50000 [1:13:21<8:25:52,  1.43it/s]


 13%|████▌                             | 6745/50000 [1:13:21<7:56:59,  1.51it/s]


 13%|████▌                             | 6746/50000 [1:13:22<8:18:35,  1.45it/s]


 13%|████▌                             | 6747/50000 [1:13:23<8:12:31,  1.46it/s]


 13%|████▌                             | 6748/50000 [1:13:23<7:54:26,  1.52it/s]


 13%|████▌                             | 6749/50000 [1:13:24<7:54:08,  1.52it/s]


 14%|████▌                             | 6750/50000 [1:13:24<7:34:48,  1.58it/s]


 14%|████▌                             | 6751/50000 [1:13:25<7:12:17,  1.67it/s]


 14%|████▌                             | 6752/50000 [1:13:26<7:44:19,  1.55it/s]


 14%|████▌                             | 6753/50000 [1:13:26<7:25:03,  1.62it/s]


 14%|████▌                             | 6754/50000 [1:13:27<7:27:53,  1.61it/s]


 14%|████▌                             | 6755/50000 [1:13:28<7:38:07,  1.57it/s]


 14%|████▌                             | 6756/50000 [1:13:28<7:54:44,  1.52it/s]


 14%|████▌                             | 6757/50000 [1:13:29<8:10:50,  1.47it/s]


 14%|████▌                             | 6758/50000 [1:13:30<7:51:23,  1.53it/s]


 14%|████▌                             | 6759/50000 [1:13:30<7:23:23,  1.63it/s]


 14%|████▌                             | 6760/50000 [1:13:31<7:30:31,  1.60it/s]


 14%|████▌                             | 6761/50000 [1:13:31<7:39:31,  1.57it/s]


 14%|████▌                             | 6762/50000 [1:13:32<7:40:32,  1.56it/s]


 14%|████▌                             | 6763/50000 [1:13:33<7:30:31,  1.60it/s]


 14%|████▌                             | 6764/50000 [1:13:33<7:52:33,  1.52it/s]


 14%|████▌                             | 6765/50000 [1:13:34<7:45:29,  1.55it/s]


 14%|████▌                             | 6766/50000 [1:13:35<7:24:11,  1.62it/s]


 14%|████▌                             | 6767/50000 [1:13:35<7:32:42,  1.59it/s]


 14%|████▌                             | 6768/50000 [1:13:36<8:20:03,  1.44it/s]


 14%|████▌                             | 6769/50000 [1:13:37<8:10:40,  1.47it/s]


 14%|████▌                             | 6770/50000 [1:13:37<8:22:25,  1.43it/s]


 14%|████▌                             | 6771/50000 [1:13:38<8:26:21,  1.42it/s]


 14%|████▌                             | 6772/50000 [1:13:39<8:23:19,  1.43it/s]


 14%|████▌                             | 6773/50000 [1:13:40<8:32:55,  1.40it/s]


 14%|████▌                             | 6774/50000 [1:13:40<7:59:23,  1.50it/s]


 14%|████▌                             | 6775/50000 [1:13:41<7:56:09,  1.51it/s]


 14%|████▌                             | 6776/50000 [1:13:41<7:31:56,  1.59it/s]


 14%|████▌                             | 6777/50000 [1:13:42<7:24:39,  1.62it/s]


 14%|████▌                             | 6778/50000 [1:13:43<7:35:41,  1.58it/s]


 14%|████▌                             | 6779/50000 [1:13:43<8:01:55,  1.49it/s]


 14%|████▌                             | 6780/50000 [1:13:44<7:59:08,  1.50it/s]


 14%|████▌                             | 6781/50000 [1:13:45<8:00:12,  1.50it/s]


 14%|████▌                             | 6782/50000 [1:13:45<7:59:30,  1.50it/s]


 14%|████▌                             | 6783/50000 [1:13:46<8:04:28,  1.49it/s]


 14%|████▌                             | 6784/50000 [1:13:47<7:56:56,  1.51it/s]


 14%|████▌                             | 6785/50000 [1:13:47<7:57:15,  1.51it/s]


 14%|████▌                             | 6786/50000 [1:13:48<7:42:17,  1.56it/s]


 14%|████▌                             | 6787/50000 [1:13:49<8:01:13,  1.50it/s]


 14%|████▌                             | 6788/50000 [1:13:49<7:36:49,  1.58it/s]


 14%|████▌                             | 6789/50000 [1:13:50<7:38:56,  1.57it/s]


 14%|████▌                             | 6790/50000 [1:13:50<7:25:29,  1.62it/s]


 14%|████▌                             | 6791/50000 [1:13:51<7:11:48,  1.67it/s]


 14%|████▌                             | 6792/50000 [1:13:52<7:26:43,  1.61it/s]


 14%|████▌                             | 6793/50000 [1:13:52<7:38:23,  1.57it/s]


 14%|████▌                             | 6794/50000 [1:13:53<7:42:17,  1.56it/s]


 14%|████▌                             | 6795/50000 [1:13:54<7:24:34,  1.62it/s]


 14%|████▌                             | 6796/50000 [1:13:54<7:46:58,  1.54it/s]


 14%|████▌                             | 6797/50000 [1:13:55<8:02:39,  1.49it/s]


 14%|████▌                             | 6798/50000 [1:13:56<7:49:33,  1.53it/s]


 14%|████▌                             | 6799/50000 [1:13:56<7:51:44,  1.53it/s]


 14%|████▌                             | 6800/50000 [1:13:57<7:47:41,  1.54it/s]
                                                                                
{'loss': 3.3704, 'grad_norm': 2.55049204826355, 'learning_rate': 0.000864, 'epoch': 0.36}

 14%|████▌                             | 6800/50000 [1:13:57<7:47:41,  1.54it/s]


 14%|████▌                             | 6801/50000 [1:13:58<8:07:18,  1.48it/s]


 14%|████▋                             | 6802/50000 [1:13:58<7:51:42,  1.53it/s]


 14%|████▋                             | 6803/50000 [1:13:59<7:46:24,  1.54it/s]


 14%|████▋                             | 6804/50000 [1:13:59<7:41:09,  1.56it/s]


 14%|████▋                             | 6805/50000 [1:14:00<7:31:47,  1.59it/s]


 14%|████▋                             | 6806/50000 [1:14:01<7:08:12,  1.68it/s]


 14%|████▋                             | 6807/50000 [1:14:01<7:07:21,  1.68it/s]


 14%|████▋                             | 6808/50000 [1:14:02<7:25:07,  1.62it/s]


 14%|████▋                             | 6809/50000 [1:14:02<7:19:28,  1.64it/s]


 14%|████▋                             | 6810/50000 [1:14:03<7:17:59,  1.64it/s]


 14%|████▋                             | 6811/50000 [1:14:04<7:30:45,  1.60it/s]


 14%|████▋                             | 6812/50000 [1:14:04<7:42:25,  1.56it/s]


 14%|████▋                             | 6813/50000 [1:14:05<7:45:02,  1.55it/s]


 14%|████▋                             | 6814/50000 [1:14:06<7:45:31,  1.55it/s]


 14%|████▋                             | 6815/50000 [1:14:06<7:43:15,  1.55it/s]


 14%|████▋                             | 6816/50000 [1:14:07<7:40:05,  1.56it/s]


 14%|████▋                             | 6817/50000 [1:14:08<7:48:59,  1.53it/s]


 14%|████▋                             | 6818/50000 [1:14:08<7:32:40,  1.59it/s]


 14%|████▋                             | 6819/50000 [1:14:09<7:20:59,  1.63it/s]


 14%|████▋                             | 6820/50000 [1:14:09<7:27:46,  1.61it/s]


 14%|████▋                             | 6821/50000 [1:14:10<7:35:11,  1.58it/s]


 14%|████▋                             | 6822/50000 [1:14:11<7:52:48,  1.52it/s]


 14%|████▋                             | 6823/50000 [1:14:11<7:56:54,  1.51it/s]


 14%|████▋                             | 6824/50000 [1:14:12<8:01:06,  1.50it/s]


 14%|████▋                             | 6825/50000 [1:14:13<7:51:49,  1.53it/s]


 14%|████▋                             | 6826/50000 [1:14:13<7:41:01,  1.56it/s]


 14%|████▋                             | 6827/50000 [1:14:14<8:01:34,  1.49it/s]


 14%|████▋                             | 6828/50000 [1:14:15<7:43:32,  1.55it/s]


 14%|████▋                             | 6829/50000 [1:14:15<7:44:22,  1.55it/s]


 14%|████▋                             | 6830/50000 [1:14:16<7:35:22,  1.58it/s]


 14%|████▋                             | 6831/50000 [1:14:17<7:45:32,  1.55it/s]


 14%|████▋                             | 6832/50000 [1:14:17<7:50:00,  1.53it/s]


 14%|████▋                             | 6833/50000 [1:14:18<7:53:21,  1.52it/s]


 14%|████▋                             | 6834/50000 [1:14:19<8:31:07,  1.41it/s]


 14%|████▋                             | 6835/50000 [1:14:19<8:12:19,  1.46it/s]


 14%|████▋                             | 6836/50000 [1:14:20<8:00:28,  1.50it/s]


 14%|████▋                             | 6837/50000 [1:14:21<8:24:38,  1.43it/s]


 14%|████▋                             | 6838/50000 [1:14:22<8:34:30,  1.40it/s]


 14%|████▋                             | 6839/50000 [1:14:22<8:55:33,  1.34it/s]


 14%|████▋                             | 6840/50000 [1:14:23<8:32:14,  1.40it/s]


 14%|████▋                             | 6841/50000 [1:14:24<8:03:28,  1.49it/s]


 14%|████▋                             | 6842/50000 [1:14:24<8:06:33,  1.48it/s]


 14%|████▋                             | 6843/50000 [1:14:25<8:20:14,  1.44it/s]


 14%|████▋                             | 6844/50000 [1:14:26<8:11:25,  1.46it/s]


 14%|████▋                             | 6845/50000 [1:14:26<8:01:11,  1.49it/s]


 14%|████▋                             | 6846/50000 [1:14:27<8:02:08,  1.49it/s]


 14%|████▋                             | 6847/50000 [1:14:28<7:45:02,  1.55it/s]


 14%|████▋                             | 6848/50000 [1:14:28<7:33:07,  1.59it/s]


 14%|████▋                             | 6849/50000 [1:14:29<7:28:49,  1.60it/s]


 14%|████▋                             | 6850/50000 [1:14:29<7:38:07,  1.57it/s]


 14%|████▋                             | 6851/50000 [1:14:30<7:29:31,  1.60it/s]


 14%|████▋                             | 6852/50000 [1:14:31<7:42:49,  1.55it/s]


 14%|████▋                             | 6853/50000 [1:14:31<7:30:26,  1.60it/s]


 14%|████▋                             | 6854/50000 [1:14:32<8:01:52,  1.49it/s]


 14%|████▋                             | 6855/50000 [1:14:33<7:55:51,  1.51it/s]


 14%|████▋                             | 6856/50000 [1:14:33<7:58:40,  1.50it/s]


 14%|████▋                             | 6857/50000 [1:14:34<7:56:25,  1.51it/s]


 14%|████▋                             | 6858/50000 [1:14:35<7:50:43,  1.53it/s]


 14%|████▋                             | 6859/50000 [1:14:35<7:50:45,  1.53it/s]


 14%|████▋                             | 6860/50000 [1:14:36<7:49:39,  1.53it/s]


 14%|████▋                             | 6861/50000 [1:14:37<7:56:53,  1.51it/s]


 14%|████▋                             | 6862/50000 [1:14:37<7:17:37,  1.64it/s]


 14%|████▋                             | 6863/50000 [1:14:38<7:32:28,  1.59it/s]


 14%|████▋                             | 6864/50000 [1:14:39<7:50:31,  1.53it/s]


 14%|████▋                             | 6865/50000 [1:14:39<7:39:37,  1.56it/s]


 14%|████▋                             | 6866/50000 [1:14:40<7:41:38,  1.56it/s]


 14%|████▋                             | 6867/50000 [1:14:40<7:22:35,  1.62it/s]


 14%|████▋                             | 6868/50000 [1:14:41<7:11:57,  1.66it/s]


 14%|████▋                             | 6869/50000 [1:14:42<7:11:30,  1.67it/s]


 14%|████▋                             | 6870/50000 [1:14:42<7:22:23,  1.62it/s]


 14%|████▋                             | 6871/50000 [1:14:43<7:48:14,  1.54it/s]


 14%|████▋                             | 6872/50000 [1:14:44<7:35:44,  1.58it/s]


 14%|████▋                             | 6873/50000 [1:14:44<7:43:35,  1.55it/s]


 14%|████▋                             | 6874/50000 [1:14:45<7:51:39,  1.52it/s]


 14%|████▋                             | 6875/50000 [1:14:45<7:22:24,  1.62it/s]


 14%|████▋                             | 6876/50000 [1:14:46<7:31:16,  1.59it/s]


 14%|████▋                             | 6877/50000 [1:14:47<7:50:03,  1.53it/s]


 14%|████▋                             | 6878/50000 [1:14:48<8:30:27,  1.41it/s]


 14%|████▋                             | 6879/50000 [1:14:48<8:23:22,  1.43it/s]


 14%|████▋                             | 6880/50000 [1:14:49<8:19:26,  1.44it/s]


 14%|████▋                             | 6881/50000 [1:14:50<8:28:15,  1.41it/s]


 14%|████▋                             | 6882/50000 [1:14:50<8:15:26,  1.45it/s]


 14%|████▋                             | 6883/50000 [1:14:51<9:01:30,  1.33it/s]


 14%|████▋                             | 6884/50000 [1:14:52<8:42:57,  1.37it/s]


 14%|████▋                             | 6885/50000 [1:14:53<8:30:15,  1.41it/s]


 14%|████▋                             | 6886/50000 [1:14:53<8:22:29,  1.43it/s]


 14%|████▋                             | 6887/50000 [1:14:54<8:10:05,  1.47it/s]


 14%|████▋                             | 6888/50000 [1:14:55<7:53:16,  1.52it/s]


 14%|████▋                             | 6889/50000 [1:14:55<7:52:24,  1.52it/s]


 14%|████▋                             | 6890/50000 [1:14:56<7:36:52,  1.57it/s]


 14%|████▋                             | 6891/50000 [1:14:56<7:31:17,  1.59it/s]


 14%|████▋                             | 6892/50000 [1:14:57<7:16:42,  1.65it/s]


 14%|████▋                             | 6893/50000 [1:14:58<7:30:55,  1.59it/s]


 14%|████▋                             | 6894/50000 [1:14:58<7:56:31,  1.51it/s]


 14%|████▋                             | 6895/50000 [1:14:59<7:43:47,  1.55it/s]


 14%|████▋                             | 6896/50000 [1:15:00<7:49:55,  1.53it/s]


 14%|████▋                             | 6897/50000 [1:15:00<7:44:35,  1.55it/s]


 14%|████▋                             | 6898/50000 [1:15:01<7:34:06,  1.58it/s]


 14%|████▋                             | 6899/50000 [1:15:02<7:37:43,  1.57it/s]


 14%|████▋                             | 6900/50000 [1:15:02<7:36:05,  1.57it/s]
                                                                                
{'loss': 3.4145, 'grad_norm': 2.8937795162200928, 'learning_rate': 0.000862, 'epoch': 0.36}

 14%|████▋                             | 6900/50000 [1:15:02<7:36:05,  1.57it/s]


 14%|████▋                             | 6901/50000 [1:15:03<7:21:23,  1.63it/s]


 14%|████▋                             | 6902/50000 [1:15:03<6:59:20,  1.71it/s]


 14%|████▋                             | 6903/50000 [1:15:04<7:35:08,  1.58it/s]


 14%|████▋                             | 6904/50000 [1:15:05<7:26:08,  1.61it/s]


 14%|████▋                             | 6905/50000 [1:15:05<7:32:03,  1.59it/s]


 14%|████▋                             | 6906/50000 [1:15:06<7:21:03,  1.63it/s]


 14%|████▋                             | 6907/50000 [1:15:06<7:33:55,  1.58it/s]


 14%|████▋                             | 6908/50000 [1:15:07<7:15:38,  1.65it/s]


 14%|████▋                             | 6909/50000 [1:15:08<7:13:40,  1.66it/s]


 14%|████▋                             | 6910/50000 [1:15:08<7:20:35,  1.63it/s]


 14%|████▋                             | 6911/50000 [1:15:09<7:19:52,  1.63it/s]


 14%|████▋                             | 6912/50000 [1:15:09<7:16:38,  1.64it/s]


 14%|████▋                             | 6913/50000 [1:15:10<7:13:02,  1.66it/s]


 14%|████▋                             | 6914/50000 [1:15:11<7:20:23,  1.63it/s]


 14%|████▋                             | 6915/50000 [1:15:11<7:33:33,  1.58it/s]


 14%|████▋                             | 6916/50000 [1:15:12<8:22:24,  1.43it/s]


 14%|████▋                             | 6917/50000 [1:15:13<8:08:36,  1.47it/s]


 14%|████▋                             | 6918/50000 [1:15:13<7:42:54,  1.55it/s]


 14%|████▋                             | 6919/50000 [1:15:14<7:51:58,  1.52it/s]


 14%|████▋                             | 6920/50000 [1:15:15<7:45:55,  1.54it/s]


 14%|████▋                             | 6921/50000 [1:15:15<7:36:04,  1.57it/s]


 14%|████▋                             | 6922/50000 [1:15:16<7:27:56,  1.60it/s]


 14%|████▋                             | 6923/50000 [1:15:17<7:30:31,  1.59it/s]


 14%|████▋                             | 6924/50000 [1:15:17<7:48:40,  1.53it/s]


 14%|████▋                             | 6925/50000 [1:15:18<7:53:13,  1.52it/s]


 14%|████▋                             | 6926/50000 [1:15:19<8:11:55,  1.46it/s]


 14%|████▋                             | 6927/50000 [1:15:19<8:01:18,  1.49it/s]


 14%|████▋                             | 6928/50000 [1:15:20<7:44:16,  1.55it/s]


 14%|████▋                             | 6929/50000 [1:15:21<7:35:41,  1.58it/s]


 14%|████▋                             | 6930/50000 [1:15:21<7:41:42,  1.55it/s]


 14%|████▋                             | 6931/50000 [1:15:22<7:27:42,  1.60it/s]


 14%|████▋                             | 6932/50000 [1:15:22<7:48:08,  1.53it/s]


 14%|████▋                             | 6933/50000 [1:15:23<7:32:16,  1.59it/s]


 14%|████▋                             | 6934/50000 [1:15:24<7:42:57,  1.55it/s]


 14%|████▋                             | 6935/50000 [1:15:24<7:31:20,  1.59it/s]


 14%|████▋                             | 6936/50000 [1:15:25<7:04:54,  1.69it/s]


 14%|████▋                             | 6937/50000 [1:15:26<7:19:44,  1.63it/s]


 14%|████▋                             | 6938/50000 [1:15:26<7:16:29,  1.64it/s]


 14%|████▋                             | 6939/50000 [1:15:27<7:42:01,  1.55it/s]


 14%|████▋                             | 6940/50000 [1:15:28<7:47:01,  1.54it/s]


 14%|████▋                             | 6941/50000 [1:15:28<7:50:38,  1.52it/s]


 14%|████▋                             | 6942/50000 [1:15:29<7:39:26,  1.56it/s]


 14%|████▋                             | 6943/50000 [1:15:29<7:38:27,  1.57it/s]


 14%|████▋                             | 6944/50000 [1:15:30<7:37:43,  1.57it/s]


 14%|████▋                             | 6945/50000 [1:15:31<7:56:52,  1.50it/s]


 14%|████▋                             | 6946/50000 [1:15:31<7:42:18,  1.55it/s]


 14%|████▋                             | 6947/50000 [1:15:32<7:29:32,  1.60it/s]


 14%|████▋                             | 6948/50000 [1:15:33<7:36:57,  1.57it/s]


 14%|████▋                             | 6949/50000 [1:15:33<7:56:25,  1.51it/s]


 14%|████▋                             | 6950/50000 [1:15:34<7:53:03,  1.52it/s]


 14%|████▋                             | 6951/50000 [1:15:35<7:56:02,  1.51it/s]


 14%|████▋                             | 6952/50000 [1:15:35<7:47:36,  1.53it/s]


 14%|████▋                             | 6953/50000 [1:15:36<7:45:36,  1.54it/s]


 14%|████▋                             | 6954/50000 [1:15:37<7:29:05,  1.60it/s]


 14%|████▋                             | 6955/50000 [1:15:37<7:23:03,  1.62it/s]


 14%|████▋                             | 6956/50000 [1:15:38<7:29:44,  1.60it/s]


 14%|████▋                             | 6957/50000 [1:15:38<7:32:58,  1.58it/s]


 14%|████▋                             | 6958/50000 [1:15:39<8:16:12,  1.45it/s]


 14%|████▋                             | 6959/50000 [1:15:40<8:08:14,  1.47it/s]


 14%|████▋                             | 6960/50000 [1:15:40<7:50:24,  1.52it/s]


 14%|████▋                             | 6961/50000 [1:15:41<7:27:11,  1.60it/s]


 14%|████▋                             | 6962/50000 [1:15:42<7:32:39,  1.58it/s]


 14%|████▋                             | 6963/50000 [1:15:42<7:18:58,  1.63it/s]


 14%|████▋                             | 6964/50000 [1:15:43<7:28:36,  1.60it/s]


 14%|████▋                             | 6965/50000 [1:15:44<7:49:37,  1.53it/s]


 14%|████▋                             | 6966/50000 [1:15:44<8:07:10,  1.47it/s]


 14%|████▋                             | 6967/50000 [1:15:45<8:14:49,  1.45it/s]


 14%|████▋                             | 6968/50000 [1:15:46<8:06:22,  1.47it/s]


 14%|████▋                             | 6969/50000 [1:15:46<8:01:49,  1.49it/s]


 14%|████▋                             | 6970/50000 [1:15:47<8:04:05,  1.48it/s]


 14%|████▋                             | 6971/50000 [1:15:48<8:03:09,  1.48it/s]


 14%|████▋                             | 6972/50000 [1:15:49<8:33:39,  1.40it/s]


 14%|████▋                             | 6973/50000 [1:15:49<8:03:26,  1.48it/s]


 14%|████▋                             | 6974/50000 [1:15:50<7:39:18,  1.56it/s]


 14%|████▋                             | 6975/50000 [1:15:50<7:58:43,  1.50it/s]


 14%|████▋                             | 6976/50000 [1:15:51<7:51:05,  1.52it/s]


 14%|████▋                             | 6977/50000 [1:15:52<7:34:50,  1.58it/s]


 14%|████▋                             | 6978/50000 [1:15:52<7:35:49,  1.57it/s]


 14%|████▋                             | 6979/50000 [1:15:53<7:54:33,  1.51it/s]


 14%|████▋                             | 6980/50000 [1:15:54<7:38:35,  1.56it/s]


 14%|████▋                             | 6981/50000 [1:15:54<7:48:50,  1.53it/s]


 14%|████▋                             | 6982/50000 [1:15:55<7:50:20,  1.52it/s]


 14%|████▋                             | 6983/50000 [1:15:56<7:52:45,  1.52it/s]


 14%|████▋                             | 6984/50000 [1:15:56<7:47:42,  1.53it/s]


 14%|████▋                             | 6985/50000 [1:15:57<7:35:57,  1.57it/s]


 14%|████▊                             | 6986/50000 [1:15:57<7:26:16,  1.61it/s]


 14%|████▊                             | 6987/50000 [1:15:58<7:15:32,  1.65it/s]


 14%|████▊                             | 6988/50000 [1:15:59<7:53:33,  1.51it/s]


 14%|████▊                             | 6989/50000 [1:15:59<7:48:07,  1.53it/s]


 14%|████▊                             | 6990/50000 [1:16:00<8:12:16,  1.46it/s]


 14%|████▊                             | 6991/50000 [1:16:01<8:00:00,  1.49it/s]


 14%|████▊                             | 6992/50000 [1:16:02<8:19:14,  1.44it/s]


 14%|████▊                             | 6993/50000 [1:16:02<7:59:15,  1.50it/s]


 14%|████▊                             | 6994/50000 [1:16:03<7:50:50,  1.52it/s]


 14%|████▊                             | 6995/50000 [1:16:03<7:38:03,  1.56it/s]


 14%|████▊                             | 6996/50000 [1:16:04<7:35:58,  1.57it/s]


 14%|████▊                             | 6997/50000 [1:16:05<7:46:39,  1.54it/s]


 14%|████▊                             | 6998/50000 [1:16:05<7:32:27,  1.58it/s]


 14%|████▊                             | 6999/50000 [1:16:06<7:49:53,  1.53it/s]


 14%|████▊                             | 7000/50000 [1:16:07<7:30:57,  1.59it/s]
                                                                                
{'loss': 3.3635, 'grad_norm': 2.5829038619995117, 'learning_rate': 0.00086, 'epoch': 0.37}

 14%|████▊                             | 7000/50000 [1:16:07<7:30:57,  1.59it/s]


 14%|████▊                             | 7001/50000 [1:16:07<7:09:15,  1.67it/s]


 14%|████▊                             | 7002/50000 [1:16:08<7:16:09,  1.64it/s]


 14%|████▊                             | 7003/50000 [1:16:08<7:04:46,  1.69it/s]


 14%|████▊                             | 7004/50000 [1:16:09<6:59:35,  1.71it/s]


 14%|████▊                             | 7005/50000 [1:16:10<7:12:13,  1.66it/s]


 14%|████▊                             | 7006/50000 [1:16:10<7:36:20,  1.57it/s]


 14%|████▊                             | 7007/50000 [1:16:11<7:40:37,  1.56it/s]


 14%|████▊                             | 7008/50000 [1:16:12<7:43:36,  1.55it/s]


 14%|████▊                             | 7009/50000 [1:16:12<8:08:13,  1.47it/s]


 14%|████▊                             | 7010/50000 [1:16:13<8:15:34,  1.45it/s]


 14%|████▊                             | 7011/50000 [1:16:14<7:49:07,  1.53it/s]


 14%|████▊                             | 7012/50000 [1:16:14<8:06:35,  1.47it/s]


 14%|████▊                             | 7013/50000 [1:16:15<7:38:52,  1.56it/s]


 14%|████▊                             | 7014/50000 [1:16:15<7:24:41,  1.61it/s]


 14%|████▊                             | 7015/50000 [1:16:16<7:18:29,  1.63it/s]


 14%|████▊                             | 7016/50000 [1:16:17<7:28:57,  1.60it/s]


 14%|████▊                             | 7017/50000 [1:16:17<7:30:52,  1.59it/s]


 14%|████▊                             | 7018/50000 [1:16:18<7:23:57,  1.61it/s]


 14%|████▊                             | 7019/50000 [1:16:19<7:25:59,  1.61it/s]


 14%|████▊                             | 7020/50000 [1:16:19<7:19:06,  1.63it/s]


 14%|████▊                             | 7021/50000 [1:16:20<7:10:19,  1.66it/s]


 14%|████▊                             | 7022/50000 [1:16:20<7:21:48,  1.62it/s]


 14%|████▊                             | 7023/50000 [1:16:21<7:25:52,  1.61it/s]


 14%|████▊                             | 7024/50000 [1:16:22<7:29:59,  1.59it/s]


 14%|████▊                             | 7025/50000 [1:16:22<8:02:10,  1.49it/s]


 14%|████▊                             | 7026/50000 [1:16:23<9:11:00,  1.30it/s]


 14%|████▊                             | 7027/50000 [1:16:24<8:36:42,  1.39it/s]


 14%|████▊                             | 7028/50000 [1:16:25<8:47:25,  1.36it/s]


 14%|████▊                             | 7029/50000 [1:16:26<8:52:09,  1.35it/s]


 14%|████▊                             | 7030/50000 [1:16:26<8:16:46,  1.44it/s]


 14%|████▊                             | 7031/50000 [1:16:27<7:58:43,  1.50it/s]


 14%|████▊                             | 7032/50000 [1:16:27<7:46:05,  1.54it/s]


 14%|████▊                             | 7033/50000 [1:16:28<7:53:44,  1.51it/s]


 14%|████▊                             | 7034/50000 [1:16:29<7:38:27,  1.56it/s]


 14%|████▊                             | 7035/50000 [1:16:29<7:40:09,  1.56it/s]


 14%|████▊                             | 7036/50000 [1:16:30<7:48:26,  1.53it/s]


 14%|████▊                             | 7037/50000 [1:16:31<7:55:16,  1.51it/s]


 14%|████▊                             | 7038/50000 [1:16:31<8:00:02,  1.49it/s]


 14%|████▊                             | 7039/50000 [1:16:32<8:16:12,  1.44it/s]


 14%|████▊                             | 7040/50000 [1:16:33<8:27:33,  1.41it/s]


 14%|████▊                             | 7041/50000 [1:16:33<8:02:10,  1.48it/s]


 14%|████▊                             | 7042/50000 [1:16:34<8:13:42,  1.45it/s]


 14%|████▊                             | 7043/50000 [1:16:35<7:49:45,  1.52it/s]


 14%|████▊                             | 7044/50000 [1:16:35<8:14:34,  1.45it/s]


 14%|████▊                             | 7045/50000 [1:16:36<7:49:11,  1.53it/s]


 14%|████▊                             | 7046/50000 [1:16:37<7:36:44,  1.57it/s]


 14%|████▊                             | 7047/50000 [1:16:37<7:41:45,  1.55it/s]


 14%|████▊                             | 7048/50000 [1:16:38<7:23:30,  1.61it/s]


 14%|████▊                             | 7049/50000 [1:16:39<7:27:01,  1.60it/s]


 14%|████▊                             | 7050/50000 [1:16:39<7:53:23,  1.51it/s]


 14%|████▊                             | 7051/50000 [1:16:40<7:39:40,  1.56it/s]


 14%|████▊                             | 7052/50000 [1:16:41<8:08:57,  1.46it/s]


 14%|████▊                             | 7053/50000 [1:16:41<8:01:28,  1.49it/s]


 14%|████▊                             | 7054/50000 [1:16:42<7:28:21,  1.60it/s]


 14%|████▊                             | 7055/50000 [1:16:43<9:02:47,  1.32it/s]


 14%|████▊                             | 7056/50000 [1:16:44<8:36:27,  1.39it/s]


 14%|████▊                             | 7057/50000 [1:16:44<8:05:45,  1.47it/s]


 14%|████▊                             | 7058/50000 [1:16:45<7:53:51,  1.51it/s]


 14%|████▊                             | 7059/50000 [1:16:45<7:59:09,  1.49it/s]


 14%|████▊                             | 7060/50000 [1:16:46<8:29:52,  1.40it/s]


 14%|████▊                             | 7061/50000 [1:16:47<8:22:51,  1.42it/s]


 14%|████▊                             | 7062/50000 [1:16:48<8:29:45,  1.40it/s]


 14%|████▊                             | 7063/50000 [1:16:49<9:10:47,  1.30it/s]


 14%|████▊                             | 7064/50000 [1:16:49<9:13:57,  1.29it/s]


 14%|████▊                             | 7065/50000 [1:16:50<8:42:54,  1.37it/s]


 14%|████▊                             | 7066/50000 [1:16:51<8:46:11,  1.36it/s]


 14%|████▊                             | 7067/50000 [1:16:52<9:02:47,  1.32it/s]


 14%|████▊                             | 7068/50000 [1:16:52<8:28:11,  1.41it/s]


 14%|████▊                             | 7069/50000 [1:16:53<8:12:10,  1.45it/s]


 14%|████▊                             | 7070/50000 [1:16:53<8:03:37,  1.48it/s]


 14%|████▊                             | 7071/50000 [1:16:54<8:19:59,  1.43it/s]


 14%|████▊                             | 7072/50000 [1:16:55<7:52:53,  1.51it/s]


 14%|████▊                             | 7073/50000 [1:16:55<7:41:54,  1.55it/s]


 14%|████▊                             | 7074/50000 [1:16:56<7:39:51,  1.56it/s]


 14%|████▊                             | 7075/50000 [1:16:57<7:31:28,  1.58it/s]


 14%|████▊                             | 7076/50000 [1:16:57<7:36:27,  1.57it/s]


 14%|████▊                             | 7077/50000 [1:16:58<7:26:24,  1.60it/s]


 14%|████▊                             | 7078/50000 [1:16:58<7:05:02,  1.68it/s]


 14%|████▊                             | 7079/50000 [1:16:59<7:16:38,  1.64it/s]


 14%|████▊                             | 7080/50000 [1:17:00<7:26:03,  1.60it/s]


 14%|████▊                             | 7081/50000 [1:17:00<7:16:27,  1.64it/s]


 14%|████▊                             | 7082/50000 [1:17:01<7:21:55,  1.62it/s]


 14%|████▊                             | 7083/50000 [1:17:01<7:17:33,  1.63it/s]


 14%|████▊                             | 7084/50000 [1:17:02<7:09:04,  1.67it/s]


 14%|████▊                             | 7085/50000 [1:17:03<7:15:22,  1.64it/s]


 14%|████▊                             | 7086/50000 [1:17:03<7:43:29,  1.54it/s]


 14%|████▊                             | 7087/50000 [1:17:04<7:44:51,  1.54it/s]


 14%|████▊                             | 7088/50000 [1:17:05<7:44:27,  1.54it/s]


 14%|████▊                             | 7089/50000 [1:17:05<7:28:04,  1.60it/s]


 14%|████▊                             | 7090/50000 [1:17:06<7:32:44,  1.58it/s]


 14%|████▊                             | 7091/50000 [1:17:07<7:41:24,  1.55it/s]


 14%|████▊                             | 7092/50000 [1:17:07<7:42:00,  1.55it/s]


 14%|████▊                             | 7093/50000 [1:17:08<7:43:42,  1.54it/s]


 14%|████▊                             | 7094/50000 [1:17:08<7:23:15,  1.61it/s]


 14%|████▊                             | 7095/50000 [1:17:09<7:25:16,  1.61it/s]


 14%|████▊                             | 7096/50000 [1:17:10<7:04:09,  1.69it/s]


 14%|████▊                             | 7097/50000 [1:17:10<7:42:26,  1.55it/s]


 14%|████▊                             | 7098/50000 [1:17:11<7:27:50,  1.60it/s]


 14%|████▊                             | 7099/50000 [1:17:12<7:37:57,  1.56it/s]


 14%|████▊                             | 7100/50000 [1:17:12<8:20:27,  1.43it/s]
                                                                                
{'loss': 3.3547, 'grad_norm': 2.6282856464385986, 'learning_rate': 0.000858, 'epoch': 0.37}

 14%|████▊                             | 7100/50000 [1:17:12<8:20:27,  1.43it/s]


 14%|████▊                             | 7101/50000 [1:17:13<8:04:07,  1.48it/s]


 14%|████▊                             | 7102/50000 [1:17:14<7:59:11,  1.49it/s]


 14%|████▊                             | 7103/50000 [1:17:14<7:58:28,  1.49it/s]


 14%|████▊                             | 7104/50000 [1:17:15<7:34:56,  1.57it/s]


 14%|████▊                             | 7105/50000 [1:17:16<7:43:02,  1.54it/s]


 14%|████▊                             | 7106/50000 [1:17:16<7:51:23,  1.52it/s]


 14%|████▊                             | 7107/50000 [1:17:17<7:28:39,  1.59it/s]


 14%|████▊                             | 7108/50000 [1:17:18<7:29:04,  1.59it/s]


 14%|████▊                             | 7109/50000 [1:17:18<7:57:03,  1.50it/s]


 14%|████▊                             | 7110/50000 [1:17:19<7:52:54,  1.51it/s]


 14%|████▊                             | 7111/50000 [1:17:20<7:50:00,  1.52it/s]


 14%|████▊                             | 7112/50000 [1:17:20<7:56:13,  1.50it/s]


 14%|████▊                             | 7113/50000 [1:17:21<7:53:41,  1.51it/s]


 14%|████▊                             | 7114/50000 [1:17:22<8:19:41,  1.43it/s]


 14%|████▊                             | 7115/50000 [1:17:22<7:52:27,  1.51it/s]


 14%|████▊                             | 7116/50000 [1:17:23<7:32:04,  1.58it/s]


 14%|████▊                             | 7117/50000 [1:17:24<8:21:49,  1.42it/s]


 14%|████▊                             | 7118/50000 [1:17:24<7:56:46,  1.50it/s]


 14%|████▊                             | 7119/50000 [1:17:25<7:31:00,  1.58it/s]


 14%|████▊                             | 7120/50000 [1:17:25<7:35:53,  1.57it/s]


 14%|████▊                             | 7121/50000 [1:17:26<7:19:07,  1.63it/s]


 14%|████▊                             | 7122/50000 [1:17:27<7:30:25,  1.59it/s]


 14%|████▊                             | 7123/50000 [1:17:27<7:24:39,  1.61it/s]


 14%|████▊                             | 7124/50000 [1:17:28<7:30:16,  1.59it/s]


 14%|████▊                             | 7125/50000 [1:17:29<7:34:12,  1.57it/s]


 14%|████▊                             | 7126/50000 [1:17:29<7:28:35,  1.59it/s]


 14%|████▊                             | 7127/50000 [1:17:30<7:51:19,  1.52it/s]


 14%|████▊                             | 7128/50000 [1:17:31<7:38:03,  1.56it/s]


 14%|████▊                             | 7129/50000 [1:17:31<7:13:01,  1.65it/s]


 14%|████▊                             | 7130/50000 [1:17:32<7:30:12,  1.59it/s]


 14%|████▊                             | 7131/50000 [1:17:32<7:34:08,  1.57it/s]


 14%|████▊                             | 7132/50000 [1:17:33<7:55:15,  1.50it/s]


 14%|████▊                             | 7133/50000 [1:17:34<7:49:02,  1.52it/s]


 14%|████▊                             | 7134/50000 [1:17:34<7:52:47,  1.51it/s]


 14%|████▊                             | 7135/50000 [1:17:35<8:19:04,  1.43it/s]


 14%|████▊                             | 7136/50000 [1:17:36<8:06:02,  1.47it/s]


 14%|████▊                             | 7137/50000 [1:17:36<7:40:20,  1.55it/s]


 14%|████▊                             | 7138/50000 [1:17:37<7:57:57,  1.49it/s]


 14%|████▊                             | 7139/50000 [1:17:38<7:19:02,  1.63it/s]


 14%|████▊                             | 7140/50000 [1:17:38<7:43:23,  1.54it/s]


 14%|████▊                             | 7141/50000 [1:17:39<7:27:05,  1.60it/s]


 14%|████▊                             | 7142/50000 [1:17:40<7:45:38,  1.53it/s]


 14%|████▊                             | 7143/50000 [1:17:40<7:43:27,  1.54it/s]


 14%|████▊                             | 7144/50000 [1:17:41<7:41:59,  1.55it/s]


 14%|████▊                             | 7145/50000 [1:17:42<7:47:48,  1.53it/s]


 14%|████▊                             | 7146/50000 [1:17:42<7:30:09,  1.59it/s]


 14%|████▊                             | 7147/50000 [1:17:43<7:32:36,  1.58it/s]


 14%|████▊                             | 7148/50000 [1:17:43<7:38:33,  1.56it/s]


 14%|████▊                             | 7149/50000 [1:17:44<7:39:53,  1.55it/s]


 14%|████▊                             | 7150/50000 [1:17:45<7:49:02,  1.52it/s]


 14%|████▊                             | 7151/50000 [1:17:46<8:50:01,  1.35it/s]


 14%|████▊                             | 7152/50000 [1:17:46<8:17:44,  1.43it/s]


 14%|████▊                             | 7153/50000 [1:17:47<8:07:15,  1.47it/s]


 14%|████▊                             | 7154/50000 [1:17:48<7:48:57,  1.52it/s]


 14%|████▊                             | 7155/50000 [1:17:48<8:05:29,  1.47it/s]


 14%|████▊                             | 7156/50000 [1:17:49<8:18:20,  1.43it/s]


 14%|████▊                             | 7157/50000 [1:17:50<8:27:56,  1.41it/s]


 14%|████▊                             | 7158/50000 [1:17:51<8:42:48,  1.37it/s]


 14%|████▊                             | 7159/50000 [1:17:51<8:42:08,  1.37it/s]


 14%|████▊                             | 7160/50000 [1:17:52<8:20:47,  1.43it/s]


 14%|████▊                             | 7161/50000 [1:17:53<9:04:30,  1.31it/s]


 14%|████▊                             | 7162/50000 [1:17:54<8:40:30,  1.37it/s]


 14%|████▊                             | 7163/50000 [1:17:54<8:24:57,  1.41it/s]


 14%|████▊                             | 7164/50000 [1:17:55<8:00:28,  1.49it/s]


 14%|████▊                             | 7165/50000 [1:17:56<8:13:51,  1.45it/s]


 14%|████▊                             | 7166/50000 [1:17:56<8:27:34,  1.41it/s]


 14%|████▊                             | 7167/50000 [1:17:57<8:13:26,  1.45it/s]


 14%|████▊                             | 7168/50000 [1:17:58<8:00:42,  1.49it/s]


 14%|████▊                             | 7169/50000 [1:17:58<8:50:21,  1.35it/s]


 14%|████▉                             | 7170/50000 [1:17:59<8:35:58,  1.38it/s]


 14%|████▉                             | 7171/50000 [1:18:00<8:02:50,  1.48it/s]


 14%|████▉                             | 7172/50000 [1:18:00<8:19:35,  1.43it/s]


 14%|████▉                             | 7173/50000 [1:18:01<8:29:57,  1.40it/s]


 14%|████▉                             | 7174/50000 [1:18:02<8:10:38,  1.45it/s]


 14%|████▉                             | 7175/50000 [1:18:03<8:30:06,  1.40it/s]


 14%|████▉                             | 7176/50000 [1:18:03<8:18:32,  1.43it/s]


 14%|████▉                             | 7177/50000 [1:18:04<8:07:53,  1.46it/s]


 14%|████▉                             | 7178/50000 [1:18:04<7:45:20,  1.53it/s]


 14%|████▉                             | 7179/50000 [1:18:05<7:51:24,  1.51it/s]


 14%|████▉                             | 7180/50000 [1:18:06<8:05:54,  1.47it/s]


 14%|████▉                             | 7181/50000 [1:18:07<7:59:36,  1.49it/s]


 14%|████▉                             | 7182/50000 [1:18:07<7:43:36,  1.54it/s]


 14%|████▉                             | 7183/50000 [1:18:08<8:18:34,  1.43it/s]


 14%|████▉                             | 7184/50000 [1:18:09<7:47:25,  1.53it/s]


 14%|████▉                             | 7185/50000 [1:18:09<7:49:03,  1.52it/s]


 14%|████▉                             | 7186/50000 [1:18:10<7:54:07,  1.51it/s]


 14%|████▉                             | 7187/50000 [1:18:10<7:50:40,  1.52it/s]


 14%|████▉                             | 7188/50000 [1:18:11<7:44:29,  1.54it/s]


 14%|████▉                             | 7189/50000 [1:18:12<8:44:04,  1.36it/s]


 14%|████▉                             | 7190/50000 [1:18:13<8:04:02,  1.47it/s]


 14%|████▉                             | 7191/50000 [1:18:13<7:54:52,  1.50it/s]


 14%|████▉                             | 7192/50000 [1:18:14<7:55:16,  1.50it/s]


 14%|████▉                             | 7193/50000 [1:18:15<8:28:07,  1.40it/s]


 14%|████▉                             | 7194/50000 [1:18:15<8:35:20,  1.38it/s]


 14%|████▉                             | 7195/50000 [1:18:16<8:39:01,  1.37it/s]


 14%|████▉                             | 7196/50000 [1:18:17<8:04:22,  1.47it/s]


 14%|████▉                             | 7197/50000 [1:18:17<7:55:08,  1.50it/s]


 14%|████▉                             | 7198/50000 [1:18:18<8:04:58,  1.47it/s]


 14%|████▉                             | 7199/50000 [1:18:19<7:44:39,  1.54it/s]


 14%|████▉                             | 7200/50000 [1:18:19<7:25:01,  1.60it/s]
                                                                                
{'loss': 3.3783, 'grad_norm': 2.5408904552459717, 'learning_rate': 0.000856, 'epoch': 0.38}

 14%|████▉                             | 7200/50000 [1:18:19<7:25:01,  1.60it/s]


 14%|████▉                             | 7201/50000 [1:18:20<7:45:19,  1.53it/s]


 14%|████▉                             | 7202/50000 [1:18:21<7:49:43,  1.52it/s]


 14%|████▉                             | 7203/50000 [1:18:21<7:51:36,  1.51it/s]


 14%|████▉                             | 7204/50000 [1:18:22<8:14:09,  1.44it/s]


 14%|████▉                             | 7205/50000 [1:18:23<7:51:23,  1.51it/s]


 14%|████▉                             | 7206/50000 [1:18:23<7:52:45,  1.51it/s]


 14%|████▉                             | 7207/50000 [1:18:24<7:48:13,  1.52it/s]


 14%|████▉                             | 7208/50000 [1:18:25<8:00:09,  1.49it/s]


 14%|████▉                             | 7209/50000 [1:18:25<8:15:11,  1.44it/s]


 14%|████▉                             | 7210/50000 [1:18:26<8:10:57,  1.45it/s]


 14%|████▉                             | 7211/50000 [1:18:27<8:00:56,  1.48it/s]


 14%|████▉                             | 7212/50000 [1:18:27<7:36:25,  1.56it/s]


 14%|████▉                             | 7213/50000 [1:18:28<7:46:19,  1.53it/s]


 14%|████▉                             | 7214/50000 [1:18:29<7:35:32,  1.57it/s]


 14%|████▉                             | 7215/50000 [1:18:29<7:17:19,  1.63it/s]


 14%|████▉                             | 7216/50000 [1:18:30<7:48:33,  1.52it/s]


 14%|████▉                             | 7217/50000 [1:18:31<7:48:01,  1.52it/s]


 14%|████▉                             | 7218/50000 [1:18:31<7:31:23,  1.58it/s]


 14%|████▉                             | 7219/50000 [1:18:32<7:23:50,  1.61it/s]


 14%|████▉                             | 7220/50000 [1:18:32<7:45:05,  1.53it/s]


 14%|████▉                             | 7221/50000 [1:18:33<7:39:07,  1.55it/s]


 14%|████▉                             | 7222/50000 [1:18:34<7:26:30,  1.60it/s]


 14%|████▉                             | 7223/50000 [1:18:34<7:00:45,  1.69it/s]


 14%|████▉                             | 7224/50000 [1:18:35<6:52:51,  1.73it/s]


 14%|████▉                             | 7225/50000 [1:18:36<7:30:00,  1.58it/s]


 14%|████▉                             | 7226/50000 [1:18:36<7:53:13,  1.51it/s]


 14%|████▉                             | 7227/50000 [1:18:37<7:47:10,  1.53it/s]


 14%|████▉                             | 7228/50000 [1:18:37<7:15:15,  1.64it/s]


 14%|████▉                             | 7229/50000 [1:18:38<7:44:14,  1.54it/s]


 14%|████▉                             | 7230/50000 [1:18:39<7:26:13,  1.60it/s]


 14%|████▉                             | 7231/50000 [1:18:39<7:25:49,  1.60it/s]


 14%|████▉                             | 7232/50000 [1:18:40<7:15:47,  1.64it/s]


 14%|████▉                             | 7233/50000 [1:18:40<7:12:48,  1.65it/s]


 14%|████▉                             | 7234/50000 [1:18:42<8:45:04,  1.36it/s]


 14%|████▉                             | 7235/50000 [1:18:42<8:11:16,  1.45it/s]


 14%|████▉                             | 7236/50000 [1:18:43<8:00:13,  1.48it/s]


 14%|████▉                             | 7237/50000 [1:18:43<7:50:57,  1.51it/s]


 14%|████▉                             | 7238/50000 [1:18:44<7:32:08,  1.58it/s]


 14%|████▉                             | 7239/50000 [1:18:44<7:00:36,  1.69it/s]


 14%|████▉                             | 7240/50000 [1:18:45<7:09:06,  1.66it/s]


 14%|████▉                             | 7241/50000 [1:18:46<7:37:10,  1.56it/s]


 14%|████▉                             | 7242/50000 [1:18:46<7:42:49,  1.54it/s]


 14%|████▉                             | 7243/50000 [1:18:47<7:46:41,  1.53it/s]


 14%|████▉                             | 7244/50000 [1:18:48<7:42:40,  1.54it/s]


 14%|████▉                             | 7245/50000 [1:18:48<7:42:34,  1.54it/s]


 14%|████▉                             | 7246/50000 [1:18:49<7:23:33,  1.61it/s]


 14%|████▉                             | 7247/50000 [1:18:50<7:34:43,  1.57it/s]


 14%|████▉                             | 7248/50000 [1:18:50<7:45:07,  1.53it/s]


 14%|████▉                             | 7249/50000 [1:18:51<8:02:20,  1.48it/s]


 14%|████▉                             | 7250/50000 [1:18:52<8:17:11,  1.43it/s]


 15%|████▉                             | 7251/50000 [1:18:53<8:27:31,  1.40it/s]


 15%|████▉                             | 7252/50000 [1:18:53<8:15:30,  1.44it/s]


 15%|████▉                             | 7253/50000 [1:18:54<8:00:31,  1.48it/s]


 15%|████▉                             | 7254/50000 [1:18:55<8:00:45,  1.48it/s]


 15%|████▉                             | 7255/50000 [1:18:55<7:51:02,  1.51it/s]


 15%|████▉                             | 7256/50000 [1:18:56<8:03:19,  1.47it/s]


 15%|████▉                             | 7257/50000 [1:18:56<7:47:23,  1.52it/s]


 15%|████▉                             | 7258/50000 [1:18:57<8:09:30,  1.46it/s]


 15%|████▉                             | 7259/50000 [1:18:58<8:04:03,  1.47it/s]


 15%|████▉                             | 7260/50000 [1:18:59<8:14:22,  1.44it/s]


 15%|████▉                             | 7261/50000 [1:18:59<8:22:55,  1.42it/s]


 15%|████▉                             | 7262/50000 [1:19:00<7:59:42,  1.48it/s]


 15%|████▉                             | 7263/50000 [1:19:01<8:22:18,  1.42it/s]


 15%|████▉                             | 7264/50000 [1:19:01<8:08:54,  1.46it/s]


 15%|████▉                             | 7265/50000 [1:19:02<8:09:10,  1.46it/s]


 15%|████▉                             | 7266/50000 [1:19:03<8:20:32,  1.42it/s]


 15%|████▉                             | 7267/50000 [1:19:03<7:50:22,  1.51it/s]


 15%|████▉                             | 7268/50000 [1:19:04<7:37:01,  1.56it/s]


 15%|████▉                             | 7269/50000 [1:19:05<7:18:19,  1.62it/s]


 15%|████▉                             | 7270/50000 [1:19:05<7:32:11,  1.57it/s]


 15%|████▉                             | 7271/50000 [1:19:06<7:55:57,  1.50it/s]


 15%|████▉                             | 7272/50000 [1:19:06<7:23:50,  1.60it/s]


 15%|████▉                             | 7273/50000 [1:19:07<7:09:00,  1.66it/s]


 15%|████▉                             | 7274/50000 [1:19:08<6:43:03,  1.77it/s]


 15%|████▉                             | 7275/50000 [1:19:08<6:44:29,  1.76it/s]


 15%|████▉                             | 7276/50000 [1:19:09<6:57:37,  1.71it/s]


 15%|████▉                             | 7277/50000 [1:19:09<7:29:14,  1.59it/s]


 15%|████▉                             | 7278/50000 [1:19:10<7:37:23,  1.56it/s]


 15%|████▉                             | 7279/50000 [1:19:11<7:21:38,  1.61it/s]


 15%|████▉                             | 7280/50000 [1:19:11<7:25:33,  1.60it/s]


 15%|████▉                             | 7281/50000 [1:19:12<7:22:46,  1.61it/s]


 15%|████▉                             | 7282/50000 [1:19:12<6:55:49,  1.71it/s]


 15%|████▉                             | 7283/50000 [1:19:13<7:38:55,  1.55it/s]


 15%|████▉                             | 7284/50000 [1:19:14<7:26:42,  1.59it/s]


 15%|████▉                             | 7285/50000 [1:19:14<7:31:30,  1.58it/s]


 15%|████▉                             | 7286/50000 [1:19:15<7:34:28,  1.57it/s]


 15%|████▉                             | 7287/50000 [1:19:16<7:31:08,  1.58it/s]


 15%|████▉                             | 7288/50000 [1:19:16<7:28:56,  1.59it/s]


 15%|████▉                             | 7289/50000 [1:19:17<7:30:01,  1.58it/s]


 15%|████▉                             | 7290/50000 [1:19:18<7:37:35,  1.56it/s]


 15%|████▉                             | 7291/50000 [1:19:18<7:42:48,  1.54it/s]


 15%|████▉                             | 7292/50000 [1:19:19<7:32:49,  1.57it/s]


 15%|████▉                             | 7293/50000 [1:19:20<7:39:29,  1.55it/s]


 15%|████▉                             | 7294/50000 [1:19:20<7:36:07,  1.56it/s]


 15%|████▉                             | 7295/50000 [1:19:21<7:07:19,  1.67it/s]


 15%|████▉                             | 7296/50000 [1:19:21<7:05:18,  1.67it/s]


 15%|████▉                             | 7297/50000 [1:19:22<7:13:29,  1.64it/s]


 15%|████▉                             | 7298/50000 [1:19:22<6:55:25,  1.71it/s]


 15%|████▉                             | 7299/50000 [1:19:23<6:56:56,  1.71it/s]


 15%|████▉                             | 7300/50000 [1:19:24<6:54:10,  1.72it/s]
                                                                                
{'loss': 3.3725, 'grad_norm': 2.744786024093628, 'learning_rate': 0.000854, 'epoch': 0.38}

 15%|████▉                             | 7300/50000 [1:19:24<6:54:10,  1.72it/s]


 15%|████▉                             | 7301/50000 [1:19:24<7:07:21,  1.67it/s]


 15%|████▉                             | 7302/50000 [1:19:25<8:02:23,  1.48it/s]


 15%|████▉                             | 7303/50000 [1:19:26<7:52:05,  1.51it/s]


 15%|████▉                             | 7304/50000 [1:19:26<7:46:12,  1.53it/s]


 15%|████▉                             | 7305/50000 [1:19:27<7:59:48,  1.48it/s]


 15%|████▉                             | 7306/50000 [1:19:28<7:22:34,  1.61it/s]


 15%|████▉                             | 7307/50000 [1:19:28<7:36:19,  1.56it/s]


 15%|████▉                             | 7308/50000 [1:19:29<7:08:41,  1.66it/s]


 15%|████▉                             | 7309/50000 [1:19:30<7:25:06,  1.60it/s]


 15%|████▉                             | 7310/50000 [1:19:30<7:27:48,  1.59it/s]


 15%|████▉                             | 7311/50000 [1:19:31<7:58:22,  1.49it/s]


 15%|████▉                             | 7312/50000 [1:19:32<7:48:14,  1.52it/s]


 15%|████▉                             | 7313/50000 [1:19:32<7:33:53,  1.57it/s]


 15%|████▉                             | 7314/50000 [1:19:33<7:36:07,  1.56it/s]


 15%|████▉                             | 7315/50000 [1:19:33<7:28:09,  1.59it/s]


 15%|████▉                             | 7316/50000 [1:19:34<7:23:55,  1.60it/s]


 15%|████▉                             | 7317/50000 [1:19:35<7:15:47,  1.63it/s]


 15%|████▉                             | 7318/50000 [1:19:35<7:24:42,  1.60it/s]


 15%|████▉                             | 7319/50000 [1:19:36<6:53:54,  1.72it/s]


 15%|████▉                             | 7320/50000 [1:19:36<7:12:11,  1.65it/s]


 15%|████▉                             | 7321/50000 [1:19:37<7:08:43,  1.66it/s]


 15%|████▉                             | 7322/50000 [1:19:38<7:21:16,  1.61it/s]


 15%|████▉                             | 7323/50000 [1:19:38<7:23:17,  1.60it/s]


 15%|████▉                             | 7324/50000 [1:19:39<7:30:00,  1.58it/s]


 15%|████▉                             | 7325/50000 [1:19:40<7:54:20,  1.50it/s]


 15%|████▉                             | 7326/50000 [1:19:40<7:55:49,  1.49it/s]


 15%|████▉                             | 7327/50000 [1:19:41<8:12:21,  1.44it/s]


 15%|████▉                             | 7328/50000 [1:19:42<7:46:54,  1.52it/s]


 15%|████▉                             | 7329/50000 [1:19:42<8:21:40,  1.42it/s]


 15%|████▉                             | 7330/50000 [1:19:43<7:43:04,  1.54it/s]


 15%|████▉                             | 7331/50000 [1:19:44<8:03:23,  1.47it/s]


 15%|████▉                             | 7332/50000 [1:19:44<7:55:32,  1.50it/s]


 15%|████▉                             | 7333/50000 [1:19:45<7:48:53,  1.52it/s]


 15%|████▉                             | 7334/50000 [1:19:46<8:09:10,  1.45it/s]


 15%|████▉                             | 7335/50000 [1:19:46<7:42:24,  1.54it/s]


 15%|████▉                             | 7336/50000 [1:19:47<7:39:49,  1.55it/s]


 15%|████▉                             | 7337/50000 [1:19:48<7:19:02,  1.62it/s]


 15%|████▉                             | 7338/50000 [1:19:48<7:52:32,  1.50it/s]


 15%|████▉                             | 7339/50000 [1:19:49<7:37:25,  1.55it/s]


 15%|████▉                             | 7340/50000 [1:19:50<8:22:16,  1.42it/s]


 15%|████▉                             | 7341/50000 [1:19:50<8:12:55,  1.44it/s]


 15%|████▉                             | 7342/50000 [1:19:51<8:39:06,  1.37it/s]


 15%|████▉                             | 7343/50000 [1:19:52<8:39:08,  1.37it/s]


 15%|████▉                             | 7344/50000 [1:19:52<7:54:35,  1.50it/s]


 15%|████▉                             | 7345/50000 [1:19:53<7:51:00,  1.51it/s]


 15%|████▉                             | 7346/50000 [1:19:54<7:51:16,  1.51it/s]


 15%|████▉                             | 7347/50000 [1:19:54<7:37:46,  1.55it/s]


 15%|████▉                             | 7348/50000 [1:19:55<7:20:21,  1.61it/s]


 15%|████▉                             | 7349/50000 [1:19:56<7:46:32,  1.52it/s]


 15%|████▉                             | 7350/50000 [1:19:56<8:06:09,  1.46it/s]


 15%|████▉                             | 7351/50000 [1:19:57<7:37:32,  1.55it/s]


 15%|████▉                             | 7352/50000 [1:19:58<7:36:20,  1.56it/s]


 15%|█████                             | 7353/50000 [1:19:58<7:16:36,  1.63it/s]


 15%|█████                             | 7354/50000 [1:19:59<7:16:15,  1.63it/s]


 15%|█████                             | 7355/50000 [1:20:00<7:41:05,  1.54it/s]


 15%|█████                             | 7356/50000 [1:20:00<7:55:12,  1.50it/s]


 15%|█████                             | 7357/50000 [1:20:01<7:51:07,  1.51it/s]


 15%|█████                             | 7358/50000 [1:20:01<7:35:59,  1.56it/s]


 15%|█████                             | 7359/50000 [1:20:02<7:21:21,  1.61it/s]


 15%|█████                             | 7360/50000 [1:20:03<7:16:22,  1.63it/s]


 15%|█████                             | 7361/50000 [1:20:03<7:19:43,  1.62it/s]


 15%|█████                             | 7362/50000 [1:20:04<7:15:15,  1.63it/s]


 15%|█████                             | 7363/50000 [1:20:05<7:39:38,  1.55it/s]


 15%|█████                             | 7364/50000 [1:20:05<7:22:24,  1.61it/s]


 15%|█████                             | 7365/50000 [1:20:06<7:26:35,  1.59it/s]


 15%|█████                             | 7366/50000 [1:20:07<8:10:12,  1.45it/s]


 15%|█████                             | 7367/50000 [1:20:07<7:57:20,  1.49it/s]


 15%|█████                             | 7368/50000 [1:20:08<7:36:10,  1.56it/s]


 15%|█████                             | 7369/50000 [1:20:09<7:41:37,  1.54it/s]


 15%|█████                             | 7370/50000 [1:20:09<7:41:19,  1.54it/s]


 15%|█████                             | 7371/50000 [1:20:10<7:40:22,  1.54it/s]


 15%|█████                             | 7372/50000 [1:20:10<7:30:07,  1.58it/s]


 15%|█████                             | 7373/50000 [1:20:11<7:18:28,  1.62it/s]


 15%|█████                             | 7374/50000 [1:20:12<7:25:07,  1.60it/s]


 15%|█████                             | 7375/50000 [1:20:12<7:28:28,  1.58it/s]


 15%|█████                             | 7376/50000 [1:20:13<7:28:05,  1.59it/s]


 15%|█████                             | 7377/50000 [1:20:14<8:09:43,  1.45it/s]


 15%|█████                             | 7378/50000 [1:20:14<7:40:45,  1.54it/s]


 15%|█████                             | 7379/50000 [1:20:15<7:31:09,  1.57it/s]


 15%|█████                             | 7380/50000 [1:20:15<7:20:29,  1.61it/s]


 15%|█████                             | 7381/50000 [1:20:16<7:27:45,  1.59it/s]


 15%|█████                             | 7382/50000 [1:20:17<7:48:16,  1.52it/s]


 15%|█████                             | 7383/50000 [1:20:17<7:32:25,  1.57it/s]


 15%|█████                             | 7384/50000 [1:20:18<7:48:39,  1.52it/s]


 15%|█████                             | 7385/50000 [1:20:19<7:32:43,  1.57it/s]


 15%|█████                             | 7386/50000 [1:20:20<8:01:17,  1.48it/s]


 15%|█████                             | 7387/50000 [1:20:20<7:53:35,  1.50it/s]


 15%|█████                             | 7388/50000 [1:20:21<7:40:02,  1.54it/s]


 15%|█████                             | 7389/50000 [1:20:21<7:38:44,  1.55it/s]


 15%|█████                             | 7390/50000 [1:20:22<7:28:25,  1.58it/s]


 15%|█████                             | 7391/50000 [1:20:23<7:30:36,  1.58it/s]


 15%|█████                             | 7392/50000 [1:20:24<8:16:49,  1.43it/s]


 15%|█████                             | 7393/50000 [1:20:24<8:03:09,  1.47it/s]


 15%|█████                             | 7394/50000 [1:20:25<8:23:56,  1.41it/s]


 15%|█████                             | 7395/50000 [1:20:25<7:56:01,  1.49it/s]


 15%|█████                             | 7396/50000 [1:20:26<7:56:46,  1.49it/s]


 15%|█████                             | 7397/50000 [1:20:27<7:40:53,  1.54it/s]


 15%|█████                             | 7398/50000 [1:20:28<8:01:28,  1.47it/s]


 15%|█████                             | 7399/50000 [1:20:28<7:46:07,  1.52it/s]


 15%|█████                             | 7400/50000 [1:20:29<7:58:04,  1.49it/s]
                                                                                
{'loss': 3.3556, 'grad_norm': 2.866522789001465, 'learning_rate': 0.000852, 'epoch': 0.39}

 15%|█████                             | 7400/50000 [1:20:29<7:58:04,  1.49it/s]


 15%|█████                             | 7401/50000 [1:20:30<7:59:33,  1.48it/s]


 15%|█████                             | 7402/50000 [1:20:30<7:44:36,  1.53it/s]


 15%|█████                             | 7403/50000 [1:20:31<7:40:43,  1.54it/s]


 15%|█████                             | 7404/50000 [1:20:31<7:48:45,  1.51it/s]


 15%|█████                             | 7405/50000 [1:20:32<7:43:32,  1.53it/s]


 15%|█████                             | 7406/50000 [1:20:33<7:16:12,  1.63it/s]


 15%|█████                             | 7407/50000 [1:20:33<7:08:40,  1.66it/s]


 15%|█████                             | 7408/50000 [1:20:34<7:39:03,  1.55it/s]


 15%|█████                             | 7409/50000 [1:20:35<7:26:52,  1.59it/s]


 15%|█████                             | 7410/50000 [1:20:35<7:20:05,  1.61it/s]


 15%|█████                             | 7411/50000 [1:20:36<7:17:57,  1.62it/s]


 15%|█████                             | 7412/50000 [1:20:36<7:08:27,  1.66it/s]


 15%|█████                             | 7413/50000 [1:20:37<7:08:28,  1.66it/s]


 15%|█████                             | 7414/50000 [1:20:38<7:58:34,  1.48it/s]


 15%|█████                             | 7415/50000 [1:20:38<7:47:41,  1.52it/s]


 15%|█████                             | 7416/50000 [1:20:39<8:23:17,  1.41it/s]


 15%|█████                             | 7417/50000 [1:20:40<7:52:50,  1.50it/s]


 15%|█████                             | 7418/50000 [1:20:40<7:34:07,  1.56it/s]


 15%|█████                             | 7419/50000 [1:20:41<7:22:30,  1.60it/s]


 15%|█████                             | 7420/50000 [1:20:42<7:47:19,  1.52it/s]


 15%|█████                             | 7421/50000 [1:20:42<7:16:04,  1.63it/s]


 15%|█████                             | 7422/50000 [1:20:43<7:20:39,  1.61it/s]


 15%|█████                             | 7423/50000 [1:20:43<7:27:52,  1.58it/s]


 15%|█████                             | 7424/50000 [1:20:44<7:16:50,  1.62it/s]


 15%|█████                             | 7425/50000 [1:20:45<7:14:23,  1.63it/s]


 15%|█████                             | 7426/50000 [1:20:45<6:53:04,  1.72it/s]


 15%|█████                             | 7427/50000 [1:20:46<6:52:21,  1.72it/s]


 15%|█████                             | 7428/50000 [1:20:46<6:51:49,  1.72it/s]


 15%|█████                             | 7429/50000 [1:20:47<6:58:09,  1.70it/s]


 15%|█████                             | 7430/50000 [1:20:47<6:50:33,  1.73it/s]


 15%|█████                             | 7431/50000 [1:20:48<6:46:39,  1.74it/s]


 15%|█████                             | 7432/50000 [1:20:49<6:49:10,  1.73it/s]


 15%|█████                             | 7433/50000 [1:20:49<7:08:30,  1.66it/s]


 15%|█████                             | 7434/50000 [1:20:50<6:59:31,  1.69it/s]


 15%|█████                             | 7435/50000 [1:20:50<6:59:58,  1.69it/s]


 15%|█████                             | 7436/50000 [1:20:51<6:57:35,  1.70it/s]


 15%|█████                             | 7437/50000 [1:20:52<6:51:53,  1.72it/s]


 15%|█████                             | 7438/50000 [1:20:52<7:06:44,  1.66it/s]


 15%|█████                             | 7439/50000 [1:20:53<7:23:55,  1.60it/s]


 15%|█████                             | 7440/50000 [1:20:54<7:18:21,  1.62it/s]


 15%|█████                             | 7441/50000 [1:20:54<8:19:31,  1.42it/s]


 15%|█████                             | 7442/50000 [1:20:55<8:08:08,  1.45it/s]


 15%|█████                             | 7443/50000 [1:20:56<8:05:56,  1.46it/s]


 15%|█████                             | 7444/50000 [1:20:56<8:17:08,  1.43it/s]


 15%|█████                             | 7445/50000 [1:20:57<7:45:05,  1.52it/s]


 15%|█████                             | 7446/50000 [1:20:58<7:32:09,  1.57it/s]


 15%|█████                             | 7447/50000 [1:20:58<7:33:31,  1.56it/s]


 15%|█████                             | 7448/50000 [1:20:59<7:24:58,  1.59it/s]


 15%|█████                             | 7449/50000 [1:21:00<7:46:17,  1.52it/s]


 15%|█████                             | 7450/50000 [1:21:00<7:28:21,  1.58it/s]


 15%|█████                             | 7451/50000 [1:21:01<7:02:55,  1.68it/s]


 15%|█████                             | 7452/50000 [1:21:01<7:11:27,  1.64it/s]


 15%|█████                             | 7453/50000 [1:21:02<8:01:50,  1.47it/s]


 15%|█████                             | 7454/50000 [1:21:03<8:00:36,  1.48it/s]


 15%|█████                             | 7455/50000 [1:21:03<7:44:53,  1.53it/s]


 15%|█████                             | 7456/50000 [1:21:04<7:56:57,  1.49it/s]


 15%|█████                             | 7457/50000 [1:21:05<7:34:18,  1.56it/s]


 15%|█████                             | 7458/50000 [1:21:05<7:19:46,  1.61it/s]


 15%|█████                             | 7459/50000 [1:21:06<7:13:39,  1.63it/s]


 15%|█████                             | 7460/50000 [1:21:07<7:25:25,  1.59it/s]


 15%|█████                             | 7461/50000 [1:21:07<7:10:54,  1.65it/s]


 15%|█████                             | 7462/50000 [1:21:08<7:26:20,  1.59it/s]


 15%|█████                             | 7463/50000 [1:21:08<7:00:05,  1.69it/s]


 15%|█████                             | 7464/50000 [1:21:09<7:11:52,  1.64it/s]


 15%|█████                             | 7465/50000 [1:21:10<7:49:53,  1.51it/s]


 15%|█████                             | 7466/50000 [1:21:10<7:41:30,  1.54it/s]


 15%|█████                             | 7467/50000 [1:21:11<7:57:40,  1.48it/s]


 15%|█████                             | 7468/50000 [1:21:12<7:53:44,  1.50it/s]


 15%|█████                             | 7469/50000 [1:21:12<7:49:45,  1.51it/s]


 15%|█████                             | 7470/50000 [1:21:13<7:49:18,  1.51it/s]


 15%|█████                             | 7471/50000 [1:21:14<7:44:55,  1.52it/s]


 15%|█████                             | 7472/50000 [1:21:14<7:39:10,  1.54it/s]


 15%|█████                             | 7473/50000 [1:21:15<7:25:45,  1.59it/s]


 15%|█████                             | 7474/50000 [1:21:16<7:29:50,  1.58it/s]


 15%|█████                             | 7475/50000 [1:21:16<7:32:44,  1.57it/s]


 15%|█████                             | 7476/50000 [1:21:17<7:23:53,  1.60it/s]


 15%|█████                             | 7477/50000 [1:21:17<7:17:43,  1.62it/s]


 15%|█████                             | 7478/50000 [1:21:18<6:56:40,  1.70it/s]


 15%|█████                             | 7479/50000 [1:21:19<7:30:16,  1.57it/s]


 15%|█████                             | 7480/50000 [1:21:19<7:31:35,  1.57it/s]


 15%|█████                             | 7481/50000 [1:21:20<7:19:03,  1.61it/s]


 15%|█████                             | 7482/50000 [1:21:21<7:30:24,  1.57it/s]


 15%|█████                             | 7483/50000 [1:21:21<7:41:12,  1.54it/s]


 15%|█████                             | 7484/50000 [1:21:22<7:20:27,  1.61it/s]


 15%|█████                             | 7485/50000 [1:21:23<8:20:38,  1.42it/s]


 15%|█████                             | 7486/50000 [1:21:24<8:43:31,  1.35it/s]


 15%|█████                             | 7487/50000 [1:21:25<9:35:32,  1.23it/s]


 15%|█████                             | 7488/50000 [1:21:25<8:25:31,  1.40it/s]


 15%|█████                             | 7489/50000 [1:21:26<7:42:58,  1.53it/s]


 15%|█████                             | 7490/50000 [1:21:26<7:46:22,  1.52it/s]


 15%|█████                             | 7491/50000 [1:21:27<7:47:25,  1.52it/s]


 15%|█████                             | 7492/50000 [1:21:28<7:49:44,  1.51it/s]


 15%|█████                             | 7493/50000 [1:21:28<7:44:20,  1.53it/s]


 15%|█████                             | 7494/50000 [1:21:29<7:41:36,  1.53it/s]


 15%|█████                             | 7495/50000 [1:21:29<7:31:26,  1.57it/s]


 15%|█████                             | 7496/50000 [1:21:30<7:35:07,  1.56it/s]


 15%|█████                             | 7497/50000 [1:21:31<7:40:26,  1.54it/s]


 15%|█████                             | 7498/50000 [1:21:31<7:46:51,  1.52it/s]


 15%|█████                             | 7499/50000 [1:21:32<7:31:10,  1.57it/s]


 15%|█████                             | 7500/50000 [1:21:33<7:14:54,  1.63it/s]
                                                                                
{'loss': 3.3695, 'grad_norm': 2.816802501678467, 'learning_rate': 0.00085, 'epoch': 0.39}

 15%|█████                             | 7500/50000 [1:21:33<7:14:54,  1.63it/s]


 15%|█████                             | 7501/50000 [1:21:33<7:44:32,  1.52it/s]


 15%|█████                             | 7502/50000 [1:21:34<8:02:21,  1.47it/s]


 15%|█████                             | 7503/50000 [1:21:35<8:24:22,  1.40it/s]


 15%|█████                             | 7504/50000 [1:21:35<8:12:10,  1.44it/s]


 15%|█████                             | 7505/50000 [1:21:36<7:34:42,  1.56it/s]


 15%|█████                             | 7506/50000 [1:21:37<7:21:19,  1.60it/s]


 15%|█████                             | 7507/50000 [1:21:37<7:12:02,  1.64it/s]


 15%|█████                             | 7508/50000 [1:21:38<7:20:34,  1.61it/s]


 15%|█████                             | 7509/50000 [1:21:39<7:42:51,  1.53it/s]


 15%|█████                             | 7510/50000 [1:21:39<7:45:48,  1.52it/s]


 15%|█████                             | 7511/50000 [1:21:40<7:47:52,  1.51it/s]


 15%|█████                             | 7512/50000 [1:21:40<7:41:15,  1.54it/s]


 15%|█████                             | 7513/50000 [1:21:41<7:41:48,  1.53it/s]


 15%|█████                             | 7514/50000 [1:21:42<7:47:47,  1.51it/s]


 15%|█████                             | 7515/50000 [1:21:42<7:18:58,  1.61it/s]


 15%|█████                             | 7516/50000 [1:21:43<7:16:47,  1.62it/s]


 15%|█████                             | 7517/50000 [1:21:44<7:02:05,  1.68it/s]


 15%|█████                             | 7518/50000 [1:21:44<6:42:59,  1.76it/s]


 15%|█████                             | 7519/50000 [1:21:45<7:03:53,  1.67it/s]


 15%|█████                             | 7520/50000 [1:21:45<7:10:28,  1.64it/s]


 15%|█████                             | 7521/50000 [1:21:46<7:17:40,  1.62it/s]


 15%|█████                             | 7522/50000 [1:21:47<7:16:02,  1.62it/s]


 15%|█████                             | 7523/50000 [1:21:47<7:09:29,  1.65it/s]


 15%|█████                             | 7524/50000 [1:21:48<7:15:41,  1.62it/s]


 15%|█████                             | 7525/50000 [1:21:48<7:22:45,  1.60it/s]


 15%|█████                             | 7526/50000 [1:21:49<7:01:19,  1.68it/s]


 15%|█████                             | 7527/50000 [1:21:50<6:57:54,  1.69it/s]


 15%|█████                             | 7528/50000 [1:21:50<6:51:35,  1.72it/s]


 15%|█████                             | 7529/50000 [1:21:51<6:52:21,  1.72it/s]


 15%|█████                             | 7530/50000 [1:21:51<6:51:31,  1.72it/s]


 15%|█████                             | 7531/50000 [1:21:52<6:35:29,  1.79it/s]


 15%|█████                             | 7532/50000 [1:21:53<7:18:02,  1.62it/s]


 15%|█████                             | 7533/50000 [1:21:53<7:24:28,  1.59it/s]


 15%|█████                             | 7534/50000 [1:21:54<7:53:32,  1.49it/s]


 15%|█████                             | 7535/50000 [1:21:55<7:38:22,  1.54it/s]


 15%|█████                             | 7536/50000 [1:21:55<7:59:19,  1.48it/s]


 15%|█████▏                            | 7537/50000 [1:21:56<7:34:19,  1.56it/s]


 15%|█████▏                            | 7538/50000 [1:21:56<7:24:52,  1.59it/s]


 15%|█████▏                            | 7539/50000 [1:21:57<7:42:40,  1.53it/s]


 15%|█████▏                            | 7540/50000 [1:21:58<7:41:33,  1.53it/s]


 15%|█████▏                            | 7541/50000 [1:21:58<7:38:09,  1.54it/s]


 15%|█████▏                            | 7542/50000 [1:21:59<8:18:55,  1.42it/s]


 15%|█████▏                            | 7543/50000 [1:22:00<8:24:01,  1.40it/s]


 15%|█████▏                            | 7544/50000 [1:22:01<8:29:07,  1.39it/s]


 15%|█████▏                            | 7545/50000 [1:22:01<8:17:09,  1.42it/s]


 15%|█████▏                            | 7546/50000 [1:22:02<8:03:30,  1.46it/s]


 15%|█████▏                            | 7547/50000 [1:22:03<7:55:09,  1.49it/s]


 15%|█████▏                            | 7548/50000 [1:22:03<7:16:30,  1.62it/s]


 15%|█████▏                            | 7549/50000 [1:22:04<7:06:09,  1.66it/s]


 15%|█████▏                            | 7550/50000 [1:22:04<7:04:09,  1.67it/s]


 15%|█████▏                            | 7551/50000 [1:22:05<7:52:36,  1.50it/s]


 15%|█████▏                            | 7552/50000 [1:22:06<7:50:10,  1.50it/s]


 15%|█████▏                            | 7553/50000 [1:22:07<8:06:33,  1.45it/s]


 15%|█████▏                            | 7554/50000 [1:22:07<7:39:43,  1.54it/s]


 15%|█████▏                            | 7555/50000 [1:22:08<7:27:46,  1.58it/s]


 15%|█████▏                            | 7556/50000 [1:22:08<7:49:44,  1.51it/s]


 15%|█████▏                            | 7557/50000 [1:22:09<7:20:37,  1.61it/s]


 15%|█████▏                            | 7558/50000 [1:22:10<7:10:04,  1.64it/s]


 15%|█████▏                            | 7559/50000 [1:22:10<7:25:49,  1.59it/s]


 15%|█████▏                            | 7560/50000 [1:22:11<7:31:13,  1.57it/s]


 15%|█████▏                            | 7561/50000 [1:22:12<7:40:09,  1.54it/s]


 15%|█████▏                            | 7562/50000 [1:22:12<7:45:06,  1.52it/s]


 15%|█████▏                            | 7563/50000 [1:22:13<8:46:47,  1.34it/s]


 15%|█████▏                            | 7564/50000 [1:22:14<9:02:50,  1.30it/s]


 15%|█████▏                            | 7565/50000 [1:22:15<8:26:49,  1.40it/s]


 15%|█████▏                            | 7566/50000 [1:22:15<8:01:32,  1.47it/s]


 15%|█████▏                            | 7567/50000 [1:22:16<7:35:56,  1.55it/s]


 15%|█████▏                            | 7568/50000 [1:22:17<8:14:20,  1.43it/s]


 15%|█████▏                            | 7569/50000 [1:22:17<7:36:05,  1.55it/s]


 15%|█████▏                            | 7570/50000 [1:22:18<7:21:05,  1.60it/s]


 15%|█████▏                            | 7571/50000 [1:22:18<7:14:12,  1.63it/s]


 15%|█████▏                            | 7572/50000 [1:22:19<7:09:25,  1.65it/s]


 15%|█████▏                            | 7573/50000 [1:22:20<7:31:40,  1.57it/s]


 15%|█████▏                            | 7574/50000 [1:22:20<7:07:27,  1.65it/s]


 15%|█████▏                            | 7575/50000 [1:22:21<6:59:20,  1.69it/s]


 15%|█████▏                            | 7576/50000 [1:22:21<6:49:36,  1.73it/s]


 15%|█████▏                            | 7577/50000 [1:22:22<7:02:49,  1.67it/s]


 15%|█████▏                            | 7578/50000 [1:22:23<7:12:01,  1.64it/s]


 15%|█████▏                            | 7579/50000 [1:22:23<6:52:21,  1.71it/s]


 15%|█████▏                            | 7580/50000 [1:22:24<7:05:18,  1.66it/s]


 15%|█████▏                            | 7581/50000 [1:22:24<7:16:23,  1.62it/s]


 15%|█████▏                            | 7582/50000 [1:22:25<7:20:19,  1.61it/s]


 15%|█████▏                            | 7583/50000 [1:22:26<7:28:25,  1.58it/s]


 15%|█████▏                            | 7584/50000 [1:22:26<7:22:02,  1.60it/s]


 15%|█████▏                            | 7585/50000 [1:22:27<7:41:55,  1.53it/s]


 15%|█████▏                            | 7586/50000 [1:22:28<7:40:47,  1.53it/s]


 15%|█████▏                            | 7587/50000 [1:22:28<7:30:44,  1.57it/s]


 15%|█████▏                            | 7588/50000 [1:22:29<7:50:02,  1.50it/s]


 15%|█████▏                            | 7589/50000 [1:22:30<7:42:49,  1.53it/s]


 15%|█████▏                            | 7590/50000 [1:22:30<7:57:22,  1.48it/s]


 15%|█████▏                            | 7591/50000 [1:22:31<7:53:20,  1.49it/s]


 15%|█████▏                            | 7592/50000 [1:22:32<7:31:14,  1.57it/s]


 15%|█████▏                            | 7593/50000 [1:22:32<7:36:02,  1.55it/s]


 15%|█████▏                            | 7594/50000 [1:22:33<7:21:27,  1.60it/s]


 15%|█████▏                            | 7595/50000 [1:22:33<7:45:54,  1.52it/s]


 15%|█████▏                            | 7596/50000 [1:22:34<7:39:36,  1.54it/s]


 15%|█████▏                            | 7597/50000 [1:22:35<7:36:42,  1.55it/s]


 15%|█████▏                            | 7598/50000 [1:22:35<7:23:49,  1.59it/s]


 15%|█████▏                            | 7599/50000 [1:22:36<7:36:19,  1.55it/s]


 15%|█████▏                            | 7600/50000 [1:22:37<7:41:24,  1.53it/s]
                                                                                
{'loss': 3.3667, 'grad_norm': 2.609954833984375, 'learning_rate': 0.000848, 'epoch': 0.4}

 15%|█████▏                            | 7600/50000 [1:22:37<7:41:24,  1.53it/s]


 15%|█████▏                            | 7601/50000 [1:22:37<7:41:49,  1.53it/s]


 15%|█████▏                            | 7602/50000 [1:22:38<7:09:59,  1.64it/s]


 15%|█████▏                            | 7603/50000 [1:22:38<7:02:29,  1.67it/s]


 15%|█████▏                            | 7604/50000 [1:22:39<7:14:30,  1.63it/s]


 15%|█████▏                            | 7605/50000 [1:22:40<7:08:14,  1.65it/s]


 15%|█████▏                            | 7606/50000 [1:22:40<7:06:26,  1.66it/s]


 15%|█████▏                            | 7607/50000 [1:22:41<8:36:04,  1.37it/s]


 15%|█████▏                            | 7608/50000 [1:22:42<8:53:42,  1.32it/s]


 15%|█████▏                            | 7609/50000 [1:22:43<9:51:00,  1.20it/s]


 15%|█████▏                            | 7610/50000 [1:22:44<9:11:04,  1.28it/s]


 15%|█████▏                            | 7611/50000 [1:22:44<8:47:13,  1.34it/s]


 15%|█████▏                            | 7612/50000 [1:22:45<9:23:09,  1.25it/s]


 15%|█████▏                            | 7613/50000 [1:22:46<8:36:47,  1.37it/s]


 15%|█████▏                            | 7614/50000 [1:22:47<9:05:05,  1.30it/s]


 15%|█████▏                            | 7615/50000 [1:22:47<8:34:56,  1.37it/s]


 15%|█████▏                            | 7616/50000 [1:22:48<8:03:09,  1.46it/s]


 15%|█████▏                            | 7617/50000 [1:22:49<7:51:36,  1.50it/s]


 15%|█████▏                            | 7618/50000 [1:22:49<7:32:48,  1.56it/s]


 15%|█████▏                            | 7619/50000 [1:22:50<8:17:59,  1.42it/s]


 15%|█████▏                            | 7620/50000 [1:22:51<8:03:47,  1.46it/s]


 15%|█████▏                            | 7621/50000 [1:22:51<7:52:34,  1.49it/s]


 15%|█████▏                            | 7622/50000 [1:22:52<7:50:11,  1.50it/s]


 15%|█████▏                            | 7623/50000 [1:22:53<7:29:33,  1.57it/s]


 15%|█████▏                            | 7624/50000 [1:22:53<7:33:48,  1.56it/s]


 15%|█████▏                            | 7625/50000 [1:22:54<7:19:24,  1.61it/s]


 15%|█████▏                            | 7626/50000 [1:22:54<7:11:59,  1.63it/s]


 15%|█████▏                            | 7627/50000 [1:22:55<7:09:41,  1.64it/s]


 15%|█████▏                            | 7628/50000 [1:22:56<7:02:30,  1.67it/s]


 15%|█████▏                            | 7629/50000 [1:22:56<7:59:17,  1.47it/s]


 15%|█████▏                            | 7630/50000 [1:22:57<7:58:49,  1.47it/s]


 15%|█████▏                            | 7631/50000 [1:22:58<7:39:46,  1.54it/s]


 15%|█████▏                            | 7632/50000 [1:22:58<7:35:42,  1.55it/s]


 15%|█████▏                            | 7633/50000 [1:22:59<7:26:02,  1.58it/s]


 15%|█████▏                            | 7634/50000 [1:23:00<8:04:51,  1.46it/s]


 15%|█████▏                            | 7635/50000 [1:23:00<7:45:06,  1.52it/s]


 15%|█████▏                            | 7636/50000 [1:23:01<7:33:53,  1.56it/s]


 15%|█████▏                            | 7637/50000 [1:23:02<7:56:10,  1.48it/s]


 15%|█████▏                            | 7638/50000 [1:23:02<7:56:31,  1.48it/s]


 15%|█████▏                            | 7639/50000 [1:23:03<7:39:00,  1.54it/s]


 15%|█████▏                            | 7640/50000 [1:23:04<7:35:58,  1.55it/s]


 15%|█████▏                            | 7641/50000 [1:23:04<7:35:07,  1.55it/s]


 15%|█████▏                            | 7642/50000 [1:23:05<7:17:23,  1.61it/s]


 15%|█████▏                            | 7643/50000 [1:23:05<7:30:24,  1.57it/s]


 15%|█████▏                            | 7644/50000 [1:23:06<7:35:25,  1.55it/s]


 15%|█████▏                            | 7645/50000 [1:23:07<7:55:28,  1.48it/s]


 15%|█████▏                            | 7646/50000 [1:23:07<7:38:05,  1.54it/s]


 15%|█████▏                            | 7647/50000 [1:23:08<8:22:30,  1.40it/s]


 15%|█████▏                            | 7648/50000 [1:23:09<8:14:29,  1.43it/s]


 15%|█████▏                            | 7649/50000 [1:23:10<7:48:43,  1.51it/s]


 15%|█████▏                            | 7650/50000 [1:23:10<7:52:07,  1.50it/s]


 15%|█████▏                            | 7651/50000 [1:23:11<7:37:01,  1.54it/s]


 15%|█████▏                            | 7652/50000 [1:23:11<7:33:10,  1.56it/s]


 15%|█████▏                            | 7653/50000 [1:23:12<7:23:51,  1.59it/s]


 15%|█████▏                            | 7654/50000 [1:23:13<7:30:44,  1.57it/s]


 15%|█████▏                            | 7655/50000 [1:23:13<6:57:24,  1.69it/s]


 15%|█████▏                            | 7656/50000 [1:23:14<7:54:01,  1.49it/s]


 15%|█████▏                            | 7657/50000 [1:23:15<8:02:36,  1.46it/s]


 15%|█████▏                            | 7658/50000 [1:23:15<7:53:40,  1.49it/s]


 15%|█████▏                            | 7659/50000 [1:23:16<7:39:19,  1.54it/s]


 15%|█████▏                            | 7660/50000 [1:23:17<7:22:49,  1.59it/s]


 15%|█████▏                            | 7661/50000 [1:23:17<7:17:44,  1.61it/s]


 15%|█████▏                            | 7662/50000 [1:23:18<7:23:38,  1.59it/s]


 15%|█████▏                            | 7663/50000 [1:23:19<8:02:39,  1.46it/s]


 15%|█████▏                            | 7664/50000 [1:23:19<7:55:06,  1.49it/s]


 15%|█████▏                            | 7665/50000 [1:23:20<8:11:54,  1.43it/s]


 15%|█████▏                            | 7666/50000 [1:23:21<7:45:30,  1.52it/s]


 15%|█████▏                            | 7667/50000 [1:23:21<7:40:24,  1.53it/s]


 15%|█████▏                            | 7668/50000 [1:23:22<7:27:23,  1.58it/s]


 15%|█████▏                            | 7669/50000 [1:23:23<8:12:00,  1.43it/s]


 15%|█████▏                            | 7670/50000 [1:23:23<7:53:30,  1.49it/s]


 15%|█████▏                            | 7671/50000 [1:23:24<8:09:27,  1.44it/s]


 15%|█████▏                            | 7672/50000 [1:23:25<8:01:05,  1.47it/s]


 15%|█████▏                            | 7673/50000 [1:23:25<7:48:40,  1.51it/s]


 15%|█████▏                            | 7674/50000 [1:23:26<7:26:33,  1.58it/s]


 15%|█████▏                            | 7675/50000 [1:23:27<8:14:24,  1.43it/s]


 15%|█████▏                            | 7676/50000 [1:23:27<7:45:50,  1.51it/s]


 15%|█████▏                            | 7677/50000 [1:23:28<7:28:34,  1.57it/s]


 15%|█████▏                            | 7678/50000 [1:23:29<7:29:50,  1.57it/s]


 15%|█████▏                            | 7679/50000 [1:23:29<7:18:41,  1.61it/s]


 15%|█████▏                            | 7680/50000 [1:23:30<7:08:10,  1.65it/s]


 15%|█████▏                            | 7681/50000 [1:23:30<7:04:48,  1.66it/s]


 15%|█████▏                            | 7682/50000 [1:23:31<7:13:11,  1.63it/s]


 15%|█████▏                            | 7683/50000 [1:23:32<7:38:21,  1.54it/s]


 15%|█████▏                            | 7684/50000 [1:23:33<8:30:41,  1.38it/s]


 15%|█████▏                            | 7685/50000 [1:23:33<7:48:31,  1.51it/s]


 15%|█████▏                            | 7686/50000 [1:23:34<7:36:03,  1.55it/s]


 15%|█████▏                            | 7687/50000 [1:23:34<8:01:43,  1.46it/s]


 15%|█████▏                            | 7688/50000 [1:23:35<7:57:30,  1.48it/s]


 15%|█████▏                            | 7689/50000 [1:23:36<7:51:50,  1.49it/s]


 15%|█████▏                            | 7690/50000 [1:23:36<7:39:50,  1.53it/s]


 15%|█████▏                            | 7691/50000 [1:23:37<7:24:44,  1.59it/s]


 15%|█████▏                            | 7692/50000 [1:23:38<7:55:04,  1.48it/s]


 15%|█████▏                            | 7693/50000 [1:23:38<7:44:53,  1.52it/s]


 15%|█████▏                            | 7694/50000 [1:23:39<7:50:57,  1.50it/s]


 15%|█████▏                            | 7695/50000 [1:23:40<7:43:14,  1.52it/s]


 15%|█████▏                            | 7696/50000 [1:23:40<7:48:27,  1.51it/s]


 15%|█████▏                            | 7697/50000 [1:23:41<7:34:50,  1.55it/s]


 15%|█████▏                            | 7698/50000 [1:23:42<7:19:53,  1.60it/s]


 15%|█████▏                            | 7699/50000 [1:23:42<7:05:25,  1.66it/s]


 15%|█████▏                            | 7700/50000 [1:23:43<7:38:47,  1.54it/s]
                                                                                
{'loss': 3.3878, 'grad_norm': 2.672886610031128, 'learning_rate': 0.000846, 'epoch': 0.4}

 15%|█████▏                            | 7700/50000 [1:23:43<7:38:47,  1.54it/s]


 15%|█████▏                            | 7701/50000 [1:23:43<7:26:51,  1.58it/s]


 15%|█████▏                            | 7702/50000 [1:23:44<7:33:16,  1.56it/s]


 15%|█████▏                            | 7703/50000 [1:23:45<7:24:03,  1.59it/s]


 15%|█████▏                            | 7704/50000 [1:23:45<7:20:16,  1.60it/s]


 15%|█████▏                            | 7705/50000 [1:23:46<7:07:06,  1.65it/s]


 15%|█████▏                            | 7706/50000 [1:23:47<7:12:34,  1.63it/s]


 15%|█████▏                            | 7707/50000 [1:23:47<7:06:58,  1.65it/s]


 15%|█████▏                            | 7708/50000 [1:23:48<7:08:29,  1.65it/s]


 15%|█████▏                            | 7709/50000 [1:23:48<7:08:13,  1.65it/s]


 15%|█████▏                            | 7710/50000 [1:23:49<7:25:18,  1.58it/s]


 15%|█████▏                            | 7711/50000 [1:23:50<7:48:57,  1.50it/s]


 15%|█████▏                            | 7712/50000 [1:23:50<7:28:53,  1.57it/s]


 15%|█████▏                            | 7713/50000 [1:23:51<7:32:12,  1.56it/s]


 15%|█████▏                            | 7714/50000 [1:23:52<7:30:04,  1.57it/s]


 15%|█████▏                            | 7715/50000 [1:23:52<7:31:15,  1.56it/s]


 15%|█████▏                            | 7716/50000 [1:23:53<7:49:58,  1.50it/s]


 15%|█████▏                            | 7717/50000 [1:23:54<7:52:03,  1.49it/s]


 15%|█████▏                            | 7718/50000 [1:23:54<7:17:56,  1.61it/s]


 15%|█████▏                            | 7719/50000 [1:23:55<7:25:17,  1.58it/s]


 15%|█████▏                            | 7720/50000 [1:23:55<7:21:30,  1.60it/s]


 15%|█████▎                            | 7721/50000 [1:23:56<7:18:42,  1.61it/s]


 15%|█████▎                            | 7722/50000 [1:23:57<7:22:01,  1.59it/s]


 15%|█████▎                            | 7723/50000 [1:23:57<7:27:44,  1.57it/s]


 15%|█████▎                            | 7724/50000 [1:23:58<7:22:45,  1.59it/s]


 15%|█████▎                            | 7725/50000 [1:23:59<7:19:22,  1.60it/s]


 15%|█████▎                            | 7726/50000 [1:23:59<7:26:28,  1.58it/s]


 15%|█████▎                            | 7727/50000 [1:24:00<7:34:08,  1.55it/s]


 15%|█████▎                            | 7728/50000 [1:24:01<7:56:02,  1.48it/s]


 15%|█████▎                            | 7729/50000 [1:24:01<8:07:18,  1.45it/s]


 15%|█████▎                            | 7730/50000 [1:24:02<7:39:58,  1.53it/s]


 15%|█████▎                            | 7731/50000 [1:24:03<7:56:04,  1.48it/s]


 15%|█████▎                            | 7732/50000 [1:24:03<7:49:22,  1.50it/s]


 15%|█████▎                            | 7733/50000 [1:24:04<8:09:27,  1.44it/s]


 15%|█████▎                            | 7734/50000 [1:24:05<7:45:30,  1.51it/s]


 15%|█████▎                            | 7735/50000 [1:24:05<7:49:55,  1.50it/s]


 15%|█████▎                            | 7736/50000 [1:24:06<8:04:24,  1.45it/s]


 15%|█████▎                            | 7737/50000 [1:24:07<7:35:03,  1.55it/s]


 15%|█████▎                            | 7738/50000 [1:24:07<7:33:21,  1.55it/s]


 15%|█████▎                            | 7739/50000 [1:24:08<7:29:28,  1.57it/s]


 15%|█████▎                            | 7740/50000 [1:24:09<7:37:38,  1.54it/s]


 15%|█████▎                            | 7741/50000 [1:24:09<8:02:39,  1.46it/s]


 15%|█████▎                            | 7742/50000 [1:24:10<8:15:51,  1.42it/s]


 15%|█████▎                            | 7743/50000 [1:24:11<8:26:29,  1.39it/s]


 15%|█████▎                            | 7744/50000 [1:24:12<8:15:47,  1.42it/s]


 15%|█████▎                            | 7745/50000 [1:24:12<7:49:15,  1.50it/s]


 15%|█████▎                            | 7746/50000 [1:24:13<7:16:36,  1.61it/s]


 15%|█████▎                            | 7747/50000 [1:24:13<7:47:19,  1.51it/s]


 15%|█████▎                            | 7748/50000 [1:24:14<7:42:59,  1.52it/s]


 15%|█████▎                            | 7749/50000 [1:24:15<7:37:30,  1.54it/s]


 16%|█████▎                            | 7750/50000 [1:24:15<7:43:10,  1.52it/s]


 16%|█████▎                            | 7751/50000 [1:24:16<7:47:02,  1.51it/s]


 16%|█████▎                            | 7752/50000 [1:24:17<7:33:34,  1.55it/s]


 16%|█████▎                            | 7753/50000 [1:24:17<7:18:56,  1.60it/s]


 16%|█████▎                            | 7754/50000 [1:24:18<8:21:11,  1.40it/s]


 16%|█████▎                            | 7755/50000 [1:24:19<8:14:57,  1.42it/s]


 16%|█████▎                            | 7756/50000 [1:24:19<7:54:35,  1.48it/s]


 16%|█████▎                            | 7757/50000 [1:24:20<8:04:25,  1.45it/s]


 16%|█████▎                            | 7758/50000 [1:24:21<7:48:21,  1.50it/s]


 16%|█████▎                            | 7759/50000 [1:24:21<7:46:26,  1.51it/s]


 16%|█████▎                            | 7760/50000 [1:24:22<7:33:25,  1.55it/s]


 16%|█████▎                            | 7761/50000 [1:24:23<7:54:12,  1.48it/s]


 16%|█████▎                            | 7762/50000 [1:24:23<7:46:40,  1.51it/s]


 16%|█████▎                            | 7763/50000 [1:24:24<8:04:45,  1.45it/s]


 16%|█████▎                            | 7764/50000 [1:24:25<8:15:56,  1.42it/s]


 16%|█████▎                            | 7765/50000 [1:24:26<8:01:49,  1.46it/s]


 16%|█████▎                            | 7766/50000 [1:24:26<8:33:23,  1.37it/s]


 16%|█████▎                            | 7767/50000 [1:24:27<8:15:19,  1.42it/s]


 16%|█████▎                            | 7768/50000 [1:24:28<7:49:20,  1.50it/s]


 16%|█████▎                            | 7769/50000 [1:24:28<7:44:24,  1.52it/s]


 16%|█████▎                            | 7770/50000 [1:24:29<7:34:16,  1.55it/s]


 16%|█████▎                            | 7771/50000 [1:24:30<7:54:40,  1.48it/s]


 16%|█████▎                            | 7772/50000 [1:24:30<7:49:12,  1.50it/s]


 16%|█████▎                            | 7773/50000 [1:24:31<7:58:50,  1.47it/s]


 16%|█████▎                            | 7774/50000 [1:24:32<7:40:03,  1.53it/s]


 16%|█████▎                            | 7775/50000 [1:24:32<7:53:48,  1.49it/s]


 16%|█████▎                            | 7776/50000 [1:24:33<7:27:13,  1.57it/s]


 16%|█████▎                            | 7777/50000 [1:24:33<7:33:58,  1.55it/s]


 16%|█████▎                            | 7778/50000 [1:24:34<7:21:19,  1.59it/s]


 16%|█████▎                            | 7779/50000 [1:24:35<8:19:40,  1.41it/s]


 16%|█████▎                            | 7780/50000 [1:24:36<8:12:00,  1.43it/s]


 16%|█████▎                            | 7781/50000 [1:24:36<7:51:54,  1.49it/s]


 16%|█████▎                            | 7782/50000 [1:24:37<7:48:28,  1.50it/s]


 16%|█████▎                            | 7783/50000 [1:24:38<8:01:27,  1.46it/s]


 16%|█████▎                            | 7784/50000 [1:24:38<7:40:27,  1.53it/s]


 16%|█████▎                            | 7785/50000 [1:24:39<8:15:20,  1.42it/s]


 16%|█████▎                            | 7786/50000 [1:24:40<8:10:11,  1.44it/s]


 16%|█████▎                            | 7787/50000 [1:24:41<8:40:05,  1.35it/s]


 16%|█████▎                            | 7788/50000 [1:24:41<8:06:12,  1.45it/s]


 16%|█████▎                            | 7789/50000 [1:24:42<8:16:29,  1.42it/s]


 16%|█████▎                            | 7790/50000 [1:24:42<8:04:27,  1.45it/s]


 16%|█████▎                            | 7791/50000 [1:24:43<7:53:24,  1.49it/s]


 16%|█████▎                            | 7792/50000 [1:24:44<7:16:52,  1.61it/s]


 16%|█████▎                            | 7793/50000 [1:24:44<7:25:23,  1.58it/s]


 16%|█████▎                            | 7794/50000 [1:24:45<7:16:37,  1.61it/s]


 16%|█████▎                            | 7795/50000 [1:24:46<8:16:15,  1.42it/s]


 16%|█████▎                            | 7796/50000 [1:24:46<8:19:34,  1.41it/s]


 16%|█████▎                            | 7797/50000 [1:24:47<7:55:46,  1.48it/s]


 16%|█████▎                            | 7798/50000 [1:24:48<8:10:34,  1.43it/s]


 16%|█████▎                            | 7799/50000 [1:24:48<7:48:08,  1.50it/s]


 16%|█████▎                            | 7800/50000 [1:24:49<7:27:12,  1.57it/s]
                                                                                
{'loss': 3.361, 'grad_norm': 2.9021952152252197, 'learning_rate': 0.000844, 'epoch': 0.41}

 16%|█████▎                            | 7800/50000 [1:24:49<7:27:12,  1.57it/s]


 16%|█████▎                            | 7801/50000 [1:24:50<7:49:13,  1.50it/s]


 16%|█████▎                            | 7802/50000 [1:24:51<8:13:59,  1.42it/s]


 16%|█████▎                            | 7803/50000 [1:24:51<8:07:59,  1.44it/s]


 16%|█████▎                            | 7804/50000 [1:24:52<7:47:41,  1.50it/s]


 16%|█████▎                            | 7805/50000 [1:24:52<7:32:14,  1.56it/s]


 16%|█████▎                            | 7806/50000 [1:24:53<7:41:27,  1.52it/s]


 16%|█████▎                            | 7807/50000 [1:24:54<7:25:07,  1.58it/s]


 16%|█████▎                            | 7808/50000 [1:24:54<7:32:25,  1.55it/s]


 16%|█████▎                            | 7809/50000 [1:24:55<7:38:53,  1.53it/s]


 16%|█████▎                            | 7810/50000 [1:24:56<7:21:57,  1.59it/s]


 16%|█████▎                            | 7811/50000 [1:24:56<7:31:36,  1.56it/s]


 16%|█████▎                            | 7812/50000 [1:24:57<7:18:17,  1.60it/s]


 16%|█████▎                            | 7813/50000 [1:24:57<7:24:50,  1.58it/s]


 16%|█████▎                            | 7814/50000 [1:24:58<7:36:19,  1.54it/s]


 16%|█████▎                            | 7815/50000 [1:24:59<7:26:48,  1.57it/s]


 16%|█████▎                            | 7816/50000 [1:24:59<7:21:29,  1.59it/s]


 16%|█████▎                            | 7817/50000 [1:25:00<7:21:52,  1.59it/s]


 16%|█████▎                            | 7818/50000 [1:25:01<8:02:14,  1.46it/s]


 16%|█████▎                            | 7819/50000 [1:25:01<7:24:16,  1.58it/s]


 16%|█████▎                            | 7820/50000 [1:25:02<7:29:04,  1.57it/s]


 16%|█████▎                            | 7821/50000 [1:25:03<7:11:08,  1.63it/s]


 16%|█████▎                            | 7822/50000 [1:25:03<7:15:56,  1.61it/s]


 16%|█████▎                            | 7823/50000 [1:25:04<7:05:52,  1.65it/s]


 16%|█████▎                            | 7824/50000 [1:25:05<7:36:47,  1.54it/s]


 16%|█████▎                            | 7825/50000 [1:25:05<7:33:53,  1.55it/s]


 16%|█████▎                            | 7826/50000 [1:25:06<7:34:23,  1.55it/s]


 16%|█████▎                            | 7827/50000 [1:25:07<7:58:22,  1.47it/s]


 16%|█████▎                            | 7828/50000 [1:25:07<7:39:31,  1.53it/s]


 16%|█████▎                            | 7829/50000 [1:25:08<7:41:04,  1.52it/s]


 16%|█████▎                            | 7830/50000 [1:25:08<7:20:49,  1.59it/s]


 16%|█████▎                            | 7831/50000 [1:25:09<7:05:18,  1.65it/s]


 16%|█████▎                            | 7832/50000 [1:25:09<6:57:06,  1.68it/s]


 16%|█████▎                            | 7833/50000 [1:25:10<7:24:01,  1.58it/s]


 16%|█████▎                            | 7834/50000 [1:25:11<7:16:53,  1.61it/s]


 16%|█████▎                            | 7835/50000 [1:25:11<7:14:28,  1.62it/s]


 16%|█████▎                            | 7836/50000 [1:25:12<7:06:14,  1.65it/s]


 16%|█████▎                            | 7837/50000 [1:25:13<7:19:11,  1.60it/s]


 16%|█████▎                            | 7838/50000 [1:25:13<7:42:12,  1.52it/s]


 16%|█████▎                            | 7839/50000 [1:25:14<7:40:13,  1.53it/s]


 16%|█████▎                            | 7840/50000 [1:25:15<7:28:05,  1.57it/s]


 16%|█████▎                            | 7841/50000 [1:25:15<7:52:17,  1.49it/s]


 16%|█████▎                            | 7842/50000 [1:25:16<8:26:17,  1.39it/s]


 16%|█████▎                            | 7843/50000 [1:25:17<8:00:48,  1.46it/s]


 16%|█████▎                            | 7844/50000 [1:25:17<7:33:28,  1.55it/s]


 16%|█████▎                            | 7845/50000 [1:25:18<7:39:23,  1.53it/s]


 16%|█████▎                            | 7846/50000 [1:25:19<7:56:06,  1.48it/s]


 16%|█████▎                            | 7847/50000 [1:25:19<7:56:40,  1.47it/s]


 16%|█████▎                            | 7848/50000 [1:25:20<7:51:36,  1.49it/s]


 16%|█████▎                            | 7849/50000 [1:25:21<7:41:30,  1.52it/s]


 16%|█████▎                            | 7850/50000 [1:25:21<7:30:35,  1.56it/s]


 16%|█████▎                            | 7851/50000 [1:25:22<7:48:54,  1.50it/s]


 16%|█████▎                            | 7852/50000 [1:25:23<7:42:21,  1.52it/s]


 16%|█████▎                            | 7853/50000 [1:25:23<7:56:49,  1.47it/s]


 16%|█████▎                            | 7854/50000 [1:25:24<7:57:04,  1.47it/s]


 16%|█████▎                            | 7855/50000 [1:25:25<7:37:18,  1.54it/s]


 16%|█████▎                            | 7856/50000 [1:25:25<7:26:06,  1.57it/s]


 16%|█████▎                            | 7857/50000 [1:25:26<7:17:04,  1.61it/s]


 16%|█████▎                            | 7858/50000 [1:25:26<7:13:22,  1.62it/s]


 16%|█████▎                            | 7859/50000 [1:25:27<7:21:18,  1.59it/s]


 16%|█████▎                            | 7860/50000 [1:25:28<7:14:48,  1.62it/s]


 16%|█████▎                            | 7861/50000 [1:25:28<7:10:16,  1.63it/s]


 16%|█████▎                            | 7862/50000 [1:25:29<7:09:50,  1.63it/s]


 16%|█████▎                            | 7863/50000 [1:25:30<7:04:12,  1.66it/s]


 16%|█████▎                            | 7864/50000 [1:25:30<6:58:53,  1.68it/s]


 16%|█████▎                            | 7865/50000 [1:25:31<6:52:30,  1.70it/s]


 16%|█████▎                            | 7866/50000 [1:25:31<7:08:06,  1.64it/s]


 16%|█████▎                            | 7867/50000 [1:25:32<7:16:16,  1.61it/s]


 16%|█████▎                            | 7868/50000 [1:25:33<7:12:40,  1.62it/s]


 16%|█████▎                            | 7869/50000 [1:25:33<7:18:09,  1.60it/s]


 16%|█████▎                            | 7870/50000 [1:25:34<7:36:55,  1.54it/s]


 16%|█████▎                            | 7871/50000 [1:25:35<7:23:06,  1.58it/s]


 16%|█████▎                            | 7872/50000 [1:25:35<7:44:37,  1.51it/s]


 16%|█████▎                            | 7873/50000 [1:25:36<7:32:23,  1.55it/s]


 16%|█████▎                            | 7874/50000 [1:25:37<7:33:20,  1.55it/s]


 16%|█████▎                            | 7875/50000 [1:25:37<7:24:15,  1.58it/s]


 16%|█████▎                            | 7876/50000 [1:25:38<7:14:43,  1.61it/s]


 16%|█████▎                            | 7877/50000 [1:25:38<7:26:29,  1.57it/s]


 16%|█████▎                            | 7878/50000 [1:25:39<7:49:45,  1.49it/s]


 16%|█████▎                            | 7879/50000 [1:25:40<7:18:43,  1.60it/s]


 16%|█████▎                            | 7880/50000 [1:25:40<7:59:49,  1.46it/s]


 16%|█████▎                            | 7881/50000 [1:25:41<7:37:44,  1.53it/s]


 16%|█████▎                            | 7882/50000 [1:25:42<7:18:46,  1.60it/s]


 16%|█████▎                            | 7883/50000 [1:25:42<6:56:22,  1.69it/s]


 16%|█████▎                            | 7884/50000 [1:25:43<6:56:03,  1.69it/s]


 16%|█████▎                            | 7885/50000 [1:25:43<7:13:07,  1.62it/s]


 16%|█████▎                            | 7886/50000 [1:25:44<7:33:09,  1.55it/s]


 16%|█████▎                            | 7887/50000 [1:25:45<7:59:42,  1.46it/s]


 16%|█████▎                            | 7888/50000 [1:25:46<8:56:45,  1.31it/s]


 16%|█████▎                            | 7889/50000 [1:25:46<8:22:50,  1.40it/s]


 16%|█████▎                            | 7890/50000 [1:25:47<7:59:08,  1.46it/s]


 16%|█████▎                            | 7891/50000 [1:25:48<7:22:01,  1.59it/s]


 16%|█████▎                            | 7892/50000 [1:25:48<7:05:59,  1.65it/s]


 16%|█████▎                            | 7893/50000 [1:25:49<7:18:46,  1.60it/s]


 16%|█████▎                            | 7894/50000 [1:25:49<7:23:37,  1.58it/s]


 16%|█████▎                            | 7895/50000 [1:25:50<7:13:43,  1.62it/s]


 16%|█████▎                            | 7896/50000 [1:25:51<7:10:39,  1.63it/s]


 16%|█████▎                            | 7897/50000 [1:25:51<7:19:12,  1.60it/s]


 16%|█████▎                            | 7898/50000 [1:25:52<7:13:16,  1.62it/s]


 16%|█████▎                            | 7899/50000 [1:25:53<7:36:18,  1.54it/s]


 16%|█████▎                            | 7900/50000 [1:25:53<7:56:34,  1.47it/s]
                                                                                
{'loss': 3.3653, 'grad_norm': 2.6380856037139893, 'learning_rate': 0.000842, 'epoch': 0.41}

 16%|█████▎                            | 7900/50000 [1:25:53<7:56:34,  1.47it/s]


 16%|█████▎                            | 7901/50000 [1:25:54<8:09:22,  1.43it/s]


 16%|█████▎                            | 7902/50000 [1:25:55<7:37:48,  1.53it/s]


 16%|█████▎                            | 7903/50000 [1:25:55<7:43:36,  1.51it/s]


 16%|█████▎                            | 7904/50000 [1:25:56<7:28:58,  1.56it/s]


 16%|█████▍                            | 7905/50000 [1:25:57<8:15:13,  1.42it/s]


 16%|█████▍                            | 7906/50000 [1:25:57<7:45:51,  1.51it/s]


 16%|█████▍                            | 7907/50000 [1:25:58<7:56:05,  1.47it/s]


 16%|█████▍                            | 7908/50000 [1:25:59<8:15:48,  1.41it/s]


 16%|█████▍                            | 7909/50000 [1:25:59<8:07:32,  1.44it/s]


 16%|█████▍                            | 7910/50000 [1:26:00<8:04:26,  1.45it/s]


 16%|█████▍                            | 7911/50000 [1:26:01<7:50:17,  1.49it/s]


 16%|█████▍                            | 7912/50000 [1:26:01<7:53:46,  1.48it/s]


 16%|█████▍                            | 7913/50000 [1:26:02<7:33:29,  1.55it/s]


 16%|█████▍                            | 7914/50000 [1:26:03<7:30:03,  1.56it/s]


 16%|█████▍                            | 7915/50000 [1:26:03<7:53:28,  1.48it/s]


 16%|█████▍                            | 7916/50000 [1:26:04<7:28:02,  1.57it/s]


 16%|█████▍                            | 7917/50000 [1:26:05<7:20:46,  1.59it/s]


 16%|█████▍                            | 7918/50000 [1:26:05<7:29:01,  1.56it/s]


 16%|█████▍                            | 7919/50000 [1:26:06<8:15:14,  1.42it/s]


 16%|█████▍                            | 7920/50000 [1:26:07<8:00:31,  1.46it/s]


 16%|█████▍                            | 7921/50000 [1:26:07<7:36:49,  1.54it/s]


 16%|█████▍                            | 7922/50000 [1:26:08<7:41:38,  1.52it/s]


 16%|█████▍                            | 7923/50000 [1:26:09<7:39:36,  1.53it/s]


 16%|█████▍                            | 7924/50000 [1:26:09<7:42:17,  1.52it/s]


 16%|█████▍                            | 7925/50000 [1:26:10<7:45:28,  1.51it/s]


 16%|█████▍                            | 7926/50000 [1:26:11<7:38:17,  1.53it/s]


 16%|█████▍                            | 7927/50000 [1:26:11<7:53:47,  1.48it/s]


 16%|█████▍                            | 7928/50000 [1:26:12<8:11:24,  1.43it/s]


 16%|█████▍                            | 7929/50000 [1:26:13<8:01:43,  1.46it/s]


 16%|█████▍                            | 7930/50000 [1:26:13<7:25:07,  1.58it/s]


 16%|█████▍                            | 7931/50000 [1:26:14<7:25:25,  1.57it/s]


 16%|█████▍                            | 7932/50000 [1:26:15<7:24:17,  1.58it/s]


 16%|█████▍                            | 7933/50000 [1:26:15<7:08:49,  1.63it/s]


 16%|█████▍                            | 7934/50000 [1:26:16<7:14:04,  1.62it/s]


 16%|█████▍                            | 7935/50000 [1:26:16<7:00:18,  1.67it/s]


 16%|█████▍                            | 7936/50000 [1:26:17<7:45:14,  1.51it/s]


 16%|█████▍                            | 7937/50000 [1:26:18<7:28:45,  1.56it/s]


 16%|█████▍                            | 7938/50000 [1:26:18<7:09:14,  1.63it/s]


 16%|█████▍                            | 7939/50000 [1:26:19<6:58:17,  1.68it/s]


 16%|█████▍                            | 7940/50000 [1:26:19<6:47:58,  1.72it/s]


 16%|█████▍                            | 7941/50000 [1:26:20<7:01:52,  1.66it/s]


 16%|█████▍                            | 7942/50000 [1:26:21<7:46:19,  1.50it/s]


 16%|█████▍                            | 7943/50000 [1:26:22<8:01:58,  1.45it/s]


 16%|█████▍                            | 7944/50000 [1:26:22<7:35:08,  1.54it/s]


 16%|█████▍                            | 7945/50000 [1:26:23<7:21:51,  1.59it/s]


 16%|█████▍                            | 7946/50000 [1:26:23<7:22:24,  1.58it/s]


 16%|█████▍                            | 7947/50000 [1:26:24<7:11:31,  1.62it/s]


 16%|█████▍                            | 7948/50000 [1:26:25<7:07:54,  1.64it/s]


 16%|█████▍                            | 7949/50000 [1:26:25<7:21:17,  1.59it/s]


 16%|█████▍                            | 7950/50000 [1:26:26<7:26:32,  1.57it/s]


 16%|█████▍                            | 7951/50000 [1:26:26<7:25:02,  1.57it/s]


 16%|█████▍                            | 7952/50000 [1:26:27<7:30:32,  1.56it/s]


 16%|█████▍                            | 7953/50000 [1:26:28<7:30:24,  1.56it/s]


 16%|█████▍                            | 7954/50000 [1:26:28<7:27:40,  1.57it/s]


 16%|█████▍                            | 7955/50000 [1:26:29<7:13:42,  1.62it/s]


 16%|█████▍                            | 7956/50000 [1:26:30<7:28:00,  1.56it/s]


 16%|█████▍                            | 7957/50000 [1:26:30<7:43:39,  1.51it/s]


 16%|█████▍                            | 7958/50000 [1:26:31<7:27:41,  1.57it/s]


 16%|█████▍                            | 7959/50000 [1:26:32<7:24:29,  1.58it/s]


 16%|█████▍                            | 7960/50000 [1:26:32<7:30:07,  1.56it/s]


 16%|█████▍                            | 7961/50000 [1:26:33<7:15:23,  1.61it/s]


 16%|█████▍                            | 7962/50000 [1:26:33<6:55:07,  1.69it/s]


 16%|█████▍                            | 7963/50000 [1:26:34<7:06:50,  1.64it/s]


 16%|█████▍                            | 7964/50000 [1:26:35<6:56:37,  1.68it/s]


 16%|█████▍                            | 7965/50000 [1:26:35<6:57:14,  1.68it/s]


 16%|█████▍                            | 7966/50000 [1:26:36<7:06:58,  1.64it/s]


 16%|█████▍                            | 7967/50000 [1:26:36<7:01:57,  1.66it/s]


 16%|█████▍                            | 7968/50000 [1:26:37<7:09:05,  1.63it/s]


 16%|█████▍                            | 7969/50000 [1:26:38<6:58:04,  1.68it/s]


 16%|█████▍                            | 7970/50000 [1:26:38<6:55:39,  1.69it/s]


 16%|█████▍                            | 7971/50000 [1:26:39<6:55:36,  1.69it/s]


 16%|█████▍                            | 7972/50000 [1:26:40<7:29:30,  1.56it/s]


 16%|█████▍                            | 7973/50000 [1:26:40<7:44:51,  1.51it/s]


 16%|█████▍                            | 7974/50000 [1:26:41<7:58:37,  1.46it/s]


 16%|█████▍                            | 7975/50000 [1:26:41<7:25:47,  1.57it/s]


 16%|█████▍                            | 7976/50000 [1:26:42<7:19:28,  1.59it/s]


 16%|█████▍                            | 7977/50000 [1:26:43<7:23:05,  1.58it/s]


 16%|█████▍                            | 7978/50000 [1:26:43<7:29:32,  1.56it/s]


 16%|█████▍                            | 7979/50000 [1:26:44<7:59:58,  1.46it/s]


 16%|█████▍                            | 7980/50000 [1:26:45<8:09:15,  1.43it/s]


 16%|█████▍                            | 7981/50000 [1:26:46<8:04:30,  1.45it/s]


 16%|█████▍                            | 7982/50000 [1:26:46<7:59:59,  1.46it/s]


 16%|█████▍                            | 7983/50000 [1:26:47<7:54:00,  1.48it/s]


 16%|█████▍                            | 7984/50000 [1:26:48<7:35:13,  1.54it/s]


 16%|█████▍                            | 7985/50000 [1:26:48<7:33:57,  1.54it/s]


 16%|█████▍                            | 7986/50000 [1:26:49<7:26:22,  1.57it/s]


 16%|█████▍                            | 7987/50000 [1:26:49<7:26:30,  1.57it/s]


 16%|█████▍                            | 7988/50000 [1:26:50<7:51:15,  1.49it/s]


 16%|█████▍                            | 7989/50000 [1:26:51<7:41:19,  1.52it/s]


 16%|█████▍                            | 7990/50000 [1:26:52<8:00:07,  1.46it/s]


 16%|█████▍                            | 7991/50000 [1:26:52<7:34:06,  1.54it/s]


 16%|█████▍                            | 7992/50000 [1:26:53<7:59:13,  1.46it/s]


 16%|█████▍                            | 7993/50000 [1:26:53<7:30:56,  1.55it/s]


 16%|█████▍                            | 7994/50000 [1:26:54<7:27:03,  1.57it/s]


 16%|█████▍                            | 7995/50000 [1:26:55<7:16:22,  1.60it/s]


 16%|█████▍                            | 7996/50000 [1:26:55<7:35:36,  1.54it/s]


 16%|█████▍                            | 7997/50000 [1:26:56<7:56:11,  1.47it/s]


 16%|█████▍                            | 7998/50000 [1:26:57<7:47:23,  1.50it/s]


 16%|█████▍                            | 7999/50000 [1:26:57<7:57:35,  1.47it/s]


 16%|█████▍                            | 8000/50000 [1:26:58<7:42:34,  1.51it/s]
                                                                                
{'loss': 3.3689, 'grad_norm': 2.3987536430358887, 'learning_rate': 0.00084, 'epoch': 0.42}

 16%|█████▍                            | 8000/50000 [1:26:58<7:42:34,  1.51it/s]


 16%|█████▍                            | 8001/50000 [1:26:59<7:42:51,  1.51it/s]


 16%|█████▍                            | 8002/50000 [1:26:59<7:29:31,  1.56it/s]


 16%|█████▍                            | 8003/50000 [1:27:00<7:28:13,  1.56it/s]


 16%|█████▍                            | 8004/50000 [1:27:00<7:08:43,  1.63it/s]


 16%|█████▍                            | 8005/50000 [1:27:01<7:17:30,  1.60it/s]


 16%|█████▍                            | 8006/50000 [1:27:02<7:19:45,  1.59it/s]


 16%|█████▍                            | 8007/50000 [1:27:02<7:21:23,  1.59it/s]


 16%|█████▍                            | 8008/50000 [1:27:03<7:33:07,  1.54it/s]


 16%|█████▍                            | 8009/50000 [1:27:04<7:30:42,  1.55it/s]


 16%|█████▍                            | 8010/50000 [1:27:04<7:34:20,  1.54it/s]


 16%|█████▍                            | 8011/50000 [1:27:05<7:14:27,  1.61it/s]


 16%|█████▍                            | 8012/50000 [1:27:06<7:04:40,  1.65it/s]


 16%|█████▍                            | 8013/50000 [1:27:06<6:57:33,  1.68it/s]


 16%|█████▍                            | 8014/50000 [1:27:07<7:03:08,  1.65it/s]


 16%|█████▍                            | 8015/50000 [1:27:07<6:59:00,  1.67it/s]


 16%|█████▍                            | 8016/50000 [1:27:08<7:00:07,  1.67it/s]


 16%|█████▍                            | 8017/50000 [1:27:08<6:51:41,  1.70it/s]


 16%|█████▍                            | 8018/50000 [1:27:09<7:01:44,  1.66it/s]


 16%|█████▍                            | 8019/50000 [1:27:10<7:31:48,  1.55it/s]


 16%|█████▍                            | 8020/50000 [1:27:11<7:33:35,  1.54it/s]


 16%|█████▍                            | 8021/50000 [1:27:11<7:51:24,  1.48it/s]


 16%|█████▍                            | 8022/50000 [1:27:12<7:51:25,  1.48it/s]


 16%|█████▍                            | 8023/50000 [1:27:13<7:52:48,  1.48it/s]


 16%|█████▍                            | 8024/50000 [1:27:13<7:21:04,  1.59it/s]


 16%|█████▍                            | 8025/50000 [1:27:14<7:15:32,  1.61it/s]


 16%|█████▍                            | 8026/50000 [1:27:14<7:45:36,  1.50it/s]


 16%|█████▍                            | 8027/50000 [1:27:15<7:57:17,  1.47it/s]


 16%|█████▍                            | 8028/50000 [1:27:16<7:38:12,  1.53it/s]


 16%|█████▍                            | 8029/50000 [1:27:16<7:32:52,  1.54it/s]


 16%|█████▍                            | 8030/50000 [1:27:17<7:34:18,  1.54it/s]


 16%|█████▍                            | 8031/50000 [1:27:18<7:16:45,  1.60it/s]


 16%|█████▍                            | 8032/50000 [1:27:18<7:12:26,  1.62it/s]


 16%|█████▍                            | 8033/50000 [1:27:19<7:47:21,  1.50it/s]


 16%|█████▍                            | 8034/50000 [1:27:20<7:23:19,  1.58it/s]


 16%|█████▍                            | 8035/50000 [1:27:20<7:34:26,  1.54it/s]


 16%|█████▍                            | 8036/50000 [1:27:21<7:28:59,  1.56it/s]


 16%|█████▍                            | 8037/50000 [1:27:22<7:35:48,  1.53it/s]


 16%|█████▍                            | 8038/50000 [1:27:22<8:17:47,  1.40it/s]


 16%|█████▍                            | 8039/50000 [1:27:23<7:30:41,  1.55it/s]


 16%|█████▍                            | 8040/50000 [1:27:24<7:20:50,  1.59it/s]


 16%|█████▍                            | 8041/50000 [1:27:24<7:23:16,  1.58it/s]


 16%|█████▍                            | 8042/50000 [1:27:25<6:56:28,  1.68it/s]


 16%|█████▍                            | 8043/50000 [1:27:25<7:26:44,  1.57it/s]


 16%|█████▍                            | 8044/50000 [1:27:26<7:10:13,  1.63it/s]


 16%|█████▍                            | 8045/50000 [1:27:27<7:02:32,  1.65it/s]


 16%|█████▍                            | 8046/50000 [1:27:27<6:57:06,  1.68it/s]


 16%|█████▍                            | 8047/50000 [1:27:28<6:46:41,  1.72it/s]


 16%|█████▍                            | 8048/50000 [1:27:28<7:23:50,  1.58it/s]


 16%|█████▍                            | 8049/50000 [1:27:29<8:03:55,  1.44it/s]


 16%|█████▍                            | 8050/50000 [1:27:30<7:46:40,  1.50it/s]


 16%|█████▍                            | 8051/50000 [1:27:30<7:28:00,  1.56it/s]


 16%|█████▍                            | 8052/50000 [1:27:31<7:54:24,  1.47it/s]


 16%|█████▍                            | 8053/50000 [1:27:32<7:42:56,  1.51it/s]


 16%|█████▍                            | 8054/50000 [1:27:33<8:14:35,  1.41it/s]


 16%|█████▍                            | 8055/50000 [1:27:33<7:36:16,  1.53it/s]


 16%|█████▍                            | 8056/50000 [1:27:34<7:40:43,  1.52it/s]


 16%|█████▍                            | 8057/50000 [1:27:35<7:51:59,  1.48it/s]


 16%|█████▍                            | 8058/50000 [1:27:35<7:49:06,  1.49it/s]


 16%|█████▍                            | 8059/50000 [1:27:36<7:40:26,  1.52it/s]


 16%|█████▍                            | 8060/50000 [1:27:37<7:54:49,  1.47it/s]


 16%|█████▍                            | 8061/50000 [1:27:37<7:40:16,  1.52it/s]


 16%|█████▍                            | 8062/50000 [1:27:38<7:30:07,  1.55it/s]


 16%|█████▍                            | 8063/50000 [1:27:38<7:13:52,  1.61it/s]


 16%|█████▍                            | 8064/50000 [1:27:39<7:26:34,  1.57it/s]


 16%|█████▍                            | 8065/50000 [1:27:40<7:27:11,  1.56it/s]


 16%|█████▍                            | 8066/50000 [1:27:40<7:44:12,  1.51it/s]


 16%|█████▍                            | 8067/50000 [1:27:41<7:24:59,  1.57it/s]


 16%|█████▍                            | 8068/50000 [1:27:42<7:11:38,  1.62it/s]


 16%|█████▍                            | 8069/50000 [1:27:42<7:14:10,  1.61it/s]


 16%|█████▍                            | 8070/50000 [1:27:43<7:48:28,  1.49it/s]


 16%|█████▍                            | 8071/50000 [1:27:44<7:25:19,  1.57it/s]


 16%|█████▍                            | 8072/50000 [1:27:44<7:07:45,  1.63it/s]


 16%|█████▍                            | 8073/50000 [1:27:45<7:59:51,  1.46it/s]


 16%|█████▍                            | 8074/50000 [1:27:46<7:48:22,  1.49it/s]


 16%|█████▍                            | 8075/50000 [1:27:46<7:48:13,  1.49it/s]


 16%|█████▍                            | 8076/50000 [1:27:47<8:40:22,  1.34it/s]


 16%|█████▍                            | 8077/50000 [1:27:48<8:10:07,  1.43it/s]


 16%|█████▍                            | 8078/50000 [1:27:48<7:50:06,  1.49it/s]


 16%|█████▍                            | 8079/50000 [1:27:49<7:37:23,  1.53it/s]


 16%|█████▍                            | 8080/50000 [1:27:50<7:37:57,  1.53it/s]


 16%|█████▍                            | 8081/50000 [1:27:50<7:34:19,  1.54it/s]


 16%|█████▍                            | 8082/50000 [1:27:51<7:20:05,  1.59it/s]


 16%|█████▍                            | 8083/50000 [1:27:51<7:13:46,  1.61it/s]


 16%|█████▍                            | 8084/50000 [1:27:52<7:06:43,  1.64it/s]


 16%|█████▍                            | 8085/50000 [1:27:53<6:54:11,  1.69it/s]


 16%|█████▍                            | 8086/50000 [1:27:53<7:02:26,  1.65it/s]


 16%|█████▍                            | 8087/50000 [1:27:54<7:08:05,  1.63it/s]


 16%|█████▍                            | 8088/50000 [1:27:55<7:13:25,  1.61it/s]


 16%|█████▌                            | 8089/50000 [1:27:55<7:07:47,  1.63it/s]


 16%|█████▌                            | 8090/50000 [1:27:56<7:13:42,  1.61it/s]


 16%|█████▌                            | 8091/50000 [1:27:57<7:46:20,  1.50it/s]


 16%|█████▌                            | 8092/50000 [1:27:57<8:03:04,  1.45it/s]


 16%|█████▌                            | 8093/50000 [1:27:58<7:59:29,  1.46it/s]


 16%|█████▌                            | 8094/50000 [1:27:58<7:30:20,  1.55it/s]


 16%|█████▌                            | 8095/50000 [1:27:59<7:50:17,  1.49it/s]


 16%|█████▌                            | 8096/50000 [1:28:00<8:09:35,  1.43it/s]


 16%|█████▌                            | 8097/50000 [1:28:01<7:59:59,  1.45it/s]


 16%|█████▌                            | 8098/50000 [1:28:01<7:34:35,  1.54it/s]


 16%|█████▌                            | 8099/50000 [1:28:02<7:48:08,  1.49it/s]


 16%|█████▌                            | 8100/50000 [1:28:03<7:33:26,  1.54it/s]
                                                                                
{'loss': 3.3407, 'grad_norm': 2.8052730560302734, 'learning_rate': 0.000838, 'epoch': 0.42}

 16%|█████▌                            | 8100/50000 [1:28:03<7:33:26,  1.54it/s]


 16%|█████▌                            | 8101/50000 [1:28:03<8:17:54,  1.40it/s]


 16%|█████▌                            | 8102/50000 [1:28:04<8:23:58,  1.39it/s]


 16%|█████▌                            | 8103/50000 [1:28:05<8:15:43,  1.41it/s]


 16%|█████▌                            | 8104/50000 [1:28:05<7:47:42,  1.49it/s]


 16%|█████▌                            | 8105/50000 [1:28:06<7:27:53,  1.56it/s]


 16%|█████▌                            | 8106/50000 [1:28:07<7:20:31,  1.58it/s]


 16%|█████▌                            | 8107/50000 [1:28:07<7:30:39,  1.55it/s]


 16%|█████▌                            | 8108/50000 [1:28:08<7:33:16,  1.54it/s]


 16%|█████▌                            | 8109/50000 [1:28:08<7:02:28,  1.65it/s]


 16%|█████▌                            | 8110/50000 [1:28:09<7:35:17,  1.53it/s]


 16%|█████▌                            | 8111/50000 [1:28:10<7:56:46,  1.46it/s]


 16%|█████▌                            | 8112/50000 [1:28:11<7:46:05,  1.50it/s]


 16%|█████▌                            | 8113/50000 [1:28:11<7:42:38,  1.51it/s]


 16%|█████▌                            | 8114/50000 [1:28:12<7:53:28,  1.47it/s]


 16%|█████▌                            | 8115/50000 [1:28:12<7:15:16,  1.60it/s]


 16%|█████▌                            | 8116/50000 [1:28:13<7:22:12,  1.58it/s]


 16%|█████▌                            | 8117/50000 [1:28:14<7:50:24,  1.48it/s]


 16%|█████▌                            | 8118/50000 [1:28:15<7:58:54,  1.46it/s]


 16%|█████▌                            | 8119/50000 [1:28:15<7:35:34,  1.53it/s]


 16%|█████▌                            | 8120/50000 [1:28:16<8:10:59,  1.42it/s]


 16%|█████▌                            | 8121/50000 [1:28:16<7:26:18,  1.56it/s]


 16%|█████▌                            | 8122/50000 [1:28:17<7:12:45,  1.61it/s]


 16%|█████▌                            | 8123/50000 [1:28:18<7:24:32,  1.57it/s]


 16%|█████▌                            | 8124/50000 [1:28:18<7:27:05,  1.56it/s]


 16%|█████▌                            | 8125/50000 [1:28:19<7:24:48,  1.57it/s]


 16%|█████▌                            | 8126/50000 [1:28:20<7:28:28,  1.56it/s]


 16%|█████▌                            | 8127/50000 [1:28:20<7:45:01,  1.50it/s]


 16%|█████▌                            | 8128/50000 [1:28:21<7:26:51,  1.56it/s]


 16%|█████▌                            | 8129/50000 [1:28:21<7:08:41,  1.63it/s]


 16%|█████▌                            | 8130/50000 [1:28:22<7:10:33,  1.62it/s]


 16%|█████▌                            | 8131/50000 [1:28:23<6:58:21,  1.67it/s]


 16%|█████▌                            | 8132/50000 [1:28:23<6:52:13,  1.69it/s]


 16%|█████▌                            | 8133/50000 [1:28:24<7:04:47,  1.64it/s]


 16%|█████▌                            | 8134/50000 [1:28:24<6:57:39,  1.67it/s]


 16%|█████▌                            | 8135/50000 [1:28:25<6:50:07,  1.70it/s]


 16%|█████▌                            | 8136/50000 [1:28:26<6:53:54,  1.69it/s]


 16%|█████▌                            | 8137/50000 [1:28:26<7:14:25,  1.61it/s]


 16%|█████▌                            | 8138/50000 [1:28:27<6:59:22,  1.66it/s]


 16%|█████▌                            | 8139/50000 [1:28:28<7:17:41,  1.59it/s]


 16%|█████▌                            | 8140/50000 [1:28:28<6:51:27,  1.70it/s]


 16%|█████▌                            | 8141/50000 [1:28:29<7:24:37,  1.57it/s]


 16%|█████▌                            | 8142/50000 [1:28:29<7:19:22,  1.59it/s]


 16%|█████▌                            | 8143/50000 [1:28:30<6:56:25,  1.68it/s]


 16%|█████▌                            | 8144/50000 [1:28:31<7:28:07,  1.56it/s]


 16%|█████▌                            | 8145/50000 [1:28:32<8:04:35,  1.44it/s]


 16%|█████▌                            | 8146/50000 [1:28:32<7:43:23,  1.51it/s]


 16%|█████▌                            | 8147/50000 [1:28:33<7:36:37,  1.53it/s]


 16%|█████▌                            | 8148/50000 [1:28:33<7:42:28,  1.51it/s]


 16%|█████▌                            | 8149/50000 [1:28:34<8:14:12,  1.41it/s]


 16%|█████▌                            | 8150/50000 [1:28:35<8:01:35,  1.45it/s]


 16%|█████▌                            | 8151/50000 [1:28:36<8:10:42,  1.42it/s]


 16%|█████▌                            | 8152/50000 [1:28:36<8:03:12,  1.44it/s]


 16%|█████▌                            | 8153/50000 [1:28:37<7:40:48,  1.51it/s]


 16%|█████▌                            | 8154/50000 [1:28:37<7:27:41,  1.56it/s]


 16%|█████▌                            | 8155/50000 [1:28:38<8:03:35,  1.44it/s]


 16%|█████▌                            | 8156/50000 [1:28:39<8:20:22,  1.39it/s]


 16%|█████▌                            | 8157/50000 [1:28:40<8:08:45,  1.43it/s]


 16%|█████▌                            | 8158/50000 [1:28:40<7:31:57,  1.54it/s]


 16%|█████▌                            | 8159/50000 [1:28:41<7:17:30,  1.59it/s]


 16%|█████▌                            | 8160/50000 [1:28:41<7:11:24,  1.62it/s]


 16%|█████▌                            | 8161/50000 [1:28:42<7:31:04,  1.55it/s]


 16%|█████▌                            | 8162/50000 [1:28:43<8:46:27,  1.32it/s]


 16%|█████▌                            | 8163/50000 [1:28:44<8:46:01,  1.33it/s]


 16%|█████▌                            | 8164/50000 [1:28:45<8:52:21,  1.31it/s]


 16%|█████▌                            | 8165/50000 [1:28:45<8:33:44,  1.36it/s]


 16%|█████▌                            | 8166/50000 [1:28:46<8:23:25,  1.38it/s]


 16%|█████▌                            | 8167/50000 [1:28:47<8:08:07,  1.43it/s]


 16%|█████▌                            | 8168/50000 [1:28:47<7:41:28,  1.51it/s]


 16%|█████▌                            | 8169/50000 [1:28:48<8:04:30,  1.44it/s]


 16%|█████▌                            | 8170/50000 [1:28:49<7:58:48,  1.46it/s]


 16%|█████▌                            | 8171/50000 [1:28:49<7:38:53,  1.52it/s]


 16%|█████▌                            | 8172/50000 [1:28:50<7:25:05,  1.57it/s]


 16%|█████▌                            | 8173/50000 [1:28:50<7:07:24,  1.63it/s]


 16%|█████▌                            | 8174/50000 [1:28:51<7:09:36,  1.62it/s]


 16%|█████▌                            | 8175/50000 [1:28:52<7:50:47,  1.48it/s]


 16%|█████▌                            | 8176/50000 [1:28:53<7:46:30,  1.49it/s]


 16%|█████▌                            | 8177/50000 [1:28:53<7:46:15,  1.49it/s]


 16%|█████▌                            | 8178/50000 [1:28:54<7:39:22,  1.52it/s]


 16%|█████▌                            | 8179/50000 [1:28:54<7:35:49,  1.53it/s]


 16%|█████▌                            | 8180/50000 [1:28:55<7:49:25,  1.48it/s]


 16%|█████▌                            | 8181/50000 [1:28:56<7:45:26,  1.50it/s]


 16%|█████▌                            | 8182/50000 [1:28:57<7:56:39,  1.46it/s]


 16%|█████▌                            | 8183/50000 [1:28:57<8:31:42,  1.36it/s]


 16%|█████▌                            | 8184/50000 [1:28:58<7:59:15,  1.45it/s]


 16%|█████▌                            | 8185/50000 [1:28:59<7:41:43,  1.51it/s]


 16%|█████▌                            | 8186/50000 [1:28:59<7:20:23,  1.58it/s]


 16%|█████▌                            | 8187/50000 [1:29:00<7:38:30,  1.52it/s]


 16%|█████▌                            | 8188/50000 [1:29:01<8:02:20,  1.44it/s]


 16%|█████▌                            | 8189/50000 [1:29:01<7:32:08,  1.54it/s]


 16%|█████▌                            | 8190/50000 [1:29:02<7:45:30,  1.50it/s]


 16%|█████▌                            | 8191/50000 [1:29:02<7:22:58,  1.57it/s]


 16%|█████▌                            | 8192/50000 [1:29:03<7:13:41,  1.61it/s]


 16%|█████▌                            | 8193/50000 [1:29:04<7:16:34,  1.60it/s]


 16%|█████▌                            | 8194/50000 [1:29:04<7:40:21,  1.51it/s]


 16%|█████▌                            | 8195/50000 [1:29:05<7:44:37,  1.50it/s]


 16%|█████▌                            | 8196/50000 [1:29:06<8:09:24,  1.42it/s]


 16%|█████▌                            | 8197/50000 [1:29:07<7:54:17,  1.47it/s]


 16%|█████▌                            | 8198/50000 [1:29:07<7:38:11,  1.52it/s]


 16%|█████▌                            | 8199/50000 [1:29:08<7:40:17,  1.51it/s]


 16%|█████▌                            | 8200/50000 [1:29:08<7:32:32,  1.54it/s]
                                                                                
{'loss': 3.358, 'grad_norm': 2.676485776901245, 'learning_rate': 0.0008359999999999999, 'epoch': 0.43}

 16%|█████▌                            | 8200/50000 [1:29:08<7:32:32,  1.54it/s]


 16%|█████▌                            | 8201/50000 [1:29:09<7:22:34,  1.57it/s]


 16%|█████▌                            | 8202/50000 [1:29:10<7:25:18,  1.56it/s]


 16%|█████▌                            | 8203/50000 [1:29:10<7:15:22,  1.60it/s]


 16%|█████▌                            | 8204/50000 [1:29:11<7:38:15,  1.52it/s]


 16%|█████▌                            | 8205/50000 [1:29:12<7:23:08,  1.57it/s]


 16%|█████▌                            | 8206/50000 [1:29:13<8:27:02,  1.37it/s]


 16%|█████▌                            | 8207/50000 [1:29:13<8:17:05,  1.40it/s]


 16%|█████▌                            | 8208/50000 [1:29:14<8:03:21,  1.44it/s]


 16%|█████▌                            | 8209/50000 [1:29:15<7:49:56,  1.48it/s]


 16%|█████▌                            | 8210/50000 [1:29:15<7:39:15,  1.52it/s]


 16%|█████▌                            | 8211/50000 [1:29:16<7:54:51,  1.47it/s]


 16%|█████▌                            | 8212/50000 [1:29:17<7:53:14,  1.47it/s]


 16%|█████▌                            | 8213/50000 [1:29:17<7:34:52,  1.53it/s]


 16%|█████▌                            | 8214/50000 [1:29:18<7:37:56,  1.52it/s]


 16%|█████▌                            | 8215/50000 [1:29:18<7:24:06,  1.57it/s]


 16%|█████▌                            | 8216/50000 [1:29:19<7:10:34,  1.62it/s]


 16%|█████▌                            | 8217/50000 [1:29:19<6:48:29,  1.70it/s]


 16%|█████▌                            | 8218/50000 [1:29:20<7:38:37,  1.52it/s]


 16%|█████▌                            | 8219/50000 [1:29:21<7:33:53,  1.53it/s]


 16%|█████▌                            | 8220/50000 [1:29:22<7:21:11,  1.58it/s]


 16%|█████▌                            | 8221/50000 [1:29:22<7:44:37,  1.50it/s]


 16%|█████▌                            | 8222/50000 [1:29:23<7:44:46,  1.50it/s]


 16%|█████▌                            | 8223/50000 [1:29:24<7:59:52,  1.45it/s]


 16%|█████▌                            | 8224/50000 [1:29:24<7:49:32,  1.48it/s]


 16%|█████▌                            | 8225/50000 [1:29:25<7:34:34,  1.53it/s]


 16%|█████▌                            | 8226/50000 [1:29:26<7:38:24,  1.52it/s]


 16%|█████▌                            | 8227/50000 [1:29:26<7:31:23,  1.54it/s]


 16%|█████▌                            | 8228/50000 [1:29:27<7:35:52,  1.53it/s]


 16%|█████▌                            | 8229/50000 [1:29:28<7:33:53,  1.53it/s]


 16%|█████▌                            | 8230/50000 [1:29:28<7:17:59,  1.59it/s]


 16%|█████▌                            | 8231/50000 [1:29:29<7:06:16,  1.63it/s]


 16%|█████▌                            | 8232/50000 [1:29:29<7:19:11,  1.59it/s]


 16%|█████▌                            | 8233/50000 [1:29:30<7:46:03,  1.49it/s]


 16%|█████▌                            | 8234/50000 [1:29:31<7:29:43,  1.55it/s]


 16%|█████▌                            | 8235/50000 [1:29:31<6:59:10,  1.66it/s]


 16%|█████▌                            | 8236/50000 [1:29:32<6:35:11,  1.76it/s]


 16%|█████▌                            | 8237/50000 [1:29:32<6:56:03,  1.67it/s]


 16%|█████▌                            | 8238/50000 [1:29:33<7:09:20,  1.62it/s]


 16%|█████▌                            | 8239/50000 [1:29:34<7:01:28,  1.65it/s]


 16%|█████▌                            | 8240/50000 [1:29:34<7:07:53,  1.63it/s]


 16%|█████▌                            | 8241/50000 [1:29:35<7:16:09,  1.60it/s]


 16%|█████▌                            | 8242/50000 [1:29:36<7:45:10,  1.50it/s]


 16%|█████▌                            | 8243/50000 [1:29:36<7:47:39,  1.49it/s]


 16%|█████▌                            | 8244/50000 [1:29:37<7:46:45,  1.49it/s]


 16%|█████▌                            | 8245/50000 [1:29:38<7:47:28,  1.49it/s]


 16%|█████▌                            | 8246/50000 [1:29:38<7:50:39,  1.48it/s]


 16%|█████▌                            | 8247/50000 [1:29:39<7:31:37,  1.54it/s]


 16%|█████▌                            | 8248/50000 [1:29:40<8:12:35,  1.41it/s]


 16%|█████▌                            | 8249/50000 [1:29:40<7:57:35,  1.46it/s]


 16%|█████▌                            | 8250/50000 [1:29:41<7:36:29,  1.52it/s]


 17%|█████▌                            | 8251/50000 [1:29:42<7:42:55,  1.50it/s]


 17%|█████▌                            | 8252/50000 [1:29:42<7:22:15,  1.57it/s]


 17%|█████▌                            | 8253/50000 [1:29:43<7:28:59,  1.55it/s]


 17%|█████▌                            | 8254/50000 [1:29:44<7:32:19,  1.54it/s]


 17%|█████▌                            | 8255/50000 [1:29:44<7:34:39,  1.53it/s]


 17%|█████▌                            | 8256/50000 [1:29:45<7:16:28,  1.59it/s]


 17%|█████▌                            | 8257/50000 [1:29:45<7:16:59,  1.59it/s]


 17%|█████▌                            | 8258/50000 [1:29:46<7:09:19,  1.62it/s]


 17%|█████▌                            | 8259/50000 [1:29:47<7:13:15,  1.61it/s]


 17%|█████▌                            | 8260/50000 [1:29:47<7:14:41,  1.60it/s]


 17%|█████▌                            | 8261/50000 [1:29:48<7:17:10,  1.59it/s]


 17%|█████▌                            | 8262/50000 [1:29:49<7:26:14,  1.56it/s]


 17%|█████▌                            | 8263/50000 [1:29:49<7:31:03,  1.54it/s]


 17%|█████▌                            | 8264/50000 [1:29:50<7:33:45,  1.53it/s]


 17%|█████▌                            | 8265/50000 [1:29:51<7:51:59,  1.47it/s]


 17%|█████▌                            | 8266/50000 [1:29:51<7:48:25,  1.48it/s]


 17%|█████▌                            | 8267/50000 [1:29:52<8:03:34,  1.44it/s]


 17%|█████▌                            | 8268/50000 [1:29:53<7:51:14,  1.48it/s]


 17%|█████▌                            | 8269/50000 [1:29:53<7:43:49,  1.50it/s]


 17%|█████▌                            | 8270/50000 [1:29:54<7:41:23,  1.51it/s]


 17%|█████▌                            | 8271/50000 [1:29:55<7:29:02,  1.55it/s]


 17%|█████▌                            | 8272/50000 [1:29:55<7:08:35,  1.62it/s]


 17%|█████▋                            | 8273/50000 [1:29:56<6:58:12,  1.66it/s]


 17%|█████▋                            | 8274/50000 [1:29:56<7:21:24,  1.58it/s]


 17%|█████▋                            | 8275/50000 [1:29:57<7:13:37,  1.60it/s]


 17%|█████▋                            | 8276/50000 [1:29:58<7:39:15,  1.51it/s]


 17%|█████▋                            | 8277/50000 [1:29:59<7:58:38,  1.45it/s]


 17%|█████▋                            | 8278/50000 [1:29:59<7:51:35,  1.47it/s]


 17%|█████▋                            | 8279/50000 [1:30:00<7:42:46,  1.50it/s]


 17%|█████▋                            | 8280/50000 [1:30:01<8:03:38,  1.44it/s]


 17%|█████▋                            | 8281/50000 [1:30:01<8:10:15,  1.42it/s]


 17%|█████▋                            | 8282/50000 [1:30:02<7:58:27,  1.45it/s]


 17%|█████▋                            | 8283/50000 [1:30:03<8:03:31,  1.44it/s]


 17%|█████▋                            | 8284/50000 [1:30:04<8:37:45,  1.34it/s]


 17%|█████▋                            | 8285/50000 [1:30:04<8:08:24,  1.42it/s]


 17%|█████▋                            | 8286/50000 [1:30:05<7:59:57,  1.45it/s]


 17%|█████▋                            | 8287/50000 [1:30:05<7:35:28,  1.53it/s]


 17%|█████▋                            | 8288/50000 [1:30:06<7:38:04,  1.52it/s]


 17%|█████▋                            | 8289/50000 [1:30:07<7:17:39,  1.59it/s]


 17%|█████▋                            | 8290/50000 [1:30:07<7:29:45,  1.55it/s]


 17%|█████▋                            | 8291/50000 [1:30:08<7:30:03,  1.54it/s]


 17%|█████▋                            | 8292/50000 [1:30:09<7:53:27,  1.47it/s]


 17%|█████▋                            | 8293/50000 [1:30:09<8:00:35,  1.45it/s]


 17%|█████▋                            | 8294/50000 [1:30:10<8:09:33,  1.42it/s]


 17%|█████▋                            | 8295/50000 [1:30:11<7:48:38,  1.48it/s]


 17%|█████▋                            | 8296/50000 [1:30:11<7:32:41,  1.54it/s]


 17%|█████▋                            | 8297/50000 [1:30:12<7:38:41,  1.52it/s]


 17%|█████▋                            | 8298/50000 [1:30:13<7:36:15,  1.52it/s]


 17%|█████▋                            | 8299/50000 [1:30:13<7:58:59,  1.45it/s]


 17%|█████▋                            | 8300/50000 [1:30:14<7:41:10,  1.51it/s]
                                                                                
{'loss': 3.353, 'grad_norm': 2.57322359085083, 'learning_rate': 0.000834, 'epoch': 0.43}

 17%|█████▋                            | 8300/50000 [1:30:14<7:41:10,  1.51it/s]


 17%|█████▋                            | 8301/50000 [1:30:15<7:35:28,  1.53it/s]


 17%|█████▋                            | 8302/50000 [1:30:15<7:41:59,  1.50it/s]


 17%|█████▋                            | 8303/50000 [1:30:16<7:37:09,  1.52it/s]


 17%|█████▋                            | 8304/50000 [1:30:17<8:26:58,  1.37it/s]


 17%|█████▋                            | 8305/50000 [1:30:18<8:18:04,  1.40it/s]


 17%|█████▋                            | 8306/50000 [1:30:18<7:44:05,  1.50it/s]


 17%|█████▋                            | 8307/50000 [1:30:19<7:46:40,  1.49it/s]


 17%|█████▋                            | 8308/50000 [1:30:19<7:10:50,  1.61it/s]


 17%|█████▋                            | 8309/50000 [1:30:20<7:40:20,  1.51it/s]


 17%|█████▋                            | 8310/50000 [1:30:21<7:33:26,  1.53it/s]


 17%|█████▋                            | 8311/50000 [1:30:22<8:09:30,  1.42it/s]


 17%|█████▋                            | 8312/50000 [1:30:22<7:40:41,  1.51it/s]


 17%|█████▋                            | 8313/50000 [1:30:23<7:50:58,  1.48it/s]


 17%|█████▋                            | 8314/50000 [1:30:24<7:48:47,  1.48it/s]


 17%|█████▋                            | 8315/50000 [1:30:24<8:17:32,  1.40it/s]


 17%|█████▋                            | 8316/50000 [1:30:25<7:55:27,  1.46it/s]


 17%|█████▋                            | 8317/50000 [1:30:26<7:34:46,  1.53it/s]


 17%|█████▋                            | 8318/50000 [1:30:26<8:13:08,  1.41it/s]


 17%|█████▋                            | 8319/50000 [1:30:27<7:33:13,  1.53it/s]


 17%|█████▋                            | 8320/50000 [1:30:28<7:56:45,  1.46it/s]


 17%|█████▋                            | 8321/50000 [1:30:28<7:27:44,  1.55it/s]


 17%|█████▋                            | 8322/50000 [1:30:29<7:07:35,  1.62it/s]


 17%|█████▋                            | 8323/50000 [1:30:29<7:33:26,  1.53it/s]


 17%|█████▋                            | 8324/50000 [1:30:30<7:53:00,  1.47it/s]


 17%|█████▋                            | 8325/50000 [1:30:31<7:51:07,  1.47it/s]


 17%|█████▋                            | 8326/50000 [1:30:32<8:07:08,  1.43it/s]


 17%|█████▋                            | 8327/50000 [1:30:32<7:42:56,  1.50it/s]


 17%|█████▋                            | 8328/50000 [1:30:33<7:23:36,  1.57it/s]


 17%|█████▋                            | 8329/50000 [1:30:33<7:17:42,  1.59it/s]


 17%|█████▋                            | 8330/50000 [1:30:34<7:37:55,  1.52it/s]


 17%|█████▋                            | 8331/50000 [1:30:35<7:52:05,  1.47it/s]


 17%|█████▋                            | 8332/50000 [1:30:35<7:29:48,  1.54it/s]


 17%|█████▋                            | 8333/50000 [1:30:36<7:14:13,  1.60it/s]


 17%|█████▋                            | 8334/50000 [1:30:37<7:16:31,  1.59it/s]


 17%|█████▋                            | 8335/50000 [1:30:37<7:56:22,  1.46it/s]


 17%|█████▋                            | 8336/50000 [1:30:38<7:49:47,  1.48it/s]


 17%|█████▋                            | 8337/50000 [1:30:39<7:25:32,  1.56it/s]


 17%|█████▋                            | 8338/50000 [1:30:39<7:40:16,  1.51it/s]


 17%|█████▋                            | 8339/50000 [1:30:40<7:28:01,  1.55it/s]


 17%|█████▋                            | 8340/50000 [1:30:41<7:43:22,  1.50it/s]


 17%|█████▋                            | 8341/50000 [1:30:41<7:09:44,  1.62it/s]


 17%|█████▋                            | 8342/50000 [1:30:42<7:19:54,  1.58it/s]


 17%|█████▋                            | 8343/50000 [1:30:43<7:12:27,  1.61it/s]


 17%|█████▋                            | 8344/50000 [1:30:43<7:24:43,  1.56it/s]


 17%|█████▋                            | 8345/50000 [1:30:44<7:44:01,  1.50it/s]


 17%|█████▋                            | 8346/50000 [1:30:45<7:42:34,  1.50it/s]


 17%|█████▋                            | 8347/50000 [1:30:45<7:34:57,  1.53it/s]


 17%|█████▋                            | 8348/50000 [1:30:46<7:19:01,  1.58it/s]


 17%|█████▋                            | 8349/50000 [1:30:46<7:22:24,  1.57it/s]


 17%|█████▋                            | 8350/50000 [1:30:47<6:56:35,  1.67it/s]


 17%|█████▋                            | 8351/50000 [1:30:48<7:21:22,  1.57it/s]


 17%|█████▋                            | 8352/50000 [1:30:48<7:28:03,  1.55it/s]


 17%|█████▋                            | 8353/50000 [1:30:49<7:01:39,  1.65it/s]


 17%|█████▋                            | 8354/50000 [1:30:50<7:29:16,  1.54it/s]


 17%|█████▋                            | 8355/50000 [1:30:50<7:26:59,  1.55it/s]


 17%|█████▋                            | 8356/50000 [1:30:51<8:02:10,  1.44it/s]


 17%|█████▋                            | 8357/50000 [1:30:52<7:53:53,  1.46it/s]


 17%|█████▋                            | 8358/50000 [1:30:52<7:50:46,  1.47it/s]


 17%|█████▋                            | 8359/50000 [1:30:53<7:44:34,  1.49it/s]


 17%|█████▋                            | 8360/50000 [1:30:54<7:37:42,  1.52it/s]


 17%|█████▋                            | 8361/50000 [1:30:54<7:50:34,  1.47it/s]


 17%|█████▋                            | 8362/50000 [1:30:55<7:16:00,  1.59it/s]


 17%|█████▋                            | 8363/50000 [1:30:55<7:01:50,  1.65it/s]


 17%|█████▋                            | 8364/50000 [1:30:56<7:27:58,  1.55it/s]


 17%|█████▋                            | 8365/50000 [1:30:57<7:00:14,  1.65it/s]


 17%|█████▋                            | 8366/50000 [1:30:57<7:04:02,  1.64it/s]


 17%|█████▋                            | 8367/50000 [1:30:58<6:52:04,  1.68it/s]


 17%|█████▋                            | 8368/50000 [1:30:58<6:48:56,  1.70it/s]


 17%|█████▋                            | 8369/50000 [1:30:59<7:02:33,  1.64it/s]


 17%|█████▋                            | 8370/50000 [1:31:00<6:51:04,  1.69it/s]


 17%|█████▋                            | 8371/50000 [1:31:00<7:02:40,  1.64it/s]


 17%|█████▋                            | 8372/50000 [1:31:01<7:50:24,  1.47it/s]


 17%|█████▋                            | 8373/50000 [1:31:02<7:09:29,  1.62it/s]


 17%|█████▋                            | 8374/50000 [1:31:02<7:03:42,  1.64it/s]


 17%|█████▋                            | 8375/50000 [1:31:03<7:35:56,  1.52it/s]


 17%|█████▋                            | 8376/50000 [1:31:04<7:57:08,  1.45it/s]


 17%|█████▋                            | 8377/50000 [1:31:04<7:19:18,  1.58it/s]


 17%|█████▋                            | 8378/50000 [1:31:05<6:56:37,  1.67it/s]


 17%|█████▋                            | 8379/50000 [1:31:05<6:58:39,  1.66it/s]


 17%|█████▋                            | 8380/50000 [1:31:06<7:14:40,  1.60it/s]


 17%|█████▋                            | 8381/50000 [1:31:07<7:16:42,  1.59it/s]


 17%|█████▋                            | 8382/50000 [1:31:07<7:45:04,  1.49it/s]


 17%|█████▋                            | 8383/50000 [1:31:08<7:32:30,  1.53it/s]


 17%|█████▋                            | 8384/50000 [1:31:09<7:23:02,  1.57it/s]


 17%|█████▋                            | 8385/50000 [1:31:09<7:48:22,  1.48it/s]


 17%|█████▋                            | 8386/50000 [1:31:10<7:57:22,  1.45it/s]


 17%|█████▋                            | 8387/50000 [1:31:11<7:38:29,  1.51it/s]


 17%|█████▋                            | 8388/50000 [1:31:11<7:34:28,  1.53it/s]


 17%|█████▋                            | 8389/50000 [1:31:12<7:19:53,  1.58it/s]


 17%|█████▋                            | 8390/50000 [1:31:13<7:41:57,  1.50it/s]


 17%|█████▋                            | 8391/50000 [1:31:14<8:22:12,  1.38it/s]


 17%|█████▋                            | 8392/50000 [1:31:14<8:07:48,  1.42it/s]


 17%|█████▋                            | 8393/50000 [1:31:15<8:04:19,  1.43it/s]


 17%|█████▋                            | 8394/50000 [1:31:16<7:57:53,  1.45it/s]


 17%|█████▋                            | 8395/50000 [1:31:16<7:46:50,  1.49it/s]


 17%|█████▋                            | 8396/50000 [1:31:17<7:36:22,  1.52it/s]


 17%|█████▋                            | 8397/50000 [1:31:18<7:50:40,  1.47it/s]


 17%|█████▋                            | 8398/50000 [1:31:18<7:17:26,  1.59it/s]


 17%|█████▋                            | 8399/50000 [1:31:19<6:55:20,  1.67it/s]


 17%|█████▋                            | 8400/50000 [1:31:19<6:47:42,  1.70it/s]
                                                                                
{'loss': 3.3911, 'grad_norm': 2.572262763977051, 'learning_rate': 0.000832, 'epoch': 0.44}

 17%|█████▋                            | 8400/50000 [1:31:19<6:47:42,  1.70it/s]


 17%|█████▋                            | 8401/50000 [1:31:20<7:20:06,  1.58it/s]


 17%|█████▋                            | 8402/50000 [1:31:21<7:22:01,  1.57it/s]


 17%|█████▋                            | 8403/50000 [1:31:21<7:11:35,  1.61it/s]


 17%|█████▋                            | 8404/50000 [1:31:22<7:06:55,  1.62it/s]


 17%|█████▋                            | 8405/50000 [1:31:22<7:27:33,  1.55it/s]


 17%|█████▋                            | 8406/50000 [1:31:23<7:23:21,  1.56it/s]


 17%|█████▋                            | 8407/50000 [1:31:24<7:09:50,  1.61it/s]


 17%|█████▋                            | 8408/50000 [1:31:24<7:20:12,  1.57it/s]


 17%|█████▋                            | 8409/50000 [1:31:25<7:46:35,  1.49it/s]


 17%|█████▋                            | 8410/50000 [1:31:26<7:59:15,  1.45it/s]


 17%|█████▋                            | 8411/50000 [1:31:27<8:08:32,  1.42it/s]


 17%|█████▋                            | 8412/50000 [1:31:27<7:52:00,  1.47it/s]


 17%|█████▋                            | 8413/50000 [1:31:28<8:49:03,  1.31it/s]


 17%|█████▋                            | 8414/50000 [1:31:29<8:29:05,  1.36it/s]


 17%|█████▋                            | 8415/50000 [1:31:29<8:05:59,  1.43it/s]


 17%|█████▋                            | 8416/50000 [1:31:30<7:41:52,  1.50it/s]


 17%|█████▋                            | 8417/50000 [1:31:31<7:23:40,  1.56it/s]


 17%|█████▋                            | 8418/50000 [1:31:31<7:54:05,  1.46it/s]


 17%|█████▋                            | 8419/50000 [1:31:32<7:49:30,  1.48it/s]


 17%|█████▋                            | 8420/50000 [1:31:33<7:48:53,  1.48it/s]


 17%|█████▋                            | 8421/50000 [1:31:33<7:56:22,  1.45it/s]


 17%|█████▋                            | 8422/50000 [1:31:34<8:04:36,  1.43it/s]


 17%|█████▋                            | 8423/50000 [1:31:35<8:22:19,  1.38it/s]


 17%|█████▋                            | 8424/50000 [1:31:36<8:11:51,  1.41it/s]


 17%|█████▋                            | 8425/50000 [1:31:36<8:03:07,  1.43it/s]


 17%|█████▋                            | 8426/50000 [1:31:37<7:49:22,  1.48it/s]


 17%|█████▋                            | 8427/50000 [1:31:38<7:34:31,  1.52it/s]


 17%|█████▋                            | 8428/50000 [1:31:38<7:21:34,  1.57it/s]


 17%|█████▋                            | 8429/50000 [1:31:39<7:46:13,  1.49it/s]


 17%|█████▋                            | 8430/50000 [1:31:40<7:46:54,  1.48it/s]


 17%|█████▋                            | 8431/50000 [1:31:40<7:44:42,  1.49it/s]


 17%|█████▋                            | 8432/50000 [1:31:41<7:29:59,  1.54it/s]


 17%|█████▋                            | 8433/50000 [1:31:41<7:17:08,  1.58it/s]


 17%|█████▋                            | 8434/50000 [1:31:42<7:23:54,  1.56it/s]


 17%|█████▋                            | 8435/50000 [1:31:43<7:15:22,  1.59it/s]


 17%|█████▋                            | 8436/50000 [1:31:43<7:06:51,  1.62it/s]


 17%|█████▋                            | 8437/50000 [1:31:44<8:08:25,  1.42it/s]


 17%|█████▋                            | 8438/50000 [1:31:45<8:04:50,  1.43it/s]


 17%|█████▋                            | 8439/50000 [1:31:45<7:42:18,  1.50it/s]


 17%|█████▋                            | 8440/50000 [1:31:46<7:35:48,  1.52it/s]


 17%|█████▋                            | 8441/50000 [1:31:47<7:21:58,  1.57it/s]


 17%|█████▋                            | 8442/50000 [1:31:47<7:09:38,  1.61it/s]


 17%|█████▋                            | 8443/50000 [1:31:48<7:06:11,  1.63it/s]


 17%|█████▋                            | 8444/50000 [1:31:48<7:01:13,  1.64it/s]


 17%|█████▋                            | 8445/50000 [1:31:49<7:12:26,  1.60it/s]


 17%|█████▋                            | 8446/50000 [1:31:50<7:17:27,  1.58it/s]


 17%|█████▋                            | 8447/50000 [1:31:50<7:27:31,  1.55it/s]


 17%|█████▋                            | 8448/50000 [1:31:51<7:56:20,  1.45it/s]


 17%|█████▋                            | 8449/50000 [1:31:52<8:10:32,  1.41it/s]


 17%|█████▋                            | 8450/50000 [1:31:53<8:17:06,  1.39it/s]


 17%|█████▋                            | 8451/50000 [1:31:53<7:53:38,  1.46it/s]


 17%|█████▋                            | 8452/50000 [1:31:54<7:46:26,  1.48it/s]


 17%|█████▋                            | 8453/50000 [1:31:55<7:25:31,  1.55it/s]


 17%|█████▋                            | 8454/50000 [1:31:55<7:17:21,  1.58it/s]


 17%|█████▋                            | 8455/50000 [1:31:56<7:06:24,  1.62it/s]


 17%|█████▊                            | 8456/50000 [1:31:57<7:54:06,  1.46it/s]


 17%|█████▊                            | 8457/50000 [1:31:57<7:47:45,  1.48it/s]


 17%|█████▊                            | 8458/50000 [1:31:58<7:43:22,  1.49it/s]


 17%|█████▊                            | 8459/50000 [1:31:59<7:44:24,  1.49it/s]


 17%|█████▊                            | 8460/50000 [1:31:59<8:15:17,  1.40it/s]


 17%|█████▊                            | 8461/50000 [1:32:00<7:45:34,  1.49it/s]


 17%|█████▊                            | 8462/50000 [1:32:01<7:45:54,  1.49it/s]


 17%|█████▊                            | 8463/50000 [1:32:01<7:28:57,  1.54it/s]


 17%|█████▊                            | 8464/50000 [1:32:02<7:49:14,  1.48it/s]


 17%|█████▊                            | 8465/50000 [1:32:03<7:32:33,  1.53it/s]


 17%|█████▊                            | 8466/50000 [1:32:03<7:14:35,  1.59it/s]


 17%|█████▊                            | 8467/50000 [1:32:04<7:09:49,  1.61it/s]


 17%|█████▊                            | 8468/50000 [1:32:04<6:57:24,  1.66it/s]


 17%|█████▊                            | 8469/50000 [1:32:05<6:52:28,  1.68it/s]


 17%|█████▊                            | 8470/50000 [1:32:05<6:52:50,  1.68it/s]


 17%|█████▊                            | 8471/50000 [1:32:06<7:41:54,  1.50it/s]


 17%|█████▊                            | 8472/50000 [1:32:07<8:02:47,  1.43it/s]


 17%|█████▊                            | 8473/50000 [1:32:08<7:51:50,  1.47it/s]


 17%|█████▊                            | 8474/50000 [1:32:08<7:32:18,  1.53it/s]


 17%|█████▊                            | 8475/50000 [1:32:09<7:21:24,  1.57it/s]


 17%|█████▊                            | 8476/50000 [1:32:09<7:04:21,  1.63it/s]


 17%|█████▊                            | 8477/50000 [1:32:10<7:25:27,  1.55it/s]


 17%|█████▊                            | 8478/50000 [1:32:11<7:33:32,  1.53it/s]


 17%|█████▊                            | 8479/50000 [1:32:12<7:31:26,  1.53it/s]


 17%|█████▊                            | 8480/50000 [1:32:12<7:33:37,  1.53it/s]


 17%|█████▊                            | 8481/50000 [1:32:13<7:46:46,  1.48it/s]


 17%|█████▊                            | 8482/50000 [1:32:14<7:33:22,  1.53it/s]


 17%|█████▊                            | 8483/50000 [1:32:14<7:53:37,  1.46it/s]


 17%|█████▊                            | 8484/50000 [1:32:15<7:40:58,  1.50it/s]


 17%|█████▊                            | 8485/50000 [1:32:16<7:34:40,  1.52it/s]


 17%|█████▊                            | 8486/50000 [1:32:16<7:17:07,  1.58it/s]


 17%|█████▊                            | 8487/50000 [1:32:17<7:02:27,  1.64it/s]


 17%|█████▊                            | 8488/50000 [1:32:17<7:13:03,  1.60it/s]


 17%|█████▊                            | 8489/50000 [1:32:18<7:34:00,  1.52it/s]


 17%|█████▊                            | 8490/50000 [1:32:19<7:06:43,  1.62it/s]


 17%|█████▊                            | 8491/50000 [1:32:19<7:21:17,  1.57it/s]


 17%|█████▊                            | 8492/50000 [1:32:20<7:36:40,  1.51it/s]


 17%|█████▊                            | 8493/50000 [1:32:21<7:19:44,  1.57it/s]


 17%|█████▊                            | 8494/50000 [1:32:21<7:24:58,  1.55it/s]


 17%|█████▊                            | 8495/50000 [1:32:22<7:28:39,  1.54it/s]


 17%|█████▊                            | 8496/50000 [1:32:23<7:25:59,  1.55it/s]


 17%|█████▊                            | 8497/50000 [1:32:23<7:51:12,  1.47it/s]


 17%|█████▊                            | 8498/50000 [1:32:24<7:31:17,  1.53it/s]


 17%|█████▊                            | 8499/50000 [1:32:25<8:23:32,  1.37it/s]


 17%|█████▊                            | 8500/50000 [1:32:25<8:05:35,  1.42it/s]
                                                                                
{'loss': 3.3533, 'grad_norm': 2.668977737426758, 'learning_rate': 0.00083, 'epoch': 0.45}

 17%|█████▊                            | 8500/50000 [1:32:25<8:05:35,  1.42it/s]


 17%|█████▊                            | 8501/50000 [1:32:26<7:42:35,  1.50it/s]


 17%|█████▊                            | 8502/50000 [1:32:27<7:57:15,  1.45it/s]


 17%|█████▊                            | 8503/50000 [1:32:27<8:06:16,  1.42it/s]


 17%|█████▊                            | 8504/50000 [1:32:28<8:01:31,  1.44it/s]


 17%|█████▊                            | 8505/50000 [1:32:29<7:40:58,  1.50it/s]


 17%|█████▊                            | 8506/50000 [1:32:29<7:39:51,  1.50it/s]


 17%|█████▊                            | 8507/50000 [1:32:30<7:32:31,  1.53it/s]


 17%|█████▊                            | 8508/50000 [1:32:31<7:11:35,  1.60it/s]


 17%|█████▊                            | 8509/50000 [1:32:31<7:35:34,  1.52it/s]


 17%|█████▊                            | 8510/50000 [1:32:32<7:16:27,  1.58it/s]


 17%|█████▊                            | 8511/50000 [1:32:32<7:00:20,  1.65it/s]


 17%|█████▊                            | 8512/50000 [1:32:33<7:12:42,  1.60it/s]


 17%|█████▊                            | 8513/50000 [1:32:34<7:35:03,  1.52it/s]


 17%|█████▊                            | 8514/50000 [1:32:34<7:13:31,  1.59it/s]


 17%|█████▊                            | 8515/50000 [1:32:35<7:08:44,  1.61it/s]


 17%|█████▊                            | 8516/50000 [1:32:36<7:18:36,  1.58it/s]


 17%|█████▊                            | 8517/50000 [1:32:36<7:09:41,  1.61it/s]


 17%|█████▊                            | 8518/50000 [1:32:37<7:15:16,  1.59it/s]


 17%|█████▊                            | 8519/50000 [1:32:38<7:58:44,  1.44it/s]


 17%|█████▊                            | 8520/50000 [1:32:38<7:52:19,  1.46it/s]


 17%|█████▊                            | 8521/50000 [1:32:39<8:20:38,  1.38it/s]


 17%|█████▊                            | 8522/50000 [1:32:40<7:46:37,  1.48it/s]


 17%|█████▊                            | 8523/50000 [1:32:40<7:38:27,  1.51it/s]


 17%|█████▊                            | 8524/50000 [1:32:41<7:38:03,  1.51it/s]


 17%|█████▊                            | 8525/50000 [1:32:42<7:37:45,  1.51it/s]


 17%|█████▊                            | 8526/50000 [1:32:42<7:42:59,  1.49it/s]


 17%|█████▊                            | 8527/50000 [1:32:43<7:21:33,  1.57it/s]


 17%|█████▊                            | 8528/50000 [1:32:44<8:01:39,  1.44it/s]


 17%|█████▊                            | 8529/50000 [1:32:45<8:06:45,  1.42it/s]


 17%|█████▊                            | 8530/50000 [1:32:45<7:36:49,  1.51it/s]


 17%|█████▊                            | 8531/50000 [1:32:46<7:36:51,  1.51it/s]


 17%|█████▊                            | 8532/50000 [1:32:46<7:18:35,  1.58it/s]


 17%|█████▊                            | 8533/50000 [1:32:47<7:08:21,  1.61it/s]


 17%|█████▊                            | 8534/50000 [1:32:48<7:30:40,  1.53it/s]


 17%|█████▊                            | 8535/50000 [1:32:48<7:27:21,  1.54it/s]


 17%|█████▊                            | 8536/50000 [1:32:49<7:33:56,  1.52it/s]


 17%|█████▊                            | 8537/50000 [1:32:50<7:21:45,  1.56it/s]


 17%|█████▊                            | 8538/50000 [1:32:50<7:28:55,  1.54it/s]


 17%|█████▊                            | 8539/50000 [1:32:51<7:24:49,  1.55it/s]


 17%|█████▊                            | 8540/50000 [1:32:52<7:20:34,  1.57it/s]


 17%|█████▊                            | 8541/50000 [1:32:52<7:05:49,  1.62it/s]


 17%|█████▊                            | 8542/50000 [1:32:53<7:17:52,  1.58it/s]


 17%|█████▊                            | 8543/50000 [1:32:53<7:09:06,  1.61it/s]


 17%|█████▊                            | 8544/50000 [1:32:54<6:55:10,  1.66it/s]


 17%|█████▊                            | 8545/50000 [1:32:54<6:47:58,  1.69it/s]


 17%|█████▊                            | 8546/50000 [1:32:55<7:18:55,  1.57it/s]


 17%|█████▊                            | 8547/50000 [1:32:56<8:21:24,  1.38it/s]


 17%|█████▊                            | 8548/50000 [1:32:57<7:47:11,  1.48it/s]


 17%|█████▊                            | 8549/50000 [1:32:57<7:25:45,  1.55it/s]


 17%|█████▊                            | 8550/50000 [1:32:58<7:39:44,  1.50it/s]


 17%|█████▊                            | 8551/50000 [1:32:59<7:16:30,  1.58it/s]


 17%|█████▊                            | 8552/50000 [1:32:59<7:02:51,  1.63it/s]


 17%|█████▊                            | 8553/50000 [1:33:00<7:15:57,  1.58it/s]


 17%|█████▊                            | 8554/50000 [1:33:00<6:41:31,  1.72it/s]


 17%|█████▊                            | 8555/50000 [1:33:01<7:02:25,  1.64it/s]


 17%|█████▊                            | 8556/50000 [1:33:01<6:39:41,  1.73it/s]


 17%|█████▊                            | 8557/50000 [1:33:02<6:45:24,  1.70it/s]


 17%|█████▊                            | 8558/50000 [1:33:03<7:20:20,  1.57it/s]


 17%|█████▊                            | 8559/50000 [1:33:03<7:11:12,  1.60it/s]


 17%|█████▊                            | 8560/50000 [1:33:04<7:17:59,  1.58it/s]


 17%|█████▊                            | 8561/50000 [1:33:05<7:12:12,  1.60it/s]


 17%|█████▊                            | 8562/50000 [1:33:05<7:41:38,  1.50it/s]


 17%|█████▊                            | 8563/50000 [1:33:06<7:28:43,  1.54it/s]


 17%|█████▊                            | 8564/50000 [1:33:07<7:13:09,  1.59it/s]


 17%|█████▊                            | 8565/50000 [1:33:07<7:08:27,  1.61it/s]


 17%|█████▊                            | 8566/50000 [1:33:08<7:38:32,  1.51it/s]


 17%|█████▊                            | 8567/50000 [1:33:09<7:26:06,  1.55it/s]


 17%|█████▊                            | 8568/50000 [1:33:09<7:05:45,  1.62it/s]


 17%|█████▊                            | 8569/50000 [1:33:10<6:52:43,  1.67it/s]


 17%|█████▊                            | 8570/50000 [1:33:10<7:10:06,  1.61it/s]


 17%|█████▊                            | 8571/50000 [1:33:11<7:03:47,  1.63it/s]


 17%|█████▊                            | 8572/50000 [1:33:12<7:07:03,  1.62it/s]


 17%|█████▊                            | 8573/50000 [1:33:12<7:37:36,  1.51it/s]


 17%|█████▊                            | 8574/50000 [1:33:13<7:49:13,  1.47it/s]


 17%|█████▊                            | 8575/50000 [1:33:14<7:23:08,  1.56it/s]


 17%|█████▊                            | 8576/50000 [1:33:14<7:08:44,  1.61it/s]


 17%|█████▊                            | 8577/50000 [1:33:15<7:14:25,  1.59it/s]


 17%|█████▊                            | 8578/50000 [1:33:15<7:18:26,  1.57it/s]


 17%|█████▊                            | 8579/50000 [1:33:16<7:06:48,  1.62it/s]


 17%|█████▊                            | 8580/50000 [1:33:17<7:01:15,  1.64it/s]


 17%|█████▊                            | 8581/50000 [1:33:17<6:56:05,  1.66it/s]


 17%|█████▊                            | 8582/50000 [1:33:18<7:57:00,  1.45it/s]


 17%|█████▊                            | 8583/50000 [1:33:19<7:54:43,  1.45it/s]


 17%|█████▊                            | 8584/50000 [1:33:20<7:51:51,  1.46it/s]


 17%|█████▊                            | 8585/50000 [1:33:20<8:03:27,  1.43it/s]


 17%|█████▊                            | 8586/50000 [1:33:21<8:00:38,  1.44it/s]


 17%|█████▊                            | 8587/50000 [1:33:22<8:10:59,  1.41it/s]


 17%|█████▊                            | 8588/50000 [1:33:23<8:53:25,  1.29it/s]


 17%|█████▊                            | 8589/50000 [1:33:23<8:35:42,  1.34it/s]


 17%|█████▊                            | 8590/50000 [1:33:24<8:04:45,  1.42it/s]


 17%|█████▊                            | 8591/50000 [1:33:25<8:30:21,  1.35it/s]


 17%|█████▊                            | 8592/50000 [1:33:25<8:08:52,  1.41it/s]


 17%|█████▊                            | 8593/50000 [1:33:26<7:47:13,  1.48it/s]


 17%|█████▊                            | 8594/50000 [1:33:27<7:25:35,  1.55it/s]


 17%|█████▊                            | 8595/50000 [1:33:27<7:14:17,  1.59it/s]


 17%|█████▊                            | 8596/50000 [1:33:28<6:57:15,  1.65it/s]


 17%|█████▊                            | 8597/50000 [1:33:28<6:55:42,  1.66it/s]


 17%|█████▊                            | 8598/50000 [1:33:29<6:37:00,  1.74it/s]


 17%|█████▊                            | 8599/50000 [1:33:29<6:48:10,  1.69it/s]


 17%|█████▊                            | 8600/50000 [1:33:30<6:39:02,  1.73it/s]
                                                                                
{'loss': 3.3625, 'grad_norm': 2.718226671218872, 'learning_rate': 0.000828, 'epoch': 0.45}

 17%|█████▊                            | 8600/50000 [1:33:30<6:39:02,  1.73it/s]


 17%|█████▊                            | 8601/50000 [1:33:31<6:52:13,  1.67it/s]


 17%|█████▊                            | 8602/50000 [1:33:31<7:00:06,  1.64it/s]


 17%|█████▊                            | 8603/50000 [1:33:32<7:27:14,  1.54it/s]


 17%|█████▊                            | 8604/50000 [1:33:33<7:55:27,  1.45it/s]


 17%|█████▊                            | 8605/50000 [1:33:33<7:33:56,  1.52it/s]


 17%|█████▊                            | 8606/50000 [1:33:34<7:14:57,  1.59it/s]


 17%|█████▊                            | 8607/50000 [1:33:35<7:22:43,  1.56it/s]


 17%|█████▊                            | 8608/50000 [1:33:35<7:37:23,  1.51it/s]


 17%|█████▊                            | 8609/50000 [1:33:36<8:02:32,  1.43it/s]


 17%|█████▊                            | 8610/50000 [1:33:37<7:38:56,  1.50it/s]


 17%|█████▊                            | 8611/50000 [1:33:37<7:24:55,  1.55it/s]


 17%|█████▊                            | 8612/50000 [1:33:38<7:07:24,  1.61it/s]


 17%|█████▊                            | 8613/50000 [1:33:39<7:37:37,  1.51it/s]


 17%|█████▊                            | 8614/50000 [1:33:39<7:54:43,  1.45it/s]


 17%|█████▊                            | 8615/50000 [1:33:40<7:37:16,  1.51it/s]


 17%|█████▊                            | 8616/50000 [1:33:41<7:29:05,  1.54it/s]


 17%|█████▊                            | 8617/50000 [1:33:41<7:52:53,  1.46it/s]


 17%|█████▊                            | 8618/50000 [1:33:42<7:29:28,  1.53it/s]


 17%|█████▊                            | 8619/50000 [1:33:42<7:16:57,  1.58it/s]


 17%|█████▊                            | 8620/50000 [1:33:43<6:51:49,  1.67it/s]


 17%|█████▊                            | 8621/50000 [1:33:44<7:46:05,  1.48it/s]


 17%|█████▊                            | 8622/50000 [1:33:44<7:28:34,  1.54it/s]


 17%|█████▊                            | 8623/50000 [1:33:45<7:22:56,  1.56it/s]


 17%|█████▊                            | 8624/50000 [1:33:46<8:03:33,  1.43it/s]


 17%|█████▊                            | 8625/50000 [1:33:46<7:34:43,  1.52it/s]


 17%|█████▊                            | 8626/50000 [1:33:47<7:32:26,  1.52it/s]


 17%|█████▊                            | 8627/50000 [1:33:48<7:56:27,  1.45it/s]


 17%|█████▊                            | 8628/50000 [1:33:49<7:47:42,  1.47it/s]


 17%|█████▊                            | 8629/50000 [1:33:49<7:24:33,  1.55it/s]


 17%|█████▊                            | 8630/50000 [1:33:50<7:14:51,  1.59it/s]


 17%|█████▊                            | 8631/50000 [1:33:50<7:00:16,  1.64it/s]


 17%|█████▊                            | 8632/50000 [1:33:51<7:09:36,  1.60it/s]


 17%|█████▊                            | 8633/50000 [1:33:51<7:01:47,  1.63it/s]


 17%|█████▊                            | 8634/50000 [1:33:52<7:05:35,  1.62it/s]


 17%|█████▊                            | 8635/50000 [1:33:53<7:30:57,  1.53it/s]


 17%|█████▊                            | 8636/50000 [1:33:54<7:37:41,  1.51it/s]


 17%|█████▊                            | 8637/50000 [1:33:54<7:33:13,  1.52it/s]


 17%|█████▊                            | 8638/50000 [1:33:55<7:16:56,  1.58it/s]


 17%|█████▊                            | 8639/50000 [1:33:56<7:36:16,  1.51it/s]


 17%|█████▉                            | 8640/50000 [1:33:56<8:11:38,  1.40it/s]


 17%|█████▉                            | 8641/50000 [1:33:57<7:47:41,  1.47it/s]


 17%|█████▉                            | 8642/50000 [1:33:58<7:25:48,  1.55it/s]


 17%|█████▉                            | 8643/50000 [1:33:58<7:22:17,  1.56it/s]


 17%|█████▉                            | 8644/50000 [1:33:59<7:30:18,  1.53it/s]


 17%|█████▉                            | 8645/50000 [1:34:00<7:49:31,  1.47it/s]


 17%|█████▉                            | 8646/50000 [1:34:00<7:49:19,  1.47it/s]


 17%|█████▉                            | 8647/50000 [1:34:01<7:47:55,  1.47it/s]


 17%|█████▉                            | 8648/50000 [1:34:01<7:23:21,  1.55it/s]


 17%|█████▉                            | 8649/50000 [1:34:02<7:07:27,  1.61it/s]


 17%|█████▉                            | 8650/50000 [1:34:03<6:38:42,  1.73it/s]


 17%|█████▉                            | 8651/50000 [1:34:03<7:01:08,  1.64it/s]


 17%|█████▉                            | 8652/50000 [1:34:04<7:31:44,  1.53it/s]


 17%|█████▉                            | 8653/50000 [1:34:05<7:14:34,  1.59it/s]


 17%|█████▉                            | 8654/50000 [1:34:05<6:58:42,  1.65it/s]


 17%|█████▉                            | 8655/50000 [1:34:06<7:21:42,  1.56it/s]


 17%|█████▉                            | 8656/50000 [1:34:06<7:12:41,  1.59it/s]


 17%|█████▉                            | 8657/50000 [1:34:07<7:24:53,  1.55it/s]


 17%|█████▉                            | 8658/50000 [1:34:08<8:08:54,  1.41it/s]


 17%|█████▉                            | 8659/50000 [1:34:09<7:53:43,  1.45it/s]


 17%|█████▉                            | 8660/50000 [1:34:09<7:49:35,  1.47it/s]


 17%|█████▉                            | 8661/50000 [1:34:10<7:40:25,  1.50it/s]


 17%|█████▉                            | 8662/50000 [1:34:10<7:11:13,  1.60it/s]


 17%|█████▉                            | 8663/50000 [1:34:11<7:13:45,  1.59it/s]


 17%|█████▉                            | 8664/50000 [1:34:12<7:34:21,  1.52it/s]


 17%|█████▉                            | 8665/50000 [1:34:13<7:45:58,  1.48it/s]


 17%|█████▉                            | 8666/50000 [1:34:13<7:57:03,  1.44it/s]


 17%|█████▉                            | 8667/50000 [1:34:14<7:49:39,  1.47it/s]


 17%|█████▉                            | 8668/50000 [1:34:15<8:03:21,  1.43it/s]


 17%|█████▉                            | 8669/50000 [1:34:15<7:38:25,  1.50it/s]


 17%|█████▉                            | 8670/50000 [1:34:16<7:15:57,  1.58it/s]


 17%|█████▉                            | 8671/50000 [1:34:16<7:22:17,  1.56it/s]


 17%|█████▉                            | 8672/50000 [1:34:17<7:30:35,  1.53it/s]


 17%|█████▉                            | 8673/50000 [1:34:18<7:04:14,  1.62it/s]


 17%|█████▉                            | 8674/50000 [1:34:18<7:00:49,  1.64it/s]


 17%|█████▉                            | 8675/50000 [1:34:19<7:03:47,  1.63it/s]


 17%|█████▉                            | 8676/50000 [1:34:20<7:17:14,  1.58it/s]


 17%|█████▉                            | 8677/50000 [1:34:20<7:25:19,  1.55it/s]


 17%|█████▉                            | 8678/50000 [1:34:21<7:38:49,  1.50it/s]


 17%|█████▉                            | 8679/50000 [1:34:22<7:32:29,  1.52it/s]


 17%|█████▉                            | 8680/50000 [1:34:22<7:37:11,  1.51it/s]


 17%|█████▉                            | 8681/50000 [1:34:23<8:31:53,  1.35it/s]


 17%|█████▉                            | 8682/50000 [1:34:24<8:16:11,  1.39it/s]


 17%|█████▉                            | 8683/50000 [1:34:24<7:52:03,  1.46it/s]


 17%|█████▉                            | 8684/50000 [1:34:25<7:41:46,  1.49it/s]


 17%|█████▉                            | 8685/50000 [1:34:26<7:33:15,  1.52it/s]


 17%|█████▉                            | 8686/50000 [1:34:27<7:59:20,  1.44it/s]


 17%|█████▉                            | 8687/50000 [1:34:27<7:41:50,  1.49it/s]


 17%|█████▉                            | 8688/50000 [1:34:28<7:35:06,  1.51it/s]


 17%|█████▉                            | 8689/50000 [1:34:28<7:39:27,  1.50it/s]


 17%|█████▉                            | 8690/50000 [1:34:29<7:22:51,  1.55it/s]


 17%|█████▉                            | 8691/50000 [1:34:30<7:11:15,  1.60it/s]


 17%|█████▉                            | 8692/50000 [1:34:30<7:36:30,  1.51it/s]


 17%|█████▉                            | 8693/50000 [1:34:31<7:51:44,  1.46it/s]


 17%|█████▉                            | 8694/50000 [1:34:32<7:45:51,  1.48it/s]


 17%|█████▉                            | 8695/50000 [1:34:32<7:39:10,  1.50it/s]


 17%|█████▉                            | 8696/50000 [1:34:33<7:26:44,  1.54it/s]


 17%|█████▉                            | 8697/50000 [1:34:34<7:46:06,  1.48it/s]


 17%|█████▉                            | 8698/50000 [1:34:34<7:38:58,  1.50it/s]


 17%|█████▉                            | 8699/50000 [1:34:35<7:35:06,  1.51it/s]


 17%|█████▉                            | 8700/50000 [1:34:36<7:55:20,  1.45it/s]
                                                                                
{'loss': 3.3545, 'grad_norm': 2.619678497314453, 'learning_rate': 0.000826, 'epoch': 0.46}

 17%|█████▉                            | 8700/50000 [1:34:36<7:55:20,  1.45it/s]


 17%|█████▉                            | 8701/50000 [1:34:36<7:45:20,  1.48it/s]


 17%|█████▉                            | 8702/50000 [1:34:37<7:29:11,  1.53it/s]


 17%|█████▉                            | 8703/50000 [1:34:38<7:28:18,  1.54it/s]


 17%|█████▉                            | 8704/50000 [1:34:38<7:12:07,  1.59it/s]


 17%|█████▉                            | 8705/50000 [1:34:39<7:17:39,  1.57it/s]


 17%|█████▉                            | 8706/50000 [1:34:40<7:24:13,  1.55it/s]


 17%|█████▉                            | 8707/50000 [1:34:40<7:26:05,  1.54it/s]


 17%|█████▉                            | 8708/50000 [1:34:41<7:40:46,  1.49it/s]


 17%|█████▉                            | 8709/50000 [1:34:42<7:25:56,  1.54it/s]


 17%|█████▉                            | 8710/50000 [1:34:42<7:16:48,  1.58it/s]


 17%|█████▉                            | 8711/50000 [1:34:43<7:19:37,  1.57it/s]


 17%|█████▉                            | 8712/50000 [1:34:43<7:09:51,  1.60it/s]


 17%|█████▉                            | 8713/50000 [1:34:44<7:27:57,  1.54it/s]


 17%|█████▉                            | 8714/50000 [1:34:45<7:32:46,  1.52it/s]


 17%|█████▉                            | 8715/50000 [1:34:45<7:29:31,  1.53it/s]


 17%|█████▉                            | 8716/50000 [1:34:46<7:28:27,  1.53it/s]


 17%|█████▉                            | 8717/50000 [1:34:47<7:18:50,  1.57it/s]


 17%|█████▉                            | 8718/50000 [1:34:47<7:21:03,  1.56it/s]


 17%|█████▉                            | 8719/50000 [1:34:48<7:42:47,  1.49it/s]


 17%|█████▉                            | 8720/50000 [1:34:49<7:36:27,  1.51it/s]


 17%|█████▉                            | 8721/50000 [1:34:49<7:33:22,  1.52it/s]


 17%|█████▉                            | 8722/50000 [1:34:50<7:50:04,  1.46it/s]


 17%|█████▉                            | 8723/50000 [1:34:51<8:23:32,  1.37it/s]


 17%|█████▉                            | 8724/50000 [1:34:52<8:12:54,  1.40it/s]


 17%|█████▉                            | 8725/50000 [1:34:52<8:16:21,  1.39it/s]


 17%|█████▉                            | 8726/50000 [1:34:53<7:56:08,  1.44it/s]


 17%|█████▉                            | 8727/50000 [1:34:54<7:43:17,  1.48it/s]


 17%|█████▉                            | 8728/50000 [1:34:54<7:34:09,  1.51it/s]


 17%|█████▉                            | 8729/50000 [1:34:55<7:15:59,  1.58it/s]


 17%|█████▉                            | 8730/50000 [1:34:55<7:22:54,  1.55it/s]


 17%|█████▉                            | 8731/50000 [1:34:56<7:02:55,  1.63it/s]


 17%|█████▉                            | 8732/50000 [1:34:57<7:09:42,  1.60it/s]


 17%|█████▉                            | 8733/50000 [1:34:57<7:01:31,  1.63it/s]


 17%|█████▉                            | 8734/50000 [1:34:58<7:10:06,  1.60it/s]


 17%|█████▉                            | 8735/50000 [1:34:59<8:24:49,  1.36it/s]


 17%|█████▉                            | 8736/50000 [1:35:00<8:20:22,  1.37it/s]


 17%|█████▉                            | 8737/50000 [1:35:00<8:17:15,  1.38it/s]


 17%|█████▉                            | 8738/50000 [1:35:01<8:08:26,  1.41it/s]


 17%|█████▉                            | 8739/50000 [1:35:02<7:51:49,  1.46it/s]


 17%|█████▉                            | 8740/50000 [1:35:02<8:21:57,  1.37it/s]


 17%|█████▉                            | 8741/50000 [1:35:03<8:33:12,  1.34it/s]


 17%|█████▉                            | 8742/50000 [1:35:04<8:33:17,  1.34it/s]


 17%|█████▉                            | 8743/50000 [1:35:05<8:03:54,  1.42it/s]


 17%|█████▉                            | 8744/50000 [1:35:05<7:41:59,  1.49it/s]


 17%|█████▉                            | 8745/50000 [1:35:06<7:42:30,  1.49it/s]


 17%|█████▉                            | 8746/50000 [1:35:07<7:36:17,  1.51it/s]


 17%|█████▉                            | 8747/50000 [1:35:07<7:07:35,  1.61it/s]


 17%|█████▉                            | 8748/50000 [1:35:08<7:20:55,  1.56it/s]


 17%|█████▉                            | 8749/50000 [1:35:08<7:27:43,  1.54it/s]


 18%|█████▉                            | 8750/50000 [1:35:09<7:22:01,  1.56it/s]


 18%|█████▉                            | 8751/50000 [1:35:10<7:40:44,  1.49it/s]


 18%|█████▉                            | 8752/50000 [1:35:10<7:32:20,  1.52it/s]


 18%|█████▉                            | 8753/50000 [1:35:11<7:36:57,  1.50it/s]


 18%|█████▉                            | 8754/50000 [1:35:12<7:21:50,  1.56it/s]


 18%|█████▉                            | 8755/50000 [1:35:12<7:36:15,  1.51it/s]


 18%|█████▉                            | 8756/50000 [1:35:13<7:23:55,  1.55it/s]


 18%|█████▉                            | 8757/50000 [1:35:14<7:11:27,  1.59it/s]


 18%|█████▉                            | 8758/50000 [1:35:14<7:10:37,  1.60it/s]


 18%|█████▉                            | 8759/50000 [1:35:15<6:58:13,  1.64it/s]


 18%|█████▉                            | 8760/50000 [1:35:15<7:14:20,  1.58it/s]


 18%|█████▉                            | 8761/50000 [1:35:16<7:15:10,  1.58it/s]


 18%|█████▉                            | 8762/50000 [1:35:17<7:19:43,  1.56it/s]


 18%|█████▉                            | 8763/50000 [1:35:17<7:17:41,  1.57it/s]


 18%|█████▉                            | 8764/50000 [1:35:18<7:33:21,  1.52it/s]


 18%|█████▉                            | 8765/50000 [1:35:19<7:23:03,  1.55it/s]


 18%|█████▉                            | 8766/50000 [1:35:19<7:41:24,  1.49it/s]


 18%|█████▉                            | 8767/50000 [1:35:20<7:24:49,  1.54it/s]


 18%|█████▉                            | 8768/50000 [1:35:21<7:30:22,  1.53it/s]


 18%|█████▉                            | 8769/50000 [1:35:21<7:02:07,  1.63it/s]


 18%|█████▉                            | 8770/50000 [1:35:22<7:05:11,  1.62it/s]


 18%|█████▉                            | 8771/50000 [1:35:22<6:54:20,  1.66it/s]


 18%|█████▉                            | 8772/50000 [1:35:23<6:58:37,  1.64it/s]


 18%|█████▉                            | 8773/50000 [1:35:24<7:06:43,  1.61it/s]


 18%|█████▉                            | 8774/50000 [1:35:24<7:15:03,  1.58it/s]


 18%|█████▉                            | 8775/50000 [1:35:25<7:06:20,  1.61it/s]


 18%|█████▉                            | 8776/50000 [1:35:26<7:32:11,  1.52it/s]


 18%|█████▉                            | 8777/50000 [1:35:26<7:38:07,  1.50it/s]


 18%|█████▉                            | 8778/50000 [1:35:27<7:30:28,  1.53it/s]


 18%|█████▉                            | 8779/50000 [1:35:28<7:03:23,  1.62it/s]


 18%|█████▉                            | 8780/50000 [1:35:28<7:26:19,  1.54it/s]


 18%|█████▉                            | 8781/50000 [1:35:29<7:13:00,  1.59it/s]


 18%|█████▉                            | 8782/50000 [1:35:30<7:20:37,  1.56it/s]


 18%|█████▉                            | 8783/50000 [1:35:30<7:37:03,  1.50it/s]


 18%|█████▉                            | 8784/50000 [1:35:31<7:14:00,  1.58it/s]


 18%|█████▉                            | 8785/50000 [1:35:32<7:41:32,  1.49it/s]


 18%|█████▉                            | 8786/50000 [1:35:32<8:10:38,  1.40it/s]


 18%|█████▉                            | 8787/50000 [1:35:33<7:37:34,  1.50it/s]


 18%|█████▉                            | 8788/50000 [1:35:34<7:24:44,  1.54it/s]


 18%|█████▉                            | 8789/50000 [1:35:34<7:42:26,  1.49it/s]


 18%|█████▉                            | 8790/50000 [1:35:35<7:41:24,  1.49it/s]


 18%|█████▉                            | 8791/50000 [1:35:36<8:13:15,  1.39it/s]


 18%|█████▉                            | 8792/50000 [1:35:36<7:40:49,  1.49it/s]


 18%|█████▉                            | 8793/50000 [1:35:37<7:27:04,  1.54it/s]


 18%|█████▉                            | 8794/50000 [1:35:37<7:13:31,  1.58it/s]


 18%|█████▉                            | 8795/50000 [1:35:38<7:04:03,  1.62it/s]


 18%|█████▉                            | 8796/50000 [1:35:39<6:58:10,  1.64it/s]


 18%|█████▉                            | 8797/50000 [1:35:39<7:06:45,  1.61it/s]


 18%|█████▉                            | 8798/50000 [1:35:40<7:03:37,  1.62it/s]


 18%|█████▉                            | 8799/50000 [1:35:41<7:09:14,  1.60it/s]


 18%|█████▉                            | 8800/50000 [1:35:41<7:57:43,  1.44it/s]
                                                                                
{'loss': 3.3596, 'grad_norm': 2.775541067123413, 'learning_rate': 0.000824, 'epoch': 0.46}

 18%|█████▉                            | 8800/50000 [1:35:41<7:57:43,  1.44it/s]


 18%|█████▉                            | 8801/50000 [1:35:42<8:41:05,  1.32it/s]


 18%|█████▉                            | 8802/50000 [1:35:43<8:52:29,  1.29it/s]


 18%|█████▉                            | 8803/50000 [1:35:44<8:39:55,  1.32it/s]


 18%|█████▉                            | 8804/50000 [1:35:45<8:16:48,  1.38it/s]


 18%|█████▉                            | 8805/50000 [1:35:45<7:46:16,  1.47it/s]


 18%|█████▉                            | 8806/50000 [1:35:46<7:08:28,  1.60it/s]


 18%|█████▉                            | 8807/50000 [1:35:46<6:55:27,  1.65it/s]


 18%|█████▉                            | 8808/50000 [1:35:47<7:01:49,  1.63it/s]


 18%|█████▉                            | 8809/50000 [1:35:47<7:06:20,  1.61it/s]


 18%|█████▉                            | 8810/50000 [1:35:48<7:18:34,  1.57it/s]


 18%|█████▉                            | 8811/50000 [1:35:49<7:07:35,  1.61it/s]


 18%|█████▉                            | 8812/50000 [1:35:49<6:58:35,  1.64it/s]


 18%|█████▉                            | 8813/50000 [1:35:50<7:04:02,  1.62it/s]


 18%|█████▉                            | 8814/50000 [1:35:51<7:29:23,  1.53it/s]


 18%|█████▉                            | 8815/50000 [1:35:51<7:43:07,  1.48it/s]


 18%|█████▉                            | 8816/50000 [1:35:52<7:24:44,  1.54it/s]


 18%|█████▉                            | 8817/50000 [1:35:53<7:27:28,  1.53it/s]


 18%|█████▉                            | 8818/50000 [1:35:53<7:13:45,  1.58it/s]


 18%|█████▉                            | 8819/50000 [1:35:54<6:58:54,  1.64it/s]


 18%|█████▉                            | 8820/50000 [1:35:54<6:51:07,  1.67it/s]


 18%|█████▉                            | 8821/50000 [1:35:55<7:39:08,  1.49it/s]


 18%|█████▉                            | 8822/50000 [1:35:56<7:16:47,  1.57it/s]


 18%|█████▉                            | 8823/50000 [1:35:56<7:40:51,  1.49it/s]


 18%|██████                            | 8824/50000 [1:35:57<7:37:22,  1.50it/s]


 18%|██████                            | 8825/50000 [1:35:58<7:29:46,  1.53it/s]


 18%|██████                            | 8826/50000 [1:35:58<7:28:22,  1.53it/s]


 18%|██████                            | 8827/50000 [1:35:59<7:33:55,  1.51it/s]


 18%|██████                            | 8828/50000 [1:36:00<7:29:55,  1.53it/s]


 18%|██████                            | 8829/50000 [1:36:00<7:36:17,  1.50it/s]


 18%|██████                            | 8830/50000 [1:36:01<7:12:06,  1.59it/s]


 18%|██████                            | 8831/50000 [1:36:02<7:13:27,  1.58it/s]


 18%|██████                            | 8832/50000 [1:36:02<7:43:43,  1.48it/s]


 18%|██████                            | 8833/50000 [1:36:03<7:39:26,  1.49it/s]


 18%|██████                            | 8834/50000 [1:36:04<7:40:22,  1.49it/s]


 18%|██████                            | 8835/50000 [1:36:04<7:17:32,  1.57it/s]


 18%|██████                            | 8836/50000 [1:36:05<7:41:11,  1.49it/s]


 18%|██████                            | 8837/50000 [1:36:06<7:32:25,  1.52it/s]


 18%|██████                            | 8838/50000 [1:36:06<7:02:06,  1.63it/s]


 18%|██████                            | 8839/50000 [1:36:07<7:11:36,  1.59it/s]


 18%|██████                            | 8840/50000 [1:36:08<8:44:10,  1.31it/s]


 18%|██████                            | 8841/50000 [1:36:09<8:59:31,  1.27it/s]


 18%|██████                            | 8842/50000 [1:36:09<8:23:13,  1.36it/s]


 18%|██████                            | 8843/50000 [1:36:10<8:18:55,  1.37it/s]


 18%|██████                            | 8844/50000 [1:36:11<8:01:25,  1.42it/s]


 18%|██████                            | 8845/50000 [1:36:12<8:36:16,  1.33it/s]


 18%|██████                            | 8846/50000 [1:36:12<7:49:23,  1.46it/s]


 18%|██████                            | 8847/50000 [1:36:13<7:59:28,  1.43it/s]


 18%|██████                            | 8848/50000 [1:36:13<7:32:10,  1.52it/s]


 18%|██████                            | 8849/50000 [1:36:14<7:36:29,  1.50it/s]


 18%|██████                            | 8850/50000 [1:36:15<7:29:04,  1.53it/s]


 18%|██████                            | 8851/50000 [1:36:15<7:15:59,  1.57it/s]


 18%|██████                            | 8852/50000 [1:36:16<6:53:08,  1.66it/s]


 18%|██████                            | 8853/50000 [1:36:16<6:42:25,  1.70it/s]


 18%|██████                            | 8854/50000 [1:36:17<7:54:11,  1.45it/s]


 18%|██████                            | 8855/50000 [1:36:18<7:58:31,  1.43it/s]


 18%|██████                            | 8856/50000 [1:36:19<7:45:47,  1.47it/s]


 18%|██████                            | 8857/50000 [1:36:19<7:22:29,  1.55it/s]


 18%|██████                            | 8858/50000 [1:36:20<7:19:11,  1.56it/s]


 18%|██████                            | 8859/50000 [1:36:20<7:00:03,  1.63it/s]


 18%|██████                            | 8860/50000 [1:36:21<7:11:18,  1.59it/s]


 18%|██████                            | 8861/50000 [1:36:22<6:44:40,  1.69it/s]


 18%|██████                            | 8862/50000 [1:36:22<6:39:49,  1.71it/s]


 18%|██████                            | 8863/50000 [1:36:23<6:37:36,  1.72it/s]


 18%|██████                            | 8864/50000 [1:36:23<6:51:39,  1.67it/s]


 18%|██████                            | 8865/50000 [1:36:24<7:01:31,  1.63it/s]


 18%|██████                            | 8866/50000 [1:36:25<7:32:32,  1.51it/s]


 18%|██████                            | 8867/50000 [1:36:25<7:15:45,  1.57it/s]


 18%|██████                            | 8868/50000 [1:36:26<7:23:36,  1.55it/s]


 18%|██████                            | 8869/50000 [1:36:27<7:19:57,  1.56it/s]


 18%|██████                            | 8870/50000 [1:36:27<7:25:14,  1.54it/s]


 18%|██████                            | 8871/50000 [1:36:28<7:11:59,  1.59it/s]


 18%|██████                            | 8872/50000 [1:36:29<7:30:45,  1.52it/s]


 18%|██████                            | 8873/50000 [1:36:29<7:16:21,  1.57it/s]


 18%|██████                            | 8874/50000 [1:36:30<7:05:54,  1.61it/s]


 18%|██████                            | 8875/50000 [1:36:30<7:02:14,  1.62it/s]


 18%|██████                            | 8876/50000 [1:36:31<7:36:48,  1.50it/s]


 18%|██████                            | 8877/50000 [1:36:32<7:12:18,  1.59it/s]


 18%|██████                            | 8878/50000 [1:36:32<6:57:47,  1.64it/s]


 18%|██████                            | 8879/50000 [1:36:33<6:51:31,  1.67it/s]


 18%|██████                            | 8880/50000 [1:36:34<7:44:59,  1.47it/s]


 18%|██████                            | 8881/50000 [1:36:34<7:44:11,  1.48it/s]


 18%|██████                            | 8882/50000 [1:36:35<7:24:06,  1.54it/s]


 18%|██████                            | 8883/50000 [1:36:36<7:12:28,  1.58it/s]


 18%|██████                            | 8884/50000 [1:36:36<7:31:00,  1.52it/s]


 18%|██████                            | 8885/50000 [1:36:37<7:35:32,  1.50it/s]


 18%|██████                            | 8886/50000 [1:36:38<8:14:00,  1.39it/s]


 18%|██████                            | 8887/50000 [1:36:38<7:40:30,  1.49it/s]


 18%|██████                            | 8888/50000 [1:36:39<8:25:59,  1.35it/s]


 18%|██████                            | 8889/50000 [1:36:40<9:09:17,  1.25it/s]


 18%|██████                            | 8890/50000 [1:36:41<8:27:42,  1.35it/s]


 18%|██████                            | 8891/50000 [1:36:42<8:12:38,  1.39it/s]


 18%|██████                            | 8892/50000 [1:36:42<7:41:20,  1.49it/s]


 18%|██████                            | 8893/50000 [1:36:43<7:55:02,  1.44it/s]


 18%|██████                            | 8894/50000 [1:36:43<7:35:24,  1.50it/s]


 18%|██████                            | 8895/50000 [1:36:44<7:33:20,  1.51it/s]


 18%|██████                            | 8896/50000 [1:36:45<7:52:00,  1.45it/s]


 18%|██████                            | 8897/50000 [1:36:46<7:58:08,  1.43it/s]


 18%|██████                            | 8898/50000 [1:36:46<7:42:50,  1.48it/s]


 18%|██████                            | 8899/50000 [1:36:47<7:20:24,  1.56it/s]


 18%|██████                            | 8900/50000 [1:36:47<7:02:40,  1.62it/s]
                                                                                
{'loss': 3.3638, 'grad_norm': 3.2291579246520996, 'learning_rate': 0.0008219999999999999, 'epoch': 0.47}

 18%|██████                            | 8900/50000 [1:36:47<7:02:40,  1.62it/s]


 18%|██████                            | 8901/50000 [1:36:48<7:01:22,  1.63it/s]


 18%|██████                            | 8902/50000 [1:36:48<6:53:57,  1.65it/s]


 18%|██████                            | 8903/50000 [1:36:49<6:37:37,  1.72it/s]


 18%|██████                            | 8904/50000 [1:36:50<6:59:27,  1.63it/s]


 18%|██████                            | 8905/50000 [1:36:50<6:47:22,  1.68it/s]


 18%|██████                            | 8906/50000 [1:36:51<6:59:40,  1.63it/s]


 18%|██████                            | 8907/50000 [1:36:52<7:08:17,  1.60it/s]


 18%|██████                            | 8908/50000 [1:36:52<7:30:38,  1.52it/s]


 18%|██████                            | 8909/50000 [1:36:53<7:07:57,  1.60it/s]


 18%|██████                            | 8910/50000 [1:36:54<7:27:42,  1.53it/s]


 18%|██████                            | 8911/50000 [1:36:54<7:07:14,  1.60it/s]


 18%|██████                            | 8912/50000 [1:36:55<7:20:10,  1.56it/s]


 18%|██████                            | 8913/50000 [1:36:56<7:45:28,  1.47it/s]


 18%|██████                            | 8914/50000 [1:36:56<7:42:57,  1.48it/s]


 18%|██████                            | 8915/50000 [1:36:57<7:34:41,  1.51it/s]


 18%|██████                            | 8916/50000 [1:36:58<8:08:03,  1.40it/s]


 18%|██████                            | 8917/50000 [1:36:58<7:43:08,  1.48it/s]


 18%|██████                            | 8918/50000 [1:36:59<7:41:20,  1.48it/s]


 18%|██████                            | 8919/50000 [1:37:00<7:49:17,  1.46it/s]


 18%|██████                            | 8920/50000 [1:37:00<7:24:51,  1.54it/s]


 18%|██████                            | 8921/50000 [1:37:01<7:22:02,  1.55it/s]


 18%|██████                            | 8922/50000 [1:37:01<7:07:47,  1.60it/s]


 18%|██████                            | 8923/50000 [1:37:02<6:59:09,  1.63it/s]


 18%|██████                            | 8924/50000 [1:37:03<7:08:36,  1.60it/s]


 18%|██████                            | 8925/50000 [1:37:03<7:02:15,  1.62it/s]


 18%|██████                            | 8926/50000 [1:37:04<7:15:56,  1.57it/s]


 18%|██████                            | 8927/50000 [1:37:05<7:37:39,  1.50it/s]


 18%|██████                            | 8928/50000 [1:37:05<7:29:01,  1.52it/s]


 18%|██████                            | 8929/50000 [1:37:06<7:16:29,  1.57it/s]


 18%|██████                            | 8930/50000 [1:37:07<7:24:37,  1.54it/s]


 18%|██████                            | 8931/50000 [1:37:07<7:25:03,  1.54it/s]


 18%|██████                            | 8932/50000 [1:37:08<7:20:05,  1.56it/s]


 18%|██████                            | 8933/50000 [1:37:09<7:19:21,  1.56it/s]


 18%|██████                            | 8934/50000 [1:37:09<7:04:40,  1.61it/s]


 18%|██████                            | 8935/50000 [1:37:10<7:09:49,  1.59it/s]


 18%|██████                            | 8936/50000 [1:37:10<7:27:46,  1.53it/s]


 18%|██████                            | 8937/50000 [1:37:11<7:27:13,  1.53it/s]


 18%|██████                            | 8938/50000 [1:37:12<6:58:18,  1.64it/s]


 18%|██████                            | 8939/50000 [1:37:12<6:48:20,  1.68it/s]


 18%|██████                            | 8940/50000 [1:37:13<7:12:34,  1.58it/s]


 18%|██████                            | 8941/50000 [1:37:14<7:39:10,  1.49it/s]


 18%|██████                            | 8942/50000 [1:37:14<7:20:41,  1.55it/s]


 18%|██████                            | 8943/50000 [1:37:15<7:05:17,  1.61it/s]


 18%|██████                            | 8944/50000 [1:37:15<6:44:35,  1.69it/s]


 18%|██████                            | 8945/50000 [1:37:16<7:55:26,  1.44it/s]


 18%|██████                            | 8946/50000 [1:37:17<8:07:14,  1.40it/s]


 18%|██████                            | 8947/50000 [1:37:18<7:49:05,  1.46it/s]


 18%|██████                            | 8948/50000 [1:37:18<7:28:38,  1.53it/s]


 18%|██████                            | 8949/50000 [1:37:19<7:52:31,  1.45it/s]


 18%|██████                            | 8950/50000 [1:37:20<8:58:45,  1.27it/s]


 18%|██████                            | 8951/50000 [1:37:21<8:28:48,  1.34it/s]


 18%|██████                            | 8952/50000 [1:37:21<7:53:38,  1.44it/s]


 18%|██████                            | 8953/50000 [1:37:22<7:40:46,  1.48it/s]


 18%|██████                            | 8954/50000 [1:37:22<7:30:22,  1.52it/s]


 18%|██████                            | 8955/50000 [1:37:23<7:11:26,  1.59it/s]


 18%|██████                            | 8956/50000 [1:37:24<6:55:32,  1.65it/s]


 18%|██████                            | 8957/50000 [1:37:24<6:52:11,  1.66it/s]


 18%|██████                            | 8958/50000 [1:37:25<6:40:48,  1.71it/s]


 18%|██████                            | 8959/50000 [1:37:25<7:06:51,  1.60it/s]


 18%|██████                            | 8960/50000 [1:37:26<7:01:25,  1.62it/s]


 18%|██████                            | 8961/50000 [1:37:27<7:30:57,  1.52it/s]


 18%|██████                            | 8962/50000 [1:37:27<7:36:29,  1.50it/s]


 18%|██████                            | 8963/50000 [1:37:28<8:15:37,  1.38it/s]


 18%|██████                            | 8964/50000 [1:37:29<8:00:03,  1.42it/s]


 18%|██████                            | 8965/50000 [1:37:30<8:13:05,  1.39it/s]


 18%|██████                            | 8966/50000 [1:37:30<8:03:49,  1.41it/s]


 18%|██████                            | 8967/50000 [1:37:31<8:16:44,  1.38it/s]


 18%|██████                            | 8968/50000 [1:37:32<7:59:45,  1.43it/s]


 18%|██████                            | 8969/50000 [1:37:32<7:32:25,  1.51it/s]


 18%|██████                            | 8970/50000 [1:37:33<8:14:31,  1.38it/s]


 18%|██████                            | 8971/50000 [1:37:34<7:50:25,  1.45it/s]


 18%|██████                            | 8972/50000 [1:37:35<7:38:26,  1.49it/s]


 18%|██████                            | 8973/50000 [1:37:35<7:14:10,  1.57it/s]


 18%|██████                            | 8974/50000 [1:37:36<6:59:43,  1.63it/s]


 18%|██████                            | 8975/50000 [1:37:36<7:23:42,  1.54it/s]


 18%|██████                            | 8976/50000 [1:37:37<7:22:59,  1.54it/s]


 18%|██████                            | 8977/50000 [1:37:38<7:20:16,  1.55it/s]


 18%|██████                            | 8978/50000 [1:37:38<7:23:41,  1.54it/s]


 18%|██████                            | 8979/50000 [1:37:39<7:02:58,  1.62it/s]


 18%|██████                            | 8980/50000 [1:37:39<6:55:58,  1.64it/s]


 18%|██████                            | 8981/50000 [1:37:40<7:18:37,  1.56it/s]


 18%|██████                            | 8982/50000 [1:37:41<7:06:55,  1.60it/s]


 18%|██████                            | 8983/50000 [1:37:42<7:50:58,  1.45it/s]


 18%|██████                            | 8984/50000 [1:37:42<7:44:07,  1.47it/s]


 18%|██████                            | 8985/50000 [1:37:43<7:18:26,  1.56it/s]


 18%|██████                            | 8986/50000 [1:37:44<7:34:20,  1.50it/s]


 18%|██████                            | 8987/50000 [1:37:44<7:19:30,  1.56it/s]


 18%|██████                            | 8988/50000 [1:37:45<6:53:58,  1.65it/s]


 18%|██████                            | 8989/50000 [1:37:45<7:02:47,  1.62it/s]


 18%|██████                            | 8990/50000 [1:37:46<6:57:28,  1.64it/s]


 18%|██████                            | 8991/50000 [1:37:47<7:19:40,  1.55it/s]


 18%|██████                            | 8992/50000 [1:37:47<7:23:19,  1.54it/s]


 18%|██████                            | 8993/50000 [1:37:48<7:39:38,  1.49it/s]


 18%|██████                            | 8994/50000 [1:37:49<7:32:15,  1.51it/s]


 18%|██████                            | 8995/50000 [1:37:49<7:43:04,  1.48it/s]


 18%|██████                            | 8996/50000 [1:37:50<7:43:35,  1.47it/s]


 18%|██████                            | 8997/50000 [1:37:51<7:28:16,  1.52it/s]


 18%|██████                            | 8998/50000 [1:37:51<7:11:15,  1.58it/s]


 18%|██████                            | 8999/50000 [1:37:52<7:34:54,  1.50it/s]


 18%|██████                            | 9000/50000 [1:37:53<7:19:37,  1.55it/s]
                                                                                
{'loss': 3.3687, 'grad_norm': 2.725790023803711, 'learning_rate': 0.00082, 'epoch': 0.47}

 18%|██████                            | 9000/50000 [1:37:53<7:19:37,  1.55it/s]


 18%|██████                            | 9001/50000 [1:37:53<7:43:30,  1.47it/s]


 18%|██████                            | 9002/50000 [1:37:54<7:23:04,  1.54it/s]


 18%|██████                            | 9003/50000 [1:37:55<7:40:38,  1.48it/s]


 18%|██████                            | 9004/50000 [1:37:55<7:48:46,  1.46it/s]


 18%|██████                            | 9005/50000 [1:37:56<7:44:56,  1.47it/s]


 18%|██████                            | 9006/50000 [1:37:57<7:34:44,  1.50it/s]


 18%|██████                            | 9007/50000 [1:37:57<7:27:20,  1.53it/s]


 18%|██████▏                           | 9008/50000 [1:37:58<7:11:49,  1.58it/s]


 18%|██████▏                           | 9009/50000 [1:37:58<7:03:33,  1.61it/s]


 18%|██████▏                           | 9010/50000 [1:37:59<7:15:53,  1.57it/s]


 18%|██████▏                           | 9011/50000 [1:38:00<6:57:17,  1.64it/s]


 18%|██████▏                           | 9012/50000 [1:38:00<7:07:34,  1.60it/s]


 18%|██████▏                           | 9013/50000 [1:38:01<7:10:53,  1.59it/s]


 18%|██████▏                           | 9014/50000 [1:38:02<7:21:03,  1.55it/s]


 18%|██████▏                           | 9015/50000 [1:38:02<7:56:43,  1.43it/s]


 18%|██████▏                           | 9016/50000 [1:38:03<7:42:41,  1.48it/s]


 18%|██████▏                           | 9017/50000 [1:38:04<7:26:22,  1.53it/s]


 18%|██████▏                           | 9018/50000 [1:38:04<7:48:07,  1.46it/s]


 18%|██████▏                           | 9019/50000 [1:38:05<7:36:40,  1.50it/s]


 18%|██████▏                           | 9020/50000 [1:38:06<7:33:50,  1.50it/s]


 18%|██████▏                           | 9021/50000 [1:38:07<8:06:51,  1.40it/s]


 18%|██████▏                           | 9022/50000 [1:38:07<8:30:07,  1.34it/s]


 18%|██████▏                           | 9023/50000 [1:38:08<8:33:56,  1.33it/s]


 18%|██████▏                           | 9024/50000 [1:38:09<8:37:59,  1.32it/s]


 18%|██████▏                           | 9025/50000 [1:38:10<8:07:31,  1.40it/s]


 18%|██████▏                           | 9026/50000 [1:38:10<8:08:31,  1.40it/s]


 18%|██████▏                           | 9027/50000 [1:38:11<7:53:30,  1.44it/s]


 18%|██████▏                           | 9028/50000 [1:38:11<7:31:37,  1.51it/s]


 18%|██████▏                           | 9029/50000 [1:38:12<7:47:58,  1.46it/s]


 18%|██████▏                           | 9030/50000 [1:38:13<7:47:00,  1.46it/s]


 18%|██████▏                           | 9031/50000 [1:38:14<8:02:19,  1.42it/s]


 18%|██████▏                           | 9032/50000 [1:38:14<7:46:33,  1.46it/s]


 18%|██████▏                           | 9033/50000 [1:38:15<7:25:00,  1.53it/s]


 18%|██████▏                           | 9034/50000 [1:38:15<7:07:22,  1.60it/s]


 18%|██████▏                           | 9035/50000 [1:38:16<6:58:58,  1.63it/s]


 18%|██████▏                           | 9036/50000 [1:38:17<6:55:39,  1.64it/s]


 18%|██████▏                           | 9037/50000 [1:38:17<6:54:33,  1.65it/s]


 18%|██████▏                           | 9038/50000 [1:38:18<7:19:08,  1.55it/s]


 18%|██████▏                           | 9039/50000 [1:38:19<7:38:54,  1.49it/s]


 18%|██████▏                           | 9040/50000 [1:38:20<8:17:09,  1.37it/s]


 18%|██████▏                           | 9041/50000 [1:38:20<8:22:00,  1.36it/s]


 18%|██████▏                           | 9042/50000 [1:38:21<8:04:02,  1.41it/s]


 18%|██████▏                           | 9043/50000 [1:38:22<8:27:56,  1.34it/s]


 18%|██████▏                           | 9044/50000 [1:38:22<8:04:24,  1.41it/s]


 18%|██████▏                           | 9045/50000 [1:38:23<8:29:25,  1.34it/s]


 18%|██████▏                           | 9046/50000 [1:38:24<8:04:11,  1.41it/s]


 18%|██████▏                           | 9047/50000 [1:38:24<7:54:20,  1.44it/s]


 18%|██████▏                           | 9048/50000 [1:38:25<7:51:18,  1.45it/s]


 18%|██████▏                           | 9049/50000 [1:38:26<7:29:42,  1.52it/s]


 18%|██████▏                           | 9050/50000 [1:38:27<7:50:10,  1.45it/s]


 18%|██████▏                           | 9051/50000 [1:38:27<7:15:10,  1.57it/s]


 18%|██████▏                           | 9052/50000 [1:38:28<7:19:47,  1.55it/s]


 18%|██████▏                           | 9053/50000 [1:38:28<7:18:04,  1.56it/s]


 18%|██████▏                           | 9054/50000 [1:38:29<7:03:59,  1.61it/s]


 18%|██████▏                           | 9055/50000 [1:38:30<7:08:10,  1.59it/s]


 18%|██████▏                           | 9056/50000 [1:38:30<7:13:41,  1.57it/s]


 18%|██████▏                           | 9057/50000 [1:38:31<7:22:49,  1.54it/s]


 18%|██████▏                           | 9058/50000 [1:38:32<7:23:23,  1.54it/s]


 18%|██████▏                           | 9059/50000 [1:38:32<7:41:58,  1.48it/s]


 18%|██████▏                           | 9060/50000 [1:38:33<7:38:38,  1.49it/s]


 18%|██████▏                           | 9061/50000 [1:38:34<7:18:16,  1.56it/s]


 18%|██████▏                           | 9062/50000 [1:38:34<7:32:32,  1.51it/s]


 18%|██████▏                           | 9063/50000 [1:38:35<8:04:38,  1.41it/s]


 18%|██████▏                           | 9064/50000 [1:38:36<7:50:40,  1.45it/s]


 18%|██████▏                           | 9065/50000 [1:38:36<7:33:07,  1.51it/s]


 18%|██████▏                           | 9066/50000 [1:38:37<7:32:33,  1.51it/s]


 18%|██████▏                           | 9067/50000 [1:38:38<8:21:40,  1.36it/s]


 18%|██████▏                           | 9068/50000 [1:38:39<8:40:23,  1.31it/s]


 18%|██████▏                           | 9069/50000 [1:38:39<8:14:24,  1.38it/s]


 18%|██████▏                           | 9070/50000 [1:38:40<7:49:39,  1.45it/s]


 18%|██████▏                           | 9071/50000 [1:38:40<7:28:28,  1.52it/s]


 18%|██████▏                           | 9072/50000 [1:38:41<7:08:30,  1.59it/s]


 18%|██████▏                           | 9073/50000 [1:38:42<6:57:04,  1.64it/s]


 18%|██████▏                           | 9074/50000 [1:38:42<6:53:09,  1.65it/s]


 18%|██████▏                           | 9075/50000 [1:38:43<7:19:21,  1.55it/s]


 18%|██████▏                           | 9076/50000 [1:38:44<7:55:10,  1.44it/s]


 18%|██████▏                           | 9077/50000 [1:38:44<7:45:18,  1.47it/s]


 18%|██████▏                           | 9078/50000 [1:38:45<7:30:35,  1.51it/s]


 18%|██████▏                           | 9079/50000 [1:38:46<7:34:32,  1.50it/s]


 18%|██████▏                           | 9080/50000 [1:38:46<7:16:36,  1.56it/s]


 18%|██████▏                           | 9081/50000 [1:38:47<8:01:25,  1.42it/s]


 18%|██████▏                           | 9082/50000 [1:38:48<7:53:34,  1.44it/s]


 18%|██████▏                           | 9083/50000 [1:38:48<7:44:26,  1.47it/s]


 18%|██████▏                           | 9084/50000 [1:38:49<7:42:04,  1.48it/s]


 18%|██████▏                           | 9085/50000 [1:38:50<7:09:55,  1.59it/s]


 18%|██████▏                           | 9086/50000 [1:38:50<6:59:50,  1.62it/s]


 18%|██████▏                           | 9087/50000 [1:38:51<7:25:39,  1.53it/s]


 18%|██████▏                           | 9088/50000 [1:38:52<7:09:29,  1.59it/s]


 18%|██████▏                           | 9089/50000 [1:38:52<6:55:39,  1.64it/s]


 18%|██████▏                           | 9090/50000 [1:38:53<6:48:29,  1.67it/s]


 18%|██████▏                           | 9091/50000 [1:38:53<7:04:44,  1.61it/s]


 18%|██████▏                           | 9092/50000 [1:38:54<7:11:47,  1.58it/s]


 18%|██████▏                           | 9093/50000 [1:38:55<7:17:54,  1.56it/s]


 18%|██████▏                           | 9094/50000 [1:38:56<8:01:13,  1.42it/s]


 18%|██████▏                           | 9095/50000 [1:38:56<7:24:31,  1.53it/s]


 18%|██████▏                           | 9096/50000 [1:38:57<8:01:47,  1.42it/s]


 18%|██████▏                           | 9097/50000 [1:38:58<7:45:11,  1.47it/s]


 18%|██████▏                           | 9098/50000 [1:38:58<7:39:59,  1.48it/s]


 18%|██████▏                           | 9099/50000 [1:38:59<8:02:49,  1.41it/s]


 18%|██████▏                           | 9100/50000 [1:39:00<7:35:29,  1.50it/s]
                                                                                
{'loss': 3.3546, 'grad_norm': 2.5385780334472656, 'learning_rate': 0.0008179999999999999, 'epoch': 0.48}

 18%|██████▏                           | 9100/50000 [1:39:00<7:35:29,  1.50it/s]


 18%|██████▏                           | 9101/50000 [1:39:00<7:35:46,  1.50it/s]


 18%|██████▏                           | 9102/50000 [1:39:01<7:34:43,  1.50it/s]


 18%|██████▏                           | 9103/50000 [1:39:02<7:45:57,  1.46it/s]


 18%|██████▏                           | 9104/50000 [1:39:02<7:42:34,  1.47it/s]


 18%|██████▏                           | 9105/50000 [1:39:03<7:35:07,  1.50it/s]


 18%|██████▏                           | 9106/50000 [1:39:04<7:29:53,  1.51it/s]


 18%|██████▏                           | 9107/50000 [1:39:04<7:48:50,  1.45it/s]


 18%|██████▏                           | 9108/50000 [1:39:05<7:27:51,  1.52it/s]


 18%|██████▏                           | 9109/50000 [1:39:06<7:51:20,  1.45it/s]


 18%|██████▏                           | 9110/50000 [1:39:06<7:42:33,  1.47it/s]


 18%|██████▏                           | 9111/50000 [1:39:07<7:24:38,  1.53it/s]


 18%|██████▏                           | 9112/50000 [1:39:08<7:38:13,  1.49it/s]


 18%|██████▏                           | 9113/50000 [1:39:08<7:17:47,  1.56it/s]


 18%|██████▏                           | 9114/50000 [1:39:09<7:07:23,  1.59it/s]


 18%|██████▏                           | 9115/50000 [1:39:09<6:54:55,  1.64it/s]


 18%|██████▏                           | 9116/50000 [1:39:10<7:02:57,  1.61it/s]


 18%|██████▏                           | 9117/50000 [1:39:11<6:55:40,  1.64it/s]


 18%|██████▏                           | 9118/50000 [1:39:11<7:06:10,  1.60it/s]


 18%|██████▏                           | 9119/50000 [1:39:12<6:54:04,  1.65it/s]


 18%|██████▏                           | 9120/50000 [1:39:12<7:07:38,  1.59it/s]


 18%|██████▏                           | 9121/50000 [1:39:13<7:24:58,  1.53it/s]


 18%|██████▏                           | 9122/50000 [1:39:14<7:27:55,  1.52it/s]


 18%|██████▏                           | 9123/50000 [1:39:14<7:15:43,  1.56it/s]


 18%|██████▏                           | 9124/50000 [1:39:15<7:37:35,  1.49it/s]


 18%|██████▏                           | 9125/50000 [1:39:16<7:21:03,  1.54it/s]


 18%|██████▏                           | 9126/50000 [1:39:16<7:18:41,  1.55it/s]


 18%|██████▏                           | 9127/50000 [1:39:17<7:07:55,  1.59it/s]


 18%|██████▏                           | 9128/50000 [1:39:18<7:13:17,  1.57it/s]


 18%|██████▏                           | 9129/50000 [1:39:18<7:14:25,  1.57it/s]


 18%|██████▏                           | 9130/50000 [1:39:19<6:59:43,  1.62it/s]


 18%|██████▏                           | 9131/50000 [1:39:20<7:12:44,  1.57it/s]


 18%|██████▏                           | 9132/50000 [1:39:20<7:21:51,  1.54it/s]


 18%|██████▏                           | 9133/50000 [1:39:21<6:56:26,  1.64it/s]


 18%|██████▏                           | 9134/50000 [1:39:21<6:50:00,  1.66it/s]


 18%|██████▏                           | 9135/50000 [1:39:22<6:50:46,  1.66it/s]


 18%|██████▏                           | 9136/50000 [1:39:23<6:59:07,  1.62it/s]


 18%|██████▏                           | 9137/50000 [1:39:23<7:08:56,  1.59it/s]


 18%|██████▏                           | 9138/50000 [1:39:24<7:30:41,  1.51it/s]


 18%|██████▏                           | 9139/50000 [1:39:25<7:08:54,  1.59it/s]


 18%|██████▏                           | 9140/50000 [1:39:25<7:34:22,  1.50it/s]


 18%|██████▏                           | 9141/50000 [1:39:26<7:53:45,  1.44it/s]


 18%|██████▏                           | 9142/50000 [1:39:27<7:34:01,  1.50it/s]


 18%|██████▏                           | 9143/50000 [1:39:27<7:27:50,  1.52it/s]


 18%|██████▏                           | 9144/50000 [1:39:28<7:41:55,  1.47it/s]


 18%|██████▏                           | 9145/50000 [1:39:29<7:25:25,  1.53it/s]


 18%|██████▏                           | 9146/50000 [1:39:29<7:19:03,  1.55it/s]


 18%|██████▏                           | 9147/50000 [1:39:30<7:08:04,  1.59it/s]


 18%|██████▏                           | 9148/50000 [1:39:30<6:57:48,  1.63it/s]


 18%|██████▏                           | 9149/50000 [1:39:31<7:23:32,  1.54it/s]


 18%|██████▏                           | 9150/50000 [1:39:32<7:11:15,  1.58it/s]


 18%|██████▏                           | 9151/50000 [1:39:32<7:01:30,  1.62it/s]


 18%|██████▏                           | 9152/50000 [1:39:33<7:45:02,  1.46it/s]


 18%|██████▏                           | 9153/50000 [1:39:34<7:19:59,  1.55it/s]


 18%|██████▏                           | 9154/50000 [1:39:34<7:19:05,  1.55it/s]


 18%|██████▏                           | 9155/50000 [1:39:35<7:18:30,  1.55it/s]


 18%|██████▏                           | 9156/50000 [1:39:36<7:06:28,  1.60it/s]


 18%|██████▏                           | 9157/50000 [1:39:36<7:08:26,  1.59it/s]


 18%|██████▏                           | 9158/50000 [1:39:37<6:54:23,  1.64it/s]


 18%|██████▏                           | 9159/50000 [1:39:38<7:10:20,  1.58it/s]


 18%|██████▏                           | 9160/50000 [1:39:38<7:12:17,  1.57it/s]


 18%|██████▏                           | 9161/50000 [1:39:39<7:21:30,  1.54it/s]


 18%|██████▏                           | 9162/50000 [1:39:39<7:07:19,  1.59it/s]


 18%|██████▏                           | 9163/50000 [1:39:40<7:16:50,  1.56it/s]


 18%|██████▏                           | 9164/50000 [1:39:41<7:06:32,  1.60it/s]


 18%|██████▏                           | 9165/50000 [1:39:41<7:16:12,  1.56it/s]


 18%|██████▏                           | 9166/50000 [1:39:42<7:02:18,  1.61it/s]


 18%|██████▏                           | 9167/50000 [1:39:43<7:04:09,  1.60it/s]


 18%|██████▏                           | 9168/50000 [1:39:43<7:15:46,  1.56it/s]


 18%|██████▏                           | 9169/50000 [1:39:44<7:25:18,  1.53it/s]


 18%|██████▏                           | 9170/50000 [1:39:45<7:16:18,  1.56it/s]


 18%|██████▏                           | 9171/50000 [1:39:45<7:06:08,  1.60it/s]


 18%|██████▏                           | 9172/50000 [1:39:46<7:11:55,  1.58it/s]


 18%|██████▏                           | 9173/50000 [1:39:46<7:10:53,  1.58it/s]


 18%|██████▏                           | 9174/50000 [1:39:47<6:57:11,  1.63it/s]


 18%|██████▏                           | 9175/50000 [1:39:48<7:32:12,  1.50it/s]


 18%|██████▏                           | 9176/50000 [1:39:49<7:52:53,  1.44it/s]


 18%|██████▏                           | 9177/50000 [1:39:49<7:26:37,  1.52it/s]


 18%|██████▏                           | 9178/50000 [1:39:50<7:25:00,  1.53it/s]


 18%|██████▏                           | 9179/50000 [1:39:50<7:04:30,  1.60it/s]


 18%|██████▏                           | 9180/50000 [1:39:51<6:54:04,  1.64it/s]


 18%|██████▏                           | 9181/50000 [1:39:51<6:50:31,  1.66it/s]


 18%|██████▏                           | 9182/50000 [1:39:52<6:46:49,  1.67it/s]


 18%|██████▏                           | 9183/50000 [1:39:53<7:11:28,  1.58it/s]


 18%|██████▏                           | 9184/50000 [1:39:53<7:01:27,  1.61it/s]


 18%|██████▏                           | 9185/50000 [1:39:54<6:59:32,  1.62it/s]


 18%|██████▏                           | 9186/50000 [1:39:55<7:19:06,  1.55it/s]


 18%|██████▏                           | 9187/50000 [1:39:56<8:02:42,  1.41it/s]


 18%|██████▏                           | 9188/50000 [1:39:56<7:54:09,  1.43it/s]


 18%|██████▏                           | 9189/50000 [1:39:57<8:08:08,  1.39it/s]


 18%|██████▏                           | 9190/50000 [1:39:58<8:14:00,  1.38it/s]


 18%|██████▏                           | 9191/50000 [1:39:58<8:03:20,  1.41it/s]


 18%|██████▎                           | 9192/50000 [1:39:59<8:05:47,  1.40it/s]


 18%|██████▎                           | 9193/50000 [1:40:00<7:50:02,  1.45it/s]


 18%|██████▎                           | 9194/50000 [1:40:00<7:32:14,  1.50it/s]


 18%|██████▎                           | 9195/50000 [1:40:01<7:35:36,  1.49it/s]


 18%|██████▎                           | 9196/50000 [1:40:02<7:20:55,  1.54it/s]


 18%|██████▎                           | 9197/50000 [1:40:02<7:21:03,  1.54it/s]


 18%|██████▎                           | 9198/50000 [1:40:03<7:06:51,  1.59it/s]


 18%|██████▎                           | 9199/50000 [1:40:03<6:58:15,  1.63it/s]


 18%|██████▎                           | 9200/50000 [1:40:04<7:23:42,  1.53it/s]
                                                                                
{'loss': 3.3567, 'grad_norm': 2.9383485317230225, 'learning_rate': 0.000816, 'epoch': 0.48}

 18%|██████▎                           | 9200/50000 [1:40:04<7:23:42,  1.53it/s]


 18%|██████▎                           | 9201/50000 [1:40:05<7:25:39,  1.53it/s]


 18%|██████▎                           | 9202/50000 [1:40:06<7:48:11,  1.45it/s]


 18%|██████▎                           | 9203/50000 [1:40:06<7:41:33,  1.47it/s]


 18%|██████▎                           | 9204/50000 [1:40:07<7:26:17,  1.52it/s]


 18%|██████▎                           | 9205/50000 [1:40:08<7:23:20,  1.53it/s]


 18%|██████▎                           | 9206/50000 [1:40:08<7:08:24,  1.59it/s]


 18%|██████▎                           | 9207/50000 [1:40:09<7:00:27,  1.62it/s]


 18%|██████▎                           | 9208/50000 [1:40:09<6:57:31,  1.63it/s]


 18%|██████▎                           | 9209/50000 [1:40:10<7:08:42,  1.59it/s]


 18%|██████▎                           | 9210/50000 [1:40:11<7:12:45,  1.57it/s]


 18%|██████▎                           | 9211/50000 [1:40:11<7:13:06,  1.57it/s]


 18%|██████▎                           | 9212/50000 [1:40:12<7:50:21,  1.45it/s]


 18%|██████▎                           | 9213/50000 [1:40:13<7:29:45,  1.51it/s]


 18%|██████▎                           | 9214/50000 [1:40:13<7:22:09,  1.54it/s]


 18%|██████▎                           | 9215/50000 [1:40:14<7:23:24,  1.53it/s]


 18%|██████▎                           | 9216/50000 [1:40:15<7:11:26,  1.58it/s]


 18%|██████▎                           | 9217/50000 [1:40:15<6:49:21,  1.66it/s]


 18%|██████▎                           | 9218/50000 [1:40:16<7:12:15,  1.57it/s]


 18%|██████▎                           | 9219/50000 [1:40:16<6:44:53,  1.68it/s]


 18%|██████▎                           | 9220/50000 [1:40:17<6:59:42,  1.62it/s]


 18%|██████▎                           | 9221/50000 [1:40:17<6:40:48,  1.70it/s]


 18%|██████▎                           | 9222/50000 [1:40:18<6:43:40,  1.68it/s]


 18%|██████▎                           | 9223/50000 [1:40:19<6:24:26,  1.77it/s]


 18%|██████▎                           | 9224/50000 [1:40:19<6:39:58,  1.70it/s]


 18%|██████▎                           | 9225/50000 [1:40:20<6:36:43,  1.71it/s]


 18%|██████▎                           | 9226/50000 [1:40:20<6:31:50,  1.73it/s]


 18%|██████▎                           | 9227/50000 [1:40:21<6:46:29,  1.67it/s]


 18%|██████▎                           | 9228/50000 [1:40:22<7:11:02,  1.58it/s]


 18%|██████▎                           | 9229/50000 [1:40:22<6:54:30,  1.64it/s]


 18%|██████▎                           | 9230/50000 [1:40:23<7:07:30,  1.59it/s]


 18%|██████▎                           | 9231/50000 [1:40:24<7:54:22,  1.43it/s]


 18%|██████▎                           | 9232/50000 [1:40:24<7:45:23,  1.46it/s]


 18%|██████▎                           | 9233/50000 [1:40:25<7:25:00,  1.53it/s]


 18%|██████▎                           | 9234/50000 [1:40:26<7:10:43,  1.58it/s]


 18%|██████▎                           | 9235/50000 [1:40:26<7:16:36,  1.56it/s]


 18%|██████▎                           | 9236/50000 [1:40:27<7:39:01,  1.48it/s]


 18%|██████▎                           | 9237/50000 [1:40:28<7:53:21,  1.44it/s]


 18%|██████▎                           | 9238/50000 [1:40:28<7:16:59,  1.55it/s]


 18%|██████▎                           | 9239/50000 [1:40:29<7:03:47,  1.60it/s]


 18%|██████▎                           | 9240/50000 [1:40:30<7:10:01,  1.58it/s]


 18%|██████▎                           | 9241/50000 [1:40:30<7:04:00,  1.60it/s]


 18%|██████▎                           | 9242/50000 [1:40:31<7:07:32,  1.59it/s]


 18%|██████▎                           | 9243/50000 [1:40:31<6:50:43,  1.65it/s]


 18%|██████▎                           | 9244/50000 [1:40:32<6:39:10,  1.70it/s]


 18%|██████▎                           | 9245/50000 [1:40:32<6:41:15,  1.69it/s]


 18%|██████▎                           | 9246/50000 [1:40:33<6:52:58,  1.64it/s]


 18%|██████▎                           | 9247/50000 [1:40:34<6:52:05,  1.65it/s]


 18%|██████▎                           | 9248/50000 [1:40:34<7:24:25,  1.53it/s]


 18%|██████▎                           | 9249/50000 [1:40:35<7:39:14,  1.48it/s]


 18%|██████▎                           | 9250/50000 [1:40:36<7:16:49,  1.55it/s]


 19%|██████▎                           | 9251/50000 [1:40:37<8:00:55,  1.41it/s]


 19%|██████▎                           | 9252/50000 [1:40:37<7:38:56,  1.48it/s]


 19%|██████▎                           | 9253/50000 [1:40:38<7:22:11,  1.54it/s]


 19%|██████▎                           | 9254/50000 [1:40:38<7:20:17,  1.54it/s]


 19%|██████▎                           | 9255/50000 [1:40:39<7:41:26,  1.47it/s]


 19%|██████▎                           | 9256/50000 [1:40:40<7:41:33,  1.47it/s]


 19%|██████▎                           | 9257/50000 [1:40:41<7:59:00,  1.42it/s]


 19%|██████▎                           | 9258/50000 [1:40:41<8:11:12,  1.38it/s]


 19%|██████▎                           | 9259/50000 [1:40:42<8:14:30,  1.37it/s]


 19%|██████▎                           | 9260/50000 [1:40:43<7:45:21,  1.46it/s]


 19%|██████▎                           | 9261/50000 [1:40:43<7:22:20,  1.54it/s]


 19%|██████▎                           | 9262/50000 [1:40:44<8:41:09,  1.30it/s]


 19%|██████▎                           | 9263/50000 [1:40:45<7:58:50,  1.42it/s]


 19%|██████▎                           | 9264/50000 [1:40:46<7:51:07,  1.44it/s]


 19%|██████▎                           | 9265/50000 [1:40:46<7:49:44,  1.45it/s]


 19%|██████▎                           | 9266/50000 [1:40:47<7:38:35,  1.48it/s]


 19%|██████▎                           | 9267/50000 [1:40:48<7:34:36,  1.49it/s]


 19%|██████▎                           | 9268/50000 [1:40:48<7:17:28,  1.55it/s]


 19%|██████▎                           | 9269/50000 [1:40:49<7:13:23,  1.57it/s]


 19%|██████▎                           | 9270/50000 [1:40:49<7:23:37,  1.53it/s]


 19%|██████▎                           | 9271/50000 [1:40:50<7:46:37,  1.45it/s]


 19%|██████▎                           | 9272/50000 [1:40:51<7:37:25,  1.48it/s]


 19%|██████▎                           | 9273/50000 [1:40:52<7:28:25,  1.51it/s]


 19%|██████▎                           | 9274/50000 [1:40:52<7:09:13,  1.58it/s]


 19%|██████▎                           | 9275/50000 [1:40:53<7:10:04,  1.58it/s]


 19%|██████▎                           | 9276/50000 [1:40:53<7:38:17,  1.48it/s]


 19%|██████▎                           | 9277/50000 [1:40:54<7:07:50,  1.59it/s]


 19%|██████▎                           | 9278/50000 [1:40:55<7:09:04,  1.58it/s]


 19%|██████▎                           | 9279/50000 [1:40:55<7:03:19,  1.60it/s]


 19%|██████▎                           | 9280/50000 [1:40:56<7:05:48,  1.59it/s]


 19%|██████▎                           | 9281/50000 [1:40:56<6:54:48,  1.64it/s]


 19%|██████▎                           | 9282/50000 [1:40:57<7:36:08,  1.49it/s]


 19%|██████▎                           | 9283/50000 [1:40:58<7:12:12,  1.57it/s]


 19%|██████▎                           | 9284/50000 [1:40:58<7:17:11,  1.55it/s]


 19%|██████▎                           | 9285/50000 [1:40:59<7:41:52,  1.47it/s]


 19%|██████▎                           | 9286/50000 [1:41:00<7:07:41,  1.59it/s]


 19%|██████▎                           | 9287/50000 [1:41:01<7:28:37,  1.51it/s]


 19%|██████▎                           | 9288/50000 [1:41:01<7:32:26,  1.50it/s]


 19%|██████▎                           | 9289/50000 [1:41:02<7:52:40,  1.44it/s]


 19%|██████▎                           | 9290/50000 [1:41:03<7:32:32,  1.50it/s]


 19%|██████▎                           | 9291/50000 [1:41:03<7:19:38,  1.54it/s]


 19%|██████▎                           | 9292/50000 [1:41:04<7:17:14,  1.55it/s]


 19%|██████▎                           | 9293/50000 [1:41:04<7:09:02,  1.58it/s]


 19%|██████▎                           | 9294/50000 [1:41:05<6:55:33,  1.63it/s]


 19%|██████▎                           | 9295/50000 [1:41:06<7:06:41,  1.59it/s]


 19%|██████▎                           | 9296/50000 [1:41:06<7:18:27,  1.55it/s]


 19%|██████▎                           | 9297/50000 [1:41:07<7:20:05,  1.54it/s]


 19%|██████▎                           | 9298/50000 [1:41:08<7:19:59,  1.54it/s]


 19%|██████▎                           | 9299/50000 [1:41:08<7:25:05,  1.52it/s]


 19%|██████▎                           | 9300/50000 [1:41:09<7:13:06,  1.57it/s]
                                                                                
{'loss': 3.3633, 'grad_norm': 2.940767288208008, 'learning_rate': 0.0008139999999999999, 'epoch': 0.49}

 19%|██████▎                           | 9300/50000 [1:41:09<7:13:06,  1.57it/s]


 19%|██████▎                           | 9301/50000 [1:41:09<6:58:25,  1.62it/s]


 19%|██████▎                           | 9302/50000 [1:41:10<6:53:10,  1.64it/s]


 19%|██████▎                           | 9303/50000 [1:41:11<7:34:38,  1.49it/s]


 19%|██████▎                           | 9304/50000 [1:41:12<7:36:36,  1.49it/s]


 19%|██████▎                           | 9305/50000 [1:41:12<7:16:03,  1.56it/s]


 19%|██████▎                           | 9306/50000 [1:41:13<7:23:50,  1.53it/s]


 19%|██████▎                           | 9307/50000 [1:41:14<7:49:05,  1.45it/s]


 19%|██████▎                           | 9308/50000 [1:41:14<7:28:40,  1.51it/s]


 19%|██████▎                           | 9309/50000 [1:41:15<7:42:18,  1.47it/s]


 19%|██████▎                           | 9310/50000 [1:41:16<8:15:50,  1.37it/s]


 19%|██████▎                           | 9311/50000 [1:41:16<7:41:01,  1.47it/s]


 19%|██████▎                           | 9312/50000 [1:41:17<8:35:59,  1.31it/s]


 19%|██████▎                           | 9313/50000 [1:41:18<8:01:27,  1.41it/s]


 19%|██████▎                           | 9314/50000 [1:41:19<8:26:15,  1.34it/s]


 19%|██████▎                           | 9315/50000 [1:41:19<8:40:59,  1.30it/s]


 19%|██████▎                           | 9316/50000 [1:41:20<8:12:54,  1.38it/s]


 19%|██████▎                           | 9317/50000 [1:41:21<8:16:49,  1.36it/s]


 19%|██████▎                           | 9318/50000 [1:41:22<8:22:17,  1.35it/s]


 19%|██████▎                           | 9319/50000 [1:41:22<8:09:52,  1.38it/s]


 19%|██████▎                           | 9320/50000 [1:41:23<8:00:02,  1.41it/s]


 19%|██████▎                           | 9321/50000 [1:41:24<7:50:34,  1.44it/s]


 19%|██████▎                           | 9322/50000 [1:41:24<7:56:08,  1.42it/s]


 19%|██████▎                           | 9323/50000 [1:41:25<7:33:28,  1.50it/s]


 19%|██████▎                           | 9324/50000 [1:41:26<7:34:21,  1.49it/s]


 19%|██████▎                           | 9325/50000 [1:41:26<7:35:11,  1.49it/s]


 19%|██████▎                           | 9326/50000 [1:41:27<7:35:46,  1.49it/s]


 19%|██████▎                           | 9327/50000 [1:41:28<7:20:43,  1.54it/s]


 19%|██████▎                           | 9328/50000 [1:41:28<7:24:10,  1.53it/s]


 19%|██████▎                           | 9329/50000 [1:41:29<7:13:40,  1.56it/s]


 19%|██████▎                           | 9330/50000 [1:41:30<7:20:35,  1.54it/s]


 19%|██████▎                           | 9331/50000 [1:41:30<7:08:32,  1.58it/s]


 19%|██████▎                           | 9332/50000 [1:41:31<6:46:33,  1.67it/s]


 19%|██████▎                           | 9333/50000 [1:41:31<6:52:30,  1.64it/s]


 19%|██████▎                           | 9334/50000 [1:41:32<7:40:36,  1.47it/s]


 19%|██████▎                           | 9335/50000 [1:41:33<7:39:18,  1.48it/s]


 19%|██████▎                           | 9336/50000 [1:41:33<7:41:05,  1.47it/s]


 19%|██████▎                           | 9337/50000 [1:41:34<7:22:57,  1.53it/s]


 19%|██████▎                           | 9338/50000 [1:41:35<7:05:08,  1.59it/s]


 19%|██████▎                           | 9339/50000 [1:41:35<6:48:49,  1.66it/s]


 19%|██████▎                           | 9340/50000 [1:41:36<7:01:45,  1.61it/s]


 19%|██████▎                           | 9341/50000 [1:41:36<6:47:46,  1.66it/s]


 19%|██████▎                           | 9342/50000 [1:41:37<7:37:11,  1.48it/s]


 19%|██████▎                           | 9343/50000 [1:41:38<7:29:13,  1.51it/s]


 19%|██████▎                           | 9344/50000 [1:41:38<7:17:09,  1.55it/s]


 19%|██████▎                           | 9345/50000 [1:41:39<7:10:04,  1.58it/s]


 19%|██████▎                           | 9346/50000 [1:41:40<6:56:11,  1.63it/s]


 19%|██████▎                           | 9347/50000 [1:41:40<7:17:24,  1.55it/s]


 19%|██████▎                           | 9348/50000 [1:41:41<7:16:39,  1.55it/s]


 19%|██████▎                           | 9349/50000 [1:41:42<6:47:09,  1.66it/s]


 19%|██████▎                           | 9350/50000 [1:41:42<7:31:30,  1.50it/s]


 19%|██████▎                           | 9351/50000 [1:41:43<7:23:55,  1.53it/s]


 19%|██████▎                           | 9352/50000 [1:41:44<7:30:15,  1.50it/s]


 19%|██████▎                           | 9353/50000 [1:41:44<7:19:07,  1.54it/s]


 19%|██████▎                           | 9354/50000 [1:41:45<7:20:21,  1.54it/s]


 19%|██████▎                           | 9355/50000 [1:41:46<7:08:17,  1.58it/s]


 19%|██████▎                           | 9356/50000 [1:41:46<8:09:53,  1.38it/s]


 19%|██████▎                           | 9357/50000 [1:41:47<8:01:00,  1.41it/s]


 19%|██████▎                           | 9358/50000 [1:41:48<8:12:16,  1.38it/s]


 19%|██████▎                           | 9359/50000 [1:41:49<7:53:46,  1.43it/s]


 19%|██████▎                           | 9360/50000 [1:41:49<7:42:02,  1.47it/s]


 19%|██████▎                           | 9361/50000 [1:41:50<8:09:54,  1.38it/s]


 19%|██████▎                           | 9362/50000 [1:41:51<8:48:58,  1.28it/s]


 19%|██████▎                           | 9363/50000 [1:41:52<8:16:53,  1.36it/s]


 19%|██████▎                           | 9364/50000 [1:41:52<7:41:47,  1.47it/s]


 19%|██████▎                           | 9365/50000 [1:41:53<7:38:53,  1.48it/s]


 19%|██████▎                           | 9366/50000 [1:41:53<7:22:36,  1.53it/s]


 19%|██████▎                           | 9367/50000 [1:41:54<7:08:52,  1.58it/s]


 19%|██████▎                           | 9368/50000 [1:41:55<7:10:37,  1.57it/s]


 19%|██████▎                           | 9369/50000 [1:41:55<7:26:11,  1.52it/s]


 19%|██████▎                           | 9370/50000 [1:41:56<7:06:20,  1.59it/s]


 19%|██████▎                           | 9371/50000 [1:41:57<7:35:18,  1.49it/s]


 19%|██████▎                           | 9372/50000 [1:41:57<7:46:24,  1.45it/s]


 19%|██████▎                           | 9373/50000 [1:41:58<7:34:21,  1.49it/s]


 19%|██████▎                           | 9374/50000 [1:41:59<7:14:25,  1.56it/s]


 19%|██████▍                           | 9375/50000 [1:41:59<7:51:47,  1.44it/s]


 19%|██████▍                           | 9376/50000 [1:42:00<7:26:37,  1.52it/s]


 19%|██████▍                           | 9377/50000 [1:42:01<7:23:04,  1.53it/s]


 19%|██████▍                           | 9378/50000 [1:42:01<7:20:32,  1.54it/s]


 19%|██████▍                           | 9379/50000 [1:42:02<7:33:04,  1.49it/s]


 19%|██████▍                           | 9380/50000 [1:42:03<7:24:57,  1.52it/s]


 19%|██████▍                           | 9381/50000 [1:42:03<8:05:59,  1.39it/s]


 19%|██████▍                           | 9382/50000 [1:42:04<7:54:31,  1.43it/s]


 19%|██████▍                           | 9383/50000 [1:42:05<7:47:54,  1.45it/s]


 19%|██████▍                           | 9384/50000 [1:42:05<7:25:04,  1.52it/s]


 19%|██████▍                           | 9385/50000 [1:42:06<7:23:08,  1.53it/s]


 19%|██████▍                           | 9386/50000 [1:42:07<7:11:27,  1.57it/s]


 19%|██████▍                           | 9387/50000 [1:42:07<7:14:52,  1.56it/s]


 19%|██████▍                           | 9388/50000 [1:42:08<7:01:55,  1.60it/s]


 19%|██████▍                           | 9389/50000 [1:42:08<7:11:12,  1.57it/s]


 19%|██████▍                           | 9390/50000 [1:42:09<7:14:58,  1.56it/s]


 19%|██████▍                           | 9391/50000 [1:42:10<6:51:07,  1.65it/s]


 19%|██████▍                           | 9392/50000 [1:42:10<7:00:36,  1.61it/s]


 19%|██████▍                           | 9393/50000 [1:42:11<6:48:10,  1.66it/s]


 19%|██████▍                           | 9394/50000 [1:42:11<6:41:07,  1.69it/s]


 19%|██████▍                           | 9395/50000 [1:42:12<6:44:40,  1.67it/s]


 19%|██████▍                           | 9396/50000 [1:42:13<6:37:02,  1.70it/s]


 19%|██████▍                           | 9397/50000 [1:42:13<6:41:47,  1.68it/s]


 19%|██████▍                           | 9398/50000 [1:42:14<6:56:42,  1.62it/s]


 19%|██████▍                           | 9399/50000 [1:42:14<6:49:19,  1.65it/s]


 19%|██████▍                           | 9400/50000 [1:42:15<6:41:30,  1.69it/s]
                                                                                
{'loss': 3.346, 'grad_norm': 2.7302396297454834, 'learning_rate': 0.0008120000000000001, 'epoch': 0.49}

 19%|██████▍                           | 9400/50000 [1:42:15<6:41:30,  1.69it/s]


 19%|██████▍                           | 9401/50000 [1:42:16<6:55:19,  1.63it/s]


 19%|██████▍                           | 9402/50000 [1:42:16<7:18:24,  1.54it/s]


 19%|██████▍                           | 9403/50000 [1:42:17<7:15:59,  1.55it/s]


 19%|██████▍                           | 9404/50000 [1:42:18<7:40:29,  1.47it/s]


 19%|██████▍                           | 9405/50000 [1:42:18<7:15:53,  1.55it/s]


 19%|██████▍                           | 9406/50000 [1:42:19<7:43:03,  1.46it/s]


 19%|██████▍                           | 9407/50000 [1:42:20<8:11:51,  1.38it/s]


 19%|██████▍                           | 9408/50000 [1:42:21<7:58:25,  1.41it/s]


 19%|██████▍                           | 9409/50000 [1:42:21<7:34:49,  1.49it/s]


 19%|██████▍                           | 9410/50000 [1:42:22<7:19:38,  1.54it/s]


 19%|██████▍                           | 9411/50000 [1:42:22<7:02:33,  1.60it/s]


 19%|██████▍                           | 9412/50000 [1:42:23<7:06:02,  1.59it/s]


 19%|██████▍                           | 9413/50000 [1:42:24<8:10:35,  1.38it/s]


 19%|██████▍                           | 9414/50000 [1:42:25<8:13:28,  1.37it/s]


 19%|██████▍                           | 9415/50000 [1:42:25<7:46:53,  1.45it/s]


 19%|██████▍                           | 9416/50000 [1:42:26<7:58:15,  1.41it/s]


 19%|██████▍                           | 9417/50000 [1:42:27<7:44:55,  1.45it/s]


 19%|██████▍                           | 9418/50000 [1:42:27<7:35:55,  1.48it/s]


 19%|██████▍                           | 9419/50000 [1:42:28<7:20:54,  1.53it/s]


 19%|██████▍                           | 9420/50000 [1:42:29<7:28:22,  1.51it/s]


 19%|██████▍                           | 9421/50000 [1:42:29<7:11:45,  1.57it/s]


 19%|██████▍                           | 9422/50000 [1:42:30<6:47:51,  1.66it/s]


 19%|██████▍                           | 9423/50000 [1:42:30<6:47:09,  1.66it/s]


 19%|██████▍                           | 9424/50000 [1:42:31<7:02:03,  1.60it/s]


 19%|██████▍                           | 9425/50000 [1:42:32<6:34:43,  1.71it/s]


 19%|██████▍                           | 9426/50000 [1:42:32<7:15:57,  1.55it/s]


 19%|██████▍                           | 9427/50000 [1:42:33<6:59:14,  1.61it/s]


 19%|██████▍                           | 9428/50000 [1:42:33<6:46:14,  1.66it/s]


 19%|██████▍                           | 9429/50000 [1:42:34<7:00:09,  1.61it/s]


 19%|██████▍                           | 9430/50000 [1:42:35<6:54:28,  1.63it/s]


 19%|██████▍                           | 9431/50000 [1:42:35<7:04:38,  1.59it/s]


 19%|██████▍                           | 9432/50000 [1:42:36<7:12:54,  1.56it/s]


 19%|██████▍                           | 9433/50000 [1:42:37<6:59:19,  1.61it/s]


 19%|██████▍                           | 9434/50000 [1:42:37<6:50:56,  1.65it/s]


 19%|██████▍                           | 9435/50000 [1:42:38<7:35:11,  1.49it/s]


 19%|██████▍                           | 9436/50000 [1:42:39<7:30:06,  1.50it/s]


 19%|██████▍                           | 9437/50000 [1:42:39<7:24:06,  1.52it/s]


 19%|██████▍                           | 9438/50000 [1:42:40<7:09:35,  1.57it/s]


 19%|██████▍                           | 9439/50000 [1:42:41<7:28:12,  1.51it/s]


 19%|██████▍                           | 9440/50000 [1:42:41<7:15:08,  1.55it/s]


 19%|██████▍                           | 9441/50000 [1:42:42<7:12:18,  1.56it/s]


 19%|██████▍                           | 9442/50000 [1:42:42<6:54:57,  1.63it/s]


 19%|██████▍                           | 9443/50000 [1:42:43<6:50:27,  1.65it/s]


 19%|██████▍                           | 9444/50000 [1:42:44<6:55:02,  1.63it/s]


 19%|██████▍                           | 9445/50000 [1:42:44<7:03:16,  1.60it/s]


 19%|██████▍                           | 9446/50000 [1:42:45<7:03:58,  1.59it/s]


 19%|██████▍                           | 9447/50000 [1:42:46<7:28:07,  1.51it/s]


 19%|██████▍                           | 9448/50000 [1:42:46<7:00:19,  1.61it/s]


 19%|██████▍                           | 9449/50000 [1:42:47<7:41:48,  1.46it/s]


 19%|██████▍                           | 9450/50000 [1:42:48<7:14:15,  1.56it/s]


 19%|██████▍                           | 9451/50000 [1:42:48<7:30:10,  1.50it/s]


 19%|██████▍                           | 9452/50000 [1:42:49<7:08:44,  1.58it/s]


 19%|██████▍                           | 9453/50000 [1:42:49<7:09:04,  1.57it/s]


 19%|██████▍                           | 9454/50000 [1:42:50<7:14:25,  1.56it/s]


 19%|██████▍                           | 9455/50000 [1:42:51<7:39:19,  1.47it/s]


 19%|██████▍                           | 9456/50000 [1:42:52<7:46:01,  1.45it/s]


 19%|██████▍                           | 9457/50000 [1:42:52<7:22:15,  1.53it/s]


 19%|██████▍                           | 9458/50000 [1:42:53<7:46:03,  1.45it/s]


 19%|██████▍                           | 9459/50000 [1:42:53<7:17:13,  1.55it/s]


 19%|██████▍                           | 9460/50000 [1:42:54<7:17:27,  1.54it/s]


 19%|██████▍                           | 9461/50000 [1:42:55<7:13:49,  1.56it/s]


 19%|██████▍                           | 9462/50000 [1:42:55<7:01:01,  1.60it/s]


 19%|██████▍                           | 9463/50000 [1:42:56<7:04:48,  1.59it/s]


 19%|██████▍                           | 9464/50000 [1:42:57<6:50:51,  1.64it/s]


 19%|██████▍                           | 9465/50000 [1:42:57<6:59:01,  1.61it/s]


 19%|██████▍                           | 9466/50000 [1:42:58<7:19:01,  1.54it/s]


 19%|██████▍                           | 9467/50000 [1:42:59<7:10:56,  1.57it/s]


 19%|██████▍                           | 9468/50000 [1:42:59<7:08:02,  1.58it/s]


 19%|██████▍                           | 9469/50000 [1:43:00<6:53:10,  1.63it/s]


 19%|██████▍                           | 9470/50000 [1:43:00<6:47:47,  1.66it/s]


 19%|██████▍                           | 9471/50000 [1:43:01<6:41:26,  1.68it/s]


 19%|██████▍                           | 9472/50000 [1:43:01<6:44:31,  1.67it/s]


 19%|██████▍                           | 9473/50000 [1:43:02<6:45:31,  1.67it/s]


 19%|██████▍                           | 9474/50000 [1:43:03<6:47:30,  1.66it/s]


 19%|██████▍                           | 9475/50000 [1:43:03<7:11:21,  1.57it/s]


 19%|██████▍                           | 9476/50000 [1:43:04<7:17:17,  1.54it/s]


 19%|██████▍                           | 9477/50000 [1:43:05<7:15:00,  1.55it/s]


 19%|██████▍                           | 9478/50000 [1:43:05<7:29:08,  1.50it/s]


 19%|██████▍                           | 9479/50000 [1:43:06<7:25:47,  1.51it/s]


 19%|██████▍                           | 9480/50000 [1:43:07<7:24:49,  1.52it/s]


 19%|██████▍                           | 9481/50000 [1:43:07<7:24:09,  1.52it/s]


 19%|██████▍                           | 9482/50000 [1:43:08<7:10:40,  1.57it/s]


 19%|██████▍                           | 9483/50000 [1:43:09<7:01:15,  1.60it/s]


 19%|██████▍                           | 9484/50000 [1:43:09<6:54:40,  1.63it/s]


 19%|██████▍                           | 9485/50000 [1:43:10<6:57:44,  1.62it/s]


 19%|██████▍                           | 9486/50000 [1:43:10<7:03:46,  1.59it/s]


 19%|██████▍                           | 9487/50000 [1:43:11<7:06:43,  1.58it/s]


 19%|██████▍                           | 9488/50000 [1:43:12<7:08:45,  1.57it/s]


 19%|██████▍                           | 9489/50000 [1:43:12<7:14:07,  1.56it/s]


 19%|██████▍                           | 9490/50000 [1:43:13<7:15:14,  1.55it/s]


 19%|██████▍                           | 9491/50000 [1:43:14<7:19:48,  1.54it/s]


 19%|██████▍                           | 9492/50000 [1:43:14<7:23:03,  1.52it/s]


 19%|██████▍                           | 9493/50000 [1:43:15<7:23:58,  1.52it/s]


 19%|██████▍                           | 9494/50000 [1:43:16<7:05:36,  1.59it/s]


 19%|██████▍                           | 9495/50000 [1:43:16<7:25:03,  1.52it/s]


 19%|██████▍                           | 9496/50000 [1:43:17<7:20:15,  1.53it/s]


 19%|██████▍                           | 9497/50000 [1:43:18<7:19:27,  1.54it/s]


 19%|██████▍                           | 9498/50000 [1:43:18<7:26:32,  1.51it/s]


 19%|██████▍                           | 9499/50000 [1:43:19<7:27:44,  1.51it/s]


 19%|██████▍                           | 9500/50000 [1:43:20<7:25:53,  1.51it/s]
                                                                                
{'loss': 3.3296, 'grad_norm': 3.130507469177246, 'learning_rate': 0.0008100000000000001, 'epoch': 0.5}

 19%|██████▍                           | 9500/50000 [1:43:20<7:25:53,  1.51it/s]


 19%|██████▍                           | 9501/50000 [1:43:20<7:20:52,  1.53it/s]


 19%|██████▍                           | 9502/50000 [1:43:21<7:09:31,  1.57it/s]


 19%|██████▍                           | 9503/50000 [1:43:21<7:09:42,  1.57it/s]


 19%|██████▍                           | 9504/50000 [1:43:22<7:01:48,  1.60it/s]


 19%|██████▍                           | 9505/50000 [1:43:23<6:56:15,  1.62it/s]


 19%|██████▍                           | 9506/50000 [1:43:23<6:47:25,  1.66it/s]


 19%|██████▍                           | 9507/50000 [1:43:24<6:44:53,  1.67it/s]


 19%|██████▍                           | 9508/50000 [1:43:24<6:51:03,  1.64it/s]


 19%|██████▍                           | 9509/50000 [1:43:25<7:20:07,  1.53it/s]


 19%|██████▍                           | 9510/50000 [1:43:26<7:25:43,  1.51it/s]


 19%|██████▍                           | 9511/50000 [1:43:27<7:29:40,  1.50it/s]


 19%|██████▍                           | 9512/50000 [1:43:27<7:29:57,  1.50it/s]


 19%|██████▍                           | 9513/50000 [1:43:28<7:40:54,  1.46it/s]


 19%|██████▍                           | 9514/50000 [1:43:29<7:46:49,  1.45it/s]


 19%|██████▍                           | 9515/50000 [1:43:29<7:50:58,  1.43it/s]


 19%|██████▍                           | 9516/50000 [1:43:30<7:46:01,  1.45it/s]


 19%|██████▍                           | 9517/50000 [1:43:31<7:41:22,  1.46it/s]


 19%|██████▍                           | 9518/50000 [1:43:31<7:23:54,  1.52it/s]


 19%|██████▍                           | 9519/50000 [1:43:32<7:37:51,  1.47it/s]


 19%|██████▍                           | 9520/50000 [1:43:33<7:27:54,  1.51it/s]


 19%|██████▍                           | 9521/50000 [1:43:33<7:07:02,  1.58it/s]


 19%|██████▍                           | 9522/50000 [1:43:34<7:10:08,  1.57it/s]


 19%|██████▍                           | 9523/50000 [1:43:35<7:13:33,  1.56it/s]


 19%|██████▍                           | 9524/50000 [1:43:35<6:48:26,  1.65it/s]


 19%|██████▍                           | 9525/50000 [1:43:36<6:39:15,  1.69it/s]


 19%|██████▍                           | 9526/50000 [1:43:36<6:39:11,  1.69it/s]


 19%|██████▍                           | 9527/50000 [1:43:37<6:57:14,  1.62it/s]


 19%|██████▍                           | 9528/50000 [1:43:37<6:48:03,  1.65it/s]


 19%|██████▍                           | 9529/50000 [1:43:38<6:31:48,  1.72it/s]


 19%|██████▍                           | 9530/50000 [1:43:39<7:09:17,  1.57it/s]


 19%|██████▍                           | 9531/50000 [1:43:39<6:58:57,  1.61it/s]


 19%|██████▍                           | 9532/50000 [1:43:40<7:03:07,  1.59it/s]


 19%|██████▍                           | 9533/50000 [1:43:41<6:57:06,  1.62it/s]


 19%|██████▍                           | 9534/50000 [1:43:41<7:37:48,  1.47it/s]


 19%|██████▍                           | 9535/50000 [1:43:42<8:14:13,  1.36it/s]


 19%|██████▍                           | 9536/50000 [1:43:43<7:44:18,  1.45it/s]


 19%|██████▍                           | 9537/50000 [1:43:43<7:20:53,  1.53it/s]


 19%|██████▍                           | 9538/50000 [1:43:44<7:32:44,  1.49it/s]


 19%|██████▍                           | 9539/50000 [1:43:45<7:15:17,  1.55it/s]


 19%|██████▍                           | 9540/50000 [1:43:45<7:19:46,  1.53it/s]


 19%|██████▍                           | 9541/50000 [1:43:46<7:22:53,  1.52it/s]


 19%|██████▍                           | 9542/50000 [1:43:47<7:37:04,  1.48it/s]


 19%|██████▍                           | 9543/50000 [1:43:47<7:23:17,  1.52it/s]


 19%|██████▍                           | 9544/50000 [1:43:48<7:18:55,  1.54it/s]


 19%|██████▍                           | 9545/50000 [1:43:49<7:01:53,  1.60it/s]


 19%|██████▍                           | 9546/50000 [1:43:49<7:06:24,  1.58it/s]


 19%|██████▍                           | 9547/50000 [1:43:50<7:37:12,  1.47it/s]


 19%|██████▍                           | 9548/50000 [1:43:51<7:18:26,  1.54it/s]


 19%|██████▍                           | 9549/50000 [1:43:51<7:24:27,  1.52it/s]


 19%|██████▍                           | 9550/50000 [1:43:52<7:44:42,  1.45it/s]


 19%|██████▍                           | 9551/50000 [1:43:53<8:00:08,  1.40it/s]


 19%|██████▍                           | 9552/50000 [1:43:54<8:01:28,  1.40it/s]


 19%|██████▍                           | 9553/50000 [1:43:54<8:06:39,  1.39it/s]


 19%|██████▍                           | 9554/50000 [1:43:55<7:36:35,  1.48it/s]


 19%|██████▍                           | 9555/50000 [1:43:56<7:32:04,  1.49it/s]


 19%|██████▍                           | 9556/50000 [1:43:56<7:32:41,  1.49it/s]


 19%|██████▍                           | 9557/50000 [1:43:57<7:17:44,  1.54it/s]


 19%|██████▍                           | 9558/50000 [1:43:57<7:06:03,  1.58it/s]


 19%|██████▌                           | 9559/50000 [1:43:58<6:55:16,  1.62it/s]


 19%|██████▌                           | 9560/50000 [1:43:59<6:59:18,  1.61it/s]


 19%|██████▌                           | 9561/50000 [1:43:59<7:37:52,  1.47it/s]


 19%|██████▌                           | 9562/50000 [1:44:00<7:26:34,  1.51it/s]


 19%|██████▌                           | 9563/50000 [1:44:01<7:22:30,  1.52it/s]


 19%|██████▌                           | 9564/50000 [1:44:01<7:00:25,  1.60it/s]


 19%|██████▌                           | 9565/50000 [1:44:02<7:22:41,  1.52it/s]


 19%|██████▌                           | 9566/50000 [1:44:03<7:27:17,  1.51it/s]


 19%|██████▌                           | 9567/50000 [1:44:04<8:08:07,  1.38it/s]


 19%|██████▌                           | 9568/50000 [1:44:04<7:48:59,  1.44it/s]


 19%|██████▌                           | 9569/50000 [1:44:05<7:26:31,  1.51it/s]


 19%|██████▌                           | 9570/50000 [1:44:05<7:14:36,  1.55it/s]


 19%|██████▌                           | 9571/50000 [1:44:06<7:12:46,  1.56it/s]


 19%|██████▌                           | 9572/50000 [1:44:07<7:10:11,  1.57it/s]


 19%|██████▌                           | 9573/50000 [1:44:07<6:58:08,  1.61it/s]


 19%|██████▌                           | 9574/50000 [1:44:08<7:16:42,  1.54it/s]


 19%|██████▌                           | 9575/50000 [1:44:08<6:58:53,  1.61it/s]


 19%|██████▌                           | 9576/50000 [1:44:09<6:47:38,  1.65it/s]


 19%|██████▌                           | 9577/50000 [1:44:10<7:17:29,  1.54it/s]


 19%|██████▌                           | 9578/50000 [1:44:10<7:14:50,  1.55it/s]


 19%|██████▌                           | 9579/50000 [1:44:11<7:00:16,  1.60it/s]


 19%|██████▌                           | 9580/50000 [1:44:12<7:09:08,  1.57it/s]


 19%|██████▌                           | 9581/50000 [1:44:12<6:58:39,  1.61it/s]


 19%|██████▌                           | 9582/50000 [1:44:13<7:04:03,  1.59it/s]


 19%|██████▌                           | 9583/50000 [1:44:14<7:09:01,  1.57it/s]


 19%|██████▌                           | 9584/50000 [1:44:14<8:12:07,  1.37it/s]


 19%|██████▌                           | 9585/50000 [1:44:15<7:44:00,  1.45it/s]


 19%|██████▌                           | 9586/50000 [1:44:16<7:25:34,  1.51it/s]


 19%|██████▌                           | 9587/50000 [1:44:16<7:07:36,  1.58it/s]


 19%|██████▌                           | 9588/50000 [1:44:17<7:09:05,  1.57it/s]


 19%|██████▌                           | 9589/50000 [1:44:18<7:16:45,  1.54it/s]


 19%|██████▌                           | 9590/50000 [1:44:18<7:05:11,  1.58it/s]


 19%|██████▌                           | 9591/50000 [1:44:19<7:11:10,  1.56it/s]


 19%|██████▌                           | 9592/50000 [1:44:19<7:16:38,  1.54it/s]


 19%|██████▌                           | 9593/50000 [1:44:20<7:40:14,  1.46it/s]


 19%|██████▌                           | 9594/50000 [1:44:21<7:19:08,  1.53it/s]


 19%|██████▌                           | 9595/50000 [1:44:21<7:09:25,  1.57it/s]


 19%|██████▌                           | 9596/50000 [1:44:22<7:14:11,  1.55it/s]


 19%|██████▌                           | 9597/50000 [1:44:23<7:16:03,  1.54it/s]


 19%|██████▌                           | 9598/50000 [1:44:24<7:50:48,  1.43it/s]


 19%|██████▌                           | 9599/50000 [1:44:24<7:26:39,  1.51it/s]


 19%|██████▌                           | 9600/50000 [1:44:25<7:13:22,  1.55it/s]
                                                                                
{'loss': 3.3575, 'grad_norm': 3.2082791328430176, 'learning_rate': 0.000808, 'epoch': 0.5}

 19%|██████▌                           | 9600/50000 [1:44:25<7:13:22,  1.55it/s]


 19%|██████▌                           | 9601/50000 [1:44:26<7:47:37,  1.44it/s]


 19%|██████▌                           | 9602/50000 [1:44:26<7:27:52,  1.50it/s]


 19%|██████▌                           | 9603/50000 [1:44:27<7:09:06,  1.57it/s]


 19%|██████▌                           | 9604/50000 [1:44:27<7:12:29,  1.56it/s]


 19%|██████▌                           | 9605/50000 [1:44:28<7:03:20,  1.59it/s]


 19%|██████▌                           | 9606/50000 [1:44:29<7:08:30,  1.57it/s]


 19%|██████▌                           | 9607/50000 [1:44:29<7:33:13,  1.49it/s]


 19%|██████▌                           | 9608/50000 [1:44:30<7:41:05,  1.46it/s]


 19%|██████▌                           | 9609/50000 [1:44:31<7:19:37,  1.53it/s]


 19%|██████▌                           | 9610/50000 [1:44:31<7:21:11,  1.53it/s]


 19%|██████▌                           | 9611/50000 [1:44:32<7:36:50,  1.47it/s]


 19%|██████▌                           | 9612/50000 [1:44:33<7:38:28,  1.47it/s]


 19%|██████▌                           | 9613/50000 [1:44:33<7:26:45,  1.51it/s]


 19%|██████▌                           | 9614/50000 [1:44:34<7:13:17,  1.55it/s]


 19%|██████▌                           | 9615/50000 [1:44:35<7:10:26,  1.56it/s]


 19%|██████▌                           | 9616/50000 [1:44:35<7:08:23,  1.57it/s]


 19%|██████▌                           | 9617/50000 [1:44:36<6:56:39,  1.62it/s]


 19%|██████▌                           | 9618/50000 [1:44:36<6:44:44,  1.66it/s]


 19%|██████▌                           | 9619/50000 [1:44:37<7:32:36,  1.49it/s]


 19%|██████▌                           | 9620/50000 [1:44:38<7:42:31,  1.46it/s]


 19%|██████▌                           | 9621/50000 [1:44:39<7:38:33,  1.47it/s]


 19%|██████▌                           | 9622/50000 [1:44:39<7:14:08,  1.55it/s]


 19%|██████▌                           | 9623/50000 [1:44:40<7:03:13,  1.59it/s]


 19%|██████▌                           | 9624/50000 [1:44:40<7:13:33,  1.55it/s]


 19%|██████▌                           | 9625/50000 [1:44:41<7:18:10,  1.54it/s]


 19%|██████▌                           | 9626/50000 [1:44:42<7:06:32,  1.58it/s]


 19%|██████▌                           | 9627/50000 [1:44:42<6:55:37,  1.62it/s]


 19%|██████▌                           | 9628/50000 [1:44:43<6:50:18,  1.64it/s]


 19%|██████▌                           | 9629/50000 [1:44:43<6:55:33,  1.62it/s]


 19%|██████▌                           | 9630/50000 [1:44:44<7:01:48,  1.60it/s]


 19%|██████▌                           | 9631/50000 [1:44:45<6:46:53,  1.65it/s]


 19%|██████▌                           | 9632/50000 [1:44:45<6:47:51,  1.65it/s]


 19%|██████▌                           | 9633/50000 [1:44:46<6:47:18,  1.65it/s]


 19%|██████▌                           | 9634/50000 [1:44:47<6:44:20,  1.66it/s]


 19%|██████▌                           | 9635/50000 [1:44:47<6:37:11,  1.69it/s]


 19%|██████▌                           | 9636/50000 [1:44:48<6:50:03,  1.64it/s]


 19%|██████▌                           | 9637/50000 [1:44:49<7:41:31,  1.46it/s]


 19%|██████▌                           | 9638/50000 [1:44:49<7:03:50,  1.59it/s]


 19%|██████▌                           | 9639/50000 [1:44:50<6:52:07,  1.63it/s]


 19%|██████▌                           | 9640/50000 [1:44:51<8:15:28,  1.36it/s]


 19%|██████▌                           | 9641/50000 [1:44:51<8:11:50,  1.37it/s]


 19%|██████▌                           | 9642/50000 [1:44:52<8:16:05,  1.36it/s]


 19%|██████▌                           | 9643/50000 [1:44:53<8:03:16,  1.39it/s]


 19%|██████▌                           | 9644/50000 [1:44:53<7:37:32,  1.47it/s]


 19%|██████▌                           | 9645/50000 [1:44:54<7:22:01,  1.52it/s]


 19%|██████▌                           | 9646/50000 [1:44:55<7:26:31,  1.51it/s]


 19%|██████▌                           | 9647/50000 [1:44:55<7:18:40,  1.53it/s]


 19%|██████▌                           | 9648/50000 [1:44:56<7:02:39,  1.59it/s]


 19%|██████▌                           | 9649/50000 [1:44:57<7:25:06,  1.51it/s]


 19%|██████▌                           | 9650/50000 [1:44:57<7:30:02,  1.49it/s]


 19%|██████▌                           | 9651/50000 [1:44:58<8:00:26,  1.40it/s]


 19%|██████▌                           | 9652/50000 [1:44:59<7:30:28,  1.49it/s]


 19%|██████▌                           | 9653/50000 [1:44:59<7:10:49,  1.56it/s]


 19%|██████▌                           | 9654/50000 [1:45:00<6:59:35,  1.60it/s]


 19%|██████▌                           | 9655/50000 [1:45:01<7:20:19,  1.53it/s]


 19%|██████▌                           | 9656/50000 [1:45:01<7:38:45,  1.47it/s]


 19%|██████▌                           | 9657/50000 [1:45:02<7:52:51,  1.42it/s]


 19%|██████▌                           | 9658/50000 [1:45:03<7:45:42,  1.44it/s]


 19%|██████▌                           | 9659/50000 [1:45:03<7:27:46,  1.50it/s]


 19%|██████▌                           | 9660/50000 [1:45:04<7:10:07,  1.56it/s]


 19%|██████▌                           | 9661/50000 [1:45:05<7:45:17,  1.44it/s]


 19%|██████▌                           | 9662/50000 [1:45:05<7:33:05,  1.48it/s]


 19%|██████▌                           | 9663/50000 [1:45:06<6:56:57,  1.61it/s]


 19%|██████▌                           | 9664/50000 [1:45:06<6:54:01,  1.62it/s]


 19%|██████▌                           | 9665/50000 [1:45:07<7:08:34,  1.57it/s]


 19%|██████▌                           | 9666/50000 [1:45:08<7:13:44,  1.55it/s]


 19%|██████▌                           | 9667/50000 [1:45:09<7:35:30,  1.48it/s]


 19%|██████▌                           | 9668/50000 [1:45:09<7:20:33,  1.53it/s]


 19%|██████▌                           | 9669/50000 [1:45:10<7:37:31,  1.47it/s]


 19%|██████▌                           | 9670/50000 [1:45:11<7:48:14,  1.44it/s]


 19%|██████▌                           | 9671/50000 [1:45:11<7:29:32,  1.50it/s]


 19%|██████▌                           | 9672/50000 [1:45:12<7:40:03,  1.46it/s]


 19%|██████▌                           | 9673/50000 [1:45:13<7:16:09,  1.54it/s]


 19%|██████▌                           | 9674/50000 [1:45:13<7:18:36,  1.53it/s]


 19%|██████▌                           | 9675/50000 [1:45:14<7:41:22,  1.46it/s]


 19%|██████▌                           | 9676/50000 [1:45:15<7:22:07,  1.52it/s]


 19%|██████▌                           | 9677/50000 [1:45:15<7:21:22,  1.52it/s]


 19%|██████▌                           | 9678/50000 [1:45:16<7:52:49,  1.42it/s]


 19%|██████▌                           | 9679/50000 [1:45:17<7:40:33,  1.46it/s]


 19%|██████▌                           | 9680/50000 [1:45:17<7:31:46,  1.49it/s]


 19%|██████▌                           | 9681/50000 [1:45:18<7:07:44,  1.57it/s]


 19%|██████▌                           | 9682/50000 [1:45:18<6:41:20,  1.67it/s]


 19%|██████▌                           | 9683/50000 [1:45:19<7:08:36,  1.57it/s]


 19%|██████▌                           | 9684/50000 [1:45:20<7:50:23,  1.43it/s]


 19%|██████▌                           | 9685/50000 [1:45:21<7:47:38,  1.44it/s]


 19%|██████▌                           | 9686/50000 [1:45:21<7:56:22,  1.41it/s]


 19%|██████▌                           | 9687/50000 [1:45:22<7:27:23,  1.50it/s]


 19%|██████▌                           | 9688/50000 [1:45:23<7:31:32,  1.49it/s]


 19%|██████▌                           | 9689/50000 [1:45:23<7:33:08,  1.48it/s]


 19%|██████▌                           | 9690/50000 [1:45:24<7:25:22,  1.51it/s]


 19%|██████▌                           | 9691/50000 [1:45:25<7:18:32,  1.53it/s]


 19%|██████▌                           | 9692/50000 [1:45:25<7:13:47,  1.55it/s]


 19%|██████▌                           | 9693/50000 [1:45:26<7:12:54,  1.55it/s]


 19%|██████▌                           | 9694/50000 [1:45:26<7:02:08,  1.59it/s]


 19%|██████▌                           | 9695/50000 [1:45:27<6:16:40,  1.78it/s]


 19%|██████▌                           | 9696/50000 [1:45:27<6:22:42,  1.76it/s]


 19%|██████▌                           | 9697/50000 [1:45:28<6:12:18,  1.80it/s]


 19%|██████▌                           | 9698/50000 [1:45:29<6:47:07,  1.65it/s]


 19%|██████▌                           | 9699/50000 [1:45:29<6:44:06,  1.66it/s]


 19%|██████▌                           | 9700/50000 [1:45:30<6:27:16,  1.73it/s]
                                                                                
{'loss': 3.3677, 'grad_norm': 2.757967948913574, 'learning_rate': 0.0008060000000000001, 'epoch': 0.51}

 19%|██████▌                           | 9700/50000 [1:45:30<6:27:16,  1.73it/s]


 19%|██████▌                           | 9701/50000 [1:45:30<6:45:53,  1.65it/s]


 19%|██████▌                           | 9702/50000 [1:45:31<6:45:55,  1.65it/s]


 19%|██████▌                           | 9703/50000 [1:45:32<6:47:02,  1.65it/s]


 19%|██████▌                           | 9704/50000 [1:45:32<6:56:48,  1.61it/s]


 19%|██████▌                           | 9705/50000 [1:45:33<7:02:58,  1.59it/s]


 19%|██████▌                           | 9706/50000 [1:45:34<7:05:18,  1.58it/s]


 19%|██████▌                           | 9707/50000 [1:45:34<6:57:09,  1.61it/s]


 19%|██████▌                           | 9708/50000 [1:45:35<7:04:16,  1.58it/s]


 19%|██████▌                           | 9709/50000 [1:45:36<7:58:10,  1.40it/s]


 19%|██████▌                           | 9710/50000 [1:45:36<7:46:59,  1.44it/s]


 19%|██████▌                           | 9711/50000 [1:45:37<7:50:55,  1.43it/s]


 19%|██████▌                           | 9712/50000 [1:45:38<7:33:10,  1.48it/s]


 19%|██████▌                           | 9713/50000 [1:45:38<7:24:25,  1.51it/s]


 19%|██████▌                           | 9714/50000 [1:45:39<7:12:58,  1.55it/s]


 19%|██████▌                           | 9715/50000 [1:45:40<7:34:41,  1.48it/s]


 19%|██████▌                           | 9716/50000 [1:45:40<7:27:31,  1.50it/s]


 19%|██████▌                           | 9717/50000 [1:45:41<7:12:12,  1.55it/s]


 19%|██████▌                           | 9718/50000 [1:45:42<6:56:28,  1.61it/s]


 19%|██████▌                           | 9719/50000 [1:45:42<7:00:46,  1.60it/s]


 19%|██████▌                           | 9720/50000 [1:45:43<6:54:49,  1.62it/s]


 19%|██████▌                           | 9721/50000 [1:45:43<6:59:39,  1.60it/s]


 19%|██████▌                           | 9722/50000 [1:45:44<6:47:43,  1.65it/s]


 19%|██████▌                           | 9723/50000 [1:45:45<6:45:45,  1.65it/s]


 19%|██████▌                           | 9724/50000 [1:45:45<6:57:18,  1.61it/s]


 19%|██████▌                           | 9725/50000 [1:45:46<6:48:40,  1.64it/s]


 19%|██████▌                           | 9726/50000 [1:45:46<6:43:49,  1.66it/s]


 19%|██████▌                           | 9727/50000 [1:45:47<7:12:51,  1.55it/s]


 19%|██████▌                           | 9728/50000 [1:45:48<6:53:10,  1.62it/s]


 19%|██████▌                           | 9729/50000 [1:45:49<7:38:00,  1.47it/s]


 19%|██████▌                           | 9730/50000 [1:45:49<7:29:48,  1.49it/s]


 19%|██████▌                           | 9731/50000 [1:45:50<7:31:45,  1.49it/s]


 19%|██████▌                           | 9732/50000 [1:45:50<7:17:48,  1.53it/s]


 19%|██████▌                           | 9733/50000 [1:45:51<7:05:59,  1.58it/s]


 19%|██████▌                           | 9734/50000 [1:45:52<7:13:05,  1.55it/s]


 19%|██████▌                           | 9735/50000 [1:45:52<6:57:32,  1.61it/s]


 19%|██████▌                           | 9736/50000 [1:45:53<7:10:57,  1.56it/s]


 19%|██████▌                           | 9737/50000 [1:45:54<7:12:37,  1.55it/s]


 19%|██████▌                           | 9738/50000 [1:45:54<7:27:55,  1.50it/s]


 19%|██████▌                           | 9739/50000 [1:45:55<7:12:58,  1.55it/s]


 19%|██████▌                           | 9740/50000 [1:45:56<7:17:51,  1.53it/s]


 19%|██████▌                           | 9741/50000 [1:45:56<7:12:19,  1.55it/s]


 19%|██████▌                           | 9742/50000 [1:45:57<7:51:53,  1.42it/s]


 19%|██████▋                           | 9743/50000 [1:45:58<7:44:02,  1.45it/s]


 19%|██████▋                           | 9744/50000 [1:45:58<7:33:13,  1.48it/s]


 19%|██████▋                           | 9745/50000 [1:45:59<7:18:54,  1.53it/s]


 19%|██████▋                           | 9746/50000 [1:46:00<7:39:55,  1.46it/s]


 19%|██████▋                           | 9747/50000 [1:46:00<7:13:19,  1.55it/s]


 19%|██████▋                           | 9748/50000 [1:46:01<6:59:44,  1.60it/s]


 19%|██████▋                           | 9749/50000 [1:46:02<7:20:04,  1.52it/s]


 20%|██████▋                           | 9750/50000 [1:46:02<7:31:25,  1.49it/s]


 20%|██████▋                           | 9751/50000 [1:46:03<7:24:00,  1.51it/s]


 20%|██████▋                           | 9752/50000 [1:46:04<7:01:02,  1.59it/s]


 20%|██████▋                           | 9753/50000 [1:46:04<6:51:17,  1.63it/s]


 20%|██████▋                           | 9754/50000 [1:46:05<7:35:18,  1.47it/s]


 20%|██████▋                           | 9755/50000 [1:46:06<7:27:49,  1.50it/s]


 20%|██████▋                           | 9756/50000 [1:46:06<7:31:39,  1.49it/s]


 20%|██████▋                           | 9757/50000 [1:46:07<8:08:54,  1.37it/s]


 20%|██████▋                           | 9758/50000 [1:46:08<7:36:06,  1.47it/s]


 20%|██████▋                           | 9759/50000 [1:46:08<7:24:38,  1.51it/s]


 20%|██████▋                           | 9760/50000 [1:46:09<7:47:53,  1.43it/s]


 20%|██████▋                           | 9761/50000 [1:46:10<8:00:28,  1.40it/s]


 20%|██████▋                           | 9762/50000 [1:46:11<8:26:31,  1.32it/s]


 20%|██████▋                           | 9763/50000 [1:46:11<8:08:53,  1.37it/s]


 20%|██████▋                           | 9764/50000 [1:46:12<8:09:47,  1.37it/s]


 20%|██████▋                           | 9765/50000 [1:46:13<7:22:02,  1.52it/s]


 20%|██████▋                           | 9766/50000 [1:46:13<7:37:01,  1.47it/s]


 20%|██████▋                           | 9767/50000 [1:46:14<7:26:27,  1.50it/s]


 20%|██████▋                           | 9768/50000 [1:46:15<7:56:00,  1.41it/s]


 20%|██████▋                           | 9769/50000 [1:46:16<8:07:06,  1.38it/s]


 20%|██████▋                           | 9770/50000 [1:46:16<7:20:13,  1.52it/s]


 20%|██████▋                           | 9771/50000 [1:46:17<7:16:06,  1.54it/s]


 20%|██████▋                           | 9772/50000 [1:46:17<7:14:25,  1.54it/s]


 20%|██████▋                           | 9773/50000 [1:46:18<7:17:08,  1.53it/s]


 20%|██████▋                           | 9774/50000 [1:46:19<7:06:12,  1.57it/s]


 20%|██████▋                           | 9775/50000 [1:46:19<7:05:00,  1.58it/s]


 20%|██████▋                           | 9776/50000 [1:46:20<6:56:23,  1.61it/s]


 20%|██████▋                           | 9777/50000 [1:46:20<7:01:47,  1.59it/s]


 20%|██████▋                           | 9778/50000 [1:46:21<7:03:07,  1.58it/s]


 20%|██████▋                           | 9779/50000 [1:46:22<7:11:43,  1.55it/s]


 20%|██████▋                           | 9780/50000 [1:46:22<7:28:19,  1.50it/s]


 20%|██████▋                           | 9781/50000 [1:46:23<7:58:47,  1.40it/s]


 20%|██████▋                           | 9782/50000 [1:46:24<8:01:22,  1.39it/s]


 20%|██████▋                           | 9783/50000 [1:46:25<7:37:11,  1.47it/s]


 20%|██████▋                           | 9784/50000 [1:46:25<7:26:38,  1.50it/s]


 20%|██████▋                           | 9785/50000 [1:46:26<7:17:55,  1.53it/s]


 20%|██████▋                           | 9786/50000 [1:46:26<7:09:07,  1.56it/s]


 20%|██████▋                           | 9787/50000 [1:46:27<6:55:29,  1.61it/s]


 20%|██████▋                           | 9788/50000 [1:46:28<7:05:01,  1.58it/s]


 20%|██████▋                           | 9789/50000 [1:46:28<7:00:10,  1.60it/s]


 20%|██████▋                           | 9790/50000 [1:46:29<7:07:11,  1.57it/s]


 20%|██████▋                           | 9791/50000 [1:46:30<6:57:48,  1.60it/s]


 20%|██████▋                           | 9792/50000 [1:46:30<7:26:20,  1.50it/s]


 20%|██████▋                           | 9793/50000 [1:46:31<7:26:37,  1.50it/s]


 20%|██████▋                           | 9794/50000 [1:46:32<7:11:28,  1.55it/s]


 20%|██████▋                           | 9795/50000 [1:46:32<7:13:30,  1.55it/s]


 20%|██████▋                           | 9796/50000 [1:46:33<7:06:30,  1.57it/s]


 20%|██████▋                           | 9797/50000 [1:46:34<7:11:35,  1.55it/s]


 20%|██████▋                           | 9798/50000 [1:46:34<7:07:23,  1.57it/s]


 20%|██████▋                           | 9799/50000 [1:46:35<7:48:56,  1.43it/s]


 20%|██████▋                           | 9800/50000 [1:46:36<7:21:56,  1.52it/s]
                                                                                
{'loss': 3.3265, 'grad_norm': 2.660703182220459, 'learning_rate': 0.000804, 'epoch': 0.51}

 20%|██████▋                           | 9800/50000 [1:46:36<7:21:56,  1.52it/s]


 20%|██████▋                           | 9801/50000 [1:46:36<6:54:46,  1.62it/s]


 20%|██████▋                           | 9802/50000 [1:46:37<7:13:35,  1.55it/s]


 20%|██████▋                           | 9803/50000 [1:46:37<6:58:38,  1.60it/s]


 20%|██████▋                           | 9804/50000 [1:46:38<7:17:57,  1.53it/s]


 20%|██████▋                           | 9805/50000 [1:46:39<7:16:52,  1.53it/s]


 20%|██████▋                           | 9806/50000 [1:46:39<7:02:04,  1.59it/s]


 20%|██████▋                           | 9807/50000 [1:46:40<7:22:55,  1.51it/s]


 20%|██████▋                           | 9808/50000 [1:46:41<7:12:35,  1.55it/s]


 20%|██████▋                           | 9809/50000 [1:46:41<7:20:46,  1.52it/s]


 20%|██████▋                           | 9810/50000 [1:46:42<7:34:37,  1.47it/s]


 20%|██████▋                           | 9811/50000 [1:46:43<7:28:36,  1.49it/s]


 20%|██████▋                           | 9812/50000 [1:46:43<7:23:13,  1.51it/s]


 20%|██████▋                           | 9813/50000 [1:46:44<7:05:13,  1.58it/s]


 20%|██████▋                           | 9814/50000 [1:46:45<6:53:52,  1.62it/s]


 20%|██████▋                           | 9815/50000 [1:46:45<6:52:29,  1.62it/s]


 20%|██████▋                           | 9816/50000 [1:46:46<6:40:21,  1.67it/s]


 20%|██████▋                           | 9817/50000 [1:46:46<6:48:21,  1.64it/s]


 20%|██████▋                           | 9818/50000 [1:46:47<6:41:13,  1.67it/s]


 20%|██████▋                           | 9819/50000 [1:46:48<6:55:34,  1.61it/s]


 20%|██████▋                           | 9820/50000 [1:46:48<6:53:49,  1.62it/s]


 20%|██████▋                           | 9821/50000 [1:46:49<7:14:25,  1.54it/s]


 20%|██████▋                           | 9822/50000 [1:46:50<7:17:23,  1.53it/s]


 20%|██████▋                           | 9823/50000 [1:46:50<7:18:22,  1.53it/s]


 20%|██████▋                           | 9824/50000 [1:46:51<7:17:32,  1.53it/s]


 20%|██████▋                           | 9825/50000 [1:46:51<6:49:34,  1.63it/s]


 20%|██████▋                           | 9826/50000 [1:46:52<6:45:54,  1.65it/s]


 20%|██████▋                           | 9827/50000 [1:46:53<7:10:32,  1.56it/s]


 20%|██████▋                           | 9828/50000 [1:46:53<7:09:30,  1.56it/s]


 20%|██████▋                           | 9829/50000 [1:46:54<6:41:25,  1.67it/s]


 20%|██████▋                           | 9830/50000 [1:46:55<7:25:55,  1.50it/s]


 20%|██████▋                           | 9831/50000 [1:46:55<7:55:47,  1.41it/s]


 20%|██████▋                           | 9832/50000 [1:46:56<8:18:03,  1.34it/s]


 20%|██████▋                           | 9833/50000 [1:46:57<7:45:19,  1.44it/s]


 20%|██████▋                           | 9834/50000 [1:46:57<7:17:30,  1.53it/s]


 20%|██████▋                           | 9835/50000 [1:46:58<7:02:53,  1.58it/s]


 20%|██████▋                           | 9836/50000 [1:46:59<7:14:19,  1.54it/s]


 20%|██████▋                           | 9837/50000 [1:46:59<6:55:32,  1.61it/s]


 20%|██████▋                           | 9838/50000 [1:47:00<6:43:39,  1.66it/s]


 20%|██████▋                           | 9839/50000 [1:47:00<6:52:02,  1.62it/s]


 20%|██████▋                           | 9840/50000 [1:47:01<6:53:54,  1.62it/s]


 20%|██████▋                           | 9841/50000 [1:47:02<7:18:22,  1.53it/s]


 20%|██████▋                           | 9842/50000 [1:47:02<7:08:01,  1.56it/s]


 20%|██████▋                           | 9843/50000 [1:47:03<7:13:35,  1.54it/s]


 20%|██████▋                           | 9844/50000 [1:47:04<7:02:12,  1.59it/s]


 20%|██████▋                           | 9845/50000 [1:47:04<7:12:03,  1.55it/s]


 20%|██████▋                           | 9846/50000 [1:47:05<6:57:23,  1.60it/s]


 20%|██████▋                           | 9847/50000 [1:47:06<7:03:33,  1.58it/s]


 20%|██████▋                           | 9848/50000 [1:47:06<7:44:47,  1.44it/s]


 20%|██████▋                           | 9849/50000 [1:47:07<7:39:14,  1.46it/s]


 20%|██████▋                           | 9850/50000 [1:47:08<7:22:39,  1.51it/s]


 20%|██████▋                           | 9851/50000 [1:47:08<7:14:56,  1.54it/s]


 20%|██████▋                           | 9852/50000 [1:47:09<6:55:32,  1.61it/s]


 20%|██████▋                           | 9853/50000 [1:47:10<7:34:01,  1.47it/s]


 20%|██████▋                           | 9854/50000 [1:47:10<7:30:34,  1.49it/s]


 20%|██████▋                           | 9855/50000 [1:47:11<7:15:17,  1.54it/s]


 20%|██████▋                           | 9856/50000 [1:47:11<6:49:56,  1.63it/s]


 20%|██████▋                           | 9857/50000 [1:47:12<6:41:43,  1.67it/s]


 20%|██████▋                           | 9858/50000 [1:47:13<6:30:57,  1.71it/s]


 20%|██████▋                           | 9859/50000 [1:47:13<6:50:00,  1.63it/s]


 20%|██████▋                           | 9860/50000 [1:47:14<6:38:20,  1.68it/s]


 20%|██████▋                           | 9861/50000 [1:47:14<6:39:07,  1.68it/s]


 20%|██████▋                           | 9862/50000 [1:47:15<6:35:52,  1.69it/s]


 20%|██████▋                           | 9863/50000 [1:47:16<6:37:37,  1.68it/s]


 20%|██████▋                           | 9864/50000 [1:47:16<6:46:39,  1.64it/s]


 20%|██████▋                           | 9865/50000 [1:47:17<7:08:01,  1.56it/s]


 20%|██████▋                           | 9866/50000 [1:47:18<7:06:19,  1.57it/s]


 20%|██████▋                           | 9867/50000 [1:47:19<8:02:33,  1.39it/s]


 20%|██████▋                           | 9868/50000 [1:47:19<8:02:37,  1.39it/s]


 20%|██████▋                           | 9869/50000 [1:47:20<8:20:56,  1.34it/s]


 20%|██████▋                           | 9870/50000 [1:47:21<8:42:54,  1.28it/s]


 20%|██████▋                           | 9871/50000 [1:47:22<8:04:30,  1.38it/s]


 20%|██████▋                           | 9872/50000 [1:47:22<7:38:57,  1.46it/s]


 20%|██████▋                           | 9873/50000 [1:47:23<7:28:49,  1.49it/s]


 20%|██████▋                           | 9874/50000 [1:47:23<7:16:35,  1.53it/s]


 20%|██████▋                           | 9875/50000 [1:47:24<7:08:02,  1.56it/s]


 20%|██████▋                           | 9876/50000 [1:47:25<7:14:51,  1.54it/s]


 20%|██████▋                           | 9877/50000 [1:47:25<7:15:42,  1.53it/s]


 20%|██████▋                           | 9878/50000 [1:47:26<7:20:04,  1.52it/s]


 20%|██████▋                           | 9879/50000 [1:47:27<7:20:35,  1.52it/s]


 20%|██████▋                           | 9880/50000 [1:47:27<7:19:42,  1.52it/s]


 20%|██████▋                           | 9881/50000 [1:47:28<7:19:05,  1.52it/s]


 20%|██████▋                           | 9882/50000 [1:47:29<7:16:05,  1.53it/s]


 20%|██████▋                           | 9883/50000 [1:47:29<7:40:01,  1.45it/s]


 20%|██████▋                           | 9884/50000 [1:47:30<8:05:01,  1.38it/s]


 20%|██████▋                           | 9885/50000 [1:47:31<7:52:14,  1.42it/s]


 20%|██████▋                           | 9886/50000 [1:47:31<7:20:20,  1.52it/s]


 20%|██████▋                           | 9887/50000 [1:47:32<7:09:15,  1.56it/s]


 20%|██████▋                           | 9888/50000 [1:47:32<6:43:08,  1.66it/s]


 20%|██████▋                           | 9889/50000 [1:47:33<6:27:37,  1.72it/s]


 20%|██████▋                           | 9890/50000 [1:47:34<6:44:01,  1.65it/s]


 20%|██████▋                           | 9891/50000 [1:47:34<6:37:35,  1.68it/s]


 20%|██████▋                           | 9892/50000 [1:47:35<6:36:50,  1.68it/s]


 20%|██████▋                           | 9893/50000 [1:47:35<6:35:05,  1.69it/s]


 20%|██████▋                           | 9894/50000 [1:47:36<7:00:40,  1.59it/s]


 20%|██████▋                           | 9895/50000 [1:47:37<6:51:54,  1.62it/s]


 20%|██████▋                           | 9896/50000 [1:47:37<7:14:26,  1.54it/s]


 20%|██████▋                           | 9897/50000 [1:47:38<7:04:21,  1.58it/s]


 20%|██████▋                           | 9898/50000 [1:47:39<7:22:02,  1.51it/s]


 20%|██████▋                           | 9899/50000 [1:47:40<8:00:37,  1.39it/s]


 20%|██████▋                           | 9900/50000 [1:47:40<8:01:25,  1.39it/s]
                                                                                
{'loss': 3.4028, 'grad_norm': 2.8550899028778076, 'learning_rate': 0.0008020000000000001, 'epoch': 0.52}

 20%|██████▋                           | 9900/50000 [1:47:40<8:01:25,  1.39it/s]


 20%|██████▋                           | 9901/50000 [1:47:41<7:22:39,  1.51it/s]


 20%|██████▋                           | 9902/50000 [1:47:41<7:06:15,  1.57it/s]


 20%|██████▋                           | 9903/50000 [1:47:42<6:53:34,  1.62it/s]


 20%|██████▋                           | 9904/50000 [1:47:43<7:00:27,  1.59it/s]


 20%|██████▋                           | 9905/50000 [1:47:43<7:06:03,  1.57it/s]


 20%|██████▋                           | 9906/50000 [1:47:44<7:12:27,  1.55it/s]


 20%|██████▋                           | 9907/50000 [1:47:45<7:10:40,  1.55it/s]


 20%|██████▋                           | 9908/50000 [1:47:45<7:26:01,  1.50it/s]


 20%|██████▋                           | 9909/50000 [1:47:46<7:06:08,  1.57it/s]


 20%|██████▋                           | 9910/50000 [1:47:47<7:27:08,  1.49it/s]


 20%|██████▋                           | 9911/50000 [1:47:47<7:15:44,  1.53it/s]


 20%|██████▋                           | 9912/50000 [1:47:48<6:44:22,  1.65it/s]


 20%|██████▋                           | 9913/50000 [1:47:48<6:48:28,  1.64it/s]


 20%|██████▋                           | 9914/50000 [1:47:49<7:09:06,  1.56it/s]


 20%|██████▋                           | 9915/50000 [1:47:50<7:07:03,  1.56it/s]


 20%|██████▋                           | 9916/50000 [1:47:50<7:12:06,  1.55it/s]


 20%|██████▋                           | 9917/50000 [1:47:51<6:58:44,  1.60it/s]


 20%|██████▋                           | 9918/50000 [1:47:52<7:08:12,  1.56it/s]


 20%|██████▋                           | 9919/50000 [1:47:52<6:54:25,  1.61it/s]


 20%|██████▋                           | 9920/50000 [1:47:53<7:14:30,  1.54it/s]


 20%|██████▋                           | 9921/50000 [1:47:54<7:15:16,  1.53it/s]


 20%|██████▋                           | 9922/50000 [1:47:54<7:14:32,  1.54it/s]


 20%|██████▋                           | 9923/50000 [1:47:55<7:32:26,  1.48it/s]


 20%|██████▋                           | 9924/50000 [1:47:56<7:22:41,  1.51it/s]


 20%|██████▋                           | 9925/50000 [1:47:56<7:18:30,  1.52it/s]


 20%|██████▋                           | 9926/50000 [1:47:57<7:14:20,  1.54it/s]


 20%|██████▊                           | 9927/50000 [1:47:58<7:36:09,  1.46it/s]


 20%|██████▊                           | 9928/50000 [1:47:58<7:18:57,  1.52it/s]


 20%|██████▊                           | 9929/50000 [1:47:59<7:20:59,  1.51it/s]


 20%|██████▊                           | 9930/50000 [1:48:00<7:10:52,  1.55it/s]


 20%|██████▊                           | 9931/50000 [1:48:00<7:08:56,  1.56it/s]


 20%|██████▊                           | 9932/50000 [1:48:01<6:54:50,  1.61it/s]


 20%|██████▊                           | 9933/50000 [1:48:02<7:22:30,  1.51it/s]


 20%|██████▊                           | 9934/50000 [1:48:02<7:00:46,  1.59it/s]


 20%|██████▊                           | 9935/50000 [1:48:03<7:07:03,  1.56it/s]


 20%|██████▊                           | 9936/50000 [1:48:03<7:27:09,  1.49it/s]


 20%|██████▊                           | 9937/50000 [1:48:04<7:10:37,  1.55it/s]


 20%|██████▊                           | 9938/50000 [1:48:05<7:07:55,  1.56it/s]


 20%|██████▊                           | 9939/50000 [1:48:05<6:43:48,  1.65it/s]


 20%|██████▊                           | 9940/50000 [1:48:06<6:25:37,  1.73it/s]


 20%|██████▊                           | 9941/50000 [1:48:06<6:28:43,  1.72it/s]


 20%|██████▊                           | 9942/50000 [1:48:07<6:47:43,  1.64it/s]


 20%|██████▊                           | 9943/50000 [1:48:08<6:53:19,  1.62it/s]


 20%|██████▊                           | 9944/50000 [1:48:08<6:57:05,  1.60it/s]


 20%|██████▊                           | 9945/50000 [1:48:09<6:49:31,  1.63it/s]


 20%|██████▊                           | 9946/50000 [1:48:09<6:46:45,  1.64it/s]


 20%|██████▊                           | 9947/50000 [1:48:10<7:07:34,  1.56it/s]


 20%|██████▊                           | 9948/50000 [1:48:11<7:23:10,  1.51it/s]


 20%|██████▊                           | 9949/50000 [1:48:12<7:19:59,  1.52it/s]


 20%|██████▊                           | 9950/50000 [1:48:12<7:15:17,  1.53it/s]


 20%|██████▊                           | 9951/50000 [1:48:13<7:04:21,  1.57it/s]


 20%|██████▊                           | 9952/50000 [1:48:14<7:20:52,  1.51it/s]


 20%|██████▊                           | 9953/50000 [1:48:14<7:34:10,  1.47it/s]


 20%|██████▊                           | 9954/50000 [1:48:15<7:26:23,  1.50it/s]


 20%|██████▊                           | 9955/50000 [1:48:15<7:13:25,  1.54it/s]


 20%|██████▊                           | 9956/50000 [1:48:16<6:59:14,  1.59it/s]


 20%|██████▊                           | 9957/50000 [1:48:17<7:26:47,  1.49it/s]


 20%|██████▊                           | 9958/50000 [1:48:17<7:12:25,  1.54it/s]


 20%|██████▊                           | 9959/50000 [1:48:18<7:17:37,  1.52it/s]


 20%|██████▊                           | 9960/50000 [1:48:19<7:55:58,  1.40it/s]


 20%|██████▊                           | 9961/50000 [1:48:20<7:30:29,  1.48it/s]


 20%|██████▊                           | 9962/50000 [1:48:20<7:09:57,  1.55it/s]


 20%|██████▊                           | 9963/50000 [1:48:21<7:06:55,  1.56it/s]


 20%|██████▊                           | 9964/50000 [1:48:21<7:29:25,  1.48it/s]


 20%|██████▊                           | 9965/50000 [1:48:22<7:25:45,  1.50it/s]


 20%|██████▊                           | 9966/50000 [1:48:23<7:11:37,  1.55it/s]


 20%|██████▊                           | 9967/50000 [1:48:23<6:56:43,  1.60it/s]


 20%|██████▊                           | 9968/50000 [1:48:24<7:03:08,  1.58it/s]


 20%|██████▊                           | 9969/50000 [1:48:25<7:13:41,  1.54it/s]


 20%|██████▊                           | 9970/50000 [1:48:25<7:30:31,  1.48it/s]


 20%|██████▊                           | 9971/50000 [1:48:26<7:15:05,  1.53it/s]


 20%|██████▊                           | 9972/50000 [1:48:27<7:11:41,  1.55it/s]


 20%|██████▊                           | 9973/50000 [1:48:27<7:11:54,  1.54it/s]


 20%|██████▊                           | 9974/50000 [1:48:28<6:51:49,  1.62it/s]


 20%|██████▊                           | 9975/50000 [1:48:28<6:49:03,  1.63it/s]


 20%|██████▊                           | 9976/50000 [1:48:29<7:01:16,  1.58it/s]


 20%|██████▊                           | 9977/50000 [1:48:30<7:09:39,  1.55it/s]


 20%|██████▊                           | 9978/50000 [1:48:30<7:10:30,  1.55it/s]


 20%|██████▊                           | 9979/50000 [1:48:31<7:26:54,  1.49it/s]


 20%|██████▊                           | 9980/50000 [1:48:32<7:13:32,  1.54it/s]


 20%|██████▊                           | 9981/50000 [1:48:32<7:14:27,  1.54it/s]


 20%|██████▊                           | 9982/50000 [1:48:33<7:02:20,  1.58it/s]


 20%|██████▊                           | 9983/50000 [1:48:34<6:50:10,  1.63it/s]


 20%|██████▊                           | 9984/50000 [1:48:34<7:09:36,  1.55it/s]


 20%|██████▊                           | 9985/50000 [1:48:35<6:56:23,  1.60it/s]


 20%|██████▊                           | 9986/50000 [1:48:35<6:56:08,  1.60it/s]


 20%|██████▊                           | 9987/50000 [1:48:36<6:57:11,  1.60it/s]


 20%|██████▊                           | 9988/50000 [1:48:37<7:18:52,  1.52it/s]


 20%|██████▊                           | 9989/50000 [1:48:37<7:06:43,  1.56it/s]


 20%|██████▊                           | 9990/50000 [1:48:38<7:04:38,  1.57it/s]


 20%|██████▊                           | 9991/50000 [1:48:39<7:49:09,  1.42it/s]


 20%|██████▊                           | 9992/50000 [1:48:40<7:43:03,  1.44it/s]


 20%|██████▊                           | 9993/50000 [1:48:40<7:33:43,  1.47it/s]


 20%|██████▊                           | 9994/50000 [1:48:41<7:28:25,  1.49it/s]


 20%|██████▊                           | 9995/50000 [1:48:42<7:49:34,  1.42it/s]


 20%|██████▊                           | 9996/50000 [1:48:42<7:19:21,  1.52it/s]


 20%|██████▊                           | 9997/50000 [1:48:43<7:05:36,  1.57it/s]


 20%|██████▊                           | 9998/50000 [1:48:43<7:04:23,  1.57it/s]


 20%|██████▊                           | 9999/50000 [1:48:44<7:05:56,  1.57it/s]


 20%|██████▌                          | 10000/50000 [1:48:45<7:04:27,  1.57it/s]
                                                                                
{'loss': 3.3921, 'grad_norm': 2.710148811340332, 'learning_rate': 0.0008, 'epoch': 0.52}

 20%|██████▌                          | 10000/50000 [1:48:45<7:04:27,  1.57it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.12s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:04<00:01,  1.58s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:06<00:00,  1.59s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 33.043336, 'eval_rouge-2': 7.3128660000000005, 'eval_rouge-l': 26.826463999999998, 'eval_bleu-4': 0.039210339534922145, 'eval_runtime': 8.7658, 'eval_samples_per_second': 5.704, 'eval_steps_per_second': 0.456, 'epoch': 0.52}

 20%|██████▌                          | 10000/50000 [1:48:53<7:04:27,  1.57it/s]

100%|█████████████████████████████████████████████| 4/4 [00:06<00:00,  1.59s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-10000


tokenizer config file saved in ./output/tmp-checkpoint-10000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-10000/special_tokens_map.json



 20%|██████▍                         | 10001/50000 [1:48:54<36:16:46,  3.27s/it]


 20%|██████▍                         | 10002/50000 [1:48:55<27:03:30,  2.44s/it]


 20%|██████▍                         | 10003/50000 [1:48:55<21:03:23,  1.90s/it]


 20%|██████▍                         | 10004/50000 [1:48:56<16:29:07,  1.48s/it]


 20%|██████▍                         | 10005/50000 [1:48:56<13:54:39,  1.25s/it]


 20%|██████▍                         | 10006/50000 [1:48:57<12:01:29,  1.08s/it]


 20%|██████▍                         | 10007/50000 [1:48:58<10:57:56,  1.01it/s]


 20%|██████▌                          | 10008/50000 [1:48:59<9:32:25,  1.16it/s]


 20%|██████▌                          | 10009/50000 [1:48:59<8:20:35,  1.33it/s]


 20%|██████▌                          | 10010/50000 [1:49:00<7:48:29,  1.42it/s]


 20%|██████▌                          | 10011/50000 [1:49:00<8:16:38,  1.34it/s]


 20%|██████▌                          | 10012/50000 [1:49:01<7:37:02,  1.46it/s]


 20%|██████▌                          | 10013/50000 [1:49:02<7:19:08,  1.52it/s]


 20%|██████▌                          | 10014/50000 [1:49:02<6:49:43,  1.63it/s]


 20%|██████▌                          | 10015/50000 [1:49:03<7:11:57,  1.54it/s]


 20%|██████▌                          | 10016/50000 [1:49:04<7:24:34,  1.50it/s]


 20%|██████▌                          | 10017/50000 [1:49:04<7:24:29,  1.50it/s]


 20%|██████▌                          | 10018/50000 [1:49:05<7:09:18,  1.55it/s]


 20%|██████▌                          | 10019/50000 [1:49:06<7:22:44,  1.51it/s]


 20%|██████▌                          | 10020/50000 [1:49:06<7:05:44,  1.57it/s]


 20%|██████▌                          | 10021/50000 [1:49:07<7:25:43,  1.49it/s]


 20%|██████▌                          | 10022/50000 [1:49:07<6:56:43,  1.60it/s]


 20%|██████▌                          | 10023/50000 [1:49:08<7:17:58,  1.52it/s]


 20%|██████▌                          | 10024/50000 [1:49:09<7:29:57,  1.48it/s]


 20%|██████▌                          | 10025/50000 [1:49:09<7:11:30,  1.54it/s]


 20%|██████▌                          | 10026/50000 [1:49:10<6:53:18,  1.61it/s]


 20%|██████▌                          | 10027/50000 [1:49:11<6:49:24,  1.63it/s]


 20%|██████▌                          | 10028/50000 [1:49:11<6:40:32,  1.66it/s]


 20%|██████▌                          | 10029/50000 [1:49:12<6:39:07,  1.67it/s]


 20%|██████▌                          | 10030/50000 [1:49:12<6:45:53,  1.64it/s]


 20%|██████▌                          | 10031/50000 [1:49:13<6:48:01,  1.63it/s]


 20%|██████▌                          | 10032/50000 [1:49:14<6:45:40,  1.64it/s]


 20%|██████▌                          | 10033/50000 [1:49:14<6:51:35,  1.62it/s]


 20%|██████▌                          | 10034/50000 [1:49:15<7:17:49,  1.52it/s]


 20%|██████▌                          | 10035/50000 [1:49:16<7:01:34,  1.58it/s]


 20%|██████▌                          | 10036/50000 [1:49:16<7:01:58,  1.58it/s]


 20%|██████▌                          | 10037/50000 [1:49:17<7:30:33,  1.48it/s]


 20%|██████▋                          | 10038/50000 [1:49:18<7:21:01,  1.51it/s]


 20%|██████▋                          | 10039/50000 [1:49:18<6:47:15,  1.64it/s]


 20%|██████▋                          | 10040/50000 [1:49:19<6:44:17,  1.65it/s]


 20%|██████▋                          | 10041/50000 [1:49:19<6:43:33,  1.65it/s]


 20%|██████▋                          | 10042/50000 [1:49:20<7:12:40,  1.54it/s]


 20%|██████▋                          | 10043/50000 [1:49:21<7:08:29,  1.55it/s]


 20%|██████▋                          | 10044/50000 [1:49:21<7:16:46,  1.52it/s]


 20%|██████▋                          | 10045/50000 [1:49:22<7:11:16,  1.54it/s]


 20%|██████▋                          | 10046/50000 [1:49:23<6:59:41,  1.59it/s]


 20%|██████▋                          | 10047/50000 [1:49:23<7:40:38,  1.45it/s]


 20%|██████▋                          | 10048/50000 [1:49:24<7:30:36,  1.48it/s]


 20%|██████▋                          | 10049/50000 [1:49:25<6:58:48,  1.59it/s]


 20%|██████▋                          | 10050/50000 [1:49:25<7:05:15,  1.57it/s]


 20%|██████▋                          | 10051/50000 [1:49:26<7:11:49,  1.54it/s]


 20%|██████▋                          | 10052/50000 [1:49:27<7:34:39,  1.46it/s]


 20%|██████▋                          | 10053/50000 [1:49:27<7:42:53,  1.44it/s]


 20%|██████▋                          | 10054/50000 [1:49:28<7:37:58,  1.45it/s]


 20%|██████▋                          | 10055/50000 [1:49:29<7:15:39,  1.53it/s]


 20%|██████▋                          | 10056/50000 [1:49:29<7:26:34,  1.49it/s]


 20%|██████▋                          | 10057/50000 [1:49:30<7:07:36,  1.56it/s]


 20%|██████▋                          | 10058/50000 [1:49:31<7:13:23,  1.54it/s]


 20%|██████▋                          | 10059/50000 [1:49:31<6:40:19,  1.66it/s]


 20%|██████▋                          | 10060/50000 [1:49:32<7:12:47,  1.54it/s]


 20%|██████▋                          | 10061/50000 [1:49:32<7:14:32,  1.53it/s]


 20%|██████▋                          | 10062/50000 [1:49:33<7:35:06,  1.46it/s]


 20%|██████▋                          | 10063/50000 [1:49:34<6:55:30,  1.60it/s]


 20%|██████▋                          | 10064/50000 [1:49:34<6:54:53,  1.60it/s]


 20%|██████▋                          | 10065/50000 [1:49:35<7:13:48,  1.53it/s]


 20%|██████▋                          | 10066/50000 [1:49:36<7:32:25,  1.47it/s]


 20%|██████▋                          | 10067/50000 [1:49:36<7:21:05,  1.51it/s]


 20%|██████▋                          | 10068/50000 [1:49:37<7:05:26,  1.56it/s]


 20%|██████▋                          | 10069/50000 [1:49:38<7:03:26,  1.57it/s]


 20%|██████▋                          | 10070/50000 [1:49:38<7:07:03,  1.56it/s]


 20%|██████▋                          | 10071/50000 [1:49:39<6:49:36,  1.62it/s]


 20%|██████▋                          | 10072/50000 [1:49:39<6:46:00,  1.64it/s]


 20%|██████▋                          | 10073/50000 [1:49:40<6:48:32,  1.63it/s]


 20%|██████▋                          | 10074/50000 [1:49:41<6:37:39,  1.67it/s]


 20%|██████▋                          | 10075/50000 [1:49:41<6:20:20,  1.75it/s]


 20%|██████▋                          | 10076/50000 [1:49:42<6:19:04,  1.76it/s]


 20%|██████▋                          | 10077/50000 [1:49:43<7:16:32,  1.52it/s]


 20%|██████▋                          | 10078/50000 [1:49:43<7:20:55,  1.51it/s]


 20%|██████▋                          | 10079/50000 [1:49:44<7:07:19,  1.56it/s]


 20%|██████▋                          | 10080/50000 [1:49:45<7:42:04,  1.44it/s]


 20%|██████▋                          | 10081/50000 [1:49:45<7:33:31,  1.47it/s]


 20%|██████▋                          | 10082/50000 [1:49:46<7:30:12,  1.48it/s]


 20%|██████▋                          | 10083/50000 [1:49:47<7:11:29,  1.54it/s]


 20%|██████▋                          | 10084/50000 [1:49:47<6:50:49,  1.62it/s]


 20%|██████▋                          | 10085/50000 [1:49:48<7:12:02,  1.54it/s]


 20%|██████▋                          | 10086/50000 [1:49:48<6:57:29,  1.59it/s]


 20%|██████▋                          | 10087/50000 [1:49:49<7:19:15,  1.51it/s]


 20%|██████▋                          | 10088/50000 [1:49:50<7:07:30,  1.56it/s]


 20%|██████▋                          | 10089/50000 [1:49:50<7:09:18,  1.55it/s]


 20%|██████▋                          | 10090/50000 [1:49:51<7:08:03,  1.55it/s]


 20%|██████▋                          | 10091/50000 [1:49:52<7:05:56,  1.56it/s]


 20%|██████▋                          | 10092/50000 [1:49:52<6:53:09,  1.61it/s]


 20%|██████▋                          | 10093/50000 [1:49:53<6:50:51,  1.62it/s]


 20%|██████▋                          | 10094/50000 [1:49:53<6:38:48,  1.67it/s]


 20%|██████▋                          | 10095/50000 [1:49:54<7:43:38,  1.43it/s]


 20%|██████▋                          | 10096/50000 [1:49:55<7:40:58,  1.44it/s]


 20%|██████▋                          | 10097/50000 [1:49:56<8:07:57,  1.36it/s]


 20%|██████▋                          | 10098/50000 [1:49:56<7:36:23,  1.46it/s]


 20%|██████▋                          | 10099/50000 [1:49:57<7:13:08,  1.54it/s]


 20%|██████▋                          | 10100/50000 [1:49:58<7:17:23,  1.52it/s]
                                                                                
{'loss': 3.3348, 'grad_norm': 2.6869916915893555, 'learning_rate': 0.0007980000000000001, 'epoch': 0.53}

 20%|██████▋                          | 10100/50000 [1:49:58<7:17:23,  1.52it/s]


 20%|██████▋                          | 10101/50000 [1:49:58<7:18:05,  1.52it/s]


 20%|██████▋                          | 10102/50000 [1:49:59<6:57:03,  1.59it/s]


 20%|██████▋                          | 10103/50000 [1:49:59<6:46:03,  1.64it/s]


 20%|██████▋                          | 10104/50000 [1:50:00<7:07:48,  1.55it/s]


 20%|██████▋                          | 10105/50000 [1:50:01<6:52:11,  1.61it/s]


 20%|██████▋                          | 10106/50000 [1:50:01<6:55:09,  1.60it/s]


 20%|██████▋                          | 10107/50000 [1:50:02<7:07:23,  1.56it/s]


 20%|██████▋                          | 10108/50000 [1:50:03<7:07:15,  1.56it/s]


 20%|██████▋                          | 10109/50000 [1:50:03<7:00:37,  1.58it/s]


 20%|██████▋                          | 10110/50000 [1:50:04<7:09:51,  1.55it/s]


 20%|██████▋                          | 10111/50000 [1:50:05<7:31:58,  1.47it/s]


 20%|██████▋                          | 10112/50000 [1:50:05<7:16:38,  1.52it/s]


 20%|██████▋                          | 10113/50000 [1:50:06<7:30:32,  1.48it/s]


 20%|██████▋                          | 10114/50000 [1:50:07<7:43:59,  1.43it/s]


 20%|██████▋                          | 10115/50000 [1:50:07<7:02:58,  1.57it/s]


 20%|██████▋                          | 10116/50000 [1:50:08<7:01:29,  1.58it/s]


 20%|██████▋                          | 10117/50000 [1:50:09<7:05:37,  1.56it/s]


 20%|██████▋                          | 10118/50000 [1:50:09<6:53:08,  1.61it/s]


 20%|██████▋                          | 10119/50000 [1:50:10<6:50:25,  1.62it/s]


 20%|██████▋                          | 10120/50000 [1:50:11<7:19:16,  1.51it/s]


 20%|██████▋                          | 10121/50000 [1:50:11<7:52:41,  1.41it/s]


 20%|██████▋                          | 10122/50000 [1:50:12<7:23:20,  1.50it/s]


 20%|██████▋                          | 10123/50000 [1:50:13<7:26:43,  1.49it/s]


 20%|██████▋                          | 10124/50000 [1:50:13<7:07:38,  1.55it/s]


 20%|██████▋                          | 10125/50000 [1:50:14<6:49:14,  1.62it/s]


 20%|██████▋                          | 10126/50000 [1:50:14<6:53:49,  1.61it/s]


 20%|██████▋                          | 10127/50000 [1:50:15<6:45:46,  1.64it/s]


 20%|██████▋                          | 10128/50000 [1:50:16<6:35:09,  1.68it/s]


 20%|██████▋                          | 10129/50000 [1:50:16<6:50:16,  1.62it/s]


 20%|██████▋                          | 10130/50000 [1:50:17<6:40:41,  1.66it/s]


 20%|██████▋                          | 10131/50000 [1:50:17<6:37:42,  1.67it/s]


 20%|██████▋                          | 10132/50000 [1:50:18<7:19:43,  1.51it/s]


 20%|██████▋                          | 10133/50000 [1:50:19<7:03:40,  1.57it/s]


 20%|██████▋                          | 10134/50000 [1:50:19<7:06:33,  1.56it/s]


 20%|██████▋                          | 10135/50000 [1:50:20<6:58:21,  1.59it/s]


 20%|██████▋                          | 10136/50000 [1:50:21<7:03:56,  1.57it/s]


 20%|██████▋                          | 10137/50000 [1:50:21<6:53:59,  1.60it/s]


 20%|██████▋                          | 10138/50000 [1:50:22<7:12:49,  1.53it/s]


 20%|██████▋                          | 10139/50000 [1:50:23<7:38:37,  1.45it/s]


 20%|██████▋                          | 10140/50000 [1:50:24<8:08:42,  1.36it/s]


 20%|██████▋                          | 10141/50000 [1:50:24<7:41:58,  1.44it/s]


 20%|██████▋                          | 10142/50000 [1:50:25<7:17:05,  1.52it/s]


 20%|██████▋                          | 10143/50000 [1:50:25<6:55:50,  1.60it/s]


 20%|██████▋                          | 10144/50000 [1:50:26<6:42:09,  1.65it/s]


 20%|██████▋                          | 10145/50000 [1:50:26<6:41:13,  1.66it/s]


 20%|██████▋                          | 10146/50000 [1:50:27<7:06:13,  1.56it/s]


 20%|██████▋                          | 10147/50000 [1:50:28<7:08:10,  1.55it/s]


 20%|██████▋                          | 10148/50000 [1:50:28<6:54:28,  1.60it/s]


 20%|██████▋                          | 10149/50000 [1:50:29<6:54:53,  1.60it/s]


 20%|██████▋                          | 10150/50000 [1:50:30<7:00:15,  1.58it/s]


 20%|██████▋                          | 10151/50000 [1:50:30<6:57:47,  1.59it/s]


 20%|██████▋                          | 10152/50000 [1:50:31<7:33:42,  1.46it/s]


 20%|██████▋                          | 10153/50000 [1:50:32<7:22:24,  1.50it/s]


 20%|██████▋                          | 10154/50000 [1:50:32<7:05:40,  1.56it/s]


 20%|██████▋                          | 10155/50000 [1:50:33<7:33:32,  1.46it/s]


 20%|██████▋                          | 10156/50000 [1:50:34<7:44:43,  1.43it/s]


 20%|██████▋                          | 10157/50000 [1:50:35<7:51:02,  1.41it/s]


 20%|██████▋                          | 10158/50000 [1:50:35<7:29:24,  1.48it/s]


 20%|██████▋                          | 10159/50000 [1:50:36<7:13:01,  1.53it/s]


 20%|██████▋                          | 10160/50000 [1:50:36<6:47:02,  1.63it/s]


 20%|██████▋                          | 10161/50000 [1:50:37<7:09:03,  1.55it/s]


 20%|██████▋                          | 10162/50000 [1:50:38<7:16:33,  1.52it/s]


 20%|██████▋                          | 10163/50000 [1:50:38<6:59:01,  1.58it/s]


 20%|██████▋                          | 10164/50000 [1:50:39<6:41:48,  1.65it/s]


 20%|██████▋                          | 10165/50000 [1:50:39<6:42:12,  1.65it/s]


 20%|██████▋                          | 10166/50000 [1:50:40<6:36:13,  1.68it/s]


 20%|██████▋                          | 10167/50000 [1:50:41<7:02:39,  1.57it/s]


 20%|██████▋                          | 10168/50000 [1:50:41<6:49:18,  1.62it/s]


 20%|██████▋                          | 10169/50000 [1:50:42<6:50:06,  1.62it/s]


 20%|██████▋                          | 10170/50000 [1:50:43<6:55:36,  1.60it/s]


 20%|██████▋                          | 10171/50000 [1:50:43<6:56:54,  1.59it/s]


 20%|██████▋                          | 10172/50000 [1:50:44<6:34:43,  1.68it/s]


 20%|██████▋                          | 10173/50000 [1:50:44<6:57:27,  1.59it/s]


 20%|██████▋                          | 10174/50000 [1:50:45<7:02:09,  1.57it/s]


 20%|██████▋                          | 10175/50000 [1:50:46<7:57:06,  1.39it/s]


 20%|██████▋                          | 10176/50000 [1:50:47<7:42:26,  1.44it/s]


 20%|██████▋                          | 10177/50000 [1:50:47<7:31:01,  1.47it/s]


 20%|██████▋                          | 10178/50000 [1:50:48<7:22:55,  1.50it/s]


 20%|██████▋                          | 10179/50000 [1:50:48<7:08:31,  1.55it/s]


 20%|██████▋                          | 10180/50000 [1:50:49<7:25:17,  1.49it/s]


 20%|██████▋                          | 10181/50000 [1:50:50<7:56:38,  1.39it/s]


 20%|██████▋                          | 10182/50000 [1:50:51<7:48:29,  1.42it/s]


 20%|██████▋                          | 10183/50000 [1:50:51<7:34:27,  1.46it/s]


 20%|██████▋                          | 10184/50000 [1:50:52<6:55:57,  1.60it/s]


 20%|██████▋                          | 10185/50000 [1:50:53<7:02:13,  1.57it/s]


 20%|██████▋                          | 10186/50000 [1:50:53<6:43:56,  1.64it/s]


 20%|██████▋                          | 10187/50000 [1:50:54<6:59:02,  1.58it/s]


 20%|██████▋                          | 10188/50000 [1:50:54<6:52:20,  1.61it/s]


 20%|██████▋                          | 10189/50000 [1:50:55<6:59:57,  1.58it/s]


 20%|██████▋                          | 10190/50000 [1:50:56<7:18:21,  1.51it/s]


 20%|██████▋                          | 10191/50000 [1:50:57<7:58:38,  1.39it/s]


 20%|██████▋                          | 10192/50000 [1:50:57<7:46:23,  1.42it/s]


 20%|██████▋                          | 10193/50000 [1:50:58<7:15:17,  1.52it/s]


 20%|██████▋                          | 10194/50000 [1:50:58<6:48:55,  1.62it/s]


 20%|██████▋                          | 10195/50000 [1:50:59<6:42:37,  1.65it/s]


 20%|██████▋                          | 10196/50000 [1:51:00<7:07:30,  1.55it/s]


 20%|██████▋                          | 10197/50000 [1:51:00<6:51:48,  1.61it/s]


 20%|██████▋                          | 10198/50000 [1:51:01<6:43:19,  1.64it/s]


 20%|██████▋                          | 10199/50000 [1:51:02<7:07:57,  1.55it/s]


 20%|██████▋                          | 10200/50000 [1:51:02<7:03:20,  1.57it/s]
                                                                                
{'loss': 3.3457, 'grad_norm': 2.7898879051208496, 'learning_rate': 0.000796, 'epoch': 0.53}

 20%|██████▋                          | 10200/50000 [1:51:02<7:03:20,  1.57it/s]


 20%|██████▋                          | 10201/50000 [1:51:03<6:38:09,  1.67it/s]


 20%|██████▋                          | 10202/50000 [1:51:03<7:07:46,  1.55it/s]


 20%|██████▋                          | 10203/50000 [1:51:04<7:12:49,  1.53it/s]


 20%|██████▋                          | 10204/50000 [1:51:05<7:28:16,  1.48it/s]


 20%|██████▋                          | 10205/50000 [1:51:05<6:56:34,  1.59it/s]


 20%|██████▋                          | 10206/50000 [1:51:06<6:56:18,  1.59it/s]


 20%|██████▋                          | 10207/50000 [1:51:07<6:57:22,  1.59it/s]


 20%|██████▋                          | 10208/50000 [1:51:07<6:49:22,  1.62it/s]


 20%|██████▋                          | 10209/50000 [1:51:08<6:43:47,  1.64it/s]


 20%|██████▋                          | 10210/50000 [1:51:08<6:48:34,  1.62it/s]


 20%|██████▋                          | 10211/50000 [1:51:09<7:12:32,  1.53it/s]


 20%|██████▋                          | 10212/50000 [1:51:10<7:03:33,  1.57it/s]


 20%|██████▋                          | 10213/50000 [1:51:10<7:06:03,  1.56it/s]


 20%|██████▋                          | 10214/50000 [1:51:11<7:02:53,  1.57it/s]


 20%|██████▋                          | 10215/50000 [1:51:12<6:51:54,  1.61it/s]


 20%|██████▋                          | 10216/50000 [1:51:12<6:55:29,  1.60it/s]


 20%|██████▋                          | 10217/50000 [1:51:13<6:55:28,  1.60it/s]


 20%|██████▋                          | 10218/50000 [1:51:14<7:04:19,  1.56it/s]


 20%|██████▋                          | 10219/50000 [1:51:14<7:05:24,  1.56it/s]


 20%|██████▋                          | 10220/50000 [1:51:15<7:03:33,  1.57it/s]


 20%|██████▋                          | 10221/50000 [1:51:15<7:03:31,  1.57it/s]


 20%|██████▋                          | 10222/50000 [1:51:16<7:47:38,  1.42it/s]


 20%|██████▋                          | 10223/50000 [1:51:17<7:51:16,  1.41it/s]


 20%|██████▋                          | 10224/50000 [1:51:18<7:38:16,  1.45it/s]


 20%|██████▋                          | 10225/50000 [1:51:18<7:16:37,  1.52it/s]


 20%|██████▋                          | 10226/50000 [1:51:19<6:53:58,  1.60it/s]


 20%|██████▋                          | 10227/50000 [1:51:19<6:53:06,  1.60it/s]


 20%|██████▊                          | 10228/50000 [1:51:20<6:57:34,  1.59it/s]


 20%|██████▊                          | 10229/50000 [1:51:21<6:43:07,  1.64it/s]


 20%|██████▊                          | 10230/50000 [1:51:21<6:52:56,  1.61it/s]


 20%|██████▊                          | 10231/50000 [1:51:22<6:45:57,  1.63it/s]


 20%|██████▊                          | 10232/50000 [1:51:22<6:34:59,  1.68it/s]


 20%|██████▊                          | 10233/50000 [1:51:23<6:34:42,  1.68it/s]


 20%|██████▊                          | 10234/50000 [1:51:24<6:32:05,  1.69it/s]


 20%|██████▊                          | 10235/50000 [1:51:24<6:58:21,  1.58it/s]


 20%|██████▊                          | 10236/50000 [1:51:25<6:53:28,  1.60it/s]


 20%|██████▊                          | 10237/50000 [1:51:26<6:49:02,  1.62it/s]


 20%|██████▊                          | 10238/50000 [1:51:26<6:40:56,  1.65it/s]


 20%|██████▊                          | 10239/50000 [1:51:27<6:23:18,  1.73it/s]


 20%|██████▊                          | 10240/50000 [1:51:27<6:56:01,  1.59it/s]


 20%|██████▊                          | 10241/50000 [1:51:28<6:50:36,  1.61it/s]


 20%|██████▊                          | 10242/50000 [1:51:29<7:16:37,  1.52it/s]


 20%|██████▊                          | 10243/50000 [1:51:29<7:11:36,  1.54it/s]


 20%|██████▊                          | 10244/50000 [1:51:30<7:25:58,  1.49it/s]


 20%|██████▊                          | 10245/50000 [1:51:31<7:10:23,  1.54it/s]


 20%|██████▊                          | 10246/50000 [1:51:31<7:14:40,  1.52it/s]


 20%|██████▊                          | 10247/50000 [1:51:32<7:00:00,  1.58it/s]


 20%|██████▊                          | 10248/50000 [1:51:33<7:03:32,  1.56it/s]


 20%|██████▊                          | 10249/50000 [1:51:33<6:54:42,  1.60it/s]


 20%|██████▊                          | 10250/50000 [1:51:34<6:56:08,  1.59it/s]


 21%|██████▊                          | 10251/50000 [1:51:34<7:03:20,  1.56it/s]


 21%|██████▊                          | 10252/50000 [1:51:35<7:38:31,  1.44it/s]


 21%|██████▊                          | 10253/50000 [1:51:36<7:34:13,  1.46it/s]


 21%|██████▊                          | 10254/50000 [1:51:37<7:16:02,  1.52it/s]


 21%|██████▊                          | 10255/50000 [1:51:37<7:21:04,  1.50it/s]


 21%|██████▊                          | 10256/50000 [1:51:38<7:07:03,  1.55it/s]


 21%|██████▊                          | 10257/50000 [1:51:39<7:13:29,  1.53it/s]


 21%|██████▊                          | 10258/50000 [1:51:39<7:02:52,  1.57it/s]


 21%|██████▊                          | 10259/50000 [1:51:40<7:36:45,  1.45it/s]


 21%|██████▊                          | 10260/50000 [1:51:41<7:23:02,  1.49it/s]


 21%|██████▊                          | 10261/50000 [1:51:41<7:14:33,  1.52it/s]


 21%|██████▊                          | 10262/50000 [1:51:42<7:33:05,  1.46it/s]


 21%|██████▊                          | 10263/50000 [1:51:43<7:12:50,  1.53it/s]


 21%|██████▊                          | 10264/50000 [1:51:43<7:12:22,  1.53it/s]


 21%|██████▊                          | 10265/50000 [1:51:44<7:44:49,  1.42it/s]


 21%|██████▊                          | 10266/50000 [1:51:45<7:39:57,  1.44it/s]


 21%|██████▊                          | 10267/50000 [1:51:45<7:21:22,  1.50it/s]


 21%|██████▊                          | 10268/50000 [1:51:46<7:03:23,  1.56it/s]


 21%|██████▊                          | 10269/50000 [1:51:47<7:17:14,  1.51it/s]


 21%|██████▊                          | 10270/50000 [1:51:47<7:53:09,  1.40it/s]


 21%|██████▊                          | 10271/50000 [1:51:48<7:23:20,  1.49it/s]


 21%|██████▊                          | 10272/50000 [1:51:49<7:26:14,  1.48it/s]


 21%|██████▊                          | 10273/50000 [1:51:49<7:05:32,  1.56it/s]


 21%|██████▊                          | 10274/50000 [1:51:50<7:08:29,  1.55it/s]


 21%|██████▊                          | 10275/50000 [1:51:51<7:09:20,  1.54it/s]


 21%|██████▊                          | 10276/50000 [1:51:51<7:05:45,  1.56it/s]


 21%|██████▊                          | 10277/50000 [1:51:52<7:12:31,  1.53it/s]


 21%|██████▊                          | 10278/50000 [1:51:52<6:59:41,  1.58it/s]


 21%|██████▊                          | 10279/50000 [1:51:53<6:49:28,  1.62it/s]


 21%|██████▊                          | 10280/50000 [1:51:54<6:56:05,  1.59it/s]


 21%|██████▊                          | 10281/50000 [1:51:54<6:49:20,  1.62it/s]


 21%|██████▊                          | 10282/50000 [1:51:55<7:00:59,  1.57it/s]


 21%|██████▊                          | 10283/50000 [1:51:55<6:49:24,  1.62it/s]


 21%|██████▊                          | 10284/50000 [1:51:56<6:50:54,  1.61it/s]


 21%|██████▊                          | 10285/50000 [1:51:57<6:53:20,  1.60it/s]


 21%|██████▊                          | 10286/50000 [1:51:57<6:45:13,  1.63it/s]


 21%|██████▊                          | 10287/50000 [1:51:58<6:44:12,  1.64it/s]


 21%|██████▊                          | 10288/50000 [1:51:59<7:08:03,  1.55it/s]


 21%|██████▊                          | 10289/50000 [1:51:59<6:51:29,  1.61it/s]


 21%|██████▊                          | 10290/50000 [1:52:00<6:37:31,  1.66it/s]


 21%|██████▊                          | 10291/50000 [1:52:00<6:36:26,  1.67it/s]


 21%|██████▊                          | 10292/50000 [1:52:01<6:45:44,  1.63it/s]


 21%|██████▊                          | 10293/50000 [1:52:02<6:48:30,  1.62it/s]


 21%|██████▊                          | 10294/50000 [1:52:02<7:14:43,  1.52it/s]


 21%|██████▊                          | 10295/50000 [1:52:03<7:29:39,  1.47it/s]


 21%|██████▊                          | 10296/50000 [1:52:04<7:23:18,  1.49it/s]


 21%|██████▊                          | 10297/50000 [1:52:04<7:17:10,  1.51it/s]


 21%|██████▊                          | 10298/50000 [1:52:05<7:20:19,  1.50it/s]


 21%|██████▊                          | 10299/50000 [1:52:06<7:06:14,  1.55it/s]


 21%|██████▊                          | 10300/50000 [1:52:06<7:12:49,  1.53it/s]
                                                                                
{'loss': 3.4054, 'grad_norm': 2.5819284915924072, 'learning_rate': 0.0007940000000000001, 'epoch': 0.54}

 21%|██████▊                          | 10300/50000 [1:52:06<7:12:49,  1.53it/s]


 21%|██████▊                          | 10301/50000 [1:52:07<7:08:46,  1.54it/s]


 21%|██████▊                          | 10302/50000 [1:52:08<7:28:53,  1.47it/s]


 21%|██████▊                          | 10303/50000 [1:52:09<7:57:31,  1.39it/s]


 21%|██████▊                          | 10304/50000 [1:52:09<7:23:36,  1.49it/s]


 21%|██████▊                          | 10305/50000 [1:52:10<6:59:50,  1.58it/s]


 21%|██████▊                          | 10306/50000 [1:52:10<6:43:14,  1.64it/s]


 21%|██████▊                          | 10307/50000 [1:52:11<6:47:52,  1.62it/s]


 21%|██████▊                          | 10308/50000 [1:52:12<7:14:11,  1.52it/s]


 21%|██████▊                          | 10309/50000 [1:52:12<7:54:53,  1.39it/s]


 21%|██████▊                          | 10310/50000 [1:52:13<8:01:07,  1.37it/s]


 21%|██████▊                          | 10311/50000 [1:52:14<7:33:37,  1.46it/s]


 21%|██████▊                          | 10312/50000 [1:52:15<7:39:25,  1.44it/s]


 21%|██████▊                          | 10313/50000 [1:52:15<7:17:07,  1.51it/s]


 21%|██████▊                          | 10314/50000 [1:52:16<6:56:35,  1.59it/s]


 21%|██████▊                          | 10315/50000 [1:52:16<6:49:33,  1.61it/s]


 21%|██████▊                          | 10316/50000 [1:52:17<6:57:22,  1.58it/s]


 21%|██████▊                          | 10317/50000 [1:52:18<6:48:58,  1.62it/s]


 21%|██████▊                          | 10318/50000 [1:52:18<6:35:39,  1.67it/s]


 21%|██████▊                          | 10319/50000 [1:52:19<6:27:39,  1.71it/s]


 21%|██████▊                          | 10320/50000 [1:52:19<6:24:44,  1.72it/s]


 21%|██████▊                          | 10321/50000 [1:52:20<6:30:14,  1.69it/s]


 21%|██████▊                          | 10322/50000 [1:52:20<6:27:51,  1.70it/s]


 21%|██████▊                          | 10323/50000 [1:52:21<6:13:08,  1.77it/s]


 21%|██████▊                          | 10324/50000 [1:52:21<6:14:49,  1.76it/s]


 21%|██████▊                          | 10325/50000 [1:52:22<6:26:44,  1.71it/s]


 21%|██████▊                          | 10326/50000 [1:52:23<7:13:48,  1.52it/s]


 21%|██████▊                          | 10327/50000 [1:52:24<7:19:21,  1.50it/s]


 21%|██████▊                          | 10328/50000 [1:52:24<7:31:12,  1.47it/s]


 21%|██████▊                          | 10329/50000 [1:52:25<7:05:02,  1.56it/s]


 21%|██████▊                          | 10330/50000 [1:52:26<7:22:29,  1.49it/s]


 21%|██████▊                          | 10331/50000 [1:52:26<7:22:46,  1.49it/s]


 21%|██████▊                          | 10332/50000 [1:52:27<7:17:53,  1.51it/s]


 21%|██████▊                          | 10333/50000 [1:52:28<7:35:08,  1.45it/s]


 21%|██████▊                          | 10334/50000 [1:52:28<7:44:43,  1.42it/s]


 21%|██████▊                          | 10335/50000 [1:52:29<7:07:34,  1.55it/s]


 21%|██████▊                          | 10336/50000 [1:52:30<7:08:28,  1.54it/s]


 21%|██████▊                          | 10337/50000 [1:52:30<7:14:10,  1.52it/s]


 21%|██████▊                          | 10338/50000 [1:52:31<6:59:23,  1.58it/s]


 21%|██████▊                          | 10339/50000 [1:52:32<7:09:02,  1.54it/s]


 21%|██████▊                          | 10340/50000 [1:52:32<7:31:30,  1.46it/s]


 21%|██████▊                          | 10341/50000 [1:52:33<8:23:11,  1.31it/s]


 21%|██████▊                          | 10342/50000 [1:52:34<7:58:59,  1.38it/s]


 21%|██████▊                          | 10343/50000 [1:52:35<8:03:47,  1.37it/s]


 21%|██████▊                          | 10344/50000 [1:52:35<7:26:37,  1.48it/s]


 21%|██████▊                          | 10345/50000 [1:52:36<6:56:03,  1.59it/s]


 21%|██████▊                          | 10346/50000 [1:52:36<7:06:49,  1.55it/s]


 21%|██████▊                          | 10347/50000 [1:52:37<7:23:37,  1.49it/s]


 21%|██████▊                          | 10348/50000 [1:52:38<7:36:33,  1.45it/s]


 21%|██████▊                          | 10349/50000 [1:52:38<7:25:03,  1.48it/s]


 21%|██████▊                          | 10350/50000 [1:52:39<7:20:44,  1.50it/s]


 21%|██████▊                          | 10351/50000 [1:52:40<7:24:08,  1.49it/s]


 21%|██████▊                          | 10352/50000 [1:52:40<7:05:08,  1.55it/s]


 21%|██████▊                          | 10353/50000 [1:52:41<7:13:06,  1.53it/s]


 21%|██████▊                          | 10354/50000 [1:52:42<7:08:38,  1.54it/s]


 21%|██████▊                          | 10355/50000 [1:52:42<7:22:20,  1.49it/s]


 21%|██████▊                          | 10356/50000 [1:52:43<7:20:07,  1.50it/s]


 21%|██████▊                          | 10357/50000 [1:52:44<6:59:46,  1.57it/s]


 21%|██████▊                          | 10358/50000 [1:52:44<7:16:09,  1.51it/s]


 21%|██████▊                          | 10359/50000 [1:52:45<7:09:33,  1.54it/s]


 21%|██████▊                          | 10360/50000 [1:52:46<6:58:33,  1.58it/s]


 21%|██████▊                          | 10361/50000 [1:52:46<6:44:35,  1.63it/s]


 21%|██████▊                          | 10362/50000 [1:52:47<6:36:07,  1.67it/s]


 21%|██████▊                          | 10363/50000 [1:52:47<6:51:29,  1.61it/s]


 21%|██████▊                          | 10364/50000 [1:52:48<6:57:09,  1.58it/s]


 21%|██████▊                          | 10365/50000 [1:52:49<6:48:34,  1.62it/s]


 21%|██████▊                          | 10366/50000 [1:52:49<6:45:02,  1.63it/s]


 21%|██████▊                          | 10367/50000 [1:52:50<6:36:32,  1.67it/s]


 21%|██████▊                          | 10368/50000 [1:52:50<6:44:13,  1.63it/s]


 21%|██████▊                          | 10369/50000 [1:52:51<6:40:49,  1.65it/s]


 21%|██████▊                          | 10370/50000 [1:52:52<7:08:57,  1.54it/s]


 21%|██████▊                          | 10371/50000 [1:52:52<7:05:43,  1.55it/s]


 21%|██████▊                          | 10372/50000 [1:52:53<7:29:04,  1.47it/s]


 21%|██████▊                          | 10373/50000 [1:52:54<7:07:14,  1.55it/s]


 21%|██████▊                          | 10374/50000 [1:52:55<7:33:51,  1.46it/s]


 21%|██████▊                          | 10375/50000 [1:52:55<7:30:41,  1.47it/s]


 21%|██████▊                          | 10376/50000 [1:52:56<7:27:09,  1.48it/s]


 21%|██████▊                          | 10377/50000 [1:52:56<7:07:08,  1.55it/s]


 21%|██████▊                          | 10378/50000 [1:52:57<7:20:41,  1.50it/s]


 21%|██████▊                          | 10379/50000 [1:52:58<7:20:51,  1.50it/s]


 21%|██████▊                          | 10380/50000 [1:52:58<7:11:44,  1.53it/s]


 21%|██████▊                          | 10381/50000 [1:52:59<7:47:41,  1.41it/s]


 21%|██████▊                          | 10382/50000 [1:53:00<7:54:34,  1.39it/s]


 21%|██████▊                          | 10383/50000 [1:53:01<7:32:25,  1.46it/s]


 21%|██████▊                          | 10384/50000 [1:53:02<8:15:02,  1.33it/s]


 21%|██████▊                          | 10385/50000 [1:53:02<7:53:29,  1.39it/s]


 21%|██████▊                          | 10386/50000 [1:53:03<7:20:52,  1.50it/s]


 21%|██████▊                          | 10387/50000 [1:53:03<7:05:28,  1.55it/s]


 21%|██████▊                          | 10388/50000 [1:53:04<6:51:04,  1.61it/s]


 21%|██████▊                          | 10389/50000 [1:53:04<6:40:54,  1.65it/s]


 21%|██████▊                          | 10390/50000 [1:53:05<6:40:14,  1.65it/s]


 21%|██████▊                          | 10391/50000 [1:53:06<6:38:17,  1.66it/s]


 21%|██████▊                          | 10392/50000 [1:53:06<7:06:07,  1.55it/s]


 21%|██████▊                          | 10393/50000 [1:53:07<6:52:54,  1.60it/s]


 21%|██████▊                          | 10394/50000 [1:53:07<6:27:53,  1.70it/s]


 21%|██████▊                          | 10395/50000 [1:53:08<6:24:47,  1.72it/s]


 21%|██████▊                          | 10396/50000 [1:53:09<6:25:06,  1.71it/s]


 21%|██████▊                          | 10397/50000 [1:53:09<6:45:04,  1.63it/s]


 21%|██████▊                          | 10398/50000 [1:53:10<7:11:44,  1.53it/s]


 21%|██████▊                          | 10399/50000 [1:53:11<7:25:22,  1.48it/s]


 21%|██████▊                          | 10400/50000 [1:53:11<7:05:42,  1.55it/s]
                                                                                
{'loss': 3.3525, 'grad_norm': 3.6891331672668457, 'learning_rate': 0.0007920000000000001, 'epoch': 0.54}

 21%|██████▊                          | 10400/50000 [1:53:11<7:05:42,  1.55it/s]


 21%|██████▊                          | 10401/50000 [1:53:12<6:54:27,  1.59it/s]


 21%|██████▊                          | 10402/50000 [1:53:13<6:50:15,  1.61it/s]


 21%|██████▊                          | 10403/50000 [1:53:13<7:01:09,  1.57it/s]


 21%|██████▊                          | 10404/50000 [1:53:14<7:01:11,  1.57it/s]


 21%|██████▊                          | 10405/50000 [1:53:14<6:58:46,  1.58it/s]


 21%|██████▊                          | 10406/50000 [1:53:15<7:34:36,  1.45it/s]


 21%|██████▊                          | 10407/50000 [1:53:16<7:29:40,  1.47it/s]


 21%|██████▊                          | 10408/50000 [1:53:17<7:36:50,  1.44it/s]


 21%|██████▊                          | 10409/50000 [1:53:17<7:28:44,  1.47it/s]


 21%|██████▊                          | 10410/50000 [1:53:18<7:37:22,  1.44it/s]


 21%|██████▊                          | 10411/50000 [1:53:19<7:49:41,  1.40it/s]


 21%|██████▊                          | 10412/50000 [1:53:19<7:41:34,  1.43it/s]


 21%|██████▊                          | 10413/50000 [1:53:20<7:17:05,  1.51it/s]


 21%|██████▊                          | 10414/50000 [1:53:21<7:12:18,  1.53it/s]


 21%|██████▊                          | 10415/50000 [1:53:21<6:56:38,  1.58it/s]


 21%|██████▊                          | 10416/50000 [1:53:22<7:32:56,  1.46it/s]


 21%|██████▉                          | 10417/50000 [1:53:23<7:29:43,  1.47it/s]


 21%|██████▉                          | 10418/50000 [1:53:24<7:42:56,  1.43it/s]


 21%|██████▉                          | 10419/50000 [1:53:24<7:07:13,  1.54it/s]


 21%|██████▉                          | 10420/50000 [1:53:25<7:19:19,  1.50it/s]


 21%|██████▉                          | 10421/50000 [1:53:25<6:49:33,  1.61it/s]


 21%|██████▉                          | 10422/50000 [1:53:26<6:39:24,  1.65it/s]


 21%|██████▉                          | 10423/50000 [1:53:26<6:47:04,  1.62it/s]


 21%|██████▉                          | 10424/50000 [1:53:27<6:47:40,  1.62it/s]


 21%|██████▉                          | 10425/50000 [1:53:28<6:55:37,  1.59it/s]


 21%|██████▉                          | 10426/50000 [1:53:28<6:42:22,  1.64it/s]


 21%|██████▉                          | 10427/50000 [1:53:29<6:46:54,  1.62it/s]


 21%|██████▉                          | 10428/50000 [1:53:29<6:28:01,  1.70it/s]


 21%|██████▉                          | 10429/50000 [1:53:30<6:56:05,  1.59it/s]


 21%|██████▉                          | 10430/50000 [1:53:31<7:05:07,  1.55it/s]


 21%|██████▉                          | 10431/50000 [1:53:31<6:45:19,  1.63it/s]


 21%|██████▉                          | 10432/50000 [1:53:32<7:13:05,  1.52it/s]


 21%|██████▉                          | 10433/50000 [1:53:33<7:09:37,  1.53it/s]


 21%|██████▉                          | 10434/50000 [1:53:34<7:49:57,  1.40it/s]


 21%|██████▉                          | 10435/50000 [1:53:34<7:44:11,  1.42it/s]


 21%|██████▉                          | 10436/50000 [1:53:35<7:15:07,  1.52it/s]


 21%|██████▉                          | 10437/50000 [1:53:36<7:52:10,  1.40it/s]


 21%|██████▉                          | 10438/50000 [1:53:36<7:44:38,  1.42it/s]


 21%|██████▉                          | 10439/50000 [1:53:37<7:46:45,  1.41it/s]


 21%|██████▉                          | 10440/50000 [1:53:38<7:39:16,  1.44it/s]


 21%|██████▉                          | 10441/50000 [1:53:38<7:32:50,  1.46it/s]


 21%|██████▉                          | 10442/50000 [1:53:39<7:32:11,  1.46it/s]


 21%|██████▉                          | 10443/50000 [1:53:40<7:09:11,  1.54it/s]


 21%|██████▉                          | 10444/50000 [1:53:40<6:58:19,  1.58it/s]


 21%|██████▉                          | 10445/50000 [1:53:41<7:04:14,  1.55it/s]


 21%|██████▉                          | 10446/50000 [1:53:42<7:00:53,  1.57it/s]


 21%|██████▉                          | 10447/50000 [1:53:42<7:05:57,  1.55it/s]


 21%|██████▉                          | 10448/50000 [1:53:43<7:04:25,  1.55it/s]


 21%|██████▉                          | 10449/50000 [1:53:44<6:53:23,  1.59it/s]


 21%|██████▉                          | 10450/50000 [1:53:44<6:45:43,  1.62it/s]


 21%|██████▉                          | 10451/50000 [1:53:45<7:04:06,  1.55it/s]


 21%|██████▉                          | 10452/50000 [1:53:46<7:28:42,  1.47it/s]


 21%|██████▉                          | 10453/50000 [1:53:46<7:21:38,  1.49it/s]


 21%|██████▉                          | 10454/50000 [1:53:47<7:06:46,  1.54it/s]


 21%|██████▉                          | 10455/50000 [1:53:47<6:51:20,  1.60it/s]


 21%|██████▉                          | 10456/50000 [1:53:48<6:26:13,  1.71it/s]


 21%|██████▉                          | 10457/50000 [1:53:49<6:44:12,  1.63it/s]


 21%|██████▉                          | 10458/50000 [1:53:49<6:51:42,  1.60it/s]


 21%|██████▉                          | 10459/50000 [1:53:50<6:48:08,  1.61it/s]


 21%|██████▉                          | 10460/50000 [1:53:50<6:36:55,  1.66it/s]


 21%|██████▉                          | 10461/50000 [1:53:51<7:03:18,  1.56it/s]


 21%|██████▉                          | 10462/50000 [1:53:52<7:20:40,  1.50it/s]


 21%|██████▉                          | 10463/50000 [1:53:53<7:28:28,  1.47it/s]


 21%|██████▉                          | 10464/50000 [1:53:53<7:23:53,  1.48it/s]


 21%|██████▉                          | 10465/50000 [1:53:54<7:50:36,  1.40it/s]


 21%|██████▉                          | 10466/50000 [1:53:55<7:08:58,  1.54it/s]


 21%|██████▉                          | 10467/50000 [1:53:55<7:21:44,  1.49it/s]


 21%|██████▉                          | 10468/50000 [1:53:56<7:16:40,  1.51it/s]


 21%|██████▉                          | 10469/50000 [1:53:57<7:08:11,  1.54it/s]


 21%|██████▉                          | 10470/50000 [1:53:57<7:07:48,  1.54it/s]


 21%|██████▉                          | 10471/50000 [1:53:58<7:15:02,  1.51it/s]


 21%|██████▉                          | 10472/50000 [1:53:59<7:30:37,  1.46it/s]


 21%|██████▉                          | 10473/50000 [1:53:59<7:47:30,  1.41it/s]


 21%|██████▉                          | 10474/50000 [1:54:00<7:33:49,  1.45it/s]


 21%|██████▉                          | 10475/50000 [1:54:01<7:18:19,  1.50it/s]


 21%|██████▉                          | 10476/50000 [1:54:01<6:44:07,  1.63it/s]


 21%|██████▉                          | 10477/50000 [1:54:02<6:32:08,  1.68it/s]


 21%|██████▉                          | 10478/50000 [1:54:02<6:40:57,  1.64it/s]


 21%|██████▉                          | 10479/50000 [1:54:03<6:35:45,  1.66it/s]


 21%|██████▉                          | 10480/50000 [1:54:03<6:32:02,  1.68it/s]


 21%|██████▉                          | 10481/50000 [1:54:04<7:02:22,  1.56it/s]


 21%|██████▉                          | 10482/50000 [1:54:05<7:04:21,  1.55it/s]


 21%|██████▉                          | 10483/50000 [1:54:05<7:04:34,  1.55it/s]


 21%|██████▉                          | 10484/50000 [1:54:06<6:54:37,  1.59it/s]


 21%|██████▉                          | 10485/50000 [1:54:07<6:57:36,  1.58it/s]


 21%|██████▉                          | 10486/50000 [1:54:07<6:49:39,  1.61it/s]


 21%|██████▉                          | 10487/50000 [1:54:08<6:54:48,  1.59it/s]


 21%|██████▉                          | 10488/50000 [1:54:09<6:58:14,  1.57it/s]


 21%|██████▉                          | 10489/50000 [1:54:09<7:36:21,  1.44it/s]


 21%|██████▉                          | 10490/50000 [1:54:10<7:30:33,  1.46it/s]


 21%|██████▉                          | 10491/50000 [1:54:11<7:25:29,  1.48it/s]


 21%|██████▉                          | 10492/50000 [1:54:11<7:10:23,  1.53it/s]


 21%|██████▉                          | 10493/50000 [1:54:12<7:26:39,  1.47it/s]


 21%|██████▉                          | 10494/50000 [1:54:13<7:25:05,  1.48it/s]


 21%|██████▉                          | 10495/50000 [1:54:13<7:20:12,  1.50it/s]


 21%|██████▉                          | 10496/50000 [1:54:14<7:16:53,  1.51it/s]


 21%|██████▉                          | 10497/50000 [1:54:15<7:04:33,  1.55it/s]


 21%|██████▉                          | 10498/50000 [1:54:15<6:48:58,  1.61it/s]


 21%|██████▉                          | 10499/50000 [1:54:16<6:59:56,  1.57it/s]


 21%|██████▉                          | 10500/50000 [1:54:17<7:18:10,  1.50it/s]
                                                                                
{'loss': 3.3896, 'grad_norm': 3.0075747966766357, 'learning_rate': 0.00079, 'epoch': 0.55}

 21%|██████▉                          | 10500/50000 [1:54:17<7:18:10,  1.50it/s]


 21%|██████▉                          | 10501/50000 [1:54:17<7:39:58,  1.43it/s]


 21%|██████▉                          | 10502/50000 [1:54:18<7:34:44,  1.45it/s]


 21%|██████▉                          | 10503/50000 [1:54:19<7:59:41,  1.37it/s]


 21%|██████▉                          | 10504/50000 [1:54:20<7:42:15,  1.42it/s]


 21%|██████▉                          | 10505/50000 [1:54:20<7:31:16,  1.46it/s]


 21%|██████▉                          | 10506/50000 [1:54:21<7:24:53,  1.48it/s]


 21%|██████▉                          | 10507/50000 [1:54:22<7:19:17,  1.50it/s]


 21%|██████▉                          | 10508/50000 [1:54:22<7:31:57,  1.46it/s]


 21%|██████▉                          | 10509/50000 [1:54:23<7:29:08,  1.47it/s]


 21%|██████▉                          | 10510/50000 [1:54:23<7:02:28,  1.56it/s]


 21%|██████▉                          | 10511/50000 [1:54:24<6:49:49,  1.61it/s]


 21%|██████▉                          | 10512/50000 [1:54:25<6:37:20,  1.66it/s]


 21%|██████▉                          | 10513/50000 [1:54:25<6:38:10,  1.65it/s]


 21%|██████▉                          | 10514/50000 [1:54:26<6:29:53,  1.69it/s]


 21%|██████▉                          | 10515/50000 [1:54:26<6:30:27,  1.69it/s]


 21%|██████▉                          | 10516/50000 [1:54:27<6:46:54,  1.62it/s]


 21%|██████▉                          | 10517/50000 [1:54:28<6:34:55,  1.67it/s]


 21%|██████▉                          | 10518/50000 [1:54:28<6:46:19,  1.62it/s]


 21%|██████▉                          | 10519/50000 [1:54:29<6:56:52,  1.58it/s]


 21%|██████▉                          | 10520/50000 [1:54:30<6:49:09,  1.61it/s]


 21%|██████▉                          | 10521/50000 [1:54:30<6:59:57,  1.57it/s]


 21%|██████▉                          | 10522/50000 [1:54:31<6:49:22,  1.61it/s]


 21%|██████▉                          | 10523/50000 [1:54:31<7:01:25,  1.56it/s]


 21%|██████▉                          | 10524/50000 [1:54:32<8:00:28,  1.37it/s]


 21%|██████▉                          | 10525/50000 [1:54:33<7:29:22,  1.46it/s]


 21%|██████▉                          | 10526/50000 [1:54:34<7:23:14,  1.48it/s]


 21%|██████▉                          | 10527/50000 [1:54:34<7:13:12,  1.52it/s]


 21%|██████▉                          | 10528/50000 [1:54:35<7:18:00,  1.50it/s]


 21%|██████▉                          | 10529/50000 [1:54:36<7:06:36,  1.54it/s]


 21%|██████▉                          | 10530/50000 [1:54:36<7:20:40,  1.49it/s]


 21%|██████▉                          | 10531/50000 [1:54:37<7:14:53,  1.51it/s]


 21%|██████▉                          | 10532/50000 [1:54:37<6:59:35,  1.57it/s]


 21%|██████▉                          | 10533/50000 [1:54:38<6:47:55,  1.61it/s]


 21%|██████▉                          | 10534/50000 [1:54:39<7:31:53,  1.46it/s]


 21%|██████▉                          | 10535/50000 [1:54:40<7:48:13,  1.40it/s]


 21%|██████▉                          | 10536/50000 [1:54:40<7:50:53,  1.40it/s]


 21%|██████▉                          | 10537/50000 [1:54:41<7:44:47,  1.42it/s]


 21%|██████▉                          | 10538/50000 [1:54:42<7:28:09,  1.47it/s]


 21%|██████▉                          | 10539/50000 [1:54:42<7:12:33,  1.52it/s]


 21%|██████▉                          | 10540/50000 [1:54:43<7:30:00,  1.46it/s]


 21%|██████▉                          | 10541/50000 [1:54:44<7:15:10,  1.51it/s]


 21%|██████▉                          | 10542/50000 [1:54:44<7:26:03,  1.47it/s]


 21%|██████▉                          | 10543/50000 [1:54:45<7:03:30,  1.55it/s]


 21%|██████▉                          | 10544/50000 [1:54:46<7:04:52,  1.55it/s]


 21%|██████▉                          | 10545/50000 [1:54:46<6:44:49,  1.62it/s]


 21%|██████▉                          | 10546/50000 [1:54:47<6:33:18,  1.67it/s]


 21%|██████▉                          | 10547/50000 [1:54:47<7:01:49,  1.56it/s]


 21%|██████▉                          | 10548/50000 [1:54:48<7:02:36,  1.56it/s]


 21%|██████▉                          | 10549/50000 [1:54:49<7:58:02,  1.38it/s]


 21%|██████▉                          | 10550/50000 [1:54:50<7:54:28,  1.39it/s]


 21%|██████▉                          | 10551/50000 [1:54:50<7:28:05,  1.47it/s]


 21%|██████▉                          | 10552/50000 [1:54:51<7:08:45,  1.53it/s]


 21%|██████▉                          | 10553/50000 [1:54:52<7:19:56,  1.49it/s]


 21%|██████▉                          | 10554/50000 [1:54:52<7:16:20,  1.51it/s]


 21%|██████▉                          | 10555/50000 [1:54:53<7:18:56,  1.50it/s]


 21%|██████▉                          | 10556/50000 [1:54:54<7:14:33,  1.51it/s]


 21%|██████▉                          | 10557/50000 [1:54:54<6:55:19,  1.58it/s]


 21%|██████▉                          | 10558/50000 [1:54:55<7:17:05,  1.50it/s]


 21%|██████▉                          | 10559/50000 [1:54:55<7:04:27,  1.55it/s]


 21%|██████▉                          | 10560/50000 [1:54:56<7:23:27,  1.48it/s]


 21%|██████▉                          | 10561/50000 [1:54:57<7:43:10,  1.42it/s]


 21%|██████▉                          | 10562/50000 [1:54:58<7:30:04,  1.46it/s]


 21%|██████▉                          | 10563/50000 [1:54:58<6:55:34,  1.58it/s]


 21%|██████▉                          | 10564/50000 [1:54:59<6:45:38,  1.62it/s]


 21%|██████▉                          | 10565/50000 [1:54:59<7:11:30,  1.52it/s]


 21%|██████▉                          | 10566/50000 [1:55:00<7:06:46,  1.54it/s]


 21%|██████▉                          | 10567/50000 [1:55:01<7:22:43,  1.48it/s]


 21%|██████▉                          | 10568/50000 [1:55:02<7:24:41,  1.48it/s]


 21%|██████▉                          | 10569/50000 [1:55:02<7:30:56,  1.46it/s]


 21%|██████▉                          | 10570/50000 [1:55:03<7:24:15,  1.48it/s]


 21%|██████▉                          | 10571/50000 [1:55:04<7:16:48,  1.50it/s]


 21%|██████▉                          | 10572/50000 [1:55:04<7:12:54,  1.52it/s]


 21%|██████▉                          | 10573/50000 [1:55:05<7:01:27,  1.56it/s]


 21%|██████▉                          | 10574/50000 [1:55:05<7:02:05,  1.56it/s]


 21%|██████▉                          | 10575/50000 [1:55:06<6:55:03,  1.58it/s]


 21%|██████▉                          | 10576/50000 [1:55:07<6:47:45,  1.61it/s]


 21%|██████▉                          | 10577/50000 [1:55:07<6:58:51,  1.57it/s]


 21%|██████▉                          | 10578/50000 [1:55:08<7:27:08,  1.47it/s]


 21%|██████▉                          | 10579/50000 [1:55:09<7:07:46,  1.54it/s]


 21%|██████▉                          | 10580/50000 [1:55:09<6:38:39,  1.65it/s]


 21%|██████▉                          | 10581/50000 [1:55:10<6:48:45,  1.61it/s]


 21%|██████▉                          | 10582/50000 [1:55:10<6:57:01,  1.58it/s]


 21%|██████▉                          | 10583/50000 [1:55:11<7:06:36,  1.54it/s]


 21%|██████▉                          | 10584/50000 [1:55:12<6:52:04,  1.59it/s]


 21%|██████▉                          | 10585/50000 [1:55:12<6:58:07,  1.57it/s]


 21%|██████▉                          | 10586/50000 [1:55:13<6:50:10,  1.60it/s]


 21%|██████▉                          | 10587/50000 [1:55:14<7:14:36,  1.51it/s]


 21%|██████▉                          | 10588/50000 [1:55:14<7:25:36,  1.47it/s]


 21%|██████▉                          | 10589/50000 [1:55:15<7:35:50,  1.44it/s]


 21%|██████▉                          | 10590/50000 [1:55:16<7:00:45,  1.56it/s]


 21%|██████▉                          | 10591/50000 [1:55:16<6:51:35,  1.60it/s]


 21%|██████▉                          | 10592/50000 [1:55:17<6:50:25,  1.60it/s]


 21%|██████▉                          | 10593/50000 [1:55:18<6:43:10,  1.63it/s]


 21%|██████▉                          | 10594/50000 [1:55:18<6:32:00,  1.68it/s]


 21%|██████▉                          | 10595/50000 [1:55:19<6:24:02,  1.71it/s]


 21%|██████▉                          | 10596/50000 [1:55:19<6:01:52,  1.81it/s]


 21%|██████▉                          | 10597/50000 [1:55:20<6:39:44,  1.64it/s]


 21%|██████▉                          | 10598/50000 [1:55:20<6:16:33,  1.74it/s]


 21%|██████▉                          | 10599/50000 [1:55:21<6:25:40,  1.70it/s]


 21%|██████▉                          | 10600/50000 [1:55:22<6:27:34,  1.69it/s]
                                                                                
{'loss': 3.3946, 'grad_norm': 2.9829928874969482, 'learning_rate': 0.0007880000000000001, 'epoch': 0.55}

 21%|██████▉                          | 10600/50000 [1:55:22<6:27:34,  1.69it/s]


 21%|██████▉                          | 10601/50000 [1:55:22<6:34:45,  1.66it/s]


 21%|██████▉                          | 10602/50000 [1:55:23<6:38:47,  1.65it/s]


 21%|██████▉                          | 10603/50000 [1:55:23<6:49:50,  1.60it/s]


 21%|██████▉                          | 10604/50000 [1:55:24<6:41:29,  1.64it/s]


 21%|██████▉                          | 10605/50000 [1:55:25<7:00:40,  1.56it/s]


 21%|██████▉                          | 10606/50000 [1:55:25<6:37:14,  1.65it/s]


 21%|███████                          | 10607/50000 [1:55:26<6:37:29,  1.65it/s]


 21%|███████                          | 10608/50000 [1:55:27<6:46:28,  1.62it/s]


 21%|███████                          | 10609/50000 [1:55:27<7:11:01,  1.52it/s]


 21%|███████                          | 10610/50000 [1:55:28<6:52:49,  1.59it/s]


 21%|███████                          | 10611/50000 [1:55:29<6:59:49,  1.56it/s]


 21%|███████                          | 10612/50000 [1:55:29<7:01:00,  1.56it/s]


 21%|███████                          | 10613/50000 [1:55:30<7:00:42,  1.56it/s]


 21%|███████                          | 10614/50000 [1:55:30<7:01:22,  1.56it/s]


 21%|███████                          | 10615/50000 [1:55:31<6:47:11,  1.61it/s]


 21%|███████                          | 10616/50000 [1:55:32<6:50:48,  1.60it/s]


 21%|███████                          | 10617/50000 [1:55:32<6:39:45,  1.64it/s]


 21%|███████                          | 10618/50000 [1:55:33<6:37:21,  1.65it/s]


 21%|███████                          | 10619/50000 [1:55:33<6:35:35,  1.66it/s]


 21%|███████                          | 10620/50000 [1:55:34<6:40:36,  1.64it/s]


 21%|███████                          | 10621/50000 [1:55:35<6:32:55,  1.67it/s]


 21%|███████                          | 10622/50000 [1:55:35<7:02:49,  1.55it/s]


 21%|███████                          | 10623/50000 [1:55:36<7:00:55,  1.56it/s]


 21%|███████                          | 10624/50000 [1:55:37<6:44:44,  1.62it/s]


 21%|███████                          | 10625/50000 [1:55:37<6:36:50,  1.65it/s]


 21%|███████                          | 10626/50000 [1:55:38<6:18:06,  1.74it/s]


 21%|███████                          | 10627/50000 [1:55:38<6:34:13,  1.66it/s]


 21%|███████                          | 10628/50000 [1:55:39<6:44:10,  1.62it/s]


 21%|███████                          | 10629/50000 [1:55:40<6:35:04,  1.66it/s]


 21%|███████                          | 10630/50000 [1:55:40<6:49:39,  1.60it/s]


 21%|███████                          | 10631/50000 [1:55:41<7:46:16,  1.41it/s]


 21%|███████                          | 10632/50000 [1:55:42<7:29:34,  1.46it/s]


 21%|███████                          | 10633/50000 [1:55:42<7:05:34,  1.54it/s]


 21%|███████                          | 10634/50000 [1:55:43<7:03:35,  1.55it/s]


 21%|███████                          | 10635/50000 [1:55:44<7:16:00,  1.50it/s]


 21%|███████                          | 10636/50000 [1:55:44<7:12:13,  1.52it/s]


 21%|███████                          | 10637/50000 [1:55:45<7:04:53,  1.54it/s]


 21%|███████                          | 10638/50000 [1:55:46<6:53:19,  1.59it/s]


 21%|███████                          | 10639/50000 [1:55:46<6:57:31,  1.57it/s]


 21%|███████                          | 10640/50000 [1:55:47<7:04:16,  1.55it/s]


 21%|███████                          | 10641/50000 [1:55:47<6:55:13,  1.58it/s]


 21%|███████                          | 10642/50000 [1:55:48<6:27:13,  1.69it/s]


 21%|███████                          | 10643/50000 [1:55:48<6:24:28,  1.71it/s]


 21%|███████                          | 10644/50000 [1:55:49<6:26:04,  1.70it/s]


 21%|███████                          | 10645/50000 [1:55:50<6:54:07,  1.58it/s]


 21%|███████                          | 10646/50000 [1:55:50<6:43:09,  1.63it/s]


 21%|███████                          | 10647/50000 [1:55:51<7:04:30,  1.55it/s]


 21%|███████                          | 10648/50000 [1:55:52<7:37:28,  1.43it/s]


 21%|███████                          | 10649/50000 [1:55:53<8:01:51,  1.36it/s]


 21%|███████                          | 10650/50000 [1:55:53<7:50:34,  1.39it/s]


 21%|███████                          | 10651/50000 [1:55:54<7:35:01,  1.44it/s]


 21%|███████                          | 10652/50000 [1:55:55<7:16:40,  1.50it/s]


 21%|███████                          | 10653/50000 [1:55:55<7:02:40,  1.55it/s]


 21%|███████                          | 10654/50000 [1:55:56<7:05:39,  1.54it/s]


 21%|███████                          | 10655/50000 [1:55:57<7:02:27,  1.55it/s]


 21%|███████                          | 10656/50000 [1:55:57<7:35:56,  1.44it/s]


 21%|███████                          | 10657/50000 [1:55:58<7:33:38,  1.45it/s]


 21%|███████                          | 10658/50000 [1:55:59<7:14:24,  1.51it/s]


 21%|███████                          | 10659/50000 [1:55:59<7:26:21,  1.47it/s]


 21%|███████                          | 10660/50000 [1:56:00<7:14:24,  1.51it/s]


 21%|███████                          | 10661/50000 [1:56:01<7:43:25,  1.41it/s]


 21%|███████                          | 10662/50000 [1:56:01<7:14:00,  1.51it/s]


 21%|███████                          | 10663/50000 [1:56:02<7:08:13,  1.53it/s]


 21%|███████                          | 10664/50000 [1:56:03<7:19:06,  1.49it/s]


 21%|███████                          | 10665/50000 [1:56:03<7:17:56,  1.50it/s]


 21%|███████                          | 10666/50000 [1:56:04<7:12:14,  1.52it/s]


 21%|███████                          | 10667/50000 [1:56:05<6:50:50,  1.60it/s]


 21%|███████                          | 10668/50000 [1:56:05<6:37:06,  1.65it/s]


 21%|███████                          | 10669/50000 [1:56:06<6:41:06,  1.63it/s]


 21%|███████                          | 10670/50000 [1:56:06<6:42:43,  1.63it/s]


 21%|███████                          | 10671/50000 [1:56:07<6:48:47,  1.60it/s]


 21%|███████                          | 10672/50000 [1:56:08<6:55:54,  1.58it/s]


 21%|███████                          | 10673/50000 [1:56:08<6:59:20,  1.56it/s]


 21%|███████                          | 10674/50000 [1:56:09<6:51:33,  1.59it/s]


 21%|███████                          | 10675/50000 [1:56:10<8:00:03,  1.37it/s]


 21%|███████                          | 10676/50000 [1:56:11<8:00:51,  1.36it/s]


 21%|███████                          | 10677/50000 [1:56:11<8:17:05,  1.32it/s]


 21%|███████                          | 10678/50000 [1:56:12<7:54:47,  1.38it/s]


 21%|███████                          | 10679/50000 [1:56:13<7:26:48,  1.47it/s]


 21%|███████                          | 10680/50000 [1:56:13<7:24:35,  1.47it/s]


 21%|███████                          | 10681/50000 [1:56:14<7:14:22,  1.51it/s]


 21%|███████                          | 10682/50000 [1:56:15<7:00:55,  1.56it/s]


 21%|███████                          | 10683/50000 [1:56:15<7:37:26,  1.43it/s]


 21%|███████                          | 10684/50000 [1:56:16<7:30:40,  1.45it/s]


 21%|███████                          | 10685/50000 [1:56:17<7:28:26,  1.46it/s]


 21%|███████                          | 10686/50000 [1:56:17<7:16:59,  1.50it/s]


 21%|███████                          | 10687/50000 [1:56:18<7:16:21,  1.50it/s]


 21%|███████                          | 10688/50000 [1:56:19<7:08:53,  1.53it/s]


 21%|███████                          | 10689/50000 [1:56:19<6:58:41,  1.56it/s]


 21%|███████                          | 10690/50000 [1:56:20<6:48:37,  1.60it/s]


 21%|███████                          | 10691/50000 [1:56:20<6:55:18,  1.58it/s]


 21%|███████                          | 10692/50000 [1:56:21<6:32:08,  1.67it/s]


 21%|███████                          | 10693/50000 [1:56:22<6:45:09,  1.62it/s]


 21%|███████                          | 10694/50000 [1:56:22<6:50:30,  1.60it/s]


 21%|███████                          | 10695/50000 [1:56:23<6:58:08,  1.57it/s]


 21%|███████                          | 10696/50000 [1:56:24<7:53:21,  1.38it/s]


 21%|███████                          | 10697/50000 [1:56:25<7:44:13,  1.41it/s]


 21%|███████                          | 10698/50000 [1:56:25<7:22:00,  1.48it/s]


 21%|███████                          | 10699/50000 [1:56:26<7:36:31,  1.43it/s]


 21%|███████                          | 10700/50000 [1:56:27<7:28:52,  1.46it/s]
                                                                                
{'loss': 3.3385, 'grad_norm': 3.311107635498047, 'learning_rate': 0.000786, 'epoch': 0.56}

 21%|███████                          | 10700/50000 [1:56:27<7:28:52,  1.46it/s]


 21%|███████                          | 10701/50000 [1:56:27<7:07:28,  1.53it/s]


 21%|███████                          | 10702/50000 [1:56:28<6:46:12,  1.61it/s]


 21%|███████                          | 10703/50000 [1:56:28<6:48:34,  1.60it/s]


 21%|███████                          | 10704/50000 [1:56:29<6:27:24,  1.69it/s]


 21%|███████                          | 10705/50000 [1:56:30<7:21:28,  1.48it/s]


 21%|███████                          | 10706/50000 [1:56:31<8:20:54,  1.31it/s]


 21%|███████                          | 10707/50000 [1:56:31<8:01:00,  1.36it/s]


 21%|███████                          | 10708/50000 [1:56:32<7:27:17,  1.46it/s]


 21%|███████                          | 10709/50000 [1:56:33<7:27:34,  1.46it/s]


 21%|███████                          | 10710/50000 [1:56:33<7:55:55,  1.38it/s]


 21%|███████                          | 10711/50000 [1:56:34<7:13:02,  1.51it/s]


 21%|███████                          | 10712/50000 [1:56:35<7:14:46,  1.51it/s]


 21%|███████                          | 10713/50000 [1:56:35<7:06:12,  1.54it/s]


 21%|███████                          | 10714/50000 [1:56:36<7:04:00,  1.54it/s]


 21%|███████                          | 10715/50000 [1:56:37<7:08:31,  1.53it/s]


 21%|███████                          | 10716/50000 [1:56:37<7:45:50,  1.41it/s]


 21%|███████                          | 10717/50000 [1:56:38<7:35:39,  1.44it/s]


 21%|███████                          | 10718/50000 [1:56:39<7:17:31,  1.50it/s]


 21%|███████                          | 10719/50000 [1:56:39<7:19:39,  1.49it/s]


 21%|███████                          | 10720/50000 [1:56:40<7:03:41,  1.55it/s]


 21%|███████                          | 10721/50000 [1:56:41<7:26:40,  1.47it/s]


 21%|███████                          | 10722/50000 [1:56:41<7:12:26,  1.51it/s]


 21%|███████                          | 10723/50000 [1:56:42<7:44:01,  1.41it/s]


 21%|███████                          | 10724/50000 [1:56:43<7:19:13,  1.49it/s]


 21%|███████                          | 10725/50000 [1:56:43<7:19:06,  1.49it/s]


 21%|███████                          | 10726/50000 [1:56:44<7:00:23,  1.56it/s]


 21%|███████                          | 10727/50000 [1:56:45<6:59:35,  1.56it/s]


 21%|███████                          | 10728/50000 [1:56:45<7:14:33,  1.51it/s]


 21%|███████                          | 10729/50000 [1:56:46<7:12:00,  1.52it/s]


 21%|███████                          | 10730/50000 [1:56:47<7:05:19,  1.54it/s]


 21%|███████                          | 10731/50000 [1:56:47<6:44:39,  1.62it/s]


 21%|███████                          | 10732/50000 [1:56:48<7:22:01,  1.48it/s]


 21%|███████                          | 10733/50000 [1:56:49<7:06:05,  1.54it/s]


 21%|███████                          | 10734/50000 [1:56:49<7:11:10,  1.52it/s]


 21%|███████                          | 10735/50000 [1:56:50<6:49:55,  1.60it/s]


 21%|███████                          | 10736/50000 [1:56:50<6:51:04,  1.59it/s]


 21%|███████                          | 10737/50000 [1:56:51<7:14:38,  1.51it/s]


 21%|███████                          | 10738/50000 [1:56:52<7:32:31,  1.45it/s]


 21%|███████                          | 10739/50000 [1:56:52<7:13:23,  1.51it/s]


 21%|███████                          | 10740/50000 [1:56:53<6:58:48,  1.56it/s]


 21%|███████                          | 10741/50000 [1:56:54<6:28:17,  1.69it/s]


 21%|███████                          | 10742/50000 [1:56:54<6:43:26,  1.62it/s]


 21%|███████                          | 10743/50000 [1:56:55<6:21:10,  1.72it/s]


 21%|███████                          | 10744/50000 [1:56:55<6:39:40,  1.64it/s]


 21%|███████                          | 10745/50000 [1:56:56<6:43:56,  1.62it/s]


 21%|███████                          | 10746/50000 [1:56:57<6:47:00,  1.61it/s]


 21%|███████                          | 10747/50000 [1:56:57<6:47:47,  1.60it/s]


 21%|███████                          | 10748/50000 [1:56:58<6:25:29,  1.70it/s]


 21%|███████                          | 10749/50000 [1:56:58<6:42:38,  1.62it/s]


 22%|███████                          | 10750/50000 [1:56:59<7:23:14,  1.48it/s]


 22%|███████                          | 10751/50000 [1:57:00<7:03:20,  1.55it/s]


 22%|███████                          | 10752/50000 [1:57:00<6:47:28,  1.61it/s]


 22%|███████                          | 10753/50000 [1:57:01<6:50:36,  1.59it/s]


 22%|███████                          | 10754/50000 [1:57:02<7:30:33,  1.45it/s]


 22%|███████                          | 10755/50000 [1:57:03<7:13:48,  1.51it/s]


 22%|███████                          | 10756/50000 [1:57:03<7:15:44,  1.50it/s]


 22%|███████                          | 10757/50000 [1:57:04<7:19:41,  1.49it/s]


 22%|███████                          | 10758/50000 [1:57:05<7:19:18,  1.49it/s]


 22%|███████                          | 10759/50000 [1:57:05<7:16:58,  1.50it/s]


 22%|███████                          | 10760/50000 [1:57:06<7:00:30,  1.56it/s]


 22%|███████                          | 10761/50000 [1:57:07<7:34:46,  1.44it/s]


 22%|███████                          | 10762/50000 [1:57:07<7:06:51,  1.53it/s]


 22%|███████                          | 10763/50000 [1:57:08<6:53:28,  1.58it/s]


 22%|███████                          | 10764/50000 [1:57:08<6:55:14,  1.57it/s]


 22%|███████                          | 10765/50000 [1:57:09<7:17:59,  1.49it/s]


 22%|███████                          | 10766/50000 [1:57:10<7:32:23,  1.45it/s]


 22%|███████                          | 10767/50000 [1:57:11<7:22:00,  1.48it/s]


 22%|███████                          | 10768/50000 [1:57:11<7:18:30,  1.49it/s]


 22%|███████                          | 10769/50000 [1:57:12<7:11:11,  1.52it/s]


 22%|███████                          | 10770/50000 [1:57:13<7:28:44,  1.46it/s]


 22%|███████                          | 10771/50000 [1:57:13<7:28:17,  1.46it/s]


 22%|███████                          | 10772/50000 [1:57:14<8:14:00,  1.32it/s]


 22%|███████                          | 10773/50000 [1:57:15<7:53:38,  1.38it/s]


 22%|███████                          | 10774/50000 [1:57:15<7:35:49,  1.43it/s]


 22%|███████                          | 10775/50000 [1:57:16<7:25:43,  1.47it/s]


 22%|███████                          | 10776/50000 [1:57:17<7:02:31,  1.55it/s]


 22%|███████                          | 10777/50000 [1:57:17<7:28:52,  1.46it/s]


 22%|███████                          | 10778/50000 [1:57:18<7:22:00,  1.48it/s]


 22%|███████                          | 10779/50000 [1:57:19<7:34:55,  1.44it/s]


 22%|███████                          | 10780/50000 [1:57:19<7:23:06,  1.48it/s]


 22%|███████                          | 10781/50000 [1:57:20<7:29:34,  1.45it/s]


 22%|███████                          | 10782/50000 [1:57:21<7:21:46,  1.48it/s]


 22%|███████                          | 10783/50000 [1:57:21<7:17:28,  1.49it/s]


 22%|███████                          | 10784/50000 [1:57:22<7:33:34,  1.44it/s]


 22%|███████                          | 10785/50000 [1:57:23<7:20:52,  1.48it/s]


 22%|███████                          | 10786/50000 [1:57:24<7:29:13,  1.45it/s]


 22%|███████                          | 10787/50000 [1:57:24<6:54:47,  1.58it/s]


 22%|███████                          | 10788/50000 [1:57:25<6:40:01,  1.63it/s]


 22%|███████                          | 10789/50000 [1:57:25<6:50:36,  1.59it/s]


 22%|███████                          | 10790/50000 [1:57:26<6:56:46,  1.57it/s]


 22%|███████                          | 10791/50000 [1:57:27<6:50:01,  1.59it/s]


 22%|███████                          | 10792/50000 [1:57:27<6:36:39,  1.65it/s]


 22%|███████                          | 10793/50000 [1:57:28<7:07:26,  1.53it/s]


 22%|███████                          | 10794/50000 [1:57:29<7:30:13,  1.45it/s]


 22%|███████                          | 10795/50000 [1:57:29<7:20:46,  1.48it/s]


 22%|███████▏                         | 10796/50000 [1:57:30<7:18:08,  1.49it/s]


 22%|███████▏                         | 10797/50000 [1:57:31<7:05:47,  1.53it/s]


 22%|███████▏                         | 10798/50000 [1:57:31<7:08:14,  1.53it/s]


 22%|███████▏                         | 10799/50000 [1:57:32<6:57:32,  1.56it/s]


 22%|███████▏                         | 10800/50000 [1:57:33<7:18:59,  1.49it/s]
                                                                                
{'loss': 3.3155, 'grad_norm': 3.5169870853424072, 'learning_rate': 0.0007840000000000001, 'epoch': 0.57}

 22%|███████▏                         | 10800/50000 [1:57:33<7:18:59,  1.49it/s]


 22%|███████▏                         | 10801/50000 [1:57:33<7:32:00,  1.45it/s]


 22%|███████▏                         | 10802/50000 [1:57:34<6:55:30,  1.57it/s]


 22%|███████▏                         | 10803/50000 [1:57:35<7:02:44,  1.55it/s]


 22%|███████▏                         | 10804/50000 [1:57:35<6:51:39,  1.59it/s]


 22%|███████▏                         | 10805/50000 [1:57:36<7:47:17,  1.40it/s]


 22%|███████▏                         | 10806/50000 [1:57:37<7:26:02,  1.46it/s]


 22%|███████▏                         | 10807/50000 [1:57:37<7:57:47,  1.37it/s]


 22%|███████▏                         | 10808/50000 [1:57:38<7:32:40,  1.44it/s]


 22%|███████▏                         | 10809/50000 [1:57:39<7:22:30,  1.48it/s]


 22%|███████▏                         | 10810/50000 [1:57:39<7:21:36,  1.48it/s]


 22%|███████▏                         | 10811/50000 [1:57:40<7:31:20,  1.45it/s]


 22%|███████▏                         | 10812/50000 [1:57:41<7:21:29,  1.48it/s]


 22%|███████▏                         | 10813/50000 [1:57:41<6:59:58,  1.56it/s]


 22%|███████▏                         | 10814/50000 [1:57:42<7:04:41,  1.54it/s]


 22%|███████▏                         | 10815/50000 [1:57:43<6:51:47,  1.59it/s]


 22%|███████▏                         | 10816/50000 [1:57:43<6:42:48,  1.62it/s]


 22%|███████▏                         | 10817/50000 [1:57:44<6:52:45,  1.58it/s]


 22%|███████▏                         | 10818/50000 [1:57:45<7:29:02,  1.45it/s]


 22%|███████▏                         | 10819/50000 [1:57:45<7:43:51,  1.41it/s]


 22%|███████▏                         | 10820/50000 [1:57:46<7:22:44,  1.47it/s]


 22%|███████▏                         | 10821/50000 [1:57:47<7:15:41,  1.50it/s]


 22%|███████▏                         | 10822/50000 [1:57:47<7:29:34,  1.45it/s]


 22%|███████▏                         | 10823/50000 [1:57:48<7:35:35,  1.43it/s]


 22%|███████▏                         | 10824/50000 [1:57:49<7:10:59,  1.51it/s]


 22%|███████▏                         | 10825/50000 [1:57:49<6:52:23,  1.58it/s]


 22%|███████▏                         | 10826/50000 [1:57:50<7:10:47,  1.52it/s]


 22%|███████▏                         | 10827/50000 [1:57:51<6:51:05,  1.59it/s]


 22%|███████▏                         | 10828/50000 [1:57:51<6:49:34,  1.59it/s]


 22%|███████▏                         | 10829/50000 [1:57:52<7:25:29,  1.47it/s]


 22%|███████▏                         | 10830/50000 [1:57:52<6:43:33,  1.62it/s]


 22%|███████▏                         | 10831/50000 [1:57:53<6:32:02,  1.67it/s]


 22%|███████▏                         | 10832/50000 [1:57:54<6:26:20,  1.69it/s]


 22%|███████▏                         | 10833/50000 [1:57:54<6:17:25,  1.73it/s]


 22%|███████▏                         | 10834/50000 [1:57:55<6:04:24,  1.79it/s]


 22%|███████▏                         | 10835/50000 [1:57:55<6:09:36,  1.77it/s]


 22%|███████▏                         | 10836/50000 [1:57:56<6:21:53,  1.71it/s]


 22%|███████▏                         | 10837/50000 [1:57:56<6:25:22,  1.69it/s]


 22%|███████▏                         | 10838/50000 [1:57:57<6:37:45,  1.64it/s]


 22%|███████▏                         | 10839/50000 [1:57:58<6:30:23,  1.67it/s]


 22%|███████▏                         | 10840/50000 [1:57:58<6:31:20,  1.67it/s]


 22%|███████▏                         | 10841/50000 [1:57:59<6:59:43,  1.55it/s]


 22%|███████▏                         | 10842/50000 [1:58:00<6:43:23,  1.62it/s]


 22%|███████▏                         | 10843/50000 [1:58:00<7:09:23,  1.52it/s]


 22%|███████▏                         | 10844/50000 [1:58:01<7:07:21,  1.53it/s]


 22%|███████▏                         | 10845/50000 [1:58:02<7:29:51,  1.45it/s]


 22%|███████▏                         | 10846/50000 [1:58:02<7:27:52,  1.46it/s]


 22%|███████▏                         | 10847/50000 [1:58:03<7:54:48,  1.37it/s]


 22%|███████▏                         | 10848/50000 [1:58:04<7:24:25,  1.47it/s]


 22%|███████▏                         | 10849/50000 [1:58:05<7:22:55,  1.47it/s]


 22%|███████▏                         | 10850/50000 [1:58:05<7:14:06,  1.50it/s]


 22%|███████▏                         | 10851/50000 [1:58:06<7:10:02,  1.52it/s]


 22%|███████▏                         | 10852/50000 [1:58:07<7:23:50,  1.47it/s]


 22%|███████▏                         | 10853/50000 [1:58:07<7:32:23,  1.44it/s]


 22%|███████▏                         | 10854/50000 [1:58:08<7:19:16,  1.49it/s]


 22%|███████▏                         | 10855/50000 [1:58:08<6:48:31,  1.60it/s]


 22%|███████▏                         | 10856/50000 [1:58:09<6:53:30,  1.58it/s]


 22%|███████▏                         | 10857/50000 [1:58:10<7:30:20,  1.45it/s]


 22%|███████▏                         | 10858/50000 [1:58:10<7:13:03,  1.51it/s]


 22%|███████▏                         | 10859/50000 [1:58:11<6:58:15,  1.56it/s]


 22%|███████▏                         | 10860/50000 [1:58:12<6:31:22,  1.67it/s]


 22%|███████▏                         | 10861/50000 [1:58:12<6:39:00,  1.63it/s]


 22%|███████▏                         | 10862/50000 [1:58:13<6:40:40,  1.63it/s]


 22%|███████▏                         | 10863/50000 [1:58:13<6:36:47,  1.64it/s]


 22%|███████▏                         | 10864/50000 [1:58:14<7:25:07,  1.47it/s]


 22%|███████▏                         | 10865/50000 [1:58:15<7:06:39,  1.53it/s]


 22%|███████▏                         | 10866/50000 [1:58:15<6:50:10,  1.59it/s]


 22%|███████▏                         | 10867/50000 [1:58:16<6:44:44,  1.61it/s]


 22%|███████▏                         | 10868/50000 [1:58:17<7:07:26,  1.53it/s]


 22%|███████▏                         | 10869/50000 [1:58:17<7:10:23,  1.52it/s]


 22%|███████▏                         | 10870/50000 [1:58:18<7:40:57,  1.41it/s]


 22%|███████▏                         | 10871/50000 [1:58:19<7:25:15,  1.46it/s]


 22%|███████▏                         | 10872/50000 [1:58:20<7:23:00,  1.47it/s]


 22%|███████▏                         | 10873/50000 [1:58:20<7:30:30,  1.45it/s]


 22%|███████▏                         | 10874/50000 [1:58:21<7:19:12,  1.48it/s]


 22%|███████▏                         | 10875/50000 [1:58:22<7:45:46,  1.40it/s]


 22%|███████▏                         | 10876/50000 [1:58:22<7:29:53,  1.45it/s]


 22%|███████▏                         | 10877/50000 [1:58:23<6:57:06,  1.56it/s]


 22%|███████▏                         | 10878/50000 [1:58:24<7:00:43,  1.55it/s]


 22%|███████▏                         | 10879/50000 [1:58:24<7:14:52,  1.50it/s]


 22%|███████▏                         | 10880/50000 [1:58:25<7:15:33,  1.50it/s]


 22%|███████▏                         | 10881/50000 [1:58:26<7:23:43,  1.47it/s]


 22%|███████▏                         | 10882/50000 [1:58:26<7:19:13,  1.48it/s]


 22%|███████▏                         | 10883/50000 [1:58:27<7:45:46,  1.40it/s]


 22%|███████▏                         | 10884/50000 [1:58:28<7:23:29,  1.47it/s]


 22%|███████▏                         | 10885/50000 [1:58:28<7:29:10,  1.45it/s]


 22%|███████▏                         | 10886/50000 [1:58:29<7:04:37,  1.54it/s]


 22%|███████▏                         | 10887/50000 [1:58:30<6:58:30,  1.56it/s]


 22%|███████▏                         | 10888/50000 [1:58:30<7:18:13,  1.49it/s]


 22%|███████▏                         | 10889/50000 [1:58:31<7:04:12,  1.54it/s]


 22%|███████▏                         | 10890/50000 [1:58:32<6:55:38,  1.57it/s]


 22%|███████▏                         | 10891/50000 [1:58:32<6:52:13,  1.58it/s]


 22%|███████▏                         | 10892/50000 [1:58:33<6:37:31,  1.64it/s]


 22%|███████▏                         | 10893/50000 [1:58:33<6:33:16,  1.66it/s]


 22%|███████▏                         | 10894/50000 [1:58:34<6:36:35,  1.64it/s]


 22%|███████▏                         | 10895/50000 [1:58:34<6:26:39,  1.69it/s]


 22%|███████▏                         | 10896/50000 [1:58:35<6:25:38,  1.69it/s]


 22%|███████▏                         | 10897/50000 [1:58:36<6:36:13,  1.64it/s]


 22%|███████▏                         | 10898/50000 [1:58:36<6:44:39,  1.61it/s]


 22%|███████▏                         | 10899/50000 [1:58:37<6:24:09,  1.70it/s]


 22%|███████▏                         | 10900/50000 [1:58:37<6:09:46,  1.76it/s]
                                                                                
{'loss': 3.3564, 'grad_norm': 3.084613561630249, 'learning_rate': 0.000782, 'epoch': 0.57}

 22%|███████▏                         | 10900/50000 [1:58:37<6:09:46,  1.76it/s]


 22%|███████▏                         | 10901/50000 [1:58:38<6:21:28,  1.71it/s]


 22%|███████▏                         | 10902/50000 [1:58:39<6:29:29,  1.67it/s]


 22%|███████▏                         | 10903/50000 [1:58:39<6:40:00,  1.63it/s]


 22%|███████▏                         | 10904/50000 [1:58:40<6:34:59,  1.65it/s]


 22%|███████▏                         | 10905/50000 [1:58:41<6:50:09,  1.59it/s]


 22%|███████▏                         | 10906/50000 [1:58:41<6:52:05,  1.58it/s]


 22%|███████▏                         | 10907/50000 [1:58:42<6:52:05,  1.58it/s]


 22%|███████▏                         | 10908/50000 [1:58:43<7:12:27,  1.51it/s]


 22%|███████▏                         | 10909/50000 [1:58:43<6:54:09,  1.57it/s]


 22%|███████▏                         | 10910/50000 [1:58:44<6:42:30,  1.62it/s]


 22%|███████▏                         | 10911/50000 [1:58:44<6:45:26,  1.61it/s]


 22%|███████▏                         | 10912/50000 [1:58:45<7:03:37,  1.54it/s]


 22%|███████▏                         | 10913/50000 [1:58:46<7:02:36,  1.54it/s]


 22%|███████▏                         | 10914/50000 [1:58:46<7:05:42,  1.53it/s]


 22%|███████▏                         | 10915/50000 [1:58:47<7:28:10,  1.45it/s]


 22%|███████▏                         | 10916/50000 [1:58:48<7:10:01,  1.51it/s]


 22%|███████▏                         | 10917/50000 [1:58:48<7:02:17,  1.54it/s]


 22%|███████▏                         | 10918/50000 [1:58:49<6:52:59,  1.58it/s]


 22%|███████▏                         | 10919/50000 [1:58:50<6:41:37,  1.62it/s]


 22%|███████▏                         | 10920/50000 [1:58:50<6:42:20,  1.62it/s]


 22%|███████▏                         | 10921/50000 [1:58:51<6:44:01,  1.61it/s]


 22%|███████▏                         | 10922/50000 [1:58:51<6:50:06,  1.59it/s]


 22%|███████▏                         | 10923/50000 [1:58:52<6:49:27,  1.59it/s]


 22%|███████▏                         | 10924/50000 [1:58:53<7:08:01,  1.52it/s]


 22%|███████▏                         | 10925/50000 [1:58:53<7:09:25,  1.52it/s]


 22%|███████▏                         | 10926/50000 [1:58:54<7:12:53,  1.50it/s]


 22%|███████▏                         | 10927/50000 [1:58:55<7:29:22,  1.45it/s]


 22%|███████▏                         | 10928/50000 [1:58:56<7:20:45,  1.48it/s]


 22%|███████▏                         | 10929/50000 [1:58:56<6:45:41,  1.61it/s]


 22%|███████▏                         | 10930/50000 [1:58:57<6:38:51,  1.63it/s]


 22%|███████▏                         | 10931/50000 [1:58:57<7:00:35,  1.55it/s]


 22%|███████▏                         | 10932/50000 [1:58:58<7:06:39,  1.53it/s]


 22%|███████▏                         | 10933/50000 [1:58:59<7:26:11,  1.46it/s]


 22%|███████▏                         | 10934/50000 [1:59:00<7:56:33,  1.37it/s]


 22%|███████▏                         | 10935/50000 [1:59:00<7:56:17,  1.37it/s]


 22%|███████▏                         | 10936/50000 [1:59:01<7:23:22,  1.47it/s]


 22%|███████▏                         | 10937/50000 [1:59:02<7:50:52,  1.38it/s]


 22%|███████▏                         | 10938/50000 [1:59:02<7:39:24,  1.42it/s]


 22%|███████▏                         | 10939/50000 [1:59:03<7:32:42,  1.44it/s]


 22%|███████▏                         | 10940/50000 [1:59:04<8:11:17,  1.33it/s]


 22%|███████▏                         | 10941/50000 [1:59:05<7:35:11,  1.43it/s]


 22%|███████▏                         | 10942/50000 [1:59:05<7:28:26,  1.45it/s]


 22%|███████▏                         | 10943/50000 [1:59:06<7:12:25,  1.51it/s]


 22%|███████▏                         | 10944/50000 [1:59:06<6:53:54,  1.57it/s]


 22%|███████▏                         | 10945/50000 [1:59:07<7:12:25,  1.51it/s]


 22%|███████▏                         | 10946/50000 [1:59:08<7:35:12,  1.43it/s]


 22%|███████▏                         | 10947/50000 [1:59:08<7:16:01,  1.49it/s]


 22%|███████▏                         | 10948/50000 [1:59:09<7:35:17,  1.43it/s]


 22%|███████▏                         | 10949/50000 [1:59:10<7:32:08,  1.44it/s]


 22%|███████▏                         | 10950/50000 [1:59:10<7:04:16,  1.53it/s]


 22%|███████▏                         | 10951/50000 [1:59:11<7:23:21,  1.47it/s]


 22%|███████▏                         | 10952/50000 [1:59:12<7:11:33,  1.51it/s]


 22%|███████▏                         | 10953/50000 [1:59:12<7:10:39,  1.51it/s]


 22%|███████▏                         | 10954/50000 [1:59:13<7:43:10,  1.41it/s]


 22%|███████▏                         | 10955/50000 [1:59:14<7:31:27,  1.44it/s]


 22%|███████▏                         | 10956/50000 [1:59:15<7:18:21,  1.48it/s]


 22%|███████▏                         | 10957/50000 [1:59:15<7:12:54,  1.50it/s]


 22%|███████▏                         | 10958/50000 [1:59:16<7:40:57,  1.41it/s]


 22%|███████▏                         | 10959/50000 [1:59:17<7:26:16,  1.46it/s]


 22%|███████▏                         | 10960/50000 [1:59:17<6:50:44,  1.58it/s]


 22%|███████▏                         | 10961/50000 [1:59:18<6:54:42,  1.57it/s]


 22%|███████▏                         | 10962/50000 [1:59:19<6:59:57,  1.55it/s]


 22%|███████▏                         | 10963/50000 [1:59:19<6:55:06,  1.57it/s]


 22%|███████▏                         | 10964/50000 [1:59:20<6:41:43,  1.62it/s]


 22%|███████▏                         | 10965/50000 [1:59:20<6:34:52,  1.65it/s]


 22%|███████▏                         | 10966/50000 [1:59:21<6:49:47,  1.59it/s]


 22%|███████▏                         | 10967/50000 [1:59:22<7:27:19,  1.45it/s]


 22%|███████▏                         | 10968/50000 [1:59:23<7:36:48,  1.42it/s]


 22%|███████▏                         | 10969/50000 [1:59:23<7:11:04,  1.51it/s]


 22%|███████▏                         | 10970/50000 [1:59:24<7:00:16,  1.55it/s]


 22%|███████▏                         | 10971/50000 [1:59:24<6:40:29,  1.62it/s]


 22%|███████▏                         | 10972/50000 [1:59:25<7:18:10,  1.48it/s]


 22%|███████▏                         | 10973/50000 [1:59:26<7:45:49,  1.40it/s]


 22%|███████▏                         | 10974/50000 [1:59:27<8:05:09,  1.34it/s]


 22%|███████▏                         | 10975/50000 [1:59:27<7:43:07,  1.40it/s]


 22%|███████▏                         | 10976/50000 [1:59:28<7:34:15,  1.43it/s]


 22%|███████▏                         | 10977/50000 [1:59:29<7:22:37,  1.47it/s]


 22%|███████▏                         | 10978/50000 [1:59:29<7:05:57,  1.53it/s]


 22%|███████▏                         | 10979/50000 [1:59:30<6:55:27,  1.57it/s]


 22%|███████▏                         | 10980/50000 [1:59:30<6:53:04,  1.57it/s]


 22%|███████▏                         | 10981/50000 [1:59:31<6:43:56,  1.61it/s]


 22%|███████▏                         | 10982/50000 [1:59:32<6:50:32,  1.58it/s]


 22%|███████▏                         | 10983/50000 [1:59:32<6:37:14,  1.64it/s]


 22%|███████▏                         | 10984/50000 [1:59:33<7:01:41,  1.54it/s]


 22%|███████▎                         | 10985/50000 [1:59:34<7:04:49,  1.53it/s]


 22%|███████▎                         | 10986/50000 [1:59:34<6:58:26,  1.55it/s]


 22%|███████▎                         | 10987/50000 [1:59:35<6:58:47,  1.55it/s]


 22%|███████▎                         | 10988/50000 [1:59:36<7:20:32,  1.48it/s]


 22%|███████▎                         | 10989/50000 [1:59:37<7:51:14,  1.38it/s]


 22%|███████▎                         | 10990/50000 [1:59:37<8:02:09,  1.35it/s]


 22%|███████▎                         | 10991/50000 [1:59:38<8:03:32,  1.34it/s]


 22%|███████▎                         | 10992/50000 [1:59:39<8:02:08,  1.35it/s]


 22%|███████▎                         | 10993/50000 [1:59:39<7:42:07,  1.41it/s]


 22%|███████▎                         | 10994/50000 [1:59:40<7:09:42,  1.51it/s]


 22%|███████▎                         | 10995/50000 [1:59:41<7:56:11,  1.37it/s]


 22%|███████▎                         | 10996/50000 [1:59:42<7:51:40,  1.38it/s]


 22%|███████▎                         | 10997/50000 [1:59:42<7:57:29,  1.36it/s]


 22%|███████▎                         | 10998/50000 [1:59:43<7:53:47,  1.37it/s]


 22%|███████▎                         | 10999/50000 [1:59:44<7:21:37,  1.47it/s]


 22%|███████▎                         | 11000/50000 [1:59:44<7:22:23,  1.47it/s]
                                                                                
{'loss': 3.3515, 'grad_norm': 2.7052907943725586, 'learning_rate': 0.0007800000000000001, 'epoch': 0.58}

 22%|███████▎                         | 11000/50000 [1:59:44<7:22:23,  1.47it/s]


 22%|███████▎                         | 11001/50000 [1:59:45<7:19:18,  1.48it/s]


 22%|███████▎                         | 11002/50000 [1:59:46<6:54:58,  1.57it/s]


 22%|███████▎                         | 11003/50000 [1:59:46<6:41:35,  1.62it/s]


 22%|███████▎                         | 11004/50000 [1:59:47<6:37:01,  1.64it/s]


 22%|███████▎                         | 11005/50000 [1:59:47<6:47:25,  1.60it/s]


 22%|███████▎                         | 11006/50000 [1:59:48<6:32:33,  1.66it/s]


 22%|███████▎                         | 11007/50000 [1:59:48<6:30:46,  1.66it/s]


 22%|███████▎                         | 11008/50000 [1:59:49<6:38:11,  1.63it/s]


 22%|███████▎                         | 11009/50000 [1:59:50<7:37:49,  1.42it/s]


 22%|███████▎                         | 11010/50000 [1:59:51<8:00:45,  1.35it/s]


 22%|███████▎                         | 11011/50000 [1:59:51<7:39:54,  1.41it/s]


 22%|███████▎                         | 11012/50000 [1:59:52<8:20:49,  1.30it/s]


 22%|███████▎                         | 11013/50000 [1:59:53<8:29:29,  1.28it/s]


 22%|███████▎                         | 11014/50000 [1:59:54<7:51:21,  1.38it/s]


 22%|███████▎                         | 11015/50000 [1:59:55<8:15:05,  1.31it/s]


 22%|███████▎                         | 11016/50000 [1:59:55<7:57:18,  1.36it/s]


 22%|███████▎                         | 11017/50000 [1:59:56<7:57:39,  1.36it/s]


 22%|███████▎                         | 11018/50000 [1:59:57<7:56:49,  1.36it/s]


 22%|███████▎                         | 11019/50000 [1:59:57<7:21:09,  1.47it/s]


 22%|███████▎                         | 11020/50000 [1:59:58<7:31:01,  1.44it/s]


 22%|███████▎                         | 11021/50000 [1:59:59<7:33:52,  1.43it/s]


 22%|███████▎                         | 11022/50000 [1:59:59<7:23:26,  1.46it/s]


 22%|███████▎                         | 11023/50000 [2:00:00<7:03:56,  1.53it/s]


 22%|███████▎                         | 11024/50000 [2:00:01<7:18:55,  1.48it/s]


 22%|███████▎                         | 11025/50000 [2:00:01<6:58:21,  1.55it/s]


 22%|███████▎                         | 11026/50000 [2:00:02<6:58:33,  1.55it/s]


 22%|███████▎                         | 11027/50000 [2:00:03<6:42:43,  1.61it/s]


 22%|███████▎                         | 11028/50000 [2:00:03<6:31:45,  1.66it/s]


 22%|███████▎                         | 11029/50000 [2:00:04<6:25:18,  1.69it/s]


 22%|███████▎                         | 11030/50000 [2:00:04<6:11:23,  1.75it/s]


 22%|███████▎                         | 11031/50000 [2:00:05<6:25:46,  1.68it/s]


 22%|███████▎                         | 11032/50000 [2:00:05<6:28:23,  1.67it/s]


 22%|███████▎                         | 11033/50000 [2:00:06<7:04:00,  1.53it/s]


 22%|███████▎                         | 11034/50000 [2:00:07<6:38:32,  1.63it/s]


 22%|███████▎                         | 11035/50000 [2:00:07<6:37:10,  1.64it/s]


 22%|███████▎                         | 11036/50000 [2:00:08<6:46:18,  1.60it/s]


 22%|███████▎                         | 11037/50000 [2:00:09<6:46:30,  1.60it/s]


 22%|███████▎                         | 11038/50000 [2:00:09<7:07:59,  1.52it/s]


 22%|███████▎                         | 11039/50000 [2:00:10<7:05:35,  1.53it/s]


 22%|███████▎                         | 11040/50000 [2:00:11<6:55:29,  1.56it/s]


 22%|███████▎                         | 11041/50000 [2:00:11<6:54:31,  1.57it/s]


 22%|███████▎                         | 11042/50000 [2:00:12<6:42:45,  1.61it/s]


 22%|███████▎                         | 11043/50000 [2:00:13<7:23:42,  1.46it/s]


 22%|███████▎                         | 11044/50000 [2:00:13<7:07:59,  1.52it/s]


 22%|███████▎                         | 11045/50000 [2:00:14<7:25:55,  1.46it/s]


 22%|███████▎                         | 11046/50000 [2:00:15<7:32:36,  1.43it/s]


 22%|███████▎                         | 11047/50000 [2:00:15<7:28:29,  1.45it/s]


 22%|███████▎                         | 11048/50000 [2:00:16<7:02:53,  1.54it/s]


 22%|███████▎                         | 11049/50000 [2:00:17<7:07:00,  1.52it/s]


 22%|███████▎                         | 11050/50000 [2:00:17<6:50:20,  1.58it/s]


 22%|███████▎                         | 11051/50000 [2:00:18<6:39:53,  1.62it/s]


 22%|███████▎                         | 11052/50000 [2:00:18<6:27:36,  1.67it/s]


 22%|███████▎                         | 11053/50000 [2:00:19<6:32:15,  1.65it/s]


 22%|███████▎                         | 11054/50000 [2:00:20<6:36:36,  1.64it/s]


 22%|███████▎                         | 11055/50000 [2:00:20<6:50:39,  1.58it/s]


 22%|███████▎                         | 11056/50000 [2:00:21<7:08:20,  1.52it/s]


 22%|███████▎                         | 11057/50000 [2:00:22<6:47:00,  1.59it/s]


 22%|███████▎                         | 11058/50000 [2:00:22<7:31:16,  1.44it/s]


 22%|███████▎                         | 11059/50000 [2:00:23<7:09:21,  1.51it/s]


 22%|███████▎                         | 11060/50000 [2:00:24<7:19:51,  1.48it/s]


 22%|███████▎                         | 11061/50000 [2:00:25<7:54:30,  1.37it/s]


 22%|███████▎                         | 11062/50000 [2:00:25<7:43:55,  1.40it/s]


 22%|███████▎                         | 11063/50000 [2:00:26<7:05:12,  1.53it/s]


 22%|███████▎                         | 11064/50000 [2:00:26<6:53:35,  1.57it/s]


 22%|███████▎                         | 11065/50000 [2:00:27<7:10:27,  1.51it/s]


 22%|███████▎                         | 11066/50000 [2:00:28<6:38:09,  1.63it/s]


 22%|███████▎                         | 11067/50000 [2:00:28<6:48:20,  1.59it/s]


 22%|███████▎                         | 11068/50000 [2:00:29<6:38:27,  1.63it/s]


 22%|███████▎                         | 11069/50000 [2:00:29<6:28:39,  1.67it/s]


 22%|███████▎                         | 11070/50000 [2:00:30<6:24:11,  1.69it/s]


 22%|███████▎                         | 11071/50000 [2:00:31<6:19:49,  1.71it/s]


 22%|███████▎                         | 11072/50000 [2:00:31<6:24:03,  1.69it/s]


 22%|███████▎                         | 11073/50000 [2:00:32<6:33:11,  1.65it/s]


 22%|███████▎                         | 11074/50000 [2:00:32<6:45:49,  1.60it/s]


 22%|███████▎                         | 11075/50000 [2:00:33<6:37:27,  1.63it/s]


 22%|███████▎                         | 11076/50000 [2:00:34<6:39:01,  1.63it/s]


 22%|███████▎                         | 11077/50000 [2:00:34<6:33:46,  1.65it/s]


 22%|███████▎                         | 11078/50000 [2:00:35<6:33:49,  1.65it/s]


 22%|███████▎                         | 11079/50000 [2:00:35<6:32:42,  1.65it/s]


 22%|███████▎                         | 11080/50000 [2:00:36<6:22:08,  1.70it/s]


 22%|███████▎                         | 11081/50000 [2:00:37<6:30:44,  1.66it/s]


 22%|███████▎                         | 11082/50000 [2:00:37<6:40:13,  1.62it/s]


 22%|███████▎                         | 11083/50000 [2:00:38<7:00:58,  1.54it/s]


 22%|███████▎                         | 11084/50000 [2:00:39<6:45:30,  1.60it/s]


 22%|███████▎                         | 11085/50000 [2:00:39<6:44:30,  1.60it/s]


 22%|███████▎                         | 11086/50000 [2:00:40<6:40:08,  1.62it/s]


 22%|███████▎                         | 11087/50000 [2:00:40<6:45:40,  1.60it/s]


 22%|███████▎                         | 11088/50000 [2:00:41<6:53:06,  1.57it/s]


 22%|███████▎                         | 11089/50000 [2:00:42<7:08:49,  1.51it/s]


 22%|███████▎                         | 11090/50000 [2:00:42<6:46:15,  1.60it/s]


 22%|███████▎                         | 11091/50000 [2:00:43<6:52:22,  1.57it/s]


 22%|███████▎                         | 11092/50000 [2:00:44<7:07:48,  1.52it/s]


 22%|███████▎                         | 11093/50000 [2:00:44<7:12:23,  1.50it/s]


 22%|███████▎                         | 11094/50000 [2:00:45<6:57:07,  1.55it/s]


 22%|███████▎                         | 11095/50000 [2:00:46<6:46:30,  1.60it/s]


 22%|███████▎                         | 11096/50000 [2:00:46<6:57:36,  1.55it/s]


 22%|███████▎                         | 11097/50000 [2:00:47<6:45:37,  1.60it/s]


 22%|███████▎                         | 11098/50000 [2:00:47<6:24:17,  1.69it/s]


 22%|███████▎                         | 11099/50000 [2:00:48<6:46:50,  1.59it/s]


 22%|███████▎                         | 11100/50000 [2:00:49<6:42:48,  1.61it/s]
                                                                                
{'loss': 3.3556, 'grad_norm': 6.795851707458496, 'learning_rate': 0.000778, 'epoch': 0.58}

 22%|███████▎                         | 11100/50000 [2:00:49<6:42:48,  1.61it/s]


 22%|███████▎                         | 11101/50000 [2:00:50<7:28:28,  1.45it/s]


 22%|███████▎                         | 11102/50000 [2:00:50<7:17:07,  1.48it/s]


 22%|███████▎                         | 11103/50000 [2:00:51<6:54:24,  1.56it/s]


 22%|███████▎                         | 11104/50000 [2:00:51<7:08:01,  1.51it/s]


 22%|███████▎                         | 11105/50000 [2:00:52<7:25:14,  1.46it/s]


 22%|███████▎                         | 11106/50000 [2:00:53<7:09:45,  1.51it/s]


 22%|███████▎                         | 11107/50000 [2:00:54<7:25:19,  1.46it/s]


 22%|███████▎                         | 11108/50000 [2:00:54<7:06:19,  1.52it/s]


 22%|███████▎                         | 11109/50000 [2:00:55<7:09:17,  1.51it/s]


 22%|███████▎                         | 11110/50000 [2:00:55<7:07:39,  1.52it/s]


 22%|███████▎                         | 11111/50000 [2:00:56<7:30:32,  1.44it/s]


 22%|███████▎                         | 11112/50000 [2:00:57<7:17:27,  1.48it/s]


 22%|███████▎                         | 11113/50000 [2:00:58<7:47:18,  1.39it/s]


 22%|███████▎                         | 11114/50000 [2:00:58<7:33:43,  1.43it/s]


 22%|███████▎                         | 11115/50000 [2:00:59<7:23:08,  1.46it/s]


 22%|███████▎                         | 11116/50000 [2:01:00<7:28:04,  1.45it/s]


 22%|███████▎                         | 11117/50000 [2:01:00<7:26:34,  1.45it/s]


 22%|███████▎                         | 11118/50000 [2:01:01<7:15:42,  1.49it/s]


 22%|███████▎                         | 11119/50000 [2:01:02<7:48:25,  1.38it/s]


 22%|███████▎                         | 11120/50000 [2:01:02<7:09:22,  1.51it/s]


 22%|███████▎                         | 11121/50000 [2:01:03<7:09:45,  1.51it/s]


 22%|███████▎                         | 11122/50000 [2:01:04<7:27:51,  1.45it/s]


 22%|███████▎                         | 11123/50000 [2:01:05<7:34:11,  1.43it/s]


 22%|███████▎                         | 11124/50000 [2:01:05<6:56:59,  1.55it/s]


 22%|███████▎                         | 11125/50000 [2:01:06<6:41:41,  1.61it/s]


 22%|███████▎                         | 11126/50000 [2:01:06<6:43:26,  1.61it/s]


 22%|███████▎                         | 11127/50000 [2:01:07<6:36:16,  1.63it/s]


 22%|███████▎                         | 11128/50000 [2:01:07<6:29:45,  1.66it/s]


 22%|███████▎                         | 11129/50000 [2:01:08<6:35:00,  1.64it/s]


 22%|███████▎                         | 11130/50000 [2:01:09<6:58:49,  1.55it/s]


 22%|███████▎                         | 11131/50000 [2:01:09<7:05:19,  1.52it/s]


 22%|███████▎                         | 11132/50000 [2:01:10<6:47:41,  1.59it/s]


 22%|███████▎                         | 11133/50000 [2:01:11<6:50:01,  1.58it/s]


 22%|███████▎                         | 11134/50000 [2:01:11<7:08:09,  1.51it/s]


 22%|███████▎                         | 11135/50000 [2:01:12<7:01:33,  1.54it/s]


 22%|███████▎                         | 11136/50000 [2:01:13<6:59:10,  1.55it/s]


 22%|███████▎                         | 11137/50000 [2:01:13<6:50:14,  1.58it/s]


 22%|███████▎                         | 11138/50000 [2:01:14<6:53:45,  1.57it/s]


 22%|███████▎                         | 11139/50000 [2:01:15<7:01:10,  1.54it/s]


 22%|███████▎                         | 11140/50000 [2:01:15<6:47:58,  1.59it/s]


 22%|███████▎                         | 11141/50000 [2:01:16<6:39:52,  1.62it/s]


 22%|███████▎                         | 11142/50000 [2:01:16<6:42:52,  1.61it/s]


 22%|███████▎                         | 11143/50000 [2:01:17<6:42:44,  1.61it/s]


 22%|███████▎                         | 11144/50000 [2:01:18<6:27:44,  1.67it/s]


 22%|███████▎                         | 11145/50000 [2:01:18<7:18:43,  1.48it/s]


 22%|███████▎                         | 11146/50000 [2:01:19<7:08:47,  1.51it/s]


 22%|███████▎                         | 11147/50000 [2:01:20<6:50:47,  1.58it/s]


 22%|███████▎                         | 11148/50000 [2:01:20<6:40:08,  1.62it/s]


 22%|███████▎                         | 11149/50000 [2:01:21<6:41:52,  1.61it/s]


 22%|███████▎                         | 11150/50000 [2:01:21<6:35:38,  1.64it/s]


 22%|███████▎                         | 11151/50000 [2:01:22<6:26:26,  1.68it/s]


 22%|███████▎                         | 11152/50000 [2:01:23<6:28:25,  1.67it/s]


 22%|███████▎                         | 11153/50000 [2:01:23<6:38:29,  1.62it/s]


 22%|███████▎                         | 11154/50000 [2:01:24<6:28:35,  1.67it/s]


 22%|███████▎                         | 11155/50000 [2:01:24<6:22:41,  1.69it/s]


 22%|███████▎                         | 11156/50000 [2:01:25<6:14:54,  1.73it/s]


 22%|███████▎                         | 11157/50000 [2:01:26<6:26:32,  1.67it/s]


 22%|███████▎                         | 11158/50000 [2:01:26<6:37:03,  1.63it/s]


 22%|███████▎                         | 11159/50000 [2:01:27<6:25:01,  1.68it/s]


 22%|███████▎                         | 11160/50000 [2:01:27<6:39:52,  1.62it/s]


 22%|███████▎                         | 11161/50000 [2:01:28<6:31:47,  1.65it/s]


 22%|███████▎                         | 11162/50000 [2:01:29<6:37:14,  1.63it/s]


 22%|███████▎                         | 11163/50000 [2:01:29<6:26:22,  1.68it/s]


 22%|███████▎                         | 11164/50000 [2:01:30<6:39:27,  1.62it/s]


 22%|███████▎                         | 11165/50000 [2:01:30<6:30:14,  1.66it/s]


 22%|███████▎                         | 11166/50000 [2:01:31<6:20:08,  1.70it/s]


 22%|███████▎                         | 11167/50000 [2:01:32<6:45:10,  1.60it/s]


 22%|███████▎                         | 11168/50000 [2:01:32<6:51:31,  1.57it/s]


 22%|███████▎                         | 11169/50000 [2:01:33<6:55:51,  1.56it/s]


 22%|███████▎                         | 11170/50000 [2:01:34<7:38:12,  1.41it/s]


 22%|███████▎                         | 11171/50000 [2:01:35<7:39:39,  1.41it/s]


 22%|███████▎                         | 11172/50000 [2:01:35<7:44:42,  1.39it/s]


 22%|███████▎                         | 11173/50000 [2:01:36<7:29:24,  1.44it/s]


 22%|███████▎                         | 11174/50000 [2:01:37<7:17:33,  1.48it/s]


 22%|███████▍                         | 11175/50000 [2:01:37<6:53:09,  1.57it/s]


 22%|███████▍                         | 11176/50000 [2:01:38<7:12:17,  1.50it/s]


 22%|███████▍                         | 11177/50000 [2:01:39<7:39:40,  1.41it/s]


 22%|███████▍                         | 11178/50000 [2:01:39<7:27:00,  1.45it/s]


 22%|███████▍                         | 11179/50000 [2:01:40<7:08:20,  1.51it/s]


 22%|███████▍                         | 11180/50000 [2:01:41<7:04:11,  1.53it/s]


 22%|███████▍                         | 11181/50000 [2:01:41<7:03:46,  1.53it/s]


 22%|███████▍                         | 11182/50000 [2:01:42<7:09:37,  1.51it/s]


 22%|███████▍                         | 11183/50000 [2:01:43<7:12:36,  1.50it/s]


 22%|███████▍                         | 11184/50000 [2:01:43<7:09:39,  1.51it/s]


 22%|███████▍                         | 11185/50000 [2:01:44<7:39:33,  1.41it/s]


 22%|███████▍                         | 11186/50000 [2:01:45<7:24:42,  1.45it/s]


 22%|███████▍                         | 11187/50000 [2:01:45<7:03:03,  1.53it/s]


 22%|███████▍                         | 11188/50000 [2:01:46<7:00:06,  1.54it/s]


 22%|███████▍                         | 11189/50000 [2:01:46<6:48:16,  1.58it/s]


 22%|███████▍                         | 11190/50000 [2:01:47<6:48:36,  1.58it/s]


 22%|███████▍                         | 11191/50000 [2:01:48<7:13:51,  1.49it/s]


 22%|███████▍                         | 11192/50000 [2:01:49<7:44:29,  1.39it/s]


 22%|███████▍                         | 11193/50000 [2:01:49<7:50:27,  1.37it/s]


 22%|███████▍                         | 11194/50000 [2:01:50<7:41:55,  1.40it/s]


 22%|███████▍                         | 11195/50000 [2:01:51<7:43:48,  1.39it/s]


 22%|███████▍                         | 11196/50000 [2:01:52<7:50:02,  1.38it/s]


 22%|███████▍                         | 11197/50000 [2:01:52<7:47:46,  1.38it/s]


 22%|███████▍                         | 11198/50000 [2:01:53<7:13:10,  1.49it/s]


 22%|███████▍                         | 11199/50000 [2:01:53<6:53:43,  1.56it/s]


 22%|███████▍                         | 11200/50000 [2:01:54<6:30:52,  1.65it/s]
                                                                                
{'loss': 3.3991, 'grad_norm': 3.5254292488098145, 'learning_rate': 0.000776, 'epoch': 0.59}

 22%|███████▍                         | 11200/50000 [2:01:54<6:30:52,  1.65it/s]


 22%|███████▍                         | 11201/50000 [2:01:55<6:45:03,  1.60it/s]


 22%|███████▍                         | 11202/50000 [2:01:55<7:10:12,  1.50it/s]


 22%|███████▍                         | 11203/50000 [2:01:56<7:01:30,  1.53it/s]


 22%|███████▍                         | 11204/50000 [2:01:57<7:01:32,  1.53it/s]


 22%|███████▍                         | 11205/50000 [2:01:57<6:30:25,  1.66it/s]


 22%|███████▍                         | 11206/50000 [2:01:58<6:40:58,  1.61it/s]


 22%|███████▍                         | 11207/50000 [2:01:58<6:20:49,  1.70it/s]


 22%|███████▍                         | 11208/50000 [2:01:59<6:37:55,  1.62it/s]


 22%|███████▍                         | 11209/50000 [2:02:00<7:06:25,  1.52it/s]


 22%|███████▍                         | 11210/50000 [2:02:00<6:49:00,  1.58it/s]


 22%|███████▍                         | 11211/50000 [2:02:01<6:51:32,  1.57it/s]


 22%|███████▍                         | 11212/50000 [2:02:02<6:39:45,  1.62it/s]


 22%|███████▍                         | 11213/50000 [2:02:02<6:41:18,  1.61it/s]


 22%|███████▍                         | 11214/50000 [2:02:03<6:30:12,  1.66it/s]


 22%|███████▍                         | 11215/50000 [2:02:03<6:35:57,  1.63it/s]


 22%|███████▍                         | 11216/50000 [2:02:04<6:47:15,  1.59it/s]


 22%|███████▍                         | 11217/50000 [2:02:05<6:36:50,  1.63it/s]


 22%|███████▍                         | 11218/50000 [2:02:05<6:56:59,  1.55it/s]


 22%|███████▍                         | 11219/50000 [2:02:06<6:57:07,  1.55it/s]


 22%|███████▍                         | 11220/50000 [2:02:07<6:28:25,  1.66it/s]


 22%|███████▍                         | 11221/50000 [2:02:07<6:41:52,  1.61it/s]


 22%|███████▍                         | 11222/50000 [2:02:08<7:09:02,  1.51it/s]


 22%|███████▍                         | 11223/50000 [2:02:08<6:39:10,  1.62it/s]


 22%|███████▍                         | 11224/50000 [2:02:09<6:43:18,  1.60it/s]


 22%|███████▍                         | 11225/50000 [2:02:10<6:36:24,  1.63it/s]


 22%|███████▍                         | 11226/50000 [2:02:10<6:34:04,  1.64it/s]


 22%|███████▍                         | 11227/50000 [2:02:11<6:33:36,  1.64it/s]


 22%|███████▍                         | 11228/50000 [2:02:12<6:41:48,  1.61it/s]


 22%|███████▍                         | 11229/50000 [2:02:12<6:35:16,  1.63it/s]


 22%|███████▍                         | 11230/50000 [2:02:13<6:36:57,  1.63it/s]


 22%|███████▍                         | 11231/50000 [2:02:13<6:28:20,  1.66it/s]


 22%|███████▍                         | 11232/50000 [2:02:14<6:52:07,  1.57it/s]


 22%|███████▍                         | 11233/50000 [2:02:15<7:11:13,  1.50it/s]


 22%|███████▍                         | 11234/50000 [2:02:15<6:54:45,  1.56it/s]


 22%|███████▍                         | 11235/50000 [2:02:16<7:27:13,  1.44it/s]


 22%|███████▍                         | 11236/50000 [2:02:17<7:05:55,  1.52it/s]


 22%|███████▍                         | 11237/50000 [2:02:17<6:51:01,  1.57it/s]


 22%|███████▍                         | 11238/50000 [2:02:18<7:00:08,  1.54it/s]


 22%|███████▍                         | 11239/50000 [2:02:19<6:54:20,  1.56it/s]


 22%|███████▍                         | 11240/50000 [2:02:19<7:01:15,  1.53it/s]


 22%|███████▍                         | 11241/50000 [2:02:20<6:45:18,  1.59it/s]


 22%|███████▍                         | 11242/50000 [2:02:20<6:30:28,  1.65it/s]


 22%|███████▍                         | 11243/50000 [2:02:21<6:28:31,  1.66it/s]


 22%|███████▍                         | 11244/50000 [2:02:22<6:25:59,  1.67it/s]


 22%|███████▍                         | 11245/50000 [2:02:22<6:54:02,  1.56it/s]


 22%|███████▍                         | 11246/50000 [2:02:23<6:53:36,  1.56it/s]


 22%|███████▍                         | 11247/50000 [2:02:24<7:08:39,  1.51it/s]


 22%|███████▍                         | 11248/50000 [2:02:24<6:56:29,  1.55it/s]


 22%|███████▍                         | 11249/50000 [2:02:25<6:53:01,  1.56it/s]


 22%|███████▍                         | 11250/50000 [2:02:26<6:54:09,  1.56it/s]


 23%|███████▍                         | 11251/50000 [2:02:26<6:51:16,  1.57it/s]


 23%|███████▍                         | 11252/50000 [2:02:27<6:59:03,  1.54it/s]


 23%|███████▍                         | 11253/50000 [2:02:28<7:05:39,  1.52it/s]


 23%|███████▍                         | 11254/50000 [2:02:28<7:24:16,  1.45it/s]


 23%|███████▍                         | 11255/50000 [2:02:29<7:02:37,  1.53it/s]


 23%|███████▍                         | 11256/50000 [2:02:29<6:46:05,  1.59it/s]


 23%|███████▍                         | 11257/50000 [2:02:30<6:49:20,  1.58it/s]


 23%|███████▍                         | 11258/50000 [2:02:31<6:49:12,  1.58it/s]


 23%|███████▍                         | 11259/50000 [2:02:31<6:53:54,  1.56it/s]


 23%|███████▍                         | 11260/50000 [2:02:32<6:30:47,  1.65it/s]


 23%|███████▍                         | 11261/50000 [2:02:33<6:43:25,  1.60it/s]


 23%|███████▍                         | 11262/50000 [2:02:33<6:54:49,  1.56it/s]


 23%|███████▍                         | 11263/50000 [2:02:34<6:44:32,  1.60it/s]


 23%|███████▍                         | 11264/50000 [2:02:35<6:56:05,  1.55it/s]


 23%|███████▍                         | 11265/50000 [2:02:35<7:09:13,  1.50it/s]


 23%|███████▍                         | 11266/50000 [2:02:36<7:08:23,  1.51it/s]


 23%|███████▍                         | 11267/50000 [2:02:37<6:52:27,  1.57it/s]


 23%|███████▍                         | 11268/50000 [2:02:37<6:59:21,  1.54it/s]


 23%|███████▍                         | 11269/50000 [2:02:38<6:50:27,  1.57it/s]


 23%|███████▍                         | 11270/50000 [2:02:38<6:44:07,  1.60it/s]


 23%|███████▍                         | 11271/50000 [2:02:39<6:47:14,  1.59it/s]


 23%|███████▍                         | 11272/50000 [2:02:40<6:50:18,  1.57it/s]


 23%|███████▍                         | 11273/50000 [2:02:40<6:56:16,  1.55it/s]


 23%|███████▍                         | 11274/50000 [2:02:41<6:54:26,  1.56it/s]


 23%|███████▍                         | 11275/50000 [2:02:42<6:54:18,  1.56it/s]


 23%|███████▍                         | 11276/50000 [2:02:42<7:14:14,  1.49it/s]


 23%|███████▍                         | 11277/50000 [2:02:43<7:09:18,  1.50it/s]


 23%|███████▍                         | 11278/50000 [2:02:44<7:46:22,  1.38it/s]


 23%|███████▍                         | 11279/50000 [2:02:44<7:03:56,  1.52it/s]


 23%|███████▍                         | 11280/50000 [2:02:45<6:47:06,  1.59it/s]


 23%|███████▍                         | 11281/50000 [2:02:46<6:54:53,  1.56it/s]


 23%|███████▍                         | 11282/50000 [2:02:46<6:36:59,  1.63it/s]


 23%|███████▍                         | 11283/50000 [2:02:47<6:46:26,  1.59it/s]


 23%|███████▍                         | 11284/50000 [2:02:47<6:50:53,  1.57it/s]


 23%|███████▍                         | 11285/50000 [2:02:48<7:07:45,  1.51it/s]


 23%|███████▍                         | 11286/50000 [2:02:49<7:18:20,  1.47it/s]


 23%|███████▍                         | 11287/50000 [2:02:50<6:56:10,  1.55it/s]


 23%|███████▍                         | 11288/50000 [2:02:50<6:47:42,  1.58it/s]


 23%|███████▍                         | 11289/50000 [2:02:51<6:33:20,  1.64it/s]


 23%|███████▍                         | 11290/50000 [2:02:52<7:18:12,  1.47it/s]


 23%|███████▍                         | 11291/50000 [2:02:52<7:25:14,  1.45it/s]


 23%|███████▍                         | 11292/50000 [2:02:53<8:15:34,  1.30it/s]


 23%|███████▍                         | 11293/50000 [2:02:54<7:38:30,  1.41it/s]


 23%|███████▍                         | 11294/50000 [2:02:54<7:16:12,  1.48it/s]


 23%|███████▍                         | 11295/50000 [2:02:55<7:16:25,  1.48it/s]


 23%|███████▍                         | 11296/50000 [2:02:56<7:30:25,  1.43it/s]


 23%|███████▍                         | 11297/50000 [2:02:57<8:18:47,  1.29it/s]


 23%|███████▍                         | 11298/50000 [2:02:57<8:11:50,  1.31it/s]


 23%|███████▍                         | 11299/50000 [2:02:58<8:25:47,  1.28it/s]


 23%|███████▍                         | 11300/50000 [2:02:59<7:56:35,  1.35it/s]
                                                                                
{'loss': 3.3741, 'grad_norm': 3.0374321937561035, 'learning_rate': 0.0007740000000000001, 'epoch': 0.59}

 23%|███████▍                         | 11300/50000 [2:02:59<7:56:35,  1.35it/s]


 23%|███████▍                         | 11301/50000 [2:03:00<7:27:27,  1.44it/s]


 23%|███████▍                         | 11302/50000 [2:03:00<7:18:02,  1.47it/s]


 23%|███████▍                         | 11303/50000 [2:03:01<6:59:18,  1.54it/s]


 23%|███████▍                         | 11304/50000 [2:03:01<6:58:22,  1.54it/s]


 23%|███████▍                         | 11305/50000 [2:03:02<7:12:54,  1.49it/s]


 23%|███████▍                         | 11306/50000 [2:03:03<7:03:04,  1.52it/s]


 23%|███████▍                         | 11307/50000 [2:03:03<6:43:36,  1.60it/s]


 23%|███████▍                         | 11308/50000 [2:03:04<7:10:02,  1.50it/s]


 23%|███████▍                         | 11309/50000 [2:03:05<6:57:23,  1.54it/s]


 23%|███████▍                         | 11310/50000 [2:03:05<7:03:10,  1.52it/s]


 23%|███████▍                         | 11311/50000 [2:03:06<7:03:34,  1.52it/s]


 23%|███████▍                         | 11312/50000 [2:03:07<6:49:06,  1.58it/s]


 23%|███████▍                         | 11313/50000 [2:03:07<6:40:14,  1.61it/s]


 23%|███████▍                         | 11314/50000 [2:03:08<6:40:17,  1.61it/s]


 23%|███████▍                         | 11315/50000 [2:03:08<6:27:58,  1.66it/s]


 23%|███████▍                         | 11316/50000 [2:03:09<6:57:39,  1.54it/s]


 23%|███████▍                         | 11317/50000 [2:03:10<7:32:32,  1.42it/s]


 23%|███████▍                         | 11318/50000 [2:03:11<7:21:34,  1.46it/s]


 23%|███████▍                         | 11319/50000 [2:03:11<6:44:12,  1.59it/s]


 23%|███████▍                         | 11320/50000 [2:03:12<6:50:20,  1.57it/s]


 23%|███████▍                         | 11321/50000 [2:03:12<6:53:12,  1.56it/s]


 23%|███████▍                         | 11322/50000 [2:03:13<6:30:14,  1.65it/s]


 23%|███████▍                         | 11323/50000 [2:03:14<6:56:53,  1.55it/s]


 23%|███████▍                         | 11324/50000 [2:03:14<7:00:20,  1.53it/s]


 23%|███████▍                         | 11325/50000 [2:03:15<6:46:53,  1.58it/s]


 23%|███████▍                         | 11326/50000 [2:03:15<6:36:17,  1.63it/s]


 23%|███████▍                         | 11327/50000 [2:03:16<6:45:56,  1.59it/s]


 23%|███████▍                         | 11328/50000 [2:03:17<6:47:47,  1.58it/s]


 23%|███████▍                         | 11329/50000 [2:03:17<7:07:50,  1.51it/s]


 23%|███████▍                         | 11330/50000 [2:03:18<6:50:59,  1.57it/s]


 23%|███████▍                         | 11331/50000 [2:03:19<7:26:47,  1.44it/s]


 23%|███████▍                         | 11332/50000 [2:03:20<7:35:06,  1.42it/s]


 23%|███████▍                         | 11333/50000 [2:03:20<7:11:16,  1.49it/s]


 23%|███████▍                         | 11334/50000 [2:03:21<7:12:48,  1.49it/s]


 23%|███████▍                         | 11335/50000 [2:03:22<7:11:24,  1.49it/s]


 23%|███████▍                         | 11336/50000 [2:03:22<7:25:36,  1.45it/s]


 23%|███████▍                         | 11337/50000 [2:03:23<7:08:08,  1.51it/s]


 23%|███████▍                         | 11338/50000 [2:03:23<6:53:37,  1.56it/s]


 23%|███████▍                         | 11339/50000 [2:03:24<6:40:53,  1.61it/s]


 23%|███████▍                         | 11340/50000 [2:03:25<7:01:53,  1.53it/s]


 23%|███████▍                         | 11341/50000 [2:03:25<6:47:52,  1.58it/s]


 23%|███████▍                         | 11342/50000 [2:03:26<6:20:23,  1.69it/s]


 23%|███████▍                         | 11343/50000 [2:03:26<6:18:48,  1.70it/s]


 23%|███████▍                         | 11344/50000 [2:03:27<6:28:38,  1.66it/s]


 23%|███████▍                         | 11345/50000 [2:03:28<6:10:32,  1.74it/s]


 23%|███████▍                         | 11346/50000 [2:03:28<6:20:26,  1.69it/s]


 23%|███████▍                         | 11347/50000 [2:03:29<6:18:52,  1.70it/s]


 23%|███████▍                         | 11348/50000 [2:03:30<6:47:33,  1.58it/s]


 23%|███████▍                         | 11349/50000 [2:03:30<7:13:54,  1.48it/s]


 23%|███████▍                         | 11350/50000 [2:03:31<6:56:15,  1.55it/s]


 23%|███████▍                         | 11351/50000 [2:03:31<6:44:00,  1.59it/s]


 23%|███████▍                         | 11352/50000 [2:03:32<6:38:57,  1.61it/s]


 23%|███████▍                         | 11353/50000 [2:03:33<7:16:56,  1.47it/s]


 23%|███████▍                         | 11354/50000 [2:03:33<6:54:58,  1.55it/s]


 23%|███████▍                         | 11355/50000 [2:03:34<6:45:25,  1.59it/s]


 23%|███████▍                         | 11356/50000 [2:03:35<6:37:28,  1.62it/s]


 23%|███████▍                         | 11357/50000 [2:03:35<7:01:41,  1.53it/s]


 23%|███████▍                         | 11358/50000 [2:03:36<7:00:28,  1.53it/s]


 23%|███████▍                         | 11359/50000 [2:03:37<6:58:10,  1.54it/s]


 23%|███████▍                         | 11360/50000 [2:03:37<7:16:12,  1.48it/s]


 23%|███████▍                         | 11361/50000 [2:03:38<7:10:01,  1.50it/s]


 23%|███████▍                         | 11362/50000 [2:03:39<7:04:28,  1.52it/s]


 23%|███████▍                         | 11363/50000 [2:03:39<7:14:09,  1.48it/s]


 23%|███████▌                         | 11364/50000 [2:03:40<7:47:41,  1.38it/s]


 23%|███████▌                         | 11365/50000 [2:03:41<7:17:34,  1.47it/s]


 23%|███████▌                         | 11366/50000 [2:03:41<6:52:42,  1.56it/s]


 23%|███████▌                         | 11367/50000 [2:03:42<6:54:46,  1.55it/s]


 23%|███████▌                         | 11368/50000 [2:03:43<6:50:12,  1.57it/s]


 23%|███████▌                         | 11369/50000 [2:03:43<6:57:50,  1.54it/s]


 23%|███████▌                         | 11370/50000 [2:03:44<6:26:15,  1.67it/s]


 23%|███████▌                         | 11371/50000 [2:03:44<6:32:35,  1.64it/s]


 23%|███████▌                         | 11372/50000 [2:03:45<6:45:31,  1.59it/s]


 23%|███████▌                         | 11373/50000 [2:03:46<6:48:30,  1.58it/s]


 23%|███████▌                         | 11374/50000 [2:03:47<7:08:14,  1.50it/s]


 23%|███████▌                         | 11375/50000 [2:03:47<7:35:54,  1.41it/s]


 23%|███████▌                         | 11376/50000 [2:03:48<7:15:14,  1.48it/s]


 23%|███████▌                         | 11377/50000 [2:03:49<7:43:23,  1.39it/s]


 23%|███████▌                         | 11378/50000 [2:03:49<7:27:45,  1.44it/s]


 23%|███████▌                         | 11379/50000 [2:03:50<7:09:30,  1.50it/s]


 23%|███████▌                         | 11380/50000 [2:03:51<7:04:02,  1.52it/s]


 23%|███████▌                         | 11381/50000 [2:03:51<6:52:51,  1.56it/s]


 23%|███████▌                         | 11382/50000 [2:03:52<7:05:58,  1.51it/s]


 23%|███████▌                         | 11383/50000 [2:03:53<7:10:23,  1.50it/s]


 23%|███████▌                         | 11384/50000 [2:03:53<7:24:56,  1.45it/s]


 23%|███████▌                         | 11385/50000 [2:03:54<7:06:20,  1.51it/s]


 23%|███████▌                         | 11386/50000 [2:03:54<6:28:29,  1.66it/s]


 23%|███████▌                         | 11387/50000 [2:03:55<6:37:43,  1.62it/s]


 23%|███████▌                         | 11388/50000 [2:03:56<6:30:55,  1.65it/s]


 23%|███████▌                         | 11389/50000 [2:03:56<6:37:01,  1.62it/s]


 23%|███████▌                         | 11390/50000 [2:03:57<6:42:23,  1.60it/s]


 23%|███████▌                         | 11391/50000 [2:03:57<6:28:04,  1.66it/s]


 23%|███████▌                         | 11392/50000 [2:03:58<6:36:10,  1.62it/s]


 23%|███████▌                         | 11393/50000 [2:03:59<6:29:54,  1.65it/s]


 23%|███████▌                         | 11394/50000 [2:03:59<6:51:07,  1.57it/s]


 23%|███████▌                         | 11395/50000 [2:04:00<6:41:24,  1.60it/s]


 23%|███████▌                         | 11396/50000 [2:04:01<6:43:12,  1.60it/s]


 23%|███████▌                         | 11397/50000 [2:04:01<6:34:39,  1.63it/s]


 23%|███████▌                         | 11398/50000 [2:04:02<6:44:29,  1.59it/s]


 23%|███████▌                         | 11399/50000 [2:04:03<6:48:52,  1.57it/s]


 23%|███████▌                         | 11400/50000 [2:04:03<6:40:54,  1.60it/s]
                                                                                
{'loss': 3.3992, 'grad_norm': 3.428956985473633, 'learning_rate': 0.000772, 'epoch': 0.6}

 23%|███████▌                         | 11400/50000 [2:04:03<6:40:54,  1.60it/s]


 23%|███████▌                         | 11401/50000 [2:04:04<6:28:20,  1.66it/s]


 23%|███████▌                         | 11402/50000 [2:04:04<6:59:10,  1.53it/s]


 23%|███████▌                         | 11403/50000 [2:04:05<6:53:12,  1.56it/s]


 23%|███████▌                         | 11404/50000 [2:04:06<6:30:05,  1.65it/s]


 23%|███████▌                         | 11405/50000 [2:04:06<6:33:58,  1.63it/s]


 23%|███████▌                         | 11406/50000 [2:04:07<6:25:46,  1.67it/s]


 23%|███████▌                         | 11407/50000 [2:04:07<6:20:00,  1.69it/s]


 23%|███████▌                         | 11408/50000 [2:04:08<6:21:59,  1.68it/s]


 23%|███████▌                         | 11409/50000 [2:04:09<6:38:05,  1.62it/s]


 23%|███████▌                         | 11410/50000 [2:04:09<6:35:50,  1.62it/s]


 23%|███████▌                         | 11411/50000 [2:04:10<6:24:44,  1.67it/s]


 23%|███████▌                         | 11412/50000 [2:04:10<6:33:57,  1.63it/s]


 23%|███████▌                         | 11413/50000 [2:04:11<6:24:32,  1.67it/s]


 23%|███████▌                         | 11414/50000 [2:04:12<6:30:08,  1.65it/s]


 23%|███████▌                         | 11415/50000 [2:04:12<6:20:42,  1.69it/s]


 23%|███████▌                         | 11416/50000 [2:04:13<6:20:23,  1.69it/s]


 23%|███████▌                         | 11417/50000 [2:04:13<6:20:02,  1.69it/s]


 23%|███████▌                         | 11418/50000 [2:04:14<6:25:49,  1.67it/s]


 23%|███████▌                         | 11419/50000 [2:04:15<6:51:03,  1.56it/s]


 23%|███████▌                         | 11420/50000 [2:04:16<7:13:22,  1.48it/s]


 23%|███████▌                         | 11421/50000 [2:04:16<6:58:03,  1.54it/s]


 23%|███████▌                         | 11422/50000 [2:04:17<6:44:57,  1.59it/s]


 23%|███████▌                         | 11423/50000 [2:04:17<6:53:58,  1.55it/s]


 23%|███████▌                         | 11424/50000 [2:04:18<7:54:14,  1.36it/s]


 23%|███████▌                         | 11425/50000 [2:04:19<7:34:05,  1.42it/s]


 23%|███████▌                         | 11426/50000 [2:04:20<7:07:50,  1.50it/s]


 23%|███████▌                         | 11427/50000 [2:04:20<7:03:54,  1.52it/s]


 23%|███████▌                         | 11428/50000 [2:04:21<6:57:38,  1.54it/s]


 23%|███████▌                         | 11429/50000 [2:04:22<7:10:19,  1.49it/s]


 23%|███████▌                         | 11430/50000 [2:04:22<6:41:51,  1.60it/s]


 23%|███████▌                         | 11431/50000 [2:04:23<6:34:40,  1.63it/s]


 23%|███████▌                         | 11432/50000 [2:04:23<6:32:13,  1.64it/s]


 23%|███████▌                         | 11433/50000 [2:04:24<6:39:03,  1.61it/s]


 23%|███████▌                         | 11434/50000 [2:04:25<7:20:30,  1.46it/s]


 23%|███████▌                         | 11435/50000 [2:04:25<7:29:21,  1.43it/s]


 23%|███████▌                         | 11436/50000 [2:04:26<7:17:46,  1.47it/s]


 23%|███████▌                         | 11437/50000 [2:04:27<7:07:50,  1.50it/s]


 23%|███████▌                         | 11438/50000 [2:04:27<6:44:52,  1.59it/s]


 23%|███████▌                         | 11439/50000 [2:04:28<6:20:30,  1.69it/s]


 23%|███████▌                         | 11440/50000 [2:04:28<6:04:38,  1.76it/s]


 23%|███████▌                         | 11441/50000 [2:04:29<6:19:39,  1.69it/s]


 23%|███████▌                         | 11442/50000 [2:04:30<6:31:23,  1.64it/s]


 23%|███████▌                         | 11443/50000 [2:04:30<6:37:11,  1.62it/s]


 23%|███████▌                         | 11444/50000 [2:04:31<6:40:05,  1.61it/s]


 23%|███████▌                         | 11445/50000 [2:04:32<6:57:54,  1.54it/s]


 23%|███████▌                         | 11446/50000 [2:04:32<6:54:54,  1.55it/s]


 23%|███████▌                         | 11447/50000 [2:04:33<6:41:25,  1.60it/s]


 23%|███████▌                         | 11448/50000 [2:04:33<6:57:51,  1.54it/s]


 23%|███████▌                         | 11449/50000 [2:04:34<6:52:01,  1.56it/s]


 23%|███████▌                         | 11450/50000 [2:04:35<6:27:53,  1.66it/s]


 23%|███████▌                         | 11451/50000 [2:04:35<6:49:25,  1.57it/s]


 23%|███████▌                         | 11452/50000 [2:04:36<7:07:25,  1.50it/s]


 23%|███████▌                         | 11453/50000 [2:04:37<6:59:50,  1.53it/s]


 23%|███████▌                         | 11454/50000 [2:04:37<7:12:14,  1.49it/s]


 23%|███████▌                         | 11455/50000 [2:04:38<7:04:34,  1.51it/s]


 23%|███████▌                         | 11456/50000 [2:04:39<7:06:29,  1.51it/s]


 23%|███████▌                         | 11457/50000 [2:04:39<6:48:23,  1.57it/s]


 23%|███████▌                         | 11458/50000 [2:04:40<7:02:27,  1.52it/s]


 23%|███████▌                         | 11459/50000 [2:04:41<7:16:32,  1.47it/s]


 23%|███████▌                         | 11460/50000 [2:04:41<7:09:59,  1.49it/s]


 23%|███████▌                         | 11461/50000 [2:04:42<7:54:08,  1.35it/s]


 23%|███████▌                         | 11462/50000 [2:04:43<7:36:10,  1.41it/s]


 23%|███████▌                         | 11463/50000 [2:04:43<7:04:11,  1.51it/s]


 23%|███████▌                         | 11464/50000 [2:04:44<6:41:55,  1.60it/s]


 23%|███████▌                         | 11465/50000 [2:04:45<6:31:09,  1.64it/s]


 23%|███████▌                         | 11466/50000 [2:04:45<6:14:16,  1.72it/s]


 23%|███████▌                         | 11467/50000 [2:04:46<6:18:53,  1.69it/s]


 23%|███████▌                         | 11468/50000 [2:04:46<6:29:36,  1.65it/s]


 23%|███████▌                         | 11469/50000 [2:04:47<6:44:28,  1.59it/s]


 23%|███████▌                         | 11470/50000 [2:04:48<6:47:29,  1.58it/s]


 23%|███████▌                         | 11471/50000 [2:04:48<7:04:37,  1.51it/s]


 23%|███████▌                         | 11472/50000 [2:04:49<6:51:45,  1.56it/s]


 23%|███████▌                         | 11473/50000 [2:04:50<6:34:35,  1.63it/s]


 23%|███████▌                         | 11474/50000 [2:04:50<6:57:28,  1.54it/s]


 23%|███████▌                         | 11475/50000 [2:04:51<6:46:00,  1.58it/s]


 23%|███████▌                         | 11476/50000 [2:04:51<6:46:33,  1.58it/s]


 23%|███████▌                         | 11477/50000 [2:04:52<7:36:58,  1.41it/s]


 23%|███████▌                         | 11478/50000 [2:04:53<7:28:25,  1.43it/s]


 23%|███████▌                         | 11479/50000 [2:04:54<7:17:25,  1.47it/s]


 23%|███████▌                         | 11480/50000 [2:04:54<7:12:05,  1.49it/s]


 23%|███████▌                         | 11481/50000 [2:04:55<6:59:47,  1.53it/s]


 23%|███████▌                         | 11482/50000 [2:04:56<7:00:58,  1.52it/s]


 23%|███████▌                         | 11483/50000 [2:04:56<6:55:51,  1.54it/s]


 23%|███████▌                         | 11484/50000 [2:04:57<6:59:25,  1.53it/s]


 23%|███████▌                         | 11485/50000 [2:04:58<7:38:45,  1.40it/s]


 23%|███████▌                         | 11486/50000 [2:04:59<7:45:46,  1.38it/s]


 23%|███████▌                         | 11487/50000 [2:04:59<7:06:55,  1.50it/s]


 23%|███████▌                         | 11488/50000 [2:05:00<7:16:59,  1.47it/s]


 23%|███████▌                         | 11489/50000 [2:05:00<7:02:57,  1.52it/s]


 23%|███████▌                         | 11490/50000 [2:05:01<6:49:32,  1.57it/s]


 23%|███████▌                         | 11491/50000 [2:05:01<6:26:05,  1.66it/s]


 23%|███████▌                         | 11492/50000 [2:05:02<6:29:46,  1.65it/s]


 23%|███████▌                         | 11493/50000 [2:05:03<6:35:52,  1.62it/s]


 23%|███████▌                         | 11494/50000 [2:05:03<7:02:35,  1.52it/s]


 23%|███████▌                         | 11495/50000 [2:05:04<7:00:05,  1.53it/s]


 23%|███████▌                         | 11496/50000 [2:05:05<7:20:38,  1.46it/s]


 23%|███████▌                         | 11497/50000 [2:05:06<7:27:42,  1.43it/s]


 23%|███████▌                         | 11498/50000 [2:05:06<7:07:56,  1.50it/s]


 23%|███████▌                         | 11499/50000 [2:05:07<7:01:29,  1.52it/s]


 23%|███████▌                         | 11500/50000 [2:05:07<6:43:29,  1.59it/s]
                                                                                
{'loss': 3.3175, 'grad_norm': 4.051909923553467, 'learning_rate': 0.0007700000000000001, 'epoch': 0.6}

 23%|███████▌                         | 11500/50000 [2:05:07<6:43:29,  1.59it/s]


 23%|███████▌                         | 11501/50000 [2:05:08<6:51:45,  1.56it/s]


 23%|███████▌                         | 11502/50000 [2:05:09<6:38:02,  1.61it/s]


 23%|███████▌                         | 11503/50000 [2:05:09<6:55:13,  1.55it/s]


 23%|███████▌                         | 11504/50000 [2:05:10<7:08:51,  1.50it/s]


 23%|███████▌                         | 11505/50000 [2:05:11<6:51:37,  1.56it/s]


 23%|███████▌                         | 11506/50000 [2:05:11<6:42:34,  1.59it/s]


 23%|███████▌                         | 11507/50000 [2:05:12<6:43:37,  1.59it/s]


 23%|███████▌                         | 11508/50000 [2:05:13<7:04:15,  1.51it/s]


 23%|███████▌                         | 11509/50000 [2:05:13<6:50:16,  1.56it/s]


 23%|███████▌                         | 11510/50000 [2:05:14<6:40:28,  1.60it/s]


 23%|███████▌                         | 11511/50000 [2:05:14<6:30:03,  1.64it/s]


 23%|███████▌                         | 11512/50000 [2:05:15<7:14:57,  1.47it/s]


 23%|███████▌                         | 11513/50000 [2:05:16<7:05:09,  1.51it/s]


 23%|███████▌                         | 11514/50000 [2:05:16<6:37:43,  1.61it/s]


 23%|███████▌                         | 11515/50000 [2:05:17<6:30:50,  1.64it/s]


 23%|███████▌                         | 11516/50000 [2:05:18<6:22:21,  1.68it/s]


 23%|███████▌                         | 11517/50000 [2:05:18<6:44:30,  1.59it/s]


 23%|███████▌                         | 11518/50000 [2:05:19<7:06:36,  1.50it/s]


 23%|███████▌                         | 11519/50000 [2:05:20<7:29:21,  1.43it/s]


 23%|███████▌                         | 11520/50000 [2:05:20<6:55:27,  1.54it/s]


 23%|███████▌                         | 11521/50000 [2:05:21<7:08:54,  1.50it/s]


 23%|███████▌                         | 11522/50000 [2:05:22<7:30:55,  1.42it/s]


 23%|███████▌                         | 11523/50000 [2:05:22<7:21:26,  1.45it/s]


 23%|███████▌                         | 11524/50000 [2:05:23<7:06:14,  1.50it/s]


 23%|███████▌                         | 11525/50000 [2:05:24<7:00:35,  1.52it/s]


 23%|███████▌                         | 11526/50000 [2:05:24<6:46:35,  1.58it/s]


 23%|███████▌                         | 11527/50000 [2:05:25<6:56:09,  1.54it/s]


 23%|███████▌                         | 11528/50000 [2:05:26<7:12:54,  1.48it/s]


 23%|███████▌                         | 11529/50000 [2:05:26<6:53:50,  1.55it/s]


 23%|███████▌                         | 11530/50000 [2:05:27<6:44:07,  1.59it/s]


 23%|███████▌                         | 11531/50000 [2:05:28<6:50:35,  1.56it/s]


 23%|███████▌                         | 11532/50000 [2:05:28<6:46:46,  1.58it/s]


 23%|███████▌                         | 11533/50000 [2:05:29<6:51:20,  1.56it/s]


 23%|███████▌                         | 11534/50000 [2:05:29<6:38:41,  1.61it/s]


 23%|███████▌                         | 11535/50000 [2:05:30<6:38:23,  1.61it/s]


 23%|███████▌                         | 11536/50000 [2:05:31<7:15:42,  1.47it/s]


 23%|███████▌                         | 11537/50000 [2:05:31<7:06:44,  1.50it/s]


 23%|███████▌                         | 11538/50000 [2:05:32<7:03:59,  1.51it/s]


 23%|███████▌                         | 11539/50000 [2:05:33<6:58:28,  1.53it/s]


 23%|███████▌                         | 11540/50000 [2:05:33<6:59:30,  1.53it/s]


 23%|███████▌                         | 11541/50000 [2:05:34<6:43:18,  1.59it/s]


 23%|███████▌                         | 11542/50000 [2:05:35<6:50:07,  1.56it/s]


 23%|███████▌                         | 11543/50000 [2:05:35<6:37:54,  1.61it/s]


 23%|███████▌                         | 11544/50000 [2:05:36<6:40:13,  1.60it/s]


 23%|███████▌                         | 11545/50000 [2:05:36<6:35:38,  1.62it/s]


 23%|███████▌                         | 11546/50000 [2:05:37<6:28:53,  1.65it/s]


 23%|███████▌                         | 11547/50000 [2:05:38<7:11:32,  1.49it/s]


 23%|███████▌                         | 11548/50000 [2:05:38<6:56:26,  1.54it/s]


 23%|███████▌                         | 11549/50000 [2:05:39<7:47:57,  1.37it/s]


 23%|███████▌                         | 11550/50000 [2:05:40<7:35:16,  1.41it/s]


 23%|███████▌                         | 11551/50000 [2:05:41<7:07:01,  1.50it/s]


 23%|███████▌                         | 11552/50000 [2:05:41<7:21:41,  1.45it/s]


 23%|███████▌                         | 11553/50000 [2:05:42<7:01:06,  1.52it/s]


 23%|███████▋                         | 11554/50000 [2:05:43<6:51:30,  1.56it/s]


 23%|███████▋                         | 11555/50000 [2:05:43<6:59:24,  1.53it/s]


 23%|███████▋                         | 11556/50000 [2:05:44<7:12:40,  1.48it/s]


 23%|███████▋                         | 11557/50000 [2:05:45<7:03:15,  1.51it/s]


 23%|███████▋                         | 11558/50000 [2:05:45<7:01:28,  1.52it/s]


 23%|███████▋                         | 11559/50000 [2:05:46<6:56:43,  1.54it/s]


 23%|███████▋                         | 11560/50000 [2:05:46<6:42:29,  1.59it/s]


 23%|███████▋                         | 11561/50000 [2:05:47<6:48:12,  1.57it/s]


 23%|███████▋                         | 11562/50000 [2:05:48<6:57:35,  1.53it/s]


 23%|███████▋                         | 11563/50000 [2:05:48<6:45:32,  1.58it/s]


 23%|███████▋                         | 11564/50000 [2:05:49<6:54:04,  1.55it/s]


 23%|███████▋                         | 11565/50000 [2:05:50<7:25:26,  1.44it/s]


 23%|███████▋                         | 11566/50000 [2:05:50<7:21:53,  1.45it/s]


 23%|███████▋                         | 11567/50000 [2:05:51<7:17:00,  1.47it/s]


 23%|███████▋                         | 11568/50000 [2:05:52<7:11:07,  1.49it/s]


 23%|███████▋                         | 11569/50000 [2:05:53<7:25:44,  1.44it/s]


 23%|███████▋                         | 11570/50000 [2:05:53<7:16:05,  1.47it/s]


 23%|███████▋                         | 11571/50000 [2:05:54<7:09:13,  1.49it/s]


 23%|███████▋                         | 11572/50000 [2:05:55<7:30:23,  1.42it/s]


 23%|███████▋                         | 11573/50000 [2:05:55<7:12:28,  1.48it/s]


 23%|███████▋                         | 11574/50000 [2:05:56<6:56:10,  1.54it/s]


 23%|███████▋                         | 11575/50000 [2:05:56<6:44:21,  1.58it/s]


 23%|███████▋                         | 11576/50000 [2:05:57<6:43:26,  1.59it/s]


 23%|███████▋                         | 11577/50000 [2:05:58<6:52:29,  1.55it/s]


 23%|███████▋                         | 11578/50000 [2:05:58<6:52:39,  1.55it/s]


 23%|███████▋                         | 11579/50000 [2:05:59<6:55:11,  1.54it/s]


 23%|███████▋                         | 11580/50000 [2:06:00<7:17:00,  1.47it/s]


 23%|███████▋                         | 11581/50000 [2:06:00<7:08:34,  1.49it/s]


 23%|███████▋                         | 11582/50000 [2:06:01<7:02:45,  1.51it/s]


 23%|███████▋                         | 11583/50000 [2:06:02<6:43:25,  1.59it/s]


 23%|███████▋                         | 11584/50000 [2:06:02<7:03:51,  1.51it/s]


 23%|███████▋                         | 11585/50000 [2:06:03<7:04:11,  1.51it/s]


 23%|███████▋                         | 11586/50000 [2:06:04<7:08:17,  1.49it/s]


 23%|███████▋                         | 11587/50000 [2:06:04<7:07:22,  1.50it/s]


 23%|███████▋                         | 11588/50000 [2:06:05<7:03:11,  1.51it/s]


 23%|███████▋                         | 11589/50000 [2:06:06<6:44:28,  1.58it/s]


 23%|███████▋                         | 11590/50000 [2:06:06<6:49:28,  1.56it/s]


 23%|███████▋                         | 11591/50000 [2:06:07<7:11:51,  1.48it/s]


 23%|███████▋                         | 11592/50000 [2:06:08<7:31:15,  1.42it/s]


 23%|███████▋                         | 11593/50000 [2:06:09<7:36:11,  1.40it/s]


 23%|███████▋                         | 11594/50000 [2:06:09<7:15:54,  1.47it/s]


 23%|███████▋                         | 11595/50000 [2:06:10<7:14:01,  1.47it/s]


 23%|███████▋                         | 11596/50000 [2:06:11<7:21:41,  1.45it/s]


 23%|███████▋                         | 11597/50000 [2:06:11<7:08:34,  1.49it/s]


 23%|███████▋                         | 11598/50000 [2:06:12<7:16:11,  1.47it/s]


 23%|███████▋                         | 11599/50000 [2:06:12<6:58:27,  1.53it/s]


 23%|███████▋                         | 11600/50000 [2:06:13<6:53:42,  1.55it/s]
                                                                                
{'loss': 3.3785, 'grad_norm': 3.292431116104126, 'learning_rate': 0.000768, 'epoch': 0.61}

 23%|███████▋                         | 11600/50000 [2:06:13<6:53:42,  1.55it/s]


 23%|███████▋                         | 11601/50000 [2:06:14<6:55:00,  1.54it/s]


 23%|███████▋                         | 11602/50000 [2:06:14<6:44:45,  1.58it/s]


 23%|███████▋                         | 11603/50000 [2:06:15<6:34:01,  1.62it/s]


 23%|███████▋                         | 11604/50000 [2:06:16<7:12:23,  1.48it/s]


 23%|███████▋                         | 11605/50000 [2:06:17<7:39:14,  1.39it/s]


 23%|███████▋                         | 11606/50000 [2:06:17<7:44:01,  1.38it/s]


 23%|███████▋                         | 11607/50000 [2:06:18<7:19:04,  1.46it/s]


 23%|███████▋                         | 11608/50000 [2:06:18<6:54:23,  1.54it/s]


 23%|███████▋                         | 11609/50000 [2:06:19<6:35:51,  1.62it/s]


 23%|███████▋                         | 11610/50000 [2:06:20<6:25:15,  1.66it/s]


 23%|███████▋                         | 11611/50000 [2:06:20<6:33:34,  1.63it/s]


 23%|███████▋                         | 11612/50000 [2:06:21<6:30:54,  1.64it/s]


 23%|███████▋                         | 11613/50000 [2:06:21<6:41:13,  1.59it/s]


 23%|███████▋                         | 11614/50000 [2:06:22<6:30:16,  1.64it/s]


 23%|███████▋                         | 11615/50000 [2:06:23<6:38:16,  1.61it/s]


 23%|███████▋                         | 11616/50000 [2:06:23<6:33:00,  1.63it/s]


 23%|███████▋                         | 11617/50000 [2:06:24<6:41:25,  1.59it/s]


 23%|███████▋                         | 11618/50000 [2:06:25<6:58:20,  1.53it/s]


 23%|███████▋                         | 11619/50000 [2:06:25<6:40:55,  1.60it/s]


 23%|███████▋                         | 11620/50000 [2:06:26<6:36:03,  1.62it/s]


 23%|███████▋                         | 11621/50000 [2:06:26<6:47:16,  1.57it/s]


 23%|███████▋                         | 11622/50000 [2:06:27<6:50:14,  1.56it/s]


 23%|███████▋                         | 11623/50000 [2:06:28<6:52:14,  1.55it/s]


 23%|███████▋                         | 11624/50000 [2:06:28<7:07:27,  1.50it/s]


 23%|███████▋                         | 11625/50000 [2:06:29<7:10:46,  1.48it/s]


 23%|███████▋                         | 11626/50000 [2:06:30<6:52:15,  1.55it/s]


 23%|███████▋                         | 11627/50000 [2:06:30<6:48:50,  1.56it/s]


 23%|███████▋                         | 11628/50000 [2:06:31<7:12:47,  1.48it/s]


 23%|███████▋                         | 11629/50000 [2:06:32<7:41:01,  1.39it/s]


 23%|███████▋                         | 11630/50000 [2:06:33<7:25:24,  1.44it/s]


 23%|███████▋                         | 11631/50000 [2:06:33<7:11:58,  1.48it/s]


 23%|███████▋                         | 11632/50000 [2:06:34<7:03:49,  1.51it/s]


 23%|███████▋                         | 11633/50000 [2:06:35<6:58:35,  1.53it/s]


 23%|███████▋                         | 11634/50000 [2:06:35<7:00:40,  1.52it/s]


 23%|███████▋                         | 11635/50000 [2:06:36<7:00:49,  1.52it/s]


 23%|███████▋                         | 11636/50000 [2:06:36<6:49:53,  1.56it/s]


 23%|███████▋                         | 11637/50000 [2:06:37<6:34:59,  1.62it/s]


 23%|███████▋                         | 11638/50000 [2:06:38<6:31:46,  1.63it/s]


 23%|███████▋                         | 11639/50000 [2:06:38<6:23:26,  1.67it/s]


 23%|███████▋                         | 11640/50000 [2:06:39<6:21:17,  1.68it/s]


 23%|███████▋                         | 11641/50000 [2:06:39<6:11:22,  1.72it/s]


 23%|███████▋                         | 11642/50000 [2:06:40<6:11:39,  1.72it/s]


 23%|███████▋                         | 11643/50000 [2:06:41<6:22:38,  1.67it/s]


 23%|███████▋                         | 11644/50000 [2:06:41<6:35:12,  1.62it/s]


 23%|███████▋                         | 11645/50000 [2:06:42<6:22:14,  1.67it/s]


 23%|███████▋                         | 11646/50000 [2:06:42<6:30:09,  1.64it/s]


 23%|███████▋                         | 11647/50000 [2:06:43<6:24:50,  1.66it/s]


 23%|███████▋                         | 11648/50000 [2:06:44<6:35:40,  1.62it/s]


 23%|███████▋                         | 11649/50000 [2:06:44<6:31:02,  1.63it/s]


 23%|███████▋                         | 11650/50000 [2:06:45<6:28:56,  1.64it/s]


 23%|███████▋                         | 11651/50000 [2:06:45<6:21:49,  1.67it/s]


 23%|███████▋                         | 11652/50000 [2:06:46<6:20:34,  1.68it/s]


 23%|███████▋                         | 11653/50000 [2:06:47<6:44:00,  1.58it/s]


 23%|███████▋                         | 11654/50000 [2:06:47<6:51:51,  1.55it/s]


 23%|███████▋                         | 11655/50000 [2:06:48<6:39:21,  1.60it/s]


 23%|███████▋                         | 11656/50000 [2:06:49<6:35:19,  1.62it/s]


 23%|███████▋                         | 11657/50000 [2:06:49<6:38:27,  1.60it/s]


 23%|███████▋                         | 11658/50000 [2:06:50<6:58:02,  1.53it/s]


 23%|███████▋                         | 11659/50000 [2:06:51<7:03:06,  1.51it/s]


 23%|███████▋                         | 11660/50000 [2:06:51<7:17:45,  1.46it/s]


 23%|███████▋                         | 11661/50000 [2:06:52<7:09:27,  1.49it/s]


 23%|███████▋                         | 11662/50000 [2:06:53<6:56:27,  1.53it/s]


 23%|███████▋                         | 11663/50000 [2:06:53<7:09:04,  1.49it/s]


 23%|███████▋                         | 11664/50000 [2:06:54<7:16:40,  1.46it/s]


 23%|███████▋                         | 11665/50000 [2:06:55<6:45:01,  1.58it/s]


 23%|███████▋                         | 11666/50000 [2:06:55<6:46:35,  1.57it/s]


 23%|███████▋                         | 11667/50000 [2:06:56<6:38:55,  1.60it/s]


 23%|███████▋                         | 11668/50000 [2:06:56<6:44:34,  1.58it/s]


 23%|███████▋                         | 11669/50000 [2:06:57<7:09:34,  1.49it/s]


 23%|███████▋                         | 11670/50000 [2:06:58<7:01:12,  1.52it/s]


 23%|███████▋                         | 11671/50000 [2:06:58<6:56:38,  1.53it/s]


 23%|███████▋                         | 11672/50000 [2:06:59<6:38:52,  1.60it/s]


 23%|███████▋                         | 11673/50000 [2:07:00<6:42:02,  1.59it/s]


 23%|███████▋                         | 11674/50000 [2:07:00<6:33:21,  1.62it/s]


 23%|███████▋                         | 11675/50000 [2:07:01<6:28:29,  1.64it/s]


 23%|███████▋                         | 11676/50000 [2:07:02<6:53:14,  1.55it/s]


 23%|███████▋                         | 11677/50000 [2:07:02<6:42:03,  1.59it/s]


 23%|███████▋                         | 11678/50000 [2:07:03<7:02:47,  1.51it/s]


 23%|███████▋                         | 11679/50000 [2:07:04<6:54:51,  1.54it/s]


 23%|███████▋                         | 11680/50000 [2:07:04<6:44:24,  1.58it/s]


 23%|███████▋                         | 11681/50000 [2:07:05<6:52:48,  1.55it/s]


 23%|███████▋                         | 11682/50000 [2:07:06<7:12:26,  1.48it/s]


 23%|███████▋                         | 11683/50000 [2:07:06<6:50:38,  1.56it/s]


 23%|███████▋                         | 11684/50000 [2:07:07<7:23:41,  1.44it/s]


 23%|███████▋                         | 11685/50000 [2:07:08<7:17:50,  1.46it/s]


 23%|███████▋                         | 11686/50000 [2:07:08<7:08:57,  1.49it/s]


 23%|███████▋                         | 11687/50000 [2:07:09<6:39:36,  1.60it/s]


 23%|███████▋                         | 11688/50000 [2:07:09<6:58:23,  1.53it/s]


 23%|███████▋                         | 11689/50000 [2:07:10<6:32:53,  1.63it/s]


 23%|███████▋                         | 11690/50000 [2:07:11<6:54:47,  1.54it/s]


 23%|███████▋                         | 11691/50000 [2:07:11<6:55:11,  1.54it/s]


 23%|███████▋                         | 11692/50000 [2:07:12<6:45:45,  1.57it/s]


 23%|███████▋                         | 11693/50000 [2:07:13<6:44:03,  1.58it/s]


 23%|███████▋                         | 11694/50000 [2:07:13<6:34:21,  1.62it/s]


 23%|███████▋                         | 11695/50000 [2:07:14<6:27:37,  1.65it/s]


 23%|███████▋                         | 11696/50000 [2:07:14<6:18:00,  1.69it/s]


 23%|███████▋                         | 11697/50000 [2:07:15<6:28:09,  1.64it/s]


 23%|███████▋                         | 11698/50000 [2:07:16<6:36:34,  1.61it/s]


 23%|███████▋                         | 11699/50000 [2:07:16<6:37:37,  1.61it/s]


 23%|███████▋                         | 11700/50000 [2:07:17<6:38:20,  1.60it/s]
                                                                                
{'loss': 3.385, 'grad_norm': 2.8310530185699463, 'learning_rate': 0.0007660000000000001, 'epoch': 0.61}

 23%|███████▋                         | 11700/50000 [2:07:17<6:38:20,  1.60it/s]


 23%|███████▋                         | 11701/50000 [2:07:17<6:38:49,  1.60it/s]


 23%|███████▋                         | 11702/50000 [2:07:18<6:40:33,  1.59it/s]


 23%|███████▋                         | 11703/50000 [2:07:19<6:41:43,  1.59it/s]


 23%|███████▋                         | 11704/50000 [2:07:19<6:35:07,  1.62it/s]


 23%|███████▋                         | 11705/50000 [2:07:20<7:06:05,  1.50it/s]


 23%|███████▋                         | 11706/50000 [2:07:21<6:59:29,  1.52it/s]


 23%|███████▋                         | 11707/50000 [2:07:21<6:59:39,  1.52it/s]


 23%|███████▋                         | 11708/50000 [2:07:22<6:52:36,  1.55it/s]


 23%|███████▋                         | 11709/50000 [2:07:23<6:53:35,  1.54it/s]


 23%|███████▋                         | 11710/50000 [2:07:23<6:49:25,  1.56it/s]


 23%|███████▋                         | 11711/50000 [2:07:24<7:03:59,  1.51it/s]


 23%|███████▋                         | 11712/50000 [2:07:25<6:56:40,  1.53it/s]


 23%|███████▋                         | 11713/50000 [2:07:25<6:40:44,  1.59it/s]


 23%|███████▋                         | 11714/50000 [2:07:26<6:47:44,  1.56it/s]


 23%|███████▋                         | 11715/50000 [2:07:27<6:45:33,  1.57it/s]


 23%|███████▋                         | 11716/50000 [2:07:27<6:38:56,  1.60it/s]


 23%|███████▋                         | 11717/50000 [2:07:28<7:17:50,  1.46it/s]


 23%|███████▋                         | 11718/50000 [2:07:29<7:02:38,  1.51it/s]


 23%|███████▋                         | 11719/50000 [2:07:29<7:24:15,  1.44it/s]


 23%|███████▋                         | 11720/50000 [2:07:30<7:40:33,  1.39it/s]


 23%|███████▋                         | 11721/50000 [2:07:31<7:18:32,  1.45it/s]


 23%|███████▋                         | 11722/50000 [2:07:32<7:36:34,  1.40it/s]


 23%|███████▋                         | 11723/50000 [2:07:32<7:40:30,  1.39it/s]


 23%|███████▋                         | 11724/50000 [2:07:33<7:17:27,  1.46it/s]


 23%|███████▋                         | 11725/50000 [2:07:34<7:15:45,  1.46it/s]


 23%|███████▋                         | 11726/50000 [2:07:34<6:52:50,  1.55it/s]


 23%|███████▋                         | 11727/50000 [2:07:35<7:11:05,  1.48it/s]


 23%|███████▋                         | 11728/50000 [2:07:35<6:47:16,  1.57it/s]


 23%|███████▋                         | 11729/50000 [2:07:36<6:45:03,  1.57it/s]


 23%|███████▋                         | 11730/50000 [2:07:37<6:36:05,  1.61it/s]


 23%|███████▋                         | 11731/50000 [2:07:37<6:26:29,  1.65it/s]


 23%|███████▋                         | 11732/50000 [2:07:38<7:06:35,  1.50it/s]


 23%|███████▋                         | 11733/50000 [2:07:39<6:46:23,  1.57it/s]


 23%|███████▋                         | 11734/50000 [2:07:39<6:47:57,  1.56it/s]


 23%|███████▋                         | 11735/50000 [2:07:40<6:46:38,  1.57it/s]


 23%|███████▋                         | 11736/50000 [2:07:40<6:48:02,  1.56it/s]


 23%|███████▋                         | 11737/50000 [2:07:41<6:52:40,  1.55it/s]


 23%|███████▋                         | 11738/50000 [2:07:42<7:32:35,  1.41it/s]


 23%|███████▋                         | 11739/50000 [2:07:43<7:43:44,  1.38it/s]


 23%|███████▋                         | 11740/50000 [2:07:43<7:47:54,  1.36it/s]


 23%|███████▋                         | 11741/50000 [2:07:44<7:35:55,  1.40it/s]


 23%|███████▋                         | 11742/50000 [2:07:45<7:17:52,  1.46it/s]


 23%|███████▊                         | 11743/50000 [2:07:45<6:40:27,  1.59it/s]


 23%|███████▊                         | 11744/50000 [2:07:46<6:29:10,  1.64it/s]


 23%|███████▊                         | 11745/50000 [2:07:47<7:14:29,  1.47it/s]


 23%|███████▊                         | 11746/50000 [2:07:47<6:55:37,  1.53it/s]


 23%|███████▊                         | 11747/50000 [2:07:48<6:40:15,  1.59it/s]


 23%|███████▊                         | 11748/50000 [2:07:48<6:30:18,  1.63it/s]


 23%|███████▊                         | 11749/50000 [2:07:49<6:33:04,  1.62it/s]


 24%|███████▊                         | 11750/50000 [2:07:50<6:03:16,  1.75it/s]


 24%|███████▊                         | 11751/50000 [2:07:50<6:23:47,  1.66it/s]


 24%|███████▊                         | 11752/50000 [2:07:51<6:25:37,  1.65it/s]


 24%|███████▊                         | 11753/50000 [2:07:51<6:26:05,  1.65it/s]


 24%|███████▊                         | 11754/50000 [2:07:52<7:04:58,  1.50it/s]


 24%|███████▊                         | 11755/50000 [2:07:53<7:22:03,  1.44it/s]


 24%|███████▊                         | 11756/50000 [2:07:54<7:30:13,  1.42it/s]


 24%|███████▊                         | 11757/50000 [2:07:54<7:23:22,  1.44it/s]


 24%|███████▊                         | 11758/50000 [2:07:55<7:02:52,  1.51it/s]


 24%|███████▊                         | 11759/50000 [2:07:55<6:35:51,  1.61it/s]


 24%|███████▊                         | 11760/50000 [2:07:56<6:44:07,  1.58it/s]


 24%|███████▊                         | 11761/50000 [2:07:57<7:08:35,  1.49it/s]


 24%|███████▊                         | 11762/50000 [2:07:58<7:02:06,  1.51it/s]


 24%|███████▊                         | 11763/50000 [2:07:58<7:02:41,  1.51it/s]


 24%|███████▊                         | 11764/50000 [2:07:59<7:11:22,  1.48it/s]


 24%|███████▊                         | 11765/50000 [2:08:00<7:39:10,  1.39it/s]


 24%|███████▊                         | 11766/50000 [2:08:00<6:52:58,  1.54it/s]


 24%|███████▊                         | 11767/50000 [2:08:01<6:53:34,  1.54it/s]


 24%|███████▊                         | 11768/50000 [2:08:02<7:18:47,  1.45it/s]


 24%|███████▊                         | 11769/50000 [2:08:02<6:59:41,  1.52it/s]


 24%|███████▊                         | 11770/50000 [2:08:03<6:48:43,  1.56it/s]


 24%|███████▊                         | 11771/50000 [2:08:04<7:10:37,  1.48it/s]


 24%|███████▊                         | 11772/50000 [2:08:04<7:02:29,  1.51it/s]


 24%|███████▊                         | 11773/50000 [2:08:05<6:45:22,  1.57it/s]


 24%|███████▊                         | 11774/50000 [2:08:06<7:10:33,  1.48it/s]


 24%|███████▊                         | 11775/50000 [2:08:06<7:02:22,  1.51it/s]


 24%|███████▊                         | 11776/50000 [2:08:07<7:16:27,  1.46it/s]


 24%|███████▊                         | 11777/50000 [2:08:08<7:22:42,  1.44it/s]


 24%|███████▊                         | 11778/50000 [2:08:08<7:47:00,  1.36it/s]


 24%|███████▊                         | 11779/50000 [2:08:09<7:29:24,  1.42it/s]


 24%|███████▊                         | 11780/50000 [2:08:10<7:16:47,  1.46it/s]


 24%|███████▊                         | 11781/50000 [2:08:10<7:15:10,  1.46it/s]


 24%|███████▊                         | 11782/50000 [2:08:11<6:58:36,  1.52it/s]


 24%|███████▊                         | 11783/50000 [2:08:12<7:01:06,  1.51it/s]


 24%|███████▊                         | 11784/50000 [2:08:12<6:56:45,  1.53it/s]


 24%|███████▊                         | 11785/50000 [2:08:13<6:57:27,  1.53it/s]


 24%|███████▊                         | 11786/50000 [2:08:14<7:16:38,  1.46it/s]


 24%|███████▊                         | 11787/50000 [2:08:14<7:22:48,  1.44it/s]


 24%|███████▊                         | 11788/50000 [2:08:15<7:15:42,  1.46it/s]


 24%|███████▊                         | 11789/50000 [2:08:16<7:14:20,  1.47it/s]


 24%|███████▊                         | 11790/50000 [2:08:16<6:58:47,  1.52it/s]


 24%|███████▊                         | 11791/50000 [2:08:17<7:11:39,  1.48it/s]


 24%|███████▊                         | 11792/50000 [2:08:18<6:57:32,  1.53it/s]


 24%|███████▊                         | 11793/50000 [2:08:18<6:52:21,  1.54it/s]


 24%|███████▊                         | 11794/50000 [2:08:19<6:48:44,  1.56it/s]


 24%|███████▊                         | 11795/50000 [2:08:20<6:41:20,  1.59it/s]


 24%|███████▊                         | 11796/50000 [2:08:20<6:50:38,  1.55it/s]


 24%|███████▊                         | 11797/50000 [2:08:21<6:54:45,  1.54it/s]


 24%|███████▊                         | 11798/50000 [2:08:22<6:37:20,  1.60it/s]


 24%|███████▊                         | 11799/50000 [2:08:22<6:29:57,  1.63it/s]


 24%|███████▊                         | 11800/50000 [2:08:23<6:39:06,  1.60it/s]
                                                                                
{'loss': 3.328, 'grad_norm': 2.782923460006714, 'learning_rate': 0.000764, 'epoch': 0.62}

 24%|███████▊                         | 11800/50000 [2:08:23<6:39:06,  1.60it/s]


 24%|███████▊                         | 11801/50000 [2:08:23<6:46:46,  1.57it/s]


 24%|███████▊                         | 11802/50000 [2:08:24<6:33:04,  1.62it/s]


 24%|███████▊                         | 11803/50000 [2:08:25<6:58:35,  1.52it/s]


 24%|███████▊                         | 11804/50000 [2:08:25<6:46:57,  1.56it/s]


 24%|███████▊                         | 11805/50000 [2:08:26<6:47:21,  1.56it/s]


 24%|███████▊                         | 11806/50000 [2:08:27<6:56:24,  1.53it/s]


 24%|███████▊                         | 11807/50000 [2:08:27<6:51:33,  1.55it/s]


 24%|███████▊                         | 11808/50000 [2:08:28<7:04:09,  1.50it/s]


 24%|███████▊                         | 11809/50000 [2:08:29<6:52:07,  1.54it/s]


 24%|███████▊                         | 11810/50000 [2:08:29<6:54:41,  1.53it/s]


 24%|███████▊                         | 11811/50000 [2:08:30<6:29:25,  1.63it/s]


 24%|███████▊                         | 11812/50000 [2:08:30<6:20:32,  1.67it/s]


 24%|███████▊                         | 11813/50000 [2:08:31<6:11:56,  1.71it/s]


 24%|███████▊                         | 11814/50000 [2:08:32<6:36:10,  1.61it/s]


 24%|███████▊                         | 11815/50000 [2:08:32<6:26:35,  1.65it/s]


 24%|███████▊                         | 11816/50000 [2:08:33<6:51:49,  1.55it/s]


 24%|███████▊                         | 11817/50000 [2:08:34<6:56:42,  1.53it/s]


 24%|███████▊                         | 11818/50000 [2:08:34<6:47:59,  1.56it/s]


 24%|███████▊                         | 11819/50000 [2:08:35<6:54:09,  1.54it/s]


 24%|███████▊                         | 11820/50000 [2:08:35<6:39:09,  1.59it/s]


 24%|███████▊                         | 11821/50000 [2:08:36<6:46:44,  1.56it/s]


 24%|███████▊                         | 11822/50000 [2:08:37<6:36:27,  1.60it/s]


 24%|███████▊                         | 11823/50000 [2:08:38<7:06:08,  1.49it/s]


 24%|███████▊                         | 11824/50000 [2:08:38<7:06:45,  1.49it/s]


 24%|███████▊                         | 11825/50000 [2:08:39<7:03:44,  1.50it/s]


 24%|███████▊                         | 11826/50000 [2:08:39<6:50:38,  1.55it/s]


 24%|███████▊                         | 11827/50000 [2:08:40<6:49:58,  1.55it/s]


 24%|███████▊                         | 11828/50000 [2:08:41<6:37:21,  1.60it/s]


 24%|███████▊                         | 11829/50000 [2:08:41<6:41:35,  1.58it/s]


 24%|███████▊                         | 11830/50000 [2:08:42<6:59:27,  1.52it/s]


 24%|███████▊                         | 11831/50000 [2:08:43<6:48:25,  1.56it/s]


 24%|███████▊                         | 11832/50000 [2:08:43<6:49:31,  1.55it/s]


 24%|███████▊                         | 11833/50000 [2:08:44<6:51:32,  1.55it/s]


 24%|███████▊                         | 11834/50000 [2:08:44<6:34:34,  1.61it/s]


 24%|███████▊                         | 11835/50000 [2:08:45<6:21:23,  1.67it/s]


 24%|███████▊                         | 11836/50000 [2:08:46<6:31:34,  1.62it/s]


 24%|███████▊                         | 11837/50000 [2:08:46<6:24:18,  1.66it/s]


 24%|███████▊                         | 11838/50000 [2:08:47<6:29:45,  1.63it/s]


 24%|███████▊                         | 11839/50000 [2:08:47<6:21:30,  1.67it/s]


 24%|███████▊                         | 11840/50000 [2:08:48<6:47:38,  1.56it/s]


 24%|███████▊                         | 11841/50000 [2:08:49<6:38:50,  1.59it/s]


 24%|███████▊                         | 11842/50000 [2:08:49<6:43:30,  1.58it/s]


 24%|███████▊                         | 11843/50000 [2:08:50<6:26:20,  1.65it/s]


 24%|███████▊                         | 11844/50000 [2:08:51<6:56:58,  1.53it/s]


 24%|███████▊                         | 11845/50000 [2:08:51<6:50:15,  1.55it/s]


 24%|███████▊                         | 11846/50000 [2:08:52<6:39:47,  1.59it/s]


 24%|███████▊                         | 11847/50000 [2:08:53<6:50:17,  1.55it/s]


 24%|███████▊                         | 11848/50000 [2:08:53<6:34:43,  1.61it/s]


 24%|███████▊                         | 11849/50000 [2:08:54<7:35:07,  1.40it/s]


 24%|███████▊                         | 11850/50000 [2:08:55<7:40:20,  1.38it/s]


 24%|███████▊                         | 11851/50000 [2:08:56<7:21:46,  1.44it/s]


 24%|███████▊                         | 11852/50000 [2:08:56<7:16:11,  1.46it/s]


 24%|███████▊                         | 11853/50000 [2:08:57<7:29:35,  1.41it/s]


 24%|███████▊                         | 11854/50000 [2:08:57<6:54:17,  1.53it/s]


 24%|███████▊                         | 11855/50000 [2:08:58<7:16:32,  1.46it/s]


 24%|███████▊                         | 11856/50000 [2:08:59<7:00:17,  1.51it/s]


 24%|███████▊                         | 11857/50000 [2:09:00<7:11:07,  1.47it/s]


 24%|███████▊                         | 11858/50000 [2:09:00<7:08:32,  1.48it/s]


 24%|███████▊                         | 11859/50000 [2:09:01<7:00:43,  1.51it/s]


 24%|███████▊                         | 11860/50000 [2:09:02<7:00:01,  1.51it/s]


 24%|███████▊                         | 11861/50000 [2:09:02<6:55:49,  1.53it/s]


 24%|███████▊                         | 11862/50000 [2:09:03<6:43:21,  1.58it/s]


 24%|███████▊                         | 11863/50000 [2:09:03<6:38:13,  1.60it/s]


 24%|███████▊                         | 11864/50000 [2:09:04<6:43:05,  1.58it/s]


 24%|███████▊                         | 11865/50000 [2:09:05<6:45:14,  1.57it/s]


 24%|███████▊                         | 11866/50000 [2:09:05<6:31:11,  1.62it/s]


 24%|███████▊                         | 11867/50000 [2:09:06<6:53:14,  1.54it/s]


 24%|███████▊                         | 11868/50000 [2:09:07<7:13:47,  1.47it/s]


 24%|███████▊                         | 11869/50000 [2:09:08<7:43:19,  1.37it/s]


 24%|███████▊                         | 11870/50000 [2:09:08<7:18:12,  1.45it/s]


 24%|███████▊                         | 11871/50000 [2:09:09<7:00:32,  1.51it/s]


 24%|███████▊                         | 11872/50000 [2:09:09<7:02:47,  1.50it/s]


 24%|███████▊                         | 11873/50000 [2:09:10<6:43:50,  1.57it/s]


 24%|███████▊                         | 11874/50000 [2:09:11<7:00:02,  1.51it/s]


 24%|███████▊                         | 11875/50000 [2:09:11<6:46:38,  1.56it/s]


 24%|███████▊                         | 11876/50000 [2:09:12<6:46:52,  1.56it/s]


 24%|███████▊                         | 11877/50000 [2:09:12<6:32:41,  1.62it/s]


 24%|███████▊                         | 11878/50000 [2:09:13<6:30:03,  1.63it/s]


 24%|███████▊                         | 11879/50000 [2:09:14<6:38:59,  1.59it/s]


 24%|███████▊                         | 11880/50000 [2:09:15<7:05:08,  1.49it/s]


 24%|███████▊                         | 11881/50000 [2:09:15<7:15:39,  1.46it/s]


 24%|███████▊                         | 11882/50000 [2:09:16<7:10:31,  1.48it/s]


 24%|███████▊                         | 11883/50000 [2:09:16<6:53:31,  1.54it/s]


 24%|███████▊                         | 11884/50000 [2:09:17<6:34:26,  1.61it/s]


 24%|███████▊                         | 11885/50000 [2:09:18<6:55:23,  1.53it/s]


 24%|███████▊                         | 11886/50000 [2:09:19<7:13:27,  1.47it/s]


 24%|███████▊                         | 11887/50000 [2:09:19<7:01:41,  1.51it/s]


 24%|███████▊                         | 11888/50000 [2:09:20<6:59:20,  1.51it/s]


 24%|███████▊                         | 11889/50000 [2:09:20<6:57:41,  1.52it/s]


 24%|███████▊                         | 11890/50000 [2:09:21<6:51:49,  1.54it/s]


 24%|███████▊                         | 11891/50000 [2:09:22<6:53:43,  1.54it/s]


 24%|███████▊                         | 11892/50000 [2:09:22<6:51:22,  1.54it/s]


 24%|███████▊                         | 11893/50000 [2:09:23<6:55:44,  1.53it/s]


 24%|███████▊                         | 11894/50000 [2:09:24<6:57:44,  1.52it/s]


 24%|███████▊                         | 11895/50000 [2:09:24<7:11:29,  1.47it/s]


 24%|███████▊                         | 11896/50000 [2:09:25<6:57:46,  1.52it/s]


 24%|███████▊                         | 11897/50000 [2:09:26<7:09:11,  1.48it/s]


 24%|███████▊                         | 11898/50000 [2:09:26<7:10:46,  1.47it/s]


 24%|███████▊                         | 11899/50000 [2:09:27<7:37:01,  1.39it/s]


 24%|███████▊                         | 11900/50000 [2:09:28<7:06:05,  1.49it/s]
                                                                                
{'loss': 3.3828, 'grad_norm': 2.603227376937866, 'learning_rate': 0.000762, 'epoch': 0.62}

 24%|███████▊                         | 11900/50000 [2:09:28<7:06:05,  1.49it/s]


 24%|███████▊                         | 11901/50000 [2:09:28<6:49:20,  1.55it/s]


 24%|███████▊                         | 11902/50000 [2:09:29<6:52:01,  1.54it/s]


 24%|███████▊                         | 11903/50000 [2:09:30<6:40:29,  1.59it/s]


 24%|███████▊                         | 11904/50000 [2:09:30<6:49:14,  1.55it/s]


 24%|███████▊                         | 11905/50000 [2:09:31<6:44:40,  1.57it/s]


 24%|███████▊                         | 11906/50000 [2:09:32<6:29:26,  1.63it/s]


 24%|███████▊                         | 11907/50000 [2:09:32<6:57:53,  1.52it/s]


 24%|███████▊                         | 11908/50000 [2:09:33<6:57:51,  1.52it/s]


 24%|███████▊                         | 11909/50000 [2:09:34<6:55:23,  1.53it/s]


 24%|███████▊                         | 11910/50000 [2:09:34<7:09:47,  1.48it/s]


 24%|███████▊                         | 11911/50000 [2:09:35<7:11:05,  1.47it/s]


 24%|███████▊                         | 11912/50000 [2:09:35<6:37:36,  1.60it/s]


 24%|███████▊                         | 11913/50000 [2:09:36<6:43:42,  1.57it/s]


 24%|███████▊                         | 11914/50000 [2:09:37<6:37:05,  1.60it/s]


 24%|███████▊                         | 11915/50000 [2:09:37<6:54:08,  1.53it/s]


 24%|███████▊                         | 11916/50000 [2:09:38<6:39:38,  1.59it/s]


 24%|███████▊                         | 11917/50000 [2:09:39<6:39:02,  1.59it/s]


 24%|███████▊                         | 11918/50000 [2:09:39<6:25:33,  1.65it/s]


 24%|███████▊                         | 11919/50000 [2:09:40<6:49:09,  1.55it/s]


 24%|███████▊                         | 11920/50000 [2:09:41<6:48:24,  1.55it/s]


 24%|███████▊                         | 11921/50000 [2:09:42<7:47:54,  1.36it/s]


 24%|███████▊                         | 11922/50000 [2:09:42<7:19:48,  1.44it/s]


 24%|███████▊                         | 11923/50000 [2:09:43<7:03:38,  1.50it/s]


 24%|███████▊                         | 11924/50000 [2:09:43<7:06:44,  1.49it/s]


 24%|███████▊                         | 11925/50000 [2:09:44<7:37:38,  1.39it/s]


 24%|███████▊                         | 11926/50000 [2:09:45<8:02:17,  1.32it/s]


 24%|███████▊                         | 11927/50000 [2:09:46<7:30:48,  1.41it/s]


 24%|███████▊                         | 11928/50000 [2:09:46<7:23:15,  1.43it/s]


 24%|███████▊                         | 11929/50000 [2:09:47<7:05:52,  1.49it/s]


 24%|███████▊                         | 11930/50000 [2:09:48<6:45:25,  1.57it/s]


 24%|███████▊                         | 11931/50000 [2:09:48<6:38:09,  1.59it/s]


 24%|███████▉                         | 11932/50000 [2:09:49<6:46:30,  1.56it/s]


 24%|███████▉                         | 11933/50000 [2:09:50<7:01:12,  1.51it/s]


 24%|███████▉                         | 11934/50000 [2:09:50<6:52:58,  1.54it/s]


 24%|███████▉                         | 11935/50000 [2:09:51<6:35:10,  1.61it/s]


 24%|███████▉                         | 11936/50000 [2:09:51<6:27:28,  1.64it/s]


 24%|███████▉                         | 11937/50000 [2:09:52<6:38:57,  1.59it/s]


 24%|███████▉                         | 11938/50000 [2:09:53<6:42:15,  1.58it/s]


 24%|███████▉                         | 11939/50000 [2:09:53<6:18:53,  1.67it/s]


 24%|███████▉                         | 11940/50000 [2:09:54<6:51:24,  1.54it/s]


 24%|███████▉                         | 11941/50000 [2:09:55<6:53:10,  1.54it/s]


 24%|███████▉                         | 11942/50000 [2:09:55<6:59:23,  1.51it/s]


 24%|███████▉                         | 11943/50000 [2:09:56<6:54:04,  1.53it/s]


 24%|███████▉                         | 11944/50000 [2:09:56<6:35:52,  1.60it/s]


 24%|███████▉                         | 11945/50000 [2:09:57<6:39:50,  1.59it/s]


 24%|███████▉                         | 11946/50000 [2:09:58<6:56:02,  1.52it/s]


 24%|███████▉                         | 11947/50000 [2:09:58<6:52:46,  1.54it/s]


 24%|███████▉                         | 11948/50000 [2:09:59<6:52:54,  1.54it/s]


 24%|███████▉                         | 11949/50000 [2:10:00<6:42:09,  1.58it/s]


 24%|███████▉                         | 11950/50000 [2:10:00<6:44:21,  1.57it/s]


 24%|███████▉                         | 11951/50000 [2:10:01<6:32:42,  1.61it/s]


 24%|███████▉                         | 11952/50000 [2:10:02<6:43:41,  1.57it/s]


 24%|███████▉                         | 11953/50000 [2:10:02<6:41:48,  1.58it/s]


 24%|███████▉                         | 11954/50000 [2:10:03<6:41:51,  1.58it/s]


 24%|███████▉                         | 11955/50000 [2:10:03<6:14:56,  1.69it/s]


 24%|███████▉                         | 11956/50000 [2:10:04<6:28:53,  1.63it/s]


 24%|███████▉                         | 11957/50000 [2:10:05<6:19:27,  1.67it/s]


 24%|███████▉                         | 11958/50000 [2:10:05<6:50:39,  1.54it/s]


 24%|███████▉                         | 11959/50000 [2:10:06<7:30:10,  1.41it/s]


 24%|███████▉                         | 11960/50000 [2:10:07<7:14:19,  1.46it/s]


 24%|███████▉                         | 11961/50000 [2:10:07<6:50:12,  1.55it/s]


 24%|███████▉                         | 11962/50000 [2:10:08<7:08:29,  1.48it/s]


 24%|███████▉                         | 11963/50000 [2:10:09<7:10:05,  1.47it/s]


 24%|███████▉                         | 11964/50000 [2:10:10<7:44:50,  1.36it/s]


 24%|███████▉                         | 11965/50000 [2:10:10<7:24:38,  1.43it/s]


 24%|███████▉                         | 11966/50000 [2:10:11<7:12:48,  1.46it/s]


 24%|███████▉                         | 11967/50000 [2:10:12<7:41:41,  1.37it/s]


 24%|███████▉                         | 11968/50000 [2:10:12<7:18:35,  1.45it/s]


 24%|███████▉                         | 11969/50000 [2:10:13<7:05:01,  1.49it/s]


 24%|███████▉                         | 11970/50000 [2:10:14<6:44:38,  1.57it/s]


 24%|███████▉                         | 11971/50000 [2:10:14<7:19:41,  1.44it/s]


 24%|███████▉                         | 11972/50000 [2:10:15<7:15:48,  1.45it/s]


 24%|███████▉                         | 11973/50000 [2:10:16<7:14:20,  1.46it/s]


 24%|███████▉                         | 11974/50000 [2:10:16<7:10:53,  1.47it/s]


 24%|███████▉                         | 11975/50000 [2:10:17<7:07:11,  1.48it/s]


 24%|███████▉                         | 11976/50000 [2:10:18<6:38:47,  1.59it/s]


 24%|███████▉                         | 11977/50000 [2:10:18<6:23:18,  1.65it/s]


 24%|███████▉                         | 11978/50000 [2:10:19<6:30:16,  1.62it/s]


 24%|███████▉                         | 11979/50000 [2:10:19<6:36:22,  1.60it/s]


 24%|███████▉                         | 11980/50000 [2:10:20<6:15:37,  1.69it/s]


 24%|███████▉                         | 11981/50000 [2:10:21<6:14:08,  1.69it/s]


 24%|███████▉                         | 11982/50000 [2:10:21<6:32:10,  1.62it/s]


 24%|███████▉                         | 11983/50000 [2:10:22<6:34:48,  1.60it/s]


 24%|███████▉                         | 11984/50000 [2:10:23<6:56:19,  1.52it/s]


 24%|███████▉                         | 11985/50000 [2:10:23<6:45:41,  1.56it/s]


 24%|███████▉                         | 11986/50000 [2:10:24<7:01:47,  1.50it/s]


 24%|███████▉                         | 11987/50000 [2:10:24<6:42:30,  1.57it/s]


 24%|███████▉                         | 11988/50000 [2:10:25<6:56:33,  1.52it/s]


 24%|███████▉                         | 11989/50000 [2:10:26<6:56:39,  1.52it/s]


 24%|███████▉                         | 11990/50000 [2:10:27<7:31:50,  1.40it/s]


 24%|███████▉                         | 11991/50000 [2:10:27<7:20:08,  1.44it/s]


 24%|███████▉                         | 11992/50000 [2:10:28<6:43:53,  1.57it/s]


 24%|███████▉                         | 11993/50000 [2:10:29<7:00:12,  1.51it/s]


 24%|███████▉                         | 11994/50000 [2:10:29<6:53:17,  1.53it/s]


 24%|███████▉                         | 11995/50000 [2:10:30<6:19:12,  1.67it/s]


 24%|███████▉                         | 11996/50000 [2:10:30<6:26:01,  1.64it/s]


 24%|███████▉                         | 11997/50000 [2:10:31<6:24:34,  1.65it/s]


 24%|███████▉                         | 11998/50000 [2:10:31<6:21:02,  1.66it/s]


 24%|███████▉                         | 11999/50000 [2:10:32<6:49:00,  1.55it/s]


 24%|███████▉                         | 12000/50000 [2:10:33<6:48:16,  1.55it/s]
                                                                                
{'loss': 3.3544, 'grad_norm': 2.63329815864563, 'learning_rate': 0.00076, 'epoch': 0.63}

 24%|███████▉                         | 12000/50000 [2:10:33<6:48:16,  1.55it/s]


 24%|███████▉                         | 12001/50000 [2:10:33<6:35:28,  1.60it/s]


 24%|███████▉                         | 12002/50000 [2:10:34<6:45:28,  1.56it/s]


 24%|███████▉                         | 12003/50000 [2:10:35<6:36:19,  1.60it/s]


 24%|███████▉                         | 12004/50000 [2:10:36<7:11:27,  1.47it/s]


 24%|███████▉                         | 12005/50000 [2:10:36<6:53:51,  1.53it/s]


 24%|███████▉                         | 12006/50000 [2:10:37<6:33:11,  1.61it/s]


 24%|███████▉                         | 12007/50000 [2:10:37<6:49:58,  1.54it/s]


 24%|███████▉                         | 12008/50000 [2:10:38<6:30:33,  1.62it/s]


 24%|███████▉                         | 12009/50000 [2:10:39<6:48:05,  1.55it/s]


 24%|███████▉                         | 12010/50000 [2:10:39<6:45:49,  1.56it/s]


 24%|███████▉                         | 12011/50000 [2:10:40<6:53:53,  1.53it/s]


 24%|███████▉                         | 12012/50000 [2:10:41<7:13:15,  1.46it/s]


 24%|███████▉                         | 12013/50000 [2:10:41<7:03:26,  1.50it/s]


 24%|███████▉                         | 12014/50000 [2:10:42<7:01:27,  1.50it/s]


 24%|███████▉                         | 12015/50000 [2:10:43<6:39:42,  1.58it/s]


 24%|███████▉                         | 12016/50000 [2:10:43<6:35:18,  1.60it/s]


 24%|███████▉                         | 12017/50000 [2:10:44<6:38:12,  1.59it/s]


 24%|███████▉                         | 12018/50000 [2:10:44<6:43:46,  1.57it/s]


 24%|███████▉                         | 12019/50000 [2:10:45<6:42:53,  1.57it/s]


 24%|███████▉                         | 12020/50000 [2:10:46<6:41:01,  1.58it/s]


 24%|███████▉                         | 12021/50000 [2:10:46<6:34:50,  1.60it/s]


 24%|███████▉                         | 12022/50000 [2:10:47<6:34:25,  1.60it/s]


 24%|███████▉                         | 12023/50000 [2:10:47<6:23:31,  1.65it/s]


 24%|███████▉                         | 12024/50000 [2:10:48<6:23:53,  1.65it/s]


 24%|███████▉                         | 12025/50000 [2:10:49<6:54:02,  1.53it/s]


 24%|███████▉                         | 12026/50000 [2:10:50<6:49:43,  1.54it/s]


 24%|███████▉                         | 12027/50000 [2:10:50<6:41:31,  1.58it/s]


 24%|███████▉                         | 12028/50000 [2:10:51<7:04:41,  1.49it/s]


 24%|███████▉                         | 12029/50000 [2:10:52<7:35:44,  1.39it/s]


 24%|███████▉                         | 12030/50000 [2:10:52<7:36:24,  1.39it/s]


 24%|███████▉                         | 12031/50000 [2:10:53<7:26:55,  1.42it/s]


 24%|███████▉                         | 12032/50000 [2:10:54<7:01:01,  1.50it/s]


 24%|███████▉                         | 12033/50000 [2:10:54<7:09:32,  1.47it/s]


 24%|███████▉                         | 12034/50000 [2:10:55<7:41:34,  1.37it/s]


 24%|███████▉                         | 12035/50000 [2:10:56<7:18:34,  1.44it/s]


 24%|███████▉                         | 12036/50000 [2:10:57<7:23:07,  1.43it/s]


 24%|███████▉                         | 12037/50000 [2:10:57<7:08:22,  1.48it/s]


 24%|███████▉                         | 12038/50000 [2:10:58<7:07:39,  1.48it/s]


 24%|███████▉                         | 12039/50000 [2:10:59<7:33:18,  1.40it/s]


 24%|███████▉                         | 12040/50000 [2:10:59<7:11:41,  1.47it/s]


 24%|███████▉                         | 12041/50000 [2:11:00<6:56:48,  1.52it/s]


 24%|███████▉                         | 12042/50000 [2:11:01<7:13:13,  1.46it/s]


 24%|███████▉                         | 12043/50000 [2:11:01<7:43:17,  1.37it/s]


 24%|███████▉                         | 12044/50000 [2:11:02<7:40:29,  1.37it/s]


 24%|███████▉                         | 12045/50000 [2:11:03<7:27:46,  1.41it/s]


 24%|███████▉                         | 12046/50000 [2:11:03<6:50:39,  1.54it/s]


 24%|███████▉                         | 12047/50000 [2:11:04<6:23:48,  1.65it/s]


 24%|███████▉                         | 12048/50000 [2:11:04<6:18:24,  1.67it/s]


 24%|███████▉                         | 12049/50000 [2:11:05<6:26:33,  1.64it/s]


 24%|███████▉                         | 12050/50000 [2:11:06<6:10:08,  1.71it/s]


 24%|███████▉                         | 12051/50000 [2:11:06<6:12:23,  1.70it/s]


 24%|███████▉                         | 12052/50000 [2:11:07<6:30:53,  1.62it/s]


 24%|███████▉                         | 12053/50000 [2:11:08<6:40:33,  1.58it/s]


 24%|███████▉                         | 12054/50000 [2:11:08<6:45:54,  1.56it/s]


 24%|███████▉                         | 12055/50000 [2:11:09<6:33:45,  1.61it/s]


 24%|███████▉                         | 12056/50000 [2:11:09<6:22:46,  1.65it/s]


 24%|███████▉                         | 12057/50000 [2:11:10<6:11:23,  1.70it/s]


 24%|███████▉                         | 12058/50000 [2:11:11<6:17:36,  1.67it/s]


 24%|███████▉                         | 12059/50000 [2:11:11<6:46:27,  1.56it/s]


 24%|███████▉                         | 12060/50000 [2:11:12<6:38:30,  1.59it/s]


 24%|███████▉                         | 12061/50000 [2:11:12<6:23:29,  1.65it/s]


 24%|███████▉                         | 12062/50000 [2:11:13<6:47:04,  1.55it/s]


 24%|███████▉                         | 12063/50000 [2:11:14<6:22:45,  1.65it/s]


 24%|███████▉                         | 12064/50000 [2:11:14<6:31:30,  1.61it/s]


 24%|███████▉                         | 12065/50000 [2:11:15<6:35:11,  1.60it/s]


 24%|███████▉                         | 12066/50000 [2:11:16<6:57:28,  1.51it/s]


 24%|███████▉                         | 12067/50000 [2:11:16<6:58:09,  1.51it/s]


 24%|███████▉                         | 12068/50000 [2:11:17<6:56:14,  1.52it/s]


 24%|███████▉                         | 12069/50000 [2:11:18<6:38:12,  1.59it/s]


 24%|███████▉                         | 12070/50000 [2:11:18<6:38:52,  1.58it/s]


 24%|███████▉                         | 12071/50000 [2:11:19<7:04:48,  1.49it/s]


 24%|███████▉                         | 12072/50000 [2:11:19<6:30:27,  1.62it/s]


 24%|███████▉                         | 12073/50000 [2:11:20<6:57:52,  1.51it/s]


 24%|███████▉                         | 12074/50000 [2:11:21<6:27:59,  1.63it/s]


 24%|███████▉                         | 12075/50000 [2:11:21<6:54:43,  1.52it/s]


 24%|███████▉                         | 12076/50000 [2:11:22<6:49:26,  1.54it/s]


 24%|███████▉                         | 12077/50000 [2:11:23<6:38:26,  1.59it/s]


 24%|███████▉                         | 12078/50000 [2:11:23<6:27:01,  1.63it/s]


 24%|███████▉                         | 12079/50000 [2:11:24<6:35:40,  1.60it/s]


 24%|███████▉                         | 12080/50000 [2:11:24<6:20:18,  1.66it/s]


 24%|███████▉                         | 12081/50000 [2:11:25<6:21:16,  1.66it/s]


 24%|███████▉                         | 12082/50000 [2:11:26<6:31:35,  1.61it/s]


 24%|███████▉                         | 12083/50000 [2:11:26<6:51:22,  1.54it/s]


 24%|███████▉                         | 12084/50000 [2:11:27<6:52:39,  1.53it/s]


 24%|███████▉                         | 12085/50000 [2:11:28<6:47:54,  1.55it/s]


 24%|███████▉                         | 12086/50000 [2:11:28<6:24:42,  1.64it/s]


 24%|███████▉                         | 12087/50000 [2:11:29<6:57:29,  1.51it/s]


 24%|███████▉                         | 12088/50000 [2:11:30<6:53:06,  1.53it/s]


 24%|███████▉                         | 12089/50000 [2:11:30<6:50:38,  1.54it/s]


 24%|███████▉                         | 12090/50000 [2:11:31<6:47:47,  1.55it/s]


 24%|███████▉                         | 12091/50000 [2:11:32<7:04:19,  1.49it/s]


 24%|███████▉                         | 12092/50000 [2:11:32<6:45:28,  1.56it/s]


 24%|███████▉                         | 12093/50000 [2:11:33<6:31:07,  1.62it/s]


 24%|███████▉                         | 12094/50000 [2:11:33<6:32:58,  1.61it/s]


 24%|███████▉                         | 12095/50000 [2:11:34<6:44:57,  1.56it/s]


 24%|███████▉                         | 12096/50000 [2:11:35<6:52:22,  1.53it/s]


 24%|███████▉                         | 12097/50000 [2:11:35<6:47:54,  1.55it/s]


 24%|███████▉                         | 12098/50000 [2:11:36<7:07:35,  1.48it/s]


 24%|███████▉                         | 12099/50000 [2:11:37<6:44:56,  1.56it/s]


 24%|███████▉                         | 12100/50000 [2:11:37<6:45:44,  1.56it/s]
                                                                                
{'loss': 3.4397, 'grad_norm': 4.172961711883545, 'learning_rate': 0.000758, 'epoch': 0.63}

 24%|███████▉                         | 12100/50000 [2:11:37<6:45:44,  1.56it/s]


 24%|███████▉                         | 12101/50000 [2:11:38<7:20:06,  1.44it/s]


 24%|███████▉                         | 12102/50000 [2:11:39<7:13:11,  1.46it/s]


 24%|███████▉                         | 12103/50000 [2:11:40<7:01:59,  1.50it/s]


 24%|███████▉                         | 12104/50000 [2:11:40<6:38:35,  1.58it/s]


 24%|███████▉                         | 12105/50000 [2:11:41<6:42:26,  1.57it/s]


 24%|███████▉                         | 12106/50000 [2:11:41<7:03:43,  1.49it/s]


 24%|███████▉                         | 12107/50000 [2:11:42<7:01:33,  1.50it/s]


 24%|███████▉                         | 12108/50000 [2:11:43<6:48:12,  1.55it/s]


 24%|███████▉                         | 12109/50000 [2:11:43<7:02:54,  1.49it/s]


 24%|███████▉                         | 12110/50000 [2:11:44<6:45:12,  1.56it/s]


 24%|███████▉                         | 12111/50000 [2:11:45<6:34:00,  1.60it/s]


 24%|███████▉                         | 12112/50000 [2:11:45<6:40:34,  1.58it/s]


 24%|███████▉                         | 12113/50000 [2:11:46<6:23:48,  1.65it/s]


 24%|███████▉                         | 12114/50000 [2:11:47<6:49:22,  1.54it/s]


 24%|███████▉                         | 12115/50000 [2:11:47<6:48:51,  1.54it/s]


 24%|███████▉                         | 12116/50000 [2:11:48<6:41:10,  1.57it/s]


 24%|███████▉                         | 12117/50000 [2:11:49<7:02:43,  1.49it/s]


 24%|███████▉                         | 12118/50000 [2:11:49<6:33:41,  1.60it/s]


 24%|███████▉                         | 12119/50000 [2:11:50<6:29:24,  1.62it/s]


 24%|███████▉                         | 12120/50000 [2:11:50<6:10:16,  1.71it/s]


 24%|███████▉                         | 12121/50000 [2:11:51<5:55:41,  1.77it/s]


 24%|████████                         | 12122/50000 [2:11:51<6:06:29,  1.72it/s]


 24%|████████                         | 12123/50000 [2:11:52<6:51:05,  1.54it/s]


 24%|████████                         | 12124/50000 [2:11:53<6:54:56,  1.52it/s]


 24%|████████                         | 12125/50000 [2:11:53<6:43:03,  1.57it/s]


 24%|████████                         | 12126/50000 [2:11:54<6:26:27,  1.63it/s]


 24%|████████                         | 12127/50000 [2:11:55<6:23:04,  1.65it/s]


 24%|████████                         | 12128/50000 [2:11:55<6:28:02,  1.63it/s]


 24%|████████                         | 12129/50000 [2:11:56<6:45:56,  1.55it/s]


 24%|████████                         | 12130/50000 [2:11:57<7:01:10,  1.50it/s]


 24%|████████                         | 12131/50000 [2:11:57<6:55:46,  1.52it/s]


 24%|████████                         | 12132/50000 [2:11:58<6:35:21,  1.60it/s]


 24%|████████                         | 12133/50000 [2:11:59<6:53:47,  1.53it/s]


 24%|████████                         | 12134/50000 [2:11:59<6:28:27,  1.62it/s]


 24%|████████                         | 12135/50000 [2:12:00<6:32:55,  1.61it/s]


 24%|████████                         | 12136/50000 [2:12:00<6:26:24,  1.63it/s]


 24%|████████                         | 12137/50000 [2:12:01<6:24:14,  1.64it/s]


 24%|████████                         | 12138/50000 [2:12:02<6:34:48,  1.60it/s]


 24%|████████                         | 12139/50000 [2:12:02<6:21:51,  1.65it/s]


 24%|████████                         | 12140/50000 [2:12:03<6:02:38,  1.74it/s]


 24%|████████                         | 12141/50000 [2:12:03<6:52:57,  1.53it/s]


 24%|████████                         | 12142/50000 [2:12:04<6:42:49,  1.57it/s]


 24%|████████                         | 12143/50000 [2:12:05<6:26:21,  1.63it/s]


 24%|████████                         | 12144/50000 [2:12:05<6:53:31,  1.53it/s]


 24%|████████                         | 12145/50000 [2:12:06<6:40:46,  1.57it/s]


 24%|████████                         | 12146/50000 [2:12:07<6:33:12,  1.60it/s]


 24%|████████                         | 12147/50000 [2:12:07<6:33:53,  1.60it/s]


 24%|████████                         | 12148/50000 [2:12:08<6:29:41,  1.62it/s]


 24%|████████                         | 12149/50000 [2:12:08<6:34:33,  1.60it/s]


 24%|████████                         | 12150/50000 [2:12:09<6:27:53,  1.63it/s]


 24%|████████                         | 12151/50000 [2:12:10<6:30:25,  1.62it/s]


 24%|████████                         | 12152/50000 [2:12:10<6:40:44,  1.57it/s]


 24%|████████                         | 12153/50000 [2:12:11<6:50:25,  1.54it/s]


 24%|████████                         | 12154/50000 [2:12:12<6:26:30,  1.63it/s]


 24%|████████                         | 12155/50000 [2:12:12<6:30:37,  1.61it/s]


 24%|████████                         | 12156/50000 [2:12:13<6:32:00,  1.61it/s]


 24%|████████                         | 12157/50000 [2:12:13<6:30:12,  1.62it/s]


 24%|████████                         | 12158/50000 [2:12:14<6:26:06,  1.63it/s]


 24%|████████                         | 12159/50000 [2:12:15<6:37:38,  1.59it/s]


 24%|████████                         | 12160/50000 [2:12:15<6:16:17,  1.68it/s]


 24%|████████                         | 12161/50000 [2:12:16<6:37:54,  1.58it/s]


 24%|████████                         | 12162/50000 [2:12:17<6:40:01,  1.58it/s]


 24%|████████                         | 12163/50000 [2:12:17<6:38:47,  1.58it/s]


 24%|████████                         | 12164/50000 [2:12:18<6:27:16,  1.63it/s]


 24%|████████                         | 12165/50000 [2:12:18<6:31:14,  1.61it/s]


 24%|████████                         | 12166/50000 [2:12:19<6:51:11,  1.53it/s]


 24%|████████                         | 12167/50000 [2:12:20<6:26:57,  1.63it/s]


 24%|████████                         | 12168/50000 [2:12:20<6:54:15,  1.52it/s]


 24%|████████                         | 12169/50000 [2:12:21<6:48:34,  1.54it/s]


 24%|████████                         | 12170/50000 [2:12:22<6:40:50,  1.57it/s]


 24%|████████                         | 12171/50000 [2:12:22<6:28:17,  1.62it/s]


 24%|████████                         | 12172/50000 [2:12:23<6:30:25,  1.61it/s]


 24%|████████                         | 12173/50000 [2:12:23<6:35:19,  1.59it/s]


 24%|████████                         | 12174/50000 [2:12:24<6:26:53,  1.63it/s]


 24%|████████                         | 12175/50000 [2:12:25<6:47:39,  1.55it/s]


 24%|████████                         | 12176/50000 [2:12:25<6:51:02,  1.53it/s]


 24%|████████                         | 12177/50000 [2:12:26<6:10:43,  1.70it/s]


 24%|████████                         | 12178/50000 [2:12:26<5:52:45,  1.79it/s]


 24%|████████                         | 12179/50000 [2:12:27<6:15:23,  1.68it/s]


 24%|████████                         | 12180/50000 [2:12:28<6:08:36,  1.71it/s]


 24%|████████                         | 12181/50000 [2:12:28<6:27:47,  1.63it/s]


 24%|████████                         | 12182/50000 [2:12:29<6:23:12,  1.64it/s]


 24%|████████                         | 12183/50000 [2:12:30<6:37:49,  1.58it/s]


 24%|████████                         | 12184/50000 [2:12:30<6:30:07,  1.62it/s]


 24%|████████                         | 12185/50000 [2:12:31<6:22:28,  1.65it/s]


 24%|████████                         | 12186/50000 [2:12:31<6:01:01,  1.75it/s]


 24%|████████                         | 12187/50000 [2:12:32<6:03:12,  1.74it/s]


 24%|████████                         | 12188/50000 [2:12:32<6:13:03,  1.69it/s]


 24%|████████                         | 12189/50000 [2:12:33<6:30:49,  1.61it/s]


 24%|████████                         | 12190/50000 [2:12:34<6:22:32,  1.65it/s]


 24%|████████                         | 12191/50000 [2:12:34<6:17:52,  1.67it/s]


 24%|████████                         | 12192/50000 [2:12:35<6:41:23,  1.57it/s]


 24%|████████                         | 12193/50000 [2:12:36<6:42:59,  1.56it/s]


 24%|████████                         | 12194/50000 [2:12:36<6:34:27,  1.60it/s]


 24%|████████                         | 12195/50000 [2:12:37<6:39:14,  1.58it/s]


 24%|████████                         | 12196/50000 [2:12:37<6:29:33,  1.62it/s]


 24%|████████                         | 12197/50000 [2:12:38<6:23:54,  1.64it/s]


 24%|████████                         | 12198/50000 [2:12:39<6:31:56,  1.61it/s]


 24%|████████                         | 12199/50000 [2:12:39<6:27:58,  1.62it/s]


 24%|████████                         | 12200/50000 [2:12:40<6:45:32,  1.55it/s]
                                                                                
{'loss': 3.3662, 'grad_norm': 3.093020439147949, 'learning_rate': 0.000756, 'epoch': 0.64}

 24%|████████                         | 12200/50000 [2:12:40<6:45:32,  1.55it/s]


 24%|████████                         | 12201/50000 [2:12:41<6:37:28,  1.58it/s]


 24%|████████                         | 12202/50000 [2:12:41<6:45:05,  1.56it/s]


 24%|████████                         | 12203/50000 [2:12:42<7:05:05,  1.48it/s]


 24%|████████                         | 12204/50000 [2:12:43<6:35:08,  1.59it/s]


 24%|████████                         | 12205/50000 [2:12:43<7:03:07,  1.49it/s]


 24%|████████                         | 12206/50000 [2:12:44<6:32:30,  1.60it/s]


 24%|████████                         | 12207/50000 [2:12:45<6:42:40,  1.56it/s]


 24%|████████                         | 12208/50000 [2:12:45<7:08:20,  1.47it/s]


 24%|████████                         | 12209/50000 [2:12:46<6:51:16,  1.53it/s]


 24%|████████                         | 12210/50000 [2:12:47<6:50:13,  1.54it/s]


 24%|████████                         | 12211/50000 [2:12:47<6:39:31,  1.58it/s]


 24%|████████                         | 12212/50000 [2:12:48<6:33:38,  1.60it/s]


 24%|████████                         | 12213/50000 [2:12:48<6:39:00,  1.58it/s]


 24%|████████                         | 12214/50000 [2:12:49<6:32:00,  1.61it/s]


 24%|████████                         | 12215/50000 [2:12:50<6:31:59,  1.61it/s]


 24%|████████                         | 12216/50000 [2:12:50<6:21:10,  1.65it/s]


 24%|████████                         | 12217/50000 [2:12:51<6:45:11,  1.55it/s]


 24%|████████                         | 12218/50000 [2:12:52<6:44:02,  1.56it/s]


 24%|████████                         | 12219/50000 [2:12:52<7:04:23,  1.48it/s]


 24%|████████                         | 12220/50000 [2:12:53<6:57:45,  1.51it/s]


 24%|████████                         | 12221/50000 [2:12:53<6:35:18,  1.59it/s]


 24%|████████                         | 12222/50000 [2:12:54<6:20:43,  1.65it/s]


 24%|████████                         | 12223/50000 [2:12:55<6:16:30,  1.67it/s]


 24%|████████                         | 12224/50000 [2:12:55<6:45:03,  1.55it/s]


 24%|████████                         | 12225/50000 [2:12:56<6:37:04,  1.59it/s]


 24%|████████                         | 12226/50000 [2:12:57<6:44:34,  1.56it/s]


 24%|████████                         | 12227/50000 [2:12:57<7:02:14,  1.49it/s]


 24%|████████                         | 12228/50000 [2:12:58<6:53:55,  1.52it/s]


 24%|████████                         | 12229/50000 [2:12:59<6:58:46,  1.50it/s]


 24%|████████                         | 12230/50000 [2:12:59<6:58:37,  1.50it/s]


 24%|████████                         | 12231/50000 [2:13:00<7:11:00,  1.46it/s]


 24%|████████                         | 12232/50000 [2:13:01<6:52:50,  1.52it/s]


 24%|████████                         | 12233/50000 [2:13:01<6:51:59,  1.53it/s]


 24%|████████                         | 12234/50000 [2:13:02<6:34:44,  1.59it/s]


 24%|████████                         | 12235/50000 [2:13:03<6:38:05,  1.58it/s]


 24%|████████                         | 12236/50000 [2:13:03<6:25:05,  1.63it/s]


 24%|████████                         | 12237/50000 [2:13:04<6:43:26,  1.56it/s]


 24%|████████                         | 12238/50000 [2:13:04<6:27:34,  1.62it/s]


 24%|████████                         | 12239/50000 [2:13:05<6:45:50,  1.55it/s]


 24%|████████                         | 12240/50000 [2:13:06<6:38:48,  1.58it/s]


 24%|████████                         | 12241/50000 [2:13:06<6:45:51,  1.55it/s]


 24%|████████                         | 12242/50000 [2:13:07<6:48:48,  1.54it/s]


 24%|████████                         | 12243/50000 [2:13:08<6:53:10,  1.52it/s]


 24%|████████                         | 12244/50000 [2:13:08<6:38:21,  1.58it/s]


 24%|████████                         | 12245/50000 [2:13:09<6:45:49,  1.55it/s]


 24%|████████                         | 12246/50000 [2:13:10<7:02:15,  1.49it/s]


 24%|████████                         | 12247/50000 [2:13:10<6:44:45,  1.55it/s]


 24%|████████                         | 12248/50000 [2:13:11<6:28:55,  1.62it/s]


 24%|████████                         | 12249/50000 [2:13:11<6:22:34,  1.64it/s]


 24%|████████                         | 12250/50000 [2:13:12<6:12:07,  1.69it/s]


 25%|████████                         | 12251/50000 [2:13:13<6:10:44,  1.70it/s]


 25%|████████                         | 12252/50000 [2:13:13<6:37:37,  1.58it/s]


 25%|████████                         | 12253/50000 [2:13:14<6:52:16,  1.53it/s]


 25%|████████                         | 12254/50000 [2:13:14<6:25:56,  1.63it/s]


 25%|████████                         | 12255/50000 [2:13:15<6:06:19,  1.72it/s]


 25%|████████                         | 12256/50000 [2:13:16<6:05:05,  1.72it/s]


 25%|████████                         | 12257/50000 [2:13:16<6:17:22,  1.67it/s]


 25%|████████                         | 12258/50000 [2:13:17<6:27:06,  1.62it/s]


 25%|████████                         | 12259/50000 [2:13:18<6:38:40,  1.58it/s]


 25%|████████                         | 12260/50000 [2:13:18<6:47:58,  1.54it/s]


 25%|████████                         | 12261/50000 [2:13:19<7:19:32,  1.43it/s]


 25%|████████                         | 12262/50000 [2:13:20<6:52:50,  1.52it/s]


 25%|████████                         | 12263/50000 [2:13:20<6:46:21,  1.55it/s]


 25%|████████                         | 12264/50000 [2:13:21<6:49:07,  1.54it/s]


 25%|████████                         | 12265/50000 [2:13:22<8:00:49,  1.31it/s]


 25%|████████                         | 12266/50000 [2:13:22<7:13:11,  1.45it/s]


 25%|████████                         | 12267/50000 [2:13:23<6:53:32,  1.52it/s]


 25%|████████                         | 12268/50000 [2:13:24<7:14:51,  1.45it/s]


 25%|████████                         | 12269/50000 [2:13:25<7:29:48,  1.40it/s]


 25%|████████                         | 12270/50000 [2:13:25<7:41:23,  1.36it/s]


 25%|████████                         | 12271/50000 [2:13:26<8:17:27,  1.26it/s]


 25%|████████                         | 12272/50000 [2:13:27<7:47:40,  1.34it/s]


 25%|████████                         | 12273/50000 [2:13:28<7:59:59,  1.31it/s]


 25%|████████                         | 12274/50000 [2:13:28<7:10:59,  1.46it/s]


 25%|████████                         | 12275/50000 [2:13:29<6:51:31,  1.53it/s]


 25%|████████                         | 12276/50000 [2:13:29<6:35:30,  1.59it/s]


 25%|████████                         | 12277/50000 [2:13:30<6:28:56,  1.62it/s]


 25%|████████                         | 12278/50000 [2:13:31<6:41:06,  1.57it/s]


 25%|████████                         | 12279/50000 [2:13:31<6:31:46,  1.60it/s]


 25%|████████                         | 12280/50000 [2:13:32<6:33:34,  1.60it/s]


 25%|████████                         | 12281/50000 [2:13:32<6:11:33,  1.69it/s]


 25%|████████                         | 12282/50000 [2:13:33<6:33:45,  1.60it/s]


 25%|████████                         | 12283/50000 [2:13:34<6:38:26,  1.58it/s]


 25%|████████                         | 12284/50000 [2:13:35<7:51:34,  1.33it/s]


 25%|████████                         | 12285/50000 [2:13:35<7:35:25,  1.38it/s]


 25%|████████                         | 12286/50000 [2:13:36<7:39:20,  1.37it/s]


 25%|████████                         | 12287/50000 [2:13:37<7:41:36,  1.36it/s]


 25%|████████                         | 12288/50000 [2:13:38<7:24:52,  1.41it/s]


 25%|████████                         | 12289/50000 [2:13:38<7:01:13,  1.49it/s]


 25%|████████                         | 12290/50000 [2:13:39<6:30:58,  1.61it/s]


 25%|████████                         | 12291/50000 [2:13:39<6:16:18,  1.67it/s]


 25%|████████                         | 12292/50000 [2:13:40<6:15:37,  1.67it/s]


 25%|████████                         | 12293/50000 [2:13:40<6:30:36,  1.61it/s]


 25%|████████                         | 12294/50000 [2:13:41<6:24:20,  1.64it/s]


 25%|████████                         | 12295/50000 [2:13:42<6:42:39,  1.56it/s]


 25%|████████                         | 12296/50000 [2:13:42<6:43:26,  1.56it/s]


 25%|████████                         | 12297/50000 [2:13:43<7:00:08,  1.50it/s]


 25%|████████                         | 12298/50000 [2:13:44<6:45:05,  1.55it/s]


 25%|████████                         | 12299/50000 [2:13:44<6:48:51,  1.54it/s]


 25%|████████                         | 12300/50000 [2:13:45<6:47:51,  1.54it/s]
                                                                                
{'loss': 3.3801, 'grad_norm': 3.0010714530944824, 'learning_rate': 0.000754, 'epoch': 0.64}

 25%|████████                         | 12300/50000 [2:13:45<6:47:51,  1.54it/s]


 25%|████████                         | 12301/50000 [2:13:46<6:37:38,  1.58it/s]


 25%|████████                         | 12302/50000 [2:13:46<6:14:25,  1.68it/s]


 25%|████████                         | 12303/50000 [2:13:47<6:12:59,  1.68it/s]


 25%|████████                         | 12304/50000 [2:13:47<6:23:54,  1.64it/s]


 25%|████████                         | 12305/50000 [2:13:48<6:02:26,  1.73it/s]


 25%|████████                         | 12306/50000 [2:13:48<5:58:43,  1.75it/s]


 25%|████████                         | 12307/50000 [2:13:49<6:43:35,  1.56it/s]


 25%|████████                         | 12308/50000 [2:13:50<6:33:21,  1.60it/s]


 25%|████████                         | 12309/50000 [2:13:51<6:54:06,  1.52it/s]


 25%|████████                         | 12310/50000 [2:13:51<7:11:03,  1.46it/s]


 25%|████████▏                        | 12311/50000 [2:13:52<6:51:28,  1.53it/s]


 25%|████████▏                        | 12312/50000 [2:13:53<7:21:40,  1.42it/s]


 25%|████████▏                        | 12313/50000 [2:13:53<7:00:01,  1.50it/s]


 25%|████████▏                        | 12314/50000 [2:13:54<7:17:34,  1.44it/s]


 25%|████████▏                        | 12315/50000 [2:13:55<6:52:27,  1.52it/s]


 25%|████████▏                        | 12316/50000 [2:13:55<6:40:50,  1.57it/s]


 25%|████████▏                        | 12317/50000 [2:13:56<6:32:33,  1.60it/s]


 25%|████████▏                        | 12318/50000 [2:13:57<6:57:05,  1.51it/s]


 25%|████████▏                        | 12319/50000 [2:13:57<6:58:25,  1.50it/s]


 25%|████████▏                        | 12320/50000 [2:13:58<6:31:14,  1.61it/s]


 25%|████████▏                        | 12321/50000 [2:13:58<6:36:48,  1.58it/s]


 25%|████████▏                        | 12322/50000 [2:13:59<6:30:57,  1.61it/s]


 25%|████████▏                        | 12323/50000 [2:14:00<6:23:24,  1.64it/s]


 25%|████████▏                        | 12324/50000 [2:14:00<6:04:24,  1.72it/s]


 25%|████████▏                        | 12325/50000 [2:14:01<6:42:12,  1.56it/s]


 25%|████████▏                        | 12326/50000 [2:14:02<6:49:04,  1.53it/s]


 25%|████████▏                        | 12327/50000 [2:14:02<7:12:18,  1.45it/s]


 25%|████████▏                        | 12328/50000 [2:14:03<6:46:25,  1.54it/s]


 25%|████████▏                        | 12329/50000 [2:14:04<6:59:41,  1.50it/s]


 25%|████████▏                        | 12330/50000 [2:14:04<6:41:02,  1.57it/s]


 25%|████████▏                        | 12331/50000 [2:14:05<6:43:25,  1.56it/s]


 25%|████████▏                        | 12332/50000 [2:14:06<7:38:56,  1.37it/s]


 25%|████████▏                        | 12333/50000 [2:14:06<7:21:35,  1.42it/s]


 25%|████████▏                        | 12334/50000 [2:14:07<7:05:55,  1.47it/s]


 25%|████████▏                        | 12335/50000 [2:14:08<7:00:46,  1.49it/s]


 25%|████████▏                        | 12336/50000 [2:14:08<6:41:52,  1.56it/s]


 25%|████████▏                        | 12337/50000 [2:14:09<6:30:56,  1.61it/s]


 25%|████████▏                        | 12338/50000 [2:14:09<6:39:59,  1.57it/s]


 25%|████████▏                        | 12339/50000 [2:14:10<6:17:08,  1.66it/s]


 25%|████████▏                        | 12340/50000 [2:14:11<6:48:36,  1.54it/s]


 25%|████████▏                        | 12341/50000 [2:14:12<7:26:51,  1.40it/s]


 25%|████████▏                        | 12342/50000 [2:14:12<7:09:33,  1.46it/s]


 25%|████████▏                        | 12343/50000 [2:14:13<7:35:29,  1.38it/s]


 25%|████████▏                        | 12344/50000 [2:14:14<7:23:54,  1.41it/s]


 25%|████████▏                        | 12345/50000 [2:14:14<7:18:08,  1.43it/s]


 25%|████████▏                        | 12346/50000 [2:14:15<7:00:57,  1.49it/s]


 25%|████████▏                        | 12347/50000 [2:14:16<7:30:33,  1.39it/s]


 25%|████████▏                        | 12348/50000 [2:14:16<6:57:50,  1.50it/s]


 25%|████████▏                        | 12349/50000 [2:14:17<6:56:18,  1.51it/s]


 25%|████████▏                        | 12350/50000 [2:14:18<6:58:49,  1.50it/s]


 25%|████████▏                        | 12351/50000 [2:14:18<6:53:24,  1.52it/s]


 25%|████████▏                        | 12352/50000 [2:14:19<6:27:36,  1.62it/s]


 25%|████████▏                        | 12353/50000 [2:14:20<6:33:56,  1.59it/s]


 25%|████████▏                        | 12354/50000 [2:14:20<6:14:00,  1.68it/s]


 25%|████████▏                        | 12355/50000 [2:14:21<6:22:00,  1.64it/s]


 25%|████████▏                        | 12356/50000 [2:14:21<6:25:17,  1.63it/s]


 25%|████████▏                        | 12357/50000 [2:14:22<6:14:34,  1.67it/s]


 25%|████████▏                        | 12358/50000 [2:14:23<6:39:30,  1.57it/s]


 25%|████████▏                        | 12359/50000 [2:14:23<6:39:56,  1.57it/s]


 25%|████████▏                        | 12360/50000 [2:14:24<7:13:24,  1.45it/s]


 25%|████████▏                        | 12361/50000 [2:14:25<6:50:21,  1.53it/s]


 25%|████████▏                        | 12362/50000 [2:14:25<6:45:05,  1.55it/s]


 25%|████████▏                        | 12363/50000 [2:14:26<6:50:56,  1.53it/s]


 25%|████████▏                        | 12364/50000 [2:14:27<7:14:35,  1.44it/s]


 25%|████████▏                        | 12365/50000 [2:14:27<7:11:32,  1.45it/s]


 25%|████████▏                        | 12366/50000 [2:14:28<6:53:54,  1.52it/s]


 25%|████████▏                        | 12367/50000 [2:14:29<6:39:13,  1.57it/s]


 25%|████████▏                        | 12368/50000 [2:14:29<6:36:08,  1.58it/s]


 25%|████████▏                        | 12369/50000 [2:14:30<6:07:20,  1.71it/s]


 25%|████████▏                        | 12370/50000 [2:14:30<6:16:12,  1.67it/s]


 25%|████████▏                        | 12371/50000 [2:14:31<6:12:53,  1.68it/s]


 25%|████████▏                        | 12372/50000 [2:14:32<6:39:28,  1.57it/s]


 25%|████████▏                        | 12373/50000 [2:14:32<6:39:56,  1.57it/s]


 25%|████████▏                        | 12374/50000 [2:14:33<6:46:11,  1.54it/s]


 25%|████████▏                        | 12375/50000 [2:14:34<6:34:57,  1.59it/s]


 25%|████████▏                        | 12376/50000 [2:14:34<6:40:18,  1.57it/s]


 25%|████████▏                        | 12377/50000 [2:14:35<6:28:30,  1.61it/s]


 25%|████████▏                        | 12378/50000 [2:14:35<6:24:58,  1.63it/s]


 25%|████████▏                        | 12379/50000 [2:14:36<7:07:41,  1.47it/s]


 25%|████████▏                        | 12380/50000 [2:14:37<6:51:11,  1.52it/s]


 25%|████████▏                        | 12381/50000 [2:14:37<6:33:44,  1.59it/s]


 25%|████████▏                        | 12382/50000 [2:14:38<6:10:20,  1.69it/s]


 25%|████████▏                        | 12383/50000 [2:14:38<6:12:17,  1.68it/s]


 25%|████████▏                        | 12384/50000 [2:14:39<6:24:21,  1.63it/s]


 25%|████████▏                        | 12385/50000 [2:14:40<7:03:41,  1.48it/s]


 25%|████████▏                        | 12386/50000 [2:14:40<6:40:07,  1.57it/s]


 25%|████████▏                        | 12387/50000 [2:14:41<6:29:36,  1.61it/s]


 25%|████████▏                        | 12388/50000 [2:14:42<6:22:12,  1.64it/s]


 25%|████████▏                        | 12389/50000 [2:14:42<6:48:17,  1.54it/s]


 25%|████████▏                        | 12390/50000 [2:14:43<6:45:22,  1.55it/s]


 25%|████████▏                        | 12391/50000 [2:14:44<6:32:15,  1.60it/s]


 25%|████████▏                        | 12392/50000 [2:14:44<6:27:35,  1.62it/s]


 25%|████████▏                        | 12393/50000 [2:14:45<6:37:29,  1.58it/s]


 25%|████████▏                        | 12394/50000 [2:14:46<6:52:41,  1.52it/s]


 25%|████████▏                        | 12395/50000 [2:14:46<6:21:12,  1.64it/s]


 25%|████████▏                        | 12396/50000 [2:14:47<6:28:03,  1.62it/s]


 25%|████████▏                        | 12397/50000 [2:14:47<6:33:03,  1.59it/s]


 25%|████████▏                        | 12398/50000 [2:14:48<7:18:13,  1.43it/s]


 25%|████████▏                        | 12399/50000 [2:14:49<7:10:32,  1.46it/s]


 25%|████████▏                        | 12400/50000 [2:14:50<7:14:42,  1.44it/s]
                                                                                
{'loss': 3.352, 'grad_norm': 2.62391996383667, 'learning_rate': 0.0007520000000000001, 'epoch': 0.65}

 25%|████████▏                        | 12400/50000 [2:14:50<7:14:42,  1.44it/s]


 25%|████████▏                        | 12401/50000 [2:14:50<7:26:15,  1.40it/s]


 25%|████████▏                        | 12402/50000 [2:14:51<7:11:17,  1.45it/s]


 25%|████████▏                        | 12403/50000 [2:14:52<7:25:09,  1.41it/s]


 25%|████████▏                        | 12404/50000 [2:14:52<7:01:00,  1.49it/s]


 25%|████████▏                        | 12405/50000 [2:14:53<6:52:30,  1.52it/s]


 25%|████████▏                        | 12406/50000 [2:14:54<6:32:12,  1.60it/s]


 25%|████████▏                        | 12407/50000 [2:14:54<6:19:18,  1.65it/s]


 25%|████████▏                        | 12408/50000 [2:14:55<6:11:22,  1.69it/s]


 25%|████████▏                        | 12409/50000 [2:14:55<6:02:15,  1.73it/s]


 25%|████████▏                        | 12410/50000 [2:14:56<6:34:19,  1.59it/s]


 25%|████████▏                        | 12411/50000 [2:14:57<6:55:35,  1.51it/s]


 25%|████████▏                        | 12412/50000 [2:14:57<6:55:46,  1.51it/s]


 25%|████████▏                        | 12413/50000 [2:14:58<6:33:38,  1.59it/s]


 25%|████████▏                        | 12414/50000 [2:14:58<6:12:59,  1.68it/s]


 25%|████████▏                        | 12415/50000 [2:14:59<6:25:09,  1.63it/s]


 25%|████████▏                        | 12416/50000 [2:15:00<6:28:54,  1.61it/s]


 25%|████████▏                        | 12417/50000 [2:15:00<6:10:45,  1.69it/s]


 25%|████████▏                        | 12418/50000 [2:15:01<6:16:39,  1.66it/s]


 25%|████████▏                        | 12419/50000 [2:15:02<6:59:39,  1.49it/s]


 25%|████████▏                        | 12420/50000 [2:15:02<6:55:28,  1.51it/s]


 25%|████████▏                        | 12421/50000 [2:15:03<6:49:00,  1.53it/s]


 25%|████████▏                        | 12422/50000 [2:15:04<7:35:40,  1.37it/s]


 25%|████████▏                        | 12423/50000 [2:15:04<7:13:23,  1.45it/s]


 25%|████████▏                        | 12424/50000 [2:15:05<6:50:47,  1.52it/s]


 25%|████████▏                        | 12425/50000 [2:15:06<7:47:32,  1.34it/s]


 25%|████████▏                        | 12426/50000 [2:15:07<7:47:07,  1.34it/s]


 25%|████████▏                        | 12427/50000 [2:15:07<7:14:28,  1.44it/s]


 25%|████████▏                        | 12428/50000 [2:15:08<6:58:34,  1.50it/s]


 25%|████████▏                        | 12429/50000 [2:15:09<7:17:38,  1.43it/s]


 25%|████████▏                        | 12430/50000 [2:15:09<7:06:57,  1.47it/s]


 25%|████████▏                        | 12431/50000 [2:15:10<6:51:04,  1.52it/s]


 25%|████████▏                        | 12432/50000 [2:15:10<6:25:07,  1.63it/s]


 25%|████████▏                        | 12433/50000 [2:15:11<6:13:26,  1.68it/s]


 25%|████████▏                        | 12434/50000 [2:15:12<6:23:03,  1.63it/s]


 25%|████████▏                        | 12435/50000 [2:15:12<6:41:41,  1.56it/s]


 25%|████████▏                        | 12436/50000 [2:15:13<6:41:33,  1.56it/s]


 25%|████████▏                        | 12437/50000 [2:15:14<6:19:29,  1.65it/s]


 25%|████████▏                        | 12438/50000 [2:15:14<6:02:57,  1.72it/s]


 25%|████████▏                        | 12439/50000 [2:15:15<6:47:42,  1.54it/s]


 25%|████████▏                        | 12440/50000 [2:15:15<6:31:22,  1.60it/s]


 25%|████████▏                        | 12441/50000 [2:15:16<6:54:34,  1.51it/s]


 25%|████████▏                        | 12442/50000 [2:15:17<6:43:03,  1.55it/s]


 25%|████████▏                        | 12443/50000 [2:15:17<6:31:16,  1.60it/s]


 25%|████████▏                        | 12444/50000 [2:15:18<6:54:50,  1.51it/s]


 25%|████████▏                        | 12445/50000 [2:15:19<6:39:42,  1.57it/s]


 25%|████████▏                        | 12446/50000 [2:15:19<6:33:45,  1.59it/s]


 25%|████████▏                        | 12447/50000 [2:15:20<6:27:14,  1.62it/s]


 25%|████████▏                        | 12448/50000 [2:15:21<6:29:56,  1.61it/s]


 25%|████████▏                        | 12449/50000 [2:15:21<6:38:56,  1.57it/s]


 25%|████████▏                        | 12450/50000 [2:15:22<6:58:40,  1.49it/s]


 25%|████████▏                        | 12451/50000 [2:15:23<6:51:55,  1.52it/s]


 25%|████████▏                        | 12452/50000 [2:15:23<6:26:23,  1.62it/s]


 25%|████████▏                        | 12453/50000 [2:15:24<6:16:11,  1.66it/s]


 25%|████████▏                        | 12454/50000 [2:15:24<6:11:28,  1.68it/s]


 25%|████████▏                        | 12455/50000 [2:15:25<6:14:00,  1.67it/s]


 25%|████████▏                        | 12456/50000 [2:15:25<6:11:06,  1.69it/s]


 25%|████████▏                        | 12457/50000 [2:15:26<6:24:28,  1.63it/s]


 25%|████████▏                        | 12458/50000 [2:15:27<6:55:43,  1.51it/s]


 25%|████████▏                        | 12459/50000 [2:15:27<6:42:35,  1.55it/s]


 25%|████████▏                        | 12460/50000 [2:15:28<6:34:38,  1.59it/s]


 25%|████████▏                        | 12461/50000 [2:15:29<6:53:25,  1.51it/s]


 25%|████████▏                        | 12462/50000 [2:15:29<6:57:52,  1.50it/s]


 25%|████████▏                        | 12463/50000 [2:15:30<6:23:33,  1.63it/s]


 25%|████████▏                        | 12464/50000 [2:15:31<6:17:49,  1.66it/s]


 25%|████████▏                        | 12465/50000 [2:15:31<6:11:30,  1.68it/s]


 25%|████████▏                        | 12466/50000 [2:15:32<6:22:18,  1.64it/s]


 25%|████████▏                        | 12467/50000 [2:15:32<6:20:19,  1.64it/s]


 25%|████████▏                        | 12468/50000 [2:15:33<6:14:20,  1.67it/s]


 25%|████████▏                        | 12469/50000 [2:15:34<7:02:21,  1.48it/s]


 25%|████████▏                        | 12470/50000 [2:15:34<6:57:54,  1.50it/s]


 25%|████████▏                        | 12471/50000 [2:15:35<6:46:28,  1.54it/s]


 25%|████████▏                        | 12472/50000 [2:15:36<6:49:14,  1.53it/s]


 25%|████████▏                        | 12473/50000 [2:15:36<6:35:43,  1.58it/s]


 25%|████████▏                        | 12474/50000 [2:15:37<6:44:01,  1.55it/s]


 25%|████████▏                        | 12475/50000 [2:15:38<6:47:41,  1.53it/s]


 25%|████████▏                        | 12476/50000 [2:15:38<6:38:11,  1.57it/s]


 25%|████████▏                        | 12477/50000 [2:15:39<6:32:38,  1.59it/s]


 25%|████████▏                        | 12478/50000 [2:15:40<6:40:45,  1.56it/s]


 25%|████████▏                        | 12479/50000 [2:15:40<6:39:19,  1.57it/s]


 25%|████████▏                        | 12480/50000 [2:15:41<6:37:01,  1.58it/s]


 25%|████████▏                        | 12481/50000 [2:15:41<6:24:56,  1.62it/s]


 25%|████████▏                        | 12482/50000 [2:15:42<7:09:59,  1.45it/s]


 25%|████████▏                        | 12483/50000 [2:15:43<6:46:49,  1.54it/s]


 25%|████████▏                        | 12484/50000 [2:15:43<6:38:41,  1.57it/s]


 25%|████████▏                        | 12485/50000 [2:15:44<6:29:28,  1.61it/s]


 25%|████████▏                        | 12486/50000 [2:15:45<6:34:49,  1.58it/s]


 25%|████████▏                        | 12487/50000 [2:15:45<6:44:39,  1.55it/s]


 25%|████████▏                        | 12488/50000 [2:15:46<6:32:28,  1.59it/s]


 25%|████████▏                        | 12489/50000 [2:15:46<6:21:32,  1.64it/s]


 25%|████████▏                        | 12490/50000 [2:15:47<6:19:47,  1.65it/s]


 25%|████████▏                        | 12491/50000 [2:15:48<6:31:39,  1.60it/s]


 25%|████████▏                        | 12492/50000 [2:15:48<6:41:05,  1.56it/s]


 25%|████████▏                        | 12493/50000 [2:15:49<6:45:21,  1.54it/s]


 25%|████████▏                        | 12494/50000 [2:15:50<6:51:58,  1.52it/s]


 25%|████████▏                        | 12495/50000 [2:15:50<6:35:05,  1.58it/s]


 25%|████████▏                        | 12496/50000 [2:15:51<6:44:46,  1.54it/s]


 25%|████████▏                        | 12497/50000 [2:15:52<6:36:03,  1.58it/s]


 25%|████████▏                        | 12498/50000 [2:15:52<6:34:40,  1.58it/s]


 25%|████████▏                        | 12499/50000 [2:15:53<6:29:02,  1.61it/s]


 25%|████████▎                        | 12500/50000 [2:15:53<6:35:47,  1.58it/s]
                                                                                
{'loss': 3.3761, 'grad_norm': 3.781750440597534, 'learning_rate': 0.00075, 'epoch': 0.65}

 25%|████████▎                        | 12500/50000 [2:15:53<6:35:47,  1.58it/s]


 25%|████████▎                        | 12501/50000 [2:15:54<6:59:54,  1.49it/s]


 25%|████████▎                        | 12502/50000 [2:15:55<6:31:45,  1.60it/s]


 25%|████████▎                        | 12503/50000 [2:15:55<6:30:32,  1.60it/s]


 25%|████████▎                        | 12504/50000 [2:15:56<6:22:35,  1.63it/s]


 25%|████████▎                        | 12505/50000 [2:15:57<6:45:58,  1.54it/s]


 25%|████████▎                        | 12506/50000 [2:15:57<6:42:52,  1.55it/s]


 25%|████████▎                        | 12507/50000 [2:15:58<6:39:25,  1.56it/s]


 25%|████████▎                        | 12508/50000 [2:15:58<6:16:19,  1.66it/s]


 25%|████████▎                        | 12509/50000 [2:15:59<6:30:19,  1.60it/s]


 25%|████████▎                        | 12510/50000 [2:16:00<6:30:40,  1.60it/s]


 25%|████████▎                        | 12511/50000 [2:16:00<6:36:52,  1.57it/s]


 25%|████████▎                        | 12512/50000 [2:16:01<6:28:10,  1.61it/s]


 25%|████████▎                        | 12513/50000 [2:16:02<6:33:53,  1.59it/s]


 25%|████████▎                        | 12514/50000 [2:16:02<6:20:11,  1.64it/s]


 25%|████████▎                        | 12515/50000 [2:16:03<6:28:13,  1.61it/s]


 25%|████████▎                        | 12516/50000 [2:16:04<6:46:17,  1.54it/s]


 25%|████████▎                        | 12517/50000 [2:16:04<6:34:51,  1.58it/s]


 25%|████████▎                        | 12518/50000 [2:16:05<6:32:44,  1.59it/s]


 25%|████████▎                        | 12519/50000 [2:16:05<6:32:15,  1.59it/s]


 25%|████████▎                        | 12520/50000 [2:16:06<6:23:47,  1.63it/s]


 25%|████████▎                        | 12521/50000 [2:16:07<6:33:12,  1.59it/s]


 25%|████████▎                        | 12522/50000 [2:16:07<6:50:48,  1.52it/s]


 25%|████████▎                        | 12523/50000 [2:16:08<6:36:43,  1.57it/s]


 25%|████████▎                        | 12524/50000 [2:16:09<6:44:31,  1.54it/s]


 25%|████████▎                        | 12525/50000 [2:16:09<6:51:10,  1.52it/s]


 25%|████████▎                        | 12526/50000 [2:16:10<7:27:51,  1.39it/s]


 25%|████████▎                        | 12527/50000 [2:16:11<7:03:44,  1.47it/s]


 25%|████████▎                        | 12528/50000 [2:16:11<6:46:50,  1.54it/s]


 25%|████████▎                        | 12529/50000 [2:16:12<6:31:29,  1.60it/s]


 25%|████████▎                        | 12530/50000 [2:16:13<6:20:42,  1.64it/s]


 25%|████████▎                        | 12531/50000 [2:16:13<6:41:58,  1.55it/s]


 25%|████████▎                        | 12532/50000 [2:16:14<6:56:56,  1.50it/s]


 25%|████████▎                        | 12533/50000 [2:16:15<7:11:01,  1.45it/s]


 25%|████████▎                        | 12534/50000 [2:16:15<6:55:27,  1.50it/s]


 25%|████████▎                        | 12535/50000 [2:16:16<6:35:17,  1.58it/s]


 25%|████████▎                        | 12536/50000 [2:16:16<6:14:34,  1.67it/s]


 25%|████████▎                        | 12537/50000 [2:16:17<6:57:24,  1.50it/s]


 25%|████████▎                        | 12538/50000 [2:16:18<6:58:59,  1.49it/s]


 25%|████████▎                        | 12539/50000 [2:16:19<6:47:04,  1.53it/s]


 25%|████████▎                        | 12540/50000 [2:16:19<7:00:31,  1.48it/s]


 25%|████████▎                        | 12541/50000 [2:16:20<6:57:41,  1.49it/s]


 25%|████████▎                        | 12542/50000 [2:16:20<6:27:52,  1.61it/s]


 25%|████████▎                        | 12543/50000 [2:16:21<6:16:59,  1.66it/s]


 25%|████████▎                        | 12544/50000 [2:16:22<7:00:10,  1.49it/s]


 25%|████████▎                        | 12545/50000 [2:16:23<7:00:50,  1.48it/s]


 25%|████████▎                        | 12546/50000 [2:16:23<6:58:57,  1.49it/s]


 25%|████████▎                        | 12547/50000 [2:16:24<6:57:40,  1.49it/s]


 25%|████████▎                        | 12548/50000 [2:16:25<7:11:29,  1.45it/s]


 25%|████████▎                        | 12549/50000 [2:16:25<7:10:00,  1.45it/s]


 25%|████████▎                        | 12550/50000 [2:16:26<6:52:26,  1.51it/s]


 25%|████████▎                        | 12551/50000 [2:16:26<6:42:19,  1.55it/s]


 25%|████████▎                        | 12552/50000 [2:16:27<6:37:45,  1.57it/s]


 25%|████████▎                        | 12553/50000 [2:16:28<6:55:18,  1.50it/s]


 25%|████████▎                        | 12554/50000 [2:16:28<6:25:04,  1.62it/s]


 25%|████████▎                        | 12555/50000 [2:16:29<6:35:03,  1.58it/s]


 25%|████████▎                        | 12556/50000 [2:16:30<6:28:59,  1.60it/s]


 25%|████████▎                        | 12557/50000 [2:16:30<6:37:52,  1.57it/s]


 25%|████████▎                        | 12558/50000 [2:16:31<6:24:02,  1.62it/s]


 25%|████████▎                        | 12559/50000 [2:16:31<6:21:21,  1.64it/s]


 25%|████████▎                        | 12560/50000 [2:16:32<6:17:04,  1.65it/s]


 25%|████████▎                        | 12561/50000 [2:16:33<7:01:28,  1.48it/s]


 25%|████████▎                        | 12562/50000 [2:16:33<6:47:28,  1.53it/s]


 25%|████████▎                        | 12563/50000 [2:16:34<6:44:50,  1.54it/s]


 25%|████████▎                        | 12564/50000 [2:16:35<6:32:26,  1.59it/s]


 25%|████████▎                        | 12565/50000 [2:16:35<6:49:04,  1.53it/s]


 25%|████████▎                        | 12566/50000 [2:16:36<7:05:25,  1.47it/s]


 25%|████████▎                        | 12567/50000 [2:16:37<6:43:15,  1.55it/s]


 25%|████████▎                        | 12568/50000 [2:16:37<6:34:43,  1.58it/s]


 25%|████████▎                        | 12569/50000 [2:16:38<6:33:40,  1.58it/s]


 25%|████████▎                        | 12570/50000 [2:16:39<7:22:42,  1.41it/s]


 25%|████████▎                        | 12571/50000 [2:16:39<7:11:49,  1.44it/s]


 25%|████████▎                        | 12572/50000 [2:16:40<6:50:08,  1.52it/s]


 25%|████████▎                        | 12573/50000 [2:16:41<6:32:32,  1.59it/s]


 25%|████████▎                        | 12574/50000 [2:16:41<6:41:30,  1.55it/s]


 25%|████████▎                        | 12575/50000 [2:16:42<6:48:36,  1.53it/s]


 25%|████████▎                        | 12576/50000 [2:16:43<6:54:13,  1.51it/s]


 25%|████████▎                        | 12577/50000 [2:16:43<6:35:46,  1.58it/s]


 25%|████████▎                        | 12578/50000 [2:16:44<6:33:31,  1.58it/s]


 25%|████████▎                        | 12579/50000 [2:16:44<6:36:31,  1.57it/s]


 25%|████████▎                        | 12580/50000 [2:16:45<6:12:07,  1.68it/s]


 25%|████████▎                        | 12581/50000 [2:16:46<6:18:05,  1.65it/s]


 25%|████████▎                        | 12582/50000 [2:16:46<6:37:44,  1.57it/s]


 25%|████████▎                        | 12583/50000 [2:16:47<6:37:13,  1.57it/s]


 25%|████████▎                        | 12584/50000 [2:16:48<6:24:58,  1.62it/s]


 25%|████████▎                        | 12585/50000 [2:16:48<6:52:19,  1.51it/s]


 25%|████████▎                        | 12586/50000 [2:16:49<6:50:39,  1.52it/s]


 25%|████████▎                        | 12587/50000 [2:16:50<7:01:12,  1.48it/s]


 25%|████████▎                        | 12588/50000 [2:16:50<7:11:28,  1.45it/s]


 25%|████████▎                        | 12589/50000 [2:16:51<7:18:41,  1.42it/s]


 25%|████████▎                        | 12590/50000 [2:16:52<7:41:52,  1.35it/s]


 25%|████████▎                        | 12591/50000 [2:16:53<7:09:54,  1.45it/s]


 25%|████████▎                        | 12592/50000 [2:16:53<7:01:41,  1.48it/s]


 25%|████████▎                        | 12593/50000 [2:16:54<6:58:18,  1.49it/s]


 25%|████████▎                        | 12594/50000 [2:16:54<6:55:54,  1.50it/s]


 25%|████████▎                        | 12595/50000 [2:16:55<6:55:23,  1.50it/s]


 25%|████████▎                        | 12596/50000 [2:16:56<6:43:14,  1.55it/s]


 25%|████████▎                        | 12597/50000 [2:16:56<6:26:27,  1.61it/s]


 25%|████████▎                        | 12598/50000 [2:16:57<6:03:28,  1.72it/s]


 25%|████████▎                        | 12599/50000 [2:16:58<6:49:18,  1.52it/s]


 25%|████████▎                        | 12600/50000 [2:16:58<6:38:56,  1.56it/s]
                                                                                
{'loss': 3.4086, 'grad_norm': 2.8407270908355713, 'learning_rate': 0.000748, 'epoch': 0.66}

 25%|████████▎                        | 12600/50000 [2:16:58<6:38:56,  1.56it/s]


 25%|████████▎                        | 12601/50000 [2:16:59<6:30:38,  1.60it/s]


 25%|████████▎                        | 12602/50000 [2:16:59<6:31:48,  1.59it/s]


 25%|████████▎                        | 12603/50000 [2:17:00<6:39:37,  1.56it/s]


 25%|████████▎                        | 12604/50000 [2:17:01<6:39:16,  1.56it/s]


 25%|████████▎                        | 12605/50000 [2:17:02<6:58:16,  1.49it/s]


 25%|████████▎                        | 12606/50000 [2:17:02<6:52:14,  1.51it/s]


 25%|████████▎                        | 12607/50000 [2:17:03<6:51:56,  1.51it/s]


 25%|████████▎                        | 12608/50000 [2:17:04<7:02:34,  1.47it/s]


 25%|████████▎                        | 12609/50000 [2:17:04<7:13:26,  1.44it/s]


 25%|████████▎                        | 12610/50000 [2:17:05<7:05:10,  1.47it/s]


 25%|████████▎                        | 12611/50000 [2:17:05<6:35:10,  1.58it/s]


 25%|████████▎                        | 12612/50000 [2:17:06<6:27:49,  1.61it/s]


 25%|████████▎                        | 12613/50000 [2:17:07<6:37:58,  1.57it/s]


 25%|████████▎                        | 12614/50000 [2:17:07<6:27:26,  1.61it/s]


 25%|████████▎                        | 12615/50000 [2:17:08<6:28:26,  1.60it/s]


 25%|████████▎                        | 12616/50000 [2:17:09<6:35:01,  1.58it/s]


 25%|████████▎                        | 12617/50000 [2:17:09<6:24:12,  1.62it/s]


 25%|████████▎                        | 12618/50000 [2:17:10<6:15:24,  1.66it/s]


 25%|████████▎                        | 12619/50000 [2:17:10<6:11:34,  1.68it/s]


 25%|████████▎                        | 12620/50000 [2:17:11<6:05:21,  1.71it/s]


 25%|████████▎                        | 12621/50000 [2:17:12<6:16:22,  1.66it/s]


 25%|████████▎                        | 12622/50000 [2:17:12<6:14:40,  1.66it/s]


 25%|████████▎                        | 12623/50000 [2:17:13<6:22:52,  1.63it/s]


 25%|████████▎                        | 12624/50000 [2:17:13<6:44:33,  1.54it/s]


 25%|████████▎                        | 12625/50000 [2:17:14<6:50:51,  1.52it/s]


 25%|████████▎                        | 12626/50000 [2:17:15<7:02:54,  1.47it/s]


 25%|████████▎                        | 12627/50000 [2:17:16<6:56:35,  1.50it/s]


 25%|████████▎                        | 12628/50000 [2:17:16<6:38:08,  1.56it/s]


 25%|████████▎                        | 12629/50000 [2:17:17<6:15:06,  1.66it/s]


 25%|████████▎                        | 12630/50000 [2:17:17<6:00:05,  1.73it/s]


 25%|████████▎                        | 12631/50000 [2:17:18<5:58:32,  1.74it/s]


 25%|████████▎                        | 12632/50000 [2:17:18<6:27:27,  1.61it/s]


 25%|████████▎                        | 12633/50000 [2:17:19<6:36:39,  1.57it/s]


 25%|████████▎                        | 12634/50000 [2:17:20<6:50:05,  1.52it/s]


 25%|████████▎                        | 12635/50000 [2:17:20<6:16:34,  1.65it/s]


 25%|████████▎                        | 12636/50000 [2:17:21<6:01:06,  1.72it/s]


 25%|████████▎                        | 12637/50000 [2:17:22<6:49:53,  1.52it/s]


 25%|████████▎                        | 12638/50000 [2:17:22<6:59:22,  1.48it/s]


 25%|████████▎                        | 12639/50000 [2:17:23<7:00:02,  1.48it/s]


 25%|████████▎                        | 12640/50000 [2:17:24<6:59:17,  1.49it/s]


 25%|████████▎                        | 12641/50000 [2:17:24<6:58:44,  1.49it/s]


 25%|████████▎                        | 12642/50000 [2:17:25<6:52:25,  1.51it/s]


 25%|████████▎                        | 12643/50000 [2:17:26<6:39:46,  1.56it/s]


 25%|████████▎                        | 12644/50000 [2:17:26<6:13:52,  1.67it/s]


 25%|████████▎                        | 12645/50000 [2:17:27<6:23:20,  1.62it/s]


 25%|████████▎                        | 12646/50000 [2:17:27<6:20:32,  1.64it/s]


 25%|████████▎                        | 12647/50000 [2:17:28<6:24:38,  1.62it/s]


 25%|████████▎                        | 12648/50000 [2:17:29<6:34:26,  1.58it/s]


 25%|████████▎                        | 12649/50000 [2:17:29<6:42:32,  1.55it/s]


 25%|████████▎                        | 12650/50000 [2:17:30<6:34:01,  1.58it/s]


 25%|████████▎                        | 12651/50000 [2:17:31<6:48:13,  1.52it/s]


 25%|████████▎                        | 12652/50000 [2:17:31<6:36:48,  1.57it/s]


 25%|████████▎                        | 12653/50000 [2:17:32<6:56:22,  1.49it/s]


 25%|████████▎                        | 12654/50000 [2:17:33<6:50:42,  1.52it/s]


 25%|████████▎                        | 12655/50000 [2:17:33<6:55:03,  1.50it/s]


 25%|████████▎                        | 12656/50000 [2:17:34<6:54:41,  1.50it/s]


 25%|████████▎                        | 12657/50000 [2:17:35<6:56:42,  1.49it/s]


 25%|████████▎                        | 12658/50000 [2:17:35<6:47:32,  1.53it/s]


 25%|████████▎                        | 12659/50000 [2:17:36<6:35:11,  1.57it/s]


 25%|████████▎                        | 12660/50000 [2:17:37<6:38:17,  1.56it/s]


 25%|████████▎                        | 12661/50000 [2:17:37<6:23:22,  1.62it/s]


 25%|████████▎                        | 12662/50000 [2:17:38<6:33:43,  1.58it/s]


 25%|████████▎                        | 12663/50000 [2:17:38<6:20:53,  1.63it/s]


 25%|████████▎                        | 12664/50000 [2:17:39<6:25:50,  1.61it/s]


 25%|████████▎                        | 12665/50000 [2:17:40<6:35:24,  1.57it/s]


 25%|████████▎                        | 12666/50000 [2:17:40<6:34:55,  1.58it/s]


 25%|████████▎                        | 12667/50000 [2:17:41<6:10:27,  1.68it/s]


 25%|████████▎                        | 12668/50000 [2:17:41<6:11:31,  1.67it/s]


 25%|████████▎                        | 12669/50000 [2:17:42<6:18:12,  1.65it/s]


 25%|████████▎                        | 12670/50000 [2:17:43<6:12:17,  1.67it/s]


 25%|████████▎                        | 12671/50000 [2:17:43<6:22:09,  1.63it/s]


 25%|████████▎                        | 12672/50000 [2:17:44<6:42:34,  1.55it/s]


 25%|████████▎                        | 12673/50000 [2:17:45<6:35:06,  1.57it/s]


 25%|████████▎                        | 12674/50000 [2:17:45<7:26:45,  1.39it/s]


 25%|████████▎                        | 12675/50000 [2:17:46<7:01:24,  1.48it/s]


 25%|████████▎                        | 12676/50000 [2:17:47<6:48:20,  1.52it/s]


 25%|████████▎                        | 12677/50000 [2:17:47<6:34:32,  1.58it/s]


 25%|████████▎                        | 12678/50000 [2:17:48<6:06:43,  1.70it/s]


 25%|████████▎                        | 12679/50000 [2:17:48<6:12:30,  1.67it/s]


 25%|████████▎                        | 12680/50000 [2:17:49<6:09:26,  1.68it/s]


 25%|████████▎                        | 12681/50000 [2:17:50<6:07:17,  1.69it/s]


 25%|████████▎                        | 12682/50000 [2:17:50<6:05:43,  1.70it/s]


 25%|████████▎                        | 12683/50000 [2:17:51<6:04:40,  1.71it/s]


 25%|████████▎                        | 12684/50000 [2:17:52<6:49:49,  1.52it/s]


 25%|████████▎                        | 12685/50000 [2:17:52<6:37:48,  1.56it/s]


 25%|████████▎                        | 12686/50000 [2:17:53<6:35:23,  1.57it/s]


 25%|████████▎                        | 12687/50000 [2:17:53<6:30:04,  1.59it/s]


 25%|████████▎                        | 12688/50000 [2:17:54<6:25:08,  1.61it/s]


 25%|████████▎                        | 12689/50000 [2:17:55<6:32:22,  1.58it/s]


 25%|████████▍                        | 12690/50000 [2:17:55<7:08:01,  1.45it/s]


 25%|████████▍                        | 12691/50000 [2:17:56<6:56:26,  1.49it/s]


 25%|████████▍                        | 12692/50000 [2:17:57<6:42:24,  1.55it/s]


 25%|████████▍                        | 12693/50000 [2:17:57<6:33:49,  1.58it/s]


 25%|████████▍                        | 12694/50000 [2:17:58<6:28:56,  1.60it/s]


 25%|████████▍                        | 12695/50000 [2:17:58<6:21:59,  1.63it/s]


 25%|████████▍                        | 12696/50000 [2:17:59<6:31:18,  1.59it/s]


 25%|████████▍                        | 12697/50000 [2:18:00<6:37:54,  1.56it/s]


 25%|████████▍                        | 12698/50000 [2:18:01<7:34:38,  1.37it/s]


 25%|████████▍                        | 12699/50000 [2:18:01<7:22:08,  1.41it/s]


 25%|████████▍                        | 12700/50000 [2:18:02<7:29:08,  1.38it/s]
                                                                                
{'loss': 3.3502, 'grad_norm': 2.847236394882202, 'learning_rate': 0.000746, 'epoch': 0.66}

 25%|████████▍                        | 12700/50000 [2:18:02<7:29:08,  1.38it/s]


 25%|████████▍                        | 12701/50000 [2:18:03<7:53:47,  1.31it/s]


 25%|████████▍                        | 12702/50000 [2:18:04<8:06:03,  1.28it/s]


 25%|████████▍                        | 12703/50000 [2:18:05<7:58:41,  1.30it/s]


 25%|████████▍                        | 12704/50000 [2:18:05<7:38:55,  1.35it/s]


 25%|████████▍                        | 12705/50000 [2:18:06<7:37:16,  1.36it/s]


 25%|████████▍                        | 12706/50000 [2:18:07<7:26:15,  1.39it/s]


 25%|████████▍                        | 12707/50000 [2:18:07<7:33:11,  1.37it/s]


 25%|████████▍                        | 12708/50000 [2:18:08<7:21:04,  1.41it/s]


 25%|████████▍                        | 12709/50000 [2:18:09<7:31:58,  1.38it/s]


 25%|████████▍                        | 12710/50000 [2:18:09<7:02:39,  1.47it/s]


 25%|████████▍                        | 12711/50000 [2:18:10<7:12:03,  1.44it/s]


 25%|████████▍                        | 12712/50000 [2:18:11<7:14:35,  1.43it/s]


 25%|████████▍                        | 12713/50000 [2:18:11<7:02:10,  1.47it/s]


 25%|████████▍                        | 12714/50000 [2:18:12<7:21:15,  1.41it/s]


 25%|████████▍                        | 12715/50000 [2:18:13<7:09:22,  1.45it/s]


 25%|████████▍                        | 12716/50000 [2:18:14<7:36:07,  1.36it/s]


 25%|████████▍                        | 12717/50000 [2:18:14<7:17:09,  1.42it/s]


 25%|████████▍                        | 12718/50000 [2:18:15<6:57:04,  1.49it/s]


 25%|████████▍                        | 12719/50000 [2:18:16<6:59:26,  1.48it/s]


 25%|████████▍                        | 12720/50000 [2:18:16<7:29:14,  1.38it/s]


 25%|████████▍                        | 12721/50000 [2:18:17<7:17:11,  1.42it/s]


 25%|████████▍                        | 12722/50000 [2:18:18<6:56:57,  1.49it/s]


 25%|████████▍                        | 12723/50000 [2:18:18<6:54:32,  1.50it/s]


 25%|████████▍                        | 12724/50000 [2:18:19<6:55:08,  1.50it/s]


 25%|████████▍                        | 12725/50000 [2:18:20<6:37:56,  1.56it/s]


 25%|████████▍                        | 12726/50000 [2:18:20<6:10:06,  1.68it/s]


 25%|████████▍                        | 12727/50000 [2:18:21<6:24:07,  1.62it/s]


 25%|████████▍                        | 12728/50000 [2:18:21<6:32:41,  1.58it/s]


 25%|████████▍                        | 12729/50000 [2:18:22<6:36:14,  1.57it/s]


 25%|████████▍                        | 12730/50000 [2:18:23<6:52:13,  1.51it/s]


 25%|████████▍                        | 12731/50000 [2:18:23<6:49:56,  1.52it/s]


 25%|████████▍                        | 12732/50000 [2:18:24<6:48:20,  1.52it/s]


 25%|████████▍                        | 12733/50000 [2:18:25<7:04:14,  1.46it/s]


 25%|████████▍                        | 12734/50000 [2:18:26<7:29:03,  1.38it/s]


 25%|████████▍                        | 12735/50000 [2:18:27<7:45:13,  1.34it/s]


 25%|████████▍                        | 12736/50000 [2:18:27<7:29:27,  1.38it/s]


 25%|████████▍                        | 12737/50000 [2:18:28<7:29:27,  1.38it/s]


 25%|████████▍                        | 12738/50000 [2:18:29<7:16:00,  1.42it/s]


 25%|████████▍                        | 12739/50000 [2:18:29<6:57:07,  1.49it/s]


 25%|████████▍                        | 12740/50000 [2:18:30<6:57:00,  1.49it/s]


 25%|████████▍                        | 12741/50000 [2:18:30<6:48:36,  1.52it/s]


 25%|████████▍                        | 12742/50000 [2:18:31<6:37:58,  1.56it/s]


 25%|████████▍                        | 12743/50000 [2:18:32<6:13:30,  1.66it/s]


 25%|████████▍                        | 12744/50000 [2:18:32<6:27:33,  1.60it/s]


 25%|████████▍                        | 12745/50000 [2:18:33<6:52:06,  1.51it/s]


 25%|████████▍                        | 12746/50000 [2:18:34<6:46:18,  1.53it/s]


 25%|████████▍                        | 12747/50000 [2:18:34<7:02:40,  1.47it/s]


 25%|████████▍                        | 12748/50000 [2:18:35<6:45:29,  1.53it/s]


 25%|████████▍                        | 12749/50000 [2:18:35<6:21:00,  1.63it/s]


 26%|████████▍                        | 12750/50000 [2:18:36<6:32:44,  1.58it/s]


 26%|████████▍                        | 12751/50000 [2:18:37<7:11:36,  1.44it/s]


 26%|████████▍                        | 12752/50000 [2:18:38<7:16:54,  1.42it/s]


 26%|████████▍                        | 12753/50000 [2:18:38<7:25:21,  1.39it/s]


 26%|████████▍                        | 12754/50000 [2:18:39<7:16:40,  1.42it/s]


 26%|████████▍                        | 12755/50000 [2:18:40<6:48:16,  1.52it/s]


 26%|████████▍                        | 12756/50000 [2:18:40<6:50:42,  1.51it/s]


 26%|████████▍                        | 12757/50000 [2:18:41<6:37:01,  1.56it/s]


 26%|████████▍                        | 12758/50000 [2:18:42<6:52:36,  1.50it/s]


 26%|████████▍                        | 12759/50000 [2:18:42<7:20:45,  1.41it/s]


 26%|████████▍                        | 12760/50000 [2:18:43<6:45:44,  1.53it/s]


 26%|████████▍                        | 12761/50000 [2:18:44<7:05:55,  1.46it/s]


 26%|████████▍                        | 12762/50000 [2:18:44<6:35:21,  1.57it/s]


 26%|████████▍                        | 12763/50000 [2:18:45<6:59:43,  1.48it/s]


 26%|████████▍                        | 12764/50000 [2:18:46<6:41:04,  1.55it/s]


 26%|████████▍                        | 12765/50000 [2:18:46<6:10:55,  1.67it/s]


 26%|████████▍                        | 12766/50000 [2:18:47<6:20:52,  1.63it/s]


 26%|████████▍                        | 12767/50000 [2:18:47<6:29:09,  1.59it/s]


 26%|████████▍                        | 12768/50000 [2:18:48<6:27:53,  1.60it/s]


 26%|████████▍                        | 12769/50000 [2:18:49<6:32:46,  1.58it/s]


 26%|████████▍                        | 12770/50000 [2:18:49<6:49:37,  1.51it/s]


 26%|████████▍                        | 12771/50000 [2:18:50<6:45:35,  1.53it/s]


 26%|████████▍                        | 12772/50000 [2:18:51<6:55:54,  1.49it/s]


 26%|████████▍                        | 12773/50000 [2:18:51<6:50:01,  1.51it/s]


 26%|████████▍                        | 12774/50000 [2:18:52<6:43:26,  1.54it/s]


 26%|████████▍                        | 12775/50000 [2:18:53<6:29:39,  1.59it/s]


 26%|████████▍                        | 12776/50000 [2:18:53<6:46:22,  1.53it/s]


 26%|████████▍                        | 12777/50000 [2:18:54<6:45:39,  1.53it/s]


 26%|████████▍                        | 12778/50000 [2:18:55<7:53:58,  1.31it/s]


 26%|████████▍                        | 12779/50000 [2:18:56<7:36:33,  1.36it/s]


 26%|████████▍                        | 12780/50000 [2:18:56<7:17:20,  1.42it/s]


 26%|████████▍                        | 12781/50000 [2:18:57<6:57:57,  1.48it/s]


 26%|████████▍                        | 12782/50000 [2:18:57<6:28:34,  1.60it/s]


 26%|████████▍                        | 12783/50000 [2:18:58<6:24:58,  1.61it/s]


 26%|████████▍                        | 12784/50000 [2:18:59<6:28:21,  1.60it/s]


 26%|████████▍                        | 12785/50000 [2:18:59<6:34:33,  1.57it/s]


 26%|████████▍                        | 12786/50000 [2:19:00<6:43:47,  1.54it/s]


 26%|████████▍                        | 12787/50000 [2:19:01<6:58:46,  1.48it/s]


 26%|████████▍                        | 12788/50000 [2:19:01<6:43:05,  1.54it/s]


 26%|████████▍                        | 12789/50000 [2:19:02<6:29:43,  1.59it/s]


 26%|████████▍                        | 12790/50000 [2:19:03<6:32:09,  1.58it/s]


 26%|████████▍                        | 12791/50000 [2:19:03<6:36:05,  1.57it/s]


 26%|████████▍                        | 12792/50000 [2:19:04<6:36:36,  1.56it/s]


 26%|████████▍                        | 12793/50000 [2:19:05<6:45:11,  1.53it/s]


 26%|████████▍                        | 12794/50000 [2:19:05<7:02:09,  1.47it/s]


 26%|████████▍                        | 12795/50000 [2:19:06<7:01:48,  1.47it/s]


 26%|████████▍                        | 12796/50000 [2:19:07<6:54:30,  1.50it/s]


 26%|████████▍                        | 12797/50000 [2:19:07<7:03:57,  1.46it/s]


 26%|████████▍                        | 12798/50000 [2:19:08<7:18:54,  1.41it/s]


 26%|████████▍                        | 12799/50000 [2:19:09<7:53:18,  1.31it/s]


 26%|████████▍                        | 12800/50000 [2:19:10<7:21:54,  1.40it/s]
                                                                                
{'loss': 3.3619, 'grad_norm': 3.7995355129241943, 'learning_rate': 0.000744, 'epoch': 0.67}

 26%|████████▍                        | 12800/50000 [2:19:10<7:21:54,  1.40it/s]


 26%|████████▍                        | 12801/50000 [2:19:10<6:57:31,  1.48it/s]


 26%|████████▍                        | 12802/50000 [2:19:11<6:53:22,  1.50it/s]


 26%|████████▍                        | 12803/50000 [2:19:12<7:02:53,  1.47it/s]


 26%|████████▍                        | 12804/50000 [2:19:12<7:00:51,  1.47it/s]


 26%|████████▍                        | 12805/50000 [2:19:13<7:43:02,  1.34it/s]


 26%|████████▍                        | 12806/50000 [2:19:14<7:43:22,  1.34it/s]


 26%|████████▍                        | 12807/50000 [2:19:15<7:36:11,  1.36it/s]


 26%|████████▍                        | 12808/50000 [2:19:15<7:15:54,  1.42it/s]


 26%|████████▍                        | 12809/50000 [2:19:16<7:40:22,  1.35it/s]


 26%|████████▍                        | 12810/50000 [2:19:17<7:22:27,  1.40it/s]


 26%|████████▍                        | 12811/50000 [2:19:17<7:32:07,  1.37it/s]


 26%|████████▍                        | 12812/50000 [2:19:18<8:10:58,  1.26it/s]


 26%|████████▍                        | 12813/50000 [2:19:19<7:20:45,  1.41it/s]


 26%|████████▍                        | 12814/50000 [2:19:19<6:56:43,  1.49it/s]


 26%|████████▍                        | 12815/50000 [2:19:20<6:47:01,  1.52it/s]


 26%|████████▍                        | 12816/50000 [2:19:21<6:42:33,  1.54it/s]


 26%|████████▍                        | 12817/50000 [2:19:21<6:33:29,  1.57it/s]


 26%|████████▍                        | 12818/50000 [2:19:22<7:00:36,  1.47it/s]


 26%|████████▍                        | 12819/50000 [2:19:23<7:08:59,  1.44it/s]


 26%|████████▍                        | 12820/50000 [2:19:23<6:51:56,  1.50it/s]


 26%|████████▍                        | 12821/50000 [2:19:24<6:49:26,  1.51it/s]


 26%|████████▍                        | 12822/50000 [2:19:25<7:26:36,  1.39it/s]


 26%|████████▍                        | 12823/50000 [2:19:26<7:00:53,  1.47it/s]


 26%|████████▍                        | 12824/50000 [2:19:26<6:41:47,  1.54it/s]


 26%|████████▍                        | 12825/50000 [2:19:27<6:43:33,  1.54it/s]


 26%|████████▍                        | 12826/50000 [2:19:27<6:26:12,  1.60it/s]


 26%|████████▍                        | 12827/50000 [2:19:28<7:05:28,  1.46it/s]


 26%|████████▍                        | 12828/50000 [2:19:29<6:53:12,  1.50it/s]


 26%|████████▍                        | 12829/50000 [2:19:29<6:45:55,  1.53it/s]


 26%|████████▍                        | 12830/50000 [2:19:30<7:15:56,  1.42it/s]


 26%|████████▍                        | 12831/50000 [2:19:31<6:49:51,  1.51it/s]


 26%|████████▍                        | 12832/50000 [2:19:31<6:18:23,  1.64it/s]


 26%|████████▍                        | 12833/50000 [2:19:32<6:12:19,  1.66it/s]


 26%|████████▍                        | 12834/50000 [2:19:32<6:12:42,  1.66it/s]


 26%|████████▍                        | 12835/50000 [2:19:33<6:34:11,  1.57it/s]


 26%|████████▍                        | 12836/50000 [2:19:34<6:23:45,  1.61it/s]


 26%|████████▍                        | 12837/50000 [2:19:34<6:21:15,  1.62it/s]


 26%|████████▍                        | 12838/50000 [2:19:35<6:03:19,  1.70it/s]


 26%|████████▍                        | 12839/50000 [2:19:35<5:51:50,  1.76it/s]


 26%|████████▍                        | 12840/50000 [2:19:36<6:04:18,  1.70it/s]


 26%|████████▍                        | 12841/50000 [2:19:37<6:17:49,  1.64it/s]


 26%|████████▍                        | 12842/50000 [2:19:38<6:57:03,  1.48it/s]


 26%|████████▍                        | 12843/50000 [2:19:38<6:37:05,  1.56it/s]


 26%|████████▍                        | 12844/50000 [2:19:39<6:26:29,  1.60it/s]


 26%|████████▍                        | 12845/50000 [2:19:39<6:20:12,  1.63it/s]


 26%|████████▍                        | 12846/50000 [2:19:40<6:26:22,  1.60it/s]


 26%|████████▍                        | 12847/50000 [2:19:41<6:29:27,  1.59it/s]


 26%|████████▍                        | 12848/50000 [2:19:41<6:16:25,  1.64it/s]


 26%|████████▍                        | 12849/50000 [2:19:42<6:20:12,  1.63it/s]


 26%|████████▍                        | 12850/50000 [2:19:42<6:41:00,  1.54it/s]


 26%|████████▍                        | 12851/50000 [2:19:43<6:55:30,  1.49it/s]


 26%|████████▍                        | 12852/50000 [2:19:44<7:39:20,  1.35it/s]


 26%|████████▍                        | 12853/50000 [2:19:45<7:13:14,  1.43it/s]


 26%|████████▍                        | 12854/50000 [2:19:45<6:52:30,  1.50it/s]


 26%|████████▍                        | 12855/50000 [2:19:46<6:48:51,  1.51it/s]


 26%|████████▍                        | 12856/50000 [2:19:47<7:02:41,  1.46it/s]


 26%|████████▍                        | 12857/50000 [2:19:47<7:00:53,  1.47it/s]


 26%|████████▍                        | 12858/50000 [2:19:48<7:06:20,  1.45it/s]


 26%|████████▍                        | 12859/50000 [2:19:49<7:15:27,  1.42it/s]


 26%|████████▍                        | 12860/50000 [2:19:50<7:16:25,  1.42it/s]


 26%|████████▍                        | 12861/50000 [2:19:50<6:52:26,  1.50it/s]


 26%|████████▍                        | 12862/50000 [2:19:51<6:43:56,  1.53it/s]


 26%|████████▍                        | 12863/50000 [2:19:51<6:59:29,  1.48it/s]


 26%|████████▍                        | 12864/50000 [2:19:52<6:58:18,  1.48it/s]


 26%|████████▍                        | 12865/50000 [2:19:53<7:08:29,  1.44it/s]


 26%|████████▍                        | 12866/50000 [2:19:54<7:11:36,  1.43it/s]


 26%|████████▍                        | 12867/50000 [2:19:54<7:05:27,  1.45it/s]


 26%|████████▍                        | 12868/50000 [2:19:55<6:49:19,  1.51it/s]


 26%|████████▍                        | 12869/50000 [2:19:55<6:46:32,  1.52it/s]


 26%|████████▍                        | 12870/50000 [2:19:56<6:42:11,  1.54it/s]


 26%|████████▍                        | 12871/50000 [2:19:57<6:39:07,  1.55it/s]


 26%|████████▍                        | 12872/50000 [2:19:57<6:52:35,  1.50it/s]


 26%|████████▍                        | 12873/50000 [2:19:58<6:38:08,  1.55it/s]


 26%|████████▍                        | 12874/50000 [2:19:59<6:26:50,  1.60it/s]


 26%|████████▍                        | 12875/50000 [2:19:59<6:28:20,  1.59it/s]


 26%|████████▍                        | 12876/50000 [2:20:00<6:23:31,  1.61it/s]


 26%|████████▍                        | 12877/50000 [2:20:00<6:20:01,  1.63it/s]


 26%|████████▍                        | 12878/50000 [2:20:01<6:17:38,  1.64it/s]


 26%|████████▌                        | 12879/50000 [2:20:02<6:43:26,  1.53it/s]


 26%|████████▌                        | 12880/50000 [2:20:02<6:19:17,  1.63it/s]


 26%|████████▌                        | 12881/50000 [2:20:03<6:27:39,  1.60it/s]


 26%|████████▌                        | 12882/50000 [2:20:04<6:23:09,  1.61it/s]


 26%|████████▌                        | 12883/50000 [2:20:04<6:26:22,  1.60it/s]


 26%|████████▌                        | 12884/50000 [2:20:05<6:17:35,  1.64it/s]


 26%|████████▌                        | 12885/50000 [2:20:06<6:47:04,  1.52it/s]


 26%|████████▌                        | 12886/50000 [2:20:06<7:15:25,  1.42it/s]


 26%|████████▌                        | 12887/50000 [2:20:07<7:21:56,  1.40it/s]


 26%|████████▌                        | 12888/50000 [2:20:08<7:22:25,  1.40it/s]


 26%|████████▌                        | 12889/50000 [2:20:08<7:08:15,  1.44it/s]


 26%|████████▌                        | 12890/50000 [2:20:09<6:51:18,  1.50it/s]


 26%|████████▌                        | 12891/50000 [2:20:10<6:32:34,  1.58it/s]


 26%|████████▌                        | 12892/50000 [2:20:10<6:41:44,  1.54it/s]


 26%|████████▌                        | 12893/50000 [2:20:11<6:59:09,  1.48it/s]


 26%|████████▌                        | 12894/50000 [2:20:12<6:55:34,  1.49it/s]


 26%|████████▌                        | 12895/50000 [2:20:12<7:07:31,  1.45it/s]


 26%|████████▌                        | 12896/50000 [2:20:13<7:02:36,  1.46it/s]


 26%|████████▌                        | 12897/50000 [2:20:14<6:42:39,  1.54it/s]


 26%|████████▌                        | 12898/50000 [2:20:14<6:26:23,  1.60it/s]


 26%|████████▌                        | 12899/50000 [2:20:15<6:19:40,  1.63it/s]


 26%|████████▌                        | 12900/50000 [2:20:16<6:25:34,  1.60it/s]
                                                                                
{'loss': 3.3625, 'grad_norm': 3.0155277252197266, 'learning_rate': 0.000742, 'epoch': 0.68}

 26%|████████▌                        | 12900/50000 [2:20:16<6:25:34,  1.60it/s]


 26%|████████▌                        | 12901/50000 [2:20:16<6:29:44,  1.59it/s]


 26%|████████▌                        | 12902/50000 [2:20:17<6:49:23,  1.51it/s]


 26%|████████▌                        | 12903/50000 [2:20:18<7:04:18,  1.46it/s]


 26%|████████▌                        | 12904/50000 [2:20:18<6:55:27,  1.49it/s]


 26%|████████▌                        | 12905/50000 [2:20:19<6:36:27,  1.56it/s]


 26%|████████▌                        | 12906/50000 [2:20:20<6:44:22,  1.53it/s]


 26%|████████▌                        | 12907/50000 [2:20:20<6:57:24,  1.48it/s]


 26%|████████▌                        | 12908/50000 [2:20:21<6:40:21,  1.54it/s]


 26%|████████▌                        | 12909/50000 [2:20:21<6:36:33,  1.56it/s]


 26%|████████▌                        | 12910/50000 [2:20:22<6:32:39,  1.57it/s]


 26%|████████▌                        | 12911/50000 [2:20:23<7:08:28,  1.44it/s]


 26%|████████▌                        | 12912/50000 [2:20:24<6:50:09,  1.51it/s]


 26%|████████▌                        | 12913/50000 [2:20:24<6:51:30,  1.50it/s]


 26%|████████▌                        | 12914/50000 [2:20:25<7:09:19,  1.44it/s]


 26%|████████▌                        | 12915/50000 [2:20:26<7:02:34,  1.46it/s]


 26%|████████▌                        | 12916/50000 [2:20:26<7:02:28,  1.46it/s]


 26%|████████▌                        | 12917/50000 [2:20:27<7:13:25,  1.43it/s]


 26%|████████▌                        | 12918/50000 [2:20:28<7:10:04,  1.44it/s]


 26%|████████▌                        | 12919/50000 [2:20:28<6:59:34,  1.47it/s]


 26%|████████▌                        | 12920/50000 [2:20:29<6:48:39,  1.51it/s]


 26%|████████▌                        | 12921/50000 [2:20:30<6:52:57,  1.50it/s]


 26%|████████▌                        | 12922/50000 [2:20:30<7:03:17,  1.46it/s]


 26%|████████▌                        | 12923/50000 [2:20:31<6:58:25,  1.48it/s]


 26%|████████▌                        | 12924/50000 [2:20:32<6:56:06,  1.49it/s]


 26%|████████▌                        | 12925/50000 [2:20:32<6:48:39,  1.51it/s]


 26%|████████▌                        | 12926/50000 [2:20:33<6:45:40,  1.52it/s]


 26%|████████▌                        | 12927/50000 [2:20:34<6:28:29,  1.59it/s]


 26%|████████▌                        | 12928/50000 [2:20:34<6:36:22,  1.56it/s]


 26%|████████▌                        | 12929/50000 [2:20:35<6:57:39,  1.48it/s]


 26%|████████▌                        | 12930/50000 [2:20:36<7:12:47,  1.43it/s]


 26%|████████▌                        | 12931/50000 [2:20:36<7:06:28,  1.45it/s]


 26%|████████▌                        | 12932/50000 [2:20:37<7:12:00,  1.43it/s]


 26%|████████▌                        | 12933/50000 [2:20:38<6:50:44,  1.50it/s]


 26%|████████▌                        | 12934/50000 [2:20:38<7:09:02,  1.44it/s]


 26%|████████▌                        | 12935/50000 [2:20:39<7:01:27,  1.47it/s]


 26%|████████▌                        | 12936/50000 [2:20:40<6:57:22,  1.48it/s]


 26%|████████▌                        | 12937/50000 [2:20:40<6:54:29,  1.49it/s]


 26%|████████▌                        | 12938/50000 [2:20:41<7:26:23,  1.38it/s]


 26%|████████▌                        | 12939/50000 [2:20:42<7:29:31,  1.37it/s]


 26%|████████▌                        | 12940/50000 [2:20:43<7:06:22,  1.45it/s]


 26%|████████▌                        | 12941/50000 [2:20:43<7:05:29,  1.45it/s]


 26%|████████▌                        | 12942/50000 [2:20:44<7:10:46,  1.43it/s]


 26%|████████▌                        | 12943/50000 [2:20:45<6:52:11,  1.50it/s]


 26%|████████▌                        | 12944/50000 [2:20:45<7:23:40,  1.39it/s]


 26%|████████▌                        | 12945/50000 [2:20:46<6:53:03,  1.50it/s]


 26%|████████▌                        | 12946/50000 [2:20:47<6:26:21,  1.60it/s]


 26%|████████▌                        | 12947/50000 [2:20:47<6:43:49,  1.53it/s]


 26%|████████▌                        | 12948/50000 [2:20:48<6:39:10,  1.55it/s]


 26%|████████▌                        | 12949/50000 [2:20:48<6:28:48,  1.59it/s]


 26%|████████▌                        | 12950/50000 [2:20:49<6:39:16,  1.55it/s]


 26%|████████▌                        | 12951/50000 [2:20:50<7:21:17,  1.40it/s]


 26%|████████▌                        | 12952/50000 [2:20:51<7:11:40,  1.43it/s]


 26%|████████▌                        | 12953/50000 [2:20:51<7:04:38,  1.45it/s]


 26%|████████▌                        | 12954/50000 [2:20:52<7:11:51,  1.43it/s]


 26%|████████▌                        | 12955/50000 [2:20:53<6:54:06,  1.49it/s]


 26%|████████▌                        | 12956/50000 [2:20:53<6:52:11,  1.50it/s]


 26%|████████▌                        | 12957/50000 [2:20:54<6:40:14,  1.54it/s]


 26%|████████▌                        | 12958/50000 [2:20:55<6:56:58,  1.48it/s]


 26%|████████▌                        | 12959/50000 [2:20:55<6:36:38,  1.56it/s]


 26%|████████▌                        | 12960/50000 [2:20:56<6:39:58,  1.54it/s]


 26%|████████▌                        | 12961/50000 [2:20:57<6:43:29,  1.53it/s]


 26%|████████▌                        | 12962/50000 [2:20:57<6:26:09,  1.60it/s]


 26%|████████▌                        | 12963/50000 [2:20:58<6:25:24,  1.60it/s]


 26%|████████▌                        | 12964/50000 [2:20:58<6:10:44,  1.66it/s]


 26%|████████▌                        | 12965/50000 [2:20:59<6:11:58,  1.66it/s]


 26%|████████▌                        | 12966/50000 [2:21:00<6:12:48,  1.66it/s]


 26%|████████▌                        | 12967/50000 [2:21:00<6:12:17,  1.66it/s]


 26%|████████▌                        | 12968/50000 [2:21:01<6:18:59,  1.63it/s]


 26%|████████▌                        | 12969/50000 [2:21:01<6:25:57,  1.60it/s]


 26%|████████▌                        | 12970/50000 [2:21:02<6:33:11,  1.57it/s]


 26%|████████▌                        | 12971/50000 [2:21:03<6:49:16,  1.51it/s]


 26%|████████▌                        | 12972/50000 [2:21:03<6:32:00,  1.57it/s]


 26%|████████▌                        | 12973/50000 [2:21:04<6:19:59,  1.62it/s]


 26%|████████▌                        | 12974/50000 [2:21:05<6:13:53,  1.65it/s]


 26%|████████▌                        | 12975/50000 [2:21:05<6:34:37,  1.56it/s]


 26%|████████▌                        | 12976/50000 [2:21:06<6:27:31,  1.59it/s]


 26%|████████▌                        | 12977/50000 [2:21:06<6:14:28,  1.65it/s]


 26%|████████▌                        | 12978/50000 [2:21:07<6:12:33,  1.66it/s]


 26%|████████▌                        | 12979/50000 [2:21:08<6:12:17,  1.66it/s]


 26%|████████▌                        | 12980/50000 [2:21:08<6:35:11,  1.56it/s]


 26%|████████▌                        | 12981/50000 [2:21:09<6:22:12,  1.61it/s]


 26%|████████▌                        | 12982/50000 [2:21:10<6:25:01,  1.60it/s]


 26%|████████▌                        | 12983/50000 [2:21:10<6:25:33,  1.60it/s]


 26%|████████▌                        | 12984/50000 [2:21:11<6:22:18,  1.61it/s]


 26%|████████▌                        | 12985/50000 [2:21:11<6:28:14,  1.59it/s]


 26%|████████▌                        | 12986/50000 [2:21:12<6:12:42,  1.66it/s]


 26%|████████▌                        | 12987/50000 [2:21:13<6:58:06,  1.48it/s]


 26%|████████▌                        | 12988/50000 [2:21:14<7:05:08,  1.45it/s]


 26%|████████▌                        | 12989/50000 [2:21:14<6:44:55,  1.52it/s]


 26%|████████▌                        | 12990/50000 [2:21:15<6:34:50,  1.56it/s]


 26%|████████▌                        | 12991/50000 [2:21:15<6:42:49,  1.53it/s]


 26%|████████▌                        | 12992/50000 [2:21:16<6:33:11,  1.57it/s]


 26%|████████▌                        | 12993/50000 [2:21:17<6:41:40,  1.54it/s]


 26%|████████▌                        | 12994/50000 [2:21:17<6:27:44,  1.59it/s]


 26%|████████▌                        | 12995/50000 [2:21:18<6:22:36,  1.61it/s]


 26%|████████▌                        | 12996/50000 [2:21:19<6:45:20,  1.52it/s]


 26%|████████▌                        | 12997/50000 [2:21:19<6:47:53,  1.51it/s]


 26%|████████▌                        | 12998/50000 [2:21:20<6:42:37,  1.53it/s]


 26%|████████▌                        | 12999/50000 [2:21:20<6:30:44,  1.58it/s]


 26%|████████▌                        | 13000/50000 [2:21:21<6:45:13,  1.52it/s]
                                                                                
{'loss': 3.3552, 'grad_norm': 3.0337395668029785, 'learning_rate': 0.00074, 'epoch': 0.68}

 26%|████████▌                        | 13000/50000 [2:21:21<6:45:13,  1.52it/s]


 26%|████████▌                        | 13001/50000 [2:21:22<7:01:42,  1.46it/s]


 26%|████████▌                        | 13002/50000 [2:21:23<6:41:45,  1.53it/s]


 26%|████████▌                        | 13003/50000 [2:21:23<7:14:34,  1.42it/s]


 26%|████████▌                        | 13004/50000 [2:21:24<7:17:24,  1.41it/s]


 26%|████████▌                        | 13005/50000 [2:21:25<6:53:04,  1.49it/s]


 26%|████████▌                        | 13006/50000 [2:21:25<6:47:23,  1.51it/s]


 26%|████████▌                        | 13007/50000 [2:21:26<6:50:26,  1.50it/s]


 26%|████████▌                        | 13008/50000 [2:21:27<6:38:31,  1.55it/s]


 26%|████████▌                        | 13009/50000 [2:21:27<6:38:27,  1.55it/s]


 26%|████████▌                        | 13010/50000 [2:21:28<6:22:48,  1.61it/s]


 26%|████████▌                        | 13011/50000 [2:21:28<6:11:25,  1.66it/s]


 26%|████████▌                        | 13012/50000 [2:21:29<6:04:13,  1.69it/s]


 26%|████████▌                        | 13013/50000 [2:21:30<6:07:33,  1.68it/s]


 26%|████████▌                        | 13014/50000 [2:21:30<6:16:04,  1.64it/s]


 26%|████████▌                        | 13015/50000 [2:21:31<6:54:27,  1.49it/s]


 26%|████████▌                        | 13016/50000 [2:21:32<6:36:41,  1.55it/s]


 26%|████████▌                        | 13017/50000 [2:21:32<6:34:49,  1.56it/s]


 26%|████████▌                        | 13018/50000 [2:21:33<6:41:54,  1.53it/s]


 26%|████████▌                        | 13019/50000 [2:21:33<6:31:44,  1.57it/s]


 26%|████████▌                        | 13020/50000 [2:21:34<6:37:28,  1.55it/s]


 26%|████████▌                        | 13021/50000 [2:21:35<6:41:59,  1.53it/s]


 26%|████████▌                        | 13022/50000 [2:21:35<6:43:26,  1.53it/s]


 26%|████████▌                        | 13023/50000 [2:21:36<6:38:44,  1.55it/s]


 26%|████████▌                        | 13024/50000 [2:21:37<6:58:01,  1.47it/s]


 26%|████████▌                        | 13025/50000 [2:21:37<6:43:10,  1.53it/s]


 26%|████████▌                        | 13026/50000 [2:21:38<6:23:20,  1.61it/s]


 26%|████████▌                        | 13027/50000 [2:21:39<6:05:20,  1.69it/s]


 26%|████████▌                        | 13028/50000 [2:21:39<6:39:24,  1.54it/s]


 26%|████████▌                        | 13029/50000 [2:21:40<6:35:47,  1.56it/s]


 26%|████████▌                        | 13030/50000 [2:21:41<6:41:26,  1.53it/s]


 26%|████████▌                        | 13031/50000 [2:21:41<6:24:24,  1.60it/s]


 26%|████████▌                        | 13032/50000 [2:21:42<6:44:32,  1.52it/s]


 26%|████████▌                        | 13033/50000 [2:21:42<6:32:26,  1.57it/s]


 26%|████████▌                        | 13034/50000 [2:21:43<6:32:11,  1.57it/s]


 26%|████████▌                        | 13035/50000 [2:21:44<6:30:44,  1.58it/s]


 26%|████████▌                        | 13036/50000 [2:21:44<6:31:58,  1.57it/s]


 26%|████████▌                        | 13037/50000 [2:21:45<6:39:53,  1.54it/s]


 26%|████████▌                        | 13038/50000 [2:21:46<7:15:32,  1.41it/s]


 26%|████████▌                        | 13039/50000 [2:21:47<7:23:22,  1.39it/s]


 26%|████████▌                        | 13040/50000 [2:21:47<7:08:31,  1.44it/s]


 26%|████████▌                        | 13041/50000 [2:21:48<6:40:47,  1.54it/s]


 26%|████████▌                        | 13042/50000 [2:21:48<6:35:51,  1.56it/s]


 26%|████████▌                        | 13043/50000 [2:21:49<6:20:14,  1.62it/s]


 26%|████████▌                        | 13044/50000 [2:21:50<6:42:16,  1.53it/s]


 26%|████████▌                        | 13045/50000 [2:21:50<6:32:00,  1.57it/s]


 26%|████████▌                        | 13046/50000 [2:21:51<6:45:21,  1.52it/s]


 26%|████████▌                        | 13047/50000 [2:21:52<6:43:03,  1.53it/s]


 26%|████████▌                        | 13048/50000 [2:21:52<6:57:09,  1.48it/s]


 26%|████████▌                        | 13049/50000 [2:21:53<6:43:02,  1.53it/s]


 26%|████████▌                        | 13050/50000 [2:21:54<6:36:47,  1.55it/s]


 26%|████████▌                        | 13051/50000 [2:21:54<6:54:57,  1.48it/s]


 26%|████████▌                        | 13052/50000 [2:21:55<6:40:20,  1.54it/s]


 26%|████████▌                        | 13053/50000 [2:21:56<6:41:56,  1.53it/s]


 26%|████████▌                        | 13054/50000 [2:21:56<6:24:23,  1.60it/s]


 26%|████████▌                        | 13055/50000 [2:21:57<6:17:49,  1.63it/s]


 26%|████████▌                        | 13056/50000 [2:21:58<6:55:07,  1.48it/s]


 26%|████████▌                        | 13057/50000 [2:21:58<6:38:11,  1.55it/s]


 26%|████████▌                        | 13058/50000 [2:21:59<6:45:05,  1.52it/s]


 26%|████████▌                        | 13059/50000 [2:22:00<6:45:16,  1.52it/s]


 26%|████████▌                        | 13060/50000 [2:22:00<6:44:03,  1.52it/s]


 26%|████████▌                        | 13061/50000 [2:22:01<6:55:26,  1.48it/s]


 26%|████████▌                        | 13062/50000 [2:22:01<6:39:29,  1.54it/s]


 26%|████████▌                        | 13063/50000 [2:22:02<6:42:28,  1.53it/s]


 26%|████████▌                        | 13064/50000 [2:22:03<6:30:29,  1.58it/s]


 26%|████████▌                        | 13065/50000 [2:22:03<6:31:19,  1.57it/s]


 26%|████████▌                        | 13066/50000 [2:22:04<6:54:45,  1.48it/s]


 26%|████████▌                        | 13067/50000 [2:22:05<6:37:00,  1.55it/s]


 26%|████████▌                        | 13068/50000 [2:22:05<6:34:00,  1.56it/s]


 26%|████████▋                        | 13069/50000 [2:22:06<6:16:50,  1.63it/s]


 26%|████████▋                        | 13070/50000 [2:22:07<6:18:57,  1.62it/s]


 26%|████████▋                        | 13071/50000 [2:22:07<5:54:06,  1.74it/s]


 26%|████████▋                        | 13072/50000 [2:22:08<6:05:07,  1.69it/s]


 26%|████████▋                        | 13073/50000 [2:22:08<6:03:31,  1.69it/s]


 26%|████████▋                        | 13074/50000 [2:22:09<5:44:31,  1.79it/s]


 26%|████████▋                        | 13075/50000 [2:22:09<6:03:01,  1.70it/s]


 26%|████████▋                        | 13076/50000 [2:22:10<6:16:05,  1.64it/s]


 26%|████████▋                        | 13077/50000 [2:22:11<6:18:26,  1.63it/s]


 26%|████████▋                        | 13078/50000 [2:22:11<6:17:23,  1.63it/s]


 26%|████████▋                        | 13079/50000 [2:22:12<6:27:12,  1.59it/s]


 26%|████████▋                        | 13080/50000 [2:22:13<6:34:58,  1.56it/s]


 26%|████████▋                        | 13081/50000 [2:22:13<6:22:52,  1.61it/s]


 26%|████████▋                        | 13082/50000 [2:22:14<6:32:00,  1.57it/s]


 26%|████████▋                        | 13083/50000 [2:22:15<6:33:35,  1.56it/s]


 26%|████████▋                        | 13084/50000 [2:22:15<7:05:00,  1.45it/s]


 26%|████████▋                        | 13085/50000 [2:22:16<7:32:41,  1.36it/s]


 26%|████████▋                        | 13086/50000 [2:22:17<7:51:04,  1.31it/s]


 26%|████████▋                        | 13087/50000 [2:22:18<7:53:58,  1.30it/s]


 26%|████████▋                        | 13088/50000 [2:22:19<8:03:29,  1.27it/s]


 26%|████████▋                        | 13089/50000 [2:22:19<7:29:19,  1.37it/s]


 26%|████████▋                        | 13090/50000 [2:22:20<7:13:42,  1.42it/s]


 26%|████████▋                        | 13091/50000 [2:22:20<6:48:52,  1.50it/s]


 26%|████████▋                        | 13092/50000 [2:22:21<6:32:49,  1.57it/s]


 26%|████████▋                        | 13093/50000 [2:22:22<6:18:11,  1.63it/s]


 26%|████████▋                        | 13094/50000 [2:22:22<6:23:02,  1.61it/s]


 26%|████████▋                        | 13095/50000 [2:22:23<6:50:19,  1.50it/s]


 26%|████████▋                        | 13096/50000 [2:22:24<7:10:37,  1.43it/s]


 26%|████████▋                        | 13097/50000 [2:22:24<6:53:37,  1.49it/s]


 26%|████████▋                        | 13098/50000 [2:22:25<6:40:28,  1.54it/s]


 26%|████████▋                        | 13099/50000 [2:22:26<6:40:29,  1.54it/s]


 26%|████████▋                        | 13100/50000 [2:22:26<6:39:28,  1.54it/s]
                                                                                
{'loss': 3.3668, 'grad_norm': 2.972939968109131, 'learning_rate': 0.000738, 'epoch': 0.69}

 26%|████████▋                        | 13100/50000 [2:22:26<6:39:28,  1.54it/s]


 26%|████████▋                        | 13101/50000 [2:22:27<6:26:05,  1.59it/s]


 26%|████████▋                        | 13102/50000 [2:22:27<6:13:21,  1.65it/s]


 26%|████████▋                        | 13103/50000 [2:22:28<6:33:57,  1.56it/s]


 26%|████████▋                        | 13104/50000 [2:22:29<6:19:42,  1.62it/s]


 26%|████████▋                        | 13105/50000 [2:22:29<6:39:19,  1.54it/s]


 26%|████████▋                        | 13106/50000 [2:22:30<6:21:10,  1.61it/s]


 26%|████████▋                        | 13107/50000 [2:22:30<5:58:36,  1.71it/s]


 26%|████████▋                        | 13108/50000 [2:22:31<6:11:14,  1.66it/s]


 26%|████████▋                        | 13109/50000 [2:22:32<6:23:39,  1.60it/s]


 26%|████████▋                        | 13110/50000 [2:22:32<6:18:20,  1.63it/s]


 26%|████████▋                        | 13111/50000 [2:22:33<6:28:55,  1.58it/s]


 26%|████████▋                        | 13112/50000 [2:22:34<6:34:07,  1.56it/s]


 26%|████████▋                        | 13113/50000 [2:22:34<6:41:07,  1.53it/s]


 26%|████████▋                        | 13114/50000 [2:22:35<6:42:39,  1.53it/s]


 26%|████████▋                        | 13115/50000 [2:22:36<6:40:07,  1.54it/s]


 26%|████████▋                        | 13116/50000 [2:22:36<6:36:01,  1.55it/s]


 26%|████████▋                        | 13117/50000 [2:22:37<7:10:18,  1.43it/s]


 26%|████████▋                        | 13118/50000 [2:22:38<6:51:22,  1.49it/s]


 26%|████████▋                        | 13119/50000 [2:22:38<6:38:03,  1.54it/s]


 26%|████████▋                        | 13120/50000 [2:22:39<6:36:51,  1.55it/s]


 26%|████████▋                        | 13121/50000 [2:22:40<6:28:59,  1.58it/s]


 26%|████████▋                        | 13122/50000 [2:22:40<6:20:57,  1.61it/s]


 26%|████████▋                        | 13123/50000 [2:22:41<6:40:12,  1.54it/s]


 26%|████████▋                        | 13124/50000 [2:22:42<6:43:58,  1.52it/s]


 26%|████████▋                        | 13125/50000 [2:22:42<6:40:41,  1.53it/s]


 26%|████████▋                        | 13126/50000 [2:22:43<6:26:45,  1.59it/s]


 26%|████████▋                        | 13127/50000 [2:22:43<6:30:50,  1.57it/s]


 26%|████████▋                        | 13128/50000 [2:22:44<7:11:08,  1.43it/s]


 26%|████████▋                        | 13129/50000 [2:22:45<7:03:08,  1.45it/s]


 26%|████████▋                        | 13130/50000 [2:22:46<6:56:20,  1.48it/s]


 26%|████████▋                        | 13131/50000 [2:22:46<7:08:24,  1.43it/s]


 26%|████████▋                        | 13132/50000 [2:22:47<6:29:17,  1.58it/s]


 26%|████████▋                        | 13133/50000 [2:22:47<6:38:27,  1.54it/s]


 26%|████████▋                        | 13134/50000 [2:22:48<6:21:42,  1.61it/s]


 26%|████████▋                        | 13135/50000 [2:22:49<6:22:44,  1.61it/s]


 26%|████████▋                        | 13136/50000 [2:22:49<6:41:20,  1.53it/s]


 26%|████████▋                        | 13137/50000 [2:22:50<6:42:19,  1.53it/s]


 26%|████████▋                        | 13138/50000 [2:22:51<7:11:56,  1.42it/s]


 26%|████████▋                        | 13139/50000 [2:22:51<6:53:08,  1.49it/s]


 26%|████████▋                        | 13140/50000 [2:22:52<7:02:35,  1.45it/s]


 26%|████████▋                        | 13141/50000 [2:22:53<6:36:11,  1.55it/s]


 26%|████████▋                        | 13142/50000 [2:22:53<6:37:26,  1.55it/s]


 26%|████████▋                        | 13143/50000 [2:22:54<6:44:10,  1.52it/s]


 26%|████████▋                        | 13144/50000 [2:22:55<6:40:40,  1.53it/s]


 26%|████████▋                        | 13145/50000 [2:22:55<6:40:33,  1.53it/s]


 26%|████████▋                        | 13146/50000 [2:22:56<6:52:43,  1.49it/s]


 26%|████████▋                        | 13147/50000 [2:22:57<6:47:56,  1.51it/s]


 26%|████████▋                        | 13148/50000 [2:22:58<7:07:10,  1.44it/s]


 26%|████████▋                        | 13149/50000 [2:22:58<6:30:41,  1.57it/s]


 26%|████████▋                        | 13150/50000 [2:22:59<6:52:42,  1.49it/s]


 26%|████████▋                        | 13151/50000 [2:22:59<6:43:15,  1.52it/s]


 26%|████████▋                        | 13152/50000 [2:23:00<6:58:04,  1.47it/s]


 26%|████████▋                        | 13153/50000 [2:23:01<6:42:15,  1.53it/s]


 26%|████████▋                        | 13154/50000 [2:23:01<6:41:34,  1.53it/s]


 26%|████████▋                        | 13155/50000 [2:23:02<7:13:43,  1.42it/s]


 26%|████████▋                        | 13156/50000 [2:23:03<7:19:17,  1.40it/s]


 26%|████████▋                        | 13157/50000 [2:23:04<6:54:46,  1.48it/s]


 26%|████████▋                        | 13158/50000 [2:23:04<7:02:08,  1.45it/s]


 26%|████████▋                        | 13159/50000 [2:23:05<6:21:31,  1.61it/s]


 26%|████████▋                        | 13160/50000 [2:23:05<6:17:45,  1.63it/s]


 26%|████████▋                        | 13161/50000 [2:23:06<7:02:29,  1.45it/s]


 26%|████████▋                        | 13162/50000 [2:23:07<6:44:12,  1.52it/s]


 26%|████████▋                        | 13163/50000 [2:23:07<6:25:37,  1.59it/s]


 26%|████████▋                        | 13164/50000 [2:23:08<6:33:36,  1.56it/s]


 26%|████████▋                        | 13165/50000 [2:23:09<6:59:23,  1.46it/s]


 26%|████████▋                        | 13166/50000 [2:23:09<7:04:11,  1.45it/s]


 26%|████████▋                        | 13167/50000 [2:23:10<6:54:37,  1.48it/s]


 26%|████████▋                        | 13168/50000 [2:23:11<6:36:21,  1.55it/s]


 26%|████████▋                        | 13169/50000 [2:23:11<6:21:16,  1.61it/s]


 26%|████████▋                        | 13170/50000 [2:23:12<6:48:34,  1.50it/s]


 26%|████████▋                        | 13171/50000 [2:23:13<6:33:13,  1.56it/s]


 26%|████████▋                        | 13172/50000 [2:23:13<6:03:28,  1.69it/s]


 26%|████████▋                        | 13173/50000 [2:23:14<6:14:21,  1.64it/s]


 26%|████████▋                        | 13174/50000 [2:23:15<7:14:50,  1.41it/s]


 26%|████████▋                        | 13175/50000 [2:23:15<7:03:31,  1.45it/s]


 26%|████████▋                        | 13176/50000 [2:23:16<7:47:09,  1.31it/s]


 26%|████████▋                        | 13177/50000 [2:23:17<7:02:24,  1.45it/s]


 26%|████████▋                        | 13178/50000 [2:23:17<6:52:38,  1.49it/s]


 26%|████████▋                        | 13179/50000 [2:23:18<7:09:27,  1.43it/s]


 26%|████████▋                        | 13180/50000 [2:23:19<6:49:24,  1.50it/s]


 26%|████████▋                        | 13181/50000 [2:23:19<6:30:38,  1.57it/s]


 26%|████████▋                        | 13182/50000 [2:23:20<6:24:28,  1.60it/s]


 26%|████████▋                        | 13183/50000 [2:23:21<6:29:28,  1.58it/s]


 26%|████████▋                        | 13184/50000 [2:23:21<6:20:17,  1.61it/s]


 26%|████████▋                        | 13185/50000 [2:23:22<6:30:00,  1.57it/s]


 26%|████████▋                        | 13186/50000 [2:23:23<6:39:17,  1.54it/s]


 26%|████████▋                        | 13187/50000 [2:23:23<6:27:07,  1.58it/s]


 26%|████████▋                        | 13188/50000 [2:23:24<6:12:43,  1.65it/s]


 26%|████████▋                        | 13189/50000 [2:23:24<6:24:48,  1.59it/s]


 26%|████████▋                        | 13190/50000 [2:23:25<6:43:01,  1.52it/s]


 26%|████████▋                        | 13191/50000 [2:23:26<6:42:10,  1.53it/s]


 26%|████████▋                        | 13192/50000 [2:23:26<6:46:10,  1.51it/s]


 26%|████████▋                        | 13193/50000 [2:23:27<6:59:51,  1.46it/s]


 26%|████████▋                        | 13194/50000 [2:23:28<6:36:29,  1.55it/s]


 26%|████████▋                        | 13195/50000 [2:23:28<6:23:35,  1.60it/s]


 26%|████████▋                        | 13196/50000 [2:23:29<6:26:07,  1.59it/s]


 26%|████████▋                        | 13197/50000 [2:23:30<6:32:38,  1.56it/s]


 26%|████████▋                        | 13198/50000 [2:23:30<7:25:01,  1.38it/s]


 26%|████████▋                        | 13199/50000 [2:23:31<7:02:21,  1.45it/s]


 26%|████████▋                        | 13200/50000 [2:23:32<6:56:52,  1.47it/s]
                                                                                
{'loss': 3.3921, 'grad_norm': 3.21974515914917, 'learning_rate': 0.000736, 'epoch': 0.69}

 26%|████████▋                        | 13200/50000 [2:23:32<6:56:52,  1.47it/s]


 26%|████████▋                        | 13201/50000 [2:23:32<6:47:35,  1.50it/s]


 26%|████████▋                        | 13202/50000 [2:23:33<6:51:32,  1.49it/s]


 26%|████████▋                        | 13203/50000 [2:23:34<6:31:02,  1.57it/s]


 26%|████████▋                        | 13204/50000 [2:23:34<6:16:49,  1.63it/s]


 26%|████████▋                        | 13205/50000 [2:23:35<6:43:21,  1.52it/s]


 26%|████████▋                        | 13206/50000 [2:23:36<6:39:16,  1.54it/s]


 26%|████████▋                        | 13207/50000 [2:23:36<6:39:45,  1.53it/s]


 26%|████████▋                        | 13208/50000 [2:23:37<6:46:01,  1.51it/s]


 26%|████████▋                        | 13209/50000 [2:23:38<7:07:14,  1.44it/s]


 26%|████████▋                        | 13210/50000 [2:23:38<6:55:43,  1.47it/s]


 26%|████████▋                        | 13211/50000 [2:23:39<6:33:31,  1.56it/s]


 26%|████████▋                        | 13212/50000 [2:23:40<6:34:15,  1.56it/s]


 26%|████████▋                        | 13213/50000 [2:23:40<6:40:33,  1.53it/s]


 26%|████████▋                        | 13214/50000 [2:23:41<6:28:39,  1.58it/s]


 26%|████████▋                        | 13215/50000 [2:23:41<6:33:12,  1.56it/s]


 26%|████████▋                        | 13216/50000 [2:23:42<6:24:42,  1.59it/s]


 26%|████████▋                        | 13217/50000 [2:23:43<6:45:53,  1.51it/s]


 26%|████████▋                        | 13218/50000 [2:23:43<6:54:32,  1.48it/s]


 26%|████████▋                        | 13219/50000 [2:23:44<6:48:53,  1.50it/s]


 26%|████████▋                        | 13220/50000 [2:23:45<6:59:21,  1.46it/s]


 26%|████████▋                        | 13221/50000 [2:23:46<6:55:22,  1.48it/s]


 26%|████████▋                        | 13222/50000 [2:23:46<6:41:37,  1.53it/s]


 26%|████████▋                        | 13223/50000 [2:23:47<7:10:12,  1.42it/s]


 26%|████████▋                        | 13224/50000 [2:23:48<7:38:28,  1.34it/s]


 26%|████████▋                        | 13225/50000 [2:23:48<6:54:46,  1.48it/s]


 26%|████████▋                        | 13226/50000 [2:23:49<6:40:52,  1.53it/s]


 26%|████████▋                        | 13227/50000 [2:23:50<6:32:17,  1.56it/s]


 26%|████████▋                        | 13228/50000 [2:23:50<6:25:06,  1.59it/s]


 26%|████████▋                        | 13229/50000 [2:23:51<6:25:23,  1.59it/s]


 26%|████████▋                        | 13230/50000 [2:23:51<6:25:12,  1.59it/s]


 26%|████████▋                        | 13231/50000 [2:23:52<6:26:23,  1.59it/s]


 26%|████████▋                        | 13232/50000 [2:23:53<6:32:58,  1.56it/s]


 26%|████████▋                        | 13233/50000 [2:23:53<6:54:27,  1.48it/s]


 26%|████████▋                        | 13234/50000 [2:23:54<6:24:14,  1.59it/s]


 26%|████████▋                        | 13235/50000 [2:23:55<6:12:59,  1.64it/s]


 26%|████████▋                        | 13236/50000 [2:23:55<6:13:03,  1.64it/s]


 26%|████████▋                        | 13237/50000 [2:23:56<6:08:35,  1.66it/s]


 26%|████████▋                        | 13238/50000 [2:23:56<6:01:05,  1.70it/s]


 26%|████████▋                        | 13239/50000 [2:23:57<6:10:33,  1.65it/s]


 26%|████████▋                        | 13240/50000 [2:23:58<6:15:01,  1.63it/s]


 26%|████████▋                        | 13241/50000 [2:23:58<7:00:53,  1.46it/s]


 26%|████████▋                        | 13242/50000 [2:23:59<6:44:05,  1.52it/s]


 26%|████████▋                        | 13243/50000 [2:24:00<6:17:43,  1.62it/s]


 26%|████████▋                        | 13244/50000 [2:24:00<6:12:37,  1.64it/s]


 26%|████████▋                        | 13245/50000 [2:24:01<6:10:12,  1.65it/s]


 26%|████████▋                        | 13246/50000 [2:24:01<6:09:33,  1.66it/s]


 26%|████████▋                        | 13247/50000 [2:24:02<6:48:39,  1.50it/s]


 26%|████████▋                        | 13248/50000 [2:24:03<6:44:44,  1.51it/s]


 26%|████████▋                        | 13249/50000 [2:24:03<6:56:18,  1.47it/s]


 26%|████████▋                        | 13250/50000 [2:24:04<6:36:04,  1.55it/s]


 27%|████████▋                        | 13251/50000 [2:24:05<6:24:44,  1.59it/s]


 27%|████████▋                        | 13252/50000 [2:24:05<6:44:53,  1.51it/s]


 27%|████████▋                        | 13253/50000 [2:24:06<7:12:33,  1.42it/s]


 27%|████████▋                        | 13254/50000 [2:24:07<6:59:24,  1.46it/s]


 27%|████████▋                        | 13255/50000 [2:24:07<6:47:35,  1.50it/s]


 27%|████████▋                        | 13256/50000 [2:24:08<6:30:32,  1.57it/s]


 27%|████████▋                        | 13257/50000 [2:24:09<6:19:43,  1.61it/s]


 27%|████████▊                        | 13258/50000 [2:24:09<6:13:48,  1.64it/s]


 27%|████████▊                        | 13259/50000 [2:24:10<6:17:15,  1.62it/s]


 27%|████████▊                        | 13260/50000 [2:24:10<6:15:09,  1.63it/s]


 27%|████████▊                        | 13261/50000 [2:24:11<6:36:00,  1.55it/s]


 27%|████████▊                        | 13262/50000 [2:24:12<6:26:51,  1.58it/s]


 27%|████████▊                        | 13263/50000 [2:24:12<6:27:28,  1.58it/s]


 27%|████████▊                        | 13264/50000 [2:24:13<6:30:08,  1.57it/s]


 27%|████████▊                        | 13265/50000 [2:24:14<6:24:58,  1.59it/s]


 27%|████████▊                        | 13266/50000 [2:24:14<6:47:28,  1.50it/s]


 27%|████████▊                        | 13267/50000 [2:24:15<6:59:40,  1.46it/s]


 27%|████████▊                        | 13268/50000 [2:24:16<6:59:27,  1.46it/s]


 27%|████████▊                        | 13269/50000 [2:24:16<6:40:31,  1.53it/s]


 27%|████████▊                        | 13270/50000 [2:24:17<6:53:16,  1.48it/s]


 27%|████████▊                        | 13271/50000 [2:24:18<6:33:58,  1.55it/s]


 27%|████████▊                        | 13272/50000 [2:24:18<6:49:56,  1.49it/s]


 27%|████████▊                        | 13273/50000 [2:24:19<6:51:16,  1.49it/s]


 27%|████████▊                        | 13274/50000 [2:24:20<6:43:01,  1.52it/s]


 27%|████████▊                        | 13275/50000 [2:24:20<6:16:54,  1.62it/s]


 27%|████████▊                        | 13276/50000 [2:24:21<6:08:26,  1.66it/s]


 27%|████████▊                        | 13277/50000 [2:24:21<6:14:04,  1.64it/s]


 27%|████████▊                        | 13278/50000 [2:24:22<6:13:22,  1.64it/s]


 27%|████████▊                        | 13279/50000 [2:24:23<6:18:43,  1.62it/s]


 27%|████████▊                        | 13280/50000 [2:24:23<6:10:56,  1.65it/s]


 27%|████████▊                        | 13281/50000 [2:24:24<6:23:57,  1.59it/s]


 27%|████████▊                        | 13282/50000 [2:24:25<6:47:35,  1.50it/s]


 27%|████████▊                        | 13283/50000 [2:24:25<6:43:42,  1.52it/s]


 27%|████████▊                        | 13284/50000 [2:24:26<7:12:16,  1.42it/s]


 27%|████████▊                        | 13285/50000 [2:24:27<6:38:31,  1.54it/s]


 27%|████████▊                        | 13286/50000 [2:24:27<6:55:24,  1.47it/s]


 27%|████████▊                        | 13287/50000 [2:24:28<7:14:13,  1.41it/s]


 27%|████████▊                        | 13288/50000 [2:24:29<7:07:05,  1.43it/s]


 27%|████████▊                        | 13289/50000 [2:24:29<6:29:05,  1.57it/s]


 27%|████████▊                        | 13290/50000 [2:24:30<6:30:49,  1.57it/s]


 27%|████████▊                        | 13291/50000 [2:24:31<7:24:25,  1.38it/s]


 27%|████████▊                        | 13292/50000 [2:24:32<7:08:21,  1.43it/s]


 27%|████████▊                        | 13293/50000 [2:24:32<7:23:11,  1.38it/s]


 27%|████████▊                        | 13294/50000 [2:24:33<7:25:26,  1.37it/s]


 27%|████████▊                        | 13295/50000 [2:24:34<6:59:44,  1.46it/s]


 27%|████████▊                        | 13296/50000 [2:24:34<6:39:35,  1.53it/s]


 27%|████████▊                        | 13297/50000 [2:24:35<6:24:21,  1.59it/s]


 27%|████████▊                        | 13298/50000 [2:24:36<7:19:53,  1.39it/s]


 27%|████████▊                        | 13299/50000 [2:24:36<6:43:45,  1.51it/s]


 27%|████████▊                        | 13300/50000 [2:24:37<6:32:49,  1.56it/s]
                                                                                
{'loss': 3.3976, 'grad_norm': 4.648677825927734, 'learning_rate': 0.000734, 'epoch': 0.7}

 27%|████████▊                        | 13300/50000 [2:24:37<6:32:49,  1.56it/s]


 27%|████████▊                        | 13301/50000 [2:24:37<6:19:33,  1.61it/s]


 27%|████████▊                        | 13302/50000 [2:24:38<6:23:17,  1.60it/s]


 27%|████████▊                        | 13303/50000 [2:24:39<6:42:13,  1.52it/s]


 27%|████████▊                        | 13304/50000 [2:24:39<6:45:50,  1.51it/s]


 27%|████████▊                        | 13305/50000 [2:24:40<6:42:35,  1.52it/s]


 27%|████████▊                        | 13306/50000 [2:24:41<6:40:17,  1.53it/s]


 27%|████████▊                        | 13307/50000 [2:24:42<6:53:00,  1.48it/s]


 27%|████████▊                        | 13308/50000 [2:24:42<6:33:00,  1.56it/s]


 27%|████████▊                        | 13309/50000 [2:24:43<6:53:55,  1.48it/s]


 27%|████████▊                        | 13310/50000 [2:24:43<6:38:57,  1.53it/s]


 27%|████████▊                        | 13311/50000 [2:24:44<6:33:36,  1.55it/s]


 27%|████████▊                        | 13312/50000 [2:24:45<6:49:24,  1.49it/s]


 27%|████████▊                        | 13313/50000 [2:24:46<7:05:17,  1.44it/s]


 27%|████████▊                        | 13314/50000 [2:24:46<6:57:13,  1.47it/s]


 27%|████████▊                        | 13315/50000 [2:24:47<7:20:25,  1.39it/s]


 27%|████████▊                        | 13316/50000 [2:24:47<6:40:44,  1.53it/s]


 27%|████████▊                        | 13317/50000 [2:24:48<6:26:17,  1.58it/s]


 27%|████████▊                        | 13318/50000 [2:24:49<6:47:49,  1.50it/s]


 27%|████████▊                        | 13319/50000 [2:24:49<6:44:26,  1.51it/s]


 27%|████████▊                        | 13320/50000 [2:24:50<6:31:29,  1.56it/s]


 27%|████████▊                        | 13321/50000 [2:24:51<6:50:36,  1.49it/s]


 27%|████████▊                        | 13322/50000 [2:24:51<6:38:02,  1.54it/s]


 27%|████████▊                        | 13323/50000 [2:24:52<6:57:34,  1.46it/s]


 27%|████████▊                        | 13324/50000 [2:24:53<6:48:46,  1.50it/s]


 27%|████████▊                        | 13325/50000 [2:24:53<6:26:37,  1.58it/s]


 27%|████████▊                        | 13326/50000 [2:24:54<6:24:48,  1.59it/s]


 27%|████████▊                        | 13327/50000 [2:24:55<6:19:15,  1.61it/s]


 27%|████████▊                        | 13328/50000 [2:24:55<6:05:54,  1.67it/s]


 27%|████████▊                        | 13329/50000 [2:24:56<6:33:47,  1.55it/s]


 27%|████████▊                        | 13330/50000 [2:24:56<6:21:42,  1.60it/s]


 27%|████████▊                        | 13331/50000 [2:24:57<6:10:52,  1.65it/s]


 27%|████████▊                        | 13332/50000 [2:24:58<6:25:15,  1.59it/s]


 27%|████████▊                        | 13333/50000 [2:24:58<6:11:06,  1.65it/s]


 27%|████████▊                        | 13334/50000 [2:24:59<6:17:55,  1.62it/s]


 27%|████████▊                        | 13335/50000 [2:25:00<6:20:07,  1.61it/s]


 27%|████████▊                        | 13336/50000 [2:25:00<6:28:03,  1.57it/s]


 27%|████████▊                        | 13337/50000 [2:25:01<6:26:34,  1.58it/s]


 27%|████████▊                        | 13338/50000 [2:25:01<6:05:12,  1.67it/s]


 27%|████████▊                        | 13339/50000 [2:25:02<6:06:53,  1.67it/s]


 27%|████████▊                        | 13340/50000 [2:25:03<6:31:47,  1.56it/s]


 27%|████████▊                        | 13341/50000 [2:25:03<6:37:37,  1.54it/s]


 27%|████████▊                        | 13342/50000 [2:25:04<6:33:50,  1.55it/s]


 27%|████████▊                        | 13343/50000 [2:25:05<6:39:00,  1.53it/s]


 27%|████████▊                        | 13344/50000 [2:25:05<6:54:41,  1.47it/s]


 27%|████████▊                        | 13345/50000 [2:25:06<6:51:16,  1.49it/s]


 27%|████████▊                        | 13346/50000 [2:25:07<7:04:27,  1.44it/s]


 27%|████████▊                        | 13347/50000 [2:25:07<6:59:20,  1.46it/s]


 27%|████████▊                        | 13348/50000 [2:25:08<6:56:53,  1.47it/s]


 27%|████████▊                        | 13349/50000 [2:25:09<6:47:03,  1.50it/s]


 27%|████████▊                        | 13350/50000 [2:25:09<6:32:05,  1.56it/s]


 27%|████████▊                        | 13351/50000 [2:25:10<6:33:06,  1.55it/s]


 27%|████████▊                        | 13352/50000 [2:25:11<6:40:53,  1.52it/s]


 27%|████████▊                        | 13353/50000 [2:25:11<6:46:21,  1.50it/s]


 27%|████████▊                        | 13354/50000 [2:25:12<6:45:22,  1.51it/s]


 27%|████████▊                        | 13355/50000 [2:25:13<6:27:23,  1.58it/s]


 27%|████████▊                        | 13356/50000 [2:25:13<6:26:19,  1.58it/s]


 27%|████████▊                        | 13357/50000 [2:25:14<6:59:06,  1.46it/s]


 27%|████████▊                        | 13358/50000 [2:25:15<6:35:39,  1.54it/s]


 27%|████████▊                        | 13359/50000 [2:25:15<6:38:47,  1.53it/s]


 27%|████████▊                        | 13360/50000 [2:25:16<6:26:59,  1.58it/s]


 27%|████████▊                        | 13361/50000 [2:25:16<6:18:46,  1.61it/s]


 27%|████████▊                        | 13362/50000 [2:25:17<6:15:19,  1.63it/s]


 27%|████████▊                        | 13363/50000 [2:25:18<6:23:18,  1.59it/s]


 27%|████████▊                        | 13364/50000 [2:25:18<6:08:09,  1.66it/s]


 27%|████████▊                        | 13365/50000 [2:25:19<6:04:25,  1.68it/s]


 27%|████████▊                        | 13366/50000 [2:25:19<5:50:41,  1.74it/s]


 27%|████████▊                        | 13367/50000 [2:25:20<6:07:10,  1.66it/s]


 27%|████████▊                        | 13368/50000 [2:25:21<6:03:47,  1.68it/s]


 27%|████████▊                        | 13369/50000 [2:25:21<6:11:02,  1.65it/s]


 27%|████████▊                        | 13370/50000 [2:25:22<6:16:49,  1.62it/s]


 27%|████████▊                        | 13371/50000 [2:25:23<6:53:01,  1.48it/s]


 27%|████████▊                        | 13372/50000 [2:25:23<6:39:26,  1.53it/s]


 27%|████████▊                        | 13373/50000 [2:25:24<6:33:40,  1.55it/s]


 27%|████████▊                        | 13374/50000 [2:25:25<6:30:38,  1.56it/s]


 27%|████████▊                        | 13375/50000 [2:25:25<6:30:55,  1.56it/s]


 27%|████████▊                        | 13376/50000 [2:25:26<6:14:06,  1.63it/s]


 27%|████████▊                        | 13377/50000 [2:25:26<6:25:02,  1.59it/s]


 27%|████████▊                        | 13378/50000 [2:25:27<6:27:01,  1.58it/s]


 27%|████████▊                        | 13379/50000 [2:25:28<6:35:33,  1.54it/s]


 27%|████████▊                        | 13380/50000 [2:25:28<6:36:49,  1.54it/s]


 27%|████████▊                        | 13381/50000 [2:25:29<6:10:28,  1.65it/s]


 27%|████████▊                        | 13382/50000 [2:25:29<5:59:25,  1.70it/s]


 27%|████████▊                        | 13383/50000 [2:25:30<5:47:34,  1.76it/s]


 27%|████████▊                        | 13384/50000 [2:25:30<5:34:43,  1.82it/s]


 27%|████████▊                        | 13385/50000 [2:25:31<5:55:06,  1.72it/s]


 27%|████████▊                        | 13386/50000 [2:25:32<6:07:02,  1.66it/s]


 27%|████████▊                        | 13387/50000 [2:25:32<6:22:30,  1.60it/s]


 27%|████████▊                        | 13388/50000 [2:25:33<6:41:42,  1.52it/s]


 27%|████████▊                        | 13389/50000 [2:25:34<6:26:58,  1.58it/s]


 27%|████████▊                        | 13390/50000 [2:25:34<6:33:52,  1.55it/s]


 27%|████████▊                        | 13391/50000 [2:25:35<6:51:36,  1.48it/s]


 27%|████████▊                        | 13392/50000 [2:25:36<6:47:25,  1.50it/s]


 27%|████████▊                        | 13393/50000 [2:25:36<6:41:06,  1.52it/s]


 27%|████████▊                        | 13394/50000 [2:25:37<6:40:03,  1.53it/s]


 27%|████████▊                        | 13395/50000 [2:25:38<6:42:46,  1.51it/s]


 27%|████████▊                        | 13396/50000 [2:25:38<6:28:33,  1.57it/s]


 27%|████████▊                        | 13397/50000 [2:25:39<6:17:23,  1.62it/s]


 27%|████████▊                        | 13398/50000 [2:25:40<6:23:27,  1.59it/s]


 27%|████████▊                        | 13399/50000 [2:25:40<6:21:58,  1.60it/s]


 27%|████████▊                        | 13400/50000 [2:25:41<6:47:57,  1.50it/s]
                                                                                
{'loss': 3.4178, 'grad_norm': 3.3885910511016846, 'learning_rate': 0.000732, 'epoch': 0.7}

 27%|████████▊                        | 13400/50000 [2:25:41<6:47:57,  1.50it/s]


 27%|████████▊                        | 13401/50000 [2:25:42<6:41:26,  1.52it/s]


 27%|████████▊                        | 13402/50000 [2:25:42<6:39:05,  1.53it/s]


 27%|████████▊                        | 13403/50000 [2:25:43<6:40:59,  1.52it/s]


 27%|████████▊                        | 13404/50000 [2:25:44<6:37:34,  1.53it/s]


 27%|████████▊                        | 13405/50000 [2:25:44<6:28:15,  1.57it/s]


 27%|████████▊                        | 13406/50000 [2:25:45<6:20:39,  1.60it/s]


 27%|████████▊                        | 13407/50000 [2:25:45<6:25:39,  1.58it/s]


 27%|████████▊                        | 13408/50000 [2:25:46<6:17:37,  1.62it/s]


 27%|████████▊                        | 13409/50000 [2:25:47<6:13:11,  1.63it/s]


 27%|████████▊                        | 13410/50000 [2:25:47<6:36:02,  1.54it/s]


 27%|████████▊                        | 13411/50000 [2:25:48<6:12:46,  1.64it/s]


 27%|████████▊                        | 13412/50000 [2:25:49<6:49:03,  1.49it/s]


 27%|████████▊                        | 13413/50000 [2:25:49<6:37:54,  1.53it/s]


 27%|████████▊                        | 13414/50000 [2:25:50<6:33:37,  1.55it/s]


 27%|████████▊                        | 13415/50000 [2:25:51<7:08:47,  1.42it/s]


 27%|████████▊                        | 13416/50000 [2:25:51<6:55:15,  1.47it/s]


 27%|████████▊                        | 13417/50000 [2:25:52<6:46:32,  1.50it/s]


 27%|████████▊                        | 13418/50000 [2:25:53<6:46:03,  1.50it/s]


 27%|████████▊                        | 13419/50000 [2:25:53<6:58:56,  1.46it/s]


 27%|████████▊                        | 13420/50000 [2:25:54<6:52:27,  1.48it/s]


 27%|████████▊                        | 13421/50000 [2:25:55<7:09:46,  1.42it/s]


 27%|████████▊                        | 13422/50000 [2:25:55<6:55:49,  1.47it/s]


 27%|████████▊                        | 13423/50000 [2:25:56<6:36:29,  1.54it/s]


 27%|████████▊                        | 13424/50000 [2:25:57<6:22:58,  1.59it/s]


 27%|████████▊                        | 13425/50000 [2:25:57<6:16:52,  1.62it/s]


 27%|████████▊                        | 13426/50000 [2:25:58<6:13:42,  1.63it/s]


 27%|████████▊                        | 13427/50000 [2:25:58<5:55:56,  1.71it/s]


 27%|████████▊                        | 13428/50000 [2:25:59<6:04:58,  1.67it/s]


 27%|████████▊                        | 13429/50000 [2:26:00<6:32:25,  1.55it/s]


 27%|████████▊                        | 13430/50000 [2:26:00<6:18:55,  1.61it/s]


 27%|████████▊                        | 13431/50000 [2:26:01<6:40:08,  1.52it/s]


 27%|████████▊                        | 13432/50000 [2:26:02<6:40:27,  1.52it/s]


 27%|████████▊                        | 13433/50000 [2:26:02<6:22:31,  1.59it/s]


 27%|████████▊                        | 13434/50000 [2:26:03<6:11:10,  1.64it/s]


 27%|████████▊                        | 13435/50000 [2:26:03<6:14:45,  1.63it/s]


 27%|████████▊                        | 13436/50000 [2:26:04<6:36:25,  1.54it/s]


 27%|████████▊                        | 13437/50000 [2:26:05<6:18:05,  1.61it/s]


 27%|████████▊                        | 13438/50000 [2:26:06<6:54:05,  1.47it/s]


 27%|████████▊                        | 13439/50000 [2:26:06<6:55:54,  1.47it/s]


 27%|████████▊                        | 13440/50000 [2:26:07<6:39:20,  1.53it/s]


 27%|████████▊                        | 13441/50000 [2:26:07<6:24:18,  1.59it/s]


 27%|████████▊                        | 13442/50000 [2:26:08<6:10:26,  1.64it/s]


 27%|████████▊                        | 13443/50000 [2:26:09<6:16:28,  1.62it/s]


 27%|████████▊                        | 13444/50000 [2:26:09<6:09:11,  1.65it/s]


 27%|████████▊                        | 13445/50000 [2:26:10<6:09:50,  1.65it/s]


 27%|████████▊                        | 13446/50000 [2:26:11<6:38:38,  1.53it/s]


 27%|████████▉                        | 13447/50000 [2:26:11<6:38:37,  1.53it/s]


 27%|████████▉                        | 13448/50000 [2:26:12<6:29:15,  1.57it/s]


 27%|████████▉                        | 13449/50000 [2:26:12<6:31:41,  1.56it/s]


 27%|████████▉                        | 13450/50000 [2:26:13<6:39:00,  1.53it/s]


 27%|████████▉                        | 13451/50000 [2:26:14<6:50:00,  1.49it/s]


 27%|████████▉                        | 13452/50000 [2:26:14<6:36:49,  1.53it/s]


 27%|████████▉                        | 13453/50000 [2:26:15<7:16:13,  1.40it/s]


 27%|████████▉                        | 13454/50000 [2:26:16<7:10:19,  1.42it/s]


 27%|████████▉                        | 13455/50000 [2:26:17<6:48:44,  1.49it/s]


 27%|████████▉                        | 13456/50000 [2:26:17<6:21:29,  1.60it/s]


 27%|████████▉                        | 13457/50000 [2:26:18<6:12:19,  1.64it/s]


 27%|████████▉                        | 13458/50000 [2:26:18<6:20:56,  1.60it/s]


 27%|████████▉                        | 13459/50000 [2:26:19<6:42:16,  1.51it/s]


 27%|████████▉                        | 13460/50000 [2:26:20<6:07:51,  1.66it/s]


 27%|████████▉                        | 13461/50000 [2:26:20<6:19:58,  1.60it/s]


 27%|████████▉                        | 13462/50000 [2:26:21<6:21:37,  1.60it/s]


 27%|████████▉                        | 13463/50000 [2:26:21<6:21:37,  1.60it/s]


 27%|████████▉                        | 13464/50000 [2:26:22<6:22:43,  1.59it/s]


 27%|████████▉                        | 13465/50000 [2:26:23<6:24:34,  1.58it/s]


 27%|████████▉                        | 13466/50000 [2:26:24<6:57:05,  1.46it/s]


 27%|████████▉                        | 13467/50000 [2:26:24<7:06:30,  1.43it/s]


 27%|████████▉                        | 13468/50000 [2:26:25<7:02:10,  1.44it/s]


 27%|████████▉                        | 13469/50000 [2:26:26<6:41:55,  1.51it/s]


 27%|████████▉                        | 13470/50000 [2:26:26<6:43:54,  1.51it/s]


 27%|████████▉                        | 13471/50000 [2:26:27<6:47:39,  1.49it/s]


 27%|████████▉                        | 13472/50000 [2:26:28<7:20:14,  1.38it/s]


 27%|████████▉                        | 13473/50000 [2:26:28<7:17:38,  1.39it/s]


 27%|████████▉                        | 13474/50000 [2:26:29<6:51:32,  1.48it/s]


 27%|████████▉                        | 13475/50000 [2:26:30<6:35:32,  1.54it/s]


 27%|████████▉                        | 13476/50000 [2:26:30<6:50:05,  1.48it/s]


 27%|████████▉                        | 13477/50000 [2:26:31<6:44:57,  1.50it/s]


 27%|████████▉                        | 13478/50000 [2:26:32<6:34:16,  1.54it/s]


 27%|████████▉                        | 13479/50000 [2:26:32<6:45:29,  1.50it/s]


 27%|████████▉                        | 13480/50000 [2:26:33<6:26:45,  1.57it/s]


 27%|████████▉                        | 13481/50000 [2:26:34<6:25:09,  1.58it/s]


 27%|████████▉                        | 13482/50000 [2:26:34<6:44:02,  1.51it/s]


 27%|████████▉                        | 13483/50000 [2:26:35<7:05:23,  1.43it/s]


 27%|████████▉                        | 13484/50000 [2:26:36<7:28:56,  1.36it/s]


 27%|████████▉                        | 13485/50000 [2:26:37<7:14:26,  1.40it/s]


 27%|████████▉                        | 13486/50000 [2:26:37<7:16:12,  1.40it/s]


 27%|████████▉                        | 13487/50000 [2:26:38<7:06:39,  1.43it/s]


 27%|████████▉                        | 13488/50000 [2:26:38<6:42:40,  1.51it/s]


 27%|████████▉                        | 13489/50000 [2:26:39<6:36:06,  1.54it/s]


 27%|████████▉                        | 13490/50000 [2:26:40<6:25:52,  1.58it/s]


 27%|████████▉                        | 13491/50000 [2:26:40<6:10:39,  1.64it/s]


 27%|████████▉                        | 13492/50000 [2:26:41<5:47:59,  1.75it/s]


 27%|████████▉                        | 13493/50000 [2:26:41<6:13:00,  1.63it/s]


 27%|████████▉                        | 13494/50000 [2:26:42<6:17:43,  1.61it/s]


 27%|████████▉                        | 13495/50000 [2:26:43<6:37:36,  1.53it/s]


 27%|████████▉                        | 13496/50000 [2:26:43<6:34:58,  1.54it/s]


 27%|████████▉                        | 13497/50000 [2:26:44<6:21:38,  1.59it/s]


 27%|████████▉                        | 13498/50000 [2:26:45<6:28:23,  1.57it/s]


 27%|████████▉                        | 13499/50000 [2:26:45<6:41:19,  1.52it/s]


 27%|████████▉                        | 13500/50000 [2:26:46<7:16:54,  1.39it/s]
                                                                                
{'loss': 3.3907, 'grad_norm': 3.2163681983947754, 'learning_rate': 0.00073, 'epoch': 0.71}

 27%|████████▉                        | 13500/50000 [2:26:46<7:16:54,  1.39it/s]


 27%|████████▉                        | 13501/50000 [2:26:47<7:03:39,  1.44it/s]


 27%|████████▉                        | 13502/50000 [2:26:47<6:45:07,  1.50it/s]


 27%|████████▉                        | 13503/50000 [2:26:48<6:40:33,  1.52it/s]


 27%|████████▉                        | 13504/50000 [2:26:49<6:49:52,  1.48it/s]


 27%|████████▉                        | 13505/50000 [2:26:50<7:02:21,  1.44it/s]


 27%|████████▉                        | 13506/50000 [2:26:50<7:13:28,  1.40it/s]


 27%|████████▉                        | 13507/50000 [2:26:51<6:51:53,  1.48it/s]


 27%|████████▉                        | 13508/50000 [2:26:52<6:45:59,  1.50it/s]


 27%|████████▉                        | 13509/50000 [2:26:52<6:29:16,  1.56it/s]


 27%|████████▉                        | 13510/50000 [2:26:53<6:54:58,  1.47it/s]


 27%|████████▉                        | 13511/50000 [2:26:53<6:29:49,  1.56it/s]


 27%|████████▉                        | 13512/50000 [2:26:54<6:28:17,  1.57it/s]


 27%|████████▉                        | 13513/50000 [2:26:55<6:35:20,  1.54it/s]


 27%|████████▉                        | 13514/50000 [2:26:55<6:19:46,  1.60it/s]


 27%|████████▉                        | 13515/50000 [2:26:56<6:38:27,  1.53it/s]


 27%|████████▉                        | 13516/50000 [2:26:57<6:44:23,  1.50it/s]


 27%|████████▉                        | 13517/50000 [2:26:57<6:37:58,  1.53it/s]


 27%|████████▉                        | 13518/50000 [2:26:58<6:26:17,  1.57it/s]


 27%|████████▉                        | 13519/50000 [2:26:59<6:30:45,  1.56it/s]


 27%|████████▉                        | 13520/50000 [2:26:59<6:16:51,  1.61it/s]


 27%|████████▉                        | 13521/50000 [2:27:00<6:12:25,  1.63it/s]


 27%|████████▉                        | 13522/50000 [2:27:00<6:13:54,  1.63it/s]


 27%|████████▉                        | 13523/50000 [2:27:01<6:31:15,  1.55it/s]


 27%|████████▉                        | 13524/50000 [2:27:02<6:37:19,  1.53it/s]


 27%|████████▉                        | 13525/50000 [2:27:02<6:17:24,  1.61it/s]


 27%|████████▉                        | 13526/50000 [2:27:03<6:24:21,  1.58it/s]


 27%|████████▉                        | 13527/50000 [2:27:04<6:17:30,  1.61it/s]


 27%|████████▉                        | 13528/50000 [2:27:04<6:20:45,  1.60it/s]


 27%|████████▉                        | 13529/50000 [2:27:05<6:14:57,  1.62it/s]


 27%|████████▉                        | 13530/50000 [2:27:06<6:57:21,  1.46it/s]


 27%|████████▉                        | 13531/50000 [2:27:07<7:20:17,  1.38it/s]


 27%|████████▉                        | 13532/50000 [2:27:07<7:08:26,  1.42it/s]


 27%|████████▉                        | 13533/50000 [2:27:08<7:12:06,  1.41it/s]


 27%|████████▉                        | 13534/50000 [2:27:09<7:00:26,  1.45it/s]


 27%|████████▉                        | 13535/50000 [2:27:09<7:26:31,  1.36it/s]


 27%|████████▉                        | 13536/50000 [2:27:10<6:57:48,  1.45it/s]


 27%|████████▉                        | 13537/50000 [2:27:11<6:49:02,  1.49it/s]


 27%|████████▉                        | 13538/50000 [2:27:11<6:39:27,  1.52it/s]


 27%|████████▉                        | 13539/50000 [2:27:12<7:09:26,  1.42it/s]


 27%|████████▉                        | 13540/50000 [2:27:13<7:18:20,  1.39it/s]


 27%|████████▉                        | 13541/50000 [2:27:13<6:42:03,  1.51it/s]


 27%|████████▉                        | 13542/50000 [2:27:14<6:43:38,  1.51it/s]


 27%|████████▉                        | 13543/50000 [2:27:15<6:17:42,  1.61it/s]


 27%|████████▉                        | 13544/50000 [2:27:15<6:22:01,  1.59it/s]


 27%|████████▉                        | 13545/50000 [2:27:16<6:56:01,  1.46it/s]


 27%|████████▉                        | 13546/50000 [2:27:17<6:36:14,  1.53it/s]


 27%|████████▉                        | 13547/50000 [2:27:17<6:50:36,  1.48it/s]


 27%|████████▉                        | 13548/50000 [2:27:18<7:09:44,  1.41it/s]


 27%|████████▉                        | 13549/50000 [2:27:19<7:00:49,  1.44it/s]


 27%|████████▉                        | 13550/50000 [2:27:19<6:28:36,  1.56it/s]


 27%|████████▉                        | 13551/50000 [2:27:20<6:15:50,  1.62it/s]


 27%|████████▉                        | 13552/50000 [2:27:21<6:36:14,  1.53it/s]


 27%|████████▉                        | 13553/50000 [2:27:21<6:22:25,  1.59it/s]


 27%|████████▉                        | 13554/50000 [2:27:22<6:25:17,  1.58it/s]


 27%|████████▉                        | 13555/50000 [2:27:22<6:42:39,  1.51it/s]


 27%|████████▉                        | 13556/50000 [2:27:23<6:23:57,  1.58it/s]


 27%|████████▉                        | 13557/50000 [2:27:24<6:04:04,  1.67it/s]


 27%|████████▉                        | 13558/50000 [2:27:24<6:04:24,  1.67it/s]


 27%|████████▉                        | 13559/50000 [2:27:25<6:44:08,  1.50it/s]


 27%|████████▉                        | 13560/50000 [2:27:26<6:52:15,  1.47it/s]


 27%|████████▉                        | 13561/50000 [2:27:26<6:44:59,  1.50it/s]


 27%|████████▉                        | 13562/50000 [2:27:27<6:29:34,  1.56it/s]


 27%|████████▉                        | 13563/50000 [2:27:28<6:33:53,  1.54it/s]


 27%|████████▉                        | 13564/50000 [2:27:28<6:39:03,  1.52it/s]


 27%|████████▉                        | 13565/50000 [2:27:29<6:34:40,  1.54it/s]


 27%|████████▉                        | 13566/50000 [2:27:29<6:24:32,  1.58it/s]


 27%|████████▉                        | 13567/50000 [2:27:30<6:50:19,  1.48it/s]


 27%|████████▉                        | 13568/50000 [2:27:31<6:49:30,  1.48it/s]


 27%|████████▉                        | 13569/50000 [2:27:31<6:26:02,  1.57it/s]


 27%|████████▉                        | 13570/50000 [2:27:32<6:09:31,  1.64it/s]


 27%|████████▉                        | 13571/50000 [2:27:33<6:28:58,  1.56it/s]


 27%|████████▉                        | 13572/50000 [2:27:33<6:14:56,  1.62it/s]


 27%|████████▉                        | 13573/50000 [2:27:34<6:08:37,  1.65it/s]


 27%|████████▉                        | 13574/50000 [2:27:34<6:06:25,  1.66it/s]


 27%|████████▉                        | 13575/50000 [2:27:35<6:25:32,  1.57it/s]


 27%|████████▉                        | 13576/50000 [2:27:36<6:14:47,  1.62it/s]


 27%|████████▉                        | 13577/50000 [2:27:36<6:01:33,  1.68it/s]


 27%|████████▉                        | 13578/50000 [2:27:37<6:15:17,  1.62it/s]


 27%|████████▉                        | 13579/50000 [2:27:38<6:31:54,  1.55it/s]


 27%|████████▉                        | 13580/50000 [2:27:38<6:47:18,  1.49it/s]


 27%|████████▉                        | 13581/50000 [2:27:39<7:02:55,  1.44it/s]


 27%|████████▉                        | 13582/50000 [2:27:40<6:50:16,  1.48it/s]


 27%|████████▉                        | 13583/50000 [2:27:40<6:33:10,  1.54it/s]


 27%|████████▉                        | 13584/50000 [2:27:41<6:18:59,  1.60it/s]


 27%|████████▉                        | 13585/50000 [2:27:42<6:20:33,  1.59it/s]


 27%|████████▉                        | 13586/50000 [2:27:42<6:14:36,  1.62it/s]


 27%|████████▉                        | 13587/50000 [2:27:43<6:23:15,  1.58it/s]


 27%|████████▉                        | 13588/50000 [2:27:43<5:58:50,  1.69it/s]


 27%|████████▉                        | 13589/50000 [2:27:44<6:14:25,  1.62it/s]


 27%|████████▉                        | 13590/50000 [2:27:45<6:23:06,  1.58it/s]


 27%|████████▉                        | 13591/50000 [2:27:45<6:10:46,  1.64it/s]


 27%|████████▉                        | 13592/50000 [2:27:46<6:29:45,  1.56it/s]


 27%|████████▉                        | 13593/50000 [2:27:47<6:28:08,  1.56it/s]


 27%|████████▉                        | 13594/50000 [2:27:47<6:46:56,  1.49it/s]


 27%|████████▉                        | 13595/50000 [2:27:48<6:43:26,  1.50it/s]


 27%|████████▉                        | 13596/50000 [2:27:49<6:53:04,  1.47it/s]


 27%|████████▉                        | 13597/50000 [2:27:49<7:04:18,  1.43it/s]


 27%|████████▉                        | 13598/50000 [2:27:50<7:12:11,  1.40it/s]


 27%|████████▉                        | 13599/50000 [2:27:51<6:57:54,  1.45it/s]


 27%|████████▉                        | 13600/50000 [2:27:52<7:26:59,  1.36it/s]
                                                                                
{'loss': 3.4038, 'grad_norm': 3.088707685470581, 'learning_rate': 0.000728, 'epoch': 0.71}

 27%|████████▉                        | 13600/50000 [2:27:52<7:26:59,  1.36it/s]


 27%|████████▉                        | 13601/50000 [2:27:52<6:55:30,  1.46it/s]


 27%|████████▉                        | 13602/50000 [2:27:53<6:48:21,  1.49it/s]


 27%|████████▉                        | 13603/50000 [2:27:54<6:49:00,  1.48it/s]


 27%|████████▉                        | 13604/50000 [2:27:54<6:55:19,  1.46it/s]


 27%|████████▉                        | 13605/50000 [2:27:55<6:43:37,  1.50it/s]


 27%|████████▉                        | 13606/50000 [2:27:55<6:16:25,  1.61it/s]


 27%|████████▉                        | 13607/50000 [2:27:56<6:04:55,  1.66it/s]


 27%|████████▉                        | 13608/50000 [2:27:57<6:32:54,  1.54it/s]


 27%|████████▉                        | 13609/50000 [2:27:57<6:34:49,  1.54it/s]


 27%|████████▉                        | 13610/50000 [2:27:58<6:18:59,  1.60it/s]


 27%|████████▉                        | 13611/50000 [2:27:59<6:09:07,  1.64it/s]


 27%|████████▉                        | 13612/50000 [2:27:59<6:11:18,  1.63it/s]


 27%|████████▉                        | 13613/50000 [2:28:00<6:24:55,  1.58it/s]


 27%|████████▉                        | 13614/50000 [2:28:00<6:24:36,  1.58it/s]


 27%|████████▉                        | 13615/50000 [2:28:01<6:15:18,  1.62it/s]


 27%|████████▉                        | 13616/50000 [2:28:02<5:54:22,  1.71it/s]


 27%|████████▉                        | 13617/50000 [2:28:02<5:55:05,  1.71it/s]


 27%|████████▉                        | 13618/50000 [2:28:03<6:01:30,  1.68it/s]


 27%|████████▉                        | 13619/50000 [2:28:03<6:14:40,  1.62it/s]


 27%|████████▉                        | 13620/50000 [2:28:04<5:54:32,  1.71it/s]


 27%|████████▉                        | 13621/50000 [2:28:05<6:07:07,  1.65it/s]


 27%|████████▉                        | 13622/50000 [2:28:05<6:16:01,  1.61it/s]


 27%|████████▉                        | 13623/50000 [2:28:06<6:12:59,  1.63it/s]


 27%|████████▉                        | 13624/50000 [2:28:06<5:49:40,  1.73it/s]


 27%|████████▉                        | 13625/50000 [2:28:07<6:00:17,  1.68it/s]


 27%|████████▉                        | 13626/50000 [2:28:08<5:54:02,  1.71it/s]


 27%|████████▉                        | 13627/50000 [2:28:08<5:58:40,  1.69it/s]


 27%|████████▉                        | 13628/50000 [2:28:09<5:58:32,  1.69it/s]


 27%|████████▉                        | 13629/50000 [2:28:09<5:45:54,  1.75it/s]


 27%|████████▉                        | 13630/50000 [2:28:10<6:18:14,  1.60it/s]


 27%|████████▉                        | 13631/50000 [2:28:11<6:22:10,  1.59it/s]


 27%|████████▉                        | 13632/50000 [2:28:11<6:06:35,  1.65it/s]


 27%|████████▉                        | 13633/50000 [2:28:12<6:03:40,  1.67it/s]


 27%|████████▉                        | 13634/50000 [2:28:13<6:27:24,  1.56it/s]


 27%|████████▉                        | 13635/50000 [2:28:13<6:21:34,  1.59it/s]


 27%|████████▉                        | 13636/50000 [2:28:14<6:21:05,  1.59it/s]


 27%|█████████                        | 13637/50000 [2:28:14<6:24:04,  1.58it/s]


 27%|█████████                        | 13638/50000 [2:28:15<6:28:32,  1.56it/s]


 27%|█████████                        | 13639/50000 [2:28:16<6:32:50,  1.54it/s]


 27%|█████████                        | 13640/50000 [2:28:16<6:34:42,  1.54it/s]


 27%|█████████                        | 13641/50000 [2:28:17<6:31:19,  1.55it/s]


 27%|█████████                        | 13642/50000 [2:28:18<6:29:59,  1.55it/s]


 27%|█████████                        | 13643/50000 [2:28:18<6:14:20,  1.62it/s]


 27%|█████████                        | 13644/50000 [2:28:19<6:18:18,  1.60it/s]


 27%|█████████                        | 13645/50000 [2:28:19<6:19:49,  1.60it/s]


 27%|█████████                        | 13646/50000 [2:28:20<6:06:05,  1.66it/s]


 27%|█████████                        | 13647/50000 [2:28:21<6:19:23,  1.60it/s]


 27%|█████████                        | 13648/50000 [2:28:21<6:21:39,  1.59it/s]


 27%|█████████                        | 13649/50000 [2:28:22<6:45:41,  1.49it/s]


 27%|█████████                        | 13650/50000 [2:28:23<6:52:58,  1.47it/s]


 27%|█████████                        | 13651/50000 [2:28:24<7:00:37,  1.44it/s]


 27%|█████████                        | 13652/50000 [2:28:24<7:29:52,  1.35it/s]


 27%|█████████                        | 13653/50000 [2:28:25<7:07:39,  1.42it/s]


 27%|█████████                        | 13654/50000 [2:28:26<7:52:01,  1.28it/s]


 27%|█████████                        | 13655/50000 [2:28:27<7:31:09,  1.34it/s]


 27%|█████████                        | 13656/50000 [2:28:27<7:15:27,  1.39it/s]


 27%|█████████                        | 13657/50000 [2:28:28<6:51:46,  1.47it/s]


 27%|█████████                        | 13658/50000 [2:28:28<6:37:26,  1.52it/s]


 27%|█████████                        | 13659/50000 [2:28:29<6:40:20,  1.51it/s]


 27%|█████████                        | 13660/50000 [2:28:30<6:34:28,  1.54it/s]


 27%|█████████                        | 13661/50000 [2:28:30<6:21:53,  1.59it/s]


 27%|█████████                        | 13662/50000 [2:28:31<7:10:36,  1.41it/s]


 27%|█████████                        | 13663/50000 [2:28:32<6:51:45,  1.47it/s]


 27%|█████████                        | 13664/50000 [2:28:33<6:59:43,  1.44it/s]


 27%|█████████                        | 13665/50000 [2:28:33<6:56:49,  1.45it/s]


 27%|█████████                        | 13666/50000 [2:28:34<6:36:26,  1.53it/s]


 27%|█████████                        | 13667/50000 [2:28:35<6:54:51,  1.46it/s]


 27%|█████████                        | 13668/50000 [2:28:35<6:59:11,  1.44it/s]


 27%|█████████                        | 13669/50000 [2:28:36<6:28:12,  1.56it/s]


 27%|█████████                        | 13670/50000 [2:28:36<6:25:30,  1.57it/s]


 27%|█████████                        | 13671/50000 [2:28:37<6:27:08,  1.56it/s]


 27%|█████████                        | 13672/50000 [2:28:38<7:06:14,  1.42it/s]


 27%|█████████                        | 13673/50000 [2:28:39<6:56:47,  1.45it/s]


 27%|█████████                        | 13674/50000 [2:28:39<7:03:14,  1.43it/s]


 27%|█████████                        | 13675/50000 [2:28:40<6:53:28,  1.46it/s]


 27%|█████████                        | 13676/50000 [2:28:40<6:24:26,  1.57it/s]


 27%|█████████                        | 13677/50000 [2:28:41<6:28:34,  1.56it/s]


 27%|█████████                        | 13678/50000 [2:28:42<6:16:35,  1.61it/s]


 27%|█████████                        | 13679/50000 [2:28:42<6:33:58,  1.54it/s]


 27%|█████████                        | 13680/50000 [2:28:43<6:37:35,  1.52it/s]


 27%|█████████                        | 13681/50000 [2:28:44<6:41:27,  1.51it/s]


 27%|█████████                        | 13682/50000 [2:28:44<6:31:53,  1.54it/s]


 27%|█████████                        | 13683/50000 [2:28:45<6:30:17,  1.55it/s]


 27%|█████████                        | 13684/50000 [2:28:46<6:44:34,  1.50it/s]


 27%|█████████                        | 13685/50000 [2:28:46<6:28:54,  1.56it/s]


 27%|█████████                        | 13686/50000 [2:28:47<7:01:33,  1.44it/s]


 27%|█████████                        | 13687/50000 [2:28:48<6:48:51,  1.48it/s]


 27%|█████████                        | 13688/50000 [2:28:48<6:46:50,  1.49it/s]


 27%|█████████                        | 13689/50000 [2:28:49<6:35:00,  1.53it/s]


 27%|█████████                        | 13690/50000 [2:28:50<6:22:15,  1.58it/s]


 27%|█████████                        | 13691/50000 [2:28:50<6:02:22,  1.67it/s]


 27%|█████████                        | 13692/50000 [2:28:51<6:06:34,  1.65it/s]


 27%|█████████                        | 13693/50000 [2:28:51<6:16:03,  1.61it/s]


 27%|█████████                        | 13694/50000 [2:28:52<6:24:59,  1.57it/s]


 27%|█████████                        | 13695/50000 [2:28:53<6:15:20,  1.61it/s]


 27%|█████████                        | 13696/50000 [2:28:53<6:37:37,  1.52it/s]


 27%|█████████                        | 13697/50000 [2:28:54<6:26:21,  1.57it/s]


 27%|█████████                        | 13698/50000 [2:28:55<6:26:32,  1.57it/s]


 27%|█████████                        | 13699/50000 [2:28:55<6:39:27,  1.51it/s]


 27%|█████████                        | 13700/50000 [2:28:56<6:25:25,  1.57it/s]
                                                                                
{'loss': 3.3968, 'grad_norm': 3.0897293090820312, 'learning_rate': 0.000726, 'epoch': 0.72}

 27%|█████████                        | 13700/50000 [2:28:56<6:25:25,  1.57it/s]


 27%|█████████                        | 13701/50000 [2:28:57<6:24:46,  1.57it/s]


 27%|█████████                        | 13702/50000 [2:28:57<6:43:04,  1.50it/s]


 27%|█████████                        | 13703/50000 [2:28:58<6:42:48,  1.50it/s]


 27%|█████████                        | 13704/50000 [2:28:59<6:35:41,  1.53it/s]


 27%|█████████                        | 13705/50000 [2:28:59<6:26:03,  1.57it/s]


 27%|█████████                        | 13706/50000 [2:29:00<6:26:13,  1.57it/s]


 27%|█████████                        | 13707/50000 [2:29:01<6:28:34,  1.56it/s]


 27%|█████████                        | 13708/50000 [2:29:01<6:44:47,  1.49it/s]


 27%|█████████                        | 13709/50000 [2:29:02<6:51:59,  1.47it/s]


 27%|█████████                        | 13710/50000 [2:29:03<6:47:46,  1.48it/s]


 27%|█████████                        | 13711/50000 [2:29:03<6:35:05,  1.53it/s]


 27%|█████████                        | 13712/50000 [2:29:04<6:35:58,  1.53it/s]


 27%|█████████                        | 13713/50000 [2:29:05<7:13:18,  1.40it/s]


 27%|█████████                        | 13714/50000 [2:29:05<6:57:00,  1.45it/s]


 27%|█████████                        | 13715/50000 [2:29:06<6:30:44,  1.55it/s]


 27%|█████████                        | 13716/50000 [2:29:06<6:12:19,  1.62it/s]


 27%|█████████                        | 13717/50000 [2:29:07<6:06:15,  1.65it/s]


 27%|█████████                        | 13718/50000 [2:29:08<6:20:21,  1.59it/s]


 27%|█████████                        | 13719/50000 [2:29:08<6:27:08,  1.56it/s]


 27%|█████████                        | 13720/50000 [2:29:09<6:26:32,  1.56it/s]


 27%|█████████                        | 13721/50000 [2:29:10<6:17:44,  1.60it/s]


 27%|█████████                        | 13722/50000 [2:29:10<6:12:42,  1.62it/s]


 27%|█████████                        | 13723/50000 [2:29:11<6:20:51,  1.59it/s]


 27%|█████████                        | 13724/50000 [2:29:12<7:02:09,  1.43it/s]


 27%|█████████                        | 13725/50000 [2:29:12<6:38:10,  1.52it/s]


 27%|█████████                        | 13726/50000 [2:29:13<6:34:00,  1.53it/s]


 27%|█████████                        | 13727/50000 [2:29:14<6:57:52,  1.45it/s]


 27%|█████████                        | 13728/50000 [2:29:14<7:05:15,  1.42it/s]


 27%|█████████                        | 13729/50000 [2:29:15<6:32:04,  1.54it/s]


 27%|█████████                        | 13730/50000 [2:29:16<6:30:58,  1.55it/s]


 27%|█████████                        | 13731/50000 [2:29:16<6:35:27,  1.53it/s]


 27%|█████████                        | 13732/50000 [2:29:17<6:01:27,  1.67it/s]


 27%|█████████                        | 13733/50000 [2:29:17<6:02:30,  1.67it/s]


 27%|█████████                        | 13734/50000 [2:29:18<5:52:47,  1.71it/s]


 27%|█████████                        | 13735/50000 [2:29:19<6:08:38,  1.64it/s]


 27%|█████████                        | 13736/50000 [2:29:19<6:19:47,  1.59it/s]


 27%|█████████                        | 13737/50000 [2:29:20<6:11:25,  1.63it/s]


 27%|█████████                        | 13738/50000 [2:29:20<6:01:06,  1.67it/s]


 27%|█████████                        | 13739/50000 [2:29:21<6:13:27,  1.62it/s]


 27%|█████████                        | 13740/50000 [2:29:22<6:12:05,  1.62it/s]


 27%|█████████                        | 13741/50000 [2:29:22<6:13:01,  1.62it/s]


 27%|█████████                        | 13742/50000 [2:29:23<6:19:12,  1.59it/s]


 27%|█████████                        | 13743/50000 [2:29:24<6:36:42,  1.52it/s]


 27%|█████████                        | 13744/50000 [2:29:24<6:34:55,  1.53it/s]


 27%|█████████                        | 13745/50000 [2:29:25<6:28:57,  1.55it/s]


 27%|█████████                        | 13746/50000 [2:29:25<6:01:25,  1.67it/s]


 27%|█████████                        | 13747/50000 [2:29:26<6:45:27,  1.49it/s]


 27%|█████████                        | 13748/50000 [2:29:27<6:11:46,  1.63it/s]


 27%|█████████                        | 13749/50000 [2:29:27<6:21:55,  1.58it/s]


 28%|█████████                        | 13750/50000 [2:29:28<6:16:14,  1.61it/s]


 28%|█████████                        | 13751/50000 [2:29:29<6:02:04,  1.67it/s]


 28%|█████████                        | 13752/50000 [2:29:29<6:13:22,  1.62it/s]


 28%|█████████                        | 13753/50000 [2:29:30<6:35:12,  1.53it/s]


 28%|█████████                        | 13754/50000 [2:29:31<7:03:18,  1.43it/s]


 28%|█████████                        | 13755/50000 [2:29:31<6:59:20,  1.44it/s]


 28%|█████████                        | 13756/50000 [2:29:32<6:39:08,  1.51it/s]


 28%|█████████                        | 13757/50000 [2:29:33<6:51:52,  1.47it/s]


 28%|█████████                        | 13758/50000 [2:29:33<6:36:13,  1.52it/s]


 28%|█████████                        | 13759/50000 [2:29:34<6:25:13,  1.57it/s]


 28%|█████████                        | 13760/50000 [2:29:35<6:17:27,  1.60it/s]


 28%|█████████                        | 13761/50000 [2:29:35<6:50:56,  1.47it/s]


 28%|█████████                        | 13762/50000 [2:29:36<6:34:19,  1.53it/s]


 28%|█████████                        | 13763/50000 [2:29:37<7:42:14,  1.31it/s]


 28%|█████████                        | 13764/50000 [2:29:38<7:05:42,  1.42it/s]


 28%|█████████                        | 13765/50000 [2:29:38<6:54:53,  1.46it/s]


 28%|█████████                        | 13766/50000 [2:29:39<7:25:19,  1.36it/s]


 28%|█████████                        | 13767/50000 [2:29:40<7:20:18,  1.37it/s]


 28%|█████████                        | 13768/50000 [2:29:40<7:19:25,  1.37it/s]


 28%|█████████                        | 13769/50000 [2:29:41<7:08:00,  1.41it/s]


 28%|█████████                        | 13770/50000 [2:29:42<6:45:07,  1.49it/s]


 28%|█████████                        | 13771/50000 [2:29:42<6:47:50,  1.48it/s]


 28%|█████████                        | 13772/50000 [2:29:43<7:03:34,  1.43it/s]


 28%|█████████                        | 13773/50000 [2:29:44<6:43:06,  1.50it/s]


 28%|█████████                        | 13774/50000 [2:29:45<6:59:01,  1.44it/s]


 28%|█████████                        | 13775/50000 [2:29:45<6:37:54,  1.52it/s]


 28%|█████████                        | 13776/50000 [2:29:46<6:19:35,  1.59it/s]


 28%|█████████                        | 13777/50000 [2:29:46<6:12:21,  1.62it/s]


 28%|█████████                        | 13778/50000 [2:29:47<6:42:05,  1.50it/s]


 28%|█████████                        | 13779/50000 [2:29:48<6:31:24,  1.54it/s]


 28%|█████████                        | 13780/50000 [2:29:48<6:50:47,  1.47it/s]


 28%|█████████                        | 13781/50000 [2:29:49<6:42:13,  1.50it/s]


 28%|█████████                        | 13782/50000 [2:29:50<6:27:01,  1.56it/s]


 28%|█████████                        | 13783/50000 [2:29:50<6:17:29,  1.60it/s]


 28%|█████████                        | 13784/50000 [2:29:51<6:08:33,  1.64it/s]


 28%|█████████                        | 13785/50000 [2:29:51<5:37:04,  1.79it/s]


 28%|█████████                        | 13786/50000 [2:29:52<5:58:32,  1.68it/s]


 28%|█████████                        | 13787/50000 [2:29:52<5:58:46,  1.68it/s]


 28%|█████████                        | 13788/50000 [2:29:53<5:50:54,  1.72it/s]


 28%|█████████                        | 13789/50000 [2:29:54<6:06:03,  1.65it/s]


 28%|█████████                        | 13790/50000 [2:29:54<6:04:01,  1.66it/s]


 28%|█████████                        | 13791/50000 [2:29:55<5:58:04,  1.69it/s]


 28%|█████████                        | 13792/50000 [2:29:56<6:08:42,  1.64it/s]


 28%|█████████                        | 13793/50000 [2:29:56<6:16:46,  1.60it/s]


 28%|█████████                        | 13794/50000 [2:29:57<6:12:46,  1.62it/s]


 28%|█████████                        | 13795/50000 [2:29:57<6:16:38,  1.60it/s]


 28%|█████████                        | 13796/50000 [2:29:58<6:25:12,  1.57it/s]


 28%|█████████                        | 13797/50000 [2:29:59<6:31:13,  1.54it/s]


 28%|█████████                        | 13798/50000 [2:30:00<7:08:34,  1.41it/s]


 28%|█████████                        | 13799/50000 [2:30:00<6:56:55,  1.45it/s]


 28%|█████████                        | 13800/50000 [2:30:01<6:48:44,  1.48it/s]
                                                                                
{'loss': 3.3551, 'grad_norm': 2.7833919525146484, 'learning_rate': 0.000724, 'epoch': 0.72}

 28%|█████████                        | 13800/50000 [2:30:01<6:48:44,  1.48it/s]


 28%|█████████                        | 13801/50000 [2:30:02<6:34:37,  1.53it/s]


 28%|█████████                        | 13802/50000 [2:30:02<6:46:33,  1.48it/s]


 28%|█████████                        | 13803/50000 [2:30:03<6:47:23,  1.48it/s]


 28%|█████████                        | 13804/50000 [2:30:04<6:55:02,  1.45it/s]


 28%|█████████                        | 13805/50000 [2:30:04<6:46:32,  1.48it/s]


 28%|█████████                        | 13806/50000 [2:30:05<7:01:24,  1.43it/s]


 28%|█████████                        | 13807/50000 [2:30:06<6:29:33,  1.55it/s]


 28%|█████████                        | 13808/50000 [2:30:06<6:32:56,  1.54it/s]


 28%|█████████                        | 13809/50000 [2:30:07<7:38:42,  1.31it/s]


 28%|█████████                        | 13810/50000 [2:30:08<7:09:46,  1.40it/s]


 28%|█████████                        | 13811/50000 [2:30:08<6:35:17,  1.53it/s]


 28%|█████████                        | 13812/50000 [2:30:09<6:36:58,  1.52it/s]


 28%|█████████                        | 13813/50000 [2:30:10<6:38:07,  1.51it/s]


 28%|█████████                        | 13814/50000 [2:30:10<6:49:37,  1.47it/s]


 28%|█████████                        | 13815/50000 [2:30:11<6:56:38,  1.45it/s]


 28%|█████████                        | 13816/50000 [2:30:12<6:36:13,  1.52it/s]


 28%|█████████                        | 13817/50000 [2:30:12<6:20:31,  1.58it/s]


 28%|█████████                        | 13818/50000 [2:30:13<6:55:10,  1.45it/s]


 28%|█████████                        | 13819/50000 [2:30:14<6:53:08,  1.46it/s]


 28%|█████████                        | 13820/50000 [2:30:14<6:50:31,  1.47it/s]


 28%|█████████                        | 13821/50000 [2:30:15<6:33:52,  1.53it/s]


 28%|█████████                        | 13822/50000 [2:30:16<7:18:24,  1.38it/s]


 28%|█████████                        | 13823/50000 [2:30:17<7:43:45,  1.30it/s]


 28%|█████████                        | 13824/50000 [2:30:17<7:19:13,  1.37it/s]


 28%|█████████                        | 13825/50000 [2:30:18<6:47:03,  1.48it/s]


 28%|█████████▏                       | 13826/50000 [2:30:19<6:44:03,  1.49it/s]


 28%|█████████▏                       | 13827/50000 [2:30:19<6:17:19,  1.60it/s]


 28%|█████████▏                       | 13828/50000 [2:30:20<6:27:44,  1.55it/s]


 28%|█████████▏                       | 13829/50000 [2:30:21<6:45:44,  1.49it/s]


 28%|█████████▏                       | 13830/50000 [2:30:21<6:11:39,  1.62it/s]


 28%|█████████▏                       | 13831/50000 [2:30:22<6:54:45,  1.45it/s]


 28%|█████████▏                       | 13832/50000 [2:30:23<6:52:47,  1.46it/s]


 28%|█████████▏                       | 13833/50000 [2:30:23<6:50:23,  1.47it/s]


 28%|█████████▏                       | 13834/50000 [2:30:24<6:29:14,  1.55it/s]


 28%|█████████▏                       | 13835/50000 [2:30:25<6:43:23,  1.49it/s]


 28%|█████████▏                       | 13836/50000 [2:30:25<6:46:05,  1.48it/s]


 28%|█████████▏                       | 13837/50000 [2:30:26<6:37:38,  1.52it/s]


 28%|█████████▏                       | 13838/50000 [2:30:27<6:32:49,  1.53it/s]


 28%|█████████▏                       | 13839/50000 [2:30:27<6:34:03,  1.53it/s]


 28%|█████████▏                       | 13840/50000 [2:30:28<6:18:52,  1.59it/s]


 28%|█████████▏                       | 13841/50000 [2:30:28<6:13:53,  1.61it/s]


 28%|█████████▏                       | 13842/50000 [2:30:29<6:00:16,  1.67it/s]


 28%|█████████▏                       | 13843/50000 [2:30:29<5:55:11,  1.70it/s]


 28%|█████████▏                       | 13844/50000 [2:30:30<5:52:53,  1.71it/s]


 28%|█████████▏                       | 13845/50000 [2:30:31<6:20:09,  1.59it/s]


 28%|█████████▏                       | 13846/50000 [2:30:31<6:14:47,  1.61it/s]


 28%|█████████▏                       | 13847/50000 [2:30:32<6:30:36,  1.54it/s]


 28%|█████████▏                       | 13848/50000 [2:30:33<6:32:21,  1.54it/s]


 28%|█████████▏                       | 13849/50000 [2:30:33<6:32:30,  1.54it/s]


 28%|█████████▏                       | 13850/50000 [2:30:34<6:21:07,  1.58it/s]


 28%|█████████▏                       | 13851/50000 [2:30:35<6:20:00,  1.59it/s]


 28%|█████████▏                       | 13852/50000 [2:30:35<6:39:06,  1.51it/s]


 28%|█████████▏                       | 13853/50000 [2:30:36<6:38:20,  1.51it/s]


 28%|█████████▏                       | 13854/50000 [2:30:37<6:42:21,  1.50it/s]


 28%|█████████▏                       | 13855/50000 [2:30:37<6:12:52,  1.62it/s]


 28%|█████████▏                       | 13856/50000 [2:30:38<6:20:02,  1.59it/s]


 28%|█████████▏                       | 13857/50000 [2:30:38<6:21:27,  1.58it/s]


 28%|█████████▏                       | 13858/50000 [2:30:39<6:15:35,  1.60it/s]


 28%|█████████▏                       | 13859/50000 [2:30:40<6:26:38,  1.56it/s]


 28%|█████████▏                       | 13860/50000 [2:30:41<6:58:24,  1.44it/s]


 28%|█████████▏                       | 13861/50000 [2:30:41<6:53:17,  1.46it/s]


 28%|█████████▏                       | 13862/50000 [2:30:42<6:48:35,  1.47it/s]


 28%|█████████▏                       | 13863/50000 [2:30:43<6:40:46,  1.50it/s]


 28%|█████████▏                       | 13864/50000 [2:30:43<6:37:34,  1.51it/s]


 28%|█████████▏                       | 13865/50000 [2:30:44<6:22:48,  1.57it/s]


 28%|█████████▏                       | 13866/50000 [2:30:44<6:38:02,  1.51it/s]


 28%|█████████▏                       | 13867/50000 [2:30:45<6:40:18,  1.50it/s]


 28%|█████████▏                       | 13868/50000 [2:30:46<7:21:58,  1.36it/s]


 28%|█████████▏                       | 13869/50000 [2:30:47<6:39:39,  1.51it/s]


 28%|█████████▏                       | 13870/50000 [2:30:47<6:40:14,  1.50it/s]


 28%|█████████▏                       | 13871/50000 [2:30:48<6:25:49,  1.56it/s]


 28%|█████████▏                       | 13872/50000 [2:30:49<7:04:55,  1.42it/s]


 28%|█████████▏                       | 13873/50000 [2:30:49<6:44:07,  1.49it/s]


 28%|█████████▏                       | 13874/50000 [2:30:50<6:39:52,  1.51it/s]


 28%|█████████▏                       | 13875/50000 [2:30:51<6:49:19,  1.47it/s]


 28%|█████████▏                       | 13876/50000 [2:30:51<6:45:18,  1.49it/s]


 28%|█████████▏                       | 13877/50000 [2:30:52<6:11:22,  1.62it/s]


 28%|█████████▏                       | 13878/50000 [2:30:52<6:09:42,  1.63it/s]


 28%|█████████▏                       | 13879/50000 [2:30:53<6:06:18,  1.64it/s]


 28%|█████████▏                       | 13880/50000 [2:30:54<6:14:11,  1.61it/s]


 28%|█████████▏                       | 13881/50000 [2:30:54<6:08:19,  1.63it/s]


 28%|█████████▏                       | 13882/50000 [2:30:55<6:17:53,  1.59it/s]


 28%|█████████▏                       | 13883/50000 [2:30:55<6:09:47,  1.63it/s]


 28%|█████████▏                       | 13884/50000 [2:30:56<6:00:37,  1.67it/s]


 28%|█████████▏                       | 13885/50000 [2:30:57<6:05:38,  1.65it/s]


 28%|█████████▏                       | 13886/50000 [2:30:57<6:26:38,  1.56it/s]


 28%|█████████▏                       | 13887/50000 [2:30:58<6:14:36,  1.61it/s]


 28%|█████████▏                       | 13888/50000 [2:30:59<6:21:04,  1.58it/s]


 28%|█████████▏                       | 13889/50000 [2:30:59<6:55:10,  1.45it/s]


 28%|█████████▏                       | 13890/50000 [2:31:00<6:34:44,  1.52it/s]


 28%|█████████▏                       | 13891/50000 [2:31:01<6:10:38,  1.62it/s]


 28%|█████████▏                       | 13892/50000 [2:31:01<6:16:02,  1.60it/s]


 28%|█████████▏                       | 13893/50000 [2:31:02<6:09:28,  1.63it/s]


 28%|█████████▏                       | 13894/50000 [2:31:02<6:04:52,  1.65it/s]


 28%|█████████▏                       | 13895/50000 [2:31:03<6:02:50,  1.66it/s]


 28%|█████████▏                       | 13896/50000 [2:31:04<6:24:43,  1.56it/s]


 28%|█████████▏                       | 13897/50000 [2:31:04<6:07:41,  1.64it/s]


 28%|█████████▏                       | 13898/50000 [2:31:05<6:07:01,  1.64it/s]


 28%|█████████▏                       | 13899/50000 [2:31:06<6:28:47,  1.55it/s]


 28%|█████████▏                       | 13900/50000 [2:31:06<6:45:08,  1.49it/s]
                                                                                
{'loss': 3.3925, 'grad_norm': 3.3978703022003174, 'learning_rate': 0.000722, 'epoch': 0.73}

 28%|█████████▏                       | 13900/50000 [2:31:06<6:45:08,  1.49it/s]


 28%|█████████▏                       | 13901/50000 [2:31:07<6:28:44,  1.55it/s]


 28%|█████████▏                       | 13902/50000 [2:31:08<6:47:17,  1.48it/s]


 28%|█████████▏                       | 13903/50000 [2:31:08<6:31:17,  1.54it/s]


 28%|█████████▏                       | 13904/50000 [2:31:09<6:16:45,  1.60it/s]


 28%|█████████▏                       | 13905/50000 [2:31:10<6:44:36,  1.49it/s]


 28%|█████████▏                       | 13906/50000 [2:31:10<6:37:27,  1.51it/s]


 28%|█████████▏                       | 13907/50000 [2:31:11<6:52:20,  1.46it/s]


 28%|█████████▏                       | 13908/50000 [2:31:12<6:43:59,  1.49it/s]


 28%|█████████▏                       | 13909/50000 [2:31:12<6:26:41,  1.56it/s]


 28%|█████████▏                       | 13910/50000 [2:31:13<6:18:01,  1.59it/s]


 28%|█████████▏                       | 13911/50000 [2:31:13<6:22:13,  1.57it/s]


 28%|█████████▏                       | 13912/50000 [2:31:14<6:28:44,  1.55it/s]


 28%|█████████▏                       | 13913/50000 [2:31:15<6:43:25,  1.49it/s]


 28%|█████████▏                       | 13914/50000 [2:31:15<6:45:47,  1.48it/s]


 28%|█████████▏                       | 13915/50000 [2:31:16<6:36:02,  1.52it/s]


 28%|█████████▏                       | 13916/50000 [2:31:17<6:22:21,  1.57it/s]


 28%|█████████▏                       | 13917/50000 [2:31:17<6:16:01,  1.60it/s]


 28%|█████████▏                       | 13918/50000 [2:31:18<6:50:20,  1.47it/s]


 28%|█████████▏                       | 13919/50000 [2:31:19<7:49:20,  1.28it/s]


 28%|█████████▏                       | 13920/50000 [2:31:20<7:21:38,  1.36it/s]


 28%|█████████▏                       | 13921/50000 [2:31:20<7:06:42,  1.41it/s]


 28%|█████████▏                       | 13922/50000 [2:31:21<6:38:08,  1.51it/s]


 28%|█████████▏                       | 13923/50000 [2:31:22<6:25:57,  1.56it/s]


 28%|█████████▏                       | 13924/50000 [2:31:22<6:14:02,  1.61it/s]


 28%|█████████▏                       | 13925/50000 [2:31:23<6:02:23,  1.66it/s]


 28%|█████████▏                       | 13926/50000 [2:31:23<5:59:48,  1.67it/s]


 28%|█████████▏                       | 13927/50000 [2:31:24<6:10:35,  1.62it/s]


 28%|█████████▏                       | 13928/50000 [2:31:25<6:07:45,  1.63it/s]


 28%|█████████▏                       | 13929/50000 [2:31:25<6:43:22,  1.49it/s]


 28%|█████████▏                       | 13930/50000 [2:31:26<6:26:14,  1.56it/s]


 28%|█████████▏                       | 13931/50000 [2:31:27<6:38:18,  1.51it/s]


 28%|█████████▏                       | 13932/50000 [2:31:27<6:31:50,  1.53it/s]


 28%|█████████▏                       | 13933/50000 [2:31:28<6:13:39,  1.61it/s]


 28%|█████████▏                       | 13934/50000 [2:31:28<6:19:04,  1.59it/s]


 28%|█████████▏                       | 13935/50000 [2:31:29<6:36:58,  1.51it/s]


 28%|█████████▏                       | 13936/50000 [2:31:30<6:41:14,  1.50it/s]


 28%|█████████▏                       | 13937/50000 [2:31:30<6:36:09,  1.52it/s]


 28%|█████████▏                       | 13938/50000 [2:31:31<6:50:06,  1.47it/s]


 28%|█████████▏                       | 13939/50000 [2:31:32<6:44:36,  1.49it/s]


 28%|█████████▏                       | 13940/50000 [2:31:32<6:32:44,  1.53it/s]


 28%|█████████▏                       | 13941/50000 [2:31:33<6:18:49,  1.59it/s]


 28%|█████████▏                       | 13942/50000 [2:31:34<6:19:37,  1.58it/s]


 28%|█████████▏                       | 13943/50000 [2:31:34<6:38:58,  1.51it/s]


 28%|█████████▏                       | 13944/50000 [2:31:35<6:34:53,  1.52it/s]


 28%|█████████▏                       | 13945/50000 [2:31:36<7:06:08,  1.41it/s]


 28%|█████████▏                       | 13946/50000 [2:31:36<6:29:27,  1.54it/s]


 28%|█████████▏                       | 13947/50000 [2:31:37<6:12:19,  1.61it/s]


 28%|█████████▏                       | 13948/50000 [2:31:38<6:02:46,  1.66it/s]


 28%|█████████▏                       | 13949/50000 [2:31:38<6:31:46,  1.53it/s]


 28%|█████████▏                       | 13950/50000 [2:31:39<6:15:28,  1.60it/s]


 28%|█████████▏                       | 13951/50000 [2:31:39<6:16:11,  1.60it/s]


 28%|█████████▏                       | 13952/50000 [2:31:40<6:05:25,  1.64it/s]


 28%|█████████▏                       | 13953/50000 [2:31:41<6:13:39,  1.61it/s]


 28%|█████████▏                       | 13954/50000 [2:31:41<6:00:18,  1.67it/s]


 28%|█████████▏                       | 13955/50000 [2:31:42<5:43:28,  1.75it/s]


 28%|█████████▏                       | 13956/50000 [2:31:42<5:39:01,  1.77it/s]


 28%|█████████▏                       | 13957/50000 [2:31:43<6:13:05,  1.61it/s]


 28%|█████████▏                       | 13958/50000 [2:31:44<6:15:36,  1.60it/s]


 28%|█████████▏                       | 13959/50000 [2:31:44<6:17:06,  1.59it/s]


 28%|█████████▏                       | 13960/50000 [2:31:45<6:26:22,  1.55it/s]


 28%|█████████▏                       | 13961/50000 [2:31:46<6:15:37,  1.60it/s]


 28%|█████████▏                       | 13962/50000 [2:31:46<6:24:58,  1.56it/s]


 28%|█████████▏                       | 13963/50000 [2:31:47<6:03:35,  1.65it/s]


 28%|█████████▏                       | 13964/50000 [2:31:47<5:52:47,  1.70it/s]


 28%|█████████▏                       | 13965/50000 [2:31:48<5:53:00,  1.70it/s]


 28%|█████████▏                       | 13966/50000 [2:31:49<6:39:55,  1.50it/s]


 28%|█████████▏                       | 13967/50000 [2:31:50<6:48:52,  1.47it/s]


 28%|█████████▏                       | 13968/50000 [2:31:50<6:58:57,  1.43it/s]


 28%|█████████▏                       | 13969/50000 [2:31:51<6:38:13,  1.51it/s]


 28%|█████████▏                       | 13970/50000 [2:31:51<6:07:19,  1.63it/s]


 28%|█████████▏                       | 13971/50000 [2:31:52<6:04:22,  1.65it/s]


 28%|█████████▏                       | 13972/50000 [2:31:53<6:34:48,  1.52it/s]


 28%|█████████▏                       | 13973/50000 [2:31:53<7:02:14,  1.42it/s]


 28%|█████████▏                       | 13974/50000 [2:31:54<6:34:54,  1.52it/s]


 28%|█████████▏                       | 13975/50000 [2:31:55<6:20:20,  1.58it/s]


 28%|█████████▏                       | 13976/50000 [2:31:55<6:37:52,  1.51it/s]


 28%|█████████▏                       | 13977/50000 [2:31:56<6:24:33,  1.56it/s]


 28%|█████████▏                       | 13978/50000 [2:31:57<6:32:16,  1.53it/s]


 28%|█████████▏                       | 13979/50000 [2:31:57<6:34:17,  1.52it/s]


 28%|█████████▏                       | 13980/50000 [2:31:58<6:31:08,  1.53it/s]


 28%|█████████▏                       | 13981/50000 [2:31:59<6:31:12,  1.53it/s]


 28%|█████████▏                       | 13982/50000 [2:31:59<6:41:34,  1.49it/s]


 28%|█████████▏                       | 13983/50000 [2:32:00<6:14:04,  1.60it/s]


 28%|█████████▏                       | 13984/50000 [2:32:00<6:09:04,  1.63it/s]


 28%|█████████▏                       | 13985/50000 [2:32:01<6:04:24,  1.65it/s]


 28%|█████████▏                       | 13986/50000 [2:32:02<6:04:28,  1.65it/s]


 28%|█████████▏                       | 13987/50000 [2:32:02<6:08:00,  1.63it/s]


 28%|█████████▏                       | 13988/50000 [2:32:03<6:31:21,  1.53it/s]


 28%|█████████▏                       | 13989/50000 [2:32:04<6:34:48,  1.52it/s]


 28%|█████████▏                       | 13990/50000 [2:32:04<7:05:33,  1.41it/s]


 28%|█████████▏                       | 13991/50000 [2:32:05<6:49:40,  1.46it/s]


 28%|█████████▏                       | 13992/50000 [2:32:06<6:31:40,  1.53it/s]


 28%|█████████▏                       | 13993/50000 [2:32:06<6:41:52,  1.49it/s]


 28%|█████████▏                       | 13994/50000 [2:32:07<6:27:18,  1.55it/s]


 28%|█████████▏                       | 13995/50000 [2:32:08<6:18:16,  1.59it/s]


 28%|█████████▏                       | 13996/50000 [2:32:08<6:11:52,  1.61it/s]


 28%|█████████▏                       | 13997/50000 [2:32:09<6:19:52,  1.58it/s]


 28%|█████████▏                       | 13998/50000 [2:32:09<6:19:58,  1.58it/s]


 28%|█████████▏                       | 13999/50000 [2:32:10<6:21:02,  1.57it/s]


 28%|█████████▏                       | 14000/50000 [2:32:11<6:12:40,  1.61it/s]
                                                                                
{'loss': 3.4322, 'grad_norm': 3.2538697719573975, 'learning_rate': 0.0007199999999999999, 'epoch': 0.73}

 28%|█████████▏                       | 14000/50000 [2:32:11<6:12:40,  1.61it/s]


 28%|█████████▏                       | 14001/50000 [2:32:11<6:08:33,  1.63it/s]


 28%|█████████▏                       | 14002/50000 [2:32:12<6:01:53,  1.66it/s]


 28%|█████████▏                       | 14003/50000 [2:32:13<6:24:55,  1.56it/s]


 28%|█████████▏                       | 14004/50000 [2:32:13<6:16:25,  1.59it/s]


 28%|█████████▏                       | 14005/50000 [2:32:14<6:34:59,  1.52it/s]


 28%|█████████▏                       | 14006/50000 [2:32:14<6:15:33,  1.60it/s]


 28%|█████████▏                       | 14007/50000 [2:32:15<6:14:36,  1.60it/s]


 28%|█████████▏                       | 14008/50000 [2:32:16<5:51:44,  1.71it/s]


 28%|█████████▏                       | 14009/50000 [2:32:16<6:04:35,  1.65it/s]


 28%|█████████▏                       | 14010/50000 [2:32:17<5:53:13,  1.70it/s]


 28%|█████████▏                       | 14011/50000 [2:32:17<5:47:31,  1.73it/s]


 28%|█████████▏                       | 14012/50000 [2:32:18<6:21:29,  1.57it/s]


 28%|█████████▏                       | 14013/50000 [2:32:19<6:26:34,  1.55it/s]


 28%|█████████▏                       | 14014/50000 [2:32:19<6:26:48,  1.55it/s]


 28%|█████████▏                       | 14015/50000 [2:32:20<6:33:54,  1.52it/s]


 28%|█████████▎                       | 14016/50000 [2:32:21<6:25:02,  1.56it/s]


 28%|█████████▎                       | 14017/50000 [2:32:21<6:12:07,  1.61it/s]


 28%|█████████▎                       | 14018/50000 [2:32:22<5:51:05,  1.71it/s]


 28%|█████████▎                       | 14019/50000 [2:32:22<5:47:09,  1.73it/s]


 28%|█████████▎                       | 14020/50000 [2:32:23<6:00:15,  1.66it/s]


 28%|█████████▎                       | 14021/50000 [2:32:24<6:22:24,  1.57it/s]


 28%|█████████▎                       | 14022/50000 [2:32:24<5:53:51,  1.69it/s]


 28%|█████████▎                       | 14023/50000 [2:32:25<5:48:02,  1.72it/s]


 28%|█████████▎                       | 14024/50000 [2:32:25<5:58:37,  1.67it/s]


 28%|█████████▎                       | 14025/50000 [2:32:26<6:22:19,  1.57it/s]


 28%|█████████▎                       | 14026/50000 [2:32:27<6:48:01,  1.47it/s]


 28%|█████████▎                       | 14027/50000 [2:32:28<6:56:57,  1.44it/s]


 28%|█████████▎                       | 14028/50000 [2:32:28<6:50:15,  1.46it/s]


 28%|█████████▎                       | 14029/50000 [2:32:29<6:47:55,  1.47it/s]


 28%|█████████▎                       | 14030/50000 [2:32:30<7:02:40,  1.42it/s]


 28%|█████████▎                       | 14031/50000 [2:32:30<7:03:24,  1.42it/s]


 28%|█████████▎                       | 14032/50000 [2:32:31<7:14:27,  1.38it/s]


 28%|█████████▎                       | 14033/50000 [2:32:32<7:11:34,  1.39it/s]


 28%|█████████▎                       | 14034/50000 [2:32:33<7:05:08,  1.41it/s]


 28%|█████████▎                       | 14035/50000 [2:32:33<6:26:08,  1.55it/s]


 28%|█████████▎                       | 14036/50000 [2:32:34<6:29:53,  1.54it/s]


 28%|█████████▎                       | 14037/50000 [2:32:34<6:15:33,  1.60it/s]


 28%|█████████▎                       | 14038/50000 [2:32:35<6:54:29,  1.45it/s]


 28%|█████████▎                       | 14039/50000 [2:32:36<6:42:48,  1.49it/s]


 28%|█████████▎                       | 14040/50000 [2:32:36<6:20:04,  1.58it/s]


 28%|█████████▎                       | 14041/50000 [2:32:37<6:23:08,  1.56it/s]


 28%|█████████▎                       | 14042/50000 [2:32:38<6:20:51,  1.57it/s]


 28%|█████████▎                       | 14043/50000 [2:32:38<6:21:29,  1.57it/s]


 28%|█████████▎                       | 14044/50000 [2:32:39<6:27:44,  1.55it/s]


 28%|█████████▎                       | 14045/50000 [2:32:40<6:49:34,  1.46it/s]


 28%|█████████▎                       | 14046/50000 [2:32:40<6:32:40,  1.53it/s]


 28%|█████████▎                       | 14047/50000 [2:32:41<6:19:36,  1.58it/s]


 28%|█████████▎                       | 14048/50000 [2:32:42<6:19:34,  1.58it/s]


 28%|█████████▎                       | 14049/50000 [2:32:42<6:24:06,  1.56it/s]


 28%|█████████▎                       | 14050/50000 [2:32:43<6:18:06,  1.58it/s]


 28%|█████████▎                       | 14051/50000 [2:32:43<6:26:26,  1.55it/s]


 28%|█████████▎                       | 14052/50000 [2:32:44<6:18:36,  1.58it/s]


 28%|█████████▎                       | 14053/50000 [2:32:45<7:05:33,  1.41it/s]


 28%|█████████▎                       | 14054/50000 [2:32:46<6:52:50,  1.45it/s]


 28%|█████████▎                       | 14055/50000 [2:32:46<6:50:45,  1.46it/s]


 28%|█████████▎                       | 14056/50000 [2:32:47<7:30:21,  1.33it/s]


 28%|█████████▎                       | 14057/50000 [2:32:48<7:02:08,  1.42it/s]


 28%|█████████▎                       | 14058/50000 [2:32:49<7:45:08,  1.29it/s]


 28%|█████████▎                       | 14059/50000 [2:32:49<7:43:41,  1.29it/s]


 28%|█████████▎                       | 14060/50000 [2:32:51<8:36:33,  1.16it/s]


 28%|█████████▎                       | 14061/50000 [2:32:51<7:40:39,  1.30it/s]


 28%|█████████▎                       | 14062/50000 [2:32:52<7:10:28,  1.39it/s]


 28%|█████████▎                       | 14063/50000 [2:32:52<7:09:02,  1.40it/s]


 28%|█████████▎                       | 14064/50000 [2:32:53<6:58:39,  1.43it/s]


 28%|█████████▎                       | 14065/50000 [2:32:54<6:33:15,  1.52it/s]


 28%|█████████▎                       | 14066/50000 [2:32:54<6:49:54,  1.46it/s]


 28%|█████████▎                       | 14067/50000 [2:32:55<6:32:42,  1.53it/s]


 28%|█████████▎                       | 14068/50000 [2:32:56<6:20:42,  1.57it/s]


 28%|█████████▎                       | 14069/50000 [2:32:56<6:15:36,  1.59it/s]


 28%|█████████▎                       | 14070/50000 [2:32:57<5:54:34,  1.69it/s]


 28%|█████████▎                       | 14071/50000 [2:32:57<6:08:43,  1.62it/s]


 28%|█████████▎                       | 14072/50000 [2:32:58<6:26:49,  1.55it/s]


 28%|█████████▎                       | 14073/50000 [2:32:59<6:24:29,  1.56it/s]


 28%|█████████▎                       | 14074/50000 [2:32:59<6:02:56,  1.65it/s]


 28%|█████████▎                       | 14075/50000 [2:33:00<5:57:26,  1.68it/s]


 28%|█████████▎                       | 14076/50000 [2:33:00<6:07:17,  1.63it/s]


 28%|█████████▎                       | 14077/50000 [2:33:01<6:01:42,  1.66it/s]


 28%|█████████▎                       | 14078/50000 [2:33:02<5:52:16,  1.70it/s]


 28%|█████████▎                       | 14079/50000 [2:33:02<6:36:29,  1.51it/s]


 28%|█████████▎                       | 14080/50000 [2:33:03<7:04:06,  1.41it/s]


 28%|█████████▎                       | 14081/50000 [2:33:04<6:30:47,  1.53it/s]


 28%|█████████▎                       | 14082/50000 [2:33:05<6:48:24,  1.47it/s]


 28%|█████████▎                       | 14083/50000 [2:33:05<6:57:06,  1.44it/s]


 28%|█████████▎                       | 14084/50000 [2:33:06<6:33:09,  1.52it/s]


 28%|█████████▎                       | 14085/50000 [2:33:06<6:34:30,  1.52it/s]


 28%|█████████▎                       | 14086/50000 [2:33:07<6:34:25,  1.52it/s]


 28%|█████████▎                       | 14087/50000 [2:33:08<6:49:25,  1.46it/s]


 28%|█████████▎                       | 14088/50000 [2:33:08<6:20:34,  1.57it/s]


 28%|█████████▎                       | 14089/50000 [2:33:09<6:27:58,  1.54it/s]


 28%|█████████▎                       | 14090/50000 [2:33:10<6:18:27,  1.58it/s]


 28%|█████████▎                       | 14091/50000 [2:33:10<6:12:49,  1.61it/s]


 28%|█████████▎                       | 14092/50000 [2:33:11<6:14:45,  1.60it/s]


 28%|█████████▎                       | 14093/50000 [2:33:12<6:22:53,  1.56it/s]


 28%|█████████▎                       | 14094/50000 [2:33:12<6:21:43,  1.57it/s]


 28%|█████████▎                       | 14095/50000 [2:33:13<6:23:11,  1.56it/s]


 28%|█████████▎                       | 14096/50000 [2:33:13<6:20:57,  1.57it/s]


 28%|█████████▎                       | 14097/50000 [2:33:14<6:25:57,  1.55it/s]


 28%|█████████▎                       | 14098/50000 [2:33:15<6:13:45,  1.60it/s]


 28%|█████████▎                       | 14099/50000 [2:33:16<7:07:38,  1.40it/s]


 28%|█████████▎                       | 14100/50000 [2:33:16<6:51:49,  1.45it/s]
                                                                                
{'loss': 3.3636, 'grad_norm': 2.8608436584472656, 'learning_rate': 0.000718, 'epoch': 0.74}

 28%|█████████▎                       | 14100/50000 [2:33:16<6:51:49,  1.45it/s]


 28%|█████████▎                       | 14101/50000 [2:33:17<6:43:14,  1.48it/s]


 28%|█████████▎                       | 14102/50000 [2:33:17<6:25:45,  1.55it/s]


 28%|█████████▎                       | 14103/50000 [2:33:18<6:30:29,  1.53it/s]


 28%|█████████▎                       | 14104/50000 [2:33:19<6:12:25,  1.61it/s]


 28%|█████████▎                       | 14105/50000 [2:33:19<6:01:51,  1.65it/s]


 28%|█████████▎                       | 14106/50000 [2:33:20<6:08:07,  1.63it/s]


 28%|█████████▎                       | 14107/50000 [2:33:21<6:18:05,  1.58it/s]


 28%|█████████▎                       | 14108/50000 [2:33:21<6:23:59,  1.56it/s]


 28%|█████████▎                       | 14109/50000 [2:33:22<6:13:18,  1.60it/s]


 28%|█████████▎                       | 14110/50000 [2:33:22<6:08:07,  1.62it/s]


 28%|█████████▎                       | 14111/50000 [2:33:23<6:42:55,  1.48it/s]


 28%|█████████▎                       | 14112/50000 [2:33:24<6:49:21,  1.46it/s]


 28%|█████████▎                       | 14113/50000 [2:33:25<7:15:09,  1.37it/s]


 28%|█████████▎                       | 14114/50000 [2:33:26<7:35:50,  1.31it/s]


 28%|█████████▎                       | 14115/50000 [2:33:26<7:05:48,  1.40it/s]


 28%|█████████▎                       | 14116/50000 [2:33:27<6:50:32,  1.46it/s]


 28%|█████████▎                       | 14117/50000 [2:33:27<6:39:51,  1.50it/s]


 28%|█████████▎                       | 14118/50000 [2:33:28<6:51:00,  1.46it/s]


 28%|█████████▎                       | 14119/50000 [2:33:29<6:35:34,  1.51it/s]


 28%|█████████▎                       | 14120/50000 [2:33:29<6:17:57,  1.58it/s]


 28%|█████████▎                       | 14121/50000 [2:33:30<6:56:37,  1.44it/s]


 28%|█████████▎                       | 14122/50000 [2:33:31<6:35:18,  1.51it/s]


 28%|█████████▎                       | 14123/50000 [2:33:31<6:31:27,  1.53it/s]


 28%|█████████▎                       | 14124/50000 [2:33:32<6:20:03,  1.57it/s]


 28%|█████████▎                       | 14125/50000 [2:33:33<6:24:18,  1.56it/s]


 28%|█████████▎                       | 14126/50000 [2:33:33<6:29:24,  1.54it/s]


 28%|█████████▎                       | 14127/50000 [2:33:34<6:13:40,  1.60it/s]


 28%|█████████▎                       | 14128/50000 [2:33:34<6:06:04,  1.63it/s]


 28%|█████████▎                       | 14129/50000 [2:33:35<6:28:23,  1.54it/s]


 28%|█████████▎                       | 14130/50000 [2:33:36<6:21:23,  1.57it/s]


 28%|█████████▎                       | 14131/50000 [2:33:36<6:08:21,  1.62it/s]


 28%|█████████▎                       | 14132/50000 [2:33:37<5:48:12,  1.72it/s]


 28%|█████████▎                       | 14133/50000 [2:33:37<5:46:01,  1.73it/s]


 28%|█████████▎                       | 14134/50000 [2:33:38<6:53:42,  1.44it/s]


 28%|█████████▎                       | 14135/50000 [2:33:39<6:50:00,  1.46it/s]


 28%|█████████▎                       | 14136/50000 [2:33:40<6:48:30,  1.46it/s]


 28%|█████████▎                       | 14137/50000 [2:33:41<7:02:44,  1.41it/s]


 28%|█████████▎                       | 14138/50000 [2:33:41<7:04:12,  1.41it/s]


 28%|█████████▎                       | 14139/50000 [2:33:42<7:04:09,  1.41it/s]


 28%|█████████▎                       | 14140/50000 [2:33:43<7:55:28,  1.26it/s]


 28%|█████████▎                       | 14141/50000 [2:33:44<7:34:16,  1.32it/s]


 28%|█████████▎                       | 14142/50000 [2:33:44<7:01:19,  1.42it/s]


 28%|█████████▎                       | 14143/50000 [2:33:45<6:40:26,  1.49it/s]


 28%|█████████▎                       | 14144/50000 [2:33:45<6:32:35,  1.52it/s]


 28%|█████████▎                       | 14145/50000 [2:33:46<6:19:11,  1.58it/s]


 28%|█████████▎                       | 14146/50000 [2:33:47<6:21:08,  1.57it/s]


 28%|█████████▎                       | 14147/50000 [2:33:48<7:09:04,  1.39it/s]


 28%|█████████▎                       | 14148/50000 [2:33:48<6:45:45,  1.47it/s]


 28%|█████████▎                       | 14149/50000 [2:33:49<6:56:02,  1.44it/s]


 28%|█████████▎                       | 14150/50000 [2:33:49<6:36:52,  1.51it/s]


 28%|█████████▎                       | 14151/50000 [2:33:50<6:06:00,  1.63it/s]


 28%|█████████▎                       | 14152/50000 [2:33:51<6:16:25,  1.59it/s]


 28%|█████████▎                       | 14153/50000 [2:33:51<6:26:03,  1.55it/s]


 28%|█████████▎                       | 14154/50000 [2:33:52<6:00:32,  1.66it/s]


 28%|█████████▎                       | 14155/50000 [2:33:52<5:55:35,  1.68it/s]


 28%|█████████▎                       | 14156/50000 [2:33:53<6:17:32,  1.58it/s]


 28%|█████████▎                       | 14157/50000 [2:33:54<6:13:06,  1.60it/s]


 28%|█████████▎                       | 14158/50000 [2:33:54<6:14:36,  1.59it/s]


 28%|█████████▎                       | 14159/50000 [2:33:55<6:18:59,  1.58it/s]


 28%|█████████▎                       | 14160/50000 [2:33:56<6:07:21,  1.63it/s]


 28%|█████████▎                       | 14161/50000 [2:33:56<6:46:40,  1.47it/s]


 28%|█████████▎                       | 14162/50000 [2:33:57<6:30:16,  1.53it/s]


 28%|█████████▎                       | 14163/50000 [2:33:58<6:14:08,  1.60it/s]


 28%|█████████▎                       | 14164/50000 [2:33:58<6:24:24,  1.55it/s]


 28%|█████████▎                       | 14165/50000 [2:33:59<6:40:09,  1.49it/s]


 28%|█████████▎                       | 14166/50000 [2:34:00<7:07:18,  1.40it/s]


 28%|█████████▎                       | 14167/50000 [2:34:00<6:47:52,  1.46it/s]


 28%|█████████▎                       | 14168/50000 [2:34:01<6:15:25,  1.59it/s]


 28%|█████████▎                       | 14169/50000 [2:34:02<6:31:46,  1.52it/s]


 28%|█████████▎                       | 14170/50000 [2:34:02<6:26:53,  1.54it/s]


 28%|█████████▎                       | 14171/50000 [2:34:03<6:28:04,  1.54it/s]


 28%|█████████▎                       | 14172/50000 [2:34:04<6:32:16,  1.52it/s]


 28%|█████████▎                       | 14173/50000 [2:34:04<6:17:03,  1.58it/s]


 28%|█████████▎                       | 14174/50000 [2:34:05<6:13:04,  1.60it/s]


 28%|█████████▎                       | 14175/50000 [2:34:05<6:12:38,  1.60it/s]


 28%|█████████▎                       | 14176/50000 [2:34:06<6:37:42,  1.50it/s]


 28%|█████████▎                       | 14177/50000 [2:34:07<6:57:33,  1.43it/s]


 28%|█████████▎                       | 14178/50000 [2:34:08<6:36:58,  1.50it/s]


 28%|█████████▎                       | 14179/50000 [2:34:08<6:50:08,  1.46it/s]


 28%|█████████▎                       | 14180/50000 [2:34:09<6:45:18,  1.47it/s]


 28%|█████████▎                       | 14181/50000 [2:34:10<6:46:13,  1.47it/s]


 28%|█████████▎                       | 14182/50000 [2:34:10<6:22:00,  1.56it/s]


 28%|█████████▎                       | 14183/50000 [2:34:11<6:40:50,  1.49it/s]


 28%|█████████▎                       | 14184/50000 [2:34:12<6:28:24,  1.54it/s]


 28%|█████████▎                       | 14185/50000 [2:34:12<6:25:13,  1.55it/s]


 28%|█████████▎                       | 14186/50000 [2:34:13<6:43:47,  1.48it/s]


 28%|█████████▎                       | 14187/50000 [2:34:14<7:27:11,  1.33it/s]


 28%|█████████▎                       | 14188/50000 [2:34:14<7:09:40,  1.39it/s]


 28%|█████████▎                       | 14189/50000 [2:34:15<6:42:48,  1.48it/s]


 28%|█████████▎                       | 14190/50000 [2:34:16<6:20:35,  1.57it/s]


 28%|█████████▎                       | 14191/50000 [2:34:16<6:19:49,  1.57it/s]


 28%|█████████▎                       | 14192/50000 [2:34:17<6:24:45,  1.55it/s]


 28%|█████████▎                       | 14193/50000 [2:34:18<6:22:39,  1.56it/s]


 28%|█████████▎                       | 14194/50000 [2:34:18<6:30:18,  1.53it/s]


 28%|█████████▎                       | 14195/50000 [2:34:19<6:42:48,  1.48it/s]


 28%|█████████▎                       | 14196/50000 [2:34:20<6:59:27,  1.42it/s]


 28%|█████████▎                       | 14197/50000 [2:34:20<6:44:56,  1.47it/s]


 28%|█████████▎                       | 14198/50000 [2:34:21<6:43:52,  1.48it/s]


 28%|█████████▎                       | 14199/50000 [2:34:22<6:36:23,  1.51it/s]


 28%|█████████▎                       | 14200/50000 [2:34:22<6:18:43,  1.58it/s]
                                                                                
{'loss': 3.3572, 'grad_norm': 3.09806489944458, 'learning_rate': 0.000716, 'epoch': 0.74}

 28%|█████████▎                       | 14200/50000 [2:34:22<6:18:43,  1.58it/s]


 28%|█████████▎                       | 14201/50000 [2:34:23<6:07:31,  1.62it/s]


 28%|█████████▎                       | 14202/50000 [2:34:24<6:44:40,  1.47it/s]


 28%|█████████▎                       | 14203/50000 [2:34:24<6:39:10,  1.49it/s]


 28%|█████████▎                       | 14204/50000 [2:34:25<6:30:52,  1.53it/s]


 28%|█████████▍                       | 14205/50000 [2:34:25<6:07:27,  1.62it/s]


 28%|█████████▍                       | 14206/50000 [2:34:26<6:37:21,  1.50it/s]


 28%|█████████▍                       | 14207/50000 [2:34:27<6:39:44,  1.49it/s]


 28%|█████████▍                       | 14208/50000 [2:34:27<6:38:01,  1.50it/s]


 28%|█████████▍                       | 14209/50000 [2:34:28<6:21:04,  1.57it/s]


 28%|█████████▍                       | 14210/50000 [2:34:29<6:43:30,  1.48it/s]


 28%|█████████▍                       | 14211/50000 [2:34:29<6:26:50,  1.54it/s]


 28%|█████████▍                       | 14212/50000 [2:34:30<6:14:59,  1.59it/s]


 28%|█████████▍                       | 14213/50000 [2:34:31<6:33:01,  1.52it/s]


 28%|█████████▍                       | 14214/50000 [2:34:32<6:59:52,  1.42it/s]


 28%|█████████▍                       | 14215/50000 [2:34:32<6:34:43,  1.51it/s]


 28%|█████████▍                       | 14216/50000 [2:34:33<6:32:55,  1.52it/s]


 28%|█████████▍                       | 14217/50000 [2:34:33<6:29:22,  1.53it/s]


 28%|█████████▍                       | 14218/50000 [2:34:34<6:12:17,  1.60it/s]


 28%|█████████▍                       | 14219/50000 [2:34:35<6:16:07,  1.59it/s]


 28%|█████████▍                       | 14220/50000 [2:34:36<7:07:46,  1.39it/s]


 28%|█████████▍                       | 14221/50000 [2:34:36<6:52:36,  1.45it/s]


 28%|█████████▍                       | 14222/50000 [2:34:37<6:31:49,  1.52it/s]


 28%|█████████▍                       | 14223/50000 [2:34:37<6:35:21,  1.51it/s]


 28%|█████████▍                       | 14224/50000 [2:34:38<6:28:47,  1.53it/s]


 28%|█████████▍                       | 14225/50000 [2:34:39<6:05:32,  1.63it/s]


 28%|█████████▍                       | 14226/50000 [2:34:39<6:07:55,  1.62it/s]


 28%|█████████▍                       | 14227/50000 [2:34:40<6:11:54,  1.60it/s]


 28%|█████████▍                       | 14228/50000 [2:34:40<6:12:19,  1.60it/s]


 28%|█████████▍                       | 14229/50000 [2:34:41<6:53:30,  1.44it/s]


 28%|█████████▍                       | 14230/50000 [2:34:42<6:41:35,  1.48it/s]


 28%|█████████▍                       | 14231/50000 [2:34:43<6:36:27,  1.50it/s]


 28%|█████████▍                       | 14232/50000 [2:34:43<6:23:50,  1.55it/s]


 28%|█████████▍                       | 14233/50000 [2:34:44<5:56:34,  1.67it/s]


 28%|█████████▍                       | 14234/50000 [2:34:44<5:57:06,  1.67it/s]


 28%|█████████▍                       | 14235/50000 [2:34:45<6:10:58,  1.61it/s]


 28%|█████████▍                       | 14236/50000 [2:34:46<6:34:43,  1.51it/s]


 28%|█████████▍                       | 14237/50000 [2:34:46<6:07:22,  1.62it/s]


 28%|█████████▍                       | 14238/50000 [2:34:47<6:09:12,  1.61it/s]


 28%|█████████▍                       | 14239/50000 [2:34:47<6:05:49,  1.63it/s]


 28%|█████████▍                       | 14240/50000 [2:34:48<6:03:29,  1.64it/s]


 28%|█████████▍                       | 14241/50000 [2:34:49<6:22:48,  1.56it/s]


 28%|█████████▍                       | 14242/50000 [2:34:49<6:21:07,  1.56it/s]


 28%|█████████▍                       | 14243/50000 [2:34:50<6:25:34,  1.55it/s]


 28%|█████████▍                       | 14244/50000 [2:34:51<6:48:28,  1.46it/s]


 28%|█████████▍                       | 14245/50000 [2:34:52<6:52:39,  1.44it/s]


 28%|█████████▍                       | 14246/50000 [2:34:52<6:40:48,  1.49it/s]


 28%|█████████▍                       | 14247/50000 [2:34:53<6:21:15,  1.56it/s]


 28%|█████████▍                       | 14248/50000 [2:34:53<6:21:05,  1.56it/s]


 28%|█████████▍                       | 14249/50000 [2:34:54<6:09:14,  1.61it/s]


 28%|█████████▍                       | 14250/50000 [2:34:55<6:17:33,  1.58it/s]


 29%|█████████▍                       | 14251/50000 [2:34:55<6:09:27,  1.61it/s]


 29%|█████████▍                       | 14252/50000 [2:34:56<6:12:48,  1.60it/s]


 29%|█████████▍                       | 14253/50000 [2:34:56<6:11:47,  1.60it/s]


 29%|█████████▍                       | 14254/50000 [2:34:57<6:16:39,  1.58it/s]


 29%|█████████▍                       | 14255/50000 [2:34:58<6:21:15,  1.56it/s]


 29%|█████████▍                       | 14256/50000 [2:34:58<6:40:48,  1.49it/s]


 29%|█████████▍                       | 14257/50000 [2:34:59<6:33:37,  1.51it/s]


 29%|█████████▍                       | 14258/50000 [2:35:00<6:29:45,  1.53it/s]


 29%|█████████▍                       | 14259/50000 [2:35:00<6:41:07,  1.49it/s]


 29%|█████████▍                       | 14260/50000 [2:35:01<6:36:09,  1.50it/s]


 29%|█████████▍                       | 14261/50000 [2:35:02<7:08:44,  1.39it/s]


 29%|█████████▍                       | 14262/50000 [2:35:03<7:06:48,  1.40it/s]


 29%|█████████▍                       | 14263/50000 [2:35:03<6:39:23,  1.49it/s]


 29%|█████████▍                       | 14264/50000 [2:35:04<7:20:11,  1.35it/s]


 29%|█████████▍                       | 14265/50000 [2:35:05<7:19:37,  1.35it/s]


 29%|█████████▍                       | 14266/50000 [2:35:05<6:48:20,  1.46it/s]


 29%|█████████▍                       | 14267/50000 [2:35:06<6:27:55,  1.54it/s]


 29%|█████████▍                       | 14268/50000 [2:35:07<6:51:05,  1.45it/s]


 29%|█████████▍                       | 14269/50000 [2:35:07<6:34:03,  1.51it/s]


 29%|█████████▍                       | 14270/50000 [2:35:08<6:23:10,  1.55it/s]


 29%|█████████▍                       | 14271/50000 [2:35:09<6:00:22,  1.65it/s]


 29%|█████████▍                       | 14272/50000 [2:35:09<6:31:43,  1.52it/s]


 29%|█████████▍                       | 14273/50000 [2:35:10<6:19:17,  1.57it/s]


 29%|█████████▍                       | 14274/50000 [2:35:10<6:05:06,  1.63it/s]


 29%|█████████▍                       | 14275/50000 [2:35:11<6:22:13,  1.56it/s]


 29%|█████████▍                       | 14276/50000 [2:35:12<6:35:43,  1.50it/s]


 29%|█████████▍                       | 14277/50000 [2:35:12<6:22:05,  1.56it/s]


 29%|█████████▍                       | 14278/50000 [2:35:13<6:09:20,  1.61it/s]


 29%|█████████▍                       | 14279/50000 [2:35:14<6:02:36,  1.64it/s]


 29%|█████████▍                       | 14280/50000 [2:35:14<5:55:22,  1.68it/s]


 29%|█████████▍                       | 14281/50000 [2:35:15<6:00:51,  1.65it/s]


 29%|█████████▍                       | 14282/50000 [2:35:16<6:24:06,  1.55it/s]


 29%|█████████▍                       | 14283/50000 [2:35:16<6:20:49,  1.56it/s]


 29%|█████████▍                       | 14284/50000 [2:35:17<6:11:39,  1.60it/s]


 29%|█████████▍                       | 14285/50000 [2:35:17<6:15:26,  1.59it/s]


 29%|█████████▍                       | 14286/50000 [2:35:18<6:37:49,  1.50it/s]


 29%|█████████▍                       | 14287/50000 [2:35:19<6:30:22,  1.52it/s]


 29%|█████████▍                       | 14288/50000 [2:35:19<6:26:13,  1.54it/s]


 29%|█████████▍                       | 14289/50000 [2:35:20<6:15:25,  1.59it/s]


 29%|█████████▍                       | 14290/50000 [2:35:21<6:23:39,  1.55it/s]


 29%|█████████▍                       | 14291/50000 [2:35:21<5:57:16,  1.67it/s]


 29%|█████████▍                       | 14292/50000 [2:35:22<6:00:50,  1.65it/s]


 29%|█████████▍                       | 14293/50000 [2:35:22<5:39:09,  1.75it/s]


 29%|█████████▍                       | 14294/50000 [2:35:23<5:36:54,  1.77it/s]


 29%|█████████▍                       | 14295/50000 [2:35:23<5:41:36,  1.74it/s]


 29%|█████████▍                       | 14296/50000 [2:35:24<5:52:20,  1.69it/s]


 29%|█████████▍                       | 14297/50000 [2:35:25<6:15:50,  1.58it/s]


 29%|█████████▍                       | 14298/50000 [2:35:25<6:16:01,  1.58it/s]


 29%|█████████▍                       | 14299/50000 [2:35:26<6:16:18,  1.58it/s]


 29%|█████████▍                       | 14300/50000 [2:35:27<6:15:10,  1.59it/s]
                                                                                
{'loss': 3.3412, 'grad_norm': 3.1498324871063232, 'learning_rate': 0.000714, 'epoch': 0.75}

 29%|█████████▍                       | 14300/50000 [2:35:27<6:15:10,  1.59it/s]


 29%|█████████▍                       | 14301/50000 [2:35:27<5:49:13,  1.70it/s]


 29%|█████████▍                       | 14302/50000 [2:35:28<5:56:24,  1.67it/s]


 29%|█████████▍                       | 14303/50000 [2:35:28<5:54:35,  1.68it/s]


 29%|█████████▍                       | 14304/50000 [2:35:29<6:03:24,  1.64it/s]


 29%|█████████▍                       | 14305/50000 [2:35:30<6:00:33,  1.65it/s]


 29%|█████████▍                       | 14306/50000 [2:35:30<6:11:03,  1.60it/s]


 29%|█████████▍                       | 14307/50000 [2:35:31<6:26:17,  1.54it/s]


 29%|█████████▍                       | 14308/50000 [2:35:32<6:12:01,  1.60it/s]


 29%|█████████▍                       | 14309/50000 [2:35:32<6:02:08,  1.64it/s]


 29%|█████████▍                       | 14310/50000 [2:35:33<6:38:05,  1.49it/s]


 29%|█████████▍                       | 14311/50000 [2:35:34<6:33:06,  1.51it/s]


 29%|█████████▍                       | 14312/50000 [2:35:34<6:29:30,  1.53it/s]


 29%|█████████▍                       | 14313/50000 [2:35:35<6:33:47,  1.51it/s]


 29%|█████████▍                       | 14314/50000 [2:35:36<6:35:46,  1.50it/s]


 29%|█████████▍                       | 14315/50000 [2:35:36<6:47:36,  1.46it/s]


 29%|█████████▍                       | 14316/50000 [2:35:37<6:39:49,  1.49it/s]


 29%|█████████▍                       | 14317/50000 [2:35:38<6:40:59,  1.48it/s]


 29%|█████████▍                       | 14318/50000 [2:35:38<6:28:17,  1.53it/s]


 29%|█████████▍                       | 14319/50000 [2:35:39<6:26:09,  1.54it/s]


 29%|█████████▍                       | 14320/50000 [2:35:40<6:29:09,  1.53it/s]


 29%|█████████▍                       | 14321/50000 [2:35:40<6:04:32,  1.63it/s]


 29%|█████████▍                       | 14322/50000 [2:35:41<6:03:28,  1.64it/s]


 29%|█████████▍                       | 14323/50000 [2:35:41<5:57:13,  1.66it/s]


 29%|█████████▍                       | 14324/50000 [2:35:42<5:51:38,  1.69it/s]


 29%|█████████▍                       | 14325/50000 [2:35:42<6:00:05,  1.65it/s]


 29%|█████████▍                       | 14326/50000 [2:35:43<5:58:04,  1.66it/s]


 29%|█████████▍                       | 14327/50000 [2:35:44<5:57:48,  1.66it/s]


 29%|█████████▍                       | 14328/50000 [2:35:44<5:53:09,  1.68it/s]


 29%|█████████▍                       | 14329/50000 [2:35:45<6:00:02,  1.65it/s]


 29%|█████████▍                       | 14330/50000 [2:35:46<6:10:33,  1.60it/s]


 29%|█████████▍                       | 14331/50000 [2:35:46<6:05:26,  1.63it/s]


 29%|█████████▍                       | 14332/50000 [2:35:47<6:17:44,  1.57it/s]


 29%|█████████▍                       | 14333/50000 [2:35:47<6:09:24,  1.61it/s]


 29%|█████████▍                       | 14334/50000 [2:35:48<6:03:31,  1.64it/s]


 29%|█████████▍                       | 14335/50000 [2:35:49<5:51:53,  1.69it/s]


 29%|█████████▍                       | 14336/50000 [2:35:49<5:54:07,  1.68it/s]


 29%|█████████▍                       | 14337/50000 [2:35:50<6:20:46,  1.56it/s]


 29%|█████████▍                       | 14338/50000 [2:35:51<6:34:51,  1.51it/s]


 29%|█████████▍                       | 14339/50000 [2:35:51<6:31:50,  1.52it/s]


 29%|█████████▍                       | 14340/50000 [2:35:52<6:47:08,  1.46it/s]


 29%|█████████▍                       | 14341/50000 [2:35:53<6:28:05,  1.53it/s]


 29%|█████████▍                       | 14342/50000 [2:35:53<6:30:51,  1.52it/s]


 29%|█████████▍                       | 14343/50000 [2:35:54<7:06:43,  1.39it/s]


 29%|█████████▍                       | 14344/50000 [2:35:55<6:46:11,  1.46it/s]


 29%|█████████▍                       | 14345/50000 [2:35:55<6:46:29,  1.46it/s]


 29%|█████████▍                       | 14346/50000 [2:35:56<7:01:50,  1.41it/s]


 29%|█████████▍                       | 14347/50000 [2:35:57<6:52:56,  1.44it/s]


 29%|█████████▍                       | 14348/50000 [2:35:57<6:41:10,  1.48it/s]


 29%|█████████▍                       | 14349/50000 [2:35:58<6:52:32,  1.44it/s]


 29%|█████████▍                       | 14350/50000 [2:35:59<6:42:45,  1.48it/s]


 29%|█████████▍                       | 14351/50000 [2:36:00<6:55:46,  1.43it/s]


 29%|█████████▍                       | 14352/50000 [2:36:00<6:57:43,  1.42it/s]


 29%|█████████▍                       | 14353/50000 [2:36:01<6:52:18,  1.44it/s]


 29%|█████████▍                       | 14354/50000 [2:36:02<6:32:49,  1.51it/s]


 29%|█████████▍                       | 14355/50000 [2:36:02<7:04:03,  1.40it/s]


 29%|█████████▍                       | 14356/50000 [2:36:03<7:43:07,  1.28it/s]


 29%|█████████▍                       | 14357/50000 [2:36:04<7:12:44,  1.37it/s]


 29%|█████████▍                       | 14358/50000 [2:36:05<7:27:30,  1.33it/s]


 29%|█████████▍                       | 14359/50000 [2:36:06<7:31:26,  1.32it/s]


 29%|█████████▍                       | 14360/50000 [2:36:06<7:23:28,  1.34it/s]


 29%|█████████▍                       | 14361/50000 [2:36:07<6:47:36,  1.46it/s]


 29%|█████████▍                       | 14362/50000 [2:36:07<6:25:44,  1.54it/s]


 29%|█████████▍                       | 14363/50000 [2:36:08<6:30:36,  1.52it/s]


 29%|█████████▍                       | 14364/50000 [2:36:09<6:06:26,  1.62it/s]


 29%|█████████▍                       | 14365/50000 [2:36:09<6:18:11,  1.57it/s]


 29%|█████████▍                       | 14366/50000 [2:36:10<6:23:07,  1.55it/s]


 29%|█████████▍                       | 14367/50000 [2:36:10<6:15:09,  1.58it/s]


 29%|█████████▍                       | 14368/50000 [2:36:11<6:47:01,  1.46it/s]


 29%|█████████▍                       | 14369/50000 [2:36:12<6:24:29,  1.54it/s]


 29%|█████████▍                       | 14370/50000 [2:36:12<6:13:16,  1.59it/s]


 29%|█████████▍                       | 14371/50000 [2:36:13<6:15:16,  1.58it/s]


 29%|█████████▍                       | 14372/50000 [2:36:14<6:05:20,  1.63it/s]


 29%|█████████▍                       | 14373/50000 [2:36:14<6:25:39,  1.54it/s]


 29%|█████████▍                       | 14374/50000 [2:36:15<6:15:48,  1.58it/s]


 29%|█████████▍                       | 14375/50000 [2:36:16<6:35:17,  1.50it/s]


 29%|█████████▍                       | 14376/50000 [2:36:16<6:30:30,  1.52it/s]


 29%|█████████▍                       | 14377/50000 [2:36:17<6:57:26,  1.42it/s]


 29%|█████████▍                       | 14378/50000 [2:36:18<6:20:43,  1.56it/s]


 29%|█████████▍                       | 14379/50000 [2:36:18<6:40:57,  1.48it/s]


 29%|█████████▍                       | 14380/50000 [2:36:19<6:20:56,  1.56it/s]


 29%|█████████▍                       | 14381/50000 [2:36:20<6:38:50,  1.49it/s]


 29%|█████████▍                       | 14382/50000 [2:36:20<6:21:49,  1.55it/s]


 29%|█████████▍                       | 14383/50000 [2:36:21<6:33:30,  1.51it/s]


 29%|█████████▍                       | 14384/50000 [2:36:22<6:53:24,  1.44it/s]


 29%|█████████▍                       | 14385/50000 [2:36:22<6:51:02,  1.44it/s]


 29%|█████████▍                       | 14386/50000 [2:36:23<6:33:42,  1.51it/s]


 29%|█████████▍                       | 14387/50000 [2:36:24<6:32:52,  1.51it/s]


 29%|█████████▍                       | 14388/50000 [2:36:24<6:32:10,  1.51it/s]


 29%|█████████▍                       | 14389/50000 [2:36:25<6:14:47,  1.58it/s]


 29%|█████████▍                       | 14390/50000 [2:36:26<6:04:52,  1.63it/s]


 29%|█████████▍                       | 14391/50000 [2:36:26<6:34:22,  1.50it/s]


 29%|█████████▍                       | 14392/50000 [2:36:27<6:36:37,  1.50it/s]


 29%|█████████▍                       | 14393/50000 [2:36:28<6:15:38,  1.58it/s]


 29%|█████████▌                       | 14394/50000 [2:36:28<6:09:58,  1.60it/s]


 29%|█████████▌                       | 14395/50000 [2:36:29<7:00:07,  1.41it/s]


 29%|█████████▌                       | 14396/50000 [2:36:30<6:34:39,  1.50it/s]


 29%|█████████▌                       | 14397/50000 [2:36:30<6:32:33,  1.51it/s]


 29%|█████████▌                       | 14398/50000 [2:36:31<6:32:26,  1.51it/s]


 29%|█████████▌                       | 14399/50000 [2:36:31<6:19:48,  1.56it/s]


 29%|█████████▌                       | 14400/50000 [2:36:32<6:42:01,  1.48it/s]
                                                                                
{'loss': 3.3856, 'grad_norm': 2.9334259033203125, 'learning_rate': 0.000712, 'epoch': 0.75}

 29%|█████████▌                       | 14400/50000 [2:36:32<6:42:01,  1.48it/s]


 29%|█████████▌                       | 14401/50000 [2:36:33<6:52:01,  1.44it/s]


 29%|█████████▌                       | 14402/50000 [2:36:34<6:30:18,  1.52it/s]


 29%|█████████▌                       | 14403/50000 [2:36:34<6:32:52,  1.51it/s]


 29%|█████████▌                       | 14404/50000 [2:36:35<6:13:24,  1.59it/s]


 29%|█████████▌                       | 14405/50000 [2:36:35<6:21:03,  1.56it/s]


 29%|█████████▌                       | 14406/50000 [2:36:36<6:24:15,  1.54it/s]


 29%|█████████▌                       | 14407/50000 [2:36:37<6:28:39,  1.53it/s]


 29%|█████████▌                       | 14408/50000 [2:36:37<6:28:20,  1.53it/s]


 29%|█████████▌                       | 14409/50000 [2:36:38<7:04:11,  1.40it/s]


 29%|█████████▌                       | 14410/50000 [2:36:39<6:26:29,  1.53it/s]


 29%|█████████▌                       | 14411/50000 [2:36:39<6:23:04,  1.55it/s]


 29%|█████████▌                       | 14412/50000 [2:36:40<5:58:48,  1.65it/s]


 29%|█████████▌                       | 14413/50000 [2:36:41<6:04:08,  1.63it/s]


 29%|█████████▌                       | 14414/50000 [2:36:41<6:06:21,  1.62it/s]


 29%|█████████▌                       | 14415/50000 [2:36:42<6:15:40,  1.58it/s]


 29%|█████████▌                       | 14416/50000 [2:36:43<6:37:26,  1.49it/s]


 29%|█████████▌                       | 14417/50000 [2:36:43<6:37:28,  1.49it/s]


 29%|█████████▌                       | 14418/50000 [2:36:44<6:19:35,  1.56it/s]


 29%|█████████▌                       | 14419/50000 [2:36:45<6:22:45,  1.55it/s]


 29%|█████████▌                       | 14420/50000 [2:36:45<6:41:05,  1.48it/s]


 29%|█████████▌                       | 14421/50000 [2:36:46<6:42:24,  1.47it/s]


 29%|█████████▌                       | 14422/50000 [2:36:47<6:29:41,  1.52it/s]


 29%|█████████▌                       | 14423/50000 [2:36:47<6:16:56,  1.57it/s]


 29%|█████████▌                       | 14424/50000 [2:36:48<6:16:25,  1.58it/s]


 29%|█████████▌                       | 14425/50000 [2:36:48<6:19:23,  1.56it/s]


 29%|█████████▌                       | 14426/50000 [2:36:49<6:12:27,  1.59it/s]


 29%|█████████▌                       | 14427/50000 [2:36:50<6:17:43,  1.57it/s]


 29%|█████████▌                       | 14428/50000 [2:36:51<6:58:32,  1.42it/s]


 29%|█████████▌                       | 14429/50000 [2:36:51<6:52:29,  1.44it/s]


 29%|█████████▌                       | 14430/50000 [2:36:52<6:17:34,  1.57it/s]


 29%|█████████▌                       | 14431/50000 [2:36:52<6:09:15,  1.61it/s]


 29%|█████████▌                       | 14432/50000 [2:36:53<6:01:13,  1.64it/s]


 29%|█████████▌                       | 14433/50000 [2:36:53<5:52:23,  1.68it/s]


 29%|█████████▌                       | 14434/50000 [2:36:54<5:37:46,  1.75it/s]


 29%|█████████▌                       | 14435/50000 [2:36:55<5:35:51,  1.76it/s]


 29%|█████████▌                       | 14436/50000 [2:36:55<5:51:20,  1.69it/s]


 29%|█████████▌                       | 14437/50000 [2:36:56<6:03:21,  1.63it/s]


 29%|█████████▌                       | 14438/50000 [2:36:56<5:57:08,  1.66it/s]


 29%|█████████▌                       | 14439/50000 [2:36:57<5:57:11,  1.66it/s]


 29%|█████████▌                       | 14440/50000 [2:36:58<5:52:52,  1.68it/s]


 29%|█████████▌                       | 14441/50000 [2:36:58<5:55:17,  1.67it/s]


 29%|█████████▌                       | 14442/50000 [2:36:59<6:02:53,  1.63it/s]


 29%|█████████▌                       | 14443/50000 [2:36:59<5:56:47,  1.66it/s]


 29%|█████████▌                       | 14444/50000 [2:37:00<5:48:57,  1.70it/s]


 29%|█████████▌                       | 14445/50000 [2:37:01<5:41:02,  1.74it/s]


 29%|█████████▌                       | 14446/50000 [2:37:01<5:52:18,  1.68it/s]


 29%|█████████▌                       | 14447/50000 [2:37:02<6:50:56,  1.44it/s]


 29%|█████████▌                       | 14448/50000 [2:37:03<6:39:25,  1.48it/s]


 29%|█████████▌                       | 14449/50000 [2:37:03<6:35:24,  1.50it/s]


 29%|█████████▌                       | 14450/50000 [2:37:04<7:08:35,  1.38it/s]


 29%|█████████▌                       | 14451/50000 [2:37:05<6:46:50,  1.46it/s]


 29%|█████████▌                       | 14452/50000 [2:37:06<7:11:57,  1.37it/s]


 29%|█████████▌                       | 14453/50000 [2:37:06<6:49:10,  1.45it/s]


 29%|█████████▌                       | 14454/50000 [2:37:07<6:39:56,  1.48it/s]


 29%|█████████▌                       | 14455/50000 [2:37:08<6:48:35,  1.45it/s]


 29%|█████████▌                       | 14456/50000 [2:37:08<6:55:53,  1.42it/s]


 29%|█████████▌                       | 14457/50000 [2:37:09<6:52:37,  1.44it/s]


 29%|█████████▌                       | 14458/50000 [2:37:10<6:35:58,  1.50it/s]


 29%|█████████▌                       | 14459/50000 [2:37:11<7:09:09,  1.38it/s]


 29%|█████████▌                       | 14460/50000 [2:37:11<6:44:57,  1.46it/s]


 29%|█████████▌                       | 14461/50000 [2:37:12<6:35:52,  1.50it/s]


 29%|█████████▌                       | 14462/50000 [2:37:12<6:52:29,  1.44it/s]


 29%|█████████▌                       | 14463/50000 [2:37:13<6:56:17,  1.42it/s]


 29%|█████████▌                       | 14464/50000 [2:37:14<6:36:07,  1.50it/s]


 29%|█████████▌                       | 14465/50000 [2:37:15<6:43:13,  1.47it/s]


 29%|█████████▌                       | 14466/50000 [2:37:15<6:35:45,  1.50it/s]


 29%|█████████▌                       | 14467/50000 [2:37:16<6:21:32,  1.55it/s]


 29%|█████████▌                       | 14468/50000 [2:37:16<6:18:21,  1.57it/s]


 29%|█████████▌                       | 14469/50000 [2:37:17<6:23:05,  1.55it/s]


 29%|█████████▌                       | 14470/50000 [2:37:18<6:14:59,  1.58it/s]


 29%|█████████▌                       | 14471/50000 [2:37:18<6:14:57,  1.58it/s]


 29%|█████████▌                       | 14472/50000 [2:37:19<6:09:09,  1.60it/s]


 29%|█████████▌                       | 14473/50000 [2:37:20<6:25:26,  1.54it/s]


 29%|█████████▌                       | 14474/50000 [2:37:20<6:53:28,  1.43it/s]


 29%|█████████▌                       | 14475/50000 [2:37:21<6:47:25,  1.45it/s]


 29%|█████████▌                       | 14476/50000 [2:37:22<6:43:09,  1.47it/s]


 29%|█████████▌                       | 14477/50000 [2:37:22<6:37:54,  1.49it/s]


 29%|█████████▌                       | 14478/50000 [2:37:23<6:35:23,  1.50it/s]


 29%|█████████▌                       | 14479/50000 [2:37:24<6:05:06,  1.62it/s]


 29%|█████████▌                       | 14480/50000 [2:37:24<6:13:49,  1.58it/s]


 29%|█████████▌                       | 14481/50000 [2:37:25<6:00:48,  1.64it/s]


 29%|█████████▌                       | 14482/50000 [2:37:25<5:51:32,  1.68it/s]


 29%|█████████▌                       | 14483/50000 [2:37:26<6:17:54,  1.57it/s]


 29%|█████████▌                       | 14484/50000 [2:37:27<6:36:20,  1.49it/s]


 29%|█████████▌                       | 14485/50000 [2:37:27<6:15:13,  1.58it/s]


 29%|█████████▌                       | 14486/50000 [2:37:28<6:13:56,  1.58it/s]


 29%|█████████▌                       | 14487/50000 [2:37:29<5:58:29,  1.65it/s]


 29%|█████████▌                       | 14488/50000 [2:37:29<6:11:11,  1.59it/s]


 29%|█████████▌                       | 14489/50000 [2:37:30<6:16:56,  1.57it/s]


 29%|█████████▌                       | 14490/50000 [2:37:31<6:25:11,  1.54it/s]


 29%|█████████▌                       | 14491/50000 [2:37:31<6:35:31,  1.50it/s]


 29%|█████████▌                       | 14492/50000 [2:37:32<6:44:18,  1.46it/s]


 29%|█████████▌                       | 14493/50000 [2:37:33<6:34:13,  1.50it/s]


 29%|█████████▌                       | 14494/50000 [2:37:33<6:22:44,  1.55it/s]


 29%|█████████▌                       | 14495/50000 [2:37:34<6:10:13,  1.60it/s]


 29%|█████████▌                       | 14496/50000 [2:37:35<6:31:10,  1.51it/s]


 29%|█████████▌                       | 14497/50000 [2:37:35<6:30:43,  1.51it/s]


 29%|█████████▌                       | 14498/50000 [2:37:36<6:44:14,  1.46it/s]


 29%|█████████▌                       | 14499/50000 [2:37:36<6:26:22,  1.53it/s]


 29%|█████████▌                       | 14500/50000 [2:37:37<6:37:49,  1.49it/s]
                                                                                
{'loss': 3.3659, 'grad_norm': 2.984023332595825, 'learning_rate': 0.00071, 'epoch': 0.76}

 29%|█████████▌                       | 14500/50000 [2:37:37<6:37:49,  1.49it/s]


 29%|█████████▌                       | 14501/50000 [2:37:38<6:07:53,  1.61it/s]


 29%|█████████▌                       | 14502/50000 [2:37:38<6:24:30,  1.54it/s]


 29%|█████████▌                       | 14503/50000 [2:37:39<6:55:02,  1.43it/s]


 29%|█████████▌                       | 14504/50000 [2:37:40<6:34:59,  1.50it/s]


 29%|█████████▌                       | 14505/50000 [2:37:40<6:22:03,  1.55it/s]


 29%|█████████▌                       | 14506/50000 [2:37:41<6:19:50,  1.56it/s]


 29%|█████████▌                       | 14507/50000 [2:37:42<6:19:15,  1.56it/s]


 29%|█████████▌                       | 14508/50000 [2:37:42<6:02:03,  1.63it/s]


 29%|█████████▌                       | 14509/50000 [2:37:43<6:06:48,  1.61it/s]


 29%|█████████▌                       | 14510/50000 [2:37:43<6:00:04,  1.64it/s]


 29%|█████████▌                       | 14511/50000 [2:37:44<5:57:36,  1.65it/s]


 29%|█████████▌                       | 14512/50000 [2:37:45<6:16:07,  1.57it/s]


 29%|█████████▌                       | 14513/50000 [2:37:45<6:24:40,  1.54it/s]


 29%|█████████▌                       | 14514/50000 [2:37:46<6:09:14,  1.60it/s]


 29%|█████████▌                       | 14515/50000 [2:37:47<6:05:09,  1.62it/s]


 29%|█████████▌                       | 14516/50000 [2:37:47<6:02:08,  1.63it/s]


 29%|█████████▌                       | 14517/50000 [2:37:48<6:03:59,  1.62it/s]


 29%|█████████▌                       | 14518/50000 [2:37:48<5:53:44,  1.67it/s]


 29%|█████████▌                       | 14519/50000 [2:37:49<6:20:25,  1.55it/s]


 29%|█████████▌                       | 14520/50000 [2:37:50<6:19:46,  1.56it/s]


 29%|█████████▌                       | 14521/50000 [2:37:51<6:33:07,  1.50it/s]


 29%|█████████▌                       | 14522/50000 [2:37:51<6:17:43,  1.57it/s]


 29%|█████████▌                       | 14523/50000 [2:37:52<6:22:36,  1.55it/s]


 29%|█████████▌                       | 14524/50000 [2:37:53<6:58:33,  1.41it/s]


 29%|█████████▌                       | 14525/50000 [2:37:54<7:37:23,  1.29it/s]


 29%|█████████▌                       | 14526/50000 [2:37:54<7:31:13,  1.31it/s]


 29%|█████████▌                       | 14527/50000 [2:37:55<6:52:55,  1.43it/s]


 29%|█████████▌                       | 14528/50000 [2:37:55<6:44:54,  1.46it/s]


 29%|█████████▌                       | 14529/50000 [2:37:56<7:09:21,  1.38it/s]


 29%|█████████▌                       | 14530/50000 [2:37:57<7:10:14,  1.37it/s]


 29%|█████████▌                       | 14531/50000 [2:37:58<6:54:29,  1.43it/s]


 29%|█████████▌                       | 14532/50000 [2:37:58<6:50:12,  1.44it/s]


 29%|█████████▌                       | 14533/50000 [2:37:59<6:52:53,  1.43it/s]


 29%|█████████▌                       | 14534/50000 [2:38:00<7:14:45,  1.36it/s]


 29%|█████████▌                       | 14535/50000 [2:38:01<7:02:18,  1.40it/s]


 29%|█████████▌                       | 14536/50000 [2:38:01<6:48:52,  1.45it/s]


 29%|█████████▌                       | 14537/50000 [2:38:02<6:32:47,  1.50it/s]


 29%|█████████▌                       | 14538/50000 [2:38:02<6:31:37,  1.51it/s]


 29%|█████████▌                       | 14539/50000 [2:38:03<6:27:24,  1.53it/s]


 29%|█████████▌                       | 14540/50000 [2:38:04<6:32:26,  1.51it/s]


 29%|█████████▌                       | 14541/50000 [2:38:05<6:50:54,  1.44it/s]


 29%|█████████▌                       | 14542/50000 [2:38:05<6:19:10,  1.56it/s]


 29%|█████████▌                       | 14543/50000 [2:38:06<6:05:14,  1.62it/s]


 29%|█████████▌                       | 14544/50000 [2:38:06<6:01:08,  1.64it/s]


 29%|█████████▌                       | 14545/50000 [2:38:07<6:00:29,  1.64it/s]


 29%|█████████▌                       | 14546/50000 [2:38:08<6:19:08,  1.56it/s]


 29%|█████████▌                       | 14547/50000 [2:38:08<6:10:47,  1.59it/s]


 29%|█████████▌                       | 14548/50000 [2:38:09<6:46:20,  1.45it/s]


 29%|█████████▌                       | 14549/50000 [2:38:10<6:25:28,  1.53it/s]


 29%|█████████▌                       | 14550/50000 [2:38:10<6:24:15,  1.54it/s]


 29%|█████████▌                       | 14551/50000 [2:38:11<6:37:13,  1.49it/s]


 29%|█████████▌                       | 14552/50000 [2:38:12<6:49:42,  1.44it/s]


 29%|█████████▌                       | 14553/50000 [2:38:12<6:57:18,  1.42it/s]


 29%|█████████▌                       | 14554/50000 [2:38:13<6:45:21,  1.46it/s]


 29%|█████████▌                       | 14555/50000 [2:38:14<6:36:59,  1.49it/s]


 29%|█████████▌                       | 14556/50000 [2:38:14<6:36:43,  1.49it/s]


 29%|█████████▌                       | 14557/50000 [2:38:15<6:56:09,  1.42it/s]


 29%|█████████▌                       | 14558/50000 [2:38:16<6:36:42,  1.49it/s]


 29%|█████████▌                       | 14559/50000 [2:38:16<6:38:50,  1.48it/s]


 29%|█████████▌                       | 14560/50000 [2:38:17<6:51:54,  1.43it/s]


 29%|█████████▌                       | 14561/50000 [2:38:18<6:33:37,  1.50it/s]


 29%|█████████▌                       | 14562/50000 [2:38:18<6:17:35,  1.56it/s]


 29%|█████████▌                       | 14563/50000 [2:38:19<6:03:05,  1.63it/s]


 29%|█████████▌                       | 14564/50000 [2:38:19<6:07:20,  1.61it/s]


 29%|█████████▌                       | 14565/50000 [2:38:20<6:01:16,  1.63it/s]


 29%|█████████▌                       | 14566/50000 [2:38:21<6:05:03,  1.62it/s]


 29%|█████████▌                       | 14567/50000 [2:38:21<6:08:42,  1.60it/s]


 29%|█████████▌                       | 14568/50000 [2:38:22<6:15:39,  1.57it/s]


 29%|█████████▌                       | 14569/50000 [2:38:22<5:48:50,  1.69it/s]


 29%|█████████▌                       | 14570/50000 [2:38:23<5:57:18,  1.65it/s]


 29%|█████████▌                       | 14571/50000 [2:38:24<6:06:45,  1.61it/s]


 29%|█████████▌                       | 14572/50000 [2:38:24<5:59:53,  1.64it/s]


 29%|█████████▌                       | 14573/50000 [2:38:25<5:56:07,  1.66it/s]


 29%|█████████▌                       | 14574/50000 [2:38:26<5:47:58,  1.70it/s]


 29%|█████████▌                       | 14575/50000 [2:38:26<5:45:43,  1.71it/s]


 29%|█████████▌                       | 14576/50000 [2:38:27<5:49:32,  1.69it/s]


 29%|█████████▌                       | 14577/50000 [2:38:28<6:34:50,  1.50it/s]


 29%|█████████▌                       | 14578/50000 [2:38:28<6:30:39,  1.51it/s]


 29%|█████████▌                       | 14579/50000 [2:38:29<6:31:40,  1.51it/s]


 29%|█████████▌                       | 14580/50000 [2:38:29<6:17:30,  1.56it/s]


 29%|█████████▌                       | 14581/50000 [2:38:30<6:11:48,  1.59it/s]


 29%|█████████▌                       | 14582/50000 [2:38:31<6:21:25,  1.55it/s]


 29%|█████████▌                       | 14583/50000 [2:38:31<6:17:59,  1.56it/s]


 29%|█████████▋                       | 14584/50000 [2:38:32<6:11:06,  1.59it/s]


 29%|█████████▋                       | 14585/50000 [2:38:33<6:10:44,  1.59it/s]


 29%|█████████▋                       | 14586/50000 [2:38:33<6:05:54,  1.61it/s]


 29%|█████████▋                       | 14587/50000 [2:38:34<6:03:36,  1.62it/s]


 29%|█████████▋                       | 14588/50000 [2:38:35<6:24:50,  1.53it/s]


 29%|█████████▋                       | 14589/50000 [2:38:35<6:12:28,  1.58it/s]


 29%|█████████▋                       | 14590/50000 [2:38:36<5:49:54,  1.69it/s]


 29%|█████████▋                       | 14591/50000 [2:38:36<5:57:59,  1.65it/s]


 29%|█████████▋                       | 14592/50000 [2:38:37<6:37:09,  1.49it/s]


 29%|█████████▋                       | 14593/50000 [2:38:38<6:49:33,  1.44it/s]


 29%|█████████▋                       | 14594/50000 [2:38:38<6:24:13,  1.54it/s]


 29%|█████████▋                       | 14595/50000 [2:38:39<6:21:05,  1.55it/s]


 29%|█████████▋                       | 14596/50000 [2:38:40<6:23:13,  1.54it/s]


 29%|█████████▋                       | 14597/50000 [2:38:40<6:42:00,  1.47it/s]


 29%|█████████▋                       | 14598/50000 [2:38:41<6:55:08,  1.42it/s]


 29%|█████████▋                       | 14599/50000 [2:38:42<6:41:33,  1.47it/s]


 29%|█████████▋                       | 14600/50000 [2:38:42<6:22:04,  1.54it/s]


                                                                                
{'loss': 3.3822, 'grad_norm': 3.0044798851013184, 'learning_rate': 0.000708, 'epoch': 0.76}

 29%|█████████▋                       | 14600/50000 [2:38:42<6:22:04,  1.54it/s]


 29%|█████████▋                       | 14601/50000 [2:38:43<6:42:29,  1.47it/s]


 29%|█████████▋                       | 14602/50000 [2:38:44<6:22:45,  1.54it/s]


 29%|█████████▋                       | 14603/50000 [2:38:44<6:26:44,  1.53it/s]


 29%|█████████▋                       | 14604/50000 [2:38:45<6:16:02,  1.57it/s]


 29%|█████████▋                       | 14605/50000 [2:38:46<6:08:41,  1.60it/s]


 29%|█████████▋                       | 14606/50000 [2:38:46<6:13:28,  1.58it/s]


 29%|█████████▋                       | 14607/50000 [2:38:47<6:32:54,  1.50it/s]


 29%|█████████▋                       | 14608/50000 [2:38:48<6:13:44,  1.58it/s]


 29%|█████████▋                       | 14609/50000 [2:38:48<6:12:37,  1.58it/s]


 29%|█████████▋                       | 14610/50000 [2:38:49<6:34:36,  1.49it/s]


 29%|█████████▋                       | 14611/50000 [2:38:50<6:22:41,  1.54it/s]


 29%|█████████▋                       | 14612/50000 [2:38:50<6:12:05,  1.59it/s]


 29%|█████████▋                       | 14613/50000 [2:38:51<6:12:31,  1.58it/s]


 29%|█████████▋                       | 14614/50000 [2:38:51<6:19:32,  1.55it/s]


 29%|█████████▋                       | 14615/50000 [2:38:52<6:09:47,  1.59it/s]


 29%|█████████▋                       | 14616/50000 [2:38:53<6:29:14,  1.52it/s]


 29%|█████████▋                       | 14617/50000 [2:38:53<6:29:05,  1.52it/s]


 29%|█████████▋                       | 14618/50000 [2:38:54<6:24:26,  1.53it/s]


 29%|█████████▋                       | 14619/50000 [2:38:55<6:19:58,  1.55it/s]


 29%|█████████▋                       | 14620/50000 [2:38:55<6:36:16,  1.49it/s]


 29%|█████████▋                       | 14621/50000 [2:38:56<6:28:15,  1.52it/s]


 29%|█████████▋                       | 14622/50000 [2:38:57<6:43:17,  1.46it/s]


 29%|█████████▋                       | 14623/50000 [2:38:57<6:28:39,  1.52it/s]


 29%|█████████▋                       | 14624/50000 [2:38:58<6:29:37,  1.51it/s]


 29%|█████████▋                       | 14625/50000 [2:38:59<6:30:22,  1.51it/s]


 29%|█████████▋                       | 14626/50000 [2:39:00<7:01:13,  1.40it/s]


 29%|█████████▋                       | 14627/50000 [2:39:00<6:35:54,  1.49it/s]


 29%|█████████▋                       | 14628/50000 [2:39:01<6:29:09,  1.51it/s]


 29%|█████████▋                       | 14629/50000 [2:39:01<6:14:21,  1.57it/s]


 29%|█████████▋                       | 14630/50000 [2:39:02<6:06:10,  1.61it/s]


 29%|█████████▋                       | 14631/50000 [2:39:02<5:56:03,  1.66it/s]


 29%|█████████▋                       | 14632/50000 [2:39:03<6:19:32,  1.55it/s]


 29%|█████████▋                       | 14633/50000 [2:39:04<6:03:08,  1.62it/s]


 29%|█████████▋                       | 14634/50000 [2:39:04<6:07:26,  1.60it/s]


 29%|█████████▋                       | 14635/50000 [2:39:05<6:01:20,  1.63it/s]


 29%|█████████▋                       | 14636/50000 [2:39:05<5:39:52,  1.73it/s]


 29%|█████████▋                       | 14637/50000 [2:39:06<6:08:06,  1.60it/s]


 29%|█████████▋                       | 14638/50000 [2:39:07<5:59:42,  1.64it/s]


 29%|█████████▋                       | 14639/50000 [2:39:07<6:05:55,  1.61it/s]


 29%|█████████▋                       | 14640/50000 [2:39:08<5:56:01,  1.66it/s]


 29%|█████████▋                       | 14641/50000 [2:39:09<6:01:15,  1.63it/s]


 29%|█████████▋                       | 14642/50000 [2:39:09<5:49:17,  1.69it/s]


 29%|█████████▋                       | 14643/50000 [2:39:10<6:16:59,  1.56it/s]


 29%|█████████▋                       | 14644/50000 [2:39:11<6:38:41,  1.48it/s]


 29%|█████████▋                       | 14645/50000 [2:39:11<6:33:18,  1.50it/s]


 29%|█████████▋                       | 14646/50000 [2:39:12<6:29:39,  1.51it/s]


 29%|█████████▋                       | 14647/50000 [2:39:13<6:30:13,  1.51it/s]


 29%|█████████▋                       | 14648/50000 [2:39:13<6:26:10,  1.53it/s]


 29%|█████████▋                       | 14649/50000 [2:39:14<6:08:53,  1.60it/s]


 29%|█████████▋                       | 14650/50000 [2:39:15<6:27:16,  1.52it/s]


 29%|█████████▋                       | 14651/50000 [2:39:15<6:17:19,  1.56it/s]


 29%|█████████▋                       | 14652/50000 [2:39:16<6:10:22,  1.59it/s]


 29%|█████████▋                       | 14653/50000 [2:39:16<6:03:18,  1.62it/s]


 29%|█████████▋                       | 14654/50000 [2:39:17<6:11:47,  1.58it/s]


 29%|█████████▋                       | 14655/50000 [2:39:18<5:58:50,  1.64it/s]


 29%|█████████▋                       | 14656/50000 [2:39:18<6:10:54,  1.59it/s]


 29%|█████████▋                       | 14657/50000 [2:39:19<6:13:43,  1.58it/s]


 29%|█████████▋                       | 14658/50000 [2:39:19<5:48:31,  1.69it/s]


 29%|█████████▋                       | 14659/50000 [2:39:20<6:03:39,  1.62it/s]


 29%|█████████▋                       | 14660/50000 [2:39:21<6:07:30,  1.60it/s]


 29%|█████████▋                       | 14661/50000 [2:39:21<6:23:43,  1.53it/s]


 29%|█████████▋                       | 14662/50000 [2:39:22<6:13:42,  1.58it/s]


 29%|█████████▋                       | 14663/50000 [2:39:23<6:32:10,  1.50it/s]


 29%|█████████▋                       | 14664/50000 [2:39:23<6:17:32,  1.56it/s]


 29%|█████████▋                       | 14665/50000 [2:39:24<6:18:16,  1.56it/s]


 29%|█████████▋                       | 14666/50000 [2:39:25<6:10:53,  1.59it/s]


 29%|█████████▋                       | 14667/50000 [2:39:25<6:10:18,  1.59it/s]


 29%|█████████▋                       | 14668/50000 [2:39:26<6:20:02,  1.55it/s]


 29%|█████████▋                       | 14669/50000 [2:39:27<6:32:54,  1.50it/s]


 29%|█████████▋                       | 14670/50000 [2:39:27<6:51:54,  1.43it/s]


 29%|█████████▋                       | 14671/50000 [2:39:28<6:40:16,  1.47it/s]


 29%|█████████▋                       | 14672/50000 [2:39:29<6:34:03,  1.49it/s]


 29%|█████████▋                       | 14673/50000 [2:39:29<6:21:56,  1.54it/s]


 29%|█████████▋                       | 14674/50000 [2:39:30<6:28:01,  1.52it/s]


 29%|█████████▋                       | 14675/50000 [2:39:31<6:24:31,  1.53it/s]


 29%|█████████▋                       | 14676/50000 [2:39:31<6:38:19,  1.48it/s]


 29%|█████████▋                       | 14677/50000 [2:39:32<6:30:39,  1.51it/s]


 29%|█████████▋                       | 14678/50000 [2:39:33<6:26:19,  1.52it/s]


 29%|█████████▋                       | 14679/50000 [2:39:33<5:46:30,  1.70it/s]


 29%|█████████▋                       | 14680/50000 [2:39:34<6:07:51,  1.60it/s]


 29%|█████████▋                       | 14681/50000 [2:39:34<6:01:34,  1.63it/s]


 29%|█████████▋                       | 14682/50000 [2:39:35<5:42:03,  1.72it/s]


 29%|█████████▋                       | 14683/50000 [2:39:35<5:43:24,  1.71it/s]


 29%|█████████▋                       | 14684/50000 [2:39:36<5:55:31,  1.66it/s]


 29%|█████████▋                       | 14685/50000 [2:39:37<5:58:27,  1.64it/s]


 29%|█████████▋                       | 14686/50000 [2:39:37<5:54:50,  1.66it/s]


 29%|█████████▋                       | 14687/50000 [2:39:38<6:03:36,  1.62it/s]


 29%|█████████▋                       | 14688/50000 [2:39:39<6:25:46,  1.53it/s]


 29%|█████████▋                       | 14689/50000 [2:39:39<6:15:04,  1.57it/s]


 29%|█████████▋                       | 14690/50000 [2:39:40<6:08:38,  1.60it/s]


 29%|█████████▋                       | 14691/50000 [2:39:40<5:58:38,  1.64it/s]


 29%|█████████▋                       | 14692/50000 [2:39:41<5:54:55,  1.66it/s]


 29%|█████████▋                       | 14693/50000 [2:39:42<5:59:03,  1.64it/s]


 29%|█████████▋                       | 14694/50000 [2:39:42<6:09:50,  1.59it/s]


 29%|█████████▋                       | 14695/50000 [2:39:43<6:11:42,  1.58it/s]


 29%|█████████▋                       | 14696/50000 [2:39:44<6:47:30,  1.44it/s]


 29%|█████████▋                       | 14697/50000 [2:39:44<6:39:45,  1.47it/s]


 29%|█████████▋                       | 14698/50000 [2:39:45<7:02:42,  1.39it/s]


 29%|█████████▋                       | 14699/50000 [2:39:46<7:13:44,  1.36it/s]


 29%|█████████▋                       | 14700/50000 [2:39:47<6:49:40,  1.44it/s]
                                                                                
{'loss': 3.3761, 'grad_norm': 3.4126434326171875, 'learning_rate': 0.0007059999999999999, 'epoch': 0.77}

 29%|█████████▋                       | 14700/50000 [2:39:47<6:49:40,  1.44it/s]


 29%|█████████▋                       | 14701/50000 [2:39:47<6:27:28,  1.52it/s]


 29%|█████████▋                       | 14702/50000 [2:39:48<6:30:38,  1.51it/s]


 29%|█████████▋                       | 14703/50000 [2:39:48<6:16:29,  1.56it/s]


 29%|█████████▋                       | 14704/50000 [2:39:49<5:51:22,  1.67it/s]


 29%|█████████▋                       | 14705/50000 [2:39:50<5:56:32,  1.65it/s]


 29%|█████████▋                       | 14706/50000 [2:39:50<6:00:06,  1.63it/s]


 29%|█████████▋                       | 14707/50000 [2:39:51<6:19:44,  1.55it/s]


 29%|█████████▋                       | 14708/50000 [2:39:52<6:07:31,  1.60it/s]


 29%|█████████▋                       | 14709/50000 [2:39:52<6:48:14,  1.44it/s]


 29%|█████████▋                       | 14710/50000 [2:39:53<7:08:36,  1.37it/s]


 29%|█████████▋                       | 14711/50000 [2:39:54<6:39:23,  1.47it/s]


 29%|█████████▋                       | 14712/50000 [2:39:54<6:39:02,  1.47it/s]


 29%|█████████▋                       | 14713/50000 [2:39:55<6:46:57,  1.45it/s]


 29%|█████████▋                       | 14714/50000 [2:39:56<6:30:51,  1.50it/s]


 29%|█████████▋                       | 14715/50000 [2:39:56<6:30:48,  1.50it/s]


 29%|█████████▋                       | 14716/50000 [2:39:57<6:15:09,  1.57it/s]


 29%|█████████▋                       | 14717/50000 [2:39:58<6:06:24,  1.60it/s]


 29%|█████████▋                       | 14718/50000 [2:39:58<6:13:54,  1.57it/s]


 29%|█████████▋                       | 14719/50000 [2:39:59<6:52:59,  1.42it/s]


 29%|█████████▋                       | 14720/50000 [2:40:00<7:04:51,  1.38it/s]


 29%|█████████▋                       | 14721/50000 [2:40:01<7:25:01,  1.32it/s]


 29%|█████████▋                       | 14722/50000 [2:40:01<7:02:21,  1.39it/s]


 29%|█████████▋                       | 14723/50000 [2:40:02<7:11:17,  1.36it/s]


 29%|█████████▋                       | 14724/50000 [2:40:03<6:56:06,  1.41it/s]


 29%|█████████▋                       | 14725/50000 [2:40:04<7:05:57,  1.38it/s]


 29%|█████████▋                       | 14726/50000 [2:40:04<7:07:22,  1.38it/s]


 29%|█████████▋                       | 14727/50000 [2:40:05<6:43:14,  1.46it/s]


 29%|█████████▋                       | 14728/50000 [2:40:06<6:51:31,  1.43it/s]


 29%|█████████▋                       | 14729/50000 [2:40:06<6:27:45,  1.52it/s]


 29%|█████████▋                       | 14730/50000 [2:40:07<6:24:32,  1.53it/s]


 29%|█████████▋                       | 14731/50000 [2:40:07<6:11:10,  1.58it/s]


 29%|█████████▋                       | 14732/50000 [2:40:08<6:01:46,  1.62it/s]


 29%|█████████▋                       | 14733/50000 [2:40:09<6:12:59,  1.58it/s]


 29%|█████████▋                       | 14734/50000 [2:40:09<5:53:33,  1.66it/s]


 29%|█████████▋                       | 14735/50000 [2:40:10<6:07:12,  1.60it/s]


 29%|█████████▋                       | 14736/50000 [2:40:10<6:07:50,  1.60it/s]


 29%|█████████▋                       | 14737/50000 [2:40:11<6:10:28,  1.59it/s]


 29%|█████████▋                       | 14738/50000 [2:40:12<6:03:41,  1.62it/s]


 29%|█████████▋                       | 14739/50000 [2:40:12<5:55:25,  1.65it/s]


 29%|█████████▋                       | 14740/50000 [2:40:13<6:32:59,  1.50it/s]


 29%|█████████▋                       | 14741/50000 [2:40:14<6:43:02,  1.46it/s]


 29%|█████████▋                       | 14742/50000 [2:40:14<6:40:42,  1.47it/s]


 29%|█████████▋                       | 14743/50000 [2:40:15<6:16:55,  1.56it/s]


 29%|█████████▋                       | 14744/50000 [2:40:16<6:16:56,  1.56it/s]


 29%|█████████▋                       | 14745/50000 [2:40:16<6:31:45,  1.50it/s]


 29%|█████████▋                       | 14746/50000 [2:40:17<6:24:59,  1.53it/s]


 29%|█████████▋                       | 14747/50000 [2:40:18<7:00:40,  1.40it/s]


 29%|█████████▋                       | 14748/50000 [2:40:19<7:05:27,  1.38it/s]


 29%|█████████▋                       | 14749/50000 [2:40:19<6:36:01,  1.48it/s]


 30%|█████████▋                       | 14750/50000 [2:40:20<6:48:02,  1.44it/s]


 30%|█████████▋                       | 14751/50000 [2:40:21<6:31:37,  1.50it/s]


 30%|█████████▋                       | 14752/50000 [2:40:21<6:31:47,  1.50it/s]


 30%|█████████▋                       | 14753/50000 [2:40:22<6:19:07,  1.55it/s]


 30%|█████████▋                       | 14754/50000 [2:40:22<6:21:28,  1.54it/s]


 30%|█████████▋                       | 14755/50000 [2:40:23<6:12:00,  1.58it/s]


 30%|█████████▋                       | 14756/50000 [2:40:24<6:31:28,  1.50it/s]


 30%|█████████▋                       | 14757/50000 [2:40:24<6:24:25,  1.53it/s]


 30%|█████████▋                       | 14758/50000 [2:40:25<6:10:36,  1.58it/s]


 30%|█████████▋                       | 14759/50000 [2:40:26<5:55:27,  1.65it/s]


 30%|█████████▋                       | 14760/50000 [2:40:26<6:01:35,  1.62it/s]


 30%|█████████▋                       | 14761/50000 [2:40:27<6:20:39,  1.54it/s]


 30%|█████████▋                       | 14762/50000 [2:40:27<6:03:39,  1.61it/s]


 30%|█████████▋                       | 14763/50000 [2:40:28<5:55:00,  1.65it/s]


 30%|█████████▋                       | 14764/50000 [2:40:29<6:08:48,  1.59it/s]


 30%|█████████▋                       | 14765/50000 [2:40:29<6:02:57,  1.62it/s]


 30%|█████████▋                       | 14766/50000 [2:40:30<5:59:52,  1.63it/s]


 30%|█████████▋                       | 14767/50000 [2:40:31<6:05:36,  1.61it/s]


 30%|█████████▋                       | 14768/50000 [2:40:31<6:29:01,  1.51it/s]


 30%|█████████▋                       | 14769/50000 [2:40:32<7:00:26,  1.40it/s]


 30%|█████████▋                       | 14770/50000 [2:40:33<7:06:20,  1.38it/s]


 30%|█████████▋                       | 14771/50000 [2:40:34<7:29:08,  1.31it/s]


 30%|█████████▋                       | 14772/50000 [2:40:34<6:55:51,  1.41it/s]


 30%|█████████▊                       | 14773/50000 [2:40:35<7:13:33,  1.35it/s]


 30%|█████████▊                       | 14774/50000 [2:40:36<6:44:58,  1.45it/s]


 30%|█████████▊                       | 14775/50000 [2:40:36<6:22:44,  1.53it/s]


 30%|█████████▊                       | 14776/50000 [2:40:37<6:41:03,  1.46it/s]


 30%|█████████▊                       | 14777/50000 [2:40:38<6:21:06,  1.54it/s]


 30%|█████████▊                       | 14778/50000 [2:40:38<6:03:44,  1.61it/s]


 30%|█████████▊                       | 14779/50000 [2:40:39<6:54:05,  1.42it/s]


 30%|█████████▊                       | 14780/50000 [2:40:40<6:42:22,  1.46it/s]


 30%|█████████▊                       | 14781/50000 [2:40:40<6:24:13,  1.53it/s]


 30%|█████████▊                       | 14782/50000 [2:40:41<6:00:54,  1.63it/s]


 30%|█████████▊                       | 14783/50000 [2:40:41<5:55:10,  1.65it/s]


 30%|█████████▊                       | 14784/50000 [2:40:42<6:06:46,  1.60it/s]


 30%|█████████▊                       | 14785/50000 [2:40:43<6:30:54,  1.50it/s]


 30%|█████████▊                       | 14786/50000 [2:40:43<6:26:05,  1.52it/s]


 30%|█████████▊                       | 14787/50000 [2:40:44<6:16:04,  1.56it/s]


 30%|█████████▊                       | 14788/50000 [2:40:45<6:13:37,  1.57it/s]


 30%|█████████▊                       | 14789/50000 [2:40:45<6:01:47,  1.62it/s]


 30%|█████████▊                       | 14790/50000 [2:40:46<6:03:26,  1.61it/s]


 30%|█████████▊                       | 14791/50000 [2:40:47<6:14:49,  1.57it/s]


 30%|█████████▊                       | 14792/50000 [2:40:47<6:09:17,  1.59it/s]


 30%|█████████▊                       | 14793/50000 [2:40:48<6:04:19,  1.61it/s]


 30%|█████████▊                       | 14794/50000 [2:40:48<6:01:52,  1.62it/s]


 30%|█████████▊                       | 14795/50000 [2:40:49<6:12:35,  1.57it/s]


 30%|█████████▊                       | 14796/50000 [2:40:50<6:19:59,  1.54it/s]


 30%|█████████▊                       | 14797/50000 [2:40:51<6:53:59,  1.42it/s]


 30%|█████████▊                       | 14798/50000 [2:40:51<6:56:07,  1.41it/s]


 30%|█████████▊                       | 14799/50000 [2:40:52<6:56:00,  1.41it/s]


 30%|█████████▊                       | 14800/50000 [2:40:53<6:32:47,  1.49it/s]
                                                                                
{'loss': 3.3754, 'grad_norm': 2.925963878631592, 'learning_rate': 0.000704, 'epoch': 0.77}

 30%|█████████▊                       | 14800/50000 [2:40:53<6:32:47,  1.49it/s]


 30%|█████████▊                       | 14801/50000 [2:40:53<6:11:01,  1.58it/s]


 30%|█████████▊                       | 14802/50000 [2:40:54<6:20:05,  1.54it/s]


 30%|█████████▊                       | 14803/50000 [2:40:54<6:17:30,  1.55it/s]


 30%|█████████▊                       | 14804/50000 [2:40:55<6:18:16,  1.55it/s]


 30%|█████████▊                       | 14805/50000 [2:40:56<6:10:52,  1.58it/s]


 30%|█████████▊                       | 14806/50000 [2:40:56<6:35:58,  1.48it/s]


 30%|█████████▊                       | 14807/50000 [2:40:57<6:51:13,  1.43it/s]


 30%|█████████▊                       | 14808/50000 [2:40:58<6:28:12,  1.51it/s]


 30%|█████████▊                       | 14809/50000 [2:40:58<6:28:36,  1.51it/s]


 30%|█████████▊                       | 14810/50000 [2:40:59<6:24:37,  1.52it/s]


 30%|█████████▊                       | 14811/50000 [2:41:00<6:19:53,  1.54it/s]


 30%|█████████▊                       | 14812/50000 [2:41:00<6:03:13,  1.61it/s]


 30%|█████████▊                       | 14813/50000 [2:41:01<6:14:58,  1.56it/s]


 30%|█████████▊                       | 14814/50000 [2:41:02<6:30:24,  1.50it/s]


 30%|█████████▊                       | 14815/50000 [2:41:02<6:46:37,  1.44it/s]


 30%|█████████▊                       | 14816/50000 [2:41:03<6:41:53,  1.46it/s]


 30%|█████████▊                       | 14817/50000 [2:41:04<6:40:02,  1.47it/s]


 30%|█████████▊                       | 14818/50000 [2:41:04<6:21:52,  1.54it/s]


 30%|█████████▊                       | 14819/50000 [2:41:05<6:38:25,  1.47it/s]


 30%|█████████▊                       | 14820/50000 [2:41:06<6:18:26,  1.55it/s]


 30%|█████████▊                       | 14821/50000 [2:41:06<6:19:50,  1.54it/s]


 30%|█████████▊                       | 14822/50000 [2:41:07<6:34:32,  1.49it/s]


 30%|█████████▊                       | 14823/50000 [2:41:08<6:17:50,  1.55it/s]


 30%|█████████▊                       | 14824/50000 [2:41:08<6:15:22,  1.56it/s]


 30%|█████████▊                       | 14825/50000 [2:41:09<6:17:05,  1.55it/s]


 30%|█████████▊                       | 14826/50000 [2:41:10<6:46:33,  1.44it/s]


 30%|█████████▊                       | 14827/50000 [2:41:10<6:52:13,  1.42it/s]


 30%|█████████▊                       | 14828/50000 [2:41:11<6:31:13,  1.50it/s]


 30%|█████████▊                       | 14829/50000 [2:41:12<6:32:59,  1.49it/s]


 30%|█████████▊                       | 14830/50000 [2:41:12<6:21:49,  1.54it/s]


 30%|█████████▊                       | 14831/50000 [2:41:13<6:08:35,  1.59it/s]


 30%|█████████▊                       | 14832/50000 [2:41:14<6:08:07,  1.59it/s]


 30%|█████████▊                       | 14833/50000 [2:41:14<6:01:09,  1.62it/s]


 30%|█████████▊                       | 14834/50000 [2:41:15<6:02:57,  1.61it/s]


 30%|█████████▊                       | 14835/50000 [2:41:15<5:49:47,  1.68it/s]


 30%|█████████▊                       | 14836/50000 [2:41:16<6:02:45,  1.62it/s]


 30%|█████████▊                       | 14837/50000 [2:41:17<6:09:36,  1.59it/s]


 30%|█████████▊                       | 14838/50000 [2:41:17<6:19:01,  1.55it/s]


 30%|█████████▊                       | 14839/50000 [2:41:18<6:01:07,  1.62it/s]


 30%|█████████▊                       | 14840/50000 [2:41:18<5:51:50,  1.67it/s]


 30%|█████████▊                       | 14841/50000 [2:41:19<5:49:45,  1.68it/s]


 30%|█████████▊                       | 14842/50000 [2:41:20<6:04:57,  1.61it/s]


 30%|█████████▊                       | 14843/50000 [2:41:20<5:52:15,  1.66it/s]


 30%|█████████▊                       | 14844/50000 [2:41:21<5:59:56,  1.63it/s]


 30%|█████████▊                       | 14845/50000 [2:41:22<6:04:41,  1.61it/s]


 30%|█████████▊                       | 14846/50000 [2:41:22<6:13:18,  1.57it/s]


 30%|█████████▊                       | 14847/50000 [2:41:23<6:05:53,  1.60it/s]


 30%|█████████▊                       | 14848/50000 [2:41:23<6:06:13,  1.60it/s]


 30%|█████████▊                       | 14849/50000 [2:41:24<6:05:27,  1.60it/s]


 30%|█████████▊                       | 14850/50000 [2:41:25<6:20:25,  1.54it/s]


 30%|█████████▊                       | 14851/50000 [2:41:25<6:25:17,  1.52it/s]


 30%|█████████▊                       | 14852/50000 [2:41:26<6:13:07,  1.57it/s]


 30%|█████████▊                       | 14853/50000 [2:41:27<6:19:07,  1.55it/s]


 30%|█████████▊                       | 14854/50000 [2:41:27<6:15:30,  1.56it/s]


 30%|█████████▊                       | 14855/50000 [2:41:28<6:15:40,  1.56it/s]


 30%|█████████▊                       | 14856/50000 [2:41:29<6:13:10,  1.57it/s]


 30%|█████████▊                       | 14857/50000 [2:41:29<6:20:08,  1.54it/s]


 30%|█████████▊                       | 14858/50000 [2:41:30<6:36:33,  1.48it/s]


 30%|█████████▊                       | 14859/50000 [2:41:31<6:37:42,  1.47it/s]


 30%|█████████▊                       | 14860/50000 [2:41:32<7:16:22,  1.34it/s]


 30%|█████████▊                       | 14861/50000 [2:41:32<6:45:33,  1.44it/s]


 30%|█████████▊                       | 14862/50000 [2:41:33<6:34:05,  1.49it/s]


 30%|█████████▊                       | 14863/50000 [2:41:33<6:27:04,  1.51it/s]


 30%|█████████▊                       | 14864/50000 [2:41:34<6:15:24,  1.56it/s]


 30%|█████████▊                       | 14865/50000 [2:41:35<6:09:32,  1.58it/s]


 30%|█████████▊                       | 14866/50000 [2:41:35<6:46:26,  1.44it/s]


 30%|█████████▊                       | 14867/50000 [2:41:36<6:35:39,  1.48it/s]


 30%|█████████▊                       | 14868/50000 [2:41:37<6:01:07,  1.62it/s]


 30%|█████████▊                       | 14869/50000 [2:41:37<5:42:17,  1.71it/s]


 30%|█████████▊                       | 14870/50000 [2:41:38<5:44:03,  1.70it/s]


 30%|█████████▊                       | 14871/50000 [2:41:38<5:57:32,  1.64it/s]


 30%|█████████▊                       | 14872/50000 [2:41:39<6:20:41,  1.54it/s]


 30%|█████████▊                       | 14873/50000 [2:41:40<6:07:39,  1.59it/s]


 30%|█████████▊                       | 14874/50000 [2:41:40<6:14:14,  1.56it/s]


 30%|█████████▊                       | 14875/50000 [2:41:41<6:09:34,  1.58it/s]


 30%|█████████▊                       | 14876/50000 [2:41:42<6:35:24,  1.48it/s]


 30%|█████████▊                       | 14877/50000 [2:41:42<6:18:20,  1.55it/s]


 30%|█████████▊                       | 14878/50000 [2:41:43<6:07:32,  1.59it/s]


 30%|█████████▊                       | 14879/50000 [2:41:43<5:57:55,  1.64it/s]


 30%|█████████▊                       | 14880/50000 [2:41:44<6:20:31,  1.54it/s]


 30%|█████████▊                       | 14881/50000 [2:41:45<6:20:12,  1.54it/s]


 30%|█████████▊                       | 14882/50000 [2:41:45<6:11:02,  1.58it/s]


 30%|█████████▊                       | 14883/50000 [2:41:46<6:04:42,  1.60it/s]


 30%|█████████▊                       | 14884/50000 [2:41:47<5:41:50,  1.71it/s]


 30%|█████████▊                       | 14885/50000 [2:41:47<5:51:49,  1.66it/s]


 30%|█████████▊                       | 14886/50000 [2:41:48<5:53:11,  1.66it/s]


 30%|█████████▊                       | 14887/50000 [2:41:48<5:53:10,  1.66it/s]


 30%|█████████▊                       | 14888/50000 [2:41:49<6:02:02,  1.62it/s]


 30%|█████████▊                       | 14889/50000 [2:41:50<6:25:19,  1.52it/s]


 30%|█████████▊                       | 14890/50000 [2:41:50<6:00:36,  1.62it/s]


 30%|█████████▊                       | 14891/50000 [2:41:51<6:04:59,  1.60it/s]


 30%|█████████▊                       | 14892/50000 [2:41:52<6:02:23,  1.61it/s]


 30%|█████████▊                       | 14893/50000 [2:41:52<6:06:10,  1.60it/s]


 30%|█████████▊                       | 14894/50000 [2:41:53<6:07:51,  1.59it/s]


 30%|█████████▊                       | 14895/50000 [2:41:53<5:53:27,  1.66it/s]


 30%|█████████▊                       | 14896/50000 [2:41:54<5:53:16,  1.66it/s]


 30%|█████████▊                       | 14897/50000 [2:41:55<6:03:07,  1.61it/s]


 30%|█████████▊                       | 14898/50000 [2:41:55<5:57:53,  1.63it/s]


 30%|█████████▊                       | 14899/50000 [2:41:56<5:56:23,  1.64it/s]


 30%|█████████▊                       | 14900/50000 [2:41:56<6:02:09,  1.62it/s]
                                                                                
{'loss': 3.3746, 'grad_norm': 3.0613176822662354, 'learning_rate': 0.0007019999999999999, 'epoch': 0.78}

 30%|█████████▊                       | 14900/50000 [2:41:56<6:02:09,  1.62it/s]


 30%|█████████▊                       | 14901/50000 [2:41:57<6:06:11,  1.60it/s]


 30%|█████████▊                       | 14902/50000 [2:41:58<5:47:18,  1.68it/s]


 30%|█████████▊                       | 14903/50000 [2:41:58<6:26:50,  1.51it/s]


 30%|█████████▊                       | 14904/50000 [2:41:59<6:06:45,  1.59it/s]


 30%|█████████▊                       | 14905/50000 [2:42:00<6:09:11,  1.58it/s]


 30%|█████████▊                       | 14906/50000 [2:42:00<5:55:29,  1.65it/s]


 30%|█████████▊                       | 14907/50000 [2:42:01<6:17:43,  1.55it/s]


 30%|█████████▊                       | 14908/50000 [2:42:02<6:04:52,  1.60it/s]


 30%|█████████▊                       | 14909/50000 [2:42:02<6:13:22,  1.57it/s]


 30%|█████████▊                       | 14910/50000 [2:42:03<6:03:59,  1.61it/s]


 30%|█████████▊                       | 14911/50000 [2:42:03<5:45:41,  1.69it/s]


 30%|█████████▊                       | 14912/50000 [2:42:04<5:29:37,  1.77it/s]


 30%|█████████▊                       | 14913/50000 [2:42:04<5:32:09,  1.76it/s]


 30%|█████████▊                       | 14914/50000 [2:42:05<5:30:36,  1.77it/s]


 30%|█████████▊                       | 14915/50000 [2:42:06<5:38:17,  1.73it/s]


 30%|█████████▊                       | 14916/50000 [2:42:06<6:10:45,  1.58it/s]


 30%|█████████▊                       | 14917/50000 [2:42:07<6:09:53,  1.58it/s]


 30%|█████████▊                       | 14918/50000 [2:42:08<6:36:54,  1.47it/s]


 30%|█████████▊                       | 14919/50000 [2:42:08<6:13:50,  1.56it/s]


 30%|█████████▊                       | 14920/50000 [2:42:09<6:06:28,  1.60it/s]


 30%|█████████▊                       | 14921/50000 [2:42:09<5:52:39,  1.66it/s]


 30%|█████████▊                       | 14922/50000 [2:42:10<6:03:04,  1.61it/s]


 30%|█████████▊                       | 14923/50000 [2:42:11<6:27:12,  1.51it/s]


 30%|█████████▊                       | 14924/50000 [2:42:11<6:22:34,  1.53it/s]


 30%|█████████▊                       | 14925/50000 [2:42:12<6:40:45,  1.46it/s]


 30%|█████████▊                       | 14926/50000 [2:42:13<6:26:29,  1.51it/s]


 30%|█████████▊                       | 14927/50000 [2:42:13<6:10:57,  1.58it/s]


 30%|█████████▊                       | 14928/50000 [2:42:14<6:24:27,  1.52it/s]


 30%|█████████▊                       | 14929/50000 [2:42:15<6:29:30,  1.50it/s]


 30%|█████████▊                       | 14930/50000 [2:42:15<6:27:20,  1.51it/s]


 30%|█████████▊                       | 14931/50000 [2:42:16<6:30:22,  1.50it/s]


 30%|█████████▊                       | 14932/50000 [2:42:17<6:39:47,  1.46it/s]


 30%|█████████▊                       | 14933/50000 [2:42:17<6:22:28,  1.53it/s]


 30%|█████████▊                       | 14934/50000 [2:42:18<6:20:15,  1.54it/s]


 30%|█████████▊                       | 14935/50000 [2:42:19<6:35:34,  1.48it/s]


 30%|█████████▊                       | 14936/50000 [2:42:19<6:19:03,  1.54it/s]


 30%|█████████▊                       | 14937/50000 [2:42:20<6:16:22,  1.55it/s]


 30%|█████████▊                       | 14938/50000 [2:42:21<6:33:35,  1.48it/s]


 30%|█████████▊                       | 14939/50000 [2:42:21<6:19:48,  1.54it/s]


 30%|█████████▊                       | 14940/50000 [2:42:22<6:10:12,  1.58it/s]


 30%|█████████▊                       | 14941/50000 [2:42:23<6:26:04,  1.51it/s]


 30%|█████████▊                       | 14942/50000 [2:42:23<6:20:08,  1.54it/s]


 30%|█████████▊                       | 14943/50000 [2:42:24<6:26:44,  1.51it/s]


 30%|█████████▊                       | 14944/50000 [2:42:25<6:12:54,  1.57it/s]


 30%|█████████▊                       | 14945/50000 [2:42:25<6:30:15,  1.50it/s]


 30%|█████████▊                       | 14946/50000 [2:42:26<6:30:46,  1.50it/s]


 30%|█████████▊                       | 14947/50000 [2:42:27<6:02:53,  1.61it/s]


 30%|█████████▊                       | 14948/50000 [2:42:27<6:08:16,  1.59it/s]


 30%|█████████▊                       | 14949/50000 [2:42:28<6:02:12,  1.61it/s]


 30%|█████████▊                       | 14950/50000 [2:42:28<6:10:07,  1.58it/s]


 30%|█████████▊                       | 14951/50000 [2:42:29<6:04:24,  1.60it/s]


 30%|█████████▊                       | 14952/50000 [2:42:30<5:58:16,  1.63it/s]


 30%|█████████▊                       | 14953/50000 [2:42:30<6:17:40,  1.55it/s]


 30%|█████████▊                       | 14954/50000 [2:42:31<6:06:29,  1.59it/s]


 30%|█████████▊                       | 14955/50000 [2:42:32<6:30:05,  1.50it/s]


 30%|█████████▊                       | 14956/50000 [2:42:32<6:25:02,  1.52it/s]


 30%|█████████▊                       | 14957/50000 [2:42:33<6:14:53,  1.56it/s]


 30%|█████████▊                       | 14958/50000 [2:42:34<6:52:21,  1.42it/s]


 30%|█████████▊                       | 14959/50000 [2:42:34<6:34:02,  1.48it/s]


 30%|█████████▊                       | 14960/50000 [2:42:35<7:00:01,  1.39it/s]


 30%|█████████▊                       | 14961/50000 [2:42:36<6:47:14,  1.43it/s]


 30%|█████████▊                       | 14962/50000 [2:42:37<6:57:29,  1.40it/s]


 30%|█████████▉                       | 14963/50000 [2:42:37<7:02:25,  1.38it/s]


 30%|█████████▉                       | 14964/50000 [2:42:38<6:41:02,  1.46it/s]


 30%|█████████▉                       | 14965/50000 [2:42:39<6:36:09,  1.47it/s]


 30%|█████████▉                       | 14966/50000 [2:42:39<6:21:33,  1.53it/s]


 30%|█████████▉                       | 14967/50000 [2:42:40<6:13:30,  1.56it/s]


 30%|█████████▉                       | 14968/50000 [2:42:40<5:58:05,  1.63it/s]


 30%|█████████▉                       | 14969/50000 [2:42:41<6:10:20,  1.58it/s]


 30%|█████████▉                       | 14970/50000 [2:42:42<6:13:37,  1.56it/s]


 30%|█████████▉                       | 14971/50000 [2:42:42<6:06:08,  1.59it/s]


 30%|█████████▉                       | 14972/50000 [2:42:43<6:09:45,  1.58it/s]


 30%|█████████▉                       | 14973/50000 [2:42:43<5:56:49,  1.64it/s]


 30%|█████████▉                       | 14974/50000 [2:42:44<6:07:38,  1.59it/s]


 30%|█████████▉                       | 14975/50000 [2:42:45<6:08:33,  1.58it/s]


 30%|█████████▉                       | 14976/50000 [2:42:45<6:13:35,  1.56it/s]


 30%|█████████▉                       | 14977/50000 [2:42:46<6:21:33,  1.53it/s]


 30%|█████████▉                       | 14978/50000 [2:42:47<6:19:25,  1.54it/s]


 30%|█████████▉                       | 14979/50000 [2:42:47<6:08:56,  1.58it/s]


 30%|█████████▉                       | 14980/50000 [2:42:48<6:01:40,  1.61it/s]


 30%|█████████▉                       | 14981/50000 [2:42:49<5:54:39,  1.65it/s]


 30%|█████████▉                       | 14982/50000 [2:42:49<5:50:44,  1.66it/s]


 30%|█████████▉                       | 14983/50000 [2:42:50<5:44:17,  1.70it/s]


 30%|█████████▉                       | 14984/50000 [2:42:50<5:58:22,  1.63it/s]


 30%|█████████▉                       | 14985/50000 [2:42:51<5:46:12,  1.69it/s]


 30%|█████████▉                       | 14986/50000 [2:42:52<6:12:25,  1.57it/s]


 30%|█████████▉                       | 14987/50000 [2:42:52<6:04:46,  1.60it/s]


 30%|█████████▉                       | 14988/50000 [2:42:53<6:42:05,  1.45it/s]


 30%|█████████▉                       | 14989/50000 [2:42:54<6:18:10,  1.54it/s]


 30%|█████████▉                       | 14990/50000 [2:42:54<6:29:56,  1.50it/s]


 30%|█████████▉                       | 14991/50000 [2:42:55<6:29:07,  1.50it/s]


 30%|█████████▉                       | 14992/50000 [2:42:56<7:00:42,  1.39it/s]


 30%|█████████▉                       | 14993/50000 [2:42:56<6:35:17,  1.48it/s]


 30%|█████████▉                       | 14994/50000 [2:42:57<6:15:20,  1.55it/s]


 30%|█████████▉                       | 14995/50000 [2:42:58<6:26:52,  1.51it/s]


 30%|█████████▉                       | 14996/50000 [2:42:58<6:23:49,  1.52it/s]


 30%|█████████▉                       | 14997/50000 [2:42:59<6:22:38,  1.52it/s]


 30%|█████████▉                       | 14998/50000 [2:43:00<6:26:21,  1.51it/s]


 30%|█████████▉                       | 14999/50000 [2:43:01<6:53:05,  1.41it/s]


 30%|█████████▉                       | 15000/50000 [2:43:01<6:34:22,  1.48it/s]
                                                                                
{'loss': 3.3641, 'grad_norm': 3.584228754043579, 'learning_rate': 0.0007, 'epoch': 0.79}

 30%|█████████▉                       | 15000/50000 [2:43:01<6:34:22,  1.48it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:05<00:05,  2.85s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:08<00:02,  2.72s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:10<00:00,  2.53s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 32.061062, 'eval_rouge-2': 7.405709999999999, 'eval_rouge-l': 24.812397999999998, 'eval_bleu-4': 0.03586395711198206, 'eval_runtime': 17.0859, 'eval_samples_per_second': 2.926, 'eval_steps_per_second': 0.234, 'epoch': 0.79}

 30%|█████████▉                       | 15000/50000 [2:43:18<6:34:22,  1.48it/s]

100%|█████████████████████████████████████████████| 4/4 [00:10<00:00,  2.53s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-15000


tokenizer config file saved in ./output/tmp-checkpoint-15000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-15000/special_tokens_map.json



 30%|█████████▌                      | 15001/50000 [2:43:19<56:28:41,  5.81s/it]


 30%|█████████▌                      | 15002/50000 [2:43:19<41:08:25,  4.23s/it]


 30%|█████████▌                      | 15003/50000 [2:43:20<30:28:41,  3.14s/it]


 30%|█████████▌                      | 15004/50000 [2:43:21<23:26:41,  2.41s/it]


 30%|█████████▌                      | 15005/50000 [2:43:21<18:08:40,  1.87s/it]


 30%|█████████▌                      | 15006/50000 [2:43:22<14:34:57,  1.50s/it]


 30%|█████████▌                      | 15007/50000 [2:43:23<12:12:02,  1.26s/it]


 30%|█████████▌                      | 15008/50000 [2:43:23<10:08:48,  1.04s/it]


 30%|█████████▉                       | 15009/50000 [2:43:24<9:00:07,  1.08it/s]


 30%|█████████▉                       | 15010/50000 [2:43:24<7:59:54,  1.22it/s]


 30%|█████████▉                       | 15011/50000 [2:43:25<7:25:29,  1.31it/s]


 30%|█████████▉                       | 15012/50000 [2:43:26<6:53:40,  1.41it/s]


 30%|█████████▉                       | 15013/50000 [2:43:26<6:28:06,  1.50it/s]


 30%|█████████▉                       | 15014/50000 [2:43:27<6:02:55,  1.61it/s]


 30%|█████████▉                       | 15015/50000 [2:43:27<6:13:38,  1.56it/s]


 30%|█████████▉                       | 15016/50000 [2:43:28<6:00:09,  1.62it/s]


 30%|█████████▉                       | 15017/50000 [2:43:29<5:58:17,  1.63it/s]


 30%|█████████▉                       | 15018/50000 [2:43:29<5:39:53,  1.72it/s]


 30%|█████████▉                       | 15019/50000 [2:43:30<5:53:00,  1.65it/s]


 30%|█████████▉                       | 15020/50000 [2:43:30<5:56:38,  1.63it/s]


 30%|█████████▉                       | 15021/50000 [2:43:31<5:58:08,  1.63it/s]


 30%|█████████▉                       | 15022/50000 [2:43:32<5:55:51,  1.64it/s]


 30%|█████████▉                       | 15023/50000 [2:43:32<6:00:46,  1.62it/s]


 30%|█████████▉                       | 15024/50000 [2:43:33<6:04:19,  1.60it/s]


 30%|█████████▉                       | 15025/50000 [2:43:34<6:42:04,  1.45it/s]


 30%|█████████▉                       | 15026/50000 [2:43:34<6:26:34,  1.51it/s]


 30%|█████████▉                       | 15027/50000 [2:43:35<6:13:30,  1.56it/s]


 30%|█████████▉                       | 15028/50000 [2:43:35<5:45:08,  1.69it/s]


 30%|█████████▉                       | 15029/50000 [2:43:36<5:31:50,  1.76it/s]


 30%|█████████▉                       | 15030/50000 [2:43:36<5:23:36,  1.80it/s]


 30%|█████████▉                       | 15031/50000 [2:43:37<5:50:27,  1.66it/s]


 30%|█████████▉                       | 15032/50000 [2:43:38<5:48:11,  1.67it/s]


 30%|█████████▉                       | 15033/50000 [2:43:38<6:02:05,  1.61it/s]


 30%|█████████▉                       | 15034/50000 [2:43:39<6:03:00,  1.61it/s]


 30%|█████████▉                       | 15035/50000 [2:43:40<5:44:20,  1.69it/s]


 30%|█████████▉                       | 15036/50000 [2:43:40<5:50:36,  1.66it/s]


 30%|█████████▉                       | 15037/50000 [2:43:41<6:47:04,  1.43it/s]


 30%|█████████▉                       | 15038/50000 [2:43:42<6:27:57,  1.50it/s]


 30%|█████████▉                       | 15039/50000 [2:43:42<6:22:11,  1.52it/s]


 30%|█████████▉                       | 15040/50000 [2:43:43<6:20:19,  1.53it/s]


 30%|█████████▉                       | 15041/50000 [2:43:44<6:22:23,  1.52it/s]


 30%|█████████▉                       | 15042/50000 [2:43:44<6:23:49,  1.52it/s]


 30%|█████████▉                       | 15043/50000 [2:43:45<6:17:07,  1.54it/s]


 30%|█████████▉                       | 15044/50000 [2:43:46<6:33:51,  1.48it/s]


 30%|█████████▉                       | 15045/50000 [2:43:46<6:44:26,  1.44it/s]


 30%|█████████▉                       | 15046/50000 [2:43:47<6:48:02,  1.43it/s]


 30%|█████████▉                       | 15047/50000 [2:43:48<6:34:07,  1.48it/s]


 30%|█████████▉                       | 15048/50000 [2:43:48<6:33:03,  1.48it/s]


 30%|█████████▉                       | 15049/50000 [2:43:49<6:17:53,  1.54it/s]


 30%|█████████▉                       | 15050/50000 [2:43:50<6:03:58,  1.60it/s]


 30%|█████████▉                       | 15051/50000 [2:43:50<6:27:51,  1.50it/s]


 30%|█████████▉                       | 15052/50000 [2:43:51<6:19:53,  1.53it/s]


 30%|█████████▉                       | 15053/50000 [2:43:52<6:11:54,  1.57it/s]


 30%|█████████▉                       | 15054/50000 [2:43:52<6:24:24,  1.52it/s]


 30%|█████████▉                       | 15055/50000 [2:43:53<6:14:01,  1.56it/s]


 30%|█████████▉                       | 15056/50000 [2:43:54<6:28:15,  1.50it/s]


 30%|█████████▉                       | 15057/50000 [2:43:54<6:14:32,  1.55it/s]


 30%|█████████▉                       | 15058/50000 [2:43:55<6:16:02,  1.55it/s]


 30%|█████████▉                       | 15059/50000 [2:43:56<6:27:10,  1.50it/s]


 30%|█████████▉                       | 15060/50000 [2:43:56<6:13:46,  1.56it/s]


 30%|█████████▉                       | 15061/50000 [2:43:57<6:28:02,  1.50it/s]


 30%|█████████▉                       | 15062/50000 [2:43:58<6:25:28,  1.51it/s]


 30%|█████████▉                       | 15063/50000 [2:43:58<6:37:17,  1.47it/s]


 30%|█████████▉                       | 15064/50000 [2:43:59<6:15:26,  1.55it/s]


 30%|█████████▉                       | 15065/50000 [2:43:59<6:20:01,  1.53it/s]


 30%|█████████▉                       | 15066/50000 [2:44:00<7:04:05,  1.37it/s]


 30%|█████████▉                       | 15067/50000 [2:44:01<6:49:31,  1.42it/s]


 30%|█████████▉                       | 15068/50000 [2:44:02<6:38:14,  1.46it/s]


 30%|█████████▉                       | 15069/50000 [2:44:02<6:32:30,  1.48it/s]


 30%|█████████▉                       | 15070/50000 [2:44:03<6:16:33,  1.55it/s]


 30%|█████████▉                       | 15071/50000 [2:44:03<6:06:21,  1.59it/s]


 30%|█████████▉                       | 15072/50000 [2:44:04<6:07:58,  1.58it/s]


 30%|█████████▉                       | 15073/50000 [2:44:05<6:41:00,  1.45it/s]


 30%|█████████▉                       | 15074/50000 [2:44:06<6:21:17,  1.53it/s]


 30%|█████████▉                       | 15075/50000 [2:44:06<6:56:14,  1.40it/s]


 30%|█████████▉                       | 15076/50000 [2:44:07<7:06:38,  1.36it/s]


 30%|█████████▉                       | 15077/50000 [2:44:08<6:38:13,  1.46it/s]


 30%|█████████▉                       | 15078/50000 [2:44:08<6:18:15,  1.54it/s]


 30%|█████████▉                       | 15079/50000 [2:44:09<6:30:07,  1.49it/s]


 30%|█████████▉                       | 15080/50000 [2:44:10<6:25:58,  1.51it/s]


 30%|█████████▉                       | 15081/50000 [2:44:10<6:34:28,  1.48it/s]


 30%|█████████▉                       | 15082/50000 [2:44:11<6:17:03,  1.54it/s]


 30%|█████████▉                       | 15083/50000 [2:44:12<6:31:49,  1.49it/s]


 30%|█████████▉                       | 15084/50000 [2:44:12<6:31:51,  1.49it/s]


 30%|█████████▉                       | 15085/50000 [2:44:13<6:27:03,  1.50it/s]


 30%|█████████▉                       | 15086/50000 [2:44:14<6:45:00,  1.44it/s]


 30%|█████████▉                       | 15087/50000 [2:44:14<6:31:48,  1.49it/s]


 30%|█████████▉                       | 15088/50000 [2:44:15<6:18:33,  1.54it/s]


 30%|█████████▉                       | 15089/50000 [2:44:16<6:22:13,  1.52it/s]


 30%|█████████▉                       | 15090/50000 [2:44:16<6:18:01,  1.54it/s]


 30%|█████████▉                       | 15091/50000 [2:44:17<6:09:33,  1.57it/s]


 30%|█████████▉                       | 15092/50000 [2:44:17<5:59:16,  1.62it/s]


 30%|█████████▉                       | 15093/50000 [2:44:18<5:54:16,  1.64it/s]


 30%|█████████▉                       | 15094/50000 [2:44:19<5:51:49,  1.65it/s]


 30%|█████████▉                       | 15095/50000 [2:44:19<5:37:26,  1.72it/s]


 30%|█████████▉                       | 15096/50000 [2:44:20<6:01:03,  1.61it/s]


 30%|█████████▉                       | 15097/50000 [2:44:20<5:54:26,  1.64it/s]


 30%|█████████▉                       | 15098/50000 [2:44:21<5:50:51,  1.66it/s]


 30%|█████████▉                       | 15099/50000 [2:44:22<5:50:37,  1.66it/s]


 30%|█████████▉                       | 15100/50000 [2:44:22<5:54:45,  1.64it/s]
                                                                                
{'loss': 3.3344, 'grad_norm': 3.7846291065216064, 'learning_rate': 0.0006979999999999999, 'epoch': 0.79}

 30%|█████████▉                       | 15100/50000 [2:44:22<5:54:45,  1.64it/s]


 30%|█████████▉                       | 15101/50000 [2:44:23<5:50:27,  1.66it/s]


 30%|█████████▉                       | 15102/50000 [2:44:24<6:31:04,  1.49it/s]


 30%|█████████▉                       | 15103/50000 [2:44:24<6:47:44,  1.43it/s]


 30%|█████████▉                       | 15104/50000 [2:44:25<6:42:33,  1.44it/s]


 30%|█████████▉                       | 15105/50000 [2:44:26<6:33:17,  1.48it/s]


 30%|█████████▉                       | 15106/50000 [2:44:26<6:18:00,  1.54it/s]


 30%|█████████▉                       | 15107/50000 [2:44:27<6:18:18,  1.54it/s]


 30%|█████████▉                       | 15108/50000 [2:44:28<6:52:57,  1.41it/s]


 30%|█████████▉                       | 15109/50000 [2:44:29<7:00:57,  1.38it/s]


 30%|█████████▉                       | 15110/50000 [2:44:29<6:25:46,  1.51it/s]


 30%|█████████▉                       | 15111/50000 [2:44:30<6:41:47,  1.45it/s]


 30%|█████████▉                       | 15112/50000 [2:44:31<7:04:43,  1.37it/s]


 30%|█████████▉                       | 15113/50000 [2:44:31<6:36:00,  1.47it/s]


 30%|█████████▉                       | 15114/50000 [2:44:32<6:12:23,  1.56it/s]


 30%|█████████▉                       | 15115/50000 [2:44:33<6:26:52,  1.50it/s]


 30%|█████████▉                       | 15116/50000 [2:44:34<7:30:08,  1.29it/s]


 30%|█████████▉                       | 15117/50000 [2:44:34<7:09:09,  1.35it/s]


 30%|█████████▉                       | 15118/50000 [2:44:35<6:44:23,  1.44it/s]


 30%|█████████▉                       | 15119/50000 [2:44:35<6:33:47,  1.48it/s]


 30%|█████████▉                       | 15120/50000 [2:44:36<6:35:14,  1.47it/s]


 30%|█████████▉                       | 15121/50000 [2:44:37<6:18:34,  1.54it/s]


 30%|█████████▉                       | 15122/50000 [2:44:37<6:32:36,  1.48it/s]


 30%|█████████▉                       | 15123/50000 [2:44:38<6:30:57,  1.49it/s]


 30%|█████████▉                       | 15124/50000 [2:44:39<6:17:45,  1.54it/s]


 30%|█████████▉                       | 15125/50000 [2:44:40<6:38:44,  1.46it/s]


 30%|█████████▉                       | 15126/50000 [2:44:40<6:45:45,  1.43it/s]


 30%|█████████▉                       | 15127/50000 [2:44:41<6:21:29,  1.52it/s]


 30%|█████████▉                       | 15128/50000 [2:44:41<6:19:49,  1.53it/s]


 30%|█████████▉                       | 15129/50000 [2:44:42<6:15:24,  1.55it/s]


 30%|█████████▉                       | 15130/50000 [2:44:43<6:12:23,  1.56it/s]


 30%|█████████▉                       | 15131/50000 [2:44:43<5:52:03,  1.65it/s]


 30%|█████████▉                       | 15132/50000 [2:44:44<6:04:47,  1.59it/s]


 30%|█████████▉                       | 15133/50000 [2:44:45<6:08:22,  1.58it/s]


 30%|█████████▉                       | 15134/50000 [2:44:45<6:02:50,  1.60it/s]


 30%|█████████▉                       | 15135/50000 [2:44:46<6:27:10,  1.50it/s]


 30%|█████████▉                       | 15136/50000 [2:44:47<6:21:27,  1.52it/s]


 30%|█████████▉                       | 15137/50000 [2:44:47<6:31:41,  1.48it/s]


 30%|█████████▉                       | 15138/50000 [2:44:48<6:43:34,  1.44it/s]


 30%|█████████▉                       | 15139/50000 [2:44:49<6:31:35,  1.48it/s]


 30%|█████████▉                       | 15140/50000 [2:44:49<6:18:56,  1.53it/s]


 30%|█████████▉                       | 15141/50000 [2:44:50<6:03:30,  1.60it/s]


 30%|█████████▉                       | 15142/50000 [2:44:50<6:07:58,  1.58it/s]


 30%|█████████▉                       | 15143/50000 [2:44:51<6:16:41,  1.54it/s]


 30%|█████████▉                       | 15144/50000 [2:44:52<6:00:55,  1.61it/s]


 30%|█████████▉                       | 15145/50000 [2:44:52<5:56:11,  1.63it/s]


 30%|█████████▉                       | 15146/50000 [2:44:53<5:51:52,  1.65it/s]


 30%|█████████▉                       | 15147/50000 [2:44:54<6:04:18,  1.59it/s]


 30%|█████████▉                       | 15148/50000 [2:44:54<5:51:01,  1.65it/s]


 30%|█████████▉                       | 15149/50000 [2:44:55<6:00:21,  1.61it/s]


 30%|█████████▉                       | 15150/50000 [2:44:55<5:55:54,  1.63it/s]


 30%|█████████▉                       | 15151/50000 [2:44:56<6:16:47,  1.54it/s]


 30%|██████████                       | 15152/50000 [2:44:57<6:08:28,  1.58it/s]


 30%|██████████                       | 15153/50000 [2:44:57<5:48:55,  1.66it/s]


 30%|██████████                       | 15154/50000 [2:44:58<5:54:31,  1.64it/s]


 30%|██████████                       | 15155/50000 [2:44:59<6:16:27,  1.54it/s]


 30%|██████████                       | 15156/50000 [2:44:59<6:21:31,  1.52it/s]


 30%|██████████                       | 15157/50000 [2:45:00<6:03:07,  1.60it/s]


 30%|██████████                       | 15158/50000 [2:45:00<5:59:57,  1.61it/s]


 30%|██████████                       | 15159/50000 [2:45:01<5:52:17,  1.65it/s]


 30%|██████████                       | 15160/50000 [2:45:02<5:54:42,  1.64it/s]


 30%|██████████                       | 15161/50000 [2:45:02<5:46:24,  1.68it/s]


 30%|██████████                       | 15162/50000 [2:45:03<6:09:36,  1.57it/s]


 30%|██████████                       | 15163/50000 [2:45:04<6:22:08,  1.52it/s]


 30%|██████████                       | 15164/50000 [2:45:04<6:15:34,  1.55it/s]


 30%|██████████                       | 15165/50000 [2:45:05<6:29:02,  1.49it/s]


 30%|██████████                       | 15166/50000 [2:45:06<6:07:14,  1.58it/s]


 30%|██████████                       | 15167/50000 [2:45:06<6:05:03,  1.59it/s]


 30%|██████████                       | 15168/50000 [2:45:07<6:08:57,  1.57it/s]


 30%|██████████                       | 15169/50000 [2:45:07<5:55:13,  1.63it/s]


 30%|██████████                       | 15170/50000 [2:45:08<6:02:06,  1.60it/s]


 30%|██████████                       | 15171/50000 [2:45:09<6:01:34,  1.61it/s]


 30%|██████████                       | 15172/50000 [2:45:09<5:55:35,  1.63it/s]


 30%|██████████                       | 15173/50000 [2:45:10<5:52:25,  1.65it/s]


 30%|██████████                       | 15174/50000 [2:45:10<5:49:09,  1.66it/s]


 30%|██████████                       | 15175/50000 [2:45:11<5:26:41,  1.78it/s]


 30%|██████████                       | 15176/50000 [2:45:11<5:34:22,  1.74it/s]


 30%|██████████                       | 15177/50000 [2:45:12<5:59:01,  1.62it/s]


 30%|██████████                       | 15178/50000 [2:45:13<6:22:50,  1.52it/s]


 30%|██████████                       | 15179/50000 [2:45:14<6:17:06,  1.54it/s]


 30%|██████████                       | 15180/50000 [2:45:14<6:22:56,  1.52it/s]


 30%|██████████                       | 15181/50000 [2:45:15<6:17:18,  1.54it/s]


 30%|██████████                       | 15182/50000 [2:45:16<6:14:15,  1.55it/s]


 30%|██████████                       | 15183/50000 [2:45:16<6:04:22,  1.59it/s]


 30%|██████████                       | 15184/50000 [2:45:17<6:36:59,  1.46it/s]


 30%|██████████                       | 15185/50000 [2:45:17<6:12:44,  1.56it/s]


 30%|██████████                       | 15186/50000 [2:45:18<6:42:51,  1.44it/s]


 30%|██████████                       | 15187/50000 [2:45:19<6:39:49,  1.45it/s]


 30%|██████████                       | 15188/50000 [2:45:20<6:21:16,  1.52it/s]


 30%|██████████                       | 15189/50000 [2:45:20<6:31:48,  1.48it/s]


 30%|██████████                       | 15190/50000 [2:45:21<6:42:22,  1.44it/s]


 30%|██████████                       | 15191/50000 [2:45:22<6:39:29,  1.45it/s]


 30%|██████████                       | 15192/50000 [2:45:22<6:29:49,  1.49it/s]


 30%|██████████                       | 15193/50000 [2:45:23<6:21:47,  1.52it/s]


 30%|██████████                       | 15194/50000 [2:45:24<6:30:36,  1.49it/s]


 30%|██████████                       | 15195/50000 [2:45:24<6:32:20,  1.48it/s]


 30%|██████████                       | 15196/50000 [2:45:25<6:11:37,  1.56it/s]


 30%|██████████                       | 15197/50000 [2:45:25<6:00:28,  1.61it/s]


 30%|██████████                       | 15198/50000 [2:45:26<6:28:10,  1.49it/s]


 30%|██████████                       | 15199/50000 [2:45:27<6:15:14,  1.55it/s]


 30%|██████████                       | 15200/50000 [2:45:27<6:17:12,  1.54it/s]
                                                                                
{'loss': 3.3806, 'grad_norm': 3.648153066635132, 'learning_rate': 0.000696, 'epoch': 0.8}

 30%|██████████                       | 15200/50000 [2:45:27<6:17:12,  1.54it/s]


 30%|██████████                       | 15201/50000 [2:45:28<6:31:15,  1.48it/s]


 30%|██████████                       | 15202/50000 [2:45:29<6:08:36,  1.57it/s]


 30%|██████████                       | 15203/50000 [2:45:30<6:28:16,  1.49it/s]


 30%|██████████                       | 15204/50000 [2:45:30<6:21:54,  1.52it/s]


 30%|██████████                       | 15205/50000 [2:45:31<6:16:15,  1.54it/s]


 30%|██████████                       | 15206/50000 [2:45:31<6:07:54,  1.58it/s]


 30%|██████████                       | 15207/50000 [2:45:32<6:30:00,  1.49it/s]


 30%|██████████                       | 15208/50000 [2:45:33<6:23:07,  1.51it/s]


 30%|██████████                       | 15209/50000 [2:45:33<6:10:32,  1.56it/s]


 30%|██████████                       | 15210/50000 [2:45:34<5:49:22,  1.66it/s]


 30%|██████████                       | 15211/50000 [2:45:35<6:01:31,  1.60it/s]


 30%|██████████                       | 15212/50000 [2:45:35<5:49:12,  1.66it/s]


 30%|██████████                       | 15213/50000 [2:45:36<6:09:39,  1.57it/s]


 30%|██████████                       | 15214/50000 [2:45:36<6:14:37,  1.55it/s]


 30%|██████████                       | 15215/50000 [2:45:37<6:04:55,  1.59it/s]


 30%|██████████                       | 15216/50000 [2:45:38<6:42:58,  1.44it/s]


 30%|██████████                       | 15217/50000 [2:45:39<7:05:10,  1.36it/s]


 30%|██████████                       | 15218/50000 [2:45:39<6:56:57,  1.39it/s]


 30%|██████████                       | 15219/50000 [2:45:40<6:42:29,  1.44it/s]


 30%|██████████                       | 15220/50000 [2:45:41<6:35:25,  1.47it/s]


 30%|██████████                       | 15221/50000 [2:45:41<6:27:14,  1.50it/s]


 30%|██████████                       | 15222/50000 [2:45:42<6:01:10,  1.60it/s]


 30%|██████████                       | 15223/50000 [2:45:42<5:55:26,  1.63it/s]


 30%|██████████                       | 15224/50000 [2:45:43<5:49:16,  1.66it/s]


 30%|██████████                       | 15225/50000 [2:45:44<5:58:07,  1.62it/s]


 30%|██████████                       | 15226/50000 [2:45:44<5:52:32,  1.64it/s]


 30%|██████████                       | 15227/50000 [2:45:45<5:55:12,  1.63it/s]


 30%|██████████                       | 15228/50000 [2:45:45<5:43:40,  1.69it/s]


 30%|██████████                       | 15229/50000 [2:45:46<5:58:38,  1.62it/s]


 30%|██████████                       | 15230/50000 [2:45:47<6:04:37,  1.59it/s]


 30%|██████████                       | 15231/50000 [2:45:47<5:42:00,  1.69it/s]


 30%|██████████                       | 15232/50000 [2:45:48<6:20:21,  1.52it/s]


 30%|██████████                       | 15233/50000 [2:45:49<6:08:53,  1.57it/s]


 30%|██████████                       | 15234/50000 [2:45:49<6:17:20,  1.54it/s]


 30%|██████████                       | 15235/50000 [2:45:50<6:15:32,  1.54it/s]


 30%|██████████                       | 15236/50000 [2:45:51<6:14:29,  1.55it/s]


 30%|██████████                       | 15237/50000 [2:45:51<6:02:35,  1.60it/s]


 30%|██████████                       | 15238/50000 [2:45:52<5:49:35,  1.66it/s]


 30%|██████████                       | 15239/50000 [2:45:52<6:02:24,  1.60it/s]


 30%|██████████                       | 15240/50000 [2:45:53<5:56:02,  1.63it/s]


 30%|██████████                       | 15241/50000 [2:45:54<6:00:14,  1.61it/s]


 30%|██████████                       | 15242/50000 [2:45:54<6:01:03,  1.60it/s]


 30%|██████████                       | 15243/50000 [2:45:55<6:08:10,  1.57it/s]


 30%|██████████                       | 15244/50000 [2:45:56<6:14:21,  1.55it/s]


 30%|██████████                       | 15245/50000 [2:45:56<5:56:46,  1.62it/s]


 30%|██████████                       | 15246/50000 [2:45:57<5:49:54,  1.66it/s]


 30%|██████████                       | 15247/50000 [2:45:57<6:11:42,  1.56it/s]


 30%|██████████                       | 15248/50000 [2:45:58<6:10:03,  1.57it/s]


 30%|██████████                       | 15249/50000 [2:45:59<6:00:13,  1.61it/s]


 30%|██████████                       | 15250/50000 [2:45:59<5:55:27,  1.63it/s]


 31%|██████████                       | 15251/50000 [2:46:00<6:17:47,  1.53it/s]


 31%|██████████                       | 15252/50000 [2:46:01<6:22:03,  1.52it/s]


 31%|██████████                       | 15253/50000 [2:46:01<6:06:23,  1.58it/s]


 31%|██████████                       | 15254/50000 [2:46:02<6:05:19,  1.59it/s]


 31%|██████████                       | 15255/50000 [2:46:03<5:57:54,  1.62it/s]


 31%|██████████                       | 15256/50000 [2:46:03<5:58:21,  1.62it/s]


 31%|██████████                       | 15257/50000 [2:46:04<6:06:15,  1.58it/s]


 31%|██████████                       | 15258/50000 [2:46:04<6:07:25,  1.58it/s]


 31%|██████████                       | 15259/50000 [2:46:05<6:07:05,  1.58it/s]


 31%|██████████                       | 15260/50000 [2:46:06<5:58:05,  1.62it/s]


 31%|██████████                       | 15261/50000 [2:46:06<6:18:34,  1.53it/s]


 31%|██████████                       | 15262/50000 [2:46:07<6:06:12,  1.58it/s]


 31%|██████████                       | 15263/50000 [2:46:08<6:04:06,  1.59it/s]


 31%|██████████                       | 15264/50000 [2:46:08<6:23:53,  1.51it/s]


 31%|██████████                       | 15265/50000 [2:46:09<6:05:26,  1.58it/s]


 31%|██████████                       | 15266/50000 [2:46:10<6:12:22,  1.55it/s]


 31%|██████████                       | 15267/50000 [2:46:10<6:00:38,  1.61it/s]


 31%|██████████                       | 15268/50000 [2:46:11<6:04:29,  1.59it/s]


 31%|██████████                       | 15269/50000 [2:46:11<6:05:11,  1.59it/s]


 31%|██████████                       | 15270/50000 [2:46:12<6:13:27,  1.55it/s]


 31%|██████████                       | 15271/50000 [2:46:13<6:09:34,  1.57it/s]


 31%|██████████                       | 15272/50000 [2:46:13<6:24:21,  1.51it/s]


 31%|██████████                       | 15273/50000 [2:46:14<6:44:38,  1.43it/s]


 31%|██████████                       | 15274/50000 [2:46:15<6:31:01,  1.48it/s]


 31%|██████████                       | 15275/50000 [2:46:15<6:04:25,  1.59it/s]


 31%|██████████                       | 15276/50000 [2:46:16<5:56:11,  1.62it/s]


 31%|██████████                       | 15277/50000 [2:46:16<5:43:51,  1.68it/s]


 31%|██████████                       | 15278/50000 [2:46:17<5:56:02,  1.63it/s]


 31%|██████████                       | 15279/50000 [2:46:18<6:02:24,  1.60it/s]


 31%|██████████                       | 15280/50000 [2:46:19<6:22:37,  1.51it/s]


 31%|██████████                       | 15281/50000 [2:46:19<6:51:38,  1.41it/s]


 31%|██████████                       | 15282/50000 [2:46:20<6:54:56,  1.39it/s]


 31%|██████████                       | 15283/50000 [2:46:21<6:40:20,  1.45it/s]


 31%|██████████                       | 15284/50000 [2:46:21<6:14:39,  1.54it/s]


 31%|██████████                       | 15285/50000 [2:46:22<6:13:04,  1.55it/s]


 31%|██████████                       | 15286/50000 [2:46:23<6:11:57,  1.56it/s]


 31%|██████████                       | 15287/50000 [2:46:23<6:04:37,  1.59it/s]


 31%|██████████                       | 15288/50000 [2:46:24<5:58:24,  1.61it/s]


 31%|██████████                       | 15289/50000 [2:46:24<6:01:48,  1.60it/s]


 31%|██████████                       | 15290/50000 [2:46:25<5:42:48,  1.69it/s]


 31%|██████████                       | 15291/50000 [2:46:25<5:35:30,  1.72it/s]


 31%|██████████                       | 15292/50000 [2:46:26<5:39:05,  1.71it/s]


 31%|██████████                       | 15293/50000 [2:46:27<6:20:01,  1.52it/s]


 31%|██████████                       | 15294/50000 [2:46:28<6:22:28,  1.51it/s]


 31%|██████████                       | 15295/50000 [2:46:28<6:16:24,  1.54it/s]


 31%|██████████                       | 15296/50000 [2:46:29<6:31:12,  1.48it/s]


 31%|██████████                       | 15297/50000 [2:46:30<6:57:38,  1.38it/s]


 31%|██████████                       | 15298/50000 [2:46:30<6:49:06,  1.41it/s]


 31%|██████████                       | 15299/50000 [2:46:31<7:11:33,  1.34it/s]


 31%|██████████                       | 15300/50000 [2:46:32<6:47:29,  1.42it/s]
                                                                                
{'loss': 3.3303, 'grad_norm': 3.143373966217041, 'learning_rate': 0.000694, 'epoch': 0.8}

 31%|██████████                       | 15300/50000 [2:46:32<6:47:29,  1.42it/s]


 31%|██████████                       | 15301/50000 [2:46:33<6:41:59,  1.44it/s]


 31%|██████████                       | 15302/50000 [2:46:33<6:25:40,  1.50it/s]


 31%|██████████                       | 15303/50000 [2:46:34<6:19:48,  1.52it/s]


 31%|██████████                       | 15304/50000 [2:46:34<6:32:27,  1.47it/s]


 31%|██████████                       | 15305/50000 [2:46:35<6:17:51,  1.53it/s]


 31%|██████████                       | 15306/50000 [2:46:36<6:17:30,  1.53it/s]


 31%|██████████                       | 15307/50000 [2:46:36<6:07:44,  1.57it/s]


 31%|██████████                       | 15308/50000 [2:46:37<6:11:53,  1.55it/s]


 31%|██████████                       | 15309/50000 [2:46:38<6:00:02,  1.61it/s]


 31%|██████████                       | 15310/50000 [2:46:38<6:25:36,  1.50it/s]


 31%|██████████                       | 15311/50000 [2:46:39<6:13:27,  1.55it/s]


 31%|██████████                       | 15312/50000 [2:46:40<6:00:36,  1.60it/s]


 31%|██████████                       | 15313/50000 [2:46:40<5:56:56,  1.62it/s]


 31%|██████████                       | 15314/50000 [2:46:41<6:02:12,  1.60it/s]


 31%|██████████                       | 15315/50000 [2:46:41<5:58:18,  1.61it/s]


 31%|██████████                       | 15316/50000 [2:46:42<6:03:06,  1.59it/s]


 31%|██████████                       | 15317/50000 [2:46:43<6:10:46,  1.56it/s]


 31%|██████████                       | 15318/50000 [2:46:43<5:45:06,  1.67it/s]


 31%|██████████                       | 15319/50000 [2:46:44<6:07:24,  1.57it/s]


 31%|██████████                       | 15320/50000 [2:46:45<6:24:10,  1.50it/s]


 31%|██████████                       | 15321/50000 [2:46:45<6:13:29,  1.55it/s]


 31%|██████████                       | 15322/50000 [2:46:46<6:04:45,  1.58it/s]


 31%|██████████                       | 15323/50000 [2:46:46<5:54:08,  1.63it/s]


 31%|██████████                       | 15324/50000 [2:46:47<5:50:04,  1.65it/s]


 31%|██████████                       | 15325/50000 [2:46:48<5:39:45,  1.70it/s]


 31%|██████████                       | 15326/50000 [2:46:48<6:26:06,  1.50it/s]


 31%|██████████                       | 15327/50000 [2:46:49<6:17:49,  1.53it/s]


 31%|██████████                       | 15328/50000 [2:46:50<6:44:41,  1.43it/s]


 31%|██████████                       | 15329/50000 [2:46:50<6:37:18,  1.45it/s]


 31%|██████████                       | 15330/50000 [2:46:51<6:19:03,  1.52it/s]


 31%|██████████                       | 15331/50000 [2:46:52<6:15:02,  1.54it/s]


 31%|██████████                       | 15332/50000 [2:46:52<6:20:52,  1.52it/s]


 31%|██████████                       | 15333/50000 [2:46:53<6:20:44,  1.52it/s]


 31%|██████████                       | 15334/50000 [2:46:54<6:17:14,  1.53it/s]


 31%|██████████                       | 15335/50000 [2:46:55<7:29:06,  1.29it/s]


 31%|██████████                       | 15336/50000 [2:46:55<6:59:33,  1.38it/s]


 31%|██████████                       | 15337/50000 [2:46:56<6:35:39,  1.46it/s]


 31%|██████████                       | 15338/50000 [2:46:57<6:45:43,  1.42it/s]


 31%|██████████                       | 15339/50000 [2:46:57<6:14:27,  1.54it/s]


 31%|██████████                       | 15340/50000 [2:46:58<6:16:21,  1.53it/s]


 31%|██████████▏                      | 15341/50000 [2:46:58<6:12:05,  1.55it/s]


 31%|██████████▏                      | 15342/50000 [2:46:59<6:18:49,  1.52it/s]


 31%|██████████▏                      | 15343/50000 [2:47:00<6:45:28,  1.42it/s]


 31%|██████████▏                      | 15344/50000 [2:47:01<6:26:45,  1.49it/s]


 31%|██████████▏                      | 15345/50000 [2:47:01<6:10:34,  1.56it/s]


 31%|██████████▏                      | 15346/50000 [2:47:02<6:16:41,  1.53it/s]


 31%|██████████▏                      | 15347/50000 [2:47:02<6:15:26,  1.54it/s]


 31%|██████████▏                      | 15348/50000 [2:47:03<6:34:45,  1.46it/s]


 31%|██████████▏                      | 15349/50000 [2:47:04<6:24:53,  1.50it/s]


 31%|██████████▏                      | 15350/50000 [2:47:05<6:34:46,  1.46it/s]


 31%|██████████▏                      | 15351/50000 [2:47:05<6:39:06,  1.45it/s]


 31%|██████████▏                      | 15352/50000 [2:47:06<6:19:09,  1.52it/s]


 31%|██████████▏                      | 15353/50000 [2:47:07<6:15:17,  1.54it/s]


 31%|██████████▏                      | 15354/50000 [2:47:07<6:19:15,  1.52it/s]


 31%|██████████▏                      | 15355/50000 [2:47:08<6:13:18,  1.55it/s]


 31%|██████████▏                      | 15356/50000 [2:47:09<6:33:46,  1.47it/s]


 31%|██████████▏                      | 15357/50000 [2:47:09<6:32:01,  1.47it/s]


 31%|██████████▏                      | 15358/50000 [2:47:10<6:21:53,  1.51it/s]


 31%|██████████▏                      | 15359/50000 [2:47:11<6:54:06,  1.39it/s]


 31%|██████████▏                      | 15360/50000 [2:47:11<6:40:33,  1.44it/s]


 31%|██████████▏                      | 15361/50000 [2:47:12<6:27:47,  1.49it/s]


 31%|██████████▏                      | 15362/50000 [2:47:13<6:10:09,  1.56it/s]


 31%|██████████▏                      | 15363/50000 [2:47:13<6:07:35,  1.57it/s]


 31%|██████████▏                      | 15364/50000 [2:47:14<6:26:57,  1.49it/s]


 31%|██████████▏                      | 15365/50000 [2:47:15<6:13:48,  1.54it/s]


 31%|██████████▏                      | 15366/50000 [2:47:15<6:11:14,  1.55it/s]


 31%|██████████▏                      | 15367/50000 [2:47:16<6:12:45,  1.55it/s]


 31%|██████████▏                      | 15368/50000 [2:47:16<5:56:17,  1.62it/s]


 31%|██████████▏                      | 15369/50000 [2:47:17<6:35:00,  1.46it/s]


 31%|██████████▏                      | 15370/50000 [2:47:18<6:33:44,  1.47it/s]


 31%|██████████▏                      | 15371/50000 [2:47:18<6:25:11,  1.50it/s]


 31%|██████████▏                      | 15372/50000 [2:47:19<6:34:56,  1.46it/s]


 31%|██████████▏                      | 15373/50000 [2:47:20<6:13:57,  1.54it/s]


 31%|██████████▏                      | 15374/50000 [2:47:20<5:48:52,  1.65it/s]


 31%|██████████▏                      | 15375/50000 [2:47:21<6:06:57,  1.57it/s]


 31%|██████████▏                      | 15376/50000 [2:47:22<6:32:02,  1.47it/s]


 31%|██████████▏                      | 15377/50000 [2:47:22<6:22:55,  1.51it/s]


 31%|██████████▏                      | 15378/50000 [2:47:23<6:11:01,  1.56it/s]


 31%|██████████▏                      | 15379/50000 [2:47:24<6:31:43,  1.47it/s]


 31%|██████████▏                      | 15380/50000 [2:47:24<6:19:17,  1.52it/s]


 31%|██████████▏                      | 15381/50000 [2:47:25<6:10:31,  1.56it/s]


 31%|██████████▏                      | 15382/50000 [2:47:26<6:00:05,  1.60it/s]


 31%|██████████▏                      | 15383/50000 [2:47:26<5:57:08,  1.62it/s]


 31%|██████████▏                      | 15384/50000 [2:47:27<6:35:35,  1.46it/s]


 31%|██████████▏                      | 15385/50000 [2:47:28<6:44:29,  1.43it/s]


 31%|██████████▏                      | 15386/50000 [2:47:28<6:33:47,  1.47it/s]


 31%|██████████▏                      | 15387/50000 [2:47:29<6:24:07,  1.50it/s]


 31%|██████████▏                      | 15388/50000 [2:47:30<6:51:15,  1.40it/s]


 31%|██████████▏                      | 15389/50000 [2:47:30<6:31:56,  1.47it/s]


 31%|██████████▏                      | 15390/50000 [2:47:31<6:10:54,  1.56it/s]


 31%|██████████▏                      | 15391/50000 [2:47:32<6:27:01,  1.49it/s]


 31%|██████████▏                      | 15392/50000 [2:47:32<6:10:38,  1.56it/s]


 31%|██████████▏                      | 15393/50000 [2:47:33<6:07:53,  1.57it/s]


 31%|██████████▏                      | 15394/50000 [2:47:34<6:13:31,  1.54it/s]


 31%|██████████▏                      | 15395/50000 [2:47:34<5:56:56,  1.62it/s]


 31%|██████████▏                      | 15396/50000 [2:47:35<5:53:07,  1.63it/s]


 31%|██████████▏                      | 15397/50000 [2:47:35<5:46:15,  1.67it/s]


 31%|██████████▏                      | 15398/50000 [2:47:36<5:55:41,  1.62it/s]


 31%|██████████▏                      | 15399/50000 [2:47:37<5:39:39,  1.70it/s]


 31%|██████████▏                      | 15400/50000 [2:47:37<5:34:31,  1.72it/s]
                                                                                
{'loss': 3.3994, 'grad_norm': 3.476101875305176, 'learning_rate': 0.000692, 'epoch': 0.81}

 31%|██████████▏                      | 15400/50000 [2:47:37<5:34:31,  1.72it/s]


 31%|██████████▏                      | 15401/50000 [2:47:38<5:33:08,  1.73it/s]


 31%|██████████▏                      | 15402/50000 [2:47:38<6:13:31,  1.54it/s]


 31%|██████████▏                      | 15403/50000 [2:47:39<7:05:07,  1.36it/s]


 31%|██████████▏                      | 15404/50000 [2:47:40<7:07:30,  1.35it/s]


 31%|██████████▏                      | 15405/50000 [2:47:41<6:48:09,  1.41it/s]


 31%|██████████▏                      | 15406/50000 [2:47:41<6:24:31,  1.50it/s]


 31%|██████████▏                      | 15407/50000 [2:47:42<6:17:52,  1.53it/s]


 31%|██████████▏                      | 15408/50000 [2:47:43<6:29:03,  1.48it/s]


 31%|██████████▏                      | 15409/50000 [2:47:43<6:16:38,  1.53it/s]


 31%|██████████▏                      | 15410/50000 [2:47:44<6:17:53,  1.53it/s]


 31%|██████████▏                      | 15411/50000 [2:47:45<7:02:20,  1.36it/s]


 31%|██████████▏                      | 15412/50000 [2:47:46<7:47:55,  1.23it/s]


 31%|██████████▏                      | 15413/50000 [2:47:47<7:20:05,  1.31it/s]


 31%|██████████▏                      | 15414/50000 [2:47:47<6:42:14,  1.43it/s]


 31%|██████████▏                      | 15415/50000 [2:47:48<6:23:16,  1.50it/s]


 31%|██████████▏                      | 15416/50000 [2:47:48<5:56:28,  1.62it/s]


 31%|██████████▏                      | 15417/50000 [2:47:49<5:59:01,  1.61it/s]


 31%|██████████▏                      | 15418/50000 [2:47:49<6:01:49,  1.59it/s]


 31%|██████████▏                      | 15419/50000 [2:47:50<5:52:54,  1.63it/s]


 31%|██████████▏                      | 15420/50000 [2:47:51<6:09:33,  1.56it/s]


 31%|██████████▏                      | 15421/50000 [2:47:52<6:40:58,  1.44it/s]


 31%|██████████▏                      | 15422/50000 [2:47:52<6:22:25,  1.51it/s]


 31%|██████████▏                      | 15423/50000 [2:47:53<6:12:42,  1.55it/s]


 31%|██████████▏                      | 15424/50000 [2:47:53<6:15:48,  1.53it/s]


 31%|██████████▏                      | 15425/50000 [2:47:54<5:57:15,  1.61it/s]


 31%|██████████▏                      | 15426/50000 [2:47:55<5:51:50,  1.64it/s]


 31%|██████████▏                      | 15427/50000 [2:47:55<6:13:49,  1.54it/s]


 31%|██████████▏                      | 15428/50000 [2:47:56<5:50:11,  1.65it/s]


 31%|██████████▏                      | 15429/50000 [2:47:56<6:09:00,  1.56it/s]


 31%|██████████▏                      | 15430/50000 [2:47:57<6:24:45,  1.50it/s]


 31%|██████████▏                      | 15431/50000 [2:47:58<6:19:07,  1.52it/s]


 31%|██████████▏                      | 15432/50000 [2:47:58<6:15:07,  1.54it/s]


 31%|██████████▏                      | 15433/50000 [2:47:59<6:33:27,  1.46it/s]


 31%|██████████▏                      | 15434/50000 [2:48:00<6:11:00,  1.55it/s]


 31%|██████████▏                      | 15435/50000 [2:48:01<6:30:40,  1.47it/s]


 31%|██████████▏                      | 15436/50000 [2:48:01<6:26:31,  1.49it/s]


 31%|██████████▏                      | 15437/50000 [2:48:02<6:29:12,  1.48it/s]


 31%|██████████▏                      | 15438/50000 [2:48:02<6:10:08,  1.56it/s]


 31%|██████████▏                      | 15439/50000 [2:48:03<6:03:12,  1.59it/s]


 31%|██████████▏                      | 15440/50000 [2:48:04<5:44:49,  1.67it/s]


 31%|██████████▏                      | 15441/50000 [2:48:04<5:51:05,  1.64it/s]


 31%|██████████▏                      | 15442/50000 [2:48:05<5:59:45,  1.60it/s]


 31%|██████████▏                      | 15443/50000 [2:48:06<6:02:49,  1.59it/s]


 31%|██████████▏                      | 15444/50000 [2:48:06<5:58:08,  1.61it/s]


 31%|██████████▏                      | 15445/50000 [2:48:07<6:04:47,  1.58it/s]


 31%|██████████▏                      | 15446/50000 [2:48:07<5:58:25,  1.61it/s]


 31%|██████████▏                      | 15447/50000 [2:48:08<6:09:17,  1.56it/s]


 31%|██████████▏                      | 15448/50000 [2:48:09<6:15:50,  1.53it/s]


 31%|██████████▏                      | 15449/50000 [2:48:09<6:05:06,  1.58it/s]


 31%|██████████▏                      | 15450/50000 [2:48:10<6:08:32,  1.56it/s]


 31%|██████████▏                      | 15451/50000 [2:48:11<6:24:35,  1.50it/s]


 31%|██████████▏                      | 15452/50000 [2:48:11<6:34:50,  1.46it/s]


 31%|██████████▏                      | 15453/50000 [2:48:12<6:50:44,  1.40it/s]


 31%|██████████▏                      | 15454/50000 [2:48:13<6:51:25,  1.40it/s]


 31%|██████████▏                      | 15455/50000 [2:48:13<6:15:23,  1.53it/s]


 31%|██████████▏                      | 15456/50000 [2:48:14<6:21:14,  1.51it/s]


 31%|██████████▏                      | 15457/50000 [2:48:15<6:07:50,  1.57it/s]


 31%|██████████▏                      | 15458/50000 [2:48:15<5:59:17,  1.60it/s]


 31%|██████████▏                      | 15459/50000 [2:48:16<5:54:11,  1.63it/s]


 31%|██████████▏                      | 15460/50000 [2:48:17<5:56:08,  1.62it/s]


 31%|██████████▏                      | 15461/50000 [2:48:17<5:59:32,  1.60it/s]


 31%|██████████▏                      | 15462/50000 [2:48:18<5:55:26,  1.62it/s]


 31%|██████████▏                      | 15463/50000 [2:48:18<5:47:08,  1.66it/s]


 31%|██████████▏                      | 15464/50000 [2:48:19<6:06:53,  1.57it/s]


 31%|██████████▏                      | 15465/50000 [2:48:20<5:51:51,  1.64it/s]


 31%|██████████▏                      | 15466/50000 [2:48:20<6:03:10,  1.58it/s]


 31%|██████████▏                      | 15467/50000 [2:48:21<6:02:23,  1.59it/s]


 31%|██████████▏                      | 15468/50000 [2:48:22<6:10:33,  1.55it/s]


 31%|██████████▏                      | 15469/50000 [2:48:22<5:59:55,  1.60it/s]


 31%|██████████▏                      | 15470/50000 [2:48:23<6:01:14,  1.59it/s]


 31%|██████████▏                      | 15471/50000 [2:48:23<6:01:02,  1.59it/s]


 31%|██████████▏                      | 15472/50000 [2:48:24<6:15:08,  1.53it/s]


 31%|██████████▏                      | 15473/50000 [2:48:25<6:30:41,  1.47it/s]


 31%|██████████▏                      | 15474/50000 [2:48:25<6:07:27,  1.57it/s]


 31%|██████████▏                      | 15475/50000 [2:48:26<6:31:59,  1.47it/s]


 31%|██████████▏                      | 15476/50000 [2:48:27<6:11:39,  1.55it/s]


 31%|██████████▏                      | 15477/50000 [2:48:27<5:47:01,  1.66it/s]


 31%|██████████▏                      | 15478/50000 [2:48:28<5:54:14,  1.62it/s]


 31%|██████████▏                      | 15479/50000 [2:48:29<5:46:18,  1.66it/s]


 31%|██████████▏                      | 15480/50000 [2:48:29<5:42:56,  1.68it/s]


 31%|██████████▏                      | 15481/50000 [2:48:30<5:28:03,  1.75it/s]


 31%|██████████▏                      | 15482/50000 [2:48:30<5:43:10,  1.68it/s]


 31%|██████████▏                      | 15483/50000 [2:48:31<5:53:49,  1.63it/s]


 31%|██████████▏                      | 15484/50000 [2:48:32<6:00:18,  1.60it/s]


 31%|██████████▏                      | 15485/50000 [2:48:32<5:48:27,  1.65it/s]


 31%|██████████▏                      | 15486/50000 [2:48:33<5:57:24,  1.61it/s]


 31%|██████████▏                      | 15487/50000 [2:48:33<5:50:43,  1.64it/s]


 31%|██████████▏                      | 15488/50000 [2:48:34<6:00:12,  1.60it/s]


 31%|██████████▏                      | 15489/50000 [2:48:35<5:42:13,  1.68it/s]


 31%|██████████▏                      | 15490/50000 [2:48:35<5:43:12,  1.68it/s]


 31%|██████████▏                      | 15491/50000 [2:48:36<6:03:37,  1.58it/s]


 31%|██████████▏                      | 15492/50000 [2:48:37<6:05:50,  1.57it/s]


 31%|██████████▏                      | 15493/50000 [2:48:37<5:58:43,  1.60it/s]


 31%|██████████▏                      | 15494/50000 [2:48:38<6:09:08,  1.56it/s]


 31%|██████████▏                      | 15495/50000 [2:48:38<5:59:55,  1.60it/s]


 31%|██████████▏                      | 15496/50000 [2:48:39<5:54:33,  1.62it/s]


 31%|██████████▏                      | 15497/50000 [2:48:40<6:02:55,  1.58it/s]


 31%|██████████▏                      | 15498/50000 [2:48:40<5:57:45,  1.61it/s]


 31%|██████████▏                      | 15499/50000 [2:48:41<6:31:03,  1.47it/s]


 31%|██████████▏                      | 15500/50000 [2:48:42<6:17:21,  1.52it/s]
                                                                                
{'loss': 3.3794, 'grad_norm': 3.6031229496002197, 'learning_rate': 0.00069, 'epoch': 0.81}

 31%|██████████▏                      | 15500/50000 [2:48:42<6:17:21,  1.52it/s]


 31%|██████████▏                      | 15501/50000 [2:48:42<6:04:39,  1.58it/s]


 31%|██████████▏                      | 15502/50000 [2:48:43<5:55:44,  1.62it/s]


 31%|██████████▏                      | 15503/50000 [2:48:43<5:57:04,  1.61it/s]


 31%|██████████▏                      | 15504/50000 [2:48:44<6:14:45,  1.53it/s]


 31%|██████████▏                      | 15505/50000 [2:48:45<6:13:34,  1.54it/s]


 31%|██████████▏                      | 15506/50000 [2:48:45<5:59:47,  1.60it/s]


 31%|██████████▏                      | 15507/50000 [2:48:46<6:07:26,  1.56it/s]


 31%|██████████▏                      | 15508/50000 [2:48:47<6:10:42,  1.55it/s]


 31%|██████████▏                      | 15509/50000 [2:48:47<6:16:11,  1.53it/s]


 31%|██████████▏                      | 15510/50000 [2:48:48<6:42:49,  1.43it/s]


 31%|██████████▏                      | 15511/50000 [2:48:49<6:18:19,  1.52it/s]


 31%|██████████▏                      | 15512/50000 [2:48:49<6:21:10,  1.51it/s]


 31%|██████████▏                      | 15513/50000 [2:48:50<5:54:08,  1.62it/s]


 31%|██████████▏                      | 15514/50000 [2:48:51<6:03:54,  1.58it/s]


 31%|██████████▏                      | 15515/50000 [2:48:51<5:52:17,  1.63it/s]


 31%|██████████▏                      | 15516/50000 [2:48:52<5:50:32,  1.64it/s]


 31%|██████████▏                      | 15517/50000 [2:48:52<5:41:53,  1.68it/s]


 31%|██████████▏                      | 15518/50000 [2:48:53<6:10:06,  1.55it/s]


 31%|██████████▏                      | 15519/50000 [2:48:54<6:02:00,  1.59it/s]


 31%|██████████▏                      | 15520/50000 [2:48:54<5:57:27,  1.61it/s]


 31%|██████████▏                      | 15521/50000 [2:48:55<6:15:15,  1.53it/s]


 31%|██████████▏                      | 15522/50000 [2:48:56<6:01:44,  1.59it/s]


 31%|██████████▏                      | 15523/50000 [2:48:56<5:54:56,  1.62it/s]


 31%|██████████▏                      | 15524/50000 [2:48:57<6:12:12,  1.54it/s]


 31%|██████████▏                      | 15525/50000 [2:48:58<6:02:30,  1.59it/s]


 31%|██████████▏                      | 15526/50000 [2:48:58<5:51:21,  1.64it/s]


 31%|██████████▏                      | 15527/50000 [2:48:59<5:59:51,  1.60it/s]


 31%|██████████▏                      | 15528/50000 [2:48:59<6:08:58,  1.56it/s]


 31%|██████████▏                      | 15529/50000 [2:49:00<5:57:11,  1.61it/s]


 31%|██████████▏                      | 15530/50000 [2:49:01<6:16:18,  1.53it/s]


 31%|██████████▎                      | 15531/50000 [2:49:01<5:52:29,  1.63it/s]


 31%|██████████▎                      | 15532/50000 [2:49:02<5:53:40,  1.62it/s]


 31%|██████████▎                      | 15533/50000 [2:49:02<5:36:30,  1.71it/s]


 31%|██████████▎                      | 15534/50000 [2:49:03<5:46:45,  1.66it/s]


 31%|██████████▎                      | 15535/50000 [2:49:04<5:51:49,  1.63it/s]


 31%|██████████▎                      | 15536/50000 [2:49:05<6:46:36,  1.41it/s]


 31%|██████████▎                      | 15537/50000 [2:49:05<6:42:35,  1.43it/s]


 31%|██████████▎                      | 15538/50000 [2:49:06<6:45:29,  1.42it/s]


 31%|██████████▎                      | 15539/50000 [2:49:07<6:20:54,  1.51it/s]


 31%|██████████▎                      | 15540/50000 [2:49:07<6:08:08,  1.56it/s]


 31%|██████████▎                      | 15541/50000 [2:49:08<6:11:07,  1.55it/s]


 31%|██████████▎                      | 15542/50000 [2:49:08<6:14:20,  1.53it/s]


 31%|██████████▎                      | 15543/50000 [2:49:09<5:57:00,  1.61it/s]


 31%|██████████▎                      | 15544/50000 [2:49:10<6:14:01,  1.54it/s]


 31%|██████████▎                      | 15545/50000 [2:49:10<6:05:30,  1.57it/s]


 31%|██████████▎                      | 15546/50000 [2:49:11<6:07:14,  1.56it/s]


 31%|██████████▎                      | 15547/50000 [2:49:12<6:23:19,  1.50it/s]


 31%|██████████▎                      | 15548/50000 [2:49:12<6:03:32,  1.58it/s]


 31%|██████████▎                      | 15549/50000 [2:49:13<6:08:13,  1.56it/s]


 31%|██████████▎                      | 15550/50000 [2:49:14<6:02:39,  1.58it/s]


 31%|██████████▎                      | 15551/50000 [2:49:14<6:18:48,  1.52it/s]


 31%|██████████▎                      | 15552/50000 [2:49:15<6:50:19,  1.40it/s]


 31%|██████████▎                      | 15553/50000 [2:49:16<7:01:59,  1.36it/s]


 31%|██████████▎                      | 15554/50000 [2:49:17<7:05:45,  1.35it/s]


 31%|██████████▎                      | 15555/50000 [2:49:17<6:38:39,  1.44it/s]


 31%|██████████▎                      | 15556/50000 [2:49:18<6:31:42,  1.47it/s]


 31%|██████████▎                      | 15557/50000 [2:49:18<6:13:55,  1.54it/s]


 31%|██████████▎                      | 15558/50000 [2:49:19<6:25:19,  1.49it/s]


 31%|██████████▎                      | 15559/50000 [2:49:20<6:33:41,  1.46it/s]


 31%|██████████▎                      | 15560/50000 [2:49:21<6:30:19,  1.47it/s]


 31%|██████████▎                      | 15561/50000 [2:49:21<6:56:12,  1.38it/s]


 31%|██████████▎                      | 15562/50000 [2:49:22<6:41:26,  1.43it/s]


 31%|██████████▎                      | 15563/50000 [2:49:23<6:36:32,  1.45it/s]


 31%|██████████▎                      | 15564/50000 [2:49:23<6:13:30,  1.54it/s]


 31%|██████████▎                      | 15565/50000 [2:49:24<5:59:30,  1.60it/s]


 31%|██████████▎                      | 15566/50000 [2:49:24<6:01:39,  1.59it/s]


 31%|██████████▎                      | 15567/50000 [2:49:25<6:03:04,  1.58it/s]


 31%|██████████▎                      | 15568/50000 [2:49:26<6:06:12,  1.57it/s]


 31%|██████████▎                      | 15569/50000 [2:49:26<6:12:50,  1.54it/s]


 31%|██████████▎                      | 15570/50000 [2:49:27<6:16:25,  1.52it/s]


 31%|██████████▎                      | 15571/50000 [2:49:28<6:14:30,  1.53it/s]


 31%|██████████▎                      | 15572/50000 [2:49:28<5:51:49,  1.63it/s]


 31%|██████████▎                      | 15573/50000 [2:49:29<6:33:29,  1.46it/s]


 31%|██████████▎                      | 15574/50000 [2:49:30<6:25:49,  1.49it/s]


 31%|██████████▎                      | 15575/50000 [2:49:30<5:56:42,  1.61it/s]


 31%|██████████▎                      | 15576/50000 [2:49:31<6:17:28,  1.52it/s]


 31%|██████████▎                      | 15577/50000 [2:49:32<6:03:20,  1.58it/s]


 31%|██████████▎                      | 15578/50000 [2:49:32<6:23:17,  1.50it/s]


 31%|██████████▎                      | 15579/50000 [2:49:33<6:18:19,  1.52it/s]


 31%|██████████▎                      | 15580/50000 [2:49:34<5:59:45,  1.59it/s]


 31%|██████████▎                      | 15581/50000 [2:49:34<5:53:07,  1.62it/s]


 31%|██████████▎                      | 15582/50000 [2:49:35<6:13:57,  1.53it/s]


 31%|██████████▎                      | 15583/50000 [2:49:36<6:19:22,  1.51it/s]


 31%|██████████▎                      | 15584/50000 [2:49:36<6:09:04,  1.55it/s]


 31%|██████████▎                      | 15585/50000 [2:49:37<5:59:38,  1.59it/s]


 31%|██████████▎                      | 15586/50000 [2:49:37<5:59:33,  1.60it/s]


 31%|██████████▎                      | 15587/50000 [2:49:38<6:03:45,  1.58it/s]


 31%|██████████▎                      | 15588/50000 [2:49:39<5:56:52,  1.61it/s]


 31%|██████████▎                      | 15589/50000 [2:49:39<6:03:31,  1.58it/s]


 31%|██████████▎                      | 15590/50000 [2:49:40<5:52:57,  1.62it/s]


 31%|██████████▎                      | 15591/50000 [2:49:41<6:14:53,  1.53it/s]


 31%|██████████▎                      | 15592/50000 [2:49:41<6:34:36,  1.45it/s]


 31%|██████████▎                      | 15593/50000 [2:49:42<6:29:29,  1.47it/s]


 31%|██████████▎                      | 15594/50000 [2:49:43<6:40:24,  1.43it/s]


 31%|██████████▎                      | 15595/50000 [2:49:43<6:31:15,  1.47it/s]


 31%|██████████▎                      | 15596/50000 [2:49:44<6:40:44,  1.43it/s]


 31%|██████████▎                      | 15597/50000 [2:49:45<6:35:51,  1.45it/s]


 31%|██████████▎                      | 15598/50000 [2:49:45<6:16:08,  1.52it/s]


 31%|██████████▎                      | 15599/50000 [2:49:46<6:02:23,  1.58it/s]


 31%|██████████▎                      | 15600/50000 [2:49:47<5:52:59,  1.62it/s]
                                                                                
{'loss': 3.3671, 'grad_norm': 3.76164174079895, 'learning_rate': 0.0006879999999999999, 'epoch': 0.82}

 31%|██████████▎                      | 15600/50000 [2:49:47<5:52:59,  1.62it/s]


 31%|██████████▎                      | 15601/50000 [2:49:47<5:45:14,  1.66it/s]


 31%|██████████▎                      | 15602/50000 [2:49:48<5:37:32,  1.70it/s]


 31%|██████████▎                      | 15603/50000 [2:49:48<5:37:28,  1.70it/s]


 31%|██████████▎                      | 15604/50000 [2:49:49<5:39:37,  1.69it/s]


 31%|██████████▎                      | 15605/50000 [2:49:49<5:46:39,  1.65it/s]


 31%|██████████▎                      | 15606/50000 [2:49:50<5:57:56,  1.60it/s]


 31%|██████████▎                      | 15607/50000 [2:49:51<6:06:58,  1.56it/s]


 31%|██████████▎                      | 15608/50000 [2:49:52<6:14:29,  1.53it/s]


 31%|██████████▎                      | 15609/50000 [2:49:52<6:00:07,  1.59it/s]


 31%|██████████▎                      | 15610/50000 [2:49:53<5:52:25,  1.63it/s]


 31%|██████████▎                      | 15611/50000 [2:49:53<5:50:18,  1.64it/s]


 31%|██████████▎                      | 15612/50000 [2:49:54<6:32:29,  1.46it/s]


 31%|██████████▎                      | 15613/50000 [2:49:55<6:43:50,  1.42it/s]


 31%|██████████▎                      | 15614/50000 [2:49:55<6:29:40,  1.47it/s]


 31%|██████████▎                      | 15615/50000 [2:49:56<6:36:30,  1.45it/s]


 31%|██████████▎                      | 15616/50000 [2:49:57<6:30:03,  1.47it/s]


 31%|██████████▎                      | 15617/50000 [2:49:57<6:09:14,  1.55it/s]


 31%|██████████▎                      | 15618/50000 [2:49:58<5:54:34,  1.62it/s]


 31%|██████████▎                      | 15619/50000 [2:49:59<5:57:22,  1.60it/s]


 31%|██████████▎                      | 15620/50000 [2:49:59<5:52:46,  1.62it/s]


 31%|██████████▎                      | 15621/50000 [2:50:00<5:58:09,  1.60it/s]


 31%|██████████▎                      | 15622/50000 [2:50:00<5:57:43,  1.60it/s]


 31%|██████████▎                      | 15623/50000 [2:50:01<6:08:17,  1.56it/s]


 31%|██████████▎                      | 15624/50000 [2:50:02<7:01:01,  1.36it/s]


 31%|██████████▎                      | 15625/50000 [2:50:03<6:36:52,  1.44it/s]


 31%|██████████▎                      | 15626/50000 [2:50:03<6:26:38,  1.48it/s]


 31%|██████████▎                      | 15627/50000 [2:50:04<6:52:59,  1.39it/s]


 31%|██████████▎                      | 15628/50000 [2:50:05<6:32:23,  1.46it/s]


 31%|██████████▎                      | 15629/50000 [2:50:06<7:25:47,  1.29it/s]


 31%|██████████▎                      | 15630/50000 [2:50:06<7:06:17,  1.34it/s]


 31%|██████████▎                      | 15631/50000 [2:50:07<6:35:13,  1.45it/s]


 31%|██████████▎                      | 15632/50000 [2:50:08<6:15:15,  1.53it/s]


 31%|██████████▎                      | 15633/50000 [2:50:08<6:09:16,  1.55it/s]


 31%|██████████▎                      | 15634/50000 [2:50:09<6:32:41,  1.46it/s]


 31%|██████████▎                      | 15635/50000 [2:50:10<6:11:48,  1.54it/s]


 31%|██████████▎                      | 15636/50000 [2:50:10<6:27:55,  1.48it/s]


 31%|██████████▎                      | 15637/50000 [2:50:11<6:14:01,  1.53it/s]


 31%|██████████▎                      | 15638/50000 [2:50:11<6:03:56,  1.57it/s]


 31%|██████████▎                      | 15639/50000 [2:50:12<6:06:54,  1.56it/s]


 31%|██████████▎                      | 15640/50000 [2:50:13<6:06:39,  1.56it/s]


 31%|██████████▎                      | 15641/50000 [2:50:14<6:58:44,  1.37it/s]


 31%|██████████▎                      | 15642/50000 [2:50:14<6:33:12,  1.46it/s]


 31%|██████████▎                      | 15643/50000 [2:50:15<6:16:17,  1.52it/s]


 31%|██████████▎                      | 15644/50000 [2:50:15<6:05:37,  1.57it/s]


 31%|██████████▎                      | 15645/50000 [2:50:16<5:43:28,  1.67it/s]


 31%|██████████▎                      | 15646/50000 [2:50:17<5:48:00,  1.65it/s]


 31%|██████████▎                      | 15647/50000 [2:50:17<5:53:19,  1.62it/s]


 31%|██████████▎                      | 15648/50000 [2:50:18<6:02:35,  1.58it/s]


 31%|██████████▎                      | 15649/50000 [2:50:19<6:02:34,  1.58it/s]


 31%|██████████▎                      | 15650/50000 [2:50:19<6:21:31,  1.50it/s]


 31%|██████████▎                      | 15651/50000 [2:50:20<6:09:02,  1.55it/s]


 31%|██████████▎                      | 15652/50000 [2:50:21<6:06:56,  1.56it/s]


 31%|██████████▎                      | 15653/50000 [2:50:21<5:51:25,  1.63it/s]


 31%|██████████▎                      | 15654/50000 [2:50:22<5:57:48,  1.60it/s]


 31%|██████████▎                      | 15655/50000 [2:50:22<5:48:14,  1.64it/s]


 31%|██████████▎                      | 15656/50000 [2:50:23<5:31:15,  1.73it/s]


 31%|██████████▎                      | 15657/50000 [2:50:23<5:42:39,  1.67it/s]


 31%|██████████▎                      | 15658/50000 [2:50:24<6:06:16,  1.56it/s]


 31%|██████████▎                      | 15659/50000 [2:50:25<6:22:47,  1.50it/s]


 31%|██████████▎                      | 15660/50000 [2:50:26<6:20:05,  1.51it/s]


 31%|██████████▎                      | 15661/50000 [2:50:26<6:30:27,  1.47it/s]


 31%|██████████▎                      | 15662/50000 [2:50:27<6:16:30,  1.52it/s]


 31%|██████████▎                      | 15663/50000 [2:50:28<6:36:37,  1.44it/s]


 31%|██████████▎                      | 15664/50000 [2:50:28<6:29:54,  1.47it/s]


 31%|██████████▎                      | 15665/50000 [2:50:29<6:34:40,  1.45it/s]


 31%|██████████▎                      | 15666/50000 [2:50:30<6:32:29,  1.46it/s]


 31%|██████████▎                      | 15667/50000 [2:50:30<6:13:37,  1.53it/s]


 31%|██████████▎                      | 15668/50000 [2:50:31<6:16:43,  1.52it/s]


 31%|██████████▎                      | 15669/50000 [2:50:32<6:03:42,  1.57it/s]


 31%|██████████▎                      | 15670/50000 [2:50:32<6:04:13,  1.57it/s]


 31%|██████████▎                      | 15671/50000 [2:50:33<6:05:41,  1.56it/s]


 31%|██████████▎                      | 15672/50000 [2:50:33<5:43:29,  1.67it/s]


 31%|██████████▎                      | 15673/50000 [2:50:34<5:55:33,  1.61it/s]


 31%|██████████▎                      | 15674/50000 [2:50:35<5:44:32,  1.66it/s]


 31%|██████████▎                      | 15675/50000 [2:50:35<6:05:19,  1.57it/s]


 31%|██████████▎                      | 15676/50000 [2:50:36<6:08:40,  1.55it/s]


 31%|██████████▎                      | 15677/50000 [2:50:37<6:43:28,  1.42it/s]


 31%|██████████▎                      | 15678/50000 [2:50:38<6:44:07,  1.42it/s]


 31%|██████████▎                      | 15679/50000 [2:50:38<7:25:55,  1.28it/s]


 31%|██████████▎                      | 15680/50000 [2:50:39<7:17:22,  1.31it/s]


 31%|██████████▎                      | 15681/50000 [2:50:40<6:55:46,  1.38it/s]


 31%|██████████▎                      | 15682/50000 [2:50:40<6:37:41,  1.44it/s]


 31%|██████████▎                      | 15683/50000 [2:50:41<6:19:24,  1.51it/s]


 31%|██████████▎                      | 15684/50000 [2:50:42<6:21:45,  1.50it/s]


 31%|██████████▎                      | 15685/50000 [2:50:42<6:17:56,  1.51it/s]


 31%|██████████▎                      | 15686/50000 [2:50:43<6:10:55,  1.54it/s]


 31%|██████████▎                      | 15687/50000 [2:50:44<5:56:16,  1.61it/s]


 31%|██████████▎                      | 15688/50000 [2:50:44<5:56:49,  1.60it/s]


 31%|██████████▎                      | 15689/50000 [2:50:45<6:13:56,  1.53it/s]


 31%|██████████▎                      | 15690/50000 [2:50:46<6:12:25,  1.54it/s]


 31%|██████████▎                      | 15691/50000 [2:50:46<6:28:08,  1.47it/s]


 31%|██████████▎                      | 15692/50000 [2:50:47<6:14:53,  1.53it/s]


 31%|██████████▎                      | 15693/50000 [2:50:47<6:06:32,  1.56it/s]


 31%|██████████▎                      | 15694/50000 [2:50:48<5:58:42,  1.59it/s]


 31%|██████████▎                      | 15695/50000 [2:50:49<6:01:47,  1.58it/s]


 31%|██████████▎                      | 15696/50000 [2:50:49<5:57:22,  1.60it/s]


 31%|██████████▎                      | 15697/50000 [2:50:50<5:48:56,  1.64it/s]


 31%|██████████▎                      | 15698/50000 [2:50:50<5:38:39,  1.69it/s]


 31%|██████████▎                      | 15699/50000 [2:50:51<5:30:44,  1.73it/s]


 31%|██████████▎                      | 15700/50000 [2:50:52<5:21:34,  1.78it/s]
                                                                                
{'loss': 3.3891, 'grad_norm': 3.7847962379455566, 'learning_rate': 0.0006860000000000001, 'epoch': 0.82}

 31%|██████████▎                      | 15700/50000 [2:50:52<5:21:34,  1.78it/s]


 31%|██████████▎                      | 15701/50000 [2:50:52<5:58:38,  1.59it/s]


 31%|██████████▎                      | 15702/50000 [2:50:53<5:36:42,  1.70it/s]


 31%|██████████▎                      | 15703/50000 [2:50:53<5:49:56,  1.63it/s]


 31%|██████████▎                      | 15704/50000 [2:50:54<6:01:27,  1.58it/s]


 31%|██████████▎                      | 15705/50000 [2:50:55<5:52:09,  1.62it/s]


 31%|██████████▎                      | 15706/50000 [2:50:55<6:14:11,  1.53it/s]


 31%|██████████▎                      | 15707/50000 [2:50:56<6:36:10,  1.44it/s]


 31%|██████████▎                      | 15708/50000 [2:50:57<6:45:01,  1.41it/s]


 31%|██████████▎                      | 15709/50000 [2:50:58<7:25:39,  1.28it/s]


 31%|██████████▎                      | 15710/50000 [2:50:59<7:04:23,  1.35it/s]


 31%|██████████▎                      | 15711/50000 [2:50:59<6:45:49,  1.41it/s]


 31%|██████████▎                      | 15712/50000 [2:51:00<6:39:05,  1.43it/s]


 31%|██████████▎                      | 15713/50000 [2:51:01<6:21:20,  1.50it/s]


 31%|██████████▎                      | 15714/50000 [2:51:01<6:00:19,  1.59it/s]


 31%|██████████▎                      | 15715/50000 [2:51:02<6:06:14,  1.56it/s]


 31%|██████████▎                      | 15716/50000 [2:51:02<5:54:03,  1.61it/s]


 31%|██████████▎                      | 15717/50000 [2:51:03<5:54:13,  1.61it/s]


 31%|██████████▎                      | 15718/50000 [2:51:03<5:46:41,  1.65it/s]


 31%|██████████▎                      | 15719/50000 [2:51:04<5:45:42,  1.65it/s]


 31%|██████████▍                      | 15720/50000 [2:51:05<5:52:42,  1.62it/s]


 31%|██████████▍                      | 15721/50000 [2:51:06<6:34:19,  1.45it/s]


 31%|██████████▍                      | 15722/50000 [2:51:06<6:19:03,  1.51it/s]


 31%|██████████▍                      | 15723/50000 [2:51:07<6:21:21,  1.50it/s]


 31%|██████████▍                      | 15724/50000 [2:51:08<6:18:39,  1.51it/s]


 31%|██████████▍                      | 15725/50000 [2:51:08<6:13:36,  1.53it/s]


 31%|██████████▍                      | 15726/50000 [2:51:09<5:57:01,  1.60it/s]


 31%|██████████▍                      | 15727/50000 [2:51:09<5:39:20,  1.68it/s]


 31%|██████████▍                      | 15728/50000 [2:51:10<5:33:03,  1.71it/s]


 31%|██████████▍                      | 15729/50000 [2:51:10<5:41:36,  1.67it/s]


 31%|██████████▍                      | 15730/50000 [2:51:11<5:33:36,  1.71it/s]


 31%|██████████▍                      | 15731/50000 [2:51:12<5:41:59,  1.67it/s]


 31%|██████████▍                      | 15732/50000 [2:51:12<5:50:58,  1.63it/s]


 31%|██████████▍                      | 15733/50000 [2:51:13<5:57:17,  1.60it/s]


 31%|██████████▍                      | 15734/50000 [2:51:14<6:00:38,  1.58it/s]


 31%|██████████▍                      | 15735/50000 [2:51:14<6:00:56,  1.58it/s]


 31%|██████████▍                      | 15736/50000 [2:51:15<5:51:22,  1.63it/s]


 31%|██████████▍                      | 15737/50000 [2:51:15<5:34:24,  1.71it/s]


 31%|██████████▍                      | 15738/50000 [2:51:16<5:33:51,  1.71it/s]


 31%|██████████▍                      | 15739/50000 [2:51:16<5:33:30,  1.71it/s]


 31%|██████████▍                      | 15740/50000 [2:51:17<5:47:17,  1.64it/s]


 31%|██████████▍                      | 15741/50000 [2:51:18<5:46:03,  1.65it/s]


 31%|██████████▍                      | 15742/50000 [2:51:18<5:58:13,  1.59it/s]


 31%|██████████▍                      | 15743/50000 [2:51:19<5:51:33,  1.62it/s]


 31%|██████████▍                      | 15744/50000 [2:51:20<5:45:52,  1.65it/s]


 31%|██████████▍                      | 15745/50000 [2:51:20<5:45:05,  1.65it/s]


 31%|██████████▍                      | 15746/50000 [2:51:21<5:52:03,  1.62it/s]


 31%|██████████▍                      | 15747/50000 [2:51:21<5:41:53,  1.67it/s]


 31%|██████████▍                      | 15748/50000 [2:51:22<5:53:18,  1.62it/s]


 31%|██████████▍                      | 15749/50000 [2:51:23<5:58:13,  1.59it/s]


 32%|██████████▍                      | 15750/50000 [2:51:23<5:59:24,  1.59it/s]


 32%|██████████▍                      | 15751/50000 [2:51:24<6:00:16,  1.58it/s]


 32%|██████████▍                      | 15752/50000 [2:51:25<6:04:10,  1.57it/s]


 32%|██████████▍                      | 15753/50000 [2:51:25<6:06:01,  1.56it/s]


 32%|██████████▍                      | 15754/50000 [2:51:26<5:55:07,  1.61it/s]


 32%|██████████▍                      | 15755/50000 [2:51:27<6:06:04,  1.56it/s]


 32%|██████████▍                      | 15756/50000 [2:51:27<6:03:53,  1.57it/s]


 32%|██████████▍                      | 15757/50000 [2:51:28<5:49:23,  1.63it/s]


 32%|██████████▍                      | 15758/50000 [2:51:28<6:11:04,  1.54it/s]


 32%|██████████▍                      | 15759/50000 [2:51:29<6:08:25,  1.55it/s]


 32%|██████████▍                      | 15760/50000 [2:51:30<5:58:53,  1.59it/s]


 32%|██████████▍                      | 15761/50000 [2:51:30<6:00:00,  1.59it/s]


 32%|██████████▍                      | 15762/50000 [2:51:31<6:03:45,  1.57it/s]


 32%|██████████▍                      | 15763/50000 [2:51:32<6:19:55,  1.50it/s]


 32%|██████████▍                      | 15764/50000 [2:51:32<6:09:08,  1.55it/s]


 32%|██████████▍                      | 15765/50000 [2:51:33<6:22:39,  1.49it/s]


 32%|██████████▍                      | 15766/50000 [2:51:34<6:24:00,  1.49it/s]


 32%|██████████▍                      | 15767/50000 [2:51:34<6:22:54,  1.49it/s]


 32%|██████████▍                      | 15768/50000 [2:51:35<6:08:04,  1.55it/s]


 32%|██████████▍                      | 15769/50000 [2:51:36<5:54:25,  1.61it/s]


 32%|██████████▍                      | 15770/50000 [2:51:36<5:44:45,  1.65it/s]


 32%|██████████▍                      | 15771/50000 [2:51:37<5:43:21,  1.66it/s]


 32%|██████████▍                      | 15772/50000 [2:51:37<5:54:27,  1.61it/s]


 32%|██████████▍                      | 15773/50000 [2:51:38<5:49:29,  1.63it/s]


 32%|██████████▍                      | 15774/50000 [2:51:39<5:42:23,  1.67it/s]


 32%|██████████▍                      | 15775/50000 [2:51:39<5:48:00,  1.64it/s]


 32%|██████████▍                      | 15776/50000 [2:51:40<5:50:42,  1.63it/s]


 32%|██████████▍                      | 15777/50000 [2:51:40<5:32:39,  1.71it/s]


 32%|██████████▍                      | 15778/50000 [2:51:41<5:32:28,  1.72it/s]


 32%|██████████▍                      | 15779/50000 [2:51:42<6:00:56,  1.58it/s]


 32%|██████████▍                      | 15780/50000 [2:51:42<5:55:31,  1.60it/s]


 32%|██████████▍                      | 15781/50000 [2:51:43<6:27:14,  1.47it/s]


 32%|██████████▍                      | 15782/50000 [2:51:44<6:05:13,  1.56it/s]


 32%|██████████▍                      | 15783/50000 [2:51:44<6:42:58,  1.42it/s]


 32%|██████████▍                      | 15784/50000 [2:51:45<7:05:56,  1.34it/s]


 32%|██████████▍                      | 15785/50000 [2:51:46<6:59:35,  1.36it/s]


 32%|██████████▍                      | 15786/50000 [2:51:47<7:04:26,  1.34it/s]


 32%|██████████▍                      | 15787/50000 [2:51:47<6:51:55,  1.38it/s]


 32%|██████████▍                      | 15788/50000 [2:51:48<6:23:40,  1.49it/s]


 32%|██████████▍                      | 15789/50000 [2:51:49<6:50:16,  1.39it/s]


 32%|██████████▍                      | 15790/50000 [2:51:49<6:29:59,  1.46it/s]


 32%|██████████▍                      | 15791/50000 [2:51:50<6:20:14,  1.50it/s]


 32%|██████████▍                      | 15792/50000 [2:51:51<6:03:38,  1.57it/s]


 32%|██████████▍                      | 15793/50000 [2:51:51<6:08:03,  1.55it/s]


 32%|██████████▍                      | 15794/50000 [2:51:52<6:25:43,  1.48it/s]


 32%|██████████▍                      | 15795/50000 [2:51:53<6:07:38,  1.55it/s]


 32%|██████████▍                      | 15796/50000 [2:51:53<6:36:56,  1.44it/s]


 32%|██████████▍                      | 15797/50000 [2:51:54<6:25:00,  1.48it/s]


 32%|██████████▍                      | 15798/50000 [2:51:55<6:20:56,  1.50it/s]


 32%|██████████▍                      | 15799/50000 [2:51:55<6:08:22,  1.55it/s]


 32%|██████████▍                      | 15800/50000 [2:51:56<5:56:21,  1.60it/s]
                                                                                
{'loss': 3.4004, 'grad_norm': 3.279200315475464, 'learning_rate': 0.000684, 'epoch': 0.83}

 32%|██████████▍                      | 15800/50000 [2:51:56<5:56:21,  1.60it/s]


 32%|██████████▍                      | 15801/50000 [2:51:56<5:50:06,  1.63it/s]


 32%|██████████▍                      | 15802/50000 [2:51:57<5:56:27,  1.60it/s]


 32%|██████████▍                      | 15803/50000 [2:51:58<5:42:37,  1.66it/s]


 32%|██████████▍                      | 15804/50000 [2:51:58<6:05:44,  1.56it/s]


 32%|██████████▍                      | 15805/50000 [2:51:59<5:51:18,  1.62it/s]


 32%|██████████▍                      | 15806/50000 [2:52:00<5:44:25,  1.65it/s]


 32%|██████████▍                      | 15807/50000 [2:52:00<6:06:59,  1.55it/s]


 32%|██████████▍                      | 15808/50000 [2:52:01<5:41:53,  1.67it/s]


 32%|██████████▍                      | 15809/50000 [2:52:01<5:39:56,  1.68it/s]


 32%|██████████▍                      | 15810/50000 [2:52:02<5:46:10,  1.65it/s]


 32%|██████████▍                      | 15811/50000 [2:52:03<6:11:25,  1.53it/s]


 32%|██████████▍                      | 15812/50000 [2:52:03<6:11:23,  1.53it/s]


 32%|██████████▍                      | 15813/50000 [2:52:04<6:08:11,  1.55it/s]


 32%|██████████▍                      | 15814/50000 [2:52:05<6:11:18,  1.53it/s]


 32%|██████████▍                      | 15815/50000 [2:52:05<6:11:15,  1.53it/s]


 32%|██████████▍                      | 15816/50000 [2:52:06<5:58:25,  1.59it/s]


 32%|██████████▍                      | 15817/50000 [2:52:06<5:50:27,  1.63it/s]


 32%|██████████▍                      | 15818/50000 [2:52:07<6:11:07,  1.54it/s]


 32%|██████████▍                      | 15819/50000 [2:52:08<6:24:55,  1.48it/s]


 32%|██████████▍                      | 15820/50000 [2:52:09<6:49:14,  1.39it/s]


 32%|██████████▍                      | 15821/50000 [2:52:10<6:54:49,  1.37it/s]


 32%|██████████▍                      | 15822/50000 [2:52:10<6:42:04,  1.42it/s]


 32%|██████████▍                      | 15823/50000 [2:52:11<6:36:30,  1.44it/s]


 32%|██████████▍                      | 15824/50000 [2:52:11<6:05:58,  1.56it/s]


 32%|██████████▍                      | 15825/50000 [2:52:12<6:10:03,  1.54it/s]


 32%|██████████▍                      | 15826/50000 [2:52:13<6:06:28,  1.55it/s]


 32%|██████████▍                      | 15827/50000 [2:52:13<5:58:26,  1.59it/s]


 32%|██████████▍                      | 15828/50000 [2:52:14<6:08:01,  1.55it/s]


 32%|██████████▍                      | 15829/50000 [2:52:15<6:12:38,  1.53it/s]


 32%|██████████▍                      | 15830/50000 [2:52:15<6:29:15,  1.46it/s]


 32%|██████████▍                      | 15831/50000 [2:52:16<7:13:54,  1.31it/s]


 32%|██████████▍                      | 15832/50000 [2:52:17<6:51:50,  1.38it/s]


 32%|██████████▍                      | 15833/50000 [2:52:17<6:17:27,  1.51it/s]


 32%|██████████▍                      | 15834/50000 [2:52:18<6:03:46,  1.57it/s]


 32%|██████████▍                      | 15835/50000 [2:52:19<5:53:04,  1.61it/s]


 32%|██████████▍                      | 15836/50000 [2:52:19<5:47:39,  1.64it/s]


 32%|██████████▍                      | 15837/50000 [2:52:20<6:13:32,  1.52it/s]


 32%|██████████▍                      | 15838/50000 [2:52:21<6:08:29,  1.55it/s]


 32%|██████████▍                      | 15839/50000 [2:52:21<6:07:02,  1.55it/s]


 32%|██████████▍                      | 15840/50000 [2:52:22<6:36:16,  1.44it/s]


 32%|██████████▍                      | 15841/50000 [2:52:23<6:27:34,  1.47it/s]


 32%|██████████▍                      | 15842/50000 [2:52:24<6:53:57,  1.38it/s]


 32%|██████████▍                      | 15843/50000 [2:52:24<6:51:08,  1.38it/s]


 32%|██████████▍                      | 15844/50000 [2:52:25<6:50:17,  1.39it/s]


 32%|██████████▍                      | 15845/50000 [2:52:25<6:14:35,  1.52it/s]


 32%|██████████▍                      | 15846/50000 [2:52:26<6:13:48,  1.52it/s]


 32%|██████████▍                      | 15847/50000 [2:52:27<6:25:54,  1.48it/s]


 32%|██████████▍                      | 15848/50000 [2:52:27<6:12:01,  1.53it/s]


 32%|██████████▍                      | 15849/50000 [2:52:28<6:46:52,  1.40it/s]


 32%|██████████▍                      | 15850/50000 [2:52:29<6:34:16,  1.44it/s]


 32%|██████████▍                      | 15851/50000 [2:52:30<6:25:23,  1.48it/s]


 32%|██████████▍                      | 15852/50000 [2:52:30<6:26:48,  1.47it/s]


 32%|██████████▍                      | 15853/50000 [2:52:31<6:41:07,  1.42it/s]


 32%|██████████▍                      | 15854/50000 [2:52:32<6:07:53,  1.55it/s]


 32%|██████████▍                      | 15855/50000 [2:52:32<6:08:39,  1.54it/s]


 32%|██████████▍                      | 15856/50000 [2:52:33<6:08:17,  1.55it/s]


 32%|██████████▍                      | 15857/50000 [2:52:34<6:25:37,  1.48it/s]


 32%|██████████▍                      | 15858/50000 [2:52:34<6:16:50,  1.51it/s]


 32%|██████████▍                      | 15859/50000 [2:52:35<6:32:43,  1.45it/s]


 32%|██████████▍                      | 15860/50000 [2:52:36<6:26:04,  1.47it/s]


 32%|██████████▍                      | 15861/50000 [2:52:36<6:24:43,  1.48it/s]


 32%|██████████▍                      | 15862/50000 [2:52:37<6:07:39,  1.55it/s]


 32%|██████████▍                      | 15863/50000 [2:52:38<6:07:28,  1.55it/s]


 32%|██████████▍                      | 15864/50000 [2:52:38<5:55:34,  1.60it/s]


 32%|██████████▍                      | 15865/50000 [2:52:39<6:47:45,  1.40it/s]


 32%|██████████▍                      | 15866/50000 [2:52:40<6:33:29,  1.45it/s]


 32%|██████████▍                      | 15867/50000 [2:52:40<6:14:49,  1.52it/s]


 32%|██████████▍                      | 15868/50000 [2:52:41<6:11:22,  1.53it/s]


 32%|██████████▍                      | 15869/50000 [2:52:41<6:03:32,  1.56it/s]


 32%|██████████▍                      | 15870/50000 [2:52:42<6:08:57,  1.54it/s]


 32%|██████████▍                      | 15871/50000 [2:52:43<5:45:29,  1.65it/s]


 32%|██████████▍                      | 15872/50000 [2:52:43<5:31:19,  1.72it/s]


 32%|██████████▍                      | 15873/50000 [2:52:44<5:26:20,  1.74it/s]


 32%|██████████▍                      | 15874/50000 [2:52:44<5:38:55,  1.68it/s]


 32%|██████████▍                      | 15875/50000 [2:52:45<5:39:02,  1.68it/s]


 32%|██████████▍                      | 15876/50000 [2:52:46<6:16:55,  1.51it/s]


 32%|██████████▍                      | 15877/50000 [2:52:46<6:13:12,  1.52it/s]


 32%|██████████▍                      | 15878/50000 [2:52:47<5:43:11,  1.66it/s]


 32%|██████████▍                      | 15879/50000 [2:52:48<6:05:04,  1.56it/s]


 32%|██████████▍                      | 15880/50000 [2:52:48<5:43:07,  1.66it/s]


 32%|██████████▍                      | 15881/50000 [2:52:49<5:35:40,  1.69it/s]


 32%|██████████▍                      | 15882/50000 [2:52:49<6:02:05,  1.57it/s]


 32%|██████████▍                      | 15883/50000 [2:52:50<6:01:46,  1.57it/s]


 32%|██████████▍                      | 15884/50000 [2:52:51<5:51:51,  1.62it/s]


 32%|██████████▍                      | 15885/50000 [2:52:51<5:52:28,  1.61it/s]


 32%|██████████▍                      | 15886/50000 [2:52:52<6:09:05,  1.54it/s]


 32%|██████████▍                      | 15887/50000 [2:52:53<6:00:03,  1.58it/s]


 32%|██████████▍                      | 15888/50000 [2:52:53<6:07:50,  1.55it/s]


 32%|██████████▍                      | 15889/50000 [2:52:54<5:58:08,  1.59it/s]


 32%|██████████▍                      | 15890/50000 [2:52:55<6:44:18,  1.41it/s]


 32%|██████████▍                      | 15891/50000 [2:52:56<7:03:28,  1.34it/s]


 32%|██████████▍                      | 15892/50000 [2:52:56<6:52:00,  1.38it/s]


 32%|██████████▍                      | 15893/50000 [2:52:57<6:29:47,  1.46it/s]


 32%|██████████▍                      | 15894/50000 [2:52:57<5:58:42,  1.58it/s]


 32%|██████████▍                      | 15895/50000 [2:52:58<5:52:26,  1.61it/s]


 32%|██████████▍                      | 15896/50000 [2:52:59<6:25:56,  1.47it/s]


 32%|██████████▍                      | 15897/50000 [2:52:59<6:12:36,  1.53it/s]


 32%|██████████▍                      | 15898/50000 [2:53:00<6:31:49,  1.45it/s]


 32%|██████████▍                      | 15899/50000 [2:53:01<6:36:40,  1.43it/s]


 32%|██████████▍                      | 15900/50000 [2:53:02<7:00:55,  1.35it/s]
                                                                                
{'loss': 3.3796, 'grad_norm': 4.287536144256592, 'learning_rate': 0.0006820000000000001, 'epoch': 0.83}

 32%|██████████▍                      | 15900/50000 [2:53:02<7:00:55,  1.35it/s]


 32%|██████████▍                      | 15901/50000 [2:53:02<6:34:00,  1.44it/s]


 32%|██████████▍                      | 15902/50000 [2:53:03<6:30:07,  1.46it/s]


 32%|██████████▍                      | 15903/50000 [2:53:04<6:10:13,  1.53it/s]


 32%|██████████▍                      | 15904/50000 [2:53:04<6:13:33,  1.52it/s]


 32%|██████████▍                      | 15905/50000 [2:53:05<6:09:18,  1.54it/s]


 32%|██████████▍                      | 15906/50000 [2:53:05<5:56:41,  1.59it/s]


 32%|██████████▍                      | 15907/50000 [2:53:06<5:49:58,  1.62it/s]


 32%|██████████▍                      | 15908/50000 [2:53:07<5:37:49,  1.68it/s]


 32%|██████████▍                      | 15909/50000 [2:53:07<5:51:49,  1.61it/s]


 32%|██████████▌                      | 15910/50000 [2:53:08<5:54:11,  1.60it/s]


 32%|██████████▌                      | 15911/50000 [2:53:09<6:03:16,  1.56it/s]


 32%|██████████▌                      | 15912/50000 [2:53:09<6:09:55,  1.54it/s]


 32%|██████████▌                      | 15913/50000 [2:53:10<6:06:50,  1.55it/s]


 32%|██████████▌                      | 15914/50000 [2:53:11<6:22:15,  1.49it/s]


 32%|██████████▌                      | 15915/50000 [2:53:11<6:10:01,  1.54it/s]


 32%|██████████▌                      | 15916/50000 [2:53:12<6:11:09,  1.53it/s]


 32%|██████████▌                      | 15917/50000 [2:53:12<5:55:44,  1.60it/s]


 32%|██████████▌                      | 15918/50000 [2:53:13<5:50:24,  1.62it/s]


 32%|██████████▌                      | 15919/50000 [2:53:14<5:53:05,  1.61it/s]


 32%|██████████▌                      | 15920/50000 [2:53:14<5:48:35,  1.63it/s]


 32%|██████████▌                      | 15921/50000 [2:53:15<5:53:57,  1.60it/s]


 32%|██████████▌                      | 15922/50000 [2:53:15<5:45:58,  1.64it/s]


 32%|██████████▌                      | 15923/50000 [2:53:16<5:53:09,  1.61it/s]


 32%|██████████▌                      | 15924/50000 [2:53:17<5:43:07,  1.66it/s]


 32%|██████████▌                      | 15925/50000 [2:53:17<5:52:11,  1.61it/s]


 32%|██████████▌                      | 15926/50000 [2:53:18<6:08:20,  1.54it/s]


 32%|██████████▌                      | 15927/50000 [2:53:19<5:57:01,  1.59it/s]


 32%|██████████▌                      | 15928/50000 [2:53:19<6:34:11,  1.44it/s]


 32%|██████████▌                      | 15929/50000 [2:53:20<6:11:05,  1.53it/s]


 32%|██████████▌                      | 15930/50000 [2:53:21<5:45:48,  1.64it/s]


 32%|██████████▌                      | 15931/50000 [2:53:21<5:50:14,  1.62it/s]


 32%|██████████▌                      | 15932/50000 [2:53:22<5:54:23,  1.60it/s]


 32%|██████████▌                      | 15933/50000 [2:53:22<5:33:11,  1.70it/s]


 32%|██████████▌                      | 15934/50000 [2:53:23<5:36:07,  1.69it/s]


 32%|██████████▌                      | 15935/50000 [2:53:24<6:08:31,  1.54it/s]


 32%|██████████▌                      | 15936/50000 [2:53:24<5:54:22,  1.60it/s]


 32%|██████████▌                      | 15937/50000 [2:53:25<6:05:02,  1.56it/s]


 32%|██████████▌                      | 15938/50000 [2:53:26<6:33:38,  1.44it/s]


 32%|██████████▌                      | 15939/50000 [2:53:26<6:11:51,  1.53it/s]


 32%|██████████▌                      | 15940/50000 [2:53:27<5:48:29,  1.63it/s]


 32%|██████████▌                      | 15941/50000 [2:53:28<5:54:45,  1.60it/s]


 32%|██████████▌                      | 15942/50000 [2:53:28<6:35:17,  1.44it/s]


 32%|██████████▌                      | 15943/50000 [2:53:29<6:23:41,  1.48it/s]


 32%|██████████▌                      | 15944/50000 [2:53:30<6:04:48,  1.56it/s]


 32%|██████████▌                      | 15945/50000 [2:53:30<6:08:49,  1.54it/s]


 32%|██████████▌                      | 15946/50000 [2:53:31<6:10:31,  1.53it/s]


 32%|██████████▌                      | 15947/50000 [2:53:32<6:11:40,  1.53it/s]


 32%|██████████▌                      | 15948/50000 [2:53:32<6:03:52,  1.56it/s]


 32%|██████████▌                      | 15949/50000 [2:53:33<6:24:40,  1.48it/s]


 32%|██████████▌                      | 15950/50000 [2:53:34<7:04:34,  1.34it/s]


 32%|██████████▌                      | 15951/50000 [2:53:35<7:00:21,  1.35it/s]


 32%|██████████▌                      | 15952/50000 [2:53:35<6:26:55,  1.47it/s]


 32%|██████████▌                      | 15953/50000 [2:53:36<6:22:47,  1.48it/s]


 32%|██████████▌                      | 15954/50000 [2:53:37<6:37:37,  1.43it/s]


 32%|██████████▌                      | 15955/50000 [2:53:37<6:21:36,  1.49it/s]


 32%|██████████▌                      | 15956/50000 [2:53:38<5:54:56,  1.60it/s]


 32%|██████████▌                      | 15957/50000 [2:53:38<6:17:59,  1.50it/s]


 32%|██████████▌                      | 15958/50000 [2:53:39<6:18:38,  1.50it/s]


 32%|██████████▌                      | 15959/50000 [2:53:40<6:08:17,  1.54it/s]


 32%|██████████▌                      | 15960/50000 [2:53:40<6:05:24,  1.55it/s]


 32%|██████████▌                      | 15961/50000 [2:53:41<6:04:30,  1.56it/s]


 32%|██████████▌                      | 15962/50000 [2:53:42<6:10:20,  1.53it/s]


 32%|██████████▌                      | 15963/50000 [2:53:42<5:56:06,  1.59it/s]


 32%|██████████▌                      | 15964/50000 [2:53:43<5:55:47,  1.59it/s]


 32%|██████████▌                      | 15965/50000 [2:53:44<6:05:29,  1.55it/s]


 32%|██████████▌                      | 15966/50000 [2:53:44<5:55:56,  1.59it/s]


 32%|██████████▌                      | 15967/50000 [2:53:45<5:58:56,  1.58it/s]


 32%|██████████▌                      | 15968/50000 [2:53:45<6:11:50,  1.53it/s]


 32%|██████████▌                      | 15969/50000 [2:53:46<6:00:18,  1.57it/s]


 32%|██████████▌                      | 15970/50000 [2:53:47<5:52:16,  1.61it/s]


 32%|██████████▌                      | 15971/50000 [2:53:47<5:45:35,  1.64it/s]


 32%|██████████▌                      | 15972/50000 [2:53:48<5:44:11,  1.65it/s]


 32%|██████████▌                      | 15973/50000 [2:53:49<6:03:07,  1.56it/s]


 32%|██████████▌                      | 15974/50000 [2:53:49<6:21:36,  1.49it/s]


 32%|██████████▌                      | 15975/50000 [2:53:50<6:22:15,  1.48it/s]


 32%|██████████▌                      | 15976/50000 [2:53:51<6:15:11,  1.51it/s]


 32%|██████████▌                      | 15977/50000 [2:53:51<6:05:52,  1.55it/s]


 32%|██████████▌                      | 15978/50000 [2:53:52<5:44:54,  1.64it/s]


 32%|██████████▌                      | 15979/50000 [2:53:52<6:02:01,  1.57it/s]


 32%|██████████▌                      | 15980/50000 [2:53:53<5:51:29,  1.61it/s]


 32%|██████████▌                      | 15981/50000 [2:53:54<5:53:40,  1.60it/s]


 32%|██████████▌                      | 15982/50000 [2:53:54<5:57:18,  1.59it/s]


 32%|██████████▌                      | 15983/50000 [2:53:55<5:53:27,  1.60it/s]


 32%|██████████▌                      | 15984/50000 [2:53:56<6:01:28,  1.57it/s]


 32%|██████████▌                      | 15985/50000 [2:53:56<5:59:35,  1.58it/s]


 32%|██████████▌                      | 15986/50000 [2:53:57<5:48:34,  1.63it/s]


 32%|██████████▌                      | 15987/50000 [2:53:58<6:13:27,  1.52it/s]


 32%|██████████▌                      | 15988/50000 [2:53:58<5:56:09,  1.59it/s]


 32%|██████████▌                      | 15989/50000 [2:53:59<6:09:55,  1.53it/s]


 32%|██████████▌                      | 15990/50000 [2:54:00<6:28:24,  1.46it/s]


 32%|██████████▌                      | 15991/50000 [2:54:00<6:27:03,  1.46it/s]


 32%|██████████▌                      | 15992/50000 [2:54:01<6:00:09,  1.57it/s]


 32%|██████████▌                      | 15993/50000 [2:54:01<5:52:43,  1.61it/s]


 32%|██████████▌                      | 15994/50000 [2:54:02<5:46:13,  1.64it/s]


 32%|██████████▌                      | 15995/50000 [2:54:03<6:08:57,  1.54it/s]


 32%|██████████▌                      | 15996/50000 [2:54:03<6:08:18,  1.54it/s]


 32%|██████████▌                      | 15997/50000 [2:54:04<6:21:08,  1.49it/s]


 32%|██████████▌                      | 15998/50000 [2:54:05<6:21:02,  1.49it/s]


 32%|██████████▌                      | 15999/50000 [2:54:05<6:08:13,  1.54it/s]


 32%|██████████▌                      | 16000/50000 [2:54:06<6:27:33,  1.46it/s]
                                                                                
{'loss': 3.3531, 'grad_norm': 4.387000560760498, 'learning_rate': 0.00068, 'epoch': 0.84}

 32%|██████████▌                      | 16000/50000 [2:54:06<6:27:33,  1.46it/s]


 32%|██████████▌                      | 16001/50000 [2:54:07<6:44:28,  1.40it/s]


 32%|██████████▌                      | 16002/50000 [2:54:08<6:39:28,  1.42it/s]


 32%|██████████▌                      | 16003/50000 [2:54:08<6:22:03,  1.48it/s]


 32%|██████████▌                      | 16004/50000 [2:54:09<6:18:30,  1.50it/s]


 32%|██████████▌                      | 16005/50000 [2:54:09<5:59:03,  1.58it/s]


 32%|██████████▌                      | 16006/50000 [2:54:10<6:21:10,  1.49it/s]


 32%|██████████▌                      | 16007/50000 [2:54:11<6:13:36,  1.52it/s]


 32%|██████████▌                      | 16008/50000 [2:54:11<6:13:38,  1.52it/s]


 32%|██████████▌                      | 16009/50000 [2:54:12<6:08:15,  1.54it/s]


 32%|██████████▌                      | 16010/50000 [2:54:13<6:04:16,  1.56it/s]


 32%|██████████▌                      | 16011/50000 [2:54:13<6:06:49,  1.54it/s]


 32%|██████████▌                      | 16012/50000 [2:54:14<6:12:55,  1.52it/s]


 32%|██████████▌                      | 16013/50000 [2:54:15<5:58:52,  1.58it/s]


 32%|██████████▌                      | 16014/50000 [2:54:15<5:51:11,  1.61it/s]


 32%|██████████▌                      | 16015/50000 [2:54:16<5:57:33,  1.58it/s]


 32%|██████████▌                      | 16016/50000 [2:54:16<5:57:46,  1.58it/s]


 32%|██████████▌                      | 16017/50000 [2:54:17<6:03:11,  1.56it/s]


 32%|██████████▌                      | 16018/50000 [2:54:18<5:48:55,  1.62it/s]


 32%|██████████▌                      | 16019/50000 [2:54:18<5:42:05,  1.66it/s]


 32%|██████████▌                      | 16020/50000 [2:54:19<5:52:22,  1.61it/s]


 32%|██████████▌                      | 16021/50000 [2:54:20<6:13:47,  1.52it/s]


 32%|██████████▌                      | 16022/50000 [2:54:20<6:14:28,  1.51it/s]


 32%|██████████▌                      | 16023/50000 [2:54:21<6:34:45,  1.43it/s]


 32%|██████████▌                      | 16024/50000 [2:54:22<6:25:49,  1.47it/s]


 32%|██████████▌                      | 16025/50000 [2:54:22<6:20:56,  1.49it/s]


 32%|██████████▌                      | 16026/50000 [2:54:23<6:46:16,  1.39it/s]


 32%|██████████▌                      | 16027/50000 [2:54:24<6:33:55,  1.44it/s]


 32%|██████████▌                      | 16028/50000 [2:54:25<6:28:34,  1.46it/s]


 32%|██████████▌                      | 16029/50000 [2:54:25<5:55:20,  1.59it/s]


 32%|██████████▌                      | 16030/50000 [2:54:26<5:56:13,  1.59it/s]


 32%|██████████▌                      | 16031/50000 [2:54:26<6:11:19,  1.52it/s]


 32%|██████████▌                      | 16032/50000 [2:54:27<6:02:00,  1.56it/s]


 32%|██████████▌                      | 16033/50000 [2:54:28<6:02:57,  1.56it/s]


 32%|██████████▌                      | 16034/50000 [2:54:28<5:48:39,  1.62it/s]


 32%|██████████▌                      | 16035/50000 [2:54:29<5:30:32,  1.71it/s]


 32%|██████████▌                      | 16036/50000 [2:54:29<5:54:10,  1.60it/s]


 32%|██████████▌                      | 16037/50000 [2:54:30<6:02:50,  1.56it/s]


 32%|██████████▌                      | 16038/50000 [2:54:31<5:50:41,  1.61it/s]


 32%|██████████▌                      | 16039/50000 [2:54:31<6:01:24,  1.57it/s]


 32%|██████████▌                      | 16040/50000 [2:54:32<6:22:21,  1.48it/s]


 32%|██████████▌                      | 16041/50000 [2:54:33<6:31:31,  1.45it/s]


 32%|██████████▌                      | 16042/50000 [2:54:33<6:14:57,  1.51it/s]


 32%|██████████▌                      | 16043/50000 [2:54:34<6:15:17,  1.51it/s]


 32%|██████████▌                      | 16044/50000 [2:54:35<6:22:57,  1.48it/s]


 32%|██████████▌                      | 16045/50000 [2:54:35<6:20:52,  1.49it/s]


 32%|██████████▌                      | 16046/50000 [2:54:36<6:05:29,  1.55it/s]


 32%|██████████▌                      | 16047/50000 [2:54:37<5:53:36,  1.60it/s]


 32%|██████████▌                      | 16048/50000 [2:54:37<5:55:58,  1.59it/s]


 32%|██████████▌                      | 16049/50000 [2:54:38<6:14:07,  1.51it/s]


 32%|██████████▌                      | 16050/50000 [2:54:39<5:58:50,  1.58it/s]


 32%|██████████▌                      | 16051/50000 [2:54:39<5:44:49,  1.64it/s]


 32%|██████████▌                      | 16052/50000 [2:54:40<5:38:28,  1.67it/s]


 32%|██████████▌                      | 16053/50000 [2:54:40<5:25:36,  1.74it/s]


 32%|██████████▌                      | 16054/50000 [2:54:41<5:50:41,  1.61it/s]


 32%|██████████▌                      | 16055/50000 [2:54:42<5:53:51,  1.60it/s]


 32%|██████████▌                      | 16056/50000 [2:54:42<6:07:55,  1.54it/s]


 32%|██████████▌                      | 16057/50000 [2:54:43<6:50:03,  1.38it/s]


 32%|██████████▌                      | 16058/50000 [2:54:44<6:48:49,  1.38it/s]


 32%|██████████▌                      | 16059/50000 [2:54:45<6:34:34,  1.43it/s]


 32%|██████████▌                      | 16060/50000 [2:54:45<6:13:56,  1.51it/s]


 32%|██████████▌                      | 16061/50000 [2:54:46<6:08:18,  1.54it/s]


 32%|██████████▌                      | 16062/50000 [2:54:46<6:11:50,  1.52it/s]


 32%|██████████▌                      | 16063/50000 [2:54:47<6:22:19,  1.48it/s]


 32%|██████████▌                      | 16064/50000 [2:54:48<6:03:28,  1.56it/s]


 32%|██████████▌                      | 16065/50000 [2:54:48<5:59:58,  1.57it/s]


 32%|██████████▌                      | 16066/50000 [2:54:49<5:40:49,  1.66it/s]


 32%|██████████▌                      | 16067/50000 [2:54:50<5:50:34,  1.61it/s]


 32%|██████████▌                      | 16068/50000 [2:54:50<5:48:40,  1.62it/s]


 32%|██████████▌                      | 16069/50000 [2:54:51<5:56:00,  1.59it/s]


 32%|██████████▌                      | 16070/50000 [2:54:51<5:46:12,  1.63it/s]


 32%|██████████▌                      | 16071/50000 [2:54:52<5:44:33,  1.64it/s]


 32%|██████████▌                      | 16072/50000 [2:54:53<6:18:50,  1.49it/s]


 32%|██████████▌                      | 16073/50000 [2:54:53<6:13:05,  1.52it/s]


 32%|██████████▌                      | 16074/50000 [2:54:54<6:16:17,  1.50it/s]


 32%|██████████▌                      | 16075/50000 [2:54:55<6:01:24,  1.56it/s]


 32%|██████████▌                      | 16076/50000 [2:54:55<5:59:29,  1.57it/s]


 32%|██████████▌                      | 16077/50000 [2:54:56<5:52:28,  1.60it/s]


 32%|██████████▌                      | 16078/50000 [2:54:56<5:35:12,  1.69it/s]


 32%|██████████▌                      | 16079/50000 [2:54:57<5:36:28,  1.68it/s]


 32%|██████████▌                      | 16080/50000 [2:54:58<6:14:53,  1.51it/s]


 32%|██████████▌                      | 16081/50000 [2:54:58<6:03:12,  1.56it/s]


 32%|██████████▌                      | 16082/50000 [2:54:59<5:59:28,  1.57it/s]


 32%|██████████▌                      | 16083/50000 [2:55:00<5:33:49,  1.69it/s]


 32%|██████████▌                      | 16084/50000 [2:55:00<5:21:08,  1.76it/s]


 32%|██████████▌                      | 16085/50000 [2:55:01<5:25:34,  1.74it/s]


 32%|██████████▌                      | 16086/50000 [2:55:01<5:40:41,  1.66it/s]


 32%|██████████▌                      | 16087/50000 [2:55:02<5:41:22,  1.66it/s]


 32%|██████████▌                      | 16088/50000 [2:55:03<5:50:25,  1.61it/s]


 32%|██████████▌                      | 16089/50000 [2:55:03<6:01:06,  1.57it/s]


 32%|██████████▌                      | 16090/50000 [2:55:04<5:47:14,  1.63it/s]


 32%|██████████▌                      | 16091/50000 [2:55:04<5:54:41,  1.59it/s]


 32%|██████████▌                      | 16092/50000 [2:55:05<6:18:27,  1.49it/s]


 32%|██████████▌                      | 16093/50000 [2:55:06<6:03:37,  1.55it/s]


 32%|██████████▌                      | 16094/50000 [2:55:06<6:02:47,  1.56it/s]


 32%|██████████▌                      | 16095/50000 [2:55:07<6:07:30,  1.54it/s]


 32%|██████████▌                      | 16096/50000 [2:55:08<6:04:29,  1.55it/s]


 32%|██████████▌                      | 16097/50000 [2:55:08<6:00:17,  1.57it/s]


 32%|██████████▌                      | 16098/50000 [2:55:09<6:12:18,  1.52it/s]


 32%|██████████▋                      | 16099/50000 [2:55:10<6:10:59,  1.52it/s]


 32%|██████████▋                      | 16100/50000 [2:55:10<6:06:52,  1.54it/s]
                                                                                
{'loss': 3.3582, 'grad_norm': 6.557674884796143, 'learning_rate': 0.0006780000000000001, 'epoch': 0.84}

 32%|██████████▋                      | 16100/50000 [2:55:10<6:06:52,  1.54it/s]


 32%|██████████▋                      | 16101/50000 [2:55:11<6:18:00,  1.49it/s]


 32%|██████████▋                      | 16102/50000 [2:55:12<6:34:39,  1.43it/s]


 32%|██████████▋                      | 16103/50000 [2:55:13<6:32:01,  1.44it/s]


 32%|██████████▋                      | 16104/50000 [2:55:13<6:46:40,  1.39it/s]


 32%|██████████▋                      | 16105/50000 [2:55:14<6:48:16,  1.38it/s]


 32%|██████████▋                      | 16106/50000 [2:55:15<6:33:02,  1.44it/s]


 32%|██████████▋                      | 16107/50000 [2:55:15<6:10:37,  1.52it/s]


 32%|██████████▋                      | 16108/50000 [2:55:16<6:36:27,  1.42it/s]


 32%|██████████▋                      | 16109/50000 [2:55:17<6:25:46,  1.46it/s]


 32%|██████████▋                      | 16110/50000 [2:55:17<6:18:15,  1.49it/s]


 32%|██████████▋                      | 16111/50000 [2:55:18<6:29:29,  1.45it/s]


 32%|██████████▋                      | 16112/50000 [2:55:19<6:21:55,  1.48it/s]


 32%|██████████▋                      | 16113/50000 [2:55:19<6:06:00,  1.54it/s]


 32%|██████████▋                      | 16114/50000 [2:55:20<5:48:24,  1.62it/s]


 32%|██████████▋                      | 16115/50000 [2:55:20<5:37:10,  1.67it/s]


 32%|██████████▋                      | 16116/50000 [2:55:21<5:42:08,  1.65it/s]


 32%|██████████▋                      | 16117/50000 [2:55:22<5:55:14,  1.59it/s]


 32%|██████████▋                      | 16118/50000 [2:55:22<5:57:57,  1.58it/s]


 32%|██████████▋                      | 16119/50000 [2:55:23<6:03:07,  1.56it/s]


 32%|██████████▋                      | 16120/50000 [2:55:24<5:59:14,  1.57it/s]


 32%|██████████▋                      | 16121/50000 [2:55:24<5:51:06,  1.61it/s]


 32%|██████████▋                      | 16122/50000 [2:55:25<5:40:55,  1.66it/s]


 32%|██████████▋                      | 16123/50000 [2:55:25<5:52:21,  1.60it/s]


 32%|██████████▋                      | 16124/50000 [2:55:26<5:46:17,  1.63it/s]


 32%|██████████▋                      | 16125/50000 [2:55:27<5:43:05,  1.65it/s]


 32%|██████████▋                      | 16126/50000 [2:55:27<5:39:11,  1.66it/s]


 32%|██████████▋                      | 16127/50000 [2:55:28<5:25:57,  1.73it/s]


 32%|██████████▋                      | 16128/50000 [2:55:28<5:49:40,  1.61it/s]


 32%|██████████▋                      | 16129/50000 [2:55:29<5:52:58,  1.60it/s]


 32%|██████████▋                      | 16130/50000 [2:55:30<5:54:13,  1.59it/s]


 32%|██████████▋                      | 16131/50000 [2:55:30<6:07:53,  1.53it/s]


 32%|██████████▋                      | 16132/50000 [2:55:31<6:07:48,  1.53it/s]


 32%|██████████▋                      | 16133/50000 [2:55:32<6:37:39,  1.42it/s]


 32%|██████████▋                      | 16134/50000 [2:55:33<6:42:02,  1.40it/s]


 32%|██████████▋                      | 16135/50000 [2:55:33<6:21:03,  1.48it/s]


 32%|██████████▋                      | 16136/50000 [2:55:34<6:21:19,  1.48it/s]


 32%|██████████▋                      | 16137/50000 [2:55:35<6:11:57,  1.52it/s]


 32%|██████████▋                      | 16138/50000 [2:55:35<6:12:49,  1.51it/s]


 32%|██████████▋                      | 16139/50000 [2:55:36<6:03:45,  1.55it/s]


 32%|██████████▋                      | 16140/50000 [2:55:37<6:17:05,  1.50it/s]


 32%|██████████▋                      | 16141/50000 [2:55:37<6:03:27,  1.55it/s]


 32%|██████████▋                      | 16142/50000 [2:55:38<5:53:58,  1.59it/s]


 32%|██████████▋                      | 16143/50000 [2:55:38<5:46:19,  1.63it/s]


 32%|██████████▋                      | 16144/50000 [2:55:39<6:04:53,  1.55it/s]


 32%|██████████▋                      | 16145/50000 [2:55:40<5:54:58,  1.59it/s]


 32%|██████████▋                      | 16146/50000 [2:55:40<5:35:49,  1.68it/s]


 32%|██████████▋                      | 16147/50000 [2:55:41<5:30:24,  1.71it/s]


 32%|██████████▋                      | 16148/50000 [2:55:41<5:59:09,  1.57it/s]


 32%|██████████▋                      | 16149/50000 [2:55:42<5:38:44,  1.67it/s]


 32%|██████████▋                      | 16150/50000 [2:55:43<5:42:18,  1.65it/s]


 32%|██████████▋                      | 16151/50000 [2:55:43<5:41:21,  1.65it/s]


 32%|██████████▋                      | 16152/50000 [2:55:44<5:51:26,  1.61it/s]


 32%|██████████▋                      | 16153/50000 [2:55:44<5:42:24,  1.65it/s]


 32%|██████████▋                      | 16154/50000 [2:55:45<6:08:34,  1.53it/s]


 32%|██████████▋                      | 16155/50000 [2:55:46<5:46:20,  1.63it/s]


 32%|██████████▋                      | 16156/50000 [2:55:46<5:53:47,  1.59it/s]


 32%|██████████▋                      | 16157/50000 [2:55:47<5:54:45,  1.59it/s]


 32%|██████████▋                      | 16158/50000 [2:55:48<6:03:58,  1.55it/s]


 32%|██████████▋                      | 16159/50000 [2:55:48<6:06:10,  1.54it/s]


 32%|██████████▋                      | 16160/50000 [2:55:49<5:59:01,  1.57it/s]


 32%|██████████▋                      | 16161/50000 [2:55:50<6:20:07,  1.48it/s]


 32%|██████████▋                      | 16162/50000 [2:55:50<6:14:08,  1.51it/s]


 32%|██████████▋                      | 16163/50000 [2:55:51<6:17:24,  1.49it/s]


 32%|██████████▋                      | 16164/50000 [2:55:52<6:26:32,  1.46it/s]


 32%|██████████▋                      | 16165/50000 [2:55:53<6:42:35,  1.40it/s]


 32%|██████████▋                      | 16166/50000 [2:55:53<6:15:00,  1.50it/s]


 32%|██████████▋                      | 16167/50000 [2:55:54<5:58:54,  1.57it/s]


 32%|██████████▋                      | 16168/50000 [2:55:54<6:03:34,  1.55it/s]


 32%|██████████▋                      | 16169/50000 [2:55:55<5:49:45,  1.61it/s]


 32%|██████████▋                      | 16170/50000 [2:55:55<5:50:47,  1.61it/s]


 32%|██████████▋                      | 16171/50000 [2:55:56<5:32:47,  1.69it/s]


 32%|██████████▋                      | 16172/50000 [2:55:57<5:19:02,  1.77it/s]


 32%|██████████▋                      | 16173/50000 [2:55:57<5:25:01,  1.73it/s]


 32%|██████████▋                      | 16174/50000 [2:55:58<5:29:23,  1.71it/s]


 32%|██████████▋                      | 16175/50000 [2:55:58<5:36:34,  1.67it/s]


 32%|██████████▋                      | 16176/50000 [2:55:59<5:59:06,  1.57it/s]


 32%|██████████▋                      | 16177/50000 [2:56:00<5:50:52,  1.61it/s]


 32%|██████████▋                      | 16178/50000 [2:56:00<5:54:40,  1.59it/s]


 32%|██████████▋                      | 16179/50000 [2:56:01<5:54:12,  1.59it/s]


 32%|██████████▋                      | 16180/50000 [2:56:02<6:33:23,  1.43it/s]


 32%|██████████▋                      | 16181/50000 [2:56:03<6:39:51,  1.41it/s]


 32%|██████████▋                      | 16182/50000 [2:56:03<6:27:56,  1.45it/s]


 32%|██████████▋                      | 16183/50000 [2:56:04<6:43:33,  1.40it/s]


 32%|██████████▋                      | 16184/50000 [2:56:04<6:08:39,  1.53it/s]


 32%|██████████▋                      | 16185/50000 [2:56:05<6:05:03,  1.54it/s]


 32%|██████████▋                      | 16186/50000 [2:56:06<5:54:00,  1.59it/s]


 32%|██████████▋                      | 16187/50000 [2:56:06<5:49:22,  1.61it/s]


 32%|██████████▋                      | 16188/50000 [2:56:07<5:54:38,  1.59it/s]


 32%|██████████▋                      | 16189/50000 [2:56:07<5:43:27,  1.64it/s]


 32%|██████████▋                      | 16190/50000 [2:56:08<5:47:27,  1.62it/s]


 32%|██████████▋                      | 16191/50000 [2:56:09<5:58:51,  1.57it/s]


 32%|██████████▋                      | 16192/50000 [2:56:09<5:39:24,  1.66it/s]


 32%|██████████▋                      | 16193/50000 [2:56:10<5:39:06,  1.66it/s]


 32%|██████████▋                      | 16194/50000 [2:56:11<6:16:21,  1.50it/s]


 32%|██████████▋                      | 16195/50000 [2:56:12<6:35:24,  1.42it/s]


 32%|██████████▋                      | 16196/50000 [2:56:12<6:45:34,  1.39it/s]


 32%|██████████▋                      | 16197/50000 [2:56:13<6:21:16,  1.48it/s]


 32%|██████████▋                      | 16198/50000 [2:56:14<6:44:46,  1.39it/s]


 32%|██████████▋                      | 16199/50000 [2:56:14<6:35:39,  1.42it/s]


 32%|██████████▋                      | 16200/50000 [2:56:15<6:16:22,  1.50it/s]
                                                                                
{'loss': 3.3982, 'grad_norm': 3.822265625, 'learning_rate': 0.0006760000000000001, 'epoch': 0.85}

 32%|██████████▋                      | 16200/50000 [2:56:15<6:16:22,  1.50it/s]


 32%|██████████▋                      | 16201/50000 [2:56:16<6:17:53,  1.49it/s]


 32%|██████████▋                      | 16202/50000 [2:56:16<6:13:34,  1.51it/s]


 32%|██████████▋                      | 16203/50000 [2:56:17<6:13:43,  1.51it/s]


 32%|██████████▋                      | 16204/50000 [2:56:18<6:06:29,  1.54it/s]


 32%|██████████▋                      | 16205/50000 [2:56:18<6:12:04,  1.51it/s]


 32%|██████████▋                      | 16206/50000 [2:56:19<6:10:28,  1.52it/s]


 32%|██████████▋                      | 16207/50000 [2:56:19<5:54:38,  1.59it/s]


 32%|██████████▋                      | 16208/50000 [2:56:20<5:54:02,  1.59it/s]


 32%|██████████▋                      | 16209/50000 [2:56:21<6:07:32,  1.53it/s]


 32%|██████████▋                      | 16210/50000 [2:56:21<6:04:09,  1.55it/s]


 32%|██████████▋                      | 16211/50000 [2:56:22<6:03:05,  1.55it/s]


 32%|██████████▋                      | 16212/50000 [2:56:23<6:01:21,  1.56it/s]


 32%|██████████▋                      | 16213/50000 [2:56:23<5:47:27,  1.62it/s]


 32%|██████████▋                      | 16214/50000 [2:56:24<5:27:29,  1.72it/s]


 32%|██████████▋                      | 16215/50000 [2:56:24<5:31:03,  1.70it/s]


 32%|██████████▋                      | 16216/50000 [2:56:25<5:30:21,  1.70it/s]


 32%|██████████▋                      | 16217/50000 [2:56:26<5:40:39,  1.65it/s]


 32%|██████████▋                      | 16218/50000 [2:56:26<5:48:44,  1.61it/s]


 32%|██████████▋                      | 16219/50000 [2:56:27<6:28:55,  1.45it/s]


 32%|██████████▋                      | 16220/50000 [2:56:28<6:51:29,  1.37it/s]


 32%|██████████▋                      | 16221/50000 [2:56:29<6:43:45,  1.39it/s]


 32%|██████████▋                      | 16222/50000 [2:56:29<6:29:51,  1.44it/s]


 32%|██████████▋                      | 16223/50000 [2:56:30<6:09:26,  1.52it/s]


 32%|██████████▋                      | 16224/50000 [2:56:31<6:22:06,  1.47it/s]


 32%|██████████▋                      | 16225/50000 [2:56:31<6:46:34,  1.38it/s]


 32%|██████████▋                      | 16226/50000 [2:56:32<6:33:37,  1.43it/s]


 32%|██████████▋                      | 16227/50000 [2:56:33<6:35:10,  1.42it/s]


 32%|██████████▋                      | 16228/50000 [2:56:33<6:12:54,  1.51it/s]


 32%|██████████▋                      | 16229/50000 [2:56:34<6:30:43,  1.44it/s]


 32%|██████████▋                      | 16230/50000 [2:56:35<6:16:11,  1.50it/s]


 32%|██████████▋                      | 16231/50000 [2:56:35<6:11:18,  1.52it/s]


 32%|██████████▋                      | 16232/50000 [2:56:36<6:02:21,  1.55it/s]


 32%|██████████▋                      | 16233/50000 [2:56:37<5:55:21,  1.58it/s]


 32%|██████████▋                      | 16234/50000 [2:56:37<6:02:05,  1.55it/s]


 32%|██████████▋                      | 16235/50000 [2:56:38<6:01:28,  1.56it/s]


 32%|██████████▋                      | 16236/50000 [2:56:39<6:22:42,  1.47it/s]


 32%|██████████▋                      | 16237/50000 [2:56:39<6:03:04,  1.55it/s]


 32%|██████████▋                      | 16238/50000 [2:56:40<6:20:35,  1.48it/s]


 32%|██████████▋                      | 16239/50000 [2:56:41<6:08:54,  1.53it/s]


 32%|██████████▋                      | 16240/50000 [2:56:41<6:04:04,  1.55it/s]


 32%|██████████▋                      | 16241/50000 [2:56:42<6:00:40,  1.56it/s]


 32%|██████████▋                      | 16242/50000 [2:56:43<6:32:17,  1.43it/s]


 32%|██████████▋                      | 16243/50000 [2:56:43<6:13:51,  1.50it/s]


 32%|██████████▋                      | 16244/50000 [2:56:44<6:11:38,  1.51it/s]


 32%|██████████▋                      | 16245/50000 [2:56:44<6:05:52,  1.54it/s]


 32%|██████████▋                      | 16246/50000 [2:56:45<6:18:56,  1.48it/s]


 32%|██████████▋                      | 16247/50000 [2:56:46<6:16:32,  1.49it/s]


 32%|██████████▋                      | 16248/50000 [2:56:46<6:05:07,  1.54it/s]


 32%|██████████▋                      | 16249/50000 [2:56:47<6:16:28,  1.49it/s]


 32%|██████████▋                      | 16250/50000 [2:56:48<6:14:44,  1.50it/s]


 33%|██████████▋                      | 16251/50000 [2:56:49<6:25:31,  1.46it/s]


 33%|██████████▋                      | 16252/50000 [2:56:49<6:32:00,  1.43it/s]


 33%|██████████▋                      | 16253/50000 [2:56:50<6:35:32,  1.42it/s]


 33%|██████████▋                      | 16254/50000 [2:56:51<6:55:38,  1.35it/s]


 33%|██████████▋                      | 16255/50000 [2:56:52<6:50:37,  1.37it/s]


 33%|██████████▋                      | 16256/50000 [2:56:52<6:32:21,  1.43it/s]


 33%|██████████▋                      | 16257/50000 [2:56:53<6:14:19,  1.50it/s]


 33%|██████████▋                      | 16258/50000 [2:56:54<6:34:08,  1.43it/s]


 33%|██████████▋                      | 16259/50000 [2:56:54<6:29:24,  1.44it/s]


 33%|██████████▋                      | 16260/50000 [2:56:55<6:10:06,  1.52it/s]


 33%|██████████▋                      | 16261/50000 [2:56:55<5:56:35,  1.58it/s]


 33%|██████████▋                      | 16262/50000 [2:56:56<5:42:55,  1.64it/s]


 33%|██████████▋                      | 16263/50000 [2:56:57<5:54:37,  1.59it/s]


 33%|██████████▋                      | 16264/50000 [2:56:57<5:57:22,  1.57it/s]


 33%|██████████▋                      | 16265/50000 [2:56:58<5:47:52,  1.62it/s]


 33%|██████████▋                      | 16266/50000 [2:56:58<5:39:58,  1.65it/s]


 33%|██████████▋                      | 16267/50000 [2:56:59<6:06:44,  1.53it/s]


 33%|██████████▋                      | 16268/50000 [2:57:00<6:06:56,  1.53it/s]


 33%|██████████▋                      | 16269/50000 [2:57:00<5:45:12,  1.63it/s]


 33%|██████████▋                      | 16270/50000 [2:57:01<5:47:41,  1.62it/s]


 33%|██████████▋                      | 16271/50000 [2:57:02<5:39:47,  1.65it/s]


 33%|██████████▋                      | 16272/50000 [2:57:02<5:31:39,  1.69it/s]


 33%|██████████▋                      | 16273/50000 [2:57:03<5:54:44,  1.58it/s]


 33%|██████████▋                      | 16274/50000 [2:57:03<5:27:23,  1.72it/s]


 33%|██████████▋                      | 16275/50000 [2:57:04<5:23:38,  1.74it/s]


 33%|██████████▋                      | 16276/50000 [2:57:04<5:34:42,  1.68it/s]


 33%|██████████▋                      | 16277/50000 [2:57:05<5:36:00,  1.67it/s]


 33%|██████████▋                      | 16278/50000 [2:57:06<5:49:42,  1.61it/s]


 33%|██████████▋                      | 16279/50000 [2:57:07<6:12:36,  1.51it/s]


 33%|██████████▋                      | 16280/50000 [2:57:07<6:11:20,  1.51it/s]


 33%|██████████▋                      | 16281/50000 [2:57:08<6:15:36,  1.50it/s]


 33%|██████████▋                      | 16282/50000 [2:57:09<6:16:29,  1.49it/s]


 33%|██████████▋                      | 16283/50000 [2:57:09<6:10:18,  1.52it/s]


 33%|██████████▋                      | 16284/50000 [2:57:10<5:47:11,  1.62it/s]


 33%|██████████▋                      | 16285/50000 [2:57:10<5:58:17,  1.57it/s]


 33%|██████████▋                      | 16286/50000 [2:57:11<6:33:43,  1.43it/s]


 33%|██████████▋                      | 16287/50000 [2:57:12<6:10:37,  1.52it/s]


 33%|██████████▊                      | 16288/50000 [2:57:13<6:39:01,  1.41it/s]


 33%|██████████▊                      | 16289/50000 [2:57:13<6:13:15,  1.51it/s]


 33%|██████████▊                      | 16290/50000 [2:57:14<6:27:36,  1.45it/s]


 33%|██████████▊                      | 16291/50000 [2:57:15<6:12:37,  1.51it/s]


 33%|██████████▊                      | 16292/50000 [2:57:15<5:59:00,  1.56it/s]


 33%|██████████▊                      | 16293/50000 [2:57:16<6:27:41,  1.45it/s]


 33%|██████████▊                      | 16294/50000 [2:57:16<6:07:57,  1.53it/s]


 33%|██████████▊                      | 16295/50000 [2:57:17<5:54:35,  1.58it/s]


 33%|██████████▊                      | 16296/50000 [2:57:18<6:08:48,  1.52it/s]


 33%|██████████▊                      | 16297/50000 [2:57:18<5:45:09,  1.63it/s]


 33%|██████████▊                      | 16298/50000 [2:57:19<5:53:30,  1.59it/s]


 33%|██████████▊                      | 16299/50000 [2:57:20<5:43:28,  1.64it/s]


 33%|██████████▊                      | 16300/50000 [2:57:20<5:46:00,  1.62it/s]
                                                                                
{'loss': 3.3544, 'grad_norm': 3.0289227962493896, 'learning_rate': 0.000674, 'epoch': 0.85}

 33%|██████████▊                      | 16300/50000 [2:57:20<5:46:00,  1.62it/s]


 33%|██████████▊                      | 16301/50000 [2:57:21<5:52:03,  1.60it/s]


 33%|██████████▊                      | 16302/50000 [2:57:22<6:05:54,  1.53it/s]


 33%|██████████▊                      | 16303/50000 [2:57:22<5:56:18,  1.58it/s]


 33%|██████████▊                      | 16304/50000 [2:57:23<6:21:01,  1.47it/s]


 33%|██████████▊                      | 16305/50000 [2:57:24<6:15:29,  1.50it/s]


 33%|██████████▊                      | 16306/50000 [2:57:24<6:40:08,  1.40it/s]


 33%|██████████▊                      | 16307/50000 [2:57:25<6:56:28,  1.35it/s]


 33%|██████████▊                      | 16308/50000 [2:57:26<6:28:36,  1.44it/s]


 33%|██████████▊                      | 16309/50000 [2:57:26<6:21:43,  1.47it/s]


 33%|██████████▊                      | 16310/50000 [2:57:27<6:13:52,  1.50it/s]


 33%|██████████▊                      | 16311/50000 [2:57:28<6:47:41,  1.38it/s]


 33%|██████████▊                      | 16312/50000 [2:57:29<6:39:27,  1.41it/s]


 33%|██████████▊                      | 16313/50000 [2:57:29<6:34:40,  1.42it/s]


 33%|██████████▊                      | 16314/50000 [2:57:30<6:29:10,  1.44it/s]


 33%|██████████▊                      | 16315/50000 [2:57:30<6:00:20,  1.56it/s]


 33%|██████████▊                      | 16316/50000 [2:57:31<6:16:15,  1.49it/s]


 33%|██████████▊                      | 16317/50000 [2:57:32<6:15:18,  1.50it/s]


 33%|██████████▊                      | 16318/50000 [2:57:32<5:59:42,  1.56it/s]


 33%|██████████▊                      | 16319/50000 [2:57:33<5:50:52,  1.60it/s]


 33%|██████████▊                      | 16320/50000 [2:57:34<6:00:59,  1.55it/s]


 33%|██████████▊                      | 16321/50000 [2:57:34<5:48:59,  1.61it/s]


 33%|██████████▊                      | 16322/50000 [2:57:35<5:40:32,  1.65it/s]


 33%|██████████▊                      | 16323/50000 [2:57:35<5:46:13,  1.62it/s]


 33%|██████████▊                      | 16324/50000 [2:57:36<5:51:25,  1.60it/s]


 33%|██████████▊                      | 16325/50000 [2:57:37<5:43:19,  1.63it/s]


 33%|██████████▊                      | 16326/50000 [2:57:37<5:50:33,  1.60it/s]


 33%|██████████▊                      | 16327/50000 [2:57:38<5:58:55,  1.56it/s]


 33%|██████████▊                      | 16328/50000 [2:57:39<5:52:49,  1.59it/s]


 33%|██████████▊                      | 16329/50000 [2:57:39<6:06:40,  1.53it/s]


 33%|██████████▊                      | 16330/50000 [2:57:40<6:04:36,  1.54it/s]


 33%|██████████▊                      | 16331/50000 [2:57:41<6:22:47,  1.47it/s]


 33%|██████████▊                      | 16332/50000 [2:57:42<6:38:10,  1.41it/s]


 33%|██████████▊                      | 16333/50000 [2:57:42<6:29:56,  1.44it/s]


 33%|██████████▊                      | 16334/50000 [2:57:43<6:51:36,  1.36it/s]


 33%|██████████▊                      | 16335/50000 [2:57:44<6:24:20,  1.46it/s]


 33%|██████████▊                      | 16336/50000 [2:57:44<6:06:14,  1.53it/s]


 33%|██████████▊                      | 16337/50000 [2:57:45<6:25:44,  1.45it/s]


 33%|██████████▊                      | 16338/50000 [2:57:46<6:15:47,  1.49it/s]


 33%|██████████▊                      | 16339/50000 [2:57:46<5:56:04,  1.58it/s]


 33%|██████████▊                      | 16340/50000 [2:57:47<6:20:59,  1.47it/s]


 33%|██████████▊                      | 16341/50000 [2:57:48<6:27:49,  1.45it/s]


 33%|██████████▊                      | 16342/50000 [2:57:48<6:17:13,  1.49it/s]


 33%|██████████▊                      | 16343/50000 [2:57:49<6:05:37,  1.53it/s]


 33%|██████████▊                      | 16344/50000 [2:57:49<6:05:54,  1.53it/s]


 33%|██████████▊                      | 16345/50000 [2:57:50<6:15:48,  1.49it/s]


 33%|██████████▊                      | 16346/50000 [2:57:51<6:03:46,  1.54it/s]


 33%|██████████▊                      | 16347/50000 [2:57:51<5:53:13,  1.59it/s]


 33%|██████████▊                      | 16348/50000 [2:57:52<5:43:27,  1.63it/s]


 33%|██████████▊                      | 16349/50000 [2:57:53<5:59:44,  1.56it/s]


 33%|██████████▊                      | 16350/50000 [2:57:53<6:23:14,  1.46it/s]


 33%|██████████▊                      | 16351/50000 [2:57:54<6:38:38,  1.41it/s]


 33%|██████████▊                      | 16352/50000 [2:57:55<6:25:32,  1.45it/s]


 33%|██████████▊                      | 16353/50000 [2:57:56<6:30:41,  1.44it/s]


 33%|██████████▊                      | 16354/50000 [2:57:56<6:32:46,  1.43it/s]


 33%|██████████▊                      | 16355/50000 [2:57:57<6:13:49,  1.50it/s]


 33%|██████████▊                      | 16356/50000 [2:57:58<6:44:16,  1.39it/s]


 33%|██████████▊                      | 16357/50000 [2:57:58<6:29:20,  1.44it/s]


 33%|██████████▊                      | 16358/50000 [2:57:59<6:12:32,  1.51it/s]


 33%|██████████▊                      | 16359/50000 [2:58:00<6:38:03,  1.41it/s]


 33%|██████████▊                      | 16360/50000 [2:58:01<6:43:33,  1.39it/s]


 33%|██████████▊                      | 16361/50000 [2:58:01<6:28:59,  1.44it/s]


 33%|██████████▊                      | 16362/50000 [2:58:02<6:53:40,  1.36it/s]


 33%|██████████▊                      | 16363/50000 [2:58:03<6:28:32,  1.44it/s]


 33%|██████████▊                      | 16364/50000 [2:58:03<6:25:49,  1.45it/s]


 33%|██████████▊                      | 16365/50000 [2:58:04<6:29:21,  1.44it/s]


 33%|██████████▊                      | 16366/50000 [2:58:04<6:04:11,  1.54it/s]


 33%|██████████▊                      | 16367/50000 [2:58:05<6:45:58,  1.38it/s]


 33%|██████████▊                      | 16368/50000 [2:58:06<6:32:44,  1.43it/s]


 33%|██████████▊                      | 16369/50000 [2:58:07<6:35:36,  1.42it/s]


 33%|██████████▊                      | 16370/50000 [2:58:07<6:21:10,  1.47it/s]


 33%|██████████▊                      | 16371/50000 [2:58:08<6:07:53,  1.52it/s]


 33%|██████████▊                      | 16372/50000 [2:58:09<6:09:12,  1.52it/s]


 33%|██████████▊                      | 16373/50000 [2:58:09<5:30:52,  1.69it/s]


 33%|██████████▊                      | 16374/50000 [2:58:10<5:23:07,  1.73it/s]


 33%|██████████▊                      | 16375/50000 [2:58:10<5:36:48,  1.66it/s]


 33%|██████████▊                      | 16376/50000 [2:58:11<5:46:26,  1.62it/s]


 33%|██████████▊                      | 16377/50000 [2:58:12<5:53:05,  1.59it/s]


 33%|██████████▊                      | 16378/50000 [2:58:12<5:43:55,  1.63it/s]


 33%|██████████▊                      | 16379/50000 [2:58:13<5:40:43,  1.64it/s]


 33%|██████████▊                      | 16380/50000 [2:58:13<5:45:53,  1.62it/s]


 33%|██████████▊                      | 16381/50000 [2:58:14<5:37:55,  1.66it/s]


 33%|██████████▊                      | 16382/50000 [2:58:14<5:24:47,  1.73it/s]


 33%|██████████▊                      | 16383/50000 [2:58:15<5:35:29,  1.67it/s]


 33%|██████████▊                      | 16384/50000 [2:58:16<5:40:37,  1.64it/s]


 33%|██████████▊                      | 16385/50000 [2:58:16<5:33:35,  1.68it/s]


 33%|██████████▊                      | 16386/50000 [2:58:17<5:42:29,  1.64it/s]


 33%|██████████▊                      | 16387/50000 [2:58:18<5:41:08,  1.64it/s]


 33%|██████████▊                      | 16388/50000 [2:58:18<5:41:10,  1.64it/s]


 33%|██████████▊                      | 16389/50000 [2:58:19<5:41:13,  1.64it/s]


 33%|██████████▊                      | 16390/50000 [2:58:19<5:50:51,  1.60it/s]


 33%|██████████▊                      | 16391/50000 [2:58:20<5:46:03,  1.62it/s]


 33%|██████████▊                      | 16392/50000 [2:58:21<5:37:12,  1.66it/s]


 33%|██████████▊                      | 16393/50000 [2:58:21<6:03:14,  1.54it/s]


 33%|██████████▊                      | 16394/50000 [2:58:22<6:23:39,  1.46it/s]


 33%|██████████▊                      | 16395/50000 [2:58:23<6:09:54,  1.51it/s]


 33%|██████████▊                      | 16396/50000 [2:58:24<6:35:09,  1.42it/s]


 33%|██████████▊                      | 16397/50000 [2:58:24<6:27:09,  1.45it/s]


 33%|██████████▊                      | 16398/50000 [2:58:25<6:09:50,  1.51it/s]


 33%|██████████▊                      | 16399/50000 [2:58:25<5:43:27,  1.63it/s]


 33%|██████████▊                      | 16400/50000 [2:58:26<5:59:43,  1.56it/s]
                                                                                
{'loss': 3.3272, 'grad_norm': 3.057971715927124, 'learning_rate': 0.0006720000000000001, 'epoch': 0.86}

 33%|██████████▊                      | 16400/50000 [2:58:26<5:59:43,  1.56it/s]


 33%|██████████▊                      | 16401/50000 [2:58:27<6:00:24,  1.55it/s]


 33%|██████████▊                      | 16402/50000 [2:58:27<5:50:07,  1.60it/s]


 33%|██████████▊                      | 16403/50000 [2:58:28<5:52:34,  1.59it/s]


 33%|██████████▊                      | 16404/50000 [2:58:29<5:47:47,  1.61it/s]


 33%|██████████▊                      | 16405/50000 [2:58:29<5:31:09,  1.69it/s]


 33%|██████████▊                      | 16406/50000 [2:58:30<5:39:11,  1.65it/s]


 33%|██████████▊                      | 16407/50000 [2:58:30<5:45:55,  1.62it/s]


 33%|██████████▊                      | 16408/50000 [2:58:31<5:47:26,  1.61it/s]


 33%|██████████▊                      | 16409/50000 [2:58:32<5:52:39,  1.59it/s]


 33%|██████████▊                      | 16410/50000 [2:58:32<6:14:52,  1.49it/s]


 33%|██████████▊                      | 16411/50000 [2:58:33<6:06:38,  1.53it/s]


 33%|██████████▊                      | 16412/50000 [2:58:34<6:16:12,  1.49it/s]


 33%|██████████▊                      | 16413/50000 [2:58:34<6:13:00,  1.50it/s]


 33%|██████████▊                      | 16414/50000 [2:58:35<6:38:07,  1.41it/s]


 33%|██████████▊                      | 16415/50000 [2:58:36<6:39:14,  1.40it/s]


 33%|██████████▊                      | 16416/50000 [2:58:36<6:13:08,  1.50it/s]


 33%|██████████▊                      | 16417/50000 [2:58:37<6:08:30,  1.52it/s]


 33%|██████████▊                      | 16418/50000 [2:58:38<6:34:02,  1.42it/s]


 33%|██████████▊                      | 16419/50000 [2:58:38<5:59:41,  1.56it/s]


 33%|██████████▊                      | 16420/50000 [2:58:39<5:48:52,  1.60it/s]


 33%|██████████▊                      | 16421/50000 [2:58:40<5:56:11,  1.57it/s]


 33%|██████████▊                      | 16422/50000 [2:58:40<6:01:15,  1.55it/s]


 33%|██████████▊                      | 16423/50000 [2:58:41<6:00:44,  1.55it/s]


 33%|██████████▊                      | 16424/50000 [2:58:42<5:53:49,  1.58it/s]


 33%|██████████▊                      | 16425/50000 [2:58:43<7:12:45,  1.29it/s]


 33%|██████████▊                      | 16426/50000 [2:58:43<6:30:57,  1.43it/s]


 33%|██████████▊                      | 16427/50000 [2:58:44<6:06:46,  1.53it/s]


 33%|██████████▊                      | 16428/50000 [2:58:44<5:55:00,  1.58it/s]


 33%|██████████▊                      | 16429/50000 [2:58:45<6:07:52,  1.52it/s]


 33%|██████████▊                      | 16430/50000 [2:58:46<6:34:38,  1.42it/s]


 33%|██████████▊                      | 16431/50000 [2:58:46<6:03:13,  1.54it/s]


 33%|██████████▊                      | 16432/50000 [2:58:47<6:13:24,  1.50it/s]


 33%|██████████▊                      | 16433/50000 [2:58:48<6:27:09,  1.45it/s]


 33%|██████████▊                      | 16434/50000 [2:58:48<6:21:28,  1.47it/s]


 33%|██████████▊                      | 16435/50000 [2:58:49<6:02:42,  1.54it/s]


 33%|██████████▊                      | 16436/50000 [2:58:50<6:07:39,  1.52it/s]


 33%|██████████▊                      | 16437/50000 [2:58:50<6:03:35,  1.54it/s]


 33%|██████████▊                      | 16438/50000 [2:58:51<6:19:10,  1.48it/s]


 33%|██████████▊                      | 16439/50000 [2:58:52<6:32:11,  1.43it/s]


 33%|██████████▊                      | 16440/50000 [2:58:53<6:27:15,  1.44it/s]


 33%|██████████▊                      | 16441/50000 [2:58:53<6:17:19,  1.48it/s]


 33%|██████████▊                      | 16442/50000 [2:58:54<6:13:31,  1.50it/s]


 33%|██████████▊                      | 16443/50000 [2:58:54<6:06:35,  1.53it/s]


 33%|██████████▊                      | 16444/50000 [2:58:55<6:18:03,  1.48it/s]


 33%|██████████▊                      | 16445/50000 [2:58:56<6:13:59,  1.50it/s]


 33%|██████████▊                      | 16446/50000 [2:58:56<6:10:10,  1.51it/s]


 33%|██████████▊                      | 16447/50000 [2:58:57<6:04:11,  1.54it/s]


 33%|██████████▊                      | 16448/50000 [2:58:58<6:01:12,  1.55it/s]


 33%|██████████▊                      | 16449/50000 [2:58:58<5:53:44,  1.58it/s]


 33%|██████████▊                      | 16450/50000 [2:58:59<5:51:44,  1.59it/s]


 33%|██████████▊                      | 16451/50000 [2:59:00<6:59:37,  1.33it/s]


 33%|██████████▊                      | 16452/50000 [2:59:01<6:42:01,  1.39it/s]


 33%|██████████▊                      | 16453/50000 [2:59:01<6:20:07,  1.47it/s]


 33%|██████████▊                      | 16454/50000 [2:59:02<6:17:30,  1.48it/s]


 33%|██████████▊                      | 16455/50000 [2:59:02<6:01:02,  1.55it/s]


 33%|██████████▊                      | 16456/50000 [2:59:03<5:58:53,  1.56it/s]


 33%|██████████▊                      | 16457/50000 [2:59:04<6:14:45,  1.49it/s]


 33%|██████████▊                      | 16458/50000 [2:59:05<6:16:55,  1.48it/s]


 33%|██████████▊                      | 16459/50000 [2:59:05<6:14:22,  1.49it/s]


 33%|██████████▊                      | 16460/50000 [2:59:06<6:17:08,  1.48it/s]


 33%|██████████▊                      | 16461/50000 [2:59:07<6:23:13,  1.46it/s]


 33%|██████████▊                      | 16462/50000 [2:59:07<6:14:44,  1.49it/s]


 33%|██████████▊                      | 16463/50000 [2:59:08<6:26:45,  1.45it/s]


 33%|██████████▊                      | 16464/50000 [2:59:09<6:21:27,  1.47it/s]


 33%|██████████▊                      | 16465/50000 [2:59:09<6:19:54,  1.47it/s]


 33%|██████████▊                      | 16466/50000 [2:59:10<6:47:15,  1.37it/s]


 33%|██████████▊                      | 16467/50000 [2:59:11<7:02:11,  1.32it/s]


 33%|██████████▊                      | 16468/50000 [2:59:12<7:01:31,  1.33it/s]


 33%|██████████▊                      | 16469/50000 [2:59:12<7:05:22,  1.31it/s]


 33%|██████████▊                      | 16470/50000 [2:59:13<6:56:59,  1.34it/s]


 33%|██████████▊                      | 16471/50000 [2:59:14<6:44:48,  1.38it/s]


 33%|██████████▊                      | 16472/50000 [2:59:14<6:22:32,  1.46it/s]


 33%|██████████▊                      | 16473/50000 [2:59:15<6:02:41,  1.54it/s]


 33%|██████████▊                      | 16474/50000 [2:59:16<6:25:05,  1.45it/s]


 33%|██████████▊                      | 16475/50000 [2:59:16<6:23:22,  1.46it/s]


 33%|██████████▊                      | 16476/50000 [2:59:17<6:17:52,  1.48it/s]


 33%|██████████▊                      | 16477/50000 [2:59:18<5:58:17,  1.56it/s]


 33%|██████████▉                      | 16478/50000 [2:59:18<5:47:44,  1.61it/s]


 33%|██████████▉                      | 16479/50000 [2:59:19<5:51:59,  1.59it/s]


 33%|██████████▉                      | 16480/50000 [2:59:19<5:45:24,  1.62it/s]


 33%|██████████▉                      | 16481/50000 [2:59:20<5:50:25,  1.59it/s]


 33%|██████████▉                      | 16482/50000 [2:59:21<5:43:17,  1.63it/s]


 33%|██████████▉                      | 16483/50000 [2:59:21<5:39:22,  1.65it/s]


 33%|██████████▉                      | 16484/50000 [2:59:22<6:03:18,  1.54it/s]


 33%|██████████▉                      | 16485/50000 [2:59:23<5:51:56,  1.59it/s]


 33%|██████████▉                      | 16486/50000 [2:59:23<6:28:20,  1.44it/s]


 33%|██████████▉                      | 16487/50000 [2:59:24<6:11:27,  1.50it/s]


 33%|██████████▉                      | 16488/50000 [2:59:25<6:11:22,  1.50it/s]


 33%|██████████▉                      | 16489/50000 [2:59:25<6:18:48,  1.47it/s]


 33%|██████████▉                      | 16490/50000 [2:59:26<5:57:27,  1.56it/s]


 33%|██████████▉                      | 16491/50000 [2:59:27<5:47:47,  1.61it/s]


 33%|██████████▉                      | 16492/50000 [2:59:27<5:34:47,  1.67it/s]


 33%|██████████▉                      | 16493/50000 [2:59:28<5:56:39,  1.57it/s]


 33%|██████████▉                      | 16494/50000 [2:59:29<6:02:02,  1.54it/s]


 33%|██████████▉                      | 16495/50000 [2:59:29<6:18:56,  1.47it/s]


 33%|██████████▉                      | 16496/50000 [2:59:30<6:40:50,  1.39it/s]


 33%|██████████▉                      | 16497/50000 [2:59:31<6:27:37,  1.44it/s]


 33%|██████████▉                      | 16498/50000 [2:59:31<6:10:52,  1.51it/s]


 33%|██████████▉                      | 16499/50000 [2:59:32<5:39:51,  1.64it/s]


 33%|██████████▉                      | 16500/50000 [2:59:32<5:41:55,  1.63it/s]
                                                                                
{'loss': 3.3592, 'grad_norm': 2.9681665897369385, 'learning_rate': 0.00067, 'epoch': 0.86}

 33%|██████████▉                      | 16500/50000 [2:59:32<5:41:55,  1.63it/s]


 33%|██████████▉                      | 16501/50000 [2:59:33<5:51:51,  1.59it/s]


 33%|██████████▉                      | 16502/50000 [2:59:34<5:55:24,  1.57it/s]


 33%|██████████▉                      | 16503/50000 [2:59:34<6:09:01,  1.51it/s]


 33%|██████████▉                      | 16504/50000 [2:59:35<6:20:40,  1.47it/s]


 33%|██████████▉                      | 16505/50000 [2:59:36<6:18:49,  1.47it/s]


 33%|██████████▉                      | 16506/50000 [2:59:37<6:17:28,  1.48it/s]


 33%|██████████▉                      | 16507/50000 [2:59:37<6:11:17,  1.50it/s]


 33%|██████████▉                      | 16508/50000 [2:59:38<6:23:10,  1.46it/s]


 33%|██████████▉                      | 16509/50000 [2:59:39<6:07:48,  1.52it/s]


 33%|██████████▉                      | 16510/50000 [2:59:39<5:52:56,  1.58it/s]


 33%|██████████▉                      | 16511/50000 [2:59:40<5:52:59,  1.58it/s]


 33%|██████████▉                      | 16512/50000 [2:59:40<5:39:18,  1.64it/s]


 33%|██████████▉                      | 16513/50000 [2:59:41<5:38:08,  1.65it/s]


 33%|██████████▉                      | 16514/50000 [2:59:42<5:55:22,  1.57it/s]


 33%|██████████▉                      | 16515/50000 [2:59:42<5:45:16,  1.62it/s]


 33%|██████████▉                      | 16516/50000 [2:59:43<5:50:49,  1.59it/s]


 33%|██████████▉                      | 16517/50000 [2:59:44<6:10:57,  1.50it/s]


 33%|██████████▉                      | 16518/50000 [2:59:44<6:12:03,  1.50it/s]


 33%|██████████▉                      | 16519/50000 [2:59:45<6:06:24,  1.52it/s]


 33%|██████████▉                      | 16520/50000 [2:59:46<6:34:05,  1.42it/s]


 33%|██████████▉                      | 16521/50000 [2:59:47<6:56:43,  1.34it/s]


 33%|██████████▉                      | 16522/50000 [2:59:47<6:53:57,  1.35it/s]


 33%|██████████▉                      | 16523/50000 [2:59:48<6:54:07,  1.35it/s]


 33%|██████████▉                      | 16524/50000 [2:59:49<6:53:09,  1.35it/s]


 33%|██████████▉                      | 16525/50000 [2:59:49<6:37:06,  1.40it/s]


 33%|██████████▉                      | 16526/50000 [2:59:50<6:17:24,  1.48it/s]


 33%|██████████▉                      | 16527/50000 [2:59:51<6:16:33,  1.48it/s]


 33%|██████████▉                      | 16528/50000 [2:59:51<6:02:04,  1.54it/s]


 33%|██████████▉                      | 16529/50000 [2:59:52<5:52:58,  1.58it/s]


 33%|██████████▉                      | 16530/50000 [2:59:52<5:53:58,  1.58it/s]


 33%|██████████▉                      | 16531/50000 [2:59:53<5:53:45,  1.58it/s]


 33%|██████████▉                      | 16532/50000 [2:59:54<6:10:51,  1.50it/s]


 33%|██████████▉                      | 16533/50000 [2:59:55<6:14:06,  1.49it/s]


 33%|██████████▉                      | 16534/50000 [2:59:55<6:07:53,  1.52it/s]


 33%|██████████▉                      | 16535/50000 [2:59:56<5:53:57,  1.58it/s]


 33%|██████████▉                      | 16536/50000 [2:59:56<5:54:41,  1.57it/s]


 33%|██████████▉                      | 16537/50000 [2:59:57<5:54:08,  1.57it/s]


 33%|██████████▉                      | 16538/50000 [2:59:58<5:35:17,  1.66it/s]


 33%|██████████▉                      | 16539/50000 [2:59:58<5:44:51,  1.62it/s]


 33%|██████████▉                      | 16540/50000 [2:59:59<5:27:43,  1.70it/s]


 33%|██████████▉                      | 16541/50000 [2:59:59<5:30:56,  1.69it/s]


 33%|██████████▉                      | 16542/50000 [3:00:00<5:50:17,  1.59it/s]


 33%|██████████▉                      | 16543/50000 [3:00:01<5:39:25,  1.64it/s]


 33%|██████████▉                      | 16544/50000 [3:00:01<5:44:36,  1.62it/s]


 33%|██████████▉                      | 16545/50000 [3:00:02<5:50:14,  1.59it/s]


 33%|██████████▉                      | 16546/50000 [3:00:03<5:53:09,  1.58it/s]


 33%|██████████▉                      | 16547/50000 [3:00:03<6:30:17,  1.43it/s]


 33%|██████████▉                      | 16548/50000 [3:00:04<6:31:46,  1.42it/s]


 33%|██████████▉                      | 16549/50000 [3:00:05<6:41:39,  1.39it/s]


 33%|██████████▉                      | 16550/50000 [3:00:06<6:59:40,  1.33it/s]


 33%|██████████▉                      | 16551/50000 [3:00:06<6:56:52,  1.34it/s]


 33%|██████████▉                      | 16552/50000 [3:00:07<6:37:44,  1.40it/s]


 33%|██████████▉                      | 16553/50000 [3:00:08<6:37:05,  1.40it/s]


 33%|██████████▉                      | 16554/50000 [3:00:08<6:17:28,  1.48it/s]


 33%|██████████▉                      | 16555/50000 [3:00:09<6:24:00,  1.45it/s]


 33%|██████████▉                      | 16556/50000 [3:00:10<5:59:53,  1.55it/s]


 33%|██████████▉                      | 16557/50000 [3:00:10<5:57:45,  1.56it/s]


 33%|██████████▉                      | 16558/50000 [3:00:11<5:46:47,  1.61it/s]


 33%|██████████▉                      | 16559/50000 [3:00:11<5:47:35,  1.60it/s]


 33%|██████████▉                      | 16560/50000 [3:00:12<5:47:04,  1.61it/s]


 33%|██████████▉                      | 16561/50000 [3:00:13<5:47:44,  1.60it/s]


 33%|██████████▉                      | 16562/50000 [3:00:13<6:04:23,  1.53it/s]


 33%|██████████▉                      | 16563/50000 [3:00:14<5:55:31,  1.57it/s]


 33%|██████████▉                      | 16564/50000 [3:00:15<5:45:09,  1.61it/s]


 33%|██████████▉                      | 16565/50000 [3:00:15<5:50:32,  1.59it/s]


 33%|██████████▉                      | 16566/50000 [3:00:16<5:43:46,  1.62it/s]


 33%|██████████▉                      | 16567/50000 [3:00:16<5:27:55,  1.70it/s]


 33%|██████████▉                      | 16568/50000 [3:00:17<6:12:21,  1.50it/s]


 33%|██████████▉                      | 16569/50000 [3:00:18<5:55:58,  1.57it/s]


 33%|██████████▉                      | 16570/50000 [3:00:18<5:57:05,  1.56it/s]


 33%|██████████▉                      | 16571/50000 [3:00:19<5:49:18,  1.60it/s]


 33%|██████████▉                      | 16572/50000 [3:00:20<5:53:22,  1.58it/s]


 33%|██████████▉                      | 16573/50000 [3:00:20<6:08:18,  1.51it/s]


 33%|██████████▉                      | 16574/50000 [3:00:21<6:17:20,  1.48it/s]


 33%|██████████▉                      | 16575/50000 [3:00:22<6:22:37,  1.46it/s]


 33%|██████████▉                      | 16576/50000 [3:00:22<6:12:31,  1.50it/s]


 33%|██████████▉                      | 16577/50000 [3:00:23<6:02:18,  1.54it/s]


 33%|██████████▉                      | 16578/50000 [3:00:24<5:58:22,  1.55it/s]


 33%|██████████▉                      | 16579/50000 [3:00:24<5:50:07,  1.59it/s]


 33%|██████████▉                      | 16580/50000 [3:00:25<5:51:47,  1.58it/s]


 33%|██████████▉                      | 16581/50000 [3:00:25<5:33:30,  1.67it/s]


 33%|██████████▉                      | 16582/50000 [3:00:26<5:43:22,  1.62it/s]


 33%|██████████▉                      | 16583/50000 [3:00:27<6:31:35,  1.42it/s]


 33%|██████████▉                      | 16584/50000 [3:00:28<6:15:29,  1.48it/s]


 33%|██████████▉                      | 16585/50000 [3:00:28<6:13:50,  1.49it/s]


 33%|██████████▉                      | 16586/50000 [3:00:29<6:21:10,  1.46it/s]


 33%|██████████▉                      | 16587/50000 [3:00:30<7:05:29,  1.31it/s]


 33%|██████████▉                      | 16588/50000 [3:00:31<6:51:58,  1.35it/s]


 33%|██████████▉                      | 16589/50000 [3:00:31<6:28:50,  1.43it/s]


 33%|██████████▉                      | 16590/50000 [3:00:32<6:08:28,  1.51it/s]


 33%|██████████▉                      | 16591/50000 [3:00:32<5:54:13,  1.57it/s]


 33%|██████████▉                      | 16592/50000 [3:00:33<5:41:02,  1.63it/s]


 33%|██████████▉                      | 16593/50000 [3:00:34<6:28:55,  1.43it/s]


 33%|██████████▉                      | 16594/50000 [3:00:34<6:16:59,  1.48it/s]


 33%|██████████▉                      | 16595/50000 [3:00:35<6:15:01,  1.48it/s]


 33%|██████████▉                      | 16596/50000 [3:00:36<6:24:45,  1.45it/s]


 33%|██████████▉                      | 16597/50000 [3:00:37<6:49:15,  1.36it/s]


 33%|██████████▉                      | 16598/50000 [3:00:37<6:30:31,  1.43it/s]


 33%|██████████▉                      | 16599/50000 [3:00:38<6:18:35,  1.47it/s]


 33%|██████████▉                      | 16600/50000 [3:00:38<5:52:37,  1.58it/s]
                                                                                
{'loss': 3.3688, 'grad_norm': 3.236881732940674, 'learning_rate': 0.0006680000000000001, 'epoch': 0.87}

 33%|██████████▉                      | 16600/50000 [3:00:38<5:52:37,  1.58it/s]


 33%|██████████▉                      | 16601/50000 [3:00:39<5:43:33,  1.62it/s]


 33%|██████████▉                      | 16602/50000 [3:00:40<5:38:08,  1.65it/s]


 33%|██████████▉                      | 16603/50000 [3:00:40<5:24:13,  1.72it/s]


 33%|██████████▉                      | 16604/50000 [3:00:41<5:53:33,  1.57it/s]


 33%|██████████▉                      | 16605/50000 [3:00:41<5:43:58,  1.62it/s]


 33%|██████████▉                      | 16606/50000 [3:00:42<5:45:44,  1.61it/s]


 33%|██████████▉                      | 16607/50000 [3:00:43<5:54:15,  1.57it/s]


 33%|██████████▉                      | 16608/50000 [3:00:43<5:53:43,  1.57it/s]


 33%|██████████▉                      | 16609/50000 [3:00:44<6:09:38,  1.51it/s]


 33%|██████████▉                      | 16610/50000 [3:00:45<6:02:20,  1.54it/s]


 33%|██████████▉                      | 16611/50000 [3:00:46<6:24:29,  1.45it/s]


 33%|██████████▉                      | 16612/50000 [3:00:46<6:20:19,  1.46it/s]


 33%|██████████▉                      | 16613/50000 [3:00:47<6:01:24,  1.54it/s]


 33%|██████████▉                      | 16614/50000 [3:00:47<6:05:16,  1.52it/s]


 33%|██████████▉                      | 16615/50000 [3:00:48<6:15:32,  1.48it/s]


 33%|██████████▉                      | 16616/50000 [3:00:49<6:01:08,  1.54it/s]


 33%|██████████▉                      | 16617/50000 [3:00:49<5:46:49,  1.60it/s]


 33%|██████████▉                      | 16618/50000 [3:00:50<5:40:58,  1.63it/s]


 33%|██████████▉                      | 16619/50000 [3:00:51<6:04:45,  1.53it/s]


 33%|██████████▉                      | 16620/50000 [3:00:51<6:04:06,  1.53it/s]


 33%|██████████▉                      | 16621/50000 [3:00:52<5:55:13,  1.57it/s]


 33%|██████████▉                      | 16622/50000 [3:00:53<5:47:55,  1.60it/s]


 33%|██████████▉                      | 16623/50000 [3:00:53<6:38:54,  1.39it/s]


 33%|██████████▉                      | 16624/50000 [3:00:54<6:15:38,  1.48it/s]


 33%|██████████▉                      | 16625/50000 [3:00:55<6:03:31,  1.53it/s]


 33%|██████████▉                      | 16626/50000 [3:00:55<5:54:50,  1.57it/s]


 33%|██████████▉                      | 16627/50000 [3:00:56<5:56:08,  1.56it/s]


 33%|██████████▉                      | 16628/50000 [3:00:56<5:48:29,  1.60it/s]


 33%|██████████▉                      | 16629/50000 [3:00:57<5:35:58,  1.66it/s]


 33%|██████████▉                      | 16630/50000 [3:00:58<5:59:04,  1.55it/s]


 33%|██████████▉                      | 16631/50000 [3:00:58<5:59:02,  1.55it/s]


 33%|██████████▉                      | 16632/50000 [3:00:59<5:42:12,  1.63it/s]


 33%|██████████▉                      | 16633/50000 [3:01:00<5:47:04,  1.60it/s]


 33%|██████████▉                      | 16634/50000 [3:01:00<5:43:18,  1.62it/s]


 33%|██████████▉                      | 16635/50000 [3:01:01<5:41:40,  1.63it/s]


 33%|██████████▉                      | 16636/50000 [3:01:01<5:30:03,  1.68it/s]


 33%|██████████▉                      | 16637/50000 [3:01:02<5:17:06,  1.75it/s]


 33%|██████████▉                      | 16638/50000 [3:01:03<5:44:56,  1.61it/s]


 33%|██████████▉                      | 16639/50000 [3:01:03<5:34:31,  1.66it/s]


 33%|██████████▉                      | 16640/50000 [3:01:04<5:41:51,  1.63it/s]


 33%|██████████▉                      | 16641/50000 [3:01:05<5:51:13,  1.58it/s]


 33%|██████████▉                      | 16642/50000 [3:01:05<5:45:05,  1.61it/s]


 33%|██████████▉                      | 16643/50000 [3:01:06<5:55:36,  1.56it/s]


 33%|██████████▉                      | 16644/50000 [3:01:07<6:13:50,  1.49it/s]


 33%|██████████▉                      | 16645/50000 [3:01:07<6:10:19,  1.50it/s]


 33%|██████████▉                      | 16646/50000 [3:01:08<5:50:06,  1.59it/s]


 33%|██████████▉                      | 16647/50000 [3:01:08<5:52:43,  1.58it/s]


 33%|██████████▉                      | 16648/50000 [3:01:09<5:54:33,  1.57it/s]


 33%|██████████▉                      | 16649/50000 [3:01:10<5:46:12,  1.61it/s]


 33%|██████████▉                      | 16650/50000 [3:01:10<6:26:32,  1.44it/s]


 33%|██████████▉                      | 16651/50000 [3:01:11<6:08:53,  1.51it/s]


 33%|██████████▉                      | 16652/50000 [3:01:12<6:42:45,  1.38it/s]


 33%|██████████▉                      | 16653/50000 [3:01:13<6:35:53,  1.40it/s]


 33%|██████████▉                      | 16654/50000 [3:01:13<6:15:17,  1.48it/s]


 33%|██████████▉                      | 16655/50000 [3:01:14<6:20:51,  1.46it/s]


 33%|██████████▉                      | 16656/50000 [3:01:15<6:06:46,  1.52it/s]


 33%|██████████▉                      | 16657/50000 [3:01:15<5:50:41,  1.58it/s]


 33%|██████████▉                      | 16658/50000 [3:01:16<5:32:33,  1.67it/s]


 33%|██████████▉                      | 16659/50000 [3:01:16<5:55:46,  1.56it/s]


 33%|██████████▉                      | 16660/50000 [3:01:17<5:54:52,  1.57it/s]


 33%|██████████▉                      | 16661/50000 [3:01:18<5:45:45,  1.61it/s]


 33%|██████████▉                      | 16662/50000 [3:01:18<5:34:14,  1.66it/s]


 33%|██████████▉                      | 16663/50000 [3:01:19<5:39:53,  1.63it/s]


 33%|██████████▉                      | 16664/50000 [3:01:19<5:44:50,  1.61it/s]


 33%|██████████▉                      | 16665/50000 [3:01:20<6:05:31,  1.52it/s]


 33%|██████████▉                      | 16666/50000 [3:01:21<5:47:01,  1.60it/s]


 33%|███████████                      | 16667/50000 [3:01:21<5:47:38,  1.60it/s]


 33%|███████████                      | 16668/50000 [3:01:22<6:05:16,  1.52it/s]


 33%|███████████                      | 16669/50000 [3:01:23<5:47:47,  1.60it/s]


 33%|███████████                      | 16670/50000 [3:01:23<5:39:51,  1.63it/s]


 33%|███████████                      | 16671/50000 [3:01:24<5:47:58,  1.60it/s]


 33%|███████████                      | 16672/50000 [3:01:25<6:14:04,  1.48it/s]


 33%|███████████                      | 16673/50000 [3:01:25<6:15:00,  1.48it/s]


 33%|███████████                      | 16674/50000 [3:01:26<6:02:58,  1.53it/s]


 33%|███████████                      | 16675/50000 [3:01:27<6:02:47,  1.53it/s]


 33%|███████████                      | 16676/50000 [3:01:27<5:59:27,  1.55it/s]


 33%|███████████                      | 16677/50000 [3:01:28<6:33:51,  1.41it/s]


 33%|███████████                      | 16678/50000 [3:01:29<6:15:56,  1.48it/s]


 33%|███████████                      | 16679/50000 [3:01:29<6:38:09,  1.39it/s]


 33%|███████████                      | 16680/50000 [3:01:30<6:12:36,  1.49it/s]


 33%|███████████                      | 16681/50000 [3:01:31<5:54:39,  1.57it/s]


 33%|███████████                      | 16682/50000 [3:01:31<5:46:12,  1.60it/s]


 33%|███████████                      | 16683/50000 [3:01:32<6:32:24,  1.42it/s]


 33%|███████████                      | 16684/50000 [3:01:33<6:01:00,  1.54it/s]


 33%|███████████                      | 16685/50000 [3:01:34<6:47:59,  1.36it/s]


 33%|███████████                      | 16686/50000 [3:01:34<6:50:39,  1.35it/s]


 33%|███████████                      | 16687/50000 [3:01:35<6:39:32,  1.39it/s]


 33%|███████████                      | 16688/50000 [3:01:36<6:26:25,  1.44it/s]


 33%|███████████                      | 16689/50000 [3:01:36<6:15:04,  1.48it/s]


 33%|███████████                      | 16690/50000 [3:01:37<5:43:38,  1.62it/s]


 33%|███████████                      | 16691/50000 [3:01:37<5:45:14,  1.61it/s]


 33%|███████████                      | 16692/50000 [3:01:38<5:41:05,  1.63it/s]


 33%|███████████                      | 16693/50000 [3:01:39<6:05:00,  1.52it/s]


 33%|███████████                      | 16694/50000 [3:01:39<6:15:04,  1.48it/s]


 33%|███████████                      | 16695/50000 [3:01:40<6:07:07,  1.51it/s]


 33%|███████████                      | 16696/50000 [3:01:41<5:56:26,  1.56it/s]


 33%|███████████                      | 16697/50000 [3:01:41<6:11:17,  1.49it/s]


 33%|███████████                      | 16698/50000 [3:01:42<6:21:48,  1.45it/s]


 33%|███████████                      | 16699/50000 [3:01:43<6:03:34,  1.53it/s]


 33%|███████████                      | 16700/50000 [3:01:43<5:58:18,  1.55it/s]
                                                                                
{'loss': 3.3616, 'grad_norm': 3.8734288215637207, 'learning_rate': 0.000666, 'epoch': 0.87}

 33%|███████████                      | 16700/50000 [3:01:43<5:58:18,  1.55it/s]


 33%|███████████                      | 16701/50000 [3:01:44<5:55:33,  1.56it/s]


 33%|███████████                      | 16702/50000 [3:01:45<5:57:43,  1.55it/s]


 33%|███████████                      | 16703/50000 [3:01:45<5:42:28,  1.62it/s]


 33%|███████████                      | 16704/50000 [3:01:46<6:00:31,  1.54it/s]


 33%|███████████                      | 16705/50000 [3:01:47<6:02:14,  1.53it/s]


 33%|███████████                      | 16706/50000 [3:01:47<5:58:12,  1.55it/s]


 33%|███████████                      | 16707/50000 [3:01:48<6:26:57,  1.43it/s]


 33%|███████████                      | 16708/50000 [3:01:49<6:21:47,  1.45it/s]


 33%|███████████                      | 16709/50000 [3:01:49<6:31:26,  1.42it/s]


 33%|███████████                      | 16710/50000 [3:01:50<6:20:58,  1.46it/s]


 33%|███████████                      | 16711/50000 [3:01:51<6:11:24,  1.49it/s]


 33%|███████████                      | 16712/50000 [3:01:51<6:11:00,  1.50it/s]


 33%|███████████                      | 16713/50000 [3:01:52<5:53:51,  1.57it/s]


 33%|███████████                      | 16714/50000 [3:01:52<5:34:57,  1.66it/s]


 33%|███████████                      | 16715/50000 [3:01:53<5:25:27,  1.70it/s]


 33%|███████████                      | 16716/50000 [3:01:54<5:34:30,  1.66it/s]


 33%|███████████                      | 16717/50000 [3:01:54<5:52:29,  1.57it/s]


 33%|███████████                      | 16718/50000 [3:01:55<6:08:53,  1.50it/s]


 33%|███████████                      | 16719/50000 [3:01:56<5:57:42,  1.55it/s]


 33%|███████████                      | 16720/50000 [3:01:56<5:49:37,  1.59it/s]


 33%|███████████                      | 16721/50000 [3:01:57<5:52:26,  1.57it/s]


 33%|███████████                      | 16722/50000 [3:01:58<6:04:58,  1.52it/s]


 33%|███████████                      | 16723/50000 [3:01:58<6:35:43,  1.40it/s]


 33%|███████████                      | 16724/50000 [3:01:59<6:46:20,  1.36it/s]


 33%|███████████                      | 16725/50000 [3:02:00<6:36:33,  1.40it/s]


 33%|███████████                      | 16726/50000 [3:02:01<6:29:31,  1.42it/s]


 33%|███████████                      | 16727/50000 [3:02:01<6:10:55,  1.50it/s]


 33%|███████████                      | 16728/50000 [3:02:02<6:13:46,  1.48it/s]


 33%|███████████                      | 16729/50000 [3:02:02<6:03:01,  1.53it/s]


 33%|███████████                      | 16730/50000 [3:02:03<6:00:44,  1.54it/s]


 33%|███████████                      | 16731/50000 [3:02:04<5:45:33,  1.60it/s]


 33%|███████████                      | 16732/50000 [3:02:04<5:38:50,  1.64it/s]


 33%|███████████                      | 16733/50000 [3:02:05<5:35:01,  1.65it/s]


 33%|███████████                      | 16734/50000 [3:02:06<6:09:06,  1.50it/s]


 33%|███████████                      | 16735/50000 [3:02:06<6:09:58,  1.50it/s]


 33%|███████████                      | 16736/50000 [3:02:07<6:02:07,  1.53it/s]


 33%|███████████                      | 16737/50000 [3:02:08<6:05:57,  1.51it/s]


 33%|███████████                      | 16738/50000 [3:02:08<5:56:09,  1.56it/s]


 33%|███████████                      | 16739/50000 [3:02:09<6:01:57,  1.53it/s]


 33%|███████████                      | 16740/50000 [3:02:09<5:51:29,  1.58it/s]


 33%|███████████                      | 16741/50000 [3:02:10<5:45:19,  1.61it/s]


 33%|███████████                      | 16742/50000 [3:02:11<5:26:52,  1.70it/s]


 33%|███████████                      | 16743/50000 [3:02:11<5:29:00,  1.68it/s]


 33%|███████████                      | 16744/50000 [3:02:12<5:23:19,  1.71it/s]


 33%|███████████                      | 16745/50000 [3:02:12<5:26:34,  1.70it/s]


 33%|███████████                      | 16746/50000 [3:02:13<5:34:09,  1.66it/s]


 33%|███████████                      | 16747/50000 [3:02:14<5:59:53,  1.54it/s]


 33%|███████████                      | 16748/50000 [3:02:14<5:47:52,  1.59it/s]


 33%|███████████                      | 16749/50000 [3:02:15<5:26:09,  1.70it/s]


 34%|███████████                      | 16750/50000 [3:02:15<5:40:48,  1.63it/s]


 34%|███████████                      | 16751/50000 [3:02:16<5:35:21,  1.65it/s]


 34%|███████████                      | 16752/50000 [3:02:17<5:52:37,  1.57it/s]


 34%|███████████                      | 16753/50000 [3:02:17<5:56:15,  1.56it/s]


 34%|███████████                      | 16754/50000 [3:02:18<6:00:55,  1.54it/s]


 34%|███████████                      | 16755/50000 [3:02:19<5:45:16,  1.60it/s]


 34%|███████████                      | 16756/50000 [3:02:19<6:11:23,  1.49it/s]


 34%|███████████                      | 16757/50000 [3:02:20<5:59:51,  1.54it/s]


 34%|███████████                      | 16758/50000 [3:02:21<6:02:28,  1.53it/s]


 34%|███████████                      | 16759/50000 [3:02:21<6:20:20,  1.46it/s]


 34%|███████████                      | 16760/50000 [3:02:22<5:46:47,  1.60it/s]


 34%|███████████                      | 16761/50000 [3:02:23<5:54:15,  1.56it/s]


 34%|███████████                      | 16762/50000 [3:02:23<5:57:17,  1.55it/s]


 34%|███████████                      | 16763/50000 [3:02:24<5:57:22,  1.55it/s]


 34%|███████████                      | 16764/50000 [3:02:25<5:56:17,  1.55it/s]


 34%|███████████                      | 16765/50000 [3:02:25<5:57:41,  1.55it/s]


 34%|███████████                      | 16766/50000 [3:02:26<6:01:52,  1.53it/s]


 34%|███████████                      | 16767/50000 [3:02:27<6:04:50,  1.52it/s]


 34%|███████████                      | 16768/50000 [3:02:27<5:55:15,  1.56it/s]


 34%|███████████                      | 16769/50000 [3:02:28<5:54:56,  1.56it/s]


 34%|███████████                      | 16770/50000 [3:02:29<6:09:43,  1.50it/s]


 34%|███████████                      | 16771/50000 [3:02:29<6:05:01,  1.52it/s]


 34%|███████████                      | 16772/50000 [3:02:30<6:21:08,  1.45it/s]


 34%|███████████                      | 16773/50000 [3:02:31<6:05:46,  1.51it/s]


 34%|███████████                      | 16774/50000 [3:02:31<6:38:03,  1.39it/s]


 34%|███████████                      | 16775/50000 [3:02:32<6:32:55,  1.41it/s]


 34%|███████████                      | 16776/50000 [3:02:33<6:50:38,  1.35it/s]


 34%|███████████                      | 16777/50000 [3:02:34<6:38:53,  1.39it/s]


 34%|███████████                      | 16778/50000 [3:02:34<6:30:52,  1.42it/s]


 34%|███████████                      | 16779/50000 [3:02:35<6:36:59,  1.39it/s]


 34%|███████████                      | 16780/50000 [3:02:36<6:59:54,  1.32it/s]


 34%|███████████                      | 16781/50000 [3:02:36<6:32:39,  1.41it/s]


 34%|███████████                      | 16782/50000 [3:02:37<6:18:51,  1.46it/s]


 34%|███████████                      | 16783/50000 [3:02:38<6:00:50,  1.53it/s]


 34%|███████████                      | 16784/50000 [3:02:38<5:51:22,  1.58it/s]


 34%|███████████                      | 16785/50000 [3:02:39<6:03:41,  1.52it/s]


 34%|███████████                      | 16786/50000 [3:02:40<5:58:34,  1.54it/s]


 34%|███████████                      | 16787/50000 [3:02:40<5:42:25,  1.62it/s]


 34%|███████████                      | 16788/50000 [3:02:41<5:32:17,  1.67it/s]


 34%|███████████                      | 16789/50000 [3:02:41<6:02:16,  1.53it/s]


 34%|███████████                      | 16790/50000 [3:02:42<6:34:17,  1.40it/s]


 34%|███████████                      | 16791/50000 [3:02:43<6:41:33,  1.38it/s]


 34%|███████████                      | 16792/50000 [3:02:44<6:28:14,  1.43it/s]


 34%|███████████                      | 16793/50000 [3:02:44<6:33:20,  1.41it/s]


 34%|███████████                      | 16794/50000 [3:02:45<6:55:03,  1.33it/s]


 34%|███████████                      | 16795/50000 [3:02:46<6:25:11,  1.44it/s]


 34%|███████████                      | 16796/50000 [3:02:46<6:12:38,  1.49it/s]


 34%|███████████                      | 16797/50000 [3:02:47<6:08:58,  1.50it/s]


 34%|███████████                      | 16798/50000 [3:02:48<6:08:38,  1.50it/s]


 34%|███████████                      | 16799/50000 [3:02:48<6:04:08,  1.52it/s]


 34%|███████████                      | 16800/50000 [3:02:49<6:08:20,  1.50it/s]
                                                                                
{'loss': 3.341, 'grad_norm': 3.570673942565918, 'learning_rate': 0.0006640000000000001, 'epoch': 0.88}

 34%|███████████                      | 16800/50000 [3:02:49<6:08:20,  1.50it/s]


 34%|███████████                      | 16801/50000 [3:02:50<6:26:33,  1.43it/s]


 34%|███████████                      | 16802/50000 [3:02:51<6:22:56,  1.44it/s]


 34%|███████████                      | 16803/50000 [3:02:51<6:18:18,  1.46it/s]


 34%|███████████                      | 16804/50000 [3:02:52<6:10:49,  1.49it/s]


 34%|███████████                      | 16805/50000 [3:02:52<5:58:21,  1.54it/s]


 34%|███████████                      | 16806/50000 [3:02:53<6:00:03,  1.54it/s]


 34%|███████████                      | 16807/50000 [3:02:54<5:57:07,  1.55it/s]


 34%|███████████                      | 16808/50000 [3:02:54<5:45:01,  1.60it/s]


 34%|███████████                      | 16809/50000 [3:02:55<6:04:46,  1.52it/s]


 34%|███████████                      | 16810/50000 [3:02:56<6:20:47,  1.45it/s]


 34%|███████████                      | 16811/50000 [3:02:56<6:19:55,  1.46it/s]


 34%|███████████                      | 16812/50000 [3:02:57<6:16:11,  1.47it/s]


 34%|███████████                      | 16813/50000 [3:02:58<6:09:19,  1.50it/s]


 34%|███████████                      | 16814/50000 [3:02:58<6:02:40,  1.53it/s]


 34%|███████████                      | 16815/50000 [3:02:59<6:02:21,  1.53it/s]


 34%|███████████                      | 16816/50000 [3:03:00<6:32:28,  1.41it/s]


 34%|███████████                      | 16817/50000 [3:03:01<6:22:11,  1.45it/s]


 34%|███████████                      | 16818/50000 [3:03:01<6:03:25,  1.52it/s]


 34%|███████████                      | 16819/50000 [3:03:02<5:45:54,  1.60it/s]


 34%|███████████                      | 16820/50000 [3:03:02<5:55:29,  1.56it/s]


 34%|███████████                      | 16821/50000 [3:03:03<5:56:55,  1.55it/s]


 34%|███████████                      | 16822/50000 [3:03:04<6:01:10,  1.53it/s]


 34%|███████████                      | 16823/50000 [3:03:04<5:51:28,  1.57it/s]


 34%|███████████                      | 16824/50000 [3:03:05<5:40:58,  1.62it/s]


 34%|███████████                      | 16825/50000 [3:03:06<5:48:15,  1.59it/s]


 34%|███████████                      | 16826/50000 [3:03:06<5:28:41,  1.68it/s]


 34%|███████████                      | 16827/50000 [3:03:07<5:34:16,  1.65it/s]


 34%|███████████                      | 16828/50000 [3:03:07<5:43:29,  1.61it/s]


 34%|███████████                      | 16829/50000 [3:03:08<5:54:08,  1.56it/s]


 34%|███████████                      | 16830/50000 [3:03:09<6:00:29,  1.53it/s]


 34%|███████████                      | 16831/50000 [3:03:09<5:35:20,  1.65it/s]


 34%|███████████                      | 16832/50000 [3:03:10<5:52:54,  1.57it/s]


 34%|███████████                      | 16833/50000 [3:03:11<6:00:41,  1.53it/s]


 34%|███████████                      | 16834/50000 [3:03:11<6:17:14,  1.47it/s]


 34%|███████████                      | 16835/50000 [3:03:12<6:04:43,  1.52it/s]


 34%|███████████                      | 16836/50000 [3:03:13<5:52:52,  1.57it/s]


 34%|███████████                      | 16837/50000 [3:03:13<5:57:10,  1.55it/s]


 34%|███████████                      | 16838/50000 [3:03:14<5:41:22,  1.62it/s]


 34%|███████████                      | 16839/50000 [3:03:14<5:46:55,  1.59it/s]


 34%|███████████                      | 16840/50000 [3:03:15<5:39:18,  1.63it/s]


 34%|███████████                      | 16841/50000 [3:03:16<5:56:31,  1.55it/s]


 34%|███████████                      | 16842/50000 [3:03:16<5:52:25,  1.57it/s]


 34%|███████████                      | 16843/50000 [3:03:17<6:08:47,  1.50it/s]


 34%|███████████                      | 16844/50000 [3:03:18<5:54:44,  1.56it/s]


 34%|███████████                      | 16845/50000 [3:03:18<6:07:20,  1.50it/s]


 34%|███████████                      | 16846/50000 [3:03:19<6:00:01,  1.53it/s]


 34%|███████████                      | 16847/50000 [3:03:20<6:10:35,  1.49it/s]


 34%|███████████                      | 16848/50000 [3:03:20<6:23:37,  1.44it/s]


 34%|███████████                      | 16849/50000 [3:03:21<6:14:24,  1.48it/s]


 34%|███████████                      | 16850/50000 [3:03:22<6:05:58,  1.51it/s]


 34%|███████████                      | 16851/50000 [3:03:22<6:07:22,  1.50it/s]


 34%|███████████                      | 16852/50000 [3:03:23<5:52:42,  1.57it/s]


 34%|███████████                      | 16853/50000 [3:03:24<6:07:53,  1.50it/s]


 34%|███████████                      | 16854/50000 [3:03:24<6:18:39,  1.46it/s]


 34%|███████████                      | 16855/50000 [3:03:25<6:03:42,  1.52it/s]


 34%|███████████                      | 16856/50000 [3:03:26<5:53:13,  1.56it/s]


 34%|███████████▏                     | 16857/50000 [3:03:26<5:57:21,  1.55it/s]


 34%|███████████▏                     | 16858/50000 [3:03:27<5:58:11,  1.54it/s]


 34%|███████████▏                     | 16859/50000 [3:03:28<6:26:58,  1.43it/s]


 34%|███████████▏                     | 16860/50000 [3:03:28<6:10:28,  1.49it/s]


 34%|███████████▏                     | 16861/50000 [3:03:29<6:11:34,  1.49it/s]


 34%|███████████▏                     | 16862/50000 [3:03:30<6:11:20,  1.49it/s]


 34%|███████████▏                     | 16863/50000 [3:03:30<6:11:03,  1.49it/s]


 34%|███████████▏                     | 16864/50000 [3:03:31<6:03:32,  1.52it/s]


 34%|███████████▏                     | 16865/50000 [3:03:32<6:02:28,  1.52it/s]


 34%|███████████▏                     | 16866/50000 [3:03:32<6:02:47,  1.52it/s]


 34%|███████████▏                     | 16867/50000 [3:03:33<5:46:37,  1.59it/s]


 34%|███████████▏                     | 16868/50000 [3:03:33<5:39:17,  1.63it/s]


 34%|███████████▏                     | 16869/50000 [3:03:34<6:06:54,  1.50it/s]


 34%|███████████▏                     | 16870/50000 [3:03:35<6:30:51,  1.41it/s]


 34%|███████████▏                     | 16871/50000 [3:03:36<6:13:09,  1.48it/s]


 34%|███████████▏                     | 16872/50000 [3:03:36<5:57:41,  1.54it/s]


 34%|███████████▏                     | 16873/50000 [3:03:37<5:59:23,  1.54it/s]


 34%|███████████▏                     | 16874/50000 [3:03:37<5:51:12,  1.57it/s]


 34%|███████████▏                     | 16875/50000 [3:03:38<6:08:58,  1.50it/s]


 34%|███████████▏                     | 16876/50000 [3:03:39<5:56:52,  1.55it/s]


 34%|███████████▏                     | 16877/50000 [3:03:39<5:55:40,  1.55it/s]


 34%|███████████▏                     | 16878/50000 [3:03:40<5:42:21,  1.61it/s]


 34%|███████████▏                     | 16879/50000 [3:03:41<5:33:01,  1.66it/s]


 34%|███████████▏                     | 16880/50000 [3:03:42<6:29:27,  1.42it/s]


 34%|███████████▏                     | 16881/50000 [3:03:42<6:23:42,  1.44it/s]


 34%|███████████▏                     | 16882/50000 [3:03:43<6:16:28,  1.47it/s]


 34%|███████████▏                     | 16883/50000 [3:03:44<6:32:48,  1.41it/s]


 34%|███████████▏                     | 16884/50000 [3:03:44<6:12:27,  1.48it/s]


 34%|███████████▏                     | 16885/50000 [3:03:45<6:18:06,  1.46it/s]


 34%|███████████▏                     | 16886/50000 [3:03:46<6:11:37,  1.49it/s]


 34%|███████████▏                     | 16887/50000 [3:03:46<6:13:25,  1.48it/s]


 34%|███████████▏                     | 16888/50000 [3:03:47<6:18:46,  1.46it/s]


 34%|███████████▏                     | 16889/50000 [3:03:48<6:01:33,  1.53it/s]


 34%|███████████▏                     | 16890/50000 [3:03:48<6:02:03,  1.52it/s]


 34%|███████████▏                     | 16891/50000 [3:03:49<6:04:35,  1.51it/s]


 34%|███████████▏                     | 16892/50000 [3:03:50<6:03:05,  1.52it/s]


 34%|███████████▏                     | 16893/50000 [3:03:50<6:04:10,  1.52it/s]


 34%|███████████▏                     | 16894/50000 [3:03:51<5:54:55,  1.55it/s]


 34%|███████████▏                     | 16895/50000 [3:03:51<5:42:56,  1.61it/s]


 34%|███████████▏                     | 16896/50000 [3:03:52<5:44:15,  1.60it/s]


 34%|███████████▏                     | 16897/50000 [3:03:53<5:48:20,  1.58it/s]


 34%|███████████▏                     | 16898/50000 [3:03:53<5:42:18,  1.61it/s]


 34%|███████████▏                     | 16899/50000 [3:03:54<5:59:22,  1.54it/s]


 34%|███████████▏                     | 16900/50000 [3:03:54<5:34:53,  1.65it/s]
                                                                                
{'loss': 3.3494, 'grad_norm': 3.224595069885254, 'learning_rate': 0.000662, 'epoch': 0.88}

 34%|███████████▏                     | 16900/50000 [3:03:54<5:34:53,  1.65it/s]


 34%|███████████▏                     | 16901/50000 [3:03:55<5:31:53,  1.66it/s]


 34%|███████████▏                     | 16902/50000 [3:03:56<5:23:29,  1.71it/s]


 34%|███████████▏                     | 16903/50000 [3:03:56<5:38:28,  1.63it/s]


 34%|███████████▏                     | 16904/50000 [3:03:57<5:47:54,  1.59it/s]


 34%|███████████▏                     | 16905/50000 [3:03:58<6:40:42,  1.38it/s]


 34%|███████████▏                     | 16906/50000 [3:03:59<6:42:22,  1.37it/s]


 34%|███████████▏                     | 16907/50000 [3:03:59<6:47:44,  1.35it/s]


 34%|███████████▏                     | 16908/50000 [3:04:00<6:29:08,  1.42it/s]


 34%|███████████▏                     | 16909/50000 [3:04:01<6:39:29,  1.38it/s]


 34%|███████████▏                     | 16910/50000 [3:04:02<7:25:30,  1.24it/s]


 34%|███████████▏                     | 16911/50000 [3:04:02<7:04:52,  1.30it/s]


 34%|███████████▏                     | 16912/50000 [3:04:03<6:40:05,  1.38it/s]


 34%|███████████▏                     | 16913/50000 [3:04:04<6:23:47,  1.44it/s]


 34%|███████████▏                     | 16914/50000 [3:04:04<6:13:48,  1.48it/s]


 34%|███████████▏                     | 16915/50000 [3:04:05<6:12:09,  1.48it/s]


 34%|███████████▏                     | 16916/50000 [3:04:06<5:47:53,  1.58it/s]


 34%|███████████▏                     | 16917/50000 [3:04:06<5:37:41,  1.63it/s]


 34%|███████████▏                     | 16918/50000 [3:04:07<6:04:03,  1.51it/s]


 34%|███████████▏                     | 16919/50000 [3:04:08<6:06:09,  1.51it/s]


 34%|███████████▏                     | 16920/50000 [3:04:08<6:00:15,  1.53it/s]


 34%|███████████▏                     | 16921/50000 [3:04:09<6:13:46,  1.47it/s]


 34%|███████████▏                     | 16922/50000 [3:04:10<6:00:24,  1.53it/s]


 34%|███████████▏                     | 16923/50000 [3:04:10<5:59:26,  1.53it/s]


 34%|███████████▏                     | 16924/50000 [3:04:11<6:02:58,  1.52it/s]


 34%|███████████▏                     | 16925/50000 [3:04:11<5:58:09,  1.54it/s]


 34%|███████████▏                     | 16926/50000 [3:04:12<6:56:53,  1.32it/s]


 34%|███████████▏                     | 16927/50000 [3:04:13<6:39:00,  1.38it/s]


 34%|███████████▏                     | 16928/50000 [3:04:14<6:18:59,  1.45it/s]


 34%|███████████▏                     | 16929/50000 [3:04:14<6:15:28,  1.47it/s]


 34%|███████████▏                     | 16930/50000 [3:04:15<6:10:52,  1.49it/s]


 34%|███████████▏                     | 16931/50000 [3:04:16<6:02:42,  1.52it/s]


 34%|███████████▏                     | 16932/50000 [3:04:16<6:04:07,  1.51it/s]


 34%|███████████▏                     | 16933/50000 [3:04:17<6:07:08,  1.50it/s]


 34%|███████████▏                     | 16934/50000 [3:04:18<6:02:51,  1.52it/s]


 34%|███████████▏                     | 16935/50000 [3:04:18<6:23:06,  1.44it/s]


 34%|███████████▏                     | 16936/50000 [3:04:19<6:04:25,  1.51it/s]


 34%|███████████▏                     | 16937/50000 [3:04:20<5:58:39,  1.54it/s]


 34%|███████████▏                     | 16938/50000 [3:04:20<5:59:50,  1.53it/s]


 34%|███████████▏                     | 16939/50000 [3:04:21<5:46:02,  1.59it/s]


 34%|███████████▏                     | 16940/50000 [3:04:22<5:49:57,  1.57it/s]


 34%|███████████▏                     | 16941/50000 [3:04:22<5:44:15,  1.60it/s]


 34%|███████████▏                     | 16942/50000 [3:04:23<6:04:49,  1.51it/s]


 34%|███████████▏                     | 16943/50000 [3:04:24<6:15:02,  1.47it/s]


 34%|███████████▏                     | 16944/50000 [3:04:24<5:52:37,  1.56it/s]


 34%|███████████▏                     | 16945/50000 [3:04:25<5:44:15,  1.60it/s]


 34%|███████████▏                     | 16946/50000 [3:04:25<5:59:40,  1.53it/s]


 34%|███████████▏                     | 16947/50000 [3:04:26<6:04:50,  1.51it/s]


 34%|███████████▏                     | 16948/50000 [3:04:27<5:49:35,  1.58it/s]


 34%|███████████▏                     | 16949/50000 [3:04:27<6:05:40,  1.51it/s]


 34%|███████████▏                     | 16950/50000 [3:04:28<5:59:57,  1.53it/s]


 34%|███████████▏                     | 16951/50000 [3:04:29<6:05:05,  1.51it/s]


 34%|███████████▏                     | 16952/50000 [3:04:29<6:01:13,  1.52it/s]


 34%|███████████▏                     | 16953/50000 [3:04:30<6:15:37,  1.47it/s]


 34%|███████████▏                     | 16954/50000 [3:04:31<6:51:28,  1.34it/s]


 34%|███████████▏                     | 16955/50000 [3:04:32<7:01:41,  1.31it/s]


 34%|███████████▏                     | 16956/50000 [3:04:32<6:29:23,  1.41it/s]


 34%|███████████▏                     | 16957/50000 [3:04:33<6:04:37,  1.51it/s]


 34%|███████████▏                     | 16958/50000 [3:04:34<6:08:12,  1.50it/s]


 34%|███████████▏                     | 16959/50000 [3:04:34<5:48:47,  1.58it/s]


 34%|███████████▏                     | 16960/50000 [3:04:35<5:55:56,  1.55it/s]


 34%|███████████▏                     | 16961/50000 [3:04:36<5:52:38,  1.56it/s]


 34%|███████████▏                     | 16962/50000 [3:04:36<5:53:28,  1.56it/s]


 34%|███████████▏                     | 16963/50000 [3:04:37<6:25:16,  1.43it/s]


 34%|███████████▏                     | 16964/50000 [3:04:38<6:26:46,  1.42it/s]


 34%|███████████▏                     | 16965/50000 [3:04:38<6:08:59,  1.49it/s]


 34%|███████████▏                     | 16966/50000 [3:04:39<6:05:53,  1.50it/s]


 34%|███████████▏                     | 16967/50000 [3:04:40<5:50:16,  1.57it/s]


 34%|███████████▏                     | 16968/50000 [3:04:40<6:14:10,  1.47it/s]


 34%|███████████▏                     | 16969/50000 [3:04:41<5:57:06,  1.54it/s]


 34%|███████████▏                     | 16970/50000 [3:04:42<5:56:36,  1.54it/s]


 34%|███████████▏                     | 16971/50000 [3:04:42<5:53:07,  1.56it/s]


 34%|███████████▏                     | 16972/50000 [3:04:43<5:47:29,  1.58it/s]


 34%|███████████▏                     | 16973/50000 [3:04:43<5:36:18,  1.64it/s]


 34%|███████████▏                     | 16974/50000 [3:04:44<5:54:54,  1.55it/s]


 34%|███████████▏                     | 16975/50000 [3:04:45<5:38:21,  1.63it/s]


 34%|███████████▏                     | 16976/50000 [3:04:45<5:40:18,  1.62it/s]


 34%|███████████▏                     | 16977/50000 [3:04:46<5:59:49,  1.53it/s]


 34%|███████████▏                     | 16978/50000 [3:04:47<6:02:40,  1.52it/s]


 34%|███████████▏                     | 16979/50000 [3:04:47<6:14:22,  1.47it/s]


 34%|███████████▏                     | 16980/50000 [3:04:48<6:22:42,  1.44it/s]


 34%|███████████▏                     | 16981/50000 [3:04:49<6:29:29,  1.41it/s]


 34%|███████████▏                     | 16982/50000 [3:04:49<6:08:48,  1.49it/s]


 34%|███████████▏                     | 16983/50000 [3:04:50<5:43:20,  1.60it/s]


 34%|███████████▏                     | 16984/50000 [3:04:51<5:43:44,  1.60it/s]


 34%|███████████▏                     | 16985/50000 [3:04:51<5:36:46,  1.63it/s]


 34%|███████████▏                     | 16986/50000 [3:04:52<5:44:29,  1.60it/s]


 34%|███████████▏                     | 16987/50000 [3:04:52<5:51:57,  1.56it/s]


 34%|███████████▏                     | 16988/50000 [3:04:53<5:46:37,  1.59it/s]


 34%|███████████▏                     | 16989/50000 [3:04:54<6:02:06,  1.52it/s]


 34%|███████████▏                     | 16990/50000 [3:04:54<5:53:45,  1.56it/s]


 34%|███████████▏                     | 16991/50000 [3:04:55<5:52:21,  1.56it/s]


 34%|███████████▏                     | 16992/50000 [3:04:56<5:47:00,  1.59it/s]


 34%|███████████▏                     | 16993/50000 [3:04:56<5:39:03,  1.62it/s]


 34%|███████████▏                     | 16994/50000 [3:04:57<5:45:56,  1.59it/s]


 34%|███████████▏                     | 16995/50000 [3:04:57<5:40:22,  1.62it/s]


 34%|███████████▏                     | 16996/50000 [3:04:58<5:33:22,  1.65it/s]


 34%|███████████▏                     | 16997/50000 [3:04:59<5:31:29,  1.66it/s]


 34%|███████████▏                     | 16998/50000 [3:04:59<5:58:48,  1.53it/s]


 34%|███████████▏                     | 16999/50000 [3:05:00<5:44:12,  1.60it/s]


 34%|███████████▏                     | 17000/50000 [3:05:01<5:45:22,  1.59it/s]
                                                                                
{'loss': 3.3682, 'grad_norm': 2.9447460174560547, 'learning_rate': 0.00066, 'epoch': 0.89}

 34%|███████████▏                     | 17000/50000 [3:05:01<5:45:22,  1.59it/s]


 34%|███████████▏                     | 17001/50000 [3:05:01<5:52:35,  1.56it/s]


 34%|███████████▏                     | 17002/50000 [3:05:02<5:46:01,  1.59it/s]


 34%|███████████▏                     | 17003/50000 [3:05:02<5:39:19,  1.62it/s]


 34%|███████████▏                     | 17004/50000 [3:05:03<5:27:29,  1.68it/s]


 34%|███████████▏                     | 17005/50000 [3:05:04<5:32:38,  1.65it/s]


 34%|███████████▏                     | 17006/50000 [3:05:04<5:32:01,  1.66it/s]


 34%|███████████▏                     | 17007/50000 [3:05:05<5:36:51,  1.63it/s]


 34%|███████████▏                     | 17008/50000 [3:05:05<5:33:56,  1.65it/s]


 34%|███████████▏                     | 17009/50000 [3:05:06<5:28:48,  1.67it/s]


 34%|███████████▏                     | 17010/50000 [3:05:07<5:14:13,  1.75it/s]


 34%|███████████▏                     | 17011/50000 [3:05:07<5:24:25,  1.69it/s]


 34%|███████████▏                     | 17012/50000 [3:05:08<5:30:29,  1.66it/s]


 34%|███████████▏                     | 17013/50000 [3:05:08<5:28:31,  1.67it/s]


 34%|███████████▏                     | 17014/50000 [3:05:09<5:50:24,  1.57it/s]


 34%|███████████▏                     | 17015/50000 [3:05:10<5:56:01,  1.54it/s]


 34%|███████████▏                     | 17016/50000 [3:05:10<5:45:22,  1.59it/s]


 34%|███████████▏                     | 17017/50000 [3:05:11<5:45:08,  1.59it/s]


 34%|███████████▏                     | 17018/50000 [3:05:12<5:37:42,  1.63it/s]


 34%|███████████▏                     | 17019/50000 [3:05:12<5:43:51,  1.60it/s]


 34%|███████████▏                     | 17020/50000 [3:05:13<5:32:42,  1.65it/s]


 34%|███████████▏                     | 17021/50000 [3:05:13<5:29:57,  1.67it/s]


 34%|███████████▏                     | 17022/50000 [3:05:14<5:55:39,  1.55it/s]


 34%|███████████▏                     | 17023/50000 [3:05:15<5:45:01,  1.59it/s]


 34%|███████████▏                     | 17024/50000 [3:05:15<6:05:05,  1.51it/s]


 34%|███████████▏                     | 17025/50000 [3:05:16<5:53:49,  1.55it/s]


 34%|███████████▏                     | 17026/50000 [3:05:17<5:45:00,  1.59it/s]


 34%|███████████▏                     | 17027/50000 [3:05:18<6:20:33,  1.44it/s]


 34%|███████████▏                     | 17028/50000 [3:05:18<6:02:56,  1.51it/s]


 34%|███████████▏                     | 17029/50000 [3:05:19<6:04:07,  1.51it/s]


 34%|███████████▏                     | 17030/50000 [3:05:19<5:58:18,  1.53it/s]


 34%|███████████▏                     | 17031/50000 [3:05:20<5:48:49,  1.58it/s]


 34%|███████████▏                     | 17032/50000 [3:05:21<6:04:42,  1.51it/s]


 34%|███████████▏                     | 17033/50000 [3:05:21<6:01:43,  1.52it/s]


 34%|███████████▏                     | 17034/50000 [3:05:22<6:04:50,  1.51it/s]


 34%|███████████▏                     | 17035/50000 [3:05:23<5:40:30,  1.61it/s]


 34%|███████████▏                     | 17036/50000 [3:05:23<5:44:41,  1.59it/s]


 34%|███████████▏                     | 17037/50000 [3:05:24<5:47:42,  1.58it/s]


 34%|███████████▏                     | 17038/50000 [3:05:24<5:40:25,  1.61it/s]


 34%|███████████▏                     | 17039/50000 [3:05:25<5:44:56,  1.59it/s]


 34%|███████████▏                     | 17040/50000 [3:05:26<5:38:48,  1.62it/s]


 34%|███████████▏                     | 17041/50000 [3:05:26<6:03:05,  1.51it/s]


 34%|███████████▏                     | 17042/50000 [3:05:27<5:58:50,  1.53it/s]


 34%|███████████▏                     | 17043/50000 [3:05:28<6:20:11,  1.44it/s]


 34%|███████████▏                     | 17044/50000 [3:05:29<6:12:48,  1.47it/s]


 34%|███████████▏                     | 17045/50000 [3:05:29<6:10:55,  1.48it/s]


 34%|███████████▎                     | 17046/50000 [3:05:30<5:39:58,  1.62it/s]


 34%|███████████▎                     | 17047/50000 [3:05:30<5:46:32,  1.58it/s]


 34%|███████████▎                     | 17048/50000 [3:05:31<5:42:06,  1.61it/s]


 34%|███████████▎                     | 17049/50000 [3:05:31<5:32:34,  1.65it/s]


 34%|███████████▎                     | 17050/50000 [3:05:32<5:35:04,  1.64it/s]


 34%|███████████▎                     | 17051/50000 [3:05:33<5:38:18,  1.62it/s]


 34%|███████████▎                     | 17052/50000 [3:05:34<6:12:39,  1.47it/s]


 34%|███████████▎                     | 17053/50000 [3:05:34<6:08:26,  1.49it/s]


 34%|███████████▎                     | 17054/50000 [3:05:35<6:01:11,  1.52it/s]


 34%|███████████▎                     | 17055/50000 [3:05:36<6:03:51,  1.51it/s]


 34%|███████████▎                     | 17056/50000 [3:05:36<5:54:04,  1.55it/s]


 34%|███████████▎                     | 17057/50000 [3:05:37<6:00:29,  1.52it/s]


 34%|███████████▎                     | 17058/50000 [3:05:38<6:09:11,  1.49it/s]


 34%|███████████▎                     | 17059/50000 [3:05:38<5:55:44,  1.54it/s]


 34%|███████████▎                     | 17060/50000 [3:05:39<6:00:57,  1.52it/s]


 34%|███████████▎                     | 17061/50000 [3:05:39<6:03:09,  1.51it/s]


 34%|███████████▎                     | 17062/50000 [3:05:40<5:49:31,  1.57it/s]


 34%|███████████▎                     | 17063/50000 [3:05:41<5:30:56,  1.66it/s]


 34%|███████████▎                     | 17064/50000 [3:05:41<5:58:14,  1.53it/s]


 34%|███████████▎                     | 17065/50000 [3:05:42<6:12:07,  1.48it/s]


 34%|███████████▎                     | 17066/50000 [3:05:43<6:10:00,  1.48it/s]


 34%|███████████▎                     | 17067/50000 [3:05:43<6:16:18,  1.46it/s]


 34%|███████████▎                     | 17068/50000 [3:05:44<6:52:01,  1.33it/s]


 34%|███████████▎                     | 17069/50000 [3:05:45<6:38:10,  1.38it/s]


 34%|███████████▎                     | 17070/50000 [3:05:46<6:17:01,  1.46it/s]


 34%|███████████▎                     | 17071/50000 [3:05:46<6:12:58,  1.47it/s]


 34%|███████████▎                     | 17072/50000 [3:05:47<6:05:43,  1.50it/s]


 34%|███████████▎                     | 17073/50000 [3:05:48<6:01:34,  1.52it/s]


 34%|███████████▎                     | 17074/50000 [3:05:48<6:04:54,  1.50it/s]


 34%|███████████▎                     | 17075/50000 [3:05:49<5:54:47,  1.55it/s]


 34%|███████████▎                     | 17076/50000 [3:05:50<6:00:10,  1.52it/s]


 34%|███████████▎                     | 17077/50000 [3:05:50<5:51:25,  1.56it/s]


 34%|███████████▎                     | 17078/50000 [3:05:51<6:07:36,  1.49it/s]


 34%|███████████▎                     | 17079/50000 [3:05:51<5:51:28,  1.56it/s]


 34%|███████████▎                     | 17080/50000 [3:05:52<5:39:07,  1.62it/s]


 34%|███████████▎                     | 17081/50000 [3:05:53<5:37:40,  1.62it/s]


 34%|███████████▎                     | 17082/50000 [3:05:53<5:35:43,  1.63it/s]


 34%|███████████▎                     | 17083/50000 [3:05:54<5:33:15,  1.65it/s]


 34%|███████████▎                     | 17084/50000 [3:05:54<5:29:29,  1.67it/s]


 34%|███████████▎                     | 17085/50000 [3:05:55<5:50:05,  1.57it/s]


 34%|███████████▎                     | 17086/50000 [3:05:56<5:48:37,  1.57it/s]


 34%|███████████▎                     | 17087/50000 [3:05:56<5:51:53,  1.56it/s]


 34%|███████████▎                     | 17088/50000 [3:05:57<5:40:40,  1.61it/s]


 34%|███████████▎                     | 17089/50000 [3:05:58<5:29:47,  1.66it/s]


 34%|███████████▎                     | 17090/50000 [3:05:58<5:35:29,  1.63it/s]


 34%|███████████▎                     | 17091/50000 [3:05:59<5:46:18,  1.58it/s]


 34%|███████████▎                     | 17092/50000 [3:05:59<5:45:40,  1.59it/s]


 34%|███████████▎                     | 17093/50000 [3:06:00<5:31:40,  1.65it/s]


 34%|███████████▎                     | 17094/50000 [3:06:01<5:27:19,  1.68it/s]


 34%|███████████▎                     | 17095/50000 [3:06:01<5:21:04,  1.71it/s]


 34%|███████████▎                     | 17096/50000 [3:06:02<5:37:11,  1.63it/s]


 34%|███████████▎                     | 17097/50000 [3:06:03<5:47:25,  1.58it/s]


 34%|███████████▎                     | 17098/50000 [3:06:03<6:06:29,  1.50it/s]


 34%|███████████▎                     | 17099/50000 [3:06:04<5:59:44,  1.52it/s]


 34%|███████████▎                     | 17100/50000 [3:06:04<5:49:44,  1.57it/s]
                                                                                
{'loss': 3.3499, 'grad_norm': 2.940734624862671, 'learning_rate': 0.0006580000000000001, 'epoch': 0.9}

 34%|███████████▎                     | 17100/50000 [3:06:04<5:49:44,  1.57it/s]


 34%|███████████▎                     | 17101/50000 [3:06:05<5:52:06,  1.56it/s]


 34%|███████████▎                     | 17102/50000 [3:06:06<5:58:52,  1.53it/s]


 34%|███████████▎                     | 17103/50000 [3:06:07<6:02:35,  1.51it/s]


 34%|███████████▎                     | 17104/50000 [3:06:07<5:43:29,  1.60it/s]


 34%|███████████▎                     | 17105/50000 [3:06:08<5:43:50,  1.59it/s]


 34%|███████████▎                     | 17106/50000 [3:06:08<5:25:55,  1.68it/s]


 34%|███████████▎                     | 17107/50000 [3:06:09<5:26:43,  1.68it/s]


 34%|███████████▎                     | 17108/50000 [3:06:09<5:35:29,  1.63it/s]


 34%|███████████▎                     | 17109/50000 [3:06:10<5:27:11,  1.68it/s]


 34%|███████████▎                     | 17110/50000 [3:06:11<5:36:42,  1.63it/s]


 34%|███████████▎                     | 17111/50000 [3:06:11<5:59:13,  1.53it/s]


 34%|███████████▎                     | 17112/50000 [3:06:12<6:15:08,  1.46it/s]


 34%|███████████▎                     | 17113/50000 [3:06:13<6:09:24,  1.48it/s]


 34%|███████████▎                     | 17114/50000 [3:06:13<6:04:00,  1.51it/s]


 34%|███████████▎                     | 17115/50000 [3:06:14<5:45:01,  1.59it/s]


 34%|███████████▎                     | 17116/50000 [3:06:15<6:44:32,  1.35it/s]


 34%|███████████▎                     | 17117/50000 [3:06:16<6:18:18,  1.45it/s]


 34%|███████████▎                     | 17118/50000 [3:06:16<6:10:14,  1.48it/s]


 34%|███████████▎                     | 17119/50000 [3:06:17<7:01:09,  1.30it/s]


 34%|███████████▎                     | 17120/50000 [3:06:18<6:15:00,  1.46it/s]


 34%|███████████▎                     | 17121/50000 [3:06:18<5:59:35,  1.52it/s]


 34%|███████████▎                     | 17122/50000 [3:06:19<6:17:25,  1.45it/s]


 34%|███████████▎                     | 17123/50000 [3:06:20<6:01:04,  1.52it/s]


 34%|███████████▎                     | 17124/50000 [3:06:20<6:33:25,  1.39it/s]


 34%|███████████▎                     | 17125/50000 [3:06:21<6:11:13,  1.48it/s]


 34%|███████████▎                     | 17126/50000 [3:06:22<6:27:18,  1.41it/s]


 34%|███████████▎                     | 17127/50000 [3:06:22<6:04:51,  1.50it/s]


 34%|███████████▎                     | 17128/50000 [3:06:23<6:11:56,  1.47it/s]


 34%|███████████▎                     | 17129/50000 [3:06:24<5:51:15,  1.56it/s]


 34%|███████████▎                     | 17130/50000 [3:06:24<5:58:19,  1.53it/s]


 34%|███████████▎                     | 17131/50000 [3:06:25<5:55:02,  1.54it/s]


 34%|███████████▎                     | 17132/50000 [3:06:26<5:46:24,  1.58it/s]


 34%|███████████▎                     | 17133/50000 [3:06:26<5:46:34,  1.58it/s]


 34%|███████████▎                     | 17134/50000 [3:06:27<5:45:33,  1.59it/s]


 34%|███████████▎                     | 17135/50000 [3:06:27<5:26:40,  1.68it/s]


 34%|███████████▎                     | 17136/50000 [3:06:28<5:32:39,  1.65it/s]


 34%|███████████▎                     | 17137/50000 [3:06:29<5:25:40,  1.68it/s]


 34%|███████████▎                     | 17138/50000 [3:06:30<6:24:31,  1.42it/s]


 34%|███████████▎                     | 17139/50000 [3:06:30<6:11:28,  1.47it/s]


 34%|███████████▎                     | 17140/50000 [3:06:31<5:55:12,  1.54it/s]


 34%|███████████▎                     | 17141/50000 [3:06:31<5:44:42,  1.59it/s]


 34%|███████████▎                     | 17142/50000 [3:06:32<5:59:12,  1.52it/s]


 34%|███████████▎                     | 17143/50000 [3:06:33<6:11:52,  1.47it/s]


 34%|███████████▎                     | 17144/50000 [3:06:33<6:11:00,  1.48it/s]


 34%|███████████▎                     | 17145/50000 [3:06:34<6:09:11,  1.48it/s]


 34%|███████████▎                     | 17146/50000 [3:06:35<5:56:31,  1.54it/s]


 34%|███████████▎                     | 17147/50000 [3:06:35<6:08:47,  1.48it/s]


 34%|███████████▎                     | 17148/50000 [3:06:36<6:02:28,  1.51it/s]


 34%|███████████▎                     | 17149/50000 [3:06:37<6:03:22,  1.51it/s]


 34%|███████████▎                     | 17150/50000 [3:06:37<5:59:39,  1.52it/s]


 34%|███████████▎                     | 17151/50000 [3:06:38<5:58:54,  1.53it/s]


 34%|███████████▎                     | 17152/50000 [3:06:39<5:47:57,  1.57it/s]


 34%|███████████▎                     | 17153/50000 [3:06:39<5:43:16,  1.59it/s]


 34%|███████████▎                     | 17154/50000 [3:06:40<6:18:19,  1.45it/s]


 34%|███████████▎                     | 17155/50000 [3:06:41<6:12:58,  1.47it/s]


 34%|███████████▎                     | 17156/50000 [3:06:41<6:04:03,  1.50it/s]


 34%|███████████▎                     | 17157/50000 [3:06:42<6:05:01,  1.50it/s]


 34%|███████████▎                     | 17158/50000 [3:06:43<5:53:08,  1.55it/s]


 34%|███████████▎                     | 17159/50000 [3:06:43<5:36:41,  1.63it/s]


 34%|███████████▎                     | 17160/50000 [3:06:44<5:40:34,  1.61it/s]


 34%|███████████▎                     | 17161/50000 [3:06:45<6:19:40,  1.44it/s]


 34%|███████████▎                     | 17162/50000 [3:06:45<6:02:23,  1.51it/s]


 34%|███████████▎                     | 17163/50000 [3:06:46<6:05:56,  1.50it/s]


 34%|███████████▎                     | 17164/50000 [3:06:46<5:48:42,  1.57it/s]


 34%|███████████▎                     | 17165/50000 [3:06:47<6:12:18,  1.47it/s]


 34%|███████████▎                     | 17166/50000 [3:06:48<5:44:23,  1.59it/s]


 34%|███████████▎                     | 17167/50000 [3:06:48<5:49:12,  1.57it/s]


 34%|███████████▎                     | 17168/50000 [3:06:49<5:53:31,  1.55it/s]


 34%|███████████▎                     | 17169/50000 [3:06:50<6:11:29,  1.47it/s]


 34%|███████████▎                     | 17170/50000 [3:06:50<5:36:38,  1.63it/s]


 34%|███████████▎                     | 17171/50000 [3:06:51<5:21:25,  1.70it/s]


 34%|███████████▎                     | 17172/50000 [3:06:52<5:35:00,  1.63it/s]


 34%|███████████▎                     | 17173/50000 [3:06:52<5:19:30,  1.71it/s]


 34%|███████████▎                     | 17174/50000 [3:06:53<5:27:43,  1.67it/s]


 34%|███████████▎                     | 17175/50000 [3:06:53<5:47:10,  1.58it/s]


 34%|███████████▎                     | 17176/50000 [3:06:54<5:59:24,  1.52it/s]


 34%|███████████▎                     | 17177/50000 [3:06:55<6:31:57,  1.40it/s]


 34%|███████████▎                     | 17178/50000 [3:06:56<6:08:55,  1.48it/s]


 34%|███████████▎                     | 17179/50000 [3:06:56<5:57:03,  1.53it/s]


 34%|███████████▎                     | 17180/50000 [3:06:57<6:07:42,  1.49it/s]


 34%|███████████▎                     | 17181/50000 [3:06:57<5:56:19,  1.54it/s]


 34%|███████████▎                     | 17182/50000 [3:06:58<5:52:36,  1.55it/s]


 34%|███████████▎                     | 17183/50000 [3:06:59<5:55:06,  1.54it/s]


 34%|███████████▎                     | 17184/50000 [3:06:59<5:54:45,  1.54it/s]


 34%|███████████▎                     | 17185/50000 [3:07:00<5:54:36,  1.54it/s]


 34%|███████████▎                     | 17186/50000 [3:07:01<5:34:12,  1.64it/s]


 34%|███████████▎                     | 17187/50000 [3:07:01<5:41:11,  1.60it/s]


 34%|███████████▎                     | 17188/50000 [3:07:02<5:41:01,  1.60it/s]


 34%|███████████▎                     | 17189/50000 [3:07:02<5:34:40,  1.63it/s]


 34%|███████████▎                     | 17190/50000 [3:07:03<5:30:11,  1.66it/s]


 34%|███████████▎                     | 17191/50000 [3:07:04<5:42:49,  1.60it/s]


 34%|███████████▎                     | 17192/50000 [3:07:04<5:48:26,  1.57it/s]


 34%|███████████▎                     | 17193/50000 [3:07:05<6:11:23,  1.47it/s]


 34%|███████████▎                     | 17194/50000 [3:07:06<5:50:49,  1.56it/s]


 34%|███████████▎                     | 17195/50000 [3:07:06<5:53:56,  1.54it/s]


 34%|███████████▎                     | 17196/50000 [3:07:07<5:41:32,  1.60it/s]


 34%|███████████▎                     | 17197/50000 [3:07:08<5:49:31,  1.56it/s]


 34%|███████████▎                     | 17198/50000 [3:07:08<5:51:57,  1.55it/s]


 34%|███████████▎                     | 17199/50000 [3:07:09<5:42:17,  1.60it/s]


 34%|███████████▎                     | 17200/50000 [3:07:10<6:20:08,  1.44it/s]
                                                                                
{'loss': 3.3546, 'grad_norm': 2.9537675380706787, 'learning_rate': 0.000656, 'epoch': 0.9}

 34%|███████████▎                     | 17200/50000 [3:07:10<6:20:08,  1.44it/s]


 34%|███████████▎                     | 17201/50000 [3:07:11<6:42:23,  1.36it/s]


 34%|███████████▎                     | 17202/50000 [3:07:11<6:27:52,  1.41it/s]


 34%|███████████▎                     | 17203/50000 [3:07:12<6:06:17,  1.49it/s]


 34%|███████████▎                     | 17204/50000 [3:07:12<6:01:35,  1.51it/s]


 34%|███████████▎                     | 17205/50000 [3:07:13<6:15:19,  1.46it/s]


 34%|███████████▎                     | 17206/50000 [3:07:14<6:38:49,  1.37it/s]


 34%|███████████▎                     | 17207/50000 [3:07:15<6:28:31,  1.41it/s]


 34%|███████████▎                     | 17208/50000 [3:07:15<6:01:28,  1.51it/s]


 34%|███████████▎                     | 17209/50000 [3:07:16<6:04:03,  1.50it/s]


 34%|███████████▎                     | 17210/50000 [3:07:16<5:49:17,  1.56it/s]


 34%|███████████▎                     | 17211/50000 [3:07:17<5:50:23,  1.56it/s]


 34%|███████████▎                     | 17212/50000 [3:07:18<6:06:00,  1.49it/s]


 34%|███████████▎                     | 17213/50000 [3:07:19<6:31:01,  1.40it/s]


 34%|███████████▎                     | 17214/50000 [3:07:19<6:16:29,  1.45it/s]


 34%|███████████▎                     | 17215/50000 [3:07:20<6:13:30,  1.46it/s]


 34%|███████████▎                     | 17216/50000 [3:07:21<6:53:01,  1.32it/s]


 34%|███████████▎                     | 17217/50000 [3:07:22<6:38:08,  1.37it/s]


 34%|███████████▎                     | 17218/50000 [3:07:22<6:10:06,  1.48it/s]


 34%|███████████▎                     | 17219/50000 [3:07:23<6:31:48,  1.39it/s]


 34%|███████████▎                     | 17220/50000 [3:07:23<6:07:44,  1.49it/s]


 34%|███████████▎                     | 17221/50000 [3:07:24<5:48:46,  1.57it/s]


 34%|███████████▎                     | 17222/50000 [3:07:25<5:50:54,  1.56it/s]


 34%|███████████▎                     | 17223/50000 [3:07:25<5:41:16,  1.60it/s]


 34%|███████████▎                     | 17224/50000 [3:07:26<5:40:41,  1.60it/s]


 34%|███████████▎                     | 17225/50000 [3:07:27<6:16:21,  1.45it/s]


 34%|███████████▎                     | 17226/50000 [3:07:27<6:09:15,  1.48it/s]


 34%|███████████▎                     | 17227/50000 [3:07:28<6:05:15,  1.50it/s]


 34%|███████████▎                     | 17228/50000 [3:07:29<6:01:28,  1.51it/s]


 34%|███████████▎                     | 17229/50000 [3:07:29<5:44:27,  1.59it/s]


 34%|███████████▎                     | 17230/50000 [3:07:30<5:49:07,  1.56it/s]


 34%|███████████▎                     | 17231/50000 [3:07:30<5:33:43,  1.64it/s]


 34%|███████████▎                     | 17232/50000 [3:07:31<5:43:31,  1.59it/s]


 34%|███████████▎                     | 17233/50000 [3:07:32<5:47:13,  1.57it/s]


 34%|███████████▎                     | 17234/50000 [3:07:32<5:24:30,  1.68it/s]


 34%|███████████▍                     | 17235/50000 [3:07:33<5:22:36,  1.69it/s]


 34%|███████████▍                     | 17236/50000 [3:07:33<5:17:10,  1.72it/s]


 34%|███████████▍                     | 17237/50000 [3:07:34<5:28:48,  1.66it/s]


 34%|███████████▍                     | 17238/50000 [3:07:35<5:13:37,  1.74it/s]


 34%|███████████▍                     | 17239/50000 [3:07:35<5:24:11,  1.68it/s]


 34%|███████████▍                     | 17240/50000 [3:07:36<5:33:40,  1.64it/s]


 34%|███████████▍                     | 17241/50000 [3:07:36<5:36:10,  1.62it/s]


 34%|███████████▍                     | 17242/50000 [3:07:37<5:46:15,  1.58it/s]


 34%|███████████▍                     | 17243/50000 [3:07:38<5:50:10,  1.56it/s]


 34%|███████████▍                     | 17244/50000 [3:07:38<5:27:40,  1.67it/s]


 34%|███████████▍                     | 17245/50000 [3:07:39<5:37:09,  1.62it/s]


 34%|███████████▍                     | 17246/50000 [3:07:40<5:37:37,  1.62it/s]


 34%|███████████▍                     | 17247/50000 [3:07:40<5:32:41,  1.64it/s]


 34%|███████████▍                     | 17248/50000 [3:07:41<6:06:28,  1.49it/s]


 34%|███████████▍                     | 17249/50000 [3:07:42<6:05:16,  1.49it/s]


 34%|███████████▍                     | 17250/50000 [3:07:42<6:07:41,  1.48it/s]


 35%|███████████▍                     | 17251/50000 [3:07:43<6:02:08,  1.51it/s]


 35%|███████████▍                     | 17252/50000 [3:07:44<6:03:15,  1.50it/s]


 35%|███████████▍                     | 17253/50000 [3:07:44<5:56:56,  1.53it/s]


 35%|███████████▍                     | 17254/50000 [3:07:45<5:48:36,  1.57it/s]


 35%|███████████▍                     | 17255/50000 [3:07:46<5:55:06,  1.54it/s]


 35%|███████████▍                     | 17256/50000 [3:07:46<5:52:15,  1.55it/s]


 35%|███████████▍                     | 17257/50000 [3:07:47<6:10:17,  1.47it/s]


 35%|███████████▍                     | 17258/50000 [3:07:48<6:24:57,  1.42it/s]


 35%|███████████▍                     | 17259/50000 [3:07:48<6:15:08,  1.45it/s]


 35%|███████████▍                     | 17260/50000 [3:07:49<6:12:18,  1.47it/s]


 35%|███████████▍                     | 17261/50000 [3:07:50<6:24:17,  1.42it/s]


 35%|███████████▍                     | 17262/50000 [3:07:51<6:32:49,  1.39it/s]


 35%|███████████▍                     | 17263/50000 [3:07:51<6:11:18,  1.47it/s]


 35%|███████████▍                     | 17264/50000 [3:07:52<5:52:12,  1.55it/s]


 35%|███████████▍                     | 17265/50000 [3:07:52<5:52:14,  1.55it/s]


 35%|███████████▍                     | 17266/50000 [3:07:53<6:28:52,  1.40it/s]


 35%|███████████▍                     | 17267/50000 [3:07:54<5:55:58,  1.53it/s]


 35%|███████████▍                     | 17268/50000 [3:07:54<6:00:58,  1.51it/s]


 35%|███████████▍                     | 17269/50000 [3:07:55<6:00:32,  1.51it/s]


 35%|███████████▍                     | 17270/50000 [3:07:56<5:51:43,  1.55it/s]


 35%|███████████▍                     | 17271/50000 [3:07:56<5:49:49,  1.56it/s]


 35%|███████████▍                     | 17272/50000 [3:07:57<5:49:24,  1.56it/s]


 35%|███████████▍                     | 17273/50000 [3:07:58<5:53:21,  1.54it/s]


 35%|███████████▍                     | 17274/50000 [3:07:58<5:56:13,  1.53it/s]


 35%|███████████▍                     | 17275/50000 [3:07:59<5:50:52,  1.55it/s]


 35%|███████████▍                     | 17276/50000 [3:08:00<6:09:12,  1.48it/s]


 35%|███████████▍                     | 17277/50000 [3:08:00<6:10:32,  1.47it/s]


 35%|███████████▍                     | 17278/50000 [3:08:01<5:53:49,  1.54it/s]


 35%|███████████▍                     | 17279/50000 [3:08:02<6:28:30,  1.40it/s]


 35%|███████████▍                     | 17280/50000 [3:08:02<5:57:32,  1.53it/s]


 35%|███████████▍                     | 17281/50000 [3:08:03<5:43:56,  1.59it/s]


 35%|███████████▍                     | 17282/50000 [3:08:03<5:46:27,  1.57it/s]


 35%|███████████▍                     | 17283/50000 [3:08:04<5:21:08,  1.70it/s]


 35%|███████████▍                     | 17284/50000 [3:08:05<5:22:21,  1.69it/s]


 35%|███████████▍                     | 17285/50000 [3:08:05<5:46:19,  1.57it/s]


 35%|███████████▍                     | 17286/50000 [3:08:06<5:40:51,  1.60it/s]


 35%|███████████▍                     | 17287/50000 [3:08:06<5:34:07,  1.63it/s]


 35%|███████████▍                     | 17288/50000 [3:08:07<5:32:18,  1.64it/s]


 35%|███████████▍                     | 17289/50000 [3:08:08<5:43:25,  1.59it/s]


 35%|███████████▍                     | 17290/50000 [3:08:09<6:00:53,  1.51it/s]


 35%|███████████▍                     | 17291/50000 [3:08:09<5:45:47,  1.58it/s]


 35%|███████████▍                     | 17292/50000 [3:08:10<5:37:19,  1.62it/s]


 35%|███████████▍                     | 17293/50000 [3:08:10<5:48:00,  1.57it/s]


 35%|███████████▍                     | 17294/50000 [3:08:11<5:35:52,  1.62it/s]


 35%|███████████▍                     | 17295/50000 [3:08:11<5:30:20,  1.65it/s]


 35%|███████████▍                     | 17296/50000 [3:08:12<5:20:23,  1.70it/s]


 35%|███████████▍                     | 17297/50000 [3:08:13<5:21:26,  1.70it/s]


 35%|███████████▍                     | 17298/50000 [3:08:13<5:20:16,  1.70it/s]


 35%|███████████▍                     | 17299/50000 [3:08:14<5:29:46,  1.65it/s]


 35%|███████████▍                     | 17300/50000 [3:08:14<5:19:59,  1.70it/s]
                                                                                
{'loss': 3.3829, 'grad_norm': 3.32247257232666, 'learning_rate': 0.0006540000000000001, 'epoch': 0.91}

 35%|███████████▍                     | 17300/50000 [3:08:14<5:19:59,  1.70it/s]


 35%|███████████▍                     | 17301/50000 [3:08:15<5:28:34,  1.66it/s]


 35%|███████████▍                     | 17302/50000 [3:08:16<5:34:29,  1.63it/s]


 35%|███████████▍                     | 17303/50000 [3:08:16<5:23:14,  1.69it/s]


 35%|███████████▍                     | 17304/50000 [3:08:17<5:21:32,  1.69it/s]


 35%|███████████▍                     | 17305/50000 [3:08:17<5:18:16,  1.71it/s]


 35%|███████████▍                     | 17306/50000 [3:08:18<5:27:19,  1.66it/s]


 35%|███████████▍                     | 17307/50000 [3:08:19<5:38:45,  1.61it/s]


 35%|███████████▍                     | 17308/50000 [3:08:20<6:11:41,  1.47it/s]


 35%|███████████▍                     | 17309/50000 [3:08:20<6:04:06,  1.50it/s]


 35%|███████████▍                     | 17310/50000 [3:08:21<6:04:58,  1.49it/s]


 35%|███████████▍                     | 17311/50000 [3:08:21<5:57:05,  1.53it/s]


 35%|███████████▍                     | 17312/50000 [3:08:22<5:41:07,  1.60it/s]


 35%|███████████▍                     | 17313/50000 [3:08:23<5:27:59,  1.66it/s]


 35%|███████████▍                     | 17314/50000 [3:08:23<5:48:51,  1.56it/s]


 35%|███████████▍                     | 17315/50000 [3:08:24<6:00:09,  1.51it/s]


 35%|███████████▍                     | 17316/50000 [3:08:25<6:03:50,  1.50it/s]


 35%|███████████▍                     | 17317/50000 [3:08:25<6:12:01,  1.46it/s]


 35%|███████████▍                     | 17318/50000 [3:08:26<6:28:01,  1.40it/s]


 35%|███████████▍                     | 17319/50000 [3:08:27<6:30:50,  1.39it/s]


 35%|███████████▍                     | 17320/50000 [3:08:28<6:25:16,  1.41it/s]


 35%|███████████▍                     | 17321/50000 [3:08:28<6:25:34,  1.41it/s]


 35%|███████████▍                     | 17322/50000 [3:08:29<6:12:11,  1.46it/s]


 35%|███████████▍                     | 17323/50000 [3:08:30<5:56:41,  1.53it/s]


 35%|███████████▍                     | 17324/50000 [3:08:30<5:40:47,  1.60it/s]


 35%|███████████▍                     | 17325/50000 [3:08:31<5:30:43,  1.65it/s]


 35%|███████████▍                     | 17326/50000 [3:08:31<5:49:44,  1.56it/s]


 35%|███████████▍                     | 17327/50000 [3:08:32<5:47:11,  1.57it/s]


 35%|███████████▍                     | 17328/50000 [3:08:33<5:50:33,  1.55it/s]


 35%|███████████▍                     | 17329/50000 [3:08:33<5:42:36,  1.59it/s]


 35%|███████████▍                     | 17330/50000 [3:08:34<5:38:15,  1.61it/s]


 35%|███████████▍                     | 17331/50000 [3:08:34<5:38:14,  1.61it/s]


 35%|███████████▍                     | 17332/50000 [3:08:35<5:28:49,  1.66it/s]


 35%|███████████▍                     | 17333/50000 [3:08:36<5:40:44,  1.60it/s]


 35%|███████████▍                     | 17334/50000 [3:08:36<5:43:58,  1.58it/s]


 35%|███████████▍                     | 17335/50000 [3:08:37<5:19:59,  1.70it/s]


 35%|███████████▍                     | 17336/50000 [3:08:37<5:17:04,  1.72it/s]


 35%|███████████▍                     | 17337/50000 [3:08:38<5:20:05,  1.70it/s]


 35%|███████████▍                     | 17338/50000 [3:08:39<5:43:17,  1.59it/s]


 35%|███████████▍                     | 17339/50000 [3:08:39<5:36:27,  1.62it/s]


 35%|███████████▍                     | 17340/50000 [3:08:40<5:34:46,  1.63it/s]


 35%|███████████▍                     | 17341/50000 [3:08:40<5:15:33,  1.72it/s]


 35%|███████████▍                     | 17342/50000 [3:08:41<6:00:29,  1.51it/s]


 35%|███████████▍                     | 17343/50000 [3:08:42<5:56:44,  1.53it/s]


 35%|███████████▍                     | 17344/50000 [3:08:42<5:42:47,  1.59it/s]


 35%|███████████▍                     | 17345/50000 [3:08:43<5:36:08,  1.62it/s]


 35%|███████████▍                     | 17346/50000 [3:08:44<5:28:21,  1.66it/s]


 35%|███████████▍                     | 17347/50000 [3:08:44<5:37:21,  1.61it/s]


 35%|███████████▍                     | 17348/50000 [3:08:45<5:38:27,  1.61it/s]


 35%|███████████▍                     | 17349/50000 [3:08:46<5:31:59,  1.64it/s]


 35%|███████████▍                     | 17350/50000 [3:08:46<5:23:24,  1.68it/s]


 35%|███████████▍                     | 17351/50000 [3:08:47<5:19:26,  1.70it/s]


 35%|███████████▍                     | 17352/50000 [3:08:47<5:25:59,  1.67it/s]


 35%|███████████▍                     | 17353/50000 [3:08:48<5:24:18,  1.68it/s]


 35%|███████████▍                     | 17354/50000 [3:08:48<5:21:19,  1.69it/s]


 35%|███████████▍                     | 17355/50000 [3:08:49<5:17:56,  1.71it/s]


 35%|███████████▍                     | 17356/50000 [3:08:50<5:26:53,  1.66it/s]


 35%|███████████▍                     | 17357/50000 [3:08:50<5:30:09,  1.65it/s]


 35%|███████████▍                     | 17358/50000 [3:08:51<5:41:40,  1.59it/s]


 35%|███████████▍                     | 17359/50000 [3:08:52<5:48:45,  1.56it/s]


 35%|███████████▍                     | 17360/50000 [3:08:52<5:38:14,  1.61it/s]


 35%|███████████▍                     | 17361/50000 [3:08:53<5:44:11,  1.58it/s]


 35%|███████████▍                     | 17362/50000 [3:08:53<5:37:01,  1.61it/s]


 35%|███████████▍                     | 17363/50000 [3:08:54<5:33:59,  1.63it/s]


 35%|███████████▍                     | 17364/50000 [3:08:55<5:36:02,  1.62it/s]


 35%|███████████▍                     | 17365/50000 [3:08:55<5:39:32,  1.60it/s]


 35%|███████████▍                     | 17366/50000 [3:08:56<5:36:45,  1.62it/s]


 35%|███████████▍                     | 17367/50000 [3:08:57<5:42:19,  1.59it/s]


 35%|███████████▍                     | 17368/50000 [3:08:57<5:56:32,  1.53it/s]


 35%|███████████▍                     | 17369/50000 [3:08:58<5:55:55,  1.53it/s]


 35%|███████████▍                     | 17370/50000 [3:08:59<6:10:20,  1.47it/s]


 35%|███████████▍                     | 17371/50000 [3:08:59<5:55:17,  1.53it/s]


 35%|███████████▍                     | 17372/50000 [3:09:00<6:10:54,  1.47it/s]


 35%|███████████▍                     | 17373/50000 [3:09:01<5:57:48,  1.52it/s]


 35%|███████████▍                     | 17374/50000 [3:09:01<6:06:11,  1.48it/s]


 35%|███████████▍                     | 17375/50000 [3:09:02<5:46:12,  1.57it/s]


 35%|███████████▍                     | 17376/50000 [3:09:03<5:51:45,  1.55it/s]


 35%|███████████▍                     | 17377/50000 [3:09:03<5:48:42,  1.56it/s]


 35%|███████████▍                     | 17378/50000 [3:09:04<5:38:06,  1.61it/s]


 35%|███████████▍                     | 17379/50000 [3:09:04<5:45:04,  1.58it/s]


 35%|███████████▍                     | 17380/50000 [3:09:05<5:43:43,  1.58it/s]


 35%|███████████▍                     | 17381/50000 [3:09:06<5:34:31,  1.63it/s]


 35%|███████████▍                     | 17382/50000 [3:09:06<5:41:33,  1.59it/s]


 35%|███████████▍                     | 17383/50000 [3:09:07<5:42:14,  1.59it/s]


 35%|███████████▍                     | 17384/50000 [3:09:08<5:36:35,  1.61it/s]


 35%|███████████▍                     | 17385/50000 [3:09:08<5:26:34,  1.66it/s]


 35%|███████████▍                     | 17386/50000 [3:09:09<5:47:39,  1.56it/s]


 35%|███████████▍                     | 17387/50000 [3:09:09<5:32:16,  1.64it/s]


 35%|███████████▍                     | 17388/50000 [3:09:10<5:42:03,  1.59it/s]


 35%|███████████▍                     | 17389/50000 [3:09:11<5:37:25,  1.61it/s]


 35%|███████████▍                     | 17390/50000 [3:09:11<5:34:12,  1.63it/s]


 35%|███████████▍                     | 17391/50000 [3:09:12<5:28:58,  1.65it/s]


 35%|███████████▍                     | 17392/50000 [3:09:12<5:37:48,  1.61it/s]


 35%|███████████▍                     | 17393/50000 [3:09:13<5:43:47,  1.58it/s]


 35%|███████████▍                     | 17394/50000 [3:09:14<5:58:39,  1.52it/s]


 35%|███████████▍                     | 17395/50000 [3:09:15<5:59:22,  1.51it/s]


 35%|███████████▍                     | 17396/50000 [3:09:15<5:48:29,  1.56it/s]


 35%|███████████▍                     | 17397/50000 [3:09:16<5:46:08,  1.57it/s]


 35%|███████████▍                     | 17398/50000 [3:09:16<5:48:33,  1.56it/s]


 35%|███████████▍                     | 17399/50000 [3:09:17<5:53:17,  1.54it/s]


 35%|███████████▍                     | 17400/50000 [3:09:18<5:44:14,  1.58it/s]
                                                                                
{'loss': 3.3345, 'grad_norm': 3.285517930984497, 'learning_rate': 0.000652, 'epoch': 0.91}

 35%|███████████▍                     | 17400/50000 [3:09:18<5:44:14,  1.58it/s]


 35%|███████████▍                     | 17401/50000 [3:09:18<5:44:18,  1.58it/s]


 35%|███████████▍                     | 17402/50000 [3:09:19<5:45:15,  1.57it/s]


 35%|███████████▍                     | 17403/50000 [3:09:20<5:39:43,  1.60it/s]


 35%|███████████▍                     | 17404/50000 [3:09:20<5:45:04,  1.57it/s]


 35%|███████████▍                     | 17405/50000 [3:09:21<5:23:44,  1.68it/s]


 35%|███████████▍                     | 17406/50000 [3:09:21<5:36:01,  1.62it/s]


 35%|███████████▍                     | 17407/50000 [3:09:22<5:52:14,  1.54it/s]


 35%|███████████▍                     | 17408/50000 [3:09:23<5:40:29,  1.60it/s]


 35%|███████████▍                     | 17409/50000 [3:09:23<5:44:35,  1.58it/s]


 35%|███████████▍                     | 17410/50000 [3:09:24<5:56:46,  1.52it/s]


 35%|███████████▍                     | 17411/50000 [3:09:25<6:21:50,  1.42it/s]


 35%|███████████▍                     | 17412/50000 [3:09:25<5:59:12,  1.51it/s]


 35%|███████████▍                     | 17413/50000 [3:09:26<5:49:26,  1.55it/s]


 35%|███████████▍                     | 17414/50000 [3:09:27<5:37:46,  1.61it/s]


 35%|███████████▍                     | 17415/50000 [3:09:27<5:30:21,  1.64it/s]


 35%|███████████▍                     | 17416/50000 [3:09:28<5:23:03,  1.68it/s]


 35%|███████████▍                     | 17417/50000 [3:09:28<5:43:07,  1.58it/s]


 35%|███████████▍                     | 17418/50000 [3:09:29<5:59:04,  1.51it/s]


 35%|███████████▍                     | 17419/50000 [3:09:30<5:35:28,  1.62it/s]


 35%|███████████▍                     | 17420/50000 [3:09:30<5:30:42,  1.64it/s]


 35%|███████████▍                     | 17421/50000 [3:09:31<5:33:32,  1.63it/s]


 35%|███████████▍                     | 17422/50000 [3:09:31<5:29:28,  1.65it/s]


 35%|███████████▍                     | 17423/50000 [3:09:32<5:50:35,  1.55it/s]


 35%|███████████▍                     | 17424/50000 [3:09:33<5:47:40,  1.56it/s]


 35%|███████████▌                     | 17425/50000 [3:09:34<5:52:38,  1.54it/s]


 35%|███████████▌                     | 17426/50000 [3:09:34<5:52:00,  1.54it/s]


 35%|███████████▌                     | 17427/50000 [3:09:35<5:55:38,  1.53it/s]


 35%|███████████▌                     | 17428/50000 [3:09:36<6:06:47,  1.48it/s]


 35%|███████████▌                     | 17429/50000 [3:09:36<6:20:49,  1.43it/s]


 35%|███████████▌                     | 17430/50000 [3:09:37<6:29:37,  1.39it/s]


 35%|███████████▌                     | 17431/50000 [3:09:38<6:10:45,  1.46it/s]


 35%|███████████▌                     | 17432/50000 [3:09:38<5:52:23,  1.54it/s]


 35%|███████████▌                     | 17433/50000 [3:09:39<5:44:37,  1.58it/s]


 35%|███████████▌                     | 17434/50000 [3:09:40<6:06:23,  1.48it/s]


 35%|███████████▌                     | 17435/50000 [3:09:40<6:01:38,  1.50it/s]


 35%|███████████▌                     | 17436/50000 [3:09:41<5:48:56,  1.56it/s]


 35%|███████████▌                     | 17437/50000 [3:09:41<5:20:11,  1.69it/s]


 35%|███████████▌                     | 17438/50000 [3:09:42<5:32:20,  1.63it/s]


 35%|███████████▌                     | 17439/50000 [3:09:43<5:39:32,  1.60it/s]


 35%|███████████▌                     | 17440/50000 [3:09:43<5:46:46,  1.56it/s]


 35%|███████████▌                     | 17441/50000 [3:09:44<5:36:21,  1.61it/s]


 35%|███████████▌                     | 17442/50000 [3:09:45<5:39:51,  1.60it/s]


 35%|███████████▌                     | 17443/50000 [3:09:45<5:21:14,  1.69it/s]


 35%|███████████▌                     | 17444/50000 [3:09:46<5:22:03,  1.68it/s]


 35%|███████████▌                     | 17445/50000 [3:09:46<5:35:02,  1.62it/s]


 35%|███████████▌                     | 17446/50000 [3:09:47<5:24:33,  1.67it/s]


 35%|███████████▌                     | 17447/50000 [3:09:47<5:29:33,  1.65it/s]


 35%|███████████▌                     | 17448/50000 [3:09:48<5:51:57,  1.54it/s]


 35%|███████████▌                     | 17449/50000 [3:09:49<6:01:56,  1.50it/s]


 35%|███████████▌                     | 17450/50000 [3:09:50<6:02:36,  1.50it/s]


 35%|███████████▌                     | 17451/50000 [3:09:50<6:00:55,  1.50it/s]


 35%|███████████▌                     | 17452/50000 [3:09:51<5:50:26,  1.55it/s]


 35%|███████████▌                     | 17453/50000 [3:09:52<6:06:13,  1.48it/s]


 35%|███████████▌                     | 17454/50000 [3:09:52<5:53:08,  1.54it/s]


 35%|███████████▌                     | 17455/50000 [3:09:53<5:54:22,  1.53it/s]


 35%|███████████▌                     | 17456/50000 [3:09:54<5:54:08,  1.53it/s]


 35%|███████████▌                     | 17457/50000 [3:09:54<5:51:57,  1.54it/s]


 35%|███████████▌                     | 17458/50000 [3:09:55<5:28:19,  1.65it/s]


 35%|███████████▌                     | 17459/50000 [3:09:55<5:47:36,  1.56it/s]


 35%|███████████▌                     | 17460/50000 [3:09:56<5:45:19,  1.57it/s]


 35%|███████████▌                     | 17461/50000 [3:09:57<6:01:34,  1.50it/s]


 35%|███████████▌                     | 17462/50000 [3:09:57<5:51:56,  1.54it/s]


 35%|███████████▌                     | 17463/50000 [3:09:58<5:48:17,  1.56it/s]


 35%|███████████▌                     | 17464/50000 [3:09:59<5:54:02,  1.53it/s]


 35%|███████████▌                     | 17465/50000 [3:09:59<6:03:15,  1.49it/s]


 35%|███████████▌                     | 17466/50000 [3:10:00<5:47:01,  1.56it/s]


 35%|███████████▌                     | 17467/50000 [3:10:01<5:54:12,  1.53it/s]


 35%|███████████▌                     | 17468/50000 [3:10:01<5:51:05,  1.54it/s]


 35%|███████████▌                     | 17469/50000 [3:10:02<5:39:31,  1.60it/s]


 35%|███████████▌                     | 17470/50000 [3:10:02<5:35:28,  1.62it/s]


 35%|███████████▌                     | 17471/50000 [3:10:03<5:32:01,  1.63it/s]


 35%|███████████▌                     | 17472/50000 [3:10:04<5:17:34,  1.71it/s]


 35%|███████████▌                     | 17473/50000 [3:10:04<5:54:06,  1.53it/s]


 35%|███████████▌                     | 17474/50000 [3:10:05<6:08:46,  1.47it/s]


 35%|███████████▌                     | 17475/50000 [3:10:06<5:55:57,  1.52it/s]


 35%|███████████▌                     | 17476/50000 [3:10:06<5:53:10,  1.53it/s]


 35%|███████████▌                     | 17477/50000 [3:10:07<5:53:14,  1.53it/s]


 35%|███████████▌                     | 17478/50000 [3:10:08<5:49:13,  1.55it/s]


 35%|███████████▌                     | 17479/50000 [3:10:08<6:08:22,  1.47it/s]


 35%|███████████▌                     | 17480/50000 [3:10:09<5:51:38,  1.54it/s]


 35%|███████████▌                     | 17481/50000 [3:10:10<5:50:00,  1.55it/s]


 35%|███████████▌                     | 17482/50000 [3:10:10<5:42:42,  1.58it/s]


 35%|███████████▌                     | 17483/50000 [3:10:11<5:22:50,  1.68it/s]


 35%|███████████▌                     | 17484/50000 [3:10:11<5:30:59,  1.64it/s]


 35%|███████████▌                     | 17485/50000 [3:10:12<5:49:23,  1.55it/s]


 35%|███████████▌                     | 17486/50000 [3:10:13<5:38:16,  1.60it/s]


 35%|███████████▌                     | 17487/50000 [3:10:13<6:08:19,  1.47it/s]


 35%|███████████▌                     | 17488/50000 [3:10:15<7:12:25,  1.25it/s]


 35%|███████████▌                     | 17489/50000 [3:10:15<6:50:44,  1.32it/s]


 35%|███████████▌                     | 17490/50000 [3:10:16<6:54:35,  1.31it/s]


 35%|███████████▌                     | 17491/50000 [3:10:17<6:53:05,  1.31it/s]


 35%|███████████▌                     | 17492/50000 [3:10:17<6:22:54,  1.41it/s]


 35%|███████████▌                     | 17493/50000 [3:10:18<6:09:54,  1.46it/s]


 35%|███████████▌                     | 17494/50000 [3:10:19<5:56:41,  1.52it/s]


 35%|███████████▌                     | 17495/50000 [3:10:19<5:43:21,  1.58it/s]


 35%|███████████▌                     | 17496/50000 [3:10:20<5:56:37,  1.52it/s]


 35%|███████████▌                     | 17497/50000 [3:10:20<5:41:15,  1.59it/s]


 35%|███████████▌                     | 17498/50000 [3:10:21<5:40:44,  1.59it/s]


 35%|███████████▌                     | 17499/50000 [3:10:22<5:59:19,  1.51it/s]


 35%|███████████▌                     | 17500/50000 [3:10:22<5:32:20,  1.63it/s]
                                                                                
{'loss': 3.3934, 'grad_norm': 4.293366432189941, 'learning_rate': 0.0006500000000000001, 'epoch': 0.92}

 35%|███████████▌                     | 17500/50000 [3:10:22<5:32:20,  1.63it/s]


 35%|███████████▌                     | 17501/50000 [3:10:23<5:39:45,  1.59it/s]


 35%|███████████▌                     | 17502/50000 [3:10:24<5:32:30,  1.63it/s]


 35%|███████████▌                     | 17503/50000 [3:10:24<5:40:44,  1.59it/s]


 35%|███████████▌                     | 17504/50000 [3:10:25<5:32:13,  1.63it/s]


 35%|███████████▌                     | 17505/50000 [3:10:25<5:23:11,  1.68it/s]


 35%|███████████▌                     | 17506/50000 [3:10:26<6:11:36,  1.46it/s]


 35%|███████████▌                     | 17507/50000 [3:10:27<5:45:25,  1.57it/s]


 35%|███████████▌                     | 17508/50000 [3:10:27<5:37:41,  1.60it/s]


 35%|███████████▌                     | 17509/50000 [3:10:28<5:52:56,  1.53it/s]


 35%|███████████▌                     | 17510/50000 [3:10:29<6:12:13,  1.45it/s]


 35%|███████████▌                     | 17511/50000 [3:10:29<6:04:53,  1.48it/s]


 35%|███████████▌                     | 17512/50000 [3:10:30<6:22:35,  1.42it/s]


 35%|███████████▌                     | 17513/50000 [3:10:31<6:28:46,  1.39it/s]


 35%|███████████▌                     | 17514/50000 [3:10:32<6:31:10,  1.38it/s]


 35%|███████████▌                     | 17515/50000 [3:10:32<6:36:51,  1.36it/s]


 35%|███████████▌                     | 17516/50000 [3:10:33<6:19:54,  1.43it/s]


 35%|███████████▌                     | 17517/50000 [3:10:34<6:00:54,  1.50it/s]


 35%|███████████▌                     | 17518/50000 [3:10:34<5:49:33,  1.55it/s]


 35%|███████████▌                     | 17519/50000 [3:10:35<5:46:50,  1.56it/s]


 35%|███████████▌                     | 17520/50000 [3:10:36<5:41:31,  1.59it/s]


 35%|███████████▌                     | 17521/50000 [3:10:36<5:46:03,  1.56it/s]


 35%|███████████▌                     | 17522/50000 [3:10:37<6:36:05,  1.37it/s]


 35%|███████████▌                     | 17523/50000 [3:10:38<6:07:53,  1.47it/s]


 35%|███████████▌                     | 17524/50000 [3:10:38<6:07:49,  1.47it/s]


 35%|███████████▌                     | 17525/50000 [3:10:39<5:59:45,  1.50it/s]


 35%|███████████▌                     | 17526/50000 [3:10:40<6:08:12,  1.47it/s]


 35%|███████████▌                     | 17527/50000 [3:10:40<5:59:31,  1.51it/s]


 35%|███████████▌                     | 17528/50000 [3:10:41<5:40:05,  1.59it/s]


 35%|███████████▌                     | 17529/50000 [3:10:41<5:20:47,  1.69it/s]


 35%|███████████▌                     | 17530/50000 [3:10:42<5:08:23,  1.75it/s]


 35%|███████████▌                     | 17531/50000 [3:10:43<5:50:43,  1.54it/s]


 35%|███████████▌                     | 17532/50000 [3:10:43<5:48:39,  1.55it/s]


 35%|███████████▌                     | 17533/50000 [3:10:44<5:34:51,  1.62it/s]


 35%|███████████▌                     | 17534/50000 [3:10:45<5:42:40,  1.58it/s]


 35%|███████████▌                     | 17535/50000 [3:10:45<5:31:36,  1.63it/s]


 35%|███████████▌                     | 17536/50000 [3:10:46<5:31:02,  1.63it/s]


 35%|███████████▌                     | 17537/50000 [3:10:47<5:50:31,  1.54it/s]


 35%|███████████▌                     | 17538/50000 [3:10:47<5:39:11,  1.60it/s]


 35%|███████████▌                     | 17539/50000 [3:10:48<5:32:29,  1.63it/s]


 35%|███████████▌                     | 17540/50000 [3:10:48<5:31:37,  1.63it/s]


 35%|███████████▌                     | 17541/50000 [3:10:49<5:35:58,  1.61it/s]


 35%|███████████▌                     | 17542/50000 [3:10:50<5:56:47,  1.52it/s]


 35%|███████████▌                     | 17543/50000 [3:10:50<5:39:15,  1.59it/s]


 35%|███████████▌                     | 17544/50000 [3:10:51<5:56:58,  1.52it/s]


 35%|███████████▌                     | 17545/50000 [3:10:52<6:11:36,  1.46it/s]


 35%|███████████▌                     | 17546/50000 [3:10:53<6:39:23,  1.35it/s]


 35%|███████████▌                     | 17547/50000 [3:10:53<6:16:28,  1.44it/s]


 35%|███████████▌                     | 17548/50000 [3:10:54<6:24:23,  1.41it/s]


 35%|███████████▌                     | 17549/50000 [3:10:55<6:07:00,  1.47it/s]


 35%|███████████▌                     | 17550/50000 [3:10:55<5:55:46,  1.52it/s]


 35%|███████████▌                     | 17551/50000 [3:10:56<5:46:44,  1.56it/s]


 35%|███████████▌                     | 17552/50000 [3:10:56<5:51:35,  1.54it/s]


 35%|███████████▌                     | 17553/50000 [3:10:57<5:41:36,  1.58it/s]


 35%|███████████▌                     | 17554/50000 [3:10:58<5:23:52,  1.67it/s]


 35%|███████████▌                     | 17555/50000 [3:10:58<5:18:12,  1.70it/s]


 35%|███████████▌                     | 17556/50000 [3:10:59<5:24:23,  1.67it/s]


 35%|███████████▌                     | 17557/50000 [3:10:59<5:49:39,  1.55it/s]


 35%|███████████▌                     | 17558/50000 [3:11:00<5:39:17,  1.59it/s]


 35%|███████████▌                     | 17559/50000 [3:11:01<5:30:10,  1.64it/s]


 35%|███████████▌                     | 17560/50000 [3:11:01<5:50:34,  1.54it/s]


 35%|███████████▌                     | 17561/50000 [3:11:02<6:11:29,  1.46it/s]


 35%|███████████▌                     | 17562/50000 [3:11:03<6:39:37,  1.35it/s]


 35%|███████████▌                     | 17563/50000 [3:11:04<6:30:39,  1.38it/s]


 35%|███████████▌                     | 17564/50000 [3:11:04<6:30:49,  1.38it/s]


 35%|███████████▌                     | 17565/50000 [3:11:05<6:15:09,  1.44it/s]


 35%|███████████▌                     | 17566/50000 [3:11:06<6:12:22,  1.45it/s]


 35%|███████████▌                     | 17567/50000 [3:11:06<6:16:39,  1.44it/s]


 35%|███████████▌                     | 17568/50000 [3:11:07<6:23:06,  1.41it/s]


 35%|███████████▌                     | 17569/50000 [3:11:08<6:28:35,  1.39it/s]


 35%|███████████▌                     | 17570/50000 [3:11:08<6:04:25,  1.48it/s]


 35%|███████████▌                     | 17571/50000 [3:11:09<6:10:09,  1.46it/s]


 35%|███████████▌                     | 17572/50000 [3:11:10<6:02:43,  1.49it/s]


 35%|███████████▌                     | 17573/50000 [3:11:10<5:50:22,  1.54it/s]


 35%|███████████▌                     | 17574/50000 [3:11:11<6:01:14,  1.50it/s]


 35%|███████████▌                     | 17575/50000 [3:11:12<5:56:26,  1.52it/s]


 35%|███████████▌                     | 17576/50000 [3:11:12<5:58:19,  1.51it/s]


 35%|███████████▌                     | 17577/50000 [3:11:13<5:47:16,  1.56it/s]


 35%|███████████▌                     | 17578/50000 [3:11:14<5:58:00,  1.51it/s]


 35%|███████████▌                     | 17579/50000 [3:11:14<6:00:21,  1.50it/s]


 35%|███████████▌                     | 17580/50000 [3:11:15<5:54:56,  1.52it/s]


 35%|███████████▌                     | 17581/50000 [3:11:16<5:56:13,  1.52it/s]


 35%|███████████▌                     | 17582/50000 [3:11:16<5:40:44,  1.59it/s]


 35%|███████████▌                     | 17583/50000 [3:11:17<5:35:57,  1.61it/s]


 35%|███████████▌                     | 17584/50000 [3:11:17<5:24:30,  1.66it/s]


 35%|███████████▌                     | 17585/50000 [3:11:18<5:22:30,  1.68it/s]


 35%|███████████▌                     | 17586/50000 [3:11:19<5:40:45,  1.59it/s]


 35%|███████████▌                     | 17587/50000 [3:11:19<5:26:55,  1.65it/s]


 35%|███████████▌                     | 17588/50000 [3:11:20<5:36:14,  1.61it/s]


 35%|███████████▌                     | 17589/50000 [3:11:21<6:12:52,  1.45it/s]


 35%|███████████▌                     | 17590/50000 [3:11:21<5:58:34,  1.51it/s]


 35%|███████████▌                     | 17591/50000 [3:11:22<6:00:42,  1.50it/s]


 35%|███████████▌                     | 17592/50000 [3:11:23<5:42:49,  1.58it/s]


 35%|███████████▌                     | 17593/50000 [3:11:23<6:15:17,  1.44it/s]


 35%|███████████▌                     | 17594/50000 [3:11:24<5:52:00,  1.53it/s]


 35%|███████████▌                     | 17595/50000 [3:11:25<6:02:43,  1.49it/s]


 35%|███████████▌                     | 17596/50000 [3:11:25<6:02:33,  1.49it/s]


 35%|███████████▌                     | 17597/50000 [3:11:26<5:44:06,  1.57it/s]


 35%|███████████▌                     | 17598/50000 [3:11:26<5:30:08,  1.64it/s]


 35%|███████████▌                     | 17599/50000 [3:11:27<5:40:49,  1.58it/s]


 35%|███████████▌                     | 17600/50000 [3:11:28<5:43:11,  1.57it/s]
                                                                                
{'loss': 3.3859, 'grad_norm': 3.50226092338562, 'learning_rate': 0.000648, 'epoch': 0.92}

 35%|███████████▌                     | 17600/50000 [3:11:28<5:43:11,  1.57it/s]


 35%|███████████▌                     | 17601/50000 [3:11:28<5:42:52,  1.57it/s]


 35%|███████████▌                     | 17602/50000 [3:11:29<5:34:22,  1.61it/s]


 35%|███████████▌                     | 17603/50000 [3:11:30<5:51:17,  1.54it/s]


 35%|███████████▌                     | 17604/50000 [3:11:30<5:41:09,  1.58it/s]


 35%|███████████▌                     | 17605/50000 [3:11:31<5:32:01,  1.63it/s]


 35%|███████████▌                     | 17606/50000 [3:11:32<5:26:43,  1.65it/s]


 35%|███████████▌                     | 17607/50000 [3:11:32<5:25:03,  1.66it/s]


 35%|███████████▌                     | 17608/50000 [3:11:33<5:24:51,  1.66it/s]


 35%|███████████▌                     | 17609/50000 [3:11:33<5:10:58,  1.74it/s]


 35%|███████████▌                     | 17610/50000 [3:11:34<5:34:00,  1.62it/s]


 35%|███████████▌                     | 17611/50000 [3:11:35<5:54:12,  1.52it/s]


 35%|███████████▌                     | 17612/50000 [3:11:35<5:39:07,  1.59it/s]


 35%|███████████▌                     | 17613/50000 [3:11:36<5:30:43,  1.63it/s]


 35%|███████████▋                     | 17614/50000 [3:11:36<5:33:01,  1.62it/s]


 35%|███████████▋                     | 17615/50000 [3:11:37<5:56:35,  1.51it/s]


 35%|███████████▋                     | 17616/50000 [3:11:38<5:56:14,  1.52it/s]


 35%|███████████▋                     | 17617/50000 [3:11:38<5:43:40,  1.57it/s]


 35%|███████████▋                     | 17618/50000 [3:11:39<5:37:48,  1.60it/s]


 35%|███████████▋                     | 17619/50000 [3:11:40<5:37:57,  1.60it/s]


 35%|███████████▋                     | 17620/50000 [3:11:40<5:39:13,  1.59it/s]


 35%|███████████▋                     | 17621/50000 [3:11:41<5:48:13,  1.55it/s]


 35%|███████████▋                     | 17622/50000 [3:11:42<5:53:21,  1.53it/s]


 35%|███████████▋                     | 17623/50000 [3:11:42<5:52:52,  1.53it/s]


 35%|███████████▋                     | 17624/50000 [3:11:43<5:48:30,  1.55it/s]


 35%|███████████▋                     | 17625/50000 [3:11:44<5:59:45,  1.50it/s]


 35%|███████████▋                     | 17626/50000 [3:11:44<6:00:26,  1.50it/s]


 35%|███████████▋                     | 17627/50000 [3:11:45<5:53:42,  1.53it/s]


 35%|███████████▋                     | 17628/50000 [3:11:46<5:52:58,  1.53it/s]


 35%|███████████▋                     | 17629/50000 [3:11:46<5:45:28,  1.56it/s]


 35%|███████████▋                     | 17630/50000 [3:11:47<5:44:39,  1.57it/s]


 35%|███████████▋                     | 17631/50000 [3:11:48<5:47:48,  1.55it/s]


 35%|███████████▋                     | 17632/50000 [3:11:48<5:38:45,  1.59it/s]


 35%|███████████▋                     | 17633/50000 [3:11:49<5:44:39,  1.57it/s]


 35%|███████████▋                     | 17634/50000 [3:11:49<5:38:34,  1.59it/s]


 35%|███████████▋                     | 17635/50000 [3:11:50<5:41:24,  1.58it/s]


 35%|███████████▋                     | 17636/50000 [3:11:51<6:04:19,  1.48it/s]


 35%|███████████▋                     | 17637/50000 [3:11:51<5:58:30,  1.50it/s]


 35%|███████████▋                     | 17638/50000 [3:11:52<5:40:09,  1.59it/s]


 35%|███████████▋                     | 17639/50000 [3:11:53<5:45:34,  1.56it/s]


 35%|███████████▋                     | 17640/50000 [3:11:53<6:01:38,  1.49it/s]


 35%|███████████▋                     | 17641/50000 [3:11:54<5:59:42,  1.50it/s]


 35%|███████████▋                     | 17642/50000 [3:11:55<5:39:54,  1.59it/s]


 35%|███████████▋                     | 17643/50000 [3:11:55<5:32:42,  1.62it/s]


 35%|███████████▋                     | 17644/50000 [3:11:56<5:47:45,  1.55it/s]


 35%|███████████▋                     | 17645/50000 [3:11:56<5:38:38,  1.59it/s]


 35%|███████████▋                     | 17646/50000 [3:11:57<6:11:05,  1.45it/s]


 35%|███████████▋                     | 17647/50000 [3:11:58<6:06:17,  1.47it/s]


 35%|███████████▋                     | 17648/50000 [3:11:59<6:01:46,  1.49it/s]


 35%|███████████▋                     | 17649/50000 [3:11:59<5:49:28,  1.54it/s]


 35%|███████████▋                     | 17650/50000 [3:12:00<5:47:11,  1.55it/s]


 35%|███████████▋                     | 17651/50000 [3:12:01<5:52:35,  1.53it/s]


 35%|███████████▋                     | 17652/50000 [3:12:01<6:05:00,  1.48it/s]


 35%|███████████▋                     | 17653/50000 [3:12:02<5:51:45,  1.53it/s]


 35%|███████████▋                     | 17654/50000 [3:12:02<5:40:27,  1.58it/s]


 35%|███████████▋                     | 17655/50000 [3:12:03<5:46:47,  1.55it/s]


 35%|███████████▋                     | 17656/50000 [3:12:04<6:21:06,  1.41it/s]


 35%|███████████▋                     | 17657/50000 [3:12:05<6:11:06,  1.45it/s]


 35%|███████████▋                     | 17658/50000 [3:12:05<5:49:53,  1.54it/s]


 35%|███████████▋                     | 17659/50000 [3:12:06<6:02:04,  1.49it/s]


 35%|███████████▋                     | 17660/50000 [3:12:07<5:53:50,  1.52it/s]


 35%|███████████▋                     | 17661/50000 [3:12:07<5:54:09,  1.52it/s]


 35%|███████████▋                     | 17662/50000 [3:12:08<5:48:31,  1.55it/s]


 35%|███████████▋                     | 17663/50000 [3:12:08<5:33:01,  1.62it/s]


 35%|███████████▋                     | 17664/50000 [3:12:09<5:55:21,  1.52it/s]


 35%|███████████▋                     | 17665/50000 [3:12:10<5:38:51,  1.59it/s]


 35%|███████████▋                     | 17666/50000 [3:12:10<5:16:30,  1.70it/s]


 35%|███████████▋                     | 17667/50000 [3:12:11<5:16:38,  1.70it/s]


 35%|███████████▋                     | 17668/50000 [3:12:11<5:40:42,  1.58it/s]


 35%|███████████▋                     | 17669/50000 [3:12:12<5:44:50,  1.56it/s]


 35%|███████████▋                     | 17670/50000 [3:12:13<5:42:42,  1.57it/s]


 35%|███████████▋                     | 17671/50000 [3:12:14<6:01:05,  1.49it/s]


 35%|███████████▋                     | 17672/50000 [3:12:14<5:55:09,  1.52it/s]


 35%|███████████▋                     | 17673/50000 [3:12:15<5:56:00,  1.51it/s]


 35%|███████████▋                     | 17674/50000 [3:12:15<5:52:28,  1.53it/s]


 35%|███████████▋                     | 17675/50000 [3:12:16<5:49:01,  1.54it/s]


 35%|███████████▋                     | 17676/50000 [3:12:17<5:58:58,  1.50it/s]


 35%|███████████▋                     | 17677/50000 [3:12:17<5:59:54,  1.50it/s]


 35%|███████████▋                     | 17678/50000 [3:12:18<5:45:09,  1.56it/s]


 35%|███████████▋                     | 17679/50000 [3:12:19<5:36:46,  1.60it/s]


 35%|███████████▋                     | 17680/50000 [3:12:19<5:42:12,  1.57it/s]


 35%|███████████▋                     | 17681/50000 [3:12:20<5:27:40,  1.64it/s]


 35%|███████████▋                     | 17682/50000 [3:12:20<5:35:40,  1.60it/s]


 35%|███████████▋                     | 17683/50000 [3:12:21<5:35:16,  1.61it/s]


 35%|███████████▋                     | 17684/50000 [3:12:22<5:43:04,  1.57it/s]


 35%|███████████▋                     | 17685/50000 [3:12:23<6:18:23,  1.42it/s]


 35%|███████████▋                     | 17686/50000 [3:12:23<5:56:56,  1.51it/s]


 35%|███████████▋                     | 17687/50000 [3:12:24<5:42:57,  1.57it/s]


 35%|███████████▋                     | 17688/50000 [3:12:24<5:45:21,  1.56it/s]


 35%|███████████▋                     | 17689/50000 [3:12:25<5:50:06,  1.54it/s]


 35%|███████████▋                     | 17690/50000 [3:12:26<5:51:23,  1.53it/s]


 35%|███████████▋                     | 17691/50000 [3:12:26<5:41:05,  1.58it/s]


 35%|███████████▋                     | 17692/50000 [3:12:27<5:47:05,  1.55it/s]


 35%|███████████▋                     | 17693/50000 [3:12:28<5:57:29,  1.51it/s]


 35%|███████████▋                     | 17694/50000 [3:12:28<6:12:12,  1.45it/s]


 35%|███████████▋                     | 17695/50000 [3:12:29<5:43:54,  1.57it/s]


 35%|███████████▋                     | 17696/50000 [3:12:30<5:56:42,  1.51it/s]


 35%|███████████▋                     | 17697/50000 [3:12:30<5:54:00,  1.52it/s]


 35%|███████████▋                     | 17698/50000 [3:12:31<6:19:33,  1.42it/s]


 35%|███████████▋                     | 17699/50000 [3:12:32<6:27:42,  1.39it/s]


 35%|███████████▋                     | 17700/50000 [3:12:33<6:19:45,  1.42it/s]
                                                                                
{'loss': 3.3847, 'grad_norm': 3.536252975463867, 'learning_rate': 0.000646, 'epoch': 0.93}

 35%|███████████▋                     | 17700/50000 [3:12:33<6:19:45,  1.42it/s]


 35%|███████████▋                     | 17701/50000 [3:12:33<6:14:20,  1.44it/s]


 35%|███████████▋                     | 17702/50000 [3:12:34<6:02:16,  1.49it/s]


 35%|███████████▋                     | 17703/50000 [3:12:35<6:04:01,  1.48it/s]


 35%|███████████▋                     | 17704/50000 [3:12:35<5:49:58,  1.54it/s]


 35%|███████████▋                     | 17705/50000 [3:12:36<5:38:02,  1.59it/s]


 35%|███████████▋                     | 17706/50000 [3:12:36<5:54:28,  1.52it/s]


 35%|███████████▋                     | 17707/50000 [3:12:37<5:51:20,  1.53it/s]


 35%|███████████▋                     | 17708/50000 [3:12:38<5:41:57,  1.57it/s]


 35%|███████████▋                     | 17709/50000 [3:12:38<5:45:41,  1.56it/s]


 35%|███████████▋                     | 17710/50000 [3:12:39<5:51:19,  1.53it/s]


 35%|███████████▋                     | 17711/50000 [3:12:40<5:56:16,  1.51it/s]


 35%|███████████▋                     | 17712/50000 [3:12:40<5:57:41,  1.50it/s]


 35%|███████████▋                     | 17713/50000 [3:12:41<5:45:22,  1.56it/s]


 35%|███████████▋                     | 17714/50000 [3:12:42<5:35:46,  1.60it/s]


 35%|███████████▋                     | 17715/50000 [3:12:42<5:32:07,  1.62it/s]


 35%|███████████▋                     | 17716/50000 [3:12:43<5:32:43,  1.62it/s]


 35%|███████████▋                     | 17717/50000 [3:12:43<5:41:11,  1.58it/s]


 35%|███████████▋                     | 17718/50000 [3:12:44<5:39:14,  1.59it/s]


 35%|███████████▋                     | 17719/50000 [3:12:45<6:15:51,  1.43it/s]


 35%|███████████▋                     | 17720/50000 [3:12:46<6:23:17,  1.40it/s]


 35%|███████████▋                     | 17721/50000 [3:12:46<6:05:41,  1.47it/s]


 35%|███████████▋                     | 17722/50000 [3:12:47<5:44:15,  1.56it/s]


 35%|███████████▋                     | 17723/50000 [3:12:47<5:34:42,  1.61it/s]


 35%|███████████▋                     | 17724/50000 [3:12:48<5:29:07,  1.63it/s]


 35%|███████████▋                     | 17725/50000 [3:12:49<5:32:44,  1.62it/s]


 35%|███████████▋                     | 17726/50000 [3:12:49<6:08:42,  1.46it/s]


 35%|███████████▋                     | 17727/50000 [3:12:50<5:55:02,  1.51it/s]


 35%|███████████▋                     | 17728/50000 [3:12:51<5:52:35,  1.53it/s]


 35%|███████████▋                     | 17729/50000 [3:12:51<5:40:45,  1.58it/s]


 35%|███████████▋                     | 17730/50000 [3:12:52<5:34:28,  1.61it/s]


 35%|███████████▋                     | 17731/50000 [3:12:53<5:31:03,  1.62it/s]


 35%|███████████▋                     | 17732/50000 [3:12:53<6:07:27,  1.46it/s]


 35%|███████████▋                     | 17733/50000 [3:12:54<5:58:18,  1.50it/s]


 35%|███████████▋                     | 17734/50000 [3:12:55<5:46:46,  1.55it/s]


 35%|███████████▋                     | 17735/50000 [3:12:55<5:59:33,  1.50it/s]


 35%|███████████▋                     | 17736/50000 [3:12:56<5:36:24,  1.60it/s]


 35%|███████████▋                     | 17737/50000 [3:12:56<5:19:04,  1.69it/s]


 35%|███████████▋                     | 17738/50000 [3:12:57<5:06:57,  1.75it/s]


 35%|███████████▋                     | 17739/50000 [3:12:58<5:20:18,  1.68it/s]


 35%|███████████▋                     | 17740/50000 [3:12:58<5:42:17,  1.57it/s]


 35%|███████████▋                     | 17741/50000 [3:12:59<5:47:08,  1.55it/s]


 35%|███████████▋                     | 17742/50000 [3:13:00<5:44:24,  1.56it/s]


 35%|███████████▋                     | 17743/50000 [3:13:00<5:37:19,  1.59it/s]


 35%|███████████▋                     | 17744/50000 [3:13:01<5:39:28,  1.58it/s]


 35%|███████████▋                     | 17745/50000 [3:13:01<5:54:23,  1.52it/s]


 35%|███████████▋                     | 17746/50000 [3:13:02<6:13:04,  1.44it/s]


 35%|███████████▋                     | 17747/50000 [3:13:03<6:08:16,  1.46it/s]


 35%|███████████▋                     | 17748/50000 [3:13:04<5:52:51,  1.52it/s]


 35%|███████████▋                     | 17749/50000 [3:13:04<5:37:58,  1.59it/s]


 36%|███████████▋                     | 17750/50000 [3:13:05<5:41:38,  1.57it/s]


 36%|███████████▋                     | 17751/50000 [3:13:05<5:43:09,  1.57it/s]


 36%|███████████▋                     | 17752/50000 [3:13:06<6:16:55,  1.43it/s]


 36%|███████████▋                     | 17753/50000 [3:13:07<6:21:41,  1.41it/s]


 36%|███████████▋                     | 17754/50000 [3:13:08<6:10:19,  1.45it/s]


 36%|███████████▋                     | 17755/50000 [3:13:08<6:10:25,  1.45it/s]


 36%|███████████▋                     | 17756/50000 [3:13:09<6:02:45,  1.48it/s]


 36%|███████████▋                     | 17757/50000 [3:13:10<5:49:03,  1.54it/s]


 36%|███████████▋                     | 17758/50000 [3:13:10<5:49:53,  1.54it/s]


 36%|███████████▋                     | 17759/50000 [3:13:11<5:47:22,  1.55it/s]


 36%|███████████▋                     | 17760/50000 [3:13:11<5:48:45,  1.54it/s]


 36%|███████████▋                     | 17761/50000 [3:13:12<5:45:20,  1.56it/s]


 36%|███████████▋                     | 17762/50000 [3:13:13<5:38:59,  1.59it/s]


 36%|███████████▋                     | 17763/50000 [3:13:13<5:39:28,  1.58it/s]


 36%|███████████▋                     | 17764/50000 [3:13:14<5:33:29,  1.61it/s]


 36%|███████████▋                     | 17765/50000 [3:13:15<5:52:20,  1.52it/s]


 36%|███████████▋                     | 17766/50000 [3:13:15<6:17:05,  1.42it/s]


 36%|███████████▋                     | 17767/50000 [3:13:16<6:12:06,  1.44it/s]


 36%|███████████▋                     | 17768/50000 [3:13:17<5:57:32,  1.50it/s]


 36%|███████████▋                     | 17769/50000 [3:13:17<5:46:04,  1.55it/s]


 36%|███████████▋                     | 17770/50000 [3:13:18<5:33:05,  1.61it/s]


 36%|███████████▋                     | 17771/50000 [3:13:19<5:49:00,  1.54it/s]


 36%|███████████▋                     | 17772/50000 [3:13:19<5:27:30,  1.64it/s]


 36%|███████████▋                     | 17773/50000 [3:13:20<5:07:28,  1.75it/s]


 36%|███████████▋                     | 17774/50000 [3:13:20<5:08:11,  1.74it/s]


 36%|███████████▋                     | 17775/50000 [3:13:21<5:12:53,  1.72it/s]


 36%|███████████▋                     | 17776/50000 [3:13:21<5:15:57,  1.70it/s]


 36%|███████████▋                     | 17777/50000 [3:13:22<5:18:02,  1.69it/s]


 36%|███████████▋                     | 17778/50000 [3:13:23<5:26:37,  1.64it/s]


 36%|███████████▋                     | 17779/50000 [3:13:23<5:47:20,  1.55it/s]


 36%|███████████▋                     | 17780/50000 [3:13:24<5:22:18,  1.67it/s]


 36%|███████████▋                     | 17781/50000 [3:13:25<5:45:21,  1.55it/s]


 36%|███████████▋                     | 17782/50000 [3:13:25<5:43:44,  1.56it/s]


 36%|███████████▋                     | 17783/50000 [3:13:26<5:57:26,  1.50it/s]


 36%|███████████▋                     | 17784/50000 [3:13:27<5:53:29,  1.52it/s]


 36%|███████████▋                     | 17785/50000 [3:13:27<5:44:41,  1.56it/s]


 36%|███████████▋                     | 17786/50000 [3:13:28<5:47:37,  1.54it/s]


 36%|███████████▋                     | 17787/50000 [3:13:29<5:52:48,  1.52it/s]


 36%|███████████▋                     | 17788/50000 [3:13:29<5:55:31,  1.51it/s]


 36%|███████████▋                     | 17789/50000 [3:13:30<5:52:29,  1.52it/s]


 36%|███████████▋                     | 17790/50000 [3:13:31<5:57:23,  1.50it/s]


 36%|███████████▋                     | 17791/50000 [3:13:31<6:06:06,  1.47it/s]


 36%|███████████▋                     | 17792/50000 [3:13:32<6:10:53,  1.45it/s]


 36%|███████████▋                     | 17793/50000 [3:13:33<6:21:44,  1.41it/s]


 36%|███████████▋                     | 17794/50000 [3:13:33<6:13:35,  1.44it/s]


 36%|███████████▋                     | 17795/50000 [3:13:34<6:11:53,  1.44it/s]


 36%|███████████▋                     | 17796/50000 [3:13:35<5:53:24,  1.52it/s]


 36%|███████████▋                     | 17797/50000 [3:13:35<5:39:39,  1.58it/s]


 36%|███████████▋                     | 17798/50000 [3:13:36<5:30:53,  1.62it/s]


 36%|███████████▋                     | 17799/50000 [3:13:37<5:54:41,  1.51it/s]


 36%|███████████▋                     | 17800/50000 [3:13:37<6:26:14,  1.39it/s]
                                                                                
{'loss': 3.363, 'grad_norm': 2.84260630607605, 'learning_rate': 0.000644, 'epoch': 0.93}

 36%|███████████▋                     | 17800/50000 [3:13:37<6:26:14,  1.39it/s]


 36%|███████████▋                     | 17801/50000 [3:13:38<6:12:51,  1.44it/s]


 36%|███████████▋                     | 17802/50000 [3:13:39<6:06:15,  1.47it/s]


 36%|███████████▋                     | 17803/50000 [3:13:39<5:56:37,  1.50it/s]


 36%|███████████▊                     | 17804/50000 [3:13:40<6:13:38,  1.44it/s]


 36%|███████████▊                     | 17805/50000 [3:13:41<5:54:02,  1.52it/s]


 36%|███████████▊                     | 17806/50000 [3:13:42<6:19:25,  1.41it/s]


 36%|███████████▊                     | 17807/50000 [3:13:42<6:20:14,  1.41it/s]


 36%|███████████▊                     | 17808/50000 [3:13:43<6:12:18,  1.44it/s]


 36%|███████████▊                     | 17809/50000 [3:13:44<5:55:20,  1.51it/s]


 36%|███████████▊                     | 17810/50000 [3:13:44<5:55:39,  1.51it/s]


 36%|███████████▊                     | 17811/50000 [3:13:45<5:55:57,  1.51it/s]


 36%|███████████▊                     | 17812/50000 [3:13:46<6:07:54,  1.46it/s]


 36%|███████████▊                     | 17813/50000 [3:13:46<6:12:01,  1.44it/s]


 36%|███████████▊                     | 17814/50000 [3:13:47<5:58:29,  1.50it/s]


 36%|███████████▊                     | 17815/50000 [3:13:48<5:53:18,  1.52it/s]


 36%|███████████▊                     | 17816/50000 [3:13:48<5:54:33,  1.51it/s]


 36%|███████████▊                     | 17817/50000 [3:13:49<5:38:15,  1.59it/s]


 36%|███████████▊                     | 17818/50000 [3:13:49<5:51:15,  1.53it/s]


 36%|███████████▊                     | 17819/50000 [3:13:50<6:00:17,  1.49it/s]


 36%|███████████▊                     | 17820/50000 [3:13:51<5:42:08,  1.57it/s]


 36%|███████████▊                     | 17821/50000 [3:13:51<5:41:38,  1.57it/s]


 36%|███████████▊                     | 17822/50000 [3:13:52<5:28:19,  1.63it/s]


 36%|███████████▊                     | 17823/50000 [3:13:53<5:46:36,  1.55it/s]


 36%|███████████▊                     | 17824/50000 [3:13:53<5:52:58,  1.52it/s]


 36%|███████████▊                     | 17825/50000 [3:13:54<5:54:24,  1.51it/s]


 36%|███████████▊                     | 17826/50000 [3:13:55<5:37:11,  1.59it/s]


 36%|███████████▊                     | 17827/50000 [3:13:55<5:40:26,  1.58it/s]


 36%|███████████▊                     | 17828/50000 [3:13:56<5:48:44,  1.54it/s]


 36%|███████████▊                     | 17829/50000 [3:13:56<5:28:25,  1.63it/s]


 36%|███████████▊                     | 17830/50000 [3:13:57<5:53:41,  1.52it/s]


 36%|███████████▊                     | 17831/50000 [3:13:58<6:19:51,  1.41it/s]


 36%|███████████▊                     | 17832/50000 [3:13:59<6:09:50,  1.45it/s]


 36%|███████████▊                     | 17833/50000 [3:13:59<5:46:31,  1.55it/s]


 36%|███████████▊                     | 17834/50000 [3:14:00<5:44:29,  1.56it/s]


 36%|███████████▊                     | 17835/50000 [3:14:01<5:55:09,  1.51it/s]


 36%|███████████▊                     | 17836/50000 [3:14:01<6:21:53,  1.40it/s]


 36%|███████████▊                     | 17837/50000 [3:14:02<6:21:23,  1.41it/s]


 36%|███████████▊                     | 17838/50000 [3:14:03<6:16:07,  1.43it/s]


 36%|███████████▊                     | 17839/50000 [3:14:03<6:00:23,  1.49it/s]


 36%|███████████▊                     | 17840/50000 [3:14:04<5:35:37,  1.60it/s]


 36%|███████████▊                     | 17841/50000 [3:14:04<5:31:55,  1.61it/s]


 36%|███████████▊                     | 17842/50000 [3:14:05<6:11:22,  1.44it/s]


 36%|███████████▊                     | 17843/50000 [3:14:06<6:06:53,  1.46it/s]


 36%|███████████▊                     | 17844/50000 [3:14:07<5:47:30,  1.54it/s]


 36%|███████████▊                     | 17845/50000 [3:14:07<5:36:50,  1.59it/s]


 36%|███████████▊                     | 17846/50000 [3:14:08<5:37:36,  1.59it/s]


 36%|███████████▊                     | 17847/50000 [3:14:08<5:32:55,  1.61it/s]


 36%|███████████▊                     | 17848/50000 [3:14:09<5:25:42,  1.65it/s]


 36%|███████████▊                     | 17849/50000 [3:14:10<5:58:02,  1.50it/s]


 36%|███████████▊                     | 17850/50000 [3:14:10<5:42:15,  1.57it/s]


 36%|███████████▊                     | 17851/50000 [3:14:11<5:46:24,  1.55it/s]


 36%|███████████▊                     | 17852/50000 [3:14:12<6:13:35,  1.43it/s]


 36%|███████████▊                     | 17853/50000 [3:14:12<5:56:09,  1.50it/s]


 36%|███████████▊                     | 17854/50000 [3:14:13<6:03:21,  1.47it/s]


 36%|███████████▊                     | 17855/50000 [3:14:14<5:42:57,  1.56it/s]


 36%|███████████▊                     | 17856/50000 [3:14:14<5:40:48,  1.57it/s]


 36%|███████████▊                     | 17857/50000 [3:14:15<5:33:10,  1.61it/s]


 36%|███████████▊                     | 17858/50000 [3:14:15<5:25:54,  1.64it/s]


 36%|███████████▊                     | 17859/50000 [3:14:16<5:31:55,  1.61it/s]


 36%|███████████▊                     | 17860/50000 [3:14:17<5:28:57,  1.63it/s]


 36%|███████████▊                     | 17861/50000 [3:14:17<5:31:01,  1.62it/s]


 36%|███████████▊                     | 17862/50000 [3:14:18<5:40:33,  1.57it/s]


 36%|███████████▊                     | 17863/50000 [3:14:19<5:44:12,  1.56it/s]


 36%|███████████▊                     | 17864/50000 [3:14:19<5:49:06,  1.53it/s]


 36%|███████████▊                     | 17865/50000 [3:14:20<5:33:11,  1.61it/s]


 36%|███████████▊                     | 17866/50000 [3:14:21<5:33:06,  1.61it/s]


 36%|███████████▊                     | 17867/50000 [3:14:21<6:06:30,  1.46it/s]


 36%|███████████▊                     | 17868/50000 [3:14:22<6:15:53,  1.42it/s]


 36%|███████████▊                     | 17869/50000 [3:14:23<5:59:40,  1.49it/s]


 36%|███████████▊                     | 17870/50000 [3:14:23<5:56:32,  1.50it/s]


 36%|███████████▊                     | 17871/50000 [3:14:24<6:28:52,  1.38it/s]


 36%|███████████▊                     | 17872/50000 [3:14:25<5:50:13,  1.53it/s]


 36%|███████████▊                     | 17873/50000 [3:14:26<6:33:45,  1.36it/s]


 36%|███████████▊                     | 17874/50000 [3:14:26<6:07:11,  1.46it/s]


 36%|███████████▊                     | 17875/50000 [3:14:27<5:37:57,  1.58it/s]


 36%|███████████▊                     | 17876/50000 [3:14:27<5:33:09,  1.61it/s]


 36%|███████████▊                     | 17877/50000 [3:14:28<5:29:48,  1.62it/s]


 36%|███████████▊                     | 17878/50000 [3:14:28<5:23:26,  1.66it/s]


 36%|███████████▊                     | 17879/50000 [3:14:29<5:03:17,  1.77it/s]


 36%|███████████▊                     | 17880/50000 [3:14:30<5:33:39,  1.60it/s]


 36%|███████████▊                     | 17881/50000 [3:14:30<5:23:06,  1.66it/s]


 36%|███████████▊                     | 17882/50000 [3:14:31<5:13:44,  1.71it/s]


 36%|███████████▊                     | 17883/50000 [3:14:31<5:26:18,  1.64it/s]


 36%|███████████▊                     | 17884/50000 [3:14:32<5:33:03,  1.61it/s]


 36%|███████████▊                     | 17885/50000 [3:14:33<5:48:28,  1.54it/s]


 36%|███████████▊                     | 17886/50000 [3:14:33<5:44:36,  1.55it/s]


 36%|███████████▊                     | 17887/50000 [3:14:34<6:03:38,  1.47it/s]


 36%|███████████▊                     | 17888/50000 [3:14:35<5:49:10,  1.53it/s]


 36%|███████████▊                     | 17889/50000 [3:14:36<6:04:41,  1.47it/s]


 36%|███████████▊                     | 17890/50000 [3:14:36<5:54:54,  1.51it/s]


 36%|███████████▊                     | 17891/50000 [3:14:37<6:04:40,  1.47it/s]


 36%|███████████▊                     | 17892/50000 [3:14:38<6:09:07,  1.45it/s]


 36%|███████████▊                     | 17893/50000 [3:14:38<6:08:16,  1.45it/s]


 36%|███████████▊                     | 17894/50000 [3:14:39<6:01:24,  1.48it/s]


 36%|███████████▊                     | 17895/50000 [3:14:40<5:49:32,  1.53it/s]


 36%|███████████▊                     | 17896/50000 [3:14:40<6:18:41,  1.41it/s]


 36%|███████████▊                     | 17897/50000 [3:14:41<6:07:38,  1.46it/s]


 36%|███████████▊                     | 17898/50000 [3:14:42<5:49:05,  1.53it/s]


 36%|███████████▊                     | 17899/50000 [3:14:42<5:26:33,  1.64it/s]


 36%|███████████▊                     | 17900/50000 [3:14:43<5:26:20,  1.64it/s]
                                                                                
{'loss': 3.3351, 'grad_norm': 2.8508920669555664, 'learning_rate': 0.000642, 'epoch': 0.94}

 36%|███████████▊                     | 17900/50000 [3:14:43<5:26:20,  1.64it/s]


 36%|███████████▊                     | 17901/50000 [3:14:44<6:17:03,  1.42it/s]


 36%|███████████▊                     | 17902/50000 [3:14:44<6:03:34,  1.47it/s]


 36%|███████████▊                     | 17903/50000 [3:14:45<6:11:38,  1.44it/s]


 36%|███████████▊                     | 17904/50000 [3:14:46<5:54:35,  1.51it/s]


 36%|███████████▊                     | 17905/50000 [3:14:46<5:57:54,  1.49it/s]


 36%|███████████▊                     | 17906/50000 [3:14:47<5:38:03,  1.58it/s]


 36%|███████████▊                     | 17907/50000 [3:14:47<5:32:07,  1.61it/s]


 36%|███████████▊                     | 17908/50000 [3:14:48<5:25:01,  1.65it/s]


 36%|███████████▊                     | 17909/50000 [3:14:49<5:32:04,  1.61it/s]


 36%|███████████▊                     | 17910/50000 [3:14:49<5:26:53,  1.64it/s]


 36%|███████████▊                     | 17911/50000 [3:14:50<5:44:59,  1.55it/s]


 36%|███████████▊                     | 17912/50000 [3:14:51<5:43:09,  1.56it/s]


 36%|███████████▊                     | 17913/50000 [3:14:51<5:33:42,  1.60it/s]


 36%|███████████▊                     | 17914/50000 [3:14:52<5:28:00,  1.63it/s]


 36%|███████████▊                     | 17915/50000 [3:14:52<5:25:05,  1.64it/s]


 36%|███████████▊                     | 17916/50000 [3:14:53<5:20:02,  1.67it/s]


 36%|███████████▊                     | 17917/50000 [3:14:54<5:28:34,  1.63it/s]


 36%|███████████▊                     | 17918/50000 [3:14:54<5:52:14,  1.52it/s]


 36%|███████████▊                     | 17919/50000 [3:14:55<5:49:06,  1.53it/s]


 36%|███████████▊                     | 17920/50000 [3:14:56<5:36:49,  1.59it/s]


 36%|███████████▊                     | 17921/50000 [3:14:56<5:49:32,  1.53it/s]


 36%|███████████▊                     | 17922/50000 [3:14:57<6:07:54,  1.45it/s]


 36%|███████████▊                     | 17923/50000 [3:14:58<6:02:03,  1.48it/s]


 36%|███████████▊                     | 17924/50000 [3:14:58<6:08:10,  1.45it/s]


 36%|███████████▊                     | 17925/50000 [3:14:59<6:00:18,  1.48it/s]


 36%|███████████▊                     | 17926/50000 [3:15:00<5:45:39,  1.55it/s]


 36%|███████████▊                     | 17927/50000 [3:15:00<6:07:12,  1.46it/s]


 36%|███████████▊                     | 17928/50000 [3:15:01<6:00:31,  1.48it/s]


 36%|███████████▊                     | 17929/50000 [3:15:02<6:08:26,  1.45it/s]


 36%|███████████▊                     | 17930/50000 [3:15:02<5:57:30,  1.50it/s]


 36%|███████████▊                     | 17931/50000 [3:15:03<5:43:38,  1.56it/s]


 36%|███████████▊                     | 17932/50000 [3:15:04<5:54:17,  1.51it/s]


 36%|███████████▊                     | 17933/50000 [3:15:04<5:49:32,  1.53it/s]


 36%|███████████▊                     | 17934/50000 [3:15:05<5:31:58,  1.61it/s]


 36%|███████████▊                     | 17935/50000 [3:15:05<5:27:42,  1.63it/s]


 36%|███████████▊                     | 17936/50000 [3:15:06<5:18:44,  1.68it/s]


 36%|███████████▊                     | 17937/50000 [3:15:07<5:53:57,  1.51it/s]


 36%|███████████▊                     | 17938/50000 [3:15:07<5:44:05,  1.55it/s]


 36%|███████████▊                     | 17939/50000 [3:15:08<5:35:12,  1.59it/s]


 36%|███████████▊                     | 17940/50000 [3:15:09<5:36:05,  1.59it/s]


 36%|███████████▊                     | 17941/50000 [3:15:09<5:32:36,  1.61it/s]


 36%|███████████▊                     | 17942/50000 [3:15:10<5:41:24,  1.57it/s]


 36%|███████████▊                     | 17943/50000 [3:15:11<5:55:00,  1.50it/s]


 36%|███████████▊                     | 17944/50000 [3:15:11<5:54:59,  1.50it/s]


 36%|███████████▊                     | 17945/50000 [3:15:12<5:58:11,  1.49it/s]


 36%|███████████▊                     | 17946/50000 [3:15:13<5:47:10,  1.54it/s]


 36%|███████████▊                     | 17947/50000 [3:15:13<5:51:33,  1.52it/s]


 36%|███████████▊                     | 17948/50000 [3:15:14<5:48:31,  1.53it/s]


 36%|███████████▊                     | 17949/50000 [3:15:15<5:37:19,  1.58it/s]


 36%|███████████▊                     | 17950/50000 [3:15:15<5:52:08,  1.52it/s]


 36%|███████████▊                     | 17951/50000 [3:15:16<5:40:48,  1.57it/s]


 36%|███████████▊                     | 17952/50000 [3:15:16<5:38:59,  1.58it/s]


 36%|███████████▊                     | 17953/50000 [3:15:17<5:20:00,  1.67it/s]


 36%|███████████▊                     | 17954/50000 [3:15:18<5:54:42,  1.51it/s]


 36%|███████████▊                     | 17955/50000 [3:15:18<5:44:35,  1.55it/s]


 36%|███████████▊                     | 17956/50000 [3:15:19<5:47:41,  1.54it/s]


 36%|███████████▊                     | 17957/50000 [3:15:20<6:13:00,  1.43it/s]


 36%|███████████▊                     | 17958/50000 [3:15:21<6:01:30,  1.48it/s]


 36%|███████████▊                     | 17959/50000 [3:15:21<6:07:44,  1.45it/s]


 36%|███████████▊                     | 17960/50000 [3:15:22<6:05:00,  1.46it/s]


 36%|███████████▊                     | 17961/50000 [3:15:23<6:20:40,  1.40it/s]


 36%|███████████▊                     | 17962/50000 [3:15:23<6:14:55,  1.42it/s]


 36%|███████████▊                     | 17963/50000 [3:15:24<5:54:49,  1.50it/s]


 36%|███████████▊                     | 17964/50000 [3:15:25<5:54:51,  1.50it/s]


 36%|███████████▊                     | 17965/50000 [3:15:25<5:37:47,  1.58it/s]


 36%|███████████▊                     | 17966/50000 [3:15:26<5:26:41,  1.63it/s]


 36%|███████████▊                     | 17967/50000 [3:15:26<5:29:05,  1.62it/s]


 36%|███████████▊                     | 17968/50000 [3:15:27<5:24:37,  1.64it/s]


 36%|███████████▊                     | 17969/50000 [3:15:28<5:32:45,  1.60it/s]


 36%|███████████▊                     | 17970/50000 [3:15:28<5:34:18,  1.60it/s]


 36%|███████████▊                     | 17971/50000 [3:15:29<5:27:18,  1.63it/s]


 36%|███████████▊                     | 17972/50000 [3:15:29<5:04:48,  1.75it/s]


 36%|███████████▊                     | 17973/50000 [3:15:30<6:02:27,  1.47it/s]


 36%|███████████▊                     | 17974/50000 [3:15:31<5:43:04,  1.56it/s]


 36%|███████████▊                     | 17975/50000 [3:15:31<5:48:42,  1.53it/s]


 36%|███████████▊                     | 17976/50000 [3:15:32<5:52:53,  1.51it/s]


 36%|███████████▊                     | 17977/50000 [3:15:33<6:00:53,  1.48it/s]


 36%|███████████▊                     | 17978/50000 [3:15:33<5:56:14,  1.50it/s]


 36%|███████████▊                     | 17979/50000 [3:15:34<5:54:04,  1.51it/s]


 36%|███████████▊                     | 17980/50000 [3:15:35<5:41:25,  1.56it/s]


 36%|███████████▊                     | 17981/50000 [3:15:35<6:00:19,  1.48it/s]


 36%|███████████▊                     | 17982/50000 [3:15:36<5:46:46,  1.54it/s]


 36%|███████████▊                     | 17983/50000 [3:15:37<5:35:21,  1.59it/s]


 36%|███████████▊                     | 17984/50000 [3:15:37<5:38:38,  1.58it/s]


 36%|███████████▊                     | 17985/50000 [3:15:38<5:30:38,  1.61it/s]


 36%|███████████▊                     | 17986/50000 [3:15:39<5:49:42,  1.53it/s]


 36%|███████████▊                     | 17987/50000 [3:15:39<5:28:41,  1.62it/s]


 36%|███████████▊                     | 17988/50000 [3:15:40<5:21:49,  1.66it/s]


 36%|███████████▊                     | 17989/50000 [3:15:40<5:15:45,  1.69it/s]


 36%|███████████▊                     | 17990/50000 [3:15:41<5:05:00,  1.75it/s]


 36%|███████████▊                     | 17991/50000 [3:15:41<5:03:05,  1.76it/s]


 36%|███████████▊                     | 17992/50000 [3:15:42<5:31:31,  1.61it/s]


 36%|███████████▉                     | 17993/50000 [3:15:43<5:33:00,  1.60it/s]


 36%|███████████▉                     | 17994/50000 [3:15:43<5:29:01,  1.62it/s]


 36%|███████████▉                     | 17995/50000 [3:15:44<5:32:00,  1.61it/s]


 36%|███████████▉                     | 17996/50000 [3:15:45<5:34:05,  1.60it/s]


 36%|███████████▉                     | 17997/50000 [3:15:45<5:31:08,  1.61it/s]


 36%|███████████▉                     | 17998/50000 [3:15:46<5:27:52,  1.63it/s]


 36%|███████████▉                     | 17999/50000 [3:15:46<5:18:44,  1.67it/s]


 36%|███████████▉                     | 18000/50000 [3:15:47<5:36:35,  1.58it/s]
                                                                                
{'loss': 3.3479, 'grad_norm': 2.9373562335968018, 'learning_rate': 0.00064, 'epoch': 0.94}

 36%|███████████▉                     | 18000/50000 [3:15:47<5:36:35,  1.58it/s]


 36%|███████████▉                     | 18001/50000 [3:15:48<5:49:12,  1.53it/s]


 36%|███████████▉                     | 18002/50000 [3:15:49<6:02:16,  1.47it/s]


 36%|███████████▉                     | 18003/50000 [3:15:49<5:43:42,  1.55it/s]


 36%|███████████▉                     | 18004/50000 [3:15:50<5:43:50,  1.55it/s]


 36%|███████████▉                     | 18005/50000 [3:15:50<5:35:09,  1.59it/s]


 36%|███████████▉                     | 18006/50000 [3:15:51<5:38:04,  1.58it/s]


 36%|███████████▉                     | 18007/50000 [3:15:52<5:33:46,  1.60it/s]


 36%|███████████▉                     | 18008/50000 [3:15:52<5:28:50,  1.62it/s]


 36%|███████████▉                     | 18009/50000 [3:15:53<5:39:31,  1.57it/s]


 36%|███████████▉                     | 18010/50000 [3:15:53<5:31:46,  1.61it/s]


 36%|███████████▉                     | 18011/50000 [3:15:54<5:23:23,  1.65it/s]


 36%|███████████▉                     | 18012/50000 [3:15:55<5:50:20,  1.52it/s]


 36%|███████████▉                     | 18013/50000 [3:15:55<5:42:23,  1.56it/s]


 36%|███████████▉                     | 18014/50000 [3:15:56<5:45:57,  1.54it/s]


 36%|███████████▉                     | 18015/50000 [3:15:57<5:57:56,  1.49it/s]


 36%|███████████▉                     | 18016/50000 [3:15:57<5:54:43,  1.50it/s]


 36%|███████████▉                     | 18017/50000 [3:15:58<5:40:28,  1.57it/s]


 36%|███████████▉                     | 18018/50000 [3:15:59<5:44:38,  1.55it/s]


 36%|███████████▉                     | 18019/50000 [3:15:59<5:41:26,  1.56it/s]


 36%|███████████▉                     | 18020/50000 [3:16:00<5:35:06,  1.59it/s]


 36%|███████████▉                     | 18021/50000 [3:16:01<5:41:56,  1.56it/s]


 36%|███████████▉                     | 18022/50000 [3:16:01<5:40:33,  1.56it/s]


 36%|███████████▉                     | 18023/50000 [3:16:02<5:27:31,  1.63it/s]


 36%|███████████▉                     | 18024/50000 [3:16:02<5:29:28,  1.62it/s]


 36%|███████████▉                     | 18025/50000 [3:16:03<5:22:42,  1.65it/s]


 36%|███████████▉                     | 18026/50000 [3:16:04<5:55:12,  1.50it/s]


 36%|███████████▉                     | 18027/50000 [3:16:05<6:20:11,  1.40it/s]


 36%|███████████▉                     | 18028/50000 [3:16:05<6:14:21,  1.42it/s]


 36%|███████████▉                     | 18029/50000 [3:16:06<5:51:08,  1.52it/s]


 36%|███████████▉                     | 18030/50000 [3:16:07<6:01:29,  1.47it/s]


 36%|███████████▉                     | 18031/50000 [3:16:07<5:54:18,  1.50it/s]


 36%|███████████▉                     | 18032/50000 [3:16:08<5:52:11,  1.51it/s]


 36%|███████████▉                     | 18033/50000 [3:16:09<5:48:38,  1.53it/s]


 36%|███████████▉                     | 18034/50000 [3:16:09<5:37:06,  1.58it/s]


 36%|███████████▉                     | 18035/50000 [3:16:10<5:32:03,  1.60it/s]


 36%|███████████▉                     | 18036/50000 [3:16:10<5:50:10,  1.52it/s]


 36%|███████████▉                     | 18037/50000 [3:16:11<5:36:54,  1.58it/s]


 36%|███████████▉                     | 18038/50000 [3:16:12<5:38:00,  1.58it/s]


 36%|███████████▉                     | 18039/50000 [3:16:12<6:09:06,  1.44it/s]


 36%|███████████▉                     | 18040/50000 [3:16:13<6:15:04,  1.42it/s]


 36%|███████████▉                     | 18041/50000 [3:16:14<5:51:39,  1.51it/s]


 36%|███████████▉                     | 18042/50000 [3:16:14<5:39:14,  1.57it/s]


 36%|███████████▉                     | 18043/50000 [3:16:15<5:38:38,  1.57it/s]


 36%|███████████▉                     | 18044/50000 [3:16:16<5:30:08,  1.61it/s]


 36%|███████████▉                     | 18045/50000 [3:16:16<5:38:21,  1.57it/s]


 36%|███████████▉                     | 18046/50000 [3:16:17<5:35:58,  1.59it/s]


 36%|███████████▉                     | 18047/50000 [3:16:18<5:56:53,  1.49it/s]


 36%|███████████▉                     | 18048/50000 [3:16:18<5:48:59,  1.53it/s]


 36%|███████████▉                     | 18049/50000 [3:16:19<5:41:19,  1.56it/s]


 36%|███████████▉                     | 18050/50000 [3:16:19<5:34:59,  1.59it/s]


 36%|███████████▉                     | 18051/50000 [3:16:20<5:38:36,  1.57it/s]


 36%|███████████▉                     | 18052/50000 [3:16:21<5:55:43,  1.50it/s]


 36%|███████████▉                     | 18053/50000 [3:16:22<6:13:46,  1.42it/s]


 36%|███████████▉                     | 18054/50000 [3:16:22<5:51:41,  1.51it/s]


 36%|███████████▉                     | 18055/50000 [3:16:23<5:28:37,  1.62it/s]


 36%|███████████▉                     | 18056/50000 [3:16:23<5:32:07,  1.60it/s]


 36%|███████████▉                     | 18057/50000 [3:16:24<5:27:30,  1.63it/s]


 36%|███████████▉                     | 18058/50000 [3:16:25<5:25:14,  1.64it/s]


 36%|███████████▉                     | 18059/50000 [3:16:25<5:31:47,  1.60it/s]


 36%|███████████▉                     | 18060/50000 [3:16:26<5:34:14,  1.59it/s]


 36%|███████████▉                     | 18061/50000 [3:16:26<5:29:57,  1.61it/s]


 36%|███████████▉                     | 18062/50000 [3:16:27<5:35:16,  1.59it/s]


 36%|███████████▉                     | 18063/50000 [3:16:28<5:52:34,  1.51it/s]


 36%|███████████▉                     | 18064/50000 [3:16:29<6:04:30,  1.46it/s]


 36%|███████████▉                     | 18065/50000 [3:16:29<5:34:41,  1.59it/s]


 36%|███████████▉                     | 18066/50000 [3:16:30<5:29:20,  1.62it/s]


 36%|███████████▉                     | 18067/50000 [3:16:30<5:51:08,  1.52it/s]


 36%|███████████▉                     | 18068/50000 [3:16:31<5:49:51,  1.52it/s]


 36%|███████████▉                     | 18069/50000 [3:16:32<6:29:28,  1.37it/s]


 36%|███████████▉                     | 18070/50000 [3:16:33<6:14:42,  1.42it/s]


 36%|███████████▉                     | 18071/50000 [3:16:33<5:51:21,  1.51it/s]


 36%|███████████▉                     | 18072/50000 [3:16:34<5:55:01,  1.50it/s]


 36%|███████████▉                     | 18073/50000 [3:16:35<6:10:12,  1.44it/s]


 36%|███████████▉                     | 18074/50000 [3:16:35<5:48:08,  1.53it/s]


 36%|███████████▉                     | 18075/50000 [3:16:36<5:44:47,  1.54it/s]


 36%|███████████▉                     | 18076/50000 [3:16:36<5:49:23,  1.52it/s]


 36%|███████████▉                     | 18077/50000 [3:16:37<5:27:56,  1.62it/s]


 36%|███████████▉                     | 18078/50000 [3:16:38<5:35:37,  1.59it/s]


 36%|███████████▉                     | 18079/50000 [3:16:38<5:40:07,  1.56it/s]


 36%|███████████▉                     | 18080/50000 [3:16:39<5:40:07,  1.56it/s]


 36%|███████████▉                     | 18081/50000 [3:16:40<5:28:00,  1.62it/s]


 36%|███████████▉                     | 18082/50000 [3:16:40<5:25:32,  1.63it/s]


 36%|███████████▉                     | 18083/50000 [3:16:41<5:24:47,  1.64it/s]


 36%|███████████▉                     | 18084/50000 [3:16:41<5:35:23,  1.59it/s]


 36%|███████████▉                     | 18085/50000 [3:16:42<5:27:45,  1.62it/s]


 36%|███████████▉                     | 18086/50000 [3:16:43<5:30:30,  1.61it/s]


 36%|███████████▉                     | 18087/50000 [3:16:43<5:44:37,  1.54it/s]


 36%|███████████▉                     | 18088/50000 [3:16:44<6:00:48,  1.47it/s]


 36%|███████████▉                     | 18089/50000 [3:16:45<5:48:29,  1.53it/s]


 36%|███████████▉                     | 18090/50000 [3:16:45<5:47:54,  1.53it/s]


 36%|███████████▉                     | 18091/50000 [3:16:46<5:50:34,  1.52it/s]


 36%|███████████▉                     | 18092/50000 [3:16:47<5:48:28,  1.53it/s]


 36%|███████████▉                     | 18093/50000 [3:16:47<5:59:23,  1.48it/s]


 36%|███████████▉                     | 18094/50000 [3:16:48<5:58:33,  1.48it/s]


 36%|███████████▉                     | 18095/50000 [3:16:49<5:54:53,  1.50it/s]


 36%|███████████▉                     | 18096/50000 [3:16:49<6:17:40,  1.41it/s]


 36%|███████████▉                     | 18097/50000 [3:16:50<6:06:21,  1.45it/s]


 36%|███████████▉                     | 18098/50000 [3:16:51<5:58:23,  1.48it/s]


 36%|███████████▉                     | 18099/50000 [3:16:51<5:39:43,  1.57it/s]


 36%|███████████▉                     | 18100/50000 [3:16:52<6:37:41,  1.34it/s]


                                                                                
{'loss': 3.3393, 'grad_norm': 3.2303383350372314, 'learning_rate': 0.000638, 'epoch': 0.95}

 36%|███████████▉                     | 18100/50000 [3:16:52<6:37:41,  1.34it/s]


 36%|███████████▉                     | 18101/50000 [3:16:53<6:20:23,  1.40it/s]


 36%|███████████▉                     | 18102/50000 [3:16:54<6:03:05,  1.46it/s]


 36%|███████████▉                     | 18103/50000 [3:16:54<5:50:04,  1.52it/s]


 36%|███████████▉                     | 18104/50000 [3:16:55<6:14:05,  1.42it/s]


 36%|███████████▉                     | 18105/50000 [3:16:56<6:02:48,  1.47it/s]


 36%|███████████▉                     | 18106/50000 [3:16:56<5:49:50,  1.52it/s]


 36%|███████████▉                     | 18107/50000 [3:16:57<6:18:06,  1.41it/s]


 36%|███████████▉                     | 18108/50000 [3:16:58<5:58:35,  1.48it/s]


 36%|███████████▉                     | 18109/50000 [3:16:58<5:46:53,  1.53it/s]


 36%|███████████▉                     | 18110/50000 [3:16:59<6:01:23,  1.47it/s]


 36%|███████████▉                     | 18111/50000 [3:17:00<5:44:51,  1.54it/s]


 36%|███████████▉                     | 18112/50000 [3:17:00<5:38:12,  1.57it/s]


 36%|███████████▉                     | 18113/50000 [3:17:01<5:58:12,  1.48it/s]


 36%|███████████▉                     | 18114/50000 [3:17:02<5:59:47,  1.48it/s]


 36%|███████████▉                     | 18115/50000 [3:17:02<6:13:19,  1.42it/s]


 36%|███████████▉                     | 18116/50000 [3:17:03<6:20:45,  1.40it/s]


 36%|███████████▉                     | 18117/50000 [3:17:04<6:07:33,  1.45it/s]


 36%|███████████▉                     | 18118/50000 [3:17:04<5:50:06,  1.52it/s]


 36%|███████████▉                     | 18119/50000 [3:17:05<6:05:38,  1.45it/s]


 36%|███████████▉                     | 18120/50000 [3:17:06<5:51:33,  1.51it/s]


 36%|███████████▉                     | 18121/50000 [3:17:06<6:10:36,  1.43it/s]


 36%|███████████▉                     | 18122/50000 [3:17:07<6:06:17,  1.45it/s]


 36%|███████████▉                     | 18123/50000 [3:17:08<5:58:17,  1.48it/s]


 36%|███████████▉                     | 18124/50000 [3:17:09<6:06:04,  1.45it/s]


 36%|███████████▉                     | 18125/50000 [3:17:09<6:09:20,  1.44it/s]


 36%|███████████▉                     | 18126/50000 [3:17:10<6:00:34,  1.47it/s]


 36%|███████████▉                     | 18127/50000 [3:17:11<6:05:21,  1.45it/s]


 36%|███████████▉                     | 18128/50000 [3:17:11<6:01:39,  1.47it/s]


 36%|███████████▉                     | 18129/50000 [3:17:12<5:53:00,  1.50it/s]


 36%|███████████▉                     | 18130/50000 [3:17:12<5:36:58,  1.58it/s]


 36%|███████████▉                     | 18131/50000 [3:17:13<5:24:44,  1.64it/s]


 36%|███████████▉                     | 18132/50000 [3:17:14<5:24:03,  1.64it/s]


 36%|███████████▉                     | 18133/50000 [3:17:14<5:14:39,  1.69it/s]


 36%|███████████▉                     | 18134/50000 [3:17:15<5:12:05,  1.70it/s]


 36%|███████████▉                     | 18135/50000 [3:17:15<5:06:17,  1.73it/s]


 36%|███████████▉                     | 18136/50000 [3:17:16<5:19:26,  1.66it/s]


 36%|███████████▉                     | 18137/50000 [3:17:17<5:27:27,  1.62it/s]


 36%|███████████▉                     | 18138/50000 [3:17:17<6:05:53,  1.45it/s]


 36%|███████████▉                     | 18139/50000 [3:17:18<6:17:30,  1.41it/s]


 36%|███████████▉                     | 18140/50000 [3:17:19<5:58:01,  1.48it/s]


 36%|███████████▉                     | 18141/50000 [3:17:20<6:11:04,  1.43it/s]


 36%|███████████▉                     | 18142/50000 [3:17:20<6:29:46,  1.36it/s]


 36%|███████████▉                     | 18143/50000 [3:17:21<6:26:48,  1.37it/s]


 36%|███████████▉                     | 18144/50000 [3:17:22<6:28:05,  1.37it/s]


 36%|███████████▉                     | 18145/50000 [3:17:23<6:24:34,  1.38it/s]


 36%|███████████▉                     | 18146/50000 [3:17:23<6:05:56,  1.45it/s]


 36%|███████████▉                     | 18147/50000 [3:17:24<5:47:56,  1.53it/s]


 36%|███████████▉                     | 18148/50000 [3:17:24<5:21:49,  1.65it/s]


 36%|███████████▉                     | 18149/50000 [3:17:25<5:18:00,  1.67it/s]


 36%|███████████▉                     | 18150/50000 [3:17:25<5:11:38,  1.70it/s]


 36%|███████████▉                     | 18151/50000 [3:17:26<5:00:52,  1.76it/s]


 36%|███████████▉                     | 18152/50000 [3:17:27<5:11:46,  1.70it/s]


 36%|███████████▉                     | 18153/50000 [3:17:27<5:39:55,  1.56it/s]


 36%|███████████▉                     | 18154/50000 [3:17:28<5:32:01,  1.60it/s]


 36%|███████████▉                     | 18155/50000 [3:17:28<5:22:33,  1.65it/s]


 36%|███████████▉                     | 18156/50000 [3:17:29<5:05:19,  1.74it/s]


 36%|███████████▉                     | 18157/50000 [3:17:30<5:19:19,  1.66it/s]


 36%|███████████▉                     | 18158/50000 [3:17:30<5:15:48,  1.68it/s]


 36%|███████████▉                     | 18159/50000 [3:17:31<5:15:12,  1.68it/s]


 36%|███████████▉                     | 18160/50000 [3:17:31<5:16:38,  1.68it/s]


 36%|███████████▉                     | 18161/50000 [3:17:32<5:28:55,  1.61it/s]


 36%|███████████▉                     | 18162/50000 [3:17:33<5:30:25,  1.61it/s]


 36%|███████████▉                     | 18163/50000 [3:17:33<5:32:25,  1.60it/s]


 36%|███████████▉                     | 18164/50000 [3:17:34<5:33:52,  1.59it/s]


 36%|███████████▉                     | 18165/50000 [3:17:35<5:29:25,  1.61it/s]


 36%|███████████▉                     | 18166/50000 [3:17:35<5:34:18,  1.59it/s]


 36%|███████████▉                     | 18167/50000 [3:17:36<5:39:48,  1.56it/s]


 36%|███████████▉                     | 18168/50000 [3:17:36<5:27:32,  1.62it/s]


 36%|███████████▉                     | 18169/50000 [3:17:37<5:11:26,  1.70it/s]


 36%|███████████▉                     | 18170/50000 [3:17:37<5:00:10,  1.77it/s]


 36%|███████████▉                     | 18171/50000 [3:17:38<5:13:47,  1.69it/s]


 36%|███████████▉                     | 18172/50000 [3:17:39<5:19:24,  1.66it/s]


 36%|███████████▉                     | 18173/50000 [3:17:39<5:30:20,  1.61it/s]


 36%|███████████▉                     | 18174/50000 [3:17:40<5:53:30,  1.50it/s]


 36%|███████████▉                     | 18175/50000 [3:17:41<5:38:07,  1.57it/s]


 36%|███████████▉                     | 18176/50000 [3:17:42<6:08:29,  1.44it/s]


 36%|███████████▉                     | 18177/50000 [3:17:42<5:38:05,  1.57it/s]


 36%|███████████▉                     | 18178/50000 [3:17:43<5:57:09,  1.48it/s]


 36%|███████████▉                     | 18179/50000 [3:17:44<6:02:51,  1.46it/s]


 36%|███████████▉                     | 18180/50000 [3:17:44<6:00:42,  1.47it/s]


 36%|███████████▉                     | 18181/50000 [3:17:45<5:56:09,  1.49it/s]


 36%|████████████                     | 18182/50000 [3:17:46<5:54:06,  1.50it/s]


 36%|████████████                     | 18183/50000 [3:17:46<5:41:28,  1.55it/s]


 36%|████████████                     | 18184/50000 [3:17:47<5:52:46,  1.50it/s]


 36%|████████████                     | 18185/50000 [3:17:47<5:52:40,  1.50it/s]


 36%|████████████                     | 18186/50000 [3:17:48<6:16:34,  1.41it/s]


 36%|████████████                     | 18187/50000 [3:17:49<6:00:11,  1.47it/s]


 36%|████████████                     | 18188/50000 [3:17:50<6:05:55,  1.45it/s]


 36%|████████████                     | 18189/50000 [3:17:50<5:48:47,  1.52it/s]


 36%|████████████                     | 18190/50000 [3:17:51<5:51:51,  1.51it/s]


 36%|████████████                     | 18191/50000 [3:17:51<5:40:57,  1.55it/s]


 36%|████████████                     | 18192/50000 [3:17:52<5:46:22,  1.53it/s]


 36%|████████████                     | 18193/50000 [3:17:53<5:30:04,  1.61it/s]


 36%|████████████                     | 18194/50000 [3:17:53<5:22:42,  1.64it/s]


 36%|████████████                     | 18195/50000 [3:17:54<5:21:31,  1.65it/s]


 36%|████████████                     | 18196/50000 [3:17:55<5:29:43,  1.61it/s]


 36%|████████████                     | 18197/50000 [3:17:55<5:31:28,  1.60it/s]


 36%|████████████                     | 18198/50000 [3:17:56<5:23:41,  1.64it/s]


 36%|████████████                     | 18199/50000 [3:17:56<5:31:32,  1.60it/s]


 36%|████████████                     | 18200/50000 [3:17:57<5:36:44,  1.57it/s]
                                                                                
{'loss': 3.3314, 'grad_norm': 2.8357298374176025, 'learning_rate': 0.0006360000000000001, 'epoch': 0.95}

 36%|████████████                     | 18200/50000 [3:17:57<5:36:44,  1.57it/s]


 36%|████████████                     | 18201/50000 [3:17:58<5:39:23,  1.56it/s]


 36%|████████████                     | 18202/50000 [3:17:59<6:09:16,  1.44it/s]


 36%|████████████                     | 18203/50000 [3:17:59<5:39:33,  1.56it/s]


 36%|████████████                     | 18204/50000 [3:18:00<5:39:16,  1.56it/s]


 36%|████████████                     | 18205/50000 [3:18:00<5:42:12,  1.55it/s]


 36%|████████████                     | 18206/50000 [3:18:01<5:52:19,  1.50it/s]


 36%|████████████                     | 18207/50000 [3:18:02<5:47:16,  1.53it/s]


 36%|████████████                     | 18208/50000 [3:18:02<5:58:08,  1.48it/s]


 36%|████████████                     | 18209/50000 [3:18:03<5:44:19,  1.54it/s]


 36%|████████████                     | 18210/50000 [3:18:04<5:31:39,  1.60it/s]


 36%|████████████                     | 18211/50000 [3:18:04<5:26:39,  1.62it/s]


 36%|████████████                     | 18212/50000 [3:18:05<5:32:14,  1.59it/s]


 36%|████████████                     | 18213/50000 [3:18:05<5:34:11,  1.59it/s]


 36%|████████████                     | 18214/50000 [3:18:06<5:34:34,  1.58it/s]


 36%|████████████                     | 18215/50000 [3:18:07<5:38:09,  1.57it/s]


 36%|████████████                     | 18216/50000 [3:18:07<5:26:33,  1.62it/s]


 36%|████████████                     | 18217/50000 [3:18:08<5:31:07,  1.60it/s]


 36%|████████████                     | 18218/50000 [3:18:09<5:35:18,  1.58it/s]


 36%|████████████                     | 18219/50000 [3:18:09<5:42:19,  1.55it/s]


 36%|████████████                     | 18220/50000 [3:18:10<5:44:16,  1.54it/s]


 36%|████████████                     | 18221/50000 [3:18:11<5:48:34,  1.52it/s]


 36%|████████████                     | 18222/50000 [3:18:11<5:47:33,  1.52it/s]


 36%|████████████                     | 18223/50000 [3:18:12<5:32:53,  1.59it/s]


 36%|████████████                     | 18224/50000 [3:18:12<5:28:34,  1.61it/s]


 36%|████████████                     | 18225/50000 [3:18:13<5:13:00,  1.69it/s]


 36%|████████████                     | 18226/50000 [3:18:14<5:32:46,  1.59it/s]


 36%|████████████                     | 18227/50000 [3:18:14<5:32:31,  1.59it/s]


 36%|████████████                     | 18228/50000 [3:18:15<5:45:29,  1.53it/s]


 36%|████████████                     | 18229/50000 [3:18:16<5:49:25,  1.52it/s]


 36%|████████████                     | 18230/50000 [3:18:16<5:40:06,  1.56it/s]


 36%|████████████                     | 18231/50000 [3:18:17<5:58:09,  1.48it/s]


 36%|████████████                     | 18232/50000 [3:18:18<5:40:15,  1.56it/s]


 36%|████████████                     | 18233/50000 [3:18:18<5:46:44,  1.53it/s]


 36%|████████████                     | 18234/50000 [3:18:19<5:43:14,  1.54it/s]


 36%|████████████                     | 18235/50000 [3:18:20<5:58:14,  1.48it/s]


 36%|████████████                     | 18236/50000 [3:18:20<5:57:20,  1.48it/s]


 36%|████████████                     | 18237/50000 [3:18:21<5:38:36,  1.56it/s]


 36%|████████████                     | 18238/50000 [3:18:22<5:40:28,  1.55it/s]


 36%|████████████                     | 18239/50000 [3:18:22<5:33:45,  1.59it/s]


 36%|████████████                     | 18240/50000 [3:18:23<5:33:14,  1.59it/s]


 36%|████████████                     | 18241/50000 [3:18:23<5:33:51,  1.59it/s]


 36%|████████████                     | 18242/50000 [3:18:24<5:29:10,  1.61it/s]


 36%|████████████                     | 18243/50000 [3:18:25<5:33:54,  1.59it/s]


 36%|████████████                     | 18244/50000 [3:18:25<5:28:10,  1.61it/s]


 36%|████████████                     | 18245/50000 [3:18:26<5:31:28,  1.60it/s]


 36%|████████████                     | 18246/50000 [3:18:26<5:22:40,  1.64it/s]


 36%|████████████                     | 18247/50000 [3:18:27<5:31:24,  1.60it/s]


 36%|████████████                     | 18248/50000 [3:18:28<5:18:30,  1.66it/s]


 36%|████████████                     | 18249/50000 [3:18:29<6:05:45,  1.45it/s]


 36%|████████████                     | 18250/50000 [3:18:29<6:00:30,  1.47it/s]


 37%|████████████                     | 18251/50000 [3:18:30<5:44:51,  1.53it/s]


 37%|████████████                     | 18252/50000 [3:18:31<5:53:57,  1.49it/s]


 37%|████████████                     | 18253/50000 [3:18:31<5:51:12,  1.51it/s]


 37%|████████████                     | 18254/50000 [3:18:32<5:45:21,  1.53it/s]


 37%|████████████                     | 18255/50000 [3:18:33<5:49:08,  1.52it/s]


 37%|████████████                     | 18256/50000 [3:18:33<6:26:13,  1.37it/s]


 37%|████████████                     | 18257/50000 [3:18:34<6:27:16,  1.37it/s]


 37%|████████████                     | 18258/50000 [3:18:35<5:58:29,  1.48it/s]


 37%|████████████                     | 18259/50000 [3:18:35<5:45:20,  1.53it/s]


 37%|████████████                     | 18260/50000 [3:18:36<5:45:09,  1.53it/s]


 37%|████████████                     | 18261/50000 [3:18:37<6:01:31,  1.46it/s]


 37%|████████████                     | 18262/50000 [3:18:37<5:52:41,  1.50it/s]


 37%|████████████                     | 18263/50000 [3:18:38<5:50:16,  1.51it/s]


 37%|████████████                     | 18264/50000 [3:18:39<6:21:00,  1.39it/s]


 37%|████████████                     | 18265/50000 [3:18:39<6:10:04,  1.43it/s]


 37%|████████████                     | 18266/50000 [3:18:40<6:00:40,  1.47it/s]


 37%|████████████                     | 18267/50000 [3:18:41<5:44:16,  1.54it/s]


 37%|████████████                     | 18268/50000 [3:18:41<5:31:46,  1.59it/s]


 37%|████████████                     | 18269/50000 [3:18:42<5:22:57,  1.64it/s]


 37%|████████████                     | 18270/50000 [3:18:43<5:42:12,  1.55it/s]


 37%|████████████                     | 18271/50000 [3:18:43<5:34:13,  1.58it/s]


 37%|████████████                     | 18272/50000 [3:18:44<5:47:55,  1.52it/s]


 37%|████████████                     | 18273/50000 [3:18:45<5:56:23,  1.48it/s]


 37%|████████████                     | 18274/50000 [3:18:45<5:37:11,  1.57it/s]


 37%|████████████                     | 18275/50000 [3:18:46<5:41:42,  1.55it/s]


 37%|████████████                     | 18276/50000 [3:18:46<5:41:57,  1.55it/s]


 37%|████████████                     | 18277/50000 [3:18:47<5:47:04,  1.52it/s]


 37%|████████████                     | 18278/50000 [3:18:48<5:32:40,  1.59it/s]


 37%|████████████                     | 18279/50000 [3:18:48<5:36:48,  1.57it/s]


 37%|████████████                     | 18280/50000 [3:18:49<5:51:59,  1.50it/s]


 37%|████████████                     | 18281/50000 [3:18:50<5:45:49,  1.53it/s]


 37%|████████████                     | 18282/50000 [3:18:50<5:56:53,  1.48it/s]


 37%|████████████                     | 18283/50000 [3:18:51<5:51:10,  1.51it/s]


 37%|████████████                     | 18284/50000 [3:18:52<5:50:04,  1.51it/s]


 37%|████████████                     | 18285/50000 [3:18:52<5:53:30,  1.50it/s]


 37%|████████████                     | 18286/50000 [3:18:53<6:02:11,  1.46it/s]


 37%|████████████                     | 18287/50000 [3:18:54<5:56:47,  1.48it/s]


 37%|████████████                     | 18288/50000 [3:18:54<5:46:00,  1.53it/s]


 37%|████████████                     | 18289/50000 [3:18:55<5:36:29,  1.57it/s]


 37%|████████████                     | 18290/50000 [3:18:56<5:55:18,  1.49it/s]


 37%|████████████                     | 18291/50000 [3:18:56<5:41:56,  1.55it/s]


 37%|████████████                     | 18292/50000 [3:18:57<5:26:41,  1.62it/s]


 37%|████████████                     | 18293/50000 [3:18:58<5:48:26,  1.52it/s]


 37%|████████████                     | 18294/50000 [3:18:58<5:33:16,  1.59it/s]


 37%|████████████                     | 18295/50000 [3:18:59<5:39:41,  1.56it/s]


 37%|████████████                     | 18296/50000 [3:19:00<5:51:13,  1.50it/s]


 37%|████████████                     | 18297/50000 [3:19:00<5:52:04,  1.50it/s]


 37%|████████████                     | 18298/50000 [3:19:01<6:14:43,  1.41it/s]


 37%|████████████                     | 18299/50000 [3:19:02<6:22:03,  1.38it/s]


 37%|████████████                     | 18300/50000 [3:19:02<6:05:49,  1.44it/s]
                                                                                
{'loss': 3.3302, 'grad_norm': 2.9912188053131104, 'learning_rate': 0.000634, 'epoch': 0.96}

 37%|████████████                     | 18300/50000 [3:19:02<6:05:49,  1.44it/s]


 37%|████████████                     | 18301/50000 [3:19:03<5:54:52,  1.49it/s]


 37%|████████████                     | 18302/50000 [3:19:04<6:25:22,  1.37it/s]


 37%|████████████                     | 18303/50000 [3:19:04<5:48:02,  1.52it/s]


 37%|████████████                     | 18304/50000 [3:19:05<5:37:10,  1.57it/s]


 37%|████████████                     | 18305/50000 [3:19:06<5:27:36,  1.61it/s]


 37%|████████████                     | 18306/50000 [3:19:06<5:28:58,  1.61it/s]


 37%|████████████                     | 18307/50000 [3:19:07<5:33:53,  1.58it/s]


 37%|████████████                     | 18308/50000 [3:19:08<5:32:19,  1.59it/s]


 37%|████████████                     | 18309/50000 [3:19:08<5:26:11,  1.62it/s]


 37%|████████████                     | 18310/50000 [3:19:09<5:22:06,  1.64it/s]


 37%|████████████                     | 18311/50000 [3:19:09<5:07:35,  1.72it/s]


 37%|████████████                     | 18312/50000 [3:19:10<5:17:55,  1.66it/s]


 37%|████████████                     | 18313/50000 [3:19:10<5:10:07,  1.70it/s]


 37%|████████████                     | 18314/50000 [3:19:11<5:18:41,  1.66it/s]


 37%|████████████                     | 18315/50000 [3:19:12<5:27:40,  1.61it/s]


 37%|████████████                     | 18316/50000 [3:19:12<5:36:02,  1.57it/s]


 37%|████████████                     | 18317/50000 [3:19:13<5:42:49,  1.54it/s]


 37%|████████████                     | 18318/50000 [3:19:14<5:38:36,  1.56it/s]


 37%|████████████                     | 18319/50000 [3:19:14<5:27:38,  1.61it/s]


 37%|████████████                     | 18320/50000 [3:19:15<5:30:56,  1.60it/s]


 37%|████████████                     | 18321/50000 [3:19:16<5:28:11,  1.61it/s]


 37%|████████████                     | 18322/50000 [3:19:16<5:23:16,  1.63it/s]


 37%|████████████                     | 18323/50000 [3:19:17<5:29:54,  1.60it/s]


 37%|████████████                     | 18324/50000 [3:19:17<5:23:34,  1.63it/s]


 37%|████████████                     | 18325/50000 [3:19:18<5:35:10,  1.58it/s]


 37%|████████████                     | 18326/50000 [3:19:19<5:58:36,  1.47it/s]


 37%|████████████                     | 18327/50000 [3:19:19<5:45:26,  1.53it/s]


 37%|████████████                     | 18328/50000 [3:19:20<5:23:50,  1.63it/s]


 37%|████████████                     | 18329/50000 [3:19:21<5:40:18,  1.55it/s]


 37%|████████████                     | 18330/50000 [3:19:22<6:26:30,  1.37it/s]


 37%|████████████                     | 18331/50000 [3:19:22<6:01:22,  1.46it/s]


 37%|████████████                     | 18332/50000 [3:19:23<5:45:29,  1.53it/s]


 37%|████████████                     | 18333/50000 [3:19:23<5:45:21,  1.53it/s]


 37%|████████████                     | 18334/50000 [3:19:24<5:43:16,  1.54it/s]


 37%|████████████                     | 18335/50000 [3:19:25<5:26:52,  1.61it/s]


 37%|████████████                     | 18336/50000 [3:19:25<5:19:21,  1.65it/s]


 37%|████████████                     | 18337/50000 [3:19:26<5:24:06,  1.63it/s]


 37%|████████████                     | 18338/50000 [3:19:27<6:11:33,  1.42it/s]


 37%|████████████                     | 18339/50000 [3:19:27<5:46:30,  1.52it/s]


 37%|████████████                     | 18340/50000 [3:19:28<5:36:47,  1.57it/s]


 37%|████████████                     | 18341/50000 [3:19:29<6:07:03,  1.44it/s]


 37%|████████████                     | 18342/50000 [3:19:29<5:50:11,  1.51it/s]


 37%|████████████                     | 18343/50000 [3:19:30<5:31:30,  1.59it/s]


 37%|████████████                     | 18344/50000 [3:19:31<5:49:34,  1.51it/s]


 37%|████████████                     | 18345/50000 [3:19:31<5:32:56,  1.58it/s]


 37%|████████████                     | 18346/50000 [3:19:32<5:38:15,  1.56it/s]


 37%|████████████                     | 18347/50000 [3:19:32<5:39:00,  1.56it/s]


 37%|████████████                     | 18348/50000 [3:19:33<5:24:26,  1.63it/s]


 37%|████████████                     | 18349/50000 [3:19:34<5:59:18,  1.47it/s]


 37%|████████████                     | 18350/50000 [3:19:34<5:51:44,  1.50it/s]


 37%|████████████                     | 18351/50000 [3:19:35<6:00:55,  1.46it/s]


 37%|████████████                     | 18352/50000 [3:19:36<6:28:00,  1.36it/s]


 37%|████████████                     | 18353/50000 [3:19:37<6:17:45,  1.40it/s]


 37%|████████████                     | 18354/50000 [3:19:37<6:28:03,  1.36it/s]


 37%|████████████                     | 18355/50000 [3:19:38<6:10:50,  1.42it/s]


 37%|████████████                     | 18356/50000 [3:19:39<5:51:45,  1.50it/s]


 37%|████████████                     | 18357/50000 [3:19:39<5:53:23,  1.49it/s]


 37%|████████████                     | 18358/50000 [3:19:40<5:52:29,  1.50it/s]


 37%|████████████                     | 18359/50000 [3:19:41<5:24:26,  1.63it/s]


 37%|████████████                     | 18360/50000 [3:19:41<5:09:44,  1.70it/s]


 37%|████████████                     | 18361/50000 [3:19:42<5:33:31,  1.58it/s]


 37%|████████████                     | 18362/50000 [3:19:42<5:31:55,  1.59it/s]


 37%|████████████                     | 18363/50000 [3:19:43<5:32:48,  1.58it/s]


 37%|████████████                     | 18364/50000 [3:19:44<5:36:18,  1.57it/s]


 37%|████████████                     | 18365/50000 [3:19:44<5:30:49,  1.59it/s]


 37%|████████████                     | 18366/50000 [3:19:45<5:38:00,  1.56it/s]


 37%|████████████                     | 18367/50000 [3:19:46<6:18:43,  1.39it/s]


 37%|████████████                     | 18368/50000 [3:19:47<6:23:49,  1.37it/s]


 37%|████████████                     | 18369/50000 [3:19:47<6:10:44,  1.42it/s]


 37%|████████████                     | 18370/50000 [3:19:48<6:03:39,  1.45it/s]


 37%|████████████                     | 18371/50000 [3:19:48<5:43:41,  1.53it/s]


 37%|████████████▏                    | 18372/50000 [3:19:49<5:29:39,  1.60it/s]


 37%|████████████▏                    | 18373/50000 [3:19:50<5:22:49,  1.63it/s]


 37%|████████████▏                    | 18374/50000 [3:19:50<5:40:25,  1.55it/s]


 37%|████████████▏                    | 18375/50000 [3:19:51<5:33:22,  1.58it/s]


 37%|████████████▏                    | 18376/50000 [3:19:52<5:31:30,  1.59it/s]


 37%|████████████▏                    | 18377/50000 [3:19:52<5:39:05,  1.55it/s]


 37%|████████████▏                    | 18378/50000 [3:19:53<5:37:26,  1.56it/s]


 37%|████████████▏                    | 18379/50000 [3:19:54<5:43:15,  1.54it/s]


 37%|████████████▏                    | 18380/50000 [3:19:54<5:36:26,  1.57it/s]


 37%|████████████▏                    | 18381/50000 [3:19:55<5:36:46,  1.56it/s]


 37%|████████████▏                    | 18382/50000 [3:19:55<5:24:51,  1.62it/s]


 37%|████████████▏                    | 18383/50000 [3:19:56<5:50:49,  1.50it/s]


 37%|████████████▏                    | 18384/50000 [3:19:57<5:57:39,  1.47it/s]


 37%|████████████▏                    | 18385/50000 [3:19:57<5:48:26,  1.51it/s]


 37%|████████████▏                    | 18386/50000 [3:19:58<5:42:06,  1.54it/s]


 37%|████████████▏                    | 18387/50000 [3:19:59<5:25:40,  1.62it/s]


 37%|████████████▏                    | 18388/50000 [3:19:59<5:20:01,  1.65it/s]


 37%|████████████▏                    | 18389/50000 [3:20:00<5:17:04,  1.66it/s]


 37%|████████████▏                    | 18390/50000 [3:20:00<5:20:02,  1.65it/s]


 37%|████████████▏                    | 18391/50000 [3:20:01<5:42:26,  1.54it/s]


 37%|████████████▏                    | 18392/50000 [3:20:02<5:42:41,  1.54it/s]


 37%|████████████▏                    | 18393/50000 [3:20:02<5:38:03,  1.56it/s]


 37%|████████████▏                    | 18394/50000 [3:20:03<5:18:07,  1.66it/s]


 37%|████████████▏                    | 18395/50000 [3:20:04<5:17:37,  1.66it/s]


 37%|████████████▏                    | 18396/50000 [3:20:04<5:04:52,  1.73it/s]


 37%|████████████▏                    | 18397/50000 [3:20:05<5:25:28,  1.62it/s]


 37%|████████████▏                    | 18398/50000 [3:20:05<5:27:05,  1.61it/s]


 37%|████████████▏                    | 18399/50000 [3:20:06<5:21:02,  1.64it/s]


 37%|████████████▏                    | 18400/50000 [3:20:06<4:51:52,  1.80it/s]
                                                                                
{'loss': 3.3566, 'grad_norm': 3.3382015228271484, 'learning_rate': 0.000632, 'epoch': 0.96}

 37%|████████████▏                    | 18400/50000 [3:20:06<4:51:52,  1.80it/s]


 37%|████████████▏                    | 18401/50000 [3:20:07<5:06:25,  1.72it/s]


 37%|████████████▏                    | 18402/50000 [3:20:08<5:27:31,  1.61it/s]


 37%|████████████▏                    | 18403/50000 [3:20:08<5:06:47,  1.72it/s]


 37%|████████████▏                    | 18404/50000 [3:20:09<5:02:51,  1.74it/s]


 37%|████████████▏                    | 18405/50000 [3:20:10<5:26:20,  1.61it/s]


 37%|████████████▏                    | 18406/50000 [3:20:10<5:33:24,  1.58it/s]


 37%|████████████▏                    | 18407/50000 [3:20:11<5:35:23,  1.57it/s]


 37%|████████████▏                    | 18408/50000 [3:20:12<5:49:13,  1.51it/s]


 37%|████████████▏                    | 18409/50000 [3:20:12<6:00:49,  1.46it/s]


 37%|████████████▏                    | 18410/50000 [3:20:13<5:59:38,  1.46it/s]


 37%|████████████▏                    | 18411/50000 [3:20:14<5:50:35,  1.50it/s]


 37%|████████████▏                    | 18412/50000 [3:20:14<5:51:19,  1.50it/s]


 37%|████████████▏                    | 18413/50000 [3:20:15<5:43:58,  1.53it/s]


 37%|████████████▏                    | 18414/50000 [3:20:16<5:46:48,  1.52it/s]


 37%|████████████▏                    | 18415/50000 [3:20:16<5:46:45,  1.52it/s]


 37%|████████████▏                    | 18416/50000 [3:20:17<5:38:39,  1.55it/s]


 37%|████████████▏                    | 18417/50000 [3:20:17<5:32:55,  1.58it/s]


 37%|████████████▏                    | 18418/50000 [3:20:18<5:14:34,  1.67it/s]


 37%|████████████▏                    | 18419/50000 [3:20:19<5:15:06,  1.67it/s]


 37%|████████████▏                    | 18420/50000 [3:20:19<5:32:48,  1.58it/s]


 37%|████████████▏                    | 18421/50000 [3:20:20<5:45:21,  1.52it/s]


 37%|████████████▏                    | 18422/50000 [3:20:21<5:36:13,  1.57it/s]


 37%|████████████▏                    | 18423/50000 [3:20:21<5:18:14,  1.65it/s]


 37%|████████████▏                    | 18424/50000 [3:20:22<5:04:45,  1.73it/s]


 37%|████████████▏                    | 18425/50000 [3:20:22<4:56:11,  1.78it/s]


 37%|████████████▏                    | 18426/50000 [3:20:23<5:00:35,  1.75it/s]


 37%|████████████▏                    | 18427/50000 [3:20:23<5:14:45,  1.67it/s]


 37%|████████████▏                    | 18428/50000 [3:20:24<5:13:33,  1.68it/s]


 37%|████████████▏                    | 18429/50000 [3:20:25<5:21:40,  1.64it/s]


 37%|████████████▏                    | 18430/50000 [3:20:25<5:18:26,  1.65it/s]


 37%|████████████▏                    | 18431/50000 [3:20:26<5:22:07,  1.63it/s]


 37%|████████████▏                    | 18432/50000 [3:20:27<5:40:04,  1.55it/s]


 37%|████████████▏                    | 18433/50000 [3:20:27<5:50:19,  1.50it/s]


 37%|████████████▏                    | 18434/50000 [3:20:28<5:51:38,  1.50it/s]


 37%|████████████▏                    | 18435/50000 [3:20:29<5:33:29,  1.58it/s]


 37%|████████████▏                    | 18436/50000 [3:20:29<5:29:36,  1.60it/s]


 37%|████████████▏                    | 18437/50000 [3:20:30<5:35:45,  1.57it/s]


 37%|████████████▏                    | 18438/50000 [3:20:31<6:04:48,  1.44it/s]


 37%|████████████▏                    | 18439/50000 [3:20:32<6:37:12,  1.32it/s]


 37%|████████████▏                    | 18440/50000 [3:20:32<6:22:00,  1.38it/s]


 37%|████████████▏                    | 18441/50000 [3:20:33<6:12:18,  1.41it/s]


 37%|████████████▏                    | 18442/50000 [3:20:34<6:06:32,  1.43it/s]


 37%|████████████▏                    | 18443/50000 [3:20:34<5:56:28,  1.48it/s]


 37%|████████████▏                    | 18444/50000 [3:20:35<5:56:22,  1.48it/s]


 37%|████████████▏                    | 18445/50000 [3:20:36<5:52:17,  1.49it/s]


 37%|████████████▏                    | 18446/50000 [3:20:36<5:38:38,  1.55it/s]


 37%|████████████▏                    | 18447/50000 [3:20:37<5:44:55,  1.52it/s]


 37%|████████████▏                    | 18448/50000 [3:20:37<5:46:17,  1.52it/s]


 37%|████████████▏                    | 18449/50000 [3:20:38<5:24:47,  1.62it/s]


 37%|████████████▏                    | 18450/50000 [3:20:39<5:55:03,  1.48it/s]


 37%|████████████▏                    | 18451/50000 [3:20:39<5:53:23,  1.49it/s]


 37%|████████████▏                    | 18452/50000 [3:20:40<5:42:12,  1.54it/s]


 37%|████████████▏                    | 18453/50000 [3:20:41<5:44:21,  1.53it/s]


 37%|████████████▏                    | 18454/50000 [3:20:41<5:31:56,  1.58it/s]


 37%|████████████▏                    | 18455/50000 [3:20:42<5:22:12,  1.63it/s]


 37%|████████████▏                    | 18456/50000 [3:20:43<5:44:45,  1.52it/s]


 37%|████████████▏                    | 18457/50000 [3:20:43<5:39:09,  1.55it/s]


 37%|████████████▏                    | 18458/50000 [3:20:44<5:38:08,  1.55it/s]


 37%|████████████▏                    | 18459/50000 [3:20:45<5:52:12,  1.49it/s]


 37%|████████████▏                    | 18460/50000 [3:20:45<6:01:02,  1.46it/s]


 37%|████████████▏                    | 18461/50000 [3:20:46<5:56:36,  1.47it/s]


 37%|████████████▏                    | 18462/50000 [3:20:47<5:57:38,  1.47it/s]


 37%|████████████▏                    | 18463/50000 [3:20:47<5:43:10,  1.53it/s]


 37%|████████████▏                    | 18464/50000 [3:20:48<5:44:03,  1.53it/s]


 37%|████████████▏                    | 18465/50000 [3:20:49<5:53:33,  1.49it/s]


 37%|████████████▏                    | 18466/50000 [3:20:49<5:46:19,  1.52it/s]


 37%|████████████▏                    | 18467/50000 [3:20:50<5:34:17,  1.57it/s]


 37%|████████████▏                    | 18468/50000 [3:20:51<5:45:45,  1.52it/s]


 37%|████████████▏                    | 18469/50000 [3:20:51<5:55:12,  1.48it/s]


 37%|████████████▏                    | 18470/50000 [3:20:52<6:23:32,  1.37it/s]


 37%|████████████▏                    | 18471/50000 [3:20:53<6:14:12,  1.40it/s]


 37%|████████████▏                    | 18472/50000 [3:20:53<5:47:52,  1.51it/s]


 37%|████████████▏                    | 18473/50000 [3:20:54<5:42:19,  1.53it/s]


 37%|████████████▏                    | 18474/50000 [3:20:55<5:31:25,  1.59it/s]


 37%|████████████▏                    | 18475/50000 [3:20:55<5:47:11,  1.51it/s]


 37%|████████████▏                    | 18476/50000 [3:20:56<5:42:52,  1.53it/s]


 37%|████████████▏                    | 18477/50000 [3:20:57<5:30:47,  1.59it/s]


 37%|████████████▏                    | 18478/50000 [3:20:57<5:32:09,  1.58it/s]


 37%|████████████▏                    | 18479/50000 [3:20:58<5:33:16,  1.58it/s]


 37%|████████████▏                    | 18480/50000 [3:20:58<5:14:40,  1.67it/s]


 37%|████████████▏                    | 18481/50000 [3:20:59<5:09:07,  1.70it/s]


 37%|████████████▏                    | 18482/50000 [3:21:00<5:17:07,  1.66it/s]


 37%|████████████▏                    | 18483/50000 [3:21:00<5:25:51,  1.61it/s]


 37%|████████████▏                    | 18484/50000 [3:21:01<5:39:50,  1.55it/s]


 37%|████████████▏                    | 18485/50000 [3:21:02<6:09:30,  1.42it/s]


 37%|████████████▏                    | 18486/50000 [3:21:02<6:14:42,  1.40it/s]


 37%|████████████▏                    | 18487/50000 [3:21:03<6:02:04,  1.45it/s]


 37%|████████████▏                    | 18488/50000 [3:21:04<5:52:24,  1.49it/s]


 37%|████████████▏                    | 18489/50000 [3:21:04<5:45:38,  1.52it/s]


 37%|████████████▏                    | 18490/50000 [3:21:05<5:42:40,  1.53it/s]


 37%|████████████▏                    | 18491/50000 [3:21:06<5:41:37,  1.54it/s]


 37%|████████████▏                    | 18492/50000 [3:21:06<5:43:49,  1.53it/s]


 37%|████████████▏                    | 18493/50000 [3:21:07<5:33:24,  1.58it/s]


 37%|████████████▏                    | 18494/50000 [3:21:07<5:28:03,  1.60it/s]


 37%|████████████▏                    | 18495/50000 [3:21:08<5:33:23,  1.57it/s]


 37%|████████████▏                    | 18496/50000 [3:21:09<5:34:04,  1.57it/s]


 37%|████████████▏                    | 18497/50000 [3:21:09<5:28:30,  1.60it/s]


 37%|████████████▏                    | 18498/50000 [3:21:10<5:30:36,  1.59it/s]


 37%|████████████▏                    | 18499/50000 [3:21:11<5:31:07,  1.59it/s]


 37%|████████████▏                    | 18500/50000 [3:21:11<5:36:30,  1.56it/s]
                                                                                
{'loss': 3.3466, 'grad_norm': 3.8940205574035645, 'learning_rate': 0.00063, 'epoch': 0.97}

 37%|████████████▏                    | 18500/50000 [3:21:11<5:36:30,  1.56it/s]


 37%|████████████▏                    | 18501/50000 [3:21:12<6:10:14,  1.42it/s]


 37%|████████████▏                    | 18502/50000 [3:21:13<6:11:52,  1.41it/s]


 37%|████████████▏                    | 18503/50000 [3:21:14<6:19:18,  1.38it/s]


 37%|████████████▏                    | 18504/50000 [3:21:14<6:11:09,  1.41it/s]


 37%|████████████▏                    | 18505/50000 [3:21:15<6:33:21,  1.33it/s]


 37%|████████████▏                    | 18506/50000 [3:21:16<6:32:19,  1.34it/s]


 37%|████████████▏                    | 18507/50000 [3:21:17<6:34:57,  1.33it/s]


 37%|████████████▏                    | 18508/50000 [3:21:17<6:06:16,  1.43it/s]


 37%|████████████▏                    | 18509/50000 [3:21:18<6:02:59,  1.45it/s]


 37%|████████████▏                    | 18510/50000 [3:21:19<6:10:01,  1.42it/s]


 37%|████████████▏                    | 18511/50000 [3:21:19<5:49:41,  1.50it/s]


 37%|████████████▏                    | 18512/50000 [3:21:20<5:30:30,  1.59it/s]


 37%|████████████▏                    | 18513/50000 [3:21:20<5:46:24,  1.51it/s]


 37%|████████████▏                    | 18514/50000 [3:21:21<5:57:22,  1.47it/s]


 37%|████████████▏                    | 18515/50000 [3:21:22<5:45:44,  1.52it/s]


 37%|████████████▏                    | 18516/50000 [3:21:22<5:31:45,  1.58it/s]


 37%|████████████▏                    | 18517/50000 [3:21:23<5:50:10,  1.50it/s]


 37%|████████████▏                    | 18518/50000 [3:21:24<5:25:24,  1.61it/s]


 37%|████████████▏                    | 18519/50000 [3:21:24<5:20:27,  1.64it/s]


 37%|████████████▏                    | 18520/50000 [3:21:25<5:05:36,  1.72it/s]


 37%|████████████▏                    | 18521/50000 [3:21:26<5:30:48,  1.59it/s]


 37%|████████████▏                    | 18522/50000 [3:21:26<5:54:24,  1.48it/s]


 37%|████████████▏                    | 18523/50000 [3:21:27<6:15:29,  1.40it/s]


 37%|████████████▏                    | 18524/50000 [3:21:28<6:07:32,  1.43it/s]


 37%|████████████▏                    | 18525/50000 [3:21:29<6:32:01,  1.34it/s]


 37%|████████████▏                    | 18526/50000 [3:21:29<6:20:58,  1.38it/s]


 37%|████████████▏                    | 18527/50000 [3:21:30<6:28:42,  1.35it/s]


 37%|████████████▏                    | 18528/50000 [3:21:31<6:47:32,  1.29it/s]


 37%|████████████▏                    | 18529/50000 [3:21:32<6:30:54,  1.34it/s]


 37%|████████████▏                    | 18530/50000 [3:21:32<6:25:17,  1.36it/s]


 37%|████████████▏                    | 18531/50000 [3:21:33<6:21:18,  1.38it/s]


 37%|████████████▏                    | 18532/50000 [3:21:34<6:02:28,  1.45it/s]


 37%|████████████▏                    | 18533/50000 [3:21:34<6:08:32,  1.42it/s]


 37%|████████████▏                    | 18534/50000 [3:21:35<5:44:39,  1.52it/s]


 37%|████████████▏                    | 18535/50000 [3:21:36<5:35:50,  1.56it/s]


 37%|████████████▏                    | 18536/50000 [3:21:36<5:25:45,  1.61it/s]


 37%|████████████▏                    | 18537/50000 [3:21:37<5:16:41,  1.66it/s]


 37%|████████████▏                    | 18538/50000 [3:21:37<5:15:16,  1.66it/s]


 37%|████████████▏                    | 18539/50000 [3:21:38<5:20:16,  1.64it/s]


 37%|████████████▏                    | 18540/50000 [3:21:38<5:13:53,  1.67it/s]


 37%|████████████▏                    | 18541/50000 [3:21:39<5:08:21,  1.70it/s]


 37%|████████████▏                    | 18542/50000 [3:21:40<5:04:30,  1.72it/s]


 37%|████████████▏                    | 18543/50000 [3:21:40<5:29:56,  1.59it/s]


 37%|████████████▏                    | 18544/50000 [3:21:41<5:31:28,  1.58it/s]


 37%|████████████▏                    | 18545/50000 [3:21:42<5:24:39,  1.61it/s]


 37%|████████████▏                    | 18546/50000 [3:21:42<6:01:45,  1.45it/s]


 37%|████████████▏                    | 18547/50000 [3:21:43<5:46:50,  1.51it/s]


 37%|████████████▏                    | 18548/50000 [3:21:44<5:34:25,  1.57it/s]


 37%|████████████▏                    | 18549/50000 [3:21:44<5:38:37,  1.55it/s]


 37%|████████████▏                    | 18550/50000 [3:21:45<5:40:31,  1.54it/s]


 37%|████████████▏                    | 18551/50000 [3:21:46<5:35:57,  1.56it/s]


 37%|████████████▏                    | 18552/50000 [3:21:46<5:46:43,  1.51it/s]


 37%|████████████▏                    | 18553/50000 [3:21:47<5:31:17,  1.58it/s]


 37%|████████████▏                    | 18554/50000 [3:21:47<5:26:28,  1.61it/s]


 37%|████████████▏                    | 18555/50000 [3:21:48<5:30:58,  1.58it/s]


 37%|████████████▏                    | 18556/50000 [3:21:49<5:35:08,  1.56it/s]


 37%|████████████▏                    | 18557/50000 [3:21:49<5:42:04,  1.53it/s]


 37%|████████████▏                    | 18558/50000 [3:21:50<5:37:01,  1.55it/s]


 37%|████████████▏                    | 18559/50000 [3:21:51<5:36:23,  1.56it/s]


 37%|████████████▏                    | 18560/50000 [3:21:51<5:49:15,  1.50it/s]


 37%|████████████▎                    | 18561/50000 [3:21:52<5:38:03,  1.55it/s]


 37%|████████████▎                    | 18562/50000 [3:21:53<5:41:07,  1.54it/s]


 37%|████████████▎                    | 18563/50000 [3:21:53<5:40:11,  1.54it/s]


 37%|████████████▎                    | 18564/50000 [3:21:54<5:51:55,  1.49it/s]


 37%|████████████▎                    | 18565/50000 [3:21:55<5:36:56,  1.55it/s]


 37%|████████████▎                    | 18566/50000 [3:21:55<5:47:15,  1.51it/s]


 37%|████████████▎                    | 18567/50000 [3:21:56<5:44:29,  1.52it/s]


 37%|████████████▎                    | 18568/50000 [3:21:57<5:41:33,  1.53it/s]


 37%|████████████▎                    | 18569/50000 [3:21:57<5:33:34,  1.57it/s]


 37%|████████████▎                    | 18570/50000 [3:21:58<5:37:00,  1.55it/s]


 37%|████████████▎                    | 18571/50000 [3:21:59<6:08:04,  1.42it/s]


 37%|████████████▎                    | 18572/50000 [3:21:59<5:57:06,  1.47it/s]


 37%|████████████▎                    | 18573/50000 [3:22:00<5:41:30,  1.53it/s]


 37%|████████████▎                    | 18574/50000 [3:22:01<6:23:23,  1.37it/s]


 37%|████████████▎                    | 18575/50000 [3:22:02<6:19:51,  1.38it/s]


 37%|████████████▎                    | 18576/50000 [3:22:02<6:08:18,  1.42it/s]


 37%|████████████▎                    | 18577/50000 [3:22:03<6:05:17,  1.43it/s]


 37%|████████████▎                    | 18578/50000 [3:22:04<6:03:02,  1.44it/s]


 37%|████████████▎                    | 18579/50000 [3:22:04<5:44:42,  1.52it/s]


 37%|████████████▎                    | 18580/50000 [3:22:05<5:43:11,  1.53it/s]


 37%|████████████▎                    | 18581/50000 [3:22:05<5:34:07,  1.57it/s]


 37%|████████████▎                    | 18582/50000 [3:22:06<5:35:41,  1.56it/s]


 37%|████████████▎                    | 18583/50000 [3:22:07<5:37:28,  1.55it/s]


 37%|████████████▎                    | 18584/50000 [3:22:07<5:34:42,  1.56it/s]


 37%|████████████▎                    | 18585/50000 [3:22:08<5:39:44,  1.54it/s]


 37%|████████████▎                    | 18586/50000 [3:22:09<5:39:10,  1.54it/s]


 37%|████████████▎                    | 18587/50000 [3:22:09<5:19:41,  1.64it/s]


 37%|████████████▎                    | 18588/50000 [3:22:10<5:10:37,  1.69it/s]


 37%|████████████▎                    | 18589/50000 [3:22:10<5:05:18,  1.71it/s]


 37%|████████████▎                    | 18590/50000 [3:22:11<5:15:17,  1.66it/s]


 37%|████████████▎                    | 18591/50000 [3:22:12<6:03:59,  1.44it/s]


 37%|████████████▎                    | 18592/50000 [3:22:12<5:48:12,  1.50it/s]


 37%|████████████▎                    | 18593/50000 [3:22:13<5:25:43,  1.61it/s]


 37%|████████████▎                    | 18594/50000 [3:22:14<5:42:38,  1.53it/s]


 37%|████████████▎                    | 18595/50000 [3:22:14<5:45:14,  1.52it/s]


 37%|████████████▎                    | 18596/50000 [3:22:15<5:45:07,  1.52it/s]


 37%|████████████▎                    | 18597/50000 [3:22:16<5:31:09,  1.58it/s]


 37%|████████████▎                    | 18598/50000 [3:22:16<5:48:25,  1.50it/s]


 37%|████████████▎                    | 18599/50000 [3:22:17<5:55:16,  1.47it/s]


 37%|████████████▎                    | 18600/50000 [3:22:18<5:52:03,  1.49it/s]
                                                                                
{'loss': 3.348, 'grad_norm': 3.620234966278076, 'learning_rate': 0.000628, 'epoch': 0.97}

 37%|████████████▎                    | 18600/50000 [3:22:18<5:52:03,  1.49it/s]


 37%|████████████▎                    | 18601/50000 [3:22:18<5:47:49,  1.50it/s]


 37%|████████████▎                    | 18602/50000 [3:22:19<5:48:47,  1.50it/s]


 37%|████████████▎                    | 18603/50000 [3:22:20<6:18:25,  1.38it/s]


 37%|████████████▎                    | 18604/50000 [3:22:20<5:56:20,  1.47it/s]


 37%|████████████▎                    | 18605/50000 [3:22:21<5:26:32,  1.60it/s]


 37%|████████████▎                    | 18606/50000 [3:22:22<5:47:10,  1.51it/s]


 37%|████████████▎                    | 18607/50000 [3:22:22<5:47:22,  1.51it/s]


 37%|████████████▎                    | 18608/50000 [3:22:23<5:44:30,  1.52it/s]


 37%|████████████▎                    | 18609/50000 [3:22:24<5:45:30,  1.51it/s]


 37%|████████████▎                    | 18610/50000 [3:22:24<5:30:03,  1.59it/s]


 37%|████████████▎                    | 18611/50000 [3:22:25<5:53:01,  1.48it/s]


 37%|████████████▎                    | 18612/50000 [3:22:26<5:48:01,  1.50it/s]


 37%|████████████▎                    | 18613/50000 [3:22:26<5:43:20,  1.52it/s]


 37%|████████████▎                    | 18614/50000 [3:22:27<5:29:08,  1.59it/s]


 37%|████████████▎                    | 18615/50000 [3:22:27<5:12:38,  1.67it/s]


 37%|████████████▎                    | 18616/50000 [3:22:28<5:16:33,  1.65it/s]


 37%|████████████▎                    | 18617/50000 [3:22:29<5:39:30,  1.54it/s]


 37%|████████████▎                    | 18618/50000 [3:22:29<5:42:30,  1.53it/s]


 37%|████████████▎                    | 18619/50000 [3:22:30<5:41:24,  1.53it/s]


 37%|████████████▎                    | 18620/50000 [3:22:31<5:19:15,  1.64it/s]


 37%|████████████▎                    | 18621/50000 [3:22:31<5:17:05,  1.65it/s]


 37%|████████████▎                    | 18622/50000 [3:22:32<5:28:29,  1.59it/s]


 37%|████████████▎                    | 18623/50000 [3:22:32<5:28:32,  1.59it/s]


 37%|████████████▎                    | 18624/50000 [3:22:33<5:24:41,  1.61it/s]


 37%|████████████▎                    | 18625/50000 [3:22:34<5:28:49,  1.59it/s]


 37%|████████████▎                    | 18626/50000 [3:22:34<5:24:44,  1.61it/s]


 37%|████████████▎                    | 18627/50000 [3:22:35<5:42:02,  1.53it/s]


 37%|████████████▎                    | 18628/50000 [3:22:36<5:43:42,  1.52it/s]


 37%|████████████▎                    | 18629/50000 [3:22:36<5:37:56,  1.55it/s]


 37%|████████████▎                    | 18630/50000 [3:22:37<5:54:03,  1.48it/s]


 37%|████████████▎                    | 18631/50000 [3:22:38<6:15:50,  1.39it/s]


 37%|████████████▎                    | 18632/50000 [3:22:39<6:18:42,  1.38it/s]


 37%|████████████▎                    | 18633/50000 [3:22:39<6:08:17,  1.42it/s]


 37%|████████████▎                    | 18634/50000 [3:22:40<5:49:20,  1.50it/s]


 37%|████████████▎                    | 18635/50000 [3:22:41<5:49:40,  1.49it/s]


 37%|████████████▎                    | 18636/50000 [3:22:41<5:47:52,  1.50it/s]


 37%|████████████▎                    | 18637/50000 [3:22:42<5:45:41,  1.51it/s]


 37%|████████████▎                    | 18638/50000 [3:22:43<5:47:09,  1.51it/s]


 37%|████████████▎                    | 18639/50000 [3:22:43<5:55:10,  1.47it/s]


 37%|████████████▎                    | 18640/50000 [3:22:44<5:49:48,  1.49it/s]


 37%|████████████▎                    | 18641/50000 [3:22:45<5:58:21,  1.46it/s]


 37%|████████████▎                    | 18642/50000 [3:22:45<6:30:52,  1.34it/s]


 37%|████████████▎                    | 18643/50000 [3:22:46<6:12:52,  1.40it/s]


 37%|████████████▎                    | 18644/50000 [3:22:47<5:39:02,  1.54it/s]


 37%|████████████▎                    | 18645/50000 [3:22:47<5:48:30,  1.50it/s]


 37%|████████████▎                    | 18646/50000 [3:22:48<6:18:01,  1.38it/s]


 37%|████████████▎                    | 18647/50000 [3:22:49<6:11:51,  1.41it/s]


 37%|████████████▎                    | 18648/50000 [3:22:50<6:00:32,  1.45it/s]


 37%|████████████▎                    | 18649/50000 [3:22:50<5:55:33,  1.47it/s]


 37%|████████████▎                    | 18650/50000 [3:22:51<5:40:19,  1.54it/s]


 37%|████████████▎                    | 18651/50000 [3:22:51<5:36:27,  1.55it/s]


 37%|████████████▎                    | 18652/50000 [3:22:52<5:50:57,  1.49it/s]


 37%|████████████▎                    | 18653/50000 [3:22:53<5:24:39,  1.61it/s]


 37%|████████████▎                    | 18654/50000 [3:22:53<5:26:30,  1.60it/s]


 37%|████████████▎                    | 18655/50000 [3:22:54<5:08:38,  1.69it/s]


 37%|████████████▎                    | 18656/50000 [3:22:54<5:08:22,  1.69it/s]


 37%|████████████▎                    | 18657/50000 [3:22:55<5:07:13,  1.70it/s]


 37%|████████████▎                    | 18658/50000 [3:22:56<5:16:16,  1.65it/s]


 37%|████████████▎                    | 18659/50000 [3:22:56<5:38:49,  1.54it/s]


 37%|████████████▎                    | 18660/50000 [3:22:57<5:43:18,  1.52it/s]


 37%|████████████▎                    | 18661/50000 [3:22:58<5:30:38,  1.58it/s]


 37%|████████████▎                    | 18662/50000 [3:22:58<5:19:53,  1.63it/s]


 37%|████████████▎                    | 18663/50000 [3:22:59<5:12:14,  1.67it/s]


 37%|████████████▎                    | 18664/50000 [3:23:00<5:45:26,  1.51it/s]


 37%|████████████▎                    | 18665/50000 [3:23:00<5:43:55,  1.52it/s]


 37%|████████████▎                    | 18666/50000 [3:23:01<5:26:04,  1.60it/s]


 37%|████████████▎                    | 18667/50000 [3:23:02<6:10:12,  1.41it/s]


 37%|████████████▎                    | 18668/50000 [3:23:02<6:14:35,  1.39it/s]


 37%|████████████▎                    | 18669/50000 [3:23:03<5:53:35,  1.48it/s]


 37%|████████████▎                    | 18670/50000 [3:23:04<5:48:37,  1.50it/s]


 37%|████████████▎                    | 18671/50000 [3:23:04<5:42:17,  1.53it/s]


 37%|████████████▎                    | 18672/50000 [3:23:05<5:42:46,  1.52it/s]


 37%|████████████▎                    | 18673/50000 [3:23:06<5:38:04,  1.54it/s]


 37%|████████████▎                    | 18674/50000 [3:23:06<6:10:41,  1.41it/s]


 37%|████████████▎                    | 18675/50000 [3:23:07<6:00:30,  1.45it/s]


 37%|████████████▎                    | 18676/50000 [3:23:08<5:59:30,  1.45it/s]


 37%|████████████▎                    | 18677/50000 [3:23:09<6:33:33,  1.33it/s]


 37%|████████████▎                    | 18678/50000 [3:23:09<6:16:36,  1.39it/s]


 37%|████████████▎                    | 18679/50000 [3:23:10<5:54:47,  1.47it/s]


 37%|████████████▎                    | 18680/50000 [3:23:11<5:59:26,  1.45it/s]


 37%|████████████▎                    | 18681/50000 [3:23:11<5:54:43,  1.47it/s]


 37%|████████████▎                    | 18682/50000 [3:23:12<5:47:38,  1.50it/s]


 37%|████████████▎                    | 18683/50000 [3:23:12<5:37:28,  1.55it/s]


 37%|████████████▎                    | 18684/50000 [3:23:13<5:24:33,  1.61it/s]


 37%|████████████▎                    | 18685/50000 [3:23:14<5:16:23,  1.65it/s]


 37%|████████████▎                    | 18686/50000 [3:23:14<5:13:32,  1.66it/s]


 37%|████████████▎                    | 18687/50000 [3:23:15<5:36:43,  1.55it/s]


 37%|████████████▎                    | 18688/50000 [3:23:15<5:28:49,  1.59it/s]


 37%|████████████▎                    | 18689/50000 [3:23:16<5:18:24,  1.64it/s]


 37%|████████████▎                    | 18690/50000 [3:23:17<5:15:59,  1.65it/s]


 37%|████████████▎                    | 18691/50000 [3:23:17<5:15:20,  1.65it/s]


 37%|████████████▎                    | 18692/50000 [3:23:18<5:08:06,  1.69it/s]


 37%|████████████▎                    | 18693/50000 [3:23:18<5:15:42,  1.65it/s]


 37%|████████████▎                    | 18694/50000 [3:23:19<5:16:02,  1.65it/s]


 37%|████████████▎                    | 18695/50000 [3:23:20<5:13:21,  1.67it/s]


 37%|████████████▎                    | 18696/50000 [3:23:20<5:14:25,  1.66it/s]


 37%|████████████▎                    | 18697/50000 [3:23:21<5:21:10,  1.62it/s]


 37%|████████████▎                    | 18698/50000 [3:23:22<5:18:01,  1.64it/s]


 37%|████████████▎                    | 18699/50000 [3:23:22<5:56:29,  1.46it/s]


 37%|████████████▎                    | 18700/50000 [3:23:23<5:53:27,  1.48it/s]
                                                                                
{'loss': 3.3132, 'grad_norm': 3.930013656616211, 'learning_rate': 0.000626, 'epoch': 0.98}

 37%|████████████▎                    | 18700/50000 [3:23:23<5:53:27,  1.48it/s]


 37%|████████████▎                    | 18701/50000 [3:23:24<5:38:38,  1.54it/s]


 37%|████████████▎                    | 18702/50000 [3:23:24<5:48:01,  1.50it/s]


 37%|████████████▎                    | 18703/50000 [3:23:25<5:34:44,  1.56it/s]


 37%|████████████▎                    | 18704/50000 [3:23:26<6:16:02,  1.39it/s]


 37%|████████████▎                    | 18705/50000 [3:23:26<6:07:10,  1.42it/s]


 37%|████████████▎                    | 18706/50000 [3:23:27<6:15:19,  1.39it/s]


 37%|████████████▎                    | 18707/50000 [3:23:28<5:40:33,  1.53it/s]


 37%|████████████▎                    | 18708/50000 [3:23:28<5:37:20,  1.55it/s]


 37%|████████████▎                    | 18709/50000 [3:23:29<6:10:53,  1.41it/s]


 37%|████████████▎                    | 18710/50000 [3:23:30<5:48:50,  1.49it/s]


 37%|████████████▎                    | 18711/50000 [3:23:31<5:57:23,  1.46it/s]


 37%|████████████▎                    | 18712/50000 [3:23:31<5:40:21,  1.53it/s]


 37%|████████████▎                    | 18713/50000 [3:23:32<5:35:20,  1.56it/s]


 37%|████████████▎                    | 18714/50000 [3:23:32<5:33:54,  1.56it/s]


 37%|████████████▎                    | 18715/50000 [3:23:33<6:08:27,  1.42it/s]


 37%|████████████▎                    | 18716/50000 [3:23:34<5:54:59,  1.47it/s]


 37%|████████████▎                    | 18717/50000 [3:23:35<6:22:20,  1.36it/s]


 37%|████████████▎                    | 18718/50000 [3:23:35<6:06:39,  1.42it/s]


 37%|████████████▎                    | 18719/50000 [3:23:36<5:58:36,  1.45it/s]


 37%|████████████▎                    | 18720/50000 [3:23:37<5:50:56,  1.49it/s]


 37%|████████████▎                    | 18721/50000 [3:23:37<5:58:51,  1.45it/s]


 37%|████████████▎                    | 18722/50000 [3:23:38<6:02:02,  1.44it/s]


 37%|████████████▎                    | 18723/50000 [3:23:39<5:53:20,  1.48it/s]


 37%|████████████▎                    | 18724/50000 [3:23:39<5:58:09,  1.46it/s]


 37%|████████████▎                    | 18725/50000 [3:23:40<5:48:34,  1.50it/s]


 37%|████████████▎                    | 18726/50000 [3:23:41<5:47:25,  1.50it/s]


 37%|████████████▎                    | 18727/50000 [3:23:41<5:50:14,  1.49it/s]


 37%|████████████▎                    | 18728/50000 [3:23:42<5:46:09,  1.51it/s]


 37%|████████████▎                    | 18729/50000 [3:23:43<5:36:12,  1.55it/s]


 37%|████████████▎                    | 18730/50000 [3:23:43<5:36:35,  1.55it/s]


 37%|████████████▎                    | 18731/50000 [3:23:44<5:39:28,  1.54it/s]


 37%|████████████▎                    | 18732/50000 [3:23:45<5:48:30,  1.50it/s]


 37%|████████████▎                    | 18733/50000 [3:23:45<6:17:35,  1.38it/s]


 37%|████████████▎                    | 18734/50000 [3:23:46<6:04:13,  1.43it/s]


 37%|████████████▎                    | 18735/50000 [3:23:47<5:42:02,  1.52it/s]


 37%|████████████▎                    | 18736/50000 [3:23:47<5:26:34,  1.60it/s]


 37%|████████████▎                    | 18737/50000 [3:23:48<5:40:24,  1.53it/s]


 37%|████████████▎                    | 18738/50000 [3:23:48<5:24:23,  1.61it/s]


 37%|████████████▎                    | 18739/50000 [3:23:49<5:18:08,  1.64it/s]


 37%|████████████▎                    | 18740/50000 [3:23:50<5:35:50,  1.55it/s]


 37%|████████████▎                    | 18741/50000 [3:23:50<5:40:53,  1.53it/s]


 37%|████████████▎                    | 18742/50000 [3:23:51<6:06:09,  1.42it/s]


 37%|████████████▎                    | 18743/50000 [3:23:52<6:00:12,  1.45it/s]


 37%|████████████▎                    | 18744/50000 [3:23:53<5:42:17,  1.52it/s]


 37%|████████████▎                    | 18745/50000 [3:23:53<5:45:26,  1.51it/s]


 37%|████████████▎                    | 18746/50000 [3:23:54<5:59:11,  1.45it/s]


 37%|████████████▎                    | 18747/50000 [3:23:55<5:44:20,  1.51it/s]


 37%|████████████▎                    | 18748/50000 [3:23:55<5:44:51,  1.51it/s]


 37%|████████████▎                    | 18749/50000 [3:23:56<5:39:21,  1.53it/s]


 38%|████████████▍                    | 18750/50000 [3:23:56<5:30:33,  1.58it/s]


 38%|████████████▍                    | 18751/50000 [3:23:57<5:28:22,  1.59it/s]


 38%|████████████▍                    | 18752/50000 [3:23:58<5:40:54,  1.53it/s]


 38%|████████████▍                    | 18753/50000 [3:23:58<5:37:49,  1.54it/s]


 38%|████████████▍                    | 18754/50000 [3:23:59<5:54:33,  1.47it/s]


 38%|████████████▍                    | 18755/50000 [3:24:00<5:59:00,  1.45it/s]


 38%|████████████▍                    | 18756/50000 [3:24:01<6:04:24,  1.43it/s]


 38%|████████████▍                    | 18757/50000 [3:24:01<5:48:58,  1.49it/s]


 38%|████████████▍                    | 18758/50000 [3:24:02<5:24:54,  1.60it/s]


 38%|████████████▍                    | 18759/50000 [3:24:02<5:20:15,  1.63it/s]


 38%|████████████▍                    | 18760/50000 [3:24:03<5:24:39,  1.60it/s]


 38%|████████████▍                    | 18761/50000 [3:24:04<5:18:26,  1.63it/s]


 38%|████████████▍                    | 18762/50000 [3:24:04<5:57:32,  1.46it/s]


 38%|████████████▍                    | 18763/50000 [3:24:05<5:43:15,  1.52it/s]


 38%|████████████▍                    | 18764/50000 [3:24:06<5:38:10,  1.54it/s]


 38%|████████████▍                    | 18765/50000 [3:24:06<5:29:37,  1.58it/s]


 38%|████████████▍                    | 18766/50000 [3:24:07<5:58:11,  1.45it/s]


 38%|████████████▍                    | 18767/50000 [3:24:08<5:44:40,  1.51it/s]


 38%|████████████▍                    | 18768/50000 [3:24:08<5:32:14,  1.57it/s]


 38%|████████████▍                    | 18769/50000 [3:24:09<5:37:29,  1.54it/s]


 38%|████████████▍                    | 18770/50000 [3:24:10<5:46:59,  1.50it/s]


 38%|████████████▍                    | 18771/50000 [3:24:10<5:35:47,  1.55it/s]


 38%|████████████▍                    | 18772/50000 [3:24:11<5:24:01,  1.61it/s]


 38%|████████████▍                    | 18773/50000 [3:24:11<5:33:33,  1.56it/s]


 38%|████████████▍                    | 18774/50000 [3:24:12<5:34:18,  1.56it/s]


 38%|████████████▍                    | 18775/50000 [3:24:13<5:46:06,  1.50it/s]


 38%|████████████▍                    | 18776/50000 [3:24:13<5:46:59,  1.50it/s]


 38%|████████████▍                    | 18777/50000 [3:24:14<5:44:36,  1.51it/s]


 38%|████████████▍                    | 18778/50000 [3:24:15<5:44:01,  1.51it/s]


 38%|████████████▍                    | 18779/50000 [3:24:15<5:25:48,  1.60it/s]


 38%|████████████▍                    | 18780/50000 [3:24:16<5:16:05,  1.65it/s]


 38%|████████████▍                    | 18781/50000 [3:24:17<5:24:03,  1.61it/s]


 38%|████████████▍                    | 18782/50000 [3:24:17<5:32:31,  1.56it/s]


 38%|████████████▍                    | 18783/50000 [3:24:18<5:51:49,  1.48it/s]


 38%|████████████▍                    | 18784/50000 [3:24:19<6:04:21,  1.43it/s]


 38%|████████████▍                    | 18785/50000 [3:24:19<5:59:48,  1.45it/s]


 38%|████████████▍                    | 18786/50000 [3:24:20<5:38:52,  1.54it/s]


 38%|████████████▍                    | 18787/50000 [3:24:21<5:30:05,  1.58it/s]


 38%|████████████▍                    | 18788/50000 [3:24:21<5:29:49,  1.58it/s]


 38%|████████████▍                    | 18789/50000 [3:24:22<5:23:45,  1.61it/s]


 38%|████████████▍                    | 18790/50000 [3:24:23<5:57:55,  1.45it/s]


 38%|████████████▍                    | 18791/50000 [3:24:23<5:37:33,  1.54it/s]


 38%|████████████▍                    | 18792/50000 [3:24:24<5:51:10,  1.48it/s]


 38%|████████████▍                    | 18793/50000 [3:24:25<5:59:54,  1.45it/s]


 38%|████████████▍                    | 18794/50000 [3:24:25<5:40:49,  1.53it/s]


 38%|████████████▍                    | 18795/50000 [3:24:26<5:35:24,  1.55it/s]


 38%|████████████▍                    | 18796/50000 [3:24:26<5:22:43,  1.61it/s]


 38%|████████████▍                    | 18797/50000 [3:24:27<5:28:35,  1.58it/s]


 38%|████████████▍                    | 18798/50000 [3:24:28<5:16:55,  1.64it/s]


 38%|████████████▍                    | 18799/50000 [3:24:28<4:57:31,  1.75it/s]


 38%|████████████▍                    | 18800/50000 [3:24:29<5:13:54,  1.66it/s]
                                                                                
{'loss': 3.3622, 'grad_norm': 3.2023394107818604, 'learning_rate': 0.000624, 'epoch': 0.98}

 38%|████████████▍                    | 18800/50000 [3:24:29<5:13:54,  1.66it/s]


 38%|████████████▍                    | 18801/50000 [3:24:29<5:11:58,  1.67it/s]


 38%|████████████▍                    | 18802/50000 [3:24:30<5:09:13,  1.68it/s]


 38%|████████████▍                    | 18803/50000 [3:24:31<5:08:17,  1.69it/s]


 38%|████████████▍                    | 18804/50000 [3:24:31<5:19:28,  1.63it/s]


 38%|████████████▍                    | 18805/50000 [3:24:32<5:23:20,  1.61it/s]


 38%|████████████▍                    | 18806/50000 [3:24:33<5:31:02,  1.57it/s]


 38%|████████████▍                    | 18807/50000 [3:24:33<5:29:29,  1.58it/s]


 38%|████████████▍                    | 18808/50000 [3:24:34<5:45:37,  1.50it/s]


 38%|████████████▍                    | 18809/50000 [3:24:35<5:35:42,  1.55it/s]


 38%|████████████▍                    | 18810/50000 [3:24:35<5:28:41,  1.58it/s]


 38%|████████████▍                    | 18811/50000 [3:24:36<5:23:51,  1.61it/s]


 38%|████████████▍                    | 18812/50000 [3:24:36<5:28:23,  1.58it/s]


 38%|████████████▍                    | 18813/50000 [3:24:37<5:32:36,  1.56it/s]


 38%|████████████▍                    | 18814/50000 [3:24:38<5:18:48,  1.63it/s]


 38%|████████████▍                    | 18815/50000 [3:24:38<5:34:50,  1.55it/s]


 38%|████████████▍                    | 18816/50000 [3:24:39<5:46:18,  1.50it/s]


 38%|████████████▍                    | 18817/50000 [3:24:40<5:27:22,  1.59it/s]


 38%|████████████▍                    | 18818/50000 [3:24:40<5:26:49,  1.59it/s]


 38%|████████████▍                    | 18819/50000 [3:24:41<5:29:30,  1.58it/s]


 38%|████████████▍                    | 18820/50000 [3:24:41<5:24:26,  1.60it/s]


 38%|████████████▍                    | 18821/50000 [3:24:42<5:53:18,  1.47it/s]


 38%|████████████▍                    | 18822/50000 [3:24:43<5:51:56,  1.48it/s]


 38%|████████████▍                    | 18823/50000 [3:24:44<5:45:06,  1.51it/s]


 38%|████████████▍                    | 18824/50000 [3:24:44<5:38:15,  1.54it/s]


 38%|████████████▍                    | 18825/50000 [3:24:45<5:48:16,  1.49it/s]


 38%|████████████▍                    | 18826/50000 [3:24:46<5:48:20,  1.49it/s]


 38%|████████████▍                    | 18827/50000 [3:24:46<5:55:55,  1.46it/s]


 38%|████████████▍                    | 18828/50000 [3:24:47<5:39:08,  1.53it/s]


 38%|████████████▍                    | 18829/50000 [3:24:48<5:53:06,  1.47it/s]


 38%|████████████▍                    | 18830/50000 [3:24:48<6:13:16,  1.39it/s]


 38%|████████████▍                    | 18831/50000 [3:24:49<6:14:04,  1.39it/s]


 38%|████████████▍                    | 18832/50000 [3:24:50<5:51:33,  1.48it/s]


 38%|████████████▍                    | 18833/50000 [3:24:50<5:50:42,  1.48it/s]


 38%|████████████▍                    | 18834/50000 [3:24:51<5:33:20,  1.56it/s]


 38%|████████████▍                    | 18835/50000 [3:24:52<5:34:49,  1.55it/s]


 38%|████████████▍                    | 18836/50000 [3:24:52<5:24:08,  1.60it/s]


 38%|████████████▍                    | 18837/50000 [3:24:53<5:30:27,  1.57it/s]


 38%|████████████▍                    | 18838/50000 [3:24:53<5:22:59,  1.61it/s]


 38%|████████████▍                    | 18839/50000 [3:24:54<5:16:53,  1.64it/s]


 38%|████████████▍                    | 18840/50000 [3:24:55<5:14:34,  1.65it/s]


 38%|████████████▍                    | 18841/50000 [3:24:55<4:56:48,  1.75it/s]


 38%|████████████▍                    | 18842/50000 [3:24:56<5:14:20,  1.65it/s]


 38%|████████████▍                    | 18843/50000 [3:24:56<5:14:40,  1.65it/s]


 38%|████████████▍                    | 18844/50000 [3:24:57<5:18:50,  1.63it/s]


 38%|████████████▍                    | 18845/50000 [3:24:58<5:22:45,  1.61it/s]


 38%|████████████▍                    | 18846/50000 [3:24:58<5:38:41,  1.53it/s]


 38%|████████████▍                    | 18847/50000 [3:24:59<5:41:33,  1.52it/s]


 38%|████████████▍                    | 18848/50000 [3:25:00<5:57:47,  1.45it/s]


 38%|████████████▍                    | 18849/50000 [3:25:00<5:53:58,  1.47it/s]


 38%|████████████▍                    | 18850/50000 [3:25:01<5:46:25,  1.50it/s]


 38%|████████████▍                    | 18851/50000 [3:25:02<5:49:02,  1.49it/s]


 38%|████████████▍                    | 18852/50000 [3:25:03<6:15:20,  1.38it/s]


 38%|████████████▍                    | 18853/50000 [3:25:03<6:02:16,  1.43it/s]


 38%|████████████▍                    | 18854/50000 [3:25:04<5:56:00,  1.46it/s]


 38%|████████████▍                    | 18855/50000 [3:25:05<5:42:50,  1.51it/s]


 38%|████████████▍                    | 18856/50000 [3:25:05<5:26:47,  1.59it/s]


 38%|████████████▍                    | 18857/50000 [3:25:06<5:43:25,  1.51it/s]


 38%|████████████▍                    | 18858/50000 [3:25:06<5:29:10,  1.58it/s]


 38%|████████████▍                    | 18859/50000 [3:25:07<5:24:01,  1.60it/s]


 38%|████████████▍                    | 18860/50000 [3:25:08<5:28:18,  1.58it/s]


 38%|████████████▍                    | 18861/50000 [3:25:08<5:43:36,  1.51it/s]


 38%|████████████▍                    | 18862/50000 [3:25:09<5:28:12,  1.58it/s]


 38%|████████████▍                    | 18863/50000 [3:25:10<5:20:27,  1.62it/s]


 38%|████████████▍                    | 18864/50000 [3:25:10<5:28:51,  1.58it/s]


 38%|████████████▍                    | 18865/50000 [3:25:11<5:19:52,  1.62it/s]


 38%|████████████▍                    | 18866/50000 [3:25:11<5:13:35,  1.65it/s]


 38%|████████████▍                    | 18867/50000 [3:25:12<5:07:17,  1.69it/s]


 38%|████████████▍                    | 18868/50000 [3:25:13<5:30:43,  1.57it/s]


 38%|████████████▍                    | 18869/50000 [3:25:13<5:57:21,  1.45it/s]


 38%|████████████▍                    | 18870/50000 [3:25:14<5:44:40,  1.51it/s]


 38%|████████████▍                    | 18871/50000 [3:25:15<6:23:12,  1.35it/s]


 38%|████████████▍                    | 18872/50000 [3:25:16<6:21:59,  1.36it/s]


 38%|████████████▍                    | 18873/50000 [3:25:16<6:08:48,  1.41it/s]


 38%|████████████▍                    | 18874/50000 [3:25:17<6:30:08,  1.33it/s]


 38%|████████████▍                    | 18875/50000 [3:25:18<6:10:32,  1.40it/s]


 38%|████████████▍                    | 18876/50000 [3:25:18<5:57:51,  1.45it/s]


 38%|████████████▍                    | 18877/50000 [3:25:19<5:55:52,  1.46it/s]


 38%|████████████▍                    | 18878/50000 [3:25:20<5:36:48,  1.54it/s]


 38%|████████████▍                    | 18879/50000 [3:25:20<5:52:20,  1.47it/s]


 38%|████████████▍                    | 18880/50000 [3:25:21<5:49:08,  1.49it/s]


 38%|████████████▍                    | 18881/50000 [3:25:22<5:45:59,  1.50it/s]


 38%|████████████▍                    | 18882/50000 [3:25:22<5:29:11,  1.58it/s]


 38%|████████████▍                    | 18883/50000 [3:25:23<5:18:34,  1.63it/s]


 38%|████████████▍                    | 18884/50000 [3:25:23<5:15:58,  1.64it/s]


 38%|████████████▍                    | 18885/50000 [3:25:24<5:40:03,  1.53it/s]


 38%|████████████▍                    | 18886/50000 [3:25:25<5:35:51,  1.54it/s]


 38%|████████████▍                    | 18887/50000 [3:25:25<5:21:08,  1.61it/s]


 38%|████████████▍                    | 18888/50000 [3:25:26<5:37:51,  1.53it/s]


 38%|████████████▍                    | 18889/50000 [3:25:27<5:28:15,  1.58it/s]


 38%|████████████▍                    | 18890/50000 [3:25:27<5:35:23,  1.55it/s]


 38%|████████████▍                    | 18891/50000 [3:25:28<6:08:11,  1.41it/s]


 38%|████████████▍                    | 18892/50000 [3:25:29<6:00:22,  1.44it/s]


 38%|████████████▍                    | 18893/50000 [3:25:30<5:45:06,  1.50it/s]


 38%|████████████▍                    | 18894/50000 [3:25:30<5:42:24,  1.51it/s]


 38%|████████████▍                    | 18895/50000 [3:25:31<5:38:30,  1.53it/s]


 38%|████████████▍                    | 18896/50000 [3:25:32<5:49:44,  1.48it/s]


 38%|████████████▍                    | 18897/50000 [3:25:32<5:30:51,  1.57it/s]


 38%|████████████▍                    | 18898/50000 [3:25:33<5:21:32,  1.61it/s]


 38%|████████████▍                    | 18899/50000 [3:25:34<6:07:53,  1.41it/s]


 38%|████████████▍                    | 18900/50000 [3:25:34<5:52:18,  1.47it/s]
                                                                                
{'loss': 3.3474, 'grad_norm': 3.1845602989196777, 'learning_rate': 0.000622, 'epoch': 0.99}

 38%|████████████▍                    | 18900/50000 [3:25:34<5:52:18,  1.47it/s]


 38%|████████████▍                    | 18901/50000 [3:25:35<5:44:04,  1.51it/s]


 38%|████████████▍                    | 18902/50000 [3:25:35<5:34:12,  1.55it/s]


 38%|████████████▍                    | 18903/50000 [3:25:36<5:31:24,  1.56it/s]


 38%|████████████▍                    | 18904/50000 [3:25:37<5:31:25,  1.56it/s]


 38%|████████████▍                    | 18905/50000 [3:25:37<5:38:16,  1.53it/s]


 38%|████████████▍                    | 18906/50000 [3:25:38<5:37:09,  1.54it/s]


 38%|████████████▍                    | 18907/50000 [3:25:39<5:46:18,  1.50it/s]


 38%|████████████▍                    | 18908/50000 [3:25:39<5:42:42,  1.51it/s]


 38%|████████████▍                    | 18909/50000 [3:25:40<6:12:47,  1.39it/s]


 38%|████████████▍                    | 18910/50000 [3:25:41<6:06:15,  1.41it/s]


 38%|████████████▍                    | 18911/50000 [3:25:42<6:00:38,  1.44it/s]


 38%|████████████▍                    | 18912/50000 [3:25:42<5:54:45,  1.46it/s]


 38%|████████████▍                    | 18913/50000 [3:25:43<5:28:45,  1.58it/s]


 38%|████████████▍                    | 18914/50000 [3:25:43<5:18:46,  1.63it/s]


 38%|████████████▍                    | 18915/50000 [3:25:44<5:20:35,  1.62it/s]


 38%|████████████▍                    | 18916/50000 [3:25:45<5:15:52,  1.64it/s]


 38%|████████████▍                    | 18917/50000 [3:25:45<5:06:44,  1.69it/s]


 38%|████████████▍                    | 18918/50000 [3:25:46<5:15:05,  1.64it/s]


 38%|████████████▍                    | 18919/50000 [3:25:46<5:09:05,  1.68it/s]


 38%|████████████▍                    | 18920/50000 [3:25:47<5:09:48,  1.67it/s]


 38%|████████████▍                    | 18921/50000 [3:25:48<5:44:36,  1.50it/s]


 38%|████████████▍                    | 18922/50000 [3:25:48<5:32:40,  1.56it/s]


 38%|████████████▍                    | 18923/50000 [3:25:49<5:25:20,  1.59it/s]


 38%|████████████▍                    | 18924/50000 [3:25:50<5:25:07,  1.59it/s]


 38%|████████████▍                    | 18925/50000 [3:25:50<5:28:52,  1.57it/s]


 38%|████████████▍                    | 18926/50000 [3:25:51<5:30:31,  1.57it/s]


 38%|████████████▍                    | 18927/50000 [3:25:51<5:10:35,  1.67it/s]


 38%|████████████▍                    | 18928/50000 [3:25:52<5:07:53,  1.68it/s]


 38%|████████████▍                    | 18929/50000 [3:25:53<5:21:42,  1.61it/s]


 38%|████████████▍                    | 18930/50000 [3:25:53<5:28:32,  1.58it/s]


 38%|████████████▍                    | 18931/50000 [3:25:54<5:35:14,  1.54it/s]


 38%|████████████▍                    | 18932/50000 [3:25:55<5:31:14,  1.56it/s]


 38%|████████████▍                    | 18933/50000 [3:25:55<5:12:21,  1.66it/s]


 38%|████████████▍                    | 18934/50000 [3:25:56<5:17:16,  1.63it/s]


 38%|████████████▍                    | 18935/50000 [3:25:56<5:21:37,  1.61it/s]


 38%|████████████▍                    | 18936/50000 [3:25:57<5:15:57,  1.64it/s]


 38%|████████████▍                    | 18937/50000 [3:25:58<5:21:07,  1.61it/s]


 38%|████████████▍                    | 18938/50000 [3:25:58<5:23:28,  1.60it/s]


 38%|████████████▍                    | 18939/50000 [3:25:59<5:24:05,  1.60it/s]


 38%|████████████▌                    | 18940/50000 [3:26:00<5:37:23,  1.53it/s]


 38%|████████████▌                    | 18941/50000 [3:26:00<5:36:44,  1.54it/s]


 38%|████████████▌                    | 18942/50000 [3:26:01<6:01:37,  1.43it/s]


 38%|████████████▌                    | 18943/50000 [3:26:02<6:06:50,  1.41it/s]


 38%|████████████▌                    | 18944/50000 [3:26:03<6:22:42,  1.35it/s]


 38%|████████████▌                    | 18945/50000 [3:26:03<5:46:31,  1.49it/s]


 38%|████████████▌                    | 18946/50000 [3:26:04<5:23:05,  1.60it/s]


 38%|████████████▌                    | 18947/50000 [3:26:04<5:29:41,  1.57it/s]


 38%|████████████▌                    | 18948/50000 [3:26:05<5:23:39,  1.60it/s]


 38%|████████████▌                    | 18949/50000 [3:26:05<5:15:19,  1.64it/s]


 38%|████████████▌                    | 18950/50000 [3:26:06<5:23:16,  1.60it/s]


 38%|████████████▌                    | 18951/50000 [3:26:07<5:23:01,  1.60it/s]


 38%|████████████▌                    | 18952/50000 [3:26:07<5:05:33,  1.69it/s]


 38%|████████████▌                    | 18953/50000 [3:26:08<5:10:35,  1.67it/s]


 38%|████████████▌                    | 18954/50000 [3:26:08<4:54:54,  1.75it/s]


 38%|████████████▌                    | 18955/50000 [3:26:09<5:16:50,  1.63it/s]


 38%|████████████▌                    | 18956/50000 [3:26:10<5:15:09,  1.64it/s]


 38%|████████████▌                    | 18957/50000 [3:26:10<5:14:00,  1.65it/s]


 38%|████████████▌                    | 18958/50000 [3:26:11<5:50:21,  1.48it/s]


 38%|████████████▌                    | 18959/50000 [3:26:12<5:22:29,  1.60it/s]


 38%|████████████▌                    | 18960/50000 [3:26:12<5:19:01,  1.62it/s]


 38%|████████████▌                    | 18961/50000 [3:26:13<5:15:38,  1.64it/s]


 38%|████████████▌                    | 18962/50000 [3:26:13<5:00:05,  1.72it/s]


 38%|████████████▌                    | 18963/50000 [3:26:14<5:22:27,  1.60it/s]


 38%|████████████▌                    | 18964/50000 [3:26:15<5:17:58,  1.63it/s]


 38%|████████████▌                    | 18965/50000 [3:26:15<5:15:50,  1.64it/s]


 38%|████████████▌                    | 18966/50000 [3:26:16<5:23:20,  1.60it/s]


 38%|████████████▌                    | 18967/50000 [3:26:17<5:28:34,  1.57it/s]


 38%|████████████▌                    | 18968/50000 [3:26:17<5:30:12,  1.57it/s]


 38%|████████████▌                    | 18969/50000 [3:26:18<5:18:36,  1.62it/s]


 38%|████████████▌                    | 18970/50000 [3:26:18<5:15:16,  1.64it/s]


 38%|████████████▌                    | 18971/50000 [3:26:19<5:13:05,  1.65it/s]


 38%|████████████▌                    | 18972/50000 [3:26:20<5:45:47,  1.50it/s]


 38%|████████████▌                    | 18973/50000 [3:26:20<5:33:21,  1.55it/s]


 38%|████████████▌                    | 18974/50000 [3:26:21<5:17:57,  1.63it/s]


 38%|████████████▌                    | 18975/50000 [3:26:22<5:34:50,  1.54it/s]


 38%|████████████▌                    | 18976/50000 [3:26:22<5:32:34,  1.55it/s]


 38%|████████████▌                    | 18977/50000 [3:26:23<5:32:52,  1.55it/s]


 38%|████████████▌                    | 18978/50000 [3:26:24<5:25:13,  1.59it/s]


 38%|████████████▌                    | 18979/50000 [3:26:24<5:38:36,  1.53it/s]


 38%|████████████▌                    | 18980/50000 [3:26:25<5:39:10,  1.52it/s]


 38%|████████████▌                    | 18981/50000 [3:26:25<5:28:40,  1.57it/s]


 38%|████████████▌                    | 18982/50000 [3:26:26<5:19:25,  1.62it/s]


 38%|████████████▌                    | 18983/50000 [3:26:27<5:19:51,  1.62it/s]


 38%|████████████▌                    | 18984/50000 [3:26:27<5:11:19,  1.66it/s]


 38%|████████████▌                    | 18985/50000 [3:26:28<5:06:23,  1.69it/s]


 38%|████████████▌                    | 18986/50000 [3:26:28<5:05:45,  1.69it/s]


 38%|████████████▌                    | 18987/50000 [3:26:29<5:02:36,  1.71it/s]


 38%|████████████▌                    | 18988/50000 [3:26:30<5:44:19,  1.50it/s]


 38%|████████████▌                    | 18989/50000 [3:26:30<5:35:11,  1.54it/s]


 38%|████████████▌                    | 18990/50000 [3:26:31<5:24:58,  1.59it/s]


 38%|████████████▌                    | 18991/50000 [3:26:32<5:29:35,  1.57it/s]


 38%|████████████▌                    | 18992/50000 [3:26:32<5:17:08,  1.63it/s]


 38%|████████████▌                    | 18993/50000 [3:26:33<5:31:55,  1.56it/s]


 38%|████████████▌                    | 18994/50000 [3:26:34<5:21:41,  1.61it/s]


 38%|████████████▌                    | 18995/50000 [3:26:34<5:15:31,  1.64it/s]


 38%|████████████▌                    | 18996/50000 [3:26:35<5:14:01,  1.65it/s]


 38%|████████████▌                    | 18997/50000 [3:26:35<5:18:50,  1.62it/s]


 38%|████████████▌                    | 18998/50000 [3:26:36<5:43:43,  1.50it/s]


 38%|████████████▌                    | 18999/50000 [3:26:37<5:38:39,  1.53it/s]


 38%|████████████▌                    | 19000/50000 [3:26:38<5:53:35,  1.46it/s]
                                                                                
{'loss': 3.3531, 'grad_norm': 3.507678747177124, 'learning_rate': 0.00062, 'epoch': 0.99}

 38%|████████████▌                    | 19000/50000 [3:26:38<5:53:35,  1.46it/s]


 38%|████████████▌                    | 19001/50000 [3:26:38<5:25:59,  1.58it/s]


 38%|████████████▌                    | 19002/50000 [3:26:39<5:18:28,  1.62it/s]


 38%|████████████▌                    | 19003/50000 [3:26:39<5:04:04,  1.70it/s]


 38%|████████████▌                    | 19004/50000 [3:26:40<5:09:02,  1.67it/s]


 38%|████████████▌                    | 19005/50000 [3:26:40<5:06:34,  1.68it/s]


 38%|████████████▌                    | 19006/50000 [3:26:41<4:55:31,  1.75it/s]


 38%|████████████▌                    | 19007/50000 [3:26:42<5:36:11,  1.54it/s]


 38%|████████████▌                    | 19008/50000 [3:26:42<5:23:43,  1.60it/s]


 38%|████████████▌                    | 19009/50000 [3:26:43<5:25:35,  1.59it/s]


 38%|████████████▌                    | 19010/50000 [3:26:43<5:16:09,  1.63it/s]


 38%|████████████▌                    | 19011/50000 [3:26:44<5:37:19,  1.53it/s]


 38%|████████████▌                    | 19012/50000 [3:26:45<6:15:29,  1.38it/s]


 38%|████████████▌                    | 19013/50000 [3:26:46<6:03:46,  1.42it/s]


 38%|████████████▌                    | 19014/50000 [3:26:46<5:52:48,  1.46it/s]


 38%|████████████▌                    | 19015/50000 [3:26:47<5:59:05,  1.44it/s]


 38%|████████████▌                    | 19016/50000 [3:26:48<5:39:42,  1.52it/s]


 38%|████████████▌                    | 19017/50000 [3:26:48<5:28:01,  1.57it/s]


 38%|████████████▌                    | 19018/50000 [3:26:49<5:17:59,  1.62it/s]


 38%|████████████▌                    | 19019/50000 [3:26:50<5:49:00,  1.48it/s]


 38%|████████████▌                    | 19020/50000 [3:26:50<5:46:18,  1.49it/s]


 38%|████████████▌                    | 19021/50000 [3:26:51<5:45:20,  1.50it/s]


 38%|████████████▌                    | 19022/50000 [3:26:52<5:38:50,  1.52it/s]


 38%|████████████▌                    | 19023/50000 [3:26:52<5:49:16,  1.48it/s]


 38%|████████████▌                    | 19024/50000 [3:26:53<5:47:29,  1.49it/s]


 38%|████████████▌                    | 19025/50000 [3:26:54<5:29:32,  1.57it/s]


 38%|████████████▌                    | 19026/50000 [3:26:54<5:20:54,  1.61it/s]


 38%|████████████▌                    | 19027/50000 [3:26:55<5:15:47,  1.63it/s]


 38%|████████████▌                    | 19028/50000 [3:26:55<5:37:06,  1.53it/s]


 38%|████████████▌                    | 19029/50000 [3:26:56<5:34:59,  1.54it/s]


 38%|████████████▌                    | 19030/50000 [3:26:57<5:52:28,  1.46it/s]


 38%|████████████▌                    | 19031/50000 [3:26:58<5:43:50,  1.50it/s]


 38%|████████████▌                    | 19032/50000 [3:26:58<6:10:00,  1.39it/s]


 38%|████████████▌                    | 19033/50000 [3:26:59<5:39:50,  1.52it/s]


 38%|████████████▌                    | 19034/50000 [3:26:59<5:32:15,  1.55it/s]


 38%|████████████▌                    | 19035/50000 [3:27:00<5:20:02,  1.61it/s]


 38%|████████████▌                    | 19036/50000 [3:27:01<5:49:22,  1.48it/s]


 38%|████████████▌                    | 19037/50000 [3:27:01<5:28:56,  1.57it/s]


 38%|████████████▌                    | 19038/50000 [3:27:02<5:44:16,  1.50it/s]


 38%|████████████▌                    | 19039/50000 [3:27:03<5:42:54,  1.50it/s]


 38%|████████████▌                    | 19040/50000 [3:27:04<5:54:03,  1.46it/s]


 38%|████████████▌                    | 19041/50000 [3:27:04<5:37:59,  1.53it/s]


 38%|████████████▌                    | 19042/50000 [3:27:05<5:34:36,  1.54it/s]


 38%|████████████▌                    | 19043/50000 [3:27:05<5:34:06,  1.54it/s]


 38%|████████████▌                    | 19044/50000 [3:27:06<6:05:14,  1.41it/s]


 38%|████████████▌                    | 19045/50000 [3:27:07<6:11:34,  1.39it/s]


 38%|████████████▌                    | 19046/50000 [3:27:08<6:00:56,  1.43it/s]


 38%|████████████▌                    | 19047/50000 [3:27:08<5:46:51,  1.49it/s]


 38%|████████████▌                    | 19048/50000 [3:27:09<5:44:59,  1.50it/s]


 38%|████████████▌                    | 19049/50000 [3:27:09<5:22:42,  1.60it/s]


 38%|████████████▌                    | 19050/50000 [3:27:10<5:18:20,  1.62it/s]


 38%|████████████▌                    | 19051/50000 [3:27:11<5:19:15,  1.62it/s]


 38%|████████████▌                    | 19052/50000 [3:27:11<5:08:08,  1.67it/s]


 38%|████████████▌                    | 19053/50000 [3:27:12<5:13:12,  1.65it/s]


 38%|████████████▌                    | 19054/50000 [3:27:13<5:39:47,  1.52it/s]


 38%|████████████▌                    | 19055/50000 [3:27:13<5:35:19,  1.54it/s]


 38%|████████████▌                    | 19056/50000 [3:27:14<5:38:50,  1.52it/s]


 38%|████████████▌                    | 19057/50000 [3:27:15<5:40:22,  1.52it/s]


 38%|████████████▌                    | 19058/50000 [3:27:15<5:50:35,  1.47it/s]


 38%|████████████▌                    | 19059/50000 [3:27:16<5:47:35,  1.48it/s]


 38%|████████████▌                    | 19060/50000 [3:27:17<5:41:40,  1.51it/s]


 38%|████████████▌                    | 19061/50000 [3:27:17<5:39:15,  1.52it/s]


 38%|████████████▌                    | 19062/50000 [3:27:18<5:23:07,  1.60it/s]


 38%|████████████▌                    | 19063/50000 [3:27:18<5:23:18,  1.59it/s]


 38%|████████████▌                    | 19064/50000 [3:27:19<5:25:10,  1.59it/s]


 38%|████████████▌                    | 19065/50000 [3:27:20<5:54:40,  1.45it/s]


 38%|████████████▌                    | 19066/50000 [3:27:21<5:44:15,  1.50it/s]


 38%|████████████▌                    | 19067/50000 [3:27:21<5:38:26,  1.52it/s]


 38%|████████████▌                    | 19068/50000 [3:27:22<5:38:38,  1.52it/s]


 38%|████████████▌                    | 19069/50000 [3:27:22<5:28:01,  1.57it/s]


 38%|████████████▌                    | 19070/50000 [3:27:23<5:09:23,  1.67it/s]


 38%|████████████▌                    | 19071/50000 [3:27:24<5:17:20,  1.62it/s]


 38%|████████████▌                    | 19072/50000 [3:27:24<5:19:02,  1.62it/s]


 38%|████████████▌                    | 19073/50000 [3:27:25<5:11:26,  1.66it/s]


 38%|████████████▌                    | 19074/50000 [3:27:25<5:32:51,  1.55it/s]


 38%|████████████▌                    | 19075/50000 [3:27:26<5:34:38,  1.54it/s]


 38%|████████████▌                    | 19076/50000 [3:27:27<5:24:54,  1.59it/s]


 38%|████████████▌                    | 19077/50000 [3:27:27<5:08:14,  1.67it/s]


 38%|████████████▌                    | 19078/50000 [3:27:28<5:02:08,  1.71it/s]


 38%|████████████▌                    | 19079/50000 [3:27:28<4:58:54,  1.72it/s]


 38%|████████████▌                    | 19080/50000 [3:27:29<5:03:23,  1.70it/s]


 38%|████████████▌                    | 19081/50000 [3:27:30<5:13:22,  1.64it/s]


 38%|████████████▌                    | 19082/50000 [3:27:30<5:08:46,  1.67it/s]


 38%|████████████▌                    | 19083/50000 [3:27:31<5:16:14,  1.63it/s]


 38%|████████████▌                    | 19084/50000 [3:27:31<5:11:39,  1.65it/s]


 38%|████████████▌                    | 19085/50000 [3:27:32<5:31:39,  1.55it/s]


 38%|████████████▌                    | 19086/50000 [3:27:33<5:20:30,  1.61it/s]


 38%|████████████▌                    | 19087/50000 [3:27:33<5:11:45,  1.65it/s]


 38%|████████████▌                    | 19088/50000 [3:27:34<5:20:18,  1.61it/s]


 38%|████████████▌                    | 19089/50000 [3:27:35<5:30:04,  1.56it/s]


 38%|████████████▌                    | 19090/50000 [3:27:35<5:51:59,  1.46it/s]


 38%|████████████▌                    | 19091/50000 [3:27:36<5:38:31,  1.52it/s]


 38%|████████████▌                    | 19092/50000 [3:27:37<6:08:17,  1.40it/s]


 38%|████████████▌                    | 19093/50000 [3:27:38<5:57:51,  1.44it/s]


 38%|████████████▌                    | 19094/50000 [3:27:38<5:41:30,  1.51it/s]


 38%|████████████▌                    | 19095/50000 [3:27:39<5:43:42,  1.50it/s]


 38%|████████████▌                    | 19096/50000 [3:27:39<5:33:27,  1.54it/s]


 38%|████████████▌                    | 19097/50000 [3:27:40<5:08:19,  1.67it/s]


 38%|████████████▌                    | 19098/50000 [3:27:40<5:01:59,  1.71it/s]


 38%|████████████▌                    | 19099/50000 [3:27:41<5:36:23,  1.53it/s]


 38%|████████████▌                    | 19100/50000 [3:27:42<5:51:29,  1.47it/s]
                                                                                
{'loss': 3.3407, 'grad_norm': 3.605889081954956, 'learning_rate': 0.0006180000000000001, 'epoch': 1.0}

 38%|████████████▌                    | 19100/50000 [3:27:42<5:51:29,  1.47it/s]


 38%|████████████▌                    | 19101/50000 [3:27:43<7:43:11,  1.11it/s]


 38%|████████████▌                    | 19102/50000 [3:27:44<6:43:20,  1.28it/s]


 38%|████████████▌                    | 19103/50000 [3:27:45<6:13:35,  1.38it/s]


 38%|████████████▌                    | 19104/50000 [3:27:45<5:50:49,  1.47it/s]


 38%|████████████▌                    | 19105/50000 [3:27:46<5:25:42,  1.58it/s]


 38%|████████████▌                    | 19106/50000 [3:27:46<5:08:58,  1.67it/s]


 38%|████████████▌                    | 19107/50000 [3:27:47<5:01:41,  1.71it/s]


 38%|████████████▌                    | 19108/50000 [3:27:47<5:25:16,  1.58it/s]


 38%|████████████▌                    | 19109/50000 [3:27:48<5:28:38,  1.57it/s]


 38%|████████████▌                    | 19110/50000 [3:27:49<5:29:05,  1.56it/s]


 38%|████████████▌                    | 19111/50000 [3:27:49<5:27:28,  1.57it/s]


 38%|████████████▌                    | 19112/50000 [3:27:50<5:28:19,  1.57it/s]


 38%|████████████▌                    | 19113/50000 [3:27:51<5:40:41,  1.51it/s]


 38%|████████████▌                    | 19114/50000 [3:27:51<5:27:47,  1.57it/s]


 38%|████████████▌                    | 19115/50000 [3:27:52<5:48:25,  1.48it/s]


 38%|████████████▌                    | 19116/50000 [3:27:53<5:35:06,  1.54it/s]


 38%|████████████▌                    | 19117/50000 [3:27:53<5:46:38,  1.48it/s]


 38%|████████████▌                    | 19118/50000 [3:27:54<5:34:54,  1.54it/s]


 38%|████████████▌                    | 19119/50000 [3:27:55<5:49:29,  1.47it/s]


 38%|████████████▌                    | 19120/50000 [3:27:55<5:32:56,  1.55it/s]


 38%|████████████▌                    | 19121/50000 [3:27:56<5:45:10,  1.49it/s]


 38%|████████████▌                    | 19122/50000 [3:27:57<5:51:23,  1.46it/s]


 38%|████████████▌                    | 19123/50000 [3:27:57<5:39:06,  1.52it/s]


 38%|████████████▌                    | 19124/50000 [3:27:58<5:18:19,  1.62it/s]


 38%|████████████▌                    | 19125/50000 [3:27:58<5:14:47,  1.63it/s]


 38%|████████████▌                    | 19126/50000 [3:27:59<5:13:07,  1.64it/s]


 38%|████████████▌                    | 19127/50000 [3:28:00<5:03:13,  1.70it/s]


 38%|████████████▌                    | 19128/50000 [3:28:00<5:27:55,  1.57it/s]


 38%|████████████▋                    | 19129/50000 [3:28:01<5:18:29,  1.62it/s]


 38%|████████████▋                    | 19130/50000 [3:28:01<5:07:00,  1.68it/s]


 38%|████████████▋                    | 19131/50000 [3:28:02<5:12:41,  1.65it/s]


 38%|████████████▋                    | 19132/50000 [3:28:03<5:19:34,  1.61it/s]


 38%|████████████▋                    | 19133/50000 [3:28:03<5:09:43,  1.66it/s]


 38%|████████████▋                    | 19134/50000 [3:28:04<5:13:32,  1.64it/s]


 38%|████████████▋                    | 19135/50000 [3:28:05<5:20:01,  1.61it/s]


 38%|████████████▋                    | 19136/50000 [3:28:05<5:27:39,  1.57it/s]


 38%|████████████▋                    | 19137/50000 [3:28:06<5:34:56,  1.54it/s]


 38%|████████████▋                    | 19138/50000 [3:28:07<5:25:15,  1.58it/s]


 38%|████████████▋                    | 19139/50000 [3:28:07<5:30:15,  1.56it/s]


 38%|████████████▋                    | 19140/50000 [3:28:08<5:22:02,  1.60it/s]


 38%|████████████▋                    | 19141/50000 [3:28:09<5:50:20,  1.47it/s]


 38%|████████████▋                    | 19142/50000 [3:28:09<5:41:23,  1.51it/s]


 38%|████████████▋                    | 19143/50000 [3:28:10<5:28:53,  1.56it/s]


 38%|████████████▋                    | 19144/50000 [3:28:10<5:18:07,  1.62it/s]


 38%|████████████▋                    | 19145/50000 [3:28:11<5:22:18,  1.60it/s]


 38%|████████████▋                    | 19146/50000 [3:28:12<5:39:15,  1.52it/s]


 38%|████████████▋                    | 19147/50000 [3:28:13<6:16:10,  1.37it/s]


 38%|████████████▋                    | 19148/50000 [3:28:13<5:54:59,  1.45it/s]


 38%|████████████▋                    | 19149/50000 [3:28:14<5:35:26,  1.53it/s]


 38%|████████████▋                    | 19150/50000 [3:28:14<5:33:19,  1.54it/s]


 38%|████████████▋                    | 19151/50000 [3:28:15<5:35:53,  1.53it/s]


 38%|████████████▋                    | 19152/50000 [3:28:16<5:19:13,  1.61it/s]


 38%|████████████▋                    | 19153/50000 [3:28:16<5:21:04,  1.60it/s]


 38%|████████████▋                    | 19154/50000 [3:28:17<5:35:31,  1.53it/s]


 38%|████████████▋                    | 19155/50000 [3:28:18<5:52:26,  1.46it/s]


 38%|████████████▋                    | 19156/50000 [3:28:18<5:51:05,  1.46it/s]


 38%|████████████▋                    | 19157/50000 [3:28:19<5:21:28,  1.60it/s]


 38%|████████████▋                    | 19158/50000 [3:28:20<5:38:38,  1.52it/s]


 38%|████████████▋                    | 19159/50000 [3:28:21<6:09:47,  1.39it/s]


 38%|████████████▋                    | 19160/50000 [3:28:21<6:09:33,  1.39it/s]


 38%|████████████▋                    | 19161/50000 [3:28:22<5:59:10,  1.43it/s]


 38%|████████████▋                    | 19162/50000 [3:28:23<5:49:57,  1.47it/s]


 38%|████████████▋                    | 19163/50000 [3:28:23<5:43:31,  1.50it/s]


 38%|████████████▋                    | 19164/50000 [3:28:24<5:50:46,  1.47it/s]


 38%|████████████▋                    | 19165/50000 [3:28:25<5:49:56,  1.47it/s]


 38%|████████████▋                    | 19166/50000 [3:28:25<6:09:45,  1.39it/s]


 38%|████████████▋                    | 19167/50000 [3:28:26<6:03:14,  1.41it/s]


 38%|████████████▋                    | 19168/50000 [3:28:27<6:08:47,  1.39it/s]


 38%|████████████▋                    | 19169/50000 [3:28:27<5:50:52,  1.46it/s]


 38%|████████████▋                    | 19170/50000 [3:28:28<5:42:10,  1.50it/s]


 38%|████████████▋                    | 19171/50000 [3:28:29<5:56:03,  1.44it/s]


 38%|████████████▋                    | 19172/50000 [3:28:30<5:59:31,  1.43it/s]


 38%|████████████▋                    | 19173/50000 [3:28:30<6:19:20,  1.35it/s]


 38%|████████████▋                    | 19174/50000 [3:28:31<6:05:59,  1.40it/s]


 38%|████████████▋                    | 19175/50000 [3:28:32<6:09:56,  1.39it/s]


 38%|████████████▋                    | 19176/50000 [3:28:32<5:47:48,  1.48it/s]


 38%|████████████▋                    | 19177/50000 [3:28:33<5:40:02,  1.51it/s]


 38%|████████████▋                    | 19178/50000 [3:28:34<5:34:37,  1.54it/s]


 38%|████████████▋                    | 19179/50000 [3:28:34<5:31:45,  1.55it/s]


 38%|████████████▋                    | 19180/50000 [3:28:35<5:22:54,  1.59it/s]


 38%|████████████▋                    | 19181/50000 [3:28:35<5:02:39,  1.70it/s]


 38%|████████████▋                    | 19182/50000 [3:28:36<5:17:14,  1.62it/s]


 38%|████████████▋                    | 19183/50000 [3:28:37<5:21:33,  1.60it/s]


 38%|████████████▋                    | 19184/50000 [3:28:37<5:35:42,  1.53it/s]


 38%|████████████▋                    | 19185/50000 [3:28:38<5:44:15,  1.49it/s]


 38%|████████████▋                    | 19186/50000 [3:28:39<5:52:32,  1.46it/s]


 38%|████████████▋                    | 19187/50000 [3:28:39<5:40:22,  1.51it/s]


 38%|████████████▋                    | 19188/50000 [3:28:40<5:34:49,  1.53it/s]


 38%|████████████▋                    | 19189/50000 [3:28:41<5:29:59,  1.56it/s]


 38%|████████████▋                    | 19190/50000 [3:28:41<5:24:33,  1.58it/s]


 38%|████████████▋                    | 19191/50000 [3:28:42<5:19:50,  1.61it/s]


 38%|████████████▋                    | 19192/50000 [3:28:43<6:03:58,  1.41it/s]


 38%|████████████▋                    | 19193/50000 [3:28:43<5:52:16,  1.46it/s]


 38%|████████████▋                    | 19194/50000 [3:28:44<5:45:57,  1.48it/s]


 38%|████████████▋                    | 19195/50000 [3:28:45<5:47:25,  1.48it/s]


 38%|████████████▋                    | 19196/50000 [3:28:45<5:29:56,  1.56it/s]


 38%|████████████▋                    | 19197/50000 [3:28:46<5:40:12,  1.51it/s]


 38%|████████████▋                    | 19198/50000 [3:28:47<5:49:37,  1.47it/s]


 38%|████████████▋                    | 19199/50000 [3:28:47<5:30:35,  1.55it/s]


 38%|████████████▋                    | 19200/50000 [3:28:48<5:31:43,  1.55it/s]
                                                                                
{'loss': 3.3304, 'grad_norm': 3.47051739692688, 'learning_rate': 0.000616, 'epoch': 1.01}

 38%|████████████▋                    | 19200/50000 [3:28:48<5:31:43,  1.55it/s]


 38%|████████████▋                    | 19201/50000 [3:28:49<5:49:35,  1.47it/s]


 38%|████████████▋                    | 19202/50000 [3:28:49<6:02:01,  1.42it/s]


 38%|████████████▋                    | 19203/50000 [3:28:50<6:25:59,  1.33it/s]


 38%|████████████▋                    | 19204/50000 [3:28:51<6:21:38,  1.34it/s]


 38%|████████████▋                    | 19205/50000 [3:28:52<6:00:37,  1.42it/s]


 38%|████████████▋                    | 19206/50000 [3:28:52<6:03:52,  1.41it/s]


 38%|████████████▋                    | 19207/50000 [3:28:53<5:52:13,  1.46it/s]


 38%|████████████▋                    | 19208/50000 [3:28:54<5:49:46,  1.47it/s]


 38%|████████████▋                    | 19209/50000 [3:28:54<5:34:27,  1.53it/s]


 38%|████████████▋                    | 19210/50000 [3:28:55<5:29:38,  1.56it/s]


 38%|████████████▋                    | 19211/50000 [3:28:56<5:44:07,  1.49it/s]


 38%|████████████▋                    | 19212/50000 [3:28:56<5:34:25,  1.53it/s]


 38%|████████████▋                    | 19213/50000 [3:28:57<5:24:44,  1.58it/s]


 38%|████████████▋                    | 19214/50000 [3:28:57<5:13:06,  1.64it/s]


 38%|████████████▋                    | 19215/50000 [3:28:58<5:15:40,  1.63it/s]


 38%|████████████▋                    | 19216/50000 [3:28:59<5:30:11,  1.55it/s]


 38%|████████████▋                    | 19217/50000 [3:28:59<5:22:41,  1.59it/s]


 38%|████████████▋                    | 19218/50000 [3:29:00<5:05:21,  1.68it/s]


 38%|████████████▋                    | 19219/50000 [3:29:00<5:05:21,  1.68it/s]


 38%|████████████▋                    | 19220/50000 [3:29:01<5:16:08,  1.62it/s]


 38%|████████████▋                    | 19221/50000 [3:29:02<5:26:59,  1.57it/s]


 38%|████████████▋                    | 19222/50000 [3:29:02<5:05:27,  1.68it/s]


 38%|████████████▋                    | 19223/50000 [3:29:03<5:23:54,  1.58it/s]


 38%|████████████▋                    | 19224/50000 [3:29:03<5:01:23,  1.70it/s]


 38%|████████████▋                    | 19225/50000 [3:29:04<5:11:15,  1.65it/s]


 38%|████████████▋                    | 19226/50000 [3:29:05<5:04:36,  1.68it/s]


 38%|████████████▋                    | 19227/50000 [3:29:05<4:59:57,  1.71it/s]


 38%|████████████▋                    | 19228/50000 [3:29:06<5:13:17,  1.64it/s]


 38%|████████████▋                    | 19229/50000 [3:29:06<4:58:46,  1.72it/s]


 38%|████████████▋                    | 19230/50000 [3:29:07<5:00:44,  1.71it/s]


 38%|████████████▋                    | 19231/50000 [3:29:08<5:12:47,  1.64it/s]


 38%|████████████▋                    | 19232/50000 [3:29:08<5:39:08,  1.51it/s]


 38%|████████████▋                    | 19233/50000 [3:29:09<5:26:02,  1.57it/s]


 38%|████████████▋                    | 19234/50000 [3:29:10<5:37:17,  1.52it/s]


 38%|████████████▋                    | 19235/50000 [3:29:10<5:28:37,  1.56it/s]


 38%|████████████▋                    | 19236/50000 [3:29:11<5:29:22,  1.56it/s]


 38%|████████████▋                    | 19237/50000 [3:29:11<5:09:59,  1.65it/s]


 38%|████████████▋                    | 19238/50000 [3:29:12<5:05:38,  1.68it/s]


 38%|████████████▋                    | 19239/50000 [3:29:13<5:05:26,  1.68it/s]


 38%|████████████▋                    | 19240/50000 [3:29:13<5:14:00,  1.63it/s]


 38%|████████████▋                    | 19241/50000 [3:29:14<5:05:30,  1.68it/s]


 38%|████████████▋                    | 19242/50000 [3:29:14<5:09:18,  1.66it/s]


 38%|████████████▋                    | 19243/50000 [3:29:15<5:07:01,  1.67it/s]


 38%|████████████▋                    | 19244/50000 [3:29:16<5:33:03,  1.54it/s]


 38%|████████████▋                    | 19245/50000 [3:29:16<5:24:39,  1.58it/s]


 38%|████████████▋                    | 19246/50000 [3:29:17<5:25:33,  1.57it/s]


 38%|████████████▋                    | 19247/50000 [3:29:18<5:14:33,  1.63it/s]


 38%|████████████▋                    | 19248/50000 [3:29:18<5:09:45,  1.65it/s]


 38%|████████████▋                    | 19249/50000 [3:29:19<5:05:25,  1.68it/s]


 38%|████████████▋                    | 19250/50000 [3:29:19<4:58:30,  1.72it/s]


 39%|████████████▋                    | 19251/50000 [3:29:20<5:08:09,  1.66it/s]


 39%|████████████▋                    | 19252/50000 [3:29:21<5:05:16,  1.68it/s]


 39%|████████████▋                    | 19253/50000 [3:29:21<5:18:52,  1.61it/s]


 39%|████████████▋                    | 19254/50000 [3:29:22<5:24:27,  1.58it/s]


 39%|████████████▋                    | 19255/50000 [3:29:22<5:13:42,  1.63it/s]


 39%|████████████▋                    | 19256/50000 [3:29:23<5:07:10,  1.67it/s]


 39%|████████████▋                    | 19257/50000 [3:29:24<5:04:35,  1.68it/s]


 39%|████████████▋                    | 19258/50000 [3:29:24<5:12:25,  1.64it/s]


 39%|████████████▋                    | 19259/50000 [3:29:25<4:52:20,  1.75it/s]


 39%|████████████▋                    | 19260/50000 [3:29:25<5:09:41,  1.65it/s]


 39%|████████████▋                    | 19261/50000 [3:29:26<5:44:04,  1.49it/s]


 39%|████████████▋                    | 19262/50000 [3:29:27<5:31:20,  1.55it/s]


 39%|████████████▋                    | 19263/50000 [3:29:28<5:33:08,  1.54it/s]


 39%|████████████▋                    | 19264/50000 [3:29:28<5:21:42,  1.59it/s]


 39%|████████████▋                    | 19265/50000 [3:29:29<5:09:54,  1.65it/s]


 39%|████████████▋                    | 19266/50000 [3:29:29<5:10:15,  1.65it/s]


 39%|████████████▋                    | 19267/50000 [3:29:30<5:17:19,  1.61it/s]


 39%|████████████▋                    | 19268/50000 [3:29:31<5:23:15,  1.58it/s]


 39%|████████████▋                    | 19269/50000 [3:29:31<5:24:26,  1.58it/s]


 39%|████████████▋                    | 19270/50000 [3:29:32<5:13:43,  1.63it/s]


 39%|████████████▋                    | 19271/50000 [3:29:32<5:17:47,  1.61it/s]


 39%|████████████▋                    | 19272/50000 [3:29:33<5:24:33,  1.58it/s]


 39%|████████████▋                    | 19273/50000 [3:29:34<5:28:21,  1.56it/s]


 39%|████████████▋                    | 19274/50000 [3:29:35<6:02:08,  1.41it/s]


 39%|████████████▋                    | 19275/50000 [3:29:35<6:09:39,  1.39it/s]


 39%|████████████▋                    | 19276/50000 [3:29:36<6:30:08,  1.31it/s]


 39%|████████████▋                    | 19277/50000 [3:29:37<5:57:43,  1.43it/s]


 39%|████████████▋                    | 19278/50000 [3:29:37<5:46:38,  1.48it/s]


 39%|████████████▋                    | 19279/50000 [3:29:38<5:46:40,  1.48it/s]


 39%|████████████▋                    | 19280/50000 [3:29:39<5:34:08,  1.53it/s]


 39%|████████████▋                    | 19281/50000 [3:29:39<5:45:04,  1.48it/s]


 39%|████████████▋                    | 19282/50000 [3:29:40<5:33:54,  1.53it/s]


 39%|████████████▋                    | 19283/50000 [3:29:41<5:19:22,  1.60it/s]


 39%|████████████▋                    | 19284/50000 [3:29:41<5:27:32,  1.56it/s]


 39%|████████████▋                    | 19285/50000 [3:29:42<5:31:24,  1.54it/s]


 39%|████████████▋                    | 19286/50000 [3:29:42<5:06:30,  1.67it/s]


 39%|████████████▋                    | 19287/50000 [3:29:43<5:15:41,  1.62it/s]


 39%|████████████▋                    | 19288/50000 [3:29:44<5:08:33,  1.66it/s]


 39%|████████████▋                    | 19289/50000 [3:29:44<5:09:14,  1.66it/s]


 39%|████████████▋                    | 19290/50000 [3:29:45<5:13:44,  1.63it/s]


 39%|████████████▋                    | 19291/50000 [3:29:46<5:24:40,  1.58it/s]


 39%|████████████▋                    | 19292/50000 [3:29:46<5:37:32,  1.52it/s]


 39%|████████████▋                    | 19293/50000 [3:29:47<5:28:32,  1.56it/s]


 39%|████████████▋                    | 19294/50000 [3:29:48<5:38:49,  1.51it/s]


 39%|████████████▋                    | 19295/50000 [3:29:48<6:02:21,  1.41it/s]


 39%|████████████▋                    | 19296/50000 [3:29:49<5:45:00,  1.48it/s]


 39%|████████████▋                    | 19297/50000 [3:29:50<5:40:33,  1.50it/s]


 39%|████████████▋                    | 19298/50000 [3:29:50<5:34:35,  1.53it/s]


 39%|████████████▋                    | 19299/50000 [3:29:51<5:58:27,  1.43it/s]


 39%|████████████▋                    | 19300/50000 [3:29:52<5:55:48,  1.44it/s]
                                                                                
{'loss': 3.3026, 'grad_norm': 2.837080240249634, 'learning_rate': 0.000614, 'epoch': 1.01}

 39%|████████████▋                    | 19300/50000 [3:29:52<5:55:48,  1.44it/s]


 39%|████████████▋                    | 19301/50000 [3:29:52<5:36:34,  1.52it/s]


 39%|████████████▋                    | 19302/50000 [3:29:53<5:32:45,  1.54it/s]


 39%|████████████▋                    | 19303/50000 [3:29:54<5:32:01,  1.54it/s]


 39%|████████████▋                    | 19304/50000 [3:29:54<6:01:30,  1.42it/s]


 39%|████████████▋                    | 19305/50000 [3:29:55<5:55:05,  1.44it/s]


 39%|████████████▋                    | 19306/50000 [3:29:56<5:38:01,  1.51it/s]


 39%|████████████▋                    | 19307/50000 [3:29:56<5:49:42,  1.46it/s]


 39%|████████████▋                    | 19308/50000 [3:29:57<5:48:46,  1.47it/s]


 39%|████████████▋                    | 19309/50000 [3:29:58<5:31:34,  1.54it/s]


 39%|████████████▋                    | 19310/50000 [3:29:58<5:18:32,  1.61it/s]


 39%|████████████▋                    | 19311/50000 [3:29:59<5:20:10,  1.60it/s]


 39%|████████████▋                    | 19312/50000 [3:30:00<5:40:02,  1.50it/s]


 39%|████████████▋                    | 19313/50000 [3:30:00<5:23:36,  1.58it/s]


 39%|████████████▋                    | 19314/50000 [3:30:01<5:17:46,  1.61it/s]


 39%|████████████▋                    | 19315/50000 [3:30:01<5:11:53,  1.64it/s]


 39%|████████████▋                    | 19316/50000 [3:30:02<5:06:42,  1.67it/s]


 39%|████████████▋                    | 19317/50000 [3:30:03<5:46:47,  1.47it/s]


 39%|████████████▋                    | 19318/50000 [3:30:04<6:00:36,  1.42it/s]


 39%|████████████▊                    | 19319/50000 [3:30:04<5:48:32,  1.47it/s]


 39%|████████████▊                    | 19320/50000 [3:30:05<6:03:45,  1.41it/s]


 39%|████████████▊                    | 19321/50000 [3:30:06<5:59:24,  1.42it/s]


 39%|████████████▊                    | 19322/50000 [3:30:06<5:43:48,  1.49it/s]


 39%|████████████▊                    | 19323/50000 [3:30:07<5:35:54,  1.52it/s]


 39%|████████████▊                    | 19324/50000 [3:30:07<5:21:34,  1.59it/s]


 39%|████████████▊                    | 19325/50000 [3:30:08<5:10:38,  1.65it/s]


 39%|████████████▊                    | 19326/50000 [3:30:09<5:03:52,  1.68it/s]


 39%|████████████▊                    | 19327/50000 [3:30:09<5:05:50,  1.67it/s]


 39%|████████████▊                    | 19328/50000 [3:30:10<5:15:04,  1.62it/s]


 39%|████████████▊                    | 19329/50000 [3:30:10<5:06:03,  1.67it/s]


 39%|████████████▊                    | 19330/50000 [3:30:11<5:17:07,  1.61it/s]


 39%|████████████▊                    | 19331/50000 [3:30:12<5:24:50,  1.57it/s]


 39%|████████████▊                    | 19332/50000 [3:30:12<5:37:29,  1.51it/s]


 39%|████████████▊                    | 19333/50000 [3:30:13<5:37:13,  1.52it/s]


 39%|████████████▊                    | 19334/50000 [3:30:14<5:44:49,  1.48it/s]


 39%|████████████▊                    | 19335/50000 [3:30:14<5:31:37,  1.54it/s]


 39%|████████████▊                    | 19336/50000 [3:30:15<5:29:10,  1.55it/s]


 39%|████████████▊                    | 19337/50000 [3:30:16<5:40:30,  1.50it/s]


 39%|████████████▊                    | 19338/50000 [3:30:17<6:09:26,  1.38it/s]


 39%|████████████▊                    | 19339/50000 [3:30:17<5:49:51,  1.46it/s]


 39%|████████████▊                    | 19340/50000 [3:30:18<5:56:49,  1.43it/s]


 39%|████████████▊                    | 19341/50000 [3:30:19<5:48:48,  1.46it/s]


 39%|████████████▊                    | 19342/50000 [3:30:19<5:46:02,  1.48it/s]


 39%|████████████▊                    | 19343/50000 [3:30:20<5:32:29,  1.54it/s]


 39%|████████████▊                    | 19344/50000 [3:30:20<5:17:13,  1.61it/s]


 39%|████████████▊                    | 19345/50000 [3:30:21<5:25:50,  1.57it/s]


 39%|████████████▊                    | 19346/50000 [3:30:22<5:52:05,  1.45it/s]


 39%|████████████▊                    | 19347/50000 [3:30:23<5:49:21,  1.46it/s]


 39%|████████████▊                    | 19348/50000 [3:30:23<5:35:42,  1.52it/s]


 39%|████████████▊                    | 19349/50000 [3:30:24<5:30:07,  1.55it/s]


 39%|████████████▊                    | 19350/50000 [3:30:25<5:56:04,  1.43it/s]


 39%|████████████▊                    | 19351/50000 [3:30:25<5:46:18,  1.48it/s]


 39%|████████████▊                    | 19352/50000 [3:30:26<5:42:15,  1.49it/s]


 39%|████████████▊                    | 19353/50000 [3:30:27<5:42:26,  1.49it/s]


 39%|████████████▊                    | 19354/50000 [3:30:27<5:23:13,  1.58it/s]


 39%|████████████▊                    | 19355/50000 [3:30:28<5:13:32,  1.63it/s]


 39%|████████████▊                    | 19356/50000 [3:30:28<5:16:28,  1.61it/s]


 39%|████████████▊                    | 19357/50000 [3:30:29<5:17:37,  1.61it/s]


 39%|████████████▊                    | 19358/50000 [3:30:30<5:47:22,  1.47it/s]


 39%|████████████▊                    | 19359/50000 [3:30:30<5:41:00,  1.50it/s]


 39%|████████████▊                    | 19360/50000 [3:30:31<6:05:34,  1.40it/s]


 39%|████████████▊                    | 19361/50000 [3:30:32<5:53:48,  1.44it/s]


 39%|████████████▊                    | 19362/50000 [3:30:33<5:59:35,  1.42it/s]


 39%|████████████▊                    | 19363/50000 [3:30:33<5:27:01,  1.56it/s]


 39%|████████████▊                    | 19364/50000 [3:30:34<5:27:12,  1.56it/s]


 39%|████████████▊                    | 19365/50000 [3:30:34<5:48:40,  1.46it/s]


 39%|████████████▊                    | 19366/50000 [3:30:35<6:12:59,  1.37it/s]


 39%|████████████▊                    | 19367/50000 [3:30:36<6:09:45,  1.38it/s]


 39%|████████████▊                    | 19368/50000 [3:30:37<5:59:39,  1.42it/s]


 39%|████████████▊                    | 19369/50000 [3:30:37<6:03:39,  1.40it/s]


 39%|████████████▊                    | 19370/50000 [3:30:38<5:44:41,  1.48it/s]


 39%|████████████▊                    | 19371/50000 [3:30:39<5:27:35,  1.56it/s]


 39%|████████████▊                    | 19372/50000 [3:30:39<5:09:34,  1.65it/s]


 39%|████████████▊                    | 19373/50000 [3:30:40<5:16:29,  1.61it/s]


 39%|████████████▊                    | 19374/50000 [3:30:40<5:16:54,  1.61it/s]


 39%|████████████▊                    | 19375/50000 [3:30:41<5:15:06,  1.62it/s]


 39%|████████████▊                    | 19376/50000 [3:30:42<5:29:25,  1.55it/s]


 39%|████████████▊                    | 19377/50000 [3:30:42<5:27:47,  1.56it/s]


 39%|████████████▊                    | 19378/50000 [3:30:43<5:40:37,  1.50it/s]


 39%|████████████▊                    | 19379/50000 [3:30:44<5:42:24,  1.49it/s]


 39%|████████████▊                    | 19380/50000 [3:30:44<5:48:31,  1.46it/s]


 39%|████████████▊                    | 19381/50000 [3:30:45<5:44:00,  1.48it/s]


 39%|████████████▊                    | 19382/50000 [3:30:46<5:43:54,  1.48it/s]


 39%|████████████▊                    | 19383/50000 [3:30:47<6:07:46,  1.39it/s]


 39%|████████████▊                    | 19384/50000 [3:30:47<5:56:17,  1.43it/s]


 39%|████████████▊                    | 19385/50000 [3:30:48<6:01:16,  1.41it/s]


 39%|████████████▊                    | 19386/50000 [3:30:49<5:53:37,  1.44it/s]


 39%|████████████▊                    | 19387/50000 [3:30:49<5:58:19,  1.42it/s]


 39%|████████████▊                    | 19388/50000 [3:30:50<6:02:39,  1.41it/s]


 39%|████████████▊                    | 19389/50000 [3:30:51<5:54:37,  1.44it/s]


 39%|████████████▊                    | 19390/50000 [3:30:51<5:45:08,  1.48it/s]


 39%|████████████▊                    | 19391/50000 [3:30:52<5:25:00,  1.57it/s]


 39%|████████████▊                    | 19392/50000 [3:30:53<5:55:22,  1.44it/s]


 39%|████████████▊                    | 19393/50000 [3:30:53<5:47:33,  1.47it/s]


 39%|████████████▊                    | 19394/50000 [3:30:54<6:36:48,  1.29it/s]


 39%|████████████▊                    | 19395/50000 [3:30:55<6:13:41,  1.37it/s]


 39%|████████████▊                    | 19396/50000 [3:30:56<5:52:35,  1.45it/s]


 39%|████████████▊                    | 19397/50000 [3:30:56<5:35:56,  1.52it/s]


 39%|████████████▊                    | 19398/50000 [3:30:57<5:34:50,  1.52it/s]


 39%|████████████▊                    | 19399/50000 [3:30:58<5:46:02,  1.47it/s]


 39%|████████████▊                    | 19400/50000 [3:30:58<5:28:26,  1.55it/s]
                                                                                
{'loss': 3.3178, 'grad_norm': 2.9808478355407715, 'learning_rate': 0.000612, 'epoch': 1.02}

 39%|████████████▊                    | 19400/50000 [3:30:58<5:28:26,  1.55it/s]


 39%|████████████▊                    | 19401/50000 [3:30:59<5:21:00,  1.59it/s]


 39%|████████████▊                    | 19402/50000 [3:30:59<5:21:36,  1.59it/s]


 39%|████████████▊                    | 19403/50000 [3:31:00<5:22:50,  1.58it/s]


 39%|████████████▊                    | 19404/50000 [3:31:01<5:15:03,  1.62it/s]


 39%|████████████▊                    | 19405/50000 [3:31:01<5:20:10,  1.59it/s]


 39%|████████████▊                    | 19406/50000 [3:31:02<5:19:58,  1.59it/s]


 39%|████████████▊                    | 19407/50000 [3:31:02<5:10:18,  1.64it/s]


 39%|████████████▊                    | 19408/50000 [3:31:03<5:06:22,  1.66it/s]


 39%|████████████▊                    | 19409/50000 [3:31:04<5:10:16,  1.64it/s]


 39%|████████████▊                    | 19410/50000 [3:31:04<5:08:12,  1.65it/s]


 39%|████████████▊                    | 19411/50000 [3:31:05<5:40:27,  1.50it/s]


 39%|████████████▊                    | 19412/50000 [3:31:06<5:41:53,  1.49it/s]


 39%|████████████▊                    | 19413/50000 [3:31:06<5:48:49,  1.46it/s]


 39%|████████████▊                    | 19414/50000 [3:31:07<5:55:55,  1.43it/s]


 39%|████████████▊                    | 19415/50000 [3:31:08<5:45:58,  1.47it/s]


 39%|████████████▊                    | 19416/50000 [3:31:09<5:53:53,  1.44it/s]


 39%|████████████▊                    | 19417/50000 [3:31:09<5:48:22,  1.46it/s]


 39%|████████████▊                    | 19418/50000 [3:31:10<5:41:32,  1.49it/s]


 39%|████████████▊                    | 19419/50000 [3:31:10<5:34:51,  1.52it/s]


 39%|████████████▊                    | 19420/50000 [3:31:11<5:14:15,  1.62it/s]


 39%|████████████▊                    | 19421/50000 [3:31:12<5:28:22,  1.55it/s]


 39%|████████████▊                    | 19422/50000 [3:31:12<5:20:51,  1.59it/s]


 39%|████████████▊                    | 19423/50000 [3:31:13<5:10:44,  1.64it/s]


 39%|████████████▊                    | 19424/50000 [3:31:13<5:07:28,  1.66it/s]


 39%|████████████▊                    | 19425/50000 [3:31:14<5:12:54,  1.63it/s]


 39%|████████████▊                    | 19426/50000 [3:31:15<5:03:15,  1.68it/s]


 39%|████████████▊                    | 19427/50000 [3:31:15<5:08:08,  1.65it/s]


 39%|████████████▊                    | 19428/50000 [3:31:16<5:19:13,  1.60it/s]


 39%|████████████▊                    | 19429/50000 [3:31:17<5:47:15,  1.47it/s]


 39%|████████████▊                    | 19430/50000 [3:31:17<5:40:46,  1.50it/s]


 39%|████████████▊                    | 19431/50000 [3:31:18<5:46:58,  1.47it/s]


 39%|████████████▊                    | 19432/50000 [3:31:19<5:56:25,  1.43it/s]


 39%|████████████▊                    | 19433/50000 [3:31:19<5:39:27,  1.50it/s]


 39%|████████████▊                    | 19434/50000 [3:31:20<5:55:00,  1.43it/s]


 39%|████████████▊                    | 19435/50000 [3:31:21<5:35:33,  1.52it/s]


 39%|████████████▊                    | 19436/50000 [3:31:21<5:33:33,  1.53it/s]


 39%|████████████▊                    | 19437/50000 [3:31:22<5:44:59,  1.48it/s]


 39%|████████████▊                    | 19438/50000 [3:31:23<6:20:51,  1.34it/s]


 39%|████████████▊                    | 19439/50000 [3:31:24<6:02:20,  1.41it/s]


 39%|████████████▊                    | 19440/50000 [3:31:24<6:07:09,  1.39it/s]


 39%|████████████▊                    | 19441/50000 [3:31:25<6:01:24,  1.41it/s]


 39%|████████████▊                    | 19442/50000 [3:31:26<5:37:09,  1.51it/s]


 39%|████████████▊                    | 19443/50000 [3:31:26<5:15:47,  1.61it/s]


 39%|████████████▊                    | 19444/50000 [3:31:27<5:10:58,  1.64it/s]


 39%|████████████▊                    | 19445/50000 [3:31:27<5:19:11,  1.60it/s]


 39%|████████████▊                    | 19446/50000 [3:31:28<5:09:34,  1.64it/s]


 39%|████████████▊                    | 19447/50000 [3:31:29<5:09:29,  1.65it/s]


 39%|████████████▊                    | 19448/50000 [3:31:29<5:05:37,  1.67it/s]


 39%|████████████▊                    | 19449/50000 [3:31:30<5:24:29,  1.57it/s]


 39%|████████████▊                    | 19450/50000 [3:31:31<5:18:57,  1.60it/s]


 39%|████████████▊                    | 19451/50000 [3:31:31<5:20:48,  1.59it/s]


 39%|████████████▊                    | 19452/50000 [3:31:32<5:12:31,  1.63it/s]


 39%|████████████▊                    | 19453/50000 [3:31:32<5:19:19,  1.59it/s]


 39%|████████████▊                    | 19454/50000 [3:31:33<5:09:35,  1.64it/s]


 39%|████████████▊                    | 19455/50000 [3:31:34<5:18:11,  1.60it/s]


 39%|████████████▊                    | 19456/50000 [3:31:34<5:23:19,  1.57it/s]


 39%|████████████▊                    | 19457/50000 [3:31:35<5:44:43,  1.48it/s]


 39%|████████████▊                    | 19458/50000 [3:31:36<6:07:51,  1.38it/s]


 39%|████████████▊                    | 19459/50000 [3:31:37<5:58:27,  1.42it/s]


 39%|████████████▊                    | 19460/50000 [3:31:37<5:36:56,  1.51it/s]


 39%|████████████▊                    | 19461/50000 [3:31:38<5:47:17,  1.47it/s]


 39%|████████████▊                    | 19462/50000 [3:31:38<5:34:54,  1.52it/s]


 39%|████████████▊                    | 19463/50000 [3:31:39<5:35:54,  1.52it/s]


 39%|████████████▊                    | 19464/50000 [3:31:40<5:26:51,  1.56it/s]


 39%|████████████▊                    | 19465/50000 [3:31:40<5:39:14,  1.50it/s]


 39%|████████████▊                    | 19466/50000 [3:31:41<5:29:10,  1.55it/s]


 39%|████████████▊                    | 19467/50000 [3:31:42<5:54:52,  1.43it/s]


 39%|████████████▊                    | 19468/50000 [3:31:43<5:56:37,  1.43it/s]


 39%|████████████▊                    | 19469/50000 [3:31:43<5:45:19,  1.47it/s]


 39%|████████████▊                    | 19470/50000 [3:31:44<5:45:03,  1.47it/s]


 39%|████████████▊                    | 19471/50000 [3:31:44<5:40:06,  1.50it/s]


 39%|████████████▊                    | 19472/50000 [3:31:45<5:28:54,  1.55it/s]


 39%|████████████▊                    | 19473/50000 [3:31:46<5:05:12,  1.67it/s]


 39%|████████████▊                    | 19474/50000 [3:31:46<5:17:56,  1.60it/s]


 39%|████████████▊                    | 19475/50000 [3:31:47<5:34:57,  1.52it/s]


 39%|████████████▊                    | 19476/50000 [3:31:48<5:59:54,  1.41it/s]


 39%|████████████▊                    | 19477/50000 [3:31:49<6:21:15,  1.33it/s]


 39%|████████████▊                    | 19478/50000 [3:31:49<6:02:27,  1.40it/s]


 39%|████████████▊                    | 19479/50000 [3:31:50<6:09:04,  1.38it/s]


 39%|████████████▊                    | 19480/50000 [3:31:51<6:02:37,  1.40it/s]


 39%|████████████▊                    | 19481/50000 [3:31:51<5:53:14,  1.44it/s]


 39%|████████████▊                    | 19482/50000 [3:31:52<6:11:28,  1.37it/s]


 39%|████████████▊                    | 19483/50000 [3:31:53<6:03:17,  1.40it/s]


 39%|████████████▊                    | 19484/50000 [3:31:53<5:30:35,  1.54it/s]


 39%|████████████▊                    | 19485/50000 [3:31:54<5:28:58,  1.55it/s]


 39%|████████████▊                    | 19486/50000 [3:31:55<5:39:25,  1.50it/s]


 39%|████████████▊                    | 19487/50000 [3:31:55<5:28:19,  1.55it/s]


 39%|████████████▊                    | 19488/50000 [3:31:56<5:19:39,  1.59it/s]


 39%|████████████▊                    | 19489/50000 [3:31:57<5:19:20,  1.59it/s]


 39%|████████████▊                    | 19490/50000 [3:31:57<5:21:00,  1.58it/s]


 39%|████████████▊                    | 19491/50000 [3:31:58<5:22:10,  1.58it/s]


 39%|████████████▊                    | 19492/50000 [3:31:58<5:13:25,  1.62it/s]


 39%|████████████▊                    | 19493/50000 [3:31:59<5:46:42,  1.47it/s]


 39%|████████████▊                    | 19494/50000 [3:32:00<5:39:12,  1.50it/s]


 39%|████████████▊                    | 19495/50000 [3:32:01<5:45:37,  1.47it/s]


 39%|████████████▊                    | 19496/50000 [3:32:01<5:34:32,  1.52it/s]


 39%|████████████▊                    | 19497/50000 [3:32:02<6:04:32,  1.39it/s]


 39%|████████████▊                    | 19498/50000 [3:32:03<6:03:16,  1.40it/s]


 39%|████████████▊                    | 19499/50000 [3:32:03<5:43:05,  1.48it/s]


 39%|████████████▊                    | 19500/50000 [3:32:04<5:30:50,  1.54it/s]
                                                                                
{'loss': 3.2987, 'grad_norm': 2.99472713470459, 'learning_rate': 0.00061, 'epoch': 1.02}

 39%|████████████▊                    | 19500/50000 [3:32:04<5:30:50,  1.54it/s]


 39%|████████████▊                    | 19501/50000 [3:32:05<5:21:24,  1.58it/s]


 39%|████████████▊                    | 19502/50000 [3:32:05<5:17:46,  1.60it/s]


 39%|████████████▊                    | 19503/50000 [3:32:06<5:41:51,  1.49it/s]


 39%|████████████▊                    | 19504/50000 [3:32:07<5:51:03,  1.45it/s]


 39%|████████████▊                    | 19505/50000 [3:32:07<5:48:19,  1.46it/s]


 39%|████████████▊                    | 19506/50000 [3:32:08<5:35:05,  1.52it/s]


 39%|████████████▊                    | 19507/50000 [3:32:09<5:27:24,  1.55it/s]


 39%|████████████▉                    | 19508/50000 [3:32:09<5:13:30,  1.62it/s]


 39%|████████████▉                    | 19509/50000 [3:32:10<5:16:56,  1.60it/s]


 39%|████████████▉                    | 19510/50000 [3:32:10<5:05:21,  1.66it/s]


 39%|████████████▉                    | 19511/50000 [3:32:11<5:08:42,  1.65it/s]


 39%|████████████▉                    | 19512/50000 [3:32:11<5:05:22,  1.66it/s]


 39%|████████████▉                    | 19513/50000 [3:32:12<5:11:34,  1.63it/s]


 39%|████████████▉                    | 19514/50000 [3:32:13<5:13:57,  1.62it/s]


 39%|████████████▉                    | 19515/50000 [3:32:13<5:24:19,  1.57it/s]


 39%|████████████▉                    | 19516/50000 [3:32:14<5:14:15,  1.62it/s]


 39%|████████████▉                    | 19517/50000 [3:32:15<5:19:34,  1.59it/s]


 39%|████████████▉                    | 19518/50000 [3:32:15<5:21:25,  1.58it/s]


 39%|████████████▉                    | 19519/50000 [3:32:16<5:12:13,  1.63it/s]


 39%|████████████▉                    | 19520/50000 [3:32:17<5:32:03,  1.53it/s]


 39%|████████████▉                    | 19521/50000 [3:32:17<5:42:09,  1.48it/s]


 39%|████████████▉                    | 19522/50000 [3:32:18<5:35:22,  1.51it/s]


 39%|████████████▉                    | 19523/50000 [3:32:19<5:32:07,  1.53it/s]


 39%|████████████▉                    | 19524/50000 [3:32:19<5:30:52,  1.54it/s]


 39%|████████████▉                    | 19525/50000 [3:32:20<5:59:57,  1.41it/s]


 39%|████████████▉                    | 19526/50000 [3:32:21<5:51:30,  1.44it/s]


 39%|████████████▉                    | 19527/50000 [3:32:21<5:57:22,  1.42it/s]


 39%|████████████▉                    | 19528/50000 [3:32:22<5:42:57,  1.48it/s]


 39%|████████████▉                    | 19529/50000 [3:32:23<5:56:10,  1.43it/s]


 39%|████████████▉                    | 19530/50000 [3:32:24<5:51:29,  1.44it/s]


 39%|████████████▉                    | 19531/50000 [3:32:24<5:43:23,  1.48it/s]


 39%|████████████▉                    | 19532/50000 [3:32:25<5:31:00,  1.53it/s]


 39%|████████████▉                    | 19533/50000 [3:32:25<5:22:22,  1.58it/s]


 39%|████████████▉                    | 19534/50000 [3:32:26<5:18:10,  1.60it/s]


 39%|████████████▉                    | 19535/50000 [3:32:27<5:34:54,  1.52it/s]


 39%|████████████▉                    | 19536/50000 [3:32:27<5:45:40,  1.47it/s]


 39%|████████████▉                    | 19537/50000 [3:32:28<5:56:02,  1.43it/s]


 39%|████████████▉                    | 19538/50000 [3:32:29<5:39:54,  1.49it/s]


 39%|████████████▉                    | 19539/50000 [3:32:29<5:26:42,  1.55it/s]


 39%|████████████▉                    | 19540/50000 [3:32:30<5:16:26,  1.60it/s]


 39%|████████████▉                    | 19541/50000 [3:32:30<5:05:20,  1.66it/s]


 39%|████████████▉                    | 19542/50000 [3:32:31<5:27:56,  1.55it/s]


 39%|████████████▉                    | 19543/50000 [3:32:32<5:26:51,  1.55it/s]


 39%|████████████▉                    | 19544/50000 [3:32:33<5:52:01,  1.44it/s]


 39%|████████████▉                    | 19545/50000 [3:32:33<5:29:26,  1.54it/s]


 39%|████████████▉                    | 19546/50000 [3:32:34<5:13:35,  1.62it/s]


 39%|████████████▉                    | 19547/50000 [3:32:34<5:30:38,  1.54it/s]


 39%|████████████▉                    | 19548/50000 [3:32:35<5:14:21,  1.61it/s]


 39%|████████████▉                    | 19549/50000 [3:32:36<5:04:55,  1.66it/s]


 39%|████████████▉                    | 19550/50000 [3:32:36<5:32:23,  1.53it/s]


 39%|████████████▉                    | 19551/50000 [3:32:37<5:27:08,  1.55it/s]


 39%|████████████▉                    | 19552/50000 [3:32:38<5:19:36,  1.59it/s]


 39%|████████████▉                    | 19553/50000 [3:32:38<5:31:39,  1.53it/s]


 39%|████████████▉                    | 19554/50000 [3:32:39<5:57:36,  1.42it/s]


 39%|████████████▉                    | 19555/50000 [3:32:40<5:39:00,  1.50it/s]


 39%|████████████▉                    | 19556/50000 [3:32:40<5:31:45,  1.53it/s]


 39%|████████████▉                    | 19557/50000 [3:32:41<5:41:30,  1.49it/s]


 39%|████████████▉                    | 19558/50000 [3:32:42<5:27:48,  1.55it/s]


 39%|████████████▉                    | 19559/50000 [3:32:42<5:20:57,  1.58it/s]


 39%|████████████▉                    | 19560/50000 [3:32:43<5:11:22,  1.63it/s]


 39%|████████████▉                    | 19561/50000 [3:32:43<5:20:57,  1.58it/s]


 39%|████████████▉                    | 19562/50000 [3:32:44<5:37:41,  1.50it/s]


 39%|████████████▉                    | 19563/50000 [3:32:45<5:26:50,  1.55it/s]


 39%|████████████▉                    | 19564/50000 [3:32:46<6:06:33,  1.38it/s]


 39%|████████████▉                    | 19565/50000 [3:32:46<6:00:36,  1.41it/s]


 39%|████████████▉                    | 19566/50000 [3:32:47<5:50:37,  1.45it/s]


 39%|████████████▉                    | 19567/50000 [3:32:48<5:48:27,  1.46it/s]


 39%|████████████▉                    | 19568/50000 [3:32:48<5:41:09,  1.49it/s]


 39%|████████████▉                    | 19569/50000 [3:32:49<5:27:25,  1.55it/s]


 39%|████████████▉                    | 19570/50000 [3:32:50<5:31:23,  1.53it/s]


 39%|████████████▉                    | 19571/50000 [3:32:50<5:27:15,  1.55it/s]


 39%|████████████▉                    | 19572/50000 [3:32:51<5:08:46,  1.64it/s]


 39%|████████████▉                    | 19573/50000 [3:32:52<5:54:11,  1.43it/s]


 39%|████████████▉                    | 19574/50000 [3:32:52<5:58:27,  1.41it/s]


 39%|████████████▉                    | 19575/50000 [3:32:53<5:52:17,  1.44it/s]


 39%|████████████▉                    | 19576/50000 [3:32:54<5:37:26,  1.50it/s]


 39%|████████████▉                    | 19577/50000 [3:32:54<5:48:34,  1.45it/s]


 39%|████████████▉                    | 19578/50000 [3:32:55<5:32:05,  1.53it/s]


 39%|████████████▉                    | 19579/50000 [3:32:56<5:34:58,  1.51it/s]


 39%|████████████▉                    | 19580/50000 [3:32:56<5:49:42,  1.45it/s]


 39%|████████████▉                    | 19581/50000 [3:32:57<5:47:17,  1.46it/s]


 39%|████████████▉                    | 19582/50000 [3:32:58<5:54:14,  1.43it/s]


 39%|████████████▉                    | 19583/50000 [3:32:59<6:06:44,  1.38it/s]


 39%|████████████▉                    | 19584/50000 [3:32:59<5:47:08,  1.46it/s]


 39%|████████████▉                    | 19585/50000 [3:33:00<5:41:12,  1.49it/s]


 39%|████████████▉                    | 19586/50000 [3:33:00<5:40:00,  1.49it/s]


 39%|████████████▉                    | 19587/50000 [3:33:01<5:51:00,  1.44it/s]


 39%|████████████▉                    | 19588/50000 [3:33:02<5:57:40,  1.42it/s]


 39%|████████████▉                    | 19589/50000 [3:33:03<5:40:48,  1.49it/s]


 39%|████████████▉                    | 19590/50000 [3:33:03<5:33:50,  1.52it/s]


 39%|████████████▉                    | 19591/50000 [3:33:04<5:30:51,  1.53it/s]


 39%|████████████▉                    | 19592/50000 [3:33:04<5:17:17,  1.60it/s]


 39%|████████████▉                    | 19593/50000 [3:33:05<5:18:22,  1.59it/s]


 39%|████████████▉                    | 19594/50000 [3:33:06<5:11:25,  1.63it/s]


 39%|████████████▉                    | 19595/50000 [3:33:06<5:07:30,  1.65it/s]


 39%|████████████▉                    | 19596/50000 [3:33:07<4:59:01,  1.69it/s]


 39%|████████████▉                    | 19597/50000 [3:33:07<5:10:19,  1.63it/s]


 39%|████████████▉                    | 19598/50000 [3:33:08<5:43:07,  1.48it/s]


 39%|████████████▉                    | 19599/50000 [3:33:09<5:30:37,  1.53it/s]


 39%|████████████▉                    | 19600/50000 [3:33:10<5:45:19,  1.47it/s]
                                                                                
{'loss': 3.2946, 'grad_norm': 3.1691510677337646, 'learning_rate': 0.000608, 'epoch': 1.03}

 39%|████████████▉                    | 19600/50000 [3:33:10<5:45:19,  1.47it/s]


 39%|████████████▉                    | 19601/50000 [3:33:10<5:38:00,  1.50it/s]


 39%|████████████▉                    | 19602/50000 [3:33:11<5:36:36,  1.51it/s]


 39%|████████████▉                    | 19603/50000 [3:33:12<5:32:01,  1.53it/s]


 39%|████████████▉                    | 19604/50000 [3:33:12<5:40:33,  1.49it/s]


 39%|████████████▉                    | 19605/50000 [3:33:13<5:47:24,  1.46it/s]


 39%|████████████▉                    | 19606/50000 [3:33:14<5:31:59,  1.53it/s]


 39%|████████████▉                    | 19607/50000 [3:33:14<5:17:25,  1.60it/s]


 39%|████████████▉                    | 19608/50000 [3:33:15<5:22:30,  1.57it/s]


 39%|████████████▉                    | 19609/50000 [3:33:15<5:27:57,  1.54it/s]


 39%|████████████▉                    | 19610/50000 [3:33:16<5:17:27,  1.60it/s]


 39%|████████████▉                    | 19611/50000 [3:33:17<5:25:27,  1.56it/s]


 39%|████████████▉                    | 19612/50000 [3:33:17<5:46:47,  1.46it/s]


 39%|████████████▉                    | 19613/50000 [3:33:18<5:34:30,  1.51it/s]


 39%|████████████▉                    | 19614/50000 [3:33:19<5:37:19,  1.50it/s]


 39%|████████████▉                    | 19615/50000 [3:33:19<5:45:30,  1.47it/s]


 39%|████████████▉                    | 19616/50000 [3:33:20<5:30:39,  1.53it/s]


 39%|████████████▉                    | 19617/50000 [3:33:21<5:18:17,  1.59it/s]


 39%|████████████▉                    | 19618/50000 [3:33:21<5:35:47,  1.51it/s]


 39%|████████████▉                    | 19619/50000 [3:33:22<5:37:54,  1.50it/s]


 39%|████████████▉                    | 19620/50000 [3:33:23<5:35:33,  1.51it/s]


 39%|████████████▉                    | 19621/50000 [3:33:23<5:30:05,  1.53it/s]


 39%|████████████▉                    | 19622/50000 [3:33:24<5:46:54,  1.46it/s]


 39%|████████████▉                    | 19623/50000 [3:33:25<5:33:14,  1.52it/s]


 39%|████████████▉                    | 19624/50000 [3:33:25<5:22:39,  1.57it/s]


 39%|████████████▉                    | 19625/50000 [3:33:26<5:11:26,  1.63it/s]


 39%|████████████▉                    | 19626/50000 [3:33:26<5:15:07,  1.61it/s]


 39%|████████████▉                    | 19627/50000 [3:33:27<5:24:24,  1.56it/s]


 39%|████████████▉                    | 19628/50000 [3:33:28<5:29:53,  1.53it/s]


 39%|████████████▉                    | 19629/50000 [3:33:28<5:28:53,  1.54it/s]


 39%|████████████▉                    | 19630/50000 [3:33:29<5:19:32,  1.58it/s]


 39%|████████████▉                    | 19631/50000 [3:33:30<5:23:29,  1.56it/s]


 39%|████████████▉                    | 19632/50000 [3:33:30<5:20:44,  1.58it/s]


 39%|████████████▉                    | 19633/50000 [3:33:31<5:13:53,  1.61it/s]


 39%|████████████▉                    | 19634/50000 [3:33:32<5:14:50,  1.61it/s]


 39%|████████████▉                    | 19635/50000 [3:33:32<5:22:12,  1.57it/s]


 39%|████████████▉                    | 19636/50000 [3:33:33<5:44:08,  1.47it/s]


 39%|████████████▉                    | 19637/50000 [3:33:34<6:09:35,  1.37it/s]


 39%|████████████▉                    | 19638/50000 [3:33:34<5:41:22,  1.48it/s]


 39%|████████████▉                    | 19639/50000 [3:33:35<5:35:56,  1.51it/s]


 39%|████████████▉                    | 19640/50000 [3:33:36<5:24:29,  1.56it/s]


 39%|████████████▉                    | 19641/50000 [3:33:36<5:18:23,  1.59it/s]


 39%|████████████▉                    | 19642/50000 [3:33:37<5:35:37,  1.51it/s]


 39%|████████████▉                    | 19643/50000 [3:33:38<5:36:49,  1.50it/s]


 39%|████████████▉                    | 19644/50000 [3:33:38<5:24:11,  1.56it/s]


 39%|████████████▉                    | 19645/50000 [3:33:39<6:18:04,  1.34it/s]


 39%|████████████▉                    | 19646/50000 [3:33:40<5:42:59,  1.47it/s]


 39%|████████████▉                    | 19647/50000 [3:33:41<6:02:55,  1.39it/s]


 39%|████████████▉                    | 19648/50000 [3:33:41<6:04:55,  1.39it/s]


 39%|████████████▉                    | 19649/50000 [3:33:42<5:51:32,  1.44it/s]


 39%|████████████▉                    | 19650/50000 [3:33:43<6:04:33,  1.39it/s]


 39%|████████████▉                    | 19651/50000 [3:33:43<5:57:57,  1.41it/s]


 39%|████████████▉                    | 19652/50000 [3:33:44<5:27:01,  1.55it/s]


 39%|████████████▉                    | 19653/50000 [3:33:45<5:24:59,  1.56it/s]


 39%|████████████▉                    | 19654/50000 [3:33:45<5:17:15,  1.59it/s]


 39%|████████████▉                    | 19655/50000 [3:33:46<5:17:39,  1.59it/s]


 39%|████████████▉                    | 19656/50000 [3:33:47<5:40:17,  1.49it/s]


 39%|████████████▉                    | 19657/50000 [3:33:47<5:27:55,  1.54it/s]


 39%|████████████▉                    | 19658/50000 [3:33:48<5:53:43,  1.43it/s]


 39%|████████████▉                    | 19659/50000 [3:33:49<5:47:48,  1.45it/s]


 39%|████████████▉                    | 19660/50000 [3:33:49<5:40:47,  1.48it/s]


 39%|████████████▉                    | 19661/50000 [3:33:50<5:27:14,  1.55it/s]


 39%|████████████▉                    | 19662/50000 [3:33:50<5:30:11,  1.53it/s]


 39%|████████████▉                    | 19663/50000 [3:33:51<5:39:01,  1.49it/s]


 39%|████████████▉                    | 19664/50000 [3:33:52<5:25:03,  1.56it/s]


 39%|████████████▉                    | 19665/50000 [3:33:52<5:38:37,  1.49it/s]


 39%|████████████▉                    | 19666/50000 [3:33:53<5:48:04,  1.45it/s]


 39%|████████████▉                    | 19667/50000 [3:33:54<5:35:08,  1.51it/s]


 39%|████████████▉                    | 19668/50000 [3:33:54<5:29:50,  1.53it/s]


 39%|████████████▉                    | 19669/50000 [3:33:55<5:19:19,  1.58it/s]


 39%|████████████▉                    | 19670/50000 [3:33:56<5:11:50,  1.62it/s]


 39%|████████████▉                    | 19671/50000 [3:33:56<5:30:01,  1.53it/s]


 39%|████████████▉                    | 19672/50000 [3:33:57<6:00:45,  1.40it/s]


 39%|████████████▉                    | 19673/50000 [3:33:58<6:02:17,  1.40it/s]


 39%|████████████▉                    | 19674/50000 [3:33:59<5:49:37,  1.45it/s]


 39%|████████████▉                    | 19675/50000 [3:33:59<5:33:03,  1.52it/s]


 39%|████████████▉                    | 19676/50000 [3:34:00<6:03:39,  1.39it/s]


 39%|████████████▉                    | 19677/50000 [3:34:01<6:02:05,  1.40it/s]


 39%|████████████▉                    | 19678/50000 [3:34:01<5:36:07,  1.50it/s]


 39%|████████████▉                    | 19679/50000 [3:34:02<5:53:44,  1.43it/s]


 39%|████████████▉                    | 19680/50000 [3:34:03<6:23:57,  1.32it/s]


 39%|████████████▉                    | 19681/50000 [3:34:04<5:56:08,  1.42it/s]


 39%|████████████▉                    | 19682/50000 [3:34:04<5:39:26,  1.49it/s]


 39%|████████████▉                    | 19683/50000 [3:34:05<5:28:42,  1.54it/s]


 39%|████████████▉                    | 19684/50000 [3:34:05<5:37:38,  1.50it/s]


 39%|████████████▉                    | 19685/50000 [3:34:06<5:30:26,  1.53it/s]


 39%|████████████▉                    | 19686/50000 [3:34:07<5:17:44,  1.59it/s]


 39%|████████████▉                    | 19687/50000 [3:34:07<5:36:01,  1.50it/s]


 39%|████████████▉                    | 19688/50000 [3:34:08<6:02:47,  1.39it/s]


 39%|████████████▉                    | 19689/50000 [3:34:09<5:37:26,  1.50it/s]


 39%|████████████▉                    | 19690/50000 [3:34:09<5:26:23,  1.55it/s]


 39%|████████████▉                    | 19691/50000 [3:34:10<5:55:59,  1.42it/s]


 39%|████████████▉                    | 19692/50000 [3:34:11<6:05:38,  1.38it/s]


 39%|████████████▉                    | 19693/50000 [3:34:12<6:11:29,  1.36it/s]


 39%|████████████▉                    | 19694/50000 [3:34:12<5:54:58,  1.42it/s]


 39%|████████████▉                    | 19695/50000 [3:34:13<5:48:19,  1.45it/s]


 39%|████████████▉                    | 19696/50000 [3:34:14<5:45:29,  1.46it/s]


 39%|█████████████                    | 19697/50000 [3:34:14<5:36:46,  1.50it/s]


 39%|█████████████                    | 19698/50000 [3:34:15<5:29:46,  1.53it/s]


 39%|█████████████                    | 19699/50000 [3:34:16<5:30:38,  1.53it/s]


 39%|█████████████                    | 19700/50000 [3:34:16<5:33:10,  1.52it/s]
                                                                                
{'loss': 3.3406, 'grad_norm': 4.049343109130859, 'learning_rate': 0.000606, 'epoch': 1.03}

 39%|█████████████                    | 19700/50000 [3:34:16<5:33:10,  1.52it/s]


 39%|█████████████                    | 19701/50000 [3:34:17<5:25:14,  1.55it/s]


 39%|█████████████                    | 19702/50000 [3:34:18<5:41:06,  1.48it/s]


 39%|█████████████                    | 19703/50000 [3:34:18<5:28:51,  1.54it/s]


 39%|█████████████                    | 19704/50000 [3:34:19<5:17:28,  1.59it/s]


 39%|█████████████                    | 19705/50000 [3:34:19<5:12:21,  1.62it/s]


 39%|█████████████                    | 19706/50000 [3:34:20<5:46:06,  1.46it/s]


 39%|█████████████                    | 19707/50000 [3:34:21<5:59:43,  1.40it/s]


 39%|█████████████                    | 19708/50000 [3:34:22<5:51:34,  1.44it/s]


 39%|█████████████                    | 19709/50000 [3:34:23<6:12:37,  1.35it/s]


 39%|█████████████                    | 19710/50000 [3:34:23<5:39:56,  1.49it/s]


 39%|█████████████                    | 19711/50000 [3:34:24<5:31:59,  1.52it/s]


 39%|█████████████                    | 19712/50000 [3:34:24<5:41:07,  1.48it/s]


 39%|█████████████                    | 19713/50000 [3:34:25<5:48:25,  1.45it/s]


 39%|█████████████                    | 19714/50000 [3:34:26<5:28:25,  1.54it/s]


 39%|█████████████                    | 19715/50000 [3:34:26<5:03:25,  1.66it/s]


 39%|█████████████                    | 19716/50000 [3:34:27<5:07:19,  1.64it/s]


 39%|█████████████                    | 19717/50000 [3:34:27<5:12:50,  1.61it/s]


 39%|█████████████                    | 19718/50000 [3:34:28<5:21:37,  1.57it/s]


 39%|█████████████                    | 19719/50000 [3:34:29<5:17:08,  1.59it/s]


 39%|█████████████                    | 19720/50000 [3:34:29<5:13:02,  1.61it/s]


 39%|█████████████                    | 19721/50000 [3:34:30<5:13:56,  1.61it/s]


 39%|█████████████                    | 19722/50000 [3:34:31<5:43:15,  1.47it/s]


 39%|█████████████                    | 19723/50000 [3:34:31<5:19:21,  1.58it/s]


 39%|█████████████                    | 19724/50000 [3:34:32<5:36:02,  1.50it/s]


 39%|█████████████                    | 19725/50000 [3:34:33<5:47:44,  1.45it/s]


 39%|█████████████                    | 19726/50000 [3:34:33<5:41:05,  1.48it/s]


 39%|█████████████                    | 19727/50000 [3:34:34<5:24:05,  1.56it/s]


 39%|█████████████                    | 19728/50000 [3:34:35<5:23:36,  1.56it/s]


 39%|█████████████                    | 19729/50000 [3:34:35<5:09:02,  1.63it/s]


 39%|█████████████                    | 19730/50000 [3:34:36<5:06:22,  1.65it/s]


 39%|█████████████                    | 19731/50000 [3:34:36<5:24:01,  1.56it/s]


 39%|█████████████                    | 19732/50000 [3:34:37<5:21:38,  1.57it/s]


 39%|█████████████                    | 19733/50000 [3:34:38<5:40:27,  1.48it/s]


 39%|█████████████                    | 19734/50000 [3:34:38<5:23:39,  1.56it/s]


 39%|█████████████                    | 19735/50000 [3:34:39<5:14:45,  1.60it/s]


 39%|█████████████                    | 19736/50000 [3:34:40<5:30:48,  1.52it/s]


 39%|█████████████                    | 19737/50000 [3:34:40<5:22:09,  1.57it/s]


 39%|█████████████                    | 19738/50000 [3:34:41<5:21:29,  1.57it/s]


 39%|█████████████                    | 19739/50000 [3:34:42<5:13:14,  1.61it/s]


 39%|█████████████                    | 19740/50000 [3:34:42<5:18:55,  1.58it/s]


 39%|█████████████                    | 19741/50000 [3:34:43<5:35:39,  1.50it/s]


 39%|█████████████                    | 19742/50000 [3:34:44<5:48:16,  1.45it/s]


 39%|█████████████                    | 19743/50000 [3:34:44<5:31:00,  1.52it/s]


 39%|█████████████                    | 19744/50000 [3:34:45<5:32:57,  1.51it/s]


 39%|█████████████                    | 19745/50000 [3:34:46<6:14:26,  1.35it/s]


 39%|█████████████                    | 19746/50000 [3:34:47<6:03:52,  1.39it/s]


 39%|█████████████                    | 19747/50000 [3:34:47<5:50:49,  1.44it/s]


 39%|█████████████                    | 19748/50000 [3:34:48<6:12:11,  1.35it/s]


 39%|█████████████                    | 19749/50000 [3:34:49<6:12:15,  1.35it/s]


 40%|█████████████                    | 19750/50000 [3:34:49<5:59:30,  1.40it/s]


 40%|█████████████                    | 19751/50000 [3:34:50<6:06:18,  1.38it/s]


 40%|█████████████                    | 19752/50000 [3:34:51<6:14:53,  1.34it/s]


 40%|█████████████                    | 19753/50000 [3:34:52<6:09:59,  1.36it/s]


 40%|█████████████                    | 19754/50000 [3:34:52<5:44:31,  1.46it/s]


 40%|█████████████                    | 19755/50000 [3:34:53<5:52:53,  1.43it/s]


 40%|█████████████                    | 19756/50000 [3:34:54<5:48:49,  1.45it/s]


 40%|█████████████                    | 19757/50000 [3:34:54<6:08:50,  1.37it/s]


 40%|█████████████                    | 19758/50000 [3:34:55<5:57:01,  1.41it/s]


 40%|█████████████                    | 19759/50000 [3:34:56<6:14:30,  1.35it/s]


 40%|█████████████                    | 19760/50000 [3:34:57<5:50:32,  1.44it/s]


 40%|█████████████                    | 19761/50000 [3:34:57<5:33:48,  1.51it/s]


 40%|█████████████                    | 19762/50000 [3:34:58<5:22:04,  1.56it/s]


 40%|█████████████                    | 19763/50000 [3:34:58<5:07:53,  1.64it/s]


 40%|█████████████                    | 19764/50000 [3:34:59<5:06:23,  1.64it/s]


 40%|█████████████                    | 19765/50000 [3:34:59<5:14:01,  1.60it/s]


 40%|█████████████                    | 19766/50000 [3:35:00<5:19:22,  1.58it/s]


 40%|█████████████                    | 19767/50000 [3:35:01<5:20:20,  1.57it/s]


 40%|█████████████                    | 19768/50000 [3:35:02<5:34:53,  1.50it/s]


 40%|█████████████                    | 19769/50000 [3:35:02<5:35:14,  1.50it/s]


 40%|█████████████                    | 19770/50000 [3:35:03<5:23:02,  1.56it/s]


 40%|█████████████                    | 19771/50000 [3:35:04<5:38:42,  1.49it/s]


 40%|█████████████                    | 19772/50000 [3:35:04<5:50:37,  1.44it/s]


 40%|█████████████                    | 19773/50000 [3:35:05<5:40:29,  1.48it/s]


 40%|█████████████                    | 19774/50000 [3:35:05<5:25:42,  1.55it/s]


 40%|█████████████                    | 19775/50000 [3:35:06<5:51:39,  1.43it/s]


 40%|█████████████                    | 19776/50000 [3:35:07<5:48:48,  1.44it/s]


 40%|█████████████                    | 19777/50000 [3:35:08<5:42:55,  1.47it/s]


 40%|█████████████                    | 19778/50000 [3:35:08<5:54:38,  1.42it/s]


 40%|█████████████                    | 19779/50000 [3:35:09<5:48:05,  1.45it/s]


 40%|█████████████                    | 19780/50000 [3:35:10<5:52:02,  1.43it/s]


 40%|█████████████                    | 19781/50000 [3:35:10<5:47:08,  1.45it/s]


 40%|█████████████                    | 19782/50000 [3:35:11<5:38:00,  1.49it/s]


 40%|█████████████                    | 19783/50000 [3:35:12<5:31:36,  1.52it/s]


 40%|█████████████                    | 19784/50000 [3:35:12<5:26:13,  1.54it/s]


 40%|█████████████                    | 19785/50000 [3:35:13<5:25:13,  1.55it/s]


 40%|█████████████                    | 19786/50000 [3:35:13<5:06:00,  1.65it/s]


 40%|█████████████                    | 19787/50000 [3:35:14<5:03:25,  1.66it/s]


 40%|█████████████                    | 19788/50000 [3:35:15<4:49:43,  1.74it/s]


 40%|█████████████                    | 19789/50000 [3:35:15<5:16:02,  1.59it/s]


 40%|█████████████                    | 19790/50000 [3:35:16<5:19:40,  1.58it/s]


 40%|█████████████                    | 19791/50000 [3:35:17<5:35:58,  1.50it/s]


 40%|█████████████                    | 19792/50000 [3:35:17<5:18:27,  1.58it/s]


 40%|█████████████                    | 19793/50000 [3:35:18<5:11:51,  1.61it/s]


 40%|█████████████                    | 19794/50000 [3:35:18<5:01:29,  1.67it/s]


 40%|█████████████                    | 19795/50000 [3:35:19<4:58:07,  1.69it/s]


 40%|█████████████                    | 19796/50000 [3:35:20<5:32:54,  1.51it/s]


 40%|█████████████                    | 19797/50000 [3:35:20<5:32:29,  1.51it/s]


 40%|█████████████                    | 19798/50000 [3:35:21<5:24:26,  1.55it/s]


 40%|█████████████                    | 19799/50000 [3:35:22<5:13:07,  1.61it/s]


 40%|█████████████                    | 19800/50000 [3:35:22<5:09:55,  1.62it/s]
                                                                                
{'loss': 3.2704, 'grad_norm': 6.449809551239014, 'learning_rate': 0.000604, 'epoch': 1.04}

 40%|█████████████                    | 19800/50000 [3:35:22<5:09:55,  1.62it/s]


 40%|█████████████                    | 19801/50000 [3:35:23<5:03:15,  1.66it/s]


 40%|█████████████                    | 19802/50000 [3:35:23<4:57:22,  1.69it/s]


 40%|█████████████                    | 19803/50000 [3:35:24<5:08:31,  1.63it/s]


 40%|█████████████                    | 19804/50000 [3:35:25<5:27:10,  1.54it/s]


 40%|█████████████                    | 19805/50000 [3:35:26<5:38:19,  1.49it/s]


 40%|█████████████                    | 19806/50000 [3:35:26<5:35:10,  1.50it/s]


 40%|█████████████                    | 19807/50000 [3:35:27<5:35:49,  1.50it/s]


 40%|█████████████                    | 19808/50000 [3:35:27<5:34:24,  1.50it/s]


 40%|█████████████                    | 19809/50000 [3:35:28<5:58:13,  1.40it/s]


 40%|█████████████                    | 19810/50000 [3:35:29<6:14:48,  1.34it/s]


 40%|█████████████                    | 19811/50000 [3:35:30<5:54:00,  1.42it/s]


 40%|█████████████                    | 19812/50000 [3:35:30<5:23:51,  1.55it/s]


 40%|█████████████                    | 19813/50000 [3:35:31<5:35:00,  1.50it/s]


 40%|█████████████                    | 19814/50000 [3:35:32<5:44:35,  1.46it/s]


 40%|█████████████                    | 19815/50000 [3:35:32<5:42:38,  1.47it/s]


 40%|█████████████                    | 19816/50000 [3:35:33<5:22:08,  1.56it/s]


 40%|█████████████                    | 19817/50000 [3:35:34<5:21:02,  1.57it/s]


 40%|█████████████                    | 19818/50000 [3:35:34<5:10:44,  1.62it/s]


 40%|█████████████                    | 19819/50000 [3:35:35<5:13:30,  1.60it/s]


 40%|█████████████                    | 19820/50000 [3:35:35<5:01:38,  1.67it/s]


 40%|█████████████                    | 19821/50000 [3:35:36<5:05:37,  1.65it/s]


 40%|█████████████                    | 19822/50000 [3:35:37<5:10:19,  1.62it/s]


 40%|█████████████                    | 19823/50000 [3:35:37<5:11:47,  1.61it/s]


 40%|█████████████                    | 19824/50000 [3:35:38<5:30:21,  1.52it/s]


 40%|█████████████                    | 19825/50000 [3:35:38<5:16:14,  1.59it/s]


 40%|█████████████                    | 19826/50000 [3:35:39<5:10:09,  1.62it/s]


 40%|█████████████                    | 19827/50000 [3:35:40<5:11:40,  1.61it/s]


 40%|█████████████                    | 19828/50000 [3:35:40<5:12:44,  1.61it/s]


 40%|█████████████                    | 19829/50000 [3:35:41<5:25:50,  1.54it/s]


 40%|█████████████                    | 19830/50000 [3:35:42<5:16:55,  1.59it/s]


 40%|█████████████                    | 19831/50000 [3:35:42<5:22:08,  1.56it/s]


 40%|█████████████                    | 19832/50000 [3:35:43<5:14:18,  1.60it/s]


 40%|█████████████                    | 19833/50000 [3:35:43<5:10:37,  1.62it/s]


 40%|█████████████                    | 19834/50000 [3:35:44<4:52:24,  1.72it/s]


 40%|█████████████                    | 19835/50000 [3:35:45<4:48:42,  1.74it/s]


 40%|█████████████                    | 19836/50000 [3:35:45<4:40:50,  1.79it/s]


 40%|█████████████                    | 19837/50000 [3:35:46<4:40:42,  1.79it/s]


 40%|█████████████                    | 19838/50000 [3:35:46<4:51:58,  1.72it/s]


 40%|█████████████                    | 19839/50000 [3:35:47<5:03:33,  1.66it/s]


 40%|█████████████                    | 19840/50000 [3:35:48<5:21:35,  1.56it/s]


 40%|█████████████                    | 19841/50000 [3:35:48<5:33:23,  1.51it/s]


 40%|█████████████                    | 19842/50000 [3:35:49<5:29:45,  1.52it/s]


 40%|█████████████                    | 19843/50000 [3:35:50<5:43:48,  1.46it/s]


 40%|█████████████                    | 19844/50000 [3:35:50<5:39:57,  1.48it/s]


 40%|█████████████                    | 19845/50000 [3:35:51<5:25:45,  1.54it/s]


 40%|█████████████                    | 19846/50000 [3:35:52<5:34:59,  1.50it/s]


 40%|█████████████                    | 19847/50000 [3:35:52<5:29:53,  1.52it/s]


 40%|█████████████                    | 19848/50000 [3:35:53<5:17:47,  1.58it/s]


 40%|█████████████                    | 19849/50000 [3:35:54<5:22:36,  1.56it/s]


 40%|█████████████                    | 19850/50000 [3:35:54<5:34:55,  1.50it/s]


 40%|█████████████                    | 19851/50000 [3:35:55<5:37:24,  1.49it/s]


 40%|█████████████                    | 19852/50000 [3:35:56<5:25:49,  1.54it/s]


 40%|█████████████                    | 19853/50000 [3:35:56<5:26:15,  1.54it/s]


 40%|█████████████                    | 19854/50000 [3:35:57<5:14:21,  1.60it/s]


 40%|█████████████                    | 19855/50000 [3:35:57<5:04:58,  1.65it/s]


 40%|█████████████                    | 19856/50000 [3:35:58<5:02:11,  1.66it/s]


 40%|█████████████                    | 19857/50000 [3:35:59<5:34:32,  1.50it/s]


 40%|█████████████                    | 19858/50000 [3:35:59<5:31:25,  1.52it/s]


 40%|█████████████                    | 19859/50000 [3:36:00<5:23:30,  1.55it/s]


 40%|█████████████                    | 19860/50000 [3:36:01<5:03:19,  1.66it/s]


 40%|█████████████                    | 19861/50000 [3:36:01<5:00:58,  1.67it/s]


 40%|█████████████                    | 19862/50000 [3:36:02<4:59:21,  1.68it/s]


 40%|█████████████                    | 19863/50000 [3:36:02<5:26:19,  1.54it/s]


 40%|█████████████                    | 19864/50000 [3:36:03<5:13:19,  1.60it/s]


 40%|█████████████                    | 19865/50000 [3:36:04<5:41:15,  1.47it/s]


 40%|█████████████                    | 19866/50000 [3:36:04<5:26:37,  1.54it/s]


 40%|█████████████                    | 19867/50000 [3:36:05<5:24:50,  1.55it/s]


 40%|█████████████                    | 19868/50000 [3:36:06<5:50:19,  1.43it/s]


 40%|█████████████                    | 19869/50000 [3:36:07<5:40:35,  1.47it/s]


 40%|█████████████                    | 19870/50000 [3:36:07<5:37:31,  1.49it/s]


 40%|█████████████                    | 19871/50000 [3:36:08<5:36:24,  1.49it/s]


 40%|█████████████                    | 19872/50000 [3:36:09<5:44:29,  1.46it/s]


 40%|█████████████                    | 19873/50000 [3:36:09<5:30:46,  1.52it/s]


 40%|█████████████                    | 19874/50000 [3:36:10<5:17:23,  1.58it/s]


 40%|█████████████                    | 19875/50000 [3:36:10<5:12:40,  1.61it/s]


 40%|█████████████                    | 19876/50000 [3:36:11<5:14:09,  1.60it/s]


 40%|█████████████                    | 19877/50000 [3:36:12<5:17:06,  1.58it/s]


 40%|█████████████                    | 19878/50000 [3:36:12<5:22:57,  1.55it/s]


 40%|█████████████                    | 19879/50000 [3:36:13<5:25:09,  1.54it/s]


 40%|█████████████                    | 19880/50000 [3:36:13<5:03:25,  1.65it/s]


 40%|█████████████                    | 19881/50000 [3:36:14<5:02:11,  1.66it/s]


 40%|█████████████                    | 19882/50000 [3:36:15<4:58:16,  1.68it/s]


 40%|█████████████                    | 19883/50000 [3:36:15<5:02:13,  1.66it/s]


 40%|█████████████                    | 19884/50000 [3:36:16<5:07:47,  1.63it/s]


 40%|█████████████                    | 19885/50000 [3:36:16<5:03:07,  1.66it/s]


 40%|█████████████                    | 19886/50000 [3:36:17<5:26:51,  1.54it/s]


 40%|█████████████▏                   | 19887/50000 [3:36:18<5:19:21,  1.57it/s]


 40%|█████████████▏                   | 19888/50000 [3:36:18<5:20:44,  1.56it/s]


 40%|█████████████▏                   | 19889/50000 [3:36:19<5:12:13,  1.61it/s]


 40%|█████████████▏                   | 19890/50000 [3:36:20<5:19:34,  1.57it/s]


 40%|█████████████▏                   | 19891/50000 [3:36:20<5:13:10,  1.60it/s]


 40%|█████████████▏                   | 19892/50000 [3:36:21<5:18:14,  1.58it/s]


 40%|█████████████▏                   | 19893/50000 [3:36:22<5:16:09,  1.59it/s]


 40%|█████████████▏                   | 19894/50000 [3:36:22<5:22:13,  1.56it/s]


 40%|█████████████▏                   | 19895/50000 [3:36:23<5:19:51,  1.57it/s]


 40%|█████████████▏                   | 19896/50000 [3:36:23<5:13:20,  1.60it/s]


 40%|█████████████▏                   | 19897/50000 [3:36:24<5:13:35,  1.60it/s]


 40%|█████████████▏                   | 19898/50000 [3:36:25<5:27:32,  1.53it/s]


 40%|█████████████▏                   | 19899/50000 [3:36:25<5:19:44,  1.57it/s]


 40%|█████████████▏                   | 19900/50000 [3:36:26<5:17:10,  1.58it/s]
                                                                                
{'loss': 3.3216, 'grad_norm': 3.0411453247070312, 'learning_rate': 0.000602, 'epoch': 1.04}

 40%|█████████████▏                   | 19900/50000 [3:36:26<5:17:10,  1.58it/s]


 40%|█████████████▏                   | 19901/50000 [3:36:27<5:30:05,  1.52it/s]


 40%|█████████████▏                   | 19902/50000 [3:36:28<5:52:49,  1.42it/s]


 40%|█████████████▏                   | 19903/50000 [3:36:28<6:09:39,  1.36it/s]


 40%|█████████████▏                   | 19904/50000 [3:36:29<6:07:40,  1.36it/s]


 40%|█████████████▏                   | 19905/50000 [3:36:30<5:48:42,  1.44it/s]


 40%|█████████████▏                   | 19906/50000 [3:36:30<5:45:54,  1.45it/s]


 40%|█████████████▏                   | 19907/50000 [3:36:31<5:30:57,  1.52it/s]


 40%|█████████████▏                   | 19908/50000 [3:36:32<5:13:37,  1.60it/s]


 40%|█████████████▏                   | 19909/50000 [3:36:32<5:18:29,  1.57it/s]


 40%|█████████████▏                   | 19910/50000 [3:36:33<5:18:17,  1.58it/s]


 40%|█████████████▏                   | 19911/50000 [3:36:33<5:01:21,  1.66it/s]


 40%|█████████████▏                   | 19912/50000 [3:36:34<5:12:45,  1.60it/s]


 40%|█████████████▏                   | 19913/50000 [3:36:35<5:16:56,  1.58it/s]


 40%|█████████████▏                   | 19914/50000 [3:36:35<5:11:15,  1.61it/s]


 40%|█████████████▏                   | 19915/50000 [3:36:36<5:13:56,  1.60it/s]


 40%|█████████████▏                   | 19916/50000 [3:36:37<5:14:02,  1.60it/s]


 40%|█████████████▏                   | 19917/50000 [3:36:37<5:21:37,  1.56it/s]


 40%|█████████████▏                   | 19918/50000 [3:36:38<5:32:43,  1.51it/s]


 40%|█████████████▏                   | 19919/50000 [3:36:39<5:41:44,  1.47it/s]


 40%|█████████████▏                   | 19920/50000 [3:36:39<6:03:48,  1.38it/s]


 40%|█████████████▏                   | 19921/50000 [3:36:40<6:21:08,  1.32it/s]


 40%|█████████████▏                   | 19922/50000 [3:36:41<6:02:52,  1.38it/s]


 40%|█████████████▏                   | 19923/50000 [3:36:42<5:42:29,  1.46it/s]


 40%|█████████████▏                   | 19924/50000 [3:36:42<5:24:27,  1.54it/s]


 40%|█████████████▏                   | 19925/50000 [3:36:43<5:33:46,  1.50it/s]


 40%|█████████████▏                   | 19926/50000 [3:36:44<5:57:17,  1.40it/s]


 40%|█████████████▏                   | 19927/50000 [3:36:44<5:33:02,  1.50it/s]


 40%|█████████████▏                   | 19928/50000 [3:36:45<6:12:08,  1.35it/s]


 40%|█████████████▏                   | 19929/50000 [3:36:46<5:57:28,  1.40it/s]


 40%|█████████████▏                   | 19930/50000 [3:36:46<5:44:24,  1.46it/s]


 40%|█████████████▏                   | 19931/50000 [3:36:47<6:27:56,  1.29it/s]


 40%|█████████████▏                   | 19932/50000 [3:36:48<6:22:13,  1.31it/s]


 40%|█████████████▏                   | 19933/50000 [3:36:49<5:57:54,  1.40it/s]


 40%|█████████████▏                   | 19934/50000 [3:36:49<5:50:33,  1.43it/s]


 40%|█████████████▏                   | 19935/50000 [3:36:50<5:40:33,  1.47it/s]


 40%|█████████████▏                   | 19936/50000 [3:36:51<5:33:33,  1.50it/s]


 40%|█████████████▏                   | 19937/50000 [3:36:51<5:11:02,  1.61it/s]


 40%|█████████████▏                   | 19938/50000 [3:36:52<5:13:46,  1.60it/s]


 40%|█████████████▏                   | 19939/50000 [3:36:52<5:26:14,  1.54it/s]


 40%|█████████████▏                   | 19940/50000 [3:36:53<5:44:51,  1.45it/s]


 40%|█████████████▏                   | 19941/50000 [3:36:54<5:30:48,  1.51it/s]


 40%|█████████████▏                   | 19942/50000 [3:36:54<5:08:06,  1.63it/s]


 40%|█████████████▏                   | 19943/50000 [3:36:55<5:09:50,  1.62it/s]


 40%|█████████████▏                   | 19944/50000 [3:36:56<5:12:02,  1.61it/s]


 40%|█████████████▏                   | 19945/50000 [3:36:56<5:07:46,  1.63it/s]


 40%|█████████████▏                   | 19946/50000 [3:36:57<5:27:57,  1.53it/s]


 40%|█████████████▏                   | 19947/50000 [3:36:58<5:18:55,  1.57it/s]


 40%|█████████████▏                   | 19948/50000 [3:36:58<5:19:14,  1.57it/s]


 40%|█████████████▏                   | 19949/50000 [3:36:59<5:38:53,  1.48it/s]


 40%|█████████████▏                   | 19950/50000 [3:37:00<5:39:58,  1.47it/s]


 40%|█████████████▏                   | 19951/50000 [3:37:00<5:34:51,  1.50it/s]


 40%|█████████████▏                   | 19952/50000 [3:37:01<5:30:19,  1.52it/s]


 40%|█████████████▏                   | 19953/50000 [3:37:02<5:27:14,  1.53it/s]


 40%|█████████████▏                   | 19954/50000 [3:37:02<5:46:20,  1.45it/s]


 40%|█████████████▏                   | 19955/50000 [3:37:03<5:36:29,  1.49it/s]


 40%|█████████████▏                   | 19956/50000 [3:37:04<5:19:11,  1.57it/s]


 40%|█████████████▏                   | 19957/50000 [3:37:04<5:23:09,  1.55it/s]


 40%|█████████████▏                   | 19958/50000 [3:37:05<5:22:09,  1.55it/s]


 40%|█████████████▏                   | 19959/50000 [3:37:05<5:20:30,  1.56it/s]


 40%|█████████████▏                   | 19960/50000 [3:37:06<5:21:17,  1.56it/s]


 40%|█████████████▏                   | 19961/50000 [3:37:07<5:42:09,  1.46it/s]


 40%|█████████████▏                   | 19962/50000 [3:37:08<5:27:49,  1.53it/s]


 40%|█████████████▏                   | 19963/50000 [3:37:08<5:14:06,  1.59it/s]


 40%|█████████████▏                   | 19964/50000 [3:37:09<5:01:40,  1.66it/s]


 40%|█████████████▏                   | 19965/50000 [3:37:09<5:01:25,  1.66it/s]


 40%|█████████████▏                   | 19966/50000 [3:37:10<5:22:28,  1.55it/s]


 40%|█████████████▏                   | 19967/50000 [3:37:11<5:15:56,  1.58it/s]


 40%|█████████████▏                   | 19968/50000 [3:37:11<5:38:25,  1.48it/s]


 40%|█████████████▏                   | 19969/50000 [3:37:12<5:48:20,  1.44it/s]


 40%|█████████████▏                   | 19970/50000 [3:37:13<5:29:36,  1.52it/s]


 40%|█████████████▏                   | 19971/50000 [3:37:13<5:09:25,  1.62it/s]


 40%|█████████████▏                   | 19972/50000 [3:37:14<5:17:39,  1.58it/s]


 40%|█████████████▏                   | 19973/50000 [3:37:14<5:13:51,  1.59it/s]


 40%|█████████████▏                   | 19974/50000 [3:37:15<5:27:53,  1.53it/s]


 40%|█████████████▏                   | 19975/50000 [3:37:16<5:24:50,  1.54it/s]


 40%|█████████████▏                   | 19976/50000 [3:37:16<5:26:31,  1.53it/s]


 40%|█████████████▏                   | 19977/50000 [3:37:17<5:15:28,  1.59it/s]


 40%|█████████████▏                   | 19978/50000 [3:37:18<5:07:37,  1.63it/s]


 40%|█████████████▏                   | 19979/50000 [3:37:18<5:03:57,  1.65it/s]


 40%|█████████████▏                   | 19980/50000 [3:37:19<5:10:50,  1.61it/s]


 40%|█████████████▏                   | 19981/50000 [3:37:20<5:25:37,  1.54it/s]


 40%|█████████████▏                   | 19982/50000 [3:37:20<5:16:34,  1.58it/s]


 40%|█████████████▏                   | 19983/50000 [3:37:21<5:21:28,  1.56it/s]


 40%|█████████████▏                   | 19984/50000 [3:37:22<5:37:29,  1.48it/s]


 40%|█████████████▏                   | 19985/50000 [3:37:22<5:35:59,  1.49it/s]


 40%|█████████████▏                   | 19986/50000 [3:37:23<5:18:55,  1.57it/s]


 40%|█████████████▏                   | 19987/50000 [3:37:23<5:21:01,  1.56it/s]


 40%|█████████████▏                   | 19988/50000 [3:37:25<6:18:59,  1.32it/s]


 40%|█████████████▏                   | 19989/50000 [3:37:25<5:48:58,  1.43it/s]


 40%|█████████████▏                   | 19990/50000 [3:37:26<6:06:45,  1.36it/s]


 40%|█████████████▏                   | 19991/50000 [3:37:27<6:56:59,  1.20it/s]


 40%|█████████████▏                   | 19992/50000 [3:37:28<6:25:00,  1.30it/s]


 40%|█████████████▏                   | 19993/50000 [3:37:28<5:58:50,  1.39it/s]


 40%|█████████████▏                   | 19994/50000 [3:37:29<6:04:30,  1.37it/s]


 40%|█████████████▏                   | 19995/50000 [3:37:30<5:51:04,  1.42it/s]


 40%|█████████████▏                   | 19996/50000 [3:37:30<5:35:55,  1.49it/s]


 40%|█████████████▏                   | 19997/50000 [3:37:31<5:16:53,  1.58it/s]


 40%|█████████████▏                   | 19998/50000 [3:37:31<5:19:34,  1.56it/s]


 40%|█████████████▏                   | 19999/50000 [3:37:32<5:01:06,  1.66it/s]


 40%|█████████████▏                   | 20000/50000 [3:37:33<5:11:26,  1.61it/s]
                                                                                
{'loss': 3.3009, 'grad_norm': 3.1901047229766846, 'learning_rate': 0.0006, 'epoch': 1.05}

 40%|█████████████▏                   | 20000/50000 [3:37:33<5:11:26,  1.61it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.06s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:03<00:01,  1.28s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:08<00:00,  2.68s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 31.773450000000004, 'eval_rouge-2': 7.092496, 'eval_rouge-l': 25.286636, 'eval_bleu-4': 0.03556758762554474, 'eval_runtime': 13.0912, 'eval_samples_per_second': 3.819, 'eval_steps_per_second': 0.306, 'epoch': 1.05}

 40%|█████████████▏                   | 20000/50000 [3:37:46<5:11:26,  1.61it/s]

100%|█████████████████████████████████████████████| 4/4 [00:08<00:00,  2.68s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-20000


tokenizer config file saved in ./output/tmp-checkpoint-20000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-20000/special_tokens_map.json



 40%|████████████▊                   | 20001/50000 [3:37:46<38:13:09,  4.59s/it]


 40%|████████████▊                   | 20002/50000 [3:37:47<28:19:02,  3.40s/it]


 40%|████████████▊                   | 20003/50000 [3:37:48<21:14:42,  2.55s/it]


 40%|████████████▊                   | 20004/50000 [3:37:48<16:22:21,  1.96s/it]


 40%|████████████▊                   | 20005/50000 [3:37:49<12:55:53,  1.55s/it]


 40%|████████████▊                   | 20006/50000 [3:37:50<10:57:24,  1.32s/it]


 40%|█████████████▏                   | 20007/50000 [3:37:50<9:03:44,  1.09s/it]


 40%|█████████████▏                   | 20008/50000 [3:37:51<7:55:40,  1.05it/s]


 40%|█████████████▏                   | 20009/50000 [3:37:51<6:57:25,  1.20it/s]


 40%|█████████████▏                   | 20010/50000 [3:37:52<6:45:33,  1.23it/s]


 40%|█████████████▏                   | 20011/50000 [3:37:53<6:19:40,  1.32it/s]


 40%|█████████████▏                   | 20012/50000 [3:37:53<5:59:42,  1.39it/s]


 40%|█████████████▏                   | 20013/50000 [3:37:54<6:14:21,  1.34it/s]


 40%|█████████████▏                   | 20014/50000 [3:37:55<5:45:53,  1.44it/s]


 40%|█████████████▏                   | 20015/50000 [3:37:55<5:35:25,  1.49it/s]


 40%|█████████████▏                   | 20016/50000 [3:37:56<5:26:06,  1.53it/s]


 40%|█████████████▏                   | 20017/50000 [3:37:56<5:14:58,  1.59it/s]


 40%|█████████████▏                   | 20018/50000 [3:37:57<5:34:09,  1.50it/s]


 40%|█████████████▏                   | 20019/50000 [3:37:58<5:20:37,  1.56it/s]


 40%|█████████████▏                   | 20020/50000 [3:37:58<5:25:16,  1.54it/s]


 40%|█████████████▏                   | 20021/50000 [3:37:59<5:28:32,  1.52it/s]


 40%|█████████████▏                   | 20022/50000 [3:38:00<5:36:36,  1.48it/s]


 40%|█████████████▏                   | 20023/50000 [3:38:00<5:24:12,  1.54it/s]


 40%|█████████████▏                   | 20024/50000 [3:38:01<5:42:26,  1.46it/s]


 40%|█████████████▏                   | 20025/50000 [3:38:02<5:47:00,  1.44it/s]


 40%|█████████████▏                   | 20026/50000 [3:38:03<5:38:57,  1.47it/s]


 40%|█████████████▏                   | 20027/50000 [3:38:03<5:25:29,  1.53it/s]


 40%|█████████████▏                   | 20028/50000 [3:38:04<5:15:09,  1.59it/s]


 40%|█████████████▏                   | 20029/50000 [3:38:04<5:06:57,  1.63it/s]


 40%|█████████████▏                   | 20030/50000 [3:38:05<5:28:03,  1.52it/s]


 40%|█████████████▏                   | 20031/50000 [3:38:06<5:16:54,  1.58it/s]


 40%|█████████████▏                   | 20032/50000 [3:38:06<5:04:23,  1.64it/s]


 40%|█████████████▏                   | 20033/50000 [3:38:07<5:03:05,  1.65it/s]


 40%|█████████████▏                   | 20034/50000 [3:38:08<5:20:36,  1.56it/s]


 40%|█████████████▏                   | 20035/50000 [3:38:08<5:45:36,  1.45it/s]


 40%|█████████████▏                   | 20036/50000 [3:38:09<5:39:34,  1.47it/s]


 40%|█████████████▏                   | 20037/50000 [3:38:10<5:24:00,  1.54it/s]


 40%|█████████████▏                   | 20038/50000 [3:38:10<5:26:21,  1.53it/s]


 40%|█████████████▏                   | 20039/50000 [3:38:11<5:36:52,  1.48it/s]


 40%|█████████████▏                   | 20040/50000 [3:38:12<5:36:20,  1.48it/s]


 40%|█████████████▏                   | 20041/50000 [3:38:12<5:25:30,  1.53it/s]


 40%|█████████████▏                   | 20042/50000 [3:38:13<5:15:06,  1.58it/s]


 40%|█████████████▏                   | 20043/50000 [3:38:13<5:05:55,  1.63it/s]


 40%|█████████████▏                   | 20044/50000 [3:38:14<5:16:27,  1.58it/s]


 40%|█████████████▏                   | 20045/50000 [3:38:15<5:19:04,  1.56it/s]


 40%|█████████████▏                   | 20046/50000 [3:38:15<5:07:41,  1.62it/s]


 40%|█████████████▏                   | 20047/50000 [3:38:16<4:49:48,  1.72it/s]


 40%|█████████████▏                   | 20048/50000 [3:38:16<5:05:08,  1.64it/s]


 40%|█████████████▏                   | 20049/50000 [3:38:17<5:12:13,  1.60it/s]


 40%|█████████████▏                   | 20050/50000 [3:38:18<5:17:05,  1.57it/s]


 40%|█████████████▏                   | 20051/50000 [3:38:19<5:30:17,  1.51it/s]


 40%|█████████████▏                   | 20052/50000 [3:38:19<5:27:45,  1.52it/s]


 40%|█████████████▏                   | 20053/50000 [3:38:20<5:24:08,  1.54it/s]


 40%|█████████████▏                   | 20054/50000 [3:38:20<5:15:57,  1.58it/s]


 40%|█████████████▏                   | 20055/50000 [3:38:21<5:30:30,  1.51it/s]


 40%|█████████████▏                   | 20056/50000 [3:38:22<5:26:59,  1.53it/s]


 40%|█████████████▏                   | 20057/50000 [3:38:22<5:17:57,  1.57it/s]


 40%|█████████████▏                   | 20058/50000 [3:38:23<5:12:33,  1.60it/s]


 40%|█████████████▏                   | 20059/50000 [3:38:24<5:16:16,  1.58it/s]


 40%|█████████████▏                   | 20060/50000 [3:38:24<5:16:05,  1.58it/s]


 40%|█████████████▏                   | 20061/50000 [3:38:25<5:27:23,  1.52it/s]


 40%|█████████████▏                   | 20062/50000 [3:38:26<5:35:13,  1.49it/s]


 40%|█████████████▏                   | 20063/50000 [3:38:26<5:33:06,  1.50it/s]


 40%|█████████████▏                   | 20064/50000 [3:38:27<5:47:25,  1.44it/s]


 40%|█████████████▏                   | 20065/50000 [3:38:28<5:41:06,  1.46it/s]


 40%|█████████████▏                   | 20066/50000 [3:38:29<5:56:09,  1.40it/s]


 40%|█████████████▏                   | 20067/50000 [3:38:29<5:59:05,  1.39it/s]


 40%|█████████████▏                   | 20068/50000 [3:38:30<5:51:12,  1.42it/s]


 40%|█████████████▏                   | 20069/50000 [3:38:31<6:00:24,  1.38it/s]


 40%|█████████████▏                   | 20070/50000 [3:38:31<5:42:36,  1.46it/s]


 40%|█████████████▏                   | 20071/50000 [3:38:32<6:02:29,  1.38it/s]


 40%|█████████████▏                   | 20072/50000 [3:38:33<5:47:58,  1.43it/s]


 40%|█████████████▏                   | 20073/50000 [3:38:34<6:12:56,  1.34it/s]


 40%|█████████████▏                   | 20074/50000 [3:38:34<5:37:20,  1.48it/s]


 40%|█████████████▏                   | 20075/50000 [3:38:35<5:32:40,  1.50it/s]


 40%|█████████████▎                   | 20076/50000 [3:38:35<5:26:34,  1.53it/s]


 40%|█████████████▎                   | 20077/50000 [3:38:36<5:19:25,  1.56it/s]


 40%|█████████████▎                   | 20078/50000 [3:38:37<5:47:27,  1.44it/s]


 40%|█████████████▎                   | 20079/50000 [3:38:37<5:14:47,  1.58it/s]


 40%|█████████████▎                   | 20080/50000 [3:38:38<5:20:38,  1.56it/s]


 40%|█████████████▎                   | 20081/50000 [3:38:39<5:21:51,  1.55it/s]


 40%|█████████████▎                   | 20082/50000 [3:38:39<5:16:05,  1.58it/s]


 40%|█████████████▎                   | 20083/50000 [3:38:40<5:09:21,  1.61it/s]


 40%|█████████████▎                   | 20084/50000 [3:38:40<5:07:19,  1.62it/s]


 40%|█████████████▎                   | 20085/50000 [3:38:41<5:03:06,  1.64it/s]


 40%|█████████████▎                   | 20086/50000 [3:38:42<5:12:30,  1.60it/s]


 40%|█████████████▎                   | 20087/50000 [3:38:42<5:18:05,  1.57it/s]


 40%|█████████████▎                   | 20088/50000 [3:38:43<5:08:00,  1.62it/s]


 40%|█████████████▎                   | 20089/50000 [3:38:44<5:13:02,  1.59it/s]


 40%|█████████████▎                   | 20090/50000 [3:38:44<5:19:27,  1.56it/s]


 40%|█████████████▎                   | 20091/50000 [3:38:45<5:05:58,  1.63it/s]


 40%|█████████████▎                   | 20092/50000 [3:38:46<5:26:07,  1.53it/s]


 40%|█████████████▎                   | 20093/50000 [3:38:46<5:45:26,  1.44it/s]


 40%|█████████████▎                   | 20094/50000 [3:38:47<6:05:33,  1.36it/s]


 40%|█████████████▎                   | 20095/50000 [3:38:48<5:43:41,  1.45it/s]


 40%|█████████████▎                   | 20096/50000 [3:38:48<5:19:17,  1.56it/s]


 40%|█████████████▎                   | 20097/50000 [3:38:49<5:09:22,  1.61it/s]


 40%|█████████████▎                   | 20098/50000 [3:38:50<5:23:25,  1.54it/s]


 40%|█████████████▎                   | 20099/50000 [3:38:50<5:09:15,  1.61it/s]


 40%|█████████████▎                   | 20100/50000 [3:38:51<5:22:48,  1.54it/s]
                                                                                
{'loss': 3.3023, 'grad_norm': 2.718670129776001, 'learning_rate': 0.000598, 'epoch': 1.05}

 40%|█████████████▎                   | 20100/50000 [3:38:51<5:22:48,  1.54it/s]


 40%|█████████████▎                   | 20101/50000 [3:38:51<5:11:29,  1.60it/s]


 40%|█████████████▎                   | 20102/50000 [3:38:52<5:13:54,  1.59it/s]


 40%|█████████████▎                   | 20103/50000 [3:38:53<5:05:14,  1.63it/s]


 40%|█████████████▎                   | 20104/50000 [3:38:53<4:47:36,  1.73it/s]


 40%|█████████████▎                   | 20105/50000 [3:38:54<4:51:50,  1.71it/s]


 40%|█████████████▎                   | 20106/50000 [3:38:54<5:05:16,  1.63it/s]


 40%|█████████████▎                   | 20107/50000 [3:38:55<5:01:37,  1.65it/s]


 40%|█████████████▎                   | 20108/50000 [3:38:56<4:58:51,  1.67it/s]


 40%|█████████████▎                   | 20109/50000 [3:38:56<5:17:56,  1.57it/s]


 40%|█████████████▎                   | 20110/50000 [3:38:57<5:12:43,  1.59it/s]


 40%|█████████████▎                   | 20111/50000 [3:38:57<5:06:18,  1.63it/s]


 40%|█████████████▎                   | 20112/50000 [3:38:58<5:08:27,  1.61it/s]


 40%|█████████████▎                   | 20113/50000 [3:38:59<5:12:49,  1.59it/s]


 40%|█████████████▎                   | 20114/50000 [3:38:59<5:13:58,  1.59it/s]


 40%|█████████████▎                   | 20115/50000 [3:39:00<5:19:29,  1.56it/s]


 40%|█████████████▎                   | 20116/50000 [3:39:01<5:49:53,  1.42it/s]


 40%|█████████████▎                   | 20117/50000 [3:39:02<5:44:41,  1.44it/s]


 40%|█████████████▎                   | 20118/50000 [3:39:02<5:29:36,  1.51it/s]


 40%|█████████████▎                   | 20119/50000 [3:39:03<5:26:35,  1.52it/s]


 40%|█████████████▎                   | 20120/50000 [3:39:03<5:16:55,  1.57it/s]


 40%|█████████████▎                   | 20121/50000 [3:39:04<5:29:35,  1.51it/s]


 40%|█████████████▎                   | 20122/50000 [3:39:05<5:19:01,  1.56it/s]


 40%|█████████████▎                   | 20123/50000 [3:39:05<5:16:25,  1.57it/s]


 40%|█████████████▎                   | 20124/50000 [3:39:06<5:13:24,  1.59it/s]


 40%|█████████████▎                   | 20125/50000 [3:39:07<5:07:16,  1.62it/s]


 40%|█████████████▎                   | 20126/50000 [3:39:07<5:06:27,  1.62it/s]


 40%|█████████████▎                   | 20127/50000 [3:39:08<5:05:00,  1.63it/s]


 40%|█████████████▎                   | 20128/50000 [3:39:08<4:55:33,  1.68it/s]


 40%|█████████████▎                   | 20129/50000 [3:39:09<5:16:20,  1.57it/s]


 40%|█████████████▎                   | 20130/50000 [3:39:10<5:11:08,  1.60it/s]


 40%|█████████████▎                   | 20131/50000 [3:39:10<5:12:04,  1.60it/s]


 40%|█████████████▎                   | 20132/50000 [3:39:11<5:06:05,  1.63it/s]


 40%|█████████████▎                   | 20133/50000 [3:39:12<5:15:10,  1.58it/s]


 40%|█████████████▎                   | 20134/50000 [3:39:12<5:49:12,  1.43it/s]


 40%|█████████████▎                   | 20135/50000 [3:39:13<5:38:44,  1.47it/s]


 40%|█████████████▎                   | 20136/50000 [3:39:14<5:24:31,  1.53it/s]


 40%|█████████████▎                   | 20137/50000 [3:39:14<5:01:41,  1.65it/s]


 40%|█████████████▎                   | 20138/50000 [3:39:15<5:01:18,  1.65it/s]


 40%|█████████████▎                   | 20139/50000 [3:39:15<5:06:40,  1.62it/s]


 40%|█████████████▎                   | 20140/50000 [3:39:16<5:01:58,  1.65it/s]


 40%|█████████████▎                   | 20141/50000 [3:39:17<5:25:43,  1.53it/s]


 40%|█████████████▎                   | 20142/50000 [3:39:17<5:21:55,  1.55it/s]


 40%|█████████████▎                   | 20143/50000 [3:39:18<5:08:54,  1.61it/s]


 40%|█████████████▎                   | 20144/50000 [3:39:19<5:13:00,  1.59it/s]


 40%|█████████████▎                   | 20145/50000 [3:39:19<5:04:31,  1.63it/s]


 40%|█████████████▎                   | 20146/50000 [3:39:20<5:13:42,  1.59it/s]


 40%|█████████████▎                   | 20147/50000 [3:39:21<5:28:05,  1.52it/s]


 40%|█████████████▎                   | 20148/50000 [3:39:21<5:40:09,  1.46it/s]


 40%|█████████████▎                   | 20149/50000 [3:39:22<5:16:23,  1.57it/s]


 40%|█████████████▎                   | 20150/50000 [3:39:22<5:09:41,  1.61it/s]


 40%|█████████████▎                   | 20151/50000 [3:39:23<5:14:34,  1.58it/s]


 40%|█████████████▎                   | 20152/50000 [3:39:24<5:26:31,  1.52it/s]


 40%|█████████████▎                   | 20153/50000 [3:39:24<5:10:16,  1.60it/s]


 40%|█████████████▎                   | 20154/50000 [3:39:25<5:19:40,  1.56it/s]


 40%|█████████████▎                   | 20155/50000 [3:39:26<5:08:14,  1.61it/s]


 40%|█████████████▎                   | 20156/50000 [3:39:26<5:12:28,  1.59it/s]


 40%|█████████████▎                   | 20157/50000 [3:39:27<5:05:04,  1.63it/s]


 40%|█████████████▎                   | 20158/50000 [3:39:27<5:22:01,  1.54it/s]


 40%|█████████████▎                   | 20159/50000 [3:39:28<5:20:17,  1.55it/s]


 40%|█████████████▎                   | 20160/50000 [3:39:29<6:25:07,  1.29it/s]


 40%|█████████████▎                   | 20161/50000 [3:39:30<6:03:36,  1.37it/s]


 40%|█████████████▎                   | 20162/50000 [3:39:30<5:45:39,  1.44it/s]


 40%|█████████████▎                   | 20163/50000 [3:39:31<6:08:01,  1.35it/s]


 40%|█████████████▎                   | 20164/50000 [3:39:32<5:54:18,  1.40it/s]


 40%|█████████████▎                   | 20165/50000 [3:39:33<5:38:03,  1.47it/s]


 40%|█████████████▎                   | 20166/50000 [3:39:33<5:36:09,  1.48it/s]


 40%|█████████████▎                   | 20167/50000 [3:39:34<5:20:39,  1.55it/s]


 40%|█████████████▎                   | 20168/50000 [3:39:34<5:00:49,  1.65it/s]


 40%|█████████████▎                   | 20169/50000 [3:39:35<5:10:09,  1.60it/s]


 40%|█████████████▎                   | 20170/50000 [3:39:36<5:27:26,  1.52it/s]


 40%|█████████████▎                   | 20171/50000 [3:39:36<5:19:16,  1.56it/s]


 40%|█████████████▎                   | 20172/50000 [3:39:37<5:35:42,  1.48it/s]


 40%|█████████████▎                   | 20173/50000 [3:39:38<5:32:39,  1.49it/s]


 40%|█████████████▎                   | 20174/50000 [3:39:38<5:29:33,  1.51it/s]


 40%|█████████████▎                   | 20175/50000 [3:39:39<5:13:16,  1.59it/s]


 40%|█████████████▎                   | 20176/50000 [3:39:39<5:02:46,  1.64it/s]


 40%|█████████████▎                   | 20177/50000 [3:39:40<5:12:25,  1.59it/s]


 40%|█████████████▎                   | 20178/50000 [3:39:41<5:11:38,  1.59it/s]


 40%|█████████████▎                   | 20179/50000 [3:39:41<5:17:38,  1.56it/s]


 40%|█████████████▎                   | 20180/50000 [3:39:42<5:22:50,  1.54it/s]


 40%|█████████████▎                   | 20181/50000 [3:39:43<5:49:13,  1.42it/s]


 40%|█████████████▎                   | 20182/50000 [3:39:44<5:41:05,  1.46it/s]


 40%|█████████████▎                   | 20183/50000 [3:39:44<5:38:22,  1.47it/s]


 40%|█████████████▎                   | 20184/50000 [3:39:45<5:27:46,  1.52it/s]


 40%|█████████████▎                   | 20185/50000 [3:39:46<5:25:08,  1.53it/s]


 40%|█████████████▎                   | 20186/50000 [3:39:46<5:35:06,  1.48it/s]


 40%|█████████████▎                   | 20187/50000 [3:39:47<6:09:09,  1.35it/s]


 40%|█████████████▎                   | 20188/50000 [3:39:48<5:59:47,  1.38it/s]


 40%|█████████████▎                   | 20189/50000 [3:39:48<5:42:48,  1.45it/s]


 40%|█████████████▎                   | 20190/50000 [3:39:49<5:48:18,  1.43it/s]


 40%|█████████████▎                   | 20191/50000 [3:39:50<5:40:26,  1.46it/s]


 40%|█████████████▎                   | 20192/50000 [3:39:50<5:27:16,  1.52it/s]


 40%|█████████████▎                   | 20193/50000 [3:39:51<5:52:05,  1.41it/s]


 40%|█████████████▎                   | 20194/50000 [3:39:52<5:35:30,  1.48it/s]


 40%|█████████████▎                   | 20195/50000 [3:39:52<5:24:45,  1.53it/s]


 40%|█████████████▎                   | 20196/50000 [3:39:53<5:01:44,  1.65it/s]


 40%|█████████████▎                   | 20197/50000 [3:39:54<5:07:48,  1.61it/s]


 40%|█████████████▎                   | 20198/50000 [3:39:54<5:12:03,  1.59it/s]


 40%|█████████████▎                   | 20199/50000 [3:39:55<5:16:03,  1.57it/s]


 40%|█████████████▎                   | 20200/50000 [3:39:55<5:07:26,  1.62it/s]


                                                                                
{'loss': 3.3369, 'grad_norm': 3.3251335620880127, 'learning_rate': 0.000596, 'epoch': 1.06}

 40%|█████████████▎                   | 20200/50000 [3:39:55<5:07:26,  1.62it/s]


 40%|█████████████▎                   | 20201/50000 [3:39:56<4:59:34,  1.66it/s]


 40%|█████████████▎                   | 20202/50000 [3:39:57<5:08:10,  1.61it/s]


 40%|█████████████▎                   | 20203/50000 [3:39:57<5:11:21,  1.59it/s]


 40%|█████████████▎                   | 20204/50000 [3:39:58<5:15:25,  1.57it/s]


 40%|█████████████▎                   | 20205/50000 [3:39:59<5:07:00,  1.62it/s]


 40%|█████████████▎                   | 20206/50000 [3:39:59<5:02:02,  1.64it/s]


 40%|█████████████▎                   | 20207/50000 [3:40:00<5:00:23,  1.65it/s]


 40%|█████████████▎                   | 20208/50000 [3:40:01<5:48:36,  1.42it/s]


 40%|█████████████▎                   | 20209/50000 [3:40:01<5:39:13,  1.46it/s]


 40%|█████████████▎                   | 20210/50000 [3:40:02<5:43:23,  1.45it/s]


 40%|█████████████▎                   | 20211/50000 [3:40:03<5:40:44,  1.46it/s]


 40%|█████████████▎                   | 20212/50000 [3:40:03<5:47:52,  1.43it/s]


 40%|█████████████▎                   | 20213/50000 [3:40:04<5:57:50,  1.39it/s]


 40%|█████████████▎                   | 20214/50000 [3:40:05<5:20:33,  1.55it/s]


 40%|█████████████▎                   | 20215/50000 [3:40:05<5:10:54,  1.60it/s]


 40%|█████████████▎                   | 20216/50000 [3:40:06<5:11:45,  1.59it/s]


 40%|█████████████▎                   | 20217/50000 [3:40:06<5:11:17,  1.59it/s]


 40%|█████████████▎                   | 20218/50000 [3:40:07<5:12:53,  1.59it/s]


 40%|█████████████▎                   | 20219/50000 [3:40:08<5:48:06,  1.43it/s]


 40%|█████████████▎                   | 20220/50000 [3:40:09<5:40:27,  1.46it/s]


 40%|█████████████▎                   | 20221/50000 [3:40:09<5:38:44,  1.47it/s]


 40%|█████████████▎                   | 20222/50000 [3:40:10<5:23:14,  1.54it/s]


 40%|█████████████▎                   | 20223/50000 [3:40:10<5:10:31,  1.60it/s]


 40%|█████████████▎                   | 20224/50000 [3:40:11<5:11:59,  1.59it/s]


 40%|█████████████▎                   | 20225/50000 [3:40:12<5:18:44,  1.56it/s]


 40%|█████████████▎                   | 20226/50000 [3:40:12<5:10:12,  1.60it/s]


 40%|█████████████▎                   | 20227/50000 [3:40:13<5:13:35,  1.58it/s]


 40%|█████████████▎                   | 20228/50000 [3:40:14<5:08:35,  1.61it/s]


 40%|█████████████▎                   | 20229/50000 [3:40:14<5:16:19,  1.57it/s]


 40%|█████████████▎                   | 20230/50000 [3:40:15<5:19:49,  1.55it/s]


 40%|█████████████▎                   | 20231/50000 [3:40:16<5:12:48,  1.59it/s]


 40%|█████████████▎                   | 20232/50000 [3:40:16<5:17:21,  1.56it/s]


 40%|█████████████▎                   | 20233/50000 [3:40:17<5:28:09,  1.51it/s]


 40%|█████████████▎                   | 20234/50000 [3:40:18<5:26:11,  1.52it/s]


 40%|█████████████▎                   | 20235/50000 [3:40:18<5:16:17,  1.57it/s]


 40%|█████████████▎                   | 20236/50000 [3:40:19<5:08:27,  1.61it/s]


 40%|█████████████▎                   | 20237/50000 [3:40:20<5:41:42,  1.45it/s]


 40%|█████████████▎                   | 20238/50000 [3:40:20<6:04:00,  1.36it/s]


 40%|█████████████▎                   | 20239/50000 [3:40:21<5:45:36,  1.44it/s]


 40%|█████████████▎                   | 20240/50000 [3:40:22<5:29:52,  1.50it/s]


 40%|█████████████▎                   | 20241/50000 [3:40:22<5:17:01,  1.56it/s]


 40%|█████████████▎                   | 20242/50000 [3:40:23<5:30:06,  1.50it/s]


 40%|█████████████▎                   | 20243/50000 [3:40:23<5:05:23,  1.62it/s]


 40%|█████████████▎                   | 20244/50000 [3:40:24<4:59:53,  1.65it/s]


 40%|█████████████▎                   | 20245/50000 [3:40:25<5:20:45,  1.55it/s]


 40%|█████████████▎                   | 20246/50000 [3:40:25<5:18:10,  1.56it/s]


 40%|█████████████▎                   | 20247/50000 [3:40:26<5:16:30,  1.57it/s]


 40%|█████████████▎                   | 20248/50000 [3:40:27<5:09:31,  1.60it/s]


 40%|█████████████▎                   | 20249/50000 [3:40:27<5:05:34,  1.62it/s]


 40%|█████████████▎                   | 20250/50000 [3:40:28<5:23:03,  1.53it/s]


 41%|█████████████▎                   | 20251/50000 [3:40:29<5:20:43,  1.55it/s]


 41%|█████████████▎                   | 20252/50000 [3:40:29<5:08:41,  1.61it/s]


 41%|█████████████▎                   | 20253/50000 [3:40:30<5:08:45,  1.61it/s]


 41%|█████████████▎                   | 20254/50000 [3:40:30<5:14:24,  1.58it/s]


 41%|█████████████▎                   | 20255/50000 [3:40:31<5:06:11,  1.62it/s]


 41%|█████████████▎                   | 20256/50000 [3:40:32<5:07:56,  1.61it/s]


 41%|█████████████▎                   | 20257/50000 [3:40:32<5:28:24,  1.51it/s]


 41%|█████████████▎                   | 20258/50000 [3:40:33<5:15:58,  1.57it/s]


 41%|█████████████▎                   | 20259/50000 [3:40:33<4:56:27,  1.67it/s]


 41%|█████████████▎                   | 20260/50000 [3:40:34<5:05:46,  1.62it/s]


 41%|█████████████▎                   | 20261/50000 [3:40:35<5:00:05,  1.65it/s]


 41%|█████████████▎                   | 20262/50000 [3:40:35<5:21:59,  1.54it/s]


 41%|█████████████▎                   | 20263/50000 [3:40:36<5:23:52,  1.53it/s]


 41%|█████████████▎                   | 20264/50000 [3:40:37<5:32:40,  1.49it/s]


 41%|█████████████▎                   | 20265/50000 [3:40:37<5:09:54,  1.60it/s]


 41%|█████████████▍                   | 20266/50000 [3:40:38<5:05:09,  1.62it/s]


 41%|█████████████▍                   | 20267/50000 [3:40:39<4:56:02,  1.67it/s]


 41%|█████████████▍                   | 20268/50000 [3:40:39<5:06:24,  1.62it/s]


 41%|█████████████▍                   | 20269/50000 [3:40:40<5:01:29,  1.64it/s]


 41%|█████████████▍                   | 20270/50000 [3:40:40<5:08:17,  1.61it/s]


 41%|█████████████▍                   | 20271/50000 [3:40:41<5:08:27,  1.61it/s]


 41%|█████████████▍                   | 20272/50000 [3:40:42<5:11:22,  1.59it/s]


 41%|█████████████▍                   | 20273/50000 [3:40:42<5:11:29,  1.59it/s]


 41%|█████████████▍                   | 20274/50000 [3:40:43<5:30:55,  1.50it/s]


 41%|█████████████▍                   | 20275/50000 [3:40:44<5:28:53,  1.51it/s]


 41%|█████████████▍                   | 20276/50000 [3:40:44<5:24:41,  1.53it/s]


 41%|█████████████▍                   | 20277/50000 [3:40:45<5:33:05,  1.49it/s]


 41%|█████████████▍                   | 20278/50000 [3:40:46<5:05:33,  1.62it/s]


 41%|█████████████▍                   | 20279/50000 [3:40:46<5:07:21,  1.61it/s]


 41%|█████████████▍                   | 20280/50000 [3:40:47<5:15:13,  1.57it/s]


 41%|█████████████▍                   | 20281/50000 [3:40:47<5:09:24,  1.60it/s]


 41%|█████████████▍                   | 20282/50000 [3:40:48<5:16:44,  1.56it/s]


 41%|█████████████▍                   | 20283/50000 [3:40:49<5:22:44,  1.53it/s]


 41%|█████████████▍                   | 20284/50000 [3:40:49<5:23:11,  1.53it/s]


 41%|█████████████▍                   | 20285/50000 [3:40:50<5:22:33,  1.54it/s]


 41%|█████████████▍                   | 20286/50000 [3:40:51<5:15:29,  1.57it/s]


 41%|█████████████▍                   | 20287/50000 [3:40:51<5:19:59,  1.55it/s]


 41%|█████████████▍                   | 20288/50000 [3:40:52<5:47:34,  1.42it/s]


 41%|█████████████▍                   | 20289/50000 [3:40:53<5:40:41,  1.45it/s]


 41%|█████████████▍                   | 20290/50000 [3:40:54<6:12:41,  1.33it/s]


 41%|█████████████▍                   | 20291/50000 [3:40:54<5:53:25,  1.40it/s]


 41%|█████████████▍                   | 20292/50000 [3:40:55<5:34:19,  1.48it/s]


 41%|█████████████▍                   | 20293/50000 [3:40:56<5:21:48,  1.54it/s]


 41%|█████████████▍                   | 20294/50000 [3:40:56<5:10:17,  1.60it/s]


 41%|█████████████▍                   | 20295/50000 [3:40:57<5:55:08,  1.39it/s]


 41%|█████████████▍                   | 20296/50000 [3:40:58<5:36:24,  1.47it/s]


 41%|█████████████▍                   | 20297/50000 [3:40:58<5:29:02,  1.50it/s]


 41%|█████████████▍                   | 20298/50000 [3:40:59<5:38:16,  1.46it/s]


 41%|█████████████▍                   | 20299/50000 [3:41:00<5:27:24,  1.51it/s]


 41%|█████████████▍                   | 20300/50000 [3:41:00<5:13:13,  1.58it/s]
                                                                                
{'loss': 3.3207, 'grad_norm': 2.7541990280151367, 'learning_rate': 0.000594, 'epoch': 1.06}

 41%|█████████████▍                   | 20300/50000 [3:41:00<5:13:13,  1.58it/s]


 41%|█████████████▍                   | 20301/50000 [3:41:01<5:01:40,  1.64it/s]


 41%|█████████████▍                   | 20302/50000 [3:41:02<5:22:18,  1.54it/s]


 41%|█████████████▍                   | 20303/50000 [3:41:02<5:49:27,  1.42it/s]


 41%|█████████████▍                   | 20304/50000 [3:41:03<5:27:04,  1.51it/s]


 41%|█████████████▍                   | 20305/50000 [3:41:04<5:35:56,  1.47it/s]


 41%|█████████████▍                   | 20306/50000 [3:41:04<5:37:34,  1.47it/s]


 41%|█████████████▍                   | 20307/50000 [3:41:05<6:04:59,  1.36it/s]


 41%|█████████████▍                   | 20308/50000 [3:41:06<5:42:39,  1.44it/s]


 41%|█████████████▍                   | 20309/50000 [3:41:06<5:26:05,  1.52it/s]


 41%|█████████████▍                   | 20310/50000 [3:41:07<5:21:59,  1.54it/s]


 41%|█████████████▍                   | 20311/50000 [3:41:08<5:23:46,  1.53it/s]


 41%|█████████████▍                   | 20312/50000 [3:41:08<5:28:42,  1.51it/s]


 41%|█████████████▍                   | 20313/50000 [3:41:09<5:38:01,  1.46it/s]


 41%|█████████████▍                   | 20314/50000 [3:41:10<5:47:24,  1.42it/s]


 41%|█████████████▍                   | 20315/50000 [3:41:10<5:30:54,  1.50it/s]


 41%|█████████████▍                   | 20316/50000 [3:41:11<5:08:31,  1.60it/s]


 41%|█████████████▍                   | 20317/50000 [3:41:12<5:10:17,  1.59it/s]


 41%|█████████████▍                   | 20318/50000 [3:41:12<5:04:10,  1.63it/s]


 41%|█████████████▍                   | 20319/50000 [3:41:13<5:08:11,  1.61it/s]


 41%|█████████████▍                   | 20320/50000 [3:41:14<5:56:27,  1.39it/s]


 41%|█████████████▍                   | 20321/50000 [3:41:14<5:31:17,  1.49it/s]


 41%|█████████████▍                   | 20322/50000 [3:41:15<5:19:23,  1.55it/s]


 41%|█████████████▍                   | 20323/50000 [3:41:16<5:22:07,  1.54it/s]


 41%|█████████████▍                   | 20324/50000 [3:41:16<5:12:59,  1.58it/s]


 41%|█████████████▍                   | 20325/50000 [3:41:17<5:58:03,  1.38it/s]


 41%|█████████████▍                   | 20326/50000 [3:41:18<5:48:01,  1.42it/s]


 41%|█████████████▍                   | 20327/50000 [3:41:19<6:04:28,  1.36it/s]


 41%|█████████████▍                   | 20328/50000 [3:41:19<5:48:55,  1.42it/s]


 41%|█████████████▍                   | 20329/50000 [3:41:20<5:54:26,  1.40it/s]


 41%|█████████████▍                   | 20330/50000 [3:41:20<5:36:32,  1.47it/s]


 41%|█████████████▍                   | 20331/50000 [3:41:21<5:35:21,  1.47it/s]


 41%|█████████████▍                   | 20332/50000 [3:41:22<5:18:05,  1.55it/s]


 41%|█████████████▍                   | 20333/50000 [3:41:22<5:22:58,  1.53it/s]


 41%|█████████████▍                   | 20334/50000 [3:41:23<5:33:17,  1.48it/s]


 41%|█████████████▍                   | 20335/50000 [3:41:24<5:31:45,  1.49it/s]


 41%|█████████████▍                   | 20336/50000 [3:41:25<6:01:44,  1.37it/s]


 41%|█████████████▍                   | 20337/50000 [3:41:25<5:42:06,  1.45it/s]


 41%|█████████████▍                   | 20338/50000 [3:41:26<6:00:17,  1.37it/s]


 41%|█████████████▍                   | 20339/50000 [3:41:27<5:44:53,  1.43it/s]


 41%|█████████████▍                   | 20340/50000 [3:41:27<5:40:38,  1.45it/s]


 41%|█████████████▍                   | 20341/50000 [3:41:28<5:28:58,  1.50it/s]


 41%|█████████████▍                   | 20342/50000 [3:41:28<5:06:11,  1.61it/s]


 41%|█████████████▍                   | 20343/50000 [3:41:29<5:12:16,  1.58it/s]


 41%|█████████████▍                   | 20344/50000 [3:41:30<5:05:17,  1.62it/s]


 41%|█████████████▍                   | 20345/50000 [3:41:31<5:54:17,  1.40it/s]


 41%|█████████████▍                   | 20346/50000 [3:41:31<6:03:24,  1.36it/s]


 41%|█████████████▍                   | 20347/50000 [3:41:32<6:07:54,  1.34it/s]


 41%|█████████████▍                   | 20348/50000 [3:41:33<5:45:10,  1.43it/s]


 41%|█████████████▍                   | 20349/50000 [3:41:33<5:37:43,  1.46it/s]


 41%|█████████████▍                   | 20350/50000 [3:41:34<5:24:00,  1.53it/s]


 41%|█████████████▍                   | 20351/50000 [3:41:35<5:49:14,  1.41it/s]


 41%|█████████████▍                   | 20352/50000 [3:41:36<5:56:08,  1.39it/s]


 41%|█████████████▍                   | 20353/50000 [3:41:36<5:38:11,  1.46it/s]


 41%|█████████████▍                   | 20354/50000 [3:41:37<5:27:23,  1.51it/s]


 41%|█████████████▍                   | 20355/50000 [3:41:37<5:11:28,  1.59it/s]


 41%|█████████████▍                   | 20356/50000 [3:41:38<5:07:48,  1.61it/s]


 41%|█████████████▍                   | 20357/50000 [3:41:39<4:58:46,  1.65it/s]


 41%|█████████████▍                   | 20358/50000 [3:41:39<5:16:18,  1.56it/s]


 41%|█████████████▍                   | 20359/50000 [3:41:40<5:44:07,  1.44it/s]


 41%|█████████████▍                   | 20360/50000 [3:41:41<5:30:37,  1.49it/s]


 41%|█████████████▍                   | 20361/50000 [3:41:41<5:21:11,  1.54it/s]


 41%|█████████████▍                   | 20362/50000 [3:41:42<5:11:47,  1.58it/s]


 41%|█████████████▍                   | 20363/50000 [3:41:43<5:16:31,  1.56it/s]


 41%|█████████████▍                   | 20364/50000 [3:41:43<4:58:40,  1.65it/s]


 41%|█████████████▍                   | 20365/50000 [3:41:44<5:03:33,  1.63it/s]


 41%|█████████████▍                   | 20366/50000 [3:41:44<4:53:52,  1.68it/s]


 41%|█████████████▍                   | 20367/50000 [3:41:45<5:03:55,  1.62it/s]


 41%|█████████████▍                   | 20368/50000 [3:41:46<5:09:03,  1.60it/s]


 41%|█████████████▍                   | 20369/50000 [3:41:46<5:05:08,  1.62it/s]


 41%|█████████████▍                   | 20370/50000 [3:41:47<5:07:05,  1.61it/s]


 41%|█████████████▍                   | 20371/50000 [3:41:47<4:51:07,  1.70it/s]


 41%|█████████████▍                   | 20372/50000 [3:41:48<5:12:48,  1.58it/s]


 41%|█████████████▍                   | 20373/50000 [3:41:49<5:06:50,  1.61it/s]


 41%|█████████████▍                   | 20374/50000 [3:41:49<5:08:18,  1.60it/s]


 41%|█████████████▍                   | 20375/50000 [3:41:50<5:30:22,  1.49it/s]


 41%|█████████████▍                   | 20376/50000 [3:41:51<5:30:23,  1.49it/s]


 41%|█████████████▍                   | 20377/50000 [3:41:51<5:21:51,  1.53it/s]


 41%|█████████████▍                   | 20378/50000 [3:41:52<5:13:55,  1.57it/s]


 41%|█████████████▍                   | 20379/50000 [3:41:53<5:08:30,  1.60it/s]


 41%|█████████████▍                   | 20380/50000 [3:41:53<5:05:39,  1.62it/s]


 41%|█████████████▍                   | 20381/50000 [3:41:54<5:12:10,  1.58it/s]


 41%|█████████████▍                   | 20382/50000 [3:41:55<5:18:36,  1.55it/s]


 41%|█████████████▍                   | 20383/50000 [3:41:55<5:24:01,  1.52it/s]


 41%|█████████████▍                   | 20384/50000 [3:41:56<5:34:44,  1.47it/s]


 41%|█████████████▍                   | 20385/50000 [3:41:57<5:34:22,  1.48it/s]


 41%|█████████████▍                   | 20386/50000 [3:41:57<5:22:50,  1.53it/s]


 41%|█████████████▍                   | 20387/50000 [3:41:58<5:32:42,  1.48it/s]


 41%|█████████████▍                   | 20388/50000 [3:41:59<5:28:29,  1.50it/s]


 41%|█████████████▍                   | 20389/50000 [3:41:59<5:27:39,  1.51it/s]


 41%|█████████████▍                   | 20390/50000 [3:42:00<5:56:42,  1.38it/s]


 41%|█████████████▍                   | 20391/50000 [3:42:01<5:58:36,  1.38it/s]


 41%|█████████████▍                   | 20392/50000 [3:42:02<5:59:56,  1.37it/s]


 41%|█████████████▍                   | 20393/50000 [3:42:02<5:41:34,  1.44it/s]


 41%|█████████████▍                   | 20394/50000 [3:42:03<5:49:49,  1.41it/s]


 41%|█████████████▍                   | 20395/50000 [3:42:04<5:59:24,  1.37it/s]


 41%|█████████████▍                   | 20396/50000 [3:42:04<5:58:59,  1.37it/s]


 41%|█████████████▍                   | 20397/50000 [3:42:05<6:11:44,  1.33it/s]


 41%|█████████████▍                   | 20398/50000 [3:42:06<5:56:09,  1.39it/s]


 41%|█████████████▍                   | 20399/50000 [3:42:07<5:49:45,  1.41it/s]


 41%|█████████████▍                   | 20400/50000 [3:42:07<5:44:25,  1.43it/s]
                                                                                
{'loss': 3.3372, 'grad_norm': 2.9778053760528564, 'learning_rate': 0.000592, 'epoch': 1.07}

 41%|█████████████▍                   | 20400/50000 [3:42:07<5:44:25,  1.43it/s]


 41%|█████████████▍                   | 20401/50000 [3:42:08<5:41:24,  1.44it/s]


 41%|█████████████▍                   | 20402/50000 [3:42:09<5:30:45,  1.49it/s]


 41%|█████████████▍                   | 20403/50000 [3:42:09<5:47:24,  1.42it/s]


 41%|█████████████▍                   | 20404/50000 [3:42:10<5:38:15,  1.46it/s]


 41%|█████████████▍                   | 20405/50000 [3:42:11<5:38:25,  1.46it/s]


 41%|█████████████▍                   | 20406/50000 [3:42:11<5:30:56,  1.49it/s]


 41%|█████████████▍                   | 20407/50000 [3:42:12<5:07:29,  1.60it/s]


 41%|█████████████▍                   | 20408/50000 [3:42:12<5:14:17,  1.57it/s]


 41%|█████████████▍                   | 20409/50000 [3:42:13<5:02:49,  1.63it/s]


 41%|█████████████▍                   | 20410/50000 [3:42:14<4:50:02,  1.70it/s]


 41%|█████████████▍                   | 20411/50000 [3:42:14<5:05:04,  1.62it/s]


 41%|█████████████▍                   | 20412/50000 [3:42:15<5:06:56,  1.61it/s]


 41%|█████████████▍                   | 20413/50000 [3:42:15<4:57:58,  1.65it/s]


 41%|█████████████▍                   | 20414/50000 [3:42:16<5:01:19,  1.64it/s]


 41%|█████████████▍                   | 20415/50000 [3:42:17<5:37:07,  1.46it/s]


 41%|█████████████▍                   | 20416/50000 [3:42:18<5:35:56,  1.47it/s]


 41%|█████████████▍                   | 20417/50000 [3:42:18<5:34:11,  1.48it/s]


 41%|█████████████▍                   | 20418/50000 [3:42:19<5:34:46,  1.47it/s]


 41%|█████████████▍                   | 20419/50000 [3:42:20<5:44:53,  1.43it/s]


 41%|█████████████▍                   | 20420/50000 [3:42:20<5:27:19,  1.51it/s]


 41%|█████████████▍                   | 20421/50000 [3:42:21<5:25:16,  1.52it/s]


 41%|█████████████▍                   | 20422/50000 [3:42:21<5:10:45,  1.59it/s]


 41%|█████████████▍                   | 20423/50000 [3:42:22<5:18:31,  1.55it/s]


 41%|█████████████▍                   | 20424/50000 [3:42:23<5:24:34,  1.52it/s]


 41%|█████████████▍                   | 20425/50000 [3:42:23<5:17:22,  1.55it/s]


 41%|█████████████▍                   | 20426/50000 [3:42:24<5:03:03,  1.63it/s]


 41%|█████████████▍                   | 20427/50000 [3:42:25<5:12:40,  1.58it/s]


 41%|█████████████▍                   | 20428/50000 [3:42:25<5:19:23,  1.54it/s]


 41%|█████████████▍                   | 20429/50000 [3:42:26<5:11:51,  1.58it/s]


 41%|█████████████▍                   | 20430/50000 [3:42:27<5:26:48,  1.51it/s]


 41%|█████████████▍                   | 20431/50000 [3:42:27<5:37:06,  1.46it/s]


 41%|█████████████▍                   | 20432/50000 [3:42:28<5:29:02,  1.50it/s]


 41%|█████████████▍                   | 20433/50000 [3:42:29<5:55:16,  1.39it/s]


 41%|█████████████▍                   | 20434/50000 [3:42:30<5:47:17,  1.42it/s]


 41%|█████████████▍                   | 20435/50000 [3:42:30<5:31:21,  1.49it/s]


 41%|█████████████▍                   | 20436/50000 [3:42:31<5:12:45,  1.58it/s]


 41%|█████████████▍                   | 20437/50000 [3:42:31<5:05:22,  1.61it/s]


 41%|█████████████▍                   | 20438/50000 [3:42:32<5:15:11,  1.56it/s]


 41%|█████████████▍                   | 20439/50000 [3:42:33<5:08:57,  1.59it/s]


 41%|█████████████▍                   | 20440/50000 [3:42:33<5:05:29,  1.61it/s]


 41%|█████████████▍                   | 20441/50000 [3:42:34<5:22:16,  1.53it/s]


 41%|█████████████▍                   | 20442/50000 [3:42:35<5:25:07,  1.52it/s]


 41%|█████████████▍                   | 20443/50000 [3:42:35<5:33:50,  1.48it/s]


 41%|█████████████▍                   | 20444/50000 [3:42:36<5:42:07,  1.44it/s]


 41%|█████████████▍                   | 20445/50000 [3:42:37<5:28:42,  1.50it/s]


 41%|█████████████▍                   | 20446/50000 [3:42:37<5:38:23,  1.46it/s]


 41%|█████████████▍                   | 20447/50000 [3:42:38<5:35:28,  1.47it/s]


 41%|█████████████▍                   | 20448/50000 [3:42:39<5:26:55,  1.51it/s]


 41%|█████████████▍                   | 20449/50000 [3:42:39<5:34:01,  1.47it/s]


 41%|█████████████▍                   | 20450/50000 [3:42:40<5:27:44,  1.50it/s]


 41%|█████████████▍                   | 20451/50000 [3:42:41<5:13:57,  1.57it/s]


 41%|█████████████▍                   | 20452/50000 [3:42:41<5:35:37,  1.47it/s]


 41%|█████████████▍                   | 20453/50000 [3:42:42<5:29:42,  1.49it/s]


 41%|█████████████▍                   | 20454/50000 [3:42:43<5:29:20,  1.50it/s]


 41%|█████████████▌                   | 20455/50000 [3:42:43<5:30:04,  1.49it/s]


 41%|█████████████▌                   | 20456/50000 [3:42:44<5:16:33,  1.56it/s]


 41%|█████████████▌                   | 20457/50000 [3:42:45<5:16:23,  1.56it/s]


 41%|█████████████▌                   | 20458/50000 [3:42:45<5:08:46,  1.59it/s]


 41%|█████████████▌                   | 20459/50000 [3:42:46<5:00:42,  1.64it/s]


 41%|█████████████▌                   | 20460/50000 [3:42:47<5:24:27,  1.52it/s]


 41%|█████████████▌                   | 20461/50000 [3:42:47<5:00:50,  1.64it/s]


 41%|█████████████▌                   | 20462/50000 [3:42:48<5:10:05,  1.59it/s]


 41%|█████████████▌                   | 20463/50000 [3:42:48<4:59:44,  1.64it/s]


 41%|█████████████▌                   | 20464/50000 [3:42:49<4:54:17,  1.67it/s]


 41%|█████████████▌                   | 20465/50000 [3:42:49<5:02:37,  1.63it/s]


 41%|█████████████▌                   | 20466/50000 [3:42:50<5:16:58,  1.55it/s]


 41%|█████████████▌                   | 20467/50000 [3:42:51<5:10:04,  1.59it/s]


 41%|█████████████▌                   | 20468/50000 [3:42:51<5:14:38,  1.56it/s]


 41%|█████████████▌                   | 20469/50000 [3:42:52<5:08:27,  1.60it/s]


 41%|█████████████▌                   | 20470/50000 [3:42:53<5:13:31,  1.57it/s]


 41%|█████████████▌                   | 20471/50000 [3:42:53<5:30:37,  1.49it/s]


 41%|█████████████▌                   | 20472/50000 [3:42:54<5:41:38,  1.44it/s]


 41%|█████████████▌                   | 20473/50000 [3:42:55<5:34:51,  1.47it/s]


 41%|█████████████▌                   | 20474/50000 [3:42:55<5:17:56,  1.55it/s]


 41%|█████████████▌                   | 20475/50000 [3:42:56<5:14:32,  1.56it/s]


 41%|█████████████▌                   | 20476/50000 [3:42:57<5:16:45,  1.55it/s]


 41%|█████████████▌                   | 20477/50000 [3:42:57<5:35:37,  1.47it/s]


 41%|█████████████▌                   | 20478/50000 [3:42:58<5:24:02,  1.52it/s]


 41%|█████████████▌                   | 20479/50000 [3:42:59<5:16:50,  1.55it/s]


 41%|█████████████▌                   | 20480/50000 [3:42:59<5:33:52,  1.47it/s]


 41%|█████████████▌                   | 20481/50000 [3:43:00<5:59:22,  1.37it/s]


 41%|█████████████▌                   | 20482/50000 [3:43:01<5:37:52,  1.46it/s]


 41%|█████████████▌                   | 20483/50000 [3:43:02<5:31:13,  1.49it/s]


 41%|█████████████▌                   | 20484/50000 [3:43:02<5:32:11,  1.48it/s]


 41%|█████████████▌                   | 20485/50000 [3:43:03<5:28:13,  1.50it/s]


 41%|█████████████▌                   | 20486/50000 [3:43:04<5:35:47,  1.46it/s]


 41%|█████████████▌                   | 20487/50000 [3:43:04<5:25:00,  1.51it/s]


 41%|█████████████▌                   | 20488/50000 [3:43:05<5:12:51,  1.57it/s]


 41%|█████████████▌                   | 20489/50000 [3:43:05<5:28:05,  1.50it/s]


 41%|█████████████▌                   | 20490/50000 [3:43:06<5:57:13,  1.38it/s]


 41%|█████████████▌                   | 20491/50000 [3:43:07<5:41:58,  1.44it/s]


 41%|█████████████▌                   | 20492/50000 [3:43:08<5:27:33,  1.50it/s]


 41%|█████████████▌                   | 20493/50000 [3:43:08<5:26:50,  1.50it/s]


 41%|█████████████▌                   | 20494/50000 [3:43:09<5:29:07,  1.49it/s]


 41%|█████████████▌                   | 20495/50000 [3:43:10<5:19:26,  1.54it/s]


 41%|█████████████▌                   | 20496/50000 [3:43:10<5:16:46,  1.55it/s]


 41%|█████████████▌                   | 20497/50000 [3:43:11<5:26:57,  1.50it/s]


 41%|█████████████▌                   | 20498/50000 [3:43:12<5:39:49,  1.45it/s]


 41%|█████████████▌                   | 20499/50000 [3:43:12<5:18:31,  1.54it/s]


 41%|█████████████▌                   | 20500/50000 [3:43:13<5:20:42,  1.53it/s]
                                                                                
{'loss': 3.3126, 'grad_norm': 3.527111768722534, 'learning_rate': 0.00059, 'epoch': 1.07}

 41%|█████████████▌                   | 20500/50000 [3:43:13<5:20:42,  1.53it/s]


 41%|█████████████▌                   | 20501/50000 [3:43:13<5:22:45,  1.52it/s]


 41%|█████████████▌                   | 20502/50000 [3:43:14<5:14:08,  1.56it/s]


 41%|█████████████▌                   | 20503/50000 [3:43:15<4:54:33,  1.67it/s]


 41%|█████████████▌                   | 20504/50000 [3:43:15<4:58:53,  1.64it/s]


 41%|█████████████▌                   | 20505/50000 [3:43:16<5:28:57,  1.49it/s]


 41%|█████████████▌                   | 20506/50000 [3:43:17<5:12:47,  1.57it/s]


 41%|█████████████▌                   | 20507/50000 [3:43:17<5:12:32,  1.57it/s]


 41%|█████████████▌                   | 20508/50000 [3:43:18<5:24:33,  1.51it/s]


 41%|█████████████▌                   | 20509/50000 [3:43:19<5:28:06,  1.50it/s]


 41%|█████████████▌                   | 20510/50000 [3:43:19<5:18:18,  1.54it/s]


 41%|█████████████▌                   | 20511/50000 [3:43:20<5:15:09,  1.56it/s]


 41%|█████████████▌                   | 20512/50000 [3:43:20<5:06:26,  1.60it/s]


 41%|█████████████▌                   | 20513/50000 [3:43:21<5:19:04,  1.54it/s]


 41%|█████████████▌                   | 20514/50000 [3:43:22<5:24:09,  1.52it/s]


 41%|█████████████▌                   | 20515/50000 [3:43:22<5:08:08,  1.59it/s]


 41%|█████████████▌                   | 20516/50000 [3:43:23<5:14:34,  1.56it/s]


 41%|█████████████▌                   | 20517/50000 [3:43:24<5:15:21,  1.56it/s]


 41%|█████████████▌                   | 20518/50000 [3:43:24<5:27:23,  1.50it/s]


 41%|█████████████▌                   | 20519/50000 [3:43:25<5:28:05,  1.50it/s]


 41%|█████████████▌                   | 20520/50000 [3:43:26<5:23:51,  1.52it/s]


 41%|█████████████▌                   | 20521/50000 [3:43:26<5:20:53,  1.53it/s]


 41%|█████████████▌                   | 20522/50000 [3:43:27<5:04:56,  1.61it/s]


 41%|█████████████▌                   | 20523/50000 [3:43:28<5:24:51,  1.51it/s]


 41%|█████████████▌                   | 20524/50000 [3:43:28<5:16:04,  1.55it/s]


 41%|█████████████▌                   | 20525/50000 [3:43:29<5:09:48,  1.59it/s]


 41%|█████████████▌                   | 20526/50000 [3:43:29<5:01:46,  1.63it/s]


 41%|█████████████▌                   | 20527/50000 [3:43:30<5:22:36,  1.52it/s]


 41%|█████████████▌                   | 20528/50000 [3:43:31<5:07:55,  1.60it/s]


 41%|█████████████▌                   | 20529/50000 [3:43:31<5:10:36,  1.58it/s]


 41%|█████████████▌                   | 20530/50000 [3:43:32<5:22:49,  1.52it/s]


 41%|█████████████▌                   | 20531/50000 [3:43:33<5:18:22,  1.54it/s]


 41%|█████████████▌                   | 20532/50000 [3:43:33<5:30:31,  1.49it/s]


 41%|█████████████▌                   | 20533/50000 [3:43:34<5:43:41,  1.43it/s]


 41%|█████████████▌                   | 20534/50000 [3:43:35<5:33:47,  1.47it/s]


 41%|█████████████▌                   | 20535/50000 [3:43:36<5:33:21,  1.47it/s]


 41%|█████████████▌                   | 20536/50000 [3:43:36<5:26:36,  1.50it/s]


 41%|█████████████▌                   | 20537/50000 [3:43:37<6:09:28,  1.33it/s]


 41%|█████████████▌                   | 20538/50000 [3:43:38<5:50:58,  1.40it/s]


 41%|█████████████▌                   | 20539/50000 [3:43:39<6:11:37,  1.32it/s]


 41%|█████████████▌                   | 20540/50000 [3:43:39<5:51:33,  1.40it/s]


 41%|█████████████▌                   | 20541/50000 [3:43:40<5:52:39,  1.39it/s]


 41%|█████████████▌                   | 20542/50000 [3:43:41<5:58:13,  1.37it/s]


 41%|█████████████▌                   | 20543/50000 [3:43:41<5:43:03,  1.43it/s]


 41%|█████████████▌                   | 20544/50000 [3:43:42<5:25:52,  1.51it/s]


 41%|█████████████▌                   | 20545/50000 [3:43:42<5:08:21,  1.59it/s]


 41%|█████████████▌                   | 20546/50000 [3:43:43<5:12:48,  1.57it/s]


 41%|█████████████▌                   | 20547/50000 [3:43:44<5:23:22,  1.52it/s]


 41%|█████████████▌                   | 20548/50000 [3:43:44<5:19:32,  1.54it/s]


 41%|█████████████▌                   | 20549/50000 [3:43:45<5:18:44,  1.54it/s]


 41%|█████████████▌                   | 20550/50000 [3:43:46<4:58:06,  1.65it/s]


 41%|█████████████▌                   | 20551/50000 [3:43:46<4:37:18,  1.77it/s]


 41%|█████████████▌                   | 20552/50000 [3:43:47<4:36:12,  1.78it/s]


 41%|█████████████▌                   | 20553/50000 [3:43:47<4:58:39,  1.64it/s]


 41%|█████████████▌                   | 20554/50000 [3:43:48<4:53:51,  1.67it/s]


 41%|█████████████▌                   | 20555/50000 [3:43:49<4:58:52,  1.64it/s]


 41%|█████████████▌                   | 20556/50000 [3:43:49<4:54:58,  1.66it/s]


 41%|█████████████▌                   | 20557/50000 [3:43:50<4:49:29,  1.70it/s]


 41%|█████████████▌                   | 20558/50000 [3:43:50<5:12:01,  1.57it/s]


 41%|█████████████▌                   | 20559/50000 [3:43:51<4:53:26,  1.67it/s]


 41%|█████████████▌                   | 20560/50000 [3:43:52<4:57:43,  1.65it/s]


 41%|█████████████▌                   | 20561/50000 [3:43:52<4:57:06,  1.65it/s]


 41%|█████████████▌                   | 20562/50000 [3:43:53<5:16:37,  1.55it/s]


 41%|█████████████▌                   | 20563/50000 [3:43:54<5:35:00,  1.46it/s]


 41%|█████████████▌                   | 20564/50000 [3:43:54<5:27:04,  1.50it/s]


 41%|█████████████▌                   | 20565/50000 [3:43:55<5:17:42,  1.54it/s]


 41%|█████████████▌                   | 20566/50000 [3:43:56<5:55:01,  1.38it/s]


 41%|█████████████▌                   | 20567/50000 [3:43:57<5:56:21,  1.38it/s]


 41%|█████████████▌                   | 20568/50000 [3:43:57<5:43:46,  1.43it/s]


 41%|█████████████▌                   | 20569/50000 [3:43:58<5:33:08,  1.47it/s]


 41%|█████████████▌                   | 20570/50000 [3:43:58<5:24:47,  1.51it/s]


 41%|█████████████▌                   | 20571/50000 [3:43:59<5:14:14,  1.56it/s]


 41%|█████████████▌                   | 20572/50000 [3:44:00<5:51:40,  1.39it/s]


 41%|█████████████▌                   | 20573/50000 [3:44:01<5:50:38,  1.40it/s]


 41%|█████████████▌                   | 20574/50000 [3:44:01<5:37:38,  1.45it/s]


 41%|█████████████▌                   | 20575/50000 [3:44:02<5:24:42,  1.51it/s]


 41%|█████████████▌                   | 20576/50000 [3:44:02<5:11:58,  1.57it/s]


 41%|█████████████▌                   | 20577/50000 [3:44:03<4:58:27,  1.64it/s]


 41%|█████████████▌                   | 20578/50000 [3:44:04<5:20:59,  1.53it/s]


 41%|█████████████▌                   | 20579/50000 [3:44:05<5:31:08,  1.48it/s]


 41%|█████████████▌                   | 20580/50000 [3:44:05<5:18:27,  1.54it/s]


 41%|█████████████▌                   | 20581/50000 [3:44:06<5:23:33,  1.52it/s]


 41%|█████████████▌                   | 20582/50000 [3:44:06<5:10:14,  1.58it/s]


 41%|█████████████▌                   | 20583/50000 [3:44:07<5:04:42,  1.61it/s]


 41%|█████████████▌                   | 20584/50000 [3:44:08<5:18:29,  1.54it/s]


 41%|█████████████▌                   | 20585/50000 [3:44:08<5:23:24,  1.52it/s]


 41%|█████████████▌                   | 20586/50000 [3:44:09<5:33:42,  1.47it/s]


 41%|█████████████▌                   | 20587/50000 [3:44:10<5:24:50,  1.51it/s]


 41%|█████████████▌                   | 20588/50000 [3:44:10<5:20:27,  1.53it/s]


 41%|█████████████▌                   | 20589/50000 [3:44:11<5:20:06,  1.53it/s]


 41%|█████████████▌                   | 20590/50000 [3:44:12<5:21:46,  1.52it/s]


 41%|█████████████▌                   | 20591/50000 [3:44:12<5:24:46,  1.51it/s]


 41%|█████████████▌                   | 20592/50000 [3:44:13<5:31:35,  1.48it/s]


 41%|█████████████▌                   | 20593/50000 [3:44:14<5:08:54,  1.59it/s]


 41%|█████████████▌                   | 20594/50000 [3:44:14<5:14:48,  1.56it/s]


 41%|█████████████▌                   | 20595/50000 [3:44:15<5:52:31,  1.39it/s]


 41%|█████████████▌                   | 20596/50000 [3:44:16<5:34:14,  1.47it/s]


 41%|█████████████▌                   | 20597/50000 [3:44:16<5:17:46,  1.54it/s]


 41%|█████████████▌                   | 20598/50000 [3:44:17<5:34:28,  1.47it/s]


 41%|█████████████▌                   | 20599/50000 [3:44:18<6:31:57,  1.25it/s]


 41%|█████████████▌                   | 20600/50000 [3:44:19<5:54:35,  1.38it/s]
                                                                                
{'loss': 3.3156, 'grad_norm': 3.5688345432281494, 'learning_rate': 0.000588, 'epoch': 1.08}

 41%|█████████████▌                   | 20600/50000 [3:44:19<5:54:35,  1.38it/s]


 41%|█████████████▌                   | 20601/50000 [3:44:19<5:46:58,  1.41it/s]


 41%|█████████████▌                   | 20602/50000 [3:44:20<5:19:37,  1.53it/s]


 41%|█████████████▌                   | 20603/50000 [3:44:21<5:17:38,  1.54it/s]


 41%|█████████████▌                   | 20604/50000 [3:44:21<5:22:50,  1.52it/s]


 41%|█████████████▌                   | 20605/50000 [3:44:22<5:14:19,  1.56it/s]


 41%|█████████████▌                   | 20606/50000 [3:44:22<5:02:53,  1.62it/s]


 41%|█████████████▌                   | 20607/50000 [3:44:23<5:21:13,  1.53it/s]


 41%|█████████████▌                   | 20608/50000 [3:44:24<5:17:50,  1.54it/s]


 41%|█████████████▌                   | 20609/50000 [3:44:24<5:34:31,  1.46it/s]


 41%|█████████████▌                   | 20610/50000 [3:44:25<5:25:21,  1.51it/s]


 41%|█████████████▌                   | 20611/50000 [3:44:26<5:09:38,  1.58it/s]


 41%|█████████████▌                   | 20612/50000 [3:44:26<4:58:40,  1.64it/s]


 41%|█████████████▌                   | 20613/50000 [3:44:27<4:54:37,  1.66it/s]


 41%|█████████████▌                   | 20614/50000 [3:44:27<5:02:55,  1.62it/s]


 41%|█████████████▌                   | 20615/50000 [3:44:28<5:08:36,  1.59it/s]


 41%|█████████████▌                   | 20616/50000 [3:44:29<5:30:56,  1.48it/s]


 41%|█████████████▌                   | 20617/50000 [3:44:30<5:25:33,  1.50it/s]


 41%|█████████████▌                   | 20618/50000 [3:44:30<5:11:34,  1.57it/s]


 41%|█████████████▌                   | 20619/50000 [3:44:31<5:15:42,  1.55it/s]


 41%|█████████████▌                   | 20620/50000 [3:44:31<5:20:26,  1.53it/s]


 41%|█████████████▌                   | 20621/50000 [3:44:32<5:33:24,  1.47it/s]


 41%|█████████████▌                   | 20622/50000 [3:44:33<5:18:57,  1.54it/s]


 41%|█████████████▌                   | 20623/50000 [3:44:33<5:12:26,  1.57it/s]


 41%|█████████████▌                   | 20624/50000 [3:44:34<5:11:41,  1.57it/s]


 41%|█████████████▌                   | 20625/50000 [3:44:35<4:54:52,  1.66it/s]


 41%|█████████████▌                   | 20626/50000 [3:44:35<5:02:08,  1.62it/s]


 41%|█████████████▌                   | 20627/50000 [3:44:36<5:09:03,  1.58it/s]


 41%|█████████████▌                   | 20628/50000 [3:44:37<5:11:08,  1.57it/s]


 41%|█████████████▌                   | 20629/50000 [3:44:37<5:03:19,  1.61it/s]


 41%|█████████████▌                   | 20630/50000 [3:44:38<4:59:37,  1.63it/s]


 41%|█████████████▌                   | 20631/50000 [3:44:38<5:02:42,  1.62it/s]


 41%|█████████████▌                   | 20632/50000 [3:44:39<4:45:50,  1.71it/s]


 41%|█████████████▌                   | 20633/50000 [3:44:39<4:55:42,  1.66it/s]


 41%|█████████████▌                   | 20634/50000 [3:44:40<5:14:06,  1.56it/s]


 41%|█████████████▌                   | 20635/50000 [3:44:41<5:23:55,  1.51it/s]


 41%|█████████████▌                   | 20636/50000 [3:44:41<5:10:25,  1.58it/s]


 41%|█████████████▌                   | 20637/50000 [3:44:42<5:00:01,  1.63it/s]


 41%|█████████████▌                   | 20638/50000 [3:44:43<5:09:23,  1.58it/s]


 41%|█████████████▌                   | 20639/50000 [3:44:43<5:10:22,  1.58it/s]


 41%|█████████████▌                   | 20640/50000 [3:44:44<5:26:19,  1.50it/s]


 41%|█████████████▌                   | 20641/50000 [3:44:45<5:25:58,  1.50it/s]


 41%|█████████████▌                   | 20642/50000 [3:44:45<5:15:27,  1.55it/s]


 41%|█████████████▌                   | 20643/50000 [3:44:46<5:14:32,  1.56it/s]


 41%|█████████████▋                   | 20644/50000 [3:44:47<5:26:25,  1.50it/s]


 41%|█████████████▋                   | 20645/50000 [3:44:47<5:16:40,  1.54it/s]


 41%|█████████████▋                   | 20646/50000 [3:44:48<5:13:39,  1.56it/s]


 41%|█████████████▋                   | 20647/50000 [3:44:49<5:05:02,  1.60it/s]


 41%|█████████████▋                   | 20648/50000 [3:44:49<5:13:52,  1.56it/s]


 41%|█████████████▋                   | 20649/50000 [3:44:50<5:08:48,  1.58it/s]


 41%|█████████████▋                   | 20650/50000 [3:44:50<5:00:38,  1.63it/s]


 41%|█████████████▋                   | 20651/50000 [3:44:51<4:54:57,  1.66it/s]


 41%|█████████████▋                   | 20652/50000 [3:44:52<5:13:39,  1.56it/s]


 41%|█████████████▋                   | 20653/50000 [3:44:52<5:15:09,  1.55it/s]


 41%|█████████████▋                   | 20654/50000 [3:44:53<5:19:57,  1.53it/s]


 41%|█████████████▋                   | 20655/50000 [3:44:54<5:31:03,  1.48it/s]


 41%|█████████████▋                   | 20656/50000 [3:44:54<5:18:58,  1.53it/s]


 41%|█████████████▋                   | 20657/50000 [3:44:55<4:58:02,  1.64it/s]


 41%|█████████████▋                   | 20658/50000 [3:44:56<5:06:05,  1.60it/s]


 41%|█████████████▋                   | 20659/50000 [3:44:56<5:13:35,  1.56it/s]


 41%|█████████████▋                   | 20660/50000 [3:44:57<4:49:43,  1.69it/s]


 41%|█████████████▋                   | 20661/50000 [3:44:57<4:56:30,  1.65it/s]


 41%|█████████████▋                   | 20662/50000 [3:44:58<5:06:51,  1.59it/s]


 41%|█████████████▋                   | 20663/50000 [3:44:59<5:07:39,  1.59it/s]


 41%|█████████████▋                   | 20664/50000 [3:44:59<4:59:53,  1.63it/s]


 41%|█████████████▋                   | 20665/50000 [3:45:00<4:53:33,  1.67it/s]


 41%|█████████████▋                   | 20666/50000 [3:45:00<4:57:20,  1.64it/s]


 41%|█████████████▋                   | 20667/50000 [3:45:01<4:59:06,  1.63it/s]


 41%|█████████████▋                   | 20668/50000 [3:45:02<5:08:39,  1.58it/s]


 41%|█████████████▋                   | 20669/50000 [3:45:02<5:09:04,  1.58it/s]


 41%|█████████████▋                   | 20670/50000 [3:45:03<4:56:16,  1.65it/s]


 41%|█████████████▋                   | 20671/50000 [3:45:03<4:48:13,  1.70it/s]


 41%|█████████████▋                   | 20672/50000 [3:45:04<4:57:16,  1.64it/s]


 41%|█████████████▋                   | 20673/50000 [3:45:05<4:44:40,  1.72it/s]


 41%|█████████████▋                   | 20674/50000 [3:45:05<4:54:45,  1.66it/s]


 41%|█████████████▋                   | 20675/50000 [3:45:06<5:06:34,  1.59it/s]


 41%|█████████████▋                   | 20676/50000 [3:45:07<4:56:50,  1.65it/s]


 41%|█████████████▋                   | 20677/50000 [3:45:07<4:58:47,  1.64it/s]


 41%|█████████████▋                   | 20678/50000 [3:45:08<4:54:30,  1.66it/s]


 41%|█████████████▋                   | 20679/50000 [3:45:08<4:53:18,  1.67it/s]


 41%|█████████████▋                   | 20680/50000 [3:45:09<4:50:39,  1.68it/s]


 41%|█████████████▋                   | 20681/50000 [3:45:10<5:10:32,  1.57it/s]


 41%|█████████████▋                   | 20682/50000 [3:45:10<5:12:52,  1.56it/s]


 41%|█████████████▋                   | 20683/50000 [3:45:11<5:19:09,  1.53it/s]


 41%|█████████████▋                   | 20684/50000 [3:45:12<5:14:23,  1.55it/s]


 41%|█████████████▋                   | 20685/50000 [3:45:12<5:11:00,  1.57it/s]


 41%|█████████████▋                   | 20686/50000 [3:45:13<5:10:28,  1.57it/s]


 41%|█████████████▋                   | 20687/50000 [3:45:13<5:08:16,  1.58it/s]


 41%|█████████████▋                   | 20688/50000 [3:45:14<5:26:32,  1.50it/s]


 41%|█████████████▋                   | 20689/50000 [3:45:15<5:15:46,  1.55it/s]


 41%|█████████████▋                   | 20690/50000 [3:45:15<5:09:07,  1.58it/s]


 41%|█████████████▋                   | 20691/50000 [3:45:16<5:10:00,  1.58it/s]


 41%|█████████████▋                   | 20692/50000 [3:45:17<5:26:06,  1.50it/s]


 41%|█████████████▋                   | 20693/50000 [3:45:17<5:11:41,  1.57it/s]


 41%|█████████████▋                   | 20694/50000 [3:45:18<5:06:13,  1.60it/s]


 41%|█████████████▋                   | 20695/50000 [3:45:18<4:54:08,  1.66it/s]


 41%|█████████████▋                   | 20696/50000 [3:45:19<4:53:01,  1.67it/s]


 41%|█████████████▋                   | 20697/50000 [3:45:20<4:56:53,  1.64it/s]


 41%|█████████████▋                   | 20698/50000 [3:45:20<5:06:06,  1.60it/s]


 41%|█████████████▋                   | 20699/50000 [3:45:21<4:58:40,  1.64it/s]


 41%|█████████████▋                   | 20700/50000 [3:45:22<5:12:57,  1.56it/s]
                                                                                
{'loss': 3.2986, 'grad_norm': 3.981605291366577, 'learning_rate': 0.0005859999999999999, 'epoch': 1.08}

 41%|█████████████▋                   | 20700/50000 [3:45:22<5:12:57,  1.56it/s]


 41%|█████████████▋                   | 20701/50000 [3:45:22<5:10:03,  1.57it/s]


 41%|█████████████▋                   | 20702/50000 [3:45:23<5:06:06,  1.60it/s]


 41%|█████████████▋                   | 20703/50000 [3:45:24<5:39:41,  1.44it/s]


 41%|█████████████▋                   | 20704/50000 [3:45:25<6:13:14,  1.31it/s]


 41%|█████████████▋                   | 20705/50000 [3:45:25<5:48:25,  1.40it/s]


 41%|█████████████▋                   | 20706/50000 [3:45:26<5:52:42,  1.38it/s]


 41%|█████████████▋                   | 20707/50000 [3:45:27<5:55:37,  1.37it/s]


 41%|█████████████▋                   | 20708/50000 [3:45:27<5:43:23,  1.42it/s]


 41%|█████████████▋                   | 20709/50000 [3:45:28<5:33:54,  1.46it/s]


 41%|█████████████▋                   | 20710/50000 [3:45:29<5:29:08,  1.48it/s]


 41%|█████████████▋                   | 20711/50000 [3:45:29<5:04:09,  1.60it/s]


 41%|█████████████▋                   | 20712/50000 [3:45:30<5:06:28,  1.59it/s]


 41%|█████████████▋                   | 20713/50000 [3:45:30<5:06:14,  1.59it/s]


 41%|█████████████▋                   | 20714/50000 [3:45:31<5:40:22,  1.43it/s]


 41%|█████████████▋                   | 20715/50000 [3:45:32<6:11:43,  1.31it/s]


 41%|█████████████▋                   | 20716/50000 [3:45:33<5:52:55,  1.38it/s]


 41%|█████████████▋                   | 20717/50000 [3:45:34<5:46:08,  1.41it/s]


 41%|█████████████▋                   | 20718/50000 [3:45:34<5:40:28,  1.43it/s]


 41%|█████████████▋                   | 20719/50000 [3:45:35<5:30:12,  1.48it/s]


 41%|█████████████▋                   | 20720/50000 [3:45:36<5:30:15,  1.48it/s]


 41%|█████████████▋                   | 20721/50000 [3:45:36<6:02:48,  1.35it/s]


 41%|█████████████▋                   | 20722/50000 [3:45:37<5:45:38,  1.41it/s]


 41%|█████████████▋                   | 20723/50000 [3:45:38<5:38:16,  1.44it/s]


 41%|█████████████▋                   | 20724/50000 [3:45:38<5:23:49,  1.51it/s]


 41%|█████████████▋                   | 20725/50000 [3:45:39<5:23:56,  1.51it/s]


 41%|█████████████▋                   | 20726/50000 [3:45:40<5:54:11,  1.38it/s]


 41%|█████████████▋                   | 20727/50000 [3:45:41<5:48:04,  1.40it/s]


 41%|█████████████▋                   | 20728/50000 [3:45:41<5:36:19,  1.45it/s]


 41%|█████████████▋                   | 20729/50000 [3:45:42<5:27:08,  1.49it/s]


 41%|█████████████▋                   | 20730/50000 [3:45:42<5:11:27,  1.57it/s]


 41%|█████████████▋                   | 20731/50000 [3:45:43<5:06:53,  1.59it/s]


 41%|█████████████▋                   | 20732/50000 [3:45:44<5:12:06,  1.56it/s]


 41%|█████████████▋                   | 20733/50000 [3:45:44<5:12:54,  1.56it/s]


 41%|█████████████▋                   | 20734/50000 [3:45:45<5:28:39,  1.48it/s]


 41%|█████████████▋                   | 20735/50000 [3:45:46<5:22:45,  1.51it/s]


 41%|█████████████▋                   | 20736/50000 [3:45:46<5:11:07,  1.57it/s]


 41%|█████████████▋                   | 20737/50000 [3:45:47<5:41:44,  1.43it/s]


 41%|█████████████▋                   | 20738/50000 [3:45:48<5:48:48,  1.40it/s]


 41%|█████████████▋                   | 20739/50000 [3:45:48<5:29:21,  1.48it/s]


 41%|█████████████▋                   | 20740/50000 [3:45:49<5:35:12,  1.45it/s]


 41%|█████████████▋                   | 20741/50000 [3:45:50<5:25:23,  1.50it/s]


 41%|█████████████▋                   | 20742/50000 [3:45:50<5:25:00,  1.50it/s]


 41%|█████████████▋                   | 20743/50000 [3:45:51<5:35:11,  1.45it/s]


 41%|█████████████▋                   | 20744/50000 [3:45:52<5:45:06,  1.41it/s]


 41%|█████████████▋                   | 20745/50000 [3:45:53<5:29:28,  1.48it/s]


 41%|█████████████▋                   | 20746/50000 [3:45:53<5:17:35,  1.54it/s]


 41%|█████████████▋                   | 20747/50000 [3:45:54<5:14:55,  1.55it/s]


 41%|█████████████▋                   | 20748/50000 [3:45:54<5:05:37,  1.60it/s]


 41%|█████████████▋                   | 20749/50000 [3:45:55<5:06:28,  1.59it/s]


 42%|█████████████▋                   | 20750/50000 [3:45:56<5:10:48,  1.57it/s]


 42%|█████████████▋                   | 20751/50000 [3:45:56<5:06:20,  1.59it/s]


 42%|█████████████▋                   | 20752/50000 [3:45:57<4:59:34,  1.63it/s]


 42%|█████████████▋                   | 20753/50000 [3:45:58<5:19:10,  1.53it/s]


 42%|█████████████▋                   | 20754/50000 [3:45:58<5:11:33,  1.56it/s]


 42%|█████████████▋                   | 20755/50000 [3:45:59<5:36:27,  1.45it/s]


 42%|█████████████▋                   | 20756/50000 [3:46:00<5:16:59,  1.54it/s]


 42%|█████████████▋                   | 20757/50000 [3:46:00<5:10:39,  1.57it/s]


 42%|█████████████▋                   | 20758/50000 [3:46:01<5:26:02,  1.49it/s]


 42%|█████████████▋                   | 20759/50000 [3:46:01<5:08:48,  1.58it/s]


 42%|█████████████▋                   | 20760/50000 [3:46:02<4:57:09,  1.64it/s]


 42%|█████████████▋                   | 20761/50000 [3:46:03<5:03:15,  1.61it/s]


 42%|█████████████▋                   | 20762/50000 [3:46:03<5:22:03,  1.51it/s]


 42%|█████████████▋                   | 20763/50000 [3:46:04<5:11:55,  1.56it/s]


 42%|█████████████▋                   | 20764/50000 [3:46:05<5:16:20,  1.54it/s]


 42%|█████████████▋                   | 20765/50000 [3:46:05<5:25:58,  1.49it/s]


 42%|█████████████▋                   | 20766/50000 [3:46:06<5:33:18,  1.46it/s]


 42%|█████████████▋                   | 20767/50000 [3:46:07<5:26:11,  1.49it/s]


 42%|█████████████▋                   | 20768/50000 [3:46:07<5:21:14,  1.52it/s]


 42%|█████████████▋                   | 20769/50000 [3:46:08<5:31:06,  1.47it/s]


 42%|█████████████▋                   | 20770/50000 [3:46:09<5:17:45,  1.53it/s]


 42%|█████████████▋                   | 20771/50000 [3:46:09<5:10:17,  1.57it/s]


 42%|█████████████▋                   | 20772/50000 [3:46:10<5:01:22,  1.62it/s]


 42%|█████████████▋                   | 20773/50000 [3:46:11<5:10:47,  1.57it/s]


 42%|█████████████▋                   | 20774/50000 [3:46:11<5:04:25,  1.60it/s]


 42%|█████████████▋                   | 20775/50000 [3:46:12<5:21:38,  1.51it/s]


 42%|█████████████▋                   | 20776/50000 [3:46:13<5:35:31,  1.45it/s]


 42%|█████████████▋                   | 20777/50000 [3:46:13<5:29:06,  1.48it/s]


 42%|█████████████▋                   | 20778/50000 [3:46:14<4:59:24,  1.63it/s]


 42%|█████████████▋                   | 20779/50000 [3:46:14<5:03:48,  1.60it/s]


 42%|█████████████▋                   | 20780/50000 [3:46:15<4:48:53,  1.69it/s]


 42%|█████████████▋                   | 20781/50000 [3:46:16<4:56:30,  1.64it/s]


 42%|█████████████▋                   | 20782/50000 [3:46:16<4:59:05,  1.63it/s]


 42%|█████████████▋                   | 20783/50000 [3:46:17<5:39:49,  1.43it/s]


 42%|█████████████▋                   | 20784/50000 [3:46:18<5:34:19,  1.46it/s]


 42%|█████████████▋                   | 20785/50000 [3:46:18<5:17:18,  1.53it/s]


 42%|█████████████▋                   | 20786/50000 [3:46:19<5:20:09,  1.52it/s]


 42%|█████████████▋                   | 20787/50000 [3:46:20<5:22:07,  1.51it/s]


 42%|█████████████▋                   | 20788/50000 [3:46:20<5:36:47,  1.45it/s]


 42%|█████████████▋                   | 20789/50000 [3:46:21<5:35:24,  1.45it/s]


 42%|█████████████▋                   | 20790/50000 [3:46:22<5:22:04,  1.51it/s]


 42%|█████████████▋                   | 20791/50000 [3:46:22<5:11:44,  1.56it/s]


 42%|█████████████▋                   | 20792/50000 [3:46:23<5:44:16,  1.41it/s]


 42%|█████████████▋                   | 20793/50000 [3:46:24<5:20:46,  1.52it/s]


 42%|█████████████▋                   | 20794/50000 [3:46:24<5:29:43,  1.48it/s]


 42%|█████████████▋                   | 20795/50000 [3:46:25<5:16:09,  1.54it/s]


 42%|█████████████▋                   | 20796/50000 [3:46:26<5:08:26,  1.58it/s]


 42%|█████████████▋                   | 20797/50000 [3:46:26<5:13:13,  1.55it/s]


 42%|█████████████▋                   | 20798/50000 [3:46:27<5:18:25,  1.53it/s]


 42%|█████████████▋                   | 20799/50000 [3:46:28<5:10:04,  1.57it/s]


 42%|█████████████▋                   | 20800/50000 [3:46:28<5:12:27,  1.56it/s]
                                                                                
{'loss': 3.3014, 'grad_norm': 3.0755059719085693, 'learning_rate': 0.000584, 'epoch': 1.09}

 42%|█████████████▋                   | 20800/50000 [3:46:28<5:12:27,  1.56it/s]


 42%|█████████████▋                   | 20801/50000 [3:46:29<5:05:59,  1.59it/s]


 42%|█████████████▋                   | 20802/50000 [3:46:30<5:21:58,  1.51it/s]


 42%|█████████████▋                   | 20803/50000 [3:46:30<5:18:01,  1.53it/s]


 42%|█████████████▋                   | 20804/50000 [3:46:31<5:06:49,  1.59it/s]


 42%|█████████████▋                   | 20805/50000 [3:46:32<5:34:47,  1.45it/s]


 42%|█████████████▋                   | 20806/50000 [3:46:32<5:18:24,  1.53it/s]


 42%|█████████████▋                   | 20807/50000 [3:46:33<5:16:10,  1.54it/s]


 42%|█████████████▋                   | 20808/50000 [3:46:33<5:25:42,  1.49it/s]


 42%|█████████████▋                   | 20809/50000 [3:46:34<5:18:31,  1.53it/s]


 42%|█████████████▋                   | 20810/50000 [3:46:35<5:15:21,  1.54it/s]


 42%|█████████████▋                   | 20811/50000 [3:46:35<4:56:57,  1.64it/s]


 42%|█████████████▋                   | 20812/50000 [3:46:36<4:54:39,  1.65it/s]


 42%|█████████████▋                   | 20813/50000 [3:46:36<4:47:36,  1.69it/s]


 42%|█████████████▋                   | 20814/50000 [3:46:37<5:04:48,  1.60it/s]


 42%|█████████████▋                   | 20815/50000 [3:46:38<5:19:53,  1.52it/s]


 42%|█████████████▋                   | 20816/50000 [3:46:38<5:19:00,  1.52it/s]


 42%|█████████████▋                   | 20817/50000 [3:46:39<5:29:52,  1.47it/s]


 42%|█████████████▋                   | 20818/50000 [3:46:40<5:19:30,  1.52it/s]


 42%|█████████████▋                   | 20819/50000 [3:46:40<5:15:58,  1.54it/s]


 42%|█████████████▋                   | 20820/50000 [3:46:41<5:09:46,  1.57it/s]


 42%|█████████████▋                   | 20821/50000 [3:46:42<5:07:23,  1.58it/s]


 42%|█████████████▋                   | 20822/50000 [3:46:42<5:26:21,  1.49it/s]


 42%|█████████████▋                   | 20823/50000 [3:46:43<5:34:00,  1.46it/s]


 42%|█████████████▋                   | 20824/50000 [3:46:44<5:20:29,  1.52it/s]


 42%|█████████████▋                   | 20825/50000 [3:46:44<5:17:33,  1.53it/s]


 42%|█████████████▋                   | 20826/50000 [3:46:45<5:15:29,  1.54it/s]


 42%|█████████████▋                   | 20827/50000 [3:46:46<5:15:48,  1.54it/s]


 42%|█████████████▋                   | 20828/50000 [3:46:46<5:18:55,  1.52it/s]


 42%|█████████████▋                   | 20829/50000 [3:46:47<5:28:44,  1.48it/s]


 42%|█████████████▋                   | 20830/50000 [3:46:48<5:16:49,  1.53it/s]


 42%|█████████████▋                   | 20831/50000 [3:46:48<5:14:04,  1.55it/s]


 42%|█████████████▋                   | 20832/50000 [3:46:49<5:29:04,  1.48it/s]


 42%|█████████████▋                   | 20833/50000 [3:46:50<5:09:41,  1.57it/s]


 42%|█████████████▊                   | 20834/50000 [3:46:50<5:25:05,  1.50it/s]


 42%|█████████████▊                   | 20835/50000 [3:46:51<5:08:52,  1.57it/s]


 42%|█████████████▊                   | 20836/50000 [3:46:52<5:02:54,  1.60it/s]


 42%|█████████████▊                   | 20837/50000 [3:46:52<5:25:53,  1.49it/s]


 42%|█████████████▊                   | 20838/50000 [3:46:53<5:10:19,  1.57it/s]


 42%|█████████████▊                   | 20839/50000 [3:46:54<5:13:14,  1.55it/s]


 42%|█████████████▊                   | 20840/50000 [3:46:54<5:17:01,  1.53it/s]


 42%|█████████████▊                   | 20841/50000 [3:46:55<5:15:09,  1.54it/s]


 42%|█████████████▊                   | 20842/50000 [3:46:55<5:13:45,  1.55it/s]


 42%|█████████████▊                   | 20843/50000 [3:46:56<5:01:43,  1.61it/s]


 42%|█████████████▊                   | 20844/50000 [3:46:57<4:58:47,  1.63it/s]


 42%|█████████████▊                   | 20845/50000 [3:46:57<5:28:00,  1.48it/s]


 42%|█████████████▊                   | 20846/50000 [3:46:58<5:24:32,  1.50it/s]


 42%|█████████████▊                   | 20847/50000 [3:46:59<5:20:20,  1.52it/s]


 42%|█████████████▊                   | 20848/50000 [3:46:59<5:17:22,  1.53it/s]


 42%|█████████████▊                   | 20849/50000 [3:47:00<5:14:26,  1.55it/s]


 42%|█████████████▊                   | 20850/50000 [3:47:01<5:40:45,  1.43it/s]


 42%|█████████████▊                   | 20851/50000 [3:47:01<5:25:13,  1.49it/s]


 42%|█████████████▊                   | 20852/50000 [3:47:02<5:24:25,  1.50it/s]


 42%|█████████████▊                   | 20853/50000 [3:47:03<5:38:06,  1.44it/s]


 42%|█████████████▊                   | 20854/50000 [3:47:03<5:24:15,  1.50it/s]


 42%|█████████████▊                   | 20855/50000 [3:47:04<5:38:03,  1.44it/s]


 42%|█████████████▊                   | 20856/50000 [3:47:05<5:34:17,  1.45it/s]


 42%|█████████████▊                   | 20857/50000 [3:47:06<5:44:58,  1.41it/s]


 42%|█████████████▊                   | 20858/50000 [3:47:06<5:49:38,  1.39it/s]


 42%|█████████████▊                   | 20859/50000 [3:47:07<5:35:57,  1.45it/s]


 42%|█████████████▊                   | 20860/50000 [3:47:08<5:27:22,  1.48it/s]


 42%|█████████████▊                   | 20861/50000 [3:47:09<5:53:47,  1.37it/s]


 42%|█████████████▊                   | 20862/50000 [3:47:09<5:34:15,  1.45it/s]


 42%|█████████████▊                   | 20863/50000 [3:47:10<5:17:52,  1.53it/s]


 42%|█████████████▊                   | 20864/50000 [3:47:10<5:04:35,  1.59it/s]


 42%|█████████████▊                   | 20865/50000 [3:47:11<5:04:28,  1.59it/s]


 42%|█████████████▊                   | 20866/50000 [3:47:12<5:10:46,  1.56it/s]


 42%|█████████████▊                   | 20867/50000 [3:47:12<5:23:55,  1.50it/s]


 42%|█████████████▊                   | 20868/50000 [3:47:13<5:25:06,  1.49it/s]


 42%|█████████████▊                   | 20869/50000 [3:47:14<5:45:30,  1.41it/s]


 42%|█████████████▊                   | 20870/50000 [3:47:14<5:46:23,  1.40it/s]


 42%|█████████████▊                   | 20871/50000 [3:47:15<5:29:05,  1.48it/s]


 42%|█████████████▊                   | 20872/50000 [3:47:16<5:20:43,  1.51it/s]


 42%|█████████████▊                   | 20873/50000 [3:47:16<5:14:51,  1.54it/s]


 42%|█████████████▊                   | 20874/50000 [3:47:17<5:12:31,  1.55it/s]


 42%|█████████████▊                   | 20875/50000 [3:47:18<5:05:50,  1.59it/s]


 42%|█████████████▊                   | 20876/50000 [3:47:18<4:59:55,  1.62it/s]


 42%|█████████████▊                   | 20877/50000 [3:47:19<4:57:27,  1.63it/s]


 42%|█████████████▊                   | 20878/50000 [3:47:19<5:02:13,  1.61it/s]


 42%|█████████████▊                   | 20879/50000 [3:47:20<4:58:06,  1.63it/s]


 42%|█████████████▊                   | 20880/50000 [3:47:21<4:48:52,  1.68it/s]


 42%|█████████████▊                   | 20881/50000 [3:47:21<5:12:14,  1.55it/s]


 42%|█████████████▊                   | 20882/50000 [3:47:22<5:06:54,  1.58it/s]


 42%|█████████████▊                   | 20883/50000 [3:47:23<5:25:44,  1.49it/s]


 42%|█████████████▊                   | 20884/50000 [3:47:23<5:18:19,  1.52it/s]


 42%|█████████████▊                   | 20885/50000 [3:47:24<5:21:21,  1.51it/s]


 42%|█████████████▊                   | 20886/50000 [3:47:25<5:13:18,  1.55it/s]


 42%|█████████████▊                   | 20887/50000 [3:47:25<5:29:30,  1.47it/s]


 42%|█████████████▊                   | 20888/50000 [3:47:26<5:03:57,  1.60it/s]


 42%|█████████████▊                   | 20889/50000 [3:47:26<4:57:29,  1.63it/s]


 42%|█████████████▊                   | 20890/50000 [3:47:27<4:56:35,  1.64it/s]


 42%|█████████████▊                   | 20891/50000 [3:47:28<4:54:10,  1.65it/s]


 42%|█████████████▊                   | 20892/50000 [3:47:28<5:01:39,  1.61it/s]


 42%|█████████████▊                   | 20893/50000 [3:47:29<5:16:26,  1.53it/s]


 42%|█████████████▊                   | 20894/50000 [3:47:30<5:07:07,  1.58it/s]


 42%|█████████████▊                   | 20895/50000 [3:47:30<5:03:17,  1.60it/s]


 42%|█████████████▊                   | 20896/50000 [3:47:31<5:08:06,  1.57it/s]


 42%|█████████████▊                   | 20897/50000 [3:47:32<5:23:42,  1.50it/s]


 42%|█████████████▊                   | 20898/50000 [3:47:32<5:16:53,  1.53it/s]


 42%|█████████████▊                   | 20899/50000 [3:47:33<5:20:21,  1.51it/s]


 42%|█████████████▊                   | 20900/50000 [3:47:34<5:18:04,  1.52it/s]
                                                                                
{'loss': 3.3059, 'grad_norm': 3.211527109146118, 'learning_rate': 0.0005819999999999999, 'epoch': 1.09}

 42%|█████████████▊                   | 20900/50000 [3:47:34<5:18:04,  1.52it/s]


 42%|█████████████▊                   | 20901/50000 [3:47:34<5:03:47,  1.60it/s]


 42%|█████████████▊                   | 20902/50000 [3:47:35<5:11:06,  1.56it/s]


 42%|█████████████▊                   | 20903/50000 [3:47:35<5:16:13,  1.53it/s]


 42%|█████████████▊                   | 20904/50000 [3:47:36<5:19:23,  1.52it/s]


 42%|█████████████▊                   | 20905/50000 [3:47:37<5:04:46,  1.59it/s]


 42%|█████████████▊                   | 20906/50000 [3:47:37<5:33:50,  1.45it/s]


 42%|█████████████▊                   | 20907/50000 [3:47:38<5:21:20,  1.51it/s]


 42%|█████████████▊                   | 20908/50000 [3:47:39<5:08:12,  1.57it/s]


 42%|█████████████▊                   | 20909/50000 [3:47:39<5:20:05,  1.51it/s]


 42%|█████████████▊                   | 20910/50000 [3:47:40<5:08:08,  1.57it/s]


 42%|█████████████▊                   | 20911/50000 [3:47:41<5:09:46,  1.57it/s]


 42%|█████████████▊                   | 20912/50000 [3:47:41<5:03:38,  1.60it/s]


 42%|█████████████▊                   | 20913/50000 [3:47:42<5:17:11,  1.53it/s]


 42%|█████████████▊                   | 20914/50000 [3:47:43<5:08:45,  1.57it/s]


 42%|█████████████▊                   | 20915/50000 [3:47:43<5:19:34,  1.52it/s]


 42%|█████████████▊                   | 20916/50000 [3:47:44<5:04:09,  1.59it/s]


 42%|█████████████▊                   | 20917/50000 [3:47:44<5:09:02,  1.57it/s]


 42%|█████████████▊                   | 20918/50000 [3:47:45<5:07:50,  1.57it/s]


 42%|█████████████▊                   | 20919/50000 [3:47:46<5:23:33,  1.50it/s]


 42%|█████████████▊                   | 20920/50000 [3:47:46<5:23:04,  1.50it/s]


 42%|█████████████▊                   | 20921/50000 [3:47:47<5:13:31,  1.55it/s]


 42%|█████████████▊                   | 20922/50000 [3:47:48<5:11:28,  1.56it/s]


 42%|█████████████▊                   | 20923/50000 [3:47:48<5:05:24,  1.59it/s]


 42%|█████████████▊                   | 20924/50000 [3:47:49<5:05:49,  1.58it/s]


 42%|█████████████▊                   | 20925/50000 [3:47:50<5:07:00,  1.58it/s]


 42%|█████████████▊                   | 20926/50000 [3:47:50<4:48:57,  1.68it/s]


 42%|█████████████▊                   | 20927/50000 [3:47:51<4:54:16,  1.65it/s]


 42%|█████████████▊                   | 20928/50000 [3:47:51<4:47:34,  1.68it/s]


 42%|█████████████▊                   | 20929/50000 [3:47:52<4:48:52,  1.68it/s]


 42%|█████████████▊                   | 20930/50000 [3:47:53<5:11:18,  1.56it/s]


 42%|█████████████▊                   | 20931/50000 [3:47:53<5:27:40,  1.48it/s]


 42%|█████████████▊                   | 20932/50000 [3:47:54<5:26:53,  1.48it/s]


 42%|█████████████▊                   | 20933/50000 [3:47:55<5:12:29,  1.55it/s]


 42%|█████████████▊                   | 20934/50000 [3:47:55<5:23:56,  1.50it/s]


 42%|█████████████▊                   | 20935/50000 [3:47:56<5:17:02,  1.53it/s]


 42%|█████████████▊                   | 20936/50000 [3:47:57<5:15:42,  1.53it/s]


 42%|█████████████▊                   | 20937/50000 [3:47:57<5:20:20,  1.51it/s]


 42%|█████████████▊                   | 20938/50000 [3:47:58<5:15:13,  1.54it/s]


 42%|█████████████▊                   | 20939/50000 [3:47:59<5:04:18,  1.59it/s]


 42%|█████████████▊                   | 20940/50000 [3:47:59<4:58:29,  1.62it/s]


 42%|█████████████▊                   | 20941/50000 [3:48:00<5:03:33,  1.60it/s]


 42%|█████████████▊                   | 20942/50000 [3:48:00<5:16:24,  1.53it/s]


 42%|█████████████▊                   | 20943/50000 [3:48:01<5:17:58,  1.52it/s]


 42%|█████████████▊                   | 20944/50000 [3:48:02<5:26:27,  1.48it/s]


 42%|█████████████▊                   | 20945/50000 [3:48:02<5:04:23,  1.59it/s]


 42%|█████████████▊                   | 20946/50000 [3:48:03<5:20:56,  1.51it/s]


 42%|█████████████▊                   | 20947/50000 [3:48:04<5:09:13,  1.57it/s]


 42%|█████████████▊                   | 20948/50000 [3:48:04<4:57:28,  1.63it/s]


 42%|█████████████▊                   | 20949/50000 [3:48:05<5:00:07,  1.61it/s]


 42%|█████████████▊                   | 20950/50000 [3:48:06<5:07:28,  1.57it/s]


 42%|█████████████▊                   | 20951/50000 [3:48:06<4:55:13,  1.64it/s]


 42%|█████████████▊                   | 20952/50000 [3:48:07<4:37:58,  1.74it/s]


 42%|█████████████▊                   | 20953/50000 [3:48:07<4:49:11,  1.67it/s]


 42%|█████████████▊                   | 20954/50000 [3:48:08<5:24:33,  1.49it/s]


 42%|█████████████▊                   | 20955/50000 [3:48:09<5:06:14,  1.58it/s]


 42%|█████████████▊                   | 20956/50000 [3:48:09<5:06:18,  1.58it/s]


 42%|█████████████▊                   | 20957/50000 [3:48:10<5:23:26,  1.50it/s]


 42%|█████████████▊                   | 20958/50000 [3:48:11<5:11:23,  1.55it/s]


 42%|█████████████▊                   | 20959/50000 [3:48:11<5:12:59,  1.55it/s]


 42%|█████████████▊                   | 20960/50000 [3:48:12<5:11:25,  1.55it/s]


 42%|█████████████▊                   | 20961/50000 [3:48:12<4:52:47,  1.65it/s]


 42%|█████████████▊                   | 20962/50000 [3:48:13<4:57:13,  1.63it/s]


 42%|█████████████▊                   | 20963/50000 [3:48:14<5:15:28,  1.53it/s]


 42%|█████████████▊                   | 20964/50000 [3:48:15<5:40:41,  1.42it/s]


 42%|█████████████▊                   | 20965/50000 [3:48:15<5:32:34,  1.46it/s]


 42%|█████████████▊                   | 20966/50000 [3:48:16<5:14:04,  1.54it/s]


 42%|█████████████▊                   | 20967/50000 [3:48:17<5:19:27,  1.51it/s]


 42%|█████████████▊                   | 20968/50000 [3:48:17<5:21:26,  1.51it/s]


 42%|█████████████▊                   | 20969/50000 [3:48:18<5:16:00,  1.53it/s]


 42%|█████████████▊                   | 20970/50000 [3:48:18<5:06:36,  1.58it/s]


 42%|█████████████▊                   | 20971/50000 [3:48:19<5:13:45,  1.54it/s]


 42%|█████████████▊                   | 20972/50000 [3:48:20<4:55:21,  1.64it/s]


 42%|█████████████▊                   | 20973/50000 [3:48:20<4:54:51,  1.64it/s]


 42%|█████████████▊                   | 20974/50000 [3:48:21<4:47:13,  1.68it/s]


 42%|█████████████▊                   | 20975/50000 [3:48:22<5:08:50,  1.57it/s]


 42%|█████████████▊                   | 20976/50000 [3:48:22<5:54:34,  1.36it/s]


 42%|█████████████▊                   | 20977/50000 [3:48:23<5:36:16,  1.44it/s]


 42%|█████████████▊                   | 20978/50000 [3:48:24<5:32:40,  1.45it/s]


 42%|█████████████▊                   | 20979/50000 [3:48:24<5:25:35,  1.49it/s]


 42%|█████████████▊                   | 20980/50000 [3:48:25<5:15:04,  1.54it/s]


 42%|█████████████▊                   | 20981/50000 [3:48:26<5:07:41,  1.57it/s]


 42%|█████████████▊                   | 20982/50000 [3:48:26<5:02:33,  1.60it/s]


 42%|█████████████▊                   | 20983/50000 [3:48:27<5:17:38,  1.52it/s]


 42%|█████████████▊                   | 20984/50000 [3:48:28<5:15:02,  1.54it/s]


 42%|█████████████▊                   | 20985/50000 [3:48:28<5:14:09,  1.54it/s]


 42%|█████████████▊                   | 20986/50000 [3:48:29<5:05:23,  1.58it/s]


 42%|█████████████▊                   | 20987/50000 [3:48:29<4:48:39,  1.68it/s]


 42%|█████████████▊                   | 20988/50000 [3:48:30<4:44:56,  1.70it/s]


 42%|█████████████▊                   | 20989/50000 [3:48:31<5:13:04,  1.54it/s]


 42%|█████████████▊                   | 20990/50000 [3:48:31<5:23:35,  1.49it/s]


 42%|█████████████▊                   | 20991/50000 [3:48:32<5:50:53,  1.38it/s]


 42%|█████████████▊                   | 20992/50000 [3:48:33<5:38:38,  1.43it/s]


 42%|█████████████▊                   | 20993/50000 [3:48:34<5:34:41,  1.44it/s]


 42%|█████████████▊                   | 20994/50000 [3:48:34<5:30:01,  1.46it/s]


 42%|█████████████▊                   | 20995/50000 [3:48:35<5:13:00,  1.54it/s]


 42%|█████████████▊                   | 20996/50000 [3:48:36<5:39:31,  1.42it/s]


 42%|█████████████▊                   | 20997/50000 [3:48:36<5:35:11,  1.44it/s]


 42%|█████████████▊                   | 20998/50000 [3:48:37<5:21:13,  1.50it/s]


 42%|█████████████▊                   | 20999/50000 [3:48:37<5:12:21,  1.55it/s]


 42%|█████████████▊                   | 21000/50000 [3:48:38<5:10:55,  1.55it/s]


                                                                                
{'loss': 3.3521, 'grad_norm': 3.0954244136810303, 'learning_rate': 0.00058, 'epoch': 1.1}

 42%|█████████████▊                   | 21000/50000 [3:48:38<5:10:55,  1.55it/s]


 42%|█████████████▊                   | 21001/50000 [3:48:39<5:12:25,  1.55it/s]


 42%|█████████████▊                   | 21002/50000 [3:48:39<5:16:57,  1.52it/s]


 42%|█████████████▊                   | 21003/50000 [3:48:40<5:00:52,  1.61it/s]


 42%|█████████████▊                   | 21004/50000 [3:48:41<5:02:20,  1.60it/s]


 42%|█████████████▊                   | 21005/50000 [3:48:41<4:56:01,  1.63it/s]


 42%|█████████████▊                   | 21006/50000 [3:48:42<4:55:15,  1.64it/s]


 42%|█████████████▊                   | 21007/50000 [3:48:42<5:03:57,  1.59it/s]


 42%|█████████████▊                   | 21008/50000 [3:48:43<5:17:40,  1.52it/s]


 42%|█████████████▊                   | 21009/50000 [3:48:44<5:27:17,  1.48it/s]


 42%|█████████████▊                   | 21010/50000 [3:48:45<5:16:16,  1.53it/s]


 42%|█████████████▊                   | 21011/50000 [3:48:45<5:13:57,  1.54it/s]


 42%|█████████████▊                   | 21012/50000 [3:48:46<5:06:50,  1.57it/s]


 42%|█████████████▊                   | 21013/50000 [3:48:46<5:08:20,  1.57it/s]


 42%|█████████████▊                   | 21014/50000 [3:48:47<5:07:33,  1.57it/s]


 42%|█████████████▊                   | 21015/50000 [3:48:48<4:59:41,  1.61it/s]


 42%|█████████████▊                   | 21016/50000 [3:48:48<5:06:33,  1.58it/s]


 42%|█████████████▊                   | 21017/50000 [3:48:49<5:25:01,  1.49it/s]


 42%|█████████████▊                   | 21018/50000 [3:48:50<5:13:43,  1.54it/s]


 42%|█████████████▊                   | 21019/50000 [3:48:50<5:05:46,  1.58it/s]


 42%|█████████████▊                   | 21020/50000 [3:48:51<5:11:17,  1.55it/s]


 42%|█████████████▊                   | 21021/50000 [3:48:51<4:57:42,  1.62it/s]


 42%|█████████████▊                   | 21022/50000 [3:48:52<4:51:09,  1.66it/s]


 42%|█████████████▉                   | 21023/50000 [3:48:53<4:54:16,  1.64it/s]


 42%|█████████████▉                   | 21024/50000 [3:48:54<5:54:22,  1.36it/s]


 42%|█████████████▉                   | 21025/50000 [3:48:54<5:32:53,  1.45it/s]


 42%|█████████████▉                   | 21026/50000 [3:48:55<5:38:32,  1.43it/s]


 42%|█████████████▉                   | 21027/50000 [3:48:56<5:50:01,  1.38it/s]


 42%|█████████████▉                   | 21028/50000 [3:48:57<6:10:33,  1.30it/s]


 42%|█████████████▉                   | 21029/50000 [3:48:57<5:42:34,  1.41it/s]


 42%|█████████████▉                   | 21030/50000 [3:48:58<5:31:18,  1.46it/s]


 42%|█████████████▉                   | 21031/50000 [3:48:58<5:15:06,  1.53it/s]


 42%|█████████████▉                   | 21032/50000 [3:48:59<5:17:31,  1.52it/s]


 42%|█████████████▉                   | 21033/50000 [3:49:00<5:30:35,  1.46it/s]


 42%|█████████████▉                   | 21034/50000 [3:49:00<5:11:53,  1.55it/s]


 42%|█████████████▉                   | 21035/50000 [3:49:01<5:42:06,  1.41it/s]


 42%|█████████████▉                   | 21036/50000 [3:49:02<5:35:54,  1.44it/s]


 42%|█████████████▉                   | 21037/50000 [3:49:02<5:15:50,  1.53it/s]


 42%|█████████████▉                   | 21038/50000 [3:49:03<5:13:41,  1.54it/s]


 42%|█████████████▉                   | 21039/50000 [3:49:04<5:16:43,  1.52it/s]


 42%|█████████████▉                   | 21040/50000 [3:49:04<5:14:11,  1.54it/s]


 42%|█████████████▉                   | 21041/50000 [3:49:05<5:16:06,  1.53it/s]


 42%|█████████████▉                   | 21042/50000 [3:49:06<5:46:20,  1.39it/s]


 42%|█████████████▉                   | 21043/50000 [3:49:06<5:15:41,  1.53it/s]


 42%|█████████████▉                   | 21044/50000 [3:49:07<5:17:38,  1.52it/s]


 42%|█████████████▉                   | 21045/50000 [3:49:08<5:18:09,  1.52it/s]


 42%|█████████████▉                   | 21046/50000 [3:49:08<5:14:46,  1.53it/s]


 42%|█████████████▉                   | 21047/50000 [3:49:09<5:16:01,  1.53it/s]


 42%|█████████████▉                   | 21048/50000 [3:49:10<5:20:34,  1.51it/s]


 42%|█████████████▉                   | 21049/50000 [3:49:11<5:30:32,  1.46it/s]


 42%|█████████████▉                   | 21050/50000 [3:49:11<5:34:29,  1.44it/s]


 42%|█████████████▉                   | 21051/50000 [3:49:12<5:21:32,  1.50it/s]


 42%|█████████████▉                   | 21052/50000 [3:49:13<5:34:57,  1.44it/s]


 42%|█████████████▉                   | 21053/50000 [3:49:13<5:28:19,  1.47it/s]


 42%|█████████████▉                   | 21054/50000 [3:49:14<5:10:07,  1.56it/s]


 42%|█████████████▉                   | 21055/50000 [3:49:15<5:26:26,  1.48it/s]


 42%|█████████████▉                   | 21056/50000 [3:49:15<5:23:12,  1.49it/s]


 42%|█████████████▉                   | 21057/50000 [3:49:16<5:38:54,  1.42it/s]


 42%|█████████████▉                   | 21058/50000 [3:49:17<5:33:45,  1.45it/s]


 42%|█████████████▉                   | 21059/50000 [3:49:17<5:40:40,  1.42it/s]


 42%|█████████████▉                   | 21060/50000 [3:49:18<5:21:19,  1.50it/s]


 42%|█████████████▉                   | 21061/50000 [3:49:19<5:32:52,  1.45it/s]


 42%|█████████████▉                   | 21062/50000 [3:49:19<5:27:39,  1.47it/s]


 42%|█████████████▉                   | 21063/50000 [3:49:20<5:28:38,  1.47it/s]


 42%|█████████████▉                   | 21064/50000 [3:49:21<5:22:01,  1.50it/s]


 42%|█████████████▉                   | 21065/50000 [3:49:21<5:11:52,  1.55it/s]


 42%|█████████████▉                   | 21066/50000 [3:49:22<5:29:57,  1.46it/s]


 42%|█████████████▉                   | 21067/50000 [3:49:23<5:27:28,  1.47it/s]


 42%|█████████████▉                   | 21068/50000 [3:49:23<5:28:35,  1.47it/s]


 42%|█████████████▉                   | 21069/50000 [3:49:24<5:09:07,  1.56it/s]


 42%|█████████████▉                   | 21070/50000 [3:49:25<5:12:13,  1.54it/s]


 42%|█████████████▉                   | 21071/50000 [3:49:25<5:04:55,  1.58it/s]


 42%|█████████████▉                   | 21072/50000 [3:49:26<4:57:07,  1.62it/s]


 42%|█████████████▉                   | 21073/50000 [3:49:26<5:01:44,  1.60it/s]


 42%|█████████████▉                   | 21074/50000 [3:49:27<4:58:32,  1.61it/s]


 42%|█████████████▉                   | 21075/50000 [3:49:28<4:59:11,  1.61it/s]


 42%|█████████████▉                   | 21076/50000 [3:49:28<4:50:29,  1.66it/s]


 42%|█████████████▉                   | 21077/50000 [3:49:29<4:55:16,  1.63it/s]


 42%|█████████████▉                   | 21078/50000 [3:49:30<5:06:00,  1.58it/s]


 42%|█████████████▉                   | 21079/50000 [3:49:30<5:00:49,  1.60it/s]


 42%|█████████████▉                   | 21080/50000 [3:49:31<4:55:09,  1.63it/s]


 42%|█████████████▉                   | 21081/50000 [3:49:32<5:24:06,  1.49it/s]


 42%|█████████████▉                   | 21082/50000 [3:49:32<5:25:11,  1.48it/s]


 42%|█████████████▉                   | 21083/50000 [3:49:33<5:30:36,  1.46it/s]


 42%|█████████████▉                   | 21084/50000 [3:49:34<5:34:21,  1.44it/s]


 42%|█████████████▉                   | 21085/50000 [3:49:34<5:39:22,  1.42it/s]


 42%|█████████████▉                   | 21086/50000 [3:49:35<5:25:06,  1.48it/s]


 42%|█████████████▉                   | 21087/50000 [3:49:36<5:34:46,  1.44it/s]


 42%|█████████████▉                   | 21088/50000 [3:49:36<5:10:27,  1.55it/s]


 42%|█████████████▉                   | 21089/50000 [3:49:37<5:11:17,  1.55it/s]


 42%|█████████████▉                   | 21090/50000 [3:49:38<5:11:00,  1.55it/s]


 42%|█████████████▉                   | 21091/50000 [3:49:38<5:05:12,  1.58it/s]


 42%|█████████████▉                   | 21092/50000 [3:49:39<4:58:28,  1.61it/s]


 42%|█████████████▉                   | 21093/50000 [3:49:39<4:59:17,  1.61it/s]


 42%|█████████████▉                   | 21094/50000 [3:49:40<4:57:01,  1.62it/s]


 42%|█████████████▉                   | 21095/50000 [3:49:40<4:37:39,  1.74it/s]


 42%|█████████████▉                   | 21096/50000 [3:49:41<4:36:56,  1.74it/s]


 42%|█████████████▉                   | 21097/50000 [3:49:42<4:46:30,  1.68it/s]


 42%|█████████████▉                   | 21098/50000 [3:49:42<4:54:03,  1.64it/s]


 42%|█████████████▉                   | 21099/50000 [3:49:43<5:01:26,  1.60it/s]


 42%|█████████████▉                   | 21100/50000 [3:49:44<5:21:39,  1.50it/s]
                                                                                
{'loss': 3.2819, 'grad_norm': 3.6837351322174072, 'learning_rate': 0.000578, 'epoch': 1.1}

 42%|█████████████▉                   | 21100/50000 [3:49:44<5:21:39,  1.50it/s]


 42%|█████████████▉                   | 21101/50000 [3:49:44<5:31:17,  1.45it/s]


 42%|█████████████▉                   | 21102/50000 [3:49:45<5:21:42,  1.50it/s]


 42%|█████████████▉                   | 21103/50000 [3:49:46<5:05:40,  1.58it/s]


 42%|█████████████▉                   | 21104/50000 [3:49:46<5:21:13,  1.50it/s]


 42%|█████████████▉                   | 21105/50000 [3:49:47<5:09:53,  1.55it/s]


 42%|█████████████▉                   | 21106/50000 [3:49:48<5:12:04,  1.54it/s]


 42%|█████████████▉                   | 21107/50000 [3:49:49<5:51:58,  1.37it/s]


 42%|█████████████▉                   | 21108/50000 [3:49:49<5:53:41,  1.36it/s]


 42%|█████████████▉                   | 21109/50000 [3:49:50<5:52:08,  1.37it/s]


 42%|█████████████▉                   | 21110/50000 [3:49:51<6:16:40,  1.28it/s]


 42%|█████████████▉                   | 21111/50000 [3:49:52<6:00:03,  1.34it/s]


 42%|█████████████▉                   | 21112/50000 [3:49:52<5:57:04,  1.35it/s]


 42%|█████████████▉                   | 21113/50000 [3:49:53<5:46:19,  1.39it/s]


 42%|█████████████▉                   | 21114/50000 [3:49:54<5:33:15,  1.44it/s]


 42%|█████████████▉                   | 21115/50000 [3:49:54<5:15:54,  1.52it/s]


 42%|█████████████▉                   | 21116/50000 [3:49:55<5:17:29,  1.52it/s]


 42%|█████████████▉                   | 21117/50000 [3:49:56<5:16:44,  1.52it/s]


 42%|█████████████▉                   | 21118/50000 [3:49:56<5:17:11,  1.52it/s]


 42%|█████████████▉                   | 21119/50000 [3:49:57<5:13:47,  1.53it/s]


 42%|█████████████▉                   | 21120/50000 [3:49:57<5:03:10,  1.59it/s]


 42%|█████████████▉                   | 21121/50000 [3:49:58<5:18:59,  1.51it/s]


 42%|█████████████▉                   | 21122/50000 [3:49:59<5:31:23,  1.45it/s]


 42%|█████████████▉                   | 21123/50000 [3:49:59<5:04:56,  1.58it/s]


 42%|█████████████▉                   | 21124/50000 [3:50:00<4:48:16,  1.67it/s]


 42%|█████████████▉                   | 21125/50000 [3:50:01<4:48:56,  1.67it/s]


 42%|█████████████▉                   | 21126/50000 [3:50:01<4:44:51,  1.69it/s]


 42%|█████████████▉                   | 21127/50000 [3:50:02<4:56:40,  1.62it/s]


 42%|█████████████▉                   | 21128/50000 [3:50:02<5:02:07,  1.59it/s]


 42%|█████████████▉                   | 21129/50000 [3:50:03<4:53:12,  1.64it/s]


 42%|█████████████▉                   | 21130/50000 [3:50:04<4:46:59,  1.68it/s]


 42%|█████████████▉                   | 21131/50000 [3:50:04<5:03:42,  1.58it/s]


 42%|█████████████▉                   | 21132/50000 [3:50:05<5:10:57,  1.55it/s]


 42%|█████████████▉                   | 21133/50000 [3:50:06<5:03:18,  1.59it/s]


 42%|█████████████▉                   | 21134/50000 [3:50:06<5:06:15,  1.57it/s]


 42%|█████████████▉                   | 21135/50000 [3:50:07<4:44:04,  1.69it/s]


 42%|█████████████▉                   | 21136/50000 [3:50:07<4:45:26,  1.69it/s]


 42%|█████████████▉                   | 21137/50000 [3:50:08<4:54:37,  1.63it/s]


 42%|█████████████▉                   | 21138/50000 [3:50:09<4:59:10,  1.61it/s]


 42%|█████████████▉                   | 21139/50000 [3:50:09<5:12:32,  1.54it/s]


 42%|█████████████▉                   | 21140/50000 [3:50:10<5:09:53,  1.55it/s]


 42%|█████████████▉                   | 21141/50000 [3:50:11<5:13:28,  1.53it/s]


 42%|█████████████▉                   | 21142/50000 [3:50:11<5:12:19,  1.54it/s]


 42%|█████████████▉                   | 21143/50000 [3:50:12<5:08:49,  1.56it/s]


 42%|█████████████▉                   | 21144/50000 [3:50:13<5:38:13,  1.42it/s]


 42%|█████████████▉                   | 21145/50000 [3:50:13<5:33:01,  1.44it/s]


 42%|█████████████▉                   | 21146/50000 [3:50:14<5:30:19,  1.46it/s]


 42%|█████████████▉                   | 21147/50000 [3:50:15<5:21:58,  1.49it/s]


 42%|█████████████▉                   | 21148/50000 [3:50:15<5:09:44,  1.55it/s]


 42%|█████████████▉                   | 21149/50000 [3:50:16<4:58:29,  1.61it/s]


 42%|█████████████▉                   | 21150/50000 [3:50:16<5:01:40,  1.59it/s]


 42%|█████████████▉                   | 21151/50000 [3:50:17<5:04:39,  1.58it/s]


 42%|█████████████▉                   | 21152/50000 [3:50:18<5:26:24,  1.47it/s]


 42%|█████████████▉                   | 21153/50000 [3:50:18<5:15:50,  1.52it/s]


 42%|█████████████▉                   | 21154/50000 [3:50:19<5:03:39,  1.58it/s]


 42%|█████████████▉                   | 21155/50000 [3:50:20<4:52:15,  1.64it/s]


 42%|█████████████▉                   | 21156/50000 [3:50:20<4:55:15,  1.63it/s]


 42%|█████████████▉                   | 21157/50000 [3:50:21<5:04:40,  1.58it/s]


 42%|█████████████▉                   | 21158/50000 [3:50:22<5:00:20,  1.60it/s]


 42%|█████████████▉                   | 21159/50000 [3:50:22<4:51:49,  1.65it/s]


 42%|█████████████▉                   | 21160/50000 [3:50:23<5:00:28,  1.60it/s]


 42%|█████████████▉                   | 21161/50000 [3:50:23<4:53:43,  1.64it/s]


 42%|█████████████▉                   | 21162/50000 [3:50:24<4:45:26,  1.68it/s]


 42%|█████████████▉                   | 21163/50000 [3:50:24<4:42:24,  1.70it/s]


 42%|█████████████▉                   | 21164/50000 [3:50:25<4:50:08,  1.66it/s]


 42%|█████████████▉                   | 21165/50000 [3:50:26<4:57:20,  1.62it/s]


 42%|█████████████▉                   | 21166/50000 [3:50:26<5:05:14,  1.57it/s]


 42%|█████████████▉                   | 21167/50000 [3:50:27<5:04:20,  1.58it/s]


 42%|█████████████▉                   | 21168/50000 [3:50:28<5:04:36,  1.58it/s]


 42%|█████████████▉                   | 21169/50000 [3:50:29<5:31:58,  1.45it/s]


 42%|█████████████▉                   | 21170/50000 [3:50:29<5:15:41,  1.52it/s]


 42%|█████████████▉                   | 21171/50000 [3:50:30<5:03:25,  1.58it/s]


 42%|█████████████▉                   | 21172/50000 [3:50:30<4:54:45,  1.63it/s]


 42%|█████████████▉                   | 21173/50000 [3:50:31<5:26:00,  1.47it/s]


 42%|█████████████▉                   | 21174/50000 [3:50:32<5:10:42,  1.55it/s]


 42%|█████████████▉                   | 21175/50000 [3:50:32<4:48:29,  1.67it/s]


 42%|█████████████▉                   | 21176/50000 [3:50:33<4:57:04,  1.62it/s]


 42%|█████████████▉                   | 21177/50000 [3:50:33<5:02:06,  1.59it/s]


 42%|█████████████▉                   | 21178/50000 [3:50:34<4:57:26,  1.62it/s]


 42%|█████████████▉                   | 21179/50000 [3:50:35<5:04:13,  1.58it/s]


 42%|█████████████▉                   | 21180/50000 [3:50:35<4:53:21,  1.64it/s]


 42%|█████████████▉                   | 21181/50000 [3:50:36<5:08:24,  1.56it/s]


 42%|█████████████▉                   | 21182/50000 [3:50:37<5:33:25,  1.44it/s]


 42%|█████████████▉                   | 21183/50000 [3:50:37<5:29:07,  1.46it/s]


 42%|█████████████▉                   | 21184/50000 [3:50:38<5:27:04,  1.47it/s]


 42%|█████████████▉                   | 21185/50000 [3:50:39<5:33:12,  1.44it/s]


 42%|█████████████▉                   | 21186/50000 [3:50:39<5:08:36,  1.56it/s]


 42%|█████████████▉                   | 21187/50000 [3:50:40<4:58:29,  1.61it/s]


 42%|█████████████▉                   | 21188/50000 [3:50:41<5:30:41,  1.45it/s]


 42%|█████████████▉                   | 21189/50000 [3:50:42<5:56:57,  1.35it/s]


 42%|█████████████▉                   | 21190/50000 [3:50:42<5:55:02,  1.35it/s]


 42%|█████████████▉                   | 21191/50000 [3:50:43<5:44:14,  1.39it/s]


 42%|█████████████▉                   | 21192/50000 [3:50:44<5:44:22,  1.39it/s]


 42%|█████████████▉                   | 21193/50000 [3:50:44<5:38:35,  1.42it/s]


 42%|█████████████▉                   | 21194/50000 [3:50:45<5:35:24,  1.43it/s]


 42%|█████████████▉                   | 21195/50000 [3:50:46<5:17:56,  1.51it/s]


 42%|█████████████▉                   | 21196/50000 [3:50:46<5:07:37,  1.56it/s]


 42%|█████████████▉                   | 21197/50000 [3:50:47<5:11:26,  1.54it/s]


 42%|█████████████▉                   | 21198/50000 [3:50:48<5:28:17,  1.46it/s]


 42%|█████████████▉                   | 21199/50000 [3:50:48<5:19:50,  1.50it/s]


 42%|█████████████▉                   | 21200/50000 [3:50:49<5:21:12,  1.49it/s]
                                                                                
{'loss': 3.3336, 'grad_norm': 2.845824718475342, 'learning_rate': 0.000576, 'epoch': 1.11}

 42%|█████████████▉                   | 21200/50000 [3:50:49<5:21:12,  1.49it/s]


 42%|█████████████▉                   | 21201/50000 [3:50:50<5:42:14,  1.40it/s]


 42%|█████████████▉                   | 21202/50000 [3:50:50<5:25:30,  1.47it/s]


 42%|█████████████▉                   | 21203/50000 [3:50:51<5:00:48,  1.60it/s]


 42%|█████████████▉                   | 21204/50000 [3:50:52<5:05:05,  1.57it/s]


 42%|█████████████▉                   | 21205/50000 [3:50:52<4:43:46,  1.69it/s]


 42%|█████████████▉                   | 21206/50000 [3:50:53<4:51:25,  1.65it/s]


 42%|█████████████▉                   | 21207/50000 [3:50:53<4:51:51,  1.64it/s]


 42%|█████████████▉                   | 21208/50000 [3:50:54<4:59:26,  1.60it/s]


 42%|█████████████▉                   | 21209/50000 [3:50:55<4:51:57,  1.64it/s]


 42%|█████████████▉                   | 21210/50000 [3:50:55<5:08:59,  1.55it/s]


 42%|█████████████▉                   | 21211/50000 [3:50:56<5:04:03,  1.58it/s]


 42%|█████████████▉                   | 21212/50000 [3:50:57<5:08:51,  1.55it/s]


 42%|██████████████                   | 21213/50000 [3:50:57<5:12:15,  1.54it/s]


 42%|██████████████                   | 21214/50000 [3:50:58<5:15:32,  1.52it/s]


 42%|██████████████                   | 21215/50000 [3:50:59<5:27:17,  1.47it/s]


 42%|██████████████                   | 21216/50000 [3:51:00<6:18:36,  1.27it/s]


 42%|██████████████                   | 21217/50000 [3:51:00<5:57:24,  1.34it/s]


 42%|██████████████                   | 21218/50000 [3:51:01<5:33:27,  1.44it/s]


 42%|██████████████                   | 21219/50000 [3:51:02<5:19:23,  1.50it/s]


 42%|██████████████                   | 21220/50000 [3:51:02<5:21:23,  1.49it/s]


 42%|██████████████                   | 21221/50000 [3:51:03<5:28:20,  1.46it/s]


 42%|██████████████                   | 21222/50000 [3:51:04<5:24:56,  1.48it/s]


 42%|██████████████                   | 21223/50000 [3:51:04<5:34:39,  1.43it/s]


 42%|██████████████                   | 21224/50000 [3:51:05<5:40:46,  1.41it/s]


 42%|██████████████                   | 21225/50000 [3:51:06<6:02:02,  1.32it/s]


 42%|██████████████                   | 21226/50000 [3:51:07<5:47:32,  1.38it/s]


 42%|██████████████                   | 21227/50000 [3:51:07<5:42:04,  1.40it/s]


 42%|██████████████                   | 21228/50000 [3:51:08<5:36:23,  1.43it/s]


 42%|██████████████                   | 21229/50000 [3:51:09<5:22:17,  1.49it/s]


 42%|██████████████                   | 21230/50000 [3:51:09<5:18:51,  1.50it/s]


 42%|██████████████                   | 21231/50000 [3:51:10<5:09:06,  1.55it/s]


 42%|██████████████                   | 21232/50000 [3:51:11<5:36:02,  1.43it/s]


 42%|██████████████                   | 21233/50000 [3:51:11<5:32:04,  1.44it/s]


 42%|██████████████                   | 21234/50000 [3:51:12<5:12:01,  1.54it/s]


 42%|██████████████                   | 21235/50000 [3:51:13<5:08:56,  1.55it/s]


 42%|██████████████                   | 21236/50000 [3:51:13<4:58:33,  1.61it/s]


 42%|██████████████                   | 21237/50000 [3:51:14<4:54:50,  1.63it/s]


 42%|██████████████                   | 21238/50000 [3:51:14<4:50:28,  1.65it/s]


 42%|██████████████                   | 21239/50000 [3:51:15<5:21:09,  1.49it/s]


 42%|██████████████                   | 21240/50000 [3:51:16<5:21:42,  1.49it/s]


 42%|██████████████                   | 21241/50000 [3:51:17<5:37:59,  1.42it/s]


 42%|██████████████                   | 21242/50000 [3:51:17<5:18:51,  1.50it/s]


 42%|██████████████                   | 21243/50000 [3:51:18<5:28:42,  1.46it/s]


 42%|██████████████                   | 21244/50000 [3:51:18<5:14:11,  1.53it/s]


 42%|██████████████                   | 21245/50000 [3:51:19<5:29:02,  1.46it/s]


 42%|██████████████                   | 21246/50000 [3:51:20<5:20:48,  1.49it/s]


 42%|██████████████                   | 21247/50000 [3:51:21<5:41:23,  1.40it/s]


 42%|██████████████                   | 21248/50000 [3:51:21<5:13:26,  1.53it/s]


 42%|██████████████                   | 21249/50000 [3:51:22<5:09:52,  1.55it/s]


 42%|██████████████                   | 21250/50000 [3:51:22<4:58:16,  1.61it/s]


 43%|██████████████                   | 21251/50000 [3:51:23<5:00:12,  1.60it/s]


 43%|██████████████                   | 21252/50000 [3:51:24<5:00:40,  1.59it/s]


 43%|██████████████                   | 21253/50000 [3:51:24<5:04:29,  1.57it/s]


 43%|██████████████                   | 21254/50000 [3:51:25<4:57:12,  1.61it/s]


 43%|██████████████                   | 21255/50000 [3:51:25<4:49:21,  1.66it/s]


 43%|██████████████                   | 21256/50000 [3:51:26<5:25:54,  1.47it/s]


 43%|██████████████                   | 21257/50000 [3:51:27<5:13:57,  1.53it/s]


 43%|██████████████                   | 21258/50000 [3:51:28<5:43:35,  1.39it/s]


 43%|██████████████                   | 21259/50000 [3:51:28<5:23:21,  1.48it/s]


 43%|██████████████                   | 21260/50000 [3:51:29<4:56:02,  1.62it/s]


 43%|██████████████                   | 21261/50000 [3:51:29<5:05:21,  1.57it/s]


 43%|██████████████                   | 21262/50000 [3:51:30<4:51:58,  1.64it/s]


 43%|██████████████                   | 21263/50000 [3:51:31<5:01:39,  1.59it/s]


 43%|██████████████                   | 21264/50000 [3:51:31<5:07:35,  1.56it/s]


 43%|██████████████                   | 21265/50000 [3:51:32<5:11:46,  1.54it/s]


 43%|██████████████                   | 21266/50000 [3:51:33<5:23:13,  1.48it/s]


 43%|██████████████                   | 21267/50000 [3:51:33<5:08:58,  1.55it/s]


 43%|██████████████                   | 21268/50000 [3:51:34<4:56:20,  1.62it/s]


 43%|██████████████                   | 21269/50000 [3:51:35<5:17:47,  1.51it/s]


 43%|██████████████                   | 21270/50000 [3:51:35<5:04:24,  1.57it/s]


 43%|██████████████                   | 21271/50000 [3:51:36<4:56:47,  1.61it/s]


 43%|██████████████                   | 21272/50000 [3:51:36<5:01:39,  1.59it/s]


 43%|██████████████                   | 21273/50000 [3:51:37<5:41:12,  1.40it/s]


 43%|██████████████                   | 21274/50000 [3:51:38<5:25:08,  1.47it/s]


 43%|██████████████                   | 21275/50000 [3:51:39<5:13:57,  1.52it/s]


 43%|██████████████                   | 21276/50000 [3:51:39<5:11:32,  1.54it/s]


 43%|██████████████                   | 21277/50000 [3:51:40<4:49:27,  1.65it/s]


 43%|██████████████                   | 21278/50000 [3:51:40<4:51:44,  1.64it/s]


 43%|██████████████                   | 21279/50000 [3:51:41<4:38:21,  1.72it/s]


 43%|██████████████                   | 21280/50000 [3:51:41<4:35:45,  1.74it/s]


 43%|██████████████                   | 21281/50000 [3:51:42<4:51:34,  1.64it/s]


 43%|██████████████                   | 21282/50000 [3:51:43<4:54:30,  1.63it/s]


 43%|██████████████                   | 21283/50000 [3:51:43<4:47:56,  1.66it/s]


 43%|██████████████                   | 21284/50000 [3:51:44<4:54:10,  1.63it/s]


 43%|██████████████                   | 21285/50000 [3:51:45<5:01:18,  1.59it/s]


 43%|██████████████                   | 21286/50000 [3:51:45<5:04:27,  1.57it/s]


 43%|██████████████                   | 21287/50000 [3:51:46<5:03:59,  1.57it/s]


 43%|██████████████                   | 21288/50000 [3:51:47<5:04:29,  1.57it/s]


 43%|██████████████                   | 21289/50000 [3:51:47<4:56:43,  1.61it/s]


 43%|██████████████                   | 21290/50000 [3:51:48<5:11:38,  1.54it/s]


 43%|██████████████                   | 21291/50000 [3:51:48<5:00:50,  1.59it/s]


 43%|██████████████                   | 21292/50000 [3:51:49<5:30:22,  1.45it/s]


 43%|██████████████                   | 21293/50000 [3:51:50<5:22:07,  1.49it/s]


 43%|██████████████                   | 21294/50000 [3:51:50<5:04:34,  1.57it/s]


 43%|██████████████                   | 21295/50000 [3:51:51<5:03:09,  1.58it/s]


 43%|██████████████                   | 21296/50000 [3:51:52<4:51:17,  1.64it/s]


 43%|██████████████                   | 21297/50000 [3:51:52<4:47:26,  1.66it/s]


 43%|██████████████                   | 21298/50000 [3:51:53<5:19:18,  1.50it/s]


 43%|██████████████                   | 21299/50000 [3:51:54<5:41:41,  1.40it/s]


 43%|██████████████                   | 21300/50000 [3:51:55<5:47:38,  1.38it/s]
                                                                                
{'loss': 3.3216, 'grad_norm': 3.5086886882781982, 'learning_rate': 0.000574, 'epoch': 1.12}

 43%|██████████████                   | 21300/50000 [3:51:55<5:47:38,  1.38it/s]


 43%|██████████████                   | 21301/50000 [3:51:55<5:29:46,  1.45it/s]


 43%|██████████████                   | 21302/50000 [3:51:56<5:05:46,  1.56it/s]


 43%|██████████████                   | 21303/50000 [3:51:56<5:11:13,  1.54it/s]


 43%|██████████████                   | 21304/50000 [3:51:57<4:51:52,  1.64it/s]


 43%|██████████████                   | 21305/50000 [3:51:58<4:47:50,  1.66it/s]


 43%|██████████████                   | 21306/50000 [3:51:58<4:53:07,  1.63it/s]


 43%|██████████████                   | 21307/50000 [3:51:59<5:09:03,  1.55it/s]


 43%|██████████████                   | 21308/50000 [3:52:00<5:25:38,  1.47it/s]


 43%|██████████████                   | 21309/50000 [3:52:00<5:18:44,  1.50it/s]


 43%|██████████████                   | 21310/50000 [3:52:01<5:19:21,  1.50it/s]


 43%|██████████████                   | 21311/50000 [3:52:02<5:15:27,  1.52it/s]


 43%|██████████████                   | 21312/50000 [3:52:02<5:11:53,  1.53it/s]


 43%|██████████████                   | 21313/50000 [3:52:03<5:01:15,  1.59it/s]


 43%|██████████████                   | 21314/50000 [3:52:04<5:14:58,  1.52it/s]


 43%|██████████████                   | 21315/50000 [3:52:04<5:13:22,  1.53it/s]


 43%|██████████████                   | 21316/50000 [3:52:05<5:04:03,  1.57it/s]


 43%|██████████████                   | 21317/50000 [3:52:05<4:58:34,  1.60it/s]


 43%|██████████████                   | 21318/50000 [3:52:06<5:01:53,  1.58it/s]


 43%|██████████████                   | 21319/50000 [3:52:07<5:14:33,  1.52it/s]


 43%|██████████████                   | 21320/50000 [3:52:08<5:42:56,  1.39it/s]


 43%|██████████████                   | 21321/50000 [3:52:08<5:27:06,  1.46it/s]


 43%|██████████████                   | 21322/50000 [3:52:09<5:26:57,  1.46it/s]


 43%|██████████████                   | 21323/50000 [3:52:09<5:14:11,  1.52it/s]


 43%|██████████████                   | 21324/50000 [3:52:10<5:23:40,  1.48it/s]


 43%|██████████████                   | 21325/50000 [3:52:11<5:30:17,  1.45it/s]


 43%|██████████████                   | 21326/50000 [3:52:11<5:10:12,  1.54it/s]


 43%|██████████████                   | 21327/50000 [3:52:12<5:00:41,  1.59it/s]


 43%|██████████████                   | 21328/50000 [3:52:13<5:01:13,  1.59it/s]


 43%|██████████████                   | 21329/50000 [3:52:13<5:08:54,  1.55it/s]


 43%|██████████████                   | 21330/50000 [3:52:14<4:57:53,  1.60it/s]


 43%|██████████████                   | 21331/50000 [3:52:15<5:04:44,  1.57it/s]


 43%|██████████████                   | 21332/50000 [3:52:15<5:04:05,  1.57it/s]


 43%|██████████████                   | 21333/50000 [3:52:16<4:44:09,  1.68it/s]


 43%|██████████████                   | 21334/50000 [3:52:16<5:03:37,  1.57it/s]


 43%|██████████████                   | 21335/50000 [3:52:17<4:55:06,  1.62it/s]


 43%|██████████████                   | 21336/50000 [3:52:18<5:23:26,  1.48it/s]


 43%|██████████████                   | 21337/50000 [3:52:19<5:24:19,  1.47it/s]


 43%|██████████████                   | 21338/50000 [3:52:19<5:15:57,  1.51it/s]


 43%|██████████████                   | 21339/50000 [3:52:20<5:19:07,  1.50it/s]


 43%|██████████████                   | 21340/50000 [3:52:21<5:29:52,  1.45it/s]


 43%|██████████████                   | 21341/50000 [3:52:21<5:09:49,  1.54it/s]


 43%|██████████████                   | 21342/50000 [3:52:22<4:58:28,  1.60it/s]


 43%|██████████████                   | 21343/50000 [3:52:22<5:05:55,  1.56it/s]


 43%|██████████████                   | 21344/50000 [3:52:23<4:43:37,  1.68it/s]


 43%|██████████████                   | 21345/50000 [3:52:24<4:55:30,  1.62it/s]


 43%|██████████████                   | 21346/50000 [3:52:24<5:02:54,  1.58it/s]


 43%|██████████████                   | 21347/50000 [3:52:25<5:21:10,  1.49it/s]


 43%|██████████████                   | 21348/50000 [3:52:26<5:10:00,  1.54it/s]


 43%|██████████████                   | 21349/50000 [3:52:26<5:46:45,  1.38it/s]


 43%|██████████████                   | 21350/50000 [3:52:27<5:32:27,  1.44it/s]


 43%|██████████████                   | 21351/50000 [3:52:28<5:29:44,  1.45it/s]


 43%|██████████████                   | 21352/50000 [3:52:29<5:49:30,  1.37it/s]


 43%|██████████████                   | 21353/50000 [3:52:29<5:38:59,  1.41it/s]


 43%|██████████████                   | 21354/50000 [3:52:30<5:15:21,  1.51it/s]


 43%|██████████████                   | 21355/50000 [3:52:30<5:07:43,  1.55it/s]


 43%|██████████████                   | 21356/50000 [3:52:31<5:07:51,  1.55it/s]


 43%|██████████████                   | 21357/50000 [3:52:32<5:17:02,  1.51it/s]


 43%|██████████████                   | 21358/50000 [3:52:32<5:11:35,  1.53it/s]


 43%|██████████████                   | 21359/50000 [3:52:33<4:58:47,  1.60it/s]


 43%|██████████████                   | 21360/50000 [3:52:34<5:03:21,  1.57it/s]


 43%|██████████████                   | 21361/50000 [3:52:34<5:01:13,  1.58it/s]


 43%|██████████████                   | 21362/50000 [3:52:35<4:50:38,  1.64it/s]


 43%|██████████████                   | 21363/50000 [3:52:35<4:46:49,  1.66it/s]


 43%|██████████████                   | 21364/50000 [3:52:36<5:11:45,  1.53it/s]


 43%|██████████████                   | 21365/50000 [3:52:37<5:16:02,  1.51it/s]


 43%|██████████████                   | 21366/50000 [3:52:37<5:13:37,  1.52it/s]


 43%|██████████████                   | 21367/50000 [3:52:38<5:12:49,  1.53it/s]


 43%|██████████████                   | 21368/50000 [3:52:39<5:05:54,  1.56it/s]


 43%|██████████████                   | 21369/50000 [3:52:40<5:25:07,  1.47it/s]


 43%|██████████████                   | 21370/50000 [3:52:40<5:17:15,  1.50it/s]


 43%|██████████████                   | 21371/50000 [3:52:41<5:14:30,  1.52it/s]


 43%|██████████████                   | 21372/50000 [3:52:41<5:05:15,  1.56it/s]


 43%|██████████████                   | 21373/50000 [3:52:42<5:06:01,  1.56it/s]


 43%|██████████████                   | 21374/50000 [3:52:43<5:02:58,  1.57it/s]


 43%|██████████████                   | 21375/50000 [3:52:43<5:13:31,  1.52it/s]


 43%|██████████████                   | 21376/50000 [3:52:44<5:25:49,  1.46it/s]


 43%|██████████████                   | 21377/50000 [3:52:45<5:07:52,  1.55it/s]


 43%|██████████████                   | 21378/50000 [3:52:45<5:02:24,  1.58it/s]


 43%|██████████████                   | 21379/50000 [3:52:46<5:03:58,  1.57it/s]


 43%|██████████████                   | 21380/50000 [3:52:47<5:16:21,  1.51it/s]


 43%|██████████████                   | 21381/50000 [3:52:47<5:05:37,  1.56it/s]


 43%|██████████████                   | 21382/50000 [3:52:48<5:04:32,  1.57it/s]


 43%|██████████████                   | 21383/50000 [3:52:49<5:14:41,  1.52it/s]


 43%|██████████████                   | 21384/50000 [3:52:49<5:01:45,  1.58it/s]


 43%|██████████████                   | 21385/50000 [3:52:50<4:49:08,  1.65it/s]


 43%|██████████████                   | 21386/50000 [3:52:50<5:09:34,  1.54it/s]


 43%|██████████████                   | 21387/50000 [3:52:51<4:54:36,  1.62it/s]


 43%|██████████████                   | 21388/50000 [3:52:52<5:16:07,  1.51it/s]


 43%|██████████████                   | 21389/50000 [3:52:52<5:17:20,  1.50it/s]


 43%|██████████████                   | 21390/50000 [3:52:53<5:04:39,  1.57it/s]


 43%|██████████████                   | 21391/50000 [3:52:54<5:15:34,  1.51it/s]


 43%|██████████████                   | 21392/50000 [3:52:54<5:10:30,  1.54it/s]


 43%|██████████████                   | 21393/50000 [3:52:55<5:12:24,  1.53it/s]


 43%|██████████████                   | 21394/50000 [3:52:56<4:53:23,  1.63it/s]


 43%|██████████████                   | 21395/50000 [3:52:56<5:15:16,  1.51it/s]


 43%|██████████████                   | 21396/50000 [3:52:57<5:37:17,  1.41it/s]


 43%|██████████████                   | 21397/50000 [3:52:58<5:22:05,  1.48it/s]


 43%|██████████████                   | 21398/50000 [3:52:58<5:07:00,  1.55it/s]


 43%|██████████████                   | 21399/50000 [3:52:59<5:11:11,  1.53it/s]


 43%|██████████████                   | 21400/50000 [3:53:00<4:58:39,  1.60it/s]
                                                                                
{'loss': 3.3052, 'grad_norm': 4.046874523162842, 'learning_rate': 0.0005719999999999999, 'epoch': 1.12}

 43%|██████████████                   | 21400/50000 [3:53:00<4:58:39,  1.60it/s]


 43%|██████████████                   | 21401/50000 [3:53:00<5:37:58,  1.41it/s]


 43%|██████████████▏                  | 21402/50000 [3:53:01<5:34:41,  1.42it/s]


 43%|██████████████▏                  | 21403/50000 [3:53:02<5:32:25,  1.43it/s]


 43%|██████████████▏                  | 21404/50000 [3:53:02<5:05:52,  1.56it/s]


 43%|██████████████▏                  | 21405/50000 [3:53:03<4:57:41,  1.60it/s]


 43%|██████████████▏                  | 21406/50000 [3:53:04<4:57:23,  1.60it/s]


 43%|██████████████▏                  | 21407/50000 [3:53:04<5:06:18,  1.56it/s]


 43%|██████████████▏                  | 21408/50000 [3:53:05<5:00:38,  1.59it/s]


 43%|██████████████▏                  | 21409/50000 [3:53:06<5:17:51,  1.50it/s]


 43%|██████████████▏                  | 21410/50000 [3:53:06<5:06:13,  1.56it/s]


 43%|██████████████▏                  | 21411/50000 [3:53:07<5:06:51,  1.55it/s]


 43%|██████████████▏                  | 21412/50000 [3:53:08<5:31:40,  1.44it/s]


 43%|██████████████▏                  | 21413/50000 [3:53:08<5:16:34,  1.50it/s]


 43%|██████████████▏                  | 21414/50000 [3:53:09<5:15:54,  1.51it/s]


 43%|██████████████▏                  | 21415/50000 [3:53:09<4:55:11,  1.61it/s]


 43%|██████████████▏                  | 21416/50000 [3:53:10<4:39:49,  1.70it/s]


 43%|██████████████▏                  | 21417/50000 [3:53:11<4:49:14,  1.65it/s]


 43%|██████████████▏                  | 21418/50000 [3:53:11<4:51:26,  1.63it/s]


 43%|██████████████▏                  | 21419/50000 [3:53:12<5:01:03,  1.58it/s]


 43%|██████████████▏                  | 21420/50000 [3:53:12<4:59:52,  1.59it/s]


 43%|██████████████▏                  | 21421/50000 [3:53:13<5:06:10,  1.56it/s]


 43%|██████████████▏                  | 21422/50000 [3:53:14<5:36:47,  1.41it/s]


 43%|██████████████▏                  | 21423/50000 [3:53:15<5:40:20,  1.40it/s]


 43%|██████████████▏                  | 21424/50000 [3:53:15<5:24:28,  1.47it/s]


 43%|██████████████▏                  | 21425/50000 [3:53:16<5:30:45,  1.44it/s]


 43%|██████████████▏                  | 21426/50000 [3:53:17<5:35:14,  1.42it/s]


 43%|██████████████▏                  | 21427/50000 [3:53:17<5:27:22,  1.45it/s]


 43%|██████████████▏                  | 21428/50000 [3:53:18<5:32:49,  1.43it/s]


 43%|██████████████▏                  | 21429/50000 [3:53:19<5:28:23,  1.45it/s]


 43%|██████████████▏                  | 21430/50000 [3:53:20<5:50:18,  1.36it/s]


 43%|██████████████▏                  | 21431/50000 [3:53:20<5:35:02,  1.42it/s]


 43%|██████████████▏                  | 21432/50000 [3:53:21<5:52:51,  1.35it/s]


 43%|██████████████▏                  | 21433/50000 [3:53:22<5:43:17,  1.39it/s]


 43%|██████████████▏                  | 21434/50000 [3:53:22<5:32:36,  1.43it/s]


 43%|██████████████▏                  | 21435/50000 [3:53:23<5:51:57,  1.35it/s]


 43%|██████████████▏                  | 21436/50000 [3:53:24<5:27:30,  1.45it/s]


 43%|██████████████▏                  | 21437/50000 [3:53:25<5:26:54,  1.46it/s]


 43%|██████████████▏                  | 21438/50000 [3:53:25<5:33:08,  1.43it/s]


 43%|██████████████▏                  | 21439/50000 [3:53:26<5:55:16,  1.34it/s]


 43%|██████████████▏                  | 21440/50000 [3:53:27<5:39:01,  1.40it/s]


 43%|██████████████▏                  | 21441/50000 [3:53:27<5:22:11,  1.48it/s]


 43%|██████████████▏                  | 21442/50000 [3:53:28<5:09:32,  1.54it/s]


 43%|██████████████▏                  | 21443/50000 [3:53:29<5:10:42,  1.53it/s]


 43%|██████████████▏                  | 21444/50000 [3:53:29<5:03:19,  1.57it/s]


 43%|██████████████▏                  | 21445/50000 [3:53:30<4:55:33,  1.61it/s]


 43%|██████████████▏                  | 21446/50000 [3:53:30<4:53:11,  1.62it/s]


 43%|██████████████▏                  | 21447/50000 [3:53:31<5:06:37,  1.55it/s]


 43%|██████████████▏                  | 21448/50000 [3:53:32<5:11:19,  1.53it/s]


 43%|██████████████▏                  | 21449/50000 [3:53:32<4:52:27,  1.63it/s]


 43%|██████████████▏                  | 21450/50000 [3:53:33<4:53:21,  1.62it/s]


 43%|██████████████▏                  | 21451/50000 [3:53:34<4:56:34,  1.60it/s]


 43%|██████████████▏                  | 21452/50000 [3:53:34<4:53:32,  1.62it/s]


 43%|██████████████▏                  | 21453/50000 [3:53:35<4:40:03,  1.70it/s]


 43%|██████████████▏                  | 21454/50000 [3:53:35<4:53:38,  1.62it/s]


 43%|██████████████▏                  | 21455/50000 [3:53:36<4:50:24,  1.64it/s]


 43%|██████████████▏                  | 21456/50000 [3:53:37<4:59:03,  1.59it/s]


 43%|██████████████▏                  | 21457/50000 [3:53:37<4:47:58,  1.65it/s]


 43%|██████████████▏                  | 21458/50000 [3:53:38<4:53:37,  1.62it/s]


 43%|██████████████▏                  | 21459/50000 [3:53:38<4:54:08,  1.62it/s]


 43%|██████████████▏                  | 21460/50000 [3:53:39<5:01:39,  1.58it/s]


 43%|██████████████▏                  | 21461/50000 [3:53:40<5:16:12,  1.50it/s]


 43%|██████████████▏                  | 21462/50000 [3:53:40<5:13:26,  1.52it/s]


 43%|██████████████▏                  | 21463/50000 [3:53:41<5:13:15,  1.52it/s]


 43%|██████████████▏                  | 21464/50000 [3:53:42<5:05:51,  1.55it/s]


 43%|██████████████▏                  | 21465/50000 [3:53:42<5:06:05,  1.55it/s]


 43%|██████████████▏                  | 21466/50000 [3:53:43<4:53:45,  1.62it/s]


 43%|██████████████▏                  | 21467/50000 [3:53:44<4:45:07,  1.67it/s]


 43%|██████████████▏                  | 21468/50000 [3:53:44<4:40:49,  1.69it/s]


 43%|██████████████▏                  | 21469/50000 [3:53:45<4:39:37,  1.70it/s]


 43%|██████████████▏                  | 21470/50000 [3:53:45<4:46:02,  1.66it/s]


 43%|██████████████▏                  | 21471/50000 [3:53:46<4:40:34,  1.69it/s]


 43%|██████████████▏                  | 21472/50000 [3:53:46<4:44:53,  1.67it/s]


 43%|██████████████▏                  | 21473/50000 [3:53:47<5:02:37,  1.57it/s]


 43%|██████████████▏                  | 21474/50000 [3:53:48<5:03:56,  1.56it/s]


 43%|██████████████▏                  | 21475/50000 [3:53:49<5:10:16,  1.53it/s]


 43%|██████████████▏                  | 21476/50000 [3:53:49<5:01:08,  1.58it/s]


 43%|██████████████▏                  | 21477/50000 [3:53:50<5:03:43,  1.57it/s]


 43%|██████████████▏                  | 21478/50000 [3:53:50<5:14:56,  1.51it/s]


 43%|██████████████▏                  | 21479/50000 [3:53:51<4:58:07,  1.59it/s]


 43%|██████████████▏                  | 21480/50000 [3:53:52<5:06:12,  1.55it/s]


 43%|██████████████▏                  | 21481/50000 [3:53:52<5:09:05,  1.54it/s]


 43%|██████████████▏                  | 21482/50000 [3:53:53<5:05:42,  1.55it/s]


 43%|██████████████▏                  | 21483/50000 [3:53:54<5:10:29,  1.53it/s]


 43%|██████████████▏                  | 21484/50000 [3:53:54<5:09:24,  1.54it/s]


 43%|██████████████▏                  | 21485/50000 [3:53:55<5:20:43,  1.48it/s]


 43%|██████████████▏                  | 21486/50000 [3:53:56<5:35:47,  1.42it/s]


 43%|██████████████▏                  | 21487/50000 [3:53:57<5:37:04,  1.41it/s]


 43%|██████████████▏                  | 21488/50000 [3:53:57<5:27:00,  1.45it/s]


 43%|██████████████▏                  | 21489/50000 [3:53:58<5:44:11,  1.38it/s]


 43%|██████████████▏                  | 21490/50000 [3:53:59<5:23:55,  1.47it/s]


 43%|██████████████▏                  | 21491/50000 [3:53:59<5:17:47,  1.50it/s]


 43%|██████████████▏                  | 21492/50000 [3:54:00<5:14:25,  1.51it/s]


 43%|██████████████▏                  | 21493/50000 [3:54:01<5:11:10,  1.53it/s]


 43%|██████████████▏                  | 21494/50000 [3:54:01<5:33:59,  1.42it/s]


 43%|██████████████▏                  | 21495/50000 [3:54:02<5:29:19,  1.44it/s]


 43%|██████████████▏                  | 21496/50000 [3:54:03<5:19:54,  1.48it/s]


 43%|██████████████▏                  | 21497/50000 [3:54:03<5:14:07,  1.51it/s]


 43%|██████████████▏                  | 21498/50000 [3:54:04<5:04:34,  1.56it/s]


 43%|██████████████▏                  | 21499/50000 [3:54:05<5:24:24,  1.46it/s]


 43%|██████████████▏                  | 21500/50000 [3:54:05<5:30:10,  1.44it/s]
                                                                                
{'loss': 3.324, 'grad_norm': 3.207218647003174, 'learning_rate': 0.00057, 'epoch': 1.13}

 43%|██████████████▏                  | 21500/50000 [3:54:05<5:30:10,  1.44it/s]


 43%|██████████████▏                  | 21501/50000 [3:54:06<5:01:07,  1.58it/s]


 43%|██████████████▏                  | 21502/50000 [3:54:07<5:08:10,  1.54it/s]


 43%|██████████████▏                  | 21503/50000 [3:54:07<5:09:27,  1.53it/s]


 43%|██████████████▏                  | 21504/50000 [3:54:08<5:07:37,  1.54it/s]


 43%|██████████████▏                  | 21505/50000 [3:54:08<4:55:37,  1.61it/s]


 43%|██████████████▏                  | 21506/50000 [3:54:09<4:58:02,  1.59it/s]


 43%|██████████████▏                  | 21507/50000 [3:54:10<4:59:46,  1.58it/s]


 43%|██████████████▏                  | 21508/50000 [3:54:10<5:14:40,  1.51it/s]


 43%|██████████████▏                  | 21509/50000 [3:54:11<5:01:29,  1.58it/s]


 43%|██████████████▏                  | 21510/50000 [3:54:12<5:06:37,  1.55it/s]


 43%|██████████████▏                  | 21511/50000 [3:54:12<4:57:36,  1.60it/s]


 43%|██████████████▏                  | 21512/50000 [3:54:13<4:53:00,  1.62it/s]


 43%|██████████████▏                  | 21513/50000 [3:54:13<4:38:37,  1.70it/s]


 43%|██████████████▏                  | 21514/50000 [3:54:14<4:46:56,  1.65it/s]


 43%|██████████████▏                  | 21515/50000 [3:54:15<4:35:14,  1.72it/s]


 43%|██████████████▏                  | 21516/50000 [3:54:15<4:43:37,  1.67it/s]


 43%|██████████████▏                  | 21517/50000 [3:54:16<5:05:12,  1.56it/s]


 43%|██████████████▏                  | 21518/50000 [3:54:17<5:14:38,  1.51it/s]


 43%|██████████████▏                  | 21519/50000 [3:54:17<5:25:59,  1.46it/s]


 43%|██████████████▏                  | 21520/50000 [3:54:18<5:13:07,  1.52it/s]


 43%|██████████████▏                  | 21521/50000 [3:54:19<5:01:32,  1.57it/s]


 43%|██████████████▏                  | 21522/50000 [3:54:19<5:12:16,  1.52it/s]


 43%|██████████████▏                  | 21523/50000 [3:54:20<5:01:48,  1.57it/s]


 43%|██████████████▏                  | 21524/50000 [3:54:20<4:56:13,  1.60it/s]


 43%|██████████████▏                  | 21525/50000 [3:54:21<4:50:34,  1.63it/s]


 43%|██████████████▏                  | 21526/50000 [3:54:22<5:00:04,  1.58it/s]


 43%|██████████████▏                  | 21527/50000 [3:54:22<5:11:20,  1.52it/s]


 43%|██████████████▏                  | 21528/50000 [3:54:23<5:25:44,  1.46it/s]


 43%|██████████████▏                  | 21529/50000 [3:54:24<5:10:19,  1.53it/s]


 43%|██████████████▏                  | 21530/50000 [3:54:24<5:12:13,  1.52it/s]


 43%|██████████████▏                  | 21531/50000 [3:54:25<5:38:33,  1.40it/s]


 43%|██████████████▏                  | 21532/50000 [3:54:26<5:16:37,  1.50it/s]


 43%|██████████████▏                  | 21533/50000 [3:54:27<5:22:51,  1.47it/s]


 43%|██████████████▏                  | 21534/50000 [3:54:27<5:14:25,  1.51it/s]


 43%|██████████████▏                  | 21535/50000 [3:54:28<5:05:40,  1.55it/s]


 43%|██████████████▏                  | 21536/50000 [3:54:29<5:42:45,  1.38it/s]


 43%|██████████████▏                  | 21537/50000 [3:54:29<5:51:02,  1.35it/s]


 43%|██████████████▏                  | 21538/50000 [3:54:30<5:32:05,  1.43it/s]


 43%|██████████████▏                  | 21539/50000 [3:54:31<5:21:35,  1.47it/s]


 43%|██████████████▏                  | 21540/50000 [3:54:31<5:07:57,  1.54it/s]


 43%|██████████████▏                  | 21541/50000 [3:54:32<4:54:53,  1.61it/s]


 43%|██████████████▏                  | 21542/50000 [3:54:32<4:58:17,  1.59it/s]


 43%|██████████████▏                  | 21543/50000 [3:54:33<5:06:06,  1.55it/s]


 43%|██████████████▏                  | 21544/50000 [3:54:34<4:57:59,  1.59it/s]


 43%|██████████████▏                  | 21545/50000 [3:54:34<4:54:09,  1.61it/s]


 43%|██████████████▏                  | 21546/50000 [3:54:35<5:09:44,  1.53it/s]


 43%|██████████████▏                  | 21547/50000 [3:54:36<5:18:57,  1.49it/s]


 43%|██████████████▏                  | 21548/50000 [3:54:36<5:20:29,  1.48it/s]


 43%|██████████████▏                  | 21549/50000 [3:54:37<5:16:10,  1.50it/s]


 43%|██████████████▏                  | 21550/50000 [3:54:38<5:16:41,  1.50it/s]


 43%|██████████████▏                  | 21551/50000 [3:54:39<5:39:25,  1.40it/s]


 43%|██████████████▏                  | 21552/50000 [3:54:39<5:23:04,  1.47it/s]


 43%|██████████████▏                  | 21553/50000 [3:54:40<6:11:19,  1.28it/s]


 43%|██████████████▏                  | 21554/50000 [3:54:41<5:42:42,  1.38it/s]


 43%|██████████████▏                  | 21555/50000 [3:54:41<5:26:13,  1.45it/s]


 43%|██████████████▏                  | 21556/50000 [3:54:42<5:45:13,  1.37it/s]


 43%|██████████████▏                  | 21557/50000 [3:54:43<5:36:06,  1.41it/s]


 43%|██████████████▏                  | 21558/50000 [3:54:43<5:08:38,  1.54it/s]


 43%|██████████████▏                  | 21559/50000 [3:54:44<5:10:33,  1.53it/s]


 43%|██████████████▏                  | 21560/50000 [3:54:45<5:19:02,  1.49it/s]


 43%|██████████████▏                  | 21561/50000 [3:54:45<5:06:57,  1.54it/s]


 43%|██████████████▏                  | 21562/50000 [3:54:46<5:05:43,  1.55it/s]


 43%|██████████████▏                  | 21563/50000 [3:54:47<5:00:19,  1.58it/s]


 43%|██████████████▏                  | 21564/50000 [3:54:47<4:49:29,  1.64it/s]


 43%|██████████████▏                  | 21565/50000 [3:54:48<4:46:18,  1.66it/s]


 43%|██████████████▏                  | 21566/50000 [3:54:48<4:38:50,  1.70it/s]


 43%|██████████████▏                  | 21567/50000 [3:54:49<4:47:47,  1.65it/s]


 43%|██████████████▏                  | 21568/50000 [3:54:50<5:12:27,  1.52it/s]


 43%|██████████████▏                  | 21569/50000 [3:54:50<5:19:50,  1.48it/s]


 43%|██████████████▏                  | 21570/50000 [3:54:51<5:14:59,  1.50it/s]


 43%|██████████████▏                  | 21571/50000 [3:54:52<5:12:13,  1.52it/s]


 43%|██████████████▏                  | 21572/50000 [3:54:52<5:11:07,  1.52it/s]


 43%|██████████████▏                  | 21573/50000 [3:54:53<5:39:11,  1.40it/s]


 43%|██████████████▏                  | 21574/50000 [3:54:54<6:22:33,  1.24it/s]


 43%|██████████████▏                  | 21575/50000 [3:54:55<6:02:21,  1.31it/s]


 43%|██████████████▏                  | 21576/50000 [3:54:56<5:49:55,  1.35it/s]


 43%|██████████████▏                  | 21577/50000 [3:54:56<5:26:46,  1.45it/s]


 43%|██████████████▏                  | 21578/50000 [3:54:57<5:18:38,  1.49it/s]


 43%|██████████████▏                  | 21579/50000 [3:54:57<5:07:34,  1.54it/s]


 43%|██████████████▏                  | 21580/50000 [3:54:58<5:20:46,  1.48it/s]


 43%|██████████████▏                  | 21581/50000 [3:54:59<5:30:00,  1.44it/s]


 43%|██████████████▏                  | 21582/50000 [3:54:59<5:14:33,  1.51it/s]


 43%|██████████████▏                  | 21583/50000 [3:55:00<5:06:24,  1.55it/s]


 43%|██████████████▏                  | 21584/50000 [3:55:01<5:11:33,  1.52it/s]


 43%|██████████████▏                  | 21585/50000 [3:55:01<5:14:14,  1.51it/s]


 43%|██████████████▏                  | 21586/50000 [3:55:02<5:39:28,  1.39it/s]


 43%|██████████████▏                  | 21587/50000 [3:55:03<5:18:50,  1.49it/s]


 43%|██████████████▏                  | 21588/50000 [3:55:04<5:41:19,  1.39it/s]


 43%|██████████████▏                  | 21589/50000 [3:55:04<5:22:59,  1.47it/s]


 43%|██████████████▏                  | 21590/50000 [3:55:05<5:22:45,  1.47it/s]


 43%|██████████████▎                  | 21591/50000 [3:55:06<5:30:05,  1.43it/s]


 43%|██████████████▎                  | 21592/50000 [3:55:06<5:12:48,  1.51it/s]


 43%|██████████████▎                  | 21593/50000 [3:55:07<5:04:46,  1.55it/s]


 43%|██████████████▎                  | 21594/50000 [3:55:08<5:24:43,  1.46it/s]


 43%|██████████████▎                  | 21595/50000 [3:55:08<5:16:07,  1.50it/s]


 43%|██████████████▎                  | 21596/50000 [3:55:09<5:13:21,  1.51it/s]


 43%|██████████████▎                  | 21597/50000 [3:55:10<5:32:55,  1.42it/s]


 43%|██████████████▎                  | 21598/50000 [3:55:10<5:28:30,  1.44it/s]


 43%|██████████████▎                  | 21599/50000 [3:55:11<5:10:18,  1.53it/s]


 43%|██████████████▎                  | 21600/50000 [3:55:12<5:33:17,  1.42it/s]
                                                                                
{'loss': 3.3236, 'grad_norm': 3.0322442054748535, 'learning_rate': 0.0005679999999999999, 'epoch': 1.13}

 43%|██████████████▎                  | 21600/50000 [3:55:12<5:33:17,  1.42it/s]


 43%|██████████████▎                  | 21601/50000 [3:55:12<5:30:47,  1.43it/s]


 43%|██████████████▎                  | 21602/50000 [3:55:13<5:21:47,  1.47it/s]


 43%|██████████████▎                  | 21603/50000 [3:55:14<5:13:42,  1.51it/s]


 43%|██████████████▎                  | 21604/50000 [3:55:15<5:41:23,  1.39it/s]


 43%|██████████████▎                  | 21605/50000 [3:55:15<6:06:02,  1.29it/s]


 43%|██████████████▎                  | 21606/50000 [3:55:16<5:45:17,  1.37it/s]


 43%|██████████████▎                  | 21607/50000 [3:55:17<5:34:15,  1.42it/s]


 43%|██████████████▎                  | 21608/50000 [3:55:17<5:30:04,  1.43it/s]


 43%|██████████████▎                  | 21609/50000 [3:55:18<5:38:18,  1.40it/s]


 43%|██████████████▎                  | 21610/50000 [3:55:19<5:18:39,  1.48it/s]


 43%|██████████████▎                  | 21611/50000 [3:55:19<5:13:01,  1.51it/s]


 43%|██████████████▎                  | 21612/50000 [3:55:20<5:19:53,  1.48it/s]


 43%|██████████████▎                  | 21613/50000 [3:55:21<5:09:15,  1.53it/s]


 43%|██████████████▎                  | 21614/50000 [3:55:21<5:05:26,  1.55it/s]


 43%|██████████████▎                  | 21615/50000 [3:55:22<4:54:53,  1.60it/s]


 43%|██████████████▎                  | 21616/50000 [3:55:23<5:25:12,  1.45it/s]


 43%|██████████████▎                  | 21617/50000 [3:55:23<5:31:23,  1.43it/s]


 43%|██████████████▎                  | 21618/50000 [3:55:24<5:11:17,  1.52it/s]


 43%|██████████████▎                  | 21619/50000 [3:55:25<5:08:44,  1.53it/s]


 43%|██████████████▎                  | 21620/50000 [3:55:25<4:53:34,  1.61it/s]


 43%|██████████████▎                  | 21621/50000 [3:55:26<5:01:39,  1.57it/s]


 43%|██████████████▎                  | 21622/50000 [3:55:27<4:57:22,  1.59it/s]


 43%|██████████████▎                  | 21623/50000 [3:55:27<4:49:09,  1.64it/s]


 43%|██████████████▎                  | 21624/50000 [3:55:28<4:54:13,  1.61it/s]


 43%|██████████████▎                  | 21625/50000 [3:55:28<4:49:51,  1.63it/s]


 43%|██████████████▎                  | 21626/50000 [3:55:29<4:44:12,  1.66it/s]


 43%|██████████████▎                  | 21627/50000 [3:55:30<4:49:08,  1.64it/s]


 43%|██████████████▎                  | 21628/50000 [3:55:30<5:03:19,  1.56it/s]


 43%|██████████████▎                  | 21629/50000 [3:55:31<5:21:38,  1.47it/s]


 43%|██████████████▎                  | 21630/50000 [3:55:32<5:29:55,  1.43it/s]


 43%|██████████████▎                  | 21631/50000 [3:55:32<5:15:07,  1.50it/s]


 43%|██████████████▎                  | 21632/50000 [3:55:33<5:26:14,  1.45it/s]


 43%|██████████████▎                  | 21633/50000 [3:55:34<5:21:48,  1.47it/s]


 43%|██████████████▎                  | 21634/50000 [3:55:34<5:05:17,  1.55it/s]


 43%|██████████████▎                  | 21635/50000 [3:55:35<5:09:46,  1.53it/s]


 43%|██████████████▎                  | 21636/50000 [3:55:36<5:36:43,  1.40it/s]


 43%|██████████████▎                  | 21637/50000 [3:55:37<5:57:05,  1.32it/s]


 43%|██████████████▎                  | 21638/50000 [3:55:37<5:45:06,  1.37it/s]


 43%|██████████████▎                  | 21639/50000 [3:55:38<6:10:01,  1.28it/s]


 43%|██████████████▎                  | 21640/50000 [3:55:39<5:36:26,  1.40it/s]


 43%|██████████████▎                  | 21641/50000 [3:55:39<5:09:37,  1.53it/s]


 43%|██████████████▎                  | 21642/50000 [3:55:40<5:07:26,  1.54it/s]


 43%|██████████████▎                  | 21643/50000 [3:55:41<4:58:37,  1.58it/s]


 43%|██████████████▎                  | 21644/50000 [3:55:41<5:36:31,  1.40it/s]


 43%|██████████████▎                  | 21645/50000 [3:55:42<5:24:27,  1.46it/s]


 43%|██████████████▎                  | 21646/50000 [3:55:43<5:21:33,  1.47it/s]


 43%|██████████████▎                  | 21647/50000 [3:55:43<5:09:44,  1.53it/s]


 43%|██████████████▎                  | 21648/50000 [3:55:44<4:55:02,  1.60it/s]


 43%|██████████████▎                  | 21649/50000 [3:55:45<5:01:43,  1.57it/s]


 43%|██████████████▎                  | 21650/50000 [3:55:45<5:01:49,  1.57it/s]


 43%|██████████████▎                  | 21651/50000 [3:55:46<5:04:43,  1.55it/s]


 43%|██████████████▎                  | 21652/50000 [3:55:47<5:17:56,  1.49it/s]


 43%|██████████████▎                  | 21653/50000 [3:55:47<5:05:31,  1.55it/s]


 43%|██████████████▎                  | 21654/50000 [3:55:48<5:07:32,  1.54it/s]


 43%|██████████████▎                  | 21655/50000 [3:55:49<5:23:39,  1.46it/s]


 43%|██████████████▎                  | 21656/50000 [3:55:49<5:15:45,  1.50it/s]


 43%|██████████████▎                  | 21657/50000 [3:55:50<5:00:29,  1.57it/s]


 43%|██████████████▎                  | 21658/50000 [3:55:50<5:03:02,  1.56it/s]


 43%|██████████████▎                  | 21659/50000 [3:55:51<4:55:51,  1.60it/s]


 43%|██████████████▎                  | 21660/50000 [3:55:52<5:01:36,  1.57it/s]


 43%|██████████████▎                  | 21661/50000 [3:55:52<5:03:00,  1.56it/s]


 43%|██████████████▎                  | 21662/50000 [3:55:53<5:04:45,  1.55it/s]


 43%|██████████████▎                  | 21663/50000 [3:55:54<4:53:30,  1.61it/s]


 43%|██████████████▎                  | 21664/50000 [3:55:54<5:13:42,  1.51it/s]


 43%|██████████████▎                  | 21665/50000 [3:55:55<4:58:43,  1.58it/s]


 43%|██████████████▎                  | 21666/50000 [3:55:56<5:09:36,  1.53it/s]


 43%|██████████████▎                  | 21667/50000 [3:55:56<5:37:51,  1.40it/s]


 43%|██████████████▎                  | 21668/50000 [3:55:57<5:31:36,  1.42it/s]


 43%|██████████████▎                  | 21669/50000 [3:55:58<5:21:10,  1.47it/s]


 43%|██████████████▎                  | 21670/50000 [3:55:58<5:10:12,  1.52it/s]


 43%|██████████████▎                  | 21671/50000 [3:55:59<5:01:28,  1.57it/s]


 43%|██████████████▎                  | 21672/50000 [3:56:00<5:00:39,  1.57it/s]


 43%|██████████████▎                  | 21673/50000 [3:56:00<5:00:03,  1.57it/s]


 43%|██████████████▎                  | 21674/50000 [3:56:01<5:02:20,  1.56it/s]


 43%|██████████████▎                  | 21675/50000 [3:56:02<4:56:47,  1.59it/s]


 43%|██████████████▎                  | 21676/50000 [3:56:02<4:55:43,  1.60it/s]


 43%|██████████████▎                  | 21677/50000 [3:56:03<5:07:34,  1.53it/s]


 43%|██████████████▎                  | 21678/50000 [3:56:04<5:12:12,  1.51it/s]


 43%|██████████████▎                  | 21679/50000 [3:56:04<5:24:39,  1.45it/s]


 43%|██████████████▎                  | 21680/50000 [3:56:05<4:56:03,  1.59it/s]


 43%|██████████████▎                  | 21681/50000 [3:56:05<4:49:47,  1.63it/s]


 43%|██████████████▎                  | 21682/50000 [3:56:06<4:48:01,  1.64it/s]


 43%|██████████████▎                  | 21683/50000 [3:56:06<4:34:49,  1.72it/s]


 43%|██████████████▎                  | 21684/50000 [3:56:07<4:36:43,  1.71it/s]


 43%|██████████████▎                  | 21685/50000 [3:56:08<4:45:10,  1.65it/s]


 43%|██████████████▎                  | 21686/50000 [3:56:08<4:51:05,  1.62it/s]


 43%|██████████████▎                  | 21687/50000 [3:56:09<4:51:53,  1.62it/s]


 43%|██████████████▎                  | 21688/50000 [3:56:10<4:50:33,  1.62it/s]


 43%|██████████████▎                  | 21689/50000 [3:56:10<4:46:04,  1.65it/s]


 43%|██████████████▎                  | 21690/50000 [3:56:11<5:03:47,  1.55it/s]


 43%|██████████████▎                  | 21691/50000 [3:56:12<5:04:17,  1.55it/s]


 43%|██████████████▎                  | 21692/50000 [3:56:12<5:04:35,  1.55it/s]


 43%|██████████████▎                  | 21693/50000 [3:56:13<4:55:56,  1.59it/s]


 43%|██████████████▎                  | 21694/50000 [3:56:13<4:52:25,  1.61it/s]


 43%|██████████████▎                  | 21695/50000 [3:56:14<5:05:21,  1.54it/s]


 43%|██████████████▎                  | 21696/50000 [3:56:15<4:51:07,  1.62it/s]


 43%|██████████████▎                  | 21697/50000 [3:56:15<4:43:47,  1.66it/s]


 43%|██████████████▎                  | 21698/50000 [3:56:16<5:00:32,  1.57it/s]


 43%|██████████████▎                  | 21699/50000 [3:56:17<4:54:00,  1.60it/s]


 43%|██████████████▎                  | 21700/50000 [3:56:17<4:44:10,  1.66it/s]
                                                                                
{'loss': 3.3107, 'grad_norm': 3.063485860824585, 'learning_rate': 0.000566, 'epoch': 1.14}

 43%|██████████████▎                  | 21700/50000 [3:56:17<4:44:10,  1.66it/s]


 43%|██████████████▎                  | 21701/50000 [3:56:18<5:02:50,  1.56it/s]


 43%|██████████████▎                  | 21702/50000 [3:56:19<5:17:24,  1.49it/s]


 43%|██████████████▎                  | 21703/50000 [3:56:19<5:03:59,  1.55it/s]


 43%|██████████████▎                  | 21704/50000 [3:56:20<4:59:23,  1.58it/s]


 43%|██████████████▎                  | 21705/50000 [3:56:20<5:13:11,  1.51it/s]


 43%|██████████████▎                  | 21706/50000 [3:56:21<5:03:41,  1.55it/s]


 43%|██████████████▎                  | 21707/50000 [3:56:22<4:56:56,  1.59it/s]


 43%|██████████████▎                  | 21708/50000 [3:56:22<5:23:22,  1.46it/s]


 43%|██████████████▎                  | 21709/50000 [3:56:23<5:08:01,  1.53it/s]


 43%|██████████████▎                  | 21710/50000 [3:56:24<5:20:43,  1.47it/s]


 43%|██████████████▎                  | 21711/50000 [3:56:24<5:03:22,  1.55it/s]


 43%|██████████████▎                  | 21712/50000 [3:56:25<4:52:52,  1.61it/s]


 43%|██████████████▎                  | 21713/50000 [3:56:26<4:55:26,  1.60it/s]


 43%|██████████████▎                  | 21714/50000 [3:56:26<5:01:37,  1.56it/s]


 43%|██████████████▎                  | 21715/50000 [3:56:27<4:54:19,  1.60it/s]


 43%|██████████████▎                  | 21716/50000 [3:56:28<5:02:40,  1.56it/s]


 43%|██████████████▎                  | 21717/50000 [3:56:28<5:20:28,  1.47it/s]


 43%|██████████████▎                  | 21718/50000 [3:56:29<5:14:43,  1.50it/s]


 43%|██████████████▎                  | 21719/50000 [3:56:30<5:12:24,  1.51it/s]


 43%|██████████████▎                  | 21720/50000 [3:56:30<5:08:06,  1.53it/s]


 43%|██████████████▎                  | 21721/50000 [3:56:31<4:54:25,  1.60it/s]


 43%|██████████████▎                  | 21722/50000 [3:56:31<5:01:21,  1.56it/s]


 43%|██████████████▎                  | 21723/50000 [3:56:32<5:07:45,  1.53it/s]


 43%|██████████████▎                  | 21724/50000 [3:56:33<5:08:27,  1.53it/s]


 43%|██████████████▎                  | 21725/50000 [3:56:34<5:30:21,  1.43it/s]


 43%|██████████████▎                  | 21726/50000 [3:56:34<5:20:07,  1.47it/s]


 43%|██████████████▎                  | 21727/50000 [3:56:35<5:19:12,  1.48it/s]


 43%|██████████████▎                  | 21728/50000 [3:56:36<5:15:51,  1.49it/s]


 43%|██████████████▎                  | 21729/50000 [3:56:36<5:31:45,  1.42it/s]


 43%|██████████████▎                  | 21730/50000 [3:56:37<5:10:15,  1.52it/s]


 43%|██████████████▎                  | 21731/50000 [3:56:38<5:05:42,  1.54it/s]


 43%|██████████████▎                  | 21732/50000 [3:56:38<4:59:46,  1.57it/s]


 43%|██████████████▎                  | 21733/50000 [3:56:39<4:49:31,  1.63it/s]


 43%|██████████████▎                  | 21734/50000 [3:56:39<4:54:59,  1.60it/s]


 43%|██████████████▎                  | 21735/50000 [3:56:40<4:56:50,  1.59it/s]


 43%|██████████████▎                  | 21736/50000 [3:56:41<4:50:57,  1.62it/s]


 43%|██████████████▎                  | 21737/50000 [3:56:41<4:43:30,  1.66it/s]


 43%|██████████████▎                  | 21738/50000 [3:56:42<5:19:28,  1.47it/s]


 43%|██████████████▎                  | 21739/50000 [3:56:43<5:04:35,  1.55it/s]


 43%|██████████████▎                  | 21740/50000 [3:56:43<5:05:00,  1.54it/s]


 43%|██████████████▎                  | 21741/50000 [3:56:44<5:34:37,  1.41it/s]


 43%|██████████████▎                  | 21742/50000 [3:56:45<5:17:33,  1.48it/s]


 43%|██████████████▎                  | 21743/50000 [3:56:45<5:01:08,  1.56it/s]


 43%|██████████████▎                  | 21744/50000 [3:56:46<5:12:10,  1.51it/s]


 43%|██████████████▎                  | 21745/50000 [3:56:47<5:13:16,  1.50it/s]


 43%|██████████████▎                  | 21746/50000 [3:56:47<5:11:20,  1.51it/s]


 43%|██████████████▎                  | 21747/50000 [3:56:48<5:02:53,  1.55it/s]


 43%|██████████████▎                  | 21748/50000 [3:56:48<4:51:40,  1.61it/s]


 43%|██████████████▎                  | 21749/50000 [3:56:49<5:06:25,  1.54it/s]


 44%|██████████████▎                  | 21750/50000 [3:56:50<5:20:16,  1.47it/s]


 44%|██████████████▎                  | 21751/50000 [3:56:51<5:15:23,  1.49it/s]


 44%|██████████████▎                  | 21752/50000 [3:56:51<5:10:59,  1.51it/s]


 44%|██████████████▎                  | 21753/50000 [3:56:52<5:01:39,  1.56it/s]


 44%|██████████████▎                  | 21754/50000 [3:56:52<5:02:18,  1.56it/s]


 44%|██████████████▎                  | 21755/50000 [3:56:53<4:52:59,  1.61it/s]


 44%|██████████████▎                  | 21756/50000 [3:56:54<5:12:41,  1.51it/s]


 44%|██████████████▎                  | 21757/50000 [3:56:54<5:07:22,  1.53it/s]


 44%|██████████████▎                  | 21758/50000 [3:56:55<5:22:51,  1.46it/s]


 44%|██████████████▎                  | 21759/50000 [3:56:56<5:14:30,  1.50it/s]


 44%|██████████████▎                  | 21760/50000 [3:56:56<5:03:15,  1.55it/s]


 44%|██████████████▎                  | 21761/50000 [3:56:57<5:07:52,  1.53it/s]


 44%|██████████████▎                  | 21762/50000 [3:56:58<5:18:41,  1.48it/s]


 44%|██████████████▎                  | 21763/50000 [3:56:58<5:18:41,  1.48it/s]


 44%|██████████████▎                  | 21764/50000 [3:56:59<5:11:28,  1.51it/s]


 44%|██████████████▎                  | 21765/50000 [3:57:00<4:57:33,  1.58it/s]


 44%|██████████████▎                  | 21766/50000 [3:57:00<4:41:04,  1.67it/s]


 44%|██████████████▎                  | 21767/50000 [3:57:01<4:46:03,  1.64it/s]


 44%|██████████████▎                  | 21768/50000 [3:57:01<4:43:18,  1.66it/s]


 44%|██████████████▎                  | 21769/50000 [3:57:02<5:13:30,  1.50it/s]


 44%|██████████████▎                  | 21770/50000 [3:57:03<5:19:35,  1.47it/s]


 44%|██████████████▎                  | 21771/50000 [3:57:04<5:29:25,  1.43it/s]


 44%|██████████████▎                  | 21772/50000 [3:57:04<5:20:23,  1.47it/s]


 44%|██████████████▎                  | 21773/50000 [3:57:05<5:16:11,  1.49it/s]


 44%|██████████████▎                  | 21774/50000 [3:57:06<5:05:13,  1.54it/s]


 44%|██████████████▎                  | 21775/50000 [3:57:06<4:58:27,  1.58it/s]


 44%|██████████████▎                  | 21776/50000 [3:57:07<5:01:45,  1.56it/s]


 44%|██████████████▎                  | 21777/50000 [3:57:07<4:50:40,  1.62it/s]


 44%|██████████████▎                  | 21778/50000 [3:57:08<5:08:12,  1.53it/s]


 44%|██████████████▎                  | 21779/50000 [3:57:09<4:56:05,  1.59it/s]


 44%|██████████████▎                  | 21780/50000 [3:57:09<5:08:29,  1.52it/s]


 44%|██████████████▍                  | 21781/50000 [3:57:10<5:35:27,  1.40it/s]


 44%|██████████████▍                  | 21782/50000 [3:57:11<5:50:48,  1.34it/s]


 44%|██████████████▍                  | 21783/50000 [3:57:12<5:46:27,  1.36it/s]


 44%|██████████████▍                  | 21784/50000 [3:57:12<5:44:35,  1.36it/s]


 44%|██████████████▍                  | 21785/50000 [3:57:13<5:32:15,  1.42it/s]


 44%|██████████████▍                  | 21786/50000 [3:57:14<5:20:05,  1.47it/s]


 44%|██████████████▍                  | 21787/50000 [3:57:14<5:13:17,  1.50it/s]


 44%|██████████████▍                  | 21788/50000 [3:57:15<4:57:55,  1.58it/s]


 44%|██████████████▍                  | 21789/50000 [3:57:16<4:52:22,  1.61it/s]


 44%|██████████████▍                  | 21790/50000 [3:57:16<4:50:14,  1.62it/s]


 44%|██████████████▍                  | 21791/50000 [3:57:17<4:33:18,  1.72it/s]


 44%|██████████████▍                  | 21792/50000 [3:57:17<4:46:48,  1.64it/s]


 44%|██████████████▍                  | 21793/50000 [3:57:18<4:42:01,  1.67it/s]


 44%|██████████████▍                  | 21794/50000 [3:57:19<4:48:23,  1.63it/s]


 44%|██████████████▍                  | 21795/50000 [3:57:19<4:46:35,  1.64it/s]


 44%|██████████████▍                  | 21796/50000 [3:57:20<5:02:47,  1.55it/s]


 44%|██████████████▍                  | 21797/50000 [3:57:21<5:16:06,  1.49it/s]


 44%|██████████████▍                  | 21798/50000 [3:57:21<5:00:10,  1.57it/s]


 44%|██████████████▍                  | 21799/50000 [3:57:22<4:49:52,  1.62it/s]


 44%|██████████████▍                  | 21800/50000 [3:57:22<4:50:41,  1.62it/s]
                                                                                
{'loss': 3.3263, 'grad_norm': 3.03725266456604, 'learning_rate': 0.0005639999999999999, 'epoch': 1.14}

 44%|██████████████▍                  | 21800/50000 [3:57:22<4:50:41,  1.62it/s]


 44%|██████████████▍                  | 21801/50000 [3:57:23<4:36:34,  1.70it/s]


 44%|██████████████▍                  | 21802/50000 [3:57:24<4:57:42,  1.58it/s]


 44%|██████████████▍                  | 21803/50000 [3:57:24<5:04:58,  1.54it/s]


 44%|██████████████▍                  | 21804/50000 [3:57:25<5:33:10,  1.41it/s]


 44%|██████████████▍                  | 21805/50000 [3:57:26<5:12:06,  1.51it/s]


 44%|██████████████▍                  | 21806/50000 [3:57:26<5:11:31,  1.51it/s]


 44%|██████████████▍                  | 21807/50000 [3:57:27<5:09:17,  1.52it/s]


 44%|██████████████▍                  | 21808/50000 [3:57:28<5:10:24,  1.51it/s]


 44%|██████████████▍                  | 21809/50000 [3:57:28<5:12:56,  1.50it/s]


 44%|██████████████▍                  | 21810/50000 [3:57:29<5:00:42,  1.56it/s]


 44%|██████████████▍                  | 21811/50000 [3:57:29<4:49:26,  1.62it/s]


 44%|██████████████▍                  | 21812/50000 [3:57:30<4:55:28,  1.59it/s]


 44%|██████████████▍                  | 21813/50000 [3:57:31<4:56:57,  1.58it/s]


 44%|██████████████▍                  | 21814/50000 [3:57:31<4:56:10,  1.59it/s]


 44%|██████████████▍                  | 21815/50000 [3:57:32<5:01:53,  1.56it/s]


 44%|██████████████▍                  | 21816/50000 [3:57:33<5:17:50,  1.48it/s]


 44%|██████████████▍                  | 21817/50000 [3:57:33<5:03:41,  1.55it/s]


 44%|██████████████▍                  | 21818/50000 [3:57:34<5:01:46,  1.56it/s]


 44%|██████████████▍                  | 21819/50000 [3:57:35<5:12:27,  1.50it/s]


 44%|██████████████▍                  | 21820/50000 [3:57:35<4:56:19,  1.58it/s]


 44%|██████████████▍                  | 21821/50000 [3:57:36<5:08:23,  1.52it/s]


 44%|██████████████▍                  | 21822/50000 [3:57:37<5:03:37,  1.55it/s]


 44%|██████████████▍                  | 21823/50000 [3:57:37<5:05:34,  1.54it/s]


 44%|██████████████▍                  | 21824/50000 [3:57:38<5:22:24,  1.46it/s]


 44%|██████████████▍                  | 21825/50000 [3:57:39<5:16:52,  1.48it/s]


 44%|██████████████▍                  | 21826/50000 [3:57:39<5:04:10,  1.54it/s]


 44%|██████████████▍                  | 21827/50000 [3:57:40<4:52:32,  1.61it/s]


 44%|██████████████▍                  | 21828/50000 [3:57:41<5:19:04,  1.47it/s]


 44%|██████████████▍                  | 21829/50000 [3:57:41<5:05:40,  1.54it/s]


 44%|██████████████▍                  | 21830/50000 [3:57:42<5:14:10,  1.49it/s]


 44%|██████████████▍                  | 21831/50000 [3:57:43<4:52:57,  1.60it/s]


 44%|██████████████▍                  | 21832/50000 [3:57:43<5:00:47,  1.56it/s]


 44%|██████████████▍                  | 21833/50000 [3:57:44<5:05:19,  1.54it/s]


 44%|██████████████▍                  | 21834/50000 [3:57:45<5:10:20,  1.51it/s]


 44%|██████████████▍                  | 21835/50000 [3:57:45<5:08:27,  1.52it/s]


 44%|██████████████▍                  | 21836/50000 [3:57:46<5:06:08,  1.53it/s]


 44%|██████████████▍                  | 21837/50000 [3:57:46<5:01:57,  1.55it/s]


 44%|██████████████▍                  | 21838/50000 [3:57:47<4:38:15,  1.69it/s]


 44%|██████████████▍                  | 21839/50000 [3:57:48<4:47:15,  1.63it/s]


 44%|██████████████▍                  | 21840/50000 [3:57:48<4:57:14,  1.58it/s]


 44%|██████████████▍                  | 21841/50000 [3:57:49<4:45:57,  1.64it/s]


 44%|██████████████▍                  | 21842/50000 [3:57:49<4:37:51,  1.69it/s]


 44%|██████████████▍                  | 21843/50000 [3:57:50<4:48:58,  1.62it/s]


 44%|██████████████▍                  | 21844/50000 [3:57:51<4:49:38,  1.62it/s]


 44%|██████████████▍                  | 21845/50000 [3:57:51<4:51:49,  1.61it/s]


 44%|██████████████▍                  | 21846/50000 [3:57:52<4:57:11,  1.58it/s]


 44%|██████████████▍                  | 21847/50000 [3:57:53<4:53:47,  1.60it/s]


 44%|██████████████▍                  | 21848/50000 [3:57:53<5:13:11,  1.50it/s]


 44%|██████████████▍                  | 21849/50000 [3:57:54<5:09:04,  1.52it/s]


 44%|██████████████▍                  | 21850/50000 [3:57:55<5:11:10,  1.51it/s]


 44%|██████████████▍                  | 21851/50000 [3:57:55<5:01:35,  1.56it/s]


 44%|██████████████▍                  | 21852/50000 [3:57:56<4:52:08,  1.61it/s]


 44%|██████████████▍                  | 21853/50000 [3:57:56<4:46:26,  1.64it/s]


 44%|██████████████▍                  | 21854/50000 [3:57:57<4:51:20,  1.61it/s]


 44%|██████████████▍                  | 21855/50000 [3:57:58<5:00:11,  1.56it/s]


 44%|██████████████▍                  | 21856/50000 [3:57:58<4:55:36,  1.59it/s]


 44%|██████████████▍                  | 21857/50000 [3:57:59<4:59:45,  1.56it/s]


 44%|██████████████▍                  | 21858/50000 [3:58:00<5:06:03,  1.53it/s]


 44%|██████████████▍                  | 21859/50000 [3:58:00<5:06:47,  1.53it/s]


 44%|██████████████▍                  | 21860/50000 [3:58:01<4:54:55,  1.59it/s]


 44%|██████████████▍                  | 21861/50000 [3:58:02<4:54:38,  1.59it/s]


 44%|██████████████▍                  | 21862/50000 [3:58:02<5:02:26,  1.55it/s]


 44%|██████████████▍                  | 21863/50000 [3:58:03<4:53:41,  1.60it/s]


 44%|██████████████▍                  | 21864/50000 [3:58:03<4:46:37,  1.64it/s]


 44%|██████████████▍                  | 21865/50000 [3:58:04<4:55:49,  1.59it/s]


 44%|██████████████▍                  | 21866/50000 [3:58:05<4:45:26,  1.64it/s]


 44%|██████████████▍                  | 21867/50000 [3:58:05<4:42:34,  1.66it/s]


 44%|██████████████▍                  | 21868/50000 [3:58:06<5:04:06,  1.54it/s]


 44%|██████████████▍                  | 21869/50000 [3:58:07<4:53:51,  1.60it/s]


 44%|██████████████▍                  | 21870/50000 [3:58:07<4:38:33,  1.68it/s]


 44%|██████████████▍                  | 21871/50000 [3:58:08<4:37:08,  1.69it/s]


 44%|██████████████▍                  | 21872/50000 [3:58:08<4:45:34,  1.64it/s]


 44%|██████████████▍                  | 21873/50000 [3:58:09<4:41:51,  1.66it/s]


 44%|██████████████▍                  | 21874/50000 [3:58:09<4:41:50,  1.66it/s]


 44%|██████████████▍                  | 21875/50000 [3:58:10<4:50:45,  1.61it/s]


 44%|██████████████▍                  | 21876/50000 [3:58:11<4:43:39,  1.65it/s]


 44%|██████████████▍                  | 21877/50000 [3:58:11<4:48:20,  1.63it/s]


 44%|██████████████▍                  | 21878/50000 [3:58:12<4:49:56,  1.62it/s]


 44%|██████████████▍                  | 21879/50000 [3:58:13<4:46:37,  1.64it/s]


 44%|██████████████▍                  | 21880/50000 [3:58:13<4:34:04,  1.71it/s]


 44%|██████████████▍                  | 21881/50000 [3:58:14<4:34:37,  1.71it/s]


 44%|██████████████▍                  | 21882/50000 [3:58:14<4:36:50,  1.69it/s]


 44%|██████████████▍                  | 21883/50000 [3:58:15<4:30:23,  1.73it/s]


 44%|██████████████▍                  | 21884/50000 [3:58:15<4:31:07,  1.73it/s]


 44%|██████████████▍                  | 21885/50000 [3:58:16<5:08:02,  1.52it/s]


 44%|██████████████▍                  | 21886/50000 [3:58:17<4:55:42,  1.58it/s]


 44%|██████████████▍                  | 21887/50000 [3:58:17<4:57:43,  1.57it/s]


 44%|██████████████▍                  | 21888/50000 [3:58:18<4:49:30,  1.62it/s]


 44%|██████████████▍                  | 21889/50000 [3:58:19<5:06:12,  1.53it/s]


 44%|██████████████▍                  | 21890/50000 [3:58:20<5:17:50,  1.47it/s]


 44%|██████████████▍                  | 21891/50000 [3:58:20<5:22:05,  1.45it/s]


 44%|██████████████▍                  | 21892/50000 [3:58:21<5:16:58,  1.48it/s]


 44%|██████████████▍                  | 21893/50000 [3:58:22<5:16:12,  1.48it/s]


 44%|██████████████▍                  | 21894/50000 [3:58:22<5:24:44,  1.44it/s]


 44%|██████████████▍                  | 21895/50000 [3:58:23<4:57:13,  1.58it/s]


 44%|██████████████▍                  | 21896/50000 [3:58:23<4:59:35,  1.56it/s]


 44%|██████████████▍                  | 21897/50000 [3:58:24<4:57:44,  1.57it/s]


 44%|██████████████▍                  | 21898/50000 [3:58:25<4:46:42,  1.63it/s]


 44%|██████████████▍                  | 21899/50000 [3:58:25<4:50:27,  1.61it/s]


 44%|██████████████▍                  | 21900/50000 [3:58:26<4:39:50,  1.67it/s]
                                                                                
{'loss': 3.3251, 'grad_norm': 3.045696258544922, 'learning_rate': 0.0005620000000000001, 'epoch': 1.15}

 44%|██████████████▍                  | 21900/50000 [3:58:26<4:39:50,  1.67it/s]


 44%|██████████████▍                  | 21901/50000 [3:58:26<4:47:25,  1.63it/s]


 44%|██████████████▍                  | 21902/50000 [3:58:27<4:56:18,  1.58it/s]


 44%|██████████████▍                  | 21903/50000 [3:58:28<4:50:01,  1.61it/s]


 44%|██████████████▍                  | 21904/50000 [3:58:28<4:55:29,  1.58it/s]


 44%|██████████████▍                  | 21905/50000 [3:58:29<4:51:13,  1.61it/s]


 44%|██████████████▍                  | 21906/50000 [3:58:30<4:48:27,  1.62it/s]


 44%|██████████████▍                  | 21907/50000 [3:58:30<4:49:01,  1.62it/s]


 44%|██████████████▍                  | 21908/50000 [3:58:31<4:56:31,  1.58it/s]


 44%|██████████████▍                  | 21909/50000 [3:58:32<5:07:05,  1.52it/s]


 44%|██████████████▍                  | 21910/50000 [3:58:32<5:14:33,  1.49it/s]


 44%|██████████████▍                  | 21911/50000 [3:58:33<5:21:02,  1.46it/s]


 44%|██████████████▍                  | 21912/50000 [3:58:34<5:16:14,  1.48it/s]


 44%|██████████████▍                  | 21913/50000 [3:58:34<5:10:24,  1.51it/s]


 44%|██████████████▍                  | 21914/50000 [3:58:35<5:06:10,  1.53it/s]


 44%|██████████████▍                  | 21915/50000 [3:58:36<4:59:33,  1.56it/s]


 44%|██████████████▍                  | 21916/50000 [3:58:36<4:59:27,  1.56it/s]


 44%|██████████████▍                  | 21917/50000 [3:58:37<4:43:04,  1.65it/s]


 44%|██████████████▍                  | 21918/50000 [3:58:37<4:47:55,  1.63it/s]


 44%|██████████████▍                  | 21919/50000 [3:58:38<4:49:31,  1.62it/s]


 44%|██████████████▍                  | 21920/50000 [3:58:39<4:57:42,  1.57it/s]


 44%|██████████████▍                  | 21921/50000 [3:58:39<4:48:22,  1.62it/s]


 44%|██████████████▍                  | 21922/50000 [3:58:40<4:57:48,  1.57it/s]


 44%|██████████████▍                  | 21923/50000 [3:58:40<4:51:05,  1.61it/s]


 44%|██████████████▍                  | 21924/50000 [3:58:41<4:52:34,  1.60it/s]


 44%|██████████████▍                  | 21925/50000 [3:58:42<5:19:13,  1.47it/s]


 44%|██████████████▍                  | 21926/50000 [3:58:43<5:16:41,  1.48it/s]


 44%|██████████████▍                  | 21927/50000 [3:58:44<5:55:52,  1.31it/s]


 44%|██████████████▍                  | 21928/50000 [3:58:44<5:34:23,  1.40it/s]


 44%|██████████████▍                  | 21929/50000 [3:58:45<5:23:43,  1.45it/s]


 44%|██████████████▍                  | 21930/50000 [3:58:46<5:26:03,  1.43it/s]


 44%|██████████████▍                  | 21931/50000 [3:58:46<5:21:26,  1.46it/s]


 44%|██████████████▍                  | 21932/50000 [3:58:47<5:07:35,  1.52it/s]


 44%|██████████████▍                  | 21933/50000 [3:58:47<4:51:51,  1.60it/s]


 44%|██████████████▍                  | 21934/50000 [3:58:48<4:54:00,  1.59it/s]


 44%|██████████████▍                  | 21935/50000 [3:58:49<5:07:22,  1.52it/s]


 44%|██████████████▍                  | 21936/50000 [3:58:49<5:09:20,  1.51it/s]


 44%|██████████████▍                  | 21937/50000 [3:58:50<5:05:18,  1.53it/s]


 44%|██████████████▍                  | 21938/50000 [3:58:51<5:08:43,  1.51it/s]


 44%|██████████████▍                  | 21939/50000 [3:58:51<5:15:39,  1.48it/s]


 44%|██████████████▍                  | 21940/50000 [3:58:52<5:15:06,  1.48it/s]


 44%|██████████████▍                  | 21941/50000 [3:58:53<5:02:19,  1.55it/s]


 44%|██████████████▍                  | 21942/50000 [3:58:53<4:54:13,  1.59it/s]


 44%|██████████████▍                  | 21943/50000 [3:58:54<5:06:18,  1.53it/s]


 44%|██████████████▍                  | 21944/50000 [3:58:54<4:53:06,  1.60it/s]


 44%|██████████████▍                  | 21945/50000 [3:58:55<4:49:30,  1.62it/s]


 44%|██████████████▍                  | 21946/50000 [3:58:56<4:43:28,  1.65it/s]


 44%|██████████████▍                  | 21947/50000 [3:58:56<4:49:20,  1.62it/s]


 44%|██████████████▍                  | 21948/50000 [3:58:57<4:29:40,  1.73it/s]


 44%|██████████████▍                  | 21949/50000 [3:58:57<4:32:07,  1.72it/s]


 44%|██████████████▍                  | 21950/50000 [3:58:58<4:28:39,  1.74it/s]


 44%|██████████████▍                  | 21951/50000 [3:58:59<4:53:58,  1.59it/s]


 44%|██████████████▍                  | 21952/50000 [3:58:59<4:57:06,  1.57it/s]


 44%|██████████████▍                  | 21953/50000 [3:59:00<4:52:12,  1.60it/s]


 44%|██████████████▍                  | 21954/50000 [3:59:01<5:18:49,  1.47it/s]


 44%|██████████████▍                  | 21955/50000 [3:59:01<5:03:59,  1.54it/s]


 44%|██████████████▍                  | 21956/50000 [3:59:02<5:00:36,  1.55it/s]


 44%|██████████████▍                  | 21957/50000 [3:59:03<5:00:48,  1.55it/s]


 44%|██████████████▍                  | 21958/50000 [3:59:03<4:52:10,  1.60it/s]


 44%|██████████████▍                  | 21959/50000 [3:59:04<4:36:50,  1.69it/s]


 44%|██████████████▍                  | 21960/50000 [3:59:04<4:38:01,  1.68it/s]


 44%|██████████████▍                  | 21961/50000 [3:59:05<4:33:10,  1.71it/s]


 44%|██████████████▍                  | 21962/50000 [3:59:05<4:34:49,  1.70it/s]


 44%|██████████████▍                  | 21963/50000 [3:59:06<4:59:08,  1.56it/s]


 44%|██████████████▍                  | 21964/50000 [3:59:07<5:04:13,  1.54it/s]


 44%|██████████████▍                  | 21965/50000 [3:59:08<5:03:18,  1.54it/s]


 44%|██████████████▍                  | 21966/50000 [3:59:08<4:56:51,  1.57it/s]


 44%|██████████████▍                  | 21967/50000 [3:59:09<5:00:13,  1.56it/s]


 44%|██████████████▍                  | 21968/50000 [3:59:09<5:01:23,  1.55it/s]


 44%|██████████████▍                  | 21969/50000 [3:59:10<5:04:58,  1.53it/s]


 44%|██████████████▌                  | 21970/50000 [3:59:11<4:52:25,  1.60it/s]


 44%|██████████████▌                  | 21971/50000 [3:59:11<4:57:48,  1.57it/s]


 44%|██████████████▌                  | 21972/50000 [3:59:12<4:56:15,  1.58it/s]


 44%|██████████████▌                  | 21973/50000 [3:59:13<4:51:38,  1.60it/s]


 44%|██████████████▌                  | 21974/50000 [3:59:13<5:21:09,  1.45it/s]


 44%|██████████████▌                  | 21975/50000 [3:59:14<5:07:16,  1.52it/s]


 44%|██████████████▌                  | 21976/50000 [3:59:15<5:14:22,  1.49it/s]


 44%|██████████████▌                  | 21977/50000 [3:59:15<5:24:40,  1.44it/s]


 44%|██████████████▌                  | 21978/50000 [3:59:16<5:07:05,  1.52it/s]


 44%|██████████████▌                  | 21979/50000 [3:59:17<5:02:40,  1.54it/s]


 44%|██████████████▌                  | 21980/50000 [3:59:17<5:03:16,  1.54it/s]


 44%|██████████████▌                  | 21981/50000 [3:59:18<4:49:49,  1.61it/s]


 44%|██████████████▌                  | 21982/50000 [3:59:19<4:58:56,  1.56it/s]


 44%|██████████████▌                  | 21983/50000 [3:59:19<5:00:48,  1.55it/s]


 44%|██████████████▌                  | 21984/50000 [3:59:20<4:59:33,  1.56it/s]


 44%|██████████████▌                  | 21985/50000 [3:59:20<4:59:28,  1.56it/s]


 44%|██████████████▌                  | 21986/50000 [3:59:21<5:13:50,  1.49it/s]


 44%|██████████████▌                  | 21987/50000 [3:59:22<5:03:16,  1.54it/s]


 44%|██████████████▌                  | 21988/50000 [3:59:22<5:04:43,  1.53it/s]


 44%|██████████████▌                  | 21989/50000 [3:59:23<4:55:47,  1.58it/s]


 44%|██████████████▌                  | 21990/50000 [3:59:24<4:51:14,  1.60it/s]


 44%|██████████████▌                  | 21991/50000 [3:59:24<5:04:00,  1.54it/s]


 44%|██████████████▌                  | 21992/50000 [3:59:25<5:04:58,  1.53it/s]


 44%|██████████████▌                  | 21993/50000 [3:59:26<5:02:55,  1.54it/s]


 44%|██████████████▌                  | 21994/50000 [3:59:27<5:28:52,  1.42it/s]


 44%|██████████████▌                  | 21995/50000 [3:59:27<5:24:04,  1.44it/s]


 44%|██████████████▌                  | 21996/50000 [3:59:28<5:05:42,  1.53it/s]


 44%|██████████████▌                  | 21997/50000 [3:59:28<5:02:37,  1.54it/s]


 44%|██████████████▌                  | 21998/50000 [3:59:29<5:17:36,  1.47it/s]


 44%|██████████████▌                  | 21999/50000 [3:59:30<5:24:29,  1.44it/s]


 44%|██████████████▌                  | 22000/50000 [3:59:30<5:08:27,  1.51it/s]
                                                                                
{'loss': 3.3155, 'grad_norm': 3.733065128326416, 'learning_rate': 0.0005600000000000001, 'epoch': 1.15}

 44%|██████████████▌                  | 22000/50000 [3:59:30<5:08:27,  1.51it/s]


 44%|██████████████▌                  | 22001/50000 [3:59:31<5:03:47,  1.54it/s]


 44%|██████████████▌                  | 22002/50000 [3:59:32<4:56:44,  1.57it/s]


 44%|██████████████▌                  | 22003/50000 [3:59:32<4:50:52,  1.60it/s]


 44%|██████████████▌                  | 22004/50000 [3:59:33<5:04:53,  1.53it/s]


 44%|██████████████▌                  | 22005/50000 [3:59:34<5:33:04,  1.40it/s]


 44%|██████████████▌                  | 22006/50000 [3:59:34<5:23:27,  1.44it/s]


 44%|██████████████▌                  | 22007/50000 [3:59:35<5:41:25,  1.37it/s]


 44%|██████████████▌                  | 22008/50000 [3:59:36<5:22:13,  1.45it/s]


 44%|██████████████▌                  | 22009/50000 [3:59:37<5:10:30,  1.50it/s]


 44%|██████████████▌                  | 22010/50000 [3:59:37<4:59:40,  1.56it/s]


 44%|██████████████▌                  | 22011/50000 [3:59:38<5:02:46,  1.54it/s]


 44%|██████████████▌                  | 22012/50000 [3:59:38<4:49:59,  1.61it/s]


 44%|██████████████▌                  | 22013/50000 [3:59:39<4:45:22,  1.63it/s]


 44%|██████████████▌                  | 22014/50000 [3:59:40<4:49:11,  1.61it/s]


 44%|██████████████▌                  | 22015/50000 [3:59:40<4:58:07,  1.56it/s]


 44%|██████████████▌                  | 22016/50000 [3:59:41<5:01:40,  1.55it/s]


 44%|██████████████▌                  | 22017/50000 [3:59:42<5:11:34,  1.50it/s]


 44%|██████████████▌                  | 22018/50000 [3:59:42<5:06:38,  1.52it/s]


 44%|██████████████▌                  | 22019/50000 [3:59:43<5:06:47,  1.52it/s]


 44%|██████████████▌                  | 22020/50000 [3:59:44<4:59:41,  1.56it/s]


 44%|██████████████▌                  | 22021/50000 [3:59:44<5:13:39,  1.49it/s]


 44%|██████████████▌                  | 22022/50000 [3:59:45<4:56:45,  1.57it/s]


 44%|██████████████▌                  | 22023/50000 [3:59:46<5:07:49,  1.51it/s]


 44%|██████████████▌                  | 22024/50000 [3:59:46<5:08:23,  1.51it/s]


 44%|██████████████▌                  | 22025/50000 [3:59:47<4:45:25,  1.63it/s]


 44%|██████████████▌                  | 22026/50000 [3:59:47<4:43:50,  1.64it/s]


 44%|██████████████▌                  | 22027/50000 [3:59:48<5:16:20,  1.47it/s]


 44%|██████████████▌                  | 22028/50000 [3:59:49<5:02:58,  1.54it/s]


 44%|██████████████▌                  | 22029/50000 [3:59:49<4:50:06,  1.61it/s]


 44%|██████████████▌                  | 22030/50000 [3:59:50<4:56:57,  1.57it/s]


 44%|██████████████▌                  | 22031/50000 [3:59:51<4:48:25,  1.62it/s]


 44%|██████████████▌                  | 22032/50000 [3:59:51<4:57:28,  1.57it/s]


 44%|██████████████▌                  | 22033/50000 [3:59:52<4:44:27,  1.64it/s]


 44%|██████████████▌                  | 22034/50000 [3:59:52<4:38:53,  1.67it/s]


 44%|██████████████▌                  | 22035/50000 [3:59:53<4:49:03,  1.61it/s]


 44%|██████████████▌                  | 22036/50000 [3:59:54<5:08:51,  1.51it/s]


 44%|██████████████▌                  | 22037/50000 [3:59:54<4:59:19,  1.56it/s]


 44%|██████████████▌                  | 22038/50000 [3:59:55<4:59:41,  1.56it/s]


 44%|██████████████▌                  | 22039/50000 [3:59:56<5:10:57,  1.50it/s]


 44%|██████████████▌                  | 22040/50000 [3:59:56<5:25:16,  1.43it/s]


 44%|██████████████▌                  | 22041/50000 [3:59:57<5:21:44,  1.45it/s]


 44%|██████████████▌                  | 22042/50000 [3:59:58<5:33:51,  1.40it/s]


 44%|██████████████▌                  | 22043/50000 [3:59:59<5:34:59,  1.39it/s]


 44%|██████████████▌                  | 22044/50000 [3:59:59<5:36:34,  1.38it/s]


 44%|██████████████▌                  | 22045/50000 [4:00:00<5:12:41,  1.49it/s]


 44%|██████████████▌                  | 22046/50000 [4:00:01<5:12:39,  1.49it/s]


 44%|██████████████▌                  | 22047/50000 [4:00:01<5:00:13,  1.55it/s]


 44%|██████████████▌                  | 22048/50000 [4:00:02<4:51:35,  1.60it/s]


 44%|██████████████▌                  | 22049/50000 [4:00:03<5:18:09,  1.46it/s]


 44%|██████████████▌                  | 22050/50000 [4:00:03<5:05:55,  1.52it/s]


 44%|██████████████▌                  | 22051/50000 [4:00:04<5:41:37,  1.36it/s]


 44%|██████████████▌                  | 22052/50000 [4:00:05<5:12:08,  1.49it/s]


 44%|██████████████▌                  | 22053/50000 [4:00:05<5:24:04,  1.44it/s]


 44%|██████████████▌                  | 22054/50000 [4:00:06<5:17:55,  1.47it/s]


 44%|██████████████▌                  | 22055/50000 [4:00:07<5:03:07,  1.54it/s]


 44%|██████████████▌                  | 22056/50000 [4:00:07<4:51:53,  1.60it/s]


 44%|██████████████▌                  | 22057/50000 [4:00:08<4:43:58,  1.64it/s]


 44%|██████████████▌                  | 22058/50000 [4:00:08<4:53:23,  1.59it/s]


 44%|██████████████▌                  | 22059/50000 [4:00:09<4:44:06,  1.64it/s]


 44%|██████████████▌                  | 22060/50000 [4:00:10<4:54:26,  1.58it/s]


 44%|██████████████▌                  | 22061/50000 [4:00:10<4:53:41,  1.59it/s]


 44%|██████████████▌                  | 22062/50000 [4:00:11<4:57:19,  1.57it/s]


 44%|██████████████▌                  | 22063/50000 [4:00:12<4:49:12,  1.61it/s]


 44%|██████████████▌                  | 22064/50000 [4:00:12<5:04:08,  1.53it/s]


 44%|██████████████▌                  | 22065/50000 [4:00:13<4:53:49,  1.58it/s]


 44%|██████████████▌                  | 22066/50000 [4:00:13<4:49:08,  1.61it/s]


 44%|██████████████▌                  | 22067/50000 [4:00:14<5:01:49,  1.54it/s]


 44%|██████████████▌                  | 22068/50000 [4:00:15<5:01:51,  1.54it/s]


 44%|██████████████▌                  | 22069/50000 [4:00:15<5:06:11,  1.52it/s]


 44%|██████████████▌                  | 22070/50000 [4:00:16<5:21:57,  1.45it/s]


 44%|██████████████▌                  | 22071/50000 [4:00:17<5:20:19,  1.45it/s]


 44%|██████████████▌                  | 22072/50000 [4:00:18<5:18:15,  1.46it/s]


 44%|██████████████▌                  | 22073/50000 [4:00:18<5:00:58,  1.55it/s]


 44%|██████████████▌                  | 22074/50000 [4:00:19<5:03:59,  1.53it/s]


 44%|██████████████▌                  | 22075/50000 [4:00:20<5:05:57,  1.52it/s]


 44%|██████████████▌                  | 22076/50000 [4:00:20<5:29:09,  1.41it/s]


 44%|██████████████▌                  | 22077/50000 [4:00:21<5:11:50,  1.49it/s]


 44%|██████████████▌                  | 22078/50000 [4:00:22<5:05:53,  1.52it/s]


 44%|██████████████▌                  | 22079/50000 [4:00:22<4:51:57,  1.59it/s]


 44%|██████████████▌                  | 22080/50000 [4:00:23<4:46:36,  1.62it/s]


 44%|██████████████▌                  | 22081/50000 [4:00:24<5:28:56,  1.41it/s]


 44%|██████████████▌                  | 22082/50000 [4:00:24<5:30:14,  1.41it/s]


 44%|██████████████▌                  | 22083/50000 [4:00:25<5:18:48,  1.46it/s]


 44%|██████████████▌                  | 22084/50000 [4:00:26<5:03:37,  1.53it/s]


 44%|██████████████▌                  | 22085/50000 [4:00:26<4:59:14,  1.55it/s]


 44%|██████████████▌                  | 22086/50000 [4:00:27<5:04:52,  1.53it/s]


 44%|██████████████▌                  | 22087/50000 [4:00:28<5:16:14,  1.47it/s]


 44%|██████████████▌                  | 22088/50000 [4:00:28<5:00:02,  1.55it/s]


 44%|██████████████▌                  | 22089/50000 [4:00:29<5:02:47,  1.54it/s]


 44%|██████████████▌                  | 22090/50000 [4:00:30<5:11:48,  1.49it/s]


 44%|██████████████▌                  | 22091/50000 [4:00:30<5:00:56,  1.55it/s]


 44%|██████████████▌                  | 22092/50000 [4:00:31<5:09:37,  1.50it/s]


 44%|██████████████▌                  | 22093/50000 [4:00:31<4:44:25,  1.64it/s]


 44%|██████████████▌                  | 22094/50000 [4:00:32<5:00:02,  1.55it/s]


 44%|██████████████▌                  | 22095/50000 [4:00:33<4:53:49,  1.58it/s]


 44%|██████████████▌                  | 22096/50000 [4:00:33<4:41:40,  1.65it/s]


 44%|██████████████▌                  | 22097/50000 [4:00:34<4:43:45,  1.64it/s]


 44%|██████████████▌                  | 22098/50000 [4:00:34<4:41:39,  1.65it/s]


 44%|██████████████▌                  | 22099/50000 [4:00:35<4:35:47,  1.69it/s]


 44%|██████████████▌                  | 22100/50000 [4:00:36<4:33:35,  1.70it/s]
                                                                                
{'loss': 3.3126, 'grad_norm': 3.293687582015991, 'learning_rate': 0.000558, 'epoch': 1.16}

 44%|██████████████▌                  | 22100/50000 [4:00:36<4:33:35,  1.70it/s]


 44%|██████████████▌                  | 22101/50000 [4:00:36<4:39:00,  1.67it/s]


 44%|██████████████▌                  | 22102/50000 [4:00:37<4:34:47,  1.69it/s]


 44%|██████████████▌                  | 22103/50000 [4:00:37<4:40:41,  1.66it/s]


 44%|██████████████▌                  | 22104/50000 [4:00:38<4:46:35,  1.62it/s]


 44%|██████████████▌                  | 22105/50000 [4:00:39<4:33:28,  1.70it/s]


 44%|██████████████▌                  | 22106/50000 [4:00:39<4:32:41,  1.70it/s]


 44%|██████████████▌                  | 22107/50000 [4:00:40<4:44:55,  1.63it/s]


 44%|██████████████▌                  | 22108/50000 [4:00:40<4:40:43,  1.66it/s]


 44%|██████████████▌                  | 22109/50000 [4:00:41<4:34:15,  1.69it/s]


 44%|██████████████▌                  | 22110/50000 [4:00:42<4:35:50,  1.69it/s]


 44%|██████████████▌                  | 22111/50000 [4:00:42<4:46:37,  1.62it/s]


 44%|██████████████▌                  | 22112/50000 [4:00:43<4:38:22,  1.67it/s]


 44%|██████████████▌                  | 22113/50000 [4:00:43<4:58:28,  1.56it/s]


 44%|██████████████▌                  | 22114/50000 [4:00:44<5:04:23,  1.53it/s]


 44%|██████████████▌                  | 22115/50000 [4:00:45<5:08:27,  1.51it/s]


 44%|██████████████▌                  | 22116/50000 [4:00:46<5:18:40,  1.46it/s]


 44%|██████████████▌                  | 22117/50000 [4:00:46<5:15:42,  1.47it/s]


 44%|██████████████▌                  | 22118/50000 [4:00:47<5:16:21,  1.47it/s]


 44%|██████████████▌                  | 22119/50000 [4:00:48<5:10:37,  1.50it/s]


 44%|██████████████▌                  | 22120/50000 [4:00:48<5:06:34,  1.52it/s]


 44%|██████████████▌                  | 22121/50000 [4:00:49<4:45:42,  1.63it/s]


 44%|██████████████▌                  | 22122/50000 [4:00:50<5:15:30,  1.47it/s]


 44%|██████████████▌                  | 22123/50000 [4:00:50<5:04:37,  1.53it/s]


 44%|██████████████▌                  | 22124/50000 [4:00:51<5:15:01,  1.47it/s]


 44%|██████████████▌                  | 22125/50000 [4:00:52<5:07:50,  1.51it/s]


 44%|██████████████▌                  | 22126/50000 [4:00:52<5:10:48,  1.49it/s]


 44%|██████████████▌                  | 22127/50000 [4:00:53<5:06:40,  1.51it/s]


 44%|██████████████▌                  | 22128/50000 [4:00:53<5:01:09,  1.54it/s]


 44%|██████████████▌                  | 22129/50000 [4:00:54<4:53:39,  1.58it/s]


 44%|██████████████▌                  | 22130/50000 [4:00:55<4:41:28,  1.65it/s]


 44%|██████████████▌                  | 22131/50000 [4:00:55<4:39:55,  1.66it/s]


 44%|██████████████▌                  | 22132/50000 [4:00:56<4:37:57,  1.67it/s]


 44%|██████████████▌                  | 22133/50000 [4:00:56<4:42:47,  1.64it/s]


 44%|██████████████▌                  | 22134/50000 [4:00:57<4:29:52,  1.72it/s]


 44%|██████████████▌                  | 22135/50000 [4:00:58<4:40:34,  1.66it/s]


 44%|██████████████▌                  | 22136/50000 [4:00:58<4:38:23,  1.67it/s]


 44%|██████████████▌                  | 22137/50000 [4:00:59<4:41:24,  1.65it/s]


 44%|██████████████▌                  | 22138/50000 [4:00:59<4:43:28,  1.64it/s]


 44%|██████████████▌                  | 22139/50000 [4:01:00<5:01:52,  1.54it/s]


 44%|██████████████▌                  | 22140/50000 [4:01:01<4:58:34,  1.56it/s]


 44%|██████████████▌                  | 22141/50000 [4:01:01<4:51:48,  1.59it/s]


 44%|██████████████▌                  | 22142/50000 [4:01:02<5:05:05,  1.52it/s]


 44%|██████████████▌                  | 22143/50000 [4:01:03<5:20:39,  1.45it/s]


 44%|██████████████▌                  | 22144/50000 [4:01:04<5:25:16,  1.43it/s]


 44%|██████████████▌                  | 22145/50000 [4:01:04<5:09:40,  1.50it/s]


 44%|██████████████▌                  | 22146/50000 [4:01:05<4:59:38,  1.55it/s]


 44%|██████████████▌                  | 22147/50000 [4:01:05<4:54:17,  1.58it/s]


 44%|██████████████▌                  | 22148/50000 [4:01:06<4:55:00,  1.57it/s]


 44%|██████████████▌                  | 22149/50000 [4:01:07<5:06:08,  1.52it/s]


 44%|██████████████▌                  | 22150/50000 [4:01:08<5:18:38,  1.46it/s]


 44%|██████████████▌                  | 22151/50000 [4:01:08<5:12:06,  1.49it/s]


 44%|██████████████▌                  | 22152/50000 [4:01:09<5:02:48,  1.53it/s]


 44%|██████████████▌                  | 22153/50000 [4:01:09<4:59:12,  1.55it/s]


 44%|██████████████▌                  | 22154/50000 [4:01:10<5:16:24,  1.47it/s]


 44%|██████████████▌                  | 22155/50000 [4:01:11<5:04:17,  1.53it/s]


 44%|██████████████▌                  | 22156/50000 [4:01:11<4:48:48,  1.61it/s]


 44%|██████████████▌                  | 22157/50000 [4:01:12<4:42:20,  1.64it/s]


 44%|██████████████▌                  | 22158/50000 [4:01:12<4:37:54,  1.67it/s]


 44%|██████████████▌                  | 22159/50000 [4:01:13<4:33:59,  1.69it/s]


 44%|██████████████▋                  | 22160/50000 [4:01:14<5:08:04,  1.51it/s]


 44%|██████████████▋                  | 22161/50000 [4:01:14<4:46:34,  1.62it/s]


 44%|██████████████▋                  | 22162/50000 [4:01:15<4:48:43,  1.61it/s]


 44%|██████████████▋                  | 22163/50000 [4:01:16<5:00:45,  1.54it/s]


 44%|██████████████▋                  | 22164/50000 [4:01:16<4:50:39,  1.60it/s]


 44%|██████████████▋                  | 22165/50000 [4:01:17<4:50:40,  1.60it/s]


 44%|██████████████▋                  | 22166/50000 [4:01:17<4:47:13,  1.62it/s]


 44%|██████████████▋                  | 22167/50000 [4:01:18<4:47:28,  1.61it/s]


 44%|██████████████▋                  | 22168/50000 [4:01:19<4:51:56,  1.59it/s]


 44%|██████████████▋                  | 22169/50000 [4:01:19<4:57:45,  1.56it/s]


 44%|██████████████▋                  | 22170/50000 [4:01:20<5:03:31,  1.53it/s]


 44%|██████████████▋                  | 22171/50000 [4:01:21<4:58:49,  1.55it/s]


 44%|██████████████▋                  | 22172/50000 [4:01:21<5:02:35,  1.53it/s]


 44%|██████████████▋                  | 22173/50000 [4:01:22<5:00:47,  1.54it/s]


 44%|██████████████▋                  | 22174/50000 [4:01:23<5:15:44,  1.47it/s]


 44%|██████████████▋                  | 22175/50000 [4:01:23<5:01:16,  1.54it/s]


 44%|██████████████▋                  | 22176/50000 [4:01:24<4:46:44,  1.62it/s]


 44%|██████████████▋                  | 22177/50000 [4:01:25<4:54:00,  1.58it/s]


 44%|██████████████▋                  | 22178/50000 [4:01:25<4:48:37,  1.61it/s]


 44%|██████████████▋                  | 22179/50000 [4:01:26<4:38:45,  1.66it/s]


 44%|██████████████▋                  | 22180/50000 [4:01:26<4:42:33,  1.64it/s]


 44%|██████████████▋                  | 22181/50000 [4:01:27<4:59:48,  1.55it/s]


 44%|██████████████▋                  | 22182/50000 [4:01:28<4:51:12,  1.59it/s]


 44%|██████████████▋                  | 22183/50000 [4:01:28<4:56:33,  1.56it/s]


 44%|██████████████▋                  | 22184/50000 [4:01:29<5:11:54,  1.49it/s]


 44%|██████████████▋                  | 22185/50000 [4:01:30<5:00:10,  1.54it/s]


 44%|██████████████▋                  | 22186/50000 [4:01:30<4:58:57,  1.55it/s]


 44%|██████████████▋                  | 22187/50000 [4:01:31<4:41:58,  1.64it/s]


 44%|██████████████▋                  | 22188/50000 [4:01:31<4:44:38,  1.63it/s]


 44%|██████████████▋                  | 22189/50000 [4:01:32<4:40:18,  1.65it/s]


 44%|██████████████▋                  | 22190/50000 [4:01:33<4:45:08,  1.63it/s]


 44%|██████████████▋                  | 22191/50000 [4:01:33<4:53:45,  1.58it/s]


 44%|██████████████▋                  | 22192/50000 [4:01:34<4:57:08,  1.56it/s]


 44%|██████████████▋                  | 22193/50000 [4:01:35<4:45:31,  1.62it/s]


 44%|██████████████▋                  | 22194/50000 [4:01:35<4:30:44,  1.71it/s]


 44%|██████████████▋                  | 22195/50000 [4:01:36<4:33:57,  1.69it/s]


 44%|██████████████▋                  | 22196/50000 [4:01:36<4:44:08,  1.63it/s]


 44%|██████████████▋                  | 22197/50000 [4:01:37<4:49:30,  1.60it/s]


 44%|██████████████▋                  | 22198/50000 [4:01:38<4:41:57,  1.64it/s]


 44%|██████████████▋                  | 22199/50000 [4:01:38<4:55:56,  1.57it/s]


 44%|██████████████▋                  | 22200/50000 [4:01:39<4:46:23,  1.62it/s]
                                                                                
{'loss': 3.3276, 'grad_norm': 2.9788389205932617, 'learning_rate': 0.0005560000000000001, 'epoch': 1.16}

 44%|██████████████▋                  | 22200/50000 [4:01:39<4:46:23,  1.62it/s]


 44%|██████████████▋                  | 22201/50000 [4:01:39<4:43:12,  1.64it/s]


 44%|██████████████▋                  | 22202/50000 [4:01:40<5:04:15,  1.52it/s]


 44%|██████████████▋                  | 22203/50000 [4:01:41<5:04:33,  1.52it/s]


 44%|██████████████▋                  | 22204/50000 [4:01:42<5:00:20,  1.54it/s]


 44%|██████████████▋                  | 22205/50000 [4:01:42<5:36:56,  1.37it/s]


 44%|██████████████▋                  | 22206/50000 [4:01:43<5:24:11,  1.43it/s]


 44%|██████████████▋                  | 22207/50000 [4:01:44<5:15:10,  1.47it/s]


 44%|██████████████▋                  | 22208/50000 [4:01:44<5:23:54,  1.43it/s]


 44%|██████████████▋                  | 22209/50000 [4:01:45<5:18:12,  1.46it/s]


 44%|██████████████▋                  | 22210/50000 [4:01:46<5:09:54,  1.49it/s]


 44%|██████████████▋                  | 22211/50000 [4:01:46<4:59:36,  1.55it/s]


 44%|██████████████▋                  | 22212/50000 [4:01:47<5:02:57,  1.53it/s]


 44%|██████████████▋                  | 22213/50000 [4:01:48<4:59:10,  1.55it/s]


 44%|██████████████▋                  | 22214/50000 [4:01:48<5:07:58,  1.50it/s]


 44%|██████████████▋                  | 22215/50000 [4:01:49<5:09:40,  1.50it/s]


 44%|██████████████▋                  | 22216/50000 [4:01:50<4:58:34,  1.55it/s]


 44%|██████████████▋                  | 22217/50000 [4:01:50<5:03:04,  1.53it/s]


 44%|██████████████▋                  | 22218/50000 [4:01:51<5:00:57,  1.54it/s]


 44%|██████████████▋                  | 22219/50000 [4:01:52<4:54:58,  1.57it/s]


 44%|██████████████▋                  | 22220/50000 [4:01:52<5:08:46,  1.50it/s]


 44%|██████████████▋                  | 22221/50000 [4:01:53<5:21:09,  1.44it/s]


 44%|██████████████▋                  | 22222/50000 [4:01:54<5:07:24,  1.51it/s]


 44%|██████████████▋                  | 22223/50000 [4:01:54<5:03:55,  1.52it/s]


 44%|██████████████▋                  | 22224/50000 [4:01:55<5:17:39,  1.46it/s]


 44%|██████████████▋                  | 22225/50000 [4:01:56<5:16:22,  1.46it/s]


 44%|██████████████▋                  | 22226/50000 [4:01:56<5:04:56,  1.52it/s]


 44%|██████████████▋                  | 22227/50000 [4:01:57<5:08:41,  1.50it/s]


 44%|██████████████▋                  | 22228/50000 [4:01:58<5:06:38,  1.51it/s]


 44%|██████████████▋                  | 22229/50000 [4:01:58<4:55:30,  1.57it/s]


 44%|██████████████▋                  | 22230/50000 [4:01:59<4:49:32,  1.60it/s]


 44%|██████████████▋                  | 22231/50000 [4:02:00<4:57:43,  1.55it/s]


 44%|██████████████▋                  | 22232/50000 [4:02:00<4:50:58,  1.59it/s]


 44%|██████████████▋                  | 22233/50000 [4:02:01<4:42:51,  1.64it/s]


 44%|██████████████▋                  | 22234/50000 [4:02:01<4:44:10,  1.63it/s]


 44%|██████████████▋                  | 22235/50000 [4:02:02<4:58:12,  1.55it/s]


 44%|██████████████▋                  | 22236/50000 [4:02:03<4:56:36,  1.56it/s]


 44%|██████████████▋                  | 22237/50000 [4:02:03<4:50:13,  1.59it/s]


 44%|██████████████▋                  | 22238/50000 [4:02:04<4:40:33,  1.65it/s]


 44%|██████████████▋                  | 22239/50000 [4:02:04<4:44:17,  1.63it/s]


 44%|██████████████▋                  | 22240/50000 [4:02:05<4:48:48,  1.60it/s]


 44%|██████████████▋                  | 22241/50000 [4:02:06<4:39:50,  1.65it/s]


 44%|██████████████▋                  | 22242/50000 [4:02:06<4:46:24,  1.62it/s]


 44%|██████████████▋                  | 22243/50000 [4:02:07<4:47:23,  1.61it/s]


 44%|██████████████▋                  | 22244/50000 [4:02:08<4:52:42,  1.58it/s]


 44%|██████████████▋                  | 22245/50000 [4:02:08<4:40:41,  1.65it/s]


 44%|██████████████▋                  | 22246/50000 [4:02:09<4:43:27,  1.63it/s]


 44%|██████████████▋                  | 22247/50000 [4:02:09<4:40:17,  1.65it/s]


 44%|██████████████▋                  | 22248/50000 [4:02:10<4:49:20,  1.60it/s]


 44%|██████████████▋                  | 22249/50000 [4:02:11<4:50:46,  1.59it/s]


 44%|██████████████▋                  | 22250/50000 [4:02:11<5:05:55,  1.51it/s]


 45%|██████████████▋                  | 22251/50000 [4:02:12<5:16:21,  1.46it/s]


 45%|██████████████▋                  | 22252/50000 [4:02:13<4:59:36,  1.54it/s]


 45%|██████████████▋                  | 22253/50000 [4:02:13<5:01:22,  1.53it/s]


 45%|██████████████▋                  | 22254/50000 [4:02:14<4:53:45,  1.57it/s]


 45%|██████████████▋                  | 22255/50000 [4:02:15<4:59:31,  1.54it/s]


 45%|██████████████▋                  | 22256/50000 [4:02:15<4:52:07,  1.58it/s]


 45%|██████████████▋                  | 22257/50000 [4:02:16<4:57:31,  1.55it/s]


 45%|██████████████▋                  | 22258/50000 [4:02:16<4:48:11,  1.60it/s]


 45%|██████████████▋                  | 22259/50000 [4:02:17<4:38:07,  1.66it/s]


 45%|██████████████▋                  | 22260/50000 [4:02:18<4:38:49,  1.66it/s]


 45%|██████████████▋                  | 22261/50000 [4:02:18<4:41:15,  1.64it/s]


 45%|██████████████▋                  | 22262/50000 [4:02:19<4:39:28,  1.65it/s]


 45%|██████████████▋                  | 22263/50000 [4:02:19<4:47:48,  1.61it/s]


 45%|██████████████▋                  | 22264/50000 [4:02:20<4:54:29,  1.57it/s]


 45%|██████████████▋                  | 22265/50000 [4:02:21<4:53:56,  1.57it/s]


 45%|██████████████▋                  | 22266/50000 [4:02:21<4:49:55,  1.59it/s]


 45%|██████████████▋                  | 22267/50000 [4:02:22<5:06:51,  1.51it/s]


 45%|██████████████▋                  | 22268/50000 [4:02:23<5:18:08,  1.45it/s]


 45%|██████████████▋                  | 22269/50000 [4:02:24<5:17:45,  1.45it/s]


 45%|██████████████▋                  | 22270/50000 [4:02:24<5:03:11,  1.52it/s]


 45%|██████████████▋                  | 22271/50000 [4:02:25<5:02:36,  1.53it/s]


 45%|██████████████▋                  | 22272/50000 [4:02:26<5:12:09,  1.48it/s]


 45%|██████████████▋                  | 22273/50000 [4:02:26<4:54:02,  1.57it/s]


 45%|██████████████▋                  | 22274/50000 [4:02:27<5:14:06,  1.47it/s]


 45%|██████████████▋                  | 22275/50000 [4:02:28<5:28:08,  1.41it/s]


 45%|██████████████▋                  | 22276/50000 [4:02:28<5:21:20,  1.44it/s]


 45%|██████████████▋                  | 22277/50000 [4:02:29<5:44:34,  1.34it/s]


 45%|██████████████▋                  | 22278/50000 [4:02:30<5:19:16,  1.45it/s]


 45%|██████████████▋                  | 22279/50000 [4:02:30<5:23:47,  1.43it/s]


 45%|██████████████▋                  | 22280/50000 [4:02:31<5:17:53,  1.45it/s]


 45%|██████████████▋                  | 22281/50000 [4:02:32<5:04:07,  1.52it/s]


 45%|██████████████▋                  | 22282/50000 [4:02:32<5:11:11,  1.48it/s]


 45%|██████████████▋                  | 22283/50000 [4:02:33<5:00:15,  1.54it/s]


 45%|██████████████▋                  | 22284/50000 [4:02:34<5:22:26,  1.43it/s]


 45%|██████████████▋                  | 22285/50000 [4:02:35<6:09:02,  1.25it/s]


 45%|██████████████▋                  | 22286/50000 [4:02:36<5:50:27,  1.32it/s]


 45%|██████████████▋                  | 22287/50000 [4:02:36<5:32:12,  1.39it/s]


 45%|██████████████▋                  | 22288/50000 [4:02:37<5:13:17,  1.47it/s]


 45%|██████████████▋                  | 22289/50000 [4:02:37<4:57:25,  1.55it/s]


 45%|██████████████▋                  | 22290/50000 [4:02:38<4:51:29,  1.58it/s]


 45%|██████████████▋                  | 22291/50000 [4:02:38<4:43:53,  1.63it/s]


 45%|██████████████▋                  | 22292/50000 [4:02:39<5:01:41,  1.53it/s]


 45%|██████████████▋                  | 22293/50000 [4:02:40<4:49:16,  1.60it/s]


 45%|██████████████▋                  | 22294/50000 [4:02:40<4:50:15,  1.59it/s]


 45%|██████████████▋                  | 22295/50000 [4:02:41<5:02:41,  1.53it/s]


 45%|██████████████▋                  | 22296/50000 [4:02:42<5:02:11,  1.53it/s]


 45%|██████████████▋                  | 22297/50000 [4:02:42<4:54:48,  1.57it/s]


 45%|██████████████▋                  | 22298/50000 [4:02:43<4:37:13,  1.67it/s]


 45%|██████████████▋                  | 22299/50000 [4:02:44<4:45:33,  1.62it/s]


 45%|██████████████▋                  | 22300/50000 [4:02:44<4:32:32,  1.69it/s]
                                                                                
{'loss': 3.3446, 'grad_norm': 4.03386116027832, 'learning_rate': 0.000554, 'epoch': 1.17}

 45%|██████████████▋                  | 22300/50000 [4:02:44<4:32:32,  1.69it/s]


 45%|██████████████▋                  | 22301/50000 [4:02:45<4:51:26,  1.58it/s]


 45%|██████████████▋                  | 22302/50000 [4:02:46<5:02:34,  1.53it/s]


 45%|██████████████▋                  | 22303/50000 [4:02:46<5:29:32,  1.40it/s]


 45%|██████████████▋                  | 22304/50000 [4:02:47<5:20:18,  1.44it/s]


 45%|██████████████▋                  | 22305/50000 [4:02:48<5:15:40,  1.46it/s]


 45%|██████████████▋                  | 22306/50000 [4:02:48<5:08:03,  1.50it/s]


 45%|██████████████▋                  | 22307/50000 [4:02:49<4:52:03,  1.58it/s]


 45%|██████████████▋                  | 22308/50000 [4:02:49<4:42:27,  1.63it/s]


 45%|██████████████▋                  | 22309/50000 [4:02:50<4:46:15,  1.61it/s]


 45%|██████████████▋                  | 22310/50000 [4:02:51<4:42:46,  1.63it/s]


 45%|██████████████▋                  | 22311/50000 [4:02:51<5:04:17,  1.52it/s]


 45%|██████████████▋                  | 22312/50000 [4:02:52<5:14:10,  1.47it/s]


 45%|██████████████▋                  | 22313/50000 [4:02:53<5:00:34,  1.54it/s]


 45%|██████████████▋                  | 22314/50000 [4:02:53<5:12:20,  1.48it/s]


 45%|██████████████▋                  | 22315/50000 [4:02:54<5:22:25,  1.43it/s]


 45%|██████████████▋                  | 22316/50000 [4:02:55<5:08:03,  1.50it/s]


 45%|██████████████▋                  | 22317/50000 [4:02:55<4:56:17,  1.56it/s]


 45%|██████████████▋                  | 22318/50000 [4:02:56<4:34:35,  1.68it/s]


 45%|██████████████▋                  | 22319/50000 [4:02:57<4:55:52,  1.56it/s]


 45%|██████████████▋                  | 22320/50000 [4:02:58<5:51:16,  1.31it/s]


 45%|██████████████▋                  | 22321/50000 [4:02:58<5:23:04,  1.43it/s]


 45%|██████████████▋                  | 22322/50000 [4:02:59<5:12:52,  1.47it/s]


 45%|██████████████▋                  | 22323/50000 [4:03:00<5:11:58,  1.48it/s]


 45%|██████████████▋                  | 22324/50000 [4:03:00<5:38:25,  1.36it/s]


 45%|██████████████▋                  | 22325/50000 [4:03:01<5:36:25,  1.37it/s]


 45%|██████████████▋                  | 22326/50000 [4:03:02<5:23:26,  1.43it/s]


 45%|██████████████▋                  | 22327/50000 [4:03:02<5:13:08,  1.47it/s]


 45%|██████████████▋                  | 22328/50000 [4:03:03<5:01:28,  1.53it/s]


 45%|██████████████▋                  | 22329/50000 [4:03:04<4:54:12,  1.57it/s]


 45%|██████████████▋                  | 22330/50000 [4:03:04<4:49:57,  1.59it/s]


 45%|██████████████▋                  | 22331/50000 [4:03:05<4:57:30,  1.55it/s]


 45%|██████████████▋                  | 22332/50000 [4:03:05<4:48:50,  1.60it/s]


 45%|██████████████▋                  | 22333/50000 [4:03:06<4:53:14,  1.57it/s]


 45%|██████████████▋                  | 22334/50000 [4:03:07<4:34:04,  1.68it/s]


 45%|██████████████▋                  | 22335/50000 [4:03:07<4:31:30,  1.70it/s]


 45%|██████████████▋                  | 22336/50000 [4:03:08<4:27:27,  1.72it/s]


 45%|██████████████▋                  | 22337/50000 [4:03:08<4:37:35,  1.66it/s]


 45%|██████████████▋                  | 22338/50000 [4:03:09<4:42:40,  1.63it/s]


 45%|██████████████▋                  | 22339/50000 [4:03:10<5:10:40,  1.48it/s]


 45%|██████████████▋                  | 22340/50000 [4:03:10<5:00:38,  1.53it/s]


 45%|██████████████▋                  | 22341/50000 [4:03:11<5:27:36,  1.41it/s]


 45%|██████████████▋                  | 22342/50000 [4:03:12<5:12:26,  1.48it/s]


 45%|██████████████▋                  | 22343/50000 [4:03:13<5:32:23,  1.39it/s]


 45%|██████████████▋                  | 22344/50000 [4:03:13<5:02:17,  1.52it/s]


 45%|██████████████▋                  | 22345/50000 [4:03:14<5:52:49,  1.31it/s]


 45%|██████████████▋                  | 22346/50000 [4:03:15<5:41:38,  1.35it/s]


 45%|██████████████▋                  | 22347/50000 [4:03:16<5:47:11,  1.33it/s]


 45%|██████████████▋                  | 22348/50000 [4:03:16<5:32:16,  1.39it/s]


 45%|██████████████▊                  | 22349/50000 [4:03:17<5:26:12,  1.41it/s]


 45%|██████████████▊                  | 22350/50000 [4:03:18<5:54:55,  1.30it/s]


 45%|██████████████▊                  | 22351/50000 [4:03:19<5:35:07,  1.38it/s]


 45%|██████████████▊                  | 22352/50000 [4:03:19<5:23:02,  1.43it/s]


 45%|██████████████▊                  | 22353/50000 [4:03:20<5:06:50,  1.50it/s]


 45%|██████████████▊                  | 22354/50000 [4:03:20<5:00:38,  1.53it/s]


 45%|██████████████▊                  | 22355/50000 [4:03:21<4:38:22,  1.66it/s]


 45%|██████████████▊                  | 22356/50000 [4:03:22<4:44:55,  1.62it/s]


 45%|██████████████▊                  | 22357/50000 [4:03:22<4:54:19,  1.57it/s]


 45%|██████████████▊                  | 22358/50000 [4:03:23<4:59:51,  1.54it/s]


 45%|██████████████▊                  | 22359/50000 [4:03:24<4:53:01,  1.57it/s]


 45%|██████████████▊                  | 22360/50000 [4:03:24<4:45:11,  1.62it/s]


 45%|██████████████▊                  | 22361/50000 [4:03:25<4:42:07,  1.63it/s]


 45%|██████████████▊                  | 22362/50000 [4:03:25<5:02:48,  1.52it/s]


 45%|██████████████▊                  | 22363/50000 [4:03:26<4:54:10,  1.57it/s]


 45%|██████████████▊                  | 22364/50000 [4:03:27<4:52:32,  1.57it/s]


 45%|██████████████▊                  | 22365/50000 [4:03:27<4:52:22,  1.58it/s]


 45%|██████████████▊                  | 22366/50000 [4:03:28<4:46:58,  1.60it/s]


 45%|██████████████▊                  | 22367/50000 [4:03:29<5:07:04,  1.50it/s]


 45%|██████████████▊                  | 22368/50000 [4:03:29<5:08:31,  1.49it/s]


 45%|██████████████▊                  | 22369/50000 [4:03:30<5:05:14,  1.51it/s]


 45%|██████████████▊                  | 22370/50000 [4:03:31<5:11:43,  1.48it/s]


 45%|██████████████▊                  | 22371/50000 [4:03:31<5:20:53,  1.44it/s]


 45%|██████████████▊                  | 22372/50000 [4:03:32<5:17:25,  1.45it/s]


 45%|██████████████▊                  | 22373/50000 [4:03:33<5:15:46,  1.46it/s]


 45%|██████████████▊                  | 22374/50000 [4:03:33<5:10:15,  1.48it/s]


 45%|██████████████▊                  | 22375/50000 [4:03:34<5:06:19,  1.50it/s]


 45%|██████████████▊                  | 22376/50000 [4:03:35<5:02:39,  1.52it/s]


 45%|██████████████▊                  | 22377/50000 [4:03:35<5:09:58,  1.49it/s]


 45%|██████████████▊                  | 22378/50000 [4:03:36<4:49:22,  1.59it/s]


 45%|██████████████▊                  | 22379/50000 [4:03:37<5:07:14,  1.50it/s]


 45%|██████████████▊                  | 22380/50000 [4:03:37<4:53:24,  1.57it/s]


 45%|██████████████▊                  | 22381/50000 [4:03:38<5:12:54,  1.47it/s]


 45%|██████████████▊                  | 22382/50000 [4:03:39<5:10:18,  1.48it/s]


 45%|██████████████▊                  | 22383/50000 [4:03:39<5:00:33,  1.53it/s]


 45%|██████████████▊                  | 22384/50000 [4:03:40<5:02:25,  1.52it/s]


 45%|██████████████▊                  | 22385/50000 [4:03:41<5:01:05,  1.53it/s]


 45%|██████████████▊                  | 22386/50000 [4:03:41<4:58:26,  1.54it/s]


 45%|██████████████▊                  | 22387/50000 [4:03:42<4:46:59,  1.60it/s]


 45%|██████████████▊                  | 22388/50000 [4:03:43<4:49:21,  1.59it/s]


 45%|██████████████▊                  | 22389/50000 [4:03:43<4:43:15,  1.62it/s]


 45%|██████████████▊                  | 22390/50000 [4:03:44<4:47:43,  1.60it/s]


 45%|██████████████▊                  | 22391/50000 [4:03:44<4:43:47,  1.62it/s]


 45%|██████████████▊                  | 22392/50000 [4:03:45<5:15:44,  1.46it/s]


 45%|██████████████▊                  | 22393/50000 [4:03:46<5:01:26,  1.53it/s]


 45%|██████████████▊                  | 22394/50000 [4:03:46<4:48:46,  1.59it/s]


 45%|██████████████▊                  | 22395/50000 [4:03:47<5:04:37,  1.51it/s]


 45%|██████████████▊                  | 22396/50000 [4:03:48<4:59:46,  1.53it/s]


 45%|██████████████▊                  | 22397/50000 [4:03:48<4:50:13,  1.59it/s]


 45%|██████████████▊                  | 22398/50000 [4:03:49<4:31:00,  1.70it/s]


 45%|██████████████▊                  | 22399/50000 [4:03:49<4:43:06,  1.62it/s]


 45%|██████████████▊                  | 22400/50000 [4:03:50<4:38:31,  1.65it/s]
                                                                                
{'loss': 3.3328, 'grad_norm': 7.551347255706787, 'learning_rate': 0.0005520000000000001, 'epoch': 1.17}

 45%|██████████████▊                  | 22400/50000 [4:03:50<4:38:31,  1.65it/s]


 45%|██████████████▊                  | 22401/50000 [4:03:51<5:06:42,  1.50it/s]


 45%|██████████████▊                  | 22402/50000 [4:03:51<4:57:39,  1.55it/s]


 45%|██████████████▊                  | 22403/50000 [4:03:52<4:51:16,  1.58it/s]


 45%|██████████████▊                  | 22404/50000 [4:03:53<4:47:18,  1.60it/s]


 45%|██████████████▊                  | 22405/50000 [4:03:53<4:50:20,  1.58it/s]


 45%|██████████████▊                  | 22406/50000 [4:03:54<4:45:21,  1.61it/s]


 45%|██████████████▊                  | 22407/50000 [4:03:55<5:02:19,  1.52it/s]


 45%|██████████████▊                  | 22408/50000 [4:03:55<4:57:18,  1.55it/s]


 45%|██████████████▊                  | 22409/50000 [4:03:56<4:55:27,  1.56it/s]


 45%|██████████████▊                  | 22410/50000 [4:03:57<4:56:43,  1.55it/s]


 45%|██████████████▊                  | 22411/50000 [4:03:57<5:11:05,  1.48it/s]


 45%|██████████████▊                  | 22412/50000 [4:03:58<5:22:56,  1.42it/s]


 45%|██████████████▊                  | 22413/50000 [4:03:59<5:20:19,  1.44it/s]


 45%|██████████████▊                  | 22414/50000 [4:03:59<5:00:16,  1.53it/s]


 45%|██████████████▊                  | 22415/50000 [4:04:00<5:08:49,  1.49it/s]


 45%|██████████████▊                  | 22416/50000 [4:04:01<4:54:03,  1.56it/s]


 45%|██████████████▊                  | 22417/50000 [4:04:01<5:00:02,  1.53it/s]


 45%|██████████████▊                  | 22418/50000 [4:04:02<4:52:01,  1.57it/s]


 45%|██████████████▊                  | 22419/50000 [4:04:03<5:45:22,  1.33it/s]


 45%|██████████████▊                  | 22420/50000 [4:04:04<5:39:35,  1.35it/s]


 45%|██████████████▊                  | 22421/50000 [4:04:04<5:25:50,  1.41it/s]


 45%|██████████████▊                  | 22422/50000 [4:04:05<5:22:22,  1.43it/s]


 45%|██████████████▊                  | 22423/50000 [4:04:06<5:33:20,  1.38it/s]


 45%|██████████████▊                  | 22424/50000 [4:04:06<5:22:19,  1.43it/s]


 45%|██████████████▊                  | 22425/50000 [4:04:07<5:32:26,  1.38it/s]


 45%|██████████████▊                  | 22426/50000 [4:04:08<5:14:42,  1.46it/s]


 45%|██████████████▊                  | 22427/50000 [4:04:09<5:38:08,  1.36it/s]


 45%|██████████████▊                  | 22428/50000 [4:04:09<5:19:34,  1.44it/s]


 45%|██████████████▊                  | 22429/50000 [4:04:10<5:09:16,  1.49it/s]


 45%|██████████████▊                  | 22430/50000 [4:04:10<5:06:16,  1.50it/s]


 45%|██████████████▊                  | 22431/50000 [4:04:11<5:05:04,  1.51it/s]


 45%|██████████████▊                  | 22432/50000 [4:04:12<4:54:41,  1.56it/s]


 45%|██████████████▊                  | 22433/50000 [4:04:12<4:43:06,  1.62it/s]


 45%|██████████████▊                  | 22434/50000 [4:04:13<4:46:15,  1.60it/s]


 45%|██████████████▊                  | 22435/50000 [4:04:13<4:43:12,  1.62it/s]


 45%|██████████████▊                  | 22436/50000 [4:04:14<4:44:34,  1.61it/s]


 45%|██████████████▊                  | 22437/50000 [4:04:15<4:52:30,  1.57it/s]


 45%|██████████████▊                  | 22438/50000 [4:04:15<4:52:02,  1.57it/s]


 45%|██████████████▊                  | 22439/50000 [4:04:16<4:50:49,  1.58it/s]


 45%|██████████████▊                  | 22440/50000 [4:04:17<4:39:32,  1.64it/s]


 45%|██████████████▊                  | 22441/50000 [4:04:17<4:36:46,  1.66it/s]


 45%|██████████████▊                  | 22442/50000 [4:04:18<4:55:14,  1.56it/s]


 45%|██████████████▊                  | 22443/50000 [4:04:19<4:57:21,  1.54it/s]


 45%|██████████████▊                  | 22444/50000 [4:04:19<4:49:18,  1.59it/s]


 45%|██████████████▊                  | 22445/50000 [4:04:20<4:34:24,  1.67it/s]


 45%|██████████████▊                  | 22446/50000 [4:04:20<4:40:06,  1.64it/s]


 45%|██████████████▊                  | 22447/50000 [4:04:21<4:53:47,  1.56it/s]


 45%|██████████████▊                  | 22448/50000 [4:04:22<4:40:41,  1.64it/s]


 45%|██████████████▊                  | 22449/50000 [4:04:22<4:35:46,  1.67it/s]


 45%|██████████████▊                  | 22450/50000 [4:04:23<4:30:39,  1.70it/s]


 45%|██████████████▊                  | 22451/50000 [4:04:23<4:41:03,  1.63it/s]


 45%|██████████████▊                  | 22452/50000 [4:04:24<4:39:32,  1.64it/s]


 45%|██████████████▊                  | 22453/50000 [4:04:25<4:35:54,  1.66it/s]


 45%|██████████████▊                  | 22454/50000 [4:04:25<4:51:36,  1.57it/s]


 45%|██████████████▊                  | 22455/50000 [4:04:26<4:43:24,  1.62it/s]


 45%|██████████████▊                  | 22456/50000 [4:04:26<4:35:54,  1.66it/s]


 45%|██████████████▊                  | 22457/50000 [4:04:27<4:55:28,  1.55it/s]


 45%|██████████████▊                  | 22458/50000 [4:04:28<4:47:55,  1.59it/s]


 45%|██████████████▊                  | 22459/50000 [4:04:28<5:04:40,  1.51it/s]


 45%|██████████████▊                  | 22460/50000 [4:04:29<4:56:56,  1.55it/s]


 45%|██████████████▊                  | 22461/50000 [4:04:30<5:10:15,  1.48it/s]


 45%|██████████████▊                  | 22462/50000 [4:04:30<4:53:04,  1.57it/s]


 45%|██████████████▊                  | 22463/50000 [4:04:31<4:51:30,  1.57it/s]


 45%|██████████████▊                  | 22464/50000 [4:04:32<4:50:22,  1.58it/s]


 45%|██████████████▊                  | 22465/50000 [4:04:32<4:43:29,  1.62it/s]


 45%|██████████████▊                  | 22466/50000 [4:04:33<4:50:49,  1.58it/s]


 45%|██████████████▊                  | 22467/50000 [4:04:34<5:15:03,  1.46it/s]


 45%|██████████████▊                  | 22468/50000 [4:04:34<5:10:17,  1.48it/s]


 45%|██████████████▊                  | 22469/50000 [4:04:35<5:18:14,  1.44it/s]


 45%|██████████████▊                  | 22470/50000 [4:04:36<5:20:46,  1.43it/s]


 45%|██████████████▊                  | 22471/50000 [4:04:36<4:59:35,  1.53it/s]


 45%|██████████████▊                  | 22472/50000 [4:04:37<4:51:35,  1.57it/s]


 45%|██████████████▊                  | 22473/50000 [4:04:38<4:51:18,  1.57it/s]


 45%|██████████████▊                  | 22474/50000 [4:04:38<4:51:52,  1.57it/s]


 45%|██████████████▊                  | 22475/50000 [4:04:39<4:45:20,  1.61it/s]


 45%|██████████████▊                  | 22476/50000 [4:04:39<4:37:19,  1.65it/s]


 45%|██████████████▊                  | 22477/50000 [4:04:40<4:36:52,  1.66it/s]


 45%|██████████████▊                  | 22478/50000 [4:04:41<4:35:40,  1.66it/s]


 45%|██████████████▊                  | 22479/50000 [4:04:41<4:51:26,  1.57it/s]


 45%|██████████████▊                  | 22480/50000 [4:04:42<4:49:24,  1.58it/s]


 45%|██████████████▊                  | 22481/50000 [4:04:43<4:54:03,  1.56it/s]


 45%|██████████████▊                  | 22482/50000 [4:04:43<4:55:32,  1.55it/s]


 45%|██████████████▊                  | 22483/50000 [4:04:44<4:43:59,  1.61it/s]


 45%|██████████████▊                  | 22484/50000 [4:04:44<4:50:16,  1.58it/s]


 45%|██████████████▊                  | 22485/50000 [4:04:45<4:49:24,  1.58it/s]


 45%|██████████████▊                  | 22486/50000 [4:04:46<5:27:08,  1.40it/s]


 45%|██████████████▊                  | 22487/50000 [4:04:47<5:11:43,  1.47it/s]


 45%|██████████████▊                  | 22488/50000 [4:04:47<5:30:27,  1.39it/s]


 45%|██████████████▊                  | 22489/50000 [4:04:48<5:20:29,  1.43it/s]


 45%|██████████████▊                  | 22490/50000 [4:04:49<5:01:12,  1.52it/s]


 45%|██████████████▊                  | 22491/50000 [4:04:49<4:57:47,  1.54it/s]


 45%|██████████████▊                  | 22492/50000 [4:04:50<4:47:46,  1.59it/s]


 45%|██████████████▊                  | 22493/50000 [4:04:50<4:48:31,  1.59it/s]


 45%|██████████████▊                  | 22494/50000 [4:04:51<4:40:35,  1.63it/s]


 45%|██████████████▊                  | 22495/50000 [4:04:52<4:49:50,  1.58it/s]


 45%|██████████████▊                  | 22496/50000 [4:04:52<4:43:25,  1.62it/s]


 45%|██████████████▊                  | 22497/50000 [4:04:53<4:58:03,  1.54it/s]


 45%|██████████████▊                  | 22498/50000 [4:04:54<4:47:36,  1.59it/s]


 45%|██████████████▊                  | 22499/50000 [4:04:54<4:40:01,  1.64it/s]


 45%|██████████████▊                  | 22500/50000 [4:04:55<4:38:09,  1.65it/s]
                                                                                
{'loss': 3.3141, 'grad_norm': 3.2756407260894775, 'learning_rate': 0.00055, 'epoch': 1.18}

 45%|██████████████▊                  | 22500/50000 [4:04:55<4:38:09,  1.65it/s]


 45%|██████████████▊                  | 22501/50000 [4:04:55<4:48:12,  1.59it/s]


 45%|██████████████▊                  | 22502/50000 [4:04:56<4:38:38,  1.64it/s]


 45%|██████████████▊                  | 22503/50000 [4:04:57<4:44:52,  1.61it/s]


 45%|██████████████▊                  | 22504/50000 [4:04:57<5:01:50,  1.52it/s]


 45%|██████████████▊                  | 22505/50000 [4:04:58<4:49:05,  1.59it/s]


 45%|██████████████▊                  | 22506/50000 [4:04:59<4:43:33,  1.62it/s]


 45%|██████████████▊                  | 22507/50000 [4:04:59<4:40:33,  1.63it/s]


 45%|██████████████▊                  | 22508/50000 [4:05:00<4:31:33,  1.69it/s]


 45%|██████████████▊                  | 22509/50000 [4:05:01<5:15:34,  1.45it/s]


 45%|██████████████▊                  | 22510/50000 [4:05:01<5:13:04,  1.46it/s]


 45%|██████████████▊                  | 22511/50000 [4:05:02<4:55:45,  1.55it/s]


 45%|██████████████▊                  | 22512/50000 [4:05:03<5:06:32,  1.49it/s]


 45%|██████████████▊                  | 22513/50000 [4:05:03<5:06:50,  1.49it/s]


 45%|██████████████▊                  | 22514/50000 [4:05:04<4:54:53,  1.55it/s]


 45%|██████████████▊                  | 22515/50000 [4:05:04<4:49:56,  1.58it/s]


 45%|██████████████▊                  | 22516/50000 [4:05:05<4:52:29,  1.57it/s]


 45%|██████████████▊                  | 22517/50000 [4:05:06<4:56:03,  1.55it/s]


 45%|██████████████▊                  | 22518/50000 [4:05:06<4:50:43,  1.58it/s]


 45%|██████████████▊                  | 22519/50000 [4:05:07<4:41:01,  1.63it/s]


 45%|██████████████▊                  | 22520/50000 [4:05:08<4:49:45,  1.58it/s]


 45%|██████████████▊                  | 22521/50000 [4:05:08<5:07:34,  1.49it/s]


 45%|██████████████▊                  | 22522/50000 [4:05:09<5:02:17,  1.52it/s]


 45%|██████████████▊                  | 22523/50000 [4:05:10<4:49:05,  1.58it/s]


 45%|██████████████▊                  | 22524/50000 [4:05:10<4:45:47,  1.60it/s]


 45%|██████████████▊                  | 22525/50000 [4:05:11<5:03:50,  1.51it/s]


 45%|██████████████▊                  | 22526/50000 [4:05:12<4:55:20,  1.55it/s]


 45%|██████████████▊                  | 22527/50000 [4:05:12<4:58:59,  1.53it/s]


 45%|██████████████▊                  | 22528/50000 [4:05:13<4:55:26,  1.55it/s]


 45%|██████████████▊                  | 22529/50000 [4:05:13<4:42:18,  1.62it/s]


 45%|██████████████▊                  | 22530/50000 [4:05:14<4:35:52,  1.66it/s]


 45%|██████████████▊                  | 22531/50000 [4:05:15<4:40:52,  1.63it/s]


 45%|██████████████▊                  | 22532/50000 [4:05:15<4:43:37,  1.61it/s]


 45%|██████████████▊                  | 22533/50000 [4:05:16<4:34:32,  1.67it/s]


 45%|██████████████▊                  | 22534/50000 [4:05:16<4:24:04,  1.73it/s]


 45%|██████████████▊                  | 22535/50000 [4:05:17<4:36:58,  1.65it/s]


 45%|██████████████▊                  | 22536/50000 [4:05:18<4:43:25,  1.61it/s]


 45%|██████████████▊                  | 22537/50000 [4:05:18<4:48:46,  1.59it/s]


 45%|██████████████▉                  | 22538/50000 [4:05:19<4:53:24,  1.56it/s]


 45%|██████████████▉                  | 22539/50000 [4:05:20<4:55:50,  1.55it/s]


 45%|██████████████▉                  | 22540/50000 [4:05:20<4:50:27,  1.58it/s]


 45%|██████████████▉                  | 22541/50000 [4:05:21<4:45:50,  1.60it/s]


 45%|██████████████▉                  | 22542/50000 [4:05:21<4:39:15,  1.64it/s]


 45%|██████████████▉                  | 22543/50000 [4:05:22<4:46:52,  1.60it/s]


 45%|██████████████▉                  | 22544/50000 [4:05:23<5:02:03,  1.51it/s]


 45%|██████████████▉                  | 22545/50000 [4:05:23<4:52:14,  1.57it/s]


 45%|██████████████▉                  | 22546/50000 [4:05:24<4:47:06,  1.59it/s]


 45%|██████████████▉                  | 22547/50000 [4:05:25<4:44:23,  1.61it/s]


 45%|██████████████▉                  | 22548/50000 [4:05:25<4:39:00,  1.64it/s]


 45%|██████████████▉                  | 22549/50000 [4:05:26<4:40:28,  1.63it/s]


 45%|██████████████▉                  | 22550/50000 [4:05:26<4:42:18,  1.62it/s]


 45%|██████████████▉                  | 22551/50000 [4:05:27<4:46:17,  1.60it/s]


 45%|██████████████▉                  | 22552/50000 [4:05:28<5:14:10,  1.46it/s]


 45%|██████████████▉                  | 22553/50000 [4:05:29<5:12:13,  1.47it/s]


 45%|██████████████▉                  | 22554/50000 [4:05:29<5:19:02,  1.43it/s]


 45%|██████████████▉                  | 22555/50000 [4:05:30<5:17:26,  1.44it/s]


 45%|██████████████▉                  | 22556/50000 [4:05:31<5:21:58,  1.42it/s]


 45%|██████████████▉                  | 22557/50000 [4:05:31<5:14:18,  1.46it/s]


 45%|██████████████▉                  | 22558/50000 [4:05:32<5:01:44,  1.52it/s]


 45%|██████████████▉                  | 22559/50000 [4:05:33<4:58:07,  1.53it/s]


 45%|██████████████▉                  | 22560/50000 [4:05:33<5:10:37,  1.47it/s]


 45%|██████████████▉                  | 22561/50000 [4:05:34<5:11:18,  1.47it/s]


 45%|██████████████▉                  | 22562/50000 [4:05:35<4:48:49,  1.58it/s]


 45%|██████████████▉                  | 22563/50000 [4:05:35<5:13:13,  1.46it/s]


 45%|██████████████▉                  | 22564/50000 [4:05:36<5:09:33,  1.48it/s]


 45%|██████████████▉                  | 22565/50000 [4:05:37<4:51:30,  1.57it/s]


 45%|██████████████▉                  | 22566/50000 [4:05:37<5:02:35,  1.51it/s]


 45%|██████████████▉                  | 22567/50000 [4:05:38<4:50:57,  1.57it/s]


 45%|██████████████▉                  | 22568/50000 [4:05:38<4:50:33,  1.57it/s]


 45%|██████████████▉                  | 22569/50000 [4:05:39<4:52:00,  1.57it/s]


 45%|██████████████▉                  | 22570/50000 [4:05:40<5:02:34,  1.51it/s]


 45%|██████████████▉                  | 22571/50000 [4:05:40<4:55:06,  1.55it/s]


 45%|██████████████▉                  | 22572/50000 [4:05:41<5:10:17,  1.47it/s]


 45%|██████████████▉                  | 22573/50000 [4:05:42<5:04:01,  1.50it/s]


 45%|██████████████▉                  | 22574/50000 [4:05:42<4:51:51,  1.57it/s]


 45%|██████████████▉                  | 22575/50000 [4:05:43<4:51:57,  1.57it/s]


 45%|██████████████▉                  | 22576/50000 [4:05:44<5:02:32,  1.51it/s]


 45%|██████████████▉                  | 22577/50000 [4:05:44<4:53:20,  1.56it/s]


 45%|██████████████▉                  | 22578/50000 [4:05:45<4:59:00,  1.53it/s]


 45%|██████████████▉                  | 22579/50000 [4:05:46<5:10:19,  1.47it/s]


 45%|██████████████▉                  | 22580/50000 [4:05:46<4:56:16,  1.54it/s]


 45%|██████████████▉                  | 22581/50000 [4:05:47<4:47:25,  1.59it/s]


 45%|██████████████▉                  | 22582/50000 [4:05:48<4:40:29,  1.63it/s]


 45%|██████████████▉                  | 22583/50000 [4:05:48<4:46:50,  1.59it/s]


 45%|██████████████▉                  | 22584/50000 [4:05:49<4:37:30,  1.65it/s]


 45%|██████████████▉                  | 22585/50000 [4:05:49<4:31:50,  1.68it/s]


 45%|██████████████▉                  | 22586/50000 [4:05:50<4:38:11,  1.64it/s]


 45%|██████████████▉                  | 22587/50000 [4:05:51<5:13:52,  1.46it/s]


 45%|██████████████▉                  | 22588/50000 [4:05:51<4:50:41,  1.57it/s]


 45%|██████████████▉                  | 22589/50000 [4:05:52<4:53:02,  1.56it/s]


 45%|██████████████▉                  | 22590/50000 [4:05:53<4:56:32,  1.54it/s]


 45%|██████████████▉                  | 22591/50000 [4:05:53<4:55:25,  1.55it/s]


 45%|██████████████▉                  | 22592/50000 [4:05:54<4:54:40,  1.55it/s]


 45%|██████████████▉                  | 22593/50000 [4:05:55<4:53:03,  1.56it/s]


 45%|██████████████▉                  | 22594/50000 [4:05:55<4:46:37,  1.59it/s]


 45%|██████████████▉                  | 22595/50000 [4:05:56<5:03:15,  1.51it/s]


 45%|██████████████▉                  | 22596/50000 [4:05:57<5:12:16,  1.46it/s]


 45%|██████████████▉                  | 22597/50000 [4:05:57<5:05:18,  1.50it/s]


 45%|██████████████▉                  | 22598/50000 [4:05:58<5:24:32,  1.41it/s]


 45%|██████████████▉                  | 22599/50000 [4:05:59<5:17:21,  1.44it/s]


 45%|██████████████▉                  | 22600/50000 [4:06:00<5:46:08,  1.32it/s]
                                                                                
{'loss': 3.3281, 'grad_norm': 3.163090944290161, 'learning_rate': 0.0005480000000000001, 'epoch': 1.18}

 45%|██████████████▉                  | 22600/50000 [4:06:00<5:46:08,  1.32it/s]


 45%|██████████████▉                  | 22601/50000 [4:06:00<5:21:14,  1.42it/s]


 45%|██████████████▉                  | 22602/50000 [4:06:01<5:13:19,  1.46it/s]


 45%|██████████████▉                  | 22603/50000 [4:06:01<4:54:45,  1.55it/s]


 45%|██████████████▉                  | 22604/50000 [4:06:02<5:34:41,  1.36it/s]


 45%|██████████████▉                  | 22605/50000 [4:06:03<5:31:23,  1.38it/s]


 45%|██████████████▉                  | 22606/50000 [4:06:04<5:16:55,  1.44it/s]


 45%|██████████████▉                  | 22607/50000 [4:06:04<5:02:28,  1.51it/s]


 45%|██████████████▉                  | 22608/50000 [4:06:05<5:04:28,  1.50it/s]


 45%|██████████████▉                  | 22609/50000 [4:06:06<5:39:41,  1.34it/s]


 45%|██████████████▉                  | 22610/50000 [4:06:06<5:14:08,  1.45it/s]


 45%|██████████████▉                  | 22611/50000 [4:06:07<5:22:57,  1.41it/s]


 45%|██████████████▉                  | 22612/50000 [4:06:08<5:05:53,  1.49it/s]


 45%|██████████████▉                  | 22613/50000 [4:06:08<5:07:49,  1.48it/s]


 45%|██████████████▉                  | 22614/50000 [4:06:09<5:03:55,  1.50it/s]


 45%|██████████████▉                  | 22615/50000 [4:06:10<4:52:39,  1.56it/s]


 45%|██████████████▉                  | 22616/50000 [4:06:10<5:10:57,  1.47it/s]


 45%|██████████████▉                  | 22617/50000 [4:06:11<4:48:22,  1.58it/s]


 45%|██████████████▉                  | 22618/50000 [4:06:12<4:41:39,  1.62it/s]


 45%|██████████████▉                  | 22619/50000 [4:06:12<4:34:17,  1.66it/s]


 45%|██████████████▉                  | 22620/50000 [4:06:13<4:43:05,  1.61it/s]


 45%|██████████████▉                  | 22621/50000 [4:06:13<4:41:17,  1.62it/s]


 45%|██████████████▉                  | 22622/50000 [4:06:14<4:32:18,  1.68it/s]


 45%|██████████████▉                  | 22623/50000 [4:06:15<4:28:42,  1.70it/s]


 45%|██████████████▉                  | 22624/50000 [4:06:15<4:48:07,  1.58it/s]


 45%|██████████████▉                  | 22625/50000 [4:06:16<4:52:36,  1.56it/s]


 45%|██████████████▉                  | 22626/50000 [4:06:17<4:57:29,  1.53it/s]


 45%|██████████████▉                  | 22627/50000 [4:06:17<4:57:22,  1.53it/s]


 45%|██████████████▉                  | 22628/50000 [4:06:18<5:08:59,  1.48it/s]


 45%|██████████████▉                  | 22629/50000 [4:06:19<4:58:34,  1.53it/s]


 45%|██████████████▉                  | 22630/50000 [4:06:19<5:01:37,  1.51it/s]


 45%|██████████████▉                  | 22631/50000 [4:06:20<4:58:35,  1.53it/s]


 45%|██████████████▉                  | 22632/50000 [4:06:21<4:58:10,  1.53it/s]


 45%|██████████████▉                  | 22633/50000 [4:06:21<5:06:59,  1.49it/s]


 45%|██████████████▉                  | 22634/50000 [4:06:22<4:51:08,  1.57it/s]


 45%|██████████████▉                  | 22635/50000 [4:06:23<5:06:14,  1.49it/s]


 45%|██████████████▉                  | 22636/50000 [4:06:23<5:11:40,  1.46it/s]


 45%|██████████████▉                  | 22637/50000 [4:06:24<5:04:11,  1.50it/s]


 45%|██████████████▉                  | 22638/50000 [4:06:25<5:03:28,  1.50it/s]


 45%|██████████████▉                  | 22639/50000 [4:06:25<4:58:40,  1.53it/s]


 45%|██████████████▉                  | 22640/50000 [4:06:26<4:55:05,  1.55it/s]


 45%|██████████████▉                  | 22641/50000 [4:06:26<4:41:25,  1.62it/s]


 45%|██████████████▉                  | 22642/50000 [4:06:27<4:46:18,  1.59it/s]


 45%|██████████████▉                  | 22643/50000 [4:06:28<4:41:51,  1.62it/s]


 45%|██████████████▉                  | 22644/50000 [4:06:28<4:57:26,  1.53it/s]


 45%|██████████████▉                  | 22645/50000 [4:06:29<5:06:10,  1.49it/s]


 45%|██████████████▉                  | 22646/50000 [4:06:30<5:13:24,  1.45it/s]


 45%|██████████████▉                  | 22647/50000 [4:06:31<5:36:25,  1.36it/s]


 45%|██████████████▉                  | 22648/50000 [4:06:31<5:24:39,  1.40it/s]


 45%|██████████████▉                  | 22649/50000 [4:06:32<5:18:12,  1.43it/s]


 45%|██████████████▉                  | 22650/50000 [4:06:33<5:11:03,  1.47it/s]


 45%|██████████████▉                  | 22651/50000 [4:06:33<4:59:07,  1.52it/s]


 45%|██████████████▉                  | 22652/50000 [4:06:34<5:10:07,  1.47it/s]


 45%|██████████████▉                  | 22653/50000 [4:06:35<5:04:26,  1.50it/s]


 45%|██████████████▉                  | 22654/50000 [4:06:35<4:59:39,  1.52it/s]


 45%|██████████████▉                  | 22655/50000 [4:06:36<5:01:34,  1.51it/s]


 45%|██████████████▉                  | 22656/50000 [4:06:37<5:08:09,  1.48it/s]


 45%|██████████████▉                  | 22657/50000 [4:06:37<4:57:54,  1.53it/s]


 45%|██████████████▉                  | 22658/50000 [4:06:38<4:45:39,  1.60it/s]


 45%|██████████████▉                  | 22659/50000 [4:06:38<4:46:30,  1.59it/s]


 45%|██████████████▉                  | 22660/50000 [4:06:39<4:48:51,  1.58it/s]


 45%|██████████████▉                  | 22661/50000 [4:06:40<4:43:29,  1.61it/s]


 45%|██████████████▉                  | 22662/50000 [4:06:40<4:39:44,  1.63it/s]


 45%|██████████████▉                  | 22663/50000 [4:06:41<4:46:41,  1.59it/s]


 45%|██████████████▉                  | 22664/50000 [4:06:42<4:47:14,  1.59it/s]


 45%|██████████████▉                  | 22665/50000 [4:06:42<4:51:09,  1.56it/s]


 45%|██████████████▉                  | 22666/50000 [4:06:43<4:49:32,  1.57it/s]


 45%|██████████████▉                  | 22667/50000 [4:06:43<4:53:30,  1.55it/s]


 45%|██████████████▉                  | 22668/50000 [4:06:44<4:57:59,  1.53it/s]


 45%|██████████████▉                  | 22669/50000 [4:06:45<4:51:33,  1.56it/s]


 45%|██████████████▉                  | 22670/50000 [4:06:46<5:26:58,  1.39it/s]


 45%|██████████████▉                  | 22671/50000 [4:06:46<5:10:11,  1.47it/s]


 45%|██████████████▉                  | 22672/50000 [4:06:47<4:56:45,  1.53it/s]


 45%|██████████████▉                  | 22673/50000 [4:06:47<4:56:05,  1.54it/s]


 45%|██████████████▉                  | 22674/50000 [4:06:48<4:59:12,  1.52it/s]


 45%|██████████████▉                  | 22675/50000 [4:06:49<4:46:31,  1.59it/s]


 45%|██████████████▉                  | 22676/50000 [4:06:49<4:52:41,  1.56it/s]


 45%|██████████████▉                  | 22677/50000 [4:06:50<4:44:44,  1.60it/s]


 45%|██████████████▉                  | 22678/50000 [4:06:51<5:11:06,  1.46it/s]


 45%|██████████████▉                  | 22679/50000 [4:06:51<5:03:43,  1.50it/s]


 45%|██████████████▉                  | 22680/50000 [4:06:52<5:11:46,  1.46it/s]


 45%|██████████████▉                  | 22681/50000 [4:06:53<5:06:45,  1.48it/s]


 45%|██████████████▉                  | 22682/50000 [4:06:53<5:03:10,  1.50it/s]


 45%|██████████████▉                  | 22683/50000 [4:06:54<4:51:56,  1.56it/s]


 45%|██████████████▉                  | 22684/50000 [4:06:55<4:55:05,  1.54it/s]


 45%|██████████████▉                  | 22685/50000 [4:06:55<4:47:51,  1.58it/s]


 45%|██████████████▉                  | 22686/50000 [4:06:56<4:46:13,  1.59it/s]


 45%|██████████████▉                  | 22687/50000 [4:06:56<4:39:49,  1.63it/s]


 45%|██████████████▉                  | 22688/50000 [4:06:57<4:55:41,  1.54it/s]


 45%|██████████████▉                  | 22689/50000 [4:06:58<4:54:15,  1.55it/s]


 45%|██████████████▉                  | 22690/50000 [4:06:58<4:36:23,  1.65it/s]


 45%|██████████████▉                  | 22691/50000 [4:06:59<5:09:10,  1.47it/s]


 45%|██████████████▉                  | 22692/50000 [4:07:00<4:56:53,  1.53it/s]


 45%|██████████████▉                  | 22693/50000 [4:07:00<4:48:10,  1.58it/s]


 45%|██████████████▉                  | 22694/50000 [4:07:01<4:47:15,  1.58it/s]


 45%|██████████████▉                  | 22695/50000 [4:07:02<4:38:00,  1.64it/s]


 45%|██████████████▉                  | 22696/50000 [4:07:02<4:47:57,  1.58it/s]


 45%|██████████████▉                  | 22697/50000 [4:07:03<4:52:19,  1.56it/s]


 45%|██████████████▉                  | 22698/50000 [4:07:04<4:46:40,  1.59it/s]


 45%|██████████████▉                  | 22699/50000 [4:07:04<4:42:43,  1.61it/s]


 45%|██████████████▉                  | 22700/50000 [4:07:05<4:54:45,  1.54it/s]
                                                                                
{'loss': 3.3037, 'grad_norm': 3.345813751220703, 'learning_rate': 0.000546, 'epoch': 1.19}

 45%|██████████████▉                  | 22700/50000 [4:07:05<4:54:45,  1.54it/s]


 45%|██████████████▉                  | 22701/50000 [4:07:05<4:45:04,  1.60it/s]


 45%|██████████████▉                  | 22702/50000 [4:07:06<4:38:12,  1.64it/s]


 45%|██████████████▉                  | 22703/50000 [4:07:07<4:31:41,  1.67it/s]


 45%|██████████████▉                  | 22704/50000 [4:07:07<4:42:33,  1.61it/s]


 45%|██████████████▉                  | 22705/50000 [4:07:08<5:01:49,  1.51it/s]


 45%|██████████████▉                  | 22706/50000 [4:07:09<5:22:15,  1.41it/s]


 45%|██████████████▉                  | 22707/50000 [4:07:10<5:19:04,  1.43it/s]


 45%|██████████████▉                  | 22708/50000 [4:07:10<4:59:41,  1.52it/s]


 45%|██████████████▉                  | 22709/50000 [4:07:11<4:58:05,  1.53it/s]


 45%|██████████████▉                  | 22710/50000 [4:07:11<4:50:55,  1.56it/s]


 45%|██████████████▉                  | 22711/50000 [4:07:12<5:20:33,  1.42it/s]


 45%|██████████████▉                  | 22712/50000 [4:07:13<5:05:50,  1.49it/s]


 45%|██████████████▉                  | 22713/50000 [4:07:13<5:02:23,  1.50it/s]


 45%|██████████████▉                  | 22714/50000 [4:07:14<4:52:12,  1.56it/s]


 45%|██████████████▉                  | 22715/50000 [4:07:15<4:50:17,  1.57it/s]


 45%|██████████████▉                  | 22716/50000 [4:07:15<4:42:04,  1.61it/s]


 45%|██████████████▉                  | 22717/50000 [4:07:16<4:59:00,  1.52it/s]


 45%|██████████████▉                  | 22718/50000 [4:07:16<4:39:03,  1.63it/s]


 45%|██████████████▉                  | 22719/50000 [4:07:17<4:57:50,  1.53it/s]


 45%|██████████████▉                  | 22720/50000 [4:07:18<4:51:19,  1.56it/s]


 45%|██████████████▉                  | 22721/50000 [4:07:18<4:51:58,  1.56it/s]


 45%|██████████████▉                  | 22722/50000 [4:07:19<4:57:52,  1.53it/s]


 45%|██████████████▉                  | 22723/50000 [4:07:20<4:58:18,  1.52it/s]


 45%|██████████████▉                  | 22724/50000 [4:07:21<5:13:34,  1.45it/s]


 45%|██████████████▉                  | 22725/50000 [4:07:21<5:20:51,  1.42it/s]


 45%|██████████████▉                  | 22726/50000 [4:07:22<4:55:50,  1.54it/s]


 45%|██████████████▉                  | 22727/50000 [4:07:22<4:34:05,  1.66it/s]


 45%|███████████████                  | 22728/50000 [4:07:23<4:42:30,  1.61it/s]


 45%|███████████████                  | 22729/50000 [4:07:24<4:45:57,  1.59it/s]


 45%|███████████████                  | 22730/50000 [4:07:24<4:59:12,  1.52it/s]


 45%|███████████████                  | 22731/50000 [4:07:25<4:51:14,  1.56it/s]


 45%|███████████████                  | 22732/50000 [4:07:26<4:52:15,  1.56it/s]


 45%|███████████████                  | 22733/50000 [4:07:26<4:45:08,  1.59it/s]


 45%|███████████████                  | 22734/50000 [4:07:27<4:52:09,  1.56it/s]


 45%|███████████████                  | 22735/50000 [4:07:28<4:55:10,  1.54it/s]


 45%|███████████████                  | 22736/50000 [4:07:28<5:09:43,  1.47it/s]


 45%|███████████████                  | 22737/50000 [4:07:29<5:04:47,  1.49it/s]


 45%|███████████████                  | 22738/50000 [4:07:30<4:51:06,  1.56it/s]


 45%|███████████████                  | 22739/50000 [4:07:30<5:07:45,  1.48it/s]


 45%|███████████████                  | 22740/50000 [4:07:31<5:00:49,  1.51it/s]


 45%|███████████████                  | 22741/50000 [4:07:32<5:02:00,  1.50it/s]


 45%|███████████████                  | 22742/50000 [4:07:32<4:48:12,  1.58it/s]


 45%|███████████████                  | 22743/50000 [4:07:33<4:54:02,  1.54it/s]


 45%|███████████████                  | 22744/50000 [4:07:33<4:51:26,  1.56it/s]


 45%|███████████████                  | 22745/50000 [4:07:34<4:44:31,  1.60it/s]


 45%|███████████████                  | 22746/50000 [4:07:35<4:35:25,  1.65it/s]


 45%|███████████████                  | 22747/50000 [4:07:35<4:43:40,  1.60it/s]


 45%|███████████████                  | 22748/50000 [4:07:36<4:47:39,  1.58it/s]


 45%|███████████████                  | 22749/50000 [4:07:37<4:41:05,  1.62it/s]


 46%|███████████████                  | 22750/50000 [4:07:37<4:36:35,  1.64it/s]


 46%|███████████████                  | 22751/50000 [4:07:38<4:41:06,  1.62it/s]


 46%|███████████████                  | 22752/50000 [4:07:38<4:26:35,  1.70it/s]


 46%|███████████████                  | 22753/50000 [4:07:39<5:32:24,  1.37it/s]


 46%|███████████████                  | 22754/50000 [4:07:40<5:33:25,  1.36it/s]


 46%|███████████████                  | 22755/50000 [4:07:41<5:15:43,  1.44it/s]


 46%|███████████████                  | 22756/50000 [4:07:41<5:27:43,  1.39it/s]


 46%|███████████████                  | 22757/50000 [4:07:42<5:31:44,  1.37it/s]


 46%|███████████████                  | 22758/50000 [4:07:43<5:08:33,  1.47it/s]


 46%|███████████████                  | 22759/50000 [4:07:43<4:45:47,  1.59it/s]


 46%|███████████████                  | 22760/50000 [4:07:44<4:47:21,  1.58it/s]


 46%|███████████████                  | 22761/50000 [4:07:45<5:04:19,  1.49it/s]


 46%|███████████████                  | 22762/50000 [4:07:46<5:29:45,  1.38it/s]


 46%|███████████████                  | 22763/50000 [4:07:46<5:24:07,  1.40it/s]


 46%|███████████████                  | 22764/50000 [4:07:47<5:27:23,  1.39it/s]


 46%|███████████████                  | 22765/50000 [4:07:48<5:08:49,  1.47it/s]


 46%|███████████████                  | 22766/50000 [4:07:48<5:32:52,  1.36it/s]


 46%|███████████████                  | 22767/50000 [4:07:49<5:23:16,  1.40it/s]


 46%|███████████████                  | 22768/50000 [4:07:50<5:26:59,  1.39it/s]


 46%|███████████████                  | 22769/50000 [4:07:50<5:11:43,  1.46it/s]


 46%|███████████████                  | 22770/50000 [4:07:51<5:05:38,  1.48it/s]


 46%|███████████████                  | 22771/50000 [4:07:52<5:04:47,  1.49it/s]


 46%|███████████████                  | 22772/50000 [4:07:52<4:52:03,  1.55it/s]


 46%|███████████████                  | 22773/50000 [4:07:53<5:11:06,  1.46it/s]


 46%|███████████████                  | 22774/50000 [4:07:54<4:58:11,  1.52it/s]


 46%|███████████████                  | 22775/50000 [4:07:54<4:56:02,  1.53it/s]


 46%|███████████████                  | 22776/50000 [4:07:55<5:00:32,  1.51it/s]


 46%|███████████████                  | 22777/50000 [4:07:56<4:56:49,  1.53it/s]


 46%|███████████████                  | 22778/50000 [4:07:56<4:47:18,  1.58it/s]


 46%|███████████████                  | 22779/50000 [4:07:57<4:50:58,  1.56it/s]


 46%|███████████████                  | 22780/50000 [4:07:58<4:50:27,  1.56it/s]


 46%|███████████████                  | 22781/50000 [4:07:58<5:01:18,  1.51it/s]


 46%|███████████████                  | 22782/50000 [4:07:59<4:52:13,  1.55it/s]


 46%|███████████████                  | 22783/50000 [4:07:59<4:33:24,  1.66it/s]


 46%|███████████████                  | 22784/50000 [4:08:00<4:44:43,  1.59it/s]


 46%|███████████████                  | 22785/50000 [4:08:01<4:49:11,  1.57it/s]


 46%|███████████████                  | 22786/50000 [4:08:02<5:18:07,  1.43it/s]


 46%|███████████████                  | 22787/50000 [4:08:02<5:22:16,  1.41it/s]


 46%|███████████████                  | 22788/50000 [4:08:03<5:31:11,  1.37it/s]


 46%|███████████████                  | 22789/50000 [4:08:04<5:29:33,  1.38it/s]


 46%|███████████████                  | 22790/50000 [4:08:04<5:05:38,  1.48it/s]


 46%|███████████████                  | 22791/50000 [4:08:05<5:18:28,  1.42it/s]


 46%|███████████████                  | 22792/50000 [4:08:06<5:01:20,  1.50it/s]


 46%|███████████████                  | 22793/50000 [4:08:06<4:47:33,  1.58it/s]


 46%|███████████████                  | 22794/50000 [4:08:07<4:49:53,  1.56it/s]


 46%|███████████████                  | 22795/50000 [4:08:08<4:53:18,  1.55it/s]


 46%|███████████████                  | 22796/50000 [4:08:08<5:01:46,  1.50it/s]


 46%|███████████████                  | 22797/50000 [4:08:09<4:53:11,  1.55it/s]


 46%|███████████████                  | 22798/50000 [4:08:10<5:07:30,  1.47it/s]


 46%|███████████████                  | 22799/50000 [4:08:10<5:28:20,  1.38it/s]


 46%|███████████████                  | 22800/50000 [4:08:11<5:20:26,  1.41it/s]


                                                                                
{'loss': 3.3462, 'grad_norm': 4.81316614151001, 'learning_rate': 0.0005440000000000001, 'epoch': 1.19}

 46%|███████████████                  | 22800/50000 [4:08:11<5:20:26,  1.41it/s]


 46%|███████████████                  | 22801/50000 [4:08:12<5:15:52,  1.44it/s]


 46%|███████████████                  | 22802/50000 [4:08:12<5:02:17,  1.50it/s]


 46%|███████████████                  | 22803/50000 [4:08:13<5:01:21,  1.50it/s]


 46%|███████████████                  | 22804/50000 [4:08:14<5:13:53,  1.44it/s]


 46%|███████████████                  | 22805/50000 [4:08:14<4:50:48,  1.56it/s]


 46%|███████████████                  | 22806/50000 [4:08:15<4:45:16,  1.59it/s]


 46%|███████████████                  | 22807/50000 [4:08:16<5:03:16,  1.49it/s]


 46%|███████████████                  | 22808/50000 [4:08:16<4:48:15,  1.57it/s]


 46%|███████████████                  | 22809/50000 [4:08:17<4:37:51,  1.63it/s]


 46%|███████████████                  | 22810/50000 [4:08:17<4:41:37,  1.61it/s]


 46%|███████████████                  | 22811/50000 [4:08:18<4:40:01,  1.62it/s]


 46%|███████████████                  | 22812/50000 [4:08:19<5:10:43,  1.46it/s]


 46%|███████████████                  | 22813/50000 [4:08:20<5:10:47,  1.46it/s]


 46%|███████████████                  | 22814/50000 [4:08:20<4:54:27,  1.54it/s]


 46%|███████████████                  | 22815/50000 [4:08:21<5:06:33,  1.48it/s]


 46%|███████████████                  | 22816/50000 [4:08:22<5:48:41,  1.30it/s]


 46%|███████████████                  | 22817/50000 [4:08:22<5:26:51,  1.39it/s]


 46%|███████████████                  | 22818/50000 [4:08:23<5:02:50,  1.50it/s]


 46%|███████████████                  | 22819/50000 [4:08:24<5:09:15,  1.46it/s]


 46%|███████████████                  | 22820/50000 [4:08:24<5:02:35,  1.50it/s]


 46%|███████████████                  | 22821/50000 [4:08:25<4:59:34,  1.51it/s]


 46%|███████████████                  | 22822/50000 [4:08:26<4:48:52,  1.57it/s]


 46%|███████████████                  | 22823/50000 [4:08:26<5:12:13,  1.45it/s]


 46%|███████████████                  | 22824/50000 [4:08:27<5:01:07,  1.50it/s]


 46%|███████████████                  | 22825/50000 [4:08:28<5:02:47,  1.50it/s]


 46%|███████████████                  | 22826/50000 [4:08:28<4:52:48,  1.55it/s]


 46%|███████████████                  | 22827/50000 [4:08:29<4:42:28,  1.60it/s]


 46%|███████████████                  | 22828/50000 [4:08:29<4:42:03,  1.61it/s]


 46%|███████████████                  | 22829/50000 [4:08:30<4:36:38,  1.64it/s]


 46%|███████████████                  | 22830/50000 [4:08:31<4:29:24,  1.68it/s]


 46%|███████████████                  | 22831/50000 [4:08:31<4:33:42,  1.65it/s]


 46%|███████████████                  | 22832/50000 [4:08:32<4:28:08,  1.69it/s]


 46%|███████████████                  | 22833/50000 [4:08:32<4:33:43,  1.65it/s]


 46%|███████████████                  | 22834/50000 [4:08:33<4:34:08,  1.65it/s]


 46%|███████████████                  | 22835/50000 [4:08:34<4:31:51,  1.67it/s]


 46%|███████████████                  | 22836/50000 [4:08:34<4:31:06,  1.67it/s]


 46%|███████████████                  | 22837/50000 [4:08:35<4:29:44,  1.68it/s]


 46%|███████████████                  | 22838/50000 [4:08:36<4:48:53,  1.57it/s]


 46%|███████████████                  | 22839/50000 [4:08:36<4:54:10,  1.54it/s]


 46%|███████████████                  | 22840/50000 [4:08:37<4:45:07,  1.59it/s]


 46%|███████████████                  | 22841/50000 [4:08:37<4:48:04,  1.57it/s]


 46%|███████████████                  | 22842/50000 [4:08:38<4:59:07,  1.51it/s]


 46%|███████████████                  | 22843/50000 [4:08:39<4:57:07,  1.52it/s]


 46%|███████████████                  | 22844/50000 [4:08:40<5:00:54,  1.50it/s]


 46%|███████████████                  | 22845/50000 [4:08:40<4:59:19,  1.51it/s]


 46%|███████████████                  | 22846/50000 [4:08:41<4:51:31,  1.55it/s]


 46%|███████████████                  | 22847/50000 [4:08:41<4:40:55,  1.61it/s]


 46%|███████████████                  | 22848/50000 [4:08:42<4:37:38,  1.63it/s]


 46%|███████████████                  | 22849/50000 [4:08:43<4:35:21,  1.64it/s]


 46%|███████████████                  | 22850/50000 [4:08:43<4:27:49,  1.69it/s]


 46%|███████████████                  | 22851/50000 [4:08:44<4:32:49,  1.66it/s]


 46%|███████████████                  | 22852/50000 [4:08:44<4:26:57,  1.69it/s]


 46%|███████████████                  | 22853/50000 [4:08:45<4:28:46,  1.68it/s]


 46%|███████████████                  | 22854/50000 [4:08:46<4:39:25,  1.62it/s]


 46%|███████████████                  | 22855/50000 [4:08:46<4:45:09,  1.59it/s]


 46%|███████████████                  | 22856/50000 [4:08:47<4:50:02,  1.56it/s]


 46%|███████████████                  | 22857/50000 [4:08:48<4:59:36,  1.51it/s]


 46%|███████████████                  | 22858/50000 [4:08:48<5:02:46,  1.49it/s]


 46%|███████████████                  | 22859/50000 [4:08:49<4:57:19,  1.52it/s]


 46%|███████████████                  | 22860/50000 [4:08:50<4:52:36,  1.55it/s]


 46%|███████████████                  | 22861/50000 [4:08:50<4:57:46,  1.52it/s]


 46%|███████████████                  | 22862/50000 [4:08:51<4:57:58,  1.52it/s]


 46%|███████████████                  | 22863/50000 [4:08:51<4:49:24,  1.56it/s]


 46%|███████████████                  | 22864/50000 [4:08:52<5:18:56,  1.42it/s]


 46%|███████████████                  | 22865/50000 [4:08:53<5:05:06,  1.48it/s]


 46%|███████████████                  | 22866/50000 [4:08:54<5:06:41,  1.47it/s]


 46%|███████████████                  | 22867/50000 [4:08:54<4:44:11,  1.59it/s]


 46%|███████████████                  | 22868/50000 [4:08:55<5:09:58,  1.46it/s]


 46%|███████████████                  | 22869/50000 [4:08:56<5:09:09,  1.46it/s]


 46%|███████████████                  | 22870/50000 [4:08:56<4:45:53,  1.58it/s]


 46%|███████████████                  | 22871/50000 [4:08:57<4:41:05,  1.61it/s]


 46%|███████████████                  | 22872/50000 [4:08:57<4:57:02,  1.52it/s]


 46%|███████████████                  | 22873/50000 [4:08:58<4:53:15,  1.54it/s]


 46%|███████████████                  | 22874/50000 [4:08:59<5:02:58,  1.49it/s]


 46%|███████████████                  | 22875/50000 [4:08:59<4:54:12,  1.54it/s]


 46%|███████████████                  | 22876/50000 [4:09:00<4:58:03,  1.52it/s]


 46%|███████████████                  | 22877/50000 [4:09:01<4:53:06,  1.54it/s]


 46%|███████████████                  | 22878/50000 [4:09:01<4:46:54,  1.58it/s]


 46%|███████████████                  | 22879/50000 [4:09:02<5:10:56,  1.45it/s]


 46%|███████████████                  | 22880/50000 [4:09:03<4:46:59,  1.57it/s]


 46%|███████████████                  | 22881/50000 [4:09:03<4:52:58,  1.54it/s]


 46%|███████████████                  | 22882/50000 [4:09:04<5:06:05,  1.48it/s]


 46%|███████████████                  | 22883/50000 [4:09:05<5:01:09,  1.50it/s]


 46%|███████████████                  | 22884/50000 [4:09:05<4:41:54,  1.60it/s]


 46%|███████████████                  | 22885/50000 [4:09:06<4:37:24,  1.63it/s]


 46%|███████████████                  | 22886/50000 [4:09:06<4:35:07,  1.64it/s]


 46%|███████████████                  | 22887/50000 [4:09:07<4:26:42,  1.69it/s]


 46%|███████████████                  | 22888/50000 [4:09:08<4:46:46,  1.58it/s]


 46%|███████████████                  | 22889/50000 [4:09:08<4:42:29,  1.60it/s]


 46%|███████████████                  | 22890/50000 [4:09:09<4:58:53,  1.51it/s]


 46%|███████████████                  | 22891/50000 [4:09:10<4:57:54,  1.52it/s]


 46%|███████████████                  | 22892/50000 [4:09:10<4:53:38,  1.54it/s]


 46%|███████████████                  | 22893/50000 [4:09:11<4:51:20,  1.55it/s]


 46%|███████████████                  | 22894/50000 [4:09:12<4:55:40,  1.53it/s]


 46%|███████████████                  | 22895/50000 [4:09:12<4:51:52,  1.55it/s]


 46%|███████████████                  | 22896/50000 [4:09:13<4:57:03,  1.52it/s]


 46%|███████████████                  | 22897/50000 [4:09:14<4:49:23,  1.56it/s]


 46%|███████████████                  | 22898/50000 [4:09:14<4:47:30,  1.57it/s]


 46%|███████████████                  | 22899/50000 [4:09:15<4:58:10,  1.51it/s]


 46%|███████████████                  | 22900/50000 [4:09:16<5:42:58,  1.32it/s]


                                                                                
{'loss': 3.3354, 'grad_norm': 3.582690954208374, 'learning_rate': 0.0005420000000000001, 'epoch': 1.2}

 46%|███████████████                  | 22900/50000 [4:09:16<5:42:58,  1.32it/s]


 46%|███████████████                  | 22901/50000 [4:09:17<5:21:36,  1.40it/s]


 46%|███████████████                  | 22902/50000 [4:09:17<5:14:22,  1.44it/s]


 46%|███████████████                  | 22903/50000 [4:09:18<4:57:23,  1.52it/s]


 46%|███████████████                  | 22904/50000 [4:09:18<4:46:21,  1.58it/s]


 46%|███████████████                  | 22905/50000 [4:09:19<4:42:41,  1.60it/s]


 46%|███████████████                  | 22906/50000 [4:09:20<4:55:57,  1.53it/s]


 46%|███████████████                  | 22907/50000 [4:09:20<4:55:24,  1.53it/s]


 46%|███████████████                  | 22908/50000 [4:09:21<4:58:29,  1.51it/s]


 46%|███████████████                  | 22909/50000 [4:09:22<4:55:29,  1.53it/s]


 46%|███████████████                  | 22910/50000 [4:09:22<4:53:24,  1.54it/s]


 46%|███████████████                  | 22911/50000 [4:09:23<4:36:11,  1.63it/s]


 46%|███████████████                  | 22912/50000 [4:09:23<4:24:05,  1.71it/s]


 46%|███████████████                  | 22913/50000 [4:09:24<4:55:26,  1.53it/s]


 46%|███████████████                  | 22914/50000 [4:09:25<5:05:40,  1.48it/s]


 46%|███████████████                  | 22915/50000 [4:09:26<5:06:30,  1.47it/s]


 46%|███████████████                  | 22916/50000 [4:09:26<4:55:57,  1.53it/s]


 46%|███████████████▏                 | 22917/50000 [4:09:27<4:46:04,  1.58it/s]


 46%|███████████████▏                 | 22918/50000 [4:09:28<5:06:01,  1.47it/s]


 46%|███████████████▏                 | 22919/50000 [4:09:28<5:04:16,  1.48it/s]


 46%|███████████████▏                 | 22920/50000 [4:09:29<5:09:01,  1.46it/s]


 46%|███████████████▏                 | 22921/50000 [4:09:30<5:04:32,  1.48it/s]


 46%|███████████████▏                 | 22922/50000 [4:09:30<4:52:55,  1.54it/s]


 46%|███████████████▏                 | 22923/50000 [4:09:31<4:55:03,  1.53it/s]


 46%|███████████████▏                 | 22924/50000 [4:09:31<4:58:13,  1.51it/s]


 46%|███████████████▏                 | 22925/50000 [4:09:32<4:57:52,  1.51it/s]


 46%|███████████████▏                 | 22926/50000 [4:09:33<5:01:08,  1.50it/s]


 46%|███████████████▏                 | 22927/50000 [4:09:33<4:56:30,  1.52it/s]


 46%|███████████████▏                 | 22928/50000 [4:09:34<4:48:58,  1.56it/s]


 46%|███████████████▏                 | 22929/50000 [4:09:35<5:02:50,  1.49it/s]


 46%|███████████████▏                 | 22930/50000 [4:09:35<4:49:10,  1.56it/s]


 46%|███████████████▏                 | 22931/50000 [4:09:36<4:32:17,  1.66it/s]


 46%|███████████████▏                 | 22932/50000 [4:09:37<4:50:17,  1.55it/s]


 46%|███████████████▏                 | 22933/50000 [4:09:37<4:42:46,  1.60it/s]


 46%|███████████████▏                 | 22934/50000 [4:09:38<4:38:31,  1.62it/s]


 46%|███████████████▏                 | 22935/50000 [4:09:38<4:47:29,  1.57it/s]


 46%|███████████████▏                 | 22936/50000 [4:09:39<4:37:05,  1.63it/s]


 46%|███████████████▏                 | 22937/50000 [4:09:40<4:39:47,  1.61it/s]


 46%|███████████████▏                 | 22938/50000 [4:09:40<4:34:37,  1.64it/s]


 46%|███████████████▏                 | 22939/50000 [4:09:41<4:49:19,  1.56it/s]


 46%|███████████████▏                 | 22940/50000 [4:09:42<4:58:26,  1.51it/s]


 46%|███████████████▏                 | 22941/50000 [4:09:43<5:20:56,  1.41it/s]


 46%|███████████████▏                 | 22942/50000 [4:09:43<5:21:45,  1.40it/s]


 46%|███████████████▏                 | 22943/50000 [4:09:44<5:29:09,  1.37it/s]


 46%|███████████████▏                 | 22944/50000 [4:09:45<5:30:05,  1.37it/s]


 46%|███████████████▏                 | 22945/50000 [4:09:45<5:15:48,  1.43it/s]


 46%|███████████████▏                 | 22946/50000 [4:09:46<5:02:20,  1.49it/s]


 46%|███████████████▏                 | 22947/50000 [4:09:47<4:49:33,  1.56it/s]


 46%|███████████████▏                 | 22948/50000 [4:09:47<4:50:45,  1.55it/s]


 46%|███████████████▏                 | 22949/50000 [4:09:48<4:54:18,  1.53it/s]


 46%|███████████████▏                 | 22950/50000 [4:09:48<4:52:27,  1.54it/s]


 46%|███████████████▏                 | 22951/50000 [4:09:49<4:39:12,  1.61it/s]


 46%|███████████████▏                 | 22952/50000 [4:09:50<4:33:17,  1.65it/s]


 46%|███████████████▏                 | 22953/50000 [4:09:50<4:52:35,  1.54it/s]


 46%|███████████████▏                 | 22954/50000 [4:09:51<4:40:11,  1.61it/s]


 46%|███████████████▏                 | 22955/50000 [4:09:51<4:24:10,  1.71it/s]


 46%|███████████████▏                 | 22956/50000 [4:09:52<4:31:18,  1.66it/s]


 46%|███████████████▏                 | 22957/50000 [4:09:53<4:38:01,  1.62it/s]


 46%|███████████████▏                 | 22958/50000 [4:09:54<5:04:04,  1.48it/s]


 46%|███████████████▏                 | 22959/50000 [4:09:54<4:53:15,  1.54it/s]


 46%|███████████████▏                 | 22960/50000 [4:09:55<4:51:39,  1.55it/s]


 46%|███████████████▏                 | 22961/50000 [4:09:55<4:34:45,  1.64it/s]


 46%|███████████████▏                 | 22962/50000 [4:09:56<4:29:27,  1.67it/s]


 46%|███████████████▏                 | 22963/50000 [4:09:57<4:39:22,  1.61it/s]


 46%|███████████████▏                 | 22964/50000 [4:09:57<4:42:55,  1.59it/s]


 46%|███████████████▏                 | 22965/50000 [4:09:58<4:33:23,  1.65it/s]


 46%|███████████████▏                 | 22966/50000 [4:09:58<4:27:30,  1.68it/s]


 46%|███████████████▏                 | 22967/50000 [4:09:59<4:15:22,  1.76it/s]


 46%|███████████████▏                 | 22968/50000 [4:09:59<4:14:57,  1.77it/s]


 46%|███████████████▏                 | 22969/50000 [4:10:00<4:39:41,  1.61it/s]


 46%|███████████████▏                 | 22970/50000 [4:10:01<4:33:34,  1.65it/s]


 46%|███████████████▏                 | 22971/50000 [4:10:01<4:43:52,  1.59it/s]


 46%|███████████████▏                 | 22972/50000 [4:10:02<4:54:31,  1.53it/s]


 46%|███████████████▏                 | 22973/50000 [4:10:03<4:44:50,  1.58it/s]


 46%|███████████████▏                 | 22974/50000 [4:10:03<4:46:35,  1.57it/s]


 46%|███████████████▏                 | 22975/50000 [4:10:04<4:39:19,  1.61it/s]


 46%|███████████████▏                 | 22976/50000 [4:10:04<4:24:20,  1.70it/s]


 46%|███████████████▏                 | 22977/50000 [4:10:05<4:29:39,  1.67it/s]


 46%|███████████████▏                 | 22978/50000 [4:10:06<4:34:16,  1.64it/s]


 46%|███████████████▏                 | 22979/50000 [4:10:06<4:30:39,  1.66it/s]


 46%|███████████████▏                 | 22980/50000 [4:10:07<4:50:40,  1.55it/s]


 46%|███████████████▏                 | 22981/50000 [4:10:08<5:14:33,  1.43it/s]


 46%|███████████████▏                 | 22982/50000 [4:10:09<5:18:47,  1.41it/s]


 46%|███████████████▏                 | 22983/50000 [4:10:09<5:26:10,  1.38it/s]


 46%|███████████████▏                 | 22984/50000 [4:10:10<5:06:08,  1.47it/s]


 46%|███████████████▏                 | 22985/50000 [4:10:11<4:59:48,  1.50it/s]


 46%|███████████████▏                 | 22986/50000 [4:10:11<5:12:46,  1.44it/s]


 46%|███████████████▏                 | 22987/50000 [4:10:12<5:10:24,  1.45it/s]


 46%|███████████████▏                 | 22988/50000 [4:10:13<5:07:02,  1.47it/s]


 46%|███████████████▏                 | 22989/50000 [4:10:13<5:25:04,  1.38it/s]


 46%|███████████████▏                 | 22990/50000 [4:10:14<5:13:07,  1.44it/s]


 46%|███████████████▏                 | 22991/50000 [4:10:15<5:00:51,  1.50it/s]


 46%|███████████████▏                 | 22992/50000 [4:10:15<5:02:56,  1.49it/s]


 46%|███████████████▏                 | 22993/50000 [4:10:16<4:47:16,  1.57it/s]


 46%|███████████████▏                 | 22994/50000 [4:10:16<4:38:04,  1.62it/s]


 46%|███████████████▏                 | 22995/50000 [4:10:17<4:44:21,  1.58it/s]


 46%|███████████████▏                 | 22996/50000 [4:10:18<4:35:04,  1.64it/s]


 46%|███████████████▏                 | 22997/50000 [4:10:18<4:29:31,  1.67it/s]


 46%|███████████████▏                 | 22998/50000 [4:10:19<5:00:30,  1.50it/s]


 46%|███████████████▏                 | 22999/50000 [4:10:20<4:48:58,  1.56it/s]


 46%|███████████████▏                 | 23000/50000 [4:10:20<4:47:42,  1.56it/s]
                                                                                
{'loss': 3.3109, 'grad_norm': 3.3030734062194824, 'learning_rate': 0.00054, 'epoch': 1.2}

 46%|███████████████▏                 | 23000/50000 [4:10:20<4:47:42,  1.56it/s]


 46%|███████████████▏                 | 23001/50000 [4:10:21<5:00:53,  1.50it/s]


 46%|███████████████▏                 | 23002/50000 [4:10:22<4:45:02,  1.58it/s]


 46%|███████████████▏                 | 23003/50000 [4:10:22<4:47:28,  1.57it/s]


 46%|███████████████▏                 | 23004/50000 [4:10:23<4:27:56,  1.68it/s]


 46%|███████████████▏                 | 23005/50000 [4:10:23<4:14:57,  1.76it/s]


 46%|███████████████▏                 | 23006/50000 [4:10:24<4:23:01,  1.71it/s]


 46%|███████████████▏                 | 23007/50000 [4:10:25<4:35:30,  1.63it/s]


 46%|███████████████▏                 | 23008/50000 [4:10:25<4:32:16,  1.65it/s]


 46%|███████████████▏                 | 23009/50000 [4:10:26<4:40:16,  1.61it/s]


 46%|███████████████▏                 | 23010/50000 [4:10:26<4:33:09,  1.65it/s]


 46%|███████████████▏                 | 23011/50000 [4:10:27<4:32:19,  1.65it/s]


 46%|███████████████▏                 | 23012/50000 [4:10:28<4:36:52,  1.62it/s]


 46%|███████████████▏                 | 23013/50000 [4:10:28<4:55:54,  1.52it/s]


 46%|███████████████▏                 | 23014/50000 [4:10:29<4:42:22,  1.59it/s]


 46%|███████████████▏                 | 23015/50000 [4:10:30<5:19:03,  1.41it/s]


 46%|███████████████▏                 | 23016/50000 [4:10:30<5:02:43,  1.49it/s]


 46%|███████████████▏                 | 23017/50000 [4:10:31<4:50:28,  1.55it/s]


 46%|███████████████▏                 | 23018/50000 [4:10:32<4:38:45,  1.61it/s]


 46%|███████████████▏                 | 23019/50000 [4:10:32<4:53:48,  1.53it/s]


 46%|███████████████▏                 | 23020/50000 [4:10:33<5:05:56,  1.47it/s]


 46%|███████████████▏                 | 23021/50000 [4:10:34<4:40:32,  1.60it/s]


 46%|███████████████▏                 | 23022/50000 [4:10:34<4:47:51,  1.56it/s]


 46%|███████████████▏                 | 23023/50000 [4:10:35<4:48:39,  1.56it/s]


 46%|███████████████▏                 | 23024/50000 [4:10:36<5:11:16,  1.44it/s]


 46%|███████████████▏                 | 23025/50000 [4:10:36<5:16:19,  1.42it/s]


 46%|███████████████▏                 | 23026/50000 [4:10:37<5:10:14,  1.45it/s]


 46%|███████████████▏                 | 23027/50000 [4:10:38<4:57:20,  1.51it/s]


 46%|███████████████▏                 | 23028/50000 [4:10:38<5:07:26,  1.46it/s]


 46%|███████████████▏                 | 23029/50000 [4:10:39<5:14:31,  1.43it/s]


 46%|███████████████▏                 | 23030/50000 [4:10:40<4:58:40,  1.51it/s]


 46%|███████████████▏                 | 23031/50000 [4:10:40<4:57:47,  1.51it/s]


 46%|███████████████▏                 | 23032/50000 [4:10:41<4:43:36,  1.58it/s]


 46%|███████████████▏                 | 23033/50000 [4:10:42<4:38:40,  1.61it/s]


 46%|███████████████▏                 | 23034/50000 [4:10:42<5:24:00,  1.39it/s]


 46%|███████████████▏                 | 23035/50000 [4:10:43<5:14:38,  1.43it/s]


 46%|███████████████▏                 | 23036/50000 [4:10:44<5:06:22,  1.47it/s]


 46%|███████████████▏                 | 23037/50000 [4:10:44<4:52:09,  1.54it/s]


 46%|███████████████▏                 | 23038/50000 [4:10:45<4:55:00,  1.52it/s]


 46%|███████████████▏                 | 23039/50000 [4:10:46<4:53:29,  1.53it/s]


 46%|███████████████▏                 | 23040/50000 [4:10:46<4:50:43,  1.55it/s]


 46%|███████████████▏                 | 23041/50000 [4:10:47<4:49:07,  1.55it/s]


 46%|███████████████▏                 | 23042/50000 [4:10:48<4:48:39,  1.56it/s]


 46%|███████████████▏                 | 23043/50000 [4:10:49<5:29:16,  1.36it/s]


 46%|███████████████▏                 | 23044/50000 [4:10:49<5:05:39,  1.47it/s]


 46%|███████████████▏                 | 23045/50000 [4:10:50<5:27:24,  1.37it/s]


 46%|███████████████▏                 | 23046/50000 [4:10:51<5:18:51,  1.41it/s]


 46%|███████████████▏                 | 23047/50000 [4:10:51<5:28:29,  1.37it/s]


 46%|███████████████▏                 | 23048/50000 [4:10:52<5:12:09,  1.44it/s]


 46%|███████████████▏                 | 23049/50000 [4:10:53<5:02:58,  1.48it/s]


 46%|███████████████▏                 | 23050/50000 [4:10:53<4:56:30,  1.51it/s]


 46%|███████████████▏                 | 23051/50000 [4:10:54<5:03:04,  1.48it/s]


 46%|███████████████▏                 | 23052/50000 [4:10:55<5:00:52,  1.49it/s]


 46%|███████████████▏                 | 23053/50000 [4:10:55<5:02:43,  1.48it/s]


 46%|███████████████▏                 | 23054/50000 [4:10:56<4:53:48,  1.53it/s]


 46%|███████████████▏                 | 23055/50000 [4:10:57<4:55:16,  1.52it/s]


 46%|███████████████▏                 | 23056/50000 [4:10:57<5:04:13,  1.48it/s]


 46%|███████████████▏                 | 23057/50000 [4:10:58<4:52:20,  1.54it/s]


 46%|███████████████▏                 | 23058/50000 [4:10:58<4:46:23,  1.57it/s]


 46%|███████████████▏                 | 23059/50000 [4:10:59<4:44:55,  1.58it/s]


 46%|███████████████▏                 | 23060/50000 [4:11:00<4:56:23,  1.51it/s]


 46%|███████████████▏                 | 23061/50000 [4:11:01<5:22:02,  1.39it/s]


 46%|███████████████▏                 | 23062/50000 [4:11:01<5:21:45,  1.40it/s]


 46%|███████████████▏                 | 23063/50000 [4:11:02<5:08:58,  1.45it/s]


 46%|███████████████▏                 | 23064/50000 [4:11:03<5:07:34,  1.46it/s]


 46%|███████████████▏                 | 23065/50000 [4:11:03<5:13:36,  1.43it/s]


 46%|███████████████▏                 | 23066/50000 [4:11:04<4:57:58,  1.51it/s]


 46%|███████████████▏                 | 23067/50000 [4:11:05<4:46:12,  1.57it/s]


 46%|███████████████▏                 | 23068/50000 [4:11:05<4:41:18,  1.60it/s]


 46%|███████████████▏                 | 23069/50000 [4:11:06<4:42:02,  1.59it/s]


 46%|███████████████▏                 | 23070/50000 [4:11:06<4:42:49,  1.59it/s]


 46%|███████████████▏                 | 23071/50000 [4:11:07<4:49:06,  1.55it/s]


 46%|███████████████▏                 | 23072/50000 [4:11:08<4:54:23,  1.52it/s]


 46%|███████████████▏                 | 23073/50000 [4:11:08<4:50:27,  1.55it/s]


 46%|███████████████▏                 | 23074/50000 [4:11:09<4:51:58,  1.54it/s]


 46%|███████████████▏                 | 23075/50000 [4:11:10<4:54:45,  1.52it/s]


 46%|███████████████▏                 | 23076/50000 [4:11:10<4:57:10,  1.51it/s]


 46%|███████████████▏                 | 23077/50000 [4:11:11<4:55:55,  1.52it/s]


 46%|███████████████▏                 | 23078/50000 [4:11:12<4:47:13,  1.56it/s]


 46%|███████████████▏                 | 23079/50000 [4:11:12<4:38:34,  1.61it/s]


 46%|███████████████▏                 | 23080/50000 [4:11:13<4:44:31,  1.58it/s]


 46%|███████████████▏                 | 23081/50000 [4:11:14<4:43:42,  1.58it/s]


 46%|███████████████▏                 | 23082/50000 [4:11:14<5:13:50,  1.43it/s]


 46%|███████████████▏                 | 23083/50000 [4:11:15<5:30:30,  1.36it/s]


 46%|███████████████▏                 | 23084/50000 [4:11:16<5:26:47,  1.37it/s]


 46%|███████████████▏                 | 23085/50000 [4:11:17<5:27:48,  1.37it/s]


 46%|███████████████▏                 | 23086/50000 [4:11:17<5:10:22,  1.45it/s]


 46%|███████████████▏                 | 23087/50000 [4:11:18<5:07:31,  1.46it/s]


 46%|███████████████▏                 | 23088/50000 [4:11:19<5:01:14,  1.49it/s]


 46%|███████████████▏                 | 23089/50000 [4:11:19<4:56:13,  1.51it/s]


 46%|███████████████▏                 | 23090/50000 [4:11:20<4:59:36,  1.50it/s]


 46%|███████████████▏                 | 23091/50000 [4:11:20<4:48:22,  1.56it/s]


 46%|███████████████▏                 | 23092/50000 [4:11:21<4:59:34,  1.50it/s]


 46%|███████████████▏                 | 23093/50000 [4:11:22<4:55:10,  1.52it/s]


 46%|███████████████▏                 | 23094/50000 [4:11:22<4:45:16,  1.57it/s]


 46%|███████████████▏                 | 23095/50000 [4:11:23<4:35:46,  1.63it/s]


 46%|███████████████▏                 | 23096/50000 [4:11:24<4:23:33,  1.70it/s]


 46%|███████████████▏                 | 23097/50000 [4:11:24<4:33:59,  1.64it/s]


 46%|███████████████▏                 | 23098/50000 [4:11:25<4:38:37,  1.61it/s]


 46%|███████████████▏                 | 23099/50000 [4:11:26<5:00:04,  1.49it/s]


 46%|███████████████▏                 | 23100/50000 [4:11:26<4:50:55,  1.54it/s]
                                                                                
{'loss': 3.3326, 'grad_norm': 3.688192844390869, 'learning_rate': 0.0005380000000000001, 'epoch': 1.21}

 46%|███████████████▏                 | 23100/50000 [4:11:26<4:50:55,  1.54it/s]


 46%|███████████████▏                 | 23101/50000 [4:11:27<5:00:16,  1.49it/s]


 46%|███████████████▏                 | 23102/50000 [4:11:28<5:02:05,  1.48it/s]


 46%|███████████████▏                 | 23103/50000 [4:11:28<4:51:29,  1.54it/s]


 46%|███████████████▏                 | 23104/50000 [4:11:29<5:03:56,  1.47it/s]


 46%|███████████████▏                 | 23105/50000 [4:11:30<4:59:40,  1.50it/s]


 46%|███████████████▏                 | 23106/50000 [4:11:30<4:44:01,  1.58it/s]


 46%|███████████████▎                 | 23107/50000 [4:11:31<4:49:01,  1.55it/s]


 46%|███████████████▎                 | 23108/50000 [4:11:31<4:31:46,  1.65it/s]


 46%|███████████████▎                 | 23109/50000 [4:11:32<4:29:24,  1.66it/s]


 46%|███████████████▎                 | 23110/50000 [4:11:32<4:23:32,  1.70it/s]


 46%|███████████████▎                 | 23111/50000 [4:11:33<4:54:08,  1.52it/s]


 46%|███████████████▎                 | 23112/50000 [4:11:34<5:02:26,  1.48it/s]


 46%|███████████████▎                 | 23113/50000 [4:11:35<4:45:00,  1.57it/s]


 46%|███████████████▎                 | 23114/50000 [4:11:35<4:34:27,  1.63it/s]


 46%|███████████████▎                 | 23115/50000 [4:11:36<4:38:02,  1.61it/s]


 46%|███████████████▎                 | 23116/50000 [4:11:36<4:40:31,  1.60it/s]


 46%|███████████████▎                 | 23117/50000 [4:11:37<4:58:46,  1.50it/s]


 46%|███████████████▎                 | 23118/50000 [4:11:38<4:42:22,  1.59it/s]


 46%|███████████████▎                 | 23119/50000 [4:11:39<5:11:38,  1.44it/s]


 46%|███████████████▎                 | 23120/50000 [4:11:39<5:03:29,  1.48it/s]


 46%|███████████████▎                 | 23121/50000 [4:11:40<5:03:32,  1.48it/s]


 46%|███████████████▎                 | 23122/50000 [4:11:40<4:52:25,  1.53it/s]


 46%|███████████████▎                 | 23123/50000 [4:11:41<4:53:08,  1.53it/s]


 46%|███████████████▎                 | 23124/50000 [4:11:42<4:49:24,  1.55it/s]


 46%|███████████████▎                 | 23125/50000 [4:11:42<4:43:22,  1.58it/s]


 46%|███████████████▎                 | 23126/50000 [4:11:43<4:39:08,  1.60it/s]


 46%|███████████████▎                 | 23127/50000 [4:11:44<4:59:34,  1.50it/s]


 46%|███████████████▎                 | 23128/50000 [4:11:44<4:59:48,  1.49it/s]


 46%|███████████████▎                 | 23129/50000 [4:11:45<5:06:20,  1.46it/s]


 46%|███████████████▎                 | 23130/50000 [4:11:46<5:03:42,  1.47it/s]


 46%|███████████████▎                 | 23131/50000 [4:11:46<5:01:02,  1.49it/s]


 46%|███████████████▎                 | 23132/50000 [4:11:47<5:01:42,  1.48it/s]


 46%|███████████████▎                 | 23133/50000 [4:11:48<4:49:27,  1.55it/s]


 46%|███████████████▎                 | 23134/50000 [4:11:48<5:01:35,  1.48it/s]


 46%|███████████████▎                 | 23135/50000 [4:11:49<5:01:15,  1.49it/s]


 46%|███████████████▎                 | 23136/50000 [4:11:50<4:56:42,  1.51it/s]


 46%|███████████████▎                 | 23137/50000 [4:11:50<4:57:50,  1.50it/s]


 46%|███████████████▎                 | 23138/50000 [4:11:51<5:23:22,  1.38it/s]


 46%|███████████████▎                 | 23139/50000 [4:11:52<5:13:50,  1.43it/s]


 46%|███████████████▎                 | 23140/50000 [4:11:53<5:10:35,  1.44it/s]


 46%|███████████████▎                 | 23141/50000 [4:11:53<4:46:43,  1.56it/s]


 46%|███████████████▎                 | 23142/50000 [4:11:54<4:44:49,  1.57it/s]


 46%|███████████████▎                 | 23143/50000 [4:11:54<4:39:15,  1.60it/s]


 46%|███████████████▎                 | 23144/50000 [4:11:55<4:42:56,  1.58it/s]


 46%|███████████████▎                 | 23145/50000 [4:11:56<4:56:53,  1.51it/s]


 46%|███████████████▎                 | 23146/50000 [4:11:56<4:53:36,  1.52it/s]


 46%|███████████████▎                 | 23147/50000 [4:11:57<4:41:11,  1.59it/s]


 46%|███████████████▎                 | 23148/50000 [4:11:58<4:47:40,  1.56it/s]


 46%|███████████████▎                 | 23149/50000 [4:11:58<4:46:22,  1.56it/s]


 46%|███████████████▎                 | 23150/50000 [4:11:59<4:38:41,  1.61it/s]


 46%|███████████████▎                 | 23151/50000 [4:11:59<4:40:02,  1.60it/s]


 46%|███████████████▎                 | 23152/50000 [4:12:00<4:56:38,  1.51it/s]


 46%|███████████████▎                 | 23153/50000 [4:12:01<4:57:44,  1.50it/s]


 46%|███████████████▎                 | 23154/50000 [4:12:02<5:00:11,  1.49it/s]


 46%|███████████████▎                 | 23155/50000 [4:12:02<4:50:48,  1.54it/s]


 46%|███████████████▎                 | 23156/50000 [4:12:03<4:55:23,  1.51it/s]


 46%|███████████████▎                 | 23157/50000 [4:12:03<4:36:06,  1.62it/s]


 46%|███████████████▎                 | 23158/50000 [4:12:04<4:42:28,  1.58it/s]


 46%|███████████████▎                 | 23159/50000 [4:12:05<4:46:01,  1.56it/s]


 46%|███████████████▎                 | 23160/50000 [4:12:05<4:50:17,  1.54it/s]


 46%|███████████████▎                 | 23161/50000 [4:12:06<4:53:17,  1.53it/s]


 46%|███████████████▎                 | 23162/50000 [4:12:07<4:56:42,  1.51it/s]


 46%|███████████████▎                 | 23163/50000 [4:12:07<4:45:53,  1.56it/s]


 46%|███████████████▎                 | 23164/50000 [4:12:08<4:40:45,  1.59it/s]


 46%|███████████████▎                 | 23165/50000 [4:12:09<4:44:48,  1.57it/s]


 46%|███████████████▎                 | 23166/50000 [4:12:09<4:40:49,  1.59it/s]


 46%|███████████████▎                 | 23167/50000 [4:12:10<4:42:17,  1.58it/s]


 46%|███████████████▎                 | 23168/50000 [4:12:10<4:44:07,  1.57it/s]


 46%|███████████████▎                 | 23169/50000 [4:12:11<4:54:02,  1.52it/s]


 46%|███████████████▎                 | 23170/50000 [4:12:12<4:46:26,  1.56it/s]


 46%|███████████████▎                 | 23171/50000 [4:12:12<4:51:30,  1.53it/s]


 46%|███████████████▎                 | 23172/50000 [4:12:13<4:39:56,  1.60it/s]


 46%|███████████████▎                 | 23173/50000 [4:12:14<4:43:01,  1.58it/s]


 46%|███████████████▎                 | 23174/50000 [4:12:14<4:33:11,  1.64it/s]


 46%|███████████████▎                 | 23175/50000 [4:12:15<4:36:31,  1.62it/s]


 46%|███████████████▎                 | 23176/50000 [4:12:15<4:37:53,  1.61it/s]


 46%|███████████████▎                 | 23177/50000 [4:12:16<4:32:04,  1.64it/s]


 46%|███████████████▎                 | 23178/50000 [4:12:17<4:47:01,  1.56it/s]


 46%|███████████████▎                 | 23179/50000 [4:12:17<4:39:22,  1.60it/s]


 46%|███████████████▎                 | 23180/50000 [4:12:18<4:42:25,  1.58it/s]


 46%|███████████████▎                 | 23181/50000 [4:12:19<4:43:43,  1.58it/s]


 46%|███████████████▎                 | 23182/50000 [4:12:19<4:54:07,  1.52it/s]


 46%|███████████████▎                 | 23183/50000 [4:12:20<4:55:24,  1.51it/s]


 46%|███████████████▎                 | 23184/50000 [4:12:21<5:05:06,  1.46it/s]


 46%|███████████████▎                 | 23185/50000 [4:12:21<4:53:40,  1.52it/s]


 46%|███████████████▎                 | 23186/50000 [4:12:22<4:54:05,  1.52it/s]


 46%|███████████████▎                 | 23187/50000 [4:12:23<5:25:46,  1.37it/s]


 46%|███████████████▎                 | 23188/50000 [4:12:24<5:11:16,  1.44it/s]


 46%|███████████████▎                 | 23189/50000 [4:12:24<5:01:54,  1.48it/s]


 46%|███████████████▎                 | 23190/50000 [4:12:25<4:48:36,  1.55it/s]


 46%|███████████████▎                 | 23191/50000 [4:12:25<4:46:53,  1.56it/s]


 46%|███████████████▎                 | 23192/50000 [4:12:26<5:01:13,  1.48it/s]


 46%|███████████████▎                 | 23193/50000 [4:12:27<5:34:01,  1.34it/s]


 46%|███████████████▎                 | 23194/50000 [4:12:28<5:23:00,  1.38it/s]


 46%|███████████████▎                 | 23195/50000 [4:12:28<5:07:36,  1.45it/s]


 46%|███████████████▎                 | 23196/50000 [4:12:29<5:04:24,  1.47it/s]


 46%|███████████████▎                 | 23197/50000 [4:12:30<5:11:53,  1.43it/s]


 46%|███████████████▎                 | 23198/50000 [4:12:31<5:29:21,  1.36it/s]


 46%|███████████████▎                 | 23199/50000 [4:12:31<5:11:56,  1.43it/s]


 46%|███████████████▎                 | 23200/50000 [4:12:32<4:51:24,  1.53it/s]
                                                                                
{'loss': 3.3396, 'grad_norm': 3.9184441566467285, 'learning_rate': 0.000536, 'epoch': 1.21}

 46%|███████████████▎                 | 23200/50000 [4:12:32<4:51:24,  1.53it/s]


 46%|███████████████▎                 | 23201/50000 [4:12:32<4:43:02,  1.58it/s]


 46%|███████████████▎                 | 23202/50000 [4:12:33<4:33:41,  1.63it/s]


 46%|███████████████▎                 | 23203/50000 [4:12:34<4:39:51,  1.60it/s]


 46%|███████████████▎                 | 23204/50000 [4:12:34<4:41:33,  1.59it/s]


 46%|███████████████▎                 | 23205/50000 [4:12:35<5:05:29,  1.46it/s]


 46%|███████████████▎                 | 23206/50000 [4:12:35<4:43:49,  1.57it/s]


 46%|███████████████▎                 | 23207/50000 [4:12:36<4:46:53,  1.56it/s]


 46%|███████████████▎                 | 23208/50000 [4:12:37<4:48:02,  1.55it/s]


 46%|███████████████▎                 | 23209/50000 [4:12:37<4:41:16,  1.59it/s]


 46%|███████████████▎                 | 23210/50000 [4:12:38<4:40:02,  1.59it/s]


 46%|███████████████▎                 | 23211/50000 [4:12:39<4:34:01,  1.63it/s]


 46%|███████████████▎                 | 23212/50000 [4:12:40<5:23:29,  1.38it/s]


 46%|███████████████▎                 | 23213/50000 [4:12:40<5:02:46,  1.47it/s]


 46%|███████████████▎                 | 23214/50000 [4:12:41<5:00:59,  1.48it/s]


 46%|███████████████▎                 | 23215/50000 [4:12:41<4:39:44,  1.60it/s]


 46%|███████████████▎                 | 23216/50000 [4:12:42<4:58:43,  1.49it/s]


 46%|███████████████▎                 | 23217/50000 [4:12:43<5:07:46,  1.45it/s]


 46%|███████████████▎                 | 23218/50000 [4:12:43<4:52:40,  1.53it/s]


 46%|███████████████▎                 | 23219/50000 [4:12:44<5:13:14,  1.42it/s]


 46%|███████████████▎                 | 23220/50000 [4:12:45<5:06:48,  1.45it/s]


 46%|███████████████▎                 | 23221/50000 [4:12:45<4:53:43,  1.52it/s]


 46%|███████████████▎                 | 23222/50000 [4:12:46<5:14:48,  1.42it/s]


 46%|███████████████▎                 | 23223/50000 [4:12:47<4:55:02,  1.51it/s]


 46%|███████████████▎                 | 23224/50000 [4:12:47<4:46:57,  1.56it/s]


 46%|███████████████▎                 | 23225/50000 [4:12:48<4:45:34,  1.56it/s]


 46%|███████████████▎                 | 23226/50000 [4:12:49<4:42:56,  1.58it/s]


 46%|███████████████▎                 | 23227/50000 [4:12:49<4:39:18,  1.60it/s]


 46%|███████████████▎                 | 23228/50000 [4:12:50<4:53:21,  1.52it/s]


 46%|███████████████▎                 | 23229/50000 [4:12:51<5:14:30,  1.42it/s]


 46%|███████████████▎                 | 23230/50000 [4:12:52<5:15:02,  1.42it/s]


 46%|███████████████▎                 | 23231/50000 [4:12:52<5:21:41,  1.39it/s]


 46%|███████████████▎                 | 23232/50000 [4:12:53<5:23:44,  1.38it/s]


 46%|███████████████▎                 | 23233/50000 [4:12:54<5:16:28,  1.41it/s]


 46%|███████████████▎                 | 23234/50000 [4:12:54<5:13:02,  1.43it/s]


 46%|███████████████▎                 | 23235/50000 [4:12:55<4:54:30,  1.51it/s]


 46%|███████████████▎                 | 23236/50000 [4:12:56<4:57:40,  1.50it/s]


 46%|███████████████▎                 | 23237/50000 [4:12:56<4:44:37,  1.57it/s]


 46%|███████████████▎                 | 23238/50000 [4:12:57<4:48:59,  1.54it/s]


 46%|███████████████▎                 | 23239/50000 [4:12:58<5:06:51,  1.45it/s]


 46%|███████████████▎                 | 23240/50000 [4:12:58<4:51:54,  1.53it/s]


 46%|███████████████▎                 | 23241/50000 [4:12:59<4:51:33,  1.53it/s]


 46%|███████████████▎                 | 23242/50000 [4:12:59<4:42:49,  1.58it/s]


 46%|███████████████▎                 | 23243/50000 [4:13:00<4:35:55,  1.62it/s]


 46%|███████████████▎                 | 23244/50000 [4:13:01<4:36:11,  1.61it/s]


 46%|███████████████▎                 | 23245/50000 [4:13:01<4:52:43,  1.52it/s]


 46%|███████████████▎                 | 23246/50000 [4:13:02<5:05:07,  1.46it/s]


 46%|███████████████▎                 | 23247/50000 [4:13:03<5:22:46,  1.38it/s]


 46%|███████████████▎                 | 23248/50000 [4:13:04<5:03:52,  1.47it/s]


 46%|███████████████▎                 | 23249/50000 [4:13:04<4:36:49,  1.61it/s]


 46%|███████████████▎                 | 23250/50000 [4:13:05<4:41:50,  1.58it/s]


 47%|███████████████▎                 | 23251/50000 [4:13:06<5:08:02,  1.45it/s]


 47%|███████████████▎                 | 23252/50000 [4:13:06<5:14:06,  1.42it/s]


 47%|███████████████▎                 | 23253/50000 [4:13:07<5:14:42,  1.42it/s]


 47%|███████████████▎                 | 23254/50000 [4:13:08<5:04:06,  1.47it/s]


 47%|███████████████▎                 | 23255/50000 [4:13:08<4:46:32,  1.56it/s]


 47%|███████████████▎                 | 23256/50000 [4:13:09<4:57:21,  1.50it/s]


 47%|███████████████▎                 | 23257/50000 [4:13:10<5:18:01,  1.40it/s]


 47%|███████████████▎                 | 23258/50000 [4:13:10<5:21:04,  1.39it/s]


 47%|███████████████▎                 | 23259/50000 [4:13:11<5:09:28,  1.44it/s]


 47%|███████████████▎                 | 23260/50000 [4:13:12<5:01:16,  1.48it/s]


 47%|███████████████▎                 | 23261/50000 [4:13:12<4:54:35,  1.51it/s]


 47%|███████████████▎                 | 23262/50000 [4:13:13<4:44:05,  1.57it/s]


 47%|███████████████▎                 | 23263/50000 [4:13:14<4:58:06,  1.49it/s]


 47%|███████████████▎                 | 23264/50000 [4:13:14<4:56:39,  1.50it/s]


 47%|███████████████▎                 | 23265/50000 [4:13:15<4:47:09,  1.55it/s]


 47%|███████████████▎                 | 23266/50000 [4:13:16<4:40:34,  1.59it/s]


 47%|███████████████▎                 | 23267/50000 [4:13:16<4:30:03,  1.65it/s]


 47%|███████████████▎                 | 23268/50000 [4:13:17<4:27:44,  1.66it/s]


 47%|███████████████▎                 | 23269/50000 [4:13:17<4:37:10,  1.61it/s]


 47%|███████████████▎                 | 23270/50000 [4:13:18<4:42:51,  1.57it/s]


 47%|███████████████▎                 | 23271/50000 [4:13:19<4:53:38,  1.52it/s]


 47%|███████████████▎                 | 23272/50000 [4:13:19<4:42:36,  1.58it/s]


 47%|███████████████▎                 | 23273/50000 [4:13:20<4:59:39,  1.49it/s]


 47%|███████████████▎                 | 23274/50000 [4:13:21<5:24:04,  1.37it/s]


 47%|███████████████▎                 | 23275/50000 [4:13:21<5:02:12,  1.47it/s]


 47%|███████████████▎                 | 23276/50000 [4:13:22<4:58:38,  1.49it/s]


 47%|███████████████▎                 | 23277/50000 [4:13:23<5:07:41,  1.45it/s]


 47%|███████████████▎                 | 23278/50000 [4:13:23<4:55:07,  1.51it/s]


 47%|███████████████▎                 | 23279/50000 [4:13:24<5:27:01,  1.36it/s]


 47%|███████████████▎                 | 23280/50000 [4:13:25<5:07:50,  1.45it/s]


 47%|███████████████▎                 | 23281/50000 [4:13:26<5:15:51,  1.41it/s]


 47%|███████████████▎                 | 23282/50000 [4:13:26<5:20:51,  1.39it/s]


 47%|███████████████▎                 | 23283/50000 [4:13:27<5:02:46,  1.47it/s]


 47%|███████████████▎                 | 23284/50000 [4:13:28<4:59:16,  1.49it/s]


 47%|███████████████▎                 | 23285/50000 [4:13:28<4:58:23,  1.49it/s]


 47%|███████████████▎                 | 23286/50000 [4:13:29<5:09:48,  1.44it/s]


 47%|███████████████▎                 | 23287/50000 [4:13:30<5:03:02,  1.47it/s]


 47%|███████████████▎                 | 23288/50000 [4:13:30<4:49:07,  1.54it/s]


 47%|███████████████▎                 | 23289/50000 [4:13:31<4:35:10,  1.62it/s]


 47%|███████████████▎                 | 23290/50000 [4:13:31<4:36:19,  1.61it/s]


 47%|███████████████▎                 | 23291/50000 [4:13:32<4:42:59,  1.57it/s]


 47%|███████████████▎                 | 23292/50000 [4:13:33<4:42:41,  1.57it/s]


 47%|███████████████▎                 | 23293/50000 [4:13:33<4:45:43,  1.56it/s]


 47%|███████████████▎                 | 23294/50000 [4:13:34<5:08:05,  1.44it/s]


 47%|███████████████▎                 | 23295/50000 [4:13:35<4:58:32,  1.49it/s]


 47%|███████████████▍                 | 23296/50000 [4:13:36<4:54:17,  1.51it/s]


 47%|███████████████▍                 | 23297/50000 [4:13:36<4:45:27,  1.56it/s]


 47%|███████████████▍                 | 23298/50000 [4:13:37<4:49:24,  1.54it/s]


 47%|███████████████▍                 | 23299/50000 [4:13:37<4:52:12,  1.52it/s]


 47%|███████████████▍                 | 23300/50000 [4:13:38<4:50:47,  1.53it/s]
                                                                                
{'loss': 3.3463, 'grad_norm': 3.735706090927124, 'learning_rate': 0.0005340000000000001, 'epoch': 1.22}

 47%|███████████████▍                 | 23300/50000 [4:13:38<4:50:47,  1.53it/s]


 47%|███████████████▍                 | 23301/50000 [4:13:39<4:50:37,  1.53it/s]


 47%|███████████████▍                 | 23302/50000 [4:13:39<4:32:23,  1.63it/s]


 47%|███████████████▍                 | 23303/50000 [4:13:40<4:30:03,  1.65it/s]


 47%|███████████████▍                 | 23304/50000 [4:13:40<4:23:25,  1.69it/s]


 47%|███████████████▍                 | 23305/50000 [4:13:41<4:45:31,  1.56it/s]


 47%|███████████████▍                 | 23306/50000 [4:13:42<4:49:27,  1.54it/s]


 47%|███████████████▍                 | 23307/50000 [4:13:43<4:52:07,  1.52it/s]


 47%|███████████████▍                 | 23308/50000 [4:13:43<4:53:11,  1.52it/s]


 47%|███████████████▍                 | 23309/50000 [4:13:44<4:52:12,  1.52it/s]


 47%|███████████████▍                 | 23310/50000 [4:13:44<4:28:25,  1.66it/s]


 47%|███████████████▍                 | 23311/50000 [4:13:45<4:36:33,  1.61it/s]


 47%|███████████████▍                 | 23312/50000 [4:13:46<4:41:23,  1.58it/s]


 47%|███████████████▍                 | 23313/50000 [4:13:46<4:38:07,  1.60it/s]


 47%|███████████████▍                 | 23314/50000 [4:13:47<4:27:20,  1.66it/s]


 47%|███████████████▍                 | 23315/50000 [4:13:47<4:19:47,  1.71it/s]


 47%|███████████████▍                 | 23316/50000 [4:13:48<4:31:18,  1.64it/s]


 47%|███████████████▍                 | 23317/50000 [4:13:49<4:39:27,  1.59it/s]


 47%|███████████████▍                 | 23318/50000 [4:13:50<5:09:39,  1.44it/s]


 47%|███████████████▍                 | 23319/50000 [4:13:50<5:12:10,  1.42it/s]


 47%|███████████████▍                 | 23320/50000 [4:13:51<5:08:48,  1.44it/s]


 47%|███████████████▍                 | 23321/50000 [4:13:52<5:16:00,  1.41it/s]


 47%|███████████████▍                 | 23322/50000 [4:13:52<5:07:18,  1.45it/s]


 47%|███████████████▍                 | 23323/50000 [4:13:53<5:00:23,  1.48it/s]


 47%|███████████████▍                 | 23324/50000 [4:13:54<4:56:19,  1.50it/s]


 47%|███████████████▍                 | 23325/50000 [4:13:54<4:45:07,  1.56it/s]


 47%|███████████████▍                 | 23326/50000 [4:13:55<4:36:23,  1.61it/s]


 47%|███████████████▍                 | 23327/50000 [4:13:55<4:23:08,  1.69it/s]


 47%|███████████████▍                 | 23328/50000 [4:13:56<4:21:00,  1.70it/s]


 47%|███████████████▍                 | 23329/50000 [4:13:56<4:10:40,  1.77it/s]


 47%|███████████████▍                 | 23330/50000 [4:13:57<4:13:08,  1.76it/s]


 47%|███████████████▍                 | 23331/50000 [4:13:58<4:16:28,  1.73it/s]


 47%|███████████████▍                 | 23332/50000 [4:13:58<4:41:07,  1.58it/s]


 47%|███████████████▍                 | 23333/50000 [4:13:59<4:41:16,  1.58it/s]


 47%|███████████████▍                 | 23334/50000 [4:14:00<4:32:02,  1.63it/s]


 47%|███████████████▍                 | 23335/50000 [4:14:00<4:39:04,  1.59it/s]


 47%|███████████████▍                 | 23336/50000 [4:14:01<4:34:45,  1.62it/s]


 47%|███████████████▍                 | 23337/50000 [4:14:01<4:20:00,  1.71it/s]


 47%|███████████████▍                 | 23338/50000 [4:14:02<4:22:06,  1.70it/s]


 47%|███████████████▍                 | 23339/50000 [4:14:03<4:43:19,  1.57it/s]


 47%|███████████████▍                 | 23340/50000 [4:14:03<4:54:47,  1.51it/s]


 47%|███████████████▍                 | 23341/50000 [4:14:04<5:05:20,  1.46it/s]


 47%|███████████████▍                 | 23342/50000 [4:14:05<4:47:59,  1.54it/s]


 47%|███████████████▍                 | 23343/50000 [4:14:05<4:40:13,  1.59it/s]


 47%|███████████████▍                 | 23344/50000 [4:14:06<5:10:27,  1.43it/s]


 47%|███████████████▍                 | 23345/50000 [4:14:07<5:03:36,  1.46it/s]


 47%|███████████████▍                 | 23346/50000 [4:14:07<4:56:18,  1.50it/s]


 47%|███████████████▍                 | 23347/50000 [4:14:08<4:52:53,  1.52it/s]


 47%|███████████████▍                 | 23348/50000 [4:14:09<4:40:10,  1.59it/s]


 47%|███████████████▍                 | 23349/50000 [4:14:09<4:33:44,  1.62it/s]


 47%|███████████████▍                 | 23350/50000 [4:14:10<4:35:10,  1.61it/s]


 47%|███████████████▍                 | 23351/50000 [4:14:10<4:32:44,  1.63it/s]


 47%|███████████████▍                 | 23352/50000 [4:14:11<4:33:36,  1.62it/s]


 47%|███████████████▍                 | 23353/50000 [4:14:12<4:55:37,  1.50it/s]


 47%|███████████████▍                 | 23354/50000 [4:14:12<4:43:41,  1.57it/s]


 47%|███████████████▍                 | 23355/50000 [4:14:13<4:33:39,  1.62it/s]


 47%|███████████████▍                 | 23356/50000 [4:14:13<4:21:06,  1.70it/s]


 47%|███████████████▍                 | 23357/50000 [4:14:14<4:28:50,  1.65it/s]


 47%|███████████████▍                 | 23358/50000 [4:14:15<4:25:50,  1.67it/s]


 47%|███████████████▍                 | 23359/50000 [4:14:15<4:20:22,  1.71it/s]


 47%|███████████████▍                 | 23360/50000 [4:14:16<4:26:37,  1.67it/s]


 47%|███████████████▍                 | 23361/50000 [4:14:16<4:20:55,  1.70it/s]


 47%|███████████████▍                 | 23362/50000 [4:14:17<4:29:26,  1.65it/s]


 47%|███████████████▍                 | 23363/50000 [4:14:18<4:47:51,  1.54it/s]


 47%|███████████████▍                 | 23364/50000 [4:14:19<4:58:58,  1.48it/s]


 47%|███████████████▍                 | 23365/50000 [4:14:19<5:04:04,  1.46it/s]


 47%|███████████████▍                 | 23366/50000 [4:14:20<4:41:54,  1.57it/s]


 47%|███████████████▍                 | 23367/50000 [4:14:21<5:12:19,  1.42it/s]


 47%|███████████████▍                 | 23368/50000 [4:14:21<4:56:27,  1.50it/s]


 47%|███████████████▍                 | 23369/50000 [4:14:22<4:51:18,  1.52it/s]


 47%|███████████████▍                 | 23370/50000 [4:14:23<4:50:57,  1.53it/s]


 47%|███████████████▍                 | 23371/50000 [4:14:23<4:58:21,  1.49it/s]


 47%|███████████████▍                 | 23372/50000 [4:14:24<4:54:12,  1.51it/s]


 47%|███████████████▍                 | 23373/50000 [4:14:25<4:50:32,  1.53it/s]


 47%|███████████████▍                 | 23374/50000 [4:14:25<4:39:31,  1.59it/s]


 47%|███████████████▍                 | 23375/50000 [4:14:26<5:03:41,  1.46it/s]


 47%|███████████████▍                 | 23376/50000 [4:14:27<5:26:47,  1.36it/s]


 47%|███████████████▍                 | 23377/50000 [4:14:27<5:12:27,  1.42it/s]


 47%|███████████████▍                 | 23378/50000 [4:14:28<5:32:44,  1.33it/s]


 47%|███████████████▍                 | 23379/50000 [4:14:29<5:11:34,  1.42it/s]


 47%|███████████████▍                 | 23380/50000 [4:14:30<5:08:30,  1.44it/s]


 47%|███████████████▍                 | 23381/50000 [4:14:30<4:55:23,  1.50it/s]


 47%|███████████████▍                 | 23382/50000 [4:14:31<4:46:11,  1.55it/s]


 47%|███████████████▍                 | 23383/50000 [4:14:31<4:37:18,  1.60it/s]


 47%|███████████████▍                 | 23384/50000 [4:14:32<4:48:49,  1.54it/s]


 47%|███████████████▍                 | 23385/50000 [4:14:33<4:48:22,  1.54it/s]


 47%|███████████████▍                 | 23386/50000 [4:14:33<4:50:35,  1.53it/s]


 47%|███████████████▍                 | 23387/50000 [4:14:34<4:38:42,  1.59it/s]


 47%|███████████████▍                 | 23388/50000 [4:14:35<4:50:31,  1.53it/s]


 47%|███████████████▍                 | 23389/50000 [4:14:35<4:50:19,  1.53it/s]


 47%|███████████████▍                 | 23390/50000 [4:14:36<4:58:43,  1.48it/s]


 47%|███████████████▍                 | 23391/50000 [4:14:37<4:49:27,  1.53it/s]


 47%|███████████████▍                 | 23392/50000 [4:14:37<4:37:12,  1.60it/s]


 47%|███████████████▍                 | 23393/50000 [4:14:38<4:38:51,  1.59it/s]


 47%|███████████████▍                 | 23394/50000 [4:14:38<4:47:08,  1.54it/s]


 47%|███████████████▍                 | 23395/50000 [4:14:39<4:48:55,  1.53it/s]


 47%|███████████████▍                 | 23396/50000 [4:14:40<4:52:40,  1.52it/s]


 47%|███████████████▍                 | 23397/50000 [4:14:41<5:20:39,  1.38it/s]


 47%|███████████████▍                 | 23398/50000 [4:14:41<4:54:15,  1.51it/s]


 47%|███████████████▍                 | 23399/50000 [4:14:42<4:46:11,  1.55it/s]


 47%|███████████████▍                 | 23400/50000 [4:14:43<4:55:47,  1.50it/s]
                                                                                
{'loss': 3.3137, 'grad_norm': 3.4105279445648193, 'learning_rate': 0.000532, 'epoch': 1.23}

 47%|███████████████▍                 | 23400/50000 [4:14:43<4:55:47,  1.50it/s]


 47%|███████████████▍                 | 23401/50000 [4:14:43<4:36:56,  1.60it/s]


 47%|███████████████▍                 | 23402/50000 [4:14:44<4:22:45,  1.69it/s]


 47%|███████████████▍                 | 23403/50000 [4:14:44<4:30:51,  1.64it/s]


 47%|███████████████▍                 | 23404/50000 [4:14:45<4:34:08,  1.62it/s]


 47%|███████████████▍                 | 23405/50000 [4:14:46<4:55:14,  1.50it/s]


 47%|███████████████▍                 | 23406/50000 [4:14:46<4:45:04,  1.55it/s]


 47%|███████████████▍                 | 23407/50000 [4:14:47<4:50:27,  1.53it/s]


 47%|███████████████▍                 | 23408/50000 [4:14:48<4:52:29,  1.52it/s]


 47%|███████████████▍                 | 23409/50000 [4:14:48<4:51:21,  1.52it/s]


 47%|███████████████▍                 | 23410/50000 [4:14:49<4:48:54,  1.53it/s]


 47%|███████████████▍                 | 23411/50000 [4:14:50<5:01:02,  1.47it/s]


 47%|███████████████▍                 | 23412/50000 [4:14:50<4:45:38,  1.55it/s]


 47%|███████████████▍                 | 23413/50000 [4:14:51<4:48:18,  1.54it/s]


 47%|███████████████▍                 | 23414/50000 [4:14:51<4:39:13,  1.59it/s]


 47%|███████████████▍                 | 23415/50000 [4:14:52<4:53:19,  1.51it/s]


 47%|███████████████▍                 | 23416/50000 [4:14:53<4:39:32,  1.58it/s]


 47%|███████████████▍                 | 23417/50000 [4:14:53<4:33:09,  1.62it/s]


 47%|███████████████▍                 | 23418/50000 [4:14:54<4:49:07,  1.53it/s]


 47%|███████████████▍                 | 23419/50000 [4:14:55<5:11:45,  1.42it/s]


 47%|███████████████▍                 | 23420/50000 [4:14:56<5:07:52,  1.44it/s]


 47%|███████████████▍                 | 23421/50000 [4:14:56<4:42:32,  1.57it/s]


 47%|███████████████▍                 | 23422/50000 [4:14:57<4:52:52,  1.51it/s]


 47%|███████████████▍                 | 23423/50000 [4:14:57<4:42:29,  1.57it/s]


 47%|███████████████▍                 | 23424/50000 [4:14:58<4:37:36,  1.60it/s]


 47%|███████████████▍                 | 23425/50000 [4:14:59<5:27:30,  1.35it/s]


 47%|███████████████▍                 | 23426/50000 [4:15:00<5:11:40,  1.42it/s]


 47%|███████████████▍                 | 23427/50000 [4:15:00<5:05:39,  1.45it/s]


 47%|███████████████▍                 | 23428/50000 [4:15:01<5:11:48,  1.42it/s]


 47%|███████████████▍                 | 23429/50000 [4:15:02<4:51:28,  1.52it/s]


 47%|███████████████▍                 | 23430/50000 [4:15:02<4:47:20,  1.54it/s]


 47%|███████████████▍                 | 23431/50000 [4:15:03<4:49:39,  1.53it/s]


 47%|███████████████▍                 | 23432/50000 [4:15:03<4:46:52,  1.54it/s]


 47%|███████████████▍                 | 23433/50000 [4:15:04<4:57:45,  1.49it/s]


 47%|███████████████▍                 | 23434/50000 [4:15:05<4:53:18,  1.51it/s]


 47%|███████████████▍                 | 23435/50000 [4:15:05<4:45:05,  1.55it/s]


 47%|███████████████▍                 | 23436/50000 [4:15:06<4:43:34,  1.56it/s]


 47%|███████████████▍                 | 23437/50000 [4:15:07<4:34:13,  1.61it/s]


 47%|███████████████▍                 | 23438/50000 [4:15:07<4:40:18,  1.58it/s]


 47%|███████████████▍                 | 23439/50000 [4:15:08<4:39:32,  1.58it/s]


 47%|███████████████▍                 | 23440/50000 [4:15:09<4:54:19,  1.50it/s]


 47%|███████████████▍                 | 23441/50000 [4:15:09<4:55:06,  1.50it/s]


 47%|███████████████▍                 | 23442/50000 [4:15:10<4:38:51,  1.59it/s]


 47%|███████████████▍                 | 23443/50000 [4:15:11<4:44:17,  1.56it/s]


 47%|███████████████▍                 | 23444/50000 [4:15:11<4:57:43,  1.49it/s]


 47%|███████████████▍                 | 23445/50000 [4:15:12<4:44:53,  1.55it/s]


 47%|███████████████▍                 | 23446/50000 [4:15:12<4:45:56,  1.55it/s]


 47%|███████████████▍                 | 23447/50000 [4:15:13<4:38:33,  1.59it/s]


 47%|███████████████▍                 | 23448/50000 [4:15:14<4:52:07,  1.51it/s]


 47%|███████████████▍                 | 23449/50000 [4:15:14<4:38:32,  1.59it/s]


 47%|███████████████▍                 | 23450/50000 [4:15:15<4:29:49,  1.64it/s]


 47%|███████████████▍                 | 23451/50000 [4:15:15<4:21:18,  1.69it/s]


 47%|███████████████▍                 | 23452/50000 [4:15:16<4:22:56,  1.68it/s]


 47%|███████████████▍                 | 23453/50000 [4:15:17<4:29:59,  1.64it/s]


 47%|███████████████▍                 | 23454/50000 [4:15:18<4:50:26,  1.52it/s]


 47%|███████████████▍                 | 23455/50000 [4:15:18<4:41:44,  1.57it/s]


 47%|███████████████▍                 | 23456/50000 [4:15:19<4:40:43,  1.58it/s]


 47%|███████████████▍                 | 23457/50000 [4:15:19<4:23:39,  1.68it/s]


 47%|███████████████▍                 | 23458/50000 [4:15:20<4:33:04,  1.62it/s]


 47%|███████████████▍                 | 23459/50000 [4:15:20<4:31:14,  1.63it/s]


 47%|███████████████▍                 | 23460/50000 [4:15:21<4:38:22,  1.59it/s]


 47%|███████████████▍                 | 23461/50000 [4:15:22<4:52:52,  1.51it/s]


 47%|███████████████▍                 | 23462/50000 [4:15:23<4:45:50,  1.55it/s]


 47%|███████████████▍                 | 23463/50000 [4:15:23<4:35:10,  1.61it/s]


 47%|███████████████▍                 | 23464/50000 [4:15:24<4:27:39,  1.65it/s]


 47%|███████████████▍                 | 23465/50000 [4:15:24<4:33:54,  1.61it/s]


 47%|███████████████▍                 | 23466/50000 [4:15:25<4:37:21,  1.59it/s]


 47%|███████████████▍                 | 23467/50000 [4:15:26<4:40:35,  1.58it/s]


 47%|███████████████▍                 | 23468/50000 [4:15:26<5:00:04,  1.47it/s]


 47%|███████████████▍                 | 23469/50000 [4:15:27<4:53:13,  1.51it/s]


 47%|███████████████▍                 | 23470/50000 [4:15:28<5:01:18,  1.47it/s]


 47%|███████████████▍                 | 23471/50000 [4:15:28<4:45:04,  1.55it/s]


 47%|███████████████▍                 | 23472/50000 [4:15:29<4:39:18,  1.58it/s]


 47%|███████████████▍                 | 23473/50000 [4:15:29<4:32:49,  1.62it/s]


 47%|███████████████▍                 | 23474/50000 [4:15:30<4:29:05,  1.64it/s]


 47%|███████████████▍                 | 23475/50000 [4:15:31<4:38:09,  1.59it/s]


 47%|███████████████▍                 | 23476/50000 [4:15:31<4:49:56,  1.52it/s]


 47%|███████████████▍                 | 23477/50000 [4:15:32<4:36:03,  1.60it/s]


 47%|███████████████▍                 | 23478/50000 [4:15:33<4:30:27,  1.63it/s]


 47%|███████████████▍                 | 23479/50000 [4:15:33<4:24:52,  1.67it/s]


 47%|███████████████▍                 | 23480/50000 [4:15:34<4:48:59,  1.53it/s]


 47%|███████████████▍                 | 23481/50000 [4:15:35<4:48:44,  1.53it/s]


 47%|███████████████▍                 | 23482/50000 [4:15:35<5:02:19,  1.46it/s]


 47%|███████████████▍                 | 23483/50000 [4:15:36<4:49:38,  1.53it/s]


 47%|███████████████▍                 | 23484/50000 [4:15:37<4:39:56,  1.58it/s]


 47%|███████████████▌                 | 23485/50000 [4:15:37<4:31:32,  1.63it/s]


 47%|███████████████▌                 | 23486/50000 [4:15:38<4:39:06,  1.58it/s]


 47%|███████████████▌                 | 23487/50000 [4:15:38<4:33:24,  1.62it/s]


 47%|███████████████▌                 | 23488/50000 [4:15:39<4:24:27,  1.67it/s]


 47%|███████████████▌                 | 23489/50000 [4:15:40<4:28:13,  1.65it/s]


 47%|███████████████▌                 | 23490/50000 [4:15:40<4:36:42,  1.60it/s]


 47%|███████████████▌                 | 23491/50000 [4:15:41<4:42:34,  1.56it/s]


 47%|███████████████▌                 | 23492/50000 [4:15:42<4:41:45,  1.57it/s]


 47%|███████████████▌                 | 23493/50000 [4:15:42<4:43:37,  1.56it/s]


 47%|███████████████▌                 | 23494/50000 [4:15:43<4:37:26,  1.59it/s]


 47%|███████████████▌                 | 23495/50000 [4:15:43<4:32:11,  1.62it/s]


 47%|███████████████▌                 | 23496/50000 [4:15:44<4:49:49,  1.52it/s]


 47%|███████████████▌                 | 23497/50000 [4:15:45<4:50:54,  1.52it/s]


 47%|███████████████▌                 | 23498/50000 [4:15:46<5:14:20,  1.41it/s]


 47%|███████████████▌                 | 23499/50000 [4:15:46<4:55:36,  1.49it/s]


 47%|███████████████▌                 | 23500/50000 [4:15:47<5:17:34,  1.39it/s]
                                                                                
{'loss': 3.3154, 'grad_norm': 3.015014171600342, 'learning_rate': 0.0005300000000000001, 'epoch': 1.23}

 47%|███████████████▌                 | 23500/50000 [4:15:47<5:17:34,  1.39it/s]


 47%|███████████████▌                 | 23501/50000 [4:15:48<5:07:50,  1.43it/s]


 47%|███████████████▌                 | 23502/50000 [4:15:48<4:51:51,  1.51it/s]


 47%|███████████████▌                 | 23503/50000 [4:15:49<5:01:08,  1.47it/s]


 47%|███████████████▌                 | 23504/50000 [4:15:50<4:57:58,  1.48it/s]


 47%|███████████████▌                 | 23505/50000 [4:15:50<4:44:55,  1.55it/s]


 47%|███████████████▌                 | 23506/50000 [4:15:51<4:32:26,  1.62it/s]


 47%|███████████████▌                 | 23507/50000 [4:15:51<4:37:56,  1.59it/s]


 47%|███████████████▌                 | 23508/50000 [4:15:52<4:43:30,  1.56it/s]


 47%|███████████████▌                 | 23509/50000 [4:15:53<4:41:27,  1.57it/s]


 47%|███████████████▌                 | 23510/50000 [4:15:53<4:36:41,  1.60it/s]


 47%|███████████████▌                 | 23511/50000 [4:15:54<4:35:55,  1.60it/s]


 47%|███████████████▌                 | 23512/50000 [4:15:55<4:56:35,  1.49it/s]


 47%|███████████████▌                 | 23513/50000 [4:15:55<4:43:53,  1.56it/s]


 47%|███████████████▌                 | 23514/50000 [4:15:56<4:36:37,  1.60it/s]


 47%|███████████████▌                 | 23515/50000 [4:15:57<5:11:57,  1.42it/s]


 47%|███████████████▌                 | 23516/50000 [4:15:57<5:03:50,  1.45it/s]


 47%|███████████████▌                 | 23517/50000 [4:15:58<4:49:10,  1.53it/s]


 47%|███████████████▌                 | 23518/50000 [4:15:59<4:52:10,  1.51it/s]


 47%|███████████████▌                 | 23519/50000 [4:15:59<4:53:19,  1.50it/s]


 47%|███████████████▌                 | 23520/50000 [4:16:00<4:52:29,  1.51it/s]


 47%|███████████████▌                 | 23521/50000 [4:16:01<4:41:50,  1.57it/s]


 47%|███████████████▌                 | 23522/50000 [4:16:01<4:53:07,  1.51it/s]


 47%|███████████████▌                 | 23523/50000 [4:16:02<5:06:05,  1.44it/s]


 47%|███████████████▌                 | 23524/50000 [4:16:03<4:48:49,  1.53it/s]


 47%|███████████████▌                 | 23525/50000 [4:16:03<4:49:23,  1.52it/s]


 47%|███████████████▌                 | 23526/50000 [4:16:04<4:47:10,  1.54it/s]


 47%|███████████████▌                 | 23527/50000 [4:16:05<4:51:31,  1.51it/s]


 47%|███████████████▌                 | 23528/50000 [4:16:05<5:11:15,  1.42it/s]


 47%|███████████████▌                 | 23529/50000 [4:16:06<4:55:46,  1.49it/s]


 47%|███████████████▌                 | 23530/50000 [4:16:07<5:07:00,  1.44it/s]


 47%|███████████████▌                 | 23531/50000 [4:16:07<5:02:55,  1.46it/s]


 47%|███████████████▌                 | 23532/50000 [4:16:08<4:57:26,  1.48it/s]


 47%|███████████████▌                 | 23533/50000 [4:16:09<4:42:46,  1.56it/s]


 47%|███████████████▌                 | 23534/50000 [4:16:09<4:31:42,  1.62it/s]


 47%|███████████████▌                 | 23535/50000 [4:16:10<4:29:45,  1.64it/s]


 47%|███████████████▌                 | 23536/50000 [4:16:10<4:25:03,  1.66it/s]


 47%|███████████████▌                 | 23537/50000 [4:16:11<4:48:54,  1.53it/s]


 47%|███████████████▌                 | 23538/50000 [4:16:12<4:58:50,  1.48it/s]


 47%|███████████████▌                 | 23539/50000 [4:16:13<4:59:01,  1.47it/s]


 47%|███████████████▌                 | 23540/50000 [4:16:13<4:52:37,  1.51it/s]


 47%|███████████████▌                 | 23541/50000 [4:16:14<4:54:00,  1.50it/s]


 47%|███████████████▌                 | 23542/50000 [4:16:15<5:06:59,  1.44it/s]


 47%|███████████████▌                 | 23543/50000 [4:16:15<4:53:50,  1.50it/s]


 47%|███████████████▌                 | 23544/50000 [4:16:16<4:51:19,  1.51it/s]


 47%|███████████████▌                 | 23545/50000 [4:16:16<4:47:05,  1.54it/s]


 47%|███████████████▌                 | 23546/50000 [4:16:17<4:33:20,  1.61it/s]


 47%|███████████████▌                 | 23547/50000 [4:16:18<4:41:10,  1.57it/s]


 47%|███████████████▌                 | 23548/50000 [4:16:18<4:45:00,  1.55it/s]


 47%|███████████████▌                 | 23549/50000 [4:16:19<5:10:12,  1.42it/s]


 47%|███████████████▌                 | 23550/50000 [4:16:20<5:01:01,  1.46it/s]


 47%|███████████████▌                 | 23551/50000 [4:16:21<5:00:16,  1.47it/s]


 47%|███████████████▌                 | 23552/50000 [4:16:21<5:04:49,  1.45it/s]


 47%|███████████████▌                 | 23553/50000 [4:16:22<5:00:28,  1.47it/s]


 47%|███████████████▌                 | 23554/50000 [4:16:22<4:36:57,  1.59it/s]


 47%|███████████████▌                 | 23555/50000 [4:16:23<4:31:42,  1.62it/s]


 47%|███████████████▌                 | 23556/50000 [4:16:24<4:36:21,  1.59it/s]


 47%|███████████████▌                 | 23557/50000 [4:16:24<4:22:32,  1.68it/s]


 47%|███████████████▌                 | 23558/50000 [4:16:25<4:19:55,  1.70it/s]


 47%|███████████████▌                 | 23559/50000 [4:16:25<4:30:37,  1.63it/s]


 47%|███████████████▌                 | 23560/50000 [4:16:26<4:35:35,  1.60it/s]


 47%|███████████████▌                 | 23561/50000 [4:16:27<4:39:02,  1.58it/s]


 47%|███████████████▌                 | 23562/50000 [4:16:27<4:34:49,  1.60it/s]


 47%|███████████████▌                 | 23563/50000 [4:16:28<4:19:12,  1.70it/s]


 47%|███████████████▌                 | 23564/50000 [4:16:28<4:21:42,  1.68it/s]


 47%|███████████████▌                 | 23565/50000 [4:16:29<4:23:27,  1.67it/s]


 47%|███████████████▌                 | 23566/50000 [4:16:30<4:28:53,  1.64it/s]


 47%|███████████████▌                 | 23567/50000 [4:16:30<4:31:01,  1.63it/s]


 47%|███████████████▌                 | 23568/50000 [4:16:31<4:44:37,  1.55it/s]


 47%|███████████████▌                 | 23569/50000 [4:16:32<4:59:17,  1.47it/s]


 47%|███████████████▌                 | 23570/50000 [4:16:33<5:12:56,  1.41it/s]


 47%|███████████████▌                 | 23571/50000 [4:16:33<4:58:44,  1.47it/s]


 47%|███████████████▌                 | 23572/50000 [4:16:34<4:58:03,  1.48it/s]


 47%|███████████████▌                 | 23573/50000 [4:16:34<4:33:05,  1.61it/s]


 47%|███████████████▌                 | 23574/50000 [4:16:35<5:05:08,  1.44it/s]


 47%|███████████████▌                 | 23575/50000 [4:16:36<4:59:54,  1.47it/s]


 47%|███████████████▌                 | 23576/50000 [4:16:37<5:06:37,  1.44it/s]


 47%|███████████████▌                 | 23577/50000 [4:16:37<5:21:49,  1.37it/s]


 47%|███████████████▌                 | 23578/50000 [4:16:38<5:15:44,  1.39it/s]


 47%|███████████████▌                 | 23579/50000 [4:16:39<5:03:49,  1.45it/s]


 47%|███████████████▌                 | 23580/50000 [4:16:39<5:00:24,  1.47it/s]


 47%|███████████████▌                 | 23581/50000 [4:16:40<5:06:51,  1.43it/s]


 47%|███████████████▌                 | 23582/50000 [4:16:41<5:13:49,  1.40it/s]


 47%|███████████████▌                 | 23583/50000 [4:16:42<5:28:16,  1.34it/s]


 47%|███████████████▌                 | 23584/50000 [4:16:42<5:11:44,  1.41it/s]


 47%|███████████████▌                 | 23585/50000 [4:16:43<4:57:40,  1.48it/s]


 47%|███████████████▌                 | 23586/50000 [4:16:44<4:55:23,  1.49it/s]


 47%|███████████████▌                 | 23587/50000 [4:16:44<4:53:43,  1.50it/s]


 47%|███████████████▌                 | 23588/50000 [4:16:45<4:51:40,  1.51it/s]


 47%|███████████████▌                 | 23589/50000 [4:16:45<4:46:12,  1.54it/s]


 47%|███████████████▌                 | 23590/50000 [4:16:46<4:38:11,  1.58it/s]


 47%|███████████████▌                 | 23591/50000 [4:16:47<4:33:38,  1.61it/s]


 47%|███████████████▌                 | 23592/50000 [4:16:47<4:28:43,  1.64it/s]


 47%|███████████████▌                 | 23593/50000 [4:16:48<4:38:37,  1.58it/s]


 47%|███████████████▌                 | 23594/50000 [4:16:48<4:19:20,  1.70it/s]


 47%|███████████████▌                 | 23595/50000 [4:16:49<4:37:23,  1.59it/s]


 47%|███████████████▌                 | 23596/50000 [4:16:50<4:31:23,  1.62it/s]


 47%|███████████████▌                 | 23597/50000 [4:16:50<4:43:52,  1.55it/s]


 47%|███████████████▌                 | 23598/50000 [4:16:51<4:43:19,  1.55it/s]


 47%|███████████████▌                 | 23599/50000 [4:16:52<4:32:12,  1.62it/s]


 47%|███████████████▌                 | 23600/50000 [4:16:52<4:27:42,  1.64it/s]
                                                                                
{'loss': 3.3092, 'grad_norm': 3.314993143081665, 'learning_rate': 0.000528, 'epoch': 1.24}

 47%|███████████████▌                 | 23600/50000 [4:16:52<4:27:42,  1.64it/s]


 47%|███████████████▌                 | 23601/50000 [4:16:53<4:41:19,  1.56it/s]


 47%|███████████████▌                 | 23602/50000 [4:16:54<4:34:55,  1.60it/s]


 47%|███████████████▌                 | 23603/50000 [4:16:54<4:32:52,  1.61it/s]


 47%|███████████████▌                 | 23604/50000 [4:16:55<4:28:15,  1.64it/s]


 47%|███████████████▌                 | 23605/50000 [4:16:55<4:43:31,  1.55it/s]


 47%|███████████████▌                 | 23606/50000 [4:16:56<4:26:50,  1.65it/s]


 47%|███████████████▌                 | 23607/50000 [4:16:57<4:28:55,  1.64it/s]


 47%|███████████████▌                 | 23608/50000 [4:16:57<4:32:06,  1.62it/s]


 47%|███████████████▌                 | 23609/50000 [4:16:58<4:22:42,  1.67it/s]


 47%|███████████████▌                 | 23610/50000 [4:16:58<4:26:51,  1.65it/s]


 47%|███████████████▌                 | 23611/50000 [4:16:59<4:26:30,  1.65it/s]


 47%|███████████████▌                 | 23612/50000 [4:17:00<4:26:55,  1.65it/s]


 47%|███████████████▌                 | 23613/50000 [4:17:00<4:41:21,  1.56it/s]


 47%|███████████████▌                 | 23614/50000 [4:17:01<4:36:11,  1.59it/s]


 47%|███████████████▌                 | 23615/50000 [4:17:02<5:03:28,  1.45it/s]


 47%|███████████████▌                 | 23616/50000 [4:17:02<4:50:55,  1.51it/s]


 47%|███████████████▌                 | 23617/50000 [4:17:03<4:38:48,  1.58it/s]


 47%|███████████████▌                 | 23618/50000 [4:17:04<4:39:30,  1.57it/s]


 47%|███████████████▌                 | 23619/50000 [4:17:04<4:23:48,  1.67it/s]


 47%|███████████████▌                 | 23620/50000 [4:17:05<4:29:47,  1.63it/s]


 47%|███████████████▌                 | 23621/50000 [4:17:05<4:37:21,  1.59it/s]


 47%|███████████████▌                 | 23622/50000 [4:17:06<4:33:22,  1.61it/s]


 47%|███████████████▌                 | 23623/50000 [4:17:07<4:35:40,  1.59it/s]


 47%|███████████████▌                 | 23624/50000 [4:17:08<5:06:39,  1.43it/s]


 47%|███████████████▌                 | 23625/50000 [4:17:08<5:24:46,  1.35it/s]


 47%|███████████████▌                 | 23626/50000 [4:17:09<5:04:01,  1.45it/s]


 47%|███████████████▌                 | 23627/50000 [4:17:10<5:07:09,  1.43it/s]


 47%|███████████████▌                 | 23628/50000 [4:17:10<4:47:42,  1.53it/s]


 47%|███████████████▌                 | 23629/50000 [4:17:11<4:37:27,  1.58it/s]


 47%|███████████████▌                 | 23630/50000 [4:17:11<4:27:50,  1.64it/s]


 47%|███████████████▌                 | 23631/50000 [4:17:12<4:32:38,  1.61it/s]


 47%|███████████████▌                 | 23632/50000 [4:17:13<4:44:21,  1.55it/s]


 47%|███████████████▌                 | 23633/50000 [4:17:13<4:30:52,  1.62it/s]


 47%|███████████████▌                 | 23634/50000 [4:17:14<4:28:54,  1.63it/s]


 47%|███████████████▌                 | 23635/50000 [4:17:14<4:27:30,  1.64it/s]


 47%|███████████████▌                 | 23636/50000 [4:17:15<4:37:22,  1.58it/s]


 47%|███████████████▌                 | 23637/50000 [4:17:16<4:31:01,  1.62it/s]


 47%|███████████████▌                 | 23638/50000 [4:17:16<4:33:11,  1.61it/s]


 47%|███████████████▌                 | 23639/50000 [4:17:17<5:22:36,  1.36it/s]


 47%|███████████████▌                 | 23640/50000 [4:17:18<5:26:17,  1.35it/s]


 47%|███████████████▌                 | 23641/50000 [4:17:19<5:17:42,  1.38it/s]


 47%|███████████████▌                 | 23642/50000 [4:17:19<5:12:34,  1.41it/s]


 47%|███████████████▌                 | 23643/50000 [4:17:20<4:58:01,  1.47it/s]


 47%|███████████████▌                 | 23644/50000 [4:17:21<4:52:03,  1.50it/s]


 47%|███████████████▌                 | 23645/50000 [4:17:21<4:38:45,  1.58it/s]


 47%|███████████████▌                 | 23646/50000 [4:17:22<4:34:20,  1.60it/s]


 47%|███████████████▌                 | 23647/50000 [4:17:23<4:40:40,  1.56it/s]


 47%|███████████████▌                 | 23648/50000 [4:17:23<4:31:38,  1.62it/s]


 47%|███████████████▌                 | 23649/50000 [4:17:24<4:32:42,  1.61it/s]


 47%|███████████████▌                 | 23650/50000 [4:17:24<4:26:04,  1.65it/s]


 47%|███████████████▌                 | 23651/50000 [4:17:25<4:28:01,  1.64it/s]


 47%|███████████████▌                 | 23652/50000 [4:17:25<4:21:52,  1.68it/s]


 47%|███████████████▌                 | 23653/50000 [4:17:26<4:43:02,  1.55it/s]


 47%|███████████████▌                 | 23654/50000 [4:17:27<4:51:49,  1.50it/s]


 47%|███████████████▌                 | 23655/50000 [4:17:28<4:53:47,  1.49it/s]


 47%|███████████████▌                 | 23656/50000 [4:17:28<4:53:04,  1.50it/s]


 47%|███████████████▌                 | 23657/50000 [4:17:29<4:52:58,  1.50it/s]


 47%|███████████████▌                 | 23658/50000 [4:17:30<4:49:32,  1.52it/s]


 47%|███████████████▌                 | 23659/50000 [4:17:30<4:47:55,  1.52it/s]


 47%|███████████████▌                 | 23660/50000 [4:17:31<4:34:24,  1.60it/s]


 47%|███████████████▌                 | 23661/50000 [4:17:31<4:37:21,  1.58it/s]


 47%|███████████████▌                 | 23662/50000 [4:17:32<5:06:59,  1.43it/s]


 47%|███████████████▌                 | 23663/50000 [4:17:33<4:51:53,  1.50it/s]


 47%|███████████████▌                 | 23664/50000 [4:17:34<4:51:15,  1.51it/s]


 47%|███████████████▌                 | 23665/50000 [4:17:34<4:40:49,  1.56it/s]


 47%|███████████████▌                 | 23666/50000 [4:17:35<4:43:33,  1.55it/s]


 47%|███████████████▌                 | 23667/50000 [4:17:36<4:57:24,  1.48it/s]


 47%|███████████████▌                 | 23668/50000 [4:17:36<4:40:10,  1.57it/s]


 47%|███████████████▌                 | 23669/50000 [4:17:37<4:33:02,  1.61it/s]


 47%|███████████████▌                 | 23670/50000 [4:17:37<4:34:05,  1.60it/s]


 47%|███████████████▌                 | 23671/50000 [4:17:38<4:51:35,  1.50it/s]


 47%|███████████████▌                 | 23672/50000 [4:17:39<5:05:32,  1.44it/s]


 47%|███████████████▌                 | 23673/50000 [4:17:39<4:55:50,  1.48it/s]


 47%|███████████████▌                 | 23674/50000 [4:17:40<5:10:08,  1.41it/s]


 47%|███████████████▋                 | 23675/50000 [4:17:41<5:10:39,  1.41it/s]


 47%|███████████████▋                 | 23676/50000 [4:17:41<4:45:43,  1.54it/s]


 47%|███████████████▋                 | 23677/50000 [4:17:42<4:39:22,  1.57it/s]


 47%|███████████████▋                 | 23678/50000 [4:17:43<4:50:02,  1.51it/s]


 47%|███████████████▋                 | 23679/50000 [4:17:43<4:35:53,  1.59it/s]


 47%|███████████████▋                 | 23680/50000 [4:17:44<4:25:08,  1.65it/s]


 47%|███████████████▋                 | 23681/50000 [4:17:45<4:43:33,  1.55it/s]


 47%|███████████████▋                 | 23682/50000 [4:17:45<4:52:06,  1.50it/s]


 47%|███████████████▋                 | 23683/50000 [4:17:46<4:27:46,  1.64it/s]


 47%|███████████████▋                 | 23684/50000 [4:17:47<4:36:49,  1.58it/s]


 47%|███████████████▋                 | 23685/50000 [4:17:47<4:28:14,  1.63it/s]


 47%|███████████████▋                 | 23686/50000 [4:17:48<4:36:16,  1.59it/s]


 47%|███████████████▋                 | 23687/50000 [4:17:48<4:27:02,  1.64it/s]


 47%|███████████████▋                 | 23688/50000 [4:17:49<4:23:47,  1.66it/s]


 47%|███████████████▋                 | 23689/50000 [4:17:50<4:31:30,  1.62it/s]


 47%|███████████████▋                 | 23690/50000 [4:17:50<4:40:13,  1.56it/s]


 47%|███████████████▋                 | 23691/50000 [4:17:51<4:38:51,  1.57it/s]


 47%|███████████████▋                 | 23692/50000 [4:17:51<4:29:21,  1.63it/s]


 47%|███████████████▋                 | 23693/50000 [4:17:52<4:30:06,  1.62it/s]


 47%|███████████████▋                 | 23694/50000 [4:17:53<4:58:56,  1.47it/s]


 47%|███████████████▋                 | 23695/50000 [4:17:53<4:36:20,  1.59it/s]


 47%|███████████████▋                 | 23696/50000 [4:17:54<4:28:22,  1.63it/s]


 47%|███████████████▋                 | 23697/50000 [4:17:55<4:24:30,  1.66it/s]


 47%|███████████████▋                 | 23698/50000 [4:17:55<4:19:16,  1.69it/s]


 47%|███████████████▋                 | 23699/50000 [4:17:56<4:14:46,  1.72it/s]


 47%|███████████████▋                 | 23700/50000 [4:17:56<4:36:01,  1.59it/s]
                                                                                
{'loss': 3.3393, 'grad_norm': 3.0866825580596924, 'learning_rate': 0.000526, 'epoch': 1.24}

 47%|███████████████▋                 | 23700/50000 [4:17:56<4:36:01,  1.59it/s]


 47%|███████████████▋                 | 23701/50000 [4:17:57<4:29:02,  1.63it/s]


 47%|███████████████▋                 | 23702/50000 [4:17:58<4:24:05,  1.66it/s]


 47%|███████████████▋                 | 23703/50000 [4:17:58<4:38:03,  1.58it/s]


 47%|███████████████▋                 | 23704/50000 [4:17:59<4:38:40,  1.57it/s]


 47%|███████████████▋                 | 23705/50000 [4:17:59<4:30:45,  1.62it/s]


 47%|███████████████▋                 | 23706/50000 [4:18:00<4:35:14,  1.59it/s]


 47%|███████████████▋                 | 23707/50000 [4:18:01<4:30:57,  1.62it/s]


 47%|███████████████▋                 | 23708/50000 [4:18:01<4:34:39,  1.60it/s]


 47%|███████████████▋                 | 23709/50000 [4:18:02<4:29:38,  1.63it/s]


 47%|███████████████▋                 | 23710/50000 [4:18:03<4:27:50,  1.64it/s]


 47%|███████████████▋                 | 23711/50000 [4:18:03<4:14:31,  1.72it/s]


 47%|███████████████▋                 | 23712/50000 [4:18:04<4:21:21,  1.68it/s]


 47%|███████████████▋                 | 23713/50000 [4:18:04<4:11:35,  1.74it/s]


 47%|███████████████▋                 | 23714/50000 [4:18:05<4:54:19,  1.49it/s]


 47%|███████████████▋                 | 23715/50000 [4:18:06<4:50:56,  1.51it/s]


 47%|███████████████▋                 | 23716/50000 [4:18:06<4:47:35,  1.52it/s]


 47%|███████████████▋                 | 23717/50000 [4:18:07<4:41:04,  1.56it/s]


 47%|███████████████▋                 | 23718/50000 [4:18:08<4:39:07,  1.57it/s]


 47%|███████████████▋                 | 23719/50000 [4:18:08<4:45:11,  1.54it/s]


 47%|███████████████▋                 | 23720/50000 [4:18:09<4:49:34,  1.51it/s]


 47%|███████████████▋                 | 23721/50000 [4:18:10<4:47:13,  1.52it/s]


 47%|███████████████▋                 | 23722/50000 [4:18:10<4:50:01,  1.51it/s]


 47%|███████████████▋                 | 23723/50000 [4:18:11<5:05:39,  1.43it/s]


 47%|███████████████▋                 | 23724/50000 [4:18:12<4:52:59,  1.49it/s]


 47%|███████████████▋                 | 23725/50000 [4:18:13<5:17:21,  1.38it/s]


 47%|███████████████▋                 | 23726/50000 [4:18:13<5:06:07,  1.43it/s]


 47%|███████████████▋                 | 23727/50000 [4:18:14<5:08:15,  1.42it/s]


 47%|███████████████▋                 | 23728/50000 [4:18:15<5:01:25,  1.45it/s]


 47%|███████████████▋                 | 23729/50000 [4:18:15<4:48:19,  1.52it/s]


 47%|███████████████▋                 | 23730/50000 [4:18:16<5:02:45,  1.45it/s]


 47%|███████████████▋                 | 23731/50000 [4:18:17<5:18:13,  1.38it/s]


 47%|███████████████▋                 | 23732/50000 [4:18:17<5:06:39,  1.43it/s]


 47%|███████████████▋                 | 23733/50000 [4:18:18<4:51:09,  1.50it/s]


 47%|███████████████▋                 | 23734/50000 [4:18:19<4:47:45,  1.52it/s]


 47%|███████████████▋                 | 23735/50000 [4:18:19<4:42:58,  1.55it/s]


 47%|███████████████▋                 | 23736/50000 [4:18:20<5:00:40,  1.46it/s]


 47%|███████████████▋                 | 23737/50000 [4:18:21<5:20:10,  1.37it/s]


 47%|███████████████▋                 | 23738/50000 [4:18:21<5:05:42,  1.43it/s]


 47%|███████████████▋                 | 23739/50000 [4:18:22<5:14:53,  1.39it/s]


 47%|███████████████▋                 | 23740/50000 [4:18:23<5:16:28,  1.38it/s]


 47%|███████████████▋                 | 23741/50000 [4:18:24<5:28:56,  1.33it/s]


 47%|███████████████▋                 | 23742/50000 [4:18:24<5:18:40,  1.37it/s]


 47%|███████████████▋                 | 23743/50000 [4:18:25<5:19:22,  1.37it/s]


 47%|███████████████▋                 | 23744/50000 [4:18:26<5:09:23,  1.41it/s]


 47%|███████████████▋                 | 23745/50000 [4:18:26<4:43:42,  1.54it/s]


 47%|███████████████▋                 | 23746/50000 [4:18:27<4:48:40,  1.52it/s]


 47%|███████████████▋                 | 23747/50000 [4:18:28<4:49:40,  1.51it/s]


 47%|███████████████▋                 | 23748/50000 [4:18:28<4:37:56,  1.57it/s]


 47%|███████████████▋                 | 23749/50000 [4:18:29<4:39:36,  1.56it/s]


 48%|███████████████▋                 | 23750/50000 [4:18:30<4:37:28,  1.58it/s]


 48%|███████████████▋                 | 23751/50000 [4:18:30<4:41:45,  1.55it/s]


 48%|███████████████▋                 | 23752/50000 [4:18:31<5:04:34,  1.44it/s]


 48%|███████████████▋                 | 23753/50000 [4:18:32<5:07:22,  1.42it/s]


 48%|███████████████▋                 | 23754/50000 [4:18:32<4:48:34,  1.52it/s]


 48%|███████████████▋                 | 23755/50000 [4:18:33<4:51:09,  1.50it/s]


 48%|███████████████▋                 | 23756/50000 [4:18:34<4:52:57,  1.49it/s]


 48%|███████████████▋                 | 23757/50000 [4:18:34<4:58:21,  1.47it/s]


 48%|███████████████▋                 | 23758/50000 [4:18:35<4:52:46,  1.49it/s]


 48%|███████████████▋                 | 23759/50000 [4:18:36<4:47:10,  1.52it/s]


 48%|███████████████▋                 | 23760/50000 [4:18:36<4:48:10,  1.52it/s]


 48%|███████████████▋                 | 23761/50000 [4:18:37<4:41:21,  1.55it/s]


 48%|███████████████▋                 | 23762/50000 [4:18:37<4:20:33,  1.68it/s]


 48%|███████████████▋                 | 23763/50000 [4:18:38<4:27:53,  1.63it/s]


 48%|███████████████▋                 | 23764/50000 [4:18:39<4:31:23,  1.61it/s]


 48%|███████████████▋                 | 23765/50000 [4:18:39<4:34:39,  1.59it/s]


 48%|███████████████▋                 | 23766/50000 [4:18:40<4:28:39,  1.63it/s]


 48%|███████████████▋                 | 23767/50000 [4:18:41<4:23:40,  1.66it/s]


 48%|███████████████▋                 | 23768/50000 [4:18:41<4:15:57,  1.71it/s]


 48%|███████████████▋                 | 23769/50000 [4:18:42<4:32:10,  1.61it/s]


 48%|███████████████▋                 | 23770/50000 [4:18:42<4:32:51,  1.60it/s]


 48%|███████████████▋                 | 23771/50000 [4:18:43<4:24:26,  1.65it/s]


 48%|███████████████▋                 | 23772/50000 [4:18:44<4:21:48,  1.67it/s]


 48%|███████████████▋                 | 23773/50000 [4:18:44<4:25:49,  1.64it/s]


 48%|███████████████▋                 | 23774/50000 [4:18:45<4:48:53,  1.51it/s]


 48%|███████████████▋                 | 23775/50000 [4:18:46<4:51:16,  1.50it/s]


 48%|███████████████▋                 | 23776/50000 [4:18:46<4:51:17,  1.50it/s]


 48%|███████████████▋                 | 23777/50000 [4:18:47<5:06:34,  1.43it/s]


 48%|███████████████▋                 | 23778/50000 [4:18:48<5:01:10,  1.45it/s]


 48%|███████████████▋                 | 23779/50000 [4:18:49<5:11:01,  1.41it/s]


 48%|███████████████▋                 | 23780/50000 [4:18:49<4:46:28,  1.53it/s]


 48%|███████████████▋                 | 23781/50000 [4:18:50<4:56:28,  1.47it/s]


 48%|███████████████▋                 | 23782/50000 [4:18:50<4:54:02,  1.49it/s]


 48%|███████████████▋                 | 23783/50000 [4:18:51<4:52:20,  1.49it/s]


 48%|███████████████▋                 | 23784/50000 [4:18:52<5:03:59,  1.44it/s]


 48%|███████████████▋                 | 23785/50000 [4:18:52<4:51:38,  1.50it/s]


 48%|███████████████▋                 | 23786/50000 [4:18:53<5:10:10,  1.41it/s]


 48%|███████████████▋                 | 23787/50000 [4:18:54<5:00:07,  1.46it/s]


 48%|███████████████▋                 | 23788/50000 [4:18:55<4:56:18,  1.47it/s]


 48%|███████████████▋                 | 23789/50000 [4:18:55<4:55:17,  1.48it/s]


 48%|███████████████▋                 | 23790/50000 [4:18:56<4:44:37,  1.53it/s]


 48%|███████████████▋                 | 23791/50000 [4:18:56<4:25:57,  1.64it/s]


 48%|███████████████▋                 | 23792/50000 [4:18:57<4:29:06,  1.62it/s]


 48%|███████████████▋                 | 23793/50000 [4:18:58<4:30:27,  1.62it/s]


 48%|███████████████▋                 | 23794/50000 [4:18:58<4:21:29,  1.67it/s]


 48%|███████████████▋                 | 23795/50000 [4:18:59<4:38:41,  1.57it/s]


 48%|███████████████▋                 | 23796/50000 [4:18:59<4:32:12,  1.60it/s]


 48%|███████████████▋                 | 23797/50000 [4:19:00<4:29:20,  1.62it/s]


 48%|███████████████▋                 | 23798/50000 [4:19:01<4:24:52,  1.65it/s]


 48%|███████████████▋                 | 23799/50000 [4:19:01<4:32:30,  1.60it/s]


 48%|███████████████▋                 | 23800/50000 [4:19:02<5:01:55,  1.45it/s]
                                                                                
{'loss': 3.2708, 'grad_norm': 3.103687286376953, 'learning_rate': 0.000524, 'epoch': 1.25}

 48%|███████████████▋                 | 23800/50000 [4:19:02<5:01:55,  1.45it/s]


 48%|███████████████▋                 | 23801/50000 [4:19:03<5:00:56,  1.45it/s]


 48%|███████████████▋                 | 23802/50000 [4:19:03<4:54:23,  1.48it/s]


 48%|███████████████▋                 | 23803/50000 [4:19:04<4:43:09,  1.54it/s]


 48%|███████████████▋                 | 23804/50000 [4:19:05<4:32:01,  1.60it/s]


 48%|███████████████▋                 | 23805/50000 [4:19:05<4:44:05,  1.54it/s]


 48%|███████████████▋                 | 23806/50000 [4:19:06<4:35:09,  1.59it/s]


 48%|███████████████▋                 | 23807/50000 [4:19:07<4:36:21,  1.58it/s]


 48%|███████████████▋                 | 23808/50000 [4:19:07<4:29:46,  1.62it/s]


 48%|███████████████▋                 | 23809/50000 [4:19:08<4:41:46,  1.55it/s]


 48%|███████████████▋                 | 23810/50000 [4:19:09<5:36:14,  1.30it/s]


 48%|███████████████▋                 | 23811/50000 [4:19:10<5:20:42,  1.36it/s]


 48%|███████████████▋                 | 23812/50000 [4:19:10<5:05:47,  1.43it/s]


 48%|███████████████▋                 | 23813/50000 [4:19:11<5:02:45,  1.44it/s]


 48%|███████████████▋                 | 23814/50000 [4:19:12<5:09:13,  1.41it/s]


 48%|███████████████▋                 | 23815/50000 [4:19:12<4:51:55,  1.49it/s]


 48%|███████████████▋                 | 23816/50000 [4:19:13<4:51:24,  1.50it/s]


 48%|███████████████▋                 | 23817/50000 [4:19:13<4:36:58,  1.58it/s]


 48%|███████████████▋                 | 23818/50000 [4:19:14<4:30:58,  1.61it/s]


 48%|███████████████▋                 | 23819/50000 [4:19:15<4:32:33,  1.60it/s]


 48%|███████████████▋                 | 23820/50000 [4:19:15<4:51:23,  1.50it/s]


 48%|███████████████▋                 | 23821/50000 [4:19:16<4:57:04,  1.47it/s]


 48%|███████████████▋                 | 23822/50000 [4:19:17<4:45:21,  1.53it/s]


 48%|███████████████▋                 | 23823/50000 [4:19:17<4:39:36,  1.56it/s]


 48%|███████████████▋                 | 23824/50000 [4:19:18<4:27:33,  1.63it/s]


 48%|███████████████▋                 | 23825/50000 [4:19:18<4:22:11,  1.66it/s]


 48%|███████████████▋                 | 23826/50000 [4:19:19<4:25:02,  1.65it/s]


 48%|███████████████▋                 | 23827/50000 [4:19:20<4:25:20,  1.64it/s]


 48%|███████████████▋                 | 23828/50000 [4:19:20<4:19:57,  1.68it/s]


 48%|███████████████▋                 | 23829/50000 [4:19:21<4:20:48,  1.67it/s]


 48%|███████████████▋                 | 23830/50000 [4:19:21<4:21:17,  1.67it/s]


 48%|███████████████▋                 | 23831/50000 [4:19:22<4:31:38,  1.61it/s]


 48%|███████████████▋                 | 23832/50000 [4:19:23<4:28:49,  1.62it/s]


 48%|███████████████▋                 | 23833/50000 [4:19:23<4:20:19,  1.68it/s]


 48%|███████████████▋                 | 23834/50000 [4:19:24<4:16:52,  1.70it/s]


 48%|███████████████▋                 | 23835/50000 [4:19:25<4:29:11,  1.62it/s]


 48%|███████████████▋                 | 23836/50000 [4:19:25<5:10:15,  1.41it/s]


 48%|███████████████▋                 | 23837/50000 [4:19:26<5:00:11,  1.45it/s]


 48%|███████████████▋                 | 23838/50000 [4:19:27<4:42:13,  1.54it/s]


 48%|███████████████▋                 | 23839/50000 [4:19:27<4:58:09,  1.46it/s]


 48%|███████████████▋                 | 23840/50000 [4:19:28<5:07:31,  1.42it/s]


 48%|███████████████▋                 | 23841/50000 [4:19:29<4:49:46,  1.50it/s]


 48%|███████████████▋                 | 23842/50000 [4:19:29<4:43:59,  1.54it/s]


 48%|███████████████▋                 | 23843/50000 [4:19:30<4:54:40,  1.48it/s]


 48%|███████████████▋                 | 23844/50000 [4:19:31<4:44:36,  1.53it/s]


 48%|███████████████▋                 | 23845/50000 [4:19:31<4:57:43,  1.46it/s]


 48%|███████████████▋                 | 23846/50000 [4:19:32<4:50:51,  1.50it/s]


 48%|███████████████▋                 | 23847/50000 [4:19:33<4:36:45,  1.57it/s]


 48%|███████████████▋                 | 23848/50000 [4:19:33<4:26:39,  1.63it/s]


 48%|███████████████▋                 | 23849/50000 [4:19:34<4:41:38,  1.55it/s]


 48%|███████████████▋                 | 23850/50000 [4:19:35<4:51:03,  1.50it/s]


 48%|███████████████▋                 | 23851/50000 [4:19:35<4:49:46,  1.50it/s]


 48%|███████████████▋                 | 23852/50000 [4:19:36<5:01:17,  1.45it/s]


 48%|███████████████▋                 | 23853/50000 [4:19:37<4:46:11,  1.52it/s]


 48%|███████████████▋                 | 23854/50000 [4:19:37<4:49:44,  1.50it/s]


 48%|███████████████▋                 | 23855/50000 [4:19:38<4:45:34,  1.53it/s]


 48%|███████████████▋                 | 23856/50000 [4:19:39<4:42:05,  1.54it/s]


 48%|███████████████▋                 | 23857/50000 [4:19:39<4:57:33,  1.46it/s]


 48%|███████████████▋                 | 23858/50000 [4:19:40<4:47:10,  1.52it/s]


 48%|███████████████▋                 | 23859/50000 [4:19:41<4:38:59,  1.56it/s]


 48%|███████████████▋                 | 23860/50000 [4:19:41<4:42:20,  1.54it/s]


 48%|███████████████▋                 | 23861/50000 [4:19:42<4:44:47,  1.53it/s]


 48%|███████████████▋                 | 23862/50000 [4:19:42<4:38:56,  1.56it/s]


 48%|███████████████▋                 | 23863/50000 [4:19:43<4:49:10,  1.51it/s]


 48%|███████████████▊                 | 23864/50000 [4:19:44<4:39:40,  1.56it/s]


 48%|███████████████▊                 | 23865/50000 [4:19:44<4:44:29,  1.53it/s]


 48%|███████████████▊                 | 23866/50000 [4:19:45<4:38:47,  1.56it/s]


 48%|███████████████▊                 | 23867/50000 [4:19:46<4:51:40,  1.49it/s]


 48%|███████████████▊                 | 23868/50000 [4:19:47<5:21:45,  1.35it/s]


 48%|███████████████▊                 | 23869/50000 [4:19:47<5:14:29,  1.38it/s]


 48%|███████████████▊                 | 23870/50000 [4:19:48<4:58:40,  1.46it/s]


 48%|███████████████▊                 | 23871/50000 [4:19:49<4:40:16,  1.55it/s]


 48%|███████████████▊                 | 23872/50000 [4:19:49<4:23:38,  1.65it/s]


 48%|███████████████▊                 | 23873/50000 [4:19:50<4:20:35,  1.67it/s]


 48%|███████████████▊                 | 23874/50000 [4:19:50<4:25:55,  1.64it/s]


 48%|███████████████▊                 | 23875/50000 [4:19:51<4:32:54,  1.60it/s]


 48%|███████████████▊                 | 23876/50000 [4:19:52<4:45:32,  1.52it/s]


 48%|███████████████▊                 | 23877/50000 [4:19:52<4:47:35,  1.51it/s]


 48%|███████████████▊                 | 23878/50000 [4:19:53<4:39:03,  1.56it/s]


 48%|███████████████▊                 | 23879/50000 [4:19:54<5:14:30,  1.38it/s]


 48%|███████████████▊                 | 23880/50000 [4:19:55<5:21:23,  1.35it/s]


 48%|███████████████▊                 | 23881/50000 [4:19:55<5:12:33,  1.39it/s]


 48%|███████████████▊                 | 23882/50000 [4:19:56<5:13:18,  1.39it/s]


 48%|███████████████▊                 | 23883/50000 [4:19:57<4:56:10,  1.47it/s]


 48%|███████████████▊                 | 23884/50000 [4:19:57<4:46:38,  1.52it/s]


 48%|███████████████▊                 | 23885/50000 [4:19:58<4:39:11,  1.56it/s]


 48%|███████████████▊                 | 23886/50000 [4:19:58<4:30:43,  1.61it/s]


 48%|███████████████▊                 | 23887/50000 [4:19:59<4:20:43,  1.67it/s]


 48%|███████████████▊                 | 23888/50000 [4:20:00<4:27:40,  1.63it/s]


 48%|███████████████▊                 | 23889/50000 [4:20:00<4:15:32,  1.70it/s]


 48%|███████████████▊                 | 23890/50000 [4:20:01<4:28:03,  1.62it/s]


 48%|███████████████▊                 | 23891/50000 [4:20:01<4:24:28,  1.65it/s]


 48%|███████████████▊                 | 23892/50000 [4:20:02<4:31:53,  1.60it/s]


 48%|███████████████▊                 | 23893/50000 [4:20:03<4:28:47,  1.62it/s]


 48%|███████████████▊                 | 23894/50000 [4:20:03<4:19:22,  1.68it/s]


 48%|███████████████▊                 | 23895/50000 [4:20:04<4:34:09,  1.59it/s]


 48%|███████████████▊                 | 23896/50000 [4:20:05<4:30:19,  1.61it/s]


 48%|███████████████▊                 | 23897/50000 [4:20:05<4:27:41,  1.63it/s]


 48%|███████████████▊                 | 23898/50000 [4:20:06<4:21:45,  1.66it/s]


 48%|███████████████▊                 | 23899/50000 [4:20:06<4:29:11,  1.62it/s]


 48%|███████████████▊                 | 23900/50000 [4:20:07<4:31:53,  1.60it/s]
                                                                                
{'loss': 3.3134, 'grad_norm': 3.136096954345703, 'learning_rate': 0.000522, 'epoch': 1.25}

 48%|███████████████▊                 | 23900/50000 [4:20:07<4:31:53,  1.60it/s]


 48%|███████████████▊                 | 23901/50000 [4:20:08<4:44:51,  1.53it/s]


 48%|███████████████▊                 | 23902/50000 [4:20:08<4:40:24,  1.55it/s]


 48%|███████████████▊                 | 23903/50000 [4:20:09<4:38:57,  1.56it/s]


 48%|███████████████▊                 | 23904/50000 [4:20:10<4:44:23,  1.53it/s]


 48%|███████████████▊                 | 23905/50000 [4:20:10<4:45:01,  1.53it/s]


 48%|███████████████▊                 | 23906/50000 [4:20:11<4:54:55,  1.47it/s]


 48%|███████████████▊                 | 23907/50000 [4:20:12<4:53:58,  1.48it/s]


 48%|███████████████▊                 | 23908/50000 [4:20:12<4:51:41,  1.49it/s]


 48%|███████████████▊                 | 23909/50000 [4:20:13<4:46:08,  1.52it/s]


 48%|███████████████▊                 | 23910/50000 [4:20:14<4:57:22,  1.46it/s]


 48%|███████████████▊                 | 23911/50000 [4:20:14<4:45:00,  1.53it/s]


 48%|███████████████▊                 | 23912/50000 [4:20:15<4:36:18,  1.57it/s]


 48%|███████████████▊                 | 23913/50000 [4:20:16<4:45:55,  1.52it/s]


 48%|███████████████▊                 | 23914/50000 [4:20:16<4:46:01,  1.52it/s]


 48%|███████████████▊                 | 23915/50000 [4:20:17<4:54:40,  1.48it/s]


 48%|███████████████▊                 | 23916/50000 [4:20:18<4:55:30,  1.47it/s]


 48%|███████████████▊                 | 23917/50000 [4:20:18<5:01:16,  1.44it/s]


 48%|███████████████▊                 | 23918/50000 [4:20:19<4:41:59,  1.54it/s]


 48%|███████████████▊                 | 23919/50000 [4:20:20<4:39:07,  1.56it/s]


 48%|███████████████▊                 | 23920/50000 [4:20:20<4:38:46,  1.56it/s]


 48%|███████████████▊                 | 23921/50000 [4:20:21<4:36:55,  1.57it/s]


 48%|███████████████▊                 | 23922/50000 [4:20:21<4:38:49,  1.56it/s]


 48%|███████████████▊                 | 23923/50000 [4:20:22<4:31:06,  1.60it/s]


 48%|███████████████▊                 | 23924/50000 [4:20:23<4:37:15,  1.57it/s]


 48%|███████████████▊                 | 23925/50000 [4:20:23<4:46:58,  1.51it/s]


 48%|███████████████▊                 | 23926/50000 [4:20:24<4:42:32,  1.54it/s]


 48%|███████████████▊                 | 23927/50000 [4:20:25<4:53:06,  1.48it/s]


 48%|███████████████▊                 | 23928/50000 [4:20:26<4:59:31,  1.45it/s]


 48%|███████████████▊                 | 23929/50000 [4:20:26<5:02:03,  1.44it/s]


 48%|███████████████▊                 | 23930/50000 [4:20:27<4:54:45,  1.47it/s]


 48%|███████████████▊                 | 23931/50000 [4:20:27<4:43:51,  1.53it/s]


 48%|███████████████▊                 | 23932/50000 [4:20:28<4:34:39,  1.58it/s]


 48%|███████████████▊                 | 23933/50000 [4:20:29<4:49:09,  1.50it/s]


 48%|███████████████▊                 | 23934/50000 [4:20:29<4:38:20,  1.56it/s]


 48%|███████████████▊                 | 23935/50000 [4:20:30<4:39:45,  1.55it/s]


 48%|███████████████▊                 | 23936/50000 [4:20:31<4:40:42,  1.55it/s]


 48%|███████████████▊                 | 23937/50000 [4:20:31<4:38:11,  1.56it/s]


 48%|███████████████▊                 | 23938/50000 [4:20:32<4:22:49,  1.65it/s]


 48%|███████████████▊                 | 23939/50000 [4:20:32<4:17:23,  1.69it/s]


 48%|███████████████▊                 | 23940/50000 [4:20:33<4:49:46,  1.50it/s]


 48%|███████████████▊                 | 23941/50000 [4:20:34<4:24:28,  1.64it/s]


 48%|███████████████▊                 | 23942/50000 [4:20:34<4:35:05,  1.58it/s]


 48%|███████████████▊                 | 23943/50000 [4:20:35<4:29:16,  1.61it/s]


 48%|███████████████▊                 | 23944/50000 [4:20:36<4:29:21,  1.61it/s]


 48%|███████████████▊                 | 23945/50000 [4:20:36<4:23:39,  1.65it/s]


 48%|███████████████▊                 | 23946/50000 [4:20:37<4:20:26,  1.67it/s]


 48%|███████████████▊                 | 23947/50000 [4:20:37<4:20:35,  1.67it/s]


 48%|███████████████▊                 | 23948/50000 [4:20:38<4:19:04,  1.68it/s]


 48%|███████████████▊                 | 23949/50000 [4:20:39<4:17:11,  1.69it/s]


 48%|███████████████▊                 | 23950/50000 [4:20:39<4:04:05,  1.78it/s]


 48%|███████████████▊                 | 23951/50000 [4:20:40<4:08:21,  1.75it/s]


 48%|███████████████▊                 | 23952/50000 [4:20:40<4:00:09,  1.81it/s]


 48%|███████████████▊                 | 23953/50000 [4:20:41<4:49:17,  1.50it/s]


 48%|███████████████▊                 | 23954/50000 [4:20:42<4:44:55,  1.52it/s]


 48%|███████████████▊                 | 23955/50000 [4:20:42<4:42:43,  1.54it/s]


 48%|███████████████▊                 | 23956/50000 [4:20:43<4:42:49,  1.53it/s]


 48%|███████████████▊                 | 23957/50000 [4:20:44<4:28:53,  1.61it/s]


 48%|███████████████▊                 | 23958/50000 [4:20:44<4:24:51,  1.64it/s]


 48%|███████████████▊                 | 23959/50000 [4:20:45<4:31:11,  1.60it/s]


 48%|███████████████▊                 | 23960/50000 [4:20:46<4:44:03,  1.53it/s]


 48%|███████████████▊                 | 23961/50000 [4:20:46<4:55:32,  1.47it/s]


 48%|███████████████▊                 | 23962/50000 [4:20:47<5:23:10,  1.34it/s]


 48%|███████████████▊                 | 23963/50000 [4:20:48<5:02:04,  1.44it/s]


 48%|███████████████▊                 | 23964/50000 [4:20:48<4:55:30,  1.47it/s]


 48%|███████████████▊                 | 23965/50000 [4:20:49<4:53:21,  1.48it/s]


 48%|███████████████▊                 | 23966/50000 [4:20:50<5:00:30,  1.44it/s]


 48%|███████████████▊                 | 23967/50000 [4:20:50<4:51:09,  1.49it/s]


 48%|███████████████▊                 | 23968/50000 [4:20:51<5:13:19,  1.38it/s]


 48%|███████████████▊                 | 23969/50000 [4:20:52<5:03:36,  1.43it/s]


 48%|███████████████▊                 | 23970/50000 [4:20:52<4:46:40,  1.51it/s]


 48%|███████████████▊                 | 23971/50000 [4:20:53<4:58:09,  1.46it/s]


 48%|███████████████▊                 | 23972/50000 [4:20:54<4:45:24,  1.52it/s]


 48%|███████████████▊                 | 23973/50000 [4:20:54<4:43:13,  1.53it/s]


 48%|███████████████▊                 | 23974/50000 [4:20:55<4:34:57,  1.58it/s]


 48%|███████████████▊                 | 23975/50000 [4:20:56<4:38:26,  1.56it/s]


 48%|███████████████▊                 | 23976/50000 [4:20:56<4:32:38,  1.59it/s]


 48%|███████████████▊                 | 23977/50000 [4:20:57<4:49:30,  1.50it/s]


 48%|███████████████▊                 | 23978/50000 [4:20:58<4:40:19,  1.55it/s]


 48%|███████████████▊                 | 23979/50000 [4:20:58<4:35:29,  1.57it/s]


 48%|███████████████▊                 | 23980/50000 [4:20:59<4:39:34,  1.55it/s]


 48%|███████████████▊                 | 23981/50000 [4:21:00<4:32:30,  1.59it/s]


 48%|███████████████▊                 | 23982/50000 [4:21:00<4:48:25,  1.50it/s]


 48%|███████████████▊                 | 23983/50000 [4:21:01<4:37:08,  1.56it/s]


 48%|███████████████▊                 | 23984/50000 [4:21:01<4:19:44,  1.67it/s]


 48%|███████████████▊                 | 23985/50000 [4:21:02<4:13:43,  1.71it/s]


 48%|███████████████▊                 | 23986/50000 [4:21:03<4:30:07,  1.61it/s]


 48%|███████████████▊                 | 23987/50000 [4:21:03<4:35:49,  1.57it/s]


 48%|███████████████▊                 | 23988/50000 [4:21:04<4:31:35,  1.60it/s]


 48%|███████████████▊                 | 23989/50000 [4:21:05<4:42:39,  1.53it/s]


 48%|███████████████▊                 | 23990/50000 [4:21:05<4:40:28,  1.55it/s]


 48%|███████████████▊                 | 23991/50000 [4:21:06<4:40:30,  1.55it/s]


 48%|███████████████▊                 | 23992/50000 [4:21:06<4:30:04,  1.61it/s]


 48%|███████████████▊                 | 23993/50000 [4:21:07<4:24:02,  1.64it/s]


 48%|███████████████▊                 | 23994/50000 [4:21:08<4:37:02,  1.56it/s]


 48%|███████████████▊                 | 23995/50000 [4:21:08<4:30:30,  1.60it/s]


 48%|███████████████▊                 | 23996/50000 [4:21:09<4:22:39,  1.65it/s]


 48%|███████████████▊                 | 23997/50000 [4:21:10<4:41:14,  1.54it/s]


 48%|███████████████▊                 | 23998/50000 [4:21:10<4:39:57,  1.55it/s]


 48%|███████████████▊                 | 23999/50000 [4:21:11<4:53:24,  1.48it/s]


 48%|███████████████▊                 | 24000/50000 [4:21:12<4:51:47,  1.49it/s]
                                                                                
{'loss': 3.2826, 'grad_norm': 5.710281848907471, 'learning_rate': 0.0005200000000000001, 'epoch': 1.26}

 48%|███████████████▊                 | 24000/50000 [4:21:12<4:51:47,  1.49it/s]


 48%|███████████████▊                 | 24001/50000 [4:21:12<4:39:16,  1.55it/s]


 48%|███████████████▊                 | 24002/50000 [4:21:13<4:20:12,  1.67it/s]


 48%|███████████████▊                 | 24003/50000 [4:21:13<4:25:15,  1.63it/s]


 48%|███████████████▊                 | 24004/50000 [4:21:14<4:13:08,  1.71it/s]


 48%|███████████████▊                 | 24005/50000 [4:21:15<4:12:53,  1.71it/s]


 48%|███████████████▊                 | 24006/50000 [4:21:15<4:35:20,  1.57it/s]


 48%|███████████████▊                 | 24007/50000 [4:21:16<4:37:26,  1.56it/s]


 48%|███████████████▊                 | 24008/50000 [4:21:17<4:47:31,  1.51it/s]


 48%|███████████████▊                 | 24009/50000 [4:21:17<4:54:34,  1.47it/s]


 48%|███████████████▊                 | 24010/50000 [4:21:18<5:15:32,  1.37it/s]


 48%|███████████████▊                 | 24011/50000 [4:21:19<5:17:27,  1.36it/s]


 48%|███████████████▊                 | 24012/50000 [4:21:20<5:07:50,  1.41it/s]


 48%|███████████████▊                 | 24013/50000 [4:21:20<4:43:22,  1.53it/s]


 48%|███████████████▊                 | 24014/50000 [4:21:21<4:35:39,  1.57it/s]


 48%|███████████████▊                 | 24015/50000 [4:21:21<4:37:37,  1.56it/s]


 48%|███████████████▊                 | 24016/50000 [4:21:22<4:48:22,  1.50it/s]


 48%|███████████████▊                 | 24017/50000 [4:21:23<4:50:38,  1.49it/s]


 48%|███████████████▊                 | 24018/50000 [4:21:23<4:52:16,  1.48it/s]


 48%|███████████████▊                 | 24019/50000 [4:21:24<4:48:24,  1.50it/s]


 48%|███████████████▊                 | 24020/50000 [4:21:25<4:49:49,  1.49it/s]


 48%|███████████████▊                 | 24021/50000 [4:21:25<4:40:07,  1.55it/s]


 48%|███████████████▊                 | 24022/50000 [4:21:26<4:32:31,  1.59it/s]


 48%|███████████████▊                 | 24023/50000 [4:21:27<4:28:49,  1.61it/s]


 48%|███████████████▊                 | 24024/50000 [4:21:27<4:34:26,  1.58it/s]


 48%|███████████████▊                 | 24025/50000 [4:21:28<4:15:57,  1.69it/s]


 48%|███████████████▊                 | 24026/50000 [4:21:28<4:27:11,  1.62it/s]


 48%|███████████████▊                 | 24027/50000 [4:21:29<4:27:41,  1.62it/s]


 48%|███████████████▊                 | 24028/50000 [4:21:30<4:31:09,  1.60it/s]


 48%|███████████████▊                 | 24029/50000 [4:21:30<4:26:13,  1.63it/s]


 48%|███████████████▊                 | 24030/50000 [4:21:31<4:53:07,  1.48it/s]


 48%|███████████████▊                 | 24031/50000 [4:21:32<5:01:41,  1.43it/s]


 48%|███████████████▊                 | 24032/50000 [4:21:32<4:54:12,  1.47it/s]


 48%|███████████████▊                 | 24033/50000 [4:21:33<4:37:31,  1.56it/s]


 48%|███████████████▊                 | 24034/50000 [4:21:34<4:31:29,  1.59it/s]


 48%|███████████████▊                 | 24035/50000 [4:21:34<4:37:54,  1.56it/s]


 48%|███████████████▊                 | 24036/50000 [4:21:35<4:36:52,  1.56it/s]


 48%|███████████████▊                 | 24037/50000 [4:21:36<4:52:46,  1.48it/s]


 48%|███████████████▊                 | 24038/50000 [4:21:36<4:35:39,  1.57it/s]


 48%|███████████████▊                 | 24039/50000 [4:21:37<4:40:48,  1.54it/s]


 48%|███████████████▊                 | 24040/50000 [4:21:37<4:29:43,  1.60it/s]


 48%|███████████████▊                 | 24041/50000 [4:21:38<4:35:04,  1.57it/s]


 48%|███████████████▊                 | 24042/50000 [4:21:39<4:29:50,  1.60it/s]


 48%|███████████████▊                 | 24043/50000 [4:21:39<4:26:09,  1.63it/s]


 48%|███████████████▊                 | 24044/50000 [4:21:40<4:41:57,  1.53it/s]


 48%|███████████████▊                 | 24045/50000 [4:21:41<4:35:20,  1.57it/s]


 48%|███████████████▊                 | 24046/50000 [4:21:41<4:39:42,  1.55it/s]


 48%|███████████████▊                 | 24047/50000 [4:21:42<4:33:45,  1.58it/s]


 48%|███████████████▊                 | 24048/50000 [4:21:43<4:29:33,  1.60it/s]


 48%|███████████████▊                 | 24049/50000 [4:21:43<4:22:35,  1.65it/s]


 48%|███████████████▊                 | 24050/50000 [4:21:44<4:37:45,  1.56it/s]


 48%|███████████████▊                 | 24051/50000 [4:21:45<4:50:55,  1.49it/s]


 48%|███████████████▊                 | 24052/50000 [4:21:45<4:57:40,  1.45it/s]


 48%|███████████████▊                 | 24053/50000 [4:21:46<4:39:51,  1.55it/s]


 48%|███████████████▉                 | 24054/50000 [4:21:47<4:42:08,  1.53it/s]


 48%|███████████████▉                 | 24055/50000 [4:21:47<4:36:16,  1.57it/s]


 48%|███████████████▉                 | 24056/50000 [4:21:48<4:28:09,  1.61it/s]


 48%|███████████████▉                 | 24057/50000 [4:21:48<4:34:09,  1.58it/s]


 48%|███████████████▉                 | 24058/50000 [4:21:49<4:34:57,  1.57it/s]


 48%|███████████████▉                 | 24059/50000 [4:21:50<4:53:46,  1.47it/s]


 48%|███████████████▉                 | 24060/50000 [4:21:51<5:04:28,  1.42it/s]


 48%|███████████████▉                 | 24061/50000 [4:21:51<5:18:05,  1.36it/s]


 48%|███████████████▉                 | 24062/50000 [4:21:52<4:56:48,  1.46it/s]


 48%|███████████████▉                 | 24063/50000 [4:21:52<4:38:30,  1.55it/s]


 48%|███████████████▉                 | 24064/50000 [4:21:53<4:39:28,  1.55it/s]


 48%|███████████████▉                 | 24065/50000 [4:21:54<4:36:05,  1.57it/s]


 48%|███████████████▉                 | 24066/50000 [4:21:54<4:46:04,  1.51it/s]


 48%|███████████████▉                 | 24067/50000 [4:21:55<4:46:26,  1.51it/s]


 48%|███████████████▉                 | 24068/50000 [4:21:56<5:01:01,  1.44it/s]


 48%|███████████████▉                 | 24069/50000 [4:21:57<4:55:09,  1.46it/s]


 48%|███████████████▉                 | 24070/50000 [4:21:57<4:38:00,  1.55it/s]


 48%|███████████████▉                 | 24071/50000 [4:21:58<4:30:07,  1.60it/s]


 48%|███████████████▉                 | 24072/50000 [4:21:58<4:26:09,  1.62it/s]


 48%|███████████████▉                 | 24073/50000 [4:21:59<4:13:59,  1.70it/s]


 48%|███████████████▉                 | 24074/50000 [4:21:59<4:15:00,  1.69it/s]


 48%|███████████████▉                 | 24075/50000 [4:22:00<4:17:10,  1.68it/s]


 48%|███████████████▉                 | 24076/50000 [4:22:01<4:23:40,  1.64it/s]


 48%|███████████████▉                 | 24077/50000 [4:22:01<4:21:40,  1.65it/s]


 48%|███████████████▉                 | 24078/50000 [4:22:02<4:27:39,  1.61it/s]


 48%|███████████████▉                 | 24079/50000 [4:22:02<4:23:42,  1.64it/s]


 48%|███████████████▉                 | 24080/50000 [4:22:03<4:29:55,  1.60it/s]


 48%|███████████████▉                 | 24081/50000 [4:22:04<4:55:30,  1.46it/s]


 48%|███████████████▉                 | 24082/50000 [4:22:05<5:00:39,  1.44it/s]


 48%|███████████████▉                 | 24083/50000 [4:22:05<4:55:50,  1.46it/s]


 48%|███████████████▉                 | 24084/50000 [4:22:06<4:34:46,  1.57it/s]


 48%|███████████████▉                 | 24085/50000 [4:22:06<4:16:47,  1.68it/s]


 48%|███████████████▉                 | 24086/50000 [4:22:07<4:31:38,  1.59it/s]


 48%|███████████████▉                 | 24087/50000 [4:22:08<4:35:32,  1.57it/s]


 48%|███████████████▉                 | 24088/50000 [4:22:08<4:25:12,  1.63it/s]


 48%|███████████████▉                 | 24089/50000 [4:22:09<4:50:35,  1.49it/s]


 48%|███████████████▉                 | 24090/50000 [4:22:10<4:43:58,  1.52it/s]


 48%|███████████████▉                 | 24091/50000 [4:22:10<4:42:33,  1.53it/s]


 48%|███████████████▉                 | 24092/50000 [4:22:11<4:33:32,  1.58it/s]


 48%|███████████████▉                 | 24093/50000 [4:22:12<4:32:57,  1.58it/s]


 48%|███████████████▉                 | 24094/50000 [4:22:12<4:24:13,  1.63it/s]


 48%|███████████████▉                 | 24095/50000 [4:22:13<4:49:50,  1.49it/s]


 48%|███████████████▉                 | 24096/50000 [4:22:14<4:59:06,  1.44it/s]


 48%|███████████████▉                 | 24097/50000 [4:22:14<4:54:42,  1.46it/s]


 48%|███████████████▉                 | 24098/50000 [4:22:15<4:48:16,  1.50it/s]


 48%|███████████████▉                 | 24099/50000 [4:22:16<4:37:17,  1.56it/s]


 48%|███████████████▉                 | 24100/50000 [4:22:16<4:40:10,  1.54it/s]
                                                                                
{'loss': 3.3049, 'grad_norm': 3.7997970581054688, 'learning_rate': 0.000518, 'epoch': 1.26}

 48%|███████████████▉                 | 24100/50000 [4:22:16<4:40:10,  1.54it/s]


 48%|███████████████▉                 | 24101/50000 [4:22:17<4:57:19,  1.45it/s]


 48%|███████████████▉                 | 24102/50000 [4:22:18<5:03:32,  1.42it/s]


 48%|███████████████▉                 | 24103/50000 [4:22:18<5:04:18,  1.42it/s]


 48%|███████████████▉                 | 24104/50000 [4:22:19<4:45:13,  1.51it/s]


 48%|███████████████▉                 | 24105/50000 [4:22:20<4:35:00,  1.57it/s]


 48%|███████████████▉                 | 24106/50000 [4:22:20<4:35:21,  1.57it/s]


 48%|███████████████▉                 | 24107/50000 [4:22:21<4:24:08,  1.63it/s]


 48%|███████████████▉                 | 24108/50000 [4:22:21<4:17:02,  1.68it/s]


 48%|███████████████▉                 | 24109/50000 [4:22:22<4:21:48,  1.65it/s]


 48%|███████████████▉                 | 24110/50000 [4:22:23<4:18:37,  1.67it/s]


 48%|███████████████▉                 | 24111/50000 [4:22:23<4:39:41,  1.54it/s]


 48%|███████████████▉                 | 24112/50000 [4:22:24<4:52:47,  1.47it/s]


 48%|███████████████▉                 | 24113/50000 [4:22:25<4:38:46,  1.55it/s]


 48%|███████████████▉                 | 24114/50000 [4:22:25<4:29:44,  1.60it/s]


 48%|███████████████▉                 | 24115/50000 [4:22:26<4:33:11,  1.58it/s]


 48%|███████████████▉                 | 24116/50000 [4:22:27<4:38:02,  1.55it/s]


 48%|███████████████▉                 | 24117/50000 [4:22:27<4:19:49,  1.66it/s]


 48%|███████████████▉                 | 24118/50000 [4:22:28<4:28:46,  1.60it/s]


 48%|███████████████▉                 | 24119/50000 [4:22:28<4:42:35,  1.53it/s]


 48%|███████████████▉                 | 24120/50000 [4:22:29<4:40:30,  1.54it/s]


 48%|███████████████▉                 | 24121/50000 [4:22:30<4:34:56,  1.57it/s]


 48%|███████████████▉                 | 24122/50000 [4:22:30<4:39:22,  1.54it/s]


 48%|███████████████▉                 | 24123/50000 [4:22:31<4:39:03,  1.55it/s]


 48%|███████████████▉                 | 24124/50000 [4:22:32<4:30:04,  1.60it/s]


 48%|███████████████▉                 | 24125/50000 [4:22:32<4:20:46,  1.65it/s]


 48%|███████████████▉                 | 24126/50000 [4:22:33<4:28:29,  1.61it/s]


 48%|███████████████▉                 | 24127/50000 [4:22:34<4:42:20,  1.53it/s]


 48%|███████████████▉                 | 24128/50000 [4:22:34<4:43:35,  1.52it/s]


 48%|███████████████▉                 | 24129/50000 [4:22:35<4:45:15,  1.51it/s]


 48%|███████████████▉                 | 24130/50000 [4:22:36<4:46:27,  1.51it/s]


 48%|███████████████▉                 | 24131/50000 [4:22:36<5:10:10,  1.39it/s]


 48%|███████████████▉                 | 24132/50000 [4:22:37<5:14:53,  1.37it/s]


 48%|███████████████▉                 | 24133/50000 [4:22:38<5:01:33,  1.43it/s]


 48%|███████████████▉                 | 24134/50000 [4:22:38<4:48:50,  1.49it/s]


 48%|███████████████▉                 | 24135/50000 [4:22:39<4:35:50,  1.56it/s]


 48%|███████████████▉                 | 24136/50000 [4:22:40<4:41:30,  1.53it/s]


 48%|███████████████▉                 | 24137/50000 [4:22:40<4:51:25,  1.48it/s]


 48%|███████████████▉                 | 24138/50000 [4:22:41<4:39:24,  1.54it/s]


 48%|███████████████▉                 | 24139/50000 [4:22:42<4:29:13,  1.60it/s]


 48%|███████████████▉                 | 24140/50000 [4:22:42<4:40:05,  1.54it/s]


 48%|███████████████▉                 | 24141/50000 [4:22:43<4:54:41,  1.46it/s]


 48%|███████████████▉                 | 24142/50000 [4:22:44<4:37:31,  1.55it/s]


 48%|███████████████▉                 | 24143/50000 [4:22:44<4:31:10,  1.59it/s]


 48%|███████████████▉                 | 24144/50000 [4:22:45<4:50:48,  1.48it/s]


 48%|███████████████▉                 | 24145/50000 [4:22:46<4:47:00,  1.50it/s]


 48%|███████████████▉                 | 24146/50000 [4:22:46<4:59:20,  1.44it/s]


 48%|███████████████▉                 | 24147/50000 [4:22:47<4:57:53,  1.45it/s]


 48%|███████████████▉                 | 24148/50000 [4:22:48<4:47:04,  1.50it/s]


 48%|███████████████▉                 | 24149/50000 [4:22:48<4:27:03,  1.61it/s]


 48%|███████████████▉                 | 24150/50000 [4:22:49<4:17:49,  1.67it/s]


 48%|███████████████▉                 | 24151/50000 [4:22:49<4:13:33,  1.70it/s]


 48%|███████████████▉                 | 24152/50000 [4:22:50<4:30:33,  1.59it/s]


 48%|███████████████▉                 | 24153/50000 [4:22:51<4:55:50,  1.46it/s]


 48%|███████████████▉                 | 24154/50000 [4:22:51<4:53:51,  1.47it/s]


 48%|███████████████▉                 | 24155/50000 [4:22:52<4:45:54,  1.51it/s]


 48%|███████████████▉                 | 24156/50000 [4:22:53<4:37:45,  1.55it/s]


 48%|███████████████▉                 | 24157/50000 [4:22:53<4:28:49,  1.60it/s]


 48%|███████████████▉                 | 24158/50000 [4:22:54<4:33:12,  1.58it/s]


 48%|███████████████▉                 | 24159/50000 [4:22:55<4:32:59,  1.58it/s]


 48%|███████████████▉                 | 24160/50000 [4:22:55<4:17:48,  1.67it/s]


 48%|███████████████▉                 | 24161/50000 [4:22:56<4:18:11,  1.67it/s]


 48%|███████████████▉                 | 24162/50000 [4:22:56<4:17:36,  1.67it/s]


 48%|███████████████▉                 | 24163/50000 [4:22:57<4:35:28,  1.56it/s]


 48%|███████████████▉                 | 24164/50000 [4:22:57<4:17:52,  1.67it/s]


 48%|███████████████▉                 | 24165/50000 [4:22:58<4:28:52,  1.60it/s]


 48%|███████████████▉                 | 24166/50000 [4:22:59<4:46:44,  1.50it/s]


 48%|███████████████▉                 | 24167/50000 [4:23:00<4:38:31,  1.55it/s]


 48%|███████████████▉                 | 24168/50000 [4:23:00<4:39:25,  1.54it/s]


 48%|███████████████▉                 | 24169/50000 [4:23:01<4:40:00,  1.54it/s]


 48%|███████████████▉                 | 24170/50000 [4:23:01<4:37:11,  1.55it/s]


 48%|███████████████▉                 | 24171/50000 [4:23:02<4:18:34,  1.66it/s]


 48%|███████████████▉                 | 24172/50000 [4:23:03<4:29:32,  1.60it/s]


 48%|███████████████▉                 | 24173/50000 [4:23:03<4:23:28,  1.63it/s]


 48%|███████████████▉                 | 24174/50000 [4:23:04<4:38:10,  1.55it/s]


 48%|███████████████▉                 | 24175/50000 [4:23:05<4:46:37,  1.50it/s]


 48%|███████████████▉                 | 24176/50000 [4:23:05<4:36:56,  1.55it/s]


 48%|███████████████▉                 | 24177/50000 [4:23:06<4:51:38,  1.48it/s]


 48%|███████████████▉                 | 24178/50000 [4:23:07<4:48:17,  1.49it/s]


 48%|███████████████▉                 | 24179/50000 [4:23:07<4:46:01,  1.50it/s]


 48%|███████████████▉                 | 24180/50000 [4:23:08<4:45:11,  1.51it/s]


 48%|███████████████▉                 | 24181/50000 [4:23:09<4:54:44,  1.46it/s]


 48%|███████████████▉                 | 24182/50000 [4:23:09<4:52:05,  1.47it/s]


 48%|███████████████▉                 | 24183/50000 [4:23:10<4:45:25,  1.51it/s]


 48%|███████████████▉                 | 24184/50000 [4:23:11<4:45:36,  1.51it/s]


 48%|███████████████▉                 | 24185/50000 [4:23:11<4:54:17,  1.46it/s]


 48%|███████████████▉                 | 24186/50000 [4:23:12<4:50:10,  1.48it/s]


 48%|███████████████▉                 | 24187/50000 [4:23:13<4:40:44,  1.53it/s]


 48%|███████████████▉                 | 24188/50000 [4:23:13<4:26:49,  1.61it/s]


 48%|███████████████▉                 | 24189/50000 [4:23:14<4:26:54,  1.61it/s]


 48%|███████████████▉                 | 24190/50000 [4:23:14<4:25:12,  1.62it/s]


 48%|███████████████▉                 | 24191/50000 [4:23:15<4:20:01,  1.65it/s]


 48%|███████████████▉                 | 24192/50000 [4:23:16<4:28:40,  1.60it/s]


 48%|███████████████▉                 | 24193/50000 [4:23:16<4:28:55,  1.60it/s]


 48%|███████████████▉                 | 24194/50000 [4:23:17<4:33:13,  1.57it/s]


 48%|███████████████▉                 | 24195/50000 [4:23:18<4:34:33,  1.57it/s]


 48%|███████████████▉                 | 24196/50000 [4:23:18<4:24:57,  1.62it/s]


 48%|███████████████▉                 | 24197/50000 [4:23:19<4:30:22,  1.59it/s]


 48%|███████████████▉                 | 24198/50000 [4:23:20<4:42:38,  1.52it/s]


 48%|███████████████▉                 | 24199/50000 [4:23:20<4:45:59,  1.50it/s]


 48%|███████████████▉                 | 24200/50000 [4:23:21<5:00:57,  1.43it/s]
                                                                                
{'loss': 3.3192, 'grad_norm': 3.1473965644836426, 'learning_rate': 0.0005160000000000001, 'epoch': 1.27}

 48%|███████████████▉                 | 24200/50000 [4:23:21<5:00:57,  1.43it/s]


 48%|███████████████▉                 | 24201/50000 [4:23:22<5:07:18,  1.40it/s]


 48%|███████████████▉                 | 24202/50000 [4:23:22<5:01:49,  1.42it/s]


 48%|███████████████▉                 | 24203/50000 [4:23:23<4:58:08,  1.44it/s]


 48%|███████████████▉                 | 24204/50000 [4:23:24<5:04:03,  1.41it/s]


 48%|███████████████▉                 | 24205/50000 [4:23:24<4:49:54,  1.48it/s]


 48%|███████████████▉                 | 24206/50000 [4:23:25<4:39:04,  1.54it/s]


 48%|███████████████▉                 | 24207/50000 [4:23:26<4:26:14,  1.61it/s]


 48%|███████████████▉                 | 24208/50000 [4:23:26<4:28:21,  1.60it/s]


 48%|███████████████▉                 | 24209/50000 [4:23:27<4:32:10,  1.58it/s]


 48%|███████████████▉                 | 24210/50000 [4:23:27<4:27:29,  1.61it/s]


 48%|███████████████▉                 | 24211/50000 [4:23:28<4:22:49,  1.64it/s]


 48%|███████████████▉                 | 24212/50000 [4:23:29<4:16:09,  1.68it/s]


 48%|███████████████▉                 | 24213/50000 [4:23:29<4:37:53,  1.55it/s]


 48%|███████████████▉                 | 24214/50000 [4:23:30<4:25:52,  1.62it/s]


 48%|███████████████▉                 | 24215/50000 [4:23:31<4:31:12,  1.58it/s]


 48%|███████████████▉                 | 24216/50000 [4:23:31<4:44:13,  1.51it/s]


 48%|███████████████▉                 | 24217/50000 [4:23:32<4:44:04,  1.51it/s]


 48%|███████████████▉                 | 24218/50000 [4:23:33<4:42:23,  1.52it/s]


 48%|███████████████▉                 | 24219/50000 [4:23:33<4:43:35,  1.52it/s]


 48%|███████████████▉                 | 24220/50000 [4:23:34<4:35:31,  1.56it/s]


 48%|███████████████▉                 | 24221/50000 [4:23:35<4:35:33,  1.56it/s]


 48%|███████████████▉                 | 24222/50000 [4:23:35<4:38:49,  1.54it/s]


 48%|███████████████▉                 | 24223/50000 [4:23:36<4:51:39,  1.47it/s]


 48%|███████████████▉                 | 24224/50000 [4:23:37<4:55:45,  1.45it/s]


 48%|███████████████▉                 | 24225/50000 [4:23:37<4:51:22,  1.47it/s]


 48%|███████████████▉                 | 24226/50000 [4:23:38<4:45:02,  1.51it/s]


 48%|███████████████▉                 | 24227/50000 [4:23:39<4:36:27,  1.55it/s]


 48%|███████████████▉                 | 24228/50000 [4:23:39<4:24:47,  1.62it/s]


 48%|███████████████▉                 | 24229/50000 [4:23:40<4:53:00,  1.47it/s]


 48%|███████████████▉                 | 24230/50000 [4:23:41<4:43:19,  1.52it/s]


 48%|███████████████▉                 | 24231/50000 [4:23:41<4:31:43,  1.58it/s]


 48%|███████████████▉                 | 24232/50000 [4:23:42<4:42:42,  1.52it/s]


 48%|███████████████▉                 | 24233/50000 [4:23:43<5:09:45,  1.39it/s]


 48%|███████████████▉                 | 24234/50000 [4:23:44<5:21:25,  1.34it/s]


 48%|███████████████▉                 | 24235/50000 [4:23:44<5:19:28,  1.34it/s]


 48%|███████████████▉                 | 24236/50000 [4:23:45<5:09:25,  1.39it/s]


 48%|███████████████▉                 | 24237/50000 [4:23:45<4:49:10,  1.48it/s]


 48%|███████████████▉                 | 24238/50000 [4:23:46<4:48:02,  1.49it/s]


 48%|███████████████▉                 | 24239/50000 [4:23:47<4:45:31,  1.50it/s]


 48%|███████████████▉                 | 24240/50000 [4:23:47<4:36:27,  1.55it/s]


 48%|███████████████▉                 | 24241/50000 [4:23:48<5:01:51,  1.42it/s]


 48%|███████████████▉                 | 24242/50000 [4:23:49<4:44:46,  1.51it/s]


 48%|████████████████                 | 24243/50000 [4:23:49<4:35:07,  1.56it/s]


 48%|████████████████                 | 24244/50000 [4:23:50<4:23:30,  1.63it/s]


 48%|████████████████                 | 24245/50000 [4:23:51<4:20:15,  1.65it/s]


 48%|████████████████                 | 24246/50000 [4:23:51<4:26:02,  1.61it/s]


 48%|████████████████                 | 24247/50000 [4:23:52<4:19:34,  1.65it/s]


 48%|████████████████                 | 24248/50000 [4:23:52<4:11:49,  1.70it/s]


 48%|████████████████                 | 24249/50000 [4:23:53<4:10:28,  1.71it/s]


 48%|████████████████                 | 24250/50000 [4:23:54<4:16:49,  1.67it/s]


 49%|████████████████                 | 24251/50000 [4:23:54<4:20:23,  1.65it/s]


 49%|████████████████                 | 24252/50000 [4:23:55<4:22:57,  1.63it/s]


 49%|████████████████                 | 24253/50000 [4:23:55<4:27:54,  1.60it/s]


 49%|████████████████                 | 24254/50000 [4:23:56<4:19:18,  1.65it/s]


 49%|████████████████                 | 24255/50000 [4:23:57<4:27:02,  1.61it/s]


 49%|████████████████                 | 24256/50000 [4:23:57<4:28:26,  1.60it/s]


 49%|████████████████                 | 24257/50000 [4:23:58<4:30:10,  1.59it/s]


 49%|████████████████                 | 24258/50000 [4:23:59<4:46:03,  1.50it/s]


 49%|████████████████                 | 24259/50000 [4:23:59<4:47:26,  1.49it/s]


 49%|████████████████                 | 24260/50000 [4:24:00<4:36:59,  1.55it/s]


 49%|████████████████                 | 24261/50000 [4:24:01<4:29:39,  1.59it/s]


 49%|████████████████                 | 24262/50000 [4:24:01<4:34:17,  1.56it/s]


 49%|████████████████                 | 24263/50000 [4:24:02<4:26:55,  1.61it/s]


 49%|████████████████                 | 24264/50000 [4:24:03<4:44:55,  1.51it/s]


 49%|████████████████                 | 24265/50000 [4:24:03<4:41:42,  1.52it/s]


 49%|████████████████                 | 24266/50000 [4:24:04<4:45:13,  1.50it/s]


 49%|████████████████                 | 24267/50000 [4:24:05<4:42:44,  1.52it/s]


 49%|████████████████                 | 24268/50000 [4:24:05<4:40:08,  1.53it/s]


 49%|████████████████                 | 24269/50000 [4:24:06<4:52:27,  1.47it/s]


 49%|████████████████                 | 24270/50000 [4:24:07<4:48:33,  1.49it/s]


 49%|████████████████                 | 24271/50000 [4:24:07<4:38:33,  1.54it/s]


 49%|████████████████                 | 24272/50000 [4:24:08<4:31:32,  1.58it/s]


 49%|████████████████                 | 24273/50000 [4:24:08<4:37:13,  1.55it/s]


 49%|████████████████                 | 24274/50000 [4:24:09<4:47:10,  1.49it/s]


 49%|████████████████                 | 24275/50000 [4:24:10<4:37:31,  1.54it/s]


 49%|████████████████                 | 24276/50000 [4:24:10<4:26:47,  1.61it/s]


 49%|████████████████                 | 24277/50000 [4:24:11<4:21:41,  1.64it/s]


 49%|████████████████                 | 24278/50000 [4:24:11<4:19:39,  1.65it/s]


 49%|████████████████                 | 24279/50000 [4:24:12<4:24:44,  1.62it/s]


 49%|████████████████                 | 24280/50000 [4:24:13<4:30:01,  1.59it/s]


 49%|████████████████                 | 24281/50000 [4:24:13<4:35:18,  1.56it/s]


 49%|████████████████                 | 24282/50000 [4:24:14<4:47:24,  1.49it/s]


 49%|████████████████                 | 24283/50000 [4:24:15<4:34:31,  1.56it/s]


 49%|████████████████                 | 24284/50000 [4:24:15<4:35:56,  1.55it/s]


 49%|████████████████                 | 24285/50000 [4:24:16<4:40:10,  1.53it/s]


 49%|████████████████                 | 24286/50000 [4:24:17<4:50:48,  1.47it/s]


 49%|████████████████                 | 24287/50000 [4:24:18<5:10:03,  1.38it/s]


 49%|████████████████                 | 24288/50000 [4:24:19<5:35:54,  1.28it/s]


 49%|████████████████                 | 24289/50000 [4:24:19<5:06:53,  1.40it/s]


 49%|████████████████                 | 24290/50000 [4:24:20<4:57:46,  1.44it/s]


 49%|████████████████                 | 24291/50000 [4:24:20<4:50:31,  1.47it/s]


 49%|████████████████                 | 24292/50000 [4:24:21<5:08:16,  1.39it/s]


 49%|████████████████                 | 24293/50000 [4:24:22<4:53:07,  1.46it/s]


 49%|████████████████                 | 24294/50000 [4:24:23<5:15:08,  1.36it/s]


 49%|████████████████                 | 24295/50000 [4:24:23<5:01:08,  1.42it/s]


 49%|████████████████                 | 24296/50000 [4:24:24<4:43:13,  1.51it/s]


 49%|████████████████                 | 24297/50000 [4:24:24<4:33:54,  1.56it/s]


 49%|████████████████                 | 24298/50000 [4:24:25<4:38:43,  1.54it/s]


 49%|████████████████                 | 24299/50000 [4:24:26<4:52:10,  1.47it/s]


 49%|████████████████                 | 24300/50000 [4:24:27<4:45:52,  1.50it/s]
                                                                                
{'loss': 3.3016, 'grad_norm': 3.1518893241882324, 'learning_rate': 0.000514, 'epoch': 1.27}

 49%|████████████████                 | 24300/50000 [4:24:27<4:45:52,  1.50it/s]


 49%|████████████████                 | 24301/50000 [4:24:27<4:53:53,  1.46it/s]


 49%|████████████████                 | 24302/50000 [4:24:28<4:40:48,  1.53it/s]


 49%|████████████████                 | 24303/50000 [4:24:28<4:38:57,  1.54it/s]


 49%|████████████████                 | 24304/50000 [4:24:29<4:38:27,  1.54it/s]


 49%|████████████████                 | 24305/50000 [4:24:30<4:39:46,  1.53it/s]


 49%|████████████████                 | 24306/50000 [4:24:30<4:27:41,  1.60it/s]


 49%|████████████████                 | 24307/50000 [4:24:31<4:41:23,  1.52it/s]


 49%|████████████████                 | 24308/50000 [4:24:32<4:43:26,  1.51it/s]


 49%|████████████████                 | 24309/50000 [4:24:32<4:35:06,  1.56it/s]


 49%|████████████████                 | 24310/50000 [4:24:33<4:29:14,  1.59it/s]


 49%|████████████████                 | 24311/50000 [4:24:34<4:24:18,  1.62it/s]


 49%|████████████████                 | 24312/50000 [4:24:34<4:37:31,  1.54it/s]


 49%|████████████████                 | 24313/50000 [4:24:35<4:27:50,  1.60it/s]


 49%|████████████████                 | 24314/50000 [4:24:35<4:21:47,  1.64it/s]


 49%|████████████████                 | 24315/50000 [4:24:36<4:19:07,  1.65it/s]


 49%|████████████████                 | 24316/50000 [4:24:37<4:23:43,  1.62it/s]


 49%|████████████████                 | 24317/50000 [4:24:38<5:04:25,  1.41it/s]


 49%|████████████████                 | 24318/50000 [4:24:38<4:45:47,  1.50it/s]


 49%|████████████████                 | 24319/50000 [4:24:39<5:08:24,  1.39it/s]


 49%|████████████████                 | 24320/50000 [4:24:40<4:57:19,  1.44it/s]


 49%|████████████████                 | 24321/50000 [4:24:40<4:41:22,  1.52it/s]


 49%|████████████████                 | 24322/50000 [4:24:41<4:30:13,  1.58it/s]


 49%|████████████████                 | 24323/50000 [4:24:42<4:45:24,  1.50it/s]


 49%|████████████████                 | 24324/50000 [4:24:42<5:03:45,  1.41it/s]


 49%|████████████████                 | 24325/50000 [4:24:43<5:05:33,  1.40it/s]


 49%|████████████████                 | 24326/50000 [4:24:44<4:44:38,  1.50it/s]


 49%|████████████████                 | 24327/50000 [4:24:44<4:50:13,  1.47it/s]


 49%|████████████████                 | 24328/50000 [4:24:45<4:41:04,  1.52it/s]


 49%|████████████████                 | 24329/50000 [4:24:46<4:42:50,  1.51it/s]


 49%|████████████████                 | 24330/50000 [4:24:46<4:54:55,  1.45it/s]


 49%|████████████████                 | 24331/50000 [4:24:47<5:02:30,  1.41it/s]


 49%|████████████████                 | 24332/50000 [4:24:48<4:44:02,  1.51it/s]


 49%|████████████████                 | 24333/50000 [4:24:48<4:41:40,  1.52it/s]


 49%|████████████████                 | 24334/50000 [4:24:49<4:39:14,  1.53it/s]


 49%|████████████████                 | 24335/50000 [4:24:50<4:38:19,  1.54it/s]


 49%|████████████████                 | 24336/50000 [4:24:50<4:59:31,  1.43it/s]


 49%|████████████████                 | 24337/50000 [4:24:51<4:53:15,  1.46it/s]


 49%|████████████████                 | 24338/50000 [4:24:52<4:57:01,  1.44it/s]


 49%|████████████████                 | 24339/50000 [4:24:53<5:14:13,  1.36it/s]


 49%|████████████████                 | 24340/50000 [4:24:53<5:24:35,  1.32it/s]


 49%|████████████████                 | 24341/50000 [4:24:54<5:09:59,  1.38it/s]


 49%|████████████████                 | 24342/50000 [4:24:55<5:03:02,  1.41it/s]


 49%|████████████████                 | 24343/50000 [4:24:55<4:54:09,  1.45it/s]


 49%|████████████████                 | 24344/50000 [4:24:56<4:53:41,  1.46it/s]


 49%|████████████████                 | 24345/50000 [4:24:57<5:11:07,  1.37it/s]


 49%|████████████████                 | 24346/50000 [4:24:58<4:58:59,  1.43it/s]


 49%|████████████████                 | 24347/50000 [4:24:58<4:46:27,  1.49it/s]


 49%|████████████████                 | 24348/50000 [4:24:59<4:33:37,  1.56it/s]


 49%|████████████████                 | 24349/50000 [4:24:59<4:17:45,  1.66it/s]


 49%|████████████████                 | 24350/50000 [4:25:00<4:16:44,  1.67it/s]


 49%|████████████████                 | 24351/50000 [4:25:00<4:20:58,  1.64it/s]


 49%|████████████████                 | 24352/50000 [4:25:01<4:38:45,  1.53it/s]


 49%|████████████████                 | 24353/50000 [4:25:02<4:30:36,  1.58it/s]


 49%|████████████████                 | 24354/50000 [4:25:02<4:23:18,  1.62it/s]


 49%|████████████████                 | 24355/50000 [4:25:03<4:19:47,  1.65it/s]


 49%|████████████████                 | 24356/50000 [4:25:04<4:28:43,  1.59it/s]


 49%|████████████████                 | 24357/50000 [4:25:05<5:05:03,  1.40it/s]


 49%|████████████████                 | 24358/50000 [4:25:05<4:57:59,  1.43it/s]


 49%|████████████████                 | 24359/50000 [4:25:06<4:40:50,  1.52it/s]


 49%|████████████████                 | 24360/50000 [4:25:06<4:41:01,  1.52it/s]


 49%|████████████████                 | 24361/50000 [4:25:07<4:42:46,  1.51it/s]


 49%|████████████████                 | 24362/50000 [4:25:08<4:34:13,  1.56it/s]


 49%|████████████████                 | 24363/50000 [4:25:08<4:25:02,  1.61it/s]


 49%|████████████████                 | 24364/50000 [4:25:09<4:44:00,  1.50it/s]


 49%|████████████████                 | 24365/50000 [4:25:10<4:32:39,  1.57it/s]


 49%|████████████████                 | 24366/50000 [4:25:10<4:47:48,  1.48it/s]


 49%|████████████████                 | 24367/50000 [4:25:11<4:38:28,  1.53it/s]


 49%|████████████████                 | 24368/50000 [4:25:12<4:29:36,  1.58it/s]


 49%|████████████████                 | 24369/50000 [4:25:12<4:22:33,  1.63it/s]


 49%|████████████████                 | 24370/50000 [4:25:13<4:05:56,  1.74it/s]


 49%|████████████████                 | 24371/50000 [4:25:13<4:05:58,  1.74it/s]


 49%|████████████████                 | 24372/50000 [4:25:14<4:36:48,  1.54it/s]


 49%|████████████████                 | 24373/50000 [4:25:15<4:36:25,  1.55it/s]


 49%|████████████████                 | 24374/50000 [4:25:15<4:31:16,  1.57it/s]


 49%|████████████████                 | 24375/50000 [4:25:16<4:24:31,  1.61it/s]


 49%|████████████████                 | 24376/50000 [4:25:17<4:50:31,  1.47it/s]


 49%|████████████████                 | 24377/50000 [4:25:17<4:43:40,  1.51it/s]


 49%|████████████████                 | 24378/50000 [4:25:18<4:42:03,  1.51it/s]


 49%|████████████████                 | 24379/50000 [4:25:18<4:28:55,  1.59it/s]


 49%|████████████████                 | 24380/50000 [4:25:19<4:13:33,  1.68it/s]


 49%|████████████████                 | 24381/50000 [4:25:20<4:23:21,  1.62it/s]


 49%|████████████████                 | 24382/50000 [4:25:20<4:31:02,  1.58it/s]


 49%|████████████████                 | 24383/50000 [4:25:21<4:43:16,  1.51it/s]


 49%|████████████████                 | 24384/50000 [4:25:22<4:32:55,  1.56it/s]


 49%|████████████████                 | 24385/50000 [4:25:22<4:27:34,  1.60it/s]


 49%|████████████████                 | 24386/50000 [4:25:23<4:25:09,  1.61it/s]


 49%|████████████████                 | 24387/50000 [4:25:23<4:20:57,  1.64it/s]


 49%|████████████████                 | 24388/50000 [4:25:24<4:18:01,  1.65it/s]


 49%|████████████████                 | 24389/50000 [4:25:25<4:16:44,  1.66it/s]


 49%|████████████████                 | 24390/50000 [4:25:25<4:19:58,  1.64it/s]


 49%|████████████████                 | 24391/50000 [4:25:26<4:24:37,  1.61it/s]


 49%|████████████████                 | 24392/50000 [4:25:26<4:20:31,  1.64it/s]


 49%|████████████████                 | 24393/50000 [4:25:27<4:27:22,  1.60it/s]


 49%|████████████████                 | 24394/50000 [4:25:28<4:29:46,  1.58it/s]


 49%|████████████████                 | 24395/50000 [4:25:28<4:21:43,  1.63it/s]


 49%|████████████████                 | 24396/50000 [4:25:29<4:20:06,  1.64it/s]


 49%|████████████████                 | 24397/50000 [4:25:30<4:19:01,  1.65it/s]


 49%|████████████████                 | 24398/50000 [4:25:30<4:18:10,  1.65it/s]


 49%|████████████████                 | 24399/50000 [4:25:31<4:16:49,  1.66it/s]


 49%|████████████████                 | 24400/50000 [4:25:31<4:15:54,  1.67it/s]
                                                                                
{'loss': 3.2957, 'grad_norm': 3.146627187728882, 'learning_rate': 0.000512, 'epoch': 1.28}

 49%|████████████████                 | 24400/50000 [4:25:31<4:15:54,  1.67it/s]


 49%|████████████████                 | 24401/50000 [4:25:32<4:32:43,  1.56it/s]


 49%|████████████████                 | 24402/50000 [4:25:33<4:31:54,  1.57it/s]


 49%|████████████████                 | 24403/50000 [4:25:33<4:20:54,  1.64it/s]


 49%|████████████████                 | 24404/50000 [4:25:34<4:12:19,  1.69it/s]


 49%|████████████████                 | 24405/50000 [4:25:34<4:12:50,  1.69it/s]


 49%|████████████████                 | 24406/50000 [4:25:35<4:03:47,  1.75it/s]


 49%|████████████████                 | 24407/50000 [4:25:36<4:05:10,  1.74it/s]


 49%|████████████████                 | 24408/50000 [4:25:36<4:04:36,  1.74it/s]


 49%|████████████████                 | 24409/50000 [4:25:37<4:03:20,  1.75it/s]


 49%|████████████████                 | 24410/50000 [4:25:37<4:14:42,  1.67it/s]


 49%|████████████████                 | 24411/50000 [4:25:38<4:38:49,  1.53it/s]


 49%|████████████████                 | 24412/50000 [4:25:39<4:28:08,  1.59it/s]


 49%|████████████████                 | 24413/50000 [4:25:40<4:57:35,  1.43it/s]


 49%|████████████████                 | 24414/50000 [4:25:40<4:43:36,  1.50it/s]


 49%|████████████████                 | 24415/50000 [4:25:41<4:33:00,  1.56it/s]


 49%|████████████████                 | 24416/50000 [4:25:41<4:27:53,  1.59it/s]


 49%|████████████████                 | 24417/50000 [4:25:42<4:22:58,  1.62it/s]


 49%|████████████████                 | 24418/50000 [4:25:43<4:31:45,  1.57it/s]


 49%|████████████████                 | 24419/50000 [4:25:43<4:33:49,  1.56it/s]


 49%|████████████████                 | 24420/50000 [4:25:44<4:32:04,  1.57it/s]


 49%|████████████████                 | 24421/50000 [4:25:44<4:32:25,  1.56it/s]


 49%|████████████████                 | 24422/50000 [4:25:45<4:35:55,  1.55it/s]


 49%|████████████████                 | 24423/50000 [4:25:46<4:39:15,  1.53it/s]


 49%|████████████████                 | 24424/50000 [4:25:47<5:06:38,  1.39it/s]


 49%|████████████████                 | 24425/50000 [4:25:47<4:50:58,  1.46it/s]


 49%|████████████████                 | 24426/50000 [4:25:48<4:35:12,  1.55it/s]


 49%|████████████████                 | 24427/50000 [4:25:48<4:25:46,  1.60it/s]


 49%|████████████████                 | 24428/50000 [4:25:49<4:19:01,  1.65it/s]


 49%|████████████████                 | 24429/50000 [4:25:50<4:37:06,  1.54it/s]


 49%|████████████████                 | 24430/50000 [4:25:50<4:48:55,  1.48it/s]


 49%|████████████████                 | 24431/50000 [4:25:51<4:34:48,  1.55it/s]


 49%|████████████████▏                | 24432/50000 [4:25:52<4:32:34,  1.56it/s]


 49%|████████████████▏                | 24433/50000 [4:25:52<4:46:34,  1.49it/s]


 49%|████████████████▏                | 24434/50000 [4:25:53<4:45:36,  1.49it/s]


 49%|████████████████▏                | 24435/50000 [4:25:54<4:35:55,  1.54it/s]


 49%|████████████████▏                | 24436/50000 [4:25:54<4:22:48,  1.62it/s]


 49%|████████████████▏                | 24437/50000 [4:25:55<4:18:51,  1.65it/s]


 49%|████████████████▏                | 24438/50000 [4:25:55<4:15:43,  1.67it/s]


 49%|████████████████▏                | 24439/50000 [4:25:56<4:25:20,  1.61it/s]


 49%|████████████████▏                | 24440/50000 [4:25:57<4:26:46,  1.60it/s]


 49%|████████████████▏                | 24441/50000 [4:25:57<4:19:37,  1.64it/s]


 49%|████████████████▏                | 24442/50000 [4:25:58<4:22:38,  1.62it/s]


 49%|████████████████▏                | 24443/50000 [4:25:59<4:51:21,  1.46it/s]


 49%|████████████████▏                | 24444/50000 [4:25:59<4:45:38,  1.49it/s]


 49%|████████████████▏                | 24445/50000 [4:26:00<4:40:03,  1.52it/s]


 49%|████████████████▏                | 24446/50000 [4:26:01<4:25:41,  1.60it/s]


 49%|████████████████▏                | 24447/50000 [4:26:01<4:40:52,  1.52it/s]


 49%|████████████████▏                | 24448/50000 [4:26:02<5:12:16,  1.36it/s]


 49%|████████████████▏                | 24449/50000 [4:26:03<5:09:14,  1.38it/s]


 49%|████████████████▏                | 24450/50000 [4:26:04<4:53:19,  1.45it/s]


 49%|████████████████▏                | 24451/50000 [4:26:04<4:46:57,  1.48it/s]


 49%|████████████████▏                | 24452/50000 [4:26:05<4:35:18,  1.55it/s]


 49%|████████████████▏                | 24453/50000 [4:26:05<4:25:33,  1.60it/s]


 49%|████████████████▏                | 24454/50000 [4:26:06<4:31:43,  1.57it/s]


 49%|████████████████▏                | 24455/50000 [4:26:07<4:25:24,  1.60it/s]


 49%|████████████████▏                | 24456/50000 [4:26:07<4:37:09,  1.54it/s]


 49%|████████████████▏                | 24457/50000 [4:26:08<4:33:19,  1.56it/s]


 49%|████████████████▏                | 24458/50000 [4:26:09<4:33:48,  1.55it/s]


 49%|████████████████▏                | 24459/50000 [4:26:09<4:26:05,  1.60it/s]


 49%|████████████████▏                | 24460/50000 [4:26:10<4:20:40,  1.63it/s]


 49%|████████████████▏                | 24461/50000 [4:26:10<4:08:28,  1.71it/s]


 49%|████████████████▏                | 24462/50000 [4:26:11<4:18:50,  1.64it/s]


 49%|████████████████▏                | 24463/50000 [4:26:12<4:21:12,  1.63it/s]


 49%|████████████████▏                | 24464/50000 [4:26:12<4:34:15,  1.55it/s]


 49%|████████████████▏                | 24465/50000 [4:26:13<4:27:10,  1.59it/s]


 49%|████████████████▏                | 24466/50000 [4:26:13<4:21:22,  1.63it/s]


 49%|████████████████▏                | 24467/50000 [4:26:14<4:30:19,  1.57it/s]


 49%|████████████████▏                | 24468/50000 [4:26:15<4:14:22,  1.67it/s]


 49%|████████████████▏                | 24469/50000 [4:26:15<4:13:10,  1.68it/s]


 49%|████████████████▏                | 24470/50000 [4:26:16<4:18:53,  1.64it/s]


 49%|████████████████▏                | 24471/50000 [4:26:16<4:07:52,  1.72it/s]


 49%|████████████████▏                | 24472/50000 [4:26:17<4:30:49,  1.57it/s]


 49%|████████████████▏                | 24473/50000 [4:26:18<4:42:02,  1.51it/s]


 49%|████████████████▏                | 24474/50000 [4:26:19<4:38:59,  1.52it/s]


 49%|████████████████▏                | 24475/50000 [4:26:19<4:29:39,  1.58it/s]


 49%|████████████████▏                | 24476/50000 [4:26:20<4:41:07,  1.51it/s]


 49%|████████████████▏                | 24477/50000 [4:26:20<4:18:39,  1.64it/s]


 49%|████████████████▏                | 24478/50000 [4:26:21<4:23:25,  1.61it/s]


 49%|████████████████▏                | 24479/50000 [4:26:21<4:13:57,  1.67it/s]


 49%|████████████████▏                | 24480/50000 [4:26:22<4:20:54,  1.63it/s]


 49%|████████████████▏                | 24481/50000 [4:26:23<4:16:56,  1.66it/s]


 49%|████████████████▏                | 24482/50000 [4:26:23<4:23:01,  1.62it/s]


 49%|████████████████▏                | 24483/50000 [4:26:24<4:21:36,  1.63it/s]


 49%|████████████████▏                | 24484/50000 [4:26:25<5:02:46,  1.40it/s]


 49%|████████████████▏                | 24485/50000 [4:26:26<4:48:41,  1.47it/s]


 49%|████████████████▏                | 24486/50000 [4:26:26<4:41:59,  1.51it/s]


 49%|████████████████▏                | 24487/50000 [4:26:27<4:38:06,  1.53it/s]


 49%|████████████████▏                | 24488/50000 [4:26:27<4:41:00,  1.51it/s]


 49%|████████████████▏                | 24489/50000 [4:26:28<4:28:38,  1.58it/s]


 49%|████████████████▏                | 24490/50000 [4:26:29<4:22:23,  1.62it/s]


 49%|████████████████▏                | 24491/50000 [4:26:29<4:43:20,  1.50it/s]


 49%|████████████████▏                | 24492/50000 [4:26:30<4:43:53,  1.50it/s]


 49%|████████████████▏                | 24493/50000 [4:26:31<4:29:03,  1.58it/s]


 49%|████████████████▏                | 24494/50000 [4:26:31<4:28:17,  1.58it/s]


 49%|████████████████▏                | 24495/50000 [4:26:32<4:40:56,  1.51it/s]


 49%|████████████████▏                | 24496/50000 [4:26:33<4:30:25,  1.57it/s]


 49%|████████████████▏                | 24497/50000 [4:26:33<4:30:47,  1.57it/s]


 49%|████████████████▏                | 24498/50000 [4:26:34<4:46:00,  1.49it/s]


 49%|████████████████▏                | 24499/50000 [4:26:35<4:35:18,  1.54it/s]


 49%|████████████████▏                | 24500/50000 [4:26:35<5:13:03,  1.36it/s]
                                                                                
{'loss': 3.3383, 'grad_norm': 2.9761409759521484, 'learning_rate': 0.00051, 'epoch': 1.28}

 49%|████████████████▏                | 24500/50000 [4:26:35<5:13:03,  1.36it/s]


 49%|████████████████▏                | 24501/50000 [4:26:36<5:11:27,  1.36it/s]


 49%|████████████████▏                | 24502/50000 [4:26:37<5:02:44,  1.40it/s]


 49%|████████████████▏                | 24503/50000 [4:26:37<4:51:01,  1.46it/s]


 49%|████████████████▏                | 24504/50000 [4:26:38<4:46:51,  1.48it/s]


 49%|████████████████▏                | 24505/50000 [4:26:39<4:46:23,  1.48it/s]


 49%|████████████████▏                | 24506/50000 [4:26:39<4:23:52,  1.61it/s]


 49%|████████████████▏                | 24507/50000 [4:26:40<4:19:43,  1.64it/s]


 49%|████████████████▏                | 24508/50000 [4:26:41<4:28:02,  1.59it/s]


 49%|████████████████▏                | 24509/50000 [4:26:41<4:18:39,  1.64it/s]


 49%|████████████████▏                | 24510/50000 [4:26:42<5:01:23,  1.41it/s]


 49%|████████████████▏                | 24511/50000 [4:26:43<4:54:02,  1.44it/s]


 49%|████████████████▏                | 24512/50000 [4:26:43<4:41:38,  1.51it/s]


 49%|████████████████▏                | 24513/50000 [4:26:44<4:32:54,  1.56it/s]


 49%|████████████████▏                | 24514/50000 [4:26:45<4:44:07,  1.50it/s]


 49%|████████████████▏                | 24515/50000 [4:26:45<4:30:43,  1.57it/s]


 49%|████████████████▏                | 24516/50000 [4:26:46<4:24:29,  1.61it/s]


 49%|████████████████▏                | 24517/50000 [4:26:46<4:18:37,  1.64it/s]


 49%|████████████████▏                | 24518/50000 [4:26:47<4:16:50,  1.65it/s]


 49%|████████████████▏                | 24519/50000 [4:26:48<4:24:27,  1.61it/s]


 49%|████████████████▏                | 24520/50000 [4:26:48<4:26:37,  1.59it/s]


 49%|████████████████▏                | 24521/50000 [4:26:49<4:36:57,  1.53it/s]


 49%|████████████████▏                | 24522/50000 [4:26:50<4:28:56,  1.58it/s]


 49%|████████████████▏                | 24523/50000 [4:26:50<4:22:35,  1.62it/s]


 49%|████████████████▏                | 24524/50000 [4:26:51<4:15:41,  1.66it/s]


 49%|████████████████▏                | 24525/50000 [4:26:51<4:33:34,  1.55it/s]


 49%|████████████████▏                | 24526/50000 [4:26:52<4:27:16,  1.59it/s]


 49%|████████████████▏                | 24527/50000 [4:26:53<4:21:05,  1.63it/s]


 49%|████████████████▏                | 24528/50000 [4:26:53<4:37:25,  1.53it/s]


 49%|████████████████▏                | 24529/50000 [4:26:54<4:35:37,  1.54it/s]


 49%|████████████████▏                | 24530/50000 [4:26:55<4:29:28,  1.58it/s]


 49%|████████████████▏                | 24531/50000 [4:26:55<4:45:46,  1.49it/s]


 49%|████████████████▏                | 24532/50000 [4:26:56<4:25:01,  1.60it/s]


 49%|████████████████▏                | 24533/50000 [4:26:56<4:17:21,  1.65it/s]


 49%|████████████████▏                | 24534/50000 [4:26:57<4:43:16,  1.50it/s]


 49%|████████████████▏                | 24535/50000 [4:26:58<4:42:57,  1.50it/s]


 49%|████████████████▏                | 24536/50000 [4:26:59<4:42:43,  1.50it/s]


 49%|████████████████▏                | 24537/50000 [4:26:59<4:54:56,  1.44it/s]


 49%|████████████████▏                | 24538/50000 [4:27:00<4:48:38,  1.47it/s]


 49%|████████████████▏                | 24539/50000 [4:27:00<4:21:30,  1.62it/s]


 49%|████████████████▏                | 24540/50000 [4:27:01<4:28:31,  1.58it/s]


 49%|████████████████▏                | 24541/50000 [4:27:02<4:27:48,  1.58it/s]


 49%|████████████████▏                | 24542/50000 [4:27:02<4:13:56,  1.67it/s]


 49%|████████████████▏                | 24543/50000 [4:27:03<4:10:24,  1.69it/s]


 49%|████████████████▏                | 24544/50000 [4:27:04<4:28:26,  1.58it/s]


 49%|████████████████▏                | 24545/50000 [4:27:04<4:25:18,  1.60it/s]


 49%|████████████████▏                | 24546/50000 [4:27:05<4:17:32,  1.65it/s]


 49%|████████████████▏                | 24547/50000 [4:27:05<4:21:37,  1.62it/s]


 49%|████████████████▏                | 24548/50000 [4:27:06<4:26:05,  1.59it/s]


 49%|████████████████▏                | 24549/50000 [4:27:07<4:27:38,  1.58it/s]


 49%|████████████████▏                | 24550/50000 [4:27:07<4:27:52,  1.58it/s]


 49%|████████████████▏                | 24551/50000 [4:27:08<4:19:14,  1.64it/s]


 49%|████████████████▏                | 24552/50000 [4:27:09<4:31:42,  1.56it/s]


 49%|████████████████▏                | 24553/50000 [4:27:09<4:19:35,  1.63it/s]


 49%|████████████████▏                | 24554/50000 [4:27:10<4:24:51,  1.60it/s]


 49%|████████████████▏                | 24555/50000 [4:27:10<4:29:37,  1.57it/s]


 49%|████████████████▏                | 24556/50000 [4:27:11<4:19:17,  1.64it/s]


 49%|████████████████▏                | 24557/50000 [4:27:12<4:22:18,  1.62it/s]


 49%|████████████████▏                | 24558/50000 [4:27:12<4:23:43,  1.61it/s]


 49%|████████████████▏                | 24559/50000 [4:27:13<4:37:00,  1.53it/s]


 49%|████████████████▏                | 24560/50000 [4:27:14<4:29:02,  1.58it/s]


 49%|████████████████▏                | 24561/50000 [4:27:14<4:40:49,  1.51it/s]


 49%|████████████████▏                | 24562/50000 [4:27:15<4:35:45,  1.54it/s]


 49%|████████████████▏                | 24563/50000 [4:27:16<4:28:54,  1.58it/s]


 49%|████████████████▏                | 24564/50000 [4:27:16<4:18:39,  1.64it/s]


 49%|████████████████▏                | 24565/50000 [4:27:17<4:46:40,  1.48it/s]


 49%|████████████████▏                | 24566/50000 [4:27:18<4:37:22,  1.53it/s]


 49%|████████████████▏                | 24567/50000 [4:27:18<4:34:49,  1.54it/s]


 49%|████████████████▏                | 24568/50000 [4:27:19<4:33:42,  1.55it/s]


 49%|████████████████▏                | 24569/50000 [4:27:20<4:50:09,  1.46it/s]


 49%|████████████████▏                | 24570/50000 [4:27:20<4:43:40,  1.49it/s]


 49%|████████████████▏                | 24571/50000 [4:27:21<4:31:46,  1.56it/s]


 49%|████████████████▏                | 24572/50000 [4:27:22<4:45:24,  1.48it/s]


 49%|████████████████▏                | 24573/50000 [4:27:22<4:52:35,  1.45it/s]


 49%|████████████████▏                | 24574/50000 [4:27:23<5:00:52,  1.41it/s]


 49%|████████████████▏                | 24575/50000 [4:27:24<4:55:04,  1.44it/s]


 49%|████████████████▏                | 24576/50000 [4:27:24<4:59:27,  1.41it/s]


 49%|████████████████▏                | 24577/50000 [4:27:25<5:02:24,  1.40it/s]


 49%|████████████████▏                | 24578/50000 [4:27:26<4:50:34,  1.46it/s]


 49%|████████████████▏                | 24579/50000 [4:27:26<4:39:49,  1.51it/s]


 49%|████████████████▏                | 24580/50000 [4:27:27<4:40:19,  1.51it/s]


 49%|████████████████▏                | 24581/50000 [4:27:28<4:31:03,  1.56it/s]


 49%|████████████████▏                | 24582/50000 [4:27:28<4:20:34,  1.63it/s]


 49%|████████████████▏                | 24583/50000 [4:27:29<4:17:12,  1.65it/s]


 49%|████████████████▏                | 24584/50000 [4:27:29<4:11:47,  1.68it/s]


 49%|████████████████▏                | 24585/50000 [4:27:30<4:10:19,  1.69it/s]


 49%|████████████████▏                | 24586/50000 [4:27:30<4:07:40,  1.71it/s]


 49%|████████████████▏                | 24587/50000 [4:27:31<4:08:13,  1.71it/s]


 49%|████████████████▏                | 24588/50000 [4:27:32<4:25:01,  1.60it/s]


 49%|████████████████▏                | 24589/50000 [4:27:32<4:16:23,  1.65it/s]


 49%|████████████████▏                | 24590/50000 [4:27:33<4:14:16,  1.67it/s]


 49%|████████████████▏                | 24591/50000 [4:27:34<4:14:24,  1.66it/s]


 49%|████████████████▏                | 24592/50000 [4:27:34<4:14:23,  1.66it/s]


 49%|████████████████▏                | 24593/50000 [4:27:35<4:02:45,  1.74it/s]


 49%|████████████████▏                | 24594/50000 [4:27:35<4:02:19,  1.75it/s]


 49%|████████████████▏                | 24595/50000 [4:27:36<4:04:22,  1.73it/s]


 49%|████████████████▏                | 24596/50000 [4:27:37<4:21:56,  1.62it/s]


 49%|████████████████▏                | 24597/50000 [4:27:37<4:19:43,  1.63it/s]


 49%|████████████████▏                | 24598/50000 [4:27:38<4:23:47,  1.60it/s]


 49%|████████████████▏                | 24599/50000 [4:27:38<4:14:41,  1.66it/s]


 49%|████████████████▏                | 24600/50000 [4:27:39<4:17:53,  1.64it/s]
                                                                                
{'loss': 3.2597, 'grad_norm': 3.022512674331665, 'learning_rate': 0.000508, 'epoch': 1.29}

 49%|████████████████▏                | 24600/50000 [4:27:39<4:17:53,  1.64it/s]


 49%|████████████████▏                | 24601/50000 [4:27:40<4:35:44,  1.54it/s]


 49%|████████████████▏                | 24602/50000 [4:27:40<4:34:58,  1.54it/s]


 49%|████████████████▏                | 24603/50000 [4:27:41<4:28:48,  1.57it/s]


 49%|████████████████▏                | 24604/50000 [4:27:42<4:54:57,  1.43it/s]


 49%|████████████████▏                | 24605/50000 [4:27:42<4:39:36,  1.51it/s]


 49%|████████████████▏                | 24606/50000 [4:27:43<4:31:14,  1.56it/s]


 49%|████████████████▏                | 24607/50000 [4:27:44<4:30:59,  1.56it/s]


 49%|████████████████▏                | 24608/50000 [4:27:44<4:21:26,  1.62it/s]


 49%|████████████████▏                | 24609/50000 [4:27:45<4:26:50,  1.59it/s]


 49%|████████████████▏                | 24610/50000 [4:27:45<4:22:33,  1.61it/s]


 49%|████████████████▏                | 24611/50000 [4:27:46<4:24:24,  1.60it/s]


 49%|████████████████▏                | 24612/50000 [4:27:47<4:39:37,  1.51it/s]


 49%|████████████████▏                | 24613/50000 [4:27:48<5:00:18,  1.41it/s]


 49%|████████████████▏                | 24614/50000 [4:27:48<4:35:58,  1.53it/s]


 49%|████████████████▏                | 24615/50000 [4:27:49<4:47:46,  1.47it/s]


 49%|████████████████▏                | 24616/50000 [4:27:50<4:52:01,  1.45it/s]


 49%|████████████████▏                | 24617/50000 [4:27:50<4:55:47,  1.43it/s]


 49%|████████████████▏                | 24618/50000 [4:27:51<4:42:00,  1.50it/s]


 49%|████████████████▏                | 24619/50000 [4:27:52<4:37:07,  1.53it/s]


 49%|████████████████▏                | 24620/50000 [4:27:52<4:28:52,  1.57it/s]


 49%|████████████████▏                | 24621/50000 [4:27:53<4:23:07,  1.61it/s]


 49%|████████████████▎                | 24622/50000 [4:27:53<4:36:19,  1.53it/s]


 49%|████████████████▎                | 24623/50000 [4:27:54<4:28:21,  1.58it/s]


 49%|████████████████▎                | 24624/50000 [4:27:55<4:38:10,  1.52it/s]


 49%|████████████████▎                | 24625/50000 [4:27:55<4:40:54,  1.51it/s]


 49%|████████████████▎                | 24626/50000 [4:27:56<4:31:34,  1.56it/s]


 49%|████████████████▎                | 24627/50000 [4:27:57<4:55:17,  1.43it/s]


 49%|████████████████▎                | 24628/50000 [4:27:57<4:47:47,  1.47it/s]


 49%|████████████████▎                | 24629/50000 [4:27:58<4:44:56,  1.48it/s]


 49%|████████████████▎                | 24630/50000 [4:27:59<4:36:30,  1.53it/s]


 49%|████████████████▎                | 24631/50000 [4:27:59<4:44:23,  1.49it/s]


 49%|████████████████▎                | 24632/50000 [4:28:00<4:42:41,  1.50it/s]


 49%|████████████████▎                | 24633/50000 [4:28:01<4:38:58,  1.52it/s]


 49%|████████████████▎                | 24634/50000 [4:28:01<4:41:58,  1.50it/s]


 49%|████████████████▎                | 24635/50000 [4:28:02<5:02:28,  1.40it/s]


 49%|████████████████▎                | 24636/50000 [4:28:03<4:43:13,  1.49it/s]


 49%|████████████████▎                | 24637/50000 [4:28:04<4:50:04,  1.46it/s]


 49%|████████████████▎                | 24638/50000 [4:28:04<4:44:52,  1.48it/s]


 49%|████████████████▎                | 24639/50000 [4:28:05<4:30:06,  1.56it/s]


 49%|████████████████▎                | 24640/50000 [4:28:05<4:23:45,  1.60it/s]


 49%|████████████████▎                | 24641/50000 [4:28:06<4:16:55,  1.64it/s]


 49%|████████████████▎                | 24642/50000 [4:28:07<4:33:13,  1.55it/s]


 49%|████████████████▎                | 24643/50000 [4:28:07<4:44:38,  1.48it/s]


 49%|████████████████▎                | 24644/50000 [4:28:08<4:30:45,  1.56it/s]


 49%|████████████████▎                | 24645/50000 [4:28:09<4:36:17,  1.53it/s]


 49%|████████████████▎                | 24646/50000 [4:28:09<4:36:00,  1.53it/s]


 49%|████████████████▎                | 24647/50000 [4:28:10<4:38:16,  1.52it/s]


 49%|████████████████▎                | 24648/50000 [4:28:11<4:30:12,  1.56it/s]


 49%|████████████████▎                | 24649/50000 [4:28:12<5:06:16,  1.38it/s]


 49%|████████████████▎                | 24650/50000 [4:28:12<4:55:23,  1.43it/s]


 49%|████████████████▎                | 24651/50000 [4:28:13<4:56:42,  1.42it/s]


 49%|████████████████▎                | 24652/50000 [4:28:13<4:40:39,  1.51it/s]


 49%|████████████████▎                | 24653/50000 [4:28:14<4:37:23,  1.52it/s]


 49%|████████████████▎                | 24654/50000 [4:28:15<4:24:45,  1.60it/s]


 49%|████████████████▎                | 24655/50000 [4:28:15<4:20:41,  1.62it/s]


 49%|████████████████▎                | 24656/50000 [4:28:16<4:44:59,  1.48it/s]


 49%|████████████████▎                | 24657/50000 [4:28:17<4:31:39,  1.55it/s]


 49%|████████████████▎                | 24658/50000 [4:28:17<4:23:55,  1.60it/s]


 49%|████████████████▎                | 24659/50000 [4:28:18<4:29:03,  1.57it/s]


 49%|████████████████▎                | 24660/50000 [4:28:18<4:29:18,  1.57it/s]


 49%|████████████████▎                | 24661/50000 [4:28:19<4:18:21,  1.63it/s]


 49%|████████████████▎                | 24662/50000 [4:28:20<4:13:50,  1.66it/s]


 49%|████████████████▎                | 24663/50000 [4:28:20<4:08:17,  1.70it/s]


 49%|████████████████▎                | 24664/50000 [4:28:21<4:23:36,  1.60it/s]


 49%|████████████████▎                | 24665/50000 [4:28:22<4:25:29,  1.59it/s]


 49%|████████████████▎                | 24666/50000 [4:28:22<4:30:21,  1.56it/s]


 49%|████████████████▎                | 24667/50000 [4:28:23<4:33:34,  1.54it/s]


 49%|████████████████▎                | 24668/50000 [4:28:24<4:36:34,  1.53it/s]


 49%|████████████████▎                | 24669/50000 [4:28:24<4:20:37,  1.62it/s]


 49%|████████████████▎                | 24670/50000 [4:28:25<4:40:36,  1.50it/s]


 49%|████████████████▎                | 24671/50000 [4:28:25<4:41:47,  1.50it/s]


 49%|████████████████▎                | 24672/50000 [4:28:26<4:23:41,  1.60it/s]


 49%|████████████████▎                | 24673/50000 [4:28:27<4:35:11,  1.53it/s]


 49%|████████████████▎                | 24674/50000 [4:28:27<4:26:26,  1.58it/s]


 49%|████████████████▎                | 24675/50000 [4:28:28<4:19:28,  1.63it/s]


 49%|████████████████▎                | 24676/50000 [4:28:28<4:01:25,  1.75it/s]


 49%|████████████████▎                | 24677/50000 [4:28:29<4:25:28,  1.59it/s]


 49%|████████████████▎                | 24678/50000 [4:28:30<4:25:10,  1.59it/s]


 49%|████████████████▎                | 24679/50000 [4:28:31<4:48:03,  1.47it/s]


 49%|████████████████▎                | 24680/50000 [4:28:31<4:45:46,  1.48it/s]


 49%|████████████████▎                | 24681/50000 [4:28:32<4:49:44,  1.46it/s]


 49%|████████████████▎                | 24682/50000 [4:28:33<4:52:37,  1.44it/s]


 49%|████████████████▎                | 24683/50000 [4:28:33<4:55:43,  1.43it/s]


 49%|████████████████▎                | 24684/50000 [4:28:34<4:51:05,  1.45it/s]


 49%|████████████████▎                | 24685/50000 [4:28:35<4:47:06,  1.47it/s]


 49%|████████████████▎                | 24686/50000 [4:28:35<4:34:43,  1.54it/s]


 49%|████████████████▎                | 24687/50000 [4:28:36<4:38:53,  1.51it/s]


 49%|████████████████▎                | 24688/50000 [4:28:37<5:02:32,  1.39it/s]


 49%|████████████████▎                | 24689/50000 [4:28:37<4:52:38,  1.44it/s]


 49%|████████████████▎                | 24690/50000 [4:28:38<4:28:34,  1.57it/s]


 49%|████████████████▎                | 24691/50000 [4:28:39<4:27:16,  1.58it/s]


 49%|████████████████▎                | 24692/50000 [4:28:39<4:20:01,  1.62it/s]


 49%|████████████████▎                | 24693/50000 [4:28:40<4:25:15,  1.59it/s]


 49%|████████████████▎                | 24694/50000 [4:28:40<4:25:42,  1.59it/s]


 49%|████████████████▎                | 24695/50000 [4:28:41<4:40:08,  1.51it/s]


 49%|████████████████▎                | 24696/50000 [4:28:42<4:53:34,  1.44it/s]


 49%|████████████████▎                | 24697/50000 [4:28:43<4:55:20,  1.43it/s]


 49%|████████████████▎                | 24698/50000 [4:28:43<4:59:06,  1.41it/s]


 49%|████████████████▎                | 24699/50000 [4:28:44<4:50:09,  1.45it/s]


 49%|████████████████▎                | 24700/50000 [4:28:45<4:47:57,  1.46it/s]
                                                                                
{'loss': 3.2534, 'grad_norm': 3.819319725036621, 'learning_rate': 0.000506, 'epoch': 1.29}

 49%|████████████████▎                | 24700/50000 [4:28:45<4:47:57,  1.46it/s]


 49%|████████████████▎                | 24701/50000 [4:28:45<4:40:52,  1.50it/s]


 49%|████████████████▎                | 24702/50000 [4:28:46<4:36:21,  1.53it/s]


 49%|████████████████▎                | 24703/50000 [4:28:47<4:25:44,  1.59it/s]


 49%|████████████████▎                | 24704/50000 [4:28:47<4:51:38,  1.45it/s]


 49%|████████████████▎                | 24705/50000 [4:28:48<4:40:11,  1.50it/s]


 49%|████████████████▎                | 24706/50000 [4:28:49<4:36:52,  1.52it/s]


 49%|████████████████▎                | 24707/50000 [4:28:49<5:02:00,  1.40it/s]


 49%|████████████████▎                | 24708/50000 [4:28:50<4:50:41,  1.45it/s]


 49%|████████████████▎                | 24709/50000 [4:28:51<4:37:55,  1.52it/s]


 49%|████████████████▎                | 24710/50000 [4:28:51<4:36:14,  1.53it/s]


 49%|████████████████▎                | 24711/50000 [4:28:52<4:45:48,  1.47it/s]


 49%|████████████████▎                | 24712/50000 [4:28:53<4:57:14,  1.42it/s]


 49%|████████████████▎                | 24713/50000 [4:28:53<4:42:51,  1.49it/s]


 49%|████████████████▎                | 24714/50000 [4:28:54<5:07:15,  1.37it/s]


 49%|████████████████▎                | 24715/50000 [4:28:55<4:45:00,  1.48it/s]


 49%|████████████████▎                | 24716/50000 [4:28:56<4:44:37,  1.48it/s]


 49%|████████████████▎                | 24717/50000 [4:28:56<4:37:57,  1.52it/s]


 49%|████████████████▎                | 24718/50000 [4:28:57<4:35:38,  1.53it/s]


 49%|████████████████▎                | 24719/50000 [4:28:57<4:27:36,  1.57it/s]


 49%|████████████████▎                | 24720/50000 [4:28:58<4:32:25,  1.55it/s]


 49%|████████████████▎                | 24721/50000 [4:28:59<4:14:38,  1.65it/s]


 49%|████████████████▎                | 24722/50000 [4:28:59<4:19:17,  1.62it/s]


 49%|████████████████▎                | 24723/50000 [4:29:00<4:10:38,  1.68it/s]


 49%|████████████████▎                | 24724/50000 [4:29:00<4:17:21,  1.64it/s]


 49%|████████████████▎                | 24725/50000 [4:29:01<4:30:05,  1.56it/s]


 49%|████████████████▎                | 24726/50000 [4:29:02<4:25:19,  1.59it/s]


 49%|████████████████▎                | 24727/50000 [4:29:02<4:28:20,  1.57it/s]


 49%|████████████████▎                | 24728/50000 [4:29:03<4:30:30,  1.56it/s]


 49%|████████████████▎                | 24729/50000 [4:29:04<4:35:12,  1.53it/s]


 49%|████████████████▎                | 24730/50000 [4:29:04<4:36:03,  1.53it/s]


 49%|████████████████▎                | 24731/50000 [4:29:05<4:27:15,  1.58it/s]


 49%|████████████████▎                | 24732/50000 [4:29:06<4:38:08,  1.51it/s]


 49%|████████████████▎                | 24733/50000 [4:29:06<4:23:46,  1.60it/s]


 49%|████████████████▎                | 24734/50000 [4:29:07<4:28:51,  1.57it/s]


 49%|████████████████▎                | 24735/50000 [4:29:07<4:19:38,  1.62it/s]


 49%|████████████████▎                | 24736/50000 [4:29:08<4:13:12,  1.66it/s]


 49%|████████████████▎                | 24737/50000 [4:29:09<4:06:15,  1.71it/s]


 49%|████████████████▎                | 24738/50000 [4:29:09<4:14:07,  1.66it/s]


 49%|████████████████▎                | 24739/50000 [4:29:10<4:14:03,  1.66it/s]


 49%|████████████████▎                | 24740/50000 [4:29:10<4:13:57,  1.66it/s]


 49%|████████████████▎                | 24741/50000 [4:29:11<4:21:46,  1.61it/s]


 49%|████████████████▎                | 24742/50000 [4:29:12<4:35:31,  1.53it/s]


 49%|████████████████▎                | 24743/50000 [4:29:13<4:59:06,  1.41it/s]


 49%|████████████████▎                | 24744/50000 [4:29:13<4:54:08,  1.43it/s]


 49%|████████████████▎                | 24745/50000 [4:29:14<5:18:40,  1.32it/s]


 49%|████████████████▎                | 24746/50000 [4:29:15<4:58:07,  1.41it/s]


 49%|████████████████▎                | 24747/50000 [4:29:15<4:47:00,  1.47it/s]


 49%|████████████████▎                | 24748/50000 [4:29:16<4:36:01,  1.52it/s]


 49%|████████████████▎                | 24749/50000 [4:29:17<4:56:10,  1.42it/s]


 50%|████████████████▎                | 24750/50000 [4:29:17<4:51:10,  1.45it/s]


 50%|████████████████▎                | 24751/50000 [4:29:18<4:48:32,  1.46it/s]


 50%|████████████████▎                | 24752/50000 [4:29:19<4:41:07,  1.50it/s]


 50%|████████████████▎                | 24753/50000 [4:29:19<4:39:56,  1.50it/s]


 50%|████████████████▎                | 24754/50000 [4:29:20<4:31:02,  1.55it/s]


 50%|████████████████▎                | 24755/50000 [4:29:21<4:28:50,  1.57it/s]


 50%|████████████████▎                | 24756/50000 [4:29:21<4:31:18,  1.55it/s]


 50%|████████████████▎                | 24757/50000 [4:29:22<4:30:33,  1.56it/s]


 50%|████████████████▎                | 24758/50000 [4:29:23<4:45:33,  1.47it/s]


 50%|████████████████▎                | 24759/50000 [4:29:23<4:34:11,  1.53it/s]


 50%|████████████████▎                | 24760/50000 [4:29:24<4:36:50,  1.52it/s]


 50%|████████████████▎                | 24761/50000 [4:29:25<4:48:36,  1.46it/s]


 50%|████████████████▎                | 24762/50000 [4:29:25<4:37:21,  1.52it/s]


 50%|████████████████▎                | 24763/50000 [4:29:26<4:30:58,  1.55it/s]


 50%|████████████████▎                | 24764/50000 [4:29:27<4:39:21,  1.51it/s]


 50%|████████████████▎                | 24765/50000 [4:29:27<4:41:13,  1.50it/s]


 50%|████████████████▎                | 24766/50000 [4:29:28<4:54:03,  1.43it/s]


 50%|████████████████▎                | 24767/50000 [4:29:29<5:13:55,  1.34it/s]


 50%|████████████████▎                | 24768/50000 [4:29:30<5:00:34,  1.40it/s]


 50%|████████████████▎                | 24769/50000 [4:29:30<4:53:43,  1.43it/s]


 50%|████████████████▎                | 24770/50000 [4:29:31<4:46:33,  1.47it/s]


 50%|████████████████▎                | 24771/50000 [4:29:32<4:47:08,  1.46it/s]


 50%|████████████████▎                | 24772/50000 [4:29:32<4:31:33,  1.55it/s]


 50%|████████████████▎                | 24773/50000 [4:29:33<4:24:26,  1.59it/s]


 50%|████████████████▎                | 24774/50000 [4:29:33<4:17:45,  1.63it/s]


 50%|████████████████▎                | 24775/50000 [4:29:34<4:19:27,  1.62it/s]


 50%|████████████████▎                | 24776/50000 [4:29:35<4:14:14,  1.65it/s]


 50%|████████████████▎                | 24777/50000 [4:29:35<4:12:58,  1.66it/s]


 50%|████████████████▎                | 24778/50000 [4:29:36<4:43:07,  1.48it/s]


 50%|████████████████▎                | 24779/50000 [4:29:37<4:32:22,  1.54it/s]


 50%|████████████████▎                | 24780/50000 [4:29:37<4:22:27,  1.60it/s]


 50%|████████████████▎                | 24781/50000 [4:29:38<4:22:50,  1.60it/s]


 50%|████████████████▎                | 24782/50000 [4:29:38<4:23:02,  1.60it/s]


 50%|████████████████▎                | 24783/50000 [4:29:39<4:22:24,  1.60it/s]


 50%|████████████████▎                | 24784/50000 [4:29:40<4:38:55,  1.51it/s]


 50%|████████████████▎                | 24785/50000 [4:29:40<4:31:05,  1.55it/s]


 50%|████████████████▎                | 24786/50000 [4:29:41<4:25:53,  1.58it/s]


 50%|████████████████▎                | 24787/50000 [4:29:42<5:09:17,  1.36it/s]


 50%|████████████████▎                | 24788/50000 [4:29:43<5:07:46,  1.37it/s]


 50%|████████████████▎                | 24789/50000 [4:29:43<4:45:42,  1.47it/s]


 50%|████████████████▎                | 24790/50000 [4:29:44<4:46:11,  1.47it/s]


 50%|████████████████▎                | 24791/50000 [4:29:45<4:46:27,  1.47it/s]


 50%|████████████████▎                | 24792/50000 [4:29:45<4:35:54,  1.52it/s]


 50%|████████████████▎                | 24793/50000 [4:29:46<4:24:20,  1.59it/s]


 50%|████████████████▎                | 24794/50000 [4:29:47<4:47:09,  1.46it/s]


 50%|████████████████▎                | 24795/50000 [4:29:47<4:34:22,  1.53it/s]


 50%|████████████████▎                | 24796/50000 [4:29:48<4:30:16,  1.55it/s]


 50%|████████████████▎                | 24797/50000 [4:29:48<4:27:30,  1.57it/s]


 50%|████████████████▎                | 24798/50000 [4:29:49<4:44:59,  1.47it/s]


 50%|████████████████▎                | 24799/50000 [4:29:50<4:40:54,  1.50it/s]


 50%|████████████████▎                | 24800/50000 [4:29:50<4:34:51,  1.53it/s]
                                                                                
{'loss': 3.2918, 'grad_norm': 2.7377092838287354, 'learning_rate': 0.000504, 'epoch': 1.3}

 50%|████████████████▎                | 24800/50000 [4:29:50<4:34:51,  1.53it/s]


 50%|████████████████▎                | 24801/50000 [4:29:51<4:56:08,  1.42it/s]


 50%|████████████████▎                | 24802/50000 [4:29:52<4:57:00,  1.41it/s]


 50%|████████████████▎                | 24803/50000 [4:29:53<4:39:19,  1.50it/s]


 50%|████████████████▎                | 24804/50000 [4:29:53<4:51:03,  1.44it/s]


 50%|████████████████▎                | 24805/50000 [4:29:54<4:35:06,  1.53it/s]


 50%|████████████████▎                | 24806/50000 [4:29:55<4:39:03,  1.50it/s]


 50%|████████████████▎                | 24807/50000 [4:29:55<4:28:17,  1.57it/s]


 50%|████████████████▎                | 24808/50000 [4:29:56<4:20:40,  1.61it/s]


 50%|████████████████▎                | 24809/50000 [4:29:56<4:18:30,  1.62it/s]


 50%|████████████████▎                | 24810/50000 [4:29:57<4:23:18,  1.59it/s]


 50%|████████████████▍                | 24811/50000 [4:29:58<4:17:57,  1.63it/s]


 50%|████████████████▍                | 24812/50000 [4:29:58<4:16:34,  1.64it/s]


 50%|████████████████▍                | 24813/50000 [4:29:59<4:14:53,  1.65it/s]


 50%|████████████████▍                | 24814/50000 [4:29:59<4:17:39,  1.63it/s]


 50%|████████████████▍                | 24815/50000 [4:30:00<4:25:13,  1.58it/s]


 50%|████████████████▍                | 24816/50000 [4:30:01<4:27:58,  1.57it/s]


 50%|████████████████▍                | 24817/50000 [4:30:01<4:22:48,  1.60it/s]


 50%|████████████████▍                | 24818/50000 [4:30:02<4:33:39,  1.53it/s]


 50%|████████████████▍                | 24819/50000 [4:30:03<4:26:45,  1.57it/s]


 50%|████████████████▍                | 24820/50000 [4:30:03<4:18:49,  1.62it/s]


 50%|████████████████▍                | 24821/50000 [4:30:04<4:19:46,  1.62it/s]


 50%|████████████████▍                | 24822/50000 [4:30:04<4:11:49,  1.67it/s]


 50%|████████████████▍                | 24823/50000 [4:30:05<4:02:18,  1.73it/s]


 50%|████████████████▍                | 24824/50000 [4:30:06<4:11:53,  1.67it/s]


 50%|████████████████▍                | 24825/50000 [4:30:06<4:11:28,  1.67it/s]


 50%|████████████████▍                | 24826/50000 [4:30:07<4:18:17,  1.62it/s]


 50%|████████████████▍                | 24827/50000 [4:30:07<4:13:31,  1.65it/s]


 50%|████████████████▍                | 24828/50000 [4:30:08<4:31:58,  1.54it/s]


 50%|████████████████▍                | 24829/50000 [4:30:09<4:39:55,  1.50it/s]


 50%|████████████████▍                | 24830/50000 [4:30:09<4:37:24,  1.51it/s]


 50%|████████████████▍                | 24831/50000 [4:30:10<4:27:43,  1.57it/s]


 50%|████████████████▍                | 24832/50000 [4:30:11<5:00:48,  1.39it/s]


 50%|████████████████▍                | 24833/50000 [4:30:12<4:48:54,  1.45it/s]


 50%|████████████████▍                | 24834/50000 [4:30:12<4:46:54,  1.46it/s]


 50%|████████████████▍                | 24835/50000 [4:30:13<4:55:20,  1.42it/s]


 50%|████████████████▍                | 24836/50000 [4:30:14<4:45:08,  1.47it/s]


 50%|████████████████▍                | 24837/50000 [4:30:14<4:45:48,  1.47it/s]


 50%|████████████████▍                | 24838/50000 [4:30:15<4:54:23,  1.42it/s]


 50%|████████████████▍                | 24839/50000 [4:30:16<4:48:04,  1.46it/s]


 50%|████████████████▍                | 24840/50000 [4:30:16<4:42:03,  1.49it/s]


 50%|████████████████▍                | 24841/50000 [4:30:17<4:33:06,  1.54it/s]


 50%|████████████████▍                | 24842/50000 [4:30:18<4:55:26,  1.42it/s]


 50%|████████████████▍                | 24843/50000 [4:30:18<4:40:48,  1.49it/s]


 50%|████████████████▍                | 24844/50000 [4:30:19<5:00:58,  1.39it/s]


 50%|████████████████▍                | 24845/50000 [4:30:20<4:42:35,  1.48it/s]


 50%|████████████████▍                | 24846/50000 [4:30:20<4:33:06,  1.54it/s]


 50%|████████████████▍                | 24847/50000 [4:30:21<4:42:26,  1.48it/s]


 50%|████████████████▍                | 24848/50000 [4:30:22<4:41:30,  1.49it/s]


 50%|████████████████▍                | 24849/50000 [4:30:22<4:43:06,  1.48it/s]


 50%|████████████████▍                | 24850/50000 [4:30:23<4:23:03,  1.59it/s]


 50%|████████████████▍                | 24851/50000 [4:30:24<4:24:31,  1.58it/s]


 50%|████████████████▍                | 24852/50000 [4:30:24<4:24:43,  1.58it/s]


 50%|████████████████▍                | 24853/50000 [4:30:25<4:36:16,  1.52it/s]


 50%|████████████████▍                | 24854/50000 [4:30:26<4:43:38,  1.48it/s]


 50%|████████████████▍                | 24855/50000 [4:30:26<4:28:40,  1.56it/s]


 50%|████████████████▍                | 24856/50000 [4:30:27<4:18:10,  1.62it/s]


 50%|████████████████▍                | 24857/50000 [4:30:28<4:30:35,  1.55it/s]


 50%|████████████████▍                | 24858/50000 [4:30:28<4:19:26,  1.62it/s]


 50%|████████████████▍                | 24859/50000 [4:30:29<4:17:08,  1.63it/s]


 50%|████████████████▍                | 24860/50000 [4:30:29<4:18:41,  1.62it/s]


 50%|████████████████▍                | 24861/50000 [4:30:30<4:32:49,  1.54it/s]


 50%|████████████████▍                | 24862/50000 [4:30:31<4:20:14,  1.61it/s]


 50%|████████████████▍                | 24863/50000 [4:30:31<4:07:45,  1.69it/s]


 50%|████████████████▍                | 24864/50000 [4:30:32<4:08:06,  1.69it/s]


 50%|████████████████▍                | 24865/50000 [4:30:32<4:16:25,  1.63it/s]


 50%|████████████████▍                | 24866/50000 [4:30:33<4:30:26,  1.55it/s]


 50%|████████████████▍                | 24867/50000 [4:30:34<4:25:36,  1.58it/s]


 50%|████████████████▍                | 24868/50000 [4:30:34<4:24:02,  1.59it/s]


 50%|████████████████▍                | 24869/50000 [4:30:35<4:49:51,  1.44it/s]


 50%|████████████████▍                | 24870/50000 [4:30:36<4:48:58,  1.45it/s]


 50%|████████████████▍                | 24871/50000 [4:30:37<4:48:12,  1.45it/s]


 50%|████████████████▍                | 24872/50000 [4:30:37<4:40:34,  1.49it/s]


 50%|████████████████▍                | 24873/50000 [4:30:38<4:54:41,  1.42it/s]


 50%|████████████████▍                | 24874/50000 [4:30:39<5:02:04,  1.39it/s]


 50%|████████████████▍                | 24875/50000 [4:30:39<4:44:40,  1.47it/s]


 50%|████████████████▍                | 24876/50000 [4:30:40<4:44:21,  1.47it/s]


 50%|████████████████▍                | 24877/50000 [4:30:40<4:28:16,  1.56it/s]


 50%|████████████████▍                | 24878/50000 [4:30:41<4:29:35,  1.55it/s]


 50%|████████████████▍                | 24879/50000 [4:30:42<4:26:37,  1.57it/s]


 50%|████████████████▍                | 24880/50000 [4:30:42<4:25:20,  1.58it/s]


 50%|████████████████▍                | 24881/50000 [4:30:43<4:36:40,  1.51it/s]


 50%|████████████████▍                | 24882/50000 [4:30:44<4:26:02,  1.57it/s]


 50%|████████████████▍                | 24883/50000 [4:30:44<4:28:54,  1.56it/s]


 50%|████████████████▍                | 24884/50000 [4:30:45<4:11:24,  1.67it/s]


 50%|████████████████▍                | 24885/50000 [4:30:46<4:25:00,  1.58it/s]


 50%|████████████████▍                | 24886/50000 [4:30:46<4:28:54,  1.56it/s]


 50%|████████████████▍                | 24887/50000 [4:30:47<4:55:42,  1.42it/s]


 50%|████████████████▍                | 24888/50000 [4:30:48<4:41:40,  1.49it/s]


 50%|████████████████▍                | 24889/50000 [4:30:48<4:51:11,  1.44it/s]


 50%|████████████████▍                | 24890/50000 [4:30:49<4:36:11,  1.52it/s]


 50%|████████████████▍                | 24891/50000 [4:30:49<4:13:29,  1.65it/s]


 50%|████████████████▍                | 24892/50000 [4:30:50<4:12:09,  1.66it/s]


 50%|████████████████▍                | 24893/50000 [4:30:51<4:15:58,  1.63it/s]


 50%|████████████████▍                | 24894/50000 [4:30:51<4:07:32,  1.69it/s]


 50%|████████████████▍                | 24895/50000 [4:30:52<4:35:39,  1.52it/s]


 50%|████████████████▍                | 24896/50000 [4:30:53<4:28:28,  1.56it/s]


 50%|████████████████▍                | 24897/50000 [4:30:53<4:38:47,  1.50it/s]


 50%|████████████████▍                | 24898/50000 [4:30:54<4:46:01,  1.46it/s]


 50%|████████████████▍                | 24899/50000 [4:30:55<4:42:49,  1.48it/s]


 50%|████████████████▍                | 24900/50000 [4:30:55<4:36:41,  1.51it/s]
                                                                                
{'loss': 3.3, 'grad_norm': 3.0606138706207275, 'learning_rate': 0.0005020000000000001, 'epoch': 1.3}

 50%|████████████████▍                | 24900/50000 [4:30:55<4:36:41,  1.51it/s]


 50%|████████████████▍                | 24901/50000 [4:30:56<4:28:27,  1.56it/s]


 50%|████████████████▍                | 24902/50000 [4:30:57<4:21:49,  1.60it/s]


 50%|████████████████▍                | 24903/50000 [4:30:57<4:29:04,  1.55it/s]


 50%|████████████████▍                | 24904/50000 [4:30:58<4:22:15,  1.59it/s]


 50%|████████████████▍                | 24905/50000 [4:30:59<4:26:58,  1.57it/s]


 50%|████████████████▍                | 24906/50000 [4:30:59<4:18:25,  1.62it/s]


 50%|████████████████▍                | 24907/50000 [4:31:00<4:15:29,  1.64it/s]


 50%|████████████████▍                | 24908/50000 [4:31:00<4:12:40,  1.66it/s]


 50%|████████████████▍                | 24909/50000 [4:31:01<4:16:19,  1.63it/s]


 50%|████████████████▍                | 24910/50000 [4:31:01<4:10:59,  1.67it/s]


 50%|████████████████▍                | 24911/50000 [4:31:02<4:15:52,  1.63it/s]


 50%|████████████████▍                | 24912/50000 [4:31:03<4:09:03,  1.68it/s]


 50%|████████████████▍                | 24913/50000 [4:31:03<4:35:49,  1.52it/s]


 50%|████████████████▍                | 24914/50000 [4:31:04<4:44:37,  1.47it/s]


 50%|████████████████▍                | 24915/50000 [4:31:05<4:37:06,  1.51it/s]


 50%|████████████████▍                | 24916/50000 [4:31:05<4:36:29,  1.51it/s]


 50%|████████████████▍                | 24917/50000 [4:31:06<4:27:19,  1.56it/s]


 50%|████████████████▍                | 24918/50000 [4:31:07<4:22:25,  1.59it/s]


 50%|████████████████▍                | 24919/50000 [4:31:07<4:15:31,  1.64it/s]


 50%|████████████████▍                | 24920/50000 [4:31:08<4:18:32,  1.62it/s]


 50%|████████████████▍                | 24921/50000 [4:31:09<4:18:49,  1.61it/s]


 50%|████████████████▍                | 24922/50000 [4:31:09<4:16:32,  1.63it/s]


 50%|████████████████▍                | 24923/50000 [4:31:10<4:21:14,  1.60it/s]


 50%|████████████████▍                | 24924/50000 [4:31:11<4:40:46,  1.49it/s]


 50%|████████████████▍                | 24925/50000 [4:31:11<4:27:54,  1.56it/s]


 50%|████████████████▍                | 24926/50000 [4:31:12<4:26:55,  1.57it/s]


 50%|████████████████▍                | 24927/50000 [4:31:12<4:30:59,  1.54it/s]


 50%|████████████████▍                | 24928/50000 [4:31:13<4:25:14,  1.58it/s]


 50%|████████████████▍                | 24929/50000 [4:31:14<4:16:25,  1.63it/s]


 50%|████████████████▍                | 24930/50000 [4:31:14<4:17:22,  1.62it/s]


 50%|████████████████▍                | 24931/50000 [4:31:15<4:19:32,  1.61it/s]


 50%|████████████████▍                | 24932/50000 [4:31:15<4:23:29,  1.59it/s]


 50%|████████████████▍                | 24933/50000 [4:31:16<4:29:25,  1.55it/s]


 50%|████████████████▍                | 24934/50000 [4:31:17<4:14:04,  1.64it/s]


 50%|████████████████▍                | 24935/50000 [4:31:17<4:20:29,  1.60it/s]


 50%|████████████████▍                | 24936/50000 [4:31:18<4:25:45,  1.57it/s]


 50%|████████████████▍                | 24937/50000 [4:31:19<4:15:58,  1.63it/s]


 50%|████████████████▍                | 24938/50000 [4:31:19<4:10:38,  1.67it/s]


 50%|████████████████▍                | 24939/50000 [4:31:20<4:16:27,  1.63it/s]


 50%|████████████████▍                | 24940/50000 [4:31:20<4:14:25,  1.64it/s]


 50%|████████████████▍                | 24941/50000 [4:31:21<4:22:32,  1.59it/s]


 50%|████████████████▍                | 24942/50000 [4:31:22<4:24:02,  1.58it/s]


 50%|████████████████▍                | 24943/50000 [4:31:22<4:36:32,  1.51it/s]


 50%|████████████████▍                | 24944/50000 [4:31:23<5:10:32,  1.34it/s]


 50%|████████████████▍                | 24945/50000 [4:31:24<5:01:44,  1.38it/s]


 50%|████████████████▍                | 24946/50000 [4:31:25<4:56:19,  1.41it/s]


 50%|████████████████▍                | 24947/50000 [4:31:25<4:35:52,  1.51it/s]


 50%|████████████████▍                | 24948/50000 [4:31:26<4:35:44,  1.51it/s]


 50%|████████████████▍                | 24949/50000 [4:31:27<4:34:42,  1.52it/s]


 50%|████████████████▍                | 24950/50000 [4:31:27<4:43:48,  1.47it/s]


 50%|████████████████▍                | 24951/50000 [4:31:28<4:49:18,  1.44it/s]


 50%|████████████████▍                | 24952/50000 [4:31:29<4:36:56,  1.51it/s]


 50%|████████████████▍                | 24953/50000 [4:31:29<4:37:49,  1.50it/s]


 50%|████████████████▍                | 24954/50000 [4:31:30<4:44:04,  1.47it/s]


 50%|████████████████▍                | 24955/50000 [4:31:31<4:36:36,  1.51it/s]


 50%|████████████████▍                | 24956/50000 [4:31:31<4:48:15,  1.45it/s]


 50%|████████████████▍                | 24957/50000 [4:31:32<4:33:55,  1.52it/s]


 50%|████████████████▍                | 24958/50000 [4:31:33<4:41:36,  1.48it/s]


 50%|████████████████▍                | 24959/50000 [4:31:33<4:51:53,  1.43it/s]


 50%|████████████████▍                | 24960/50000 [4:31:34<4:29:47,  1.55it/s]


 50%|████████████████▍                | 24961/50000 [4:31:35<4:38:24,  1.50it/s]


 50%|████████████████▍                | 24962/50000 [4:31:35<4:27:49,  1.56it/s]


 50%|████████████████▍                | 24963/50000 [4:31:36<4:31:28,  1.54it/s]


 50%|████████████████▍                | 24964/50000 [4:31:37<4:43:01,  1.47it/s]


 50%|████████████████▍                | 24965/50000 [4:31:38<5:01:47,  1.38it/s]


 50%|████████████████▍                | 24966/50000 [4:31:38<5:02:36,  1.38it/s]


 50%|████████████████▍                | 24967/50000 [4:31:39<5:01:39,  1.38it/s]


 50%|████████████████▍                | 24968/50000 [4:31:40<4:43:12,  1.47it/s]


 50%|████████████████▍                | 24969/50000 [4:31:40<4:43:48,  1.47it/s]


 50%|████████████████▍                | 24970/50000 [4:31:41<4:42:34,  1.48it/s]


 50%|████████████████▍                | 24971/50000 [4:31:42<4:36:11,  1.51it/s]


 50%|████████████████▍                | 24972/50000 [4:31:42<4:30:58,  1.54it/s]


 50%|████████████████▍                | 24973/50000 [4:31:43<4:19:24,  1.61it/s]


 50%|████████████████▍                | 24974/50000 [4:31:43<4:35:15,  1.52it/s]


 50%|████████████████▍                | 24975/50000 [4:31:44<4:45:38,  1.46it/s]


 50%|████████████████▍                | 24976/50000 [4:31:45<4:35:55,  1.51it/s]


 50%|████████████████▍                | 24977/50000 [4:31:45<4:33:49,  1.52it/s]


 50%|████████████████▍                | 24978/50000 [4:31:46<4:46:14,  1.46it/s]


 50%|████████████████▍                | 24979/50000 [4:31:47<4:51:49,  1.43it/s]


 50%|████████████████▍                | 24980/50000 [4:31:48<4:37:12,  1.50it/s]


 50%|████████████████▍                | 24981/50000 [4:31:48<4:29:16,  1.55it/s]


 50%|████████████████▍                | 24982/50000 [4:31:49<4:17:25,  1.62it/s]


 50%|████████████████▍                | 24983/50000 [4:31:49<4:24:11,  1.58it/s]


 50%|████████████████▍                | 24984/50000 [4:31:50<4:34:45,  1.52it/s]


 50%|████████████████▍                | 24985/50000 [4:31:51<4:25:10,  1.57it/s]


 50%|████████████████▍                | 24986/50000 [4:31:51<4:17:43,  1.62it/s]


 50%|████████████████▍                | 24987/50000 [4:31:52<4:36:29,  1.51it/s]


 50%|████████████████▍                | 24988/50000 [4:31:53<4:22:27,  1.59it/s]


 50%|████████████████▍                | 24989/50000 [4:31:53<4:19:41,  1.61it/s]


 50%|████████████████▍                | 24990/50000 [4:31:54<4:31:36,  1.53it/s]


 50%|████████████████▍                | 24991/50000 [4:31:55<4:34:49,  1.52it/s]


 50%|████████████████▍                | 24992/50000 [4:31:55<4:23:38,  1.58it/s]


 50%|████████████████▍                | 24993/50000 [4:31:56<4:48:09,  1.45it/s]


 50%|████████████████▍                | 24994/50000 [4:31:57<4:47:10,  1.45it/s]


 50%|████████████████▍                | 24995/50000 [4:31:57<4:45:43,  1.46it/s]


 50%|████████████████▍                | 24996/50000 [4:31:58<4:43:55,  1.47it/s]


 50%|████████████████▍                | 24997/50000 [4:31:59<4:32:20,  1.53it/s]


 50%|████████████████▍                | 24998/50000 [4:31:59<4:34:29,  1.52it/s]


 50%|████████████████▍                | 24999/50000 [4:32:00<5:10:56,  1.34it/s]


 50%|████████████████▌                | 25000/50000 [4:32:01<5:02:17,  1.38it/s]
                                                                                
{'loss': 3.2704, 'grad_norm': 3.3508472442626953, 'learning_rate': 0.0005, 'epoch': 1.31}

 50%|████████████████▌                | 25000/50000 [4:32:01<5:02:17,  1.38it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.10s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:03<00:01,  1.26s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:04<00:00,  1.27s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 32.171838, 'eval_rouge-2': 7.428891999999999, 'eval_rouge-l': 26.023281999999995, 'eval_bleu-4': 0.03265535400522058, 'eval_runtime': 8.0046, 'eval_samples_per_second': 6.246, 'eval_steps_per_second': 0.5, 'epoch': 1.31}

 50%|████████████████▌                | 25000/50000 [4:32:09<5:02:17,  1.38it/s]

100%|█████████████████████████████████████████████| 4/4 [00:05<00:00,  1.27s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-25000


tokenizer config file saved in ./output/tmp-checkpoint-25000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-25000/special_tokens_map.json



 50%|████████████████                | 25001/50000 [4:32:10<21:44:44,  3.13s/it]


 50%|████████████████                | 25002/50000 [4:32:10<16:54:29,  2.43s/it]


 50%|████████████████                | 25003/50000 [4:32:11<13:13:10,  1.90s/it]


 50%|████████████████                | 25004/50000 [4:32:12<10:24:52,  1.50s/it]


 50%|████████████████▌                | 25005/50000 [4:32:12<8:46:47,  1.26s/it]


 50%|████████████████▌                | 25006/50000 [4:32:13<7:20:02,  1.06s/it]


 50%|████████████████▌                | 25007/50000 [4:32:14<6:33:26,  1.06it/s]


 50%|████████████████▌                | 25008/50000 [4:32:14<6:06:37,  1.14it/s]


 50%|████████████████▌                | 25009/50000 [4:32:15<5:40:27,  1.22it/s]


 50%|████████████████▌                | 25010/50000 [4:32:16<5:09:27,  1.35it/s]


 50%|████████████████▌                | 25011/50000 [4:32:16<4:40:14,  1.49it/s]


 50%|████████████████▌                | 25012/50000 [4:32:17<4:39:30,  1.49it/s]


 50%|████████████████▌                | 25013/50000 [4:32:17<4:34:02,  1.52it/s]


 50%|████████████████▌                | 25014/50000 [4:32:18<4:20:41,  1.60it/s]


 50%|████████████████▌                | 25015/50000 [4:32:19<4:47:40,  1.45it/s]


 50%|████████████████▌                | 25016/50000 [4:32:19<4:36:27,  1.51it/s]


 50%|████████████████▌                | 25017/50000 [4:32:20<4:36:01,  1.51it/s]


 50%|████████████████▌                | 25018/50000 [4:32:21<4:34:47,  1.52it/s]


 50%|████████████████▌                | 25019/50000 [4:32:21<4:35:20,  1.51it/s]


 50%|████████████████▌                | 25020/50000 [4:32:22<4:41:18,  1.48it/s]


 50%|████████████████▌                | 25021/50000 [4:32:23<4:42:20,  1.47it/s]


 50%|████████████████▌                | 25022/50000 [4:32:23<4:35:57,  1.51it/s]


 50%|████████████████▌                | 25023/50000 [4:32:24<4:23:33,  1.58it/s]


 50%|████████████████▌                | 25024/50000 [4:32:25<4:25:06,  1.57it/s]


 50%|████████████████▌                | 25025/50000 [4:32:25<4:24:37,  1.57it/s]


 50%|████████████████▌                | 25026/50000 [4:32:26<4:14:05,  1.64it/s]


 50%|████████████████▌                | 25027/50000 [4:32:26<4:27:12,  1.56it/s]


 50%|████████████████▌                | 25028/50000 [4:32:27<4:27:41,  1.55it/s]


 50%|████████████████▌                | 25029/50000 [4:32:28<4:53:46,  1.42it/s]


 50%|████████████████▌                | 25030/50000 [4:32:29<4:55:43,  1.41it/s]


 50%|████████████████▌                | 25031/50000 [4:32:29<4:50:41,  1.43it/s]


 50%|████████████████▌                | 25032/50000 [4:32:30<4:48:36,  1.44it/s]


 50%|████████████████▌                | 25033/50000 [4:32:31<4:44:30,  1.46it/s]


 50%|████████████████▌                | 25034/50000 [4:32:31<4:51:21,  1.43it/s]


 50%|████████████████▌                | 25035/50000 [4:32:32<4:55:26,  1.41it/s]


 50%|████████████████▌                | 25036/50000 [4:32:33<4:32:15,  1.53it/s]


 50%|████████████████▌                | 25037/50000 [4:32:33<4:32:47,  1.53it/s]


 50%|████████████████▌                | 25038/50000 [4:32:34<4:29:10,  1.55it/s]


 50%|████████████████▌                | 25039/50000 [4:32:35<4:29:44,  1.54it/s]


 50%|████████████████▌                | 25040/50000 [4:32:35<4:30:55,  1.54it/s]


 50%|████████████████▌                | 25041/50000 [4:32:36<4:30:10,  1.54it/s]


 50%|████████████████▌                | 25042/50000 [4:32:37<4:40:18,  1.48it/s]


 50%|████████████████▌                | 25043/50000 [4:32:37<4:47:14,  1.45it/s]


 50%|████████████████▌                | 25044/50000 [4:32:38<4:40:04,  1.49it/s]


 50%|████████████████▌                | 25045/50000 [4:32:39<4:37:22,  1.50it/s]


 50%|████████████████▌                | 25046/50000 [4:32:39<4:28:20,  1.55it/s]


 50%|████████████████▌                | 25047/50000 [4:32:40<4:26:47,  1.56it/s]


 50%|████████████████▌                | 25048/50000 [4:32:41<4:31:12,  1.53it/s]


 50%|████████████████▌                | 25049/50000 [4:32:41<4:18:34,  1.61it/s]


 50%|████████████████▌                | 25050/50000 [4:32:42<4:11:18,  1.65it/s]


 50%|████████████████▌                | 25051/50000 [4:32:42<4:26:17,  1.56it/s]


 50%|████████████████▌                | 25052/50000 [4:32:43<4:29:20,  1.54it/s]


 50%|████████████████▌                | 25053/50000 [4:32:44<4:32:12,  1.53it/s]


 50%|████████████████▌                | 25054/50000 [4:32:44<4:24:43,  1.57it/s]


 50%|████████████████▌                | 25055/50000 [4:32:45<4:27:27,  1.55it/s]


 50%|████████████████▌                | 25056/50000 [4:32:46<4:21:29,  1.59it/s]


 50%|████████████████▌                | 25057/50000 [4:32:46<4:22:04,  1.59it/s]


 50%|████████████████▌                | 25058/50000 [4:32:47<4:55:36,  1.41it/s]


 50%|████████████████▌                | 25059/50000 [4:32:48<4:24:25,  1.57it/s]


 50%|████████████████▌                | 25060/50000 [4:32:48<4:13:50,  1.64it/s]


 50%|████████████████▌                | 25061/50000 [4:32:49<4:10:23,  1.66it/s]


 50%|████████████████▌                | 25062/50000 [4:32:49<4:16:33,  1.62it/s]


 50%|████████████████▌                | 25063/50000 [4:32:50<4:19:39,  1.60it/s]


 50%|████████████████▌                | 25064/50000 [4:32:51<4:12:32,  1.65it/s]


 50%|████████████████▌                | 25065/50000 [4:32:51<4:02:00,  1.72it/s]


 50%|████████████████▌                | 25066/50000 [4:32:52<4:10:02,  1.66it/s]


 50%|████████████████▌                | 25067/50000 [4:32:52<4:18:03,  1.61it/s]


 50%|████████████████▌                | 25068/50000 [4:32:53<4:20:17,  1.60it/s]


 50%|████████████████▌                | 25069/50000 [4:32:54<5:09:34,  1.34it/s]


 50%|████████████████▌                | 25070/50000 [4:32:55<5:09:16,  1.34it/s]


 50%|████████████████▌                | 25071/50000 [4:32:55<4:53:58,  1.41it/s]


 50%|████████████████▌                | 25072/50000 [4:32:56<4:35:19,  1.51it/s]


 50%|████████████████▌                | 25073/50000 [4:32:57<4:45:21,  1.46it/s]


 50%|████████████████▌                | 25074/50000 [4:32:57<4:31:35,  1.53it/s]


 50%|████████████████▌                | 25075/50000 [4:32:58<4:32:09,  1.53it/s]


 50%|████████████████▌                | 25076/50000 [4:32:59<4:33:21,  1.52it/s]


 50%|████████████████▌                | 25077/50000 [4:32:59<4:31:51,  1.53it/s]


 50%|████████████████▌                | 25078/50000 [4:33:00<4:35:33,  1.51it/s]


 50%|████████████████▌                | 25079/50000 [4:33:01<4:23:59,  1.57it/s]


 50%|████████████████▌                | 25080/50000 [4:33:01<4:23:45,  1.57it/s]


 50%|████████████████▌                | 25081/50000 [4:33:02<4:20:18,  1.60it/s]


 50%|████████████████▌                | 25082/50000 [4:33:02<4:21:10,  1.59it/s]


 50%|████████████████▌                | 25083/50000 [4:33:03<4:46:07,  1.45it/s]


 50%|████████████████▌                | 25084/50000 [4:33:04<4:35:10,  1.51it/s]


 50%|████████████████▌                | 25085/50000 [4:33:04<4:25:29,  1.56it/s]


 50%|████████████████▌                | 25086/50000 [4:33:05<4:25:00,  1.57it/s]


 50%|████████████████▌                | 25087/50000 [4:33:06<4:27:33,  1.55it/s]


 50%|████████████████▌                | 25088/50000 [4:33:07<4:42:17,  1.47it/s]


 50%|████████████████▌                | 25089/50000 [4:33:07<4:47:04,  1.45it/s]


 50%|████████████████▌                | 25090/50000 [4:33:08<4:42:56,  1.47it/s]


 50%|████████████████▌                | 25091/50000 [4:33:09<4:39:13,  1.49it/s]


 50%|████████████████▌                | 25092/50000 [4:33:09<4:24:03,  1.57it/s]


 50%|████████████████▌                | 25093/50000 [4:33:10<4:36:55,  1.50it/s]


 50%|████████████████▌                | 25094/50000 [4:33:11<4:51:28,  1.42it/s]


 50%|████████████████▌                | 25095/50000 [4:33:11<4:38:07,  1.49it/s]


 50%|████████████████▌                | 25096/50000 [4:33:12<4:26:24,  1.56it/s]


 50%|████████████████▌                | 25097/50000 [4:33:12<4:21:18,  1.59it/s]


 50%|████████████████▌                | 25098/50000 [4:33:13<4:22:31,  1.58it/s]


 50%|████████████████▌                | 25099/50000 [4:33:14<4:33:56,  1.51it/s]


 50%|████████████████▌                | 25100/50000 [4:33:14<4:26:33,  1.56it/s]
                                                                                
{'loss': 3.3012, 'grad_norm': 5.023917198181152, 'learning_rate': 0.000498, 'epoch': 1.31}

 50%|████████████████▌                | 25100/50000 [4:33:14<4:26:33,  1.56it/s]


 50%|████████████████▌                | 25101/50000 [4:33:15<4:52:09,  1.42it/s]


 50%|████████████████▌                | 25102/50000 [4:33:16<4:42:32,  1.47it/s]


 50%|████████████████▌                | 25103/50000 [4:33:16<4:28:44,  1.54it/s]


 50%|████████████████▌                | 25104/50000 [4:33:17<4:12:24,  1.64it/s]


 50%|████████████████▌                | 25105/50000 [4:33:18<4:20:57,  1.59it/s]


 50%|████████████████▌                | 25106/50000 [4:33:18<4:20:39,  1.59it/s]


 50%|████████████████▌                | 25107/50000 [4:33:19<4:20:26,  1.59it/s]


 50%|████████████████▌                | 25108/50000 [4:33:19<4:15:32,  1.62it/s]


 50%|████████████████▌                | 25109/50000 [4:33:20<4:19:29,  1.60it/s]


 50%|████████████████▌                | 25110/50000 [4:33:21<4:24:34,  1.57it/s]


 50%|████████████████▌                | 25111/50000 [4:33:21<4:18:25,  1.61it/s]


 50%|████████████████▌                | 25112/50000 [4:33:22<4:13:21,  1.64it/s]


 50%|████████████████▌                | 25113/50000 [4:33:23<4:22:28,  1.58it/s]


 50%|████████████████▌                | 25114/50000 [4:33:23<4:37:46,  1.49it/s]


 50%|████████████████▌                | 25115/50000 [4:33:24<4:18:42,  1.60it/s]


 50%|████████████████▌                | 25116/50000 [4:33:25<4:22:10,  1.58it/s]


 50%|████████████████▌                | 25117/50000 [4:33:25<4:25:33,  1.56it/s]


 50%|████████████████▌                | 25118/50000 [4:33:26<4:35:14,  1.51it/s]


 50%|████████████████▌                | 25119/50000 [4:33:26<4:24:18,  1.57it/s]


 50%|████████████████▌                | 25120/50000 [4:33:27<4:33:14,  1.52it/s]


 50%|████████████████▌                | 25121/50000 [4:33:28<4:25:21,  1.56it/s]


 50%|████████████████▌                | 25122/50000 [4:33:28<4:23:46,  1.57it/s]


 50%|████████████████▌                | 25123/50000 [4:33:29<4:24:56,  1.56it/s]


 50%|████████████████▌                | 25124/50000 [4:33:30<4:36:21,  1.50it/s]


 50%|████████████████▌                | 25125/50000 [4:33:31<4:49:03,  1.43it/s]


 50%|████████████████▌                | 25126/50000 [4:33:31<4:40:20,  1.48it/s]


 50%|████████████████▌                | 25127/50000 [4:33:32<5:00:07,  1.38it/s]


 50%|████████████████▌                | 25128/50000 [4:33:33<4:48:49,  1.44it/s]


 50%|████████████████▌                | 25129/50000 [4:33:33<4:36:54,  1.50it/s]


 50%|████████████████▌                | 25130/50000 [4:33:34<4:33:39,  1.51it/s]


 50%|████████████████▌                | 25131/50000 [4:33:35<4:30:28,  1.53it/s]


 50%|████████████████▌                | 25132/50000 [4:33:35<4:33:30,  1.52it/s]


 50%|████████████████▌                | 25133/50000 [4:33:36<4:40:24,  1.48it/s]


 50%|████████████████▌                | 25134/50000 [4:33:37<4:29:26,  1.54it/s]


 50%|████████████████▌                | 25135/50000 [4:33:37<4:22:04,  1.58it/s]


 50%|████████████████▌                | 25136/50000 [4:33:38<4:12:54,  1.64it/s]


 50%|████████████████▌                | 25137/50000 [4:33:38<4:21:58,  1.58it/s]


 50%|████████████████▌                | 25138/50000 [4:33:39<4:26:49,  1.55it/s]


 50%|████████████████▌                | 25139/50000 [4:33:40<4:26:56,  1.55it/s]


 50%|████████████████▌                | 25140/50000 [4:33:40<4:25:33,  1.56it/s]


 50%|████████████████▌                | 25141/50000 [4:33:41<4:26:49,  1.55it/s]


 50%|████████████████▌                | 25142/50000 [4:33:42<4:51:17,  1.42it/s]


 50%|████████████████▌                | 25143/50000 [4:33:42<4:42:35,  1.47it/s]


 50%|████████████████▌                | 25144/50000 [4:33:43<4:35:41,  1.50it/s]


 50%|████████████████▌                | 25145/50000 [4:33:44<4:25:25,  1.56it/s]


 50%|████████████████▌                | 25146/50000 [4:33:44<4:25:20,  1.56it/s]


 50%|████████████████▌                | 25147/50000 [4:33:45<4:13:29,  1.63it/s]


 50%|████████████████▌                | 25148/50000 [4:33:46<4:28:57,  1.54it/s]


 50%|████████████████▌                | 25149/50000 [4:33:46<4:43:09,  1.46it/s]


 50%|████████████████▌                | 25150/50000 [4:33:47<4:38:56,  1.48it/s]


 50%|████████████████▌                | 25151/50000 [4:33:48<4:38:58,  1.48it/s]


 50%|████████████████▌                | 25152/50000 [4:33:48<4:48:38,  1.43it/s]


 50%|████████████████▌                | 25153/50000 [4:33:49<4:33:55,  1.51it/s]


 50%|████████████████▌                | 25154/50000 [4:33:50<4:29:54,  1.53it/s]


 50%|████████████████▌                | 25155/50000 [4:33:50<4:41:31,  1.47it/s]


 50%|████████████████▌                | 25156/50000 [4:33:51<4:28:14,  1.54it/s]


 50%|████████████████▌                | 25157/50000 [4:33:51<4:15:51,  1.62it/s]


 50%|████████████████▌                | 25158/50000 [4:33:52<4:14:46,  1.63it/s]


 50%|████████████████▌                | 25159/50000 [4:33:53<4:06:23,  1.68it/s]


 50%|████████████████▌                | 25160/50000 [4:33:53<4:12:06,  1.64it/s]


 50%|████████████████▌                | 25161/50000 [4:33:54<4:19:53,  1.59it/s]


 50%|████████████████▌                | 25162/50000 [4:33:55<4:22:51,  1.57it/s]


 50%|████████████████▌                | 25163/50000 [4:33:55<4:08:53,  1.66it/s]


 50%|████████████████▌                | 25164/50000 [4:33:56<4:26:27,  1.55it/s]


 50%|████████████████▌                | 25165/50000 [4:33:56<4:14:09,  1.63it/s]


 50%|████████████████▌                | 25166/50000 [4:33:57<4:15:47,  1.62it/s]


 50%|████████████████▌                | 25167/50000 [4:33:58<4:14:28,  1.63it/s]


 50%|████████████████▌                | 25168/50000 [4:33:58<4:22:14,  1.58it/s]


 50%|████████████████▌                | 25169/50000 [4:33:59<4:26:05,  1.56it/s]


 50%|████████████████▌                | 25170/50000 [4:34:00<4:28:47,  1.54it/s]


 50%|████████████████▌                | 25171/50000 [4:34:00<4:12:57,  1.64it/s]


 50%|████████████████▌                | 25172/50000 [4:34:01<4:18:46,  1.60it/s]


 50%|████████████████▌                | 25173/50000 [4:34:01<4:09:35,  1.66it/s]


 50%|████████████████▌                | 25174/50000 [4:34:02<4:25:18,  1.56it/s]


 50%|████████████████▌                | 25175/50000 [4:34:03<4:23:31,  1.57it/s]


 50%|████████████████▌                | 25176/50000 [4:34:03<4:38:14,  1.49it/s]


 50%|████████████████▌                | 25177/50000 [4:34:04<4:33:23,  1.51it/s]


 50%|████████████████▌                | 25178/50000 [4:34:05<4:26:46,  1.55it/s]


 50%|████████████████▌                | 25179/50000 [4:34:05<4:34:45,  1.51it/s]


 50%|████████████████▌                | 25180/50000 [4:34:06<4:31:42,  1.52it/s]


 50%|████████████████▌                | 25181/50000 [4:34:07<4:20:58,  1.59it/s]


 50%|████████████████▌                | 25182/50000 [4:34:07<4:36:29,  1.50it/s]


 50%|████████████████▌                | 25183/50000 [4:34:08<5:01:09,  1.37it/s]


 50%|████████████████▌                | 25184/50000 [4:34:09<5:02:11,  1.37it/s]


 50%|████████████████▌                | 25185/50000 [4:34:10<4:59:28,  1.38it/s]


 50%|████████████████▌                | 25186/50000 [4:34:10<4:47:19,  1.44it/s]


 50%|████████████████▌                | 25187/50000 [4:34:11<4:41:56,  1.47it/s]


 50%|████████████████▌                | 25188/50000 [4:34:12<4:38:13,  1.49it/s]


 50%|████████████████▌                | 25189/50000 [4:34:12<4:22:19,  1.58it/s]


 50%|████████████████▋                | 25190/50000 [4:34:13<4:14:18,  1.63it/s]


 50%|████████████████▋                | 25191/50000 [4:34:13<4:10:14,  1.65it/s]


 50%|████████████████▋                | 25192/50000 [4:34:14<4:16:49,  1.61it/s]


 50%|████████████████▋                | 25193/50000 [4:34:15<4:11:12,  1.65it/s]


 50%|████████████████▋                | 25194/50000 [4:34:15<4:16:40,  1.61it/s]


 50%|████████████████▋                | 25195/50000 [4:34:16<4:10:17,  1.65it/s]


 50%|████████████████▋                | 25196/50000 [4:34:16<4:19:06,  1.60it/s]


 50%|████████████████▋                | 25197/50000 [4:34:17<4:25:22,  1.56it/s]


 50%|████████████████▋                | 25198/50000 [4:34:18<4:28:10,  1.54it/s]


 50%|████████████████▋                | 25199/50000 [4:34:19<4:35:38,  1.50it/s]


 50%|████████████████▋                | 25200/50000 [4:34:19<4:34:01,  1.51it/s]
                                                                                
{'loss': 3.3066, 'grad_norm': 4.104577541351318, 'learning_rate': 0.000496, 'epoch': 1.32}

 50%|████████████████▋                | 25200/50000 [4:34:19<4:34:01,  1.51it/s]


 50%|████████████████▋                | 25201/50000 [4:34:20<4:29:36,  1.53it/s]


 50%|████████████████▋                | 25202/50000 [4:34:21<4:38:09,  1.49it/s]


 50%|████████████████▋                | 25203/50000 [4:34:21<4:32:44,  1.52it/s]


 50%|████████████████▋                | 25204/50000 [4:34:22<4:53:04,  1.41it/s]


 50%|████████████████▋                | 25205/50000 [4:34:23<4:44:36,  1.45it/s]


 50%|████████████████▋                | 25206/50000 [4:34:23<4:34:44,  1.50it/s]


 50%|████████████████▋                | 25207/50000 [4:34:24<4:22:38,  1.57it/s]


 50%|████████████████▋                | 25208/50000 [4:34:24<4:17:03,  1.61it/s]


 50%|████████████████▋                | 25209/50000 [4:34:25<4:11:38,  1.64it/s]


 50%|████████████████▋                | 25210/50000 [4:34:26<4:19:37,  1.59it/s]


 50%|████████████████▋                | 25211/50000 [4:34:26<4:34:51,  1.50it/s]


 50%|████████████████▋                | 25212/50000 [4:34:27<4:33:23,  1.51it/s]


 50%|████████████████▋                | 25213/50000 [4:34:28<4:22:16,  1.58it/s]


 50%|████████████████▋                | 25214/50000 [4:34:28<4:35:06,  1.50it/s]


 50%|████████████████▋                | 25215/50000 [4:34:29<4:23:33,  1.57it/s]


 50%|████████████████▋                | 25216/50000 [4:34:30<4:19:57,  1.59it/s]


 50%|████████████████▋                | 25217/50000 [4:34:30<4:26:08,  1.55it/s]


 50%|████████████████▋                | 25218/50000 [4:34:31<4:36:13,  1.50it/s]


 50%|████████████████▋                | 25219/50000 [4:34:32<4:35:56,  1.50it/s]


 50%|████████████████▋                | 25220/50000 [4:34:32<4:30:22,  1.53it/s]


 50%|████████████████▋                | 25221/50000 [4:34:33<4:28:50,  1.54it/s]


 50%|████████████████▋                | 25222/50000 [4:34:33<4:18:12,  1.60it/s]


 50%|████████████████▋                | 25223/50000 [4:34:34<4:24:52,  1.56it/s]


 50%|████████████████▋                | 25224/50000 [4:34:35<4:22:22,  1.57it/s]


 50%|████████████████▋                | 25225/50000 [4:34:36<4:38:49,  1.48it/s]


 50%|████████████████▋                | 25226/50000 [4:34:36<4:44:54,  1.45it/s]


 50%|████████████████▋                | 25227/50000 [4:34:37<4:47:20,  1.44it/s]


 50%|████████████████▋                | 25228/50000 [4:34:38<4:31:43,  1.52it/s]


 50%|████████████████▋                | 25229/50000 [4:34:38<4:14:06,  1.62it/s]


 50%|████████████████▋                | 25230/50000 [4:34:39<4:06:52,  1.67it/s]


 50%|████████████████▋                | 25231/50000 [4:34:39<4:15:53,  1.61it/s]


 50%|████████████████▋                | 25232/50000 [4:34:40<4:01:32,  1.71it/s]


 50%|████████████████▋                | 25233/50000 [4:34:40<4:12:11,  1.64it/s]


 50%|████████████████▋                | 25234/50000 [4:34:41<4:15:41,  1.61it/s]


 50%|████████████████▋                | 25235/50000 [4:34:42<4:20:28,  1.58it/s]


 50%|████████████████▋                | 25236/50000 [4:34:42<4:11:23,  1.64it/s]


 50%|████████████████▋                | 25237/50000 [4:34:43<4:16:42,  1.61it/s]


 50%|████████████████▋                | 25238/50000 [4:34:44<4:25:19,  1.56it/s]


 50%|████████████████▋                | 25239/50000 [4:34:44<4:28:57,  1.53it/s]


 50%|████████████████▋                | 25240/50000 [4:34:45<4:28:59,  1.53it/s]


 50%|████████████████▋                | 25241/50000 [4:34:46<4:28:59,  1.53it/s]


 50%|████████████████▋                | 25242/50000 [4:34:46<4:23:28,  1.57it/s]


 50%|████████████████▋                | 25243/50000 [4:34:47<4:23:35,  1.57it/s]


 50%|████████████████▋                | 25244/50000 [4:34:47<4:18:54,  1.59it/s]


 50%|████████████████▋                | 25245/50000 [4:34:48<4:16:25,  1.61it/s]


 50%|████████████████▋                | 25246/50000 [4:34:49<4:20:10,  1.59it/s]


 50%|████████████████▋                | 25247/50000 [4:34:49<4:24:22,  1.56it/s]


 50%|████████████████▋                | 25248/50000 [4:34:50<4:18:43,  1.59it/s]


 50%|████████████████▋                | 25249/50000 [4:34:51<4:14:45,  1.62it/s]


 50%|████████████████▋                | 25250/50000 [4:34:51<4:18:11,  1.60it/s]


 51%|████████████████▋                | 25251/50000 [4:34:52<4:19:08,  1.59it/s]


 51%|████████████████▋                | 25252/50000 [4:34:52<4:18:59,  1.59it/s]


 51%|████████████████▋                | 25253/50000 [4:34:53<4:20:23,  1.58it/s]


 51%|████████████████▋                | 25254/50000 [4:34:54<4:45:41,  1.44it/s]


 51%|████████████████▋                | 25255/50000 [4:34:55<4:52:44,  1.41it/s]


 51%|████████████████▋                | 25256/50000 [4:34:55<4:34:39,  1.50it/s]


 51%|████████████████▋                | 25257/50000 [4:34:56<4:31:33,  1.52it/s]


 51%|████████████████▋                | 25258/50000 [4:34:57<4:30:14,  1.53it/s]


 51%|████████████████▋                | 25259/50000 [4:34:57<4:11:42,  1.64it/s]


 51%|████████████████▋                | 25260/50000 [4:34:58<4:10:05,  1.65it/s]


 51%|████████████████▋                | 25261/50000 [4:34:58<4:25:55,  1.55it/s]


 51%|████████████████▋                | 25262/50000 [4:34:59<4:19:12,  1.59it/s]


 51%|████████████████▋                | 25263/50000 [4:35:00<4:12:10,  1.63it/s]


 51%|████████████████▋                | 25264/50000 [4:35:00<4:06:45,  1.67it/s]


 51%|████████████████▋                | 25265/50000 [4:35:01<4:15:10,  1.62it/s]


 51%|████████████████▋                | 25266/50000 [4:35:02<4:28:26,  1.54it/s]


 51%|████████████████▋                | 25267/50000 [4:35:02<4:41:40,  1.46it/s]


 51%|████████████████▋                | 25268/50000 [4:35:03<4:25:38,  1.55it/s]


 51%|████████████████▋                | 25269/50000 [4:35:03<4:10:56,  1.64it/s]


 51%|████████████████▋                | 25270/50000 [4:35:04<4:23:34,  1.56it/s]


 51%|████████████████▋                | 25271/50000 [4:35:05<4:32:31,  1.51it/s]


 51%|████████████████▋                | 25272/50000 [4:35:06<5:03:48,  1.36it/s]


 51%|████████████████▋                | 25273/50000 [4:35:06<5:03:05,  1.36it/s]


 51%|████████████████▋                | 25274/50000 [4:35:07<4:51:28,  1.41it/s]


 51%|████████████████▋                | 25275/50000 [4:35:08<4:47:57,  1.43it/s]


 51%|████████████████▋                | 25276/50000 [4:35:08<4:35:54,  1.49it/s]


 51%|████████████████▋                | 25277/50000 [4:35:09<4:27:27,  1.54it/s]


 51%|████████████████▋                | 25278/50000 [4:35:10<4:30:55,  1.52it/s]


 51%|████████████████▋                | 25279/50000 [4:35:10<4:34:05,  1.50it/s]


 51%|████████████████▋                | 25280/50000 [4:35:11<4:44:32,  1.45it/s]


 51%|████████████████▋                | 25281/50000 [4:35:12<4:33:42,  1.51it/s]


 51%|████████████████▋                | 25282/50000 [4:35:12<4:24:34,  1.56it/s]


 51%|████████████████▋                | 25283/50000 [4:35:13<4:19:00,  1.59it/s]


 51%|████████████████▋                | 25284/50000 [4:35:14<4:22:56,  1.57it/s]


 51%|████████████████▋                | 25285/50000 [4:35:14<4:08:52,  1.66it/s]


 51%|████████████████▋                | 25286/50000 [4:35:15<3:58:18,  1.73it/s]


 51%|████████████████▋                | 25287/50000 [4:35:15<4:28:43,  1.53it/s]


 51%|████████████████▋                | 25288/50000 [4:35:16<4:25:45,  1.55it/s]


 51%|████████████████▋                | 25289/50000 [4:35:17<4:25:17,  1.55it/s]


 51%|████████████████▋                | 25290/50000 [4:35:17<4:16:23,  1.61it/s]


 51%|████████████████▋                | 25291/50000 [4:35:18<4:29:53,  1.53it/s]


 51%|████████████████▋                | 25292/50000 [4:35:19<4:31:18,  1.52it/s]


 51%|████████████████▋                | 25293/50000 [4:35:19<4:29:56,  1.53it/s]


 51%|████████████████▋                | 25294/50000 [4:35:20<4:28:57,  1.53it/s]


 51%|████████████████▋                | 25295/50000 [4:35:21<4:30:58,  1.52it/s]


 51%|████████████████▋                | 25296/50000 [4:35:21<4:31:58,  1.51it/s]


 51%|████████████████▋                | 25297/50000 [4:35:22<4:19:43,  1.59it/s]


 51%|████████████████▋                | 25298/50000 [4:35:22<4:18:46,  1.59it/s]


 51%|████████████████▋                | 25299/50000 [4:35:23<4:28:57,  1.53it/s]


 51%|████████████████▋                | 25300/50000 [4:35:24<4:52:23,  1.41it/s]
                                                                                
{'loss': 3.3308, 'grad_norm': 5.3140130043029785, 'learning_rate': 0.000494, 'epoch': 1.32}

 51%|████████████████▋                | 25300/50000 [4:35:24<4:52:23,  1.41it/s]


 51%|████████████████▋                | 25301/50000 [4:35:25<4:45:53,  1.44it/s]


 51%|████████████████▋                | 25302/50000 [4:35:25<4:34:39,  1.50it/s]


 51%|████████████████▋                | 25303/50000 [4:35:26<4:25:19,  1.55it/s]


 51%|████████████████▋                | 25304/50000 [4:35:26<4:10:25,  1.64it/s]


 51%|████████████████▋                | 25305/50000 [4:35:27<3:57:47,  1.73it/s]


 51%|████████████████▋                | 25306/50000 [4:35:27<4:03:10,  1.69it/s]


 51%|████████████████▋                | 25307/50000 [4:35:28<4:08:22,  1.66it/s]


 51%|████████████████▋                | 25308/50000 [4:35:29<4:27:18,  1.54it/s]


 51%|████████████████▋                | 25309/50000 [4:35:29<4:20:51,  1.58it/s]


 51%|████████████████▋                | 25310/50000 [4:35:30<4:24:55,  1.55it/s]


 51%|████████████████▋                | 25311/50000 [4:35:31<4:34:18,  1.50it/s]


 51%|████████████████▋                | 25312/50000 [4:35:32<4:33:34,  1.50it/s]


 51%|████████████████▋                | 25313/50000 [4:35:32<4:25:58,  1.55it/s]


 51%|████████████████▋                | 25314/50000 [4:35:33<4:19:51,  1.58it/s]


 51%|████████████████▋                | 25315/50000 [4:35:33<4:22:14,  1.57it/s]


 51%|████████████████▋                | 25316/50000 [4:35:34<4:22:25,  1.57it/s]


 51%|████████████████▋                | 25317/50000 [4:35:35<4:23:36,  1.56it/s]


 51%|████████████████▋                | 25318/50000 [4:35:35<4:33:25,  1.50it/s]


 51%|████████████████▋                | 25319/50000 [4:35:36<4:21:12,  1.57it/s]


 51%|████████████████▋                | 25320/50000 [4:35:37<4:25:56,  1.55it/s]


 51%|████████████████▋                | 25321/50000 [4:35:37<4:13:39,  1.62it/s]


 51%|████████████████▋                | 25322/50000 [4:35:38<4:08:10,  1.66it/s]


 51%|████████████████▋                | 25323/50000 [4:35:38<4:08:10,  1.66it/s]


 51%|████████████████▋                | 25324/50000 [4:35:39<4:29:40,  1.53it/s]


 51%|████████████████▋                | 25325/50000 [4:35:40<4:22:28,  1.57it/s]


 51%|████████████████▋                | 25326/50000 [4:35:40<4:34:10,  1.50it/s]


 51%|████████████████▋                | 25327/50000 [4:35:41<4:15:57,  1.61it/s]


 51%|████████████████▋                | 25328/50000 [4:35:42<4:13:34,  1.62it/s]


 51%|████████████████▋                | 25329/50000 [4:35:42<4:22:10,  1.57it/s]


 51%|████████████████▋                | 25330/50000 [4:35:43<4:25:00,  1.55it/s]


 51%|████████████████▋                | 25331/50000 [4:35:44<4:34:07,  1.50it/s]


 51%|████████████████▋                | 25332/50000 [4:35:44<4:30:18,  1.52it/s]


 51%|████████████████▋                | 25333/50000 [4:35:45<4:19:01,  1.59it/s]


 51%|████████████████▋                | 25334/50000 [4:35:45<4:19:38,  1.58it/s]


 51%|████████████████▋                | 25335/50000 [4:35:46<4:09:57,  1.64it/s]


 51%|████████████████▋                | 25336/50000 [4:35:47<4:23:36,  1.56it/s]


 51%|████████████████▋                | 25337/50000 [4:35:47<4:08:11,  1.66it/s]


 51%|████████████████▋                | 25338/50000 [4:35:48<4:03:23,  1.69it/s]


 51%|████████████████▋                | 25339/50000 [4:35:48<4:09:58,  1.64it/s]


 51%|████████████████▋                | 25340/50000 [4:35:49<4:27:24,  1.54it/s]


 51%|████████████████▋                | 25341/50000 [4:35:50<4:50:57,  1.41it/s]


 51%|████████████████▋                | 25342/50000 [4:35:51<4:33:58,  1.50it/s]


 51%|████████████████▋                | 25343/50000 [4:35:51<4:22:05,  1.57it/s]


 51%|████████████████▋                | 25344/50000 [4:35:52<4:10:37,  1.64it/s]


 51%|████████████████▋                | 25345/50000 [4:35:52<4:12:40,  1.63it/s]


 51%|████████████████▋                | 25346/50000 [4:35:53<4:25:06,  1.55it/s]


 51%|████████████████▋                | 25347/50000 [4:35:54<4:19:38,  1.58it/s]


 51%|████████████████▋                | 25348/50000 [4:35:54<4:13:33,  1.62it/s]


 51%|████████████████▋                | 25349/50000 [4:35:55<4:17:50,  1.59it/s]


 51%|████████████████▋                | 25350/50000 [4:35:56<4:33:43,  1.50it/s]


 51%|████████████████▋                | 25351/50000 [4:35:56<4:28:54,  1.53it/s]


 51%|████████████████▋                | 25352/50000 [4:35:57<4:26:20,  1.54it/s]


 51%|████████████████▋                | 25353/50000 [4:35:58<4:27:33,  1.54it/s]


 51%|████████████████▋                | 25354/50000 [4:35:58<4:30:03,  1.52it/s]


 51%|████████████████▋                | 25355/50000 [4:35:59<4:38:56,  1.47it/s]


 51%|████████████████▋                | 25356/50000 [4:36:00<4:37:27,  1.48it/s]


 51%|████████████████▋                | 25357/50000 [4:36:00<4:30:40,  1.52it/s]


 51%|████████████████▋                | 25358/50000 [4:36:01<4:33:42,  1.50it/s]


 51%|████████████████▋                | 25359/50000 [4:36:02<4:29:35,  1.52it/s]


 51%|████████████████▋                | 25360/50000 [4:36:02<4:18:09,  1.59it/s]


 51%|████████████████▋                | 25361/50000 [4:36:03<4:45:58,  1.44it/s]


 51%|████████████████▋                | 25362/50000 [4:36:04<4:30:26,  1.52it/s]


 51%|████████████████▋                | 25363/50000 [4:36:04<4:25:46,  1.55it/s]


 51%|████████████████▋                | 25364/50000 [4:36:05<4:07:15,  1.66it/s]


 51%|████████████████▋                | 25365/50000 [4:36:05<4:24:03,  1.55it/s]


 51%|████████████████▋                | 25366/50000 [4:36:06<4:16:01,  1.60it/s]


 51%|████████████████▋                | 25367/50000 [4:36:07<4:22:37,  1.56it/s]


 51%|████████████████▋                | 25368/50000 [4:36:07<4:25:37,  1.55it/s]


 51%|████████████████▋                | 25369/50000 [4:36:08<4:50:24,  1.41it/s]


 51%|████████████████▋                | 25370/50000 [4:36:09<4:42:09,  1.45it/s]


 51%|████████████████▋                | 25371/50000 [4:36:10<4:36:26,  1.48it/s]


 51%|████████████████▋                | 25372/50000 [4:36:10<4:31:43,  1.51it/s]


 51%|████████████████▋                | 25373/50000 [4:36:11<4:39:34,  1.47it/s]


 51%|████████████████▋                | 25374/50000 [4:36:11<4:18:04,  1.59it/s]


 51%|████████████████▋                | 25375/50000 [4:36:12<4:09:30,  1.64it/s]


 51%|████████████████▋                | 25376/50000 [4:36:13<4:07:55,  1.66it/s]


 51%|████████████████▋                | 25377/50000 [4:36:13<4:07:04,  1.66it/s]


 51%|████████████████▋                | 25378/50000 [4:36:14<4:24:27,  1.55it/s]


 51%|████████████████▊                | 25379/50000 [4:36:15<5:03:12,  1.35it/s]


 51%|████████████████▊                | 25380/50000 [4:36:15<4:48:38,  1.42it/s]


 51%|████████████████▊                | 25381/50000 [4:36:16<4:41:39,  1.46it/s]


 51%|████████████████▊                | 25382/50000 [4:36:17<4:33:32,  1.50it/s]


 51%|████████████████▊                | 25383/50000 [4:36:17<4:31:43,  1.51it/s]


 51%|████████████████▊                | 25384/50000 [4:36:18<4:34:37,  1.49it/s]


 51%|████████████████▊                | 25385/50000 [4:36:19<4:20:15,  1.58it/s]


 51%|████████████████▊                | 25386/50000 [4:36:19<4:18:42,  1.59it/s]


 51%|████████████████▊                | 25387/50000 [4:36:20<4:09:25,  1.64it/s]


 51%|████████████████▊                | 25388/50000 [4:36:20<4:13:27,  1.62it/s]


 51%|████████████████▊                | 25389/50000 [4:36:21<4:15:59,  1.60it/s]


 51%|████████████████▊                | 25390/50000 [4:36:22<4:17:03,  1.60it/s]


 51%|████████████████▊                | 25391/50000 [4:36:23<4:46:37,  1.43it/s]


 51%|████████████████▊                | 25392/50000 [4:36:23<4:34:36,  1.49it/s]


 51%|████████████████▊                | 25393/50000 [4:36:24<4:24:40,  1.55it/s]


 51%|████████████████▊                | 25394/50000 [4:36:24<4:21:35,  1.57it/s]


 51%|████████████████▊                | 25395/50000 [4:36:25<4:33:39,  1.50it/s]


 51%|████████████████▊                | 25396/50000 [4:36:26<4:33:24,  1.50it/s]


 51%|████████████████▊                | 25397/50000 [4:36:26<4:33:52,  1.50it/s]


 51%|████████████████▊                | 25398/50000 [4:36:27<4:34:14,  1.50it/s]


 51%|████████████████▊                | 25399/50000 [4:36:28<4:20:30,  1.57it/s]


 51%|████████████████▊                | 25400/50000 [4:36:28<4:12:28,  1.62it/s]
                                                                                
{'loss': 3.3045, 'grad_norm': 3.5138819217681885, 'learning_rate': 0.000492, 'epoch': 1.33}

 51%|████████████████▊                | 25400/50000 [4:36:28<4:12:28,  1.62it/s]


 51%|████████████████▊                | 25401/50000 [4:36:29<4:07:40,  1.66it/s]


 51%|████████████████▊                | 25402/50000 [4:36:30<4:37:35,  1.48it/s]


 51%|████████████████▊                | 25403/50000 [4:36:30<4:34:26,  1.49it/s]


 51%|████████████████▊                | 25404/50000 [4:36:31<4:23:43,  1.55it/s]


 51%|████████████████▊                | 25405/50000 [4:36:32<4:39:53,  1.46it/s]


 51%|████████████████▊                | 25406/50000 [4:36:32<4:48:55,  1.42it/s]


 51%|████████████████▊                | 25407/50000 [4:36:33<4:40:05,  1.46it/s]


 51%|████████████████▊                | 25408/50000 [4:36:34<4:29:59,  1.52it/s]


 51%|████████████████▊                | 25409/50000 [4:36:34<4:26:56,  1.54it/s]


 51%|████████████████▊                | 25410/50000 [4:36:35<4:30:58,  1.51it/s]


 51%|████████████████▊                | 25411/50000 [4:36:36<4:33:48,  1.50it/s]


 51%|████████████████▊                | 25412/50000 [4:36:36<4:26:20,  1.54it/s]


 51%|████████████████▊                | 25413/50000 [4:36:37<4:09:55,  1.64it/s]


 51%|████████████████▊                | 25414/50000 [4:36:37<4:19:15,  1.58it/s]


 51%|████████████████▊                | 25415/50000 [4:36:38<4:51:34,  1.41it/s]


 51%|████████████████▊                | 25416/50000 [4:36:39<4:54:14,  1.39it/s]


 51%|████████████████▊                | 25417/50000 [4:36:40<4:44:46,  1.44it/s]


 51%|████████████████▊                | 25418/50000 [4:36:40<4:47:36,  1.42it/s]


 51%|████████████████▊                | 25419/50000 [4:36:41<4:37:53,  1.47it/s]


 51%|████████████████▊                | 25420/50000 [4:36:42<4:34:51,  1.49it/s]


 51%|████████████████▊                | 25421/50000 [4:36:42<4:25:00,  1.55it/s]


 51%|████████████████▊                | 25422/50000 [4:36:43<4:25:03,  1.55it/s]


 51%|████████████████▊                | 25423/50000 [4:36:44<4:38:34,  1.47it/s]


 51%|████████████████▊                | 25424/50000 [4:36:44<4:45:32,  1.43it/s]


 51%|████████████████▊                | 25425/50000 [4:36:45<4:36:57,  1.48it/s]


 51%|████████████████▊                | 25426/50000 [4:36:46<4:34:00,  1.49it/s]


 51%|████████████████▊                | 25427/50000 [4:36:47<4:43:07,  1.45it/s]


 51%|████████████████▊                | 25428/50000 [4:36:47<4:30:39,  1.51it/s]


 51%|████████████████▊                | 25429/50000 [4:36:48<4:27:27,  1.53it/s]


 51%|████████████████▊                | 25430/50000 [4:36:48<4:24:13,  1.55it/s]


 51%|████████████████▊                | 25431/50000 [4:36:49<4:25:02,  1.54it/s]


 51%|████████████████▊                | 25432/50000 [4:36:50<4:12:27,  1.62it/s]


 51%|████████████████▊                | 25433/50000 [4:36:50<4:17:33,  1.59it/s]


 51%|████████████████▊                | 25434/50000 [4:36:51<4:36:22,  1.48it/s]


 51%|████████████████▊                | 25435/50000 [4:36:52<4:26:35,  1.54it/s]


 51%|████████████████▊                | 25436/50000 [4:36:52<4:29:48,  1.52it/s]


 51%|████████████████▊                | 25437/50000 [4:36:53<4:19:01,  1.58it/s]


 51%|████████████████▊                | 25438/50000 [4:36:53<4:12:24,  1.62it/s]


 51%|████████████████▊                | 25439/50000 [4:36:54<4:17:50,  1.59it/s]


 51%|████████████████▊                | 25440/50000 [4:36:55<4:09:59,  1.64it/s]


 51%|████████████████▊                | 25441/50000 [4:36:55<4:18:24,  1.58it/s]


 51%|████████████████▊                | 25442/50000 [4:36:56<4:12:40,  1.62it/s]


 51%|████████████████▊                | 25443/50000 [4:36:57<4:17:00,  1.59it/s]


 51%|████████████████▊                | 25444/50000 [4:36:57<4:09:51,  1.64it/s]


 51%|████████████████▊                | 25445/50000 [4:36:58<4:15:06,  1.60it/s]


 51%|████████████████▊                | 25446/50000 [4:36:59<4:26:41,  1.53it/s]


 51%|████████████████▊                | 25447/50000 [4:36:59<4:30:59,  1.51it/s]


 51%|████████████████▊                | 25448/50000 [4:37:00<4:32:03,  1.50it/s]


 51%|████████████████▊                | 25449/50000 [4:37:00<4:24:16,  1.55it/s]


 51%|████████████████▊                | 25450/50000 [4:37:01<4:27:25,  1.53it/s]


 51%|████████████████▊                | 25451/50000 [4:37:02<4:26:27,  1.54it/s]


 51%|████████████████▊                | 25452/50000 [4:37:02<4:24:15,  1.55it/s]


 51%|████████████████▊                | 25453/50000 [4:37:03<4:23:24,  1.55it/s]


 51%|████████████████▊                | 25454/50000 [4:37:04<4:24:26,  1.55it/s]


 51%|████████████████▊                | 25455/50000 [4:37:04<4:25:04,  1.54it/s]


 51%|████████████████▊                | 25456/50000 [4:37:05<4:28:40,  1.52it/s]


 51%|████████████████▊                | 25457/50000 [4:37:06<4:50:19,  1.41it/s]


 51%|████████████████▊                | 25458/50000 [4:37:07<4:44:47,  1.44it/s]


 51%|████████████████▊                | 25459/50000 [4:37:07<4:31:34,  1.51it/s]


 51%|████████████████▊                | 25460/50000 [4:37:08<4:21:34,  1.56it/s]


 51%|████████████████▊                | 25461/50000 [4:37:08<4:31:54,  1.50it/s]


 51%|████████████████▊                | 25462/50000 [4:37:09<4:22:35,  1.56it/s]


 51%|████████████████▊                | 25463/50000 [4:37:10<4:26:09,  1.54it/s]


 51%|████████████████▊                | 25464/50000 [4:37:10<4:19:16,  1.58it/s]


 51%|████████████████▊                | 25465/50000 [4:37:11<4:30:17,  1.51it/s]


 51%|████████████████▊                | 25466/50000 [4:37:12<4:20:44,  1.57it/s]


 51%|████████████████▊                | 25467/50000 [4:37:12<4:25:36,  1.54it/s]


 51%|████████████████▊                | 25468/50000 [4:37:13<4:29:50,  1.52it/s]


 51%|████████████████▊                | 25469/50000 [4:37:14<4:28:04,  1.53it/s]


 51%|████████████████▊                | 25470/50000 [4:37:14<4:31:34,  1.51it/s]


 51%|████████████████▊                | 25471/50000 [4:37:15<4:20:48,  1.57it/s]


 51%|████████████████▊                | 25472/50000 [4:37:16<4:23:17,  1.55it/s]


 51%|████████████████▊                | 25473/50000 [4:37:16<4:35:24,  1.48it/s]


 51%|████████████████▊                | 25474/50000 [4:37:17<4:57:42,  1.37it/s]


 51%|████████████████▊                | 25475/50000 [4:37:18<4:40:36,  1.46it/s]


 51%|████████████████▊                | 25476/50000 [4:37:18<4:43:24,  1.44it/s]


 51%|████████████████▊                | 25477/50000 [4:37:19<4:27:30,  1.53it/s]


 51%|████████████████▊                | 25478/50000 [4:37:20<4:27:09,  1.53it/s]


 51%|████████████████▊                | 25479/50000 [4:37:20<4:21:29,  1.56it/s]


 51%|████████████████▊                | 25480/50000 [4:37:21<4:19:55,  1.57it/s]


 51%|████████████████▊                | 25481/50000 [4:37:22<4:21:50,  1.56it/s]


 51%|████████████████▊                | 25482/50000 [4:37:22<4:06:36,  1.66it/s]


 51%|████████████████▊                | 25483/50000 [4:37:23<4:04:03,  1.67it/s]


 51%|████████████████▊                | 25484/50000 [4:37:23<4:09:57,  1.63it/s]


 51%|████████████████▊                | 25485/50000 [4:37:24<4:14:51,  1.60it/s]


 51%|████████████████▊                | 25486/50000 [4:37:24<4:06:07,  1.66it/s]


 51%|████████████████▊                | 25487/50000 [4:37:25<4:35:28,  1.48it/s]


 51%|████████████████▊                | 25488/50000 [4:37:26<4:29:47,  1.51it/s]


 51%|████████████████▊                | 25489/50000 [4:37:27<4:59:54,  1.36it/s]


 51%|████████████████▊                | 25490/50000 [4:37:27<4:36:41,  1.48it/s]


 51%|████████████████▊                | 25491/50000 [4:37:28<4:21:16,  1.56it/s]


 51%|████████████████▊                | 25492/50000 [4:37:29<4:25:52,  1.54it/s]


 51%|████████████████▊                | 25493/50000 [4:37:29<4:35:34,  1.48it/s]


 51%|████████████████▊                | 25494/50000 [4:37:30<4:30:30,  1.51it/s]


 51%|████████████████▊                | 25495/50000 [4:37:31<4:42:46,  1.44it/s]


 51%|████████████████▊                | 25496/50000 [4:37:31<4:32:23,  1.50it/s]


 51%|████████████████▊                | 25497/50000 [4:37:32<4:22:02,  1.56it/s]


 51%|████████████████▊                | 25498/50000 [4:37:33<4:39:08,  1.46it/s]


 51%|████████████████▊                | 25499/50000 [4:37:33<4:34:31,  1.49it/s]


 51%|████████████████▊                | 25500/50000 [4:37:34<4:45:19,  1.43it/s]
                                                                                
{'loss': 3.3065, 'grad_norm': 4.011852264404297, 'learning_rate': 0.00049, 'epoch': 1.34}

 51%|████████████████▊                | 25500/50000 [4:37:34<4:45:19,  1.43it/s]


 51%|████████████████▊                | 25501/50000 [4:37:35<4:39:37,  1.46it/s]


 51%|████████████████▊                | 25502/50000 [4:37:35<4:26:18,  1.53it/s]


 51%|████████████████▊                | 25503/50000 [4:37:36<4:30:12,  1.51it/s]


 51%|████████████████▊                | 25504/50000 [4:37:37<4:30:33,  1.51it/s]


 51%|████████████████▊                | 25505/50000 [4:37:37<4:21:30,  1.56it/s]


 51%|████████████████▊                | 25506/50000 [4:37:38<4:05:38,  1.66it/s]


 51%|████████████████▊                | 25507/50000 [4:37:38<4:02:30,  1.68it/s]


 51%|████████████████▊                | 25508/50000 [4:37:39<4:01:11,  1.69it/s]


 51%|████████████████▊                | 25509/50000 [4:37:40<4:11:47,  1.62it/s]


 51%|████████████████▊                | 25510/50000 [4:37:40<4:03:47,  1.67it/s]


 51%|████████████████▊                | 25511/50000 [4:37:41<4:08:11,  1.64it/s]


 51%|████████████████▊                | 25512/50000 [4:37:42<4:20:38,  1.57it/s]


 51%|████████████████▊                | 25513/50000 [4:37:42<4:13:54,  1.61it/s]


 51%|████████████████▊                | 25514/50000 [4:37:43<4:20:39,  1.57it/s]


 51%|████████████████▊                | 25515/50000 [4:37:43<4:21:33,  1.56it/s]


 51%|████████████████▊                | 25516/50000 [4:37:44<4:19:08,  1.57it/s]


 51%|████████████████▊                | 25517/50000 [4:37:45<4:25:05,  1.54it/s]


 51%|████████████████▊                | 25518/50000 [4:37:45<4:18:24,  1.58it/s]


 51%|████████████████▊                | 25519/50000 [4:37:46<4:22:59,  1.55it/s]


 51%|████████████████▊                | 25520/50000 [4:37:47<4:14:48,  1.60it/s]


 51%|████████████████▊                | 25521/50000 [4:37:47<4:28:40,  1.52it/s]


 51%|████████████████▊                | 25522/50000 [4:37:48<4:25:34,  1.54it/s]


 51%|████████████████▊                | 25523/50000 [4:37:49<4:21:52,  1.56it/s]


 51%|████████████████▊                | 25524/50000 [4:37:49<4:31:56,  1.50it/s]


 51%|████████████████▊                | 25525/50000 [4:37:50<4:22:25,  1.55it/s]


 51%|████████████████▊                | 25526/50000 [4:37:51<4:22:40,  1.55it/s]


 51%|████████████████▊                | 25527/50000 [4:37:51<4:22:31,  1.55it/s]


 51%|████████████████▊                | 25528/50000 [4:37:52<4:16:04,  1.59it/s]


 51%|████████████████▊                | 25529/50000 [4:37:52<4:20:47,  1.56it/s]


 51%|████████████████▊                | 25530/50000 [4:37:53<4:14:06,  1.60it/s]


 51%|████████████████▊                | 25531/50000 [4:37:54<4:18:46,  1.58it/s]


 51%|████████████████▊                | 25532/50000 [4:37:54<4:18:53,  1.58it/s]


 51%|████████████████▊                | 25533/50000 [4:37:55<4:20:31,  1.57it/s]


 51%|████████████████▊                | 25534/50000 [4:37:56<4:10:54,  1.63it/s]


 51%|████████████████▊                | 25535/50000 [4:37:56<4:07:09,  1.65it/s]


 51%|████████████████▊                | 25536/50000 [4:37:57<4:19:58,  1.57it/s]


 51%|████████████████▊                | 25537/50000 [4:37:57<4:10:24,  1.63it/s]


 51%|████████████████▊                | 25538/50000 [4:37:58<4:26:23,  1.53it/s]


 51%|████████████████▊                | 25539/50000 [4:37:59<4:19:28,  1.57it/s]


 51%|████████████████▊                | 25540/50000 [4:38:00<4:47:12,  1.42it/s]


 51%|████████████████▊                | 25541/50000 [4:38:00<4:33:32,  1.49it/s]


 51%|████████████████▊                | 25542/50000 [4:38:01<4:25:23,  1.54it/s]


 51%|████████████████▊                | 25543/50000 [4:38:02<4:36:56,  1.47it/s]


 51%|████████████████▊                | 25544/50000 [4:38:02<4:35:26,  1.48it/s]


 51%|████████████████▊                | 25545/50000 [4:38:03<4:44:32,  1.43it/s]


 51%|████████████████▊                | 25546/50000 [4:38:04<4:28:14,  1.52it/s]


 51%|████████████████▊                | 25547/50000 [4:38:04<4:16:14,  1.59it/s]


 51%|████████████████▊                | 25548/50000 [4:38:05<4:09:19,  1.63it/s]


 51%|████████████████▊                | 25549/50000 [4:38:05<4:12:11,  1.62it/s]


 51%|████████████████▊                | 25550/50000 [4:38:06<4:07:19,  1.65it/s]


 51%|████████████████▊                | 25551/50000 [4:38:06<4:00:45,  1.69it/s]


 51%|████████████████▊                | 25552/50000 [4:38:07<4:05:23,  1.66it/s]


 51%|████████████████▊                | 25553/50000 [4:38:08<4:05:31,  1.66it/s]


 51%|████████████████▊                | 25554/50000 [4:38:08<4:10:15,  1.63it/s]


 51%|████████████████▊                | 25555/50000 [4:38:09<4:27:07,  1.53it/s]


 51%|████████████████▊                | 25556/50000 [4:38:10<4:19:13,  1.57it/s]


 51%|████████████████▊                | 25557/50000 [4:38:10<4:46:20,  1.42it/s]


 51%|████████████████▊                | 25558/50000 [4:38:11<4:20:01,  1.57it/s]


 51%|████████████████▊                | 25559/50000 [4:38:12<4:34:47,  1.48it/s]


 51%|████████████████▊                | 25560/50000 [4:38:12<4:30:03,  1.51it/s]


 51%|████████████████▊                | 25561/50000 [4:38:13<4:28:31,  1.52it/s]


 51%|████████████████▊                | 25562/50000 [4:38:14<4:25:10,  1.54it/s]


 51%|████████████████▊                | 25563/50000 [4:38:14<4:21:43,  1.56it/s]


 51%|████████████████▊                | 25564/50000 [4:38:15<4:21:55,  1.55it/s]


 51%|████████████████▊                | 25565/50000 [4:38:16<4:36:13,  1.47it/s]


 51%|████████████████▊                | 25566/50000 [4:38:16<4:32:53,  1.49it/s]


 51%|████████████████▊                | 25567/50000 [4:38:17<4:43:34,  1.44it/s]


 51%|████████████████▊                | 25568/50000 [4:38:18<4:40:46,  1.45it/s]


 51%|████████████████▉                | 25569/50000 [4:38:18<4:44:38,  1.43it/s]


 51%|████████████████▉                | 25570/50000 [4:38:19<4:30:37,  1.50it/s]


 51%|████████████████▉                | 25571/50000 [4:38:20<4:29:21,  1.51it/s]


 51%|████████████████▉                | 25572/50000 [4:38:20<4:39:41,  1.46it/s]


 51%|████████████████▉                | 25573/50000 [4:38:21<4:37:16,  1.47it/s]


 51%|████████████████▉                | 25574/50000 [4:38:22<4:25:27,  1.53it/s]


 51%|████████████████▉                | 25575/50000 [4:38:22<4:27:46,  1.52it/s]


 51%|████████████████▉                | 25576/50000 [4:38:23<4:09:45,  1.63it/s]


 51%|████████████████▉                | 25577/50000 [4:38:24<4:16:46,  1.59it/s]


 51%|████████████████▉                | 25578/50000 [4:38:24<4:29:00,  1.51it/s]


 51%|████████████████▉                | 25579/50000 [4:38:25<4:52:44,  1.39it/s]


 51%|████████████████▉                | 25580/50000 [4:38:26<4:36:05,  1.47it/s]


 51%|████████████████▉                | 25581/50000 [4:38:26<4:43:15,  1.44it/s]


 51%|████████████████▉                | 25582/50000 [4:38:27<4:42:05,  1.44it/s]


 51%|████████████████▉                | 25583/50000 [4:38:28<4:30:07,  1.51it/s]


 51%|████████████████▉                | 25584/50000 [4:38:28<4:28:38,  1.51it/s]


 51%|████████████████▉                | 25585/50000 [4:38:29<4:15:38,  1.59it/s]


 51%|████████████████▉                | 25586/50000 [4:38:30<4:26:56,  1.52it/s]


 51%|████████████████▉                | 25587/50000 [4:38:30<4:42:29,  1.44it/s]


 51%|████████████████▉                | 25588/50000 [4:38:31<4:25:17,  1.53it/s]


 51%|████████████████▉                | 25589/50000 [4:38:32<4:37:25,  1.47it/s]


 51%|████████████████▉                | 25590/50000 [4:38:32<4:24:00,  1.54it/s]


 51%|████████████████▉                | 25591/50000 [4:38:33<4:28:28,  1.52it/s]


 51%|████████████████▉                | 25592/50000 [4:38:34<4:28:32,  1.51it/s]


 51%|████████████████▉                | 25593/50000 [4:38:34<4:27:47,  1.52it/s]


 51%|████████████████▉                | 25594/50000 [4:38:35<4:40:46,  1.45it/s]


 51%|████████████████▉                | 25595/50000 [4:38:36<4:46:40,  1.42it/s]


 51%|████████████████▉                | 25596/50000 [4:38:36<4:23:56,  1.54it/s]


 51%|████████████████▉                | 25597/50000 [4:38:37<4:20:46,  1.56it/s]


 51%|████████████████▉                | 25598/50000 [4:38:38<4:43:53,  1.43it/s]


 51%|████████████████▉                | 25599/50000 [4:38:39<4:48:03,  1.41it/s]


 51%|████████████████▉                | 25600/50000 [4:38:39<4:35:08,  1.48it/s]
                                                                                
{'loss': 3.2704, 'grad_norm': 3.5789482593536377, 'learning_rate': 0.000488, 'epoch': 1.34}

 51%|████████████████▉                | 25600/50000 [4:38:39<4:35:08,  1.48it/s]


 51%|████████████████▉                | 25601/50000 [4:38:40<4:33:45,  1.49it/s]


 51%|████████████████▉                | 25602/50000 [4:38:40<4:15:22,  1.59it/s]


 51%|████████████████▉                | 25603/50000 [4:38:41<4:34:14,  1.48it/s]


 51%|████████████████▉                | 25604/50000 [4:38:42<4:28:28,  1.51it/s]


 51%|████████████████▉                | 25605/50000 [4:38:42<4:16:42,  1.58it/s]


 51%|████████████████▉                | 25606/50000 [4:38:43<4:49:26,  1.40it/s]


 51%|████████████████▉                | 25607/50000 [4:38:44<4:29:54,  1.51it/s]


 51%|████████████████▉                | 25608/50000 [4:38:44<4:26:56,  1.52it/s]


 51%|████████████████▉                | 25609/50000 [4:38:45<4:29:02,  1.51it/s]


 51%|████████████████▉                | 25610/50000 [4:38:46<4:31:13,  1.50it/s]


 51%|████████████████▉                | 25611/50000 [4:38:47<4:43:03,  1.44it/s]


 51%|████████████████▉                | 25612/50000 [4:38:47<4:25:36,  1.53it/s]


 51%|████████████████▉                | 25613/50000 [4:38:48<4:15:43,  1.59it/s]


 51%|████████████████▉                | 25614/50000 [4:38:48<4:21:49,  1.55it/s]


 51%|████████████████▉                | 25615/50000 [4:38:49<4:21:27,  1.55it/s]


 51%|████████████████▉                | 25616/50000 [4:38:50<4:33:06,  1.49it/s]


 51%|████████████████▉                | 25617/50000 [4:38:50<4:24:47,  1.53it/s]


 51%|████████████████▉                | 25618/50000 [4:38:51<4:12:03,  1.61it/s]


 51%|████████████████▉                | 25619/50000 [4:38:52<4:19:14,  1.57it/s]


 51%|████████████████▉                | 25620/50000 [4:38:52<4:34:47,  1.48it/s]


 51%|████████████████▉                | 25621/50000 [4:38:53<4:40:52,  1.45it/s]


 51%|████████████████▉                | 25622/50000 [4:38:54<4:34:04,  1.48it/s]


 51%|████████████████▉                | 25623/50000 [4:38:54<4:32:21,  1.49it/s]


 51%|████████████████▉                | 25624/50000 [4:38:55<4:39:51,  1.45it/s]


 51%|████████████████▉                | 25625/50000 [4:38:56<4:35:58,  1.47it/s]


 51%|████████████████▉                | 25626/50000 [4:38:56<4:24:30,  1.54it/s]


 51%|████████████████▉                | 25627/50000 [4:38:57<4:45:05,  1.42it/s]


 51%|████████████████▉                | 25628/50000 [4:38:58<4:26:22,  1.52it/s]


 51%|████████████████▉                | 25629/50000 [4:38:58<4:23:49,  1.54it/s]


 51%|████████████████▉                | 25630/50000 [4:38:59<4:08:39,  1.63it/s]


 51%|████████████████▉                | 25631/50000 [4:38:59<4:04:32,  1.66it/s]


 51%|████████████████▉                | 25632/50000 [4:39:00<4:00:36,  1.69it/s]


 51%|████████████████▉                | 25633/50000 [4:39:01<4:17:22,  1.58it/s]


 51%|████████████████▉                | 25634/50000 [4:39:01<4:17:18,  1.58it/s]


 51%|████████████████▉                | 25635/50000 [4:39:02<4:03:45,  1.67it/s]


 51%|████████████████▉                | 25636/50000 [4:39:02<4:01:13,  1.68it/s]


 51%|████████████████▉                | 25637/50000 [4:39:03<4:11:11,  1.62it/s]


 51%|████████████████▉                | 25638/50000 [4:39:04<4:03:08,  1.67it/s]


 51%|████████████████▉                | 25639/50000 [4:39:04<4:17:43,  1.58it/s]


 51%|████████████████▉                | 25640/50000 [4:39:05<4:32:59,  1.49it/s]


 51%|████████████████▉                | 25641/50000 [4:39:06<4:20:55,  1.56it/s]


 51%|████████████████▉                | 25642/50000 [4:39:06<4:23:54,  1.54it/s]


 51%|████████████████▉                | 25643/50000 [4:39:07<4:31:25,  1.50it/s]


 51%|████████████████▉                | 25644/50000 [4:39:08<4:18:25,  1.57it/s]


 51%|████████████████▉                | 25645/50000 [4:39:08<4:32:27,  1.49it/s]


 51%|████████████████▉                | 25646/50000 [4:39:09<4:32:41,  1.49it/s]


 51%|████████████████▉                | 25647/50000 [4:39:10<4:14:44,  1.59it/s]


 51%|████████████████▉                | 25648/50000 [4:39:10<4:20:18,  1.56it/s]


 51%|████████████████▉                | 25649/50000 [4:39:11<4:12:39,  1.61it/s]


 51%|████████████████▉                | 25650/50000 [4:39:12<4:18:03,  1.57it/s]


 51%|████████████████▉                | 25651/50000 [4:39:12<4:03:00,  1.67it/s]


 51%|████████████████▉                | 25652/50000 [4:39:13<4:00:23,  1.69it/s]


 51%|████████████████▉                | 25653/50000 [4:39:13<4:19:31,  1.56it/s]


 51%|████████████████▉                | 25654/50000 [4:39:14<4:13:22,  1.60it/s]


 51%|████████████████▉                | 25655/50000 [4:39:15<4:14:25,  1.59it/s]


 51%|████████████████▉                | 25656/50000 [4:39:15<4:09:01,  1.63it/s]


 51%|████████████████▉                | 25657/50000 [4:39:16<4:06:42,  1.64it/s]


 51%|████████████████▉                | 25658/50000 [4:39:16<4:09:45,  1.62it/s]


 51%|████████████████▉                | 25659/50000 [4:39:17<4:11:54,  1.61it/s]


 51%|████████████████▉                | 25660/50000 [4:39:18<4:14:53,  1.59it/s]


 51%|████████████████▉                | 25661/50000 [4:39:18<4:24:53,  1.53it/s]


 51%|████████████████▉                | 25662/50000 [4:39:19<4:22:52,  1.54it/s]


 51%|████████████████▉                | 25663/50000 [4:39:20<4:12:48,  1.60it/s]


 51%|████████████████▉                | 25664/50000 [4:39:20<3:57:29,  1.71it/s]


 51%|████████████████▉                | 25665/50000 [4:39:21<4:30:16,  1.50it/s]


 51%|████████████████▉                | 25666/50000 [4:39:22<4:17:03,  1.58it/s]


 51%|████████████████▉                | 25667/50000 [4:39:22<3:58:49,  1.70it/s]


 51%|████████████████▉                | 25668/50000 [4:39:23<4:02:44,  1.67it/s]


 51%|████████████████▉                | 25669/50000 [4:39:23<4:16:16,  1.58it/s]


 51%|████████████████▉                | 25670/50000 [4:39:24<4:15:17,  1.59it/s]


 51%|████████████████▉                | 25671/50000 [4:39:25<4:39:52,  1.45it/s]


 51%|████████████████▉                | 25672/50000 [4:39:25<4:38:19,  1.46it/s]


 51%|████████████████▉                | 25673/50000 [4:39:26<4:27:15,  1.52it/s]


 51%|████████████████▉                | 25674/50000 [4:39:27<4:23:22,  1.54it/s]


 51%|████████████████▉                | 25675/50000 [4:39:27<4:31:24,  1.49it/s]


 51%|████████████████▉                | 25676/50000 [4:39:28<4:23:06,  1.54it/s]


 51%|████████████████▉                | 25677/50000 [4:39:29<4:27:22,  1.52it/s]


 51%|████████████████▉                | 25678/50000 [4:39:29<4:39:14,  1.45it/s]


 51%|████████████████▉                | 25679/50000 [4:39:30<4:37:51,  1.46it/s]


 51%|████████████████▉                | 25680/50000 [4:39:31<4:47:09,  1.41it/s]


 51%|████████████████▉                | 25681/50000 [4:39:31<4:30:19,  1.50it/s]


 51%|████████████████▉                | 25682/50000 [4:39:32<4:26:11,  1.52it/s]


 51%|████████████████▉                | 25683/50000 [4:39:33<4:47:04,  1.41it/s]


 51%|████████████████▉                | 25684/50000 [4:39:34<4:43:19,  1.43it/s]


 51%|████████████████▉                | 25685/50000 [4:39:34<4:46:22,  1.42it/s]


 51%|████████████████▉                | 25686/50000 [4:39:35<4:31:19,  1.49it/s]


 51%|████████████████▉                | 25687/50000 [4:39:35<4:11:55,  1.61it/s]


 51%|████████████████▉                | 25688/50000 [4:39:36<4:27:26,  1.52it/s]


 51%|████████████████▉                | 25689/50000 [4:39:37<4:26:31,  1.52it/s]


 51%|████████████████▉                | 25690/50000 [4:39:37<4:22:50,  1.54it/s]


 51%|████████████████▉                | 25691/50000 [4:39:38<4:31:21,  1.49it/s]


 51%|████████████████▉                | 25692/50000 [4:39:39<4:37:59,  1.46it/s]


 51%|████████████████▉                | 25693/50000 [4:39:40<4:48:00,  1.41it/s]


 51%|████████████████▉                | 25694/50000 [4:39:40<4:43:17,  1.43it/s]


 51%|████████████████▉                | 25695/50000 [4:39:41<4:50:09,  1.40it/s]


 51%|████████████████▉                | 25696/50000 [4:39:42<4:33:10,  1.48it/s]


 51%|████████████████▉                | 25697/50000 [4:39:42<4:31:17,  1.49it/s]


 51%|████████████████▉                | 25698/50000 [4:39:43<4:21:26,  1.55it/s]


 51%|████████████████▉                | 25699/50000 [4:39:43<4:13:01,  1.60it/s]


 51%|████████████████▉                | 25700/50000 [4:39:44<4:30:35,  1.50it/s]


                                                                                
{'loss': 3.2994, 'grad_norm': 3.6215548515319824, 'learning_rate': 0.000486, 'epoch': 1.35}

 51%|████████████████▉                | 25700/50000 [4:39:44<4:30:35,  1.50it/s]


 51%|████████████████▉                | 25701/50000 [4:39:45<4:38:18,  1.46it/s]


 51%|████████████████▉                | 25702/50000 [4:39:46<5:08:44,  1.31it/s]


 51%|████████████████▉                | 25703/50000 [4:39:47<4:51:34,  1.39it/s]


 51%|████████████████▉                | 25704/50000 [4:39:47<4:43:18,  1.43it/s]


 51%|████████████████▉                | 25705/50000 [4:39:48<4:31:22,  1.49it/s]


 51%|████████████████▉                | 25706/50000 [4:39:48<4:29:55,  1.50it/s]


 51%|████████████████▉                | 25707/50000 [4:39:49<4:39:10,  1.45it/s]


 51%|████████████████▉                | 25708/50000 [4:39:50<4:30:52,  1.49it/s]


 51%|████████████████▉                | 25709/50000 [4:39:50<4:21:54,  1.55it/s]


 51%|████████████████▉                | 25710/50000 [4:39:51<4:13:27,  1.60it/s]


 51%|████████████████▉                | 25711/50000 [4:39:51<3:58:35,  1.70it/s]


 51%|████████████████▉                | 25712/50000 [4:39:52<3:58:30,  1.70it/s]


 51%|████████████████▉                | 25713/50000 [4:39:53<4:29:09,  1.50it/s]


 51%|████████████████▉                | 25714/50000 [4:39:54<4:26:46,  1.52it/s]


 51%|████████████████▉                | 25715/50000 [4:39:54<4:37:42,  1.46it/s]


 51%|████████████████▉                | 25716/50000 [4:39:55<4:24:24,  1.53it/s]


 51%|████████████████▉                | 25717/50000 [4:39:56<4:36:02,  1.47it/s]


 51%|████████████████▉                | 25718/50000 [4:39:56<4:30:06,  1.50it/s]


 51%|████████████████▉                | 25719/50000 [4:39:57<4:43:12,  1.43it/s]


 51%|████████████████▉                | 25720/50000 [4:39:58<5:02:00,  1.34it/s]


 51%|████████████████▉                | 25721/50000 [4:39:59<4:53:36,  1.38it/s]


 51%|████████████████▉                | 25722/50000 [4:39:59<4:54:58,  1.37it/s]


 51%|████████████████▉                | 25723/50000 [4:40:00<4:29:55,  1.50it/s]


 51%|████████████████▉                | 25724/50000 [4:40:01<4:35:46,  1.47it/s]


 51%|████████████████▉                | 25725/50000 [4:40:01<4:31:23,  1.49it/s]


 51%|████████████████▉                | 25726/50000 [4:40:02<4:51:52,  1.39it/s]


 51%|████████████████▉                | 25727/50000 [4:40:03<4:34:16,  1.47it/s]


 51%|████████████████▉                | 25728/50000 [4:40:03<4:12:24,  1.60it/s]


 51%|████████████████▉                | 25729/50000 [4:40:04<4:39:36,  1.45it/s]


 51%|████████████████▉                | 25730/50000 [4:40:04<4:19:14,  1.56it/s]


 51%|████████████████▉                | 25731/50000 [4:40:05<4:09:15,  1.62it/s]


 51%|████████████████▉                | 25732/50000 [4:40:06<4:15:58,  1.58it/s]


 51%|████████████████▉                | 25733/50000 [4:40:06<4:17:35,  1.57it/s]


 51%|████████████████▉                | 25734/50000 [4:40:07<4:18:46,  1.56it/s]


 51%|████████████████▉                | 25735/50000 [4:40:08<4:10:22,  1.62it/s]


 51%|████████████████▉                | 25736/50000 [4:40:08<4:06:48,  1.64it/s]


 51%|████████████████▉                | 25737/50000 [4:40:09<4:36:32,  1.46it/s]


 51%|████████████████▉                | 25738/50000 [4:40:10<4:24:20,  1.53it/s]


 51%|████████████████▉                | 25739/50000 [4:40:10<4:36:44,  1.46it/s]


 51%|████████████████▉                | 25740/50000 [4:40:11<5:03:34,  1.33it/s]


 51%|████████████████▉                | 25741/50000 [4:40:12<4:52:21,  1.38it/s]


 51%|████████████████▉                | 25742/50000 [4:40:13<5:03:50,  1.33it/s]


 51%|████████████████▉                | 25743/50000 [4:40:13<4:43:28,  1.43it/s]


 51%|████████████████▉                | 25744/50000 [4:40:14<4:35:21,  1.47it/s]


 51%|████████████████▉                | 25745/50000 [4:40:15<4:21:56,  1.54it/s]


 51%|████████████████▉                | 25746/50000 [4:40:15<4:18:45,  1.56it/s]


 51%|████████████████▉                | 25747/50000 [4:40:16<4:40:09,  1.44it/s]


 51%|████████████████▉                | 25748/50000 [4:40:17<4:49:24,  1.40it/s]


 51%|████████████████▉                | 25749/50000 [4:40:17<4:44:47,  1.42it/s]


 52%|████████████████▉                | 25750/50000 [4:40:18<4:31:38,  1.49it/s]


 52%|████████████████▉                | 25751/50000 [4:40:19<4:17:55,  1.57it/s]


 52%|████████████████▉                | 25752/50000 [4:40:19<4:16:37,  1.57it/s]


 52%|████████████████▉                | 25753/50000 [4:40:20<4:18:09,  1.57it/s]


 52%|████████████████▉                | 25754/50000 [4:40:21<4:19:06,  1.56it/s]


 52%|████████████████▉                | 25755/50000 [4:40:21<4:03:19,  1.66it/s]


 52%|████████████████▉                | 25756/50000 [4:40:22<3:56:28,  1.71it/s]


 52%|████████████████▉                | 25757/50000 [4:40:22<4:04:56,  1.65it/s]


 52%|█████████████████                | 25758/50000 [4:40:23<4:07:39,  1.63it/s]


 52%|█████████████████                | 25759/50000 [4:40:23<4:12:23,  1.60it/s]


 52%|█████████████████                | 25760/50000 [4:40:24<4:14:57,  1.58it/s]


 52%|█████████████████                | 25761/50000 [4:40:25<4:21:24,  1.55it/s]


 52%|█████████████████                | 25762/50000 [4:40:25<4:10:36,  1.61it/s]


 52%|█████████████████                | 25763/50000 [4:40:26<4:03:46,  1.66it/s]


 52%|█████████████████                | 25764/50000 [4:40:27<4:07:25,  1.63it/s]


 52%|█████████████████                | 25765/50000 [4:40:27<4:14:35,  1.59it/s]


 52%|█████████████████                | 25766/50000 [4:40:28<4:09:34,  1.62it/s]


 52%|█████████████████                | 25767/50000 [4:40:29<4:24:43,  1.53it/s]


 52%|█████████████████                | 25768/50000 [4:40:29<4:24:15,  1.53it/s]


 52%|█████████████████                | 25769/50000 [4:40:30<4:22:24,  1.54it/s]


 52%|█████████████████                | 25770/50000 [4:40:30<4:06:14,  1.64it/s]


 52%|█████████████████                | 25771/50000 [4:40:31<4:07:32,  1.63it/s]


 52%|█████████████████                | 25772/50000 [4:40:32<4:03:50,  1.66it/s]


 52%|█████████████████                | 25773/50000 [4:40:32<4:16:32,  1.57it/s]


 52%|█████████████████                | 25774/50000 [4:40:33<4:16:15,  1.58it/s]


 52%|█████████████████                | 25775/50000 [4:40:34<4:17:34,  1.57it/s]


 52%|█████████████████                | 25776/50000 [4:40:34<4:06:16,  1.64it/s]


 52%|█████████████████                | 25777/50000 [4:40:35<3:51:11,  1.75it/s]


 52%|█████████████████                | 25778/50000 [4:40:35<3:53:55,  1.73it/s]


 52%|█████████████████                | 25779/50000 [4:40:36<4:01:21,  1.67it/s]


 52%|█████████████████                | 25780/50000 [4:40:36<4:01:50,  1.67it/s]


 52%|█████████████████                | 25781/50000 [4:40:37<4:16:14,  1.58it/s]


 52%|█████████████████                | 25782/50000 [4:40:38<4:19:55,  1.55it/s]


 52%|█████████████████                | 25783/50000 [4:40:38<4:13:16,  1.59it/s]


 52%|█████████████████                | 25784/50000 [4:40:39<4:19:18,  1.56it/s]


 52%|█████████████████                | 25785/50000 [4:40:40<4:13:33,  1.59it/s]


 52%|█████████████████                | 25786/50000 [4:40:40<4:25:07,  1.52it/s]


 52%|█████████████████                | 25787/50000 [4:40:41<4:31:31,  1.49it/s]


 52%|█████████████████                | 25788/50000 [4:40:42<4:19:22,  1.56it/s]


 52%|█████████████████                | 25789/50000 [4:40:42<4:21:17,  1.54it/s]


 52%|█████████████████                | 25790/50000 [4:40:43<4:23:20,  1.53it/s]


 52%|█████████████████                | 25791/50000 [4:40:44<4:48:35,  1.40it/s]


 52%|█████████████████                | 25792/50000 [4:40:45<4:51:11,  1.39it/s]


 52%|█████████████████                | 25793/50000 [4:40:45<4:42:45,  1.43it/s]


 52%|█████████████████                | 25794/50000 [4:40:46<4:25:27,  1.52it/s]


 52%|█████████████████                | 25795/50000 [4:40:46<4:13:19,  1.59it/s]


 52%|█████████████████                | 25796/50000 [4:40:47<4:14:40,  1.58it/s]


 52%|█████████████████                | 25797/50000 [4:40:48<4:28:20,  1.50it/s]


 52%|█████████████████                | 25798/50000 [4:40:48<4:20:51,  1.55it/s]


 52%|█████████████████                | 25799/50000 [4:40:49<4:21:42,  1.54it/s]


 52%|█████████████████                | 25800/50000 [4:40:50<4:19:06,  1.56it/s]
                                                                                
{'loss': 3.3031, 'grad_norm': 13.483455657958984, 'learning_rate': 0.000484, 'epoch': 1.35}

 52%|█████████████████                | 25800/50000 [4:40:50<4:19:06,  1.56it/s]


 52%|█████████████████                | 25801/50000 [4:40:50<4:19:30,  1.55it/s]


 52%|█████████████████                | 25802/50000 [4:40:51<4:18:18,  1.56it/s]


 52%|█████████████████                | 25803/50000 [4:40:51<4:06:42,  1.63it/s]


 52%|█████████████████                | 25804/50000 [4:40:52<4:08:30,  1.62it/s]


 52%|█████████████████                | 25805/50000 [4:40:53<4:03:43,  1.65it/s]


 52%|█████████████████                | 25806/50000 [4:40:53<4:11:03,  1.61it/s]


 52%|█████████████████                | 25807/50000 [4:40:54<4:06:57,  1.63it/s]


 52%|█████████████████                | 25808/50000 [4:40:55<4:22:46,  1.53it/s]


 52%|█████████████████                | 25809/50000 [4:40:56<4:47:23,  1.40it/s]


 52%|█████████████████                | 25810/50000 [4:40:56<4:27:32,  1.51it/s]


 52%|█████████████████                | 25811/50000 [4:40:57<4:36:32,  1.46it/s]


 52%|█████████████████                | 25812/50000 [4:40:57<4:19:27,  1.55it/s]


 52%|█████████████████                | 25813/50000 [4:40:58<4:09:06,  1.62it/s]


 52%|█████████████████                | 25814/50000 [4:40:59<4:26:45,  1.51it/s]


 52%|█████████████████                | 25815/50000 [4:40:59<4:18:11,  1.56it/s]


 52%|█████████████████                | 25816/50000 [4:41:00<4:09:45,  1.61it/s]


 52%|█████████████████                | 25817/50000 [4:41:01<4:27:47,  1.51it/s]


 52%|█████████████████                | 25818/50000 [4:41:01<4:20:09,  1.55it/s]


 52%|█████████████████                | 25819/50000 [4:41:02<4:27:51,  1.50it/s]


 52%|█████████████████                | 25820/50000 [4:41:03<4:17:09,  1.57it/s]


 52%|█████████████████                | 25821/50000 [4:41:03<4:15:50,  1.58it/s]


 52%|█████████████████                | 25822/50000 [4:41:04<4:17:10,  1.57it/s]


 52%|█████████████████                | 25823/50000 [4:41:04<4:20:23,  1.55it/s]


 52%|█████████████████                | 25824/50000 [4:41:05<4:08:52,  1.62it/s]


 52%|█████████████████                | 25825/50000 [4:41:06<4:12:18,  1.60it/s]


 52%|█████████████████                | 25826/50000 [4:41:06<4:08:30,  1.62it/s]


 52%|█████████████████                | 25827/50000 [4:41:07<4:03:38,  1.65it/s]


 52%|█████████████████                | 25828/50000 [4:41:07<4:01:00,  1.67it/s]


 52%|█████████████████                | 25829/50000 [4:41:08<4:04:29,  1.65it/s]


 52%|█████████████████                | 25830/50000 [4:41:09<4:16:53,  1.57it/s]


 52%|█████████████████                | 25831/50000 [4:41:09<4:30:25,  1.49it/s]


 52%|█████████████████                | 25832/50000 [4:41:10<4:36:48,  1.46it/s]


 52%|█████████████████                | 25833/50000 [4:41:11<4:34:49,  1.47it/s]


 52%|█████████████████                | 25834/50000 [4:41:12<4:39:53,  1.44it/s]


 52%|█████████████████                | 25835/50000 [4:41:12<4:32:28,  1.48it/s]


 52%|█████████████████                | 25836/50000 [4:41:13<4:31:53,  1.48it/s]


 52%|█████████████████                | 25837/50000 [4:41:14<4:23:00,  1.53it/s]


 52%|█████████████████                | 25838/50000 [4:41:14<4:16:43,  1.57it/s]


 52%|█████████████████                | 25839/50000 [4:41:15<4:10:39,  1.61it/s]


 52%|█████████████████                | 25840/50000 [4:41:15<4:07:04,  1.63it/s]


 52%|█████████████████                | 25841/50000 [4:41:16<4:04:18,  1.65it/s]


 52%|█████████████████                | 25842/50000 [4:41:17<4:09:15,  1.62it/s]


 52%|█████████████████                | 25843/50000 [4:41:17<4:26:12,  1.51it/s]


 52%|█████████████████                | 25844/50000 [4:41:18<4:17:44,  1.56it/s]


 52%|█████████████████                | 25845/50000 [4:41:19<4:17:11,  1.57it/s]


 52%|█████████████████                | 25846/50000 [4:41:19<4:06:50,  1.63it/s]


 52%|█████████████████                | 25847/50000 [4:41:20<4:12:39,  1.59it/s]


 52%|█████████████████                | 25848/50000 [4:41:21<4:44:59,  1.41it/s]


 52%|█████████████████                | 25849/50000 [4:41:21<4:50:59,  1.38it/s]


 52%|█████████████████                | 25850/50000 [4:41:22<4:56:44,  1.36it/s]


 52%|█████████████████                | 25851/50000 [4:41:23<4:53:36,  1.37it/s]


 52%|█████████████████                | 25852/50000 [4:41:24<4:42:57,  1.42it/s]


 52%|█████████████████                | 25853/50000 [4:41:24<4:26:24,  1.51it/s]


 52%|█████████████████                | 25854/50000 [4:41:25<4:24:50,  1.52it/s]


 52%|█████████████████                | 25855/50000 [4:41:25<4:34:41,  1.46it/s]


 52%|█████████████████                | 25856/50000 [4:41:26<5:01:08,  1.34it/s]


 52%|█████████████████                | 25857/50000 [4:41:27<4:46:00,  1.41it/s]


 52%|█████████████████                | 25858/50000 [4:41:28<4:31:36,  1.48it/s]


 52%|█████████████████                | 25859/50000 [4:41:28<4:28:22,  1.50it/s]


 52%|█████████████████                | 25860/50000 [4:41:29<4:19:57,  1.55it/s]


 52%|█████████████████                | 25861/50000 [4:41:30<4:31:07,  1.48it/s]


 52%|█████████████████                | 25862/50000 [4:41:30<4:29:26,  1.49it/s]


 52%|█████████████████                | 25863/50000 [4:41:31<4:20:42,  1.54it/s]


 52%|█████████████████                | 25864/50000 [4:41:32<4:33:00,  1.47it/s]


 52%|█████████████████                | 25865/50000 [4:41:32<4:29:47,  1.49it/s]


 52%|█████████████████                | 25866/50000 [4:41:33<4:27:29,  1.50it/s]


 52%|█████████████████                | 25867/50000 [4:41:33<4:19:48,  1.55it/s]


 52%|█████████████████                | 25868/50000 [4:41:34<4:30:56,  1.48it/s]


 52%|█████████████████                | 25869/50000 [4:41:35<4:58:09,  1.35it/s]


 52%|█████████████████                | 25870/50000 [4:41:36<4:29:31,  1.49it/s]


 52%|█████████████████                | 25871/50000 [4:41:36<4:31:07,  1.48it/s]


 52%|█████████████████                | 25872/50000 [4:41:37<4:17:05,  1.56it/s]


 52%|█████████████████                | 25873/50000 [4:41:38<4:28:04,  1.50it/s]


 52%|█████████████████                | 25874/50000 [4:41:38<4:41:51,  1.43it/s]


 52%|█████████████████                | 25875/50000 [4:41:39<4:39:01,  1.44it/s]


 52%|█████████████████                | 25876/50000 [4:41:40<4:43:22,  1.42it/s]


 52%|█████████████████                | 25877/50000 [4:41:40<4:28:38,  1.50it/s]


 52%|█████████████████                | 25878/50000 [4:41:41<4:47:16,  1.40it/s]


 52%|█████████████████                | 25879/50000 [4:41:42<4:50:44,  1.38it/s]


 52%|█████████████████                | 25880/50000 [4:41:43<4:51:35,  1.38it/s]


 52%|█████████████████                | 25881/50000 [4:41:43<4:50:26,  1.38it/s]


 52%|█████████████████                | 25882/50000 [4:41:44<4:40:25,  1.43it/s]


 52%|█████████████████                | 25883/50000 [4:41:45<4:28:11,  1.50it/s]


 52%|█████████████████                | 25884/50000 [4:41:45<4:27:28,  1.50it/s]


 52%|█████████████████                | 25885/50000 [4:41:46<4:39:28,  1.44it/s]


 52%|█████████████████                | 25886/50000 [4:41:47<4:41:26,  1.43it/s]


 52%|█████████████████                | 25887/50000 [4:41:48<4:54:53,  1.36it/s]


 52%|█████████████████                | 25888/50000 [4:41:48<4:36:11,  1.46it/s]


 52%|█████████████████                | 25889/50000 [4:41:49<4:29:55,  1.49it/s]


 52%|█████████████████                | 25890/50000 [4:41:49<4:12:08,  1.59it/s]


 52%|█████████████████                | 25891/50000 [4:41:50<4:15:19,  1.57it/s]


 52%|█████████████████                | 25892/50000 [4:41:51<4:16:49,  1.56it/s]


 52%|█████████████████                | 25893/50000 [4:41:51<4:15:34,  1.57it/s]


 52%|█████████████████                | 25894/50000 [4:41:52<4:13:58,  1.58it/s]


 52%|█████████████████                | 25895/50000 [4:41:53<4:15:42,  1.57it/s]


 52%|█████████████████                | 25896/50000 [4:41:53<4:07:40,  1.62it/s]


 52%|█████████████████                | 25897/50000 [4:41:54<4:11:23,  1.60it/s]


 52%|█████████████████                | 25898/50000 [4:41:54<3:58:18,  1.69it/s]


 52%|█████████████████                | 25899/50000 [4:41:55<4:03:53,  1.65it/s]


 52%|█████████████████                | 25900/50000 [4:41:55<3:57:12,  1.69it/s]
                                                                                
{'loss': 3.3192, 'grad_norm': 3.1255199909210205, 'learning_rate': 0.000482, 'epoch': 1.36}

 52%|█████████████████                | 25900/50000 [4:41:55<3:57:12,  1.69it/s]


 52%|█████████████████                | 25901/50000 [4:41:56<3:56:29,  1.70it/s]


 52%|█████████████████                | 25902/50000 [4:41:57<3:56:30,  1.70it/s]


 52%|█████████████████                | 25903/50000 [4:41:57<4:12:06,  1.59it/s]


 52%|█████████████████                | 25904/50000 [4:41:58<4:03:43,  1.65it/s]


 52%|█████████████████                | 25905/50000 [4:41:59<4:20:05,  1.54it/s]


 52%|█████████████████                | 25906/50000 [4:42:00<4:52:47,  1.37it/s]


 52%|█████████████████                | 25907/50000 [4:42:00<4:42:47,  1.42it/s]


 52%|█████████████████                | 25908/50000 [4:42:01<4:38:48,  1.44it/s]


 52%|█████████████████                | 25909/50000 [4:42:02<4:43:14,  1.42it/s]


 52%|█████████████████                | 25910/50000 [4:42:02<4:29:59,  1.49it/s]


 52%|█████████████████                | 25911/50000 [4:42:03<4:31:24,  1.48it/s]


 52%|█████████████████                | 25912/50000 [4:42:04<4:40:15,  1.43it/s]


 52%|█████████████████                | 25913/50000 [4:42:04<4:35:32,  1.46it/s]


 52%|█████████████████                | 25914/50000 [4:42:05<4:27:37,  1.50it/s]


 52%|█████████████████                | 25915/50000 [4:42:06<4:28:12,  1.50it/s]


 52%|█████████████████                | 25916/50000 [4:42:06<4:17:55,  1.56it/s]


 52%|█████████████████                | 25917/50000 [4:42:07<4:10:43,  1.60it/s]


 52%|█████████████████                | 25918/50000 [4:42:07<3:56:57,  1.69it/s]


 52%|█████████████████                | 25919/50000 [4:42:08<4:05:55,  1.63it/s]


 52%|█████████████████                | 25920/50000 [4:42:09<4:19:21,  1.55it/s]


 52%|█████████████████                | 25921/50000 [4:42:09<4:07:59,  1.62it/s]


 52%|█████████████████                | 25922/50000 [4:42:10<4:13:39,  1.58it/s]


 52%|█████████████████                | 25923/50000 [4:42:11<4:29:19,  1.49it/s]


 52%|█████████████████                | 25924/50000 [4:42:11<4:36:30,  1.45it/s]


 52%|█████████████████                | 25925/50000 [4:42:12<4:34:22,  1.46it/s]


 52%|█████████████████                | 25926/50000 [4:42:13<4:38:13,  1.44it/s]


 52%|█████████████████                | 25927/50000 [4:42:13<4:34:02,  1.46it/s]


 52%|█████████████████                | 25928/50000 [4:42:14<4:20:28,  1.54it/s]


 52%|█████████████████                | 25929/50000 [4:42:15<4:14:42,  1.58it/s]


 52%|█████████████████                | 25930/50000 [4:42:15<4:24:43,  1.52it/s]


 52%|█████████████████                | 25931/50000 [4:42:16<4:31:07,  1.48it/s]


 52%|█████████████████                | 25932/50000 [4:42:17<4:30:50,  1.48it/s]


 52%|█████████████████                | 25933/50000 [4:42:17<4:21:30,  1.53it/s]


 52%|█████████████████                | 25934/50000 [4:42:18<4:50:47,  1.38it/s]


 52%|█████████████████                | 25935/50000 [4:42:19<4:50:12,  1.38it/s]


 52%|█████████████████                | 25936/50000 [4:42:19<4:28:58,  1.49it/s]


 52%|█████████████████                | 25937/50000 [4:42:20<4:27:43,  1.50it/s]


 52%|█████████████████                | 25938/50000 [4:42:21<4:23:51,  1.52it/s]


 52%|█████████████████                | 25939/50000 [4:42:22<4:47:37,  1.39it/s]


 52%|█████████████████                | 25940/50000 [4:42:22<4:58:44,  1.34it/s]


 52%|█████████████████                | 25941/50000 [4:42:23<4:42:13,  1.42it/s]


 52%|█████████████████                | 25942/50000 [4:42:24<4:25:22,  1.51it/s]


 52%|█████████████████                | 25943/50000 [4:42:24<4:16:34,  1.56it/s]


 52%|█████████████████                | 25944/50000 [4:42:25<4:11:55,  1.59it/s]


 52%|█████████████████                | 25945/50000 [4:42:25<4:03:26,  1.65it/s]


 52%|█████████████████                | 25946/50000 [4:42:26<4:00:51,  1.66it/s]


 52%|█████████████████▏               | 25947/50000 [4:42:26<3:58:00,  1.68it/s]


 52%|█████████████████▏               | 25948/50000 [4:42:27<4:13:40,  1.58it/s]


 52%|█████████████████▏               | 25949/50000 [4:42:28<4:08:23,  1.61it/s]


 52%|█████████████████▏               | 25950/50000 [4:42:28<4:03:11,  1.65it/s]


 52%|█████████████████▏               | 25951/50000 [4:42:29<4:10:56,  1.60it/s]


 52%|█████████████████▏               | 25952/50000 [4:42:30<4:13:14,  1.58it/s]


 52%|█████████████████▏               | 25953/50000 [4:42:30<4:18:14,  1.55it/s]


 52%|█████████████████▏               | 25954/50000 [4:42:31<4:09:21,  1.61it/s]


 52%|█████████████████▏               | 25955/50000 [4:42:32<4:12:08,  1.59it/s]


 52%|█████████████████▏               | 25956/50000 [4:42:32<4:08:03,  1.62it/s]


 52%|█████████████████▏               | 25957/50000 [4:42:33<4:01:25,  1.66it/s]


 52%|█████████████████▏               | 25958/50000 [4:42:33<4:06:37,  1.62it/s]


 52%|█████████████████▏               | 25959/50000 [4:42:34<4:09:29,  1.61it/s]


 52%|█████████████████▏               | 25960/50000 [4:42:35<4:12:15,  1.59it/s]


 52%|█████████████████▏               | 25961/50000 [4:42:35<4:23:01,  1.52it/s]


 52%|█████████████████▏               | 25962/50000 [4:42:36<4:20:12,  1.54it/s]


 52%|█████████████████▏               | 25963/50000 [4:42:37<4:19:47,  1.54it/s]


 52%|█████████████████▏               | 25964/50000 [4:42:37<4:07:22,  1.62it/s]


 52%|█████████████████▏               | 25965/50000 [4:42:38<4:00:12,  1.67it/s]


 52%|█████████████████▏               | 25966/50000 [4:42:38<4:08:01,  1.62it/s]


 52%|█████████████████▏               | 25967/50000 [4:42:39<4:04:21,  1.64it/s]


 52%|█████████████████▏               | 25968/50000 [4:42:40<4:16:17,  1.56it/s]


 52%|█████████████████▏               | 25969/50000 [4:42:40<4:09:24,  1.61it/s]


 52%|█████████████████▏               | 25970/50000 [4:42:41<4:05:19,  1.63it/s]


 52%|█████████████████▏               | 25971/50000 [4:42:42<4:10:02,  1.60it/s]


 52%|█████████████████▏               | 25972/50000 [4:42:42<4:12:32,  1.59it/s]


 52%|█████████████████▏               | 25973/50000 [4:42:43<3:58:46,  1.68it/s]


 52%|█████████████████▏               | 25974/50000 [4:42:43<4:09:14,  1.61it/s]


 52%|█████████████████▏               | 25975/50000 [4:42:44<4:02:55,  1.65it/s]


 52%|█████████████████▏               | 25976/50000 [4:42:45<4:38:48,  1.44it/s]


 52%|█████████████████▏               | 25977/50000 [4:42:46<4:33:28,  1.46it/s]


 52%|█████████████████▏               | 25978/50000 [4:42:46<4:27:27,  1.50it/s]


 52%|█████████████████▏               | 25979/50000 [4:42:47<4:48:15,  1.39it/s]


 52%|█████████████████▏               | 25980/50000 [4:42:48<4:48:02,  1.39it/s]


 52%|█████████████████▏               | 25981/50000 [4:42:48<4:39:07,  1.43it/s]


 52%|█████████████████▏               | 25982/50000 [4:42:49<4:26:12,  1.50it/s]


 52%|█████████████████▏               | 25983/50000 [4:42:50<4:15:36,  1.57it/s]


 52%|█████████████████▏               | 25984/50000 [4:42:50<4:16:30,  1.56it/s]


 52%|█████████████████▏               | 25985/50000 [4:42:51<4:17:03,  1.56it/s]


 52%|█████████████████▏               | 25986/50000 [4:42:52<4:29:11,  1.49it/s]


 52%|█████████████████▏               | 25987/50000 [4:42:52<4:27:27,  1.50it/s]


 52%|█████████████████▏               | 25988/50000 [4:42:53<4:26:16,  1.50it/s]


 52%|█████████████████▏               | 25989/50000 [4:42:54<4:36:11,  1.45it/s]


 52%|█████████████████▏               | 25990/50000 [4:42:54<4:35:31,  1.45it/s]


 52%|█████████████████▏               | 25991/50000 [4:42:55<4:30:21,  1.48it/s]


 52%|█████████████████▏               | 25992/50000 [4:42:56<4:39:59,  1.43it/s]


 52%|█████████████████▏               | 25993/50000 [4:42:56<4:22:10,  1.53it/s]


 52%|█████████████████▏               | 25994/50000 [4:42:57<4:05:29,  1.63it/s]


 52%|█████████████████▏               | 25995/50000 [4:42:57<4:07:06,  1.62it/s]


 52%|█████████████████▏               | 25996/50000 [4:42:58<4:09:00,  1.61it/s]


 52%|█████████████████▏               | 25997/50000 [4:42:59<4:34:30,  1.46it/s]


 52%|█████████████████▏               | 25998/50000 [4:43:00<4:54:43,  1.36it/s]


 52%|█████████████████▏               | 25999/50000 [4:43:00<4:53:10,  1.36it/s]


 52%|█████████████████▏               | 26000/50000 [4:43:01<4:36:34,  1.45it/s]
                                                                                
{'loss': 3.3289, 'grad_norm': 3.0548934936523438, 'learning_rate': 0.00048, 'epoch': 1.36}

 52%|█████████████████▏               | 26000/50000 [4:43:01<4:36:34,  1.45it/s]


 52%|█████████████████▏               | 26001/50000 [4:43:02<4:40:28,  1.43it/s]


 52%|█████████████████▏               | 26002/50000 [4:43:02<4:25:29,  1.51it/s]


 52%|█████████████████▏               | 26003/50000 [4:43:03<4:38:52,  1.43it/s]


 52%|█████████████████▏               | 26004/50000 [4:43:04<4:53:09,  1.36it/s]


 52%|█████████████████▏               | 26005/50000 [4:43:05<4:41:09,  1.42it/s]


 52%|█████████████████▏               | 26006/50000 [4:43:05<4:42:40,  1.41it/s]


 52%|█████████████████▏               | 26007/50000 [4:43:06<4:26:59,  1.50it/s]


 52%|█████████████████▏               | 26008/50000 [4:43:06<4:13:48,  1.58it/s]


 52%|█████████████████▏               | 26009/50000 [4:43:07<4:37:47,  1.44it/s]


 52%|█████████████████▏               | 26010/50000 [4:43:08<4:29:44,  1.48it/s]


 52%|█████████████████▏               | 26011/50000 [4:43:08<4:14:55,  1.57it/s]


 52%|█████████████████▏               | 26012/50000 [4:43:09<4:09:02,  1.61it/s]


 52%|█████████████████▏               | 26013/50000 [4:43:10<4:03:29,  1.64it/s]


 52%|█████████████████▏               | 26014/50000 [4:43:10<4:05:35,  1.63it/s]


 52%|█████████████████▏               | 26015/50000 [4:43:11<4:04:42,  1.63it/s]


 52%|█████████████████▏               | 26016/50000 [4:43:12<4:29:16,  1.48it/s]


 52%|█████████████████▏               | 26017/50000 [4:43:12<4:26:00,  1.50it/s]


 52%|█████████████████▏               | 26018/50000 [4:43:13<4:19:13,  1.54it/s]


 52%|█████████████████▏               | 26019/50000 [4:43:14<4:16:09,  1.56it/s]


 52%|█████████████████▏               | 26020/50000 [4:43:14<4:09:27,  1.60it/s]


 52%|█████████████████▏               | 26021/50000 [4:43:15<4:01:00,  1.66it/s]


 52%|█████████████████▏               | 26022/50000 [4:43:15<4:15:44,  1.56it/s]


 52%|█████████████████▏               | 26023/50000 [4:43:16<4:19:49,  1.54it/s]


 52%|█████████████████▏               | 26024/50000 [4:43:17<4:39:08,  1.43it/s]


 52%|█████████████████▏               | 26025/50000 [4:43:18<4:32:50,  1.46it/s]


 52%|█████████████████▏               | 26026/50000 [4:43:18<4:13:36,  1.58it/s]


 52%|█████████████████▏               | 26027/50000 [4:43:19<4:18:02,  1.55it/s]


 52%|█████████████████▏               | 26028/50000 [4:43:19<4:09:42,  1.60it/s]


 52%|█████████████████▏               | 26029/50000 [4:43:20<4:05:20,  1.63it/s]


 52%|█████████████████▏               | 26030/50000 [4:43:21<4:10:42,  1.59it/s]


 52%|█████████████████▏               | 26031/50000 [4:43:21<4:14:22,  1.57it/s]


 52%|█████████████████▏               | 26032/50000 [4:43:22<4:28:38,  1.49it/s]


 52%|█████████████████▏               | 26033/50000 [4:43:23<4:16:25,  1.56it/s]


 52%|█████████████████▏               | 26034/50000 [4:43:23<4:25:13,  1.51it/s]


 52%|█████████████████▏               | 26035/50000 [4:43:24<4:48:19,  1.39it/s]


 52%|█████████████████▏               | 26036/50000 [4:43:25<4:33:54,  1.46it/s]


 52%|█████████████████▏               | 26037/50000 [4:43:25<4:30:38,  1.48it/s]


 52%|█████████████████▏               | 26038/50000 [4:43:26<4:17:50,  1.55it/s]


 52%|█████████████████▏               | 26039/50000 [4:43:27<4:16:25,  1.56it/s]


 52%|█████████████████▏               | 26040/50000 [4:43:28<4:53:21,  1.36it/s]


 52%|█████████████████▏               | 26041/50000 [4:43:28<4:35:55,  1.45it/s]


 52%|█████████████████▏               | 26042/50000 [4:43:29<4:29:03,  1.48it/s]


 52%|█████████████████▏               | 26043/50000 [4:43:29<4:14:21,  1.57it/s]


 52%|█████████████████▏               | 26044/50000 [4:43:30<4:15:23,  1.56it/s]


 52%|█████████████████▏               | 26045/50000 [4:43:31<4:19:56,  1.54it/s]


 52%|█████████████████▏               | 26046/50000 [4:43:31<4:24:25,  1.51it/s]


 52%|█████████████████▏               | 26047/50000 [4:43:32<4:19:34,  1.54it/s]


 52%|█████████████████▏               | 26048/50000 [4:43:33<4:12:58,  1.58it/s]


 52%|█████████████████▏               | 26049/50000 [4:43:33<4:08:20,  1.61it/s]


 52%|█████████████████▏               | 26050/50000 [4:43:34<4:18:51,  1.54it/s]


 52%|█████████████████▏               | 26051/50000 [4:43:35<4:27:01,  1.49it/s]


 52%|█████████████████▏               | 26052/50000 [4:43:35<4:43:54,  1.41it/s]


 52%|█████████████████▏               | 26053/50000 [4:43:36<4:45:27,  1.40it/s]


 52%|█████████████████▏               | 26054/50000 [4:43:37<4:37:51,  1.44it/s]


 52%|█████████████████▏               | 26055/50000 [4:43:37<4:31:50,  1.47it/s]


 52%|█████████████████▏               | 26056/50000 [4:43:38<4:26:48,  1.50it/s]


 52%|█████████████████▏               | 26057/50000 [4:43:39<4:24:05,  1.51it/s]


 52%|█████████████████▏               | 26058/50000 [4:43:39<4:26:48,  1.50it/s]


 52%|█████████████████▏               | 26059/50000 [4:43:40<4:15:48,  1.56it/s]


 52%|█████████████████▏               | 26060/50000 [4:43:41<4:11:03,  1.59it/s]


 52%|█████████████████▏               | 26061/50000 [4:43:41<4:20:51,  1.53it/s]


 52%|█████████████████▏               | 26062/50000 [4:43:42<4:11:02,  1.59it/s]


 52%|█████████████████▏               | 26063/50000 [4:43:43<4:25:40,  1.50it/s]


 52%|█████████████████▏               | 26064/50000 [4:43:43<4:13:43,  1.57it/s]


 52%|█████████████████▏               | 26065/50000 [4:43:44<4:00:21,  1.66it/s]


 52%|█████████████████▏               | 26066/50000 [4:43:44<3:59:30,  1.67it/s]


 52%|█████████████████▏               | 26067/50000 [4:43:45<3:54:28,  1.70it/s]


 52%|█████████████████▏               | 26068/50000 [4:43:46<4:09:02,  1.60it/s]


 52%|█████████████████▏               | 26069/50000 [4:43:46<4:25:39,  1.50it/s]


 52%|█████████████████▏               | 26070/50000 [4:43:47<4:33:28,  1.46it/s]


 52%|█████████████████▏               | 26071/50000 [4:43:48<4:31:45,  1.47it/s]


 52%|█████████████████▏               | 26072/50000 [4:43:48<4:37:42,  1.44it/s]


 52%|█████████████████▏               | 26073/50000 [4:43:49<4:25:39,  1.50it/s]


 52%|█████████████████▏               | 26074/50000 [4:43:50<4:20:14,  1.53it/s]


 52%|█████████████████▏               | 26075/50000 [4:43:50<4:08:23,  1.61it/s]


 52%|█████████████████▏               | 26076/50000 [4:43:51<4:05:01,  1.63it/s]


 52%|█████████████████▏               | 26077/50000 [4:43:51<4:10:02,  1.59it/s]


 52%|█████████████████▏               | 26078/50000 [4:43:52<4:10:14,  1.59it/s]


 52%|█████████████████▏               | 26079/50000 [4:43:53<4:08:04,  1.61it/s]


 52%|█████████████████▏               | 26080/50000 [4:43:53<4:08:56,  1.60it/s]


 52%|█████████████████▏               | 26081/50000 [4:43:54<4:42:04,  1.41it/s]


 52%|█████████████████▏               | 26082/50000 [4:43:55<4:48:51,  1.38it/s]


 52%|█████████████████▏               | 26083/50000 [4:43:56<4:37:26,  1.44it/s]


 52%|█████████████████▏               | 26084/50000 [4:43:56<4:19:40,  1.53it/s]


 52%|█████████████████▏               | 26085/50000 [4:43:57<4:09:30,  1.60it/s]


 52%|█████████████████▏               | 26086/50000 [4:43:57<4:05:18,  1.62it/s]


 52%|█████████████████▏               | 26087/50000 [4:43:58<4:21:39,  1.52it/s]


 52%|█████████████████▏               | 26088/50000 [4:43:59<4:35:18,  1.45it/s]


 52%|█████████████████▏               | 26089/50000 [4:43:59<4:30:09,  1.48it/s]


 52%|█████████████████▏               | 26090/50000 [4:44:00<4:19:02,  1.54it/s]


 52%|█████████████████▏               | 26091/50000 [4:44:01<4:40:29,  1.42it/s]


 52%|█████████████████▏               | 26092/50000 [4:44:02<4:42:54,  1.41it/s]


 52%|█████████████████▏               | 26093/50000 [4:44:02<4:45:26,  1.40it/s]


 52%|█████████████████▏               | 26094/50000 [4:44:03<4:30:11,  1.47it/s]


 52%|█████████████████▏               | 26095/50000 [4:44:04<4:25:34,  1.50it/s]


 52%|█████████████████▏               | 26096/50000 [4:44:04<4:22:19,  1.52it/s]


 52%|█████████████████▏               | 26097/50000 [4:44:05<4:35:28,  1.45it/s]


 52%|█████████████████▏               | 26098/50000 [4:44:06<4:34:37,  1.45it/s]


 52%|█████████████████▏               | 26099/50000 [4:44:06<4:21:57,  1.52it/s]


 52%|█████████████████▏               | 26100/50000 [4:44:07<4:32:53,  1.46it/s]
                                                                                
{'loss': 3.2648, 'grad_norm': 3.5611772537231445, 'learning_rate': 0.00047799999999999996, 'epoch': 1.37}

 52%|█████████████████▏               | 26100/50000 [4:44:07<4:32:53,  1.46it/s]


 52%|█████████████████▏               | 26101/50000 [4:44:08<4:29:44,  1.48it/s]


 52%|█████████████████▏               | 26102/50000 [4:44:08<4:29:48,  1.48it/s]


 52%|█████████████████▏               | 26103/50000 [4:44:09<4:26:02,  1.50it/s]


 52%|█████████████████▏               | 26104/50000 [4:44:10<4:20:26,  1.53it/s]


 52%|█████████████████▏               | 26105/50000 [4:44:10<4:20:11,  1.53it/s]


 52%|█████████████████▏               | 26106/50000 [4:44:11<4:20:01,  1.53it/s]


 52%|█████████████████▏               | 26107/50000 [4:44:12<4:26:47,  1.49it/s]


 52%|█████████████████▏               | 26108/50000 [4:44:12<4:25:28,  1.50it/s]


 52%|█████████████████▏               | 26109/50000 [4:44:13<4:11:02,  1.59it/s]


 52%|█████████████████▏               | 26110/50000 [4:44:13<4:10:36,  1.59it/s]


 52%|█████████████████▏               | 26111/50000 [4:44:14<4:07:14,  1.61it/s]


 52%|█████████████████▏               | 26112/50000 [4:44:15<4:22:34,  1.52it/s]


 52%|█████████████████▏               | 26113/50000 [4:44:15<4:12:39,  1.58it/s]


 52%|█████████████████▏               | 26114/50000 [4:44:16<4:11:45,  1.58it/s]


 52%|█████████████████▏               | 26115/50000 [4:44:17<4:22:00,  1.52it/s]


 52%|█████████████████▏               | 26116/50000 [4:44:17<4:19:46,  1.53it/s]


 52%|█████████████████▏               | 26117/50000 [4:44:18<4:04:12,  1.63it/s]


 52%|█████████████████▏               | 26118/50000 [4:44:19<4:08:47,  1.60it/s]


 52%|█████████████████▏               | 26119/50000 [4:44:19<4:18:51,  1.54it/s]


 52%|█████████████████▏               | 26120/50000 [4:44:20<4:18:20,  1.54it/s]


 52%|█████████████████▏               | 26121/50000 [4:44:21<4:28:07,  1.48it/s]


 52%|█████████████████▏               | 26122/50000 [4:44:21<4:19:29,  1.53it/s]


 52%|█████████████████▏               | 26123/50000 [4:44:22<4:17:17,  1.55it/s]


 52%|█████████████████▏               | 26124/50000 [4:44:23<4:28:03,  1.48it/s]


 52%|█████████████████▏               | 26125/50000 [4:44:23<4:48:54,  1.38it/s]


 52%|█████████████████▏               | 26126/50000 [4:44:24<4:41:35,  1.41it/s]


 52%|█████████████████▏               | 26127/50000 [4:44:25<4:37:17,  1.43it/s]


 52%|█████████████████▏               | 26128/50000 [4:44:25<4:28:12,  1.48it/s]


 52%|█████████████████▏               | 26129/50000 [4:44:26<4:24:51,  1.50it/s]


 52%|█████████████████▏               | 26130/50000 [4:44:27<4:36:40,  1.44it/s]


 52%|█████████████████▏               | 26131/50000 [4:44:27<4:21:47,  1.52it/s]


 52%|█████████████████▏               | 26132/50000 [4:44:28<4:13:33,  1.57it/s]


 52%|█████████████████▏               | 26133/50000 [4:44:29<4:12:21,  1.58it/s]


 52%|█████████████████▏               | 26134/50000 [4:44:29<4:14:27,  1.56it/s]


 52%|█████████████████▏               | 26135/50000 [4:44:30<4:18:58,  1.54it/s]


 52%|█████████████████▏               | 26136/50000 [4:44:31<4:25:55,  1.50it/s]


 52%|█████████████████▎               | 26137/50000 [4:44:31<4:22:27,  1.52it/s]


 52%|█████████████████▎               | 26138/50000 [4:44:32<4:19:17,  1.53it/s]


 52%|█████████████████▎               | 26139/50000 [4:44:32<4:07:15,  1.61it/s]


 52%|█████████████████▎               | 26140/50000 [4:44:33<3:59:58,  1.66it/s]


 52%|█████████████████▎               | 26141/50000 [4:44:34<3:59:43,  1.66it/s]


 52%|█████████████████▎               | 26142/50000 [4:44:34<4:01:51,  1.64it/s]


 52%|█████████████████▎               | 26143/50000 [4:44:35<4:32:46,  1.46it/s]


 52%|█████████████████▎               | 26144/50000 [4:44:36<4:25:42,  1.50it/s]


 52%|█████████████████▎               | 26145/50000 [4:44:36<4:14:45,  1.56it/s]


 52%|█████████████████▎               | 26146/50000 [4:44:37<4:30:47,  1.47it/s]


 52%|█████████████████▎               | 26147/50000 [4:44:38<4:41:14,  1.41it/s]


 52%|█████████████████▎               | 26148/50000 [4:44:39<4:33:53,  1.45it/s]


 52%|█████████████████▎               | 26149/50000 [4:44:39<4:12:33,  1.57it/s]


 52%|█████████████████▎               | 26150/50000 [4:44:40<4:08:31,  1.60it/s]


 52%|█████████████████▎               | 26151/50000 [4:44:40<4:12:40,  1.57it/s]


 52%|█████████████████▎               | 26152/50000 [4:44:41<4:07:05,  1.61it/s]


 52%|█████████████████▎               | 26153/50000 [4:44:41<4:00:56,  1.65it/s]


 52%|█████████████████▎               | 26154/50000 [4:44:42<3:56:37,  1.68it/s]


 52%|█████████████████▎               | 26155/50000 [4:44:43<3:53:48,  1.70it/s]


 52%|█████████████████▎               | 26156/50000 [4:44:43<4:01:24,  1.65it/s]


 52%|█████████████████▎               | 26157/50000 [4:44:44<4:08:16,  1.60it/s]


 52%|█████████████████▎               | 26158/50000 [4:44:45<4:10:44,  1.58it/s]


 52%|█████████████████▎               | 26159/50000 [4:44:45<4:14:01,  1.56it/s]


 52%|█████████████████▎               | 26160/50000 [4:44:46<4:09:31,  1.59it/s]


 52%|█████████████████▎               | 26161/50000 [4:44:46<4:13:53,  1.56it/s]


 52%|█████████████████▎               | 26162/50000 [4:44:47<4:06:50,  1.61it/s]


 52%|█████████████████▎               | 26163/50000 [4:44:48<4:04:26,  1.63it/s]


 52%|█████████████████▎               | 26164/50000 [4:44:48<3:53:22,  1.70it/s]


 52%|█████████████████▎               | 26165/50000 [4:44:49<4:32:03,  1.46it/s]


 52%|█████████████████▎               | 26166/50000 [4:44:50<4:21:22,  1.52it/s]


 52%|█████████████████▎               | 26167/50000 [4:44:50<4:19:53,  1.53it/s]


 52%|█████████████████▎               | 26168/50000 [4:44:51<4:40:39,  1.42it/s]


 52%|█████████████████▎               | 26169/50000 [4:44:52<4:22:56,  1.51it/s]


 52%|█████████████████▎               | 26170/50000 [4:44:52<4:09:48,  1.59it/s]


 52%|█████████████████▎               | 26171/50000 [4:44:53<4:11:52,  1.58it/s]


 52%|█████████████████▎               | 26172/50000 [4:44:54<4:16:24,  1.55it/s]


 52%|█████████████████▎               | 26173/50000 [4:44:54<4:05:22,  1.62it/s]


 52%|█████████████████▎               | 26174/50000 [4:44:55<4:19:00,  1.53it/s]


 52%|█████████████████▎               | 26175/50000 [4:44:56<4:17:43,  1.54it/s]


 52%|█████████████████▎               | 26176/50000 [4:44:56<4:01:25,  1.64it/s]


 52%|█████████████████▎               | 26177/50000 [4:44:57<4:09:11,  1.59it/s]


 52%|█████████████████▎               | 26178/50000 [4:44:57<4:04:05,  1.63it/s]


 52%|█████████████████▎               | 26179/50000 [4:44:58<3:59:53,  1.66it/s]


 52%|█████████████████▎               | 26180/50000 [4:44:59<4:18:06,  1.54it/s]


 52%|█████████████████▎               | 26181/50000 [4:44:59<4:12:19,  1.57it/s]


 52%|█████████████████▎               | 26182/50000 [4:45:00<3:58:07,  1.67it/s]


 52%|█████████████████▎               | 26183/50000 [4:45:00<3:57:37,  1.67it/s]


 52%|█████████████████▎               | 26184/50000 [4:45:01<3:55:01,  1.69it/s]


 52%|█████████████████▎               | 26185/50000 [4:45:02<4:05:12,  1.62it/s]


 52%|█████████████████▎               | 26186/50000 [4:45:02<4:12:17,  1.57it/s]


 52%|█████████████████▎               | 26187/50000 [4:45:03<4:05:18,  1.62it/s]


 52%|█████████████████▎               | 26188/50000 [4:45:04<4:13:10,  1.57it/s]


 52%|█████████████████▎               | 26189/50000 [4:45:04<4:05:03,  1.62it/s]


 52%|█████████████████▎               | 26190/50000 [4:45:05<4:08:22,  1.60it/s]


 52%|█████████████████▎               | 26191/50000 [4:45:05<4:20:04,  1.53it/s]


 52%|█████████████████▎               | 26192/50000 [4:45:06<4:18:10,  1.54it/s]


 52%|█████████████████▎               | 26193/50000 [4:45:07<4:07:51,  1.60it/s]


 52%|█████████████████▎               | 26194/50000 [4:45:07<4:02:06,  1.64it/s]


 52%|█████████████████▎               | 26195/50000 [4:45:08<3:58:13,  1.67it/s]


 52%|█████████████████▎               | 26196/50000 [4:45:08<3:52:21,  1.71it/s]


 52%|█████████████████▎               | 26197/50000 [4:45:09<4:04:02,  1.63it/s]


 52%|█████████████████▎               | 26198/50000 [4:45:10<4:00:12,  1.65it/s]


 52%|█████████████████▎               | 26199/50000 [4:45:10<3:56:02,  1.68it/s]


 52%|█████████████████▎               | 26200/50000 [4:45:11<4:47:31,  1.38it/s]
                                                                                
{'loss': 3.3339, 'grad_norm': 3.1189804077148438, 'learning_rate': 0.00047599999999999997, 'epoch': 1.37}

 52%|█████████████████▎               | 26200/50000 [4:45:11<4:47:31,  1.38it/s]


 52%|█████████████████▎               | 26201/50000 [4:45:12<4:39:40,  1.42it/s]


 52%|█████████████████▎               | 26202/50000 [4:45:12<4:26:33,  1.49it/s]


 52%|█████████████████▎               | 26203/50000 [4:45:13<4:18:51,  1.53it/s]


 52%|█████████████████▎               | 26204/50000 [4:45:14<4:03:22,  1.63it/s]


 52%|█████████████████▎               | 26205/50000 [4:45:14<4:01:52,  1.64it/s]


 52%|█████████████████▎               | 26206/50000 [4:45:15<4:07:42,  1.60it/s]


 52%|█████████████████▎               | 26207/50000 [4:45:16<4:07:59,  1.60it/s]


 52%|█████████████████▎               | 26208/50000 [4:45:16<4:02:11,  1.64it/s]


 52%|█████████████████▎               | 26209/50000 [4:45:17<4:15:40,  1.55it/s]


 52%|█████████████████▎               | 26210/50000 [4:45:18<4:40:45,  1.41it/s]


 52%|█████████████████▎               | 26211/50000 [4:45:18<4:34:53,  1.44it/s]


 52%|█████████████████▎               | 26212/50000 [4:45:19<4:24:40,  1.50it/s]


 52%|█████████████████▎               | 26213/50000 [4:45:20<4:16:06,  1.55it/s]


 52%|█████████████████▎               | 26214/50000 [4:45:20<4:24:23,  1.50it/s]


 52%|█████████████████▎               | 26215/50000 [4:45:21<4:45:53,  1.39it/s]


 52%|█████████████████▎               | 26216/50000 [4:45:22<4:35:29,  1.44it/s]


 52%|█████████████████▎               | 26217/50000 [4:45:22<4:41:10,  1.41it/s]


 52%|█████████████████▎               | 26218/50000 [4:45:23<4:42:14,  1.40it/s]


 52%|█████████████████▎               | 26219/50000 [4:45:24<4:25:00,  1.50it/s]


 52%|█████████████████▎               | 26220/50000 [4:45:24<4:34:30,  1.44it/s]


 52%|█████████████████▎               | 26221/50000 [4:45:25<4:40:56,  1.41it/s]


 52%|█████████████████▎               | 26222/50000 [4:45:26<4:28:23,  1.48it/s]


 52%|█████████████████▎               | 26223/50000 [4:45:27<4:28:03,  1.48it/s]


 52%|█████████████████▎               | 26224/50000 [4:45:27<4:40:01,  1.42it/s]


 52%|█████████████████▎               | 26225/50000 [4:45:28<4:32:15,  1.46it/s]


 52%|█████████████████▎               | 26226/50000 [4:45:29<4:25:19,  1.49it/s]


 52%|█████████████████▎               | 26227/50000 [4:45:29<4:15:14,  1.55it/s]


 52%|█████████████████▎               | 26228/50000 [4:45:30<4:35:05,  1.44it/s]


 52%|█████████████████▎               | 26229/50000 [4:45:31<4:31:41,  1.46it/s]


 52%|█████████████████▎               | 26230/50000 [4:45:31<4:28:24,  1.48it/s]


 52%|█████████████████▎               | 26231/50000 [4:45:32<4:17:52,  1.54it/s]


 52%|█████████████████▎               | 26232/50000 [4:45:33<4:14:58,  1.55it/s]


 52%|█████████████████▎               | 26233/50000 [4:45:33<4:19:43,  1.53it/s]


 52%|█████████████████▎               | 26234/50000 [4:45:34<4:06:38,  1.61it/s]


 52%|█████████████████▎               | 26235/50000 [4:45:34<4:17:59,  1.54it/s]


 52%|█████████████████▎               | 26236/50000 [4:45:35<4:07:35,  1.60it/s]


 52%|█████████████████▎               | 26237/50000 [4:45:36<4:14:36,  1.56it/s]


 52%|█████████████████▎               | 26238/50000 [4:45:36<4:24:15,  1.50it/s]


 52%|█████████████████▎               | 26239/50000 [4:45:37<4:13:26,  1.56it/s]


 52%|█████████████████▎               | 26240/50000 [4:45:38<4:04:22,  1.62it/s]


 52%|█████████████████▎               | 26241/50000 [4:45:38<4:09:19,  1.59it/s]


 52%|█████████████████▎               | 26242/50000 [4:45:39<4:36:52,  1.43it/s]


 52%|█████████████████▎               | 26243/50000 [4:45:40<4:11:25,  1.57it/s]


 52%|█████████████████▎               | 26244/50000 [4:45:40<4:01:30,  1.64it/s]


 52%|█████████████████▎               | 26245/50000 [4:45:41<4:15:54,  1.55it/s]


 52%|█████████████████▎               | 26246/50000 [4:45:41<4:07:35,  1.60it/s]


 52%|█████████████████▎               | 26247/50000 [4:45:42<4:08:33,  1.59it/s]


 52%|█████████████████▎               | 26248/50000 [4:45:43<4:26:57,  1.48it/s]


 52%|█████████████████▎               | 26249/50000 [4:45:44<4:35:51,  1.43it/s]


 52%|█████████████████▎               | 26250/50000 [4:45:44<4:15:27,  1.55it/s]


 53%|█████████████████▎               | 26251/50000 [4:45:45<4:24:05,  1.50it/s]


 53%|█████████████████▎               | 26252/50000 [4:45:45<4:11:52,  1.57it/s]


 53%|█████████████████▎               | 26253/50000 [4:45:46<4:08:47,  1.59it/s]


 53%|█████████████████▎               | 26254/50000 [4:45:47<4:34:34,  1.44it/s]


 53%|█████████████████▎               | 26255/50000 [4:45:48<4:29:21,  1.47it/s]


 53%|█████████████████▎               | 26256/50000 [4:45:48<4:37:07,  1.43it/s]


 53%|█████████████████▎               | 26257/50000 [4:45:49<4:39:11,  1.42it/s]


 53%|█████████████████▎               | 26258/50000 [4:45:50<4:32:04,  1.45it/s]


 53%|█████████████████▎               | 26259/50000 [4:45:50<4:46:35,  1.38it/s]


 53%|█████████████████▎               | 26260/50000 [4:45:51<4:20:23,  1.52it/s]


 53%|█████████████████▎               | 26261/50000 [4:45:52<4:22:41,  1.51it/s]


 53%|█████████████████▎               | 26262/50000 [4:45:52<4:18:16,  1.53it/s]


 53%|█████████████████▎               | 26263/50000 [4:45:53<4:17:24,  1.54it/s]


 53%|█████████████████▎               | 26264/50000 [4:45:54<4:32:55,  1.45it/s]


 53%|█████████████████▎               | 26265/50000 [4:45:54<4:26:14,  1.49it/s]


 53%|█████████████████▎               | 26266/50000 [4:45:55<4:16:59,  1.54it/s]


 53%|█████████████████▎               | 26267/50000 [4:45:56<4:15:02,  1.55it/s]


 53%|█████████████████▎               | 26268/50000 [4:45:56<4:25:15,  1.49it/s]


 53%|█████████████████▎               | 26269/50000 [4:45:57<4:17:07,  1.54it/s]


 53%|█████████████████▎               | 26270/50000 [4:45:57<4:14:25,  1.55it/s]


 53%|█████████████████▎               | 26271/50000 [4:45:58<4:16:13,  1.54it/s]


 53%|█████████████████▎               | 26272/50000 [4:45:59<4:27:31,  1.48it/s]


 53%|█████████████████▎               | 26273/50000 [4:46:00<4:34:00,  1.44it/s]


 53%|█████████████████▎               | 26274/50000 [4:46:00<4:22:27,  1.51it/s]


 53%|█████████████████▎               | 26275/50000 [4:46:01<4:34:12,  1.44it/s]


 53%|█████████████████▎               | 26276/50000 [4:46:02<4:27:06,  1.48it/s]


 53%|█████████████████▎               | 26277/50000 [4:46:02<4:46:01,  1.38it/s]


 53%|█████████████████▎               | 26278/50000 [4:46:03<4:38:22,  1.42it/s]


 53%|█████████████████▎               | 26279/50000 [4:46:04<4:22:28,  1.51it/s]


 53%|█████████████████▎               | 26280/50000 [4:46:04<4:21:04,  1.51it/s]


 53%|█████████████████▎               | 26281/50000 [4:46:05<4:12:35,  1.57it/s]


 53%|█████████████████▎               | 26282/50000 [4:46:06<4:17:07,  1.54it/s]


 53%|█████████████████▎               | 26283/50000 [4:46:06<4:00:26,  1.64it/s]


 53%|█████████████████▎               | 26284/50000 [4:46:07<4:04:04,  1.62it/s]


 53%|█████████████████▎               | 26285/50000 [4:46:07<4:00:40,  1.64it/s]


 53%|█████████████████▎               | 26286/50000 [4:46:08<3:55:19,  1.68it/s]


 53%|█████████████████▎               | 26287/50000 [4:46:09<4:08:50,  1.59it/s]


 53%|█████████████████▎               | 26288/50000 [4:46:09<4:08:28,  1.59it/s]


 53%|█████████████████▎               | 26289/50000 [4:46:10<4:08:16,  1.59it/s]


 53%|█████████████████▎               | 26290/50000 [4:46:11<4:14:53,  1.55it/s]


 53%|█████████████████▎               | 26291/50000 [4:46:11<4:34:26,  1.44it/s]


 53%|█████████████████▎               | 26292/50000 [4:46:12<4:30:54,  1.46it/s]


 53%|█████████████████▎               | 26293/50000 [4:46:13<4:23:13,  1.50it/s]


 53%|█████████████████▎               | 26294/50000 [4:46:13<4:12:35,  1.56it/s]


 53%|█████████████████▎               | 26295/50000 [4:46:14<4:05:55,  1.61it/s]


 53%|█████████████████▎               | 26296/50000 [4:46:14<4:12:25,  1.57it/s]


 53%|█████████████████▎               | 26297/50000 [4:46:15<4:26:16,  1.48it/s]


 53%|█████████████████▎               | 26298/50000 [4:46:16<4:15:27,  1.55it/s]


 53%|█████████████████▎               | 26299/50000 [4:46:17<4:27:36,  1.48it/s]


 53%|█████████████████▎               | 26300/50000 [4:46:17<4:17:53,  1.53it/s]
                                                                                
{'loss': 3.2929, 'grad_norm': 3.100290298461914, 'learning_rate': 0.000474, 'epoch': 1.38}

 53%|█████████████████▎               | 26300/50000 [4:46:17<4:17:53,  1.53it/s]


 53%|█████████████████▎               | 26301/50000 [4:46:18<4:36:34,  1.43it/s]


 53%|█████████████████▎               | 26302/50000 [4:46:19<4:31:42,  1.45it/s]


 53%|█████████████████▎               | 26303/50000 [4:46:19<4:24:27,  1.49it/s]


 53%|█████████████████▎               | 26304/50000 [4:46:20<4:23:08,  1.50it/s]


 53%|█████████████████▎               | 26305/50000 [4:46:21<4:22:16,  1.51it/s]


 53%|█████████████████▎               | 26306/50000 [4:46:21<4:12:38,  1.56it/s]


 53%|█████████████████▎               | 26307/50000 [4:46:22<3:58:52,  1.65it/s]


 53%|█████████████████▎               | 26308/50000 [4:46:22<4:00:45,  1.64it/s]


 53%|█████████████████▎               | 26309/50000 [4:46:23<3:59:04,  1.65it/s]


 53%|█████████████████▎               | 26310/50000 [4:46:24<4:12:05,  1.57it/s]


 53%|█████████████████▎               | 26311/50000 [4:46:24<3:56:07,  1.67it/s]


 53%|█████████████████▎               | 26312/50000 [4:46:25<3:55:01,  1.68it/s]


 53%|█████████████████▎               | 26313/50000 [4:46:25<3:46:22,  1.74it/s]


 53%|█████████████████▎               | 26314/50000 [4:46:26<3:38:09,  1.81it/s]


 53%|█████████████████▎               | 26315/50000 [4:46:26<3:40:57,  1.79it/s]


 53%|█████████████████▎               | 26316/50000 [4:46:27<3:45:07,  1.75it/s]


 53%|█████████████████▎               | 26317/50000 [4:46:27<3:42:52,  1.77it/s]


 53%|█████████████████▎               | 26318/50000 [4:46:28<3:54:44,  1.68it/s]


 53%|█████████████████▎               | 26319/50000 [4:46:29<3:48:50,  1.72it/s]


 53%|█████████████████▎               | 26320/50000 [4:46:29<4:05:58,  1.60it/s]


 53%|█████████████████▎               | 26321/50000 [4:46:30<4:09:21,  1.58it/s]


 53%|█████████████████▎               | 26322/50000 [4:46:31<4:13:17,  1.56it/s]


 53%|█████████████████▎               | 26323/50000 [4:46:31<4:04:51,  1.61it/s]


 53%|█████████████████▎               | 26324/50000 [4:46:32<4:10:04,  1.58it/s]


 53%|█████████████████▎               | 26325/50000 [4:46:33<4:13:46,  1.55it/s]


 53%|█████████████████▍               | 26326/50000 [4:46:33<4:24:51,  1.49it/s]


 53%|█████████████████▍               | 26327/50000 [4:46:34<4:18:53,  1.52it/s]


 53%|█████████████████▍               | 26328/50000 [4:46:35<4:16:53,  1.54it/s]


 53%|█████████████████▍               | 26329/50000 [4:46:35<4:09:40,  1.58it/s]


 53%|█████████████████▍               | 26330/50000 [4:46:36<4:00:13,  1.64it/s]


 53%|█████████████████▍               | 26331/50000 [4:46:36<4:04:46,  1.61it/s]


 53%|█████████████████▍               | 26332/50000 [4:46:37<4:05:53,  1.60it/s]


 53%|█████████████████▍               | 26333/50000 [4:46:38<4:00:38,  1.64it/s]


 53%|█████████████████▍               | 26334/50000 [4:46:38<4:05:50,  1.60it/s]


 53%|█████████████████▍               | 26335/50000 [4:46:39<3:58:22,  1.65it/s]


 53%|█████████████████▍               | 26336/50000 [4:46:40<4:08:00,  1.59it/s]


 53%|█████████████████▍               | 26337/50000 [4:46:40<4:07:20,  1.59it/s]


 53%|█████████████████▍               | 26338/50000 [4:46:41<4:13:30,  1.56it/s]


 53%|█████████████████▍               | 26339/50000 [4:46:41<4:08:54,  1.58it/s]


 53%|█████████████████▍               | 26340/50000 [4:46:42<4:20:08,  1.52it/s]


 53%|█████████████████▍               | 26341/50000 [4:46:43<4:13:32,  1.56it/s]


 53%|█████████████████▍               | 26342/50000 [4:46:43<4:08:06,  1.59it/s]


 53%|█████████████████▍               | 26343/50000 [4:46:44<3:59:08,  1.65it/s]


 53%|█████████████████▍               | 26344/50000 [4:46:45<4:02:03,  1.63it/s]


 53%|█████████████████▍               | 26345/50000 [4:46:45<4:03:07,  1.62it/s]


 53%|█████████████████▍               | 26346/50000 [4:46:46<4:07:18,  1.59it/s]


 53%|█████████████████▍               | 26347/50000 [4:46:46<4:00:30,  1.64it/s]


 53%|█████████████████▍               | 26348/50000 [4:46:47<4:02:27,  1.63it/s]


 53%|█████████████████▍               | 26349/50000 [4:46:48<4:01:34,  1.63it/s]


 53%|█████████████████▍               | 26350/50000 [4:46:48<4:02:31,  1.63it/s]


 53%|█████████████████▍               | 26351/50000 [4:46:49<4:05:23,  1.61it/s]


 53%|█████████████████▍               | 26352/50000 [4:46:50<4:08:53,  1.58it/s]


 53%|█████████████████▍               | 26353/50000 [4:46:50<4:09:52,  1.58it/s]


 53%|█████████████████▍               | 26354/50000 [4:46:51<4:06:46,  1.60it/s]


 53%|█████████████████▍               | 26355/50000 [4:46:51<4:12:49,  1.56it/s]


 53%|█████████████████▍               | 26356/50000 [4:46:52<4:15:34,  1.54it/s]


 53%|█████████████████▍               | 26357/50000 [4:46:53<4:10:43,  1.57it/s]


 53%|█████████████████▍               | 26358/50000 [4:46:53<4:24:49,  1.49it/s]


 53%|█████████████████▍               | 26359/50000 [4:46:54<4:20:59,  1.51it/s]


 53%|█████████████████▍               | 26360/50000 [4:46:55<4:22:45,  1.50it/s]


 53%|█████████████████▍               | 26361/50000 [4:46:55<4:14:16,  1.55it/s]


 53%|█████████████████▍               | 26362/50000 [4:46:56<4:08:16,  1.59it/s]


 53%|█████████████████▍               | 26363/50000 [4:46:57<4:03:21,  1.62it/s]


 53%|█████████████████▍               | 26364/50000 [4:46:57<3:56:16,  1.67it/s]


 53%|█████████████████▍               | 26365/50000 [4:46:58<4:10:26,  1.57it/s]


 53%|█████████████████▍               | 26366/50000 [4:46:59<4:16:54,  1.53it/s]


 53%|█████████████████▍               | 26367/50000 [4:46:59<4:30:09,  1.46it/s]


 53%|█████████████████▍               | 26368/50000 [4:47:00<4:20:23,  1.51it/s]


 53%|█████████████████▍               | 26369/50000 [4:47:01<4:16:32,  1.54it/s]


 53%|█████████████████▍               | 26370/50000 [4:47:01<4:07:01,  1.59it/s]


 53%|█████████████████▍               | 26371/50000 [4:47:02<4:09:12,  1.58it/s]


 53%|█████████████████▍               | 26372/50000 [4:47:03<4:44:27,  1.38it/s]


 53%|█████████████████▍               | 26373/50000 [4:47:03<4:27:22,  1.47it/s]


 53%|█████████████████▍               | 26374/50000 [4:47:04<4:46:48,  1.37it/s]


 53%|█████████████████▍               | 26375/50000 [4:47:05<4:30:34,  1.46it/s]


 53%|█████████████████▍               | 26376/50000 [4:47:05<4:23:42,  1.49it/s]


 53%|█████████████████▍               | 26377/50000 [4:47:06<4:09:59,  1.57it/s]


 53%|█████████████████▍               | 26378/50000 [4:47:07<4:09:15,  1.58it/s]


 53%|█████████████████▍               | 26379/50000 [4:47:07<4:04:15,  1.61it/s]


 53%|█████████████████▍               | 26380/50000 [4:47:08<4:08:17,  1.59it/s]


 53%|█████████████████▍               | 26381/50000 [4:47:08<4:14:03,  1.55it/s]


 53%|█████████████████▍               | 26382/50000 [4:47:09<4:14:53,  1.54it/s]


 53%|█████████████████▍               | 26383/50000 [4:47:10<4:26:09,  1.48it/s]


 53%|█████████████████▍               | 26384/50000 [4:47:10<4:24:03,  1.49it/s]


 53%|█████████████████▍               | 26385/50000 [4:47:11<4:15:08,  1.54it/s]


 53%|█████████████████▍               | 26386/50000 [4:47:12<4:19:22,  1.52it/s]


 53%|█████████████████▍               | 26387/50000 [4:47:13<4:39:29,  1.41it/s]


 53%|█████████████████▍               | 26388/50000 [4:47:13<4:31:52,  1.45it/s]


 53%|█████████████████▍               | 26389/50000 [4:47:14<4:21:22,  1.51it/s]


 53%|█████████████████▍               | 26390/50000 [4:47:14<4:19:11,  1.52it/s]


 53%|█████████████████▍               | 26391/50000 [4:47:15<4:16:15,  1.54it/s]


 53%|█████████████████▍               | 26392/50000 [4:47:16<4:18:02,  1.52it/s]


 53%|█████████████████▍               | 26393/50000 [4:47:16<4:19:31,  1.52it/s]


 53%|█████████████████▍               | 26394/50000 [4:47:17<4:32:06,  1.45it/s]


 53%|█████████████████▍               | 26395/50000 [4:47:18<4:24:47,  1.49it/s]


 53%|█████████████████▍               | 26396/50000 [4:47:19<4:25:35,  1.48it/s]


 53%|█████████████████▍               | 26397/50000 [4:47:19<4:07:05,  1.59it/s]


 53%|█████████████████▍               | 26398/50000 [4:47:20<4:08:42,  1.58it/s]


 53%|█████████████████▍               | 26399/50000 [4:47:20<4:09:05,  1.58it/s]


 53%|█████████████████▍               | 26400/50000 [4:47:21<4:06:16,  1.60it/s]
                                                                                
{'loss': 3.2736, 'grad_norm': 2.9376258850097656, 'learning_rate': 0.000472, 'epoch': 1.38}

 53%|█████████████████▍               | 26400/50000 [4:47:21<4:06:16,  1.60it/s]


 53%|█████████████████▍               | 26401/50000 [4:47:22<4:13:20,  1.55it/s]


 53%|█████████████████▍               | 26402/50000 [4:47:22<4:23:48,  1.49it/s]


 53%|█████████████████▍               | 26403/50000 [4:47:23<4:21:09,  1.51it/s]


 53%|█████████████████▍               | 26404/50000 [4:47:24<4:09:36,  1.58it/s]


 53%|█████████████████▍               | 26405/50000 [4:47:24<4:14:53,  1.54it/s]


 53%|█████████████████▍               | 26406/50000 [4:47:25<4:09:35,  1.58it/s]


 53%|█████████████████▍               | 26407/50000 [4:47:26<4:20:25,  1.51it/s]


 53%|█████████████████▍               | 26408/50000 [4:47:26<4:38:03,  1.41it/s]


 53%|█████████████████▍               | 26409/50000 [4:47:27<4:40:17,  1.40it/s]


 53%|█████████████████▍               | 26410/50000 [4:47:28<4:24:26,  1.49it/s]


 53%|█████████████████▍               | 26411/50000 [4:47:28<4:09:38,  1.57it/s]


 53%|█████████████████▍               | 26412/50000 [4:47:29<4:46:13,  1.37it/s]


 53%|█████████████████▍               | 26413/50000 [4:47:30<4:32:01,  1.45it/s]


 53%|█████████████████▍               | 26414/50000 [4:47:31<4:36:37,  1.42it/s]


 53%|█████████████████▍               | 26415/50000 [4:47:31<4:15:16,  1.54it/s]


 53%|█████████████████▍               | 26416/50000 [4:47:32<4:39:33,  1.41it/s]


 53%|█████████████████▍               | 26417/50000 [4:47:33<4:34:05,  1.43it/s]


 53%|█████████████████▍               | 26418/50000 [4:47:33<4:25:52,  1.48it/s]


 53%|█████████████████▍               | 26419/50000 [4:47:34<4:30:48,  1.45it/s]


 53%|█████████████████▍               | 26420/50000 [4:47:35<4:25:38,  1.48it/s]


 53%|█████████████████▍               | 26421/50000 [4:47:35<4:23:36,  1.49it/s]


 53%|█████████████████▍               | 26422/50000 [4:47:36<4:06:07,  1.60it/s]


 53%|█████████████████▍               | 26423/50000 [4:47:36<3:57:17,  1.66it/s]


 53%|█████████████████▍               | 26424/50000 [4:47:37<3:51:06,  1.70it/s]


 53%|█████████████████▍               | 26425/50000 [4:47:37<3:56:17,  1.66it/s]


 53%|█████████████████▍               | 26426/50000 [4:47:38<4:12:59,  1.55it/s]


 53%|█████████████████▍               | 26427/50000 [4:47:39<4:05:03,  1.60it/s]


 53%|█████████████████▍               | 26428/50000 [4:47:39<4:09:12,  1.58it/s]


 53%|█████████████████▍               | 26429/50000 [4:47:40<3:54:33,  1.67it/s]


 53%|█████████████████▍               | 26430/50000 [4:47:41<3:58:02,  1.65it/s]


 53%|█████████████████▍               | 26431/50000 [4:47:41<4:03:26,  1.61it/s]


 53%|█████████████████▍               | 26432/50000 [4:47:42<3:58:19,  1.65it/s]


 53%|█████████████████▍               | 26433/50000 [4:47:42<3:56:58,  1.66it/s]


 53%|█████████████████▍               | 26434/50000 [4:47:43<4:01:10,  1.63it/s]


 53%|█████████████████▍               | 26435/50000 [4:47:44<4:06:27,  1.59it/s]


 53%|█████████████████▍               | 26436/50000 [4:47:44<4:07:48,  1.58it/s]


 53%|█████████████████▍               | 26437/50000 [4:47:45<4:22:29,  1.50it/s]


 53%|█████████████████▍               | 26438/50000 [4:47:46<4:10:54,  1.57it/s]


 53%|█████████████████▍               | 26439/50000 [4:47:46<4:01:22,  1.63it/s]


 53%|█████████████████▍               | 26440/50000 [4:47:47<4:18:47,  1.52it/s]


 53%|█████████████████▍               | 26441/50000 [4:47:48<4:11:17,  1.56it/s]


 53%|█████████████████▍               | 26442/50000 [4:47:48<4:00:06,  1.64it/s]


 53%|█████████████████▍               | 26443/50000 [4:47:49<4:14:03,  1.55it/s]


 53%|█████████████████▍               | 26444/50000 [4:47:50<4:12:26,  1.56it/s]


 53%|█████████████████▍               | 26445/50000 [4:47:50<4:10:34,  1.57it/s]


 53%|█████████████████▍               | 26446/50000 [4:47:51<4:15:11,  1.54it/s]


 53%|█████████████████▍               | 26447/50000 [4:47:51<4:17:43,  1.52it/s]


 53%|█████████████████▍               | 26448/50000 [4:47:52<4:05:33,  1.60it/s]


 53%|█████████████████▍               | 26449/50000 [4:47:53<4:09:26,  1.57it/s]


 53%|█████████████████▍               | 26450/50000 [4:47:53<4:05:27,  1.60it/s]


 53%|█████████████████▍               | 26451/50000 [4:47:54<3:57:31,  1.65it/s]


 53%|█████████████████▍               | 26452/50000 [4:47:54<4:00:44,  1.63it/s]


 53%|█████████████████▍               | 26453/50000 [4:47:55<4:16:47,  1.53it/s]


 53%|█████████████████▍               | 26454/50000 [4:47:56<4:13:40,  1.55it/s]


 53%|█████████████████▍               | 26455/50000 [4:47:57<4:25:50,  1.48it/s]


 53%|█████████████████▍               | 26456/50000 [4:47:57<4:07:34,  1.58it/s]


 53%|█████████████████▍               | 26457/50000 [4:47:58<4:09:19,  1.57it/s]


 53%|█████████████████▍               | 26458/50000 [4:47:58<4:05:19,  1.60it/s]


 53%|█████████████████▍               | 26459/50000 [4:47:59<3:59:35,  1.64it/s]


 53%|█████████████████▍               | 26460/50000 [4:48:00<4:15:11,  1.54it/s]


 53%|█████████████████▍               | 26461/50000 [4:48:00<3:58:41,  1.64it/s]


 53%|█████████████████▍               | 26462/50000 [4:48:01<3:59:39,  1.64it/s]


 53%|█████████████████▍               | 26463/50000 [4:48:01<3:55:36,  1.66it/s]


 53%|█████████████████▍               | 26464/50000 [4:48:02<3:53:30,  1.68it/s]


 53%|█████████████████▍               | 26465/50000 [4:48:03<3:49:48,  1.71it/s]


 53%|█████████████████▍               | 26466/50000 [4:48:03<3:50:51,  1.70it/s]


 53%|█████████████████▍               | 26467/50000 [4:48:04<3:47:54,  1.72it/s]


 53%|█████████████████▍               | 26468/50000 [4:48:04<3:45:10,  1.74it/s]


 53%|█████████████████▍               | 26469/50000 [4:48:05<3:46:09,  1.73it/s]


 53%|█████████████████▍               | 26470/50000 [4:48:06<4:04:17,  1.61it/s]


 53%|█████████████████▍               | 26471/50000 [4:48:06<4:02:27,  1.62it/s]


 53%|█████████████████▍               | 26472/50000 [4:48:07<4:40:02,  1.40it/s]


 53%|█████████████████▍               | 26473/50000 [4:48:08<4:26:02,  1.47it/s]


 53%|█████████████████▍               | 26474/50000 [4:48:08<4:11:53,  1.56it/s]


 53%|█████████████████▍               | 26475/50000 [4:48:09<3:58:00,  1.65it/s]


 53%|█████████████████▍               | 26476/50000 [4:48:09<4:00:21,  1.63it/s]


 53%|█████████████████▍               | 26477/50000 [4:48:10<4:07:54,  1.58it/s]


 53%|█████████████████▍               | 26478/50000 [4:48:11<3:59:50,  1.63it/s]


 53%|█████████████████▍               | 26479/50000 [4:48:11<4:01:35,  1.62it/s]


 53%|█████████████████▍               | 26480/50000 [4:48:12<3:59:04,  1.64it/s]


 53%|█████████████████▍               | 26481/50000 [4:48:13<4:06:59,  1.59it/s]


 53%|█████████████████▍               | 26482/50000 [4:48:13<4:03:09,  1.61it/s]


 53%|█████████████████▍               | 26483/50000 [4:48:14<4:04:01,  1.61it/s]


 53%|█████████████████▍               | 26484/50000 [4:48:14<4:10:26,  1.56it/s]


 53%|█████████████████▍               | 26485/50000 [4:48:15<4:10:29,  1.56it/s]


 53%|█████████████████▍               | 26486/50000 [4:48:16<4:14:58,  1.54it/s]


 53%|█████████████████▍               | 26487/50000 [4:48:17<4:28:07,  1.46it/s]


 53%|█████████████████▍               | 26488/50000 [4:48:17<4:25:07,  1.48it/s]


 53%|█████████████████▍               | 26489/50000 [4:48:18<4:35:54,  1.42it/s]


 53%|█████████████████▍               | 26490/50000 [4:48:19<4:26:47,  1.47it/s]


 53%|█████████████████▍               | 26491/50000 [4:48:19<4:15:58,  1.53it/s]


 53%|█████████████████▍               | 26492/50000 [4:48:20<4:09:54,  1.57it/s]


 53%|█████████████████▍               | 26493/50000 [4:48:20<3:59:40,  1.63it/s]


 53%|█████████████████▍               | 26494/50000 [4:48:21<4:25:07,  1.48it/s]


 53%|█████████████████▍               | 26495/50000 [4:48:22<4:14:03,  1.54it/s]


 53%|█████████████████▍               | 26496/50000 [4:48:22<4:12:12,  1.55it/s]


 53%|█████████████████▍               | 26497/50000 [4:48:23<4:22:19,  1.49it/s]


 53%|█████████████████▍               | 26498/50000 [4:48:24<4:18:44,  1.51it/s]


 53%|█████████████████▍               | 26499/50000 [4:48:24<4:09:37,  1.57it/s]


 53%|█████████████████▍               | 26500/50000 [4:48:25<4:00:16,  1.63it/s]
                                                                                
{'loss': 3.2465, 'grad_norm': 4.3433332443237305, 'learning_rate': 0.00047, 'epoch': 1.39}

 53%|█████████████████▍               | 26500/50000 [4:48:25<4:00:16,  1.63it/s]


 53%|█████████████████▍               | 26501/50000 [4:48:25<3:56:42,  1.65it/s]


 53%|█████████████████▍               | 26502/50000 [4:48:26<3:58:38,  1.64it/s]


 53%|█████████████████▍               | 26503/50000 [4:48:27<3:55:30,  1.66it/s]


 53%|█████████████████▍               | 26504/50000 [4:48:27<3:55:27,  1.66it/s]


 53%|█████████████████▍               | 26505/50000 [4:48:28<4:11:13,  1.56it/s]


 53%|█████████████████▍               | 26506/50000 [4:48:29<4:04:18,  1.60it/s]


 53%|█████████████████▍               | 26507/50000 [4:48:29<4:02:21,  1.62it/s]


 53%|█████████████████▍               | 26508/50000 [4:48:30<3:54:38,  1.67it/s]


 53%|█████████████████▍               | 26509/50000 [4:48:30<4:04:41,  1.60it/s]


 53%|█████████████████▍               | 26510/50000 [4:48:31<3:57:30,  1.65it/s]


 53%|█████████████████▍               | 26511/50000 [4:48:32<4:05:04,  1.60it/s]


 53%|█████████████████▍               | 26512/50000 [4:48:32<4:01:26,  1.62it/s]


 53%|█████████████████▍               | 26513/50000 [4:48:33<4:12:20,  1.55it/s]


 53%|█████████████████▍               | 26514/50000 [4:48:34<4:02:10,  1.62it/s]


 53%|█████████████████▍               | 26515/50000 [4:48:34<3:57:21,  1.65it/s]


 53%|█████████████████▌               | 26516/50000 [4:48:35<3:59:50,  1.63it/s]


 53%|█████████████████▌               | 26517/50000 [4:48:35<4:08:08,  1.58it/s]


 53%|█████████████████▌               | 26518/50000 [4:48:36<4:10:14,  1.56it/s]


 53%|█████████████████▌               | 26519/50000 [4:48:37<4:08:48,  1.57it/s]


 53%|█████████████████▌               | 26520/50000 [4:48:38<4:31:25,  1.44it/s]


 53%|█████████████████▌               | 26521/50000 [4:48:38<4:30:24,  1.45it/s]


 53%|█████████████████▌               | 26522/50000 [4:48:39<4:33:17,  1.43it/s]


 53%|█████████████████▌               | 26523/50000 [4:48:40<5:02:52,  1.29it/s]


 53%|█████████████████▌               | 26524/50000 [4:48:41<4:47:02,  1.36it/s]


 53%|█████████████████▌               | 26525/50000 [4:48:41<4:35:13,  1.42it/s]


 53%|█████████████████▌               | 26526/50000 [4:48:42<4:30:41,  1.45it/s]


 53%|█████████████████▌               | 26527/50000 [4:48:42<4:23:51,  1.48it/s]


 53%|█████████████████▌               | 26528/50000 [4:48:43<4:25:00,  1.48it/s]


 53%|█████████████████▌               | 26529/50000 [4:48:44<4:19:04,  1.51it/s]


 53%|█████████████████▌               | 26530/50000 [4:48:44<4:19:26,  1.51it/s]


 53%|█████████████████▌               | 26531/50000 [4:48:45<4:15:10,  1.53it/s]


 53%|█████████████████▌               | 26532/50000 [4:48:46<4:13:35,  1.54it/s]


 53%|█████████████████▌               | 26533/50000 [4:48:47<4:29:13,  1.45it/s]


 53%|█████████████████▌               | 26534/50000 [4:48:47<4:16:51,  1.52it/s]


 53%|█████████████████▌               | 26535/50000 [4:48:48<4:13:19,  1.54it/s]


 53%|█████████████████▌               | 26536/50000 [4:48:48<4:03:29,  1.61it/s]


 53%|█████████████████▌               | 26537/50000 [4:48:49<4:06:10,  1.59it/s]


 53%|█████████████████▌               | 26538/50000 [4:48:50<4:18:52,  1.51it/s]


 53%|█████████████████▌               | 26539/50000 [4:48:50<4:09:49,  1.57it/s]


 53%|█████████████████▌               | 26540/50000 [4:48:51<4:22:16,  1.49it/s]


 53%|█████████████████▌               | 26541/50000 [4:48:52<4:18:50,  1.51it/s]


 53%|█████████████████▌               | 26542/50000 [4:48:52<4:20:07,  1.50it/s]


 53%|█████████████████▌               | 26543/50000 [4:48:53<4:11:23,  1.56it/s]


 53%|█████████████████▌               | 26544/50000 [4:48:54<4:15:40,  1.53it/s]


 53%|█████████████████▌               | 26545/50000 [4:48:54<4:12:43,  1.55it/s]


 53%|█████████████████▌               | 26546/50000 [4:48:55<4:22:03,  1.49it/s]


 53%|█████████████████▌               | 26547/50000 [4:48:55<4:08:27,  1.57it/s]


 53%|█████████████████▌               | 26548/50000 [4:48:56<4:08:24,  1.57it/s]


 53%|█████████████████▌               | 26549/50000 [4:48:57<4:09:04,  1.57it/s]


 53%|█████████████████▌               | 26550/50000 [4:48:57<4:01:24,  1.62it/s]


 53%|█████████████████▌               | 26551/50000 [4:48:58<4:14:57,  1.53it/s]


 53%|█████████████████▌               | 26552/50000 [4:48:59<4:33:36,  1.43it/s]


 53%|█████████████████▌               | 26553/50000 [4:49:00<4:48:14,  1.36it/s]


 53%|█████████████████▌               | 26554/50000 [4:49:00<4:40:43,  1.39it/s]


 53%|█████████████████▌               | 26555/50000 [4:49:01<4:43:54,  1.38it/s]


 53%|█████████████████▌               | 26556/50000 [4:49:02<4:50:33,  1.34it/s]


 53%|█████████████████▌               | 26557/50000 [4:49:03<4:43:42,  1.38it/s]


 53%|█████████████████▌               | 26558/50000 [4:49:03<4:25:23,  1.47it/s]


 53%|█████████████████▌               | 26559/50000 [4:49:04<4:11:08,  1.56it/s]


 53%|█████████████████▌               | 26560/50000 [4:49:05<4:35:08,  1.42it/s]


 53%|█████████████████▌               | 26561/50000 [4:49:05<4:26:05,  1.47it/s]


 53%|█████████████████▌               | 26562/50000 [4:49:06<4:11:41,  1.55it/s]


 53%|█████████████████▌               | 26563/50000 [4:49:06<4:06:01,  1.59it/s]


 53%|█████████████████▌               | 26564/50000 [4:49:07<4:09:21,  1.57it/s]


 53%|█████████████████▌               | 26565/50000 [4:49:07<3:50:41,  1.69it/s]


 53%|█████████████████▌               | 26566/50000 [4:49:08<3:46:49,  1.72it/s]


 53%|█████████████████▌               | 26567/50000 [4:49:09<3:40:01,  1.78it/s]


 53%|█████████████████▌               | 26568/50000 [4:49:09<3:51:53,  1.68it/s]


 53%|█████████████████▌               | 26569/50000 [4:49:10<3:51:18,  1.69it/s]


 53%|█████████████████▌               | 26570/50000 [4:49:10<3:45:45,  1.73it/s]


 53%|█████████████████▌               | 26571/50000 [4:49:11<3:46:19,  1.73it/s]


 53%|█████████████████▌               | 26572/50000 [4:49:12<3:53:20,  1.67it/s]


 53%|█████████████████▌               | 26573/50000 [4:49:12<3:51:40,  1.69it/s]


 53%|█████████████████▌               | 26574/50000 [4:49:13<3:55:39,  1.66it/s]


 53%|█████████████████▌               | 26575/50000 [4:49:13<3:59:07,  1.63it/s]


 53%|█████████████████▌               | 26576/50000 [4:49:14<4:05:58,  1.59it/s]


 53%|█████████████████▌               | 26577/50000 [4:49:15<3:56:01,  1.65it/s]


 53%|█████████████████▌               | 26578/50000 [4:49:15<3:55:42,  1.66it/s]


 53%|█████████████████▌               | 26579/50000 [4:49:16<4:05:10,  1.59it/s]


 53%|█████████████████▌               | 26580/50000 [4:49:17<4:48:15,  1.35it/s]


 53%|█████████████████▌               | 26581/50000 [4:49:18<4:28:35,  1.45it/s]


 53%|█████████████████▌               | 26582/50000 [4:49:18<4:36:29,  1.41it/s]


 53%|█████████████████▌               | 26583/50000 [4:49:19<4:49:06,  1.35it/s]


 53%|█████████████████▌               | 26584/50000 [4:49:20<4:22:04,  1.49it/s]


 53%|█████████████████▌               | 26585/50000 [4:49:20<4:00:58,  1.62it/s]


 53%|█████████████████▌               | 26586/50000 [4:49:21<3:56:55,  1.65it/s]


 53%|█████████████████▌               | 26587/50000 [4:49:21<4:02:10,  1.61it/s]


 53%|█████████████████▌               | 26588/50000 [4:49:22<3:50:39,  1.69it/s]


 53%|█████████████████▌               | 26589/50000 [4:49:23<4:04:32,  1.60it/s]


 53%|█████████████████▌               | 26590/50000 [4:49:23<4:05:59,  1.59it/s]


 53%|█████████████████▌               | 26591/50000 [4:49:24<4:12:11,  1.55it/s]


 53%|█████████████████▌               | 26592/50000 [4:49:25<4:14:22,  1.53it/s]


 53%|█████████████████▌               | 26593/50000 [4:49:25<4:16:40,  1.52it/s]


 53%|█████████████████▌               | 26594/50000 [4:49:26<4:07:08,  1.58it/s]


 53%|█████████████████▌               | 26595/50000 [4:49:26<4:09:22,  1.56it/s]


 53%|█████████████████▌               | 26596/50000 [4:49:27<3:58:21,  1.64it/s]


 53%|█████████████████▌               | 26597/50000 [4:49:27<3:47:18,  1.72it/s]


 53%|█████████████████▌               | 26598/50000 [4:49:28<4:09:03,  1.57it/s]


 53%|█████████████████▌               | 26599/50000 [4:49:29<4:07:42,  1.57it/s]


 53%|█████████████████▌               | 26600/50000 [4:49:29<3:52:23,  1.68it/s]
                                                                                
{'loss': 3.2933, 'grad_norm': 4.235383987426758, 'learning_rate': 0.00046800000000000005, 'epoch': 1.39}

 53%|█████████████████▌               | 26600/50000 [4:49:29<3:52:23,  1.68it/s]


 53%|█████████████████▌               | 26601/50000 [4:49:30<4:02:00,  1.61it/s]


 53%|█████████████████▌               | 26602/50000 [4:49:31<3:56:04,  1.65it/s]


 53%|█████████████████▌               | 26603/50000 [4:49:31<4:25:14,  1.47it/s]


 53%|█████████████████▌               | 26604/50000 [4:49:32<4:19:01,  1.51it/s]


 53%|█████████████████▌               | 26605/50000 [4:49:33<4:40:30,  1.39it/s]


 53%|█████████████████▌               | 26606/50000 [4:49:34<4:25:13,  1.47it/s]


 53%|█████████████████▌               | 26607/50000 [4:49:34<4:03:10,  1.60it/s]


 53%|█████████████████▌               | 26608/50000 [4:49:35<4:13:55,  1.54it/s]


 53%|█████████████████▌               | 26609/50000 [4:49:35<4:11:51,  1.55it/s]


 53%|█████████████████▌               | 26610/50000 [4:49:36<4:16:19,  1.52it/s]


 53%|█████████████████▌               | 26611/50000 [4:49:37<4:55:17,  1.32it/s]


 53%|█████████████████▌               | 26612/50000 [4:49:38<4:46:39,  1.36it/s]


 53%|█████████████████▌               | 26613/50000 [4:49:38<4:29:32,  1.45it/s]


 53%|█████████████████▌               | 26614/50000 [4:49:39<4:25:44,  1.47it/s]


 53%|█████████████████▌               | 26615/50000 [4:49:40<4:31:21,  1.44it/s]


 53%|█████████████████▌               | 26616/50000 [4:49:40<4:29:57,  1.44it/s]


 53%|█████████████████▌               | 26617/50000 [4:49:41<4:21:33,  1.49it/s]


 53%|█████████████████▌               | 26618/50000 [4:49:42<4:22:18,  1.49it/s]


 53%|█████████████████▌               | 26619/50000 [4:49:42<4:20:36,  1.50it/s]


 53%|█████████████████▌               | 26620/50000 [4:49:43<4:07:36,  1.57it/s]


 53%|█████████████████▌               | 26621/50000 [4:49:44<4:12:30,  1.54it/s]


 53%|█████████████████▌               | 26622/50000 [4:49:44<4:01:59,  1.61it/s]


 53%|█████████████████▌               | 26623/50000 [4:49:45<4:18:34,  1.51it/s]


 53%|█████████████████▌               | 26624/50000 [4:49:46<4:20:12,  1.50it/s]


 53%|█████████████████▌               | 26625/50000 [4:49:46<4:11:42,  1.55it/s]


 53%|█████████████████▌               | 26626/50000 [4:49:47<4:19:04,  1.50it/s]


 53%|█████████████████▌               | 26627/50000 [4:49:48<4:15:11,  1.53it/s]


 53%|█████████████████▌               | 26628/50000 [4:49:48<4:05:18,  1.59it/s]


 53%|█████████████████▌               | 26629/50000 [4:49:49<4:07:54,  1.57it/s]


 53%|█████████████████▌               | 26630/50000 [4:49:49<4:09:00,  1.56it/s]


 53%|█████████████████▌               | 26631/50000 [4:49:50<3:58:46,  1.63it/s]


 53%|█████████████████▌               | 26632/50000 [4:49:51<4:35:13,  1.42it/s]


 53%|█████████████████▌               | 26633/50000 [4:49:52<4:32:30,  1.43it/s]


 53%|█████████████████▌               | 26634/50000 [4:49:52<4:24:45,  1.47it/s]


 53%|█████████████████▌               | 26635/50000 [4:49:53<4:05:39,  1.59it/s]


 53%|█████████████████▌               | 26636/50000 [4:49:53<3:57:50,  1.64it/s]


 53%|█████████████████▌               | 26637/50000 [4:49:54<4:03:28,  1.60it/s]


 53%|█████████████████▌               | 26638/50000 [4:49:55<3:56:17,  1.65it/s]


 53%|█████████████████▌               | 26639/50000 [4:49:55<3:52:43,  1.67it/s]


 53%|█████████████████▌               | 26640/50000 [4:49:56<3:49:49,  1.69it/s]


 53%|█████████████████▌               | 26641/50000 [4:49:56<4:12:21,  1.54it/s]


 53%|█████████████████▌               | 26642/50000 [4:49:57<4:13:06,  1.54it/s]


 53%|█████████████████▌               | 26643/50000 [4:49:58<4:05:30,  1.59it/s]


 53%|█████████████████▌               | 26644/50000 [4:49:58<4:07:26,  1.57it/s]


 53%|█████████████████▌               | 26645/50000 [4:49:59<4:09:32,  1.56it/s]


 53%|█████████████████▌               | 26646/50000 [4:50:00<4:05:07,  1.59it/s]


 53%|█████████████████▌               | 26647/50000 [4:50:00<3:59:52,  1.62it/s]


 53%|█████████████████▌               | 26648/50000 [4:50:01<3:56:53,  1.64it/s]


 53%|█████████████████▌               | 26649/50000 [4:50:01<3:51:51,  1.68it/s]


 53%|█████████████████▌               | 26650/50000 [4:50:02<4:01:37,  1.61it/s]


 53%|█████████████████▌               | 26651/50000 [4:50:03<3:54:29,  1.66it/s]


 53%|█████████████████▌               | 26652/50000 [4:50:03<3:59:47,  1.62it/s]


 53%|█████████████████▌               | 26653/50000 [4:50:04<4:08:03,  1.57it/s]


 53%|█████████████████▌               | 26654/50000 [4:50:05<4:04:43,  1.59it/s]


 53%|█████████████████▌               | 26655/50000 [4:50:05<4:14:03,  1.53it/s]


 53%|█████████████████▌               | 26656/50000 [4:50:06<4:34:34,  1.42it/s]


 53%|█████████████████▌               | 26657/50000 [4:50:07<4:21:14,  1.49it/s]


 53%|█████████████████▌               | 26658/50000 [4:50:07<4:11:03,  1.55it/s]


 53%|█████████████████▌               | 26659/50000 [4:50:08<4:09:41,  1.56it/s]


 53%|█████████████████▌               | 26660/50000 [4:50:09<4:10:05,  1.56it/s]


 53%|█████████████████▌               | 26661/50000 [4:50:09<4:18:35,  1.50it/s]


 53%|█████████████████▌               | 26662/50000 [4:50:10<4:28:26,  1.45it/s]


 53%|█████████████████▌               | 26663/50000 [4:50:11<4:26:15,  1.46it/s]


 53%|█████████████████▌               | 26664/50000 [4:50:11<4:22:09,  1.48it/s]


 53%|█████████████████▌               | 26665/50000 [4:50:12<4:07:20,  1.57it/s]


 53%|█████████████████▌               | 26666/50000 [4:50:13<4:19:06,  1.50it/s]


 53%|█████████████████▌               | 26667/50000 [4:50:13<4:20:23,  1.49it/s]


 53%|█████████████████▌               | 26668/50000 [4:50:14<4:03:13,  1.60it/s]


 53%|█████████████████▌               | 26669/50000 [4:50:14<4:02:40,  1.60it/s]


 53%|█████████████████▌               | 26670/50000 [4:50:15<4:00:43,  1.62it/s]


 53%|█████████████████▌               | 26671/50000 [4:50:16<3:57:53,  1.63it/s]


 53%|█████████████████▌               | 26672/50000 [4:50:16<4:01:51,  1.61it/s]


 53%|█████████████████▌               | 26673/50000 [4:50:17<4:09:02,  1.56it/s]


 53%|█████████████████▌               | 26674/50000 [4:50:18<4:02:59,  1.60it/s]


 53%|█████████████████▌               | 26675/50000 [4:50:18<3:55:12,  1.65it/s]


 53%|█████████████████▌               | 26676/50000 [4:50:19<3:54:43,  1.66it/s]


 53%|█████████████████▌               | 26677/50000 [4:50:19<4:10:55,  1.55it/s]


 53%|█████████████████▌               | 26678/50000 [4:50:20<4:13:09,  1.54it/s]


 53%|█████████████████▌               | 26679/50000 [4:50:21<4:06:35,  1.58it/s]


 53%|█████████████████▌               | 26680/50000 [4:50:21<4:05:50,  1.58it/s]


 53%|█████████████████▌               | 26681/50000 [4:50:22<4:06:41,  1.58it/s]


 53%|█████████████████▌               | 26682/50000 [4:50:23<4:28:33,  1.45it/s]


 53%|█████████████████▌               | 26683/50000 [4:50:23<4:30:42,  1.44it/s]


 53%|█████████████████▌               | 26684/50000 [4:50:24<4:24:10,  1.47it/s]


 53%|█████████████████▌               | 26685/50000 [4:50:25<4:13:35,  1.53it/s]


 53%|█████████████████▌               | 26686/50000 [4:50:25<4:10:34,  1.55it/s]


 53%|█████████████████▌               | 26687/50000 [4:50:26<4:01:10,  1.61it/s]


 53%|█████████████████▌               | 26688/50000 [4:50:27<4:02:39,  1.60it/s]


 53%|█████████████████▌               | 26689/50000 [4:50:27<4:29:34,  1.44it/s]


 53%|█████████████████▌               | 26690/50000 [4:50:28<4:23:11,  1.48it/s]


 53%|█████████████████▌               | 26691/50000 [4:50:29<4:16:35,  1.51it/s]


 53%|█████████████████▌               | 26692/50000 [4:50:29<4:14:08,  1.53it/s]


 53%|█████████████████▌               | 26693/50000 [4:50:30<4:20:34,  1.49it/s]


 53%|█████████████████▌               | 26694/50000 [4:50:31<4:28:15,  1.45it/s]


 53%|█████████████████▌               | 26695/50000 [4:50:31<4:21:35,  1.48it/s]


 53%|█████████████████▌               | 26696/50000 [4:50:32<4:30:23,  1.44it/s]


 53%|█████████████████▌               | 26697/50000 [4:50:33<4:23:47,  1.47it/s]


 53%|█████████████████▌               | 26698/50000 [4:50:33<4:03:20,  1.60it/s]


 53%|█████████████████▌               | 26699/50000 [4:50:34<4:04:47,  1.59it/s]


 53%|█████████████████▌               | 26700/50000 [4:50:35<4:15:41,  1.52it/s]
                                                                                
{'loss': 3.265, 'grad_norm': 4.997198581695557, 'learning_rate': 0.00046600000000000005, 'epoch': 1.4}

 53%|█████████████████▌               | 26700/50000 [4:50:35<4:15:41,  1.52it/s]


 53%|█████████████████▌               | 26701/50000 [4:50:35<4:04:43,  1.59it/s]


 53%|█████████████████▌               | 26702/50000 [4:50:36<3:57:44,  1.63it/s]


 53%|█████████████████▌               | 26703/50000 [4:50:36<3:53:33,  1.66it/s]


 53%|█████████████████▌               | 26704/50000 [4:50:37<4:03:06,  1.60it/s]


 53%|█████████████████▋               | 26705/50000 [4:50:38<3:53:40,  1.66it/s]


 53%|█████████████████▋               | 26706/50000 [4:50:38<3:58:01,  1.63it/s]


 53%|█████████████████▋               | 26707/50000 [4:50:39<3:43:54,  1.73it/s]


 53%|█████████████████▋               | 26708/50000 [4:50:39<3:46:01,  1.72it/s]


 53%|█████████████████▋               | 26709/50000 [4:50:40<4:02:31,  1.60it/s]


 53%|█████████████████▋               | 26710/50000 [4:50:41<4:05:40,  1.58it/s]


 53%|█████████████████▋               | 26711/50000 [4:50:41<4:08:41,  1.56it/s]


 53%|█████████████████▋               | 26712/50000 [4:50:42<4:19:52,  1.49it/s]


 53%|█████████████████▋               | 26713/50000 [4:50:43<4:11:10,  1.55it/s]


 53%|█████████████████▋               | 26714/50000 [4:50:43<4:20:06,  1.49it/s]


 53%|█████████████████▋               | 26715/50000 [4:50:44<4:15:03,  1.52it/s]


 53%|█████████████████▋               | 26716/50000 [4:50:45<4:03:08,  1.60it/s]


 53%|█████████████████▋               | 26717/50000 [4:50:45<3:57:31,  1.63it/s]


 53%|█████████████████▋               | 26718/50000 [4:50:46<3:58:31,  1.63it/s]


 53%|█████████████████▋               | 26719/50000 [4:50:46<3:59:57,  1.62it/s]


 53%|█████████████████▋               | 26720/50000 [4:50:47<3:55:48,  1.65it/s]


 53%|█████████████████▋               | 26721/50000 [4:50:48<4:03:54,  1.59it/s]


 53%|█████████████████▋               | 26722/50000 [4:50:48<4:08:09,  1.56it/s]


 53%|█████████████████▋               | 26723/50000 [4:50:49<4:02:15,  1.60it/s]


 53%|█████████████████▋               | 26724/50000 [4:50:50<4:18:20,  1.50it/s]


 53%|█████████████████▋               | 26725/50000 [4:50:50<4:07:53,  1.56it/s]


 53%|█████████████████▋               | 26726/50000 [4:50:51<4:10:51,  1.55it/s]


 53%|█████████████████▋               | 26727/50000 [4:50:52<4:07:50,  1.57it/s]


 53%|█████████████████▋               | 26728/50000 [4:50:52<3:59:05,  1.62it/s]


 53%|█████████████████▋               | 26729/50000 [4:50:53<3:52:16,  1.67it/s]


 53%|█████████████████▋               | 26730/50000 [4:50:53<3:48:54,  1.69it/s]


 53%|█████████████████▋               | 26731/50000 [4:50:54<4:07:25,  1.57it/s]


 53%|█████████████████▋               | 26732/50000 [4:50:55<4:06:50,  1.57it/s]


 53%|█████████████████▋               | 26733/50000 [4:50:55<4:21:25,  1.48it/s]


 53%|█████████████████▋               | 26734/50000 [4:50:56<4:07:54,  1.56it/s]


 53%|█████████████████▋               | 26735/50000 [4:50:57<4:07:50,  1.56it/s]


 53%|█████████████████▋               | 26736/50000 [4:50:57<4:22:11,  1.48it/s]


 53%|█████████████████▋               | 26737/50000 [4:50:58<4:29:13,  1.44it/s]


 53%|█████████████████▋               | 26738/50000 [4:50:59<4:36:26,  1.40it/s]


 53%|█████████████████▋               | 26739/50000 [4:51:00<4:36:01,  1.40it/s]


 53%|█████████████████▋               | 26740/50000 [4:51:00<4:19:28,  1.49it/s]


 53%|█████████████████▋               | 26741/50000 [4:51:01<4:17:27,  1.51it/s]


 53%|█████████████████▋               | 26742/50000 [4:51:01<4:13:54,  1.53it/s]


 53%|█████████████████▋               | 26743/50000 [4:51:02<4:23:25,  1.47it/s]


 53%|█████████████████▋               | 26744/50000 [4:51:03<4:26:55,  1.45it/s]


 53%|█████████████████▋               | 26745/50000 [4:51:03<4:11:43,  1.54it/s]


 53%|█████████████████▋               | 26746/50000 [4:51:04<4:08:34,  1.56it/s]


 53%|█████████████████▋               | 26747/50000 [4:51:05<3:50:24,  1.68it/s]


 53%|█████████████████▋               | 26748/50000 [4:51:05<4:04:46,  1.58it/s]


 53%|█████████████████▋               | 26749/50000 [4:51:06<3:59:08,  1.62it/s]


 54%|█████████████████▋               | 26750/50000 [4:51:07<4:23:42,  1.47it/s]


 54%|█████████████████▋               | 26751/50000 [4:51:07<4:39:25,  1.39it/s]


 54%|█████████████████▋               | 26752/50000 [4:51:08<4:50:34,  1.33it/s]


 54%|█████████████████▋               | 26753/50000 [4:51:09<4:37:05,  1.40it/s]


 54%|█████████████████▋               | 26754/50000 [4:51:10<4:24:34,  1.46it/s]


 54%|█████████████████▋               | 26755/50000 [4:51:10<4:32:19,  1.42it/s]


 54%|█████████████████▋               | 26756/50000 [4:51:11<4:19:46,  1.49it/s]


 54%|█████████████████▋               | 26757/50000 [4:51:11<4:00:26,  1.61it/s]


 54%|█████████████████▋               | 26758/50000 [4:51:12<3:54:32,  1.65it/s]


 54%|█████████████████▋               | 26759/50000 [4:51:13<4:02:55,  1.59it/s]


 54%|█████████████████▋               | 26760/50000 [4:51:13<4:12:32,  1.53it/s]


 54%|█████████████████▋               | 26761/50000 [4:51:14<4:14:01,  1.52it/s]


 54%|█████████████████▋               | 26762/50000 [4:51:15<4:15:05,  1.52it/s]


 54%|█████████████████▋               | 26763/50000 [4:51:15<4:14:17,  1.52it/s]


 54%|█████████████████▋               | 26764/50000 [4:51:16<4:22:06,  1.48it/s]


 54%|█████████████████▋               | 26765/50000 [4:51:17<4:11:08,  1.54it/s]


 54%|█████████████████▋               | 26766/50000 [4:51:17<4:18:10,  1.50it/s]


 54%|█████████████████▋               | 26767/50000 [4:51:18<4:14:41,  1.52it/s]


 54%|█████████████████▋               | 26768/50000 [4:51:19<4:26:08,  1.45it/s]


 54%|█████████████████▋               | 26769/50000 [4:51:19<4:21:17,  1.48it/s]


 54%|█████████████████▋               | 26770/50000 [4:51:20<4:33:37,  1.41it/s]


 54%|█████████████████▋               | 26771/50000 [4:51:21<4:27:21,  1.45it/s]


 54%|█████████████████▋               | 26772/50000 [4:51:22<4:34:12,  1.41it/s]


 54%|█████████████████▋               | 26773/50000 [4:51:22<4:36:03,  1.40it/s]


 54%|█████████████████▋               | 26774/50000 [4:51:23<4:21:46,  1.48it/s]


 54%|█████████████████▋               | 26775/50000 [4:51:23<4:13:04,  1.53it/s]


 54%|█████████████████▋               | 26776/50000 [4:51:24<4:22:40,  1.47it/s]


 54%|█████████████████▋               | 26777/50000 [4:51:25<4:37:51,  1.39it/s]


 54%|█████████████████▋               | 26778/50000 [4:51:26<4:32:28,  1.42it/s]


 54%|█████████████████▋               | 26779/50000 [4:51:26<4:37:12,  1.40it/s]


 54%|█████████████████▋               | 26780/50000 [4:51:27<4:29:55,  1.43it/s]


 54%|█████████████████▋               | 26781/50000 [4:51:28<4:17:33,  1.50it/s]


 54%|█████████████████▋               | 26782/50000 [4:51:28<3:59:44,  1.61it/s]


 54%|█████████████████▋               | 26783/50000 [4:51:29<4:03:43,  1.59it/s]


 54%|█████████████████▋               | 26784/50000 [4:51:30<4:19:22,  1.49it/s]


 54%|█████████████████▋               | 26785/50000 [4:51:30<4:11:25,  1.54it/s]


 54%|█████████████████▋               | 26786/50000 [4:51:31<4:08:43,  1.56it/s]


 54%|█████████████████▋               | 26787/50000 [4:51:31<4:02:25,  1.60it/s]


 54%|█████████████████▋               | 26788/50000 [4:51:32<4:06:09,  1.57it/s]


 54%|█████████████████▋               | 26789/50000 [4:51:33<4:18:31,  1.50it/s]


 54%|█████████████████▋               | 26790/50000 [4:51:33<4:00:08,  1.61it/s]


 54%|█████████████████▋               | 26791/50000 [4:51:34<3:57:11,  1.63it/s]


 54%|█████████████████▋               | 26792/50000 [4:51:35<3:58:49,  1.62it/s]


 54%|█████████████████▋               | 26793/50000 [4:51:35<3:56:54,  1.63it/s]


 54%|█████████████████▋               | 26794/50000 [4:51:36<4:16:29,  1.51it/s]


 54%|█████████████████▋               | 26795/50000 [4:51:36<4:05:11,  1.58it/s]


 54%|█████████████████▋               | 26796/50000 [4:51:37<4:03:54,  1.59it/s]


 54%|█████████████████▋               | 26797/50000 [4:51:38<4:16:37,  1.51it/s]


 54%|█████████████████▋               | 26798/50000 [4:51:38<4:12:26,  1.53it/s]


 54%|█████████████████▋               | 26799/50000 [4:51:39<4:02:48,  1.59it/s]


 54%|█████████████████▋               | 26800/50000 [4:51:40<4:02:40,  1.59it/s]
                                                                                
{'loss': 3.2713, 'grad_norm': 3.5518946647644043, 'learning_rate': 0.00046400000000000006, 'epoch': 1.4}

 54%|█████████████████▋               | 26800/50000 [4:51:40<4:02:40,  1.59it/s]


 54%|█████████████████▋               | 26801/50000 [4:51:40<4:04:03,  1.58it/s]


 54%|█████████████████▋               | 26802/50000 [4:51:41<3:59:52,  1.61it/s]


 54%|█████████████████▋               | 26803/50000 [4:51:42<4:25:33,  1.46it/s]


 54%|█████████████████▋               | 26804/50000 [4:51:42<4:15:10,  1.52it/s]


 54%|█████████████████▋               | 26805/50000 [4:51:43<4:05:28,  1.57it/s]


 54%|█████████████████▋               | 26806/50000 [4:51:44<4:06:40,  1.57it/s]


 54%|█████████████████▋               | 26807/50000 [4:51:44<3:50:22,  1.68it/s]


 54%|█████████████████▋               | 26808/50000 [4:51:45<4:09:40,  1.55it/s]


 54%|█████████████████▋               | 26809/50000 [4:51:45<4:11:55,  1.53it/s]


 54%|█████████████████▋               | 26810/50000 [4:51:46<4:12:45,  1.53it/s]


 54%|█████████████████▋               | 26811/50000 [4:51:47<4:30:52,  1.43it/s]


 54%|█████████████████▋               | 26812/50000 [4:51:48<4:21:38,  1.48it/s]


 54%|█████████████████▋               | 26813/50000 [4:51:48<4:20:55,  1.48it/s]


 54%|█████████████████▋               | 26814/50000 [4:51:49<4:13:04,  1.53it/s]


 54%|█████████████████▋               | 26815/50000 [4:51:50<4:15:02,  1.52it/s]


 54%|█████████████████▋               | 26816/50000 [4:51:50<4:14:52,  1.52it/s]


 54%|█████████████████▋               | 26817/50000 [4:51:51<4:16:28,  1.51it/s]


 54%|█████████████████▋               | 26818/50000 [4:51:52<4:19:25,  1.49it/s]


 54%|█████████████████▋               | 26819/50000 [4:51:52<4:29:56,  1.43it/s]


 54%|█████████████████▋               | 26820/50000 [4:51:53<4:18:20,  1.50it/s]


 54%|█████████████████▋               | 26821/50000 [4:51:54<4:10:46,  1.54it/s]


 54%|█████████████████▋               | 26822/50000 [4:51:54<4:07:44,  1.56it/s]


 54%|█████████████████▋               | 26823/50000 [4:51:55<4:15:53,  1.51it/s]


 54%|█████████████████▋               | 26824/50000 [4:51:56<4:33:13,  1.41it/s]


 54%|█████████████████▋               | 26825/50000 [4:51:56<4:41:10,  1.37it/s]


 54%|█████████████████▋               | 26826/50000 [4:51:57<4:55:07,  1.31it/s]


 54%|█████████████████▋               | 26827/50000 [4:51:58<5:11:30,  1.24it/s]


 54%|█████████████████▋               | 26828/50000 [4:51:59<4:50:43,  1.33it/s]


 54%|█████████████████▋               | 26829/50000 [4:51:59<4:31:02,  1.42it/s]


 54%|█████████████████▋               | 26830/50000 [4:52:00<4:15:47,  1.51it/s]


 54%|█████████████████▋               | 26831/50000 [4:52:01<4:16:06,  1.51it/s]


 54%|█████████████████▋               | 26832/50000 [4:52:01<4:16:17,  1.51it/s]


 54%|█████████████████▋               | 26833/50000 [4:52:02<4:08:21,  1.55it/s]


 54%|█████████████████▋               | 26834/50000 [4:52:03<4:11:11,  1.54it/s]


 54%|█████████████████▋               | 26835/50000 [4:52:03<3:55:52,  1.64it/s]


 54%|█████████████████▋               | 26836/50000 [4:52:04<3:53:19,  1.65it/s]


 54%|█████████████████▋               | 26837/50000 [4:52:04<3:48:37,  1.69it/s]


 54%|█████████████████▋               | 26838/50000 [4:52:05<4:06:50,  1.56it/s]


 54%|█████████████████▋               | 26839/50000 [4:52:06<4:06:08,  1.57it/s]


 54%|█████████████████▋               | 26840/50000 [4:52:06<3:59:02,  1.61it/s]


 54%|█████████████████▋               | 26841/50000 [4:52:07<3:50:25,  1.68it/s]


 54%|█████████████████▋               | 26842/50000 [4:52:07<4:03:32,  1.58it/s]


 54%|█████████████████▋               | 26843/50000 [4:52:08<4:04:30,  1.58it/s]


 54%|█████████████████▋               | 26844/50000 [4:52:09<4:04:28,  1.58it/s]


 54%|█████████████████▋               | 26845/50000 [4:52:09<4:00:46,  1.60it/s]


 54%|█████████████████▋               | 26846/50000 [4:52:10<3:58:08,  1.62it/s]


 54%|█████████████████▋               | 26847/50000 [4:52:11<4:04:21,  1.58it/s]


 54%|█████████████████▋               | 26848/50000 [4:52:11<4:03:36,  1.58it/s]


 54%|█████████████████▋               | 26849/50000 [4:52:12<4:09:46,  1.54it/s]


 54%|█████████████████▋               | 26850/50000 [4:52:12<4:01:36,  1.60it/s]


 54%|█████████████████▋               | 26851/50000 [4:52:13<3:45:18,  1.71it/s]


 54%|█████████████████▋               | 26852/50000 [4:52:14<3:42:58,  1.73it/s]


 54%|█████████████████▋               | 26853/50000 [4:52:14<3:44:40,  1.72it/s]


 54%|█████████████████▋               | 26854/50000 [4:52:15<3:42:35,  1.73it/s]


 54%|█████████████████▋               | 26855/50000 [4:52:15<3:40:20,  1.75it/s]


 54%|█████████████████▋               | 26856/50000 [4:52:16<3:52:38,  1.66it/s]


 54%|█████████████████▋               | 26857/50000 [4:52:17<3:57:32,  1.62it/s]


 54%|█████████████████▋               | 26858/50000 [4:52:17<3:58:50,  1.61it/s]


 54%|█████████████████▋               | 26859/50000 [4:52:18<4:09:15,  1.55it/s]


 54%|█████████████████▋               | 26860/50000 [4:52:19<4:12:27,  1.53it/s]


 54%|█████████████████▋               | 26861/50000 [4:52:19<4:21:13,  1.48it/s]


 54%|█████████████████▋               | 26862/50000 [4:52:20<4:12:25,  1.53it/s]


 54%|█████████████████▋               | 26863/50000 [4:52:21<4:09:55,  1.54it/s]


 54%|█████████████████▋               | 26864/50000 [4:52:21<4:02:20,  1.59it/s]


 54%|█████████████████▋               | 26865/50000 [4:52:22<4:13:23,  1.52it/s]


 54%|█████████████████▋               | 26866/50000 [4:52:22<4:06:56,  1.56it/s]


 54%|█████████████████▋               | 26867/50000 [4:52:23<4:06:48,  1.56it/s]


 54%|█████████████████▋               | 26868/50000 [4:52:24<3:53:08,  1.65it/s]


 54%|█████████████████▋               | 26869/50000 [4:52:24<3:52:54,  1.66it/s]


 54%|█████████████████▋               | 26870/50000 [4:52:25<4:00:08,  1.61it/s]


 54%|█████████████████▋               | 26871/50000 [4:52:26<4:07:18,  1.56it/s]


 54%|█████████████████▋               | 26872/50000 [4:52:26<3:56:11,  1.63it/s]


 54%|█████████████████▋               | 26873/50000 [4:52:27<4:11:17,  1.53it/s]


 54%|█████████████████▋               | 26874/50000 [4:52:28<4:10:32,  1.54it/s]


 54%|█████████████████▋               | 26875/50000 [4:52:28<4:28:57,  1.43it/s]


 54%|█████████████████▋               | 26876/50000 [4:52:29<4:15:39,  1.51it/s]


 54%|█████████████████▋               | 26877/50000 [4:52:30<4:34:04,  1.41it/s]


 54%|█████████████████▋               | 26878/50000 [4:52:30<4:27:21,  1.44it/s]


 54%|█████████████████▋               | 26879/50000 [4:52:31<4:35:28,  1.40it/s]


 54%|█████████████████▋               | 26880/50000 [4:52:32<4:17:29,  1.50it/s]


 54%|█████████████████▋               | 26881/50000 [4:52:32<4:23:09,  1.46it/s]


 54%|█████████████████▋               | 26882/50000 [4:52:33<4:23:33,  1.46it/s]


 54%|█████████████████▋               | 26883/50000 [4:52:34<4:21:36,  1.47it/s]


 54%|█████████████████▋               | 26884/50000 [4:52:34<4:22:28,  1.47it/s]


 54%|█████████████████▋               | 26885/50000 [4:52:35<4:20:48,  1.48it/s]


 54%|█████████████████▋               | 26886/50000 [4:52:36<4:24:47,  1.45it/s]


 54%|█████████████████▋               | 26887/50000 [4:52:36<4:12:59,  1.52it/s]


 54%|█████████████████▋               | 26888/50000 [4:52:37<4:15:39,  1.51it/s]


 54%|█████████████████▋               | 26889/50000 [4:52:38<4:14:33,  1.51it/s]


 54%|█████████████████▋               | 26890/50000 [4:52:39<4:24:22,  1.46it/s]


 54%|█████████████████▋               | 26891/50000 [4:52:39<4:28:57,  1.43it/s]


 54%|█████████████████▋               | 26892/50000 [4:52:40<4:30:27,  1.42it/s]


 54%|█████████████████▋               | 26893/50000 [4:52:41<4:22:44,  1.47it/s]


 54%|█████████████████▊               | 26894/50000 [4:52:41<4:26:09,  1.45it/s]


 54%|█████████████████▊               | 26895/50000 [4:52:42<4:28:32,  1.43it/s]


 54%|█████████████████▊               | 26896/50000 [4:52:43<4:31:50,  1.42it/s]


 54%|█████████████████▊               | 26897/50000 [4:52:43<4:25:50,  1.45it/s]


 54%|█████████████████▊               | 26898/50000 [4:52:44<4:20:49,  1.48it/s]


 54%|█████████████████▊               | 26899/50000 [4:52:45<4:21:37,  1.47it/s]


 54%|█████████████████▊               | 26900/50000 [4:52:45<4:29:43,  1.43it/s]
                                                                                
{'loss': 3.3018, 'grad_norm': 3.3360676765441895, 'learning_rate': 0.000462, 'epoch': 1.41}

 54%|█████████████████▊               | 26900/50000 [4:52:45<4:29:43,  1.43it/s]


 54%|█████████████████▊               | 26901/50000 [4:52:46<4:34:45,  1.40it/s]


 54%|█████████████████▊               | 26902/50000 [4:52:47<4:38:13,  1.38it/s]


 54%|█████████████████▊               | 26903/50000 [4:52:48<4:43:32,  1.36it/s]


 54%|█████████████████▊               | 26904/50000 [4:52:48<4:30:13,  1.42it/s]


 54%|█████████████████▊               | 26905/50000 [4:52:49<4:37:14,  1.39it/s]


 54%|█████████████████▊               | 26906/50000 [4:52:50<4:37:40,  1.39it/s]


 54%|█████████████████▊               | 26907/50000 [4:52:50<4:28:56,  1.43it/s]


 54%|█████████████████▊               | 26908/50000 [4:52:51<4:16:15,  1.50it/s]


 54%|█████████████████▊               | 26909/50000 [4:52:52<4:05:59,  1.56it/s]


 54%|█████████████████▊               | 26910/50000 [4:52:52<4:06:01,  1.56it/s]


 54%|█████████████████▊               | 26911/50000 [4:52:53<3:57:19,  1.62it/s]


 54%|█████████████████▊               | 26912/50000 [4:52:53<3:55:33,  1.63it/s]


 54%|█████████████████▊               | 26913/50000 [4:52:54<3:57:14,  1.62it/s]


 54%|█████████████████▊               | 26914/50000 [4:52:55<3:57:45,  1.62it/s]


 54%|█████████████████▊               | 26915/50000 [4:52:55<3:52:57,  1.65it/s]


 54%|█████████████████▊               | 26916/50000 [4:52:56<3:55:27,  1.63it/s]


 54%|█████████████████▊               | 26917/50000 [4:52:57<3:58:17,  1.61it/s]


 54%|█████████████████▊               | 26918/50000 [4:52:57<3:54:56,  1.64it/s]


 54%|█████████████████▊               | 26919/50000 [4:52:58<4:20:05,  1.48it/s]


 54%|█████████████████▊               | 26920/50000 [4:52:59<4:07:08,  1.56it/s]


 54%|█████████████████▊               | 26921/50000 [4:52:59<4:30:53,  1.42it/s]


 54%|█████████████████▊               | 26922/50000 [4:53:00<4:39:50,  1.37it/s]


 54%|█████████████████▊               | 26923/50000 [4:53:01<4:31:08,  1.42it/s]


 54%|█████████████████▊               | 26924/50000 [4:53:01<4:17:44,  1.49it/s]


 54%|█████████████████▊               | 26925/50000 [4:53:02<4:05:26,  1.57it/s]


 54%|█████████████████▊               | 26926/50000 [4:53:03<4:34:59,  1.40it/s]


 54%|█████████████████▊               | 26927/50000 [4:53:04<4:37:43,  1.38it/s]


 54%|█████████████████▊               | 26928/50000 [4:53:04<4:26:56,  1.44it/s]


 54%|█████████████████▊               | 26929/50000 [4:53:05<4:31:48,  1.41it/s]


 54%|█████████████████▊               | 26930/50000 [4:53:06<4:15:16,  1.51it/s]


 54%|█████████████████▊               | 26931/50000 [4:53:06<4:05:08,  1.57it/s]


 54%|█████████████████▊               | 26932/50000 [4:53:07<4:03:09,  1.58it/s]


 54%|█████████████████▊               | 26933/50000 [4:53:07<4:18:49,  1.49it/s]


 54%|█████████████████▊               | 26934/50000 [4:53:08<4:15:36,  1.50it/s]


 54%|█████████████████▊               | 26935/50000 [4:53:09<4:14:54,  1.51it/s]


 54%|█████████████████▊               | 26936/50000 [4:53:09<4:14:24,  1.51it/s]


 54%|█████████████████▊               | 26937/50000 [4:53:10<4:27:28,  1.44it/s]


 54%|█████████████████▊               | 26938/50000 [4:53:11<4:23:53,  1.46it/s]


 54%|█████████████████▊               | 26939/50000 [4:53:12<4:49:52,  1.33it/s]


 54%|█████████████████▊               | 26940/50000 [4:53:12<4:38:11,  1.38it/s]


 54%|█████████████████▊               | 26941/50000 [4:53:13<4:23:43,  1.46it/s]


 54%|█████████████████▊               | 26942/50000 [4:53:14<4:01:36,  1.59it/s]


 54%|█████████████████▊               | 26943/50000 [4:53:14<4:14:56,  1.51it/s]


 54%|█████████████████▊               | 26944/50000 [4:53:15<3:56:06,  1.63it/s]


 54%|█████████████████▊               | 26945/50000 [4:53:15<3:57:03,  1.62it/s]


 54%|█████████████████▊               | 26946/50000 [4:53:16<4:01:25,  1.59it/s]


 54%|█████████████████▊               | 26947/50000 [4:53:17<4:05:13,  1.57it/s]


 54%|█████████████████▊               | 26948/50000 [4:53:17<4:13:39,  1.51it/s]


 54%|█████████████████▊               | 26949/50000 [4:53:18<4:06:27,  1.56it/s]


 54%|█████████████████▊               | 26950/50000 [4:53:19<4:11:41,  1.53it/s]


 54%|█████████████████▊               | 26951/50000 [4:53:19<3:55:52,  1.63it/s]


 54%|█████████████████▊               | 26952/50000 [4:53:20<4:00:25,  1.60it/s]


 54%|█████████████████▊               | 26953/50000 [4:53:21<4:01:28,  1.59it/s]


 54%|█████████████████▊               | 26954/50000 [4:53:21<4:10:43,  1.53it/s]


 54%|█████████████████▊               | 26955/50000 [4:53:22<4:13:30,  1.52it/s]


 54%|█████████████████▊               | 26956/50000 [4:53:23<4:12:29,  1.52it/s]


 54%|█████████████████▊               | 26957/50000 [4:53:23<3:55:25,  1.63it/s]


 54%|█████████████████▊               | 26958/50000 [4:53:24<3:47:31,  1.69it/s]


 54%|█████████████████▊               | 26959/50000 [4:53:24<3:39:23,  1.75it/s]


 54%|█████████████████▊               | 26960/50000 [4:53:25<3:46:29,  1.70it/s]


 54%|█████████████████▊               | 26961/50000 [4:53:25<3:41:15,  1.74it/s]


 54%|█████████████████▊               | 26962/50000 [4:53:26<3:59:00,  1.61it/s]


 54%|█████████████████▊               | 26963/50000 [4:53:27<4:11:18,  1.53it/s]


 54%|█████████████████▊               | 26964/50000 [4:53:27<4:00:06,  1.60it/s]


 54%|█████████████████▊               | 26965/50000 [4:53:28<4:11:23,  1.53it/s]


 54%|█████████████████▊               | 26966/50000 [4:53:29<4:13:55,  1.51it/s]


 54%|█████████████████▊               | 26967/50000 [4:53:29<4:19:25,  1.48it/s]


 54%|█████████████████▊               | 26968/50000 [4:53:30<4:07:56,  1.55it/s]


 54%|█████████████████▊               | 26969/50000 [4:53:31<4:32:38,  1.41it/s]


 54%|█████████████████▊               | 26970/50000 [4:53:31<4:18:34,  1.48it/s]


 54%|█████████████████▊               | 26971/50000 [4:53:32<4:13:11,  1.52it/s]


 54%|█████████████████▊               | 26972/50000 [4:53:33<4:06:28,  1.56it/s]


 54%|█████████████████▊               | 26973/50000 [4:53:33<4:00:30,  1.60it/s]


 54%|█████████████████▊               | 26974/50000 [4:53:34<4:11:44,  1.52it/s]


 54%|█████████████████▊               | 26975/50000 [4:53:35<4:08:24,  1.54it/s]


 54%|█████████████████▊               | 26976/50000 [4:53:35<3:58:48,  1.61it/s]


 54%|█████████████████▊               | 26977/50000 [4:53:36<3:59:15,  1.60it/s]


 54%|█████████████████▊               | 26978/50000 [4:53:37<4:10:46,  1.53it/s]


 54%|█████████████████▊               | 26979/50000 [4:53:37<4:10:31,  1.53it/s]


 54%|█████████████████▊               | 26980/50000 [4:53:38<4:12:31,  1.52it/s]


 54%|█████████████████▊               | 26981/50000 [4:53:38<4:01:37,  1.59it/s]


 54%|█████████████████▊               | 26982/50000 [4:53:39<3:58:16,  1.61it/s]


 54%|█████████████████▊               | 26983/50000 [4:53:40<3:51:38,  1.66it/s]


 54%|█████████████████▊               | 26984/50000 [4:53:40<3:46:58,  1.69it/s]


 54%|█████████████████▊               | 26985/50000 [4:53:41<3:54:33,  1.64it/s]


 54%|█████████████████▊               | 26986/50000 [4:53:42<4:18:02,  1.49it/s]


 54%|█████████████████▊               | 26987/50000 [4:53:42<4:17:10,  1.49it/s]


 54%|█████████████████▊               | 26988/50000 [4:53:43<4:17:17,  1.49it/s]


 54%|█████████████████▊               | 26989/50000 [4:53:44<4:24:47,  1.45it/s]


 54%|█████████████████▊               | 26990/50000 [4:53:44<4:16:49,  1.49it/s]


 54%|█████████████████▊               | 26991/50000 [4:53:45<4:17:05,  1.49it/s]


 54%|█████████████████▊               | 26992/50000 [4:53:46<4:16:25,  1.50it/s]


 54%|█████████████████▊               | 26993/50000 [4:53:46<4:06:27,  1.56it/s]


 54%|█████████████████▊               | 26994/50000 [4:53:47<4:16:32,  1.49it/s]


 54%|█████████████████▊               | 26995/50000 [4:53:48<4:16:41,  1.49it/s]


 54%|█████████████████▊               | 26996/50000 [4:53:48<4:24:24,  1.45it/s]


 54%|█████████████████▊               | 26997/50000 [4:53:49<4:14:54,  1.50it/s]


 54%|█████████████████▊               | 26998/50000 [4:53:50<4:23:14,  1.46it/s]


 54%|█████████████████▊               | 26999/50000 [4:53:50<4:11:55,  1.52it/s]


 54%|█████████████████▊               | 27000/50000 [4:53:51<4:09:50,  1.53it/s]
                                                                                
{'loss': 3.3022, 'grad_norm': 4.703917503356934, 'learning_rate': 0.00046, 'epoch': 1.41}

 54%|█████████████████▊               | 27000/50000 [4:53:51<4:09:50,  1.53it/s]


 54%|█████████████████▊               | 27001/50000 [4:53:52<4:11:21,  1.52it/s]


 54%|█████████████████▊               | 27002/50000 [4:53:52<4:34:16,  1.40it/s]


 54%|█████████████████▊               | 27003/50000 [4:53:53<4:28:24,  1.43it/s]


 54%|█████████████████▊               | 27004/50000 [4:53:54<4:29:26,  1.42it/s]


 54%|█████████████████▊               | 27005/50000 [4:53:54<4:16:14,  1.50it/s]


 54%|█████████████████▊               | 27006/50000 [4:53:55<4:08:32,  1.54it/s]


 54%|█████████████████▊               | 27007/50000 [4:53:56<3:53:59,  1.64it/s]


 54%|█████████████████▊               | 27008/50000 [4:53:56<3:47:07,  1.69it/s]


 54%|█████████████████▊               | 27009/50000 [4:53:57<3:53:11,  1.64it/s]


 54%|█████████████████▊               | 27010/50000 [4:53:57<4:05:24,  1.56it/s]


 54%|█████████████████▊               | 27011/50000 [4:53:58<4:19:24,  1.48it/s]


 54%|█████████████████▊               | 27012/50000 [4:53:59<4:17:57,  1.49it/s]


 54%|█████████████████▊               | 27013/50000 [4:54:00<4:27:25,  1.43it/s]


 54%|█████████████████▊               | 27014/50000 [4:54:00<4:34:00,  1.40it/s]


 54%|█████████████████▊               | 27015/50000 [4:54:01<4:38:41,  1.37it/s]


 54%|█████████████████▊               | 27016/50000 [4:54:02<4:40:22,  1.37it/s]


 54%|█████████████████▊               | 27017/50000 [4:54:03<4:24:39,  1.45it/s]


 54%|█████████████████▊               | 27018/50000 [4:54:03<4:29:53,  1.42it/s]


 54%|█████████████████▊               | 27019/50000 [4:54:04<4:34:14,  1.40it/s]


 54%|█████████████████▊               | 27020/50000 [4:54:05<4:23:16,  1.45it/s]


 54%|█████████████████▊               | 27021/50000 [4:54:05<4:09:02,  1.54it/s]


 54%|█████████████████▊               | 27022/50000 [4:54:06<4:09:16,  1.54it/s]


 54%|█████████████████▊               | 27023/50000 [4:54:06<4:04:13,  1.57it/s]


 54%|█████████████████▊               | 27024/50000 [4:54:07<4:02:15,  1.58it/s]


 54%|█████████████████▊               | 27025/50000 [4:54:08<4:16:42,  1.49it/s]


 54%|█████████████████▊               | 27026/50000 [4:54:09<4:18:42,  1.48it/s]


 54%|█████████████████▊               | 27027/50000 [4:54:09<4:09:55,  1.53it/s]


 54%|█████████████████▊               | 27028/50000 [4:54:10<4:12:56,  1.51it/s]


 54%|█████████████████▊               | 27029/50000 [4:54:10<4:14:02,  1.51it/s]


 54%|█████████████████▊               | 27030/50000 [4:54:11<4:55:04,  1.30it/s]


 54%|█████████████████▊               | 27031/50000 [4:54:12<4:40:41,  1.36it/s]


 54%|█████████████████▊               | 27032/50000 [4:54:13<4:33:46,  1.40it/s]


 54%|█████████████████▊               | 27033/50000 [4:54:14<4:45:32,  1.34it/s]


 54%|█████████████████▊               | 27034/50000 [4:54:14<4:31:19,  1.41it/s]


 54%|█████████████████▊               | 27035/50000 [4:54:15<4:48:06,  1.33it/s]


 54%|█████████████████▊               | 27036/50000 [4:54:16<4:49:10,  1.32it/s]


 54%|█████████████████▊               | 27037/50000 [4:54:17<4:43:49,  1.35it/s]


 54%|█████████████████▊               | 27038/50000 [4:54:17<4:32:10,  1.41it/s]


 54%|█████████████████▊               | 27039/50000 [4:54:18<4:18:49,  1.48it/s]


 54%|█████████████████▊               | 27040/50000 [4:54:18<4:18:07,  1.48it/s]


 54%|█████████████████▊               | 27041/50000 [4:54:19<4:16:58,  1.49it/s]


 54%|█████████████████▊               | 27042/50000 [4:54:20<4:17:35,  1.49it/s]


 54%|█████████████████▊               | 27043/50000 [4:54:20<4:14:27,  1.50it/s]


 54%|█████████████████▊               | 27044/50000 [4:54:21<4:21:12,  1.46it/s]


 54%|█████████████████▊               | 27045/50000 [4:54:22<4:06:48,  1.55it/s]


 54%|█████████████████▊               | 27046/50000 [4:54:23<4:42:07,  1.36it/s]


 54%|█████████████████▊               | 27047/50000 [4:54:23<4:32:19,  1.40it/s]


 54%|█████████████████▊               | 27048/50000 [4:54:24<4:45:40,  1.34it/s]


 54%|█████████████████▊               | 27049/50000 [4:54:25<4:36:11,  1.38it/s]


 54%|█████████████████▊               | 27050/50000 [4:54:25<4:21:34,  1.46it/s]


 54%|█████████████████▊               | 27051/50000 [4:54:26<4:12:01,  1.52it/s]


 54%|█████████████████▊               | 27052/50000 [4:54:27<4:08:16,  1.54it/s]


 54%|█████████████████▊               | 27053/50000 [4:54:27<4:09:58,  1.53it/s]


 54%|█████████████████▊               | 27054/50000 [4:54:28<4:02:59,  1.57it/s]


 54%|█████████████████▊               | 27055/50000 [4:54:29<4:16:02,  1.49it/s]


 54%|█████████████████▊               | 27056/50000 [4:54:29<4:07:28,  1.55it/s]


 54%|█████████████████▊               | 27057/50000 [4:54:30<3:52:26,  1.65it/s]


 54%|█████████████████▊               | 27058/50000 [4:54:30<3:54:38,  1.63it/s]


 54%|█████████████████▊               | 27059/50000 [4:54:31<3:57:31,  1.61it/s]


 54%|█████████████████▊               | 27060/50000 [4:54:32<4:19:46,  1.47it/s]


 54%|█████████████████▊               | 27061/50000 [4:54:32<4:10:03,  1.53it/s]


 54%|█████████████████▊               | 27062/50000 [4:54:33<4:09:05,  1.53it/s]


 54%|█████████████████▊               | 27063/50000 [4:54:34<4:12:42,  1.51it/s]


 54%|█████████████████▊               | 27064/50000 [4:54:34<3:56:03,  1.62it/s]


 54%|█████████████████▊               | 27065/50000 [4:54:35<3:57:48,  1.61it/s]


 54%|█████████████████▊               | 27066/50000 [4:54:36<4:01:08,  1.59it/s]


 54%|█████████████████▊               | 27067/50000 [4:54:36<4:10:05,  1.53it/s]


 54%|█████████████████▊               | 27068/50000 [4:54:37<3:53:33,  1.64it/s]


 54%|█████████████████▊               | 27069/50000 [4:54:37<3:55:22,  1.62it/s]


 54%|█████████████████▊               | 27070/50000 [4:54:38<3:52:12,  1.65it/s]


 54%|█████████████████▊               | 27071/50000 [4:54:39<3:57:12,  1.61it/s]


 54%|█████████████████▊               | 27072/50000 [4:54:39<4:10:25,  1.53it/s]


 54%|█████████████████▊               | 27073/50000 [4:54:40<4:09:58,  1.53it/s]


 54%|█████████████████▊               | 27074/50000 [4:54:41<4:02:26,  1.58it/s]


 54%|█████████████████▊               | 27075/50000 [4:54:41<3:56:26,  1.62it/s]


 54%|█████████████████▊               | 27076/50000 [4:54:42<4:01:38,  1.58it/s]


 54%|█████████████████▊               | 27077/50000 [4:54:43<4:46:57,  1.33it/s]


 54%|█████████████████▊               | 27078/50000 [4:54:44<4:38:27,  1.37it/s]


 54%|█████████████████▊               | 27079/50000 [4:54:44<4:20:14,  1.47it/s]


 54%|█████████████████▊               | 27080/50000 [4:54:45<4:28:11,  1.42it/s]


 54%|█████████████████▊               | 27081/50000 [4:54:45<4:07:09,  1.55it/s]


 54%|█████████████████▊               | 27082/50000 [4:54:46<4:19:41,  1.47it/s]


 54%|█████████████████▊               | 27083/50000 [4:54:47<4:19:40,  1.47it/s]


 54%|█████████████████▉               | 27084/50000 [4:54:48<4:18:50,  1.48it/s]


 54%|█████████████████▉               | 27085/50000 [4:54:48<4:16:05,  1.49it/s]


 54%|█████████████████▉               | 27086/50000 [4:54:49<4:02:36,  1.57it/s]


 54%|█████████████████▉               | 27087/50000 [4:54:49<3:56:04,  1.62it/s]


 54%|█████████████████▉               | 27088/50000 [4:54:50<3:43:10,  1.71it/s]


 54%|█████████████████▉               | 27089/50000 [4:54:51<3:53:23,  1.64it/s]


 54%|█████████████████▉               | 27090/50000 [4:54:51<3:52:29,  1.64it/s]


 54%|█████████████████▉               | 27091/50000 [4:54:52<3:48:06,  1.67it/s]


 54%|█████████████████▉               | 27092/50000 [4:54:52<3:47:03,  1.68it/s]


 54%|█████████████████▉               | 27093/50000 [4:54:53<3:52:04,  1.65it/s]


 54%|█████████████████▉               | 27094/50000 [4:54:53<3:46:57,  1.68it/s]


 54%|█████████████████▉               | 27095/50000 [4:54:54<3:42:42,  1.71it/s]


 54%|█████████████████▉               | 27096/50000 [4:54:55<3:48:22,  1.67it/s]


 54%|█████████████████▉               | 27097/50000 [4:54:55<3:37:31,  1.75it/s]


 54%|█████████████████▉               | 27098/50000 [4:54:56<3:43:56,  1.70it/s]


 54%|█████████████████▉               | 27099/50000 [4:54:56<3:50:36,  1.66it/s]


 54%|█████████████████▉               | 27100/50000 [4:54:57<3:41:13,  1.73it/s]
                                                                                
{'loss': 3.3255, 'grad_norm': 3.911872625350952, 'learning_rate': 0.000458, 'epoch': 1.42}

 54%|█████████████████▉               | 27100/50000 [4:54:57<3:41:13,  1.73it/s]


 54%|█████████████████▉               | 27101/50000 [4:54:58<3:53:11,  1.64it/s]


 54%|█████████████████▉               | 27102/50000 [4:54:58<3:45:36,  1.69it/s]


 54%|█████████████████▉               | 27103/50000 [4:54:59<3:54:06,  1.63it/s]


 54%|█████████████████▉               | 27104/50000 [4:54:59<3:52:01,  1.64it/s]


 54%|█████████████████▉               | 27105/50000 [4:55:00<4:07:25,  1.54it/s]


 54%|█████████████████▉               | 27106/50000 [4:55:01<4:05:40,  1.55it/s]


 54%|█████████████████▉               | 27107/50000 [4:55:01<4:03:46,  1.57it/s]


 54%|█████████████████▉               | 27108/50000 [4:55:02<4:00:08,  1.59it/s]


 54%|█████████████████▉               | 27109/50000 [4:55:03<4:22:54,  1.45it/s]


 54%|█████████████████▉               | 27110/50000 [4:55:04<4:27:13,  1.43it/s]


 54%|█████████████████▉               | 27111/50000 [4:55:04<4:21:45,  1.46it/s]


 54%|█████████████████▉               | 27112/50000 [4:55:05<4:10:35,  1.52it/s]


 54%|█████████████████▉               | 27113/50000 [4:55:05<4:02:45,  1.57it/s]


 54%|█████████████████▉               | 27114/50000 [4:55:06<4:16:24,  1.49it/s]


 54%|█████████████████▉               | 27115/50000 [4:55:07<4:11:53,  1.51it/s]


 54%|█████████████████▉               | 27116/50000 [4:55:07<4:08:45,  1.53it/s]


 54%|█████████████████▉               | 27117/50000 [4:55:08<4:08:40,  1.53it/s]


 54%|█████████████████▉               | 27118/50000 [4:55:09<4:02:51,  1.57it/s]


 54%|█████████████████▉               | 27119/50000 [4:55:09<4:15:41,  1.49it/s]


 54%|█████████████████▉               | 27120/50000 [4:55:10<4:17:12,  1.48it/s]


 54%|█████████████████▉               | 27121/50000 [4:55:11<4:08:47,  1.53it/s]


 54%|█████████████████▉               | 27122/50000 [4:55:11<4:09:26,  1.53it/s]


 54%|█████████████████▉               | 27123/50000 [4:55:12<4:01:12,  1.58it/s]


 54%|█████████████████▉               | 27124/50000 [4:55:13<3:59:50,  1.59it/s]


 54%|█████████████████▉               | 27125/50000 [4:55:13<4:15:44,  1.49it/s]


 54%|█████████████████▉               | 27126/50000 [4:55:14<4:03:28,  1.57it/s]


 54%|█████████████████▉               | 27127/50000 [4:55:14<3:52:45,  1.64it/s]


 54%|█████████████████▉               | 27128/50000 [4:55:15<3:50:57,  1.65it/s]


 54%|█████████████████▉               | 27129/50000 [4:55:16<3:48:14,  1.67it/s]


 54%|█████████████████▉               | 27130/50000 [4:55:16<4:13:03,  1.51it/s]


 54%|█████████████████▉               | 27131/50000 [4:55:17<4:02:15,  1.57it/s]


 54%|█████████████████▉               | 27132/50000 [4:55:18<3:56:50,  1.61it/s]


 54%|█████████████████▉               | 27133/50000 [4:55:18<3:53:03,  1.64it/s]


 54%|█████████████████▉               | 27134/50000 [4:55:19<4:01:14,  1.58it/s]


 54%|█████████████████▉               | 27135/50000 [4:55:20<4:04:49,  1.56it/s]


 54%|█████████████████▉               | 27136/50000 [4:55:20<4:08:45,  1.53it/s]


 54%|█████████████████▉               | 27137/50000 [4:55:21<4:16:52,  1.48it/s]


 54%|█████████████████▉               | 27138/50000 [4:55:22<4:14:18,  1.50it/s]


 54%|█████████████████▉               | 27139/50000 [4:55:22<4:27:17,  1.43it/s]


 54%|█████████████████▉               | 27140/50000 [4:55:23<4:23:05,  1.45it/s]


 54%|█████████████████▉               | 27141/50000 [4:55:24<4:25:54,  1.43it/s]


 54%|█████████████████▉               | 27142/50000 [4:55:24<4:19:08,  1.47it/s]


 54%|█████████████████▉               | 27143/50000 [4:55:25<4:14:22,  1.50it/s]


 54%|█████████████████▉               | 27144/50000 [4:55:26<4:07:00,  1.54it/s]


 54%|█████████████████▉               | 27145/50000 [4:55:26<4:01:03,  1.58it/s]


 54%|█████████████████▉               | 27146/50000 [4:55:27<4:06:05,  1.55it/s]


 54%|█████████████████▉               | 27147/50000 [4:55:28<4:17:09,  1.48it/s]


 54%|█████████████████▉               | 27148/50000 [4:55:28<4:04:22,  1.56it/s]


 54%|█████████████████▉               | 27149/50000 [4:55:29<3:56:51,  1.61it/s]


 54%|█████████████████▉               | 27150/50000 [4:55:30<4:09:15,  1.53it/s]


 54%|█████████████████▉               | 27151/50000 [4:55:30<4:09:42,  1.53it/s]


 54%|█████████████████▉               | 27152/50000 [4:55:31<3:57:06,  1.61it/s]


 54%|█████████████████▉               | 27153/50000 [4:55:32<4:14:38,  1.50it/s]


 54%|█████████████████▉               | 27154/50000 [4:55:32<4:12:53,  1.51it/s]


 54%|█████████████████▉               | 27155/50000 [4:55:33<4:01:36,  1.58it/s]


 54%|█████████████████▉               | 27156/50000 [4:55:33<3:57:14,  1.60it/s]


 54%|█████████████████▉               | 27157/50000 [4:55:34<4:15:29,  1.49it/s]


 54%|█████████████████▉               | 27158/50000 [4:55:35<4:20:04,  1.46it/s]


 54%|█████████████████▉               | 27159/50000 [4:55:35<4:08:03,  1.53it/s]


 54%|█████████████████▉               | 27160/50000 [4:55:36<4:27:03,  1.43it/s]


 54%|█████████████████▉               | 27161/50000 [4:55:37<4:13:45,  1.50it/s]


 54%|█████████████████▉               | 27162/50000 [4:55:38<4:30:05,  1.41it/s]


 54%|█████████████████▉               | 27163/50000 [4:55:38<4:21:22,  1.46it/s]


 54%|█████████████████▉               | 27164/50000 [4:55:39<4:23:55,  1.44it/s]


 54%|█████████████████▉               | 27165/50000 [4:55:40<4:13:21,  1.50it/s]


 54%|█████████████████▉               | 27166/50000 [4:55:40<4:04:31,  1.56it/s]


 54%|█████████████████▉               | 27167/50000 [4:55:41<3:58:21,  1.60it/s]


 54%|█████████████████▉               | 27168/50000 [4:55:42<4:19:56,  1.46it/s]


 54%|█████████████████▉               | 27169/50000 [4:55:42<4:12:46,  1.51it/s]


 54%|█████████████████▉               | 27170/50000 [4:55:43<4:07:47,  1.54it/s]


 54%|█████████████████▉               | 27171/50000 [4:55:43<4:05:43,  1.55it/s]


 54%|█████████████████▉               | 27172/50000 [4:55:44<4:03:28,  1.56it/s]


 54%|█████████████████▉               | 27173/50000 [4:55:45<4:13:41,  1.50it/s]


 54%|█████████████████▉               | 27174/50000 [4:55:46<4:22:25,  1.45it/s]


 54%|█████████████████▉               | 27175/50000 [4:55:46<4:11:33,  1.51it/s]


 54%|█████████████████▉               | 27176/50000 [4:55:47<4:08:17,  1.53it/s]


 54%|█████████████████▉               | 27177/50000 [4:55:47<3:50:33,  1.65it/s]


 54%|█████████████████▉               | 27178/50000 [4:55:48<3:52:11,  1.64it/s]


 54%|█████████████████▉               | 27179/50000 [4:55:49<4:11:37,  1.51it/s]


 54%|█████████████████▉               | 27180/50000 [4:55:49<4:13:24,  1.50it/s]


 54%|█████████████████▉               | 27181/50000 [4:55:50<4:03:48,  1.56it/s]


 54%|█████████████████▉               | 27182/50000 [4:55:51<4:15:21,  1.49it/s]


 54%|█████████████████▉               | 27183/50000 [4:55:51<4:11:06,  1.51it/s]


 54%|█████████████████▉               | 27184/50000 [4:55:52<4:04:33,  1.55it/s]


 54%|█████████████████▉               | 27185/50000 [4:55:53<4:02:13,  1.57it/s]


 54%|█████████████████▉               | 27186/50000 [4:55:53<4:02:01,  1.57it/s]


 54%|█████████████████▉               | 27187/50000 [4:55:54<3:51:40,  1.64it/s]


 54%|█████████████████▉               | 27188/50000 [4:55:54<3:50:42,  1.65it/s]


 54%|█████████████████▉               | 27189/50000 [4:55:55<4:04:01,  1.56it/s]


 54%|█████████████████▉               | 27190/50000 [4:55:56<4:02:24,  1.57it/s]


 54%|█████████████████▉               | 27191/50000 [4:55:56<4:12:17,  1.51it/s]


 54%|█████████████████▉               | 27192/50000 [4:55:57<4:17:25,  1.48it/s]


 54%|█████████████████▉               | 27193/50000 [4:55:58<4:23:24,  1.44it/s]


 54%|█████████████████▉               | 27194/50000 [4:55:58<4:12:53,  1.50it/s]


 54%|█████████████████▉               | 27195/50000 [4:55:59<4:04:06,  1.56it/s]


 54%|█████████████████▉               | 27196/50000 [4:56:00<4:05:58,  1.55it/s]


 54%|█████████████████▉               | 27197/50000 [4:56:00<4:21:24,  1.45it/s]


 54%|█████████████████▉               | 27198/50000 [4:56:01<4:11:45,  1.51it/s]


 54%|█████████████████▉               | 27199/50000 [4:56:02<4:25:49,  1.43it/s]


 54%|█████████████████▉               | 27200/50000 [4:56:03<4:20:41,  1.46it/s]


                                                                                
{'loss': 3.267, 'grad_norm': 2.942537307739258, 'learning_rate': 0.000456, 'epoch': 1.42}

 54%|█████████████████▉               | 27200/50000 [4:56:03<4:20:41,  1.46it/s]


 54%|█████████████████▉               | 27201/50000 [4:56:03<4:11:16,  1.51it/s]


 54%|█████████████████▉               | 27202/50000 [4:56:04<3:58:56,  1.59it/s]


 54%|█████████████████▉               | 27203/50000 [4:56:04<4:02:33,  1.57it/s]


 54%|█████████████████▉               | 27204/50000 [4:56:05<4:22:19,  1.45it/s]


 54%|█████████████████▉               | 27205/50000 [4:56:06<4:12:22,  1.51it/s]


 54%|█████████████████▉               | 27206/50000 [4:56:06<3:59:00,  1.59it/s]


 54%|█████████████████▉               | 27207/50000 [4:56:07<4:15:50,  1.48it/s]


 54%|█████████████████▉               | 27208/50000 [4:56:08<4:05:42,  1.55it/s]


 54%|█████████████████▉               | 27209/50000 [4:56:08<4:07:14,  1.54it/s]


 54%|█████████████████▉               | 27210/50000 [4:56:09<4:11:14,  1.51it/s]


 54%|█████████████████▉               | 27211/50000 [4:56:10<4:17:35,  1.47it/s]


 54%|█████████████████▉               | 27212/50000 [4:56:10<4:21:17,  1.45it/s]


 54%|█████████████████▉               | 27213/50000 [4:56:11<4:02:29,  1.57it/s]


 54%|█████████████████▉               | 27214/50000 [4:56:12<3:53:56,  1.62it/s]


 54%|█████████████████▉               | 27215/50000 [4:56:12<4:12:41,  1.50it/s]


 54%|█████████████████▉               | 27216/50000 [4:56:13<4:29:05,  1.41it/s]


 54%|█████████████████▉               | 27217/50000 [4:56:14<4:37:19,  1.37it/s]


 54%|█████████████████▉               | 27218/50000 [4:56:15<4:36:32,  1.37it/s]


 54%|█████████████████▉               | 27219/50000 [4:56:15<4:29:17,  1.41it/s]


 54%|█████████████████▉               | 27220/50000 [4:56:16<4:12:40,  1.50it/s]


 54%|█████████████████▉               | 27221/50000 [4:56:16<4:11:48,  1.51it/s]


 54%|█████████████████▉               | 27222/50000 [4:56:17<4:03:57,  1.56it/s]


 54%|█████████████████▉               | 27223/50000 [4:56:18<3:49:30,  1.65it/s]


 54%|█████████████████▉               | 27224/50000 [4:56:18<3:57:44,  1.60it/s]


 54%|█████████████████▉               | 27225/50000 [4:56:19<3:54:08,  1.62it/s]


 54%|█████████████████▉               | 27226/50000 [4:56:19<3:52:18,  1.63it/s]


 54%|█████████████████▉               | 27227/50000 [4:56:20<3:41:23,  1.71it/s]


 54%|█████████████████▉               | 27228/50000 [4:56:21<3:56:26,  1.61it/s]


 54%|█████████████████▉               | 27229/50000 [4:56:21<3:50:20,  1.65it/s]


 54%|█████████████████▉               | 27230/50000 [4:56:22<3:46:05,  1.68it/s]


 54%|█████████████████▉               | 27231/50000 [4:56:22<3:51:42,  1.64it/s]


 54%|█████████████████▉               | 27232/50000 [4:56:23<3:46:59,  1.67it/s]


 54%|█████████████████▉               | 27233/50000 [4:56:24<4:11:02,  1.51it/s]


 54%|█████████████████▉               | 27234/50000 [4:56:25<4:09:09,  1.52it/s]


 54%|█████████████████▉               | 27235/50000 [4:56:25<4:12:16,  1.50it/s]


 54%|█████████████████▉               | 27236/50000 [4:56:26<4:04:57,  1.55it/s]


 54%|█████████████████▉               | 27237/50000 [4:56:26<3:59:50,  1.58it/s]


 54%|█████████████████▉               | 27238/50000 [4:56:27<4:13:11,  1.50it/s]


 54%|█████████████████▉               | 27239/50000 [4:56:28<4:12:07,  1.50it/s]


 54%|█████████████████▉               | 27240/50000 [4:56:29<4:33:50,  1.39it/s]


 54%|█████████████████▉               | 27241/50000 [4:56:29<4:46:00,  1.33it/s]


 54%|█████████████████▉               | 27242/50000 [4:56:30<4:31:28,  1.40it/s]


 54%|█████████████████▉               | 27243/50000 [4:56:31<4:21:51,  1.45it/s]


 54%|█████████████████▉               | 27244/50000 [4:56:31<4:14:21,  1.49it/s]


 54%|█████████████████▉               | 27245/50000 [4:56:32<4:05:54,  1.54it/s]


 54%|█████████████████▉               | 27246/50000 [4:56:33<4:09:25,  1.52it/s]


 54%|█████████████████▉               | 27247/50000 [4:56:33<4:11:10,  1.51it/s]


 54%|█████████████████▉               | 27248/50000 [4:56:34<3:59:29,  1.58it/s]


 54%|█████████████████▉               | 27249/50000 [4:56:34<3:45:08,  1.68it/s]


 55%|█████████████████▉               | 27250/50000 [4:56:35<4:20:46,  1.45it/s]


 55%|█████████████████▉               | 27251/50000 [4:56:36<4:15:28,  1.48it/s]


 55%|█████████████████▉               | 27252/50000 [4:56:37<4:11:45,  1.51it/s]


 55%|█████████████████▉               | 27253/50000 [4:56:37<4:19:19,  1.46it/s]


 55%|█████████████████▉               | 27254/50000 [4:56:38<4:18:30,  1.47it/s]


 55%|█████████████████▉               | 27255/50000 [4:56:39<4:16:30,  1.48it/s]


 55%|█████████████████▉               | 27256/50000 [4:56:39<4:04:20,  1.55it/s]


 55%|█████████████████▉               | 27257/50000 [4:56:40<4:02:21,  1.56it/s]


 55%|█████████████████▉               | 27258/50000 [4:56:41<4:06:41,  1.54it/s]


 55%|█████████████████▉               | 27259/50000 [4:56:41<4:00:22,  1.58it/s]


 55%|█████████████████▉               | 27260/50000 [4:56:42<4:01:05,  1.57it/s]


 55%|█████████████████▉               | 27261/50000 [4:56:43<4:14:56,  1.49it/s]


 55%|█████████████████▉               | 27262/50000 [4:56:43<4:11:54,  1.50it/s]


 55%|█████████████████▉               | 27263/50000 [4:56:44<4:05:23,  1.54it/s]


 55%|█████████████████▉               | 27264/50000 [4:56:44<4:12:26,  1.50it/s]


 55%|█████████████████▉               | 27265/50000 [4:56:45<4:13:02,  1.50it/s]


 55%|█████████████████▉               | 27266/50000 [4:56:46<3:59:49,  1.58it/s]


 55%|█████████████████▉               | 27267/50000 [4:56:46<3:56:12,  1.60it/s]


 55%|█████████████████▉               | 27268/50000 [4:56:47<3:59:24,  1.58it/s]


 55%|█████████████████▉               | 27269/50000 [4:56:48<3:55:55,  1.61it/s]


 55%|█████████████████▉               | 27270/50000 [4:56:48<3:56:25,  1.60it/s]


 55%|█████████████████▉               | 27271/50000 [4:56:49<3:57:28,  1.60it/s]


 55%|█████████████████▉               | 27272/50000 [4:56:49<3:59:47,  1.58it/s]


 55%|██████████████████               | 27273/50000 [4:56:50<4:30:15,  1.40it/s]


 55%|██████████████████               | 27274/50000 [4:56:51<4:30:45,  1.40it/s]


 55%|██████████████████               | 27275/50000 [4:56:52<4:25:59,  1.42it/s]


 55%|██████████████████               | 27276/50000 [4:56:52<4:21:58,  1.45it/s]


 55%|██████████████████               | 27277/50000 [4:56:53<4:16:48,  1.47it/s]


 55%|██████████████████               | 27278/50000 [4:56:54<4:02:24,  1.56it/s]


 55%|██████████████████               | 27279/50000 [4:56:54<3:55:10,  1.61it/s]


 55%|██████████████████               | 27280/50000 [4:56:55<4:26:05,  1.42it/s]


 55%|██████████████████               | 27281/50000 [4:56:56<4:33:33,  1.38it/s]


 55%|██████████████████               | 27282/50000 [4:56:56<4:10:09,  1.51it/s]


 55%|██████████████████               | 27283/50000 [4:56:57<4:07:05,  1.53it/s]


 55%|██████████████████               | 27284/50000 [4:56:58<3:56:18,  1.60it/s]


 55%|██████████████████               | 27285/50000 [4:56:58<4:21:44,  1.45it/s]


 55%|██████████████████               | 27286/50000 [4:56:59<4:09:24,  1.52it/s]


 55%|██████████████████               | 27287/50000 [4:57:00<4:07:09,  1.53it/s]


 55%|██████████████████               | 27288/50000 [4:57:00<4:07:50,  1.53it/s]


 55%|██████████████████               | 27289/50000 [4:57:01<4:01:05,  1.57it/s]


 55%|██████████████████               | 27290/50000 [4:57:02<4:17:28,  1.47it/s]


 55%|██████████████████               | 27291/50000 [4:57:02<4:05:39,  1.54it/s]


 55%|██████████████████               | 27292/50000 [4:57:03<3:56:01,  1.60it/s]


 55%|██████████████████               | 27293/50000 [4:57:03<3:56:25,  1.60it/s]


 55%|██████████████████               | 27294/50000 [4:57:04<3:56:47,  1.60it/s]


 55%|██████████████████               | 27295/50000 [4:57:05<4:06:29,  1.54it/s]


 55%|██████████████████               | 27296/50000 [4:57:05<4:06:45,  1.53it/s]


 55%|██████████████████               | 27297/50000 [4:57:06<4:09:48,  1.51it/s]


 55%|██████████████████               | 27298/50000 [4:57:07<3:57:39,  1.59it/s]


 55%|██████████████████               | 27299/50000 [4:57:07<3:59:53,  1.58it/s]


 55%|██████████████████               | 27300/50000 [4:57:08<3:50:02,  1.64it/s]
                                                                                
{'loss': 3.3289, 'grad_norm': 3.379270076751709, 'learning_rate': 0.00045400000000000003, 'epoch': 1.43}

 55%|██████████████████               | 27300/50000 [4:57:08<3:50:02,  1.64it/s]


 55%|██████████████████               | 27301/50000 [4:57:09<4:13:12,  1.49it/s]


 55%|██████████████████               | 27302/50000 [4:57:09<4:02:54,  1.56it/s]


 55%|██████████████████               | 27303/50000 [4:57:10<3:57:03,  1.60it/s]


 55%|██████████████████               | 27304/50000 [4:57:11<4:08:19,  1.52it/s]


 55%|██████████████████               | 27305/50000 [4:57:11<4:11:32,  1.50it/s]


 55%|██████████████████               | 27306/50000 [4:57:12<4:21:51,  1.44it/s]


 55%|██████████████████               | 27307/50000 [4:57:13<4:15:50,  1.48it/s]


 55%|██████████████████               | 27308/50000 [4:57:13<4:07:27,  1.53it/s]


 55%|██████████████████               | 27309/50000 [4:57:14<4:00:42,  1.57it/s]


 55%|██████████████████               | 27310/50000 [4:57:15<4:11:27,  1.50it/s]


 55%|██████████████████               | 27311/50000 [4:57:15<4:18:11,  1.46it/s]


 55%|██████████████████               | 27312/50000 [4:57:16<4:21:14,  1.45it/s]


 55%|██████████████████               | 27313/50000 [4:57:17<4:16:52,  1.47it/s]


 55%|██████████████████               | 27314/50000 [4:57:17<4:07:16,  1.53it/s]


 55%|██████████████████               | 27315/50000 [4:57:18<4:06:23,  1.53it/s]


 55%|██████████████████               | 27316/50000 [4:57:19<3:59:56,  1.58it/s]


 55%|██████████████████               | 27317/50000 [4:57:19<4:00:26,  1.57it/s]


 55%|██████████████████               | 27318/50000 [4:57:20<3:52:15,  1.63it/s]


 55%|██████████████████               | 27319/50000 [4:57:20<3:49:24,  1.65it/s]


 55%|██████████████████               | 27320/50000 [4:57:21<3:53:52,  1.62it/s]


 55%|██████████████████               | 27321/50000 [4:57:22<4:19:52,  1.45it/s]


 55%|██████████████████               | 27322/50000 [4:57:23<4:27:37,  1.41it/s]


 55%|██████████████████               | 27323/50000 [4:57:23<4:22:02,  1.44it/s]


 55%|██████████████████               | 27324/50000 [4:57:24<4:35:58,  1.37it/s]


 55%|██████████████████               | 27325/50000 [4:57:25<4:24:59,  1.43it/s]


 55%|██████████████████               | 27326/50000 [4:57:25<4:11:32,  1.50it/s]


 55%|██████████████████               | 27327/50000 [4:57:26<4:13:33,  1.49it/s]


 55%|██████████████████               | 27328/50000 [4:57:26<4:02:08,  1.56it/s]


 55%|██████████████████               | 27329/50000 [4:57:27<4:17:21,  1.47it/s]


 55%|██████████████████               | 27330/50000 [4:57:28<4:25:03,  1.43it/s]


 55%|██████████████████               | 27331/50000 [4:57:29<4:29:52,  1.40it/s]


 55%|██████████████████               | 27332/50000 [4:57:29<4:23:31,  1.43it/s]


 55%|██████████████████               | 27333/50000 [4:57:30<4:10:30,  1.51it/s]


 55%|██████████████████               | 27334/50000 [4:57:31<4:04:11,  1.55it/s]


 55%|██████████████████               | 27335/50000 [4:57:31<4:02:06,  1.56it/s]


 55%|██████████████████               | 27336/50000 [4:57:32<3:48:54,  1.65it/s]


 55%|██████████████████               | 27337/50000 [4:57:32<3:42:11,  1.70it/s]


 55%|██████████████████               | 27338/50000 [4:57:33<3:59:13,  1.58it/s]


 55%|██████████████████               | 27339/50000 [4:57:34<3:49:24,  1.65it/s]


 55%|██████████████████               | 27340/50000 [4:57:34<4:04:31,  1.54it/s]


 55%|██████████████████               | 27341/50000 [4:57:35<4:14:51,  1.48it/s]


 55%|██████████████████               | 27342/50000 [4:57:36<4:16:04,  1.47it/s]


 55%|██████████████████               | 27343/50000 [4:57:37<4:22:55,  1.44it/s]


 55%|██████████████████               | 27344/50000 [4:57:37<4:06:44,  1.53it/s]


 55%|██████████████████               | 27345/50000 [4:57:38<3:59:37,  1.58it/s]


 55%|██████████████████               | 27346/50000 [4:57:38<3:59:44,  1.57it/s]


 55%|██████████████████               | 27347/50000 [4:57:39<4:01:13,  1.57it/s]


 55%|██████████████████               | 27348/50000 [4:57:39<3:47:37,  1.66it/s]


 55%|██████████████████               | 27349/50000 [4:57:40<4:02:15,  1.56it/s]


 55%|██████████████████               | 27350/50000 [4:57:41<4:22:59,  1.44it/s]


 55%|██████████████████               | 27351/50000 [4:57:42<4:29:56,  1.40it/s]


 55%|██████████████████               | 27352/50000 [4:57:42<4:11:36,  1.50it/s]


 55%|██████████████████               | 27353/50000 [4:57:43<4:13:02,  1.49it/s]


 55%|██████████████████               | 27354/50000 [4:57:44<3:56:30,  1.60it/s]


 55%|██████████████████               | 27355/50000 [4:57:44<4:12:03,  1.50it/s]


 55%|██████████████████               | 27356/50000 [4:57:45<4:05:26,  1.54it/s]


 55%|██████████████████               | 27357/50000 [4:57:45<3:56:35,  1.60it/s]


 55%|██████████████████               | 27358/50000 [4:57:46<3:42:19,  1.70it/s]


 55%|██████████████████               | 27359/50000 [4:57:47<4:04:26,  1.54it/s]


 55%|██████████████████               | 27360/50000 [4:57:47<3:49:49,  1.64it/s]


 55%|██████████████████               | 27361/50000 [4:57:48<3:49:54,  1.64it/s]


 55%|██████████████████               | 27362/50000 [4:57:49<4:03:55,  1.55it/s]


 55%|██████████████████               | 27363/50000 [4:57:49<3:53:21,  1.62it/s]


 55%|██████████████████               | 27364/50000 [4:57:50<3:56:41,  1.59it/s]


 55%|██████████████████               | 27365/50000 [4:57:50<4:00:28,  1.57it/s]


 55%|██████████████████               | 27366/50000 [4:57:51<4:00:56,  1.57it/s]


 55%|██████████████████               | 27367/50000 [4:57:52<4:04:09,  1.54it/s]


 55%|██████████████████               | 27368/50000 [4:57:52<4:11:20,  1.50it/s]


 55%|██████████████████               | 27369/50000 [4:57:53<3:59:47,  1.57it/s]


 55%|██████████████████               | 27370/50000 [4:57:54<3:43:30,  1.69it/s]


 55%|██████████████████               | 27371/50000 [4:57:54<3:57:24,  1.59it/s]


 55%|██████████████████               | 27372/50000 [4:57:55<4:08:49,  1.52it/s]


 55%|██████████████████               | 27373/50000 [4:57:56<4:06:30,  1.53it/s]


 55%|██████████████████               | 27374/50000 [4:57:56<3:59:54,  1.57it/s]


 55%|██████████████████               | 27375/50000 [4:57:57<3:58:50,  1.58it/s]


 55%|██████████████████               | 27376/50000 [4:57:58<4:00:55,  1.57it/s]


 55%|██████████████████               | 27377/50000 [4:57:58<3:50:53,  1.63it/s]


 55%|██████████████████               | 27378/50000 [4:57:59<4:01:58,  1.56it/s]


 55%|██████████████████               | 27379/50000 [4:58:00<4:11:57,  1.50it/s]


 55%|██████████████████               | 27380/50000 [4:58:00<4:21:05,  1.44it/s]


 55%|██████████████████               | 27381/50000 [4:58:01<4:08:40,  1.52it/s]


 55%|██████████████████               | 27382/50000 [4:58:01<4:00:38,  1.57it/s]


 55%|██████████████████               | 27383/50000 [4:58:02<4:12:26,  1.49it/s]


 55%|██████████████████               | 27384/50000 [4:58:03<4:16:54,  1.47it/s]


 55%|██████████████████               | 27385/50000 [4:58:04<4:21:42,  1.44it/s]


 55%|██████████████████               | 27386/50000 [4:58:04<4:11:09,  1.50it/s]


 55%|██████████████████               | 27387/50000 [4:58:05<4:08:50,  1.51it/s]


 55%|██████████████████               | 27388/50000 [4:58:05<4:01:30,  1.56it/s]


 55%|██████████████████               | 27389/50000 [4:58:06<4:12:22,  1.49it/s]


 55%|██████████████████               | 27390/50000 [4:58:07<4:09:38,  1.51it/s]


 55%|██████████████████               | 27391/50000 [4:58:07<4:02:43,  1.55it/s]


 55%|██████████████████               | 27392/50000 [4:58:08<3:55:45,  1.60it/s]


 55%|██████████████████               | 27393/50000 [4:58:09<3:56:38,  1.59it/s]


 55%|██████████████████               | 27394/50000 [4:58:09<3:51:30,  1.63it/s]


 55%|██████████████████               | 27395/50000 [4:58:10<4:23:42,  1.43it/s]


 55%|██████████████████               | 27396/50000 [4:58:11<4:09:45,  1.51it/s]


 55%|██████████████████               | 27397/50000 [4:58:11<4:08:31,  1.52it/s]


 55%|██████████████████               | 27398/50000 [4:58:12<4:07:03,  1.52it/s]


 55%|██████████████████               | 27399/50000 [4:58:13<3:58:11,  1.58it/s]


 55%|██████████████████               | 27400/50000 [4:58:13<3:53:16,  1.61it/s]
                                                                                
{'loss': 3.2896, 'grad_norm': 3.929229497909546, 'learning_rate': 0.00045200000000000004, 'epoch': 1.43}

 55%|██████████████████               | 27400/50000 [4:58:13<3:53:16,  1.61it/s]


 55%|██████████████████               | 27401/50000 [4:58:14<4:08:02,  1.52it/s]


 55%|██████████████████               | 27402/50000 [4:58:15<4:05:50,  1.53it/s]


 55%|██████████████████               | 27403/50000 [4:58:15<4:05:03,  1.54it/s]


 55%|██████████████████               | 27404/50000 [4:58:16<4:13:23,  1.49it/s]


 55%|██████████████████               | 27405/50000 [4:58:17<4:13:12,  1.49it/s]


 55%|██████████████████               | 27406/50000 [4:58:17<4:05:09,  1.54it/s]


 55%|██████████████████               | 27407/50000 [4:58:18<4:12:45,  1.49it/s]


 55%|██████████████████               | 27408/50000 [4:58:19<4:07:43,  1.52it/s]


 55%|██████████████████               | 27409/50000 [4:58:19<4:06:17,  1.53it/s]


 55%|██████████████████               | 27410/50000 [4:58:20<4:14:13,  1.48it/s]


 55%|██████████████████               | 27411/50000 [4:58:20<3:56:59,  1.59it/s]


 55%|██████████████████               | 27412/50000 [4:58:21<3:48:09,  1.65it/s]


 55%|██████████████████               | 27413/50000 [4:58:22<3:43:23,  1.69it/s]


 55%|██████████████████               | 27414/50000 [4:58:22<3:37:54,  1.73it/s]


 55%|██████████████████               | 27415/50000 [4:58:23<3:40:24,  1.71it/s]


 55%|██████████████████               | 27416/50000 [4:58:23<3:44:23,  1.68it/s]


 55%|██████████████████               | 27417/50000 [4:58:24<3:54:23,  1.61it/s]


 55%|██████████████████               | 27418/50000 [4:58:25<3:46:20,  1.66it/s]


 55%|██████████████████               | 27419/50000 [4:58:25<3:44:54,  1.67it/s]


 55%|██████████████████               | 27420/50000 [4:58:26<4:02:43,  1.55it/s]


 55%|██████████████████               | 27421/50000 [4:58:27<4:11:00,  1.50it/s]


 55%|██████████████████               | 27422/50000 [4:58:27<4:07:14,  1.52it/s]


 55%|██████████████████               | 27423/50000 [4:58:28<3:56:44,  1.59it/s]


 55%|██████████████████               | 27424/50000 [4:58:28<3:56:27,  1.59it/s]


 55%|██████████████████               | 27425/50000 [4:58:29<4:06:36,  1.53it/s]


 55%|██████████████████               | 27426/50000 [4:58:30<4:06:58,  1.52it/s]


 55%|██████████████████               | 27427/50000 [4:58:30<4:05:01,  1.54it/s]


 55%|██████████████████               | 27428/50000 [4:58:31<4:11:34,  1.50it/s]


 55%|██████████████████               | 27429/50000 [4:58:32<3:55:05,  1.60it/s]


 55%|██████████████████               | 27430/50000 [4:58:33<4:18:05,  1.46it/s]


 55%|██████████████████               | 27431/50000 [4:58:33<4:08:33,  1.51it/s]


 55%|██████████████████               | 27432/50000 [4:58:34<4:07:34,  1.52it/s]


 55%|██████████████████               | 27433/50000 [4:58:34<4:08:21,  1.51it/s]


 55%|██████████████████               | 27434/50000 [4:58:35<3:58:52,  1.57it/s]


 55%|██████████████████               | 27435/50000 [4:58:36<3:57:57,  1.58it/s]


 55%|██████████████████               | 27436/50000 [4:58:36<3:59:23,  1.57it/s]


 55%|██████████████████               | 27437/50000 [4:58:37<4:09:15,  1.51it/s]


 55%|██████████████████               | 27438/50000 [4:58:38<4:14:29,  1.48it/s]


 55%|██████████████████               | 27439/50000 [4:58:38<4:09:35,  1.51it/s]


 55%|██████████████████               | 27440/50000 [4:58:39<4:06:08,  1.53it/s]


 55%|██████████████████               | 27441/50000 [4:58:40<3:58:15,  1.58it/s]


 55%|██████████████████               | 27442/50000 [4:58:40<4:13:26,  1.48it/s]


 55%|██████████████████               | 27443/50000 [4:58:41<4:03:13,  1.55it/s]


 55%|██████████████████               | 27444/50000 [4:58:42<4:11:14,  1.50it/s]


 55%|██████████████████               | 27445/50000 [4:58:42<4:02:56,  1.55it/s]


 55%|██████████████████               | 27446/50000 [4:58:43<3:52:15,  1.62it/s]


 55%|██████████████████               | 27447/50000 [4:58:43<3:58:55,  1.57it/s]


 55%|██████████████████               | 27448/50000 [4:58:44<3:57:16,  1.58it/s]


 55%|██████████████████               | 27449/50000 [4:58:45<3:43:36,  1.68it/s]


 55%|██████████████████               | 27450/50000 [4:58:45<3:59:34,  1.57it/s]


 55%|██████████████████               | 27451/50000 [4:58:46<3:59:07,  1.57it/s]


 55%|██████████████████               | 27452/50000 [4:58:47<4:03:45,  1.54it/s]


 55%|██████████████████               | 27453/50000 [4:58:47<4:11:35,  1.49it/s]


 55%|██████████████████               | 27454/50000 [4:58:48<4:00:21,  1.56it/s]


 55%|██████████████████               | 27455/50000 [4:58:49<4:05:17,  1.53it/s]


 55%|██████████████████               | 27456/50000 [4:58:49<4:05:10,  1.53it/s]


 55%|██████████████████               | 27457/50000 [4:58:50<4:03:48,  1.54it/s]


 55%|██████████████████               | 27458/50000 [4:58:51<4:04:23,  1.54it/s]


 55%|██████████████████               | 27459/50000 [4:58:51<4:01:17,  1.56it/s]


 55%|██████████████████               | 27460/50000 [4:58:52<4:13:24,  1.48it/s]


 55%|██████████████████               | 27461/50000 [4:58:53<4:04:26,  1.54it/s]


 55%|██████████████████               | 27462/50000 [4:58:53<4:03:10,  1.54it/s]


 55%|██████████████████▏              | 27463/50000 [4:58:54<4:02:16,  1.55it/s]


 55%|██████████████████▏              | 27464/50000 [4:58:55<4:15:28,  1.47it/s]


 55%|██████████████████▏              | 27465/50000 [4:58:55<3:57:42,  1.58it/s]


 55%|██████████████████▏              | 27466/50000 [4:58:56<4:02:41,  1.55it/s]


 55%|██████████████████▏              | 27467/50000 [4:58:56<3:58:18,  1.58it/s]


 55%|██████████████████▏              | 27468/50000 [4:58:57<3:52:27,  1.62it/s]


 55%|██████████████████▏              | 27469/50000 [4:58:58<3:54:06,  1.60it/s]


 55%|██████████████████▏              | 27470/50000 [4:58:58<3:55:14,  1.60it/s]


 55%|██████████████████▏              | 27471/50000 [4:58:59<3:57:31,  1.58it/s]


 55%|██████████████████▏              | 27472/50000 [4:59:00<4:07:06,  1.52it/s]


 55%|██████████████████▏              | 27473/50000 [4:59:00<4:00:12,  1.56it/s]


 55%|██████████████████▏              | 27474/50000 [4:59:01<4:05:25,  1.53it/s]


 55%|██████████████████▏              | 27475/50000 [4:59:02<4:17:08,  1.46it/s]


 55%|██████████████████▏              | 27476/50000 [4:59:02<4:05:09,  1.53it/s]


 55%|██████████████████▏              | 27477/50000 [4:59:03<3:56:53,  1.58it/s]


 55%|██████████████████▏              | 27478/50000 [4:59:04<4:08:20,  1.51it/s]


 55%|██████████████████▏              | 27479/50000 [4:59:04<3:58:59,  1.57it/s]


 55%|██████████████████▏              | 27480/50000 [4:59:05<4:11:12,  1.49it/s]


 55%|██████████████████▏              | 27481/50000 [4:59:05<4:08:45,  1.51it/s]


 55%|██████████████████▏              | 27482/50000 [4:59:06<4:09:57,  1.50it/s]


 55%|██████████████████▏              | 27483/50000 [4:59:07<4:01:26,  1.55it/s]


 55%|██████████████████▏              | 27484/50000 [4:59:07<3:56:58,  1.58it/s]


 55%|██████████████████▏              | 27485/50000 [4:59:08<3:56:47,  1.58it/s]


 55%|██████████████████▏              | 27486/50000 [4:59:09<3:53:37,  1.61it/s]


 55%|██████████████████▏              | 27487/50000 [4:59:09<4:05:41,  1.53it/s]


 55%|██████████████████▏              | 27488/50000 [4:59:10<4:05:20,  1.53it/s]


 55%|██████████████████▏              | 27489/50000 [4:59:11<4:06:31,  1.52it/s]


 55%|██████████████████▏              | 27490/50000 [4:59:11<4:04:29,  1.53it/s]


 55%|██████████████████▏              | 27491/50000 [4:59:12<4:10:59,  1.49it/s]


 55%|██████████████████▏              | 27492/50000 [4:59:13<4:10:30,  1.50it/s]


 55%|██████████████████▏              | 27493/50000 [4:59:13<4:02:09,  1.55it/s]


 55%|██████████████████▏              | 27494/50000 [4:59:14<3:54:33,  1.60it/s]


 55%|██████████████████▏              | 27495/50000 [4:59:14<3:49:02,  1.64it/s]


 55%|██████████████████▏              | 27496/50000 [4:59:15<3:43:03,  1.68it/s]


 55%|██████████████████▏              | 27497/50000 [4:59:16<3:49:25,  1.63it/s]


 55%|██████████████████▏              | 27498/50000 [4:59:16<4:05:39,  1.53it/s]


 55%|██████████████████▏              | 27499/50000 [4:59:17<4:14:47,  1.47it/s]


 55%|██████████████████▏              | 27500/50000 [4:59:18<3:55:43,  1.59it/s]
                                                                                
{'loss': 3.2517, 'grad_norm': 3.584094762802124, 'learning_rate': 0.00045000000000000004, 'epoch': 1.44}

 55%|██████████████████▏              | 27500/50000 [4:59:18<3:55:43,  1.59it/s]


 55%|██████████████████▏              | 27501/50000 [4:59:18<3:51:13,  1.62it/s]


 55%|██████████████████▏              | 27502/50000 [4:59:19<3:51:42,  1.62it/s]


 55%|██████████████████▏              | 27503/50000 [4:59:19<3:55:30,  1.59it/s]


 55%|██████████████████▏              | 27504/50000 [4:59:20<4:00:17,  1.56it/s]


 55%|██████████████████▏              | 27505/50000 [4:59:21<4:00:04,  1.56it/s]


 55%|██████████████████▏              | 27506/50000 [4:59:21<3:51:27,  1.62it/s]


 55%|██████████████████▏              | 27507/50000 [4:59:22<3:40:46,  1.70it/s]


 55%|██████████████████▏              | 27508/50000 [4:59:23<3:47:50,  1.65it/s]


 55%|██████████████████▏              | 27509/50000 [4:59:23<3:34:04,  1.75it/s]


 55%|██████████████████▏              | 27510/50000 [4:59:24<3:55:33,  1.59it/s]


 55%|██████████████████▏              | 27511/50000 [4:59:25<4:07:42,  1.51it/s]


 55%|██████████████████▏              | 27512/50000 [4:59:25<4:08:52,  1.51it/s]


 55%|██████████████████▏              | 27513/50000 [4:59:26<3:59:42,  1.56it/s]


 55%|██████████████████▏              | 27514/50000 [4:59:26<3:53:17,  1.61it/s]


 55%|██████████████████▏              | 27515/50000 [4:59:27<3:55:53,  1.59it/s]


 55%|██████████████████▏              | 27516/50000 [4:59:28<3:51:48,  1.62it/s]


 55%|██████████████████▏              | 27517/50000 [4:59:28<3:58:24,  1.57it/s]


 55%|██████████████████▏              | 27518/50000 [4:59:29<3:51:01,  1.62it/s]


 55%|██████████████████▏              | 27519/50000 [4:59:30<4:19:06,  1.45it/s]


 55%|██████████████████▏              | 27520/50000 [4:59:30<4:18:15,  1.45it/s]


 55%|██████████████████▏              | 27521/50000 [4:59:31<4:14:51,  1.47it/s]


 55%|██████████████████▏              | 27522/50000 [4:59:32<4:31:32,  1.38it/s]


 55%|██████████████████▏              | 27523/50000 [4:59:32<4:15:35,  1.47it/s]


 55%|██████████████████▏              | 27524/50000 [4:59:33<4:29:52,  1.39it/s]


 55%|██████████████████▏              | 27525/50000 [4:59:34<4:33:04,  1.37it/s]


 55%|██████████████████▏              | 27526/50000 [4:59:35<4:08:27,  1.51it/s]


 55%|██████████████████▏              | 27527/50000 [4:59:35<4:00:47,  1.56it/s]


 55%|██████████████████▏              | 27528/50000 [4:59:36<3:47:15,  1.65it/s]


 55%|██████████████████▏              | 27529/50000 [4:59:36<3:50:53,  1.62it/s]


 55%|██████████████████▏              | 27530/50000 [4:59:37<3:48:29,  1.64it/s]


 55%|██████████████████▏              | 27531/50000 [4:59:38<3:56:48,  1.58it/s]


 55%|██████████████████▏              | 27532/50000 [4:59:38<3:58:16,  1.57it/s]


 55%|██████████████████▏              | 27533/50000 [4:59:39<3:59:21,  1.56it/s]


 55%|██████████████████▏              | 27534/50000 [4:59:39<3:53:40,  1.60it/s]


 55%|██████████████████▏              | 27535/50000 [4:59:40<3:46:53,  1.65it/s]


 55%|██████████████████▏              | 27536/50000 [4:59:41<4:03:08,  1.54it/s]


 55%|██████████████████▏              | 27537/50000 [4:59:41<3:55:02,  1.59it/s]


 55%|██████████████████▏              | 27538/50000 [4:59:42<4:00:35,  1.56it/s]


 55%|██████████████████▏              | 27539/50000 [4:59:43<4:05:09,  1.53it/s]


 55%|██████████████████▏              | 27540/50000 [4:59:43<3:59:08,  1.57it/s]


 55%|██████████████████▏              | 27541/50000 [4:59:44<4:03:44,  1.54it/s]


 55%|██████████████████▏              | 27542/50000 [4:59:45<4:01:13,  1.55it/s]


 55%|██████████████████▏              | 27543/50000 [4:59:45<3:50:18,  1.63it/s]


 55%|██████████████████▏              | 27544/50000 [4:59:46<3:44:40,  1.67it/s]


 55%|██████████████████▏              | 27545/50000 [4:59:46<3:44:46,  1.66it/s]


 55%|██████████████████▏              | 27546/50000 [4:59:47<3:58:37,  1.57it/s]


 55%|██████████████████▏              | 27547/50000 [4:59:48<4:06:40,  1.52it/s]


 55%|██████████████████▏              | 27548/50000 [4:59:48<3:51:19,  1.62it/s]


 55%|██████████████████▏              | 27549/50000 [4:59:49<3:56:41,  1.58it/s]


 55%|██████████████████▏              | 27550/50000 [4:59:50<3:56:19,  1.58it/s]


 55%|██████████████████▏              | 27551/50000 [4:59:50<4:18:02,  1.45it/s]


 55%|██████████████████▏              | 27552/50000 [4:59:51<4:08:26,  1.51it/s]


 55%|██████████████████▏              | 27553/50000 [4:59:52<3:59:32,  1.56it/s]


 55%|██████████████████▏              | 27554/50000 [4:59:52<3:54:00,  1.60it/s]


 55%|██████████████████▏              | 27555/50000 [4:59:53<4:11:03,  1.49it/s]


 55%|██████████████████▏              | 27556/50000 [4:59:54<4:08:27,  1.51it/s]


 55%|██████████████████▏              | 27557/50000 [4:59:54<4:08:44,  1.50it/s]


 55%|██████████████████▏              | 27558/50000 [4:59:55<4:06:53,  1.51it/s]


 55%|██████████████████▏              | 27559/50000 [4:59:56<4:04:48,  1.53it/s]


 55%|██████████████████▏              | 27560/50000 [4:59:56<4:06:12,  1.52it/s]


 55%|██████████████████▏              | 27561/50000 [4:59:57<3:58:00,  1.57it/s]


 55%|██████████████████▏              | 27562/50000 [4:59:57<3:50:51,  1.62it/s]


 55%|██████████████████▏              | 27563/50000 [4:59:58<3:47:14,  1.65it/s]


 55%|██████████████████▏              | 27564/50000 [4:59:59<3:53:51,  1.60it/s]


 55%|██████████████████▏              | 27565/50000 [4:59:59<3:58:31,  1.57it/s]


 55%|██████████████████▏              | 27566/50000 [5:00:00<3:51:06,  1.62it/s]


 55%|██████████████████▏              | 27567/50000 [5:00:01<4:06:14,  1.52it/s]


 55%|██████████████████▏              | 27568/50000 [5:00:01<4:02:54,  1.54it/s]


 55%|██████████████████▏              | 27569/50000 [5:00:02<4:05:34,  1.52it/s]


 55%|██████████████████▏              | 27570/50000 [5:00:03<3:57:36,  1.57it/s]


 55%|██████████████████▏              | 27571/50000 [5:00:03<3:58:45,  1.57it/s]


 55%|██████████████████▏              | 27572/50000 [5:00:04<4:00:14,  1.56it/s]


 55%|██████████████████▏              | 27573/50000 [5:00:04<3:54:16,  1.60it/s]


 55%|██████████████████▏              | 27574/50000 [5:00:05<3:59:57,  1.56it/s]


 55%|██████████████████▏              | 27575/50000 [5:00:06<3:56:03,  1.58it/s]


 55%|██████████████████▏              | 27576/50000 [5:00:06<4:04:46,  1.53it/s]


 55%|██████████████████▏              | 27577/50000 [5:00:07<3:56:50,  1.58it/s]


 55%|██████████████████▏              | 27578/50000 [5:00:08<3:59:41,  1.56it/s]


 55%|██████████████████▏              | 27579/50000 [5:00:08<4:02:19,  1.54it/s]


 55%|██████████████████▏              | 27580/50000 [5:00:09<4:03:27,  1.53it/s]


 55%|██████████████████▏              | 27581/50000 [5:00:09<3:49:00,  1.63it/s]


 55%|██████████████████▏              | 27582/50000 [5:00:10<3:53:22,  1.60it/s]


 55%|██████████████████▏              | 27583/50000 [5:00:11<3:53:37,  1.60it/s]


 55%|██████████████████▏              | 27584/50000 [5:00:11<3:47:22,  1.64it/s]


 55%|██████████████████▏              | 27585/50000 [5:00:12<3:36:25,  1.73it/s]


 55%|██████████████████▏              | 27586/50000 [5:00:13<3:45:19,  1.66it/s]


 55%|██████████████████▏              | 27587/50000 [5:00:13<3:41:37,  1.69it/s]


 55%|██████████████████▏              | 27588/50000 [5:00:14<4:00:32,  1.55it/s]


 55%|██████████████████▏              | 27589/50000 [5:00:14<3:55:45,  1.58it/s]


 55%|██████████████████▏              | 27590/50000 [5:00:15<3:58:50,  1.56it/s]


 55%|██████████████████▏              | 27591/50000 [5:00:16<4:06:42,  1.51it/s]


 55%|██████████████████▏              | 27592/50000 [5:00:16<4:03:40,  1.53it/s]


 55%|██████████████████▏              | 27593/50000 [5:00:17<4:02:10,  1.54it/s]


 55%|██████████████████▏              | 27594/50000 [5:00:18<3:54:05,  1.60it/s]


 55%|██████████████████▏              | 27595/50000 [5:00:18<3:50:33,  1.62it/s]


 55%|██████████████████▏              | 27596/50000 [5:00:19<3:51:36,  1.61it/s]


 55%|██████████████████▏              | 27597/50000 [5:00:20<4:08:13,  1.50it/s]


 55%|██████████████████▏              | 27598/50000 [5:00:20<3:59:47,  1.56it/s]


 55%|██████████████████▏              | 27599/50000 [5:00:21<3:49:34,  1.63it/s]


 55%|██████████████████▏              | 27600/50000 [5:00:21<3:45:56,  1.65it/s]
                                                                                
{'loss': 3.2693, 'grad_norm': 5.222185134887695, 'learning_rate': 0.000448, 'epoch': 1.45}

 55%|██████████████████▏              | 27600/50000 [5:00:21<3:45:56,  1.65it/s]


 55%|██████████████████▏              | 27601/50000 [5:00:22<4:14:03,  1.47it/s]


 55%|██████████████████▏              | 27602/50000 [5:00:23<3:55:45,  1.58it/s]


 55%|██████████████████▏              | 27603/50000 [5:00:23<4:04:29,  1.53it/s]


 55%|██████████████████▏              | 27604/50000 [5:00:24<4:01:21,  1.55it/s]


 55%|██████████████████▏              | 27605/50000 [5:00:25<4:16:27,  1.46it/s]


 55%|██████████████████▏              | 27606/50000 [5:00:26<4:14:42,  1.47it/s]


 55%|██████████████████▏              | 27607/50000 [5:00:26<4:00:42,  1.55it/s]


 55%|██████████████████▏              | 27608/50000 [5:00:27<3:55:50,  1.58it/s]


 55%|██████████████████▏              | 27609/50000 [5:00:27<3:47:41,  1.64it/s]


 55%|██████████████████▏              | 27610/50000 [5:00:28<3:51:53,  1.61it/s]


 55%|██████████████████▏              | 27611/50000 [5:00:28<3:46:24,  1.65it/s]


 55%|██████████████████▏              | 27612/50000 [5:00:29<3:43:14,  1.67it/s]


 55%|██████████████████▏              | 27613/50000 [5:00:30<3:51:35,  1.61it/s]


 55%|██████████████████▏              | 27614/50000 [5:00:30<3:53:24,  1.60it/s]


 55%|██████████████████▏              | 27615/50000 [5:00:31<3:57:28,  1.57it/s]


 55%|██████████████████▏              | 27616/50000 [5:00:32<4:13:54,  1.47it/s]


 55%|██████████████████▏              | 27617/50000 [5:00:32<4:11:43,  1.48it/s]


 55%|██████████████████▏              | 27618/50000 [5:00:33<4:00:54,  1.55it/s]


 55%|██████████████████▏              | 27619/50000 [5:00:34<4:12:03,  1.48it/s]


 55%|██████████████████▏              | 27620/50000 [5:00:34<4:07:39,  1.51it/s]


 55%|██████████████████▏              | 27621/50000 [5:00:35<4:04:28,  1.53it/s]


 55%|██████████████████▏              | 27622/50000 [5:00:36<4:01:30,  1.54it/s]


 55%|██████████████████▏              | 27623/50000 [5:00:36<3:56:33,  1.58it/s]


 55%|██████████████████▏              | 27624/50000 [5:00:37<4:00:55,  1.55it/s]


 55%|██████████████████▏              | 27625/50000 [5:00:38<3:59:03,  1.56it/s]


 55%|██████████████████▏              | 27626/50000 [5:00:38<3:53:36,  1.60it/s]


 55%|██████████████████▏              | 27627/50000 [5:00:39<3:57:27,  1.57it/s]


 55%|██████████████████▏              | 27628/50000 [5:00:39<3:57:17,  1.57it/s]


 55%|██████████████████▏              | 27629/50000 [5:00:40<3:59:15,  1.56it/s]


 55%|██████████████████▏              | 27630/50000 [5:00:41<3:52:10,  1.61it/s]


 55%|██████████████████▏              | 27631/50000 [5:00:41<3:59:19,  1.56it/s]


 55%|██████████████████▏              | 27632/50000 [5:00:42<4:00:33,  1.55it/s]


 55%|██████████████████▏              | 27633/50000 [5:00:43<3:50:05,  1.62it/s]


 55%|██████████████████▏              | 27634/50000 [5:00:43<3:51:12,  1.61it/s]


 55%|██████████████████▏              | 27635/50000 [5:00:44<3:45:36,  1.65it/s]


 55%|██████████████████▏              | 27636/50000 [5:00:44<3:49:26,  1.62it/s]


 55%|██████████████████▏              | 27637/50000 [5:00:45<3:47:08,  1.64it/s]


 55%|██████████████████▏              | 27638/50000 [5:00:46<3:51:55,  1.61it/s]


 55%|██████████████████▏              | 27639/50000 [5:00:46<3:45:48,  1.65it/s]


 55%|██████████████████▏              | 27640/50000 [5:00:47<4:01:16,  1.54it/s]


 55%|██████████████████▏              | 27641/50000 [5:00:48<3:51:19,  1.61it/s]


 55%|██████████████████▏              | 27642/50000 [5:00:48<4:14:35,  1.46it/s]


 55%|██████████████████▏              | 27643/50000 [5:00:49<4:04:04,  1.53it/s]


 55%|██████████████████▏              | 27644/50000 [5:00:50<3:56:01,  1.58it/s]


 55%|██████████████████▏              | 27645/50000 [5:00:50<4:08:19,  1.50it/s]


 55%|██████████████████▏              | 27646/50000 [5:00:51<3:46:42,  1.64it/s]


 55%|██████████████████▏              | 27647/50000 [5:00:51<3:41:41,  1.68it/s]


 55%|██████████████████▏              | 27648/50000 [5:00:52<3:36:08,  1.72it/s]


 55%|██████████████████▏              | 27649/50000 [5:00:53<3:53:02,  1.60it/s]


 55%|██████████████████▏              | 27650/50000 [5:00:53<3:58:11,  1.56it/s]


 55%|██████████████████▏              | 27651/50000 [5:00:54<4:17:58,  1.44it/s]


 55%|██████████████████▎              | 27652/50000 [5:00:55<4:19:54,  1.43it/s]


 55%|██████████████████▎              | 27653/50000 [5:00:55<4:14:45,  1.46it/s]


 55%|██████████████████▎              | 27654/50000 [5:00:56<4:01:23,  1.54it/s]


 55%|██████████████████▎              | 27655/50000 [5:00:57<3:54:06,  1.59it/s]


 55%|██████████████████▎              | 27656/50000 [5:00:57<3:45:29,  1.65it/s]


 55%|██████████████████▎              | 27657/50000 [5:00:58<3:59:32,  1.55it/s]


 55%|██████████████████▎              | 27658/50000 [5:00:59<4:00:29,  1.55it/s]


 55%|██████████████████▎              | 27659/50000 [5:00:59<4:01:54,  1.54it/s]


 55%|██████████████████▎              | 27660/50000 [5:01:00<4:10:12,  1.49it/s]


 55%|██████████████████▎              | 27661/50000 [5:01:01<4:07:18,  1.51it/s]


 55%|██████████████████▎              | 27662/50000 [5:01:01<4:08:46,  1.50it/s]


 55%|██████████████████▎              | 27663/50000 [5:01:02<4:00:38,  1.55it/s]


 55%|██████████████████▎              | 27664/50000 [5:01:02<3:51:25,  1.61it/s]


 55%|██████████████████▎              | 27665/50000 [5:01:03<3:52:46,  1.60it/s]


 55%|██████████████████▎              | 27666/50000 [5:01:04<4:13:20,  1.47it/s]


 55%|██████████████████▎              | 27667/50000 [5:01:05<4:11:40,  1.48it/s]


 55%|██████████████████▎              | 27668/50000 [5:01:05<4:11:10,  1.48it/s]


 55%|██████████████████▎              | 27669/50000 [5:01:06<4:19:02,  1.44it/s]


 55%|██████████████████▎              | 27670/50000 [5:01:07<4:15:56,  1.45it/s]


 55%|██████████████████▎              | 27671/50000 [5:01:07<4:14:46,  1.46it/s]


 55%|██████████████████▎              | 27672/50000 [5:01:08<4:01:18,  1.54it/s]


 55%|██████████████████▎              | 27673/50000 [5:01:08<3:54:37,  1.59it/s]


 55%|██████████████████▎              | 27674/50000 [5:01:09<4:07:13,  1.51it/s]


 55%|██████████████████▎              | 27675/50000 [5:01:10<4:03:48,  1.53it/s]


 55%|██████████████████▎              | 27676/50000 [5:01:10<3:49:03,  1.62it/s]


 55%|██████████████████▎              | 27677/50000 [5:01:11<3:54:31,  1.59it/s]


 55%|██████████████████▎              | 27678/50000 [5:01:12<3:56:12,  1.58it/s]


 55%|██████████████████▎              | 27679/50000 [5:01:12<3:48:59,  1.62it/s]


 55%|██████████████████▎              | 27680/50000 [5:01:13<3:53:01,  1.60it/s]


 55%|██████████████████▎              | 27681/50000 [5:01:14<4:17:04,  1.45it/s]


 55%|██████████████████▎              | 27682/50000 [5:01:14<4:12:40,  1.47it/s]


 55%|██████████████████▎              | 27683/50000 [5:01:15<4:10:19,  1.49it/s]


 55%|██████████████████▎              | 27684/50000 [5:01:16<4:09:23,  1.49it/s]


 55%|██████████████████▎              | 27685/50000 [5:01:16<4:00:56,  1.54it/s]


 55%|██████████████████▎              | 27686/50000 [5:01:17<3:57:55,  1.56it/s]


 55%|██████████████████▎              | 27687/50000 [5:01:17<3:50:51,  1.61it/s]


 55%|██████████████████▎              | 27688/50000 [5:01:18<4:00:42,  1.54it/s]


 55%|██████████████████▎              | 27689/50000 [5:01:19<3:59:12,  1.55it/s]


 55%|██████████████████▎              | 27690/50000 [5:01:20<4:10:21,  1.49it/s]


 55%|██████████████████▎              | 27691/50000 [5:01:20<4:03:12,  1.53it/s]


 55%|██████████████████▎              | 27692/50000 [5:01:21<3:55:58,  1.58it/s]


 55%|██████████████████▎              | 27693/50000 [5:01:21<3:40:42,  1.68it/s]


 55%|██████████████████▎              | 27694/50000 [5:01:22<3:37:22,  1.71it/s]


 55%|██████████████████▎              | 27695/50000 [5:01:22<3:42:08,  1.67it/s]


 55%|██████████████████▎              | 27696/50000 [5:01:23<3:44:43,  1.65it/s]


 55%|██████████████████▎              | 27697/50000 [5:01:24<3:46:32,  1.64it/s]


 55%|██████████████████▎              | 27698/50000 [5:01:24<3:59:19,  1.55it/s]


 55%|██████████████████▎              | 27699/50000 [5:01:25<3:53:55,  1.59it/s]


 55%|██████████████████▎              | 27700/50000 [5:01:26<3:55:03,  1.58it/s]
                                                                                
{'loss': 3.2952, 'grad_norm': 3.3396942615509033, 'learning_rate': 0.000446, 'epoch': 1.45}

 55%|██████████████████▎              | 27700/50000 [5:01:26<3:55:03,  1.58it/s]


 55%|██████████████████▎              | 27701/50000 [5:01:26<3:51:05,  1.61it/s]


 55%|██████████████████▎              | 27702/50000 [5:01:27<3:43:57,  1.66it/s]


 55%|██████████████████▎              | 27703/50000 [5:01:27<3:42:26,  1.67it/s]


 55%|██████████████████▎              | 27704/50000 [5:01:28<3:58:31,  1.56it/s]


 55%|██████████████████▎              | 27705/50000 [5:01:29<3:48:24,  1.63it/s]


 55%|██████████████████▎              | 27706/50000 [5:01:29<4:00:34,  1.54it/s]


 55%|██████████████████▎              | 27707/50000 [5:01:30<3:57:40,  1.56it/s]


 55%|██████████████████▎              | 27708/50000 [5:01:31<3:59:47,  1.55it/s]


 55%|██████████████████▎              | 27709/50000 [5:01:31<4:08:32,  1.49it/s]


 55%|██████████████████▎              | 27710/50000 [5:01:32<4:05:55,  1.51it/s]


 55%|██████████████████▎              | 27711/50000 [5:01:33<3:52:54,  1.60it/s]


 55%|██████████████████▎              | 27712/50000 [5:01:34<4:22:35,  1.41it/s]


 55%|██████████████████▎              | 27713/50000 [5:01:34<4:27:21,  1.39it/s]


 55%|██████████████████▎              | 27714/50000 [5:01:35<4:32:07,  1.36it/s]


 55%|██████████████████▎              | 27715/50000 [5:01:36<4:19:40,  1.43it/s]


 55%|██████████████████▎              | 27716/50000 [5:01:36<4:23:51,  1.41it/s]


 55%|██████████████████▎              | 27717/50000 [5:01:37<4:18:53,  1.43it/s]


 55%|██████████████████▎              | 27718/50000 [5:01:38<4:04:02,  1.52it/s]


 55%|██████████████████▎              | 27719/50000 [5:01:38<3:55:00,  1.58it/s]


 55%|██████████████████▎              | 27720/50000 [5:01:39<3:57:08,  1.57it/s]


 55%|██████████████████▎              | 27721/50000 [5:01:39<3:50:54,  1.61it/s]


 55%|██████████████████▎              | 27722/50000 [5:01:40<3:55:40,  1.58it/s]


 55%|██████████████████▎              | 27723/50000 [5:01:41<3:54:08,  1.59it/s]


 55%|██████████████████▎              | 27724/50000 [5:01:41<3:48:05,  1.63it/s]


 55%|██████████████████▎              | 27725/50000 [5:01:42<3:43:09,  1.66it/s]


 55%|██████████████████▎              | 27726/50000 [5:01:42<3:43:10,  1.66it/s]


 55%|██████████████████▎              | 27727/50000 [5:01:43<3:52:24,  1.60it/s]


 55%|██████████████████▎              | 27728/50000 [5:01:44<3:54:05,  1.59it/s]


 55%|██████████████████▎              | 27729/50000 [5:01:44<4:00:14,  1.55it/s]


 55%|██████████████████▎              | 27730/50000 [5:01:45<4:10:55,  1.48it/s]


 55%|██████████████████▎              | 27731/50000 [5:01:46<4:01:58,  1.53it/s]


 55%|██████████████████▎              | 27732/50000 [5:01:47<4:09:59,  1.48it/s]


 55%|██████████████████▎              | 27733/50000 [5:01:47<4:06:11,  1.51it/s]


 55%|██████████████████▎              | 27734/50000 [5:01:48<4:33:07,  1.36it/s]


 55%|██████████████████▎              | 27735/50000 [5:01:49<4:15:24,  1.45it/s]


 55%|██████████████████▎              | 27736/50000 [5:01:49<4:13:28,  1.46it/s]


 55%|██████████████████▎              | 27737/50000 [5:01:50<3:55:39,  1.57it/s]


 55%|██████████████████▎              | 27738/50000 [5:01:51<4:01:08,  1.54it/s]


 55%|██████████████████▎              | 27739/50000 [5:01:51<3:52:17,  1.60it/s]


 55%|██████████████████▎              | 27740/50000 [5:01:52<3:57:22,  1.56it/s]


 55%|██████████████████▎              | 27741/50000 [5:01:52<3:47:30,  1.63it/s]


 55%|██████████████████▎              | 27742/50000 [5:01:53<3:48:21,  1.62it/s]


 55%|██████████████████▎              | 27743/50000 [5:01:54<3:50:22,  1.61it/s]


 55%|██████████████████▎              | 27744/50000 [5:01:54<3:56:37,  1.57it/s]


 55%|██████████████████▎              | 27745/50000 [5:01:55<4:09:49,  1.48it/s]


 55%|██████████████████▎              | 27746/50000 [5:01:56<4:08:50,  1.49it/s]


 55%|██████████████████▎              | 27747/50000 [5:01:56<4:16:53,  1.44it/s]


 55%|██████████████████▎              | 27748/50000 [5:01:57<4:04:00,  1.52it/s]


 55%|██████████████████▎              | 27749/50000 [5:01:58<4:06:09,  1.51it/s]


 56%|██████████████████▎              | 27750/50000 [5:01:58<3:58:33,  1.55it/s]


 56%|██████████████████▎              | 27751/50000 [5:01:59<4:06:53,  1.50it/s]


 56%|██████████████████▎              | 27752/50000 [5:02:00<3:49:35,  1.61it/s]


 56%|██████████████████▎              | 27753/50000 [5:02:00<3:38:48,  1.69it/s]


 56%|██████████████████▎              | 27754/50000 [5:02:01<3:38:40,  1.70it/s]


 56%|██████████████████▎              | 27755/50000 [5:02:01<3:46:19,  1.64it/s]


 56%|██████████████████▎              | 27756/50000 [5:02:02<3:45:00,  1.65it/s]


 56%|██████████████████▎              | 27757/50000 [5:02:03<3:58:05,  1.56it/s]


 56%|██████████████████▎              | 27758/50000 [5:02:03<3:51:29,  1.60it/s]


 56%|██████████████████▎              | 27759/50000 [5:02:04<3:53:09,  1.59it/s]


 56%|██████████████████▎              | 27760/50000 [5:02:04<3:57:08,  1.56it/s]


 56%|██████████████████▎              | 27761/50000 [5:02:05<3:50:05,  1.61it/s]


 56%|██████████████████▎              | 27762/50000 [5:02:06<3:39:08,  1.69it/s]


 56%|██████████████████▎              | 27763/50000 [5:02:06<3:53:54,  1.58it/s]


 56%|██████████████████▎              | 27764/50000 [5:02:07<4:10:35,  1.48it/s]


 56%|██████████████████▎              | 27765/50000 [5:02:08<4:09:24,  1.49it/s]


 56%|██████████████████▎              | 27766/50000 [5:02:08<4:00:02,  1.54it/s]


 56%|██████████████████▎              | 27767/50000 [5:02:09<3:53:28,  1.59it/s]


 56%|██████████████████▎              | 27768/50000 [5:02:10<3:55:51,  1.57it/s]


 56%|██████████████████▎              | 27769/50000 [5:02:10<3:56:07,  1.57it/s]


 56%|██████████████████▎              | 27770/50000 [5:02:11<3:50:02,  1.61it/s]


 56%|██████████████████▎              | 27771/50000 [5:02:11<3:45:48,  1.64it/s]


 56%|██████████████████▎              | 27772/50000 [5:02:12<3:47:05,  1.63it/s]


 56%|██████████████████▎              | 27773/50000 [5:02:13<3:50:02,  1.61it/s]


 56%|██████████████████▎              | 27774/50000 [5:02:13<3:53:26,  1.59it/s]


 56%|██████████████████▎              | 27775/50000 [5:02:14<3:49:43,  1.61it/s]


 56%|██████████████████▎              | 27776/50000 [5:02:14<3:43:40,  1.66it/s]


 56%|██████████████████▎              | 27777/50000 [5:02:15<3:42:58,  1.66it/s]


 56%|██████████████████▎              | 27778/50000 [5:02:16<3:39:41,  1.69it/s]


 56%|██████████████████▎              | 27779/50000 [5:02:16<3:38:45,  1.69it/s]


 56%|██████████████████▎              | 27780/50000 [5:02:17<3:44:22,  1.65it/s]


 56%|██████████████████▎              | 27781/50000 [5:02:18<3:56:47,  1.56it/s]


 56%|██████████████████▎              | 27782/50000 [5:02:18<3:52:04,  1.60it/s]


 56%|██████████████████▎              | 27783/50000 [5:02:19<3:46:42,  1.63it/s]


 56%|██████████████████▎              | 27784/50000 [5:02:19<3:54:11,  1.58it/s]


 56%|██████████████████▎              | 27785/50000 [5:02:20<3:50:15,  1.61it/s]


 56%|██████████████████▎              | 27786/50000 [5:02:21<3:44:01,  1.65it/s]


 56%|██████████████████▎              | 27787/50000 [5:02:21<3:42:49,  1.66it/s]


 56%|██████████████████▎              | 27788/50000 [5:02:22<3:37:59,  1.70it/s]


 56%|██████████████████▎              | 27789/50000 [5:02:22<3:44:43,  1.65it/s]


 56%|██████████████████▎              | 27790/50000 [5:02:23<3:40:54,  1.68it/s]


 56%|██████████████████▎              | 27791/50000 [5:02:24<3:48:29,  1.62it/s]


 56%|██████████████████▎              | 27792/50000 [5:02:24<4:00:57,  1.54it/s]


 56%|██████████████████▎              | 27793/50000 [5:02:25<3:50:51,  1.60it/s]


 56%|██████████████████▎              | 27794/50000 [5:02:25<3:44:26,  1.65it/s]


 56%|██████████████████▎              | 27795/50000 [5:02:26<3:46:58,  1.63it/s]


 56%|██████████████████▎              | 27796/50000 [5:02:27<4:14:43,  1.45it/s]


 56%|██████████████████▎              | 27797/50000 [5:02:28<4:33:30,  1.35it/s]


 56%|██████████████████▎              | 27798/50000 [5:02:28<4:19:06,  1.43it/s]


 56%|██████████████████▎              | 27799/50000 [5:02:29<4:08:19,  1.49it/s]


 56%|██████████████████▎              | 27800/50000 [5:02:30<3:52:01,  1.59it/s]
                                                                                
{'loss': 3.3109, 'grad_norm': 2.9455134868621826, 'learning_rate': 0.000444, 'epoch': 1.46}

 56%|██████████████████▎              | 27800/50000 [5:02:30<3:52:01,  1.59it/s]


 56%|██████████████████▎              | 27801/50000 [5:02:30<3:49:24,  1.61it/s]


 56%|██████████████████▎              | 27802/50000 [5:02:31<4:04:41,  1.51it/s]


 56%|██████████████████▎              | 27803/50000 [5:02:32<3:57:30,  1.56it/s]


 56%|██████████████████▎              | 27804/50000 [5:02:32<3:56:04,  1.57it/s]


 56%|██████████████████▎              | 27805/50000 [5:02:33<4:10:37,  1.48it/s]


 56%|██████████████████▎              | 27806/50000 [5:02:34<3:57:58,  1.55it/s]


 56%|██████████████████▎              | 27807/50000 [5:02:34<3:48:26,  1.62it/s]


 56%|██████████████████▎              | 27808/50000 [5:02:35<3:54:24,  1.58it/s]


 56%|██████████████████▎              | 27809/50000 [5:02:35<4:02:49,  1.52it/s]


 56%|██████████████████▎              | 27810/50000 [5:02:36<3:47:56,  1.62it/s]


 56%|██████████████████▎              | 27811/50000 [5:02:37<3:53:18,  1.59it/s]


 56%|██████████████████▎              | 27812/50000 [5:02:37<3:55:37,  1.57it/s]


 56%|██████████████████▎              | 27813/50000 [5:02:38<3:58:40,  1.55it/s]


 56%|██████████████████▎              | 27814/50000 [5:02:39<3:50:16,  1.61it/s]


 56%|██████████████████▎              | 27815/50000 [5:02:39<3:47:10,  1.63it/s]


 56%|██████████████████▎              | 27816/50000 [5:02:40<3:46:24,  1.63it/s]


 56%|██████████████████▎              | 27817/50000 [5:02:40<3:42:39,  1.66it/s]


 56%|██████████████████▎              | 27818/50000 [5:02:41<3:50:18,  1.61it/s]


 56%|██████████████████▎              | 27819/50000 [5:02:42<4:00:33,  1.54it/s]


 56%|██████████████████▎              | 27820/50000 [5:02:42<4:08:01,  1.49it/s]


 56%|██████████████████▎              | 27821/50000 [5:02:43<3:55:27,  1.57it/s]


 56%|██████████████████▎              | 27822/50000 [5:02:44<3:55:42,  1.57it/s]


 56%|██████████████████▎              | 27823/50000 [5:02:44<3:55:16,  1.57it/s]


 56%|██████████████████▎              | 27824/50000 [5:02:45<4:24:23,  1.40it/s]


 56%|██████████████████▎              | 27825/50000 [5:02:46<4:29:37,  1.37it/s]


 56%|██████████████████▎              | 27826/50000 [5:02:47<4:18:16,  1.43it/s]


 56%|██████████████████▎              | 27827/50000 [5:02:47<4:06:47,  1.50it/s]


 56%|██████████████████▎              | 27828/50000 [5:02:48<3:48:38,  1.62it/s]


 56%|██████████████████▎              | 27829/50000 [5:02:48<3:50:55,  1.60it/s]


 56%|██████████████████▎              | 27830/50000 [5:02:49<3:57:23,  1.56it/s]


 56%|██████████████████▎              | 27831/50000 [5:02:50<3:49:23,  1.61it/s]


 56%|██████████████████▎              | 27832/50000 [5:02:50<3:50:45,  1.60it/s]


 56%|██████████████████▎              | 27833/50000 [5:02:51<4:03:11,  1.52it/s]


 56%|██████████████████▎              | 27834/50000 [5:02:51<3:51:21,  1.60it/s]


 56%|██████████████████▎              | 27835/50000 [5:02:52<3:42:22,  1.66it/s]


 56%|██████████████████▎              | 27836/50000 [5:02:53<3:42:59,  1.66it/s]


 56%|██████████████████▎              | 27837/50000 [5:02:53<3:50:25,  1.60it/s]


 56%|██████████████████▎              | 27838/50000 [5:02:54<3:51:38,  1.59it/s]


 56%|██████████████████▎              | 27839/50000 [5:02:55<3:51:42,  1.59it/s]


 56%|██████████████████▎              | 27840/50000 [5:02:55<4:16:53,  1.44it/s]


 56%|██████████████████▍              | 27841/50000 [5:02:56<4:08:37,  1.49it/s]


 56%|██████████████████▍              | 27842/50000 [5:02:57<4:09:57,  1.48it/s]


 56%|██████████████████▍              | 27843/50000 [5:02:58<4:28:12,  1.38it/s]


 56%|██████████████████▍              | 27844/50000 [5:02:58<4:26:45,  1.38it/s]


 56%|██████████████████▍              | 27845/50000 [5:02:59<4:17:39,  1.43it/s]


 56%|██████████████████▍              | 27846/50000 [5:03:00<4:13:44,  1.46it/s]


 56%|██████████████████▍              | 27847/50000 [5:03:00<4:28:51,  1.37it/s]


 56%|██████████████████▍              | 27848/50000 [5:03:01<4:21:52,  1.41it/s]


 56%|██████████████████▍              | 27849/50000 [5:03:02<3:59:06,  1.54it/s]


 56%|██████████████████▍              | 27850/50000 [5:03:02<3:58:50,  1.55it/s]


 56%|██████████████████▍              | 27851/50000 [5:03:03<3:59:29,  1.54it/s]


 56%|██████████████████▍              | 27852/50000 [5:03:03<3:50:04,  1.60it/s]


 56%|██████████████████▍              | 27853/50000 [5:03:04<3:42:58,  1.66it/s]


 56%|██████████████████▍              | 27854/50000 [5:03:05<3:51:04,  1.60it/s]


 56%|██████████████████▍              | 27855/50000 [5:03:05<3:51:54,  1.59it/s]


 56%|██████████████████▍              | 27856/50000 [5:03:06<3:48:57,  1.61it/s]


 56%|██████████████████▍              | 27857/50000 [5:03:07<4:03:57,  1.51it/s]


 56%|██████████████████▍              | 27858/50000 [5:03:07<3:55:56,  1.56it/s]


 56%|██████████████████▍              | 27859/50000 [5:03:08<3:48:19,  1.62it/s]


 56%|██████████████████▍              | 27860/50000 [5:03:08<3:55:29,  1.57it/s]


 56%|██████████████████▍              | 27861/50000 [5:03:09<3:56:16,  1.56it/s]


 56%|██████████████████▍              | 27862/50000 [5:03:10<3:50:34,  1.60it/s]


 56%|██████████████████▍              | 27863/50000 [5:03:10<3:52:09,  1.59it/s]


 56%|██████████████████▍              | 27864/50000 [5:03:11<3:42:50,  1.66it/s]


 56%|██████████████████▍              | 27865/50000 [5:03:12<3:54:30,  1.57it/s]


 56%|██████████████████▍              | 27866/50000 [5:03:12<4:15:59,  1.44it/s]


 56%|██████████████████▍              | 27867/50000 [5:03:13<4:06:12,  1.50it/s]


 56%|██████████████████▍              | 27868/50000 [5:03:14<4:03:19,  1.52it/s]


 56%|██████████████████▍              | 27869/50000 [5:03:14<4:04:04,  1.51it/s]


 56%|██████████████████▍              | 27870/50000 [5:03:15<4:34:07,  1.35it/s]


 56%|██████████████████▍              | 27871/50000 [5:03:16<4:27:43,  1.38it/s]


 56%|██████████████████▍              | 27872/50000 [5:03:17<4:38:34,  1.32it/s]


 56%|██████████████████▍              | 27873/50000 [5:03:18<4:44:48,  1.29it/s]


 56%|██████████████████▍              | 27874/50000 [5:03:18<4:26:09,  1.39it/s]


 56%|██████████████████▍              | 27875/50000 [5:03:19<4:00:54,  1.53it/s]


 56%|██████████████████▍              | 27876/50000 [5:03:19<3:46:30,  1.63it/s]


 56%|██████████████████▍              | 27877/50000 [5:03:20<3:53:26,  1.58it/s]


 56%|██████████████████▍              | 27878/50000 [5:03:21<4:06:29,  1.50it/s]


 56%|██████████████████▍              | 27879/50000 [5:03:21<4:05:38,  1.50it/s]


 56%|██████████████████▍              | 27880/50000 [5:03:22<4:14:18,  1.45it/s]


 56%|██████████████████▍              | 27881/50000 [5:03:23<4:02:00,  1.52it/s]


 56%|██████████████████▍              | 27882/50000 [5:03:23<4:11:05,  1.47it/s]


 56%|██████████████████▍              | 27883/50000 [5:03:24<4:18:12,  1.43it/s]


 56%|██████████████████▍              | 27884/50000 [5:03:25<4:02:41,  1.52it/s]


 56%|██████████████████▍              | 27885/50000 [5:03:25<4:09:33,  1.48it/s]


 56%|██████████████████▍              | 27886/50000 [5:03:26<4:01:25,  1.53it/s]


 56%|██████████████████▍              | 27887/50000 [5:03:27<4:18:57,  1.42it/s]


 56%|██████████████████▍              | 27888/50000 [5:03:27<4:04:36,  1.51it/s]


 56%|██████████████████▍              | 27889/50000 [5:03:28<4:20:55,  1.41it/s]


 56%|██████████████████▍              | 27890/50000 [5:03:29<4:21:21,  1.41it/s]


 56%|██████████████████▍              | 27891/50000 [5:03:30<4:23:04,  1.40it/s]


 56%|██████████████████▍              | 27892/50000 [5:03:30<4:18:22,  1.43it/s]


 56%|██████████████████▍              | 27893/50000 [5:03:31<4:06:37,  1.49it/s]


 56%|██████████████████▍              | 27894/50000 [5:03:31<3:54:56,  1.57it/s]


 56%|██████████████████▍              | 27895/50000 [5:03:32<4:06:21,  1.50it/s]


 56%|██████████████████▍              | 27896/50000 [5:03:33<4:12:13,  1.46it/s]


 56%|██████████████████▍              | 27897/50000 [5:03:34<4:19:45,  1.42it/s]


 56%|██████████████████▍              | 27898/50000 [5:03:34<4:23:30,  1.40it/s]


 56%|██████████████████▍              | 27899/50000 [5:03:35<4:46:51,  1.28it/s]


 56%|██████████████████▍              | 27900/50000 [5:03:36<4:33:03,  1.35it/s]
                                                                                
{'loss': 3.2891, 'grad_norm': 3.1650712490081787, 'learning_rate': 0.000442, 'epoch': 1.46}

 56%|██████████████████▍              | 27900/50000 [5:03:36<4:33:03,  1.35it/s]


 56%|██████████████████▍              | 27901/50000 [5:03:37<4:19:55,  1.42it/s]


 56%|██████████████████▍              | 27902/50000 [5:03:37<3:58:18,  1.55it/s]


 56%|██████████████████▍              | 27903/50000 [5:03:38<4:05:15,  1.50it/s]


 56%|██████████████████▍              | 27904/50000 [5:03:38<4:00:55,  1.53it/s]


 56%|██████████████████▍              | 27905/50000 [5:03:39<4:00:00,  1.53it/s]


 56%|██████████████████▍              | 27906/50000 [5:03:40<3:54:36,  1.57it/s]


 56%|██████████████████▍              | 27907/50000 [5:03:40<3:53:22,  1.58it/s]


 56%|██████████████████▍              | 27908/50000 [5:03:41<3:45:52,  1.63it/s]


 56%|██████████████████▍              | 27909/50000 [5:03:42<3:58:21,  1.54it/s]


 56%|██████████████████▍              | 27910/50000 [5:03:42<3:53:17,  1.58it/s]


 56%|██████████████████▍              | 27911/50000 [5:03:43<3:57:28,  1.55it/s]


 56%|██████████████████▍              | 27912/50000 [5:03:44<4:08:20,  1.48it/s]


 56%|██████████████████▍              | 27913/50000 [5:03:44<4:16:38,  1.43it/s]


 56%|██████████████████▍              | 27914/50000 [5:03:45<4:01:17,  1.53it/s]


 56%|██████████████████▍              | 27915/50000 [5:03:46<4:18:59,  1.42it/s]


 56%|██████████████████▍              | 27916/50000 [5:03:46<4:19:38,  1.42it/s]


 56%|██████████████████▍              | 27917/50000 [5:03:47<4:07:29,  1.49it/s]


 56%|██████████████████▍              | 27918/50000 [5:03:48<3:53:23,  1.58it/s]


 56%|██████████████████▍              | 27919/50000 [5:03:48<3:54:37,  1.57it/s]


 56%|██████████████████▍              | 27920/50000 [5:03:49<3:49:54,  1.60it/s]


 56%|██████████████████▍              | 27921/50000 [5:03:49<3:44:32,  1.64it/s]


 56%|██████████████████▍              | 27922/50000 [5:03:50<3:39:25,  1.68it/s]


 56%|██████████████████▍              | 27923/50000 [5:03:51<3:37:54,  1.69it/s]


 56%|██████████████████▍              | 27924/50000 [5:03:51<3:37:31,  1.69it/s]


 56%|██████████████████▍              | 27925/50000 [5:03:52<3:33:43,  1.72it/s]


 56%|██████████████████▍              | 27926/50000 [5:03:52<3:33:56,  1.72it/s]


 56%|██████████████████▍              | 27927/50000 [5:03:53<3:48:03,  1.61it/s]


 56%|██████████████████▍              | 27928/50000 [5:03:54<3:55:08,  1.56it/s]


 56%|██████████████████▍              | 27929/50000 [5:03:54<4:00:03,  1.53it/s]


 56%|██████████████████▍              | 27930/50000 [5:03:55<4:08:37,  1.48it/s]


 56%|██████████████████▍              | 27931/50000 [5:03:56<4:17:26,  1.43it/s]


 56%|██████████████████▍              | 27932/50000 [5:03:56<4:02:24,  1.52it/s]


 56%|██████████████████▍              | 27933/50000 [5:03:57<4:03:00,  1.51it/s]


 56%|██████████████████▍              | 27934/50000 [5:03:58<4:12:09,  1.46it/s]


 56%|██████████████████▍              | 27935/50000 [5:03:58<4:02:52,  1.51it/s]


 56%|██████████████████▍              | 27936/50000 [5:03:59<4:34:52,  1.34it/s]


 56%|██████████████████▍              | 27937/50000 [5:04:00<4:27:49,  1.37it/s]


 56%|██████████████████▍              | 27938/50000 [5:04:01<4:18:45,  1.42it/s]


 56%|██████████████████▍              | 27939/50000 [5:04:01<4:07:29,  1.49it/s]


 56%|██████████████████▍              | 27940/50000 [5:04:02<3:54:46,  1.57it/s]


 56%|██████████████████▍              | 27941/50000 [5:04:03<3:55:33,  1.56it/s]


 56%|██████████████████▍              | 27942/50000 [5:04:03<3:57:33,  1.55it/s]


 56%|██████████████████▍              | 27943/50000 [5:04:04<3:55:27,  1.56it/s]


 56%|██████████████████▍              | 27944/50000 [5:04:04<3:51:06,  1.59it/s]


 56%|██████████████████▍              | 27945/50000 [5:04:05<3:53:40,  1.57it/s]


 56%|██████████████████▍              | 27946/50000 [5:04:06<3:59:00,  1.54it/s]


 56%|██████████████████▍              | 27947/50000 [5:04:06<3:57:45,  1.55it/s]


 56%|██████████████████▍              | 27948/50000 [5:04:07<3:52:03,  1.58it/s]


 56%|██████████████████▍              | 27949/50000 [5:04:08<4:02:56,  1.51it/s]


 56%|██████████████████▍              | 27950/50000 [5:04:08<3:54:57,  1.56it/s]


 56%|██████████████████▍              | 27951/50000 [5:04:09<3:53:32,  1.57it/s]


 56%|██████████████████▍              | 27952/50000 [5:04:10<3:53:57,  1.57it/s]


 56%|██████████████████▍              | 27953/50000 [5:04:10<3:52:54,  1.58it/s]


 56%|██████████████████▍              | 27954/50000 [5:04:11<3:50:07,  1.60it/s]


 56%|██████████████████▍              | 27955/50000 [5:04:11<3:43:17,  1.65it/s]


 56%|██████████████████▍              | 27956/50000 [5:04:12<3:50:56,  1.59it/s]


 56%|██████████████████▍              | 27957/50000 [5:04:13<3:54:55,  1.56it/s]


 56%|██████████████████▍              | 27958/50000 [5:04:14<4:23:00,  1.40it/s]


 56%|██████████████████▍              | 27959/50000 [5:04:14<4:16:54,  1.43it/s]


 56%|██████████████████▍              | 27960/50000 [5:04:15<4:24:33,  1.39it/s]


 56%|██████████████████▍              | 27961/50000 [5:04:16<4:18:39,  1.42it/s]


 56%|██████████████████▍              | 27962/50000 [5:04:16<4:16:41,  1.43it/s]


 56%|██████████████████▍              | 27963/50000 [5:04:17<4:23:09,  1.40it/s]


 56%|██████████████████▍              | 27964/50000 [5:04:18<4:04:30,  1.50it/s]


 56%|██████████████████▍              | 27965/50000 [5:04:18<4:01:47,  1.52it/s]


 56%|██████████████████▍              | 27966/50000 [5:04:19<3:53:40,  1.57it/s]


 56%|██████████████████▍              | 27967/50000 [5:04:19<3:41:19,  1.66it/s]


 56%|██████████████████▍              | 27968/50000 [5:04:20<4:00:34,  1.53it/s]


 56%|██████████████████▍              | 27969/50000 [5:04:21<4:01:50,  1.52it/s]


 56%|██████████████████▍              | 27970/50000 [5:04:22<3:55:04,  1.56it/s]


 56%|██████████████████▍              | 27971/50000 [5:04:22<3:53:13,  1.57it/s]


 56%|██████████████████▍              | 27972/50000 [5:04:23<3:53:54,  1.57it/s]


 56%|██████████████████▍              | 27973/50000 [5:04:23<3:55:13,  1.56it/s]


 56%|██████████████████▍              | 27974/50000 [5:04:24<3:48:26,  1.61it/s]


 56%|██████████████████▍              | 27975/50000 [5:04:25<3:58:57,  1.54it/s]


 56%|██████████████████▍              | 27976/50000 [5:04:25<3:41:44,  1.66it/s]


 56%|██████████████████▍              | 27977/50000 [5:04:26<3:38:16,  1.68it/s]


 56%|██████████████████▍              | 27978/50000 [5:04:26<3:45:34,  1.63it/s]


 56%|██████████████████▍              | 27979/50000 [5:04:27<3:46:22,  1.62it/s]


 56%|██████████████████▍              | 27980/50000 [5:04:28<3:52:21,  1.58it/s]


 56%|██████████████████▍              | 27981/50000 [5:04:28<3:44:44,  1.63it/s]


 56%|██████████████████▍              | 27982/50000 [5:04:29<4:02:44,  1.51it/s]


 56%|██████████████████▍              | 27983/50000 [5:04:30<4:01:42,  1.52it/s]


 56%|██████████████████▍              | 27984/50000 [5:04:30<3:51:16,  1.59it/s]


 56%|██████████████████▍              | 27985/50000 [5:04:31<3:59:59,  1.53it/s]


 56%|██████████████████▍              | 27986/50000 [5:04:32<3:58:19,  1.54it/s]


 56%|██████████████████▍              | 27987/50000 [5:04:32<3:46:50,  1.62it/s]


 56%|██████████████████▍              | 27988/50000 [5:04:33<3:54:02,  1.57it/s]


 56%|██████████████████▍              | 27989/50000 [5:04:34<3:54:51,  1.56it/s]


 56%|██████████████████▍              | 27990/50000 [5:04:34<3:47:11,  1.61it/s]


 56%|██████████████████▍              | 27991/50000 [5:04:35<3:42:27,  1.65it/s]


 56%|██████████████████▍              | 27992/50000 [5:04:35<3:37:06,  1.69it/s]


 56%|██████████████████▍              | 27993/50000 [5:04:36<3:40:58,  1.66it/s]


 56%|██████████████████▍              | 27994/50000 [5:04:37<3:54:20,  1.57it/s]


 56%|██████████████████▍              | 27995/50000 [5:04:37<3:53:02,  1.57it/s]


 56%|██████████████████▍              | 27996/50000 [5:04:38<4:04:54,  1.50it/s]


 56%|██████████████████▍              | 27997/50000 [5:04:39<3:59:44,  1.53it/s]


 56%|██████████████████▍              | 27998/50000 [5:04:39<3:44:37,  1.63it/s]


 56%|██████████████████▍              | 27999/50000 [5:04:40<4:03:12,  1.51it/s]


 56%|██████████████████▍              | 28000/50000 [5:04:41<4:25:00,  1.38it/s]
                                                                                
{'loss': 3.2877, 'grad_norm': 4.570897579193115, 'learning_rate': 0.00044, 'epoch': 1.47}

 56%|██████████████████▍              | 28000/50000 [5:04:41<4:25:00,  1.38it/s]


 56%|██████████████████▍              | 28001/50000 [5:04:41<4:16:34,  1.43it/s]


 56%|██████████████████▍              | 28002/50000 [5:04:42<4:23:26,  1.39it/s]


 56%|██████████████████▍              | 28003/50000 [5:04:43<4:14:04,  1.44it/s]


 56%|██████████████████▍              | 28004/50000 [5:04:43<4:02:36,  1.51it/s]


 56%|██████████████████▍              | 28005/50000 [5:04:44<3:59:27,  1.53it/s]


 56%|██████████████████▍              | 28006/50000 [5:04:45<4:01:27,  1.52it/s]


 56%|██████████████████▍              | 28007/50000 [5:04:45<3:44:27,  1.63it/s]


 56%|██████████████████▍              | 28008/50000 [5:04:46<3:46:05,  1.62it/s]


 56%|██████████████████▍              | 28009/50000 [5:04:46<3:44:23,  1.63it/s]


 56%|██████████████████▍              | 28010/50000 [5:04:47<3:50:11,  1.59it/s]


 56%|██████████████████▍              | 28011/50000 [5:04:48<3:52:18,  1.58it/s]


 56%|██████████████████▍              | 28012/50000 [5:04:48<4:01:19,  1.52it/s]


 56%|██████████████████▍              | 28013/50000 [5:04:49<3:43:37,  1.64it/s]


 56%|██████████████████▍              | 28014/50000 [5:04:50<3:48:13,  1.61it/s]


 56%|██████████████████▍              | 28015/50000 [5:04:50<3:35:49,  1.70it/s]


 56%|██████████████████▍              | 28016/50000 [5:04:51<3:36:30,  1.69it/s]


 56%|██████████████████▍              | 28017/50000 [5:04:51<3:43:12,  1.64it/s]


 56%|██████████████████▍              | 28018/50000 [5:04:52<3:36:11,  1.69it/s]


 56%|██████████████████▍              | 28019/50000 [5:04:52<3:34:00,  1.71it/s]


 56%|██████████████████▍              | 28020/50000 [5:04:53<3:35:14,  1.70it/s]


 56%|██████████████████▍              | 28021/50000 [5:04:54<4:01:43,  1.52it/s]


 56%|██████████████████▍              | 28022/50000 [5:04:54<3:54:37,  1.56it/s]


 56%|██████████████████▍              | 28023/50000 [5:04:55<3:55:54,  1.55it/s]


 56%|██████████████████▍              | 28024/50000 [5:04:56<3:50:34,  1.59it/s]


 56%|██████████████████▍              | 28025/50000 [5:04:56<3:46:50,  1.61it/s]


 56%|██████████████████▍              | 28026/50000 [5:04:57<3:44:48,  1.63it/s]


 56%|██████████████████▍              | 28027/50000 [5:04:58<4:01:04,  1.52it/s]


 56%|██████████████████▍              | 28028/50000 [5:04:58<3:49:19,  1.60it/s]


 56%|██████████████████▍              | 28029/50000 [5:04:59<3:48:47,  1.60it/s]


 56%|██████████████████▍              | 28030/50000 [5:05:00<3:53:55,  1.57it/s]


 56%|██████████████████▌              | 28031/50000 [5:05:00<3:56:47,  1.55it/s]


 56%|██████████████████▌              | 28032/50000 [5:05:01<4:04:21,  1.50it/s]


 56%|██████████████████▌              | 28033/50000 [5:05:02<4:09:56,  1.46it/s]


 56%|██████████████████▌              | 28034/50000 [5:05:02<3:59:36,  1.53it/s]


 56%|██████████████████▌              | 28035/50000 [5:05:03<3:53:04,  1.57it/s]


 56%|██████████████████▌              | 28036/50000 [5:05:03<3:44:24,  1.63it/s]


 56%|██████████████████▌              | 28037/50000 [5:05:04<3:40:23,  1.66it/s]


 56%|██████████████████▌              | 28038/50000 [5:05:05<3:58:00,  1.54it/s]


 56%|██████████████████▌              | 28039/50000 [5:05:05<3:49:54,  1.59it/s]


 56%|██████████████████▌              | 28040/50000 [5:05:06<3:53:19,  1.57it/s]


 56%|██████████████████▌              | 28041/50000 [5:05:07<3:49:21,  1.60it/s]


 56%|██████████████████▌              | 28042/50000 [5:05:07<3:50:43,  1.59it/s]


 56%|██████████████████▌              | 28043/50000 [5:05:08<3:49:40,  1.59it/s]


 56%|██████████████████▌              | 28044/50000 [5:05:08<3:41:19,  1.65it/s]


 56%|██████████████████▌              | 28045/50000 [5:05:09<3:38:53,  1.67it/s]


 56%|██████████████████▌              | 28046/50000 [5:05:10<3:42:03,  1.65it/s]


 56%|██████████████████▌              | 28047/50000 [5:05:10<3:37:19,  1.68it/s]


 56%|██████████████████▌              | 28048/50000 [5:05:11<3:45:49,  1.62it/s]


 56%|██████████████████▌              | 28049/50000 [5:05:11<3:49:36,  1.59it/s]


 56%|██████████████████▌              | 28050/50000 [5:05:12<3:41:14,  1.65it/s]


 56%|██████████████████▌              | 28051/50000 [5:05:13<3:54:18,  1.56it/s]


 56%|██████████████████▌              | 28052/50000 [5:05:13<3:49:17,  1.60it/s]


 56%|██████████████████▌              | 28053/50000 [5:05:14<3:48:36,  1.60it/s]


 56%|██████████████████▌              | 28054/50000 [5:05:15<3:53:38,  1.57it/s]


 56%|██████████████████▌              | 28055/50000 [5:05:15<3:43:20,  1.64it/s]


 56%|██████████████████▌              | 28056/50000 [5:05:16<3:55:47,  1.55it/s]


 56%|██████████████████▌              | 28057/50000 [5:05:17<4:00:04,  1.52it/s]


 56%|██████████████████▌              | 28058/50000 [5:05:17<3:49:54,  1.59it/s]


 56%|██████████████████▌              | 28059/50000 [5:05:18<3:54:38,  1.56it/s]


 56%|██████████████████▌              | 28060/50000 [5:05:18<3:48:51,  1.60it/s]


 56%|██████████████████▌              | 28061/50000 [5:05:19<3:50:23,  1.59it/s]


 56%|██████████████████▌              | 28062/50000 [5:05:20<4:03:25,  1.50it/s]


 56%|██████████████████▌              | 28063/50000 [5:05:21<4:22:00,  1.40it/s]


 56%|██████████████████▌              | 28064/50000 [5:05:21<4:14:14,  1.44it/s]


 56%|██████████████████▌              | 28065/50000 [5:05:22<4:19:28,  1.41it/s]


 56%|██████████████████▌              | 28066/50000 [5:05:23<4:13:52,  1.44it/s]


 56%|██████████████████▌              | 28067/50000 [5:05:23<4:07:48,  1.48it/s]


 56%|██████████████████▌              | 28068/50000 [5:05:24<3:56:43,  1.54it/s]


 56%|██████████████████▌              | 28069/50000 [5:05:25<4:06:29,  1.48it/s]


 56%|██████████████████▌              | 28070/50000 [5:05:25<4:02:38,  1.51it/s]


 56%|██████████████████▌              | 28071/50000 [5:05:26<3:54:25,  1.56it/s]


 56%|██████████████████▌              | 28072/50000 [5:05:26<3:45:56,  1.62it/s]


 56%|██████████████████▌              | 28073/50000 [5:05:27<3:51:46,  1.58it/s]


 56%|██████████████████▌              | 28074/50000 [5:05:28<3:48:11,  1.60it/s]


 56%|██████████████████▌              | 28075/50000 [5:05:28<3:40:11,  1.66it/s]


 56%|██████████████████▌              | 28076/50000 [5:05:29<3:37:19,  1.68it/s]


 56%|██████████████████▌              | 28077/50000 [5:05:29<3:31:51,  1.72it/s]


 56%|██████████████████▌              | 28078/50000 [5:05:30<3:46:15,  1.61it/s]


 56%|██████████████████▌              | 28079/50000 [5:05:31<3:40:57,  1.65it/s]


 56%|██████████████████▌              | 28080/50000 [5:05:31<3:46:05,  1.62it/s]


 56%|██████████████████▌              | 28081/50000 [5:05:32<3:49:03,  1.59it/s]


 56%|██████████████████▌              | 28082/50000 [5:05:33<3:50:22,  1.59it/s]


 56%|██████████████████▌              | 28083/50000 [5:05:33<3:51:17,  1.58it/s]


 56%|██████████████████▌              | 28084/50000 [5:05:34<3:51:19,  1.58it/s]


 56%|██████████████████▌              | 28085/50000 [5:05:34<3:33:04,  1.71it/s]


 56%|██████████████████▌              | 28086/50000 [5:05:35<3:32:21,  1.72it/s]


 56%|██████████████████▌              | 28087/50000 [5:05:36<3:47:03,  1.61it/s]


 56%|██████████████████▌              | 28088/50000 [5:05:36<3:47:35,  1.60it/s]


 56%|██████████████████▌              | 28089/50000 [5:05:37<3:51:25,  1.58it/s]


 56%|██████████████████▌              | 28090/50000 [5:05:37<3:41:39,  1.65it/s]


 56%|██████████████████▌              | 28091/50000 [5:05:38<3:54:28,  1.56it/s]


 56%|██████████████████▌              | 28092/50000 [5:05:39<3:53:33,  1.56it/s]


 56%|██████████████████▌              | 28093/50000 [5:05:39<3:45:27,  1.62it/s]


 56%|██████████████████▌              | 28094/50000 [5:05:40<3:37:28,  1.68it/s]


 56%|██████████████████▌              | 28095/50000 [5:05:40<3:26:05,  1.77it/s]


 56%|██████████████████▌              | 28096/50000 [5:05:41<3:35:09,  1.70it/s]


 56%|██████████████████▌              | 28097/50000 [5:05:42<3:30:21,  1.74it/s]


 56%|██████████████████▌              | 28098/50000 [5:05:42<3:40:43,  1.65it/s]


 56%|██████████████████▌              | 28099/50000 [5:05:43<3:43:52,  1.63it/s]


 56%|██████████████████▌              | 28100/50000 [5:05:43<3:41:10,  1.65it/s]
                                                                                
{'loss': 3.2938, 'grad_norm': 3.5880777835845947, 'learning_rate': 0.000438, 'epoch': 1.47}

 56%|██████████████████▌              | 28100/50000 [5:05:43<3:41:10,  1.65it/s]


 56%|██████████████████▌              | 28101/50000 [5:05:44<3:48:21,  1.60it/s]


 56%|██████████████████▌              | 28102/50000 [5:05:45<3:49:12,  1.59it/s]


 56%|██████████████████▌              | 28103/50000 [5:05:45<3:49:49,  1.59it/s]


 56%|██████████████████▌              | 28104/50000 [5:05:46<3:50:15,  1.58it/s]


 56%|██████████████████▌              | 28105/50000 [5:05:47<3:45:39,  1.62it/s]


 56%|██████████████████▌              | 28106/50000 [5:05:47<3:38:17,  1.67it/s]


 56%|██████████████████▌              | 28107/50000 [5:05:48<4:02:08,  1.51it/s]


 56%|██████████████████▌              | 28108/50000 [5:05:49<3:54:06,  1.56it/s]


 56%|██████████████████▌              | 28109/50000 [5:05:49<4:02:26,  1.50it/s]


 56%|██████████████████▌              | 28110/50000 [5:05:50<4:01:19,  1.51it/s]


 56%|██████████████████▌              | 28111/50000 [5:05:51<4:03:17,  1.50it/s]


 56%|██████████████████▌              | 28112/50000 [5:05:51<4:12:39,  1.44it/s]


 56%|██████████████████▌              | 28113/50000 [5:05:52<4:00:52,  1.51it/s]


 56%|██████████████████▌              | 28114/50000 [5:05:53<3:54:00,  1.56it/s]


 56%|██████████████████▌              | 28115/50000 [5:05:53<4:03:18,  1.50it/s]


 56%|██████████████████▌              | 28116/50000 [5:05:54<3:54:21,  1.56it/s]


 56%|██████████████████▌              | 28117/50000 [5:05:54<3:45:59,  1.61it/s]


 56%|██████████████████▌              | 28118/50000 [5:05:55<3:40:48,  1.65it/s]


 56%|██████████████████▌              | 28119/50000 [5:05:56<3:56:03,  1.54it/s]


 56%|██████████████████▌              | 28120/50000 [5:05:56<3:45:47,  1.62it/s]


 56%|██████████████████▌              | 28121/50000 [5:05:57<3:43:10,  1.63it/s]


 56%|██████████████████▌              | 28122/50000 [5:05:58<4:09:49,  1.46it/s]


 56%|██████████████████▌              | 28123/50000 [5:05:58<3:55:52,  1.55it/s]


 56%|██████████████████▌              | 28124/50000 [5:05:59<3:51:32,  1.57it/s]


 56%|██████████████████▌              | 28125/50000 [5:05:59<3:41:40,  1.64it/s]


 56%|██████████████████▌              | 28126/50000 [5:06:00<3:37:28,  1.68it/s]


 56%|██████████████████▌              | 28127/50000 [5:06:01<4:02:06,  1.51it/s]


 56%|██████████████████▌              | 28128/50000 [5:06:02<3:58:45,  1.53it/s]


 56%|██████████████████▌              | 28129/50000 [5:06:02<3:59:48,  1.52it/s]


 56%|██████████████████▌              | 28130/50000 [5:06:03<3:49:30,  1.59it/s]


 56%|██████████████████▌              | 28131/50000 [5:06:03<3:50:34,  1.58it/s]


 56%|██████████████████▌              | 28132/50000 [5:06:04<4:01:56,  1.51it/s]


 56%|██████████████████▌              | 28133/50000 [5:06:05<3:49:38,  1.59it/s]


 56%|██████████████████▌              | 28134/50000 [5:06:05<3:55:32,  1.55it/s]


 56%|██████████████████▌              | 28135/50000 [5:06:06<4:03:06,  1.50it/s]


 56%|██████████████████▌              | 28136/50000 [5:06:07<3:43:14,  1.63it/s]


 56%|██████████████████▌              | 28137/50000 [5:06:07<3:49:22,  1.59it/s]


 56%|██████████████████▌              | 28138/50000 [5:06:08<3:58:49,  1.53it/s]


 56%|██████████████████▌              | 28139/50000 [5:06:09<3:49:39,  1.59it/s]


 56%|██████████████████▌              | 28140/50000 [5:06:09<3:53:24,  1.56it/s]


 56%|██████████████████▌              | 28141/50000 [5:06:10<3:45:00,  1.62it/s]


 56%|██████████████████▌              | 28142/50000 [5:06:10<3:38:25,  1.67it/s]


 56%|██████████████████▌              | 28143/50000 [5:06:11<3:58:15,  1.53it/s]


 56%|██████████████████▌              | 28144/50000 [5:06:12<3:47:41,  1.60it/s]


 56%|██████████████████▌              | 28145/50000 [5:06:12<3:58:28,  1.53it/s]


 56%|██████████████████▌              | 28146/50000 [5:06:13<3:49:55,  1.58it/s]


 56%|██████████████████▌              | 28147/50000 [5:06:14<3:48:45,  1.59it/s]


 56%|██████████████████▌              | 28148/50000 [5:06:14<3:39:40,  1.66it/s]


 56%|██████████████████▌              | 28149/50000 [5:06:15<3:47:46,  1.60it/s]


 56%|██████████████████▌              | 28150/50000 [5:06:15<3:48:36,  1.59it/s]


 56%|██████████████████▌              | 28151/50000 [5:06:16<3:46:23,  1.61it/s]


 56%|██████████████████▌              | 28152/50000 [5:06:17<3:59:36,  1.52it/s]


 56%|██████████████████▌              | 28153/50000 [5:06:17<3:50:06,  1.58it/s]


 56%|██████████████████▌              | 28154/50000 [5:06:18<3:41:59,  1.64it/s]


 56%|██████████████████▌              | 28155/50000 [5:06:18<3:37:41,  1.67it/s]


 56%|██████████████████▌              | 28156/50000 [5:06:19<3:35:21,  1.69it/s]


 56%|██████████████████▌              | 28157/50000 [5:06:20<3:31:00,  1.73it/s]


 56%|██████████████████▌              | 28158/50000 [5:06:20<3:30:42,  1.73it/s]


 56%|██████████████████▌              | 28159/50000 [5:06:21<3:41:28,  1.64it/s]


 56%|██████████████████▌              | 28160/50000 [5:06:21<3:36:36,  1.68it/s]


 56%|██████████████████▌              | 28161/50000 [5:06:22<3:52:05,  1.57it/s]


 56%|██████████████████▌              | 28162/50000 [5:06:23<3:53:01,  1.56it/s]


 56%|██████████████████▌              | 28163/50000 [5:06:24<4:04:13,  1.49it/s]


 56%|██████████████████▌              | 28164/50000 [5:06:24<4:09:16,  1.46it/s]


 56%|██████████████████▌              | 28165/50000 [5:06:25<4:19:47,  1.40it/s]


 56%|██████████████████▌              | 28166/50000 [5:06:26<4:15:16,  1.43it/s]


 56%|██████████████████▌              | 28167/50000 [5:06:26<4:03:52,  1.49it/s]


 56%|██████████████████▌              | 28168/50000 [5:06:27<3:57:17,  1.53it/s]


 56%|██████████████████▌              | 28169/50000 [5:06:28<3:50:38,  1.58it/s]


 56%|██████████████████▌              | 28170/50000 [5:06:28<3:55:37,  1.54it/s]


 56%|██████████████████▌              | 28171/50000 [5:06:29<3:53:56,  1.56it/s]


 56%|██████████████████▌              | 28172/50000 [5:06:29<3:53:26,  1.56it/s]


 56%|██████████████████▌              | 28173/50000 [5:06:30<3:43:09,  1.63it/s]


 56%|██████████████████▌              | 28174/50000 [5:06:31<3:45:12,  1.62it/s]


 56%|██████████████████▌              | 28175/50000 [5:06:31<3:52:05,  1.57it/s]


 56%|██████████████████▌              | 28176/50000 [5:06:32<4:13:08,  1.44it/s]


 56%|██████████████████▌              | 28177/50000 [5:06:33<4:16:26,  1.42it/s]


 56%|██████████████████▌              | 28178/50000 [5:06:33<4:04:32,  1.49it/s]


 56%|██████████████████▌              | 28179/50000 [5:06:34<4:05:04,  1.48it/s]


 56%|██████████████████▌              | 28180/50000 [5:06:35<4:05:28,  1.48it/s]


 56%|██████████████████▌              | 28181/50000 [5:06:35<3:52:38,  1.56it/s]


 56%|██████████████████▌              | 28182/50000 [5:06:36<3:47:47,  1.60it/s]


 56%|██████████████████▌              | 28183/50000 [5:06:37<3:49:11,  1.59it/s]


 56%|██████████████████▌              | 28184/50000 [5:06:37<3:45:19,  1.61it/s]


 56%|██████████████████▌              | 28185/50000 [5:06:38<3:47:29,  1.60it/s]


 56%|██████████████████▌              | 28186/50000 [5:06:39<3:51:10,  1.57it/s]


 56%|██████████████████▌              | 28187/50000 [5:06:39<3:52:16,  1.57it/s]


 56%|██████████████████▌              | 28188/50000 [5:06:40<3:54:30,  1.55it/s]


 56%|██████████████████▌              | 28189/50000 [5:06:40<3:52:29,  1.56it/s]


 56%|██████████████████▌              | 28190/50000 [5:06:41<3:48:13,  1.59it/s]


 56%|██████████████████▌              | 28191/50000 [5:06:42<3:53:05,  1.56it/s]


 56%|██████████████████▌              | 28192/50000 [5:06:42<3:56:24,  1.54it/s]


 56%|██████████████████▌              | 28193/50000 [5:06:43<3:58:59,  1.52it/s]


 56%|██████████████████▌              | 28194/50000 [5:06:44<3:50:08,  1.58it/s]


 56%|██████████████████▌              | 28195/50000 [5:06:44<3:50:08,  1.58it/s]


 56%|██████████████████▌              | 28196/50000 [5:06:45<3:59:25,  1.52it/s]


 56%|██████████████████▌              | 28197/50000 [5:06:46<3:51:09,  1.57it/s]


 56%|██████████████████▌              | 28198/50000 [5:06:46<3:46:56,  1.60it/s]


 56%|██████████████████▌              | 28199/50000 [5:06:47<3:43:49,  1.62it/s]


 56%|██████████████████▌              | 28200/50000 [5:06:47<3:56:16,  1.54it/s]
                                                                                
{'loss': 3.2855, 'grad_norm': 2.957921028137207, 'learning_rate': 0.000436, 'epoch': 1.48}

 56%|██████████████████▌              | 28200/50000 [5:06:47<3:56:16,  1.54it/s]


 56%|██████████████████▌              | 28201/50000 [5:06:48<3:57:15,  1.53it/s]


 56%|██████████████████▌              | 28202/50000 [5:06:49<4:03:23,  1.49it/s]


 56%|██████████████████▌              | 28203/50000 [5:06:49<3:52:29,  1.56it/s]


 56%|██████████████████▌              | 28204/50000 [5:06:50<3:53:08,  1.56it/s]


 56%|██████████████████▌              | 28205/50000 [5:06:51<3:50:51,  1.57it/s]


 56%|██████████████████▌              | 28206/50000 [5:06:51<3:49:54,  1.58it/s]


 56%|██████████████████▌              | 28207/50000 [5:06:52<3:46:27,  1.60it/s]


 56%|██████████████████▌              | 28208/50000 [5:06:53<4:11:38,  1.44it/s]


 56%|██████████████████▌              | 28209/50000 [5:06:54<4:15:01,  1.42it/s]


 56%|██████████████████▌              | 28210/50000 [5:06:54<3:53:25,  1.56it/s]


 56%|██████████████████▌              | 28211/50000 [5:06:55<3:55:08,  1.54it/s]


 56%|██████████████████▌              | 28212/50000 [5:06:55<3:56:58,  1.53it/s]


 56%|██████████████████▌              | 28213/50000 [5:06:56<4:04:44,  1.48it/s]


 56%|██████████████████▌              | 28214/50000 [5:06:57<4:03:04,  1.49it/s]


 56%|██████████████████▌              | 28215/50000 [5:06:58<4:15:13,  1.42it/s]


 56%|██████████████████▌              | 28216/50000 [5:06:58<4:28:56,  1.35it/s]


 56%|██████████████████▌              | 28217/50000 [5:06:59<4:21:19,  1.39it/s]


 56%|██████████████████▌              | 28218/50000 [5:07:00<4:03:43,  1.49it/s]


 56%|██████████████████▌              | 28219/50000 [5:07:00<3:59:45,  1.51it/s]


 56%|██████████████████▋              | 28220/50000 [5:07:01<4:16:14,  1.42it/s]


 56%|██████████████████▋              | 28221/50000 [5:07:02<4:09:11,  1.46it/s]


 56%|██████████████████▋              | 28222/50000 [5:07:02<4:14:11,  1.43it/s]


 56%|██████████████████▋              | 28223/50000 [5:07:03<4:08:31,  1.46it/s]


 56%|██████████████████▋              | 28224/50000 [5:07:04<3:56:18,  1.54it/s]


 56%|██████████████████▋              | 28225/50000 [5:07:04<3:47:00,  1.60it/s]


 56%|██████████████████▋              | 28226/50000 [5:07:05<4:01:30,  1.50it/s]


 56%|██████████████████▋              | 28227/50000 [5:07:06<4:10:08,  1.45it/s]


 56%|██████████████████▋              | 28228/50000 [5:07:06<4:20:04,  1.40it/s]


 56%|██████████████████▋              | 28229/50000 [5:07:07<4:16:43,  1.41it/s]


 56%|██████████████████▋              | 28230/50000 [5:07:08<4:03:08,  1.49it/s]


 56%|██████████████████▋              | 28231/50000 [5:07:08<3:58:27,  1.52it/s]


 56%|██████████████████▋              | 28232/50000 [5:07:09<4:05:42,  1.48it/s]


 56%|██████████████████▋              | 28233/50000 [5:07:10<4:23:33,  1.38it/s]


 56%|██████████████████▋              | 28234/50000 [5:07:10<4:07:15,  1.47it/s]


 56%|██████████████████▋              | 28235/50000 [5:07:11<4:05:30,  1.48it/s]


 56%|██████████████████▋              | 28236/50000 [5:07:12<4:13:25,  1.43it/s]


 56%|██████████████████▋              | 28237/50000 [5:07:12<3:57:23,  1.53it/s]


 56%|██████████████████▋              | 28238/50000 [5:07:13<3:57:04,  1.53it/s]


 56%|██████████████████▋              | 28239/50000 [5:07:14<3:48:39,  1.59it/s]


 56%|██████████████████▋              | 28240/50000 [5:07:14<3:50:57,  1.57it/s]


 56%|██████████████████▋              | 28241/50000 [5:07:15<3:47:06,  1.60it/s]


 56%|██████████████████▋              | 28242/50000 [5:07:15<3:39:37,  1.65it/s]


 56%|██████████████████▋              | 28243/50000 [5:07:16<3:39:24,  1.65it/s]


 56%|██████████████████▋              | 28244/50000 [5:07:17<3:51:09,  1.57it/s]


 56%|██████████████████▋              | 28245/50000 [5:07:18<4:00:51,  1.51it/s]


 56%|██████████████████▋              | 28246/50000 [5:07:18<3:50:49,  1.57it/s]


 56%|██████████████████▋              | 28247/50000 [5:07:19<3:58:58,  1.52it/s]


 56%|██████████████████▋              | 28248/50000 [5:07:19<3:51:32,  1.57it/s]


 56%|██████████████████▋              | 28249/50000 [5:07:20<3:48:42,  1.59it/s]


 56%|██████████████████▋              | 28250/50000 [5:07:21<3:46:37,  1.60it/s]


 57%|██████████████████▋              | 28251/50000 [5:07:21<3:43:36,  1.62it/s]


 57%|██████████████████▋              | 28252/50000 [5:07:22<3:56:57,  1.53it/s]


 57%|██████████████████▋              | 28253/50000 [5:07:23<3:50:53,  1.57it/s]


 57%|██████████████████▋              | 28254/50000 [5:07:23<3:55:27,  1.54it/s]


 57%|██████████████████▋              | 28255/50000 [5:07:24<4:08:39,  1.46it/s]


 57%|██████████████████▋              | 28256/50000 [5:07:25<3:59:40,  1.51it/s]


 57%|██████████████████▋              | 28257/50000 [5:07:25<3:57:30,  1.53it/s]


 57%|██████████████████▋              | 28258/50000 [5:07:26<3:53:51,  1.55it/s]


 57%|██████████████████▋              | 28259/50000 [5:07:26<3:49:46,  1.58it/s]


 57%|██████████████████▋              | 28260/50000 [5:07:27<4:00:55,  1.50it/s]


 57%|██████████████████▋              | 28261/50000 [5:07:28<3:54:41,  1.54it/s]


 57%|██████████████████▋              | 28262/50000 [5:07:28<3:54:37,  1.54it/s]


 57%|██████████████████▋              | 28263/50000 [5:07:29<4:15:41,  1.42it/s]


 57%|██████████████████▋              | 28264/50000 [5:07:30<4:05:01,  1.48it/s]


 57%|██████████████████▋              | 28265/50000 [5:07:31<3:54:09,  1.55it/s]


 57%|██████████████████▋              | 28266/50000 [5:07:31<3:45:55,  1.60it/s]


 57%|██████████████████▋              | 28267/50000 [5:07:32<3:55:15,  1.54it/s]


 57%|██████████████████▋              | 28268/50000 [5:07:32<3:47:20,  1.59it/s]


 57%|██████████████████▋              | 28269/50000 [5:07:33<3:42:28,  1.63it/s]


 57%|██████████████████▋              | 28270/50000 [5:07:34<3:35:37,  1.68it/s]


 57%|██████████████████▋              | 28271/50000 [5:07:34<3:43:08,  1.62it/s]


 57%|██████████████████▋              | 28272/50000 [5:07:35<3:40:09,  1.64it/s]


 57%|██████████████████▋              | 28273/50000 [5:07:35<3:44:55,  1.61it/s]


 57%|██████████████████▋              | 28274/50000 [5:07:36<3:56:44,  1.53it/s]


 57%|██████████████████▋              | 28275/50000 [5:07:37<3:51:03,  1.57it/s]


 57%|██████████████████▋              | 28276/50000 [5:07:37<3:55:20,  1.54it/s]


 57%|██████████████████▋              | 28277/50000 [5:07:38<3:52:51,  1.55it/s]


 57%|██████████████████▋              | 28278/50000 [5:07:39<3:53:48,  1.55it/s]


 57%|██████████████████▋              | 28279/50000 [5:07:39<3:40:25,  1.64it/s]


 57%|██████████████████▋              | 28280/50000 [5:07:40<4:02:11,  1.49it/s]


 57%|██████████████████▋              | 28281/50000 [5:07:41<4:06:32,  1.47it/s]


 57%|██████████████████▋              | 28282/50000 [5:07:41<4:07:03,  1.47it/s]


 57%|██████████████████▋              | 28283/50000 [5:07:42<3:50:24,  1.57it/s]


 57%|██████████████████▋              | 28284/50000 [5:07:43<4:03:05,  1.49it/s]


 57%|██████████████████▋              | 28285/50000 [5:07:43<4:08:23,  1.46it/s]


 57%|██████████████████▋              | 28286/50000 [5:07:44<4:04:52,  1.48it/s]


 57%|██████████████████▋              | 28287/50000 [5:07:45<3:53:33,  1.55it/s]


 57%|██████████████████▋              | 28288/50000 [5:07:45<4:05:50,  1.47it/s]


 57%|██████████████████▋              | 28289/50000 [5:07:46<3:58:19,  1.52it/s]


 57%|██████████████████▋              | 28290/50000 [5:07:47<3:49:02,  1.58it/s]


 57%|██████████████████▋              | 28291/50000 [5:07:47<3:39:43,  1.65it/s]


 57%|██████████████████▋              | 28292/50000 [5:07:48<3:44:10,  1.61it/s]


 57%|██████████████████▋              | 28293/50000 [5:07:48<3:41:45,  1.63it/s]


 57%|██████████████████▋              | 28294/50000 [5:07:49<3:28:46,  1.73it/s]


 57%|██████████████████▋              | 28295/50000 [5:07:49<3:30:16,  1.72it/s]


 57%|██████████████████▋              | 28296/50000 [5:07:50<4:08:28,  1.46it/s]


 57%|██████████████████▋              | 28297/50000 [5:07:51<3:55:20,  1.54it/s]


 57%|██████████████████▋              | 28298/50000 [5:07:52<3:58:58,  1.51it/s]


 57%|██████████████████▋              | 28299/50000 [5:07:52<4:17:11,  1.41it/s]


 57%|██████████████████▋              | 28300/50000 [5:07:53<4:01:12,  1.50it/s]
                                                                                
{'loss': 3.2999, 'grad_norm': 6.539376735687256, 'learning_rate': 0.00043400000000000003, 'epoch': 1.48}

 57%|██████████████████▋              | 28300/50000 [5:07:53<4:01:12,  1.50it/s]


 57%|██████████████████▋              | 28301/50000 [5:07:54<4:07:27,  1.46it/s]


 57%|██████████████████▋              | 28302/50000 [5:07:54<4:10:11,  1.45it/s]


 57%|██████████████████▋              | 28303/50000 [5:07:55<4:04:28,  1.48it/s]


 57%|██████████████████▋              | 28304/50000 [5:07:56<3:46:26,  1.60it/s]


 57%|██████████████████▋              | 28305/50000 [5:07:56<3:43:46,  1.62it/s]


 57%|██████████████████▋              | 28306/50000 [5:07:57<3:39:53,  1.64it/s]


 57%|██████████████████▋              | 28307/50000 [5:07:57<3:38:28,  1.65it/s]


 57%|██████████████████▋              | 28308/50000 [5:07:58<3:41:35,  1.63it/s]


 57%|██████████████████▋              | 28309/50000 [5:07:59<3:52:23,  1.56it/s]


 57%|██████████████████▋              | 28310/50000 [5:07:59<3:45:13,  1.61it/s]


 57%|██████████████████▋              | 28311/50000 [5:08:00<3:47:39,  1.59it/s]


 57%|██████████████████▋              | 28312/50000 [5:08:01<3:42:32,  1.62it/s]


 57%|██████████████████▋              | 28313/50000 [5:08:01<3:36:56,  1.67it/s]


 57%|██████████████████▋              | 28314/50000 [5:08:02<3:23:48,  1.77it/s]


 57%|██████████████████▋              | 28315/50000 [5:08:02<3:31:59,  1.70it/s]


 57%|██████████████████▋              | 28316/50000 [5:08:03<3:32:59,  1.70it/s]


 57%|██████████████████▋              | 28317/50000 [5:08:04<3:41:09,  1.63it/s]


 57%|██████████████████▋              | 28318/50000 [5:08:04<3:45:27,  1.60it/s]


 57%|██████████████████▋              | 28319/50000 [5:08:05<3:49:54,  1.57it/s]


 57%|██████████████████▋              | 28320/50000 [5:08:05<3:40:44,  1.64it/s]


 57%|██████████████████▋              | 28321/50000 [5:08:06<3:36:59,  1.67it/s]


 57%|██████████████████▋              | 28322/50000 [5:08:06<3:21:55,  1.79it/s]


 57%|██████████████████▋              | 28323/50000 [5:08:07<3:26:59,  1.75it/s]


 57%|██████████████████▋              | 28324/50000 [5:08:08<3:47:31,  1.59it/s]


 57%|██████████████████▋              | 28325/50000 [5:08:08<3:42:26,  1.62it/s]


 57%|██████████████████▋              | 28326/50000 [5:08:09<3:36:06,  1.67it/s]


 57%|██████████████████▋              | 28327/50000 [5:08:10<3:41:12,  1.63it/s]


 57%|██████████████████▋              | 28328/50000 [5:08:10<3:47:35,  1.59it/s]


 57%|██████████████████▋              | 28329/50000 [5:08:11<3:58:25,  1.51it/s]


 57%|██████████████████▋              | 28330/50000 [5:08:12<3:48:54,  1.58it/s]


 57%|██████████████████▋              | 28331/50000 [5:08:12<4:03:38,  1.48it/s]


 57%|██████████████████▋              | 28332/50000 [5:08:13<4:08:58,  1.45it/s]


 57%|██████████████████▋              | 28333/50000 [5:08:14<3:57:24,  1.52it/s]


 57%|██████████████████▋              | 28334/50000 [5:08:14<4:14:00,  1.42it/s]


 57%|██████████████████▋              | 28335/50000 [5:08:15<4:18:46,  1.40it/s]


 57%|██████████████████▋              | 28336/50000 [5:08:16<4:03:14,  1.48it/s]


 57%|██████████████████▋              | 28337/50000 [5:08:17<4:18:22,  1.40it/s]


 57%|██████████████████▋              | 28338/50000 [5:08:17<4:04:58,  1.47it/s]


 57%|██████████████████▋              | 28339/50000 [5:08:18<4:02:18,  1.49it/s]


 57%|██████████████████▋              | 28340/50000 [5:08:18<3:58:24,  1.51it/s]


 57%|██████████████████▋              | 28341/50000 [5:08:19<3:50:54,  1.56it/s]


 57%|██████████████████▋              | 28342/50000 [5:08:20<3:43:34,  1.61it/s]


 57%|██████████████████▋              | 28343/50000 [5:08:20<3:39:52,  1.64it/s]


 57%|██████████████████▋              | 28344/50000 [5:08:21<3:39:17,  1.65it/s]


 57%|██████████████████▋              | 28345/50000 [5:08:21<3:28:58,  1.73it/s]


 57%|██████████████████▋              | 28346/50000 [5:08:22<3:30:55,  1.71it/s]


 57%|██████████████████▋              | 28347/50000 [5:08:22<3:28:45,  1.73it/s]


 57%|██████████████████▋              | 28348/50000 [5:08:23<3:48:37,  1.58it/s]


 57%|██████████████████▋              | 28349/50000 [5:08:24<3:53:21,  1.55it/s]


 57%|██████████████████▋              | 28350/50000 [5:08:24<3:42:19,  1.62it/s]


 57%|██████████████████▋              | 28351/50000 [5:08:25<3:40:41,  1.63it/s]


 57%|██████████████████▋              | 28352/50000 [5:08:26<3:37:32,  1.66it/s]


 57%|██████████████████▋              | 28353/50000 [5:08:26<3:31:56,  1.70it/s]


 57%|██████████████████▋              | 28354/50000 [5:08:27<3:52:56,  1.55it/s]


 57%|██████████████████▋              | 28355/50000 [5:08:28<3:54:21,  1.54it/s]


 57%|██████████████████▋              | 28356/50000 [5:08:28<4:05:07,  1.47it/s]


 57%|██████████████████▋              | 28357/50000 [5:08:29<4:01:27,  1.49it/s]


 57%|██████████████████▋              | 28358/50000 [5:08:30<3:54:06,  1.54it/s]


 57%|██████████████████▋              | 28359/50000 [5:08:30<3:56:30,  1.53it/s]


 57%|██████████████████▋              | 28360/50000 [5:08:31<3:55:26,  1.53it/s]


 57%|██████████████████▋              | 28361/50000 [5:08:32<3:49:50,  1.57it/s]


 57%|██████████████████▋              | 28362/50000 [5:08:32<3:41:13,  1.63it/s]


 57%|██████████████████▋              | 28363/50000 [5:08:33<3:31:19,  1.71it/s]


 57%|██████████████████▋              | 28364/50000 [5:08:33<3:39:12,  1.65it/s]


 57%|██████████████████▋              | 28365/50000 [5:08:34<3:43:05,  1.62it/s]


 57%|██████████████████▋              | 28366/50000 [5:08:35<3:40:07,  1.64it/s]


 57%|██████████████████▋              | 28367/50000 [5:08:35<3:47:33,  1.58it/s]


 57%|██████████████████▋              | 28368/50000 [5:08:36<3:43:11,  1.62it/s]


 57%|██████████████████▋              | 28369/50000 [5:08:37<4:06:03,  1.47it/s]


 57%|██████████████████▋              | 28370/50000 [5:08:37<3:56:51,  1.52it/s]


 57%|██████████████████▋              | 28371/50000 [5:08:38<3:57:15,  1.52it/s]


 57%|██████████████████▋              | 28372/50000 [5:08:38<3:51:18,  1.56it/s]


 57%|██████████████████▋              | 28373/50000 [5:08:39<3:54:44,  1.54it/s]


 57%|██████████████████▋              | 28374/50000 [5:08:40<3:52:23,  1.55it/s]


 57%|██████████████████▋              | 28375/50000 [5:08:40<3:42:34,  1.62it/s]


 57%|██████████████████▋              | 28376/50000 [5:08:41<4:06:58,  1.46it/s]


 57%|██████████████████▋              | 28377/50000 [5:08:42<3:48:14,  1.58it/s]


 57%|██████████████████▋              | 28378/50000 [5:08:42<3:40:57,  1.63it/s]


 57%|██████████████████▋              | 28379/50000 [5:08:43<3:54:30,  1.54it/s]


 57%|██████████████████▋              | 28380/50000 [5:08:44<3:47:22,  1.58it/s]


 57%|██████████████████▋              | 28381/50000 [5:08:44<3:43:08,  1.61it/s]


 57%|██████████████████▋              | 28382/50000 [5:08:45<3:38:40,  1.65it/s]


 57%|██████████████████▋              | 28383/50000 [5:08:45<3:42:22,  1.62it/s]


 57%|██████████████████▋              | 28384/50000 [5:08:46<3:47:02,  1.59it/s]


 57%|██████████████████▋              | 28385/50000 [5:08:47<3:40:04,  1.64it/s]


 57%|██████████████████▋              | 28386/50000 [5:08:47<3:46:06,  1.59it/s]


 57%|██████████████████▋              | 28387/50000 [5:08:48<4:07:22,  1.46it/s]


 57%|██████████████████▋              | 28388/50000 [5:08:49<3:54:58,  1.53it/s]


 57%|██████████████████▋              | 28389/50000 [5:08:50<4:12:11,  1.43it/s]


 57%|██████████████████▋              | 28390/50000 [5:08:50<4:08:32,  1.45it/s]


 57%|██████████████████▋              | 28391/50000 [5:08:51<3:57:49,  1.51it/s]


 57%|██████████████████▋              | 28392/50000 [5:08:51<3:48:55,  1.57it/s]


 57%|██████████████████▋              | 28393/50000 [5:08:52<3:50:51,  1.56it/s]


 57%|██████████████████▋              | 28394/50000 [5:08:53<3:46:04,  1.59it/s]


 57%|██████████████████▋              | 28395/50000 [5:08:53<3:49:38,  1.57it/s]


 57%|██████████████████▋              | 28396/50000 [5:08:54<3:46:01,  1.59it/s]


 57%|██████████████████▋              | 28397/50000 [5:08:54<3:41:29,  1.63it/s]


 57%|██████████████████▋              | 28398/50000 [5:08:55<3:44:22,  1.60it/s]


 57%|██████████████████▋              | 28399/50000 [5:08:56<3:50:25,  1.56it/s]


 57%|██████████████████▋              | 28400/50000 [5:08:56<3:49:57,  1.57it/s]
                                                                                
{'loss': 3.2262, 'grad_norm': 3.0274910926818848, 'learning_rate': 0.000432, 'epoch': 1.49}

 57%|██████████████████▋              | 28400/50000 [5:08:56<3:49:57,  1.57it/s]


 57%|██████████████████▋              | 28401/50000 [5:08:57<3:40:46,  1.63it/s]


 57%|██████████████████▋              | 28402/50000 [5:08:58<3:39:03,  1.64it/s]


 57%|██████████████████▋              | 28403/50000 [5:08:58<3:51:05,  1.56it/s]


 57%|██████████████████▋              | 28404/50000 [5:08:59<3:53:07,  1.54it/s]


 57%|██████████████████▋              | 28405/50000 [5:09:00<3:56:32,  1.52it/s]


 57%|██████████████████▋              | 28406/50000 [5:09:00<4:08:45,  1.45it/s]


 57%|██████████████████▋              | 28407/50000 [5:09:01<4:01:23,  1.49it/s]


 57%|██████████████████▋              | 28408/50000 [5:09:02<3:51:25,  1.55it/s]


 57%|██████████████████▋              | 28409/50000 [5:09:02<3:51:13,  1.56it/s]


 57%|██████████████████▊              | 28410/50000 [5:09:03<3:42:17,  1.62it/s]


 57%|██████████████████▊              | 28411/50000 [5:09:03<3:31:31,  1.70it/s]


 57%|██████████████████▊              | 28412/50000 [5:09:04<3:31:10,  1.70it/s]


 57%|██████████████████▊              | 28413/50000 [5:09:05<3:46:47,  1.59it/s]


 57%|██████████████████▊              | 28414/50000 [5:09:05<3:38:56,  1.64it/s]


 57%|██████████████████▊              | 28415/50000 [5:09:06<3:44:57,  1.60it/s]


 57%|██████████████████▊              | 28416/50000 [5:09:06<3:44:27,  1.60it/s]


 57%|██████████████████▊              | 28417/50000 [5:09:07<3:35:54,  1.67it/s]


 57%|██████████████████▊              | 28418/50000 [5:09:08<3:34:01,  1.68it/s]


 57%|██████████████████▊              | 28419/50000 [5:09:08<3:30:37,  1.71it/s]


 57%|██████████████████▊              | 28420/50000 [5:09:09<3:38:27,  1.65it/s]


 57%|██████████████████▊              | 28421/50000 [5:09:10<3:51:01,  1.56it/s]


 57%|██████████████████▊              | 28422/50000 [5:09:10<3:52:01,  1.55it/s]


 57%|██████████████████▊              | 28423/50000 [5:09:11<3:43:57,  1.61it/s]


 57%|██████████████████▊              | 28424/50000 [5:09:11<3:33:08,  1.69it/s]


 57%|██████████████████▊              | 28425/50000 [5:09:12<3:46:23,  1.59it/s]


 57%|██████████████████▊              | 28426/50000 [5:09:13<3:49:28,  1.57it/s]


 57%|██████████████████▊              | 28427/50000 [5:09:13<3:48:57,  1.57it/s]


 57%|██████████████████▊              | 28428/50000 [5:09:14<3:45:06,  1.60it/s]


 57%|██████████████████▊              | 28429/50000 [5:09:14<3:37:02,  1.66it/s]


 57%|██████████████████▊              | 28430/50000 [5:09:15<3:32:05,  1.70it/s]


 57%|██████████████████▊              | 28431/50000 [5:09:16<3:32:21,  1.69it/s]


 57%|██████████████████▊              | 28432/50000 [5:09:16<3:39:43,  1.64it/s]


 57%|██████████████████▊              | 28433/50000 [5:09:17<4:12:10,  1.43it/s]


 57%|██████████████████▊              | 28434/50000 [5:09:18<4:01:22,  1.49it/s]


 57%|██████████████████▊              | 28435/50000 [5:09:19<4:09:48,  1.44it/s]


 57%|██████████████████▊              | 28436/50000 [5:09:19<3:56:59,  1.52it/s]


 57%|██████████████████▊              | 28437/50000 [5:09:20<4:02:21,  1.48it/s]


 57%|██████████████████▊              | 28438/50000 [5:09:21<4:06:08,  1.46it/s]


 57%|██████████████████▊              | 28439/50000 [5:09:21<4:14:28,  1.41it/s]


 57%|██████████████████▊              | 28440/50000 [5:09:22<4:00:55,  1.49it/s]


 57%|██████████████████▊              | 28441/50000 [5:09:23<4:00:57,  1.49it/s]


 57%|██████████████████▊              | 28442/50000 [5:09:23<3:53:44,  1.54it/s]


 57%|██████████████████▊              | 28443/50000 [5:09:24<3:48:36,  1.57it/s]


 57%|██████████████████▊              | 28444/50000 [5:09:24<3:46:55,  1.58it/s]


 57%|██████████████████▊              | 28445/50000 [5:09:25<3:33:50,  1.68it/s]


 57%|██████████████████▊              | 28446/50000 [5:09:26<3:41:20,  1.62it/s]


 57%|██████████████████▊              | 28447/50000 [5:09:26<3:51:22,  1.55it/s]


 57%|██████████████████▊              | 28448/50000 [5:09:27<3:55:41,  1.52it/s]


 57%|██████████████████▊              | 28449/50000 [5:09:27<3:38:35,  1.64it/s]


 57%|██████████████████▊              | 28450/50000 [5:09:28<3:53:05,  1.54it/s]


 57%|██████████████████▊              | 28451/50000 [5:09:29<3:43:54,  1.60it/s]


 57%|██████████████████▊              | 28452/50000 [5:09:29<3:42:10,  1.62it/s]


 57%|██████████████████▊              | 28453/50000 [5:09:30<3:40:17,  1.63it/s]


 57%|██████████████████▊              | 28454/50000 [5:09:31<3:46:46,  1.58it/s]


 57%|██████████████████▊              | 28455/50000 [5:09:31<3:38:07,  1.65it/s]


 57%|██████████████████▊              | 28456/50000 [5:09:32<3:41:30,  1.62it/s]


 57%|██████████████████▊              | 28457/50000 [5:09:32<3:47:21,  1.58it/s]


 57%|██████████████████▊              | 28458/50000 [5:09:33<3:47:21,  1.58it/s]


 57%|██████████████████▊              | 28459/50000 [5:09:34<3:58:28,  1.51it/s]


 57%|██████████████████▊              | 28460/50000 [5:09:34<3:56:31,  1.52it/s]


 57%|██████████████████▊              | 28461/50000 [5:09:35<3:54:25,  1.53it/s]


 57%|██████████████████▊              | 28462/50000 [5:09:36<4:03:51,  1.47it/s]


 57%|██████████████████▊              | 28463/50000 [5:09:37<4:11:24,  1.43it/s]


 57%|██████████████████▊              | 28464/50000 [5:09:37<4:06:57,  1.45it/s]


 57%|██████████████████▊              | 28465/50000 [5:09:38<4:00:24,  1.49it/s]


 57%|██████████████████▊              | 28466/50000 [5:09:39<4:05:19,  1.46it/s]


 57%|██████████████████▊              | 28467/50000 [5:09:39<3:56:26,  1.52it/s]


 57%|██████████████████▊              | 28468/50000 [5:09:40<3:46:13,  1.59it/s]


 57%|██████████████████▊              | 28469/50000 [5:09:40<3:37:42,  1.65it/s]


 57%|██████████████████▊              | 28470/50000 [5:09:41<3:40:33,  1.63it/s]


 57%|██████████████████▊              | 28471/50000 [5:09:41<3:30:38,  1.70it/s]


 57%|██████████████████▊              | 28472/50000 [5:09:42<3:29:31,  1.71it/s]


 57%|██████████████████▊              | 28473/50000 [5:09:43<3:46:40,  1.58it/s]


 57%|██████████████████▊              | 28474/50000 [5:09:43<3:42:05,  1.62it/s]


 57%|██████████████████▊              | 28475/50000 [5:09:44<4:04:40,  1.47it/s]


 57%|██████████████████▊              | 28476/50000 [5:09:45<4:09:57,  1.44it/s]


 57%|██████████████████▊              | 28477/50000 [5:09:46<4:02:31,  1.48it/s]


 57%|██████████████████▊              | 28478/50000 [5:09:46<4:18:15,  1.39it/s]


 57%|██████████████████▊              | 28479/50000 [5:09:47<4:11:38,  1.43it/s]


 57%|██████████████████▊              | 28480/50000 [5:09:48<4:00:50,  1.49it/s]


 57%|██████████████████▊              | 28481/50000 [5:09:48<4:16:21,  1.40it/s]


 57%|██████████████████▊              | 28482/50000 [5:09:49<4:08:16,  1.44it/s]


 57%|██████████████████▊              | 28483/50000 [5:09:50<4:07:21,  1.45it/s]


 57%|██████████████████▊              | 28484/50000 [5:09:50<3:55:09,  1.52it/s]


 57%|██████████████████▊              | 28485/50000 [5:09:51<3:46:35,  1.58it/s]


 57%|██████████████████▊              | 28486/50000 [5:09:52<3:52:07,  1.54it/s]


 57%|██████████████████▊              | 28487/50000 [5:09:52<3:49:55,  1.56it/s]


 57%|██████████████████▊              | 28488/50000 [5:09:53<4:08:04,  1.45it/s]


 57%|██████████████████▊              | 28489/50000 [5:09:54<3:53:39,  1.53it/s]


 57%|██████████████████▊              | 28490/50000 [5:09:54<4:12:53,  1.42it/s]


 57%|██████████████████▊              | 28491/50000 [5:09:55<4:09:16,  1.44it/s]


 57%|██████████████████▊              | 28492/50000 [5:09:56<4:11:43,  1.42it/s]


 57%|██████████████████▊              | 28493/50000 [5:09:56<4:02:57,  1.48it/s]


 57%|██████████████████▊              | 28494/50000 [5:09:57<3:57:28,  1.51it/s]


 57%|██████████████████▊              | 28495/50000 [5:09:58<3:51:02,  1.55it/s]


 57%|██████████████████▊              | 28496/50000 [5:09:58<3:58:53,  1.50it/s]


 57%|██████████████████▊              | 28497/50000 [5:09:59<3:49:20,  1.56it/s]


 57%|██████████████████▊              | 28498/50000 [5:10:00<3:48:15,  1.57it/s]


 57%|██████████████████▊              | 28499/50000 [5:10:00<3:49:22,  1.56it/s]


 57%|██████████████████▊              | 28500/50000 [5:10:01<3:50:52,  1.55it/s]
                                                                                
{'loss': 3.2646, 'grad_norm': 2.9462435245513916, 'learning_rate': 0.00043, 'epoch': 1.49}

 57%|██████████████████▊              | 28500/50000 [5:10:01<3:50:52,  1.55it/s]


 57%|██████████████████▊              | 28501/50000 [5:10:02<3:45:53,  1.59it/s]


 57%|██████████████████▊              | 28502/50000 [5:10:02<3:40:16,  1.63it/s]


 57%|██████████████████▊              | 28503/50000 [5:10:03<3:55:42,  1.52it/s]


 57%|██████████████████▊              | 28504/50000 [5:10:04<3:57:20,  1.51it/s]


 57%|██████████████████▊              | 28505/50000 [5:10:04<4:02:34,  1.48it/s]


 57%|██████████████████▊              | 28506/50000 [5:10:05<3:54:00,  1.53it/s]


 57%|██████████████████▊              | 28507/50000 [5:10:06<3:57:39,  1.51it/s]


 57%|██████████████████▊              | 28508/50000 [5:10:06<3:50:21,  1.55it/s]


 57%|██████████████████▊              | 28509/50000 [5:10:07<3:43:51,  1.60it/s]


 57%|██████████████████▊              | 28510/50000 [5:10:07<3:45:28,  1.59it/s]


 57%|██████████████████▊              | 28511/50000 [5:10:08<3:42:26,  1.61it/s]


 57%|██████████████████▊              | 28512/50000 [5:10:09<3:38:57,  1.64it/s]


 57%|██████████████████▊              | 28513/50000 [5:10:09<3:51:03,  1.55it/s]


 57%|██████████████████▊              | 28514/50000 [5:10:10<3:51:42,  1.55it/s]


 57%|██████████████████▊              | 28515/50000 [5:10:11<3:51:04,  1.55it/s]


 57%|██████████████████▊              | 28516/50000 [5:10:11<3:46:24,  1.58it/s]


 57%|██████████████████▊              | 28517/50000 [5:10:12<3:47:50,  1.57it/s]


 57%|██████████████████▊              | 28518/50000 [5:10:12<3:50:12,  1.56it/s]


 57%|██████████████████▊              | 28519/50000 [5:10:13<3:41:42,  1.61it/s]


 57%|██████████████████▊              | 28520/50000 [5:10:14<3:43:11,  1.60it/s]


 57%|██████████████████▊              | 28521/50000 [5:10:14<3:45:33,  1.59it/s]


 57%|██████████████████▊              | 28522/50000 [5:10:15<3:46:33,  1.58it/s]


 57%|██████████████████▊              | 28523/50000 [5:10:16<3:50:41,  1.55it/s]


 57%|██████████████████▊              | 28524/50000 [5:10:16<3:57:40,  1.51it/s]


 57%|██████████████████▊              | 28525/50000 [5:10:17<3:55:42,  1.52it/s]


 57%|██████████████████▊              | 28526/50000 [5:10:18<4:02:44,  1.47it/s]


 57%|██████████████████▊              | 28527/50000 [5:10:18<4:00:35,  1.49it/s]


 57%|██████████████████▊              | 28528/50000 [5:10:19<3:50:26,  1.55it/s]


 57%|██████████████████▊              | 28529/50000 [5:10:20<3:54:16,  1.53it/s]


 57%|██████████████████▊              | 28530/50000 [5:10:20<3:46:48,  1.58it/s]


 57%|██████████████████▊              | 28531/50000 [5:10:21<3:39:31,  1.63it/s]


 57%|██████████████████▊              | 28532/50000 [5:10:21<3:35:04,  1.66it/s]


 57%|██████████████████▊              | 28533/50000 [5:10:22<3:49:42,  1.56it/s]


 57%|██████████████████▊              | 28534/50000 [5:10:23<3:52:10,  1.54it/s]


 57%|██████████████████▊              | 28535/50000 [5:10:23<3:43:44,  1.60it/s]


 57%|██████████████████▊              | 28536/50000 [5:10:24<3:35:09,  1.66it/s]


 57%|██████████████████▊              | 28537/50000 [5:10:25<3:41:56,  1.61it/s]


 57%|██████████████████▊              | 28538/50000 [5:10:25<3:37:56,  1.64it/s]


 57%|██████████████████▊              | 28539/50000 [5:10:26<3:32:23,  1.68it/s]


 57%|██████████████████▊              | 28540/50000 [5:10:26<3:33:12,  1.68it/s]


 57%|██████████████████▊              | 28541/50000 [5:10:27<3:33:48,  1.67it/s]


 57%|██████████████████▊              | 28542/50000 [5:10:28<3:47:20,  1.57it/s]


 57%|██████████████████▊              | 28543/50000 [5:10:28<3:51:08,  1.55it/s]


 57%|██████████████████▊              | 28544/50000 [5:10:29<3:49:02,  1.56it/s]


 57%|██████████████████▊              | 28545/50000 [5:10:30<3:49:36,  1.56it/s]


 57%|██████████████████▊              | 28546/50000 [5:10:30<3:58:24,  1.50it/s]


 57%|██████████████████▊              | 28547/50000 [5:10:31<3:46:17,  1.58it/s]


 57%|██████████████████▊              | 28548/50000 [5:10:31<3:43:50,  1.60it/s]


 57%|██████████████████▊              | 28549/50000 [5:10:32<3:41:09,  1.62it/s]


 57%|██████████████████▊              | 28550/50000 [5:10:33<3:39:56,  1.63it/s]


 57%|██████████████████▊              | 28551/50000 [5:10:33<3:35:09,  1.66it/s]


 57%|██████████████████▊              | 28552/50000 [5:10:34<3:37:49,  1.64it/s]


 57%|██████████████████▊              | 28553/50000 [5:10:34<3:32:17,  1.68it/s]


 57%|██████████████████▊              | 28554/50000 [5:10:35<3:37:09,  1.65it/s]


 57%|██████████████████▊              | 28555/50000 [5:10:36<3:53:10,  1.53it/s]


 57%|██████████████████▊              | 28556/50000 [5:10:36<3:46:34,  1.58it/s]


 57%|██████████████████▊              | 28557/50000 [5:10:37<3:47:42,  1.57it/s]


 57%|██████████████████▊              | 28558/50000 [5:10:38<3:56:52,  1.51it/s]


 57%|██████████████████▊              | 28559/50000 [5:10:38<3:49:50,  1.55it/s]


 57%|██████████████████▊              | 28560/50000 [5:10:39<3:54:23,  1.52it/s]


 57%|██████████████████▊              | 28561/50000 [5:10:40<3:57:32,  1.50it/s]


 57%|██████████████████▊              | 28562/50000 [5:10:40<3:42:33,  1.61it/s]


 57%|██████████████████▊              | 28563/50000 [5:10:41<3:45:51,  1.58it/s]


 57%|██████████████████▊              | 28564/50000 [5:10:42<3:58:39,  1.50it/s]


 57%|██████████████████▊              | 28565/50000 [5:10:42<3:49:08,  1.56it/s]


 57%|██████████████████▊              | 28566/50000 [5:10:43<3:43:47,  1.60it/s]


 57%|██████████████████▊              | 28567/50000 [5:10:43<3:37:17,  1.64it/s]


 57%|██████████████████▊              | 28568/50000 [5:10:44<3:36:03,  1.65it/s]


 57%|██████████████████▊              | 28569/50000 [5:10:45<3:42:41,  1.60it/s]


 57%|██████████████████▊              | 28570/50000 [5:10:45<3:57:07,  1.51it/s]


 57%|██████████████████▊              | 28571/50000 [5:10:46<3:49:16,  1.56it/s]


 57%|██████████████████▊              | 28572/50000 [5:10:47<3:47:51,  1.57it/s]


 57%|██████████████████▊              | 28573/50000 [5:10:47<3:40:48,  1.62it/s]


 57%|██████████████████▊              | 28574/50000 [5:10:48<3:47:20,  1.57it/s]


 57%|██████████████████▊              | 28575/50000 [5:10:49<4:07:58,  1.44it/s]


 57%|██████████████████▊              | 28576/50000 [5:10:49<4:01:24,  1.48it/s]


 57%|██████████████████▊              | 28577/50000 [5:10:50<3:57:28,  1.50it/s]


 57%|██████████████████▊              | 28578/50000 [5:10:51<4:02:16,  1.47it/s]


 57%|██████████████████▊              | 28579/50000 [5:10:51<3:53:23,  1.53it/s]


 57%|██████████████████▊              | 28580/50000 [5:10:52<3:45:48,  1.58it/s]


 57%|██████████████████▊              | 28581/50000 [5:10:52<3:41:07,  1.61it/s]


 57%|██████████████████▊              | 28582/50000 [5:10:53<3:46:42,  1.57it/s]


 57%|██████████████████▊              | 28583/50000 [5:10:54<3:48:32,  1.56it/s]


 57%|██████████████████▊              | 28584/50000 [5:10:54<3:51:53,  1.54it/s]


 57%|██████████████████▊              | 28585/50000 [5:10:55<3:49:46,  1.55it/s]


 57%|██████████████████▊              | 28586/50000 [5:10:56<3:39:41,  1.62it/s]


 57%|██████████████████▊              | 28587/50000 [5:10:56<3:38:09,  1.64it/s]


 57%|██████████████████▊              | 28588/50000 [5:10:57<3:37:43,  1.64it/s]


 57%|██████████████████▊              | 28589/50000 [5:10:58<3:43:35,  1.60it/s]


 57%|██████████████████▊              | 28590/50000 [5:10:58<3:40:11,  1.62it/s]


 57%|██████████████████▊              | 28591/50000 [5:10:59<3:35:08,  1.66it/s]


 57%|██████████████████▊              | 28592/50000 [5:10:59<3:59:17,  1.49it/s]


 57%|██████████████████▊              | 28593/50000 [5:11:00<3:59:59,  1.49it/s]


 57%|██████████████████▊              | 28594/50000 [5:11:01<3:57:03,  1.50it/s]


 57%|██████████████████▊              | 28595/50000 [5:11:01<3:48:17,  1.56it/s]


 57%|██████████████████▊              | 28596/50000 [5:11:02<3:43:37,  1.60it/s]


 57%|██████████████████▊              | 28597/50000 [5:11:03<3:55:26,  1.52it/s]


 57%|██████████████████▊              | 28598/50000 [5:11:04<4:13:27,  1.41it/s]


 57%|██████████████████▉              | 28599/50000 [5:11:04<3:56:25,  1.51it/s]


 57%|██████████████████▉              | 28600/50000 [5:11:05<3:46:32,  1.57it/s]
                                                                                
{'loss': 3.2726, 'grad_norm': 3.2259747982025146, 'learning_rate': 0.000428, 'epoch': 1.5}

 57%|██████████████████▉              | 28600/50000 [5:11:05<3:46:32,  1.57it/s]


 57%|██████████████████▉              | 28601/50000 [5:11:05<3:31:11,  1.69it/s]


 57%|██████████████████▉              | 28602/50000 [5:11:06<3:35:35,  1.65it/s]


 57%|██████████████████▉              | 28603/50000 [5:11:06<3:44:11,  1.59it/s]


 57%|██████████████████▉              | 28604/50000 [5:11:07<3:36:06,  1.65it/s]


 57%|██████████████████▉              | 28605/50000 [5:11:08<3:33:10,  1.67it/s]


 57%|██████████████████▉              | 28606/50000 [5:11:08<3:32:27,  1.68it/s]


 57%|██████████████████▉              | 28607/50000 [5:11:09<3:38:47,  1.63it/s]


 57%|██████████████████▉              | 28608/50000 [5:11:09<3:33:41,  1.67it/s]


 57%|██████████████████▉              | 28609/50000 [5:11:10<4:01:21,  1.48it/s]


 57%|██████████████████▉              | 28610/50000 [5:11:11<4:08:39,  1.43it/s]


 57%|██████████████████▉              | 28611/50000 [5:11:12<4:05:24,  1.45it/s]


 57%|██████████████████▉              | 28612/50000 [5:11:12<4:01:45,  1.47it/s]


 57%|██████████████████▉              | 28613/50000 [5:11:13<3:51:44,  1.54it/s]


 57%|██████████████████▉              | 28614/50000 [5:11:14<3:50:46,  1.54it/s]


 57%|██████████████████▉              | 28615/50000 [5:11:14<3:49:27,  1.55it/s]


 57%|██████████████████▉              | 28616/50000 [5:11:15<3:57:22,  1.50it/s]


 57%|██████████████████▉              | 28617/50000 [5:11:16<4:06:31,  1.45it/s]


 57%|██████████████████▉              | 28618/50000 [5:11:16<4:10:09,  1.42it/s]


 57%|██████████████████▉              | 28619/50000 [5:11:17<4:23:10,  1.35it/s]


 57%|██████████████████▉              | 28620/50000 [5:11:18<4:07:49,  1.44it/s]


 57%|██████████████████▉              | 28621/50000 [5:11:19<4:12:13,  1.41it/s]


 57%|██████████████████▉              | 28622/50000 [5:11:19<4:06:15,  1.45it/s]


 57%|██████████████████▉              | 28623/50000 [5:11:20<3:55:18,  1.51it/s]


 57%|██████████████████▉              | 28624/50000 [5:11:20<3:46:17,  1.57it/s]


 57%|██████████████████▉              | 28625/50000 [5:11:21<4:01:52,  1.47it/s]


 57%|██████████████████▉              | 28626/50000 [5:11:22<3:59:39,  1.49it/s]


 57%|██████████████████▉              | 28627/50000 [5:11:22<3:47:19,  1.57it/s]


 57%|██████████████████▉              | 28628/50000 [5:11:23<3:52:13,  1.53it/s]


 57%|██████████████████▉              | 28629/50000 [5:11:24<3:49:32,  1.55it/s]


 57%|██████████████████▉              | 28630/50000 [5:11:25<4:07:08,  1.44it/s]


 57%|██████████████████▉              | 28631/50000 [5:11:25<3:53:57,  1.52it/s]


 57%|██████████████████▉              | 28632/50000 [5:11:26<3:42:39,  1.60it/s]


 57%|██████████████████▉              | 28633/50000 [5:11:26<3:45:31,  1.58it/s]


 57%|██████████████████▉              | 28634/50000 [5:11:27<3:48:12,  1.56it/s]


 57%|██████████████████▉              | 28635/50000 [5:11:28<3:52:05,  1.53it/s]


 57%|██████████████████▉              | 28636/50000 [5:11:28<3:47:05,  1.57it/s]


 57%|██████████████████▉              | 28637/50000 [5:11:29<3:40:32,  1.61it/s]


 57%|██████████████████▉              | 28638/50000 [5:11:29<3:36:38,  1.64it/s]


 57%|██████████████████▉              | 28639/50000 [5:11:30<3:39:54,  1.62it/s]


 57%|██████████████████▉              | 28640/50000 [5:11:31<3:33:27,  1.67it/s]


 57%|██████████████████▉              | 28641/50000 [5:11:32<4:08:53,  1.43it/s]


 57%|██████████████████▉              | 28642/50000 [5:11:32<3:53:05,  1.53it/s]


 57%|██████████████████▉              | 28643/50000 [5:11:33<4:02:29,  1.47it/s]


 57%|██████████████████▉              | 28644/50000 [5:11:34<4:07:41,  1.44it/s]


 57%|██████████████████▉              | 28645/50000 [5:11:34<4:04:20,  1.46it/s]


 57%|██████████████████▉              | 28646/50000 [5:11:35<4:17:26,  1.38it/s]


 57%|██████████████████▉              | 28647/50000 [5:11:36<4:06:28,  1.44it/s]


 57%|██████████████████▉              | 28648/50000 [5:11:36<3:56:44,  1.50it/s]


 57%|██████████████████▉              | 28649/50000 [5:11:37<3:52:38,  1.53it/s]


 57%|██████████████████▉              | 28650/50000 [5:11:37<3:41:01,  1.61it/s]


 57%|██████████████████▉              | 28651/50000 [5:11:38<3:51:07,  1.54it/s]


 57%|██████████████████▉              | 28652/50000 [5:11:39<3:43:58,  1.59it/s]


 57%|██████████████████▉              | 28653/50000 [5:11:39<3:56:45,  1.50it/s]


 57%|██████████████████▉              | 28654/50000 [5:11:40<3:50:33,  1.54it/s]


 57%|██████████████████▉              | 28655/50000 [5:11:41<3:48:18,  1.56it/s]


 57%|██████████████████▉              | 28656/50000 [5:11:41<3:50:46,  1.54it/s]


 57%|██████████████████▉              | 28657/50000 [5:11:42<3:35:58,  1.65it/s]


 57%|██████████████████▉              | 28658/50000 [5:11:43<3:58:55,  1.49it/s]


 57%|██████████████████▉              | 28659/50000 [5:11:43<3:53:29,  1.52it/s]


 57%|██████████████████▉              | 28660/50000 [5:11:44<3:59:49,  1.48it/s]


 57%|██████████████████▉              | 28661/50000 [5:11:45<3:58:49,  1.49it/s]


 57%|██████████████████▉              | 28662/50000 [5:11:45<3:55:21,  1.51it/s]


 57%|██████████████████▉              | 28663/50000 [5:11:46<4:01:23,  1.47it/s]


 57%|██████████████████▉              | 28664/50000 [5:11:47<4:07:32,  1.44it/s]


 57%|██████████████████▉              | 28665/50000 [5:11:47<3:47:44,  1.56it/s]


 57%|██████████████████▉              | 28666/50000 [5:11:48<3:49:40,  1.55it/s]


 57%|██████████████████▉              | 28667/50000 [5:11:49<4:02:43,  1.46it/s]


 57%|██████████████████▉              | 28668/50000 [5:11:49<4:07:07,  1.44it/s]


 57%|██████████████████▉              | 28669/50000 [5:11:50<4:04:35,  1.45it/s]


 57%|██████████████████▉              | 28670/50000 [5:11:51<4:10:24,  1.42it/s]


 57%|██████████████████▉              | 28671/50000 [5:11:51<3:57:24,  1.50it/s]


 57%|██████████████████▉              | 28672/50000 [5:11:52<4:14:31,  1.40it/s]


 57%|██████████████████▉              | 28673/50000 [5:11:53<4:09:01,  1.43it/s]


 57%|██████████████████▉              | 28674/50000 [5:11:54<4:25:33,  1.34it/s]


 57%|██████████████████▉              | 28675/50000 [5:11:55<4:21:32,  1.36it/s]


 57%|██████████████████▉              | 28676/50000 [5:11:55<4:13:14,  1.40it/s]


 57%|██████████████████▉              | 28677/50000 [5:11:56<4:34:28,  1.29it/s]


 57%|██████████████████▉              | 28678/50000 [5:11:57<4:31:58,  1.31it/s]


 57%|██████████████████▉              | 28679/50000 [5:11:58<4:36:40,  1.28it/s]


 57%|██████████████████▉              | 28680/50000 [5:11:59<4:49:31,  1.23it/s]


 57%|██████████████████▉              | 28681/50000 [5:11:59<4:28:50,  1.32it/s]


 57%|██████████████████▉              | 28682/50000 [5:12:00<4:01:52,  1.47it/s]


 57%|██████████████████▉              | 28683/50000 [5:12:00<3:42:20,  1.60it/s]


 57%|██████████████████▉              | 28684/50000 [5:12:01<3:41:47,  1.60it/s]


 57%|██████████████████▉              | 28685/50000 [5:12:02<4:03:35,  1.46it/s]


 57%|██████████████████▉              | 28686/50000 [5:12:02<3:54:33,  1.51it/s]


 57%|██████████████████▉              | 28687/50000 [5:12:03<3:48:29,  1.55it/s]


 57%|██████████████████▉              | 28688/50000 [5:12:03<3:46:44,  1.57it/s]


 57%|██████████████████▉              | 28689/50000 [5:12:04<3:46:51,  1.57it/s]


 57%|██████████████████▉              | 28690/50000 [5:12:05<4:00:04,  1.48it/s]


 57%|██████████████████▉              | 28691/50000 [5:12:05<3:56:54,  1.50it/s]


 57%|██████████████████▉              | 28692/50000 [5:12:06<4:12:25,  1.41it/s]


 57%|██████████████████▉              | 28693/50000 [5:12:07<3:59:25,  1.48it/s]


 57%|██████████████████▉              | 28694/50000 [5:12:08<3:58:23,  1.49it/s]


 57%|██████████████████▉              | 28695/50000 [5:12:08<3:56:30,  1.50it/s]


 57%|██████████████████▉              | 28696/50000 [5:12:09<3:46:29,  1.57it/s]


 57%|██████████████████▉              | 28697/50000 [5:12:09<3:50:57,  1.54it/s]


 57%|██████████████████▉              | 28698/50000 [5:12:10<3:53:19,  1.52it/s]


 57%|██████████████████▉              | 28699/50000 [5:12:11<3:44:16,  1.58it/s]


 57%|██████████████████▉              | 28700/50000 [5:12:11<3:38:34,  1.62it/s]


                                                                                
{'loss': 3.28, 'grad_norm': 3.0155959129333496, 'learning_rate': 0.000426, 'epoch': 1.5}

 57%|██████████████████▉              | 28700/50000 [5:12:11<3:38:34,  1.62it/s]


 57%|██████████████████▉              | 28701/50000 [5:12:12<3:44:02,  1.58it/s]


 57%|██████████████████▉              | 28702/50000 [5:12:13<3:39:04,  1.62it/s]


 57%|██████████████████▉              | 28703/50000 [5:12:13<3:43:48,  1.59it/s]


 57%|██████████████████▉              | 28704/50000 [5:12:14<3:59:27,  1.48it/s]


 57%|██████████████████▉              | 28705/50000 [5:12:15<3:55:18,  1.51it/s]


 57%|██████████████████▉              | 28706/50000 [5:12:15<3:48:58,  1.55it/s]


 57%|██████████████████▉              | 28707/50000 [5:12:16<3:50:38,  1.54it/s]


 57%|██████████████████▉              | 28708/50000 [5:12:16<3:41:40,  1.60it/s]


 57%|██████████████████▉              | 28709/50000 [5:12:17<3:58:33,  1.49it/s]


 57%|██████████████████▉              | 28710/50000 [5:12:18<3:50:33,  1.54it/s]


 57%|██████████████████▉              | 28711/50000 [5:12:19<4:01:21,  1.47it/s]


 57%|██████████████████▉              | 28712/50000 [5:12:19<3:47:50,  1.56it/s]


 57%|██████████████████▉              | 28713/50000 [5:12:20<3:33:11,  1.66it/s]


 57%|██████████████████▉              | 28714/50000 [5:12:20<3:40:00,  1.61it/s]


 57%|██████████████████▉              | 28715/50000 [5:12:21<3:29:36,  1.69it/s]


 57%|██████████████████▉              | 28716/50000 [5:12:21<3:34:43,  1.65it/s]


 57%|██████████████████▉              | 28717/50000 [5:12:22<3:47:20,  1.56it/s]


 57%|██████████████████▉              | 28718/50000 [5:12:23<4:02:16,  1.46it/s]


 57%|██████████████████▉              | 28719/50000 [5:12:24<4:07:59,  1.43it/s]


 57%|██████████████████▉              | 28720/50000 [5:12:24<3:58:13,  1.49it/s]


 57%|██████████████████▉              | 28721/50000 [5:12:25<3:50:45,  1.54it/s]


 57%|██████████████████▉              | 28722/50000 [5:12:26<3:52:16,  1.53it/s]


 57%|██████████████████▉              | 28723/50000 [5:12:26<4:10:05,  1.42it/s]


 57%|██████████████████▉              | 28724/50000 [5:12:27<3:53:43,  1.52it/s]


 57%|██████████████████▉              | 28725/50000 [5:12:28<3:44:52,  1.58it/s]


 57%|██████████████████▉              | 28726/50000 [5:12:28<3:39:21,  1.62it/s]


 57%|██████████████████▉              | 28727/50000 [5:12:29<3:33:30,  1.66it/s]


 57%|██████████████████▉              | 28728/50000 [5:12:29<3:32:04,  1.67it/s]


 57%|██████████████████▉              | 28729/50000 [5:12:30<3:57:56,  1.49it/s]


 57%|██████████████████▉              | 28730/50000 [5:12:31<3:47:49,  1.56it/s]


 57%|██████████████████▉              | 28731/50000 [5:12:31<3:44:04,  1.58it/s]


 57%|██████████████████▉              | 28732/50000 [5:12:32<3:56:33,  1.50it/s]


 57%|██████████████████▉              | 28733/50000 [5:12:33<3:56:18,  1.50it/s]


 57%|██████████████████▉              | 28734/50000 [5:12:34<4:12:56,  1.40it/s]


 57%|██████████████████▉              | 28735/50000 [5:12:34<4:00:22,  1.47it/s]


 57%|██████████████████▉              | 28736/50000 [5:12:35<3:52:11,  1.53it/s]


 57%|██████████████████▉              | 28737/50000 [5:12:35<3:57:57,  1.49it/s]


 57%|██████████████████▉              | 28738/50000 [5:12:36<3:56:35,  1.50it/s]


 57%|██████████████████▉              | 28739/50000 [5:12:37<3:50:10,  1.54it/s]


 57%|██████████████████▉              | 28740/50000 [5:12:38<4:07:53,  1.43it/s]


 57%|██████████████████▉              | 28741/50000 [5:12:38<3:43:47,  1.58it/s]


 57%|██████████████████▉              | 28742/50000 [5:12:39<3:43:14,  1.59it/s]


 57%|██████████████████▉              | 28743/50000 [5:12:39<3:48:00,  1.55it/s]


 57%|██████████████████▉              | 28744/50000 [5:12:40<3:47:48,  1.56it/s]


 57%|██████████████████▉              | 28745/50000 [5:12:40<3:40:42,  1.61it/s]


 57%|██████████████████▉              | 28746/50000 [5:12:41<3:35:16,  1.65it/s]


 57%|██████████████████▉              | 28747/50000 [5:12:42<3:32:08,  1.67it/s]


 57%|██████████████████▉              | 28748/50000 [5:12:42<3:35:22,  1.64it/s]


 57%|██████████████████▉              | 28749/50000 [5:12:43<3:38:57,  1.62it/s]


 57%|██████████████████▉              | 28750/50000 [5:12:44<3:42:46,  1.59it/s]


 58%|██████████████████▉              | 28751/50000 [5:12:44<3:37:24,  1.63it/s]


 58%|██████████████████▉              | 28752/50000 [5:12:45<3:47:45,  1.55it/s]


 58%|██████████████████▉              | 28753/50000 [5:12:46<3:47:32,  1.56it/s]


 58%|██████████████████▉              | 28754/50000 [5:12:46<3:42:41,  1.59it/s]


 58%|██████████████████▉              | 28755/50000 [5:12:47<3:34:38,  1.65it/s]


 58%|██████████████████▉              | 28756/50000 [5:12:47<3:48:41,  1.55it/s]


 58%|██████████████████▉              | 28757/50000 [5:12:48<3:41:30,  1.60it/s]


 58%|██████████████████▉              | 28758/50000 [5:12:49<3:48:01,  1.55it/s]


 58%|██████████████████▉              | 28759/50000 [5:12:49<3:49:46,  1.54it/s]


 58%|██████████████████▉              | 28760/50000 [5:12:50<4:19:19,  1.37it/s]


 58%|██████████████████▉              | 28761/50000 [5:12:51<4:02:50,  1.46it/s]


 58%|██████████████████▉              | 28762/50000 [5:12:51<3:54:49,  1.51it/s]


 58%|██████████████████▉              | 28763/50000 [5:12:52<4:02:12,  1.46it/s]


 58%|██████████████████▉              | 28764/50000 [5:12:53<3:40:38,  1.60it/s]


 58%|██████████████████▉              | 28765/50000 [5:12:53<3:44:36,  1.58it/s]


 58%|██████████████████▉              | 28766/50000 [5:12:54<3:45:22,  1.57it/s]


 58%|██████████████████▉              | 28767/50000 [5:12:55<3:45:52,  1.57it/s]


 58%|██████████████████▉              | 28768/50000 [5:12:55<3:33:47,  1.66it/s]


 58%|██████████████████▉              | 28769/50000 [5:12:56<3:30:22,  1.68it/s]


 58%|██████████████████▉              | 28770/50000 [5:12:56<3:44:26,  1.58it/s]


 58%|██████████████████▉              | 28771/50000 [5:12:57<4:12:46,  1.40it/s]


 58%|██████████████████▉              | 28772/50000 [5:12:58<3:57:42,  1.49it/s]


 58%|██████████████████▉              | 28773/50000 [5:12:58<3:48:26,  1.55it/s]


 58%|██████████████████▉              | 28774/50000 [5:12:59<3:35:32,  1.64it/s]


 58%|██████████████████▉              | 28775/50000 [5:13:00<3:59:44,  1.48it/s]


 58%|██████████████████▉              | 28776/50000 [5:13:01<3:58:32,  1.48it/s]


 58%|██████████████████▉              | 28777/50000 [5:13:01<3:41:59,  1.59it/s]


 58%|██████████████████▉              | 28778/50000 [5:13:02<3:36:45,  1.63it/s]


 58%|██████████████████▉              | 28779/50000 [5:13:02<3:47:20,  1.56it/s]


 58%|██████████████████▉              | 28780/50000 [5:13:03<3:43:09,  1.58it/s]


 58%|██████████████████▉              | 28781/50000 [5:13:04<3:55:15,  1.50it/s]


 58%|██████████████████▉              | 28782/50000 [5:13:04<3:48:01,  1.55it/s]


 58%|██████████████████▉              | 28783/50000 [5:13:05<3:42:59,  1.59it/s]


 58%|██████████████████▉              | 28784/50000 [5:13:05<3:37:26,  1.63it/s]


 58%|██████████████████▉              | 28785/50000 [5:13:06<3:41:32,  1.60it/s]


 58%|██████████████████▉              | 28786/50000 [5:13:07<3:36:27,  1.63it/s]


 58%|██████████████████▉              | 28787/50000 [5:13:07<3:39:32,  1.61it/s]


 58%|███████████████████              | 28788/50000 [5:13:08<3:37:03,  1.63it/s]


 58%|███████████████████              | 28789/50000 [5:13:09<3:35:58,  1.64it/s]


 58%|███████████████████              | 28790/50000 [5:13:09<3:40:30,  1.60it/s]


 58%|███████████████████              | 28791/50000 [5:13:10<3:41:44,  1.59it/s]


 58%|███████████████████              | 28792/50000 [5:13:10<3:35:23,  1.64it/s]


 58%|███████████████████              | 28793/50000 [5:13:11<3:35:23,  1.64it/s]


 58%|███████████████████              | 28794/50000 [5:13:12<3:39:27,  1.61it/s]


 58%|███████████████████              | 28795/50000 [5:13:12<3:43:14,  1.58it/s]


 58%|███████████████████              | 28796/50000 [5:13:13<4:06:36,  1.43it/s]


 58%|███████████████████              | 28797/50000 [5:13:14<3:54:36,  1.51it/s]


 58%|███████████████████              | 28798/50000 [5:13:14<3:52:55,  1.52it/s]


 58%|███████████████████              | 28799/50000 [5:13:15<3:41:05,  1.60it/s]


 58%|███████████████████              | 28800/50000 [5:13:16<3:43:39,  1.58it/s]
                                                                                
{'loss': 3.271, 'grad_norm': 3.6103060245513916, 'learning_rate': 0.000424, 'epoch': 1.51}

 58%|███████████████████              | 28800/50000 [5:13:16<3:43:39,  1.58it/s]


 58%|███████████████████              | 28801/50000 [5:13:16<4:06:03,  1.44it/s]


 58%|███████████████████              | 28802/50000 [5:13:17<4:14:39,  1.39it/s]


 58%|███████████████████              | 28803/50000 [5:13:18<4:06:13,  1.43it/s]


 58%|███████████████████              | 28804/50000 [5:13:19<4:05:03,  1.44it/s]


 58%|███████████████████              | 28805/50000 [5:13:19<3:50:18,  1.53it/s]


 58%|███████████████████              | 28806/50000 [5:13:20<3:40:39,  1.60it/s]


 58%|███████████████████              | 28807/50000 [5:13:21<4:10:40,  1.41it/s]


 58%|███████████████████              | 28808/50000 [5:13:21<4:03:24,  1.45it/s]


 58%|███████████████████              | 28809/50000 [5:13:22<4:05:38,  1.44it/s]


 58%|███████████████████              | 28810/50000 [5:13:23<4:04:25,  1.44it/s]


 58%|███████████████████              | 28811/50000 [5:13:23<3:54:45,  1.50it/s]


 58%|███████████████████              | 28812/50000 [5:13:24<3:53:25,  1.51it/s]


 58%|███████████████████              | 28813/50000 [5:13:24<3:52:30,  1.52it/s]


 58%|███████████████████              | 28814/50000 [5:13:25<4:02:48,  1.45it/s]


 58%|███████████████████              | 28815/50000 [5:13:26<4:12:41,  1.40it/s]


 58%|███████████████████              | 28816/50000 [5:13:27<4:12:39,  1.40it/s]


 58%|███████████████████              | 28817/50000 [5:13:27<3:55:54,  1.50it/s]


 58%|███████████████████              | 28818/50000 [5:13:28<3:48:47,  1.54it/s]


 58%|███████████████████              | 28819/50000 [5:13:29<3:58:10,  1.48it/s]


 58%|███████████████████              | 28820/50000 [5:13:29<3:46:24,  1.56it/s]


 58%|███████████████████              | 28821/50000 [5:13:30<3:40:50,  1.60it/s]


 58%|███████████████████              | 28822/50000 [5:13:30<3:45:41,  1.56it/s]


 58%|███████████████████              | 28823/50000 [5:13:31<3:36:19,  1.63it/s]


 58%|███████████████████              | 28824/50000 [5:13:32<3:33:05,  1.66it/s]


 58%|███████████████████              | 28825/50000 [5:13:32<3:47:12,  1.55it/s]


 58%|███████████████████              | 28826/50000 [5:13:33<3:43:19,  1.58it/s]


 58%|███████████████████              | 28827/50000 [5:13:34<3:47:20,  1.55it/s]


 58%|███████████████████              | 28828/50000 [5:13:34<3:41:26,  1.59it/s]


 58%|███████████████████              | 28829/50000 [5:13:35<3:44:44,  1.57it/s]


 58%|███████████████████              | 28830/50000 [5:13:35<3:37:38,  1.62it/s]


 58%|███████████████████              | 28831/50000 [5:13:36<3:44:02,  1.57it/s]


 58%|███████████████████              | 28832/50000 [5:13:37<3:36:36,  1.63it/s]


 58%|███████████████████              | 28833/50000 [5:13:37<3:41:21,  1.59it/s]


 58%|███████████████████              | 28834/50000 [5:13:38<3:34:37,  1.64it/s]


 58%|███████████████████              | 28835/50000 [5:13:38<3:33:13,  1.65it/s]


 58%|███████████████████              | 28836/50000 [5:13:39<3:56:31,  1.49it/s]


 58%|███████████████████              | 28837/50000 [5:13:40<3:46:15,  1.56it/s]


 58%|███████████████████              | 28838/50000 [5:13:41<3:48:23,  1.54it/s]


 58%|███████████████████              | 28839/50000 [5:13:41<3:58:10,  1.48it/s]


 58%|███████████████████              | 28840/50000 [5:13:42<3:49:52,  1.53it/s]


 58%|███████████████████              | 28841/50000 [5:13:42<3:42:07,  1.59it/s]


 58%|███████████████████              | 28842/50000 [5:13:43<3:34:02,  1.65it/s]


 58%|███████████████████              | 28843/50000 [5:13:44<3:33:01,  1.66it/s]


 58%|███████████████████              | 28844/50000 [5:13:44<3:50:38,  1.53it/s]


 58%|███████████████████              | 28845/50000 [5:13:45<3:45:13,  1.57it/s]


 58%|███████████████████              | 28846/50000 [5:13:46<3:50:13,  1.53it/s]


 58%|███████████████████              | 28847/50000 [5:13:46<3:59:17,  1.47it/s]


 58%|███████████████████              | 28848/50000 [5:13:47<3:57:22,  1.49it/s]


 58%|███████████████████              | 28849/50000 [5:13:48<3:52:41,  1.51it/s]


 58%|███████████████████              | 28850/50000 [5:13:48<3:58:08,  1.48it/s]


 58%|███████████████████              | 28851/50000 [5:13:49<4:09:39,  1.41it/s]


 58%|███████████████████              | 28852/50000 [5:13:50<4:10:06,  1.41it/s]


 58%|███████████████████              | 28853/50000 [5:13:50<3:53:35,  1.51it/s]


 58%|███████████████████              | 28854/50000 [5:13:51<3:46:41,  1.55it/s]


 58%|███████████████████              | 28855/50000 [5:13:52<3:54:33,  1.50it/s]


 58%|███████████████████              | 28856/50000 [5:13:53<4:02:16,  1.45it/s]


 58%|███████████████████              | 28857/50000 [5:13:53<4:00:07,  1.47it/s]


 58%|███████████████████              | 28858/50000 [5:13:54<4:44:33,  1.24it/s]


 58%|███████████████████              | 28859/50000 [5:13:55<4:31:06,  1.30it/s]


 58%|███████████████████              | 28860/50000 [5:13:56<4:07:35,  1.42it/s]


 58%|███████████████████              | 28861/50000 [5:13:56<4:11:25,  1.40it/s]


 58%|███████████████████              | 28862/50000 [5:13:57<3:55:50,  1.49it/s]


 58%|███████████████████              | 28863/50000 [5:13:57<3:53:33,  1.51it/s]


 58%|███████████████████              | 28864/50000 [5:13:58<3:51:16,  1.52it/s]


 58%|███████████████████              | 28865/50000 [5:13:59<3:45:01,  1.57it/s]


 58%|███████████████████              | 28866/50000 [5:13:59<3:48:38,  1.54it/s]


 58%|███████████████████              | 28867/50000 [5:14:00<3:49:52,  1.53it/s]


 58%|███████████████████              | 28868/50000 [5:14:01<3:44:41,  1.57it/s]


 58%|███████████████████              | 28869/50000 [5:14:01<3:37:09,  1.62it/s]


 58%|███████████████████              | 28870/50000 [5:14:02<3:43:47,  1.57it/s]


 58%|███████████████████              | 28871/50000 [5:14:02<3:39:05,  1.61it/s]


 58%|███████████████████              | 28872/50000 [5:14:03<3:37:44,  1.62it/s]


 58%|███████████████████              | 28873/50000 [5:14:04<3:42:51,  1.58it/s]


 58%|███████████████████              | 28874/50000 [5:14:04<3:35:46,  1.63it/s]


 58%|███████████████████              | 28875/50000 [5:14:05<3:35:33,  1.63it/s]


 58%|███████████████████              | 28876/50000 [5:14:06<3:38:02,  1.61it/s]


 58%|███████████████████              | 28877/50000 [5:14:06<3:48:41,  1.54it/s]


 58%|███████████████████              | 28878/50000 [5:14:07<4:09:11,  1.41it/s]


 58%|███████████████████              | 28879/50000 [5:14:08<4:02:11,  1.45it/s]


 58%|███████████████████              | 28880/50000 [5:14:08<4:04:39,  1.44it/s]


 58%|███████████████████              | 28881/50000 [5:14:09<3:53:01,  1.51it/s]


 58%|███████████████████              | 28882/50000 [5:14:10<3:47:28,  1.55it/s]


 58%|███████████████████              | 28883/50000 [5:14:10<3:55:18,  1.50it/s]


 58%|███████████████████              | 28884/50000 [5:14:11<4:04:46,  1.44it/s]


 58%|███████████████████              | 28885/50000 [5:14:12<4:07:21,  1.42it/s]


 58%|███████████████████              | 28886/50000 [5:14:12<3:56:11,  1.49it/s]


 58%|███████████████████              | 28887/50000 [5:14:13<3:53:44,  1.51it/s]


 58%|███████████████████              | 28888/50000 [5:14:14<3:59:32,  1.47it/s]


 58%|███████████████████              | 28889/50000 [5:14:15<3:59:10,  1.47it/s]


 58%|███████████████████              | 28890/50000 [5:14:15<3:54:53,  1.50it/s]


 58%|███████████████████              | 28891/50000 [5:14:16<3:51:54,  1.52it/s]


 58%|███████████████████              | 28892/50000 [5:14:16<3:34:51,  1.64it/s]


 58%|███████████████████              | 28893/50000 [5:14:17<3:33:52,  1.64it/s]


 58%|███████████████████              | 28894/50000 [5:14:18<3:46:14,  1.55it/s]


 58%|███████████████████              | 28895/50000 [5:14:18<3:42:05,  1.58it/s]


 58%|███████████████████              | 28896/50000 [5:14:19<3:34:34,  1.64it/s]


 58%|███████████████████              | 28897/50000 [5:14:19<3:28:41,  1.69it/s]


 58%|███████████████████              | 28898/50000 [5:14:20<3:33:10,  1.65it/s]


 58%|███████████████████              | 28899/50000 [5:14:21<3:30:55,  1.67it/s]


 58%|███████████████████              | 28900/50000 [5:14:21<3:33:24,  1.65it/s]
                                                                                
{'loss': 3.2894, 'grad_norm': 4.201901912689209, 'learning_rate': 0.000422, 'epoch': 1.51}

 58%|███████████████████              | 28900/50000 [5:14:21<3:33:24,  1.65it/s]


 58%|███████████████████              | 28901/50000 [5:14:22<3:18:19,  1.77it/s]


 58%|███████████████████              | 28902/50000 [5:14:22<3:13:34,  1.82it/s]


 58%|███████████████████              | 28903/50000 [5:14:23<3:27:14,  1.70it/s]


 58%|███████████████████              | 28904/50000 [5:14:24<3:43:03,  1.58it/s]


 58%|███████████████████              | 28905/50000 [5:14:24<3:35:51,  1.63it/s]


 58%|███████████████████              | 28906/50000 [5:14:25<3:41:27,  1.59it/s]


 58%|███████████████████              | 28907/50000 [5:14:25<3:41:30,  1.59it/s]


 58%|███████████████████              | 28908/50000 [5:14:26<3:42:40,  1.58it/s]


 58%|███████████████████              | 28909/50000 [5:14:27<3:33:33,  1.65it/s]


 58%|███████████████████              | 28910/50000 [5:14:27<3:36:31,  1.62it/s]


 58%|███████████████████              | 28911/50000 [5:14:28<3:40:29,  1.59it/s]


 58%|███████████████████              | 28912/50000 [5:14:29<3:35:20,  1.63it/s]


 58%|███████████████████              | 28913/50000 [5:14:29<3:40:02,  1.60it/s]


 58%|███████████████████              | 28914/50000 [5:14:30<3:52:15,  1.51it/s]


 58%|███████████████████              | 28915/50000 [5:14:31<3:45:51,  1.56it/s]


 58%|███████████████████              | 28916/50000 [5:14:31<3:53:00,  1.51it/s]


 58%|███████████████████              | 28917/50000 [5:14:32<3:41:15,  1.59it/s]


 58%|███████████████████              | 28918/50000 [5:14:32<3:38:24,  1.61it/s]


 58%|███████████████████              | 28919/50000 [5:14:33<3:49:12,  1.53it/s]


 58%|███████████████████              | 28920/50000 [5:14:34<3:44:27,  1.57it/s]


 58%|███████████████████              | 28921/50000 [5:14:34<3:37:17,  1.62it/s]


 58%|███████████████████              | 28922/50000 [5:14:35<4:06:52,  1.42it/s]


 58%|███████████████████              | 28923/50000 [5:14:36<3:52:26,  1.51it/s]


 58%|███████████████████              | 28924/50000 [5:14:36<3:49:07,  1.53it/s]


 58%|███████████████████              | 28925/50000 [5:14:37<3:57:56,  1.48it/s]


 58%|███████████████████              | 28926/50000 [5:14:38<3:47:00,  1.55it/s]


 58%|███████████████████              | 28927/50000 [5:14:38<4:04:25,  1.44it/s]


 58%|███████████████████              | 28928/50000 [5:14:39<3:46:21,  1.55it/s]


 58%|███████████████████              | 28929/50000 [5:14:40<3:48:09,  1.54it/s]


 58%|███████████████████              | 28930/50000 [5:14:40<3:40:06,  1.60it/s]


 58%|███████████████████              | 28931/50000 [5:14:41<3:33:04,  1.65it/s]


 58%|███████████████████              | 28932/50000 [5:14:42<3:50:02,  1.53it/s]


 58%|███████████████████              | 28933/50000 [5:14:42<3:53:26,  1.50it/s]


 58%|███████████████████              | 28934/50000 [5:14:43<3:45:26,  1.56it/s]


 58%|███████████████████              | 28935/50000 [5:14:43<3:31:29,  1.66it/s]


 58%|███████████████████              | 28936/50000 [5:14:44<3:22:23,  1.73it/s]


 58%|███████████████████              | 28937/50000 [5:14:44<3:23:40,  1.72it/s]


 58%|███████████████████              | 28938/50000 [5:14:45<3:33:52,  1.64it/s]


 58%|███████████████████              | 28939/50000 [5:14:46<3:35:59,  1.63it/s]


 58%|███████████████████              | 28940/50000 [5:14:46<3:34:17,  1.64it/s]


 58%|███████████████████              | 28941/50000 [5:14:47<3:40:51,  1.59it/s]


 58%|███████████████████              | 28942/50000 [5:14:48<3:36:08,  1.62it/s]


 58%|███████████████████              | 28943/50000 [5:14:48<3:25:53,  1.70it/s]


 58%|███████████████████              | 28944/50000 [5:14:49<4:16:48,  1.37it/s]


 58%|███████████████████              | 28945/50000 [5:14:50<4:29:16,  1.30it/s]


 58%|███████████████████              | 28946/50000 [5:14:51<4:26:43,  1.32it/s]


 58%|███████████████████              | 28947/50000 [5:14:51<4:10:14,  1.40it/s]


 58%|███████████████████              | 28948/50000 [5:14:52<3:58:01,  1.47it/s]


 58%|███████████████████              | 28949/50000 [5:14:53<3:55:26,  1.49it/s]


 58%|███████████████████              | 28950/50000 [5:14:53<3:47:38,  1.54it/s]


 58%|███████████████████              | 28951/50000 [5:14:54<3:31:19,  1.66it/s]


 58%|███████████████████              | 28952/50000 [5:14:54<3:22:29,  1.73it/s]


 58%|███████████████████              | 28953/50000 [5:14:55<3:27:30,  1.69it/s]


 58%|███████████████████              | 28954/50000 [5:14:56<3:34:10,  1.64it/s]


 58%|███████████████████              | 28955/50000 [5:14:56<3:36:12,  1.62it/s]


 58%|███████████████████              | 28956/50000 [5:14:57<3:34:08,  1.64it/s]


 58%|███████████████████              | 28957/50000 [5:14:57<3:33:33,  1.64it/s]


 58%|███████████████████              | 28958/50000 [5:14:58<3:36:21,  1.62it/s]


 58%|███████████████████              | 28959/50000 [5:14:59<3:41:27,  1.58it/s]


 58%|███████████████████              | 28960/50000 [5:14:59<3:53:50,  1.50it/s]


 58%|███████████████████              | 28961/50000 [5:15:00<3:52:15,  1.51it/s]


 58%|███████████████████              | 28962/50000 [5:15:01<4:04:44,  1.43it/s]


 58%|███████████████████              | 28963/50000 [5:15:02<4:05:54,  1.43it/s]


 58%|███████████████████              | 28964/50000 [5:15:02<3:46:26,  1.55it/s]


 58%|███████████████████              | 28965/50000 [5:15:03<4:04:20,  1.43it/s]


 58%|███████████████████              | 28966/50000 [5:15:04<3:59:02,  1.47it/s]


 58%|███████████████████              | 28967/50000 [5:15:04<4:02:05,  1.45it/s]


 58%|███████████████████              | 28968/50000 [5:15:05<3:44:36,  1.56it/s]


 58%|███████████████████              | 28969/50000 [5:15:05<3:49:22,  1.53it/s]


 58%|███████████████████              | 28970/50000 [5:15:06<4:00:55,  1.45it/s]


 58%|███████████████████              | 28971/50000 [5:15:07<4:05:18,  1.43it/s]


 58%|███████████████████              | 28972/50000 [5:15:08<3:55:26,  1.49it/s]


 58%|███████████████████              | 28973/50000 [5:15:08<4:06:34,  1.42it/s]


 58%|███████████████████              | 28974/50000 [5:15:09<3:44:29,  1.56it/s]


 58%|███████████████████              | 28975/50000 [5:15:09<3:40:35,  1.59it/s]


 58%|███████████████████              | 28976/50000 [5:15:10<3:46:32,  1.55it/s]


 58%|███████████████████              | 28977/50000 [5:15:11<3:38:56,  1.60it/s]


 58%|███████████████████▏             | 28978/50000 [5:15:11<3:49:35,  1.53it/s]


 58%|███████████████████▏             | 28979/50000 [5:15:12<3:46:03,  1.55it/s]


 58%|███████████████████▏             | 28980/50000 [5:15:13<3:59:17,  1.46it/s]


 58%|███████████████████▏             | 28981/50000 [5:15:14<3:56:59,  1.48it/s]


 58%|███████████████████▏             | 28982/50000 [5:15:14<3:51:21,  1.51it/s]


 58%|███████████████████▏             | 28983/50000 [5:15:15<3:44:03,  1.56it/s]


 58%|███████████████████▏             | 28984/50000 [5:15:15<3:44:15,  1.56it/s]


 58%|███████████████████▏             | 28985/50000 [5:15:16<3:37:45,  1.61it/s]


 58%|███████████████████▏             | 28986/50000 [5:15:17<3:36:28,  1.62it/s]


 58%|███████████████████▏             | 28987/50000 [5:15:17<3:32:20,  1.65it/s]


 58%|███████████████████▏             | 28988/50000 [5:15:18<3:29:24,  1.67it/s]


 58%|███████████████████▏             | 28989/50000 [5:15:18<3:17:09,  1.78it/s]


 58%|███████████████████▏             | 28990/50000 [5:15:19<3:20:06,  1.75it/s]


 58%|███████████████████▏             | 28991/50000 [5:15:19<3:30:57,  1.66it/s]


 58%|███████████████████▏             | 28992/50000 [5:15:20<3:31:06,  1.66it/s]


 58%|███████████████████▏             | 28993/50000 [5:15:21<3:31:09,  1.66it/s]


 58%|███████████████████▏             | 28994/50000 [5:15:21<3:21:35,  1.74it/s]


 58%|███████████████████▏             | 28995/50000 [5:15:22<3:35:48,  1.62it/s]


 58%|███████████████████▏             | 28996/50000 [5:15:23<3:37:49,  1.61it/s]


 58%|███████████████████▏             | 28997/50000 [5:15:23<3:44:29,  1.56it/s]


 58%|███████████████████▏             | 28998/50000 [5:15:24<3:28:57,  1.68it/s]


 58%|███████████████████▏             | 28999/50000 [5:15:24<3:41:02,  1.58it/s]


 58%|███████████████████▏             | 29000/50000 [5:15:25<3:43:22,  1.57it/s]
                                                                                
{'loss': 3.2839, 'grad_norm': 3.9266700744628906, 'learning_rate': 0.00042, 'epoch': 1.52}

 58%|███████████████████▏             | 29000/50000 [5:15:25<3:43:22,  1.57it/s]


 58%|███████████████████▏             | 29001/50000 [5:15:26<4:13:34,  1.38it/s]


 58%|███████████████████▏             | 29002/50000 [5:15:27<4:05:16,  1.43it/s]


 58%|███████████████████▏             | 29003/50000 [5:15:27<3:58:48,  1.47it/s]


 58%|███████████████████▏             | 29004/50000 [5:15:28<4:04:30,  1.43it/s]


 58%|███████████████████▏             | 29005/50000 [5:15:29<3:46:50,  1.54it/s]


 58%|███████████████████▏             | 29006/50000 [5:15:29<3:41:22,  1.58it/s]


 58%|███████████████████▏             | 29007/50000 [5:15:30<3:40:06,  1.59it/s]


 58%|███████████████████▏             | 29008/50000 [5:15:30<3:50:05,  1.52it/s]


 58%|███████████████████▏             | 29009/50000 [5:15:31<3:41:33,  1.58it/s]


 58%|███████████████████▏             | 29010/50000 [5:15:32<3:38:11,  1.60it/s]


 58%|███████████████████▏             | 29011/50000 [5:15:32<3:27:35,  1.69it/s]


 58%|███████████████████▏             | 29012/50000 [5:15:33<3:40:41,  1.59it/s]


 58%|███████████████████▏             | 29013/50000 [5:15:33<3:28:01,  1.68it/s]


 58%|███████████████████▏             | 29014/50000 [5:15:34<3:53:17,  1.50it/s]


 58%|███████████████████▏             | 29015/50000 [5:15:35<4:13:05,  1.38it/s]


 58%|███████████████████▏             | 29016/50000 [5:15:36<3:51:22,  1.51it/s]


 58%|███████████████████▏             | 29017/50000 [5:15:36<3:53:01,  1.50it/s]


 58%|███████████████████▏             | 29018/50000 [5:15:37<3:59:05,  1.46it/s]


 58%|███████████████████▏             | 29019/50000 [5:15:38<4:06:39,  1.42it/s]


 58%|███████████████████▏             | 29020/50000 [5:15:38<3:51:09,  1.51it/s]


 58%|███████████████████▏             | 29021/50000 [5:15:39<3:40:18,  1.59it/s]


 58%|███████████████████▏             | 29022/50000 [5:15:39<3:33:21,  1.64it/s]


 58%|███████████████████▏             | 29023/50000 [5:15:40<3:31:23,  1.65it/s]


 58%|███████████████████▏             | 29024/50000 [5:15:41<3:54:29,  1.49it/s]


 58%|███████████████████▏             | 29025/50000 [5:15:42<3:53:13,  1.50it/s]


 58%|███████████████████▏             | 29026/50000 [5:15:42<3:48:57,  1.53it/s]


 58%|███████████████████▏             | 29027/50000 [5:15:43<3:38:01,  1.60it/s]


 58%|███████████████████▏             | 29028/50000 [5:15:43<3:36:19,  1.62it/s]


 58%|███████████████████▏             | 29029/50000 [5:15:44<3:41:47,  1.58it/s]


 58%|███████████████████▏             | 29030/50000 [5:15:45<3:42:16,  1.57it/s]


 58%|███████████████████▏             | 29031/50000 [5:15:45<3:36:41,  1.61it/s]


 58%|███████████████████▏             | 29032/50000 [5:15:46<3:40:42,  1.58it/s]


 58%|███████████████████▏             | 29033/50000 [5:15:46<3:28:38,  1.67it/s]


 58%|███████████████████▏             | 29034/50000 [5:15:47<3:37:43,  1.60it/s]


 58%|███████████████████▏             | 29035/50000 [5:15:48<3:32:51,  1.64it/s]


 58%|███████████████████▏             | 29036/50000 [5:15:48<3:56:30,  1.48it/s]


 58%|███████████████████▏             | 29037/50000 [5:15:49<3:47:15,  1.54it/s]


 58%|███████████████████▏             | 29038/50000 [5:15:50<3:40:12,  1.59it/s]


 58%|███████████████████▏             | 29039/50000 [5:15:51<4:08:26,  1.41it/s]


 58%|███████████████████▏             | 29040/50000 [5:15:51<4:12:29,  1.38it/s]


 58%|███████████████████▏             | 29041/50000 [5:15:52<3:57:10,  1.47it/s]


 58%|███████████████████▏             | 29042/50000 [5:15:52<3:49:47,  1.52it/s]


 58%|███████████████████▏             | 29043/50000 [5:15:53<4:18:36,  1.35it/s]


 58%|███████████████████▏             | 29044/50000 [5:15:54<4:12:02,  1.39it/s]


 58%|███████████████████▏             | 29045/50000 [5:15:55<3:59:28,  1.46it/s]


 58%|███████████████████▏             | 29046/50000 [5:15:55<3:54:36,  1.49it/s]


 58%|███████████████████▏             | 29047/50000 [5:15:56<4:00:04,  1.45it/s]


 58%|███████████████████▏             | 29048/50000 [5:15:57<4:06:33,  1.42it/s]


 58%|███████████████████▏             | 29049/50000 [5:15:57<3:54:16,  1.49it/s]


 58%|███████████████████▏             | 29050/50000 [5:15:58<3:42:26,  1.57it/s]


 58%|███████████████████▏             | 29051/50000 [5:15:59<3:50:57,  1.51it/s]


 58%|███████████████████▏             | 29052/50000 [5:15:59<3:44:40,  1.55it/s]


 58%|███████████████████▏             | 29053/50000 [5:16:00<3:38:55,  1.59it/s]


 58%|███████████████████▏             | 29054/50000 [5:16:01<3:50:49,  1.51it/s]


 58%|███████████████████▏             | 29055/50000 [5:16:01<4:00:09,  1.45it/s]


 58%|███████████████████▏             | 29056/50000 [5:16:02<3:45:48,  1.55it/s]


 58%|███████████████████▏             | 29057/50000 [5:16:03<3:52:20,  1.50it/s]


 58%|███████████████████▏             | 29058/50000 [5:16:03<3:59:47,  1.46it/s]


 58%|███████████████████▏             | 29059/50000 [5:16:04<3:57:28,  1.47it/s]


 58%|███████████████████▏             | 29060/50000 [5:16:05<3:53:09,  1.50it/s]


 58%|███████████████████▏             | 29061/50000 [5:16:05<3:53:30,  1.49it/s]


 58%|███████████████████▏             | 29062/50000 [5:16:06<3:54:24,  1.49it/s]


 58%|███████████████████▏             | 29063/50000 [5:16:07<3:47:03,  1.54it/s]


 58%|███████████████████▏             | 29064/50000 [5:16:07<3:49:11,  1.52it/s]


 58%|███████████████████▏             | 29065/50000 [5:16:08<3:38:49,  1.59it/s]


 58%|███████████████████▏             | 29066/50000 [5:16:08<3:31:32,  1.65it/s]


 58%|███████████████████▏             | 29067/50000 [5:16:09<3:26:27,  1.69it/s]


 58%|███████████████████▏             | 29068/50000 [5:16:10<3:33:24,  1.63it/s]


 58%|███████████████████▏             | 29069/50000 [5:16:11<4:04:08,  1.43it/s]


 58%|███████████████████▏             | 29070/50000 [5:16:11<3:57:48,  1.47it/s]


 58%|███████████████████▏             | 29071/50000 [5:16:12<4:16:33,  1.36it/s]


 58%|███████████████████▏             | 29072/50000 [5:16:13<4:24:17,  1.32it/s]


 58%|███████████████████▏             | 29073/50000 [5:16:14<4:15:52,  1.36it/s]


 58%|███████████████████▏             | 29074/50000 [5:16:14<3:51:48,  1.50it/s]


 58%|███████████████████▏             | 29075/50000 [5:16:15<3:52:26,  1.50it/s]


 58%|███████████████████▏             | 29076/50000 [5:16:15<3:59:46,  1.45it/s]


 58%|███████████████████▏             | 29077/50000 [5:16:16<3:58:04,  1.46it/s]


 58%|███████████████████▏             | 29078/50000 [5:16:17<3:48:14,  1.53it/s]


 58%|███████████████████▏             | 29079/50000 [5:16:17<3:53:56,  1.49it/s]


 58%|███████████████████▏             | 29080/50000 [5:16:18<3:51:18,  1.51it/s]


 58%|███████████████████▏             | 29081/50000 [5:16:19<3:47:28,  1.53it/s]


 58%|███████████████████▏             | 29082/50000 [5:16:19<3:50:03,  1.52it/s]


 58%|███████████████████▏             | 29083/50000 [5:16:20<4:09:42,  1.40it/s]


 58%|███████████████████▏             | 29084/50000 [5:16:21<4:22:10,  1.33it/s]


 58%|███████████████████▏             | 29085/50000 [5:16:22<3:54:54,  1.48it/s]


 58%|███████████████████▏             | 29086/50000 [5:16:22<3:51:58,  1.50it/s]


 58%|███████████████████▏             | 29087/50000 [5:16:23<3:59:25,  1.46it/s]


 58%|███████████████████▏             | 29088/50000 [5:16:24<3:55:46,  1.48it/s]


 58%|███████████████████▏             | 29089/50000 [5:16:24<4:04:03,  1.43it/s]


 58%|███████████████████▏             | 29090/50000 [5:16:25<4:02:31,  1.44it/s]


 58%|███████████████████▏             | 29091/50000 [5:16:26<3:48:17,  1.53it/s]


 58%|███████████████████▏             | 29092/50000 [5:16:26<3:46:13,  1.54it/s]


 58%|███████████████████▏             | 29093/50000 [5:16:27<3:42:05,  1.57it/s]


 58%|███████████████████▏             | 29094/50000 [5:16:27<3:45:10,  1.55it/s]


 58%|███████████████████▏             | 29095/50000 [5:16:28<3:41:19,  1.57it/s]


 58%|███████████████████▏             | 29096/50000 [5:16:29<3:45:49,  1.54it/s]


 58%|███████████████████▏             | 29097/50000 [5:16:29<3:40:29,  1.58it/s]


 58%|███████████████████▏             | 29098/50000 [5:16:30<3:23:30,  1.71it/s]


 58%|███████████████████▏             | 29099/50000 [5:16:31<3:43:47,  1.56it/s]


 58%|███████████████████▏             | 29100/50000 [5:16:31<3:38:24,  1.59it/s]
                                                                                
{'loss': 3.271, 'grad_norm': 3.076934337615967, 'learning_rate': 0.00041799999999999997, 'epoch': 1.52}

 58%|███████████████████▏             | 29100/50000 [5:16:31<3:38:24,  1.59it/s]


 58%|███████████████████▏             | 29101/50000 [5:16:32<3:38:39,  1.59it/s]


 58%|███████████████████▏             | 29102/50000 [5:16:32<3:38:46,  1.59it/s]


 58%|███████████████████▏             | 29103/50000 [5:16:33<3:59:55,  1.45it/s]


 58%|███████████████████▏             | 29104/50000 [5:16:34<3:56:15,  1.47it/s]


 58%|███████████████████▏             | 29105/50000 [5:16:34<3:42:30,  1.57it/s]


 58%|███████████████████▏             | 29106/50000 [5:16:35<3:42:47,  1.56it/s]


 58%|███████████████████▏             | 29107/50000 [5:16:36<3:44:49,  1.55it/s]


 58%|███████████████████▏             | 29108/50000 [5:16:36<3:46:11,  1.54it/s]


 58%|███████████████████▏             | 29109/50000 [5:16:37<3:31:35,  1.65it/s]


 58%|███████████████████▏             | 29110/50000 [5:16:38<3:42:12,  1.57it/s]


 58%|███████████████████▏             | 29111/50000 [5:16:38<3:53:06,  1.49it/s]


 58%|███████████████████▏             | 29112/50000 [5:16:39<3:45:58,  1.54it/s]


 58%|███████████████████▏             | 29113/50000 [5:16:40<3:45:34,  1.54it/s]


 58%|███████████████████▏             | 29114/50000 [5:16:40<3:38:04,  1.60it/s]


 58%|███████████████████▏             | 29115/50000 [5:16:41<3:36:02,  1.61it/s]


 58%|███████████████████▏             | 29116/50000 [5:16:42<3:45:16,  1.55it/s]


 58%|███████████████████▏             | 29117/50000 [5:16:42<3:49:04,  1.52it/s]


 58%|███████████████████▏             | 29118/50000 [5:16:43<3:45:49,  1.54it/s]


 58%|███████████████████▏             | 29119/50000 [5:16:44<3:48:08,  1.53it/s]


 58%|███████████████████▏             | 29120/50000 [5:16:44<3:45:49,  1.54it/s]


 58%|███████████████████▏             | 29121/50000 [5:16:45<3:40:52,  1.58it/s]


 58%|███████████████████▏             | 29122/50000 [5:16:45<3:43:23,  1.56it/s]


 58%|███████████████████▏             | 29123/50000 [5:16:46<3:44:28,  1.55it/s]


 58%|███████████████████▏             | 29124/50000 [5:16:47<3:51:10,  1.51it/s]


 58%|███████████████████▏             | 29125/50000 [5:16:47<3:41:58,  1.57it/s]


 58%|███████████████████▏             | 29126/50000 [5:16:48<3:36:13,  1.61it/s]


 58%|███████████████████▏             | 29127/50000 [5:16:49<3:30:55,  1.65it/s]


 58%|███████████████████▏             | 29128/50000 [5:16:49<3:35:23,  1.61it/s]


 58%|███████████████████▏             | 29129/50000 [5:16:50<3:37:33,  1.60it/s]


 58%|███████████████████▏             | 29130/50000 [5:16:51<3:49:10,  1.52it/s]


 58%|███████████████████▏             | 29131/50000 [5:16:51<4:05:33,  1.42it/s]


 58%|███████████████████▏             | 29132/50000 [5:16:52<4:29:41,  1.29it/s]


 58%|███████████████████▏             | 29133/50000 [5:16:53<4:14:12,  1.37it/s]


 58%|███████████████████▏             | 29134/50000 [5:16:54<4:14:48,  1.36it/s]


 58%|███████████████████▏             | 29135/50000 [5:16:54<4:16:32,  1.36it/s]


 58%|███████████████████▏             | 29136/50000 [5:16:55<4:17:44,  1.35it/s]


 58%|███████████████████▏             | 29137/50000 [5:16:56<4:05:48,  1.41it/s]


 58%|███████████████████▏             | 29138/50000 [5:16:56<3:52:51,  1.49it/s]


 58%|███████████████████▏             | 29139/50000 [5:16:57<4:17:29,  1.35it/s]


 58%|███████████████████▏             | 29140/50000 [5:16:58<4:01:01,  1.44it/s]


 58%|███████████████████▏             | 29141/50000 [5:16:59<4:02:40,  1.43it/s]


 58%|███████████████████▏             | 29142/50000 [5:16:59<4:08:01,  1.40it/s]


 58%|███████████████████▏             | 29143/50000 [5:17:00<4:00:18,  1.45it/s]


 58%|███████████████████▏             | 29144/50000 [5:17:01<4:06:19,  1.41it/s]


 58%|███████████████████▏             | 29145/50000 [5:17:01<3:53:48,  1.49it/s]


 58%|███████████████████▏             | 29146/50000 [5:17:02<3:44:25,  1.55it/s]


 58%|███████████████████▏             | 29147/50000 [5:17:03<3:44:25,  1.55it/s]


 58%|███████████████████▏             | 29148/50000 [5:17:03<3:38:30,  1.59it/s]


 58%|███████████████████▏             | 29149/50000 [5:17:04<3:29:45,  1.66it/s]


 58%|███████████████████▏             | 29150/50000 [5:17:04<3:35:28,  1.61it/s]


 58%|███████████████████▏             | 29151/50000 [5:17:05<3:34:07,  1.62it/s]


 58%|███████████████████▏             | 29152/50000 [5:17:06<3:39:08,  1.59it/s]


 58%|███████████████████▏             | 29153/50000 [5:17:06<3:27:31,  1.67it/s]


 58%|███████████████████▏             | 29154/50000 [5:17:07<3:47:00,  1.53it/s]


 58%|███████████████████▏             | 29155/50000 [5:17:07<3:37:14,  1.60it/s]


 58%|███████████████████▏             | 29156/50000 [5:17:08<4:01:21,  1.44it/s]


 58%|███████████████████▏             | 29157/50000 [5:17:09<3:48:00,  1.52it/s]


 58%|███████████████████▏             | 29158/50000 [5:17:10<3:46:19,  1.53it/s]


 58%|███████████████████▏             | 29159/50000 [5:17:10<3:53:55,  1.48it/s]


 58%|███████████████████▏             | 29160/50000 [5:17:11<3:48:35,  1.52it/s]


 58%|███████████████████▏             | 29161/50000 [5:17:11<3:42:02,  1.56it/s]


 58%|███████████████████▏             | 29162/50000 [5:17:12<3:44:41,  1.55it/s]


 58%|███████████████████▏             | 29163/50000 [5:17:13<4:03:48,  1.42it/s]


 58%|███████████████████▏             | 29164/50000 [5:17:14<4:16:37,  1.35it/s]


 58%|███████████████████▏             | 29165/50000 [5:17:15<4:27:34,  1.30it/s]


 58%|███████████████████▏             | 29166/50000 [5:17:15<4:12:49,  1.37it/s]


 58%|███████████████████▎             | 29167/50000 [5:17:16<3:59:12,  1.45it/s]


 58%|███████████████████▎             | 29168/50000 [5:17:17<4:24:39,  1.31it/s]


 58%|███████████████████▎             | 29169/50000 [5:17:17<4:12:03,  1.38it/s]


 58%|███████████████████▎             | 29170/50000 [5:17:18<3:51:03,  1.50it/s]


 58%|███████████████████▎             | 29171/50000 [5:17:18<3:33:46,  1.62it/s]


 58%|███████████████████▎             | 29172/50000 [5:17:19<3:31:50,  1.64it/s]


 58%|███████████████████▎             | 29173/50000 [5:17:20<3:37:06,  1.60it/s]


 58%|███████████████████▎             | 29174/50000 [5:17:20<3:38:10,  1.59it/s]


 58%|███████████████████▎             | 29175/50000 [5:17:21<3:38:11,  1.59it/s]


 58%|███████████████████▎             | 29176/50000 [5:17:22<3:59:04,  1.45it/s]


 58%|███████████████████▎             | 29177/50000 [5:17:22<3:49:35,  1.51it/s]


 58%|███████████████████▎             | 29178/50000 [5:17:23<3:59:35,  1.45it/s]


 58%|███████████████████▎             | 29179/50000 [5:17:24<3:46:00,  1.54it/s]


 58%|███████████████████▎             | 29180/50000 [5:17:24<3:36:32,  1.60it/s]


 58%|███████████████████▎             | 29181/50000 [5:17:25<4:07:00,  1.40it/s]


 58%|███████████████████▎             | 29182/50000 [5:17:26<3:58:29,  1.45it/s]


 58%|███████████████████▎             | 29183/50000 [5:17:26<3:55:01,  1.48it/s]


 58%|███████████████████▎             | 29184/50000 [5:17:27<3:58:36,  1.45it/s]


 58%|███████████████████▎             | 29185/50000 [5:17:28<3:58:24,  1.46it/s]


 58%|███████████████████▎             | 29186/50000 [5:17:28<3:53:02,  1.49it/s]


 58%|███████████████████▎             | 29187/50000 [5:17:29<3:44:37,  1.54it/s]


 58%|███████████████████▎             | 29188/50000 [5:17:30<3:52:09,  1.49it/s]


 58%|███████████████████▎             | 29189/50000 [5:17:30<3:43:58,  1.55it/s]


 58%|███████████████████▎             | 29190/50000 [5:17:31<3:43:28,  1.55it/s]


 58%|███████████████████▎             | 29191/50000 [5:17:32<3:39:12,  1.58it/s]


 58%|███████████████████▎             | 29192/50000 [5:17:32<3:36:27,  1.60it/s]


 58%|███████████████████▎             | 29193/50000 [5:17:33<3:53:17,  1.49it/s]


 58%|███████████████████▎             | 29194/50000 [5:17:34<4:00:09,  1.44it/s]


 58%|███████████████████▎             | 29195/50000 [5:17:35<4:14:20,  1.36it/s]


 58%|███████████████████▎             | 29196/50000 [5:17:35<3:58:37,  1.45it/s]


 58%|███████████████████▎             | 29197/50000 [5:17:36<3:47:38,  1.52it/s]


 58%|███████████████████▎             | 29198/50000 [5:17:36<3:47:51,  1.52it/s]


 58%|███████████████████▎             | 29199/50000 [5:17:37<3:58:45,  1.45it/s]


 58%|███████████████████▎             | 29200/50000 [5:17:38<4:11:21,  1.38it/s]
                                                                                
{'loss': 3.2777, 'grad_norm': 6.768975734710693, 'learning_rate': 0.000416, 'epoch': 1.53}

 58%|███████████████████▎             | 29200/50000 [5:17:38<4:11:21,  1.38it/s]


 58%|███████████████████▎             | 29201/50000 [5:17:39<4:05:45,  1.41it/s]


 58%|███████████████████▎             | 29202/50000 [5:17:39<3:59:07,  1.45it/s]


 58%|███████████████████▎             | 29203/50000 [5:17:40<3:45:58,  1.53it/s]


 58%|███████████████████▎             | 29204/50000 [5:17:41<3:42:44,  1.56it/s]


 58%|███████████████████▎             | 29205/50000 [5:17:41<3:36:01,  1.60it/s]


 58%|███████████████████▎             | 29206/50000 [5:17:42<3:40:57,  1.57it/s]


 58%|███████████████████▎             | 29207/50000 [5:17:42<3:42:26,  1.56it/s]


 58%|███████████████████▎             | 29208/50000 [5:17:43<3:34:19,  1.62it/s]


 58%|███████████████████▎             | 29209/50000 [5:17:44<3:43:45,  1.55it/s]


 58%|███████████████████▎             | 29210/50000 [5:17:44<3:39:10,  1.58it/s]


 58%|███████████████████▎             | 29211/50000 [5:17:45<3:30:03,  1.65it/s]


 58%|███████████████████▎             | 29212/50000 [5:17:45<3:23:58,  1.70it/s]


 58%|███████████████████▎             | 29213/50000 [5:17:46<3:42:22,  1.56it/s]


 58%|███████████████████▎             | 29214/50000 [5:17:47<3:42:04,  1.56it/s]


 58%|███████████████████▎             | 29215/50000 [5:17:47<3:43:57,  1.55it/s]


 58%|███████████████████▎             | 29216/50000 [5:17:48<3:37:24,  1.59it/s]


 58%|███████████████████▎             | 29217/50000 [5:17:49<3:48:47,  1.51it/s]


 58%|███████████████████▎             | 29218/50000 [5:17:49<3:38:04,  1.59it/s]


 58%|███████████████████▎             | 29219/50000 [5:17:50<3:58:42,  1.45it/s]


 58%|███████████████████▎             | 29220/50000 [5:17:51<3:40:05,  1.57it/s]


 58%|███████████████████▎             | 29221/50000 [5:17:51<3:41:09,  1.57it/s]


 58%|███████████████████▎             | 29222/50000 [5:17:52<3:43:52,  1.55it/s]


 58%|███████████████████▎             | 29223/50000 [5:17:53<3:33:22,  1.62it/s]


 58%|███████████████████▎             | 29224/50000 [5:17:53<3:34:36,  1.61it/s]


 58%|███████████████████▎             | 29225/50000 [5:17:54<3:31:25,  1.64it/s]


 58%|███████████████████▎             | 29226/50000 [5:17:54<3:29:19,  1.65it/s]


 58%|███████████████████▎             | 29227/50000 [5:17:55<3:43:58,  1.55it/s]


 58%|███████████████████▎             | 29228/50000 [5:17:56<3:36:10,  1.60it/s]


 58%|███████████████████▎             | 29229/50000 [5:17:56<3:34:01,  1.62it/s]


 58%|███████████████████▎             | 29230/50000 [5:17:57<3:29:16,  1.65it/s]


 58%|███████████████████▎             | 29231/50000 [5:17:58<3:50:44,  1.50it/s]


 58%|███████████████████▎             | 29232/50000 [5:17:58<3:48:49,  1.51it/s]


 58%|███████████████████▎             | 29233/50000 [5:17:59<3:41:31,  1.56it/s]


 58%|███████████████████▎             | 29234/50000 [5:18:00<3:44:49,  1.54it/s]


 58%|███████████████████▎             | 29235/50000 [5:18:00<3:52:33,  1.49it/s]


 58%|███████████████████▎             | 29236/50000 [5:18:01<3:48:34,  1.51it/s]


 58%|███████████████████▎             | 29237/50000 [5:18:02<3:46:23,  1.53it/s]


 58%|███████████████████▎             | 29238/50000 [5:18:02<3:44:12,  1.54it/s]


 58%|███████████████████▎             | 29239/50000 [5:18:03<3:50:51,  1.50it/s]


 58%|███████████████████▎             | 29240/50000 [5:18:04<3:51:24,  1.50it/s]


 58%|███████████████████▎             | 29241/50000 [5:18:04<3:55:36,  1.47it/s]


 58%|███████████████████▎             | 29242/50000 [5:18:05<3:49:59,  1.50it/s]


 58%|███████████████████▎             | 29243/50000 [5:18:06<3:48:37,  1.51it/s]


 58%|███████████████████▎             | 29244/50000 [5:18:06<3:45:46,  1.53it/s]


 58%|███████████████████▎             | 29245/50000 [5:18:07<3:30:19,  1.64it/s]


 58%|███████████████████▎             | 29246/50000 [5:18:07<3:34:53,  1.61it/s]


 58%|███████████████████▎             | 29247/50000 [5:18:08<3:37:27,  1.59it/s]


 58%|███████████████████▎             | 29248/50000 [5:18:09<3:59:31,  1.44it/s]


 58%|███████████████████▎             | 29249/50000 [5:18:09<3:54:38,  1.47it/s]


 58%|███████████████████▎             | 29250/50000 [5:18:10<3:46:38,  1.53it/s]


 59%|███████████████████▎             | 29251/50000 [5:18:11<3:43:39,  1.55it/s]


 59%|███████████████████▎             | 29252/50000 [5:18:11<3:38:15,  1.58it/s]


 59%|███████████████████▎             | 29253/50000 [5:18:12<3:43:03,  1.55it/s]


 59%|███████████████████▎             | 29254/50000 [5:18:13<3:38:32,  1.58it/s]


 59%|███████████████████▎             | 29255/50000 [5:18:13<3:34:42,  1.61it/s]


 59%|███████████████████▎             | 29256/50000 [5:18:14<3:49:59,  1.50it/s]


 59%|███████████████████▎             | 29257/50000 [5:18:15<3:58:04,  1.45it/s]


 59%|███████████████████▎             | 29258/50000 [5:18:15<3:55:34,  1.47it/s]


 59%|███████████████████▎             | 29259/50000 [5:18:16<3:47:54,  1.52it/s]


 59%|███████████████████▎             | 29260/50000 [5:18:17<3:37:58,  1.59it/s]


 59%|███████████████████▎             | 29261/50000 [5:18:17<3:37:35,  1.59it/s]


 59%|███████████████████▎             | 29262/50000 [5:18:18<3:46:07,  1.53it/s]


 59%|███████████████████▎             | 29263/50000 [5:18:19<3:47:34,  1.52it/s]


 59%|███████████████████▎             | 29264/50000 [5:18:19<3:46:19,  1.53it/s]


 59%|███████████████████▎             | 29265/50000 [5:18:20<3:37:31,  1.59it/s]


 59%|███████████████████▎             | 29266/50000 [5:18:20<3:37:57,  1.59it/s]


 59%|███████████████████▎             | 29267/50000 [5:18:21<3:24:49,  1.69it/s]


 59%|███████████████████▎             | 29268/50000 [5:18:22<3:39:02,  1.58it/s]


 59%|███████████████████▎             | 29269/50000 [5:18:22<3:42:14,  1.55it/s]


 59%|███████████████████▎             | 29270/50000 [5:18:23<3:37:53,  1.59it/s]


 59%|███████████████████▎             | 29271/50000 [5:18:24<3:42:04,  1.56it/s]


 59%|███████████████████▎             | 29272/50000 [5:18:24<4:08:36,  1.39it/s]


 59%|███████████████████▎             | 29273/50000 [5:18:25<4:12:24,  1.37it/s]


 59%|███████████████████▎             | 29274/50000 [5:18:26<3:59:37,  1.44it/s]


 59%|███████████████████▎             | 29275/50000 [5:18:26<3:56:35,  1.46it/s]


 59%|███████████████████▎             | 29276/50000 [5:18:27<3:59:45,  1.44it/s]


 59%|███████████████████▎             | 29277/50000 [5:18:28<3:53:26,  1.48it/s]


 59%|███████████████████▎             | 29278/50000 [5:18:28<3:52:58,  1.48it/s]


 59%|███████████████████▎             | 29279/50000 [5:18:29<3:58:43,  1.45it/s]


 59%|███████████████████▎             | 29280/50000 [5:18:30<3:55:18,  1.47it/s]


 59%|███████████████████▎             | 29281/50000 [5:18:31<3:52:14,  1.49it/s]


 59%|███████████████████▎             | 29282/50000 [5:18:31<3:56:06,  1.46it/s]


 59%|███████████████████▎             | 29283/50000 [5:18:32<3:58:47,  1.45it/s]


 59%|███████████████████▎             | 29284/50000 [5:18:33<4:05:26,  1.41it/s]


 59%|███████████████████▎             | 29285/50000 [5:18:33<4:06:12,  1.40it/s]


 59%|███████████████████▎             | 29286/50000 [5:18:34<3:57:24,  1.45it/s]


 59%|███████████████████▎             | 29287/50000 [5:18:35<3:42:53,  1.55it/s]


 59%|███████████████████▎             | 29288/50000 [5:18:35<3:41:09,  1.56it/s]


 59%|███████████████████▎             | 29289/50000 [5:18:36<3:45:48,  1.53it/s]


 59%|███████████████████▎             | 29290/50000 [5:18:37<4:02:44,  1.42it/s]


 59%|███████████████████▎             | 29291/50000 [5:18:37<3:50:27,  1.50it/s]


 59%|███████████████████▎             | 29292/50000 [5:18:38<3:38:36,  1.58it/s]


 59%|███████████████████▎             | 29293/50000 [5:18:38<3:34:09,  1.61it/s]


 59%|███████████████████▎             | 29294/50000 [5:18:39<3:38:09,  1.58it/s]


 59%|███████████████████▎             | 29295/50000 [5:18:40<3:52:57,  1.48it/s]


 59%|███████████████████▎             | 29296/50000 [5:18:41<3:47:58,  1.51it/s]


 59%|███████████████████▎             | 29297/50000 [5:18:41<4:07:23,  1.39it/s]


 59%|███████████████████▎             | 29298/50000 [5:18:42<4:01:22,  1.43it/s]


 59%|███████████████████▎             | 29299/50000 [5:18:43<3:50:09,  1.50it/s]


 59%|███████████████████▎             | 29300/50000 [5:18:43<3:46:24,  1.52it/s]
                                                                                
{'loss': 3.2449, 'grad_norm': 3.7545790672302246, 'learning_rate': 0.000414, 'epoch': 1.53}

 59%|███████████████████▎             | 29300/50000 [5:18:43<3:46:24,  1.52it/s]


 59%|███████████████████▎             | 29301/50000 [5:18:44<3:38:26,  1.58it/s]


 59%|███████████████████▎             | 29302/50000 [5:18:44<3:41:17,  1.56it/s]


 59%|███████████████████▎             | 29303/50000 [5:18:45<3:39:59,  1.57it/s]


 59%|███████████████████▎             | 29304/50000 [5:18:46<3:51:07,  1.49it/s]


 59%|███████████████████▎             | 29305/50000 [5:18:46<3:46:57,  1.52it/s]


 59%|███████████████████▎             | 29306/50000 [5:18:47<3:54:00,  1.47it/s]


 59%|███████████████████▎             | 29307/50000 [5:18:48<3:51:29,  1.49it/s]


 59%|███████████████████▎             | 29308/50000 [5:18:49<4:07:19,  1.39it/s]


 59%|███████████████████▎             | 29309/50000 [5:18:49<4:02:44,  1.42it/s]


 59%|███████████████████▎             | 29310/50000 [5:18:50<4:04:25,  1.41it/s]


 59%|███████████████████▎             | 29311/50000 [5:18:51<4:12:17,  1.37it/s]


 59%|███████████████████▎             | 29312/50000 [5:18:51<3:57:45,  1.45it/s]


 59%|███████████████████▎             | 29313/50000 [5:18:52<3:54:43,  1.47it/s]


 59%|███████████████████▎             | 29314/50000 [5:18:53<3:47:22,  1.52it/s]


 59%|███████████████████▎             | 29315/50000 [5:18:53<3:47:30,  1.52it/s]


 59%|███████████████████▎             | 29316/50000 [5:18:54<3:48:12,  1.51it/s]


 59%|███████████████████▎             | 29317/50000 [5:18:55<3:41:29,  1.56it/s]


 59%|███████████████████▎             | 29318/50000 [5:18:55<3:37:25,  1.59it/s]


 59%|███████████████████▎             | 29319/50000 [5:18:56<3:39:51,  1.57it/s]


 59%|███████████████████▎             | 29320/50000 [5:18:57<3:36:21,  1.59it/s]


 59%|███████████████████▎             | 29321/50000 [5:18:57<3:37:48,  1.58it/s]


 59%|███████████████████▎             | 29322/50000 [5:18:58<3:42:43,  1.55it/s]


 59%|███████████████████▎             | 29323/50000 [5:18:58<3:33:09,  1.62it/s]


 59%|███████████████████▎             | 29324/50000 [5:18:59<3:36:49,  1.59it/s]


 59%|███████████████████▎             | 29325/50000 [5:19:00<3:34:09,  1.61it/s]


 59%|███████████████████▎             | 29326/50000 [5:19:00<3:36:12,  1.59it/s]


 59%|███████████████████▎             | 29327/50000 [5:19:01<3:46:23,  1.52it/s]


 59%|███████████████████▎             | 29328/50000 [5:19:02<3:36:57,  1.59it/s]


 59%|███████████████████▎             | 29329/50000 [5:19:02<3:45:26,  1.53it/s]


 59%|███████████████████▎             | 29330/50000 [5:19:03<3:44:06,  1.54it/s]


 59%|███████████████████▎             | 29331/50000 [5:19:03<3:34:42,  1.60it/s]


 59%|███████████████████▎             | 29332/50000 [5:19:04<3:49:18,  1.50it/s]


 59%|███████████████████▎             | 29333/50000 [5:19:05<3:40:57,  1.56it/s]


 59%|███████████████████▎             | 29334/50000 [5:19:05<3:33:46,  1.61it/s]


 59%|███████████████████▎             | 29335/50000 [5:19:06<3:32:07,  1.62it/s]


 59%|███████████████████▎             | 29336/50000 [5:19:07<3:30:51,  1.63it/s]


 59%|███████████████████▎             | 29337/50000 [5:19:07<3:29:18,  1.65it/s]


 59%|███████████████████▎             | 29338/50000 [5:19:08<3:36:04,  1.59it/s]


 59%|███████████████████▎             | 29339/50000 [5:19:08<3:32:17,  1.62it/s]


 59%|███████████████████▎             | 29340/50000 [5:19:09<3:38:11,  1.58it/s]


 59%|███████████████████▎             | 29341/50000 [5:19:10<3:35:01,  1.60it/s]


 59%|███████████████████▎             | 29342/50000 [5:19:10<3:45:13,  1.53it/s]


 59%|███████████████████▎             | 29343/50000 [5:19:11<4:15:23,  1.35it/s]


 59%|███████████████████▎             | 29344/50000 [5:19:12<4:12:00,  1.37it/s]


 59%|███████████████████▎             | 29345/50000 [5:19:13<4:04:28,  1.41it/s]


 59%|███████████████████▎             | 29346/50000 [5:19:13<3:43:47,  1.54it/s]


 59%|███████████████████▎             | 29347/50000 [5:19:14<3:41:22,  1.55it/s]


 59%|███████████████████▎             | 29348/50000 [5:19:14<3:26:59,  1.66it/s]


 59%|███████████████████▎             | 29349/50000 [5:19:15<3:38:09,  1.58it/s]


 59%|███████████████████▎             | 29350/50000 [5:19:16<3:46:51,  1.52it/s]


 59%|███████████████████▎             | 29351/50000 [5:19:16<3:40:53,  1.56it/s]


 59%|███████████████████▎             | 29352/50000 [5:19:17<3:32:18,  1.62it/s]


 59%|███████████████████▎             | 29353/50000 [5:19:18<3:29:42,  1.64it/s]


 59%|███████████████████▎             | 29354/50000 [5:19:18<3:42:51,  1.54it/s]


 59%|███████████████████▎             | 29355/50000 [5:19:19<3:42:00,  1.55it/s]


 59%|███████████████████▎             | 29356/50000 [5:19:20<3:41:33,  1.55it/s]


 59%|███████████████████▍             | 29357/50000 [5:19:20<3:59:35,  1.44it/s]


 59%|███████████████████▍             | 29358/50000 [5:19:21<3:48:41,  1.50it/s]


 59%|███████████████████▍             | 29359/50000 [5:19:22<4:00:55,  1.43it/s]


 59%|███████████████████▍             | 29360/50000 [5:19:22<3:49:36,  1.50it/s]


 59%|███████████████████▍             | 29361/50000 [5:19:23<4:09:19,  1.38it/s]


 59%|███████████████████▍             | 29362/50000 [5:19:24<3:56:14,  1.46it/s]


 59%|███████████████████▍             | 29363/50000 [5:19:25<3:51:30,  1.49it/s]


 59%|███████████████████▍             | 29364/50000 [5:19:25<3:49:30,  1.50it/s]


 59%|███████████████████▍             | 29365/50000 [5:19:26<3:38:57,  1.57it/s]


 59%|███████████████████▍             | 29366/50000 [5:19:26<3:35:55,  1.59it/s]


 59%|███████████████████▍             | 29367/50000 [5:19:27<3:29:19,  1.64it/s]


 59%|███████████████████▍             | 29368/50000 [5:19:28<3:33:09,  1.61it/s]


 59%|███████████████████▍             | 29369/50000 [5:19:28<3:35:51,  1.59it/s]


 59%|███████████████████▍             | 29370/50000 [5:19:29<3:37:04,  1.58it/s]


 59%|███████████████████▍             | 29371/50000 [5:19:29<3:33:20,  1.61it/s]


 59%|███████████████████▍             | 29372/50000 [5:19:30<3:37:02,  1.58it/s]


 59%|███████████████████▍             | 29373/50000 [5:19:31<3:31:54,  1.62it/s]


 59%|███████████████████▍             | 29374/50000 [5:19:31<3:21:40,  1.70it/s]


 59%|███████████████████▍             | 29375/50000 [5:19:32<3:31:02,  1.63it/s]


 59%|███████████████████▍             | 29376/50000 [5:19:32<3:24:36,  1.68it/s]


 59%|███████████████████▍             | 29377/50000 [5:19:33<3:32:30,  1.62it/s]


 59%|███████████████████▍             | 29378/50000 [5:19:34<3:37:57,  1.58it/s]


 59%|███████████████████▍             | 29379/50000 [5:19:34<3:41:47,  1.55it/s]


 59%|███████████████████▍             | 29380/50000 [5:19:36<4:24:41,  1.30it/s]


 59%|███████████████████▍             | 29381/50000 [5:19:36<4:28:44,  1.28it/s]


 59%|███████████████████▍             | 29382/50000 [5:19:37<4:23:28,  1.30it/s]


 59%|███████████████████▍             | 29383/50000 [5:19:38<4:14:18,  1.35it/s]


 59%|███████████████████▍             | 29384/50000 [5:19:38<4:00:03,  1.43it/s]


 59%|███████████████████▍             | 29385/50000 [5:19:39<3:54:37,  1.46it/s]


 59%|███████████████████▍             | 29386/50000 [5:19:40<4:11:01,  1.37it/s]


 59%|███████████████████▍             | 29387/50000 [5:19:40<4:02:15,  1.42it/s]


 59%|███████████████████▍             | 29388/50000 [5:19:41<3:46:27,  1.52it/s]


 59%|███████████████████▍             | 29389/50000 [5:19:42<3:38:33,  1.57it/s]


 59%|███████████████████▍             | 29390/50000 [5:19:42<3:34:56,  1.60it/s]


 59%|███████████████████▍             | 29391/50000 [5:19:43<3:36:22,  1.59it/s]


 59%|███████████████████▍             | 29392/50000 [5:19:44<4:06:05,  1.40it/s]


 59%|███████████████████▍             | 29393/50000 [5:19:44<3:46:05,  1.52it/s]


 59%|███████████████████▍             | 29394/50000 [5:19:45<3:37:39,  1.58it/s]


 59%|███████████████████▍             | 29395/50000 [5:19:46<3:52:50,  1.47it/s]


 59%|███████████████████▍             | 29396/50000 [5:19:46<3:53:29,  1.47it/s]


 59%|███████████████████▍             | 29397/50000 [5:19:47<4:03:55,  1.41it/s]


 59%|███████████████████▍             | 29398/50000 [5:19:48<4:01:11,  1.42it/s]


 59%|███████████████████▍             | 29399/50000 [5:19:48<3:57:21,  1.45it/s]


 59%|███████████████████▍             | 29400/50000 [5:19:49<3:47:28,  1.51it/s]
                                                                                
{'loss': 3.2904, 'grad_norm': 2.822556257247925, 'learning_rate': 0.000412, 'epoch': 1.54}

 59%|███████████████████▍             | 29400/50000 [5:19:49<3:47:28,  1.51it/s]


 59%|███████████████████▍             | 29401/50000 [5:19:50<3:39:56,  1.56it/s]


 59%|███████████████████▍             | 29402/50000 [5:19:50<3:41:49,  1.55it/s]


 59%|███████████████████▍             | 29403/50000 [5:19:51<3:55:09,  1.46it/s]


 59%|███████████████████▍             | 29404/50000 [5:19:52<3:53:02,  1.47it/s]


 59%|███████████████████▍             | 29405/50000 [5:19:52<3:59:35,  1.43it/s]


 59%|███████████████████▍             | 29406/50000 [5:19:53<3:45:26,  1.52it/s]


 59%|███████████████████▍             | 29407/50000 [5:19:54<3:51:05,  1.49it/s]


 59%|███████████████████▍             | 29408/50000 [5:19:54<3:43:55,  1.53it/s]


 59%|███████████████████▍             | 29409/50000 [5:19:55<3:43:56,  1.53it/s]


 59%|███████████████████▍             | 29410/50000 [5:19:56<3:40:40,  1.56it/s]


 59%|███████████████████▍             | 29411/50000 [5:19:56<3:48:39,  1.50it/s]


 59%|███████████████████▍             | 29412/50000 [5:19:57<3:46:03,  1.52it/s]


 59%|███████████████████▍             | 29413/50000 [5:19:58<3:48:37,  1.50it/s]


 59%|███████████████████▍             | 29414/50000 [5:19:58<3:33:15,  1.61it/s]


 59%|███████████████████▍             | 29415/50000 [5:19:59<3:57:22,  1.45it/s]


 59%|███████████████████▍             | 29416/50000 [5:20:00<4:02:01,  1.42it/s]


 59%|███████████████████▍             | 29417/50000 [5:20:00<3:49:24,  1.50it/s]


 59%|███████████████████▍             | 29418/50000 [5:20:01<3:42:28,  1.54it/s]


 59%|███████████████████▍             | 29419/50000 [5:20:02<3:37:36,  1.58it/s]


 59%|███████████████████▍             | 29420/50000 [5:20:02<3:48:07,  1.50it/s]


 59%|███████████████████▍             | 29421/50000 [5:20:03<3:40:54,  1.55it/s]


 59%|███████████████████▍             | 29422/50000 [5:20:03<3:35:11,  1.59it/s]


 59%|███████████████████▍             | 29423/50000 [5:20:04<3:31:13,  1.62it/s]


 59%|███████████████████▍             | 29424/50000 [5:20:05<3:26:31,  1.66it/s]


 59%|███████████████████▍             | 29425/50000 [5:20:05<3:27:03,  1.66it/s]


 59%|███████████████████▍             | 29426/50000 [5:20:06<3:41:59,  1.54it/s]


 59%|███████████████████▍             | 29427/50000 [5:20:07<3:51:49,  1.48it/s]


 59%|███████████████████▍             | 29428/50000 [5:20:07<3:46:09,  1.52it/s]


 59%|███████████████████▍             | 29429/50000 [5:20:08<3:37:37,  1.58it/s]


 59%|███████████████████▍             | 29430/50000 [5:20:09<3:33:31,  1.61it/s]


 59%|███████████████████▍             | 29431/50000 [5:20:09<3:28:09,  1.65it/s]


 59%|███████████████████▍             | 29432/50000 [5:20:10<3:36:01,  1.59it/s]


 59%|███████████████████▍             | 29433/50000 [5:20:10<3:40:13,  1.56it/s]


 59%|███████████████████▍             | 29434/50000 [5:20:11<3:32:52,  1.61it/s]


 59%|███████████████████▍             | 29435/50000 [5:20:12<3:30:58,  1.62it/s]


 59%|███████████████████▍             | 29436/50000 [5:20:12<3:28:13,  1.65it/s]


 59%|███████████████████▍             | 29437/50000 [5:20:13<3:30:54,  1.62it/s]


 59%|███████████████████▍             | 29438/50000 [5:20:13<3:32:46,  1.61it/s]


 59%|███████████████████▍             | 29439/50000 [5:20:14<3:42:46,  1.54it/s]


 59%|███████████████████▍             | 29440/50000 [5:20:15<3:38:23,  1.57it/s]


 59%|███████████████████▍             | 29441/50000 [5:20:16<3:50:32,  1.49it/s]


 59%|███████████████████▍             | 29442/50000 [5:20:16<3:46:47,  1.51it/s]


 59%|███████████████████▍             | 29443/50000 [5:20:17<3:32:17,  1.61it/s]


 59%|███████████████████▍             | 29444/50000 [5:20:17<3:35:17,  1.59it/s]


 59%|███████████████████▍             | 29445/50000 [5:20:18<3:46:40,  1.51it/s]


 59%|███████████████████▍             | 29446/50000 [5:20:19<3:44:33,  1.53it/s]


 59%|███████████████████▍             | 29447/50000 [5:20:19<3:38:31,  1.57it/s]


 59%|███████████████████▍             | 29448/50000 [5:20:20<3:46:03,  1.52it/s]


 59%|███████████████████▍             | 29449/50000 [5:20:21<3:42:53,  1.54it/s]


 59%|███████████████████▍             | 29450/50000 [5:20:21<3:41:55,  1.54it/s]


 59%|███████████████████▍             | 29451/50000 [5:20:22<3:35:24,  1.59it/s]


 59%|███████████████████▍             | 29452/50000 [5:20:22<3:27:38,  1.65it/s]


 59%|███████████████████▍             | 29453/50000 [5:20:23<3:41:16,  1.55it/s]


 59%|███████████████████▍             | 29454/50000 [5:20:24<3:49:00,  1.50it/s]


 59%|███████████████████▍             | 29455/50000 [5:20:25<4:26:49,  1.28it/s]


 59%|███████████████████▍             | 29456/50000 [5:20:26<4:06:52,  1.39it/s]


 59%|███████████████████▍             | 29457/50000 [5:20:26<4:17:27,  1.33it/s]


 59%|███████████████████▍             | 29458/50000 [5:20:27<4:08:06,  1.38it/s]


 59%|███████████████████▍             | 29459/50000 [5:20:28<4:06:46,  1.39it/s]


 59%|███████████████████▍             | 29460/50000 [5:20:28<3:59:18,  1.43it/s]


 59%|███████████████████▍             | 29461/50000 [5:20:29<3:49:31,  1.49it/s]


 59%|███████████████████▍             | 29462/50000 [5:20:30<3:56:27,  1.45it/s]


 59%|███████████████████▍             | 29463/50000 [5:20:30<3:50:06,  1.49it/s]


 59%|███████████████████▍             | 29464/50000 [5:20:31<3:46:56,  1.51it/s]


 59%|███████████████████▍             | 29465/50000 [5:20:32<3:40:49,  1.55it/s]


 59%|███████████████████▍             | 29466/50000 [5:20:32<3:35:14,  1.59it/s]


 59%|███████████████████▍             | 29467/50000 [5:20:33<3:29:27,  1.63it/s]


 59%|███████████████████▍             | 29468/50000 [5:20:33<3:24:44,  1.67it/s]


 59%|███████████████████▍             | 29469/50000 [5:20:34<3:47:59,  1.50it/s]


 59%|███████████████████▍             | 29470/50000 [5:20:35<3:40:15,  1.55it/s]


 59%|███████████████████▍             | 29471/50000 [5:20:35<3:35:06,  1.59it/s]


 59%|███████████████████▍             | 29472/50000 [5:20:36<3:35:15,  1.59it/s]


 59%|███████████████████▍             | 29473/50000 [5:20:37<4:08:26,  1.38it/s]


 59%|███████████████████▍             | 29474/50000 [5:20:38<3:54:33,  1.46it/s]


 59%|███████████████████▍             | 29475/50000 [5:20:38<3:46:09,  1.51it/s]


 59%|███████████████████▍             | 29476/50000 [5:20:39<3:34:30,  1.59it/s]


 59%|███████████████████▍             | 29477/50000 [5:20:40<3:53:34,  1.46it/s]


 59%|███████████████████▍             | 29478/50000 [5:20:40<3:53:14,  1.47it/s]


 59%|███████████████████▍             | 29479/50000 [5:20:41<4:06:33,  1.39it/s]


 59%|███████████████████▍             | 29480/50000 [5:20:42<3:51:58,  1.47it/s]


 59%|███████████████████▍             | 29481/50000 [5:20:42<3:38:30,  1.57it/s]


 59%|███████████████████▍             | 29482/50000 [5:20:43<3:29:06,  1.64it/s]


 59%|███████████████████▍             | 29483/50000 [5:20:43<3:25:05,  1.67it/s]


 59%|███████████████████▍             | 29484/50000 [5:20:44<3:32:39,  1.61it/s]


 59%|███████████████████▍             | 29485/50000 [5:20:45<3:30:46,  1.62it/s]


 59%|███████████████████▍             | 29486/50000 [5:20:45<3:27:30,  1.65it/s]


 59%|███████████████████▍             | 29487/50000 [5:20:46<3:52:21,  1.47it/s]


 59%|███████████████████▍             | 29488/50000 [5:20:47<4:10:25,  1.37it/s]


 59%|███████████████████▍             | 29489/50000 [5:20:47<3:51:50,  1.47it/s]


 59%|███████████████████▍             | 29490/50000 [5:20:48<4:02:25,  1.41it/s]


 59%|███████████████████▍             | 29491/50000 [5:20:49<3:58:31,  1.43it/s]


 59%|███████████████████▍             | 29492/50000 [5:20:50<4:00:38,  1.42it/s]


 59%|███████████████████▍             | 29493/50000 [5:20:50<3:53:28,  1.46it/s]


 59%|███████████████████▍             | 29494/50000 [5:20:51<3:43:55,  1.53it/s]


 59%|███████████████████▍             | 29495/50000 [5:20:51<3:51:54,  1.47it/s]


 59%|███████████████████▍             | 29496/50000 [5:20:52<3:49:23,  1.49it/s]


 59%|███████████████████▍             | 29497/50000 [5:20:53<3:40:34,  1.55it/s]


 59%|███████████████████▍             | 29498/50000 [5:20:53<3:32:24,  1.61it/s]


 59%|███████████████████▍             | 29499/50000 [5:20:54<3:27:21,  1.65it/s]


 59%|███████████████████▍             | 29500/50000 [5:20:54<3:22:32,  1.69it/s]
                                                                                
{'loss': 3.2841, 'grad_norm': 2.966561794281006, 'learning_rate': 0.00041, 'epoch': 1.54}

 59%|███████████████████▍             | 29500/50000 [5:20:54<3:22:32,  1.69it/s]


 59%|███████████████████▍             | 29501/50000 [5:20:55<3:27:32,  1.65it/s]


 59%|███████████████████▍             | 29502/50000 [5:20:56<3:33:39,  1.60it/s]


 59%|███████████████████▍             | 29503/50000 [5:20:56<3:26:17,  1.66it/s]


 59%|███████████████████▍             | 29504/50000 [5:20:57<3:32:45,  1.61it/s]


 59%|███████████████████▍             | 29505/50000 [5:20:58<3:36:43,  1.58it/s]


 59%|███████████████████▍             | 29506/50000 [5:20:58<3:32:14,  1.61it/s]


 59%|███████████████████▍             | 29507/50000 [5:20:59<3:33:35,  1.60it/s]


 59%|███████████████████▍             | 29508/50000 [5:20:59<3:30:29,  1.62it/s]


 59%|███████████████████▍             | 29509/50000 [5:21:00<3:31:35,  1.61it/s]


 59%|███████████████████▍             | 29510/50000 [5:21:01<3:42:26,  1.54it/s]


 59%|███████████████████▍             | 29511/50000 [5:21:02<3:51:24,  1.48it/s]


 59%|███████████████████▍             | 29512/50000 [5:21:02<3:44:27,  1.52it/s]


 59%|███████████████████▍             | 29513/50000 [5:21:03<3:52:10,  1.47it/s]


 59%|███████████████████▍             | 29514/50000 [5:21:03<3:46:59,  1.50it/s]


 59%|███████████████████▍             | 29515/50000 [5:21:04<3:55:53,  1.45it/s]


 59%|███████████████████▍             | 29516/50000 [5:21:05<3:45:44,  1.51it/s]


 59%|███████████████████▍             | 29517/50000 [5:21:05<3:37:54,  1.57it/s]


 59%|███████████████████▍             | 29518/50000 [5:21:06<3:33:41,  1.60it/s]


 59%|███████████████████▍             | 29519/50000 [5:21:07<3:52:45,  1.47it/s]


 59%|███████████████████▍             | 29520/50000 [5:21:08<4:00:36,  1.42it/s]


 59%|███████████████████▍             | 29521/50000 [5:21:08<3:54:07,  1.46it/s]


 59%|███████████████████▍             | 29522/50000 [5:21:09<3:43:11,  1.53it/s]


 59%|███████████████████▍             | 29523/50000 [5:21:10<3:50:08,  1.48it/s]


 59%|███████████████████▍             | 29524/50000 [5:21:10<3:29:10,  1.63it/s]


 59%|███████████████████▍             | 29525/50000 [5:21:11<3:23:50,  1.67it/s]


 59%|███████████████████▍             | 29526/50000 [5:21:11<3:29:39,  1.63it/s]


 59%|███████████████████▍             | 29527/50000 [5:21:12<3:34:26,  1.59it/s]


 59%|███████████████████▍             | 29528/50000 [5:21:13<3:49:45,  1.49it/s]


 59%|███████████████████▍             | 29529/50000 [5:21:13<3:40:49,  1.54it/s]


 59%|███████████████████▍             | 29530/50000 [5:21:14<3:30:38,  1.62it/s]


 59%|███████████████████▍             | 29531/50000 [5:21:14<3:32:30,  1.61it/s]


 59%|███████████████████▍             | 29532/50000 [5:21:15<3:30:31,  1.62it/s]


 59%|███████████████████▍             | 29533/50000 [5:21:16<3:53:01,  1.46it/s]


 59%|███████████████████▍             | 29534/50000 [5:21:17<3:59:23,  1.42it/s]


 59%|███████████████████▍             | 29535/50000 [5:21:17<3:56:30,  1.44it/s]


 59%|███████████████████▍             | 29536/50000 [5:21:18<4:00:34,  1.42it/s]


 59%|███████████████████▍             | 29537/50000 [5:21:19<3:48:50,  1.49it/s]


 59%|███████████████████▍             | 29538/50000 [5:21:19<3:48:26,  1.49it/s]


 59%|███████████████████▍             | 29539/50000 [5:21:20<3:33:39,  1.60it/s]


 59%|███████████████████▍             | 29540/50000 [5:21:20<3:37:09,  1.57it/s]


 59%|███████████████████▍             | 29541/50000 [5:21:21<3:36:12,  1.58it/s]


 59%|███████████████████▍             | 29542/50000 [5:21:22<3:35:29,  1.58it/s]


 59%|███████████████████▍             | 29543/50000 [5:21:23<3:57:33,  1.44it/s]


 59%|███████████████████▍             | 29544/50000 [5:21:23<3:52:57,  1.46it/s]


 59%|███████████████████▍             | 29545/50000 [5:21:24<4:00:27,  1.42it/s]


 59%|███████████████████▌             | 29546/50000 [5:21:25<3:46:35,  1.50it/s]


 59%|███████████████████▌             | 29547/50000 [5:21:25<3:35:00,  1.59it/s]


 59%|███████████████████▌             | 29548/50000 [5:21:26<3:43:58,  1.52it/s]


 59%|███████████████████▌             | 29549/50000 [5:21:27<3:47:20,  1.50it/s]


 59%|███████████████████▌             | 29550/50000 [5:21:27<3:47:20,  1.50it/s]


 59%|███████████████████▌             | 29551/50000 [5:21:28<3:48:01,  1.49it/s]


 59%|███████████████████▌             | 29552/50000 [5:21:28<3:45:16,  1.51it/s]


 59%|███████████████████▌             | 29553/50000 [5:21:29<3:36:50,  1.57it/s]


 59%|███████████████████▌             | 29554/50000 [5:21:30<3:31:34,  1.61it/s]


 59%|███████████████████▌             | 29555/50000 [5:21:30<3:29:45,  1.62it/s]


 59%|███████████████████▌             | 29556/50000 [5:21:31<3:27:49,  1.64it/s]


 59%|███████████████████▌             | 29557/50000 [5:21:31<3:23:20,  1.68it/s]


 59%|███████████████████▌             | 29558/50000 [5:21:32<3:31:40,  1.61it/s]


 59%|███████████████████▌             | 29559/50000 [5:21:33<3:41:21,  1.54it/s]


 59%|███████████████████▌             | 29560/50000 [5:21:33<3:44:17,  1.52it/s]


 59%|███████████████████▌             | 29561/50000 [5:21:34<3:55:44,  1.44it/s]


 59%|███████████████████▌             | 29562/50000 [5:21:35<3:45:24,  1.51it/s]


 59%|███████████████████▌             | 29563/50000 [5:21:36<3:43:18,  1.53it/s]


 59%|███████████████████▌             | 29564/50000 [5:21:36<3:34:44,  1.59it/s]


 59%|███████████████████▌             | 29565/50000 [5:21:37<3:30:03,  1.62it/s]


 59%|███████████████████▌             | 29566/50000 [5:21:37<3:28:03,  1.64it/s]


 59%|███████████████████▌             | 29567/50000 [5:21:38<3:22:51,  1.68it/s]


 59%|███████████████████▌             | 29568/50000 [5:21:38<3:30:37,  1.62it/s]


 59%|███████████████████▌             | 29569/50000 [5:21:39<3:34:00,  1.59it/s]


 59%|███████████████████▌             | 29570/50000 [5:21:40<3:53:07,  1.46it/s]


 59%|███████████████████▌             | 29571/50000 [5:21:40<3:35:15,  1.58it/s]


 59%|███████████████████▌             | 29572/50000 [5:21:41<3:27:37,  1.64it/s]


 59%|███████████████████▌             | 29573/50000 [5:21:42<3:39:17,  1.55it/s]


 59%|███████████████████▌             | 29574/50000 [5:21:42<3:33:00,  1.60it/s]


 59%|███████████████████▌             | 29575/50000 [5:21:43<3:27:59,  1.64it/s]


 59%|███████████████████▌             | 29576/50000 [5:21:44<3:52:10,  1.47it/s]


 59%|███████████████████▌             | 29577/50000 [5:21:44<3:43:57,  1.52it/s]


 59%|███████████████████▌             | 29578/50000 [5:21:45<3:52:37,  1.46it/s]


 59%|███████████████████▌             | 29579/50000 [5:21:46<3:46:50,  1.50it/s]


 59%|███████████████████▌             | 29580/50000 [5:21:46<3:39:26,  1.55it/s]


 59%|███████████████████▌             | 29581/50000 [5:21:47<3:48:51,  1.49it/s]


 59%|███████████████████▌             | 29582/50000 [5:21:48<3:46:45,  1.50it/s]


 59%|███████████████████▌             | 29583/50000 [5:21:48<3:45:16,  1.51it/s]


 59%|███████████████████▌             | 29584/50000 [5:21:49<3:51:35,  1.47it/s]


 59%|███████████████████▌             | 29585/50000 [5:21:50<3:51:58,  1.47it/s]


 59%|███████████████████▌             | 29586/50000 [5:21:50<3:55:42,  1.44it/s]


 59%|███████████████████▌             | 29587/50000 [5:21:51<3:50:51,  1.47it/s]


 59%|███████████████████▌             | 29588/50000 [5:21:52<3:34:54,  1.58it/s]


 59%|███████████████████▌             | 29589/50000 [5:21:52<3:31:09,  1.61it/s]


 59%|███████████████████▌             | 29590/50000 [5:21:53<3:37:38,  1.56it/s]


 59%|███████████████████▌             | 29591/50000 [5:21:54<3:42:05,  1.53it/s]


 59%|███████████████████▌             | 29592/50000 [5:21:54<3:24:23,  1.66it/s]


 59%|███████████████████▌             | 29593/50000 [5:21:55<3:20:36,  1.70it/s]


 59%|███████████████████▌             | 29594/50000 [5:21:55<3:29:31,  1.62it/s]


 59%|███████████████████▌             | 29595/50000 [5:21:56<3:28:00,  1.64it/s]


 59%|███████████████████▌             | 29596/50000 [5:21:57<3:33:26,  1.59it/s]


 59%|███████████████████▌             | 29597/50000 [5:21:57<3:39:09,  1.55it/s]


 59%|███████████████████▌             | 29598/50000 [5:21:58<3:41:52,  1.53it/s]


 59%|███████████████████▌             | 29599/50000 [5:21:59<3:42:28,  1.53it/s]


 59%|███████████████████▌             | 29600/50000 [5:21:59<3:55:27,  1.44it/s]
                                                                                
{'loss': 3.2597, 'grad_norm': 3.9621291160583496, 'learning_rate': 0.000408, 'epoch': 1.55}

 59%|███████████████████▌             | 29600/50000 [5:21:59<3:55:27,  1.44it/s]


 59%|███████████████████▌             | 29601/50000 [5:22:00<3:46:48,  1.50it/s]


 59%|███████████████████▌             | 29602/50000 [5:22:01<3:53:54,  1.45it/s]


 59%|███████████████████▌             | 29603/50000 [5:22:01<3:43:10,  1.52it/s]


 59%|███████████████████▌             | 29604/50000 [5:22:02<3:45:16,  1.51it/s]


 59%|███████████████████▌             | 29605/50000 [5:22:03<3:43:30,  1.52it/s]


 59%|███████████████████▌             | 29606/50000 [5:22:03<3:35:53,  1.57it/s]


 59%|███████████████████▌             | 29607/50000 [5:22:04<3:36:19,  1.57it/s]


 59%|███████████████████▌             | 29608/50000 [5:22:05<3:36:20,  1.57it/s]


 59%|███████████████████▌             | 29609/50000 [5:22:05<3:31:08,  1.61it/s]


 59%|███████████████████▌             | 29610/50000 [5:22:06<3:34:13,  1.59it/s]


 59%|███████████████████▌             | 29611/50000 [5:22:06<3:27:26,  1.64it/s]


 59%|███████████████████▌             | 29612/50000 [5:22:07<3:22:05,  1.68it/s]


 59%|███████████████████▌             | 29613/50000 [5:22:08<3:29:13,  1.62it/s]


 59%|███████████████████▌             | 29614/50000 [5:22:08<3:39:11,  1.55it/s]


 59%|███████████████████▌             | 29615/50000 [5:22:09<3:37:27,  1.56it/s]


 59%|███████████████████▌             | 29616/50000 [5:22:10<3:38:05,  1.56it/s]


 59%|███████████████████▌             | 29617/50000 [5:22:10<3:42:24,  1.53it/s]


 59%|███████████████████▌             | 29618/50000 [5:22:11<3:42:08,  1.53it/s]


 59%|███████████████████▌             | 29619/50000 [5:22:12<3:58:41,  1.42it/s]


 59%|███████████████████▌             | 29620/50000 [5:22:12<4:04:09,  1.39it/s]


 59%|███████████████████▌             | 29621/50000 [5:22:13<3:51:32,  1.47it/s]


 59%|███████████████████▌             | 29622/50000 [5:22:14<3:38:18,  1.56it/s]


 59%|███████████████████▌             | 29623/50000 [5:22:14<3:32:52,  1.60it/s]


 59%|███████████████████▌             | 29624/50000 [5:22:15<3:37:22,  1.56it/s]


 59%|███████████████████▌             | 29625/50000 [5:22:16<3:44:24,  1.51it/s]


 59%|███████████████████▌             | 29626/50000 [5:22:16<3:50:14,  1.47it/s]


 59%|███████████████████▌             | 29627/50000 [5:22:17<3:40:33,  1.54it/s]


 59%|███████████████████▌             | 29628/50000 [5:22:17<3:30:35,  1.61it/s]


 59%|███████████████████▌             | 29629/50000 [5:22:18<3:31:19,  1.61it/s]


 59%|███████████████████▌             | 29630/50000 [5:22:19<3:33:43,  1.59it/s]


 59%|███████████████████▌             | 29631/50000 [5:22:19<3:26:25,  1.64it/s]


 59%|███████████████████▌             | 29632/50000 [5:22:20<3:26:24,  1.64it/s]


 59%|███████████████████▌             | 29633/50000 [5:22:21<3:32:10,  1.60it/s]


 59%|███████████████████▌             | 29634/50000 [5:22:21<3:34:57,  1.58it/s]


 59%|███████████████████▌             | 29635/50000 [5:22:22<3:47:31,  1.49it/s]


 59%|███████████████████▌             | 29636/50000 [5:22:22<3:36:04,  1.57it/s]


 59%|███████████████████▌             | 29637/50000 [5:22:23<3:30:37,  1.61it/s]


 59%|███████████████████▌             | 29638/50000 [5:22:24<3:24:53,  1.66it/s]


 59%|███████████████████▌             | 29639/50000 [5:22:24<3:27:54,  1.63it/s]


 59%|███████████████████▌             | 29640/50000 [5:22:25<3:37:48,  1.56it/s]


 59%|███████████████████▌             | 29641/50000 [5:22:26<3:36:55,  1.56it/s]


 59%|███████████████████▌             | 29642/50000 [5:22:26<3:44:04,  1.51it/s]


 59%|███████████████████▌             | 29643/50000 [5:22:27<3:59:18,  1.42it/s]


 59%|███████████████████▌             | 29644/50000 [5:22:28<3:50:54,  1.47it/s]


 59%|███████████████████▌             | 29645/50000 [5:22:28<3:41:33,  1.53it/s]


 59%|███████████████████▌             | 29646/50000 [5:22:29<3:41:26,  1.53it/s]


 59%|███████████████████▌             | 29647/50000 [5:22:30<3:40:05,  1.54it/s]


 59%|███████████████████▌             | 29648/50000 [5:22:30<3:51:37,  1.46it/s]


 59%|███████████████████▌             | 29649/50000 [5:22:31<3:40:48,  1.54it/s]


 59%|███████████████████▌             | 29650/50000 [5:22:32<3:49:31,  1.48it/s]


 59%|███████████████████▌             | 29651/50000 [5:22:32<3:39:57,  1.54it/s]


 59%|███████████████████▌             | 29652/50000 [5:22:33<3:32:38,  1.59it/s]


 59%|███████████████████▌             | 29653/50000 [5:22:33<3:25:37,  1.65it/s]


 59%|███████████████████▌             | 29654/50000 [5:22:34<3:25:16,  1.65it/s]


 59%|███████████████████▌             | 29655/50000 [5:22:35<3:32:35,  1.60it/s]


 59%|███████████████████▌             | 29656/50000 [5:22:36<3:57:02,  1.43it/s]


 59%|███████████████████▌             | 29657/50000 [5:22:36<3:52:14,  1.46it/s]


 59%|███████████████████▌             | 29658/50000 [5:22:37<4:04:57,  1.38it/s]


 59%|███████████████████▌             | 29659/50000 [5:22:38<4:00:22,  1.41it/s]


 59%|███████████████████▌             | 29660/50000 [5:22:39<4:15:17,  1.33it/s]


 59%|███████████████████▌             | 29661/50000 [5:22:39<4:24:57,  1.28it/s]


 59%|███████████████████▌             | 29662/50000 [5:22:40<4:12:25,  1.34it/s]


 59%|███████████████████▌             | 29663/50000 [5:22:41<3:57:11,  1.43it/s]


 59%|███████████████████▌             | 29664/50000 [5:22:41<3:44:00,  1.51it/s]


 59%|███████████████████▌             | 29665/50000 [5:22:42<3:36:40,  1.56it/s]


 59%|███████████████████▌             | 29666/50000 [5:22:42<3:38:40,  1.55it/s]


 59%|███████████████████▌             | 29667/50000 [5:22:43<3:55:23,  1.44it/s]


 59%|███████████████████▌             | 29668/50000 [5:22:44<3:42:43,  1.52it/s]


 59%|███████████████████▌             | 29669/50000 [5:22:44<3:37:01,  1.56it/s]


 59%|███████████████████▌             | 29670/50000 [5:22:45<3:38:09,  1.55it/s]


 59%|███████████████████▌             | 29671/50000 [5:22:46<3:44:48,  1.51it/s]


 59%|███████████████████▌             | 29672/50000 [5:22:46<3:38:22,  1.55it/s]


 59%|███████████████████▌             | 29673/50000 [5:22:47<3:32:05,  1.60it/s]


 59%|███████████████████▌             | 29674/50000 [5:22:48<3:36:03,  1.57it/s]


 59%|███████████████████▌             | 29675/50000 [5:22:48<3:29:49,  1.61it/s]


 59%|███████████████████▌             | 29676/50000 [5:22:49<3:53:48,  1.45it/s]


 59%|███████████████████▌             | 29677/50000 [5:22:50<3:42:55,  1.52it/s]


 59%|███████████████████▌             | 29678/50000 [5:22:50<3:37:08,  1.56it/s]


 59%|███████████████████▌             | 29679/50000 [5:22:51<3:38:14,  1.55it/s]


 59%|███████████████████▌             | 29680/50000 [5:22:52<3:33:13,  1.59it/s]


 59%|███████████████████▌             | 29681/50000 [5:22:52<3:36:45,  1.56it/s]


 59%|███████████████████▌             | 29682/50000 [5:22:53<3:39:53,  1.54it/s]


 59%|███████████████████▌             | 29683/50000 [5:22:53<3:33:44,  1.58it/s]


 59%|███████████████████▌             | 29684/50000 [5:22:54<3:26:54,  1.64it/s]


 59%|███████████████████▌             | 29685/50000 [5:22:55<3:30:24,  1.61it/s]


 59%|███████████████████▌             | 29686/50000 [5:22:55<3:40:50,  1.53it/s]


 59%|███████████████████▌             | 29687/50000 [5:22:56<3:52:04,  1.46it/s]


 59%|███████████████████▌             | 29688/50000 [5:22:57<3:44:08,  1.51it/s]


 59%|███████████████████▌             | 29689/50000 [5:22:57<3:34:10,  1.58it/s]


 59%|███████████████████▌             | 29690/50000 [5:22:58<3:35:28,  1.57it/s]


 59%|███████████████████▌             | 29691/50000 [5:22:59<3:39:38,  1.54it/s]


 59%|███████████████████▌             | 29692/50000 [5:22:59<3:26:10,  1.64it/s]


 59%|███████████████████▌             | 29693/50000 [5:23:00<3:38:32,  1.55it/s]


 59%|███████████████████▌             | 29694/50000 [5:23:01<3:41:09,  1.53it/s]


 59%|███████████████████▌             | 29695/50000 [5:23:01<3:39:08,  1.54it/s]


 59%|███████████████████▌             | 29696/50000 [5:23:02<3:40:16,  1.54it/s]


 59%|███████████████████▌             | 29697/50000 [5:23:03<3:41:04,  1.53it/s]


 59%|███████████████████▌             | 29698/50000 [5:23:03<3:39:04,  1.54it/s]


 59%|███████████████████▌             | 29699/50000 [5:23:04<3:24:30,  1.65it/s]


 59%|███████████████████▌             | 29700/50000 [5:23:04<3:38:32,  1.55it/s]


                                                                                
{'loss': 3.2755, 'grad_norm': 3.0455243587493896, 'learning_rate': 0.00040600000000000006, 'epoch': 1.55}

 59%|███████████████████▌             | 29700/50000 [5:23:04<3:38:32,  1.55it/s]


 59%|███████████████████▌             | 29701/50000 [5:23:05<3:45:02,  1.50it/s]


 59%|███████████████████▌             | 29702/50000 [5:23:06<3:44:22,  1.51it/s]


 59%|███████████████████▌             | 29703/50000 [5:23:07<3:59:15,  1.41it/s]


 59%|███████████████████▌             | 29704/50000 [5:23:07<3:39:16,  1.54it/s]


 59%|███████████████████▌             | 29705/50000 [5:23:08<3:32:01,  1.60it/s]


 59%|███████████████████▌             | 29706/50000 [5:23:08<3:30:06,  1.61it/s]


 59%|███████████████████▌             | 29707/50000 [5:23:09<3:23:03,  1.67it/s]


 59%|███████████████████▌             | 29708/50000 [5:23:10<3:30:52,  1.60it/s]


 59%|███████████████████▌             | 29709/50000 [5:23:10<3:31:52,  1.60it/s]


 59%|███████████████████▌             | 29710/50000 [5:23:11<3:31:57,  1.60it/s]


 59%|███████████████████▌             | 29711/50000 [5:23:11<3:37:47,  1.55it/s]


 59%|███████████████████▌             | 29712/50000 [5:23:12<3:36:45,  1.56it/s]


 59%|███████████████████▌             | 29713/50000 [5:23:13<3:31:30,  1.60it/s]


 59%|███████████████████▌             | 29714/50000 [5:23:13<3:44:20,  1.51it/s]


 59%|███████████████████▌             | 29715/50000 [5:23:14<3:33:41,  1.58it/s]


 59%|███████████████████▌             | 29716/50000 [5:23:15<3:39:03,  1.54it/s]


 59%|███████████████████▌             | 29717/50000 [5:23:15<3:33:09,  1.59it/s]


 59%|███████████████████▌             | 29718/50000 [5:23:16<3:34:00,  1.58it/s]


 59%|███████████████████▌             | 29719/50000 [5:23:17<3:43:57,  1.51it/s]


 59%|███████████████████▌             | 29720/50000 [5:23:17<3:42:13,  1.52it/s]


 59%|███████████████████▌             | 29721/50000 [5:23:18<3:50:52,  1.46it/s]


 59%|███████████████████▌             | 29722/50000 [5:23:19<3:40:42,  1.53it/s]


 59%|███████████████████▌             | 29723/50000 [5:23:19<3:31:04,  1.60it/s]


 59%|███████████████████▌             | 29724/50000 [5:23:20<3:23:09,  1.66it/s]


 59%|███████████████████▌             | 29725/50000 [5:23:20<3:36:13,  1.56it/s]


 59%|███████████████████▌             | 29726/50000 [5:23:21<3:29:17,  1.61it/s]


 59%|███████████████████▌             | 29727/50000 [5:23:22<3:31:22,  1.60it/s]


 59%|███████████████████▌             | 29728/50000 [5:23:22<3:42:03,  1.52it/s]


 59%|███████████████████▌             | 29729/50000 [5:23:23<3:49:41,  1.47it/s]


 59%|███████████████████▌             | 29730/50000 [5:23:24<3:38:38,  1.55it/s]


 59%|███████████████████▌             | 29731/50000 [5:23:24<3:25:59,  1.64it/s]


 59%|███████████████████▌             | 29732/50000 [5:23:25<3:28:58,  1.62it/s]


 59%|███████████████████▌             | 29733/50000 [5:23:25<3:23:24,  1.66it/s]


 59%|███████████████████▌             | 29734/50000 [5:23:26<3:25:58,  1.64it/s]


 59%|███████████████████▋             | 29735/50000 [5:23:27<3:24:31,  1.65it/s]


 59%|███████████████████▋             | 29736/50000 [5:23:27<3:31:14,  1.60it/s]


 59%|███████████████████▋             | 29737/50000 [5:23:28<3:32:02,  1.59it/s]


 59%|███████████████████▋             | 29738/50000 [5:23:29<3:34:27,  1.57it/s]


 59%|███████████████████▋             | 29739/50000 [5:23:29<3:44:46,  1.50it/s]


 59%|███████████████████▋             | 29740/50000 [5:23:30<3:40:49,  1.53it/s]


 59%|███████████████████▋             | 29741/50000 [5:23:30<3:26:52,  1.63it/s]


 59%|███████████████████▋             | 29742/50000 [5:23:31<3:30:18,  1.61it/s]


 59%|███████████████████▋             | 29743/50000 [5:23:32<3:53:49,  1.44it/s]


 59%|███████████████████▋             | 29744/50000 [5:23:33<3:48:51,  1.48it/s]


 59%|███████████████████▋             | 29745/50000 [5:23:33<3:39:23,  1.54it/s]


 59%|███████████████████▋             | 29746/50000 [5:23:34<3:50:47,  1.46it/s]


 59%|███████████████████▋             | 29747/50000 [5:23:35<3:57:23,  1.42it/s]


 59%|███████████████████▋             | 29748/50000 [5:23:35<3:58:38,  1.41it/s]


 59%|███████████████████▋             | 29749/50000 [5:23:36<3:49:54,  1.47it/s]


 60%|███████████████████▋             | 29750/50000 [5:23:37<3:39:58,  1.53it/s]


 60%|███████████████████▋             | 29751/50000 [5:23:37<3:43:10,  1.51it/s]


 60%|███████████████████▋             | 29752/50000 [5:23:38<3:48:03,  1.48it/s]


 60%|███████████████████▋             | 29753/50000 [5:23:39<3:44:01,  1.51it/s]


 60%|███████████████████▋             | 29754/50000 [5:23:39<3:41:41,  1.52it/s]


 60%|███████████████████▋             | 29755/50000 [5:23:40<3:40:07,  1.53it/s]


 60%|███████████████████▋             | 29756/50000 [5:23:41<3:38:55,  1.54it/s]


 60%|███████████████████▋             | 29757/50000 [5:23:41<3:32:16,  1.59it/s]


 60%|███████████████████▋             | 29758/50000 [5:23:42<3:34:32,  1.57it/s]


 60%|███████████████████▋             | 29759/50000 [5:23:43<3:42:37,  1.52it/s]


 60%|███████████████████▋             | 29760/50000 [5:23:43<3:51:04,  1.46it/s]


 60%|███████████████████▋             | 29761/50000 [5:23:44<3:38:48,  1.54it/s]


 60%|███████████████████▋             | 29762/50000 [5:23:44<3:39:08,  1.54it/s]


 60%|███████████████████▋             | 29763/50000 [5:23:45<3:50:32,  1.46it/s]


 60%|███████████████████▋             | 29764/50000 [5:23:46<3:44:46,  1.50it/s]


 60%|███████████████████▋             | 29765/50000 [5:23:46<3:38:46,  1.54it/s]


 60%|███████████████████▋             | 29766/50000 [5:23:47<3:37:10,  1.55it/s]


 60%|███████████████████▋             | 29767/50000 [5:23:48<3:37:19,  1.55it/s]


 60%|███████████████████▋             | 29768/50000 [5:23:49<4:03:07,  1.39it/s]


 60%|███████████████████▋             | 29769/50000 [5:23:49<3:58:02,  1.42it/s]


 60%|███████████████████▋             | 29770/50000 [5:23:50<3:53:15,  1.45it/s]


 60%|███████████████████▋             | 29771/50000 [5:23:51<4:05:10,  1.38it/s]


 60%|███████████████████▋             | 29772/50000 [5:23:51<3:51:48,  1.45it/s]


 60%|███████████████████▋             | 29773/50000 [5:23:52<3:50:45,  1.46it/s]


 60%|███████████████████▋             | 29774/50000 [5:23:53<3:45:34,  1.49it/s]


 60%|███████████████████▋             | 29775/50000 [5:23:53<3:53:39,  1.44it/s]


 60%|███████████████████▋             | 29776/50000 [5:23:54<3:40:35,  1.53it/s]


 60%|███████████████████▋             | 29777/50000 [5:23:55<3:52:48,  1.45it/s]


 60%|███████████████████▋             | 29778/50000 [5:23:56<3:56:53,  1.42it/s]


 60%|███████████████████▋             | 29779/50000 [5:23:56<3:58:26,  1.41it/s]


 60%|███████████████████▋             | 29780/50000 [5:23:57<3:52:07,  1.45it/s]


 60%|███████████████████▋             | 29781/50000 [5:23:58<3:59:41,  1.41it/s]


 60%|███████████████████▋             | 29782/50000 [5:23:58<3:37:28,  1.55it/s]


 60%|███████████████████▋             | 29783/50000 [5:23:59<3:29:12,  1.61it/s]


 60%|███████████████████▋             | 29784/50000 [5:23:59<3:26:54,  1.63it/s]


 60%|███████████████████▋             | 29785/50000 [5:24:00<3:39:29,  1.53it/s]


 60%|███████████████████▋             | 29786/50000 [5:24:01<3:37:41,  1.55it/s]


 60%|███████████████████▋             | 29787/50000 [5:24:01<3:40:12,  1.53it/s]


 60%|███████████████████▋             | 29788/50000 [5:24:02<3:50:35,  1.46it/s]


 60%|███████████████████▋             | 29789/50000 [5:24:03<3:49:57,  1.46it/s]


 60%|███████████████████▋             | 29790/50000 [5:24:04<3:57:24,  1.42it/s]


 60%|███████████████████▋             | 29791/50000 [5:24:04<3:52:01,  1.45it/s]


 60%|███████████████████▋             | 29792/50000 [5:24:05<3:54:06,  1.44it/s]


 60%|███████████████████▋             | 29793/50000 [5:24:06<3:46:35,  1.49it/s]


 60%|███████████████████▋             | 29794/50000 [5:24:06<3:50:55,  1.46it/s]


 60%|███████████████████▋             | 29795/50000 [5:24:07<3:46:22,  1.49it/s]


 60%|███████████████████▋             | 29796/50000 [5:24:08<3:42:37,  1.51it/s]


 60%|███████████████████▋             | 29797/50000 [5:24:08<3:40:38,  1.53it/s]


 60%|███████████████████▋             | 29798/50000 [5:24:09<3:56:27,  1.42it/s]


 60%|███████████████████▋             | 29799/50000 [5:24:10<4:08:07,  1.36it/s]


 60%|███████████████████▋             | 29800/50000 [5:24:11<4:06:17,  1.37it/s]
                                                                                
{'loss': 3.2916, 'grad_norm': 3.910555839538574, 'learning_rate': 0.000404, 'epoch': 1.56}

 60%|███████████████████▋             | 29800/50000 [5:24:11<4:06:17,  1.37it/s]


 60%|███████████████████▋             | 29801/50000 [5:24:11<4:06:16,  1.37it/s]


 60%|███████████████████▋             | 29802/50000 [5:24:12<3:50:36,  1.46it/s]


 60%|███████████████████▋             | 29803/50000 [5:24:12<3:41:31,  1.52it/s]


 60%|███████████████████▋             | 29804/50000 [5:24:13<3:44:06,  1.50it/s]


 60%|███████████████████▋             | 29805/50000 [5:24:14<3:44:08,  1.50it/s]


 60%|███████████████████▋             | 29806/50000 [5:24:14<3:49:37,  1.47it/s]


 60%|███████████████████▋             | 29807/50000 [5:24:15<3:57:18,  1.42it/s]


 60%|███████████████████▋             | 29808/50000 [5:24:16<3:43:15,  1.51it/s]


 60%|███████████████████▋             | 29809/50000 [5:24:17<3:49:22,  1.47it/s]


 60%|███████████████████▋             | 29810/50000 [5:24:17<3:55:58,  1.43it/s]


 60%|███████████████████▋             | 29811/50000 [5:24:18<3:52:00,  1.45it/s]


 60%|███████████████████▋             | 29812/50000 [5:24:19<3:39:39,  1.53it/s]


 60%|███████████████████▋             | 29813/50000 [5:24:19<3:33:24,  1.58it/s]


 60%|███████████████████▋             | 29814/50000 [5:24:20<3:44:34,  1.50it/s]


 60%|███████████████████▋             | 29815/50000 [5:24:20<3:28:53,  1.61it/s]


 60%|███████████████████▋             | 29816/50000 [5:24:21<3:23:46,  1.65it/s]


 60%|███████████████████▋             | 29817/50000 [5:24:22<3:38:04,  1.54it/s]


 60%|███████████████████▋             | 29818/50000 [5:24:22<3:31:38,  1.59it/s]


 60%|███████████████████▋             | 29819/50000 [5:24:23<4:12:03,  1.33it/s]


 60%|███████████████████▋             | 29820/50000 [5:24:24<3:57:21,  1.42it/s]


 60%|███████████████████▋             | 29821/50000 [5:24:25<4:03:57,  1.38it/s]


 60%|███████████████████▋             | 29822/50000 [5:24:25<3:58:02,  1.41it/s]


 60%|███████████████████▋             | 29823/50000 [5:24:26<3:50:42,  1.46it/s]


 60%|███████████████████▋             | 29824/50000 [5:24:27<3:40:19,  1.53it/s]


 60%|███████████████████▋             | 29825/50000 [5:24:27<3:55:52,  1.43it/s]


 60%|███████████████████▋             | 29826/50000 [5:24:28<4:06:47,  1.36it/s]


 60%|███████████████████▋             | 29827/50000 [5:24:29<3:57:15,  1.42it/s]


 60%|███████████████████▋             | 29828/50000 [5:24:29<3:45:26,  1.49it/s]


 60%|███████████████████▋             | 29829/50000 [5:24:30<3:32:44,  1.58it/s]


 60%|███████████████████▋             | 29830/50000 [5:24:31<3:37:12,  1.55it/s]


 60%|███████████████████▋             | 29831/50000 [5:24:31<3:38:27,  1.54it/s]


 60%|███████████████████▋             | 29832/50000 [5:24:32<3:32:57,  1.58it/s]


 60%|███████████████████▋             | 29833/50000 [5:24:32<3:24:00,  1.65it/s]


 60%|███████████████████▋             | 29834/50000 [5:24:33<3:26:38,  1.63it/s]


 60%|███████████████████▋             | 29835/50000 [5:24:34<3:22:06,  1.66it/s]


 60%|███████████████████▋             | 29836/50000 [5:24:34<3:33:54,  1.57it/s]


 60%|███████████████████▋             | 29837/50000 [5:24:35<3:43:31,  1.50it/s]


 60%|███████████████████▋             | 29838/50000 [5:24:36<3:40:21,  1.52it/s]


 60%|███████████████████▋             | 29839/50000 [5:24:36<3:39:19,  1.53it/s]


 60%|███████████████████▋             | 29840/50000 [5:24:37<3:38:02,  1.54it/s]


 60%|███████████████████▋             | 29841/50000 [5:24:38<3:40:54,  1.52it/s]


 60%|███████████████████▋             | 29842/50000 [5:24:38<3:39:05,  1.53it/s]


 60%|███████████████████▋             | 29843/50000 [5:24:39<3:38:29,  1.54it/s]


 60%|███████████████████▋             | 29844/50000 [5:24:40<3:40:01,  1.53it/s]


 60%|███████████████████▋             | 29845/50000 [5:24:40<3:39:03,  1.53it/s]


 60%|███████████████████▋             | 29846/50000 [5:24:41<3:33:18,  1.57it/s]


 60%|███████████████████▋             | 29847/50000 [5:24:41<3:28:01,  1.61it/s]


 60%|███████████████████▋             | 29848/50000 [5:24:42<3:31:18,  1.59it/s]


 60%|███████████████████▋             | 29849/50000 [5:24:43<3:24:05,  1.65it/s]


 60%|███████████████████▋             | 29850/50000 [5:24:43<3:45:06,  1.49it/s]


 60%|███████████████████▋             | 29851/50000 [5:24:44<3:33:07,  1.58it/s]


 60%|███████████████████▋             | 29852/50000 [5:24:45<3:34:50,  1.56it/s]


 60%|███████████████████▋             | 29853/50000 [5:24:45<3:34:49,  1.56it/s]


 60%|███████████████████▋             | 29854/50000 [5:24:46<3:21:45,  1.66it/s]


 60%|███████████████████▋             | 29855/50000 [5:24:46<3:22:42,  1.66it/s]


 60%|███████████████████▋             | 29856/50000 [5:24:47<3:30:06,  1.60it/s]


 60%|███████████████████▋             | 29857/50000 [5:24:48<3:22:36,  1.66it/s]


 60%|███████████████████▋             | 29858/50000 [5:24:48<3:36:37,  1.55it/s]


 60%|███████████████████▋             | 29859/50000 [5:24:49<3:27:47,  1.62it/s]


 60%|███████████████████▋             | 29860/50000 [5:24:50<3:25:58,  1.63it/s]


 60%|███████████████████▋             | 29861/50000 [5:24:50<3:29:48,  1.60it/s]


 60%|███████████████████▋             | 29862/50000 [5:24:51<3:30:00,  1.60it/s]


 60%|███████████████████▋             | 29863/50000 [5:24:52<3:49:07,  1.46it/s]


 60%|███████████████████▋             | 29864/50000 [5:24:52<3:45:21,  1.49it/s]


 60%|███████████████████▋             | 29865/50000 [5:24:53<3:41:38,  1.51it/s]


 60%|███████████████████▋             | 29866/50000 [5:24:54<3:38:53,  1.53it/s]


 60%|███████████████████▋             | 29867/50000 [5:24:54<3:28:43,  1.61it/s]


 60%|███████████████████▋             | 29868/50000 [5:24:55<3:29:50,  1.60it/s]


 60%|███████████████████▋             | 29869/50000 [5:24:55<3:33:49,  1.57it/s]


 60%|███████████████████▋             | 29870/50000 [5:24:56<3:34:02,  1.57it/s]


 60%|███████████████████▋             | 29871/50000 [5:24:57<3:36:06,  1.55it/s]


 60%|███████████████████▋             | 29872/50000 [5:24:58<3:57:06,  1.41it/s]


 60%|███████████████████▋             | 29873/50000 [5:24:58<3:49:15,  1.46it/s]


 60%|███████████████████▋             | 29874/50000 [5:24:59<3:54:38,  1.43it/s]


 60%|███████████████████▋             | 29875/50000 [5:25:00<3:56:15,  1.42it/s]


 60%|███████████████████▋             | 29876/50000 [5:25:00<3:45:15,  1.49it/s]


 60%|███████████████████▋             | 29877/50000 [5:25:01<3:38:10,  1.54it/s]


 60%|███████████████████▋             | 29878/50000 [5:25:01<3:32:34,  1.58it/s]


 60%|███████████████████▋             | 29879/50000 [5:25:02<3:47:22,  1.47it/s]


 60%|███████████████████▋             | 29880/50000 [5:25:03<3:50:30,  1.45it/s]


 60%|███████████████████▋             | 29881/50000 [5:25:04<3:50:06,  1.46it/s]


 60%|███████████████████▋             | 29882/50000 [5:25:04<3:36:49,  1.55it/s]


 60%|███████████████████▋             | 29883/50000 [5:25:05<3:37:35,  1.54it/s]


 60%|███████████████████▋             | 29884/50000 [5:25:06<3:55:04,  1.43it/s]


 60%|███████████████████▋             | 29885/50000 [5:25:06<3:37:37,  1.54it/s]


 60%|███████████████████▋             | 29886/50000 [5:25:07<3:39:17,  1.53it/s]


 60%|███████████████████▋             | 29887/50000 [5:25:07<3:31:29,  1.58it/s]


 60%|███████████████████▋             | 29888/50000 [5:25:08<3:20:36,  1.67it/s]


 60%|███████████████████▋             | 29889/50000 [5:25:09<3:25:21,  1.63it/s]


 60%|███████████████████▋             | 29890/50000 [5:25:09<3:45:12,  1.49it/s]


 60%|███████████████████▋             | 29891/50000 [5:25:10<3:34:21,  1.56it/s]


 60%|███████████████████▋             | 29892/50000 [5:25:11<3:31:10,  1.59it/s]


 60%|███████████████████▋             | 29893/50000 [5:25:11<3:28:56,  1.60it/s]


 60%|███████████████████▋             | 29894/50000 [5:25:12<3:48:21,  1.47it/s]


 60%|███████████████████▋             | 29895/50000 [5:25:13<3:37:47,  1.54it/s]


 60%|███████████████████▋             | 29896/50000 [5:25:13<3:35:30,  1.55it/s]


 60%|███████████████████▋             | 29897/50000 [5:25:14<3:28:54,  1.60it/s]


 60%|███████████████████▋             | 29898/50000 [5:25:15<3:39:00,  1.53it/s]


 60%|███████████████████▋             | 29899/50000 [5:25:15<3:33:44,  1.57it/s]


 60%|███████████████████▋             | 29900/50000 [5:25:16<3:33:18,  1.57it/s]
                                                                                
{'loss': 3.215, 'grad_norm': 2.9178130626678467, 'learning_rate': 0.000402, 'epoch': 1.57}

 60%|███████████████████▋             | 29900/50000 [5:25:16<3:33:18,  1.57it/s]


 60%|███████████████████▋             | 29901/50000 [5:25:16<3:40:39,  1.52it/s]


 60%|███████████████████▋             | 29902/50000 [5:25:17<3:34:11,  1.56it/s]


 60%|███████████████████▋             | 29903/50000 [5:25:18<3:24:50,  1.64it/s]


 60%|███████████████████▋             | 29904/50000 [5:25:18<3:21:57,  1.66it/s]


 60%|███████████████████▋             | 29905/50000 [5:25:19<3:27:30,  1.61it/s]


 60%|███████████████████▋             | 29906/50000 [5:25:19<3:29:31,  1.60it/s]


 60%|███████████████████▋             | 29907/50000 [5:25:20<3:29:37,  1.60it/s]


 60%|███████████████████▋             | 29908/50000 [5:25:21<3:25:17,  1.63it/s]


 60%|███████████████████▋             | 29909/50000 [5:25:21<3:37:05,  1.54it/s]


 60%|███████████████████▋             | 29910/50000 [5:25:22<3:55:57,  1.42it/s]


 60%|███████████████████▋             | 29911/50000 [5:25:23<3:44:18,  1.49it/s]


 60%|███████████████████▋             | 29912/50000 [5:25:23<3:35:34,  1.55it/s]


 60%|███████████████████▋             | 29913/50000 [5:25:24<3:37:39,  1.54it/s]


 60%|███████████████████▋             | 29914/50000 [5:25:25<3:43:37,  1.50it/s]


 60%|███████████████████▋             | 29915/50000 [5:25:25<3:36:19,  1.55it/s]


 60%|███████████████████▋             | 29916/50000 [5:25:26<3:28:03,  1.61it/s]


 60%|███████████████████▋             | 29917/50000 [5:25:26<3:20:23,  1.67it/s]


 60%|███████████████████▋             | 29918/50000 [5:25:27<3:32:56,  1.57it/s]


 60%|███████████████████▋             | 29919/50000 [5:25:28<3:21:30,  1.66it/s]


 60%|███████████████████▋             | 29920/50000 [5:25:28<3:17:39,  1.69it/s]


 60%|███████████████████▋             | 29921/50000 [5:25:29<3:23:49,  1.64it/s]


 60%|███████████████████▋             | 29922/50000 [5:25:30<3:29:24,  1.60it/s]


 60%|███████████████████▋             | 29923/50000 [5:25:30<3:52:22,  1.44it/s]


 60%|███████████████████▋             | 29924/50000 [5:25:31<3:35:06,  1.56it/s]


 60%|███████████████████▊             | 29925/50000 [5:25:32<3:36:02,  1.55it/s]


 60%|███████████████████▊             | 29926/50000 [5:25:32<3:36:40,  1.54it/s]


 60%|███████████████████▊             | 29927/50000 [5:25:33<3:24:08,  1.64it/s]


 60%|███████████████████▊             | 29928/50000 [5:25:33<3:15:33,  1.71it/s]


 60%|███████████████████▊             | 29929/50000 [5:25:34<3:12:18,  1.74it/s]


 60%|███████████████████▊             | 29930/50000 [5:25:35<3:16:57,  1.70it/s]


 60%|███████████████████▊             | 29931/50000 [5:25:35<3:40:27,  1.52it/s]


 60%|███████████████████▊             | 29932/50000 [5:25:36<3:50:09,  1.45it/s]


 60%|███████████████████▊             | 29933/50000 [5:25:37<4:02:19,  1.38it/s]


 60%|███████████████████▊             | 29934/50000 [5:25:38<4:05:26,  1.36it/s]


 60%|███████████████████▊             | 29935/50000 [5:25:38<3:55:19,  1.42it/s]


 60%|███████████████████▊             | 29936/50000 [5:25:39<3:43:14,  1.50it/s]


 60%|███████████████████▊             | 29937/50000 [5:25:40<3:38:35,  1.53it/s]


 60%|███████████████████▊             | 29938/50000 [5:25:40<3:54:54,  1.42it/s]


 60%|███████████████████▊             | 29939/50000 [5:25:41<4:06:56,  1.35it/s]


 60%|███████████████████▊             | 29940/50000 [5:25:42<4:06:44,  1.35it/s]


 60%|███████████████████▊             | 29941/50000 [5:25:43<4:09:09,  1.34it/s]


 60%|███████████████████▊             | 29942/50000 [5:25:43<3:57:58,  1.40it/s]


 60%|███████████████████▊             | 29943/50000 [5:25:44<3:53:14,  1.43it/s]


 60%|███████████████████▊             | 29944/50000 [5:25:44<3:38:33,  1.53it/s]


 60%|███████████████████▊             | 29945/50000 [5:25:45<3:49:27,  1.46it/s]


 60%|███████████████████▊             | 29946/50000 [5:25:46<3:57:05,  1.41it/s]


 60%|███████████████████▊             | 29947/50000 [5:25:47<3:43:09,  1.50it/s]


 60%|███████████████████▊             | 29948/50000 [5:25:47<3:40:17,  1.52it/s]


 60%|███████████████████▊             | 29949/50000 [5:25:48<3:29:29,  1.60it/s]


 60%|███████████████████▊             | 29950/50000 [5:25:48<3:35:15,  1.55it/s]


 60%|███████████████████▊             | 29951/50000 [5:25:49<3:43:15,  1.50it/s]


 60%|███████████████████▊             | 29952/50000 [5:25:50<3:36:33,  1.54it/s]


 60%|███████████████████▊             | 29953/50000 [5:25:50<3:39:29,  1.52it/s]


 60%|███████████████████▊             | 29954/50000 [5:25:51<3:49:23,  1.46it/s]


 60%|███████████████████▊             | 29955/50000 [5:25:52<3:43:25,  1.50it/s]


 60%|███████████████████▊             | 29956/50000 [5:25:52<3:38:39,  1.53it/s]


 60%|███████████████████▊             | 29957/50000 [5:25:53<3:32:37,  1.57it/s]


 60%|███████████████████▊             | 29958/50000 [5:25:54<3:41:36,  1.51it/s]


 60%|███████████████████▊             | 29959/50000 [5:25:54<3:31:50,  1.58it/s]


 60%|███████████████████▊             | 29960/50000 [5:25:55<3:27:31,  1.61it/s]


 60%|███████████████████▊             | 29961/50000 [5:25:56<3:32:41,  1.57it/s]


 60%|███████████████████▊             | 29962/50000 [5:25:56<3:36:18,  1.54it/s]


 60%|███████████████████▊             | 29963/50000 [5:25:57<3:43:18,  1.50it/s]


 60%|███████████████████▊             | 29964/50000 [5:25:58<3:35:31,  1.55it/s]


 60%|███████████████████▊             | 29965/50000 [5:25:58<3:52:09,  1.44it/s]


 60%|███████████████████▊             | 29966/50000 [5:25:59<4:00:58,  1.39it/s]


 60%|███████████████████▊             | 29967/50000 [5:26:00<3:47:16,  1.47it/s]


 60%|███████████████████▊             | 29968/50000 [5:26:00<3:44:00,  1.49it/s]


 60%|███████████████████▊             | 29969/50000 [5:26:01<3:44:13,  1.49it/s]


 60%|███████████████████▊             | 29970/50000 [5:26:02<3:43:05,  1.50it/s]


 60%|███████████████████▊             | 29971/50000 [5:26:02<3:30:56,  1.58it/s]


 60%|███████████████████▊             | 29972/50000 [5:26:03<3:31:54,  1.58it/s]


 60%|███████████████████▊             | 29973/50000 [5:26:04<3:42:20,  1.50it/s]


 60%|███████████████████▊             | 29974/50000 [5:26:04<3:31:02,  1.58it/s]


 60%|███████████████████▊             | 29975/50000 [5:26:05<3:20:16,  1.67it/s]


 60%|███████████████████▊             | 29976/50000 [5:26:05<3:26:57,  1.61it/s]


 60%|███████████████████▊             | 29977/50000 [5:26:06<3:31:02,  1.58it/s]


 60%|███████████████████▊             | 29978/50000 [5:26:07<3:51:31,  1.44it/s]


 60%|███████████████████▊             | 29979/50000 [5:26:08<3:47:25,  1.47it/s]


 60%|███████████████████▊             | 29980/50000 [5:26:08<3:39:22,  1.52it/s]


 60%|███████████████████▊             | 29981/50000 [5:26:09<3:55:16,  1.42it/s]


 60%|███████████████████▊             | 29982/50000 [5:26:10<3:48:04,  1.46it/s]


 60%|███████████████████▊             | 29983/50000 [5:26:10<3:31:57,  1.57it/s]


 60%|███████████████████▊             | 29984/50000 [5:26:11<3:28:31,  1.60it/s]


 60%|███████████████████▊             | 29985/50000 [5:26:11<3:33:54,  1.56it/s]


 60%|███████████████████▊             | 29986/50000 [5:26:12<3:25:05,  1.63it/s]


 60%|███████████████████▊             | 29987/50000 [5:26:13<3:29:44,  1.59it/s]


 60%|███████████████████▊             | 29988/50000 [5:26:14<4:01:34,  1.38it/s]


 60%|███████████████████▊             | 29989/50000 [5:26:14<3:55:16,  1.42it/s]


 60%|███████████████████▊             | 29990/50000 [5:26:15<3:56:29,  1.41it/s]


 60%|███████████████████▊             | 29991/50000 [5:26:16<3:44:28,  1.49it/s]


 60%|███████████████████▊             | 29992/50000 [5:26:16<3:33:33,  1.56it/s]


 60%|███████████████████▊             | 29993/50000 [5:26:17<3:32:15,  1.57it/s]


 60%|███████████████████▊             | 29994/50000 [5:26:17<3:28:44,  1.60it/s]


 60%|███████████████████▊             | 29995/50000 [5:26:18<3:24:25,  1.63it/s]


 60%|███████████████████▊             | 29996/50000 [5:26:19<3:19:29,  1.67it/s]


 60%|███████████████████▊             | 29997/50000 [5:26:19<3:15:27,  1.71it/s]


 60%|███████████████████▊             | 29998/50000 [5:26:20<3:40:58,  1.51it/s]


 60%|███████████████████▊             | 29999/50000 [5:26:21<3:41:50,  1.50it/s]


 60%|███████████████████▊             | 30000/50000 [5:26:21<3:35:27,  1.55it/s]
                                                                                
{'loss': 3.2287, 'grad_norm': 2.8905599117279053, 'learning_rate': 0.0004, 'epoch': 1.57}

 60%|███████████████████▊             | 30000/50000 [5:26:21<3:35:27,  1.55it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.15s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:07<00:03,  3.03s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:09<00:00,  2.53s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 32.628236, 'eval_rouge-2': 8.01929, 'eval_rouge-l': 25.152721999999997, 'eval_bleu-4': 0.03645962084664121, 'eval_runtime': 12.8128, 'eval_samples_per_second': 3.902, 'eval_steps_per_second': 0.312, 'epoch': 1.57}

 60%|███████████████████▊             | 30000/50000 [5:26:34<3:35:27,  1.55it/s]

100%|█████████████████████████████████████████████| 4/4 [00:09<00:00,  2.53s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-30000
tokenizer config file saved in ./output/tmp-checkpoint-30000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-30000/special_tokens_map.json



 60%|███████████████████▏            | 30001/50000 [5:26:35<25:16:51,  4.55s/it]


 60%|███████████████████▏            | 30002/50000 [5:26:36<18:57:19,  3.41s/it]


 60%|███████████████████▏            | 30003/50000 [5:26:36<14:23:48,  2.59s/it]


 60%|███████████████████▏            | 30004/50000 [5:26:37<11:05:21,  2.00s/it]


 60%|███████████████████▊             | 30005/50000 [5:26:38<8:59:23,  1.62s/it]


 60%|███████████████████▊             | 30006/50000 [5:26:38<7:20:52,  1.32s/it]


 60%|███████████████████▊             | 30007/50000 [5:26:39<6:05:38,  1.10s/it]


 60%|███████████████████▊             | 30008/50000 [5:26:39<5:20:27,  1.04it/s]


 60%|███████████████████▊             | 30009/50000 [5:26:40<4:50:44,  1.15it/s]


 60%|███████████████████▊             | 30010/50000 [5:26:41<4:31:53,  1.23it/s]


 60%|███████████████████▊             | 30011/50000 [5:26:42<4:37:01,  1.20it/s]


 60%|███████████████████▊             | 30012/50000 [5:26:42<4:19:47,  1.28it/s]


 60%|███████████████████▊             | 30013/50000 [5:26:43<3:58:12,  1.40it/s]


 60%|███████████████████▊             | 30014/50000 [5:26:44<3:51:16,  1.44it/s]


 60%|███████████████████▊             | 30015/50000 [5:26:44<3:42:35,  1.50it/s]


 60%|███████████████████▊             | 30016/50000 [5:26:45<3:35:55,  1.54it/s]


 60%|███████████████████▊             | 30017/50000 [5:26:45<3:38:09,  1.53it/s]


 60%|███████████████████▊             | 30018/50000 [5:26:46<3:21:08,  1.66it/s]


 60%|███████████████████▊             | 30019/50000 [5:26:47<3:46:18,  1.47it/s]


 60%|███████████████████▊             | 30020/50000 [5:26:47<3:43:34,  1.49it/s]


 60%|███████████████████▊             | 30021/50000 [5:26:48<3:32:11,  1.57it/s]


 60%|███████████████████▊             | 30022/50000 [5:26:49<3:33:16,  1.56it/s]


 60%|███████████████████▊             | 30023/50000 [5:26:49<3:37:51,  1.53it/s]


 60%|███████████████████▊             | 30024/50000 [5:26:50<3:48:03,  1.46it/s]


 60%|███████████████████▊             | 30025/50000 [5:26:51<3:40:20,  1.51it/s]


 60%|███████████████████▊             | 30026/50000 [5:26:51<3:25:28,  1.62it/s]


 60%|███████████████████▊             | 30027/50000 [5:26:52<3:24:01,  1.63it/s]


 60%|███████████████████▊             | 30028/50000 [5:26:53<3:37:39,  1.53it/s]


 60%|███████████████████▊             | 30029/50000 [5:26:53<3:30:49,  1.58it/s]


 60%|███████████████████▊             | 30030/50000 [5:26:54<3:42:01,  1.50it/s]


 60%|███████████████████▊             | 30031/50000 [5:26:55<3:38:14,  1.53it/s]


 60%|███████████████████▊             | 30032/50000 [5:26:55<3:29:53,  1.59it/s]


 60%|███████████████████▊             | 30033/50000 [5:26:56<3:26:59,  1.61it/s]


 60%|███████████████████▊             | 30034/50000 [5:26:56<3:39:42,  1.51it/s]


 60%|███████████████████▊             | 30035/50000 [5:26:57<3:37:36,  1.53it/s]


 60%|███████████████████▊             | 30036/50000 [5:26:58<3:45:12,  1.48it/s]


 60%|███████████████████▊             | 30037/50000 [5:26:58<3:40:18,  1.51it/s]


 60%|███████████████████▊             | 30038/50000 [5:26:59<3:52:12,  1.43it/s]


 60%|███████████████████▊             | 30039/50000 [5:27:00<3:44:31,  1.48it/s]


 60%|███████████████████▊             | 30040/50000 [5:27:00<3:39:07,  1.52it/s]


 60%|███████████████████▊             | 30041/50000 [5:27:01<3:40:27,  1.51it/s]


 60%|███████████████████▊             | 30042/50000 [5:27:02<3:38:44,  1.52it/s]


 60%|███████████████████▊             | 30043/50000 [5:27:02<3:37:10,  1.53it/s]


 60%|███████████████████▊             | 30044/50000 [5:27:03<3:35:20,  1.54it/s]


 60%|███████████████████▊             | 30045/50000 [5:27:04<3:30:07,  1.58it/s]


 60%|███████████████████▊             | 30046/50000 [5:27:04<3:23:19,  1.64it/s]


 60%|███████████████████▊             | 30047/50000 [5:27:05<3:36:38,  1.54it/s]


 60%|███████████████████▊             | 30048/50000 [5:27:06<3:31:12,  1.57it/s]


 60%|███████████████████▊             | 30049/50000 [5:27:06<3:27:31,  1.60it/s]


 60%|███████████████████▊             | 30050/50000 [5:27:07<3:33:40,  1.56it/s]


 60%|███████████████████▊             | 30051/50000 [5:27:07<3:24:35,  1.63it/s]


 60%|███████████████████▊             | 30052/50000 [5:27:08<3:23:41,  1.63it/s]


 60%|███████████████████▊             | 30053/50000 [5:27:09<3:25:08,  1.62it/s]


 60%|███████████████████▊             | 30054/50000 [5:27:09<3:23:34,  1.63it/s]


 60%|███████████████████▊             | 30055/50000 [5:27:10<3:29:34,  1.59it/s]


 60%|███████████████████▊             | 30056/50000 [5:27:11<3:26:46,  1.61it/s]


 60%|███████████████████▊             | 30057/50000 [5:27:11<3:29:15,  1.59it/s]


 60%|███████████████████▊             | 30058/50000 [5:27:12<3:34:37,  1.55it/s]


 60%|███████████████████▊             | 30059/50000 [5:27:13<3:36:31,  1.53it/s]


 60%|███████████████████▊             | 30060/50000 [5:27:13<3:27:45,  1.60it/s]


 60%|███████████████████▊             | 30061/50000 [5:27:14<3:43:18,  1.49it/s]


 60%|███████████████████▊             | 30062/50000 [5:27:15<3:58:18,  1.39it/s]


 60%|███████████████████▊             | 30063/50000 [5:27:15<3:57:31,  1.40it/s]


 60%|███████████████████▊             | 30064/50000 [5:27:16<3:37:43,  1.53it/s]


 60%|███████████████████▊             | 30065/50000 [5:27:16<3:32:22,  1.56it/s]


 60%|███████████████████▊             | 30066/50000 [5:27:17<3:31:08,  1.57it/s]


 60%|███████████████████▊             | 30067/50000 [5:27:18<3:41:18,  1.50it/s]


 60%|███████████████████▊             | 30068/50000 [5:27:18<3:23:53,  1.63it/s]


 60%|███████████████████▊             | 30069/50000 [5:27:19<3:37:21,  1.53it/s]


 60%|███████████████████▊             | 30070/50000 [5:27:20<3:27:42,  1.60it/s]


 60%|███████████████████▊             | 30071/50000 [5:27:20<3:30:20,  1.58it/s]


 60%|███████████████████▊             | 30072/50000 [5:27:21<3:25:53,  1.61it/s]


 60%|███████████████████▊             | 30073/50000 [5:27:22<3:31:07,  1.57it/s]


 60%|███████████████████▊             | 30074/50000 [5:27:22<3:27:40,  1.60it/s]


 60%|███████████████████▊             | 30075/50000 [5:27:23<3:40:02,  1.51it/s]


 60%|███████████████████▊             | 30076/50000 [5:27:24<3:48:03,  1.46it/s]


 60%|███████████████████▊             | 30077/50000 [5:27:25<4:13:41,  1.31it/s]


 60%|███████████████████▊             | 30078/50000 [5:27:25<4:10:59,  1.32it/s]


 60%|███████████████████▊             | 30079/50000 [5:27:26<4:19:33,  1.28it/s]


 60%|███████████████████▊             | 30080/50000 [5:27:27<4:02:33,  1.37it/s]


 60%|███████████████████▊             | 30081/50000 [5:27:27<3:48:35,  1.45it/s]


 60%|███████████████████▊             | 30082/50000 [5:27:28<3:47:28,  1.46it/s]


 60%|███████████████████▊             | 30083/50000 [5:27:29<3:34:44,  1.55it/s]


 60%|███████████████████▊             | 30084/50000 [5:27:29<3:53:28,  1.42it/s]


 60%|███████████████████▊             | 30085/50000 [5:27:30<3:54:40,  1.41it/s]


 60%|███████████████████▊             | 30086/50000 [5:27:31<3:49:25,  1.45it/s]


 60%|███████████████████▊             | 30087/50000 [5:27:32<3:57:15,  1.40it/s]


 60%|███████████████████▊             | 30088/50000 [5:27:32<3:49:57,  1.44it/s]


 60%|███████████████████▊             | 30089/50000 [5:27:33<3:39:47,  1.51it/s]


 60%|███████████████████▊             | 30090/50000 [5:27:33<3:24:37,  1.62it/s]


 60%|███████████████████▊             | 30091/50000 [5:27:34<3:28:08,  1.59it/s]


 60%|███████████████████▊             | 30092/50000 [5:27:35<3:50:06,  1.44it/s]


 60%|███████████████████▊             | 30093/50000 [5:27:36<3:48:28,  1.45it/s]


 60%|███████████████████▊             | 30094/50000 [5:27:36<3:42:22,  1.49it/s]


 60%|███████████████████▊             | 30095/50000 [5:27:37<3:36:08,  1.53it/s]


 60%|███████████████████▊             | 30096/50000 [5:27:37<3:34:21,  1.55it/s]


 60%|███████████████████▊             | 30097/50000 [5:27:38<3:25:33,  1.61it/s]


 60%|███████████████████▊             | 30098/50000 [5:27:38<3:14:00,  1.71it/s]


 60%|███████████████████▊             | 30099/50000 [5:27:39<3:18:58,  1.67it/s]


 60%|███████████████████▊             | 30100/50000 [5:27:40<3:19:24,  1.66it/s]
                                                                                
{'loss': 3.2407, 'grad_norm': 2.799912929534912, 'learning_rate': 0.000398, 'epoch': 1.58}

 60%|███████████████████▊             | 30100/50000 [5:27:40<3:19:24,  1.66it/s]


 60%|███████████████████▊             | 30101/50000 [5:27:40<3:09:56,  1.75it/s]


 60%|███████████████████▊             | 30102/50000 [5:27:41<3:12:23,  1.72it/s]


 60%|███████████████████▊             | 30103/50000 [5:27:42<3:26:11,  1.61it/s]


 60%|███████████████████▊             | 30104/50000 [5:27:42<3:19:49,  1.66it/s]


 60%|███████████████████▊             | 30105/50000 [5:27:43<3:33:09,  1.56it/s]


 60%|███████████████████▊             | 30106/50000 [5:27:43<3:32:48,  1.56it/s]


 60%|███████████████████▊             | 30107/50000 [5:27:44<3:52:05,  1.43it/s]


 60%|███████████████████▊             | 30108/50000 [5:27:45<3:57:41,  1.39it/s]


 60%|███████████████████▊             | 30109/50000 [5:27:46<4:04:24,  1.36it/s]


 60%|███████████████████▊             | 30110/50000 [5:27:46<3:58:00,  1.39it/s]


 60%|███████████████████▊             | 30111/50000 [5:27:47<3:58:35,  1.39it/s]


 60%|███████████████████▊             | 30112/50000 [5:27:48<3:46:13,  1.47it/s]


 60%|███████████████████▊             | 30113/50000 [5:27:49<3:54:48,  1.41it/s]


 60%|███████████████████▉             | 30114/50000 [5:27:49<3:47:22,  1.46it/s]


 60%|███████████████████▉             | 30115/50000 [5:27:50<4:16:57,  1.29it/s]


 60%|███████████████████▉             | 30116/50000 [5:27:51<4:07:50,  1.34it/s]


 60%|███████████████████▉             | 30117/50000 [5:27:52<4:11:09,  1.32it/s]


 60%|███████████████████▉             | 30118/50000 [5:27:52<4:12:20,  1.31it/s]


 60%|███████████████████▉             | 30119/50000 [5:27:53<3:52:53,  1.42it/s]


 60%|███████████████████▉             | 30120/50000 [5:27:54<3:44:56,  1.47it/s]


 60%|███████████████████▉             | 30121/50000 [5:27:54<3:55:20,  1.41it/s]


 60%|███████████████████▉             | 30122/50000 [5:27:55<3:48:27,  1.45it/s]


 60%|███████████████████▉             | 30123/50000 [5:27:56<3:44:22,  1.48it/s]


 60%|███████████████████▉             | 30124/50000 [5:27:56<3:37:38,  1.52it/s]


 60%|███████████████████▉             | 30125/50000 [5:27:57<3:43:40,  1.48it/s]


 60%|███████████████████▉             | 30126/50000 [5:27:58<3:41:01,  1.50it/s]


 60%|███████████████████▉             | 30127/50000 [5:27:58<3:41:32,  1.50it/s]


 60%|███████████████████▉             | 30128/50000 [5:27:59<3:29:16,  1.58it/s]


 60%|███████████████████▉             | 30129/50000 [5:28:00<3:32:32,  1.56it/s]


 60%|███████████████████▉             | 30130/50000 [5:28:00<3:41:31,  1.49it/s]


 60%|███████████████████▉             | 30131/50000 [5:28:01<3:33:50,  1.55it/s]


 60%|███████████████████▉             | 30132/50000 [5:28:02<3:41:00,  1.50it/s]


 60%|███████████████████▉             | 30133/50000 [5:28:02<3:50:38,  1.44it/s]


 60%|███████████████████▉             | 30134/50000 [5:28:03<3:44:25,  1.48it/s]


 60%|███████████████████▉             | 30135/50000 [5:28:04<3:48:40,  1.45it/s]


 60%|███████████████████▉             | 30136/50000 [5:28:04<3:40:39,  1.50it/s]


 60%|███████████████████▉             | 30137/50000 [5:28:05<3:34:36,  1.54it/s]


 60%|███████████████████▉             | 30138/50000 [5:28:06<3:28:53,  1.58it/s]


 60%|███████████████████▉             | 30139/50000 [5:28:06<3:26:45,  1.60it/s]


 60%|███████████████████▉             | 30140/50000 [5:28:07<3:18:55,  1.66it/s]


 60%|███████████████████▉             | 30141/50000 [5:28:07<3:19:00,  1.66it/s]


 60%|███████████████████▉             | 30142/50000 [5:28:08<3:23:28,  1.63it/s]


 60%|███████████████████▉             | 30143/50000 [5:28:09<3:22:44,  1.63it/s]


 60%|███████████████████▉             | 30144/50000 [5:28:09<3:24:45,  1.62it/s]


 60%|███████████████████▉             | 30145/50000 [5:28:10<3:35:48,  1.53it/s]


 60%|███████████████████▉             | 30146/50000 [5:28:10<3:30:07,  1.57it/s]


 60%|███████████████████▉             | 30147/50000 [5:28:11<3:33:40,  1.55it/s]


 60%|███████████████████▉             | 30148/50000 [5:28:12<3:34:16,  1.54it/s]


 60%|███████████████████▉             | 30149/50000 [5:28:12<3:27:49,  1.59it/s]


 60%|███████████████████▉             | 30150/50000 [5:28:13<3:25:47,  1.61it/s]


 60%|███████████████████▉             | 30151/50000 [5:28:14<3:39:03,  1.51it/s]


 60%|███████████████████▉             | 30152/50000 [5:28:15<3:48:20,  1.45it/s]


 60%|███████████████████▉             | 30153/50000 [5:28:15<3:37:39,  1.52it/s]


 60%|███████████████████▉             | 30154/50000 [5:28:16<3:39:33,  1.51it/s]


 60%|███████████████████▉             | 30155/50000 [5:28:16<3:29:38,  1.58it/s]


 60%|███████████████████▉             | 30156/50000 [5:28:17<3:28:54,  1.58it/s]


 60%|███████████████████▉             | 30157/50000 [5:28:18<3:30:55,  1.57it/s]


 60%|███████████████████▉             | 30158/50000 [5:28:18<3:32:56,  1.55it/s]


 60%|███████████████████▉             | 30159/50000 [5:28:19<3:31:52,  1.56it/s]


 60%|███████████████████▉             | 30160/50000 [5:28:20<3:27:22,  1.59it/s]


 60%|███████████████████▉             | 30161/50000 [5:28:20<3:21:44,  1.64it/s]


 60%|███████████████████▉             | 30162/50000 [5:28:21<3:24:04,  1.62it/s]


 60%|███████████████████▉             | 30163/50000 [5:28:21<3:36:03,  1.53it/s]


 60%|███████████████████▉             | 30164/50000 [5:28:22<3:33:29,  1.55it/s]


 60%|███████████████████▉             | 30165/50000 [5:28:23<3:33:30,  1.55it/s]


 60%|███████████████████▉             | 30166/50000 [5:28:23<3:35:26,  1.53it/s]


 60%|███████████████████▉             | 30167/50000 [5:28:24<3:34:10,  1.54it/s]


 60%|███████████████████▉             | 30168/50000 [5:28:25<3:46:50,  1.46it/s]


 60%|███████████████████▉             | 30169/50000 [5:28:25<3:38:27,  1.51it/s]


 60%|███████████████████▉             | 30170/50000 [5:28:26<3:28:49,  1.58it/s]


 60%|███████████████████▉             | 30171/50000 [5:28:27<3:28:20,  1.59it/s]


 60%|███████████████████▉             | 30172/50000 [5:28:27<3:23:37,  1.62it/s]


 60%|███████████████████▉             | 30173/50000 [5:28:28<3:18:28,  1.66it/s]


 60%|███████████████████▉             | 30174/50000 [5:28:28<3:30:41,  1.57it/s]


 60%|███████████████████▉             | 30175/50000 [5:28:29<3:30:56,  1.57it/s]


 60%|███████████████████▉             | 30176/50000 [5:28:30<3:47:59,  1.45it/s]


 60%|███████████████████▉             | 30177/50000 [5:28:31<3:52:37,  1.42it/s]


 60%|███████████████████▉             | 30178/50000 [5:28:31<3:44:28,  1.47it/s]


 60%|███████████████████▉             | 30179/50000 [5:28:32<3:47:30,  1.45it/s]


 60%|███████████████████▉             | 30180/50000 [5:28:33<3:35:08,  1.54it/s]


 60%|███████████████████▉             | 30181/50000 [5:28:33<3:43:35,  1.48it/s]


 60%|███████████████████▉             | 30182/50000 [5:28:34<3:48:16,  1.45it/s]


 60%|███████████████████▉             | 30183/50000 [5:28:35<3:52:11,  1.42it/s]


 60%|███████████████████▉             | 30184/50000 [5:28:35<3:38:22,  1.51it/s]


 60%|███████████████████▉             | 30185/50000 [5:28:36<3:36:10,  1.53it/s]


 60%|███████████████████▉             | 30186/50000 [5:28:37<3:29:02,  1.58it/s]


 60%|███████████████████▉             | 30187/50000 [5:28:37<3:20:55,  1.64it/s]


 60%|███████████████████▉             | 30188/50000 [5:28:38<3:19:36,  1.65it/s]


 60%|███████████████████▉             | 30189/50000 [5:28:38<3:13:41,  1.70it/s]


 60%|███████████████████▉             | 30190/50000 [5:28:39<3:12:39,  1.71it/s]


 60%|███████████████████▉             | 30191/50000 [5:28:39<3:05:23,  1.78it/s]


 60%|███████████████████▉             | 30192/50000 [5:28:40<3:13:44,  1.70it/s]


 60%|███████████████████▉             | 30193/50000 [5:28:41<3:12:42,  1.71it/s]


 60%|███████████████████▉             | 30194/50000 [5:28:41<3:10:07,  1.74it/s]


 60%|███████████████████▉             | 30195/50000 [5:28:42<3:20:08,  1.65it/s]


 60%|███████████████████▉             | 30196/50000 [5:28:42<3:24:36,  1.61it/s]


 60%|███████████████████▉             | 30197/50000 [5:28:43<3:45:15,  1.47it/s]


 60%|███████████████████▉             | 30198/50000 [5:28:44<3:39:08,  1.51it/s]


 60%|███████████████████▉             | 30199/50000 [5:28:44<3:32:57,  1.55it/s]


 60%|███████████████████▉             | 30200/50000 [5:28:45<3:41:30,  1.49it/s]
                                                                                
{'loss': 3.2449, 'grad_norm': 2.978003978729248, 'learning_rate': 0.00039600000000000003, 'epoch': 1.58}

 60%|███████████████████▉             | 30200/50000 [5:28:45<3:41:30,  1.49it/s]


 60%|███████████████████▉             | 30201/50000 [5:28:46<3:39:39,  1.50it/s]


 60%|███████████████████▉             | 30202/50000 [5:28:47<3:39:15,  1.50it/s]


 60%|███████████████████▉             | 30203/50000 [5:28:47<3:38:53,  1.51it/s]


 60%|███████████████████▉             | 30204/50000 [5:28:48<3:56:07,  1.40it/s]


 60%|███████████████████▉             | 30205/50000 [5:28:49<3:47:32,  1.45it/s]


 60%|███████████████████▉             | 30206/50000 [5:28:49<3:35:45,  1.53it/s]


 60%|███████████████████▉             | 30207/50000 [5:28:50<3:30:45,  1.57it/s]


 60%|███████████████████▉             | 30208/50000 [5:28:50<3:27:15,  1.59it/s]


 60%|███████████████████▉             | 30209/50000 [5:28:51<3:38:09,  1.51it/s]


 60%|███████████████████▉             | 30210/50000 [5:28:52<3:40:24,  1.50it/s]


 60%|███████████████████▉             | 30211/50000 [5:28:52<3:30:43,  1.57it/s]


 60%|███████████████████▉             | 30212/50000 [5:28:53<3:25:19,  1.61it/s]


 60%|███████████████████▉             | 30213/50000 [5:28:54<3:18:54,  1.66it/s]


 60%|███████████████████▉             | 30214/50000 [5:28:54<3:22:29,  1.63it/s]


 60%|███████████████████▉             | 30215/50000 [5:28:55<3:17:30,  1.67it/s]


 60%|███████████████████▉             | 30216/50000 [5:28:55<3:21:45,  1.63it/s]


 60%|███████████████████▉             | 30217/50000 [5:28:56<3:33:36,  1.54it/s]


 60%|███████████████████▉             | 30218/50000 [5:28:57<3:35:34,  1.53it/s]


 60%|███████████████████▉             | 30219/50000 [5:28:57<3:38:04,  1.51it/s]


 60%|███████████████████▉             | 30220/50000 [5:28:58<3:31:36,  1.56it/s]


 60%|███████████████████▉             | 30221/50000 [5:28:59<3:34:31,  1.54it/s]


 60%|███████████████████▉             | 30222/50000 [5:28:59<3:32:11,  1.55it/s]


 60%|███████████████████▉             | 30223/50000 [5:29:00<3:30:29,  1.57it/s]


 60%|███████████████████▉             | 30224/50000 [5:29:01<3:30:32,  1.57it/s]


 60%|███████████████████▉             | 30225/50000 [5:29:01<3:31:15,  1.56it/s]


 60%|███████████████████▉             | 30226/50000 [5:29:02<3:42:00,  1.48it/s]


 60%|███████████████████▉             | 30227/50000 [5:29:03<3:41:09,  1.49it/s]


 60%|███████████████████▉             | 30228/50000 [5:29:03<3:31:13,  1.56it/s]


 60%|███████████████████▉             | 30229/50000 [5:29:04<3:26:41,  1.59it/s]


 60%|███████████████████▉             | 30230/50000 [5:29:04<3:21:52,  1.63it/s]


 60%|███████████████████▉             | 30231/50000 [5:29:05<3:21:21,  1.64it/s]


 60%|███████████████████▉             | 30232/50000 [5:29:06<3:17:19,  1.67it/s]


 60%|███████████████████▉             | 30233/50000 [5:29:06<3:21:18,  1.64it/s]


 60%|███████████████████▉             | 30234/50000 [5:29:07<3:25:20,  1.60it/s]


 60%|███████████████████▉             | 30235/50000 [5:29:08<3:22:32,  1.63it/s]


 60%|███████████████████▉             | 30236/50000 [5:29:08<3:18:08,  1.66it/s]


 60%|███████████████████▉             | 30237/50000 [5:29:09<3:41:50,  1.48it/s]


 60%|███████████████████▉             | 30238/50000 [5:29:09<3:26:20,  1.60it/s]


 60%|███████████████████▉             | 30239/50000 [5:29:10<3:12:24,  1.71it/s]


 60%|███████████████████▉             | 30240/50000 [5:29:11<3:14:08,  1.70it/s]


 60%|███████████████████▉             | 30241/50000 [5:29:11<3:22:49,  1.62it/s]


 60%|███████████████████▉             | 30242/50000 [5:29:12<3:32:05,  1.55it/s]


 60%|███████████████████▉             | 30243/50000 [5:29:13<3:33:07,  1.55it/s]


 60%|███████████████████▉             | 30244/50000 [5:29:13<3:29:26,  1.57it/s]


 60%|███████████████████▉             | 30245/50000 [5:29:14<3:38:12,  1.51it/s]


 60%|███████████████████▉             | 30246/50000 [5:29:15<3:34:06,  1.54it/s]


 60%|███████████████████▉             | 30247/50000 [5:29:15<3:44:34,  1.47it/s]


 60%|███████████████████▉             | 30248/50000 [5:29:16<3:41:33,  1.49it/s]


 60%|███████████████████▉             | 30249/50000 [5:29:17<3:41:24,  1.49it/s]


 60%|███████████████████▉             | 30250/50000 [5:29:18<4:03:48,  1.35it/s]


 61%|███████████████████▉             | 30251/50000 [5:29:18<3:57:34,  1.39it/s]


 61%|███████████████████▉             | 30252/50000 [5:29:19<4:19:33,  1.27it/s]


 61%|███████████████████▉             | 30253/50000 [5:29:20<4:08:00,  1.33it/s]


 61%|███████████████████▉             | 30254/50000 [5:29:20<4:00:29,  1.37it/s]


 61%|███████████████████▉             | 30255/50000 [5:29:21<3:48:18,  1.44it/s]


 61%|███████████████████▉             | 30256/50000 [5:29:22<3:50:28,  1.43it/s]


 61%|███████████████████▉             | 30257/50000 [5:29:22<3:45:40,  1.46it/s]


 61%|███████████████████▉             | 30258/50000 [5:29:23<3:41:05,  1.49it/s]


 61%|███████████████████▉             | 30259/50000 [5:29:24<3:40:22,  1.49it/s]


 61%|███████████████████▉             | 30260/50000 [5:29:24<3:31:07,  1.56it/s]


 61%|███████████████████▉             | 30261/50000 [5:29:25<3:31:32,  1.56it/s]


 61%|███████████████████▉             | 30262/50000 [5:29:26<3:31:07,  1.56it/s]


 61%|███████████████████▉             | 30263/50000 [5:29:27<4:08:13,  1.33it/s]


 61%|███████████████████▉             | 30264/50000 [5:29:27<3:58:04,  1.38it/s]


 61%|███████████████████▉             | 30265/50000 [5:29:28<3:48:35,  1.44it/s]


 61%|███████████████████▉             | 30266/50000 [5:29:29<3:38:42,  1.50it/s]


 61%|███████████████████▉             | 30267/50000 [5:29:29<4:06:12,  1.34it/s]


 61%|███████████████████▉             | 30268/50000 [5:29:30<3:55:24,  1.40it/s]


 61%|███████████████████▉             | 30269/50000 [5:29:31<3:49:46,  1.43it/s]


 61%|███████████████████▉             | 30270/50000 [5:29:31<3:43:23,  1.47it/s]


 61%|███████████████████▉             | 30271/50000 [5:29:32<3:30:06,  1.57it/s]


 61%|███████████████████▉             | 30272/50000 [5:29:33<3:26:24,  1.59it/s]


 61%|███████████████████▉             | 30273/50000 [5:29:33<3:22:34,  1.62it/s]


 61%|███████████████████▉             | 30274/50000 [5:29:34<3:24:50,  1.61it/s]


 61%|███████████████████▉             | 30275/50000 [5:29:34<3:27:39,  1.58it/s]


 61%|███████████████████▉             | 30276/50000 [5:29:35<3:30:16,  1.56it/s]


 61%|███████████████████▉             | 30277/50000 [5:29:36<3:29:16,  1.57it/s]


 61%|███████████████████▉             | 30278/50000 [5:29:36<3:40:29,  1.49it/s]


 61%|███████████████████▉             | 30279/50000 [5:29:37<3:31:50,  1.55it/s]


 61%|███████████████████▉             | 30280/50000 [5:29:38<3:42:53,  1.47it/s]


 61%|███████████████████▉             | 30281/50000 [5:29:39<3:48:44,  1.44it/s]


 61%|███████████████████▉             | 30282/50000 [5:29:39<3:36:59,  1.51it/s]


 61%|███████████████████▉             | 30283/50000 [5:29:40<3:29:58,  1.57it/s]


 61%|███████████████████▉             | 30284/50000 [5:29:40<3:29:26,  1.57it/s]


 61%|███████████████████▉             | 30285/50000 [5:29:41<3:30:51,  1.56it/s]


 61%|███████████████████▉             | 30286/50000 [5:29:42<3:33:08,  1.54it/s]


 61%|███████████████████▉             | 30287/50000 [5:29:42<3:27:51,  1.58it/s]


 61%|███████████████████▉             | 30288/50000 [5:29:43<3:22:56,  1.62it/s]


 61%|███████████████████▉             | 30289/50000 [5:29:43<3:20:41,  1.64it/s]


 61%|███████████████████▉             | 30290/50000 [5:29:44<3:19:08,  1.65it/s]


 61%|███████████████████▉             | 30291/50000 [5:29:45<3:36:22,  1.52it/s]


 61%|███████████████████▉             | 30292/50000 [5:29:45<3:37:38,  1.51it/s]


 61%|███████████████████▉             | 30293/50000 [5:29:46<3:35:20,  1.53it/s]


 61%|███████████████████▉             | 30294/50000 [5:29:47<3:36:13,  1.52it/s]


 61%|███████████████████▉             | 30295/50000 [5:29:47<3:38:42,  1.50it/s]


 61%|███████████████████▉             | 30296/50000 [5:29:48<3:48:13,  1.44it/s]


 61%|███████████████████▉             | 30297/50000 [5:29:49<3:52:21,  1.41it/s]


 61%|███████████████████▉             | 30298/50000 [5:29:50<3:41:56,  1.48it/s]


 61%|███████████████████▉             | 30299/50000 [5:29:50<3:46:41,  1.45it/s]


 61%|███████████████████▉             | 30300/50000 [5:29:51<3:44:46,  1.46it/s]
                                                                                
{'loss': 3.2561, 'grad_norm': 3.5650930404663086, 'learning_rate': 0.00039400000000000004, 'epoch': 1.59}

 61%|███████████████████▉             | 30300/50000 [5:29:51<3:44:46,  1.46it/s]


 61%|███████████████████▉             | 30301/50000 [5:29:52<3:48:41,  1.44it/s]


 61%|███████████████████▉             | 30302/50000 [5:29:52<3:38:03,  1.51it/s]


 61%|███████████████████▉             | 30303/50000 [5:29:53<3:22:55,  1.62it/s]


 61%|████████████████████             | 30304/50000 [5:29:53<3:24:23,  1.61it/s]


 61%|████████████████████             | 30305/50000 [5:29:54<3:17:20,  1.66it/s]


 61%|████████████████████             | 30306/50000 [5:29:55<3:32:36,  1.54it/s]


 61%|████████████████████             | 30307/50000 [5:29:55<3:29:57,  1.56it/s]


 61%|████████████████████             | 30308/50000 [5:29:56<3:31:10,  1.55it/s]


 61%|████████████████████             | 30309/50000 [5:29:57<3:26:34,  1.59it/s]


 61%|████████████████████             | 30310/50000 [5:29:57<3:23:49,  1.61it/s]


 61%|████████████████████             | 30311/50000 [5:29:58<3:16:58,  1.67it/s]


 61%|████████████████████             | 30312/50000 [5:29:58<3:31:36,  1.55it/s]


 61%|████████████████████             | 30313/50000 [5:29:59<3:34:45,  1.53it/s]


 61%|████████████████████             | 30314/50000 [5:30:00<3:37:00,  1.51it/s]


 61%|████████████████████             | 30315/50000 [5:30:00<3:27:24,  1.58it/s]


 61%|████████████████████             | 30316/50000 [5:30:01<3:28:41,  1.57it/s]


 61%|████████████████████             | 30317/50000 [5:30:02<3:41:05,  1.48it/s]


 61%|████████████████████             | 30318/50000 [5:30:02<3:41:24,  1.48it/s]


 61%|████████████████████             | 30319/50000 [5:30:03<3:37:27,  1.51it/s]


 61%|████████████████████             | 30320/50000 [5:30:04<3:52:41,  1.41it/s]


 61%|████████████████████             | 30321/50000 [5:30:05<3:41:03,  1.48it/s]


 61%|████████████████████             | 30322/50000 [5:30:05<3:39:44,  1.49it/s]


 61%|████████████████████             | 30323/50000 [5:30:06<3:37:38,  1.51it/s]


 61%|████████████████████             | 30324/50000 [5:30:06<3:31:36,  1.55it/s]


 61%|████████████████████             | 30325/50000 [5:30:07<3:32:14,  1.54it/s]


 61%|████████████████████             | 30326/50000 [5:30:08<3:22:12,  1.62it/s]


 61%|████████████████████             | 30327/50000 [5:30:08<3:23:10,  1.61it/s]


 61%|████████████████████             | 30328/50000 [5:30:09<3:18:19,  1.65it/s]


 61%|████████████████████             | 30329/50000 [5:30:09<3:21:42,  1.63it/s]


 61%|████████████████████             | 30330/50000 [5:30:10<3:32:22,  1.54it/s]


 61%|████████████████████             | 30331/50000 [5:30:11<3:26:34,  1.59it/s]


 61%|████████████████████             | 30332/50000 [5:30:12<3:40:19,  1.49it/s]


 61%|████████████████████             | 30333/50000 [5:30:12<3:39:36,  1.49it/s]


 61%|████████████████████             | 30334/50000 [5:30:13<3:45:30,  1.45it/s]


 61%|████████████████████             | 30335/50000 [5:30:14<3:42:34,  1.47it/s]


 61%|████████████████████             | 30336/50000 [5:30:14<3:46:09,  1.45it/s]


 61%|████████████████████             | 30337/50000 [5:30:15<3:36:14,  1.52it/s]


 61%|████████████████████             | 30338/50000 [5:30:16<3:34:10,  1.53it/s]


 61%|████████████████████             | 30339/50000 [5:30:16<3:36:33,  1.51it/s]


 61%|████████████████████             | 30340/50000 [5:30:17<3:33:50,  1.53it/s]


 61%|████████████████████             | 30341/50000 [5:30:18<3:32:32,  1.54it/s]


 61%|████████████████████             | 30342/50000 [5:30:18<3:58:34,  1.37it/s]


 61%|████████████████████             | 30343/50000 [5:30:19<3:41:53,  1.48it/s]


 61%|████████████████████             | 30344/50000 [5:30:20<3:31:24,  1.55it/s]


 61%|████████████████████             | 30345/50000 [5:30:20<3:26:31,  1.59it/s]


 61%|████████████████████             | 30346/50000 [5:30:21<3:28:02,  1.57it/s]


 61%|████████████████████             | 30347/50000 [5:30:22<3:36:11,  1.52it/s]


 61%|████████████████████             | 30348/50000 [5:30:22<3:26:47,  1.58it/s]


 61%|████████████████████             | 30349/50000 [5:30:23<3:23:51,  1.61it/s]


 61%|████████████████████             | 30350/50000 [5:30:23<3:28:43,  1.57it/s]


 61%|████████████████████             | 30351/50000 [5:30:24<3:33:16,  1.54it/s]


 61%|████████████████████             | 30352/50000 [5:30:25<3:34:35,  1.53it/s]


 61%|████████████████████             | 30353/50000 [5:30:25<3:31:48,  1.55it/s]


 61%|████████████████████             | 30354/50000 [5:30:26<3:39:58,  1.49it/s]


 61%|████████████████████             | 30355/50000 [5:30:27<3:37:59,  1.50it/s]


 61%|████████████████████             | 30356/50000 [5:30:27<3:49:18,  1.43it/s]


 61%|████████████████████             | 30357/50000 [5:30:28<3:53:31,  1.40it/s]


 61%|████████████████████             | 30358/50000 [5:30:29<3:55:18,  1.39it/s]


 61%|████████████████████             | 30359/50000 [5:30:30<3:43:48,  1.46it/s]


 61%|████████████████████             | 30360/50000 [5:30:30<4:00:43,  1.36it/s]


 61%|████████████████████             | 30361/50000 [5:30:31<3:58:10,  1.37it/s]


 61%|████████████████████             | 30362/50000 [5:30:32<3:50:09,  1.42it/s]


 61%|████████████████████             | 30363/50000 [5:30:32<3:32:25,  1.54it/s]


 61%|████████████████████             | 30364/50000 [5:30:33<3:28:19,  1.57it/s]


 61%|████████████████████             | 30365/50000 [5:30:33<3:17:07,  1.66it/s]


 61%|████████████████████             | 30366/50000 [5:30:34<3:13:19,  1.69it/s]


 61%|████████████████████             | 30367/50000 [5:30:35<3:12:31,  1.70it/s]


 61%|████████████████████             | 30368/50000 [5:30:35<3:21:07,  1.63it/s]


 61%|████████████████████             | 30369/50000 [5:30:36<3:18:20,  1.65it/s]


 61%|████████████████████             | 30370/50000 [5:30:36<3:20:33,  1.63it/s]


 61%|████████████████████             | 30371/50000 [5:30:37<3:41:19,  1.48it/s]


 61%|████████████████████             | 30372/50000 [5:30:38<3:28:36,  1.57it/s]


 61%|████████████████████             | 30373/50000 [5:30:38<3:29:33,  1.56it/s]


 61%|████████████████████             | 30374/50000 [5:30:39<3:32:06,  1.54it/s]


 61%|████████████████████             | 30375/50000 [5:30:40<3:31:25,  1.55it/s]


 61%|████████████████████             | 30376/50000 [5:30:40<3:24:54,  1.60it/s]


 61%|████████████████████             | 30377/50000 [5:30:41<3:26:56,  1.58it/s]


 61%|████████████████████             | 30378/50000 [5:30:42<3:21:02,  1.63it/s]


 61%|████████████████████             | 30379/50000 [5:30:42<3:15:42,  1.67it/s]


 61%|████████████████████             | 30380/50000 [5:30:43<3:21:27,  1.62it/s]


 61%|████████████████████             | 30381/50000 [5:30:43<3:24:15,  1.60it/s]


 61%|████████████████████             | 30382/50000 [5:30:44<3:18:11,  1.65it/s]


 61%|████████████████████             | 30383/50000 [5:30:45<3:17:35,  1.65it/s]


 61%|████████████████████             | 30384/50000 [5:30:45<3:38:16,  1.50it/s]


 61%|████████████████████             | 30385/50000 [5:30:46<3:44:48,  1.45it/s]


 61%|████████████████████             | 30386/50000 [5:30:47<3:40:53,  1.48it/s]


 61%|████████████████████             | 30387/50000 [5:30:47<3:39:25,  1.49it/s]


 61%|████████████████████             | 30388/50000 [5:30:48<3:44:49,  1.45it/s]


 61%|████████████████████             | 30389/50000 [5:30:49<3:36:38,  1.51it/s]


 61%|████████████████████             | 30390/50000 [5:30:49<3:26:51,  1.58it/s]


 61%|████████████████████             | 30391/50000 [5:30:50<3:35:13,  1.52it/s]


 61%|████████████████████             | 30392/50000 [5:30:51<4:08:31,  1.32it/s]


 61%|████████████████████             | 30393/50000 [5:30:52<3:48:48,  1.43it/s]


 61%|████████████████████             | 30394/50000 [5:30:52<3:43:02,  1.47it/s]


 61%|████████████████████             | 30395/50000 [5:30:53<3:32:50,  1.54it/s]


 61%|████████████████████             | 30396/50000 [5:30:54<3:35:36,  1.52it/s]


 61%|████████████████████             | 30397/50000 [5:30:54<3:29:27,  1.56it/s]


 61%|████████████████████             | 30398/50000 [5:30:55<3:20:49,  1.63it/s]


 61%|████████████████████             | 30399/50000 [5:30:55<3:33:00,  1.53it/s]


 61%|████████████████████             | 30400/50000 [5:30:56<3:57:29,  1.38it/s]
                                                                                
{'loss': 3.2447, 'grad_norm': 3.185678005218506, 'learning_rate': 0.00039200000000000004, 'epoch': 1.59}

 61%|████████████████████             | 30400/50000 [5:30:56<3:57:29,  1.38it/s]


 61%|████████████████████             | 30401/50000 [5:30:57<3:57:28,  1.38it/s]


 61%|████████████████████             | 30402/50000 [5:30:58<3:45:16,  1.45it/s]


 61%|████████████████████             | 30403/50000 [5:30:58<3:33:41,  1.53it/s]


 61%|████████████████████             | 30404/50000 [5:30:59<3:35:58,  1.51it/s]


 61%|████████████████████             | 30405/50000 [5:31:00<3:37:34,  1.50it/s]


 61%|████████████████████             | 30406/50000 [5:31:00<3:33:43,  1.53it/s]


 61%|████████████████████             | 30407/50000 [5:31:01<3:28:32,  1.57it/s]


 61%|████████████████████             | 30408/50000 [5:31:02<3:33:01,  1.53it/s]


 61%|████████████████████             | 30409/50000 [5:31:02<3:36:10,  1.51it/s]


 61%|████████████████████             | 30410/50000 [5:31:03<3:20:04,  1.63it/s]


 61%|████████████████████             | 30411/50000 [5:31:03<3:14:02,  1.68it/s]


 61%|████████████████████             | 30412/50000 [5:31:04<3:12:56,  1.69it/s]


 61%|████████████████████             | 30413/50000 [5:31:04<3:13:58,  1.68it/s]


 61%|████████████████████             | 30414/50000 [5:31:05<3:12:15,  1.70it/s]


 61%|████████████████████             | 30415/50000 [5:31:06<3:26:45,  1.58it/s]


 61%|████████████████████             | 30416/50000 [5:31:06<3:31:05,  1.55it/s]


 61%|████████████████████             | 30417/50000 [5:31:07<3:24:51,  1.59it/s]


 61%|████████████████████             | 30418/50000 [5:31:08<3:30:24,  1.55it/s]


 61%|████████████████████             | 30419/50000 [5:31:08<3:40:10,  1.48it/s]


 61%|████████████████████             | 30420/50000 [5:31:09<3:32:22,  1.54it/s]


 61%|████████████████████             | 30421/50000 [5:31:10<3:32:29,  1.54it/s]


 61%|████████████████████             | 30422/50000 [5:31:10<3:32:00,  1.54it/s]


 61%|████████████████████             | 30423/50000 [5:31:11<3:19:33,  1.63it/s]


 61%|████████████████████             | 30424/50000 [5:31:12<3:24:49,  1.59it/s]


 61%|████████████████████             | 30425/50000 [5:31:12<3:34:51,  1.52it/s]


 61%|████████████████████             | 30426/50000 [5:31:13<3:57:52,  1.37it/s]


 61%|████████████████████             | 30427/50000 [5:31:14<3:55:57,  1.38it/s]


 61%|████████████████████             | 30428/50000 [5:31:15<3:52:10,  1.41it/s]


 61%|████████████████████             | 30429/50000 [5:31:15<3:40:46,  1.48it/s]


 61%|████████████████████             | 30430/50000 [5:31:16<3:43:59,  1.46it/s]


 61%|████████████████████             | 30431/50000 [5:31:16<3:32:03,  1.54it/s]


 61%|████████████████████             | 30432/50000 [5:31:17<3:40:46,  1.48it/s]


 61%|████████████████████             | 30433/50000 [5:31:18<3:40:17,  1.48it/s]


 61%|████████████████████             | 30434/50000 [5:31:18<3:36:45,  1.50it/s]


 61%|████████████████████             | 30435/50000 [5:31:19<3:33:03,  1.53it/s]


 61%|████████████████████             | 30436/50000 [5:31:20<3:30:30,  1.55it/s]


 61%|████████████████████             | 30437/50000 [5:31:21<3:55:17,  1.39it/s]


 61%|████████████████████             | 30438/50000 [5:31:21<3:46:00,  1.44it/s]


 61%|████████████████████             | 30439/50000 [5:31:22<3:32:45,  1.53it/s]


 61%|████████████████████             | 30440/50000 [5:31:22<3:35:53,  1.51it/s]


 61%|████████████████████             | 30441/50000 [5:31:23<3:40:35,  1.48it/s]


 61%|████████████████████             | 30442/50000 [5:31:24<3:37:37,  1.50it/s]


 61%|████████████████████             | 30443/50000 [5:31:25<3:45:02,  1.45it/s]


 61%|████████████████████             | 30444/50000 [5:31:25<3:47:31,  1.43it/s]


 61%|████████████████████             | 30445/50000 [5:31:26<3:48:39,  1.43it/s]


 61%|████████████████████             | 30446/50000 [5:31:27<3:43:13,  1.46it/s]


 61%|████████████████████             | 30447/50000 [5:31:27<3:27:21,  1.57it/s]


 61%|████████████████████             | 30448/50000 [5:31:28<3:37:49,  1.50it/s]


 61%|████████████████████             | 30449/50000 [5:31:29<3:31:17,  1.54it/s]


 61%|████████████████████             | 30450/50000 [5:31:29<3:34:45,  1.52it/s]


 61%|████████████████████             | 30451/50000 [5:31:30<3:33:30,  1.53it/s]


 61%|████████████████████             | 30452/50000 [5:31:31<3:32:37,  1.53it/s]


 61%|████████████████████             | 30453/50000 [5:31:31<3:39:02,  1.49it/s]


 61%|████████████████████             | 30454/50000 [5:31:32<3:44:41,  1.45it/s]


 61%|████████████████████             | 30455/50000 [5:31:33<3:41:43,  1.47it/s]


 61%|████████████████████             | 30456/50000 [5:31:33<3:35:53,  1.51it/s]


 61%|████████████████████             | 30457/50000 [5:31:34<3:33:05,  1.53it/s]


 61%|████████████████████             | 30458/50000 [5:31:35<3:32:53,  1.53it/s]


 61%|████████████████████             | 30459/50000 [5:31:35<3:29:44,  1.55it/s]


 61%|████████████████████             | 30460/50000 [5:31:36<3:38:48,  1.49it/s]


 61%|████████████████████             | 30461/50000 [5:31:36<3:14:42,  1.67it/s]


 61%|████████████████████             | 30462/50000 [5:31:37<3:29:28,  1.55it/s]


 61%|████████████████████             | 30463/50000 [5:31:38<3:12:17,  1.69it/s]


 61%|████████████████████             | 30464/50000 [5:31:38<3:23:53,  1.60it/s]


 61%|████████████████████             | 30465/50000 [5:31:39<3:33:27,  1.53it/s]


 61%|████████████████████             | 30466/50000 [5:31:40<3:27:33,  1.57it/s]


 61%|████████████████████             | 30467/50000 [5:31:40<3:28:26,  1.56it/s]


 61%|████████████████████             | 30468/50000 [5:31:41<3:26:32,  1.58it/s]


 61%|████████████████████             | 30469/50000 [5:31:41<3:31:26,  1.54it/s]


 61%|████████████████████             | 30470/50000 [5:31:42<3:34:50,  1.52it/s]


 61%|████████████████████             | 30471/50000 [5:31:43<3:28:33,  1.56it/s]


 61%|████████████████████             | 30472/50000 [5:31:43<3:24:42,  1.59it/s]


 61%|████████████████████             | 30473/50000 [5:31:44<3:22:41,  1.61it/s]


 61%|████████████████████             | 30474/50000 [5:31:45<3:34:27,  1.52it/s]


 61%|████████████████████             | 30475/50000 [5:31:45<3:39:22,  1.48it/s]


 61%|████████████████████             | 30476/50000 [5:31:46<3:29:49,  1.55it/s]


 61%|████████████████████             | 30477/50000 [5:31:47<3:26:11,  1.58it/s]


 61%|████████████████████             | 30478/50000 [5:31:47<3:38:44,  1.49it/s]


 61%|████████████████████             | 30479/50000 [5:31:48<3:27:40,  1.57it/s]


 61%|████████████████████             | 30480/50000 [5:31:49<3:23:30,  1.60it/s]


 61%|████████████████████             | 30481/50000 [5:31:49<3:12:51,  1.69it/s]


 61%|████████████████████             | 30482/50000 [5:31:50<3:25:40,  1.58it/s]


 61%|████████████████████             | 30483/50000 [5:31:50<3:19:04,  1.63it/s]


 61%|████████████████████             | 30484/50000 [5:31:51<3:18:01,  1.64it/s]


 61%|████████████████████             | 30485/50000 [5:31:52<3:24:42,  1.59it/s]


 61%|████████████████████             | 30486/50000 [5:31:52<3:23:54,  1.60it/s]


 61%|████████████████████             | 30487/50000 [5:31:53<3:19:39,  1.63it/s]


 61%|████████████████████             | 30488/50000 [5:31:53<3:20:57,  1.62it/s]


 61%|████████████████████             | 30489/50000 [5:31:54<3:22:48,  1.60it/s]


 61%|████████████████████             | 30490/50000 [5:31:55<3:15:53,  1.66it/s]


 61%|████████████████████             | 30491/50000 [5:31:55<3:23:11,  1.60it/s]


 61%|████████████████████             | 30492/50000 [5:31:56<3:19:04,  1.63it/s]


 61%|████████████████████▏            | 30493/50000 [5:31:57<3:18:37,  1.64it/s]


 61%|████████████████████▏            | 30494/50000 [5:31:57<3:30:20,  1.55it/s]


 61%|████████████████████▏            | 30495/50000 [5:31:58<3:32:40,  1.53it/s]


 61%|████████████████████▏            | 30496/50000 [5:31:59<3:33:41,  1.52it/s]


 61%|████████████████████▏            | 30497/50000 [5:31:59<3:35:37,  1.51it/s]


 61%|████████████████████▏            | 30498/50000 [5:32:00<3:28:20,  1.56it/s]


 61%|████████████████████▏            | 30499/50000 [5:32:00<3:28:51,  1.56it/s]


 61%|████████████████████▏            | 30500/50000 [5:32:01<3:21:53,  1.61it/s]
                                                                                
{'loss': 3.2386, 'grad_norm': 2.9934730529785156, 'learning_rate': 0.00039000000000000005, 'epoch': 1.6}

 61%|████████████████████▏            | 30500/50000 [5:32:01<3:21:53,  1.61it/s]


 61%|████████████████████▏            | 30501/50000 [5:32:02<3:45:21,  1.44it/s]


 61%|████████████████████▏            | 30502/50000 [5:32:02<3:27:25,  1.57it/s]


 61%|████████████████████▏            | 30503/50000 [5:32:03<3:28:43,  1.56it/s]


 61%|████████████████████▏            | 30504/50000 [5:32:04<3:29:04,  1.55it/s]


 61%|████████████████████▏            | 30505/50000 [5:32:04<3:25:32,  1.58it/s]


 61%|████████████████████▏            | 30506/50000 [5:32:05<3:29:51,  1.55it/s]


 61%|████████████████████▏            | 30507/50000 [5:32:06<3:30:25,  1.54it/s]


 61%|████████████████████▏            | 30508/50000 [5:32:06<3:29:01,  1.55it/s]


 61%|████████████████████▏            | 30509/50000 [5:32:07<3:37:33,  1.49it/s]


 61%|████████████████████▏            | 30510/50000 [5:32:08<3:33:52,  1.52it/s]


 61%|████████████████████▏            | 30511/50000 [5:32:08<3:29:02,  1.55it/s]


 61%|████████████████████▏            | 30512/50000 [5:32:09<3:48:57,  1.42it/s]


 61%|████████████████████▏            | 30513/50000 [5:32:10<3:42:01,  1.46it/s]


 61%|████████████████████▏            | 30514/50000 [5:32:10<3:33:59,  1.52it/s]


 61%|████████████████████▏            | 30515/50000 [5:32:11<3:25:57,  1.58it/s]


 61%|████████████████████▏            | 30516/50000 [5:32:12<3:30:16,  1.54it/s]


 61%|████████████████████▏            | 30517/50000 [5:32:12<3:29:35,  1.55it/s]


 61%|████████████████████▏            | 30518/50000 [5:32:13<3:32:44,  1.53it/s]


 61%|████████████████████▏            | 30519/50000 [5:32:14<3:34:55,  1.51it/s]


 61%|████████████████████▏            | 30520/50000 [5:32:14<3:29:04,  1.55it/s]


 61%|████████████████████▏            | 30521/50000 [5:32:15<3:24:21,  1.59it/s]


 61%|████████████████████▏            | 30522/50000 [5:32:16<3:35:25,  1.51it/s]


 61%|████████████████████▏            | 30523/50000 [5:32:16<3:27:34,  1.56it/s]


 61%|████████████████████▏            | 30524/50000 [5:32:17<3:31:56,  1.53it/s]


 61%|████████████████████▏            | 30525/50000 [5:32:17<3:25:06,  1.58it/s]


 61%|████████████████████▏            | 30526/50000 [5:32:18<3:28:56,  1.55it/s]


 61%|████████████████████▏            | 30527/50000 [5:32:19<3:32:48,  1.53it/s]


 61%|████████████████████▏            | 30528/50000 [5:32:19<3:28:06,  1.56it/s]


 61%|████████████████████▏            | 30529/50000 [5:32:20<3:28:33,  1.56it/s]


 61%|████████████████████▏            | 30530/50000 [5:32:21<3:31:21,  1.54it/s]


 61%|████████████████████▏            | 30531/50000 [5:32:21<3:25:54,  1.58it/s]


 61%|████████████████████▏            | 30532/50000 [5:32:22<3:29:30,  1.55it/s]


 61%|████████████████████▏            | 30533/50000 [5:32:23<3:30:50,  1.54it/s]


 61%|████████████████████▏            | 30534/50000 [5:32:23<3:29:49,  1.55it/s]


 61%|████████████████████▏            | 30535/50000 [5:32:24<3:37:56,  1.49it/s]


 61%|████████████████████▏            | 30536/50000 [5:32:25<3:29:17,  1.55it/s]


 61%|████████████████████▏            | 30537/50000 [5:32:25<3:30:37,  1.54it/s]


 61%|████████████████████▏            | 30538/50000 [5:32:26<3:21:43,  1.61it/s]


 61%|████████████████████▏            | 30539/50000 [5:32:27<3:33:18,  1.52it/s]


 61%|████████████████████▏            | 30540/50000 [5:32:27<3:49:35,  1.41it/s]


 61%|████████████████████▏            | 30541/50000 [5:32:28<3:37:37,  1.49it/s]


 61%|████████████████████▏            | 30542/50000 [5:32:28<3:26:50,  1.57it/s]


 61%|████████████████████▏            | 30543/50000 [5:32:29<3:22:55,  1.60it/s]


 61%|████████████████████▏            | 30544/50000 [5:32:30<3:35:49,  1.50it/s]


 61%|████████████████████▏            | 30545/50000 [5:32:30<3:34:07,  1.51it/s]


 61%|████████████████████▏            | 30546/50000 [5:32:31<3:34:20,  1.51it/s]


 61%|████████████████████▏            | 30547/50000 [5:32:32<3:26:49,  1.57it/s]


 61%|████████████████████▏            | 30548/50000 [5:32:32<3:31:00,  1.54it/s]


 61%|████████████████████▏            | 30549/50000 [5:32:33<3:25:47,  1.58it/s]


 61%|████████████████████▏            | 30550/50000 [5:32:34<3:30:46,  1.54it/s]


 61%|████████████████████▏            | 30551/50000 [5:32:34<3:24:23,  1.59it/s]


 61%|████████████████████▏            | 30552/50000 [5:32:35<3:39:17,  1.48it/s]


 61%|████████████████████▏            | 30553/50000 [5:32:36<3:20:53,  1.61it/s]


 61%|████████████████████▏            | 30554/50000 [5:32:36<3:31:48,  1.53it/s]


 61%|████████████████████▏            | 30555/50000 [5:32:37<3:34:18,  1.51it/s]


 61%|████████████████████▏            | 30556/50000 [5:32:38<3:23:49,  1.59it/s]


 61%|████████████████████▏            | 30557/50000 [5:32:38<3:33:14,  1.52it/s]


 61%|████████████████████▏            | 30558/50000 [5:32:39<3:35:59,  1.50it/s]


 61%|████████████████████▏            | 30559/50000 [5:32:40<3:33:23,  1.52it/s]


 61%|████████████████████▏            | 30560/50000 [5:32:40<3:33:25,  1.52it/s]


 61%|████████████████████▏            | 30561/50000 [5:32:41<3:27:51,  1.56it/s]


 61%|████████████████████▏            | 30562/50000 [5:32:42<3:30:42,  1.54it/s]


 61%|████████████████████▏            | 30563/50000 [5:32:42<3:36:24,  1.50it/s]


 61%|████████████████████▏            | 30564/50000 [5:32:43<3:41:51,  1.46it/s]


 61%|████████████████████▏            | 30565/50000 [5:32:44<3:55:11,  1.38it/s]


 61%|████████████████████▏            | 30566/50000 [5:32:44<3:45:34,  1.44it/s]


 61%|████████████████████▏            | 30567/50000 [5:32:45<3:43:09,  1.45it/s]


 61%|████████████████████▏            | 30568/50000 [5:32:46<3:39:24,  1.48it/s]


 61%|████████████████████▏            | 30569/50000 [5:32:46<3:44:52,  1.44it/s]


 61%|████████████████████▏            | 30570/50000 [5:32:47<3:48:27,  1.42it/s]


 61%|████████████████████▏            | 30571/50000 [5:32:48<3:40:50,  1.47it/s]


 61%|████████████████████▏            | 30572/50000 [5:32:48<3:29:59,  1.54it/s]


 61%|████████████████████▏            | 30573/50000 [5:32:49<3:28:30,  1.55it/s]


 61%|████████████████████▏            | 30574/50000 [5:32:50<3:28:01,  1.56it/s]


 61%|████████████████████▏            | 30575/50000 [5:32:50<3:30:48,  1.54it/s]


 61%|████████████████████▏            | 30576/50000 [5:32:51<3:33:07,  1.52it/s]


 61%|████████████████████▏            | 30577/50000 [5:32:52<3:28:11,  1.55it/s]


 61%|████████████████████▏            | 30578/50000 [5:32:52<3:32:12,  1.53it/s]


 61%|████████████████████▏            | 30579/50000 [5:32:53<3:33:44,  1.51it/s]


 61%|████████████████████▏            | 30580/50000 [5:32:54<3:51:20,  1.40it/s]


 61%|████████████████████▏            | 30581/50000 [5:32:55<4:05:00,  1.32it/s]


 61%|████████████████████▏            | 30582/50000 [5:32:55<3:46:52,  1.43it/s]


 61%|████████████████████▏            | 30583/50000 [5:32:56<3:34:45,  1.51it/s]


 61%|████████████████████▏            | 30584/50000 [5:32:56<3:25:03,  1.58it/s]


 61%|████████████████████▏            | 30585/50000 [5:32:57<3:26:11,  1.57it/s]


 61%|████████████████████▏            | 30586/50000 [5:32:58<3:19:03,  1.63it/s]


 61%|████████████████████▏            | 30587/50000 [5:32:58<3:22:58,  1.59it/s]


 61%|████████████████████▏            | 30588/50000 [5:32:59<3:24:08,  1.58it/s]


 61%|████████████████████▏            | 30589/50000 [5:32:59<3:18:50,  1.63it/s]


 61%|████████████████████▏            | 30590/50000 [5:33:00<3:29:27,  1.54it/s]


 61%|████████████████████▏            | 30591/50000 [5:33:01<3:24:57,  1.58it/s]


 61%|████████████████████▏            | 30592/50000 [5:33:01<3:24:54,  1.58it/s]


 61%|████████████████████▏            | 30593/50000 [5:33:02<3:19:28,  1.62it/s]


 61%|████████████████████▏            | 30594/50000 [5:33:03<3:20:31,  1.61it/s]


 61%|████████████████████▏            | 30595/50000 [5:33:03<3:25:27,  1.57it/s]


 61%|████████████████████▏            | 30596/50000 [5:33:04<3:39:37,  1.47it/s]


 61%|████████████████████▏            | 30597/50000 [5:33:05<3:37:35,  1.49it/s]


 61%|████████████████████▏            | 30598/50000 [5:33:05<3:29:33,  1.54it/s]


 61%|████████████████████▏            | 30599/50000 [5:33:06<3:33:18,  1.52it/s]


 61%|████████████████████▏            | 30600/50000 [5:33:07<3:35:55,  1.50it/s]
                                                                                
{'loss': 3.2692, 'grad_norm': 2.8078176975250244, 'learning_rate': 0.000388, 'epoch': 1.6}

 61%|████████████████████▏            | 30600/50000 [5:33:07<3:35:55,  1.50it/s]


 61%|████████████████████▏            | 30601/50000 [5:33:07<3:32:47,  1.52it/s]


 61%|████████████████████▏            | 30602/50000 [5:33:08<3:26:54,  1.56it/s]


 61%|████████████████████▏            | 30603/50000 [5:33:09<3:26:32,  1.57it/s]


 61%|████████████████████▏            | 30604/50000 [5:33:09<3:34:52,  1.50it/s]


 61%|████████████████████▏            | 30605/50000 [5:33:10<3:31:49,  1.53it/s]


 61%|████████████████████▏            | 30606/50000 [5:33:10<3:22:56,  1.59it/s]


 61%|████████████████████▏            | 30607/50000 [5:33:11<3:25:14,  1.57it/s]


 61%|████████████████████▏            | 30608/50000 [5:33:12<3:16:31,  1.64it/s]


 61%|████████████████████▏            | 30609/50000 [5:33:12<3:14:36,  1.66it/s]


 61%|████████████████████▏            | 30610/50000 [5:33:13<3:19:25,  1.62it/s]


 61%|████████████████████▏            | 30611/50000 [5:33:14<3:22:48,  1.59it/s]


 61%|████████████████████▏            | 30612/50000 [5:33:14<3:23:54,  1.58it/s]


 61%|████████████████████▏            | 30613/50000 [5:33:15<3:19:11,  1.62it/s]


 61%|████████████████████▏            | 30614/50000 [5:33:15<3:23:15,  1.59it/s]


 61%|████████████████████▏            | 30615/50000 [5:33:16<3:19:20,  1.62it/s]


 61%|████████████████████▏            | 30616/50000 [5:33:17<3:23:57,  1.58it/s]


 61%|████████████████████▏            | 30617/50000 [5:33:17<3:17:59,  1.63it/s]


 61%|████████████████████▏            | 30618/50000 [5:33:18<3:14:26,  1.66it/s]


 61%|████████████████████▏            | 30619/50000 [5:33:18<3:03:42,  1.76it/s]


 61%|████████████████████▏            | 30620/50000 [5:33:19<3:18:10,  1.63it/s]


 61%|████████████████████▏            | 30621/50000 [5:33:20<3:23:42,  1.59it/s]


 61%|████████████████████▏            | 30622/50000 [5:33:20<3:26:24,  1.56it/s]


 61%|████████████████████▏            | 30623/50000 [5:33:21<3:20:18,  1.61it/s]


 61%|████████████████████▏            | 30624/50000 [5:33:22<3:17:54,  1.63it/s]


 61%|████████████████████▏            | 30625/50000 [5:33:22<3:16:09,  1.65it/s]


 61%|████████████████████▏            | 30626/50000 [5:33:23<3:18:00,  1.63it/s]


 61%|████████████████████▏            | 30627/50000 [5:33:23<3:15:42,  1.65it/s]


 61%|████████████████████▏            | 30628/50000 [5:33:24<3:13:45,  1.67it/s]


 61%|████████████████████▏            | 30629/50000 [5:33:25<3:09:48,  1.70it/s]


 61%|████████████████████▏            | 30630/50000 [5:33:25<3:19:03,  1.62it/s]


 61%|████████████████████▏            | 30631/50000 [5:33:26<3:30:41,  1.53it/s]


 61%|████████████████████▏            | 30632/50000 [5:33:27<3:38:17,  1.48it/s]


 61%|████████████████████▏            | 30633/50000 [5:33:27<3:37:45,  1.48it/s]


 61%|████████████████████▏            | 30634/50000 [5:33:28<3:28:30,  1.55it/s]


 61%|████████████████████▏            | 30635/50000 [5:33:29<3:41:46,  1.46it/s]


 61%|████████████████████▏            | 30636/50000 [5:33:29<3:35:57,  1.49it/s]


 61%|████████████████████▏            | 30637/50000 [5:33:30<3:39:49,  1.47it/s]


 61%|████████████████████▏            | 30638/50000 [5:33:31<3:49:31,  1.41it/s]


 61%|████████████████████▏            | 30639/50000 [5:33:32<3:53:12,  1.38it/s]


 61%|████████████████████▏            | 30640/50000 [5:33:32<3:43:53,  1.44it/s]


 61%|████████████████████▏            | 30641/50000 [5:33:33<3:39:49,  1.47it/s]


 61%|████████████████████▏            | 30642/50000 [5:33:33<3:26:39,  1.56it/s]


 61%|████████████████████▏            | 30643/50000 [5:33:34<3:21:38,  1.60it/s]


 61%|████████████████████▏            | 30644/50000 [5:33:35<3:21:13,  1.60it/s]


 61%|████████████████████▏            | 30645/50000 [5:33:35<3:34:36,  1.50it/s]


 61%|████████████████████▏            | 30646/50000 [5:33:36<3:27:45,  1.55it/s]


 61%|████████████████████▏            | 30647/50000 [5:33:37<3:25:28,  1.57it/s]


 61%|████████████████████▏            | 30648/50000 [5:33:37<3:21:59,  1.60it/s]


 61%|████████████████████▏            | 30649/50000 [5:33:38<3:18:58,  1.62it/s]


 61%|████████████████████▏            | 30650/50000 [5:33:39<3:32:27,  1.52it/s]


 61%|████████████████████▏            | 30651/50000 [5:33:39<3:25:07,  1.57it/s]


 61%|████████████████████▏            | 30652/50000 [5:33:40<3:29:09,  1.54it/s]


 61%|████████████████████▏            | 30653/50000 [5:33:40<3:20:57,  1.60it/s]


 61%|████████████████████▏            | 30654/50000 [5:33:41<3:18:51,  1.62it/s]


 61%|████████████████████▏            | 30655/50000 [5:33:42<3:22:17,  1.59it/s]


 61%|████████████████████▏            | 30656/50000 [5:33:42<3:24:04,  1.58it/s]


 61%|████████████████████▏            | 30657/50000 [5:33:43<3:24:38,  1.58it/s]


 61%|████████████████████▏            | 30658/50000 [5:33:43<3:13:46,  1.66it/s]


 61%|████████████████████▏            | 30659/50000 [5:33:44<3:20:33,  1.61it/s]


 61%|████████████████████▏            | 30660/50000 [5:33:45<3:22:14,  1.59it/s]


 61%|████████████████████▏            | 30661/50000 [5:33:45<3:18:29,  1.62it/s]


 61%|████████████████████▏            | 30662/50000 [5:33:46<3:17:40,  1.63it/s]


 61%|████████████████████▏            | 30663/50000 [5:33:47<3:28:05,  1.55it/s]


 61%|████████████████████▏            | 30664/50000 [5:33:47<3:22:11,  1.59it/s]


 61%|████████████████████▏            | 30665/50000 [5:33:48<3:32:32,  1.52it/s]


 61%|████████████████████▏            | 30666/50000 [5:33:49<3:31:53,  1.52it/s]


 61%|████████████████████▏            | 30667/50000 [5:33:49<3:47:13,  1.42it/s]


 61%|████████████████████▏            | 30668/50000 [5:33:50<3:50:51,  1.40it/s]


 61%|████████████████████▏            | 30669/50000 [5:33:51<3:51:33,  1.39it/s]


 61%|████████████████████▏            | 30670/50000 [5:33:52<3:39:33,  1.47it/s]


 61%|████████████████████▏            | 30671/50000 [5:33:52<3:43:41,  1.44it/s]


 61%|████████████████████▏            | 30672/50000 [5:33:53<3:29:51,  1.54it/s]


 61%|████████████████████▏            | 30673/50000 [5:33:53<3:21:22,  1.60it/s]


 61%|████████████████████▏            | 30674/50000 [5:33:54<3:30:06,  1.53it/s]


 61%|████████████████████▏            | 30675/50000 [5:33:55<3:22:10,  1.59it/s]


 61%|████████████████████▏            | 30676/50000 [5:33:55<3:22:03,  1.59it/s]


 61%|████████████████████▏            | 30677/50000 [5:33:56<3:17:08,  1.63it/s]


 61%|████████████████████▏            | 30678/50000 [5:33:57<3:23:27,  1.58it/s]


 61%|████████████████████▏            | 30679/50000 [5:33:57<3:34:49,  1.50it/s]


 61%|████████████████████▏            | 30680/50000 [5:33:58<3:32:47,  1.51it/s]


 61%|████████████████████▏            | 30681/50000 [5:33:59<3:28:55,  1.54it/s]


 61%|████████████████████▎            | 30682/50000 [5:33:59<3:37:24,  1.48it/s]


 61%|████████████████████▎            | 30683/50000 [5:34:00<3:42:42,  1.45it/s]


 61%|████████████████████▎            | 30684/50000 [5:34:01<3:25:46,  1.56it/s]


 61%|████████████████████▎            | 30685/50000 [5:34:01<3:32:33,  1.51it/s]


 61%|████████████████████▎            | 30686/50000 [5:34:02<3:37:15,  1.48it/s]


 61%|████████████████████▎            | 30687/50000 [5:34:02<3:24:46,  1.57it/s]


 61%|████████████████████▎            | 30688/50000 [5:34:03<3:15:55,  1.64it/s]


 61%|████████████████████▎            | 30689/50000 [5:34:04<3:26:13,  1.56it/s]


 61%|████████████████████▎            | 30690/50000 [5:34:05<3:47:29,  1.41it/s]


 61%|████████████████████▎            | 30691/50000 [5:34:05<3:36:30,  1.49it/s]


 61%|████████████████████▎            | 30692/50000 [5:34:06<3:50:26,  1.40it/s]


 61%|████████████████████▎            | 30693/50000 [5:34:07<3:47:20,  1.42it/s]


 61%|████████████████████▎            | 30694/50000 [5:34:07<3:42:04,  1.45it/s]


 61%|████████████████████▎            | 30695/50000 [5:34:08<3:39:37,  1.47it/s]


 61%|████████████████████▎            | 30696/50000 [5:34:09<3:33:39,  1.51it/s]


 61%|████████████████████▎            | 30697/50000 [5:34:09<3:49:32,  1.40it/s]


 61%|████████████████████▎            | 30698/50000 [5:34:10<3:37:29,  1.48it/s]


 61%|████████████████████▎            | 30699/50000 [5:34:11<3:27:50,  1.55it/s]


 61%|████████████████████▎            | 30700/50000 [5:34:11<3:22:53,  1.59it/s]
                                                                                
{'loss': 3.2634, 'grad_norm': 2.940323829650879, 'learning_rate': 0.000386, 'epoch': 1.61}

 61%|████████████████████▎            | 30700/50000 [5:34:11<3:22:53,  1.59it/s]


 61%|████████████████████▎            | 30701/50000 [5:34:12<3:23:52,  1.58it/s]


 61%|████████████████████▎            | 30702/50000 [5:34:13<3:25:37,  1.56it/s]


 61%|████████████████████▎            | 30703/50000 [5:34:13<3:21:23,  1.60it/s]


 61%|████████████████████▎            | 30704/50000 [5:34:14<3:18:57,  1.62it/s]


 61%|████████████████████▎            | 30705/50000 [5:34:14<3:17:52,  1.63it/s]


 61%|████████████████████▎            | 30706/50000 [5:34:15<3:19:01,  1.62it/s]


 61%|████████████████████▎            | 30707/50000 [5:34:16<3:30:21,  1.53it/s]


 61%|████████████████████▎            | 30708/50000 [5:34:16<3:29:06,  1.54it/s]


 61%|████████████████████▎            | 30709/50000 [5:34:17<3:27:01,  1.55it/s]


 61%|████████████████████▎            | 30710/50000 [5:34:18<3:25:24,  1.57it/s]


 61%|████████████████████▎            | 30711/50000 [5:34:18<3:32:47,  1.51it/s]


 61%|████████████████████▎            | 30712/50000 [5:34:19<3:30:05,  1.53it/s]


 61%|████████████████████▎            | 30713/50000 [5:34:20<3:24:41,  1.57it/s]


 61%|████████████████████▎            | 30714/50000 [5:34:20<3:27:29,  1.55it/s]


 61%|████████████████████▎            | 30715/50000 [5:34:21<3:26:19,  1.56it/s]


 61%|████████████████████▎            | 30716/50000 [5:34:21<3:27:54,  1.55it/s]


 61%|████████████████████▎            | 30717/50000 [5:34:22<3:29:45,  1.53it/s]


 61%|████████████████████▎            | 30718/50000 [5:34:23<3:31:39,  1.52it/s]


 61%|████████████████████▎            | 30719/50000 [5:34:23<3:30:58,  1.52it/s]


 61%|████████████████████▎            | 30720/50000 [5:34:24<3:36:05,  1.49it/s]


 61%|████████████████████▎            | 30721/50000 [5:34:25<4:08:33,  1.29it/s]


 61%|████████████████████▎            | 30722/50000 [5:34:26<3:47:43,  1.41it/s]


 61%|████████████████████▎            | 30723/50000 [5:34:27<3:59:55,  1.34it/s]


 61%|████████████████████▎            | 30724/50000 [5:34:27<3:42:52,  1.44it/s]


 61%|████████████████████▎            | 30725/50000 [5:34:28<3:40:43,  1.46it/s]


 61%|████████████████████▎            | 30726/50000 [5:34:28<3:34:20,  1.50it/s]


 61%|████████████████████▎            | 30727/50000 [5:34:29<3:27:21,  1.55it/s]


 61%|████████████████████▎            | 30728/50000 [5:34:30<3:53:37,  1.37it/s]


 61%|████████████████████▎            | 30729/50000 [5:34:31<3:36:37,  1.48it/s]


 61%|████████████████████▎            | 30730/50000 [5:34:31<3:41:58,  1.45it/s]


 61%|████████████████████▎            | 30731/50000 [5:34:32<3:43:43,  1.44it/s]


 61%|████████████████████▎            | 30732/50000 [5:34:33<3:34:30,  1.50it/s]


 61%|████████████████████▎            | 30733/50000 [5:34:33<3:26:52,  1.55it/s]


 61%|████████████████████▎            | 30734/50000 [5:34:34<3:28:24,  1.54it/s]


 61%|████████████████████▎            | 30735/50000 [5:34:34<3:18:33,  1.62it/s]


 61%|████████████████████▎            | 30736/50000 [5:34:35<3:15:15,  1.64it/s]


 61%|████████████████████▎            | 30737/50000 [5:34:36<3:31:56,  1.51it/s]


 61%|████████████████████▎            | 30738/50000 [5:34:36<3:18:38,  1.62it/s]


 61%|████████████████████▎            | 30739/50000 [5:34:37<3:41:26,  1.45it/s]


 61%|████████████████████▎            | 30740/50000 [5:34:38<3:38:24,  1.47it/s]


 61%|████████████████████▎            | 30741/50000 [5:34:38<3:33:18,  1.50it/s]


 61%|████████████████████▎            | 30742/50000 [5:34:39<3:50:56,  1.39it/s]


 61%|████████████████████▎            | 30743/50000 [5:34:40<3:43:10,  1.44it/s]


 61%|████████████████████▎            | 30744/50000 [5:34:40<3:31:41,  1.52it/s]


 61%|████████████████████▎            | 30745/50000 [5:34:41<3:34:02,  1.50it/s]


 61%|████████████████████▎            | 30746/50000 [5:34:42<3:25:55,  1.56it/s]


 61%|████████████████████▎            | 30747/50000 [5:34:42<3:34:32,  1.50it/s]


 61%|████████████████████▎            | 30748/50000 [5:34:43<3:34:11,  1.50it/s]


 61%|████████████████████▎            | 30749/50000 [5:34:44<3:27:12,  1.55it/s]


 62%|████████████████████▎            | 30750/50000 [5:34:44<3:38:23,  1.47it/s]


 62%|████████████████████▎            | 30751/50000 [5:34:45<3:32:37,  1.51it/s]


 62%|████████████████████▎            | 30752/50000 [5:34:46<3:29:11,  1.53it/s]


 62%|████████████████████▎            | 30753/50000 [5:34:46<3:37:18,  1.48it/s]


 62%|████████████████████▎            | 30754/50000 [5:34:47<3:33:04,  1.51it/s]


 62%|████████████████████▎            | 30755/50000 [5:34:48<3:26:21,  1.55it/s]


 62%|████████████████████▎            | 30756/50000 [5:34:48<3:21:40,  1.59it/s]


 62%|████████████████████▎            | 30757/50000 [5:34:49<3:25:10,  1.56it/s]


 62%|████████████████████▎            | 30758/50000 [5:34:50<3:27:36,  1.54it/s]


 62%|████████████████████▎            | 30759/50000 [5:34:50<3:26:47,  1.55it/s]


 62%|████████████████████▎            | 30760/50000 [5:34:51<3:26:15,  1.55it/s]


 62%|████████████████████▎            | 30761/50000 [5:34:51<3:22:11,  1.59it/s]


 62%|████████████████████▎            | 30762/50000 [5:34:52<3:13:59,  1.65it/s]


 62%|████████████████████▎            | 30763/50000 [5:34:53<3:16:42,  1.63it/s]


 62%|████████████████████▎            | 30764/50000 [5:34:53<3:28:32,  1.54it/s]


 62%|████████████████████▎            | 30765/50000 [5:34:54<3:46:16,  1.42it/s]


 62%|████████████████████▎            | 30766/50000 [5:34:55<3:41:04,  1.45it/s]


 62%|████████████████████▎            | 30767/50000 [5:34:56<3:43:35,  1.43it/s]


 62%|████████████████████▎            | 30768/50000 [5:34:56<3:34:17,  1.50it/s]


 62%|████████████████████▎            | 30769/50000 [5:34:57<3:30:53,  1.52it/s]


 62%|████████████████████▎            | 30770/50000 [5:34:58<3:32:45,  1.51it/s]


 62%|████████████████████▎            | 30771/50000 [5:34:58<3:25:41,  1.56it/s]


 62%|████████████████████▎            | 30772/50000 [5:34:59<3:42:40,  1.44it/s]


 62%|████████████████████▎            | 30773/50000 [5:35:00<3:36:54,  1.48it/s]


 62%|████████████████████▎            | 30774/50000 [5:35:00<3:28:38,  1.54it/s]


 62%|████████████████████▎            | 30775/50000 [5:35:01<3:19:52,  1.60it/s]


 62%|████████████████████▎            | 30776/50000 [5:35:01<3:14:59,  1.64it/s]


 62%|████████████████████▎            | 30777/50000 [5:35:02<3:29:37,  1.53it/s]


 62%|████████████████████▎            | 30778/50000 [5:35:03<3:20:46,  1.60it/s]


 62%|████████████████████▎            | 30779/50000 [5:35:03<3:24:03,  1.57it/s]


 62%|████████████████████▎            | 30780/50000 [5:35:04<3:24:26,  1.57it/s]


 62%|████████████████████▎            | 30781/50000 [5:35:05<3:28:22,  1.54it/s]


 62%|████████████████████▎            | 30782/50000 [5:35:05<3:27:29,  1.54it/s]


 62%|████████████████████▎            | 30783/50000 [5:35:06<3:15:01,  1.64it/s]


 62%|████████████████████▎            | 30784/50000 [5:35:07<3:34:20,  1.49it/s]


 62%|████████████████████▎            | 30785/50000 [5:35:07<3:35:46,  1.48it/s]


 62%|████████████████████▎            | 30786/50000 [5:35:08<3:39:47,  1.46it/s]


 62%|████████████████████▎            | 30787/50000 [5:35:09<3:29:50,  1.53it/s]


 62%|████████████████████▎            | 30788/50000 [5:35:09<3:27:07,  1.55it/s]


 62%|████████████████████▎            | 30789/50000 [5:35:10<3:36:56,  1.48it/s]


 62%|████████████████████▎            | 30790/50000 [5:35:11<3:32:39,  1.51it/s]


 62%|████████████████████▎            | 30791/50000 [5:35:11<3:36:59,  1.48it/s]


 62%|████████████████████▎            | 30792/50000 [5:35:12<3:41:25,  1.45it/s]


 62%|████████████████████▎            | 30793/50000 [5:35:13<3:27:56,  1.54it/s]


 62%|████████████████████▎            | 30794/50000 [5:35:13<3:22:06,  1.58it/s]


 62%|████████████████████▎            | 30795/50000 [5:35:14<3:26:31,  1.55it/s]


 62%|████████████████████▎            | 30796/50000 [5:35:14<3:26:32,  1.55it/s]


 62%|████████████████████▎            | 30797/50000 [5:35:15<3:55:15,  1.36it/s]


 62%|████████████████████▎            | 30798/50000 [5:35:16<3:49:06,  1.40it/s]


 62%|████████████████████▎            | 30799/50000 [5:35:17<3:37:28,  1.47it/s]


 62%|████████████████████▎            | 30800/50000 [5:35:17<3:29:58,  1.52it/s]


                                                                                
{'loss': 3.2375, 'grad_norm': 3.7812066078186035, 'learning_rate': 0.000384, 'epoch': 1.61}

 62%|████████████████████▎            | 30800/50000 [5:35:17<3:29:58,  1.52it/s]


 62%|████████████████████▎            | 30801/50000 [5:35:18<3:56:26,  1.35it/s]


 62%|████████████████████▎            | 30802/50000 [5:35:19<3:53:38,  1.37it/s]


 62%|████████████████████▎            | 30803/50000 [5:35:20<3:44:53,  1.42it/s]


 62%|████████████████████▎            | 30804/50000 [5:35:20<3:40:02,  1.45it/s]


 62%|████████████████████▎            | 30805/50000 [5:35:21<3:35:21,  1.49it/s]


 62%|████████████████████▎            | 30806/50000 [5:35:21<3:24:14,  1.57it/s]


 62%|████████████████████▎            | 30807/50000 [5:35:22<3:33:38,  1.50it/s]


 62%|████████████████████▎            | 30808/50000 [5:35:23<3:19:40,  1.60it/s]


 62%|████████████████████▎            | 30809/50000 [5:35:23<3:18:03,  1.61it/s]


 62%|████████████████████▎            | 30810/50000 [5:35:24<3:15:09,  1.64it/s]


 62%|████████████████████▎            | 30811/50000 [5:35:25<3:25:24,  1.56it/s]


 62%|████████████████████▎            | 30812/50000 [5:35:25<3:19:39,  1.60it/s]


 62%|████████████████████▎            | 30813/50000 [5:35:26<3:22:54,  1.58it/s]


 62%|████████████████████▎            | 30814/50000 [5:35:27<3:34:31,  1.49it/s]


 62%|████████████████████▎            | 30815/50000 [5:35:27<3:26:39,  1.55it/s]


 62%|████████████████████▎            | 30816/50000 [5:35:28<3:28:26,  1.53it/s]


 62%|████████████████████▎            | 30817/50000 [5:35:28<3:30:53,  1.52it/s]


 62%|████████████████████▎            | 30818/50000 [5:35:29<3:28:56,  1.53it/s]


 62%|████████████████████▎            | 30819/50000 [5:35:30<3:20:57,  1.59it/s]


 62%|████████████████████▎            | 30820/50000 [5:35:30<3:17:09,  1.62it/s]


 62%|████████████████████▎            | 30821/50000 [5:35:31<3:21:45,  1.58it/s]


 62%|████████████████████▎            | 30822/50000 [5:35:32<3:23:43,  1.57it/s]


 62%|████████████████████▎            | 30823/50000 [5:35:32<3:23:54,  1.57it/s]


 62%|████████████████████▎            | 30824/50000 [5:35:33<3:34:37,  1.49it/s]


 62%|████████████████████▎            | 30825/50000 [5:35:34<3:31:29,  1.51it/s]


 62%|████████████████████▎            | 30826/50000 [5:35:34<3:30:33,  1.52it/s]


 62%|████████████████████▎            | 30827/50000 [5:35:35<3:47:26,  1.40it/s]


 62%|████████████████████▎            | 30828/50000 [5:35:36<3:48:03,  1.40it/s]


 62%|████████████████████▎            | 30829/50000 [5:35:37<3:58:31,  1.34it/s]


 62%|████████████████████▎            | 30830/50000 [5:35:37<3:50:06,  1.39it/s]


 62%|████████████████████▎            | 30831/50000 [5:35:38<3:52:54,  1.37it/s]


 62%|████████████████████▎            | 30832/50000 [5:35:39<3:40:38,  1.45it/s]


 62%|████████████████████▎            | 30833/50000 [5:35:39<3:36:56,  1.47it/s]


 62%|████████████████████▎            | 30834/50000 [5:35:40<3:25:54,  1.55it/s]


 62%|████████████████████▎            | 30835/50000 [5:35:41<3:42:59,  1.43it/s]


 62%|████████████████████▎            | 30836/50000 [5:35:41<3:48:30,  1.40it/s]


 62%|████████████████████▎            | 30837/50000 [5:35:42<3:35:10,  1.48it/s]


 62%|████████████████████▎            | 30838/50000 [5:35:43<3:24:37,  1.56it/s]


 62%|████████████████████▎            | 30839/50000 [5:35:43<3:26:17,  1.55it/s]


 62%|████████████████████▎            | 30840/50000 [5:35:44<3:20:14,  1.59it/s]


 62%|████████████████████▎            | 30841/50000 [5:35:44<3:16:32,  1.62it/s]


 62%|████████████████████▎            | 30842/50000 [5:35:45<3:21:14,  1.59it/s]


 62%|████████████████████▎            | 30843/50000 [5:35:46<3:32:37,  1.50it/s]


 62%|████████████████████▎            | 30844/50000 [5:35:47<3:38:50,  1.46it/s]


 62%|████████████████████▎            | 30845/50000 [5:35:47<3:43:04,  1.43it/s]


 62%|████████████████████▎            | 30846/50000 [5:35:48<3:57:23,  1.34it/s]


 62%|████████████████████▎            | 30847/50000 [5:35:49<3:41:57,  1.44it/s]


 62%|████████████████████▎            | 30848/50000 [5:35:49<3:31:10,  1.51it/s]


 62%|████████████████████▎            | 30849/50000 [5:35:50<3:17:14,  1.62it/s]


 62%|████████████████████▎            | 30850/50000 [5:35:50<3:13:15,  1.65it/s]


 62%|████████████████████▎            | 30851/50000 [5:35:51<3:20:42,  1.59it/s]


 62%|████████████████████▎            | 30852/50000 [5:35:52<3:15:05,  1.64it/s]


 62%|████████████████████▎            | 30853/50000 [5:35:52<3:31:26,  1.51it/s]


 62%|████████████████████▎            | 30854/50000 [5:35:53<3:18:05,  1.61it/s]


 62%|████████████████████▎            | 30855/50000 [5:35:54<3:22:34,  1.58it/s]


 62%|████████████████████▎            | 30856/50000 [5:35:54<3:30:23,  1.52it/s]


 62%|████████████████████▎            | 30857/50000 [5:35:55<3:32:28,  1.50it/s]


 62%|████████████████████▎            | 30858/50000 [5:35:56<3:40:44,  1.45it/s]


 62%|████████████████████▎            | 30859/50000 [5:35:56<3:31:10,  1.51it/s]


 62%|████████████████████▎            | 30860/50000 [5:35:57<3:30:28,  1.52it/s]


 62%|████████████████████▎            | 30861/50000 [5:35:58<3:23:24,  1.57it/s]


 62%|████████████████████▎            | 30862/50000 [5:35:58<3:35:37,  1.48it/s]


 62%|████████████████████▎            | 30863/50000 [5:35:59<3:20:04,  1.59it/s]


 62%|████████████████████▎            | 30864/50000 [5:35:59<3:16:39,  1.62it/s]


 62%|████████████████████▎            | 30865/50000 [5:36:00<3:15:02,  1.64it/s]


 62%|████████████████████▎            | 30866/50000 [5:36:01<3:10:43,  1.67it/s]


 62%|████████████████████▎            | 30867/50000 [5:36:01<3:11:22,  1.67it/s]


 62%|████████████████████▎            | 30868/50000 [5:36:02<3:17:51,  1.61it/s]


 62%|████████████████████▎            | 30869/50000 [5:36:03<3:30:24,  1.52it/s]


 62%|████████████████████▎            | 30870/50000 [5:36:03<3:30:57,  1.51it/s]


 62%|████████████████████▎            | 30871/50000 [5:36:04<3:30:46,  1.51it/s]


 62%|████████████████████▍            | 30872/50000 [5:36:05<3:21:33,  1.58it/s]


 62%|████████████████████▍            | 30873/50000 [5:36:05<3:30:25,  1.51it/s]


 62%|████████████████████▍            | 30874/50000 [5:36:06<3:23:42,  1.56it/s]


 62%|████████████████████▍            | 30875/50000 [5:36:06<3:19:33,  1.60it/s]


 62%|████████████████████▍            | 30876/50000 [5:36:07<3:13:01,  1.65it/s]


 62%|████████████████████▍            | 30877/50000 [5:36:08<3:08:27,  1.69it/s]


 62%|████████████████████▍            | 30878/50000 [5:36:08<3:31:08,  1.51it/s]


 62%|████████████████████▍            | 30879/50000 [5:36:09<3:22:55,  1.57it/s]


 62%|████████████████████▍            | 30880/50000 [5:36:10<3:26:14,  1.55it/s]


 62%|████████████████████▍            | 30881/50000 [5:36:10<3:27:53,  1.53it/s]


 62%|████████████████████▍            | 30882/50000 [5:36:11<3:26:05,  1.55it/s]


 62%|████████████████████▍            | 30883/50000 [5:36:12<3:32:54,  1.50it/s]


 62%|████████████████████▍            | 30884/50000 [5:36:12<3:38:15,  1.46it/s]


 62%|████████████████████▍            | 30885/50000 [5:36:13<3:33:52,  1.49it/s]


 62%|████████████████████▍            | 30886/50000 [5:36:14<3:42:32,  1.43it/s]


 62%|████████████████████▍            | 30887/50000 [5:36:14<3:37:29,  1.46it/s]


 62%|████████████████████▍            | 30888/50000 [5:36:15<3:36:58,  1.47it/s]


 62%|████████████████████▍            | 30889/50000 [5:36:16<3:35:23,  1.48it/s]


 62%|████████████████████▍            | 30890/50000 [5:36:16<3:31:56,  1.50it/s]


 62%|████████████████████▍            | 30891/50000 [5:36:17<3:21:41,  1.58it/s]


 62%|████████████████████▍            | 30892/50000 [5:36:18<3:28:55,  1.52it/s]


 62%|████████████████████▍            | 30893/50000 [5:36:18<3:22:18,  1.57it/s]


 62%|████████████████████▍            | 30894/50000 [5:36:19<3:33:57,  1.49it/s]


 62%|████████████████████▍            | 30895/50000 [5:36:20<3:37:31,  1.46it/s]


 62%|████████████████████▍            | 30896/50000 [5:36:20<3:21:36,  1.58it/s]


 62%|████████████████████▍            | 30897/50000 [5:36:21<3:18:32,  1.60it/s]


 62%|████████████████████▍            | 30898/50000 [5:36:22<3:23:02,  1.57it/s]


 62%|████████████████████▍            | 30899/50000 [5:36:22<3:47:25,  1.40it/s]


 62%|████████████████████▍            | 30900/50000 [5:36:23<3:38:27,  1.46it/s]
                                                                                
{'loss': 3.2482, 'grad_norm': 3.3444857597351074, 'learning_rate': 0.000382, 'epoch': 1.62}

 62%|████████████████████▍            | 30900/50000 [5:36:23<3:38:27,  1.46it/s]


 62%|████████████████████▍            | 30901/50000 [5:36:24<3:29:45,  1.52it/s]


 62%|████████████████████▍            | 30902/50000 [5:36:24<3:28:26,  1.53it/s]


 62%|████████████████████▍            | 30903/50000 [5:36:25<3:25:12,  1.55it/s]


 62%|████████████████████▍            | 30904/50000 [5:36:26<3:35:12,  1.48it/s]


 62%|████████████████████▍            | 30905/50000 [5:36:26<3:26:49,  1.54it/s]


 62%|████████████████████▍            | 30906/50000 [5:36:27<3:42:06,  1.43it/s]


 62%|████████████████████▍            | 30907/50000 [5:36:28<4:06:10,  1.29it/s]


 62%|████████████████████▍            | 30908/50000 [5:36:29<3:55:44,  1.35it/s]


 62%|████████████████████▍            | 30909/50000 [5:36:29<3:49:39,  1.39it/s]


 62%|████████████████████▍            | 30910/50000 [5:36:30<3:44:14,  1.42it/s]


 62%|████████████████████▍            | 30911/50000 [5:36:31<3:29:00,  1.52it/s]


 62%|████████████████████▍            | 30912/50000 [5:36:31<3:29:44,  1.52it/s]


 62%|████████████████████▍            | 30913/50000 [5:36:32<3:20:02,  1.59it/s]


 62%|████████████████████▍            | 30914/50000 [5:36:33<3:34:34,  1.48it/s]


 62%|████████████████████▍            | 30915/50000 [5:36:33<3:47:23,  1.40it/s]


 62%|████████████████████▍            | 30916/50000 [5:36:34<3:42:00,  1.43it/s]


 62%|████████████████████▍            | 30917/50000 [5:36:35<3:30:22,  1.51it/s]


 62%|████████████████████▍            | 30918/50000 [5:36:35<3:27:05,  1.54it/s]


 62%|████████████████████▍            | 30919/50000 [5:36:36<3:42:51,  1.43it/s]


 62%|████████████████████▍            | 30920/50000 [5:36:37<3:44:30,  1.42it/s]


 62%|████████████████████▍            | 30921/50000 [5:36:37<3:31:32,  1.50it/s]


 62%|████████████████████▍            | 30922/50000 [5:36:38<3:40:12,  1.44it/s]


 62%|████████████████████▍            | 30923/50000 [5:36:39<3:33:56,  1.49it/s]


 62%|████████████████████▍            | 30924/50000 [5:36:40<3:42:25,  1.43it/s]


 62%|████████████████████▍            | 30925/50000 [5:36:40<3:37:51,  1.46it/s]


 62%|████████████████████▍            | 30926/50000 [5:36:41<3:45:48,  1.41it/s]


 62%|████████████████████▍            | 30927/50000 [5:36:42<3:41:25,  1.44it/s]


 62%|████████████████████▍            | 30928/50000 [5:36:42<3:35:56,  1.47it/s]


 62%|████████████████████▍            | 30929/50000 [5:36:43<3:33:18,  1.49it/s]


 62%|████████████████████▍            | 30930/50000 [5:36:44<3:31:26,  1.50it/s]


 62%|████████████████████▍            | 30931/50000 [5:36:44<3:32:34,  1.50it/s]


 62%|████████████████████▍            | 30932/50000 [5:36:45<3:24:21,  1.56it/s]


 62%|████████████████████▍            | 30933/50000 [5:36:45<3:22:46,  1.57it/s]


 62%|████████████████████▍            | 30934/50000 [5:36:46<3:29:31,  1.52it/s]


 62%|████████████████████▍            | 30935/50000 [5:36:47<3:23:21,  1.56it/s]


 62%|████████████████████▍            | 30936/50000 [5:36:47<3:22:10,  1.57it/s]


 62%|████████████████████▍            | 30937/50000 [5:36:48<3:24:17,  1.56it/s]


 62%|████████████████████▍            | 30938/50000 [5:36:49<3:22:42,  1.57it/s]


 62%|████████████████████▍            | 30939/50000 [5:36:49<3:30:51,  1.51it/s]


 62%|████████████████████▍            | 30940/50000 [5:36:50<3:29:08,  1.52it/s]


 62%|████████████████████▍            | 30941/50000 [5:36:51<3:29:45,  1.51it/s]


 62%|████████████████████▍            | 30942/50000 [5:36:51<3:21:09,  1.58it/s]


 62%|████████████████████▍            | 30943/50000 [5:36:52<3:21:09,  1.58it/s]


 62%|████████████████████▍            | 30944/50000 [5:36:53<3:29:46,  1.51it/s]


 62%|████████████████████▍            | 30945/50000 [5:36:53<3:29:37,  1.52it/s]


 62%|████████████████████▍            | 30946/50000 [5:36:54<3:24:42,  1.55it/s]


 62%|████████████████████▍            | 30947/50000 [5:36:54<3:15:10,  1.63it/s]


 62%|████████████████████▍            | 30948/50000 [5:36:55<3:38:04,  1.46it/s]


 62%|████████████████████▍            | 30949/50000 [5:36:56<3:33:38,  1.49it/s]


 62%|████████████████████▍            | 30950/50000 [5:36:56<3:24:38,  1.55it/s]


 62%|████████████████████▍            | 30951/50000 [5:36:57<3:15:23,  1.62it/s]


 62%|████████████████████▍            | 30952/50000 [5:36:58<3:08:52,  1.68it/s]


 62%|████████████████████▍            | 30953/50000 [5:36:58<3:07:54,  1.69it/s]


 62%|████████████████████▍            | 30954/50000 [5:36:59<3:16:51,  1.61it/s]


 62%|████████████████████▍            | 30955/50000 [5:37:00<3:30:35,  1.51it/s]


 62%|████████████████████▍            | 30956/50000 [5:37:00<3:30:55,  1.50it/s]


 62%|████████████████████▍            | 30957/50000 [5:37:01<3:29:19,  1.52it/s]


 62%|████████████████████▍            | 30958/50000 [5:37:02<3:45:00,  1.41it/s]


 62%|████████████████████▍            | 30959/50000 [5:37:02<3:34:21,  1.48it/s]


 62%|████████████████████▍            | 30960/50000 [5:37:03<3:31:48,  1.50it/s]


 62%|████████████████████▍            | 30961/50000 [5:37:04<3:29:16,  1.52it/s]


 62%|████████████████████▍            | 30962/50000 [5:37:04<3:22:02,  1.57it/s]


 62%|████████████████████▍            | 30963/50000 [5:37:05<3:13:17,  1.64it/s]


 62%|████████████████████▍            | 30964/50000 [5:37:06<3:56:56,  1.34it/s]


 62%|████████████████████▍            | 30965/50000 [5:37:07<3:58:23,  1.33it/s]


 62%|████████████████████▍            | 30966/50000 [5:37:07<3:47:06,  1.40it/s]


 62%|████████████████████▍            | 30967/50000 [5:37:08<3:34:25,  1.48it/s]


 62%|████████████████████▍            | 30968/50000 [5:37:08<3:33:19,  1.49it/s]


 62%|████████████████████▍            | 30969/50000 [5:37:09<3:27:14,  1.53it/s]


 62%|████████████████████▍            | 30970/50000 [5:37:10<3:13:39,  1.64it/s]


 62%|████████████████████▍            | 30971/50000 [5:37:10<3:12:44,  1.65it/s]


 62%|████████████████████▍            | 30972/50000 [5:37:11<3:10:55,  1.66it/s]


 62%|████████████████████▍            | 30973/50000 [5:37:11<3:07:21,  1.69it/s]


 62%|████████████████████▍            | 30974/50000 [5:37:12<3:08:17,  1.68it/s]


 62%|████████████████████▍            | 30975/50000 [5:37:13<3:40:58,  1.43it/s]


 62%|████████████████████▍            | 30976/50000 [5:37:13<3:32:28,  1.49it/s]


 62%|████████████████████▍            | 30977/50000 [5:37:14<3:40:36,  1.44it/s]


 62%|████████████████████▍            | 30978/50000 [5:37:15<3:43:14,  1.42it/s]


 62%|████████████████████▍            | 30979/50000 [5:37:16<3:31:38,  1.50it/s]


 62%|████████████████████▍            | 30980/50000 [5:37:16<3:32:32,  1.49it/s]


 62%|████████████████████▍            | 30981/50000 [5:37:17<3:26:30,  1.53it/s]


 62%|████████████████████▍            | 30982/50000 [5:37:17<3:22:20,  1.57it/s]


 62%|████████████████████▍            | 30983/50000 [5:37:18<3:25:25,  1.54it/s]


 62%|████████████████████▍            | 30984/50000 [5:37:19<3:19:10,  1.59it/s]


 62%|████████████████████▍            | 30985/50000 [5:37:19<3:21:58,  1.57it/s]


 62%|████████████████████▍            | 30986/50000 [5:37:20<3:21:32,  1.57it/s]


 62%|████████████████████▍            | 30987/50000 [5:37:21<3:28:30,  1.52it/s]


 62%|████████████████████▍            | 30988/50000 [5:37:21<3:30:55,  1.50it/s]


 62%|████████████████████▍            | 30989/50000 [5:37:22<3:26:38,  1.53it/s]


 62%|████████████████████▍            | 30990/50000 [5:37:23<3:20:35,  1.58it/s]


 62%|████████████████████▍            | 30991/50000 [5:37:23<3:15:45,  1.62it/s]


 62%|████████████████████▍            | 30992/50000 [5:37:24<3:20:12,  1.58it/s]


 62%|████████████████████▍            | 30993/50000 [5:37:24<3:13:41,  1.64it/s]


 62%|████████████████████▍            | 30994/50000 [5:37:25<3:19:55,  1.58it/s]


 62%|████████████████████▍            | 30995/50000 [5:37:26<3:29:18,  1.51it/s]


 62%|████████████████████▍            | 30996/50000 [5:37:26<3:18:59,  1.59it/s]


 62%|████████████████████▍            | 30997/50000 [5:37:27<3:31:05,  1.50it/s]


 62%|████████████████████▍            | 30998/50000 [5:37:28<3:21:22,  1.57it/s]


 62%|████████████████████▍            | 30999/50000 [5:37:28<3:29:11,  1.51it/s]


 62%|████████████████████▍            | 31000/50000 [5:37:29<3:25:59,  1.54it/s]
                                                                                
{'loss': 3.2913, 'grad_norm': 6.552734851837158, 'learning_rate': 0.00038, 'epoch': 1.62}

 62%|████████████████████▍            | 31000/50000 [5:37:29<3:25:59,  1.54it/s]


 62%|████████████████████▍            | 31001/50000 [5:37:30<3:25:31,  1.54it/s]


 62%|████████████████████▍            | 31002/50000 [5:37:30<3:25:46,  1.54it/s]


 62%|████████████████████▍            | 31003/50000 [5:37:31<3:24:49,  1.55it/s]


 62%|████████████████████▍            | 31004/50000 [5:37:32<3:34:30,  1.48it/s]


 62%|████████████████████▍            | 31005/50000 [5:37:32<3:27:15,  1.53it/s]


 62%|████████████████████▍            | 31006/50000 [5:37:33<3:20:58,  1.58it/s]


 62%|████████████████████▍            | 31007/50000 [5:37:33<3:14:12,  1.63it/s]


 62%|████████████████████▍            | 31008/50000 [5:37:34<3:37:04,  1.46it/s]


 62%|████████████████████▍            | 31009/50000 [5:37:35<3:23:44,  1.55it/s]


 62%|████████████████████▍            | 31010/50000 [5:37:35<3:19:41,  1.58it/s]


 62%|████████████████████▍            | 31011/50000 [5:37:36<3:21:04,  1.57it/s]


 62%|████████████████████▍            | 31012/50000 [5:37:37<3:20:17,  1.58it/s]


 62%|████████████████████▍            | 31013/50000 [5:37:37<3:21:27,  1.57it/s]


 62%|████████████████████▍            | 31014/50000 [5:37:38<3:31:31,  1.50it/s]


 62%|████████████████████▍            | 31015/50000 [5:37:39<3:20:23,  1.58it/s]


 62%|████████████████████▍            | 31016/50000 [5:37:39<3:09:51,  1.67it/s]


 62%|████████████████████▍            | 31017/50000 [5:37:40<3:22:47,  1.56it/s]


 62%|████████████████████▍            | 31018/50000 [5:37:41<3:36:04,  1.46it/s]


 62%|████████████████████▍            | 31019/50000 [5:37:41<3:39:22,  1.44it/s]


 62%|████████████████████▍            | 31020/50000 [5:37:42<3:32:27,  1.49it/s]


 62%|████████████████████▍            | 31021/50000 [5:37:43<3:32:21,  1.49it/s]


 62%|████████████████████▍            | 31022/50000 [5:37:43<3:31:42,  1.49it/s]


 62%|████████████████████▍            | 31023/50000 [5:37:44<3:36:19,  1.46it/s]


 62%|████████████████████▍            | 31024/50000 [5:37:45<3:41:53,  1.43it/s]


 62%|████████████████████▍            | 31025/50000 [5:37:45<3:30:34,  1.50it/s]


 62%|████████████████████▍            | 31026/50000 [5:37:46<3:39:43,  1.44it/s]


 62%|████████████████████▍            | 31027/50000 [5:37:47<3:36:17,  1.46it/s]


 62%|████████████████████▍            | 31028/50000 [5:37:48<3:36:20,  1.46it/s]


 62%|████████████████████▍            | 31029/50000 [5:37:48<3:39:30,  1.44it/s]


 62%|████████████████████▍            | 31030/50000 [5:37:49<3:48:13,  1.39it/s]


 62%|████████████████████▍            | 31031/50000 [5:37:50<3:41:44,  1.43it/s]


 62%|████████████████████▍            | 31032/50000 [5:37:50<3:30:18,  1.50it/s]


 62%|████████████████████▍            | 31033/50000 [5:37:51<3:28:57,  1.51it/s]


 62%|████████████████████▍            | 31034/50000 [5:37:52<3:26:23,  1.53it/s]


 62%|████████████████████▍            | 31035/50000 [5:37:52<3:33:32,  1.48it/s]


 62%|████████████████████▍            | 31036/50000 [5:37:53<3:32:36,  1.49it/s]


 62%|████████████████████▍            | 31037/50000 [5:37:54<3:28:15,  1.52it/s]


 62%|████████████████████▍            | 31038/50000 [5:37:54<3:42:35,  1.42it/s]


 62%|████████████████████▍            | 31039/50000 [5:37:55<3:54:22,  1.35it/s]


 62%|████████████████████▍            | 31040/50000 [5:37:56<3:41:03,  1.43it/s]


 62%|████████████████████▍            | 31041/50000 [5:37:56<3:27:35,  1.52it/s]


 62%|████████████████████▍            | 31042/50000 [5:37:57<3:26:29,  1.53it/s]


 62%|████████████████████▍            | 31043/50000 [5:37:58<3:11:08,  1.65it/s]


 62%|████████████████████▍            | 31044/50000 [5:37:58<3:07:51,  1.68it/s]


 62%|████████████████████▍            | 31045/50000 [5:37:59<3:15:06,  1.62it/s]


 62%|████████████████████▍            | 31046/50000 [5:37:59<3:14:10,  1.63it/s]


 62%|████████████████████▍            | 31047/50000 [5:38:00<3:17:07,  1.60it/s]


 62%|████████████████████▍            | 31048/50000 [5:38:01<3:21:38,  1.57it/s]


 62%|████████████████████▍            | 31049/50000 [5:38:01<3:09:30,  1.67it/s]


 62%|████████████████████▍            | 31050/50000 [5:38:02<3:13:12,  1.63it/s]


 62%|████████████████████▍            | 31051/50000 [5:38:02<3:15:14,  1.62it/s]


 62%|████████████████████▍            | 31052/50000 [5:38:03<3:13:38,  1.63it/s]


 62%|████████████████████▍            | 31053/50000 [5:38:04<3:12:28,  1.64it/s]


 62%|████████████████████▍            | 31054/50000 [5:38:04<3:08:09,  1.68it/s]


 62%|████████████████████▍            | 31055/50000 [5:38:05<3:13:27,  1.63it/s]


 62%|████████████████████▍            | 31056/50000 [5:38:06<3:17:49,  1.60it/s]


 62%|████████████████████▍            | 31057/50000 [5:38:06<3:13:49,  1.63it/s]


 62%|████████████████████▍            | 31058/50000 [5:38:07<3:33:48,  1.48it/s]


 62%|████████████████████▍            | 31059/50000 [5:38:08<3:39:06,  1.44it/s]


 62%|████████████████████▍            | 31060/50000 [5:38:08<3:28:13,  1.52it/s]


 62%|████████████████████▌            | 31061/50000 [5:38:09<3:30:03,  1.50it/s]


 62%|████████████████████▌            | 31062/50000 [5:38:10<3:24:57,  1.54it/s]


 62%|████████████████████▌            | 31063/50000 [5:38:10<3:27:33,  1.52it/s]


 62%|████████████████████▌            | 31064/50000 [5:38:11<3:35:20,  1.47it/s]


 62%|████████████████████▌            | 31065/50000 [5:38:12<3:23:45,  1.55it/s]


 62%|████████████████████▌            | 31066/50000 [5:38:12<3:40:41,  1.43it/s]


 62%|████████████████████▌            | 31067/50000 [5:38:13<3:29:13,  1.51it/s]


 62%|████████████████████▌            | 31068/50000 [5:38:14<3:22:22,  1.56it/s]


 62%|████████████████████▌            | 31069/50000 [5:38:14<3:33:23,  1.48it/s]


 62%|████████████████████▌            | 31070/50000 [5:38:15<3:31:16,  1.49it/s]


 62%|████████████████████▌            | 31071/50000 [5:38:16<3:22:00,  1.56it/s]


 62%|████████████████████▌            | 31072/50000 [5:38:16<3:25:09,  1.54it/s]


 62%|████████████████████▌            | 31073/50000 [5:38:17<3:22:33,  1.56it/s]


 62%|████████████████████▌            | 31074/50000 [5:38:17<3:23:06,  1.55it/s]


 62%|████████████████████▌            | 31075/50000 [5:38:18<3:25:15,  1.54it/s]


 62%|████████████████████▌            | 31076/50000 [5:38:19<3:24:48,  1.54it/s]


 62%|████████████████████▌            | 31077/50000 [5:38:19<3:27:28,  1.52it/s]


 62%|████████████████████▌            | 31078/50000 [5:38:20<3:25:41,  1.53it/s]


 62%|████████████████████▌            | 31079/50000 [5:38:21<3:19:07,  1.58it/s]


 62%|████████████████████▌            | 31080/50000 [5:38:21<3:14:02,  1.63it/s]


 62%|████████████████████▌            | 31081/50000 [5:38:22<3:13:16,  1.63it/s]


 62%|████████████████████▌            | 31082/50000 [5:38:23<3:15:09,  1.62it/s]


 62%|████████████████████▌            | 31083/50000 [5:38:23<3:41:38,  1.42it/s]


 62%|████████████████████▌            | 31084/50000 [5:38:24<3:44:17,  1.41it/s]


 62%|████████████████████▌            | 31085/50000 [5:38:25<3:32:42,  1.48it/s]


 62%|████████████████████▌            | 31086/50000 [5:38:25<3:24:34,  1.54it/s]


 62%|████████████████████▌            | 31087/50000 [5:38:26<3:24:50,  1.54it/s]


 62%|████████████████████▌            | 31088/50000 [5:38:27<3:22:43,  1.55it/s]


 62%|████████████████████▌            | 31089/50000 [5:38:27<3:16:28,  1.60it/s]


 62%|████████████████████▌            | 31090/50000 [5:38:28<3:10:52,  1.65it/s]


 62%|████████████████████▌            | 31091/50000 [5:38:28<3:13:31,  1.63it/s]


 62%|████████████████████▌            | 31092/50000 [5:38:29<3:23:55,  1.55it/s]


 62%|████████████████████▌            | 31093/50000 [5:38:30<3:22:39,  1.55it/s]


 62%|████████████████████▌            | 31094/50000 [5:38:30<3:20:33,  1.57it/s]


 62%|████████████████████▌            | 31095/50000 [5:38:31<3:23:16,  1.55it/s]


 62%|████████████████████▌            | 31096/50000 [5:38:32<3:15:39,  1.61it/s]


 62%|████████████████████▌            | 31097/50000 [5:38:32<3:08:27,  1.67it/s]


 62%|████████████████████▌            | 31098/50000 [5:38:33<3:08:11,  1.67it/s]


 62%|████████████████████▌            | 31099/50000 [5:38:33<3:25:45,  1.53it/s]


 62%|████████████████████▌            | 31100/50000 [5:38:34<3:24:09,  1.54it/s]
                                                                                
{'loss': 3.2608, 'grad_norm': 3.2037084102630615, 'learning_rate': 0.000378, 'epoch': 1.63}

 62%|████████████████████▌            | 31100/50000 [5:38:34<3:24:09,  1.54it/s]


 62%|████████████████████▌            | 31101/50000 [5:38:35<3:25:58,  1.53it/s]


 62%|████████████████████▌            | 31102/50000 [5:38:35<3:19:53,  1.58it/s]


 62%|████████████████████▌            | 31103/50000 [5:38:36<3:22:09,  1.56it/s]


 62%|████████████████████▌            | 31104/50000 [5:38:37<3:15:20,  1.61it/s]


 62%|████████████████████▌            | 31105/50000 [5:38:37<3:20:43,  1.57it/s]


 62%|████████████████████▌            | 31106/50000 [5:38:38<3:21:29,  1.56it/s]


 62%|████████████████████▌            | 31107/50000 [5:38:39<3:29:24,  1.50it/s]


 62%|████████████████████▌            | 31108/50000 [5:38:39<3:23:20,  1.55it/s]


 62%|████████████████████▌            | 31109/50000 [5:38:40<3:19:42,  1.58it/s]


 62%|████████████████████▌            | 31110/50000 [5:38:41<3:24:24,  1.54it/s]


 62%|████████████████████▌            | 31111/50000 [5:38:41<3:16:56,  1.60it/s]


 62%|████████████████████▌            | 31112/50000 [5:38:42<3:22:23,  1.56it/s]


 62%|████████████████████▌            | 31113/50000 [5:38:43<3:35:34,  1.46it/s]


 62%|████████████████████▌            | 31114/50000 [5:38:43<3:27:46,  1.51it/s]


 62%|████████████████████▌            | 31115/50000 [5:38:44<3:24:05,  1.54it/s]


 62%|████████████████████▌            | 31116/50000 [5:38:44<3:17:21,  1.59it/s]


 62%|████████████████████▌            | 31117/50000 [5:38:45<3:34:40,  1.47it/s]


 62%|████████████████████▌            | 31118/50000 [5:38:46<3:34:56,  1.46it/s]


 62%|████████████████████▌            | 31119/50000 [5:38:47<3:39:36,  1.43it/s]


 62%|████████████████████▌            | 31120/50000 [5:38:47<3:27:01,  1.52it/s]


 62%|████████████████████▌            | 31121/50000 [5:38:48<3:52:49,  1.35it/s]


 62%|████████████████████▌            | 31122/50000 [5:38:49<3:54:23,  1.34it/s]


 62%|████████████████████▌            | 31123/50000 [5:38:50<3:54:10,  1.34it/s]


 62%|████████████████████▌            | 31124/50000 [5:38:50<3:35:57,  1.46it/s]


 62%|████████████████████▌            | 31125/50000 [5:38:51<3:30:18,  1.50it/s]


 62%|████████████████████▌            | 31126/50000 [5:38:52<3:37:53,  1.44it/s]


 62%|████████████████████▌            | 31127/50000 [5:38:52<3:42:39,  1.41it/s]


 62%|████████████████████▌            | 31128/50000 [5:38:53<3:39:09,  1.44it/s]


 62%|████████████████████▌            | 31129/50000 [5:38:54<3:34:25,  1.47it/s]


 62%|████████████████████▌            | 31130/50000 [5:38:54<3:22:52,  1.55it/s]


 62%|████████████████████▌            | 31131/50000 [5:38:55<3:18:56,  1.58it/s]


 62%|████████████████████▌            | 31132/50000 [5:38:56<3:31:24,  1.49it/s]


 62%|████████████████████▌            | 31133/50000 [5:38:56<3:27:19,  1.52it/s]


 62%|████████████████████▌            | 31134/50000 [5:38:57<3:22:37,  1.55it/s]


 62%|████████████████████▌            | 31135/50000 [5:38:57<3:07:46,  1.67it/s]


 62%|████████████████████▌            | 31136/50000 [5:38:58<3:06:37,  1.68it/s]


 62%|████████████████████▌            | 31137/50000 [5:38:58<3:05:46,  1.69it/s]


 62%|████████████████████▌            | 31138/50000 [5:38:59<3:17:08,  1.59it/s]


 62%|████████████████████▌            | 31139/50000 [5:39:00<3:09:36,  1.66it/s]


 62%|████████████████████▌            | 31140/50000 [5:39:00<3:19:47,  1.57it/s]


 62%|████████████████████▌            | 31141/50000 [5:39:01<3:13:48,  1.62it/s]


 62%|████████████████████▌            | 31142/50000 [5:39:01<3:05:07,  1.70it/s]


 62%|████████████████████▌            | 31143/50000 [5:39:02<3:04:27,  1.70it/s]


 62%|████████████████████▌            | 31144/50000 [5:39:03<3:18:22,  1.58it/s]


 62%|████████████████████▌            | 31145/50000 [5:39:03<3:21:43,  1.56it/s]


 62%|████████████████████▌            | 31146/50000 [5:39:04<3:32:42,  1.48it/s]


 62%|████████████████████▌            | 31147/50000 [5:39:05<3:31:09,  1.49it/s]


 62%|████████████████████▌            | 31148/50000 [5:39:06<3:45:00,  1.40it/s]


 62%|████████████████████▌            | 31149/50000 [5:39:06<3:36:15,  1.45it/s]


 62%|████████████████████▌            | 31150/50000 [5:39:07<3:30:38,  1.49it/s]


 62%|████████████████████▌            | 31151/50000 [5:39:08<3:26:08,  1.52it/s]


 62%|████████████████████▌            | 31152/50000 [5:39:08<3:32:15,  1.48it/s]


 62%|████████████████████▌            | 31153/50000 [5:39:09<3:17:44,  1.59it/s]


 62%|████████████████████▌            | 31154/50000 [5:39:10<3:23:08,  1.55it/s]


 62%|████████████████████▌            | 31155/50000 [5:39:10<3:30:02,  1.50it/s]


 62%|████████████████████▌            | 31156/50000 [5:39:11<3:24:29,  1.54it/s]


 62%|████████████████████▌            | 31157/50000 [5:39:11<3:23:33,  1.54it/s]


 62%|████████████████████▌            | 31158/50000 [5:39:12<3:21:09,  1.56it/s]


 62%|████████████████████▌            | 31159/50000 [5:39:13<3:17:36,  1.59it/s]


 62%|████████████████████▌            | 31160/50000 [5:39:13<3:17:32,  1.59it/s]


 62%|████████████████████▌            | 31161/50000 [5:39:14<3:22:15,  1.55it/s]


 62%|████████████████████▌            | 31162/50000 [5:39:15<3:34:47,  1.46it/s]


 62%|████████████████████▌            | 31163/50000 [5:39:15<3:33:08,  1.47it/s]


 62%|████████████████████▌            | 31164/50000 [5:39:16<3:21:18,  1.56it/s]


 62%|████████████████████▌            | 31165/50000 [5:39:17<3:28:28,  1.51it/s]


 62%|████████████████████▌            | 31166/50000 [5:39:18<3:44:44,  1.40it/s]


 62%|████████████████████▌            | 31167/50000 [5:39:18<3:51:04,  1.36it/s]


 62%|████████████████████▌            | 31168/50000 [5:39:19<3:34:27,  1.46it/s]


 62%|████████████████████▌            | 31169/50000 [5:39:20<3:41:30,  1.42it/s]


 62%|████████████████████▌            | 31170/50000 [5:39:20<3:30:07,  1.49it/s]


 62%|████████████████████▌            | 31171/50000 [5:39:21<3:28:34,  1.50it/s]


 62%|████████████████████▌            | 31172/50000 [5:39:22<3:23:30,  1.54it/s]


 62%|████████████████████▌            | 31173/50000 [5:39:22<3:33:12,  1.47it/s]


 62%|████████████████████▌            | 31174/50000 [5:39:23<3:30:49,  1.49it/s]


 62%|████████████████████▌            | 31175/50000 [5:39:24<3:27:20,  1.51it/s]


 62%|████████████████████▌            | 31176/50000 [5:39:24<3:18:12,  1.58it/s]


 62%|████████████████████▌            | 31177/50000 [5:39:25<3:14:44,  1.61it/s]


 62%|████████████████████▌            | 31178/50000 [5:39:25<3:15:58,  1.60it/s]


 62%|████████████████████▌            | 31179/50000 [5:39:26<3:15:37,  1.60it/s]


 62%|████████████████████▌            | 31180/50000 [5:39:27<3:21:27,  1.56it/s]


 62%|████████████████████▌            | 31181/50000 [5:39:27<3:24:56,  1.53it/s]


 62%|████████████████████▌            | 31182/50000 [5:39:28<3:24:29,  1.53it/s]


 62%|████████████████████▌            | 31183/50000 [5:39:29<3:22:19,  1.55it/s]


 62%|████████████████████▌            | 31184/50000 [5:39:29<3:39:13,  1.43it/s]


 62%|████████████████████▌            | 31185/50000 [5:39:30<3:27:52,  1.51it/s]


 62%|████████████████████▌            | 31186/50000 [5:39:31<3:21:40,  1.55it/s]


 62%|████████████████████▌            | 31187/50000 [5:39:31<3:16:45,  1.59it/s]


 62%|████████████████████▌            | 31188/50000 [5:39:32<3:13:52,  1.62it/s]


 62%|████████████████████▌            | 31189/50000 [5:39:32<3:11:33,  1.64it/s]


 62%|████████████████████▌            | 31190/50000 [5:39:33<3:02:30,  1.72it/s]


 62%|████████████████████▌            | 31191/50000 [5:39:34<3:12:14,  1.63it/s]


 62%|████████████████████▌            | 31192/50000 [5:39:34<3:15:12,  1.61it/s]


 62%|████████████████████▌            | 31193/50000 [5:39:35<3:05:27,  1.69it/s]


 62%|████████████████████▌            | 31194/50000 [5:39:35<3:06:33,  1.68it/s]


 62%|████████████████████▌            | 31195/50000 [5:39:36<3:09:43,  1.65it/s]


 62%|████████████████████▌            | 31196/50000 [5:39:37<3:05:27,  1.69it/s]


 62%|████████████████████▌            | 31197/50000 [5:39:37<3:19:51,  1.57it/s]


 62%|████████████████████▌            | 31198/50000 [5:39:38<3:16:01,  1.60it/s]


 62%|████████████████████▌            | 31199/50000 [5:39:39<3:27:16,  1.51it/s]


 62%|████████████████████▌            | 31200/50000 [5:39:39<3:19:02,  1.57it/s]
                                                                                
{'loss': 3.2409, 'grad_norm': 3.659757375717163, 'learning_rate': 0.00037600000000000003, 'epoch': 1.63}

 62%|████████████████████▌            | 31200/50000 [5:39:39<3:19:02,  1.57it/s]


 62%|████████████████████▌            | 31201/50000 [5:39:40<3:46:35,  1.38it/s]


 62%|████████████████████▌            | 31202/50000 [5:39:41<3:34:46,  1.46it/s]


 62%|████████████████████▌            | 31203/50000 [5:39:41<3:27:38,  1.51it/s]


 62%|████████████████████▌            | 31204/50000 [5:39:42<3:14:36,  1.61it/s]


 62%|████████████████████▌            | 31205/50000 [5:39:43<3:20:03,  1.57it/s]


 62%|████████████████████▌            | 31206/50000 [5:39:43<3:22:16,  1.55it/s]


 62%|████████████████████▌            | 31207/50000 [5:39:44<3:30:18,  1.49it/s]


 62%|████████████████████▌            | 31208/50000 [5:39:45<3:22:34,  1.55it/s]


 62%|████████████████████▌            | 31209/50000 [5:39:45<3:22:38,  1.55it/s]


 62%|████████████████████▌            | 31210/50000 [5:39:46<3:16:13,  1.60it/s]


 62%|████████████████████▌            | 31211/50000 [5:39:46<3:11:49,  1.63it/s]


 62%|████████████████████▌            | 31212/50000 [5:39:47<3:18:43,  1.58it/s]


 62%|████████████████████▌            | 31213/50000 [5:39:48<3:14:05,  1.61it/s]


 62%|████████████████████▌            | 31214/50000 [5:39:48<3:11:55,  1.63it/s]


 62%|████████████████████▌            | 31215/50000 [5:39:49<3:14:02,  1.61it/s]


 62%|████████████████████▌            | 31216/50000 [5:39:49<3:14:20,  1.61it/s]


 62%|████████████████████▌            | 31217/50000 [5:39:51<3:52:47,  1.34it/s]


 62%|████████████████████▌            | 31218/50000 [5:39:51<3:37:15,  1.44it/s]


 62%|████████████████████▌            | 31219/50000 [5:39:52<3:28:51,  1.50it/s]


 62%|████████████████████▌            | 31220/50000 [5:39:52<3:19:21,  1.57it/s]


 62%|████████████████████▌            | 31221/50000 [5:39:53<3:20:21,  1.56it/s]


 62%|████████████████████▌            | 31222/50000 [5:39:54<3:30:11,  1.49it/s]


 62%|████████████████████▌            | 31223/50000 [5:39:54<3:27:20,  1.51it/s]


 62%|████████████████████▌            | 31224/50000 [5:39:55<3:36:15,  1.45it/s]


 62%|████████████████████▌            | 31225/50000 [5:39:56<3:20:03,  1.56it/s]


 62%|████████████████████▌            | 31226/50000 [5:39:56<3:22:00,  1.55it/s]


 62%|████████████████████▌            | 31227/50000 [5:39:57<3:18:01,  1.58it/s]


 62%|████████████████████▌            | 31228/50000 [5:39:57<3:18:48,  1.57it/s]


 62%|████████████████████▌            | 31229/50000 [5:39:58<3:38:19,  1.43it/s]


 62%|████████████████████▌            | 31230/50000 [5:39:59<3:21:30,  1.55it/s]


 62%|████████████████████▌            | 31231/50000 [5:39:59<3:13:36,  1.62it/s]


 62%|████████████████████▌            | 31232/50000 [5:40:00<3:12:08,  1.63it/s]


 62%|████████████████████▌            | 31233/50000 [5:40:01<3:17:14,  1.59it/s]


 62%|████████████████████▌            | 31234/50000 [5:40:01<3:30:52,  1.48it/s]


 62%|████████████████████▌            | 31235/50000 [5:40:02<3:21:10,  1.55it/s]


 62%|████████████████████▌            | 31236/50000 [5:40:03<3:27:58,  1.50it/s]


 62%|████████████████████▌            | 31237/50000 [5:40:03<3:21:26,  1.55it/s]


 62%|████████████████████▌            | 31238/50000 [5:40:04<3:19:50,  1.56it/s]


 62%|████████████████████▌            | 31239/50000 [5:40:05<3:28:25,  1.50it/s]


 62%|████████████████████▌            | 31240/50000 [5:40:05<3:27:03,  1.51it/s]


 62%|████████████████████▌            | 31241/50000 [5:40:06<3:27:54,  1.50it/s]


 62%|████████████████████▌            | 31242/50000 [5:40:07<3:24:55,  1.53it/s]


 62%|████████████████████▌            | 31243/50000 [5:40:07<3:29:56,  1.49it/s]


 62%|████████████████████▌            | 31244/50000 [5:40:08<3:13:04,  1.62it/s]


 62%|████████████████████▌            | 31245/50000 [5:40:09<3:22:30,  1.54it/s]


 62%|████████████████████▌            | 31246/50000 [5:40:09<3:21:45,  1.55it/s]


 62%|████████████████████▌            | 31247/50000 [5:40:10<3:15:54,  1.60it/s]


 62%|████████████████████▌            | 31248/50000 [5:40:10<3:15:21,  1.60it/s]


 62%|████████████████████▌            | 31249/50000 [5:40:11<3:15:29,  1.60it/s]


 62%|████████████████████▋            | 31250/50000 [5:40:12<3:16:11,  1.59it/s]


 63%|████████████████████▋            | 31251/50000 [5:40:12<3:34:30,  1.46it/s]


 63%|████████████████████▋            | 31252/50000 [5:40:13<3:25:55,  1.52it/s]


 63%|████████████████████▋            | 31253/50000 [5:40:14<3:44:23,  1.39it/s]


 63%|████████████████████▋            | 31254/50000 [5:40:15<3:36:25,  1.44it/s]


 63%|████████████████████▋            | 31255/50000 [5:40:15<3:27:14,  1.51it/s]


 63%|████████████████████▋            | 31256/50000 [5:40:16<3:33:28,  1.46it/s]


 63%|████████████████████▋            | 31257/50000 [5:40:17<3:47:39,  1.37it/s]


 63%|████████████████████▋            | 31258/50000 [5:40:17<3:37:32,  1.44it/s]


 63%|████████████████████▋            | 31259/50000 [5:40:18<3:32:44,  1.47it/s]


 63%|████████████████████▋            | 31260/50000 [5:40:19<3:22:54,  1.54it/s]


 63%|████████████████████▋            | 31261/50000 [5:40:19<3:10:23,  1.64it/s]


 63%|████████████████████▋            | 31262/50000 [5:40:20<3:16:42,  1.59it/s]


 63%|████████████████████▋            | 31263/50000 [5:40:20<3:21:43,  1.55it/s]


 63%|████████████████████▋            | 31264/50000 [5:40:21<3:13:28,  1.61it/s]


 63%|████████████████████▋            | 31265/50000 [5:40:22<3:23:16,  1.54it/s]


 63%|████████████████████▋            | 31266/50000 [5:40:22<3:13:57,  1.61it/s]


 63%|████████████████████▋            | 31267/50000 [5:40:23<3:14:05,  1.61it/s]


 63%|████████████████████▋            | 31268/50000 [5:40:24<3:17:34,  1.58it/s]


 63%|████████████████████▋            | 31269/50000 [5:40:24<3:27:26,  1.50it/s]


 63%|████████████████████▋            | 31270/50000 [5:40:25<3:18:12,  1.57it/s]


 63%|████████████████████▋            | 31271/50000 [5:40:26<3:21:49,  1.55it/s]


 63%|████████████████████▋            | 31272/50000 [5:40:26<3:20:09,  1.56it/s]


 63%|████████████████████▋            | 31273/50000 [5:40:27<3:23:05,  1.54it/s]


 63%|████████████████████▋            | 31274/50000 [5:40:27<3:19:12,  1.57it/s]


 63%|████████████████████▋            | 31275/50000 [5:40:28<3:15:22,  1.60it/s]


 63%|████████████████████▋            | 31276/50000 [5:40:29<3:21:00,  1.55it/s]


 63%|████████████████████▋            | 31277/50000 [5:40:29<3:21:55,  1.55it/s]


 63%|████████████████████▋            | 31278/50000 [5:40:30<3:22:32,  1.54it/s]


 63%|████████████████████▋            | 31279/50000 [5:40:31<3:13:32,  1.61it/s]


 63%|████████████████████▋            | 31280/50000 [5:40:31<3:17:19,  1.58it/s]


 63%|████████████████████▋            | 31281/50000 [5:40:32<3:28:27,  1.50it/s]


 63%|████████████████████▋            | 31282/50000 [5:40:33<3:29:29,  1.49it/s]


 63%|████████████████████▋            | 31283/50000 [5:40:33<3:25:28,  1.52it/s]


 63%|████████████████████▋            | 31284/50000 [5:40:34<3:15:04,  1.60it/s]


 63%|████████████████████▋            | 31285/50000 [5:40:34<3:12:23,  1.62it/s]


 63%|████████████████████▋            | 31286/50000 [5:40:35<3:11:07,  1.63it/s]


 63%|████████████████████▋            | 31287/50000 [5:40:36<3:21:45,  1.55it/s]


 63%|████████████████████▋            | 31288/50000 [5:40:37<3:39:24,  1.42it/s]


 63%|████████████████████▋            | 31289/50000 [5:40:37<3:24:53,  1.52it/s]


 63%|████████████████████▋            | 31290/50000 [5:40:38<3:19:51,  1.56it/s]


 63%|████████████████████▋            | 31291/50000 [5:40:39<3:30:12,  1.48it/s]


 63%|████████████████████▋            | 31292/50000 [5:40:39<3:30:07,  1.48it/s]


 63%|████████████████████▋            | 31293/50000 [5:40:40<3:21:44,  1.55it/s]


 63%|████████████████████▋            | 31294/50000 [5:40:40<3:21:47,  1.54it/s]


 63%|████████████████████▋            | 31295/50000 [5:40:41<3:39:36,  1.42it/s]


 63%|████████████████████▋            | 31296/50000 [5:40:42<3:50:15,  1.35it/s]


 63%|████████████████████▋            | 31297/50000 [5:40:43<3:40:10,  1.42it/s]


 63%|████████████████████▋            | 31298/50000 [5:40:44<3:52:27,  1.34it/s]


 63%|████████████████████▋            | 31299/50000 [5:40:44<3:39:42,  1.42it/s]


 63%|████████████████████▋            | 31300/50000 [5:40:45<3:32:18,  1.47it/s]
                                                                                
{'loss': 3.2118, 'grad_norm': 3.4387869834899902, 'learning_rate': 0.000374, 'epoch': 1.64}

 63%|████████████████████▋            | 31300/50000 [5:40:45<3:32:18,  1.47it/s]


 63%|████████████████████▋            | 31301/50000 [5:40:45<3:29:19,  1.49it/s]


 63%|████████████████████▋            | 31302/50000 [5:40:46<3:23:31,  1.53it/s]


 63%|████████████████████▋            | 31303/50000 [5:40:47<3:23:04,  1.53it/s]


 63%|████████████████████▋            | 31304/50000 [5:40:47<3:16:17,  1.59it/s]


 63%|████████████████████▋            | 31305/50000 [5:40:48<3:16:10,  1.59it/s]


 63%|████████████████████▋            | 31306/50000 [5:40:48<3:08:33,  1.65it/s]


 63%|████████████████████▋            | 31307/50000 [5:40:49<3:30:52,  1.48it/s]


 63%|████████████████████▋            | 31308/50000 [5:40:50<3:38:29,  1.43it/s]


 63%|████████████████████▋            | 31309/50000 [5:40:51<3:34:42,  1.45it/s]


 63%|████████████████████▋            | 31310/50000 [5:40:51<3:30:05,  1.48it/s]


 63%|████████████████████▋            | 31311/50000 [5:40:52<3:25:05,  1.52it/s]


 63%|████████████████████▋            | 31312/50000 [5:40:53<3:18:33,  1.57it/s]


 63%|████████████████████▋            | 31313/50000 [5:40:53<3:26:41,  1.51it/s]


 63%|████████████████████▋            | 31314/50000 [5:40:54<3:30:58,  1.48it/s]


 63%|████████████████████▋            | 31315/50000 [5:40:55<3:31:37,  1.47it/s]


 63%|████████████████████▋            | 31316/50000 [5:40:55<3:27:23,  1.50it/s]


 63%|████████████████████▋            | 31317/50000 [5:40:56<3:25:44,  1.51it/s]


 63%|████████████████████▋            | 31318/50000 [5:40:57<3:18:41,  1.57it/s]


 63%|████████████████████▋            | 31319/50000 [5:40:57<3:20:46,  1.55it/s]


 63%|████████████████████▋            | 31320/50000 [5:40:58<3:14:34,  1.60it/s]


 63%|████████████████████▋            | 31321/50000 [5:40:58<3:05:23,  1.68it/s]


 63%|████████████████████▋            | 31322/50000 [5:40:59<2:57:39,  1.75it/s]


 63%|████████████████████▋            | 31323/50000 [5:41:00<3:07:08,  1.66it/s]


 63%|████████████████████▋            | 31324/50000 [5:41:00<3:21:52,  1.54it/s]


 63%|████████████████████▋            | 31325/50000 [5:41:01<3:08:36,  1.65it/s]


 63%|████████████████████▋            | 31326/50000 [5:41:01<3:07:10,  1.66it/s]


 63%|████████████████████▋            | 31327/50000 [5:41:02<3:06:10,  1.67it/s]


 63%|████████████████████▋            | 31328/50000 [5:41:03<3:01:25,  1.72it/s]


 63%|████████████████████▋            | 31329/50000 [5:41:03<3:13:25,  1.61it/s]


 63%|████████████████████▋            | 31330/50000 [5:41:04<3:22:38,  1.54it/s]


 63%|████████████████████▋            | 31331/50000 [5:41:05<3:17:00,  1.58it/s]


 63%|████████████████████▋            | 31332/50000 [5:41:05<3:13:00,  1.61it/s]


 63%|████████████████████▋            | 31333/50000 [5:41:06<3:17:18,  1.58it/s]


 63%|████████████████████▋            | 31334/50000 [5:41:06<3:20:20,  1.55it/s]


 63%|████████████████████▋            | 31335/50000 [5:41:07<3:09:13,  1.64it/s]


 63%|████████████████████▋            | 31336/50000 [5:41:08<3:08:45,  1.65it/s]


 63%|████████████████████▋            | 31337/50000 [5:41:08<3:11:59,  1.62it/s]


 63%|████████████████████▋            | 31338/50000 [5:41:09<3:20:45,  1.55it/s]


 63%|████████████████████▋            | 31339/50000 [5:41:10<3:17:25,  1.58it/s]


 63%|████████████████████▋            | 31340/50000 [5:41:10<3:18:02,  1.57it/s]


 63%|████████████████████▋            | 31341/50000 [5:41:11<3:13:10,  1.61it/s]


 63%|████████████████████▋            | 31342/50000 [5:41:11<3:16:12,  1.58it/s]


 63%|████████████████████▋            | 31343/50000 [5:41:12<3:27:24,  1.50it/s]


 63%|████████████████████▋            | 31344/50000 [5:41:13<3:21:29,  1.54it/s]


 63%|████████████████████▋            | 31345/50000 [5:41:13<3:07:11,  1.66it/s]


 63%|████████████████████▋            | 31346/50000 [5:41:14<3:12:05,  1.62it/s]


 63%|████████████████████▋            | 31347/50000 [5:41:15<3:16:03,  1.59it/s]


 63%|████████████████████▋            | 31348/50000 [5:41:15<3:17:03,  1.58it/s]


 63%|████████████████████▋            | 31349/50000 [5:41:16<3:06:51,  1.66it/s]


 63%|████████████████████▋            | 31350/50000 [5:41:16<3:02:23,  1.70it/s]


 63%|████████████████████▋            | 31351/50000 [5:41:17<3:11:00,  1.63it/s]


 63%|████████████████████▋            | 31352/50000 [5:41:18<3:20:37,  1.55it/s]


 63%|████████████████████▋            | 31353/50000 [5:41:18<3:23:38,  1.53it/s]


 63%|████████████████████▋            | 31354/50000 [5:41:19<3:23:58,  1.52it/s]


 63%|████████████████████▋            | 31355/50000 [5:41:20<3:22:23,  1.54it/s]


 63%|████████████████████▋            | 31356/50000 [5:41:21<3:37:47,  1.43it/s]


 63%|████████████████████▋            | 31357/50000 [5:41:21<3:25:00,  1.52it/s]


 63%|████████████████████▋            | 31358/50000 [5:41:22<3:21:55,  1.54it/s]


 63%|████████████████████▋            | 31359/50000 [5:41:22<3:09:27,  1.64it/s]


 63%|████████████████████▋            | 31360/50000 [5:41:23<3:40:36,  1.41it/s]


 63%|████████████████████▋            | 31361/50000 [5:41:24<3:33:28,  1.46it/s]


 63%|████████████████████▋            | 31362/50000 [5:41:25<3:44:49,  1.38it/s]


 63%|████████████████████▋            | 31363/50000 [5:41:25<3:44:59,  1.38it/s]


 63%|████████████████████▋            | 31364/50000 [5:41:26<3:53:49,  1.33it/s]


 63%|████████████████████▋            | 31365/50000 [5:41:27<3:51:50,  1.34it/s]


 63%|████████████████████▋            | 31366/50000 [5:41:28<3:44:23,  1.38it/s]


 63%|████████████████████▋            | 31367/50000 [5:41:28<3:40:59,  1.41it/s]


 63%|████████████████████▋            | 31368/50000 [5:41:29<3:30:53,  1.47it/s]


 63%|████████████████████▋            | 31369/50000 [5:41:30<3:35:09,  1.44it/s]


 63%|████████████████████▋            | 31370/50000 [5:41:30<3:26:34,  1.50it/s]


 63%|████████████████████▋            | 31371/50000 [5:41:31<3:43:35,  1.39it/s]


 63%|████████████████████▋            | 31372/50000 [5:41:32<3:43:57,  1.39it/s]


 63%|████████████████████▋            | 31373/50000 [5:41:32<3:40:28,  1.41it/s]


 63%|████████████████████▋            | 31374/50000 [5:41:33<3:42:24,  1.40it/s]


 63%|████████████████████▋            | 31375/50000 [5:41:34<3:27:37,  1.50it/s]


 63%|████████████████████▋            | 31376/50000 [5:41:34<3:35:41,  1.44it/s]


 63%|████████████████████▋            | 31377/50000 [5:41:35<3:30:29,  1.47it/s]


 63%|████████████████████▋            | 31378/50000 [5:41:36<3:38:24,  1.42it/s]


 63%|████████████████████▋            | 31379/50000 [5:41:37<3:36:41,  1.43it/s]


 63%|████████████████████▋            | 31380/50000 [5:41:37<3:23:35,  1.52it/s]


 63%|████████████████████▋            | 31381/50000 [5:41:38<3:22:37,  1.53it/s]


 63%|████████████████████▋            | 31382/50000 [5:41:38<3:14:55,  1.59it/s]


 63%|████████████████████▋            | 31383/50000 [5:41:39<3:36:01,  1.44it/s]


 63%|████████████████████▋            | 31384/50000 [5:41:40<3:28:59,  1.48it/s]


 63%|████████████████████▋            | 31385/50000 [5:41:40<3:19:22,  1.56it/s]


 63%|████████████████████▋            | 31386/50000 [5:41:41<3:10:52,  1.63it/s]


 63%|████████████████████▋            | 31387/50000 [5:41:41<3:06:04,  1.67it/s]


 63%|████████████████████▋            | 31388/50000 [5:41:42<3:06:13,  1.67it/s]


 63%|████████████████████▋            | 31389/50000 [5:41:43<3:11:35,  1.62it/s]


 63%|████████████████████▋            | 31390/50000 [5:41:43<3:12:27,  1.61it/s]


 63%|████████████████████▋            | 31391/50000 [5:41:44<3:23:15,  1.53it/s]


 63%|████████████████████▋            | 31392/50000 [5:41:45<3:21:10,  1.54it/s]


 63%|████████████████████▋            | 31393/50000 [5:41:45<3:31:11,  1.47it/s]


 63%|████████████████████▋            | 31394/50000 [5:41:46<3:20:16,  1.55it/s]


 63%|████████████████████▋            | 31395/50000 [5:41:47<3:19:10,  1.56it/s]


 63%|████████████████████▋            | 31396/50000 [5:41:47<3:10:40,  1.63it/s]


 63%|████████████████████▋            | 31397/50000 [5:41:48<3:06:30,  1.66it/s]


 63%|████████████████████▋            | 31398/50000 [5:41:48<3:04:45,  1.68it/s]


 63%|████████████████████▋            | 31399/50000 [5:41:49<3:02:23,  1.70it/s]


 63%|████████████████████▋            | 31400/50000 [5:41:50<3:24:10,  1.52it/s]
                                                                                
{'loss': 3.2286, 'grad_norm': 2.9451191425323486, 'learning_rate': 0.000372, 'epoch': 1.64}

 63%|████████████████████▋            | 31400/50000 [5:41:50<3:24:10,  1.52it/s]


 63%|████████████████████▋            | 31401/50000 [5:41:50<3:15:23,  1.59it/s]


 63%|████████████████████▋            | 31402/50000 [5:41:51<3:22:43,  1.53it/s]


 63%|████████████████████▋            | 31403/50000 [5:41:52<3:29:48,  1.48it/s]


 63%|████████████████████▋            | 31404/50000 [5:41:52<3:28:05,  1.49it/s]


 63%|████████████████████▋            | 31405/50000 [5:41:53<3:28:37,  1.49it/s]


 63%|████████████████████▋            | 31406/50000 [5:41:54<3:43:04,  1.39it/s]


 63%|████████████████████▋            | 31407/50000 [5:41:55<3:29:45,  1.48it/s]


 63%|████████████████████▋            | 31408/50000 [5:41:55<3:22:06,  1.53it/s]


 63%|████████████████████▋            | 31409/50000 [5:41:56<3:12:11,  1.61it/s]


 63%|████████████████████▋            | 31410/50000 [5:41:56<3:16:53,  1.57it/s]


 63%|████████████████████▋            | 31411/50000 [5:41:57<3:19:05,  1.56it/s]


 63%|████████████████████▋            | 31412/50000 [5:41:58<3:13:33,  1.60it/s]


 63%|████████████████████▋            | 31413/50000 [5:41:58<3:11:56,  1.61it/s]


 63%|████████████████████▋            | 31414/50000 [5:41:59<3:14:19,  1.59it/s]


 63%|████████████████████▋            | 31415/50000 [5:41:59<3:08:24,  1.64it/s]


 63%|████████████████████▋            | 31416/50000 [5:42:00<3:03:04,  1.69it/s]


 63%|████████████████████▋            | 31417/50000 [5:42:01<3:06:23,  1.66it/s]


 63%|████████████████████▋            | 31418/50000 [5:42:01<3:09:51,  1.63it/s]


 63%|████████████████████▋            | 31419/50000 [5:42:02<3:09:19,  1.64it/s]


 63%|████████████████████▋            | 31420/50000 [5:42:02<3:10:10,  1.63it/s]


 63%|████████████████████▋            | 31421/50000 [5:42:03<3:01:39,  1.70it/s]


 63%|████████████████████▋            | 31422/50000 [5:42:03<2:53:22,  1.79it/s]


 63%|████████████████████▋            | 31423/50000 [5:42:04<3:07:14,  1.65it/s]


 63%|████████████████████▋            | 31424/50000 [5:42:05<3:18:20,  1.56it/s]


 63%|████████████████████▋            | 31425/50000 [5:42:06<3:31:27,  1.46it/s]


 63%|████████████████████▋            | 31426/50000 [5:42:06<3:23:53,  1.52it/s]


 63%|████████████████████▋            | 31427/50000 [5:42:07<3:22:41,  1.53it/s]


 63%|████████████████████▋            | 31428/50000 [5:42:08<3:20:44,  1.54it/s]


 63%|████████████████████▋            | 31429/50000 [5:42:08<3:16:21,  1.58it/s]


 63%|████████████████████▋            | 31430/50000 [5:42:09<3:16:20,  1.58it/s]


 63%|████████████████████▋            | 31431/50000 [5:42:09<3:20:20,  1.54it/s]


 63%|████████████████████▋            | 31432/50000 [5:42:10<3:15:31,  1.58it/s]


 63%|████████████████████▋            | 31433/50000 [5:42:11<3:09:49,  1.63it/s]


 63%|████████████████████▋            | 31434/50000 [5:42:11<3:29:46,  1.48it/s]


 63%|████████████████████▋            | 31435/50000 [5:42:12<3:35:14,  1.44it/s]


 63%|████████████████████▋            | 31436/50000 [5:42:13<3:29:26,  1.48it/s]


 63%|████████████████████▋            | 31437/50000 [5:42:13<3:21:14,  1.54it/s]


 63%|████████████████████▋            | 31438/50000 [5:42:14<3:21:57,  1.53it/s]


 63%|████████████████████▋            | 31439/50000 [5:42:15<3:40:43,  1.40it/s]


 63%|████████████████████▊            | 31440/50000 [5:42:16<3:33:49,  1.45it/s]


 63%|████████████████████▊            | 31441/50000 [5:42:16<3:32:55,  1.45it/s]


 63%|████████████████████▊            | 31442/50000 [5:42:17<3:17:09,  1.57it/s]


 63%|████████████████████▊            | 31443/50000 [5:42:17<3:21:15,  1.54it/s]


 63%|████████████████████▊            | 31444/50000 [5:42:18<3:28:55,  1.48it/s]


 63%|████████████████████▊            | 31445/50000 [5:42:19<3:17:36,  1.56it/s]


 63%|████████████████████▊            | 31446/50000 [5:42:19<3:13:43,  1.60it/s]


 63%|████████████████████▊            | 31447/50000 [5:42:20<3:22:56,  1.52it/s]


 63%|████████████████████▊            | 31448/50000 [5:42:21<3:30:04,  1.47it/s]


 63%|████████████████████▊            | 31449/50000 [5:42:22<3:39:10,  1.41it/s]


 63%|████████████████████▊            | 31450/50000 [5:42:22<3:26:27,  1.50it/s]


 63%|████████████████████▊            | 31451/50000 [5:42:23<3:30:32,  1.47it/s]


 63%|████████████████████▊            | 31452/50000 [5:42:24<3:30:38,  1.47it/s]


 63%|████████████████████▊            | 31453/50000 [5:42:24<3:24:14,  1.51it/s]


 63%|████████████████████▊            | 31454/50000 [5:42:25<3:25:59,  1.50it/s]


 63%|████████████████████▊            | 31455/50000 [5:42:25<3:20:09,  1.54it/s]


 63%|████████████████████▊            | 31456/50000 [5:42:26<3:13:44,  1.60it/s]


 63%|████████████████████▊            | 31457/50000 [5:42:27<3:24:40,  1.51it/s]


 63%|████████████████████▊            | 31458/50000 [5:42:27<3:22:43,  1.52it/s]


 63%|████████████████████▊            | 31459/50000 [5:42:28<3:27:49,  1.49it/s]


 63%|████████████████████▊            | 31460/50000 [5:42:29<3:20:14,  1.54it/s]


 63%|████████████████████▊            | 31461/50000 [5:42:29<3:21:41,  1.53it/s]


 63%|████████████████████▊            | 31462/50000 [5:42:30<3:12:32,  1.60it/s]


 63%|████████████████████▊            | 31463/50000 [5:42:31<3:21:17,  1.53it/s]


 63%|████████████████████▊            | 31464/50000 [5:42:31<3:32:56,  1.45it/s]


 63%|████████████████████▊            | 31465/50000 [5:42:32<3:38:39,  1.41it/s]


 63%|████████████████████▊            | 31466/50000 [5:42:33<3:27:16,  1.49it/s]


 63%|████████████████████▊            | 31467/50000 [5:42:33<3:18:00,  1.56it/s]


 63%|████████████████████▊            | 31468/50000 [5:42:34<3:20:51,  1.54it/s]


 63%|████████████████████▊            | 31469/50000 [5:42:35<3:20:24,  1.54it/s]


 63%|████████████████████▊            | 31470/50000 [5:42:35<3:15:59,  1.58it/s]


 63%|████████████████████▊            | 31471/50000 [5:42:36<3:12:52,  1.60it/s]


 63%|████████████████████▊            | 31472/50000 [5:42:37<3:16:01,  1.58it/s]


 63%|████████████████████▊            | 31473/50000 [5:42:37<3:32:11,  1.46it/s]


 63%|████████████████████▊            | 31474/50000 [5:42:38<3:30:19,  1.47it/s]


 63%|████████████████████▊            | 31475/50000 [5:42:39<3:27:47,  1.49it/s]


 63%|████████████████████▊            | 31476/50000 [5:42:39<3:28:45,  1.48it/s]


 63%|████████████████████▊            | 31477/50000 [5:42:40<3:23:37,  1.52it/s]


 63%|████████████████████▊            | 31478/50000 [5:42:41<3:22:53,  1.52it/s]


 63%|████████████████████▊            | 31479/50000 [5:42:41<3:16:33,  1.57it/s]


 63%|████████████████████▊            | 31480/50000 [5:42:42<3:04:15,  1.68it/s]


 63%|████████████████████▊            | 31481/50000 [5:42:42<3:11:08,  1.61it/s]


 63%|████████████████████▊            | 31482/50000 [5:42:43<3:08:52,  1.63it/s]


 63%|████████████████████▊            | 31483/50000 [5:42:44<3:15:38,  1.58it/s]


 63%|████████████████████▊            | 31484/50000 [5:42:44<3:09:20,  1.63it/s]


 63%|████████████████████▊            | 31485/50000 [5:42:45<3:12:32,  1.60it/s]


 63%|████████████████████▊            | 31486/50000 [5:42:46<3:24:49,  1.51it/s]


 63%|████████████████████▊            | 31487/50000 [5:42:46<3:21:22,  1.53it/s]


 63%|████████████████████▊            | 31488/50000 [5:42:47<3:28:31,  1.48it/s]


 63%|████████████████████▊            | 31489/50000 [5:42:48<3:29:17,  1.47it/s]


 63%|████████████████████▊            | 31490/50000 [5:42:48<3:24:40,  1.51it/s]


 63%|████████████████████▊            | 31491/50000 [5:42:49<3:38:21,  1.41it/s]


 63%|████████████████████▊            | 31492/50000 [5:42:50<3:32:12,  1.45it/s]


 63%|████████████████████▊            | 31493/50000 [5:42:50<3:28:34,  1.48it/s]


 63%|████████████████████▊            | 31494/50000 [5:42:51<3:26:28,  1.49it/s]


 63%|████████████████████▊            | 31495/50000 [5:42:52<3:27:21,  1.49it/s]


 63%|████████████████████▊            | 31496/50000 [5:42:52<3:32:19,  1.45it/s]


 63%|████████████████████▊            | 31497/50000 [5:42:53<3:14:16,  1.59it/s]


 63%|████████████████████▊            | 31498/50000 [5:42:54<3:17:07,  1.56it/s]


 63%|████████████████████▊            | 31499/50000 [5:42:54<3:19:06,  1.55it/s]


 63%|████████████████████▊            | 31500/50000 [5:42:55<3:22:12,  1.52it/s]
                                                                                
{'loss': 3.2522, 'grad_norm': 2.9929981231689453, 'learning_rate': 0.00037, 'epoch': 1.65}

 63%|████████████████████▊            | 31500/50000 [5:42:55<3:22:12,  1.52it/s]


 63%|████████████████████▊            | 31501/50000 [5:42:56<3:21:55,  1.53it/s]


 63%|████████████████████▊            | 31502/50000 [5:42:56<3:20:28,  1.54it/s]


 63%|████████████████████▊            | 31503/50000 [5:42:57<3:13:02,  1.60it/s]


 63%|████████████████████▊            | 31504/50000 [5:42:57<3:12:33,  1.60it/s]


 63%|████████████████████▊            | 31505/50000 [5:42:58<3:20:22,  1.54it/s]


 63%|████████████████████▊            | 31506/50000 [5:42:59<3:19:59,  1.54it/s]


 63%|████████████████████▊            | 31507/50000 [5:42:59<3:08:17,  1.64it/s]


 63%|████████████████████▊            | 31508/50000 [5:43:00<3:04:29,  1.67it/s]


 63%|████████████████████▊            | 31509/50000 [5:43:01<3:10:35,  1.62it/s]


 63%|████████████████████▊            | 31510/50000 [5:43:01<3:11:57,  1.61it/s]


 63%|████████████████████▊            | 31511/50000 [5:43:02<3:37:30,  1.42it/s]


 63%|████████████████████▊            | 31512/50000 [5:43:03<3:34:52,  1.43it/s]


 63%|████████████████████▊            | 31513/50000 [5:43:03<3:37:20,  1.42it/s]


 63%|████████████████████▊            | 31514/50000 [5:43:04<3:42:43,  1.38it/s]


 63%|████████████████████▊            | 31515/50000 [5:43:05<3:37:31,  1.42it/s]


 63%|████████████████████▊            | 31516/50000 [5:43:06<3:27:26,  1.49it/s]


 63%|████████████████████▊            | 31517/50000 [5:43:06<3:13:04,  1.60it/s]


 63%|████████████████████▊            | 31518/50000 [5:43:07<3:02:29,  1.69it/s]


 63%|████████████████████▊            | 31519/50000 [5:43:07<3:25:47,  1.50it/s]


 63%|████████████████████▊            | 31520/50000 [5:43:08<3:19:10,  1.55it/s]


 63%|████████████████████▊            | 31521/50000 [5:43:09<3:19:37,  1.54it/s]


 63%|████████████████████▊            | 31522/50000 [5:43:09<3:17:38,  1.56it/s]


 63%|████████████████████▊            | 31523/50000 [5:43:10<3:17:22,  1.56it/s]


 63%|████████████████████▊            | 31524/50000 [5:43:11<3:24:30,  1.51it/s]


 63%|████████████████████▊            | 31525/50000 [5:43:11<3:41:04,  1.39it/s]


 63%|████████████████████▊            | 31526/50000 [5:43:12<3:42:24,  1.38it/s]


 63%|████████████████████▊            | 31527/50000 [5:43:13<3:29:33,  1.47it/s]


 63%|████████████████████▊            | 31528/50000 [5:43:13<3:32:11,  1.45it/s]


 63%|████████████████████▊            | 31529/50000 [5:43:14<3:36:34,  1.42it/s]


 63%|████████████████████▊            | 31530/50000 [5:43:15<3:41:58,  1.39it/s]


 63%|████████████████████▊            | 31531/50000 [5:43:16<3:30:54,  1.46it/s]


 63%|████████████████████▊            | 31532/50000 [5:43:16<3:44:09,  1.37it/s]


 63%|████████████████████▊            | 31533/50000 [5:43:17<3:43:44,  1.38it/s]


 63%|████████████████████▊            | 31534/50000 [5:43:18<3:37:24,  1.42it/s]


 63%|████████████████████▊            | 31535/50000 [5:43:18<3:30:03,  1.47it/s]


 63%|████████████████████▊            | 31536/50000 [5:43:19<3:27:52,  1.48it/s]


 63%|████████████████████▊            | 31537/50000 [5:43:20<3:15:49,  1.57it/s]


 63%|████████████████████▊            | 31538/50000 [5:43:20<3:19:00,  1.55it/s]


 63%|████████████████████▊            | 31539/50000 [5:43:21<3:11:20,  1.61it/s]


 63%|████████████████████▊            | 31540/50000 [5:43:22<3:20:45,  1.53it/s]


 63%|████████████████████▊            | 31541/50000 [5:43:22<3:12:32,  1.60it/s]


 63%|████████████████████▊            | 31542/50000 [5:43:23<3:26:58,  1.49it/s]


 63%|████████████████████▊            | 31543/50000 [5:43:23<3:15:22,  1.57it/s]


 63%|████████████████████▊            | 31544/50000 [5:43:24<3:23:14,  1.51it/s]


 63%|████████████████████▊            | 31545/50000 [5:43:25<3:34:39,  1.43it/s]


 63%|████████████████████▊            | 31546/50000 [5:43:26<3:23:04,  1.51it/s]


 63%|████████████████████▊            | 31547/50000 [5:43:26<3:20:11,  1.54it/s]


 63%|████████████████████▊            | 31548/50000 [5:43:27<3:19:54,  1.54it/s]


 63%|████████████████████▊            | 31549/50000 [5:43:27<3:17:33,  1.56it/s]


 63%|████████████████████▊            | 31550/50000 [5:43:28<3:30:31,  1.46it/s]


 63%|████████████████████▊            | 31551/50000 [5:43:29<3:33:49,  1.44it/s]


 63%|████████████████████▊            | 31552/50000 [5:43:30<3:32:21,  1.45it/s]


 63%|████████████████████▊            | 31553/50000 [5:43:30<3:19:05,  1.54it/s]


 63%|████████████████████▊            | 31554/50000 [5:43:31<3:46:02,  1.36it/s]


 63%|████████████████████▊            | 31555/50000 [5:43:32<3:32:09,  1.45it/s]


 63%|████████████████████▊            | 31556/50000 [5:43:32<3:36:57,  1.42it/s]


 63%|████████████████████▊            | 31557/50000 [5:43:33<3:23:22,  1.51it/s]


 63%|████████████████████▊            | 31558/50000 [5:43:34<3:14:18,  1.58it/s]


 63%|████████████████████▊            | 31559/50000 [5:43:34<3:12:01,  1.60it/s]


 63%|████████████████████▊            | 31560/50000 [5:43:35<3:13:20,  1.59it/s]


 63%|████████████████████▊            | 31561/50000 [5:43:35<3:03:28,  1.67it/s]


 63%|████████████████████▊            | 31562/50000 [5:43:36<2:56:01,  1.75it/s]


 63%|████████████████████▊            | 31563/50000 [5:43:36<2:55:44,  1.75it/s]


 63%|████████████████████▊            | 31564/50000 [5:43:37<3:00:15,  1.70it/s]


 63%|████████████████████▊            | 31565/50000 [5:43:38<2:58:45,  1.72it/s]


 63%|████████████████████▊            | 31566/50000 [5:43:38<3:02:21,  1.68it/s]


 63%|████████████████████▊            | 31567/50000 [5:43:39<3:17:57,  1.55it/s]


 63%|████████████████████▊            | 31568/50000 [5:43:40<3:18:45,  1.55it/s]


 63%|████████████████████▊            | 31569/50000 [5:43:40<3:11:38,  1.60it/s]


 63%|████████████████████▊            | 31570/50000 [5:43:41<3:13:03,  1.59it/s]


 63%|████████████████████▊            | 31571/50000 [5:43:41<3:02:42,  1.68it/s]


 63%|████████████████████▊            | 31572/50000 [5:43:42<3:06:16,  1.65it/s]


 63%|████████████████████▊            | 31573/50000 [5:43:43<3:07:35,  1.64it/s]


 63%|████████████████████▊            | 31574/50000 [5:43:43<3:11:21,  1.60it/s]


 63%|████████████████████▊            | 31575/50000 [5:43:44<3:30:51,  1.46it/s]


 63%|████████████████████▊            | 31576/50000 [5:43:45<3:27:52,  1.48it/s]


 63%|████████████████████▊            | 31577/50000 [5:43:45<3:31:20,  1.45it/s]


 63%|████████████████████▊            | 31578/50000 [5:43:46<3:34:19,  1.43it/s]


 63%|████████████████████▊            | 31579/50000 [5:43:47<3:24:28,  1.50it/s]


 63%|████████████████████▊            | 31580/50000 [5:43:47<3:22:13,  1.52it/s]


 63%|████████████████████▊            | 31581/50000 [5:43:48<3:21:48,  1.52it/s]


 63%|████████████████████▊            | 31582/50000 [5:43:49<3:15:08,  1.57it/s]


 63%|████████████████████▊            | 31583/50000 [5:43:49<3:13:56,  1.58it/s]


 63%|████████████████████▊            | 31584/50000 [5:43:50<3:11:02,  1.61it/s]


 63%|████████████████████▊            | 31585/50000 [5:43:51<3:15:28,  1.57it/s]


 63%|████████████████████▊            | 31586/50000 [5:43:51<3:24:51,  1.50it/s]


 63%|████████████████████▊            | 31587/50000 [5:43:52<3:19:29,  1.54it/s]


 63%|████████████████████▊            | 31588/50000 [5:43:53<3:14:11,  1.58it/s]


 63%|████████████████████▊            | 31589/50000 [5:43:53<3:19:08,  1.54it/s]


 63%|████████████████████▊            | 31590/50000 [5:43:54<3:10:55,  1.61it/s]


 63%|████████████████████▊            | 31591/50000 [5:43:54<3:14:25,  1.58it/s]


 63%|████████████████████▊            | 31592/50000 [5:43:55<3:21:36,  1.52it/s]


 63%|████████████████████▊            | 31593/50000 [5:43:56<3:29:41,  1.46it/s]


 63%|████████████████████▊            | 31594/50000 [5:43:57<3:29:53,  1.46it/s]


 63%|████████████████████▊            | 31595/50000 [5:43:57<3:27:46,  1.48it/s]


 63%|████████████████████▊            | 31596/50000 [5:43:58<3:27:57,  1.47it/s]


 63%|████████████████████▊            | 31597/50000 [5:43:59<3:20:56,  1.53it/s]


 63%|████████████████████▊            | 31598/50000 [5:43:59<3:18:21,  1.55it/s]


 63%|████████████████████▊            | 31599/50000 [5:44:00<3:19:22,  1.54it/s]


 63%|████████████████████▊            | 31600/50000 [5:44:01<3:31:25,  1.45it/s]
                                                                                
{'loss': 3.2496, 'grad_norm': 3.201050043106079, 'learning_rate': 0.000368, 'epoch': 1.65}

 63%|████████████████████▊            | 31600/50000 [5:44:01<3:31:25,  1.45it/s]


 63%|████████████████████▊            | 31601/50000 [5:44:01<3:37:34,  1.41it/s]


 63%|████████████████████▊            | 31602/50000 [5:44:02<3:24:12,  1.50it/s]


 63%|████████████████████▊            | 31603/50000 [5:44:03<3:34:15,  1.43it/s]


 63%|████████████████████▊            | 31604/50000 [5:44:03<3:31:05,  1.45it/s]


 63%|████████████████████▊            | 31605/50000 [5:44:04<3:25:31,  1.49it/s]


 63%|████████████████████▊            | 31606/50000 [5:44:05<3:29:07,  1.47it/s]


 63%|████████████████████▊            | 31607/50000 [5:44:05<3:18:46,  1.54it/s]


 63%|████████████████████▊            | 31608/50000 [5:44:06<3:51:57,  1.32it/s]


 63%|████████████████████▊            | 31609/50000 [5:44:07<3:49:34,  1.34it/s]


 63%|████████████████████▊            | 31610/50000 [5:44:08<3:31:22,  1.45it/s]


 63%|████████████████████▊            | 31611/50000 [5:44:08<3:22:04,  1.52it/s]


 63%|████████████████████▊            | 31612/50000 [5:44:09<3:22:35,  1.51it/s]


 63%|████████████████████▊            | 31613/50000 [5:44:09<3:14:46,  1.57it/s]


 63%|████████████████████▊            | 31614/50000 [5:44:10<3:00:55,  1.69it/s]


 63%|████████████████████▊            | 31615/50000 [5:44:11<3:16:08,  1.56it/s]


 63%|████████████████████▊            | 31616/50000 [5:44:11<3:19:33,  1.54it/s]


 63%|████████████████████▊            | 31617/50000 [5:44:12<3:29:43,  1.46it/s]


 63%|████████████████████▊            | 31618/50000 [5:44:13<3:23:50,  1.50it/s]


 63%|████████████████████▊            | 31619/50000 [5:44:13<3:20:18,  1.53it/s]


 63%|████████████████████▊            | 31620/50000 [5:44:14<3:37:34,  1.41it/s]


 63%|████████████████████▊            | 31621/50000 [5:44:15<3:24:44,  1.50it/s]


 63%|████████████████████▊            | 31622/50000 [5:44:15<3:25:34,  1.49it/s]


 63%|████████████████████▊            | 31623/50000 [5:44:16<3:24:23,  1.50it/s]


 63%|████████████████████▊            | 31624/50000 [5:44:17<3:18:21,  1.54it/s]


 63%|████████████████████▊            | 31625/50000 [5:44:18<3:50:59,  1.33it/s]


 63%|████████████████████▊            | 31626/50000 [5:44:18<3:33:27,  1.43it/s]


 63%|████████████████████▊            | 31627/50000 [5:44:19<3:29:58,  1.46it/s]


 63%|████████████████████▊            | 31628/50000 [5:44:19<3:22:11,  1.51it/s]


 63%|████████████████████▉            | 31629/50000 [5:44:20<3:16:09,  1.56it/s]


 63%|████████████████████▉            | 31630/50000 [5:44:21<3:17:08,  1.55it/s]


 63%|████████████████████▉            | 31631/50000 [5:44:21<3:02:33,  1.68it/s]


 63%|████████████████████▉            | 31632/50000 [5:44:22<2:54:38,  1.75it/s]


 63%|████████████████████▉            | 31633/50000 [5:44:22<3:10:28,  1.61it/s]


 63%|████████████████████▉            | 31634/50000 [5:44:23<3:07:57,  1.63it/s]


 63%|████████████████████▉            | 31635/50000 [5:44:24<3:16:42,  1.56it/s]


 63%|████████████████████▉            | 31636/50000 [5:44:24<3:15:16,  1.57it/s]


 63%|████████████████████▉            | 31637/50000 [5:44:25<3:17:48,  1.55it/s]


 63%|████████████████████▉            | 31638/50000 [5:44:26<3:19:31,  1.53it/s]


 63%|████████████████████▉            | 31639/50000 [5:44:26<3:18:58,  1.54it/s]


 63%|████████████████████▉            | 31640/50000 [5:44:27<3:14:30,  1.57it/s]


 63%|████████████████████▉            | 31641/50000 [5:44:28<3:07:21,  1.63it/s]


 63%|████████████████████▉            | 31642/50000 [5:44:28<3:09:28,  1.61it/s]


 63%|████████████████████▉            | 31643/50000 [5:44:29<3:07:48,  1.63it/s]


 63%|████████████████████▉            | 31644/50000 [5:44:29<3:07:12,  1.63it/s]


 63%|████████████████████▉            | 31645/50000 [5:44:30<2:57:16,  1.73it/s]


 63%|████████████████████▉            | 31646/50000 [5:44:30<2:58:40,  1.71it/s]


 63%|████████████████████▉            | 31647/50000 [5:44:31<2:49:34,  1.80it/s]


 63%|████████████████████▉            | 31648/50000 [5:44:32<2:56:13,  1.74it/s]


 63%|████████████████████▉            | 31649/50000 [5:44:32<2:56:48,  1.73it/s]


 63%|████████████████████▉            | 31650/50000 [5:44:33<3:03:02,  1.67it/s]


 63%|████████████████████▉            | 31651/50000 [5:44:33<3:01:36,  1.68it/s]


 63%|████████████████████▉            | 31652/50000 [5:44:34<3:08:44,  1.62it/s]


 63%|████████████████████▉            | 31653/50000 [5:44:35<3:07:20,  1.63it/s]


 63%|████████████████████▉            | 31654/50000 [5:44:35<3:25:21,  1.49it/s]


 63%|████████████████████▉            | 31655/50000 [5:44:36<3:16:36,  1.56it/s]


 63%|████████████████████▉            | 31656/50000 [5:44:37<3:20:23,  1.53it/s]


 63%|████████████████████▉            | 31657/50000 [5:44:37<3:15:24,  1.56it/s]


 63%|████████████████████▉            | 31658/50000 [5:44:38<3:19:31,  1.53it/s]


 63%|████████████████████▉            | 31659/50000 [5:44:39<3:19:26,  1.53it/s]


 63%|████████████████████▉            | 31660/50000 [5:44:39<3:13:35,  1.58it/s]


 63%|████████████████████▉            | 31661/50000 [5:44:40<3:05:30,  1.65it/s]


 63%|████████████████████▉            | 31662/50000 [5:44:40<3:02:41,  1.67it/s]


 63%|████████████████████▉            | 31663/50000 [5:44:41<3:12:56,  1.58it/s]


 63%|████████████████████▉            | 31664/50000 [5:44:42<3:23:44,  1.50it/s]


 63%|████████████████████▉            | 31665/50000 [5:44:42<3:15:26,  1.56it/s]


 63%|████████████████████▉            | 31666/50000 [5:44:43<3:18:52,  1.54it/s]


 63%|████████████████████▉            | 31667/50000 [5:44:44<3:09:24,  1.61it/s]


 63%|████████████████████▉            | 31668/50000 [5:44:44<3:05:37,  1.65it/s]


 63%|████████████████████▉            | 31669/50000 [5:44:45<3:17:08,  1.55it/s]


 63%|████████████████████▉            | 31670/50000 [5:44:46<3:15:05,  1.57it/s]


 63%|████████████████████▉            | 31671/50000 [5:44:46<3:19:26,  1.53it/s]


 63%|████████████████████▉            | 31672/50000 [5:44:47<3:22:21,  1.51it/s]


 63%|████████████████████▉            | 31673/50000 [5:44:48<3:15:35,  1.56it/s]


 63%|████████████████████▉            | 31674/50000 [5:44:48<3:18:55,  1.54it/s]


 63%|████████████████████▉            | 31675/50000 [5:44:49<3:20:43,  1.52it/s]


 63%|████████████████████▉            | 31676/50000 [5:44:49<3:10:42,  1.60it/s]


 63%|████████████████████▉            | 31677/50000 [5:44:50<3:16:09,  1.56it/s]


 63%|████████████████████▉            | 31678/50000 [5:44:51<3:10:06,  1.61it/s]


 63%|████████████████████▉            | 31679/50000 [5:44:51<3:14:31,  1.57it/s]


 63%|████████████████████▉            | 31680/50000 [5:44:52<3:14:48,  1.57it/s]


 63%|████████████████████▉            | 31681/50000 [5:44:53<3:23:57,  1.50it/s]


 63%|████████████████████▉            | 31682/50000 [5:44:53<3:30:17,  1.45it/s]


 63%|████████████████████▉            | 31683/50000 [5:44:54<3:26:54,  1.48it/s]


 63%|████████████████████▉            | 31684/50000 [5:44:55<3:11:00,  1.60it/s]


 63%|████████████████████▉            | 31685/50000 [5:44:55<3:14:35,  1.57it/s]


 63%|████████████████████▉            | 31686/50000 [5:44:56<3:09:24,  1.61it/s]


 63%|████████████████████▉            | 31687/50000 [5:44:56<3:07:06,  1.63it/s]


 63%|████████████████████▉            | 31688/50000 [5:44:57<3:10:38,  1.60it/s]


 63%|████████████████████▉            | 31689/50000 [5:44:58<3:22:05,  1.51it/s]


 63%|████████████████████▉            | 31690/50000 [5:44:59<3:22:55,  1.50it/s]


 63%|████████████████████▉            | 31691/50000 [5:44:59<3:28:17,  1.46it/s]


 63%|████████████████████▉            | 31692/50000 [5:45:00<3:24:17,  1.49it/s]


 63%|████████████████████▉            | 31693/50000 [5:45:01<3:21:04,  1.52it/s]


 63%|████████████████████▉            | 31694/50000 [5:45:01<3:06:29,  1.64it/s]


 63%|████████████████████▉            | 31695/50000 [5:45:02<3:17:05,  1.55it/s]


 63%|████████████████████▉            | 31696/50000 [5:45:02<3:16:06,  1.56it/s]


 63%|████████████████████▉            | 31697/50000 [5:45:03<3:34:27,  1.42it/s]


 63%|████████████████████▉            | 31698/50000 [5:45:04<3:28:16,  1.46it/s]


 63%|████████████████████▉            | 31699/50000 [5:45:05<3:21:34,  1.51it/s]


 63%|████████████████████▉            | 31700/50000 [5:45:05<3:20:57,  1.52it/s]
                                                                                
{'loss': 3.2542, 'grad_norm': 3.382310152053833, 'learning_rate': 0.000366, 'epoch': 1.66}

 63%|████████████████████▉            | 31700/50000 [5:45:05<3:20:57,  1.52it/s]


 63%|████████████████████▉            | 31701/50000 [5:45:06<3:27:08,  1.47it/s]


 63%|████████████████████▉            | 31702/50000 [5:45:07<3:22:59,  1.50it/s]


 63%|████████████████████▉            | 31703/50000 [5:45:07<3:13:05,  1.58it/s]


 63%|████████████████████▉            | 31704/50000 [5:45:08<3:15:23,  1.56it/s]


 63%|████████████████████▉            | 31705/50000 [5:45:08<3:14:05,  1.57it/s]


 63%|████████████████████▉            | 31706/50000 [5:45:09<3:03:39,  1.66it/s]


 63%|████████████████████▉            | 31707/50000 [5:45:09<2:54:04,  1.75it/s]


 63%|████████████████████▉            | 31708/50000 [5:45:10<3:09:50,  1.61it/s]


 63%|████████████████████▉            | 31709/50000 [5:45:11<3:03:19,  1.66it/s]


 63%|████████████████████▉            | 31710/50000 [5:45:11<3:13:10,  1.58it/s]


 63%|████████████████████▉            | 31711/50000 [5:45:12<3:10:53,  1.60it/s]


 63%|████████████████████▉            | 31712/50000 [5:45:13<3:23:21,  1.50it/s]


 63%|████████████████████▉            | 31713/50000 [5:45:13<3:14:22,  1.57it/s]


 63%|████████████████████▉            | 31714/50000 [5:45:14<3:11:37,  1.59it/s]


 63%|████████████████████▉            | 31715/50000 [5:45:15<3:23:16,  1.50it/s]


 63%|████████████████████▉            | 31716/50000 [5:45:15<3:17:49,  1.54it/s]


 63%|████████████████████▉            | 31717/50000 [5:45:16<3:11:45,  1.59it/s]


 63%|████████████████████▉            | 31718/50000 [5:45:16<3:06:55,  1.63it/s]


 63%|████████████████████▉            | 31719/50000 [5:45:17<3:25:58,  1.48it/s]


 63%|████████████████████▉            | 31720/50000 [5:45:18<3:24:56,  1.49it/s]


 63%|████████████████████▉            | 31721/50000 [5:45:19<3:32:32,  1.43it/s]


 63%|████████████████████▉            | 31722/50000 [5:45:19<3:22:01,  1.51it/s]


 63%|████████████████████▉            | 31723/50000 [5:45:20<3:20:57,  1.52it/s]


 63%|████████████████████▉            | 31724/50000 [5:45:21<3:27:28,  1.47it/s]


 63%|████████████████████▉            | 31725/50000 [5:45:21<3:23:04,  1.50it/s]


 63%|████████████████████▉            | 31726/50000 [5:45:22<3:13:39,  1.57it/s]


 63%|████████████████████▉            | 31727/50000 [5:45:22<3:07:06,  1.63it/s]


 63%|████████████████████▉            | 31728/50000 [5:45:23<3:03:39,  1.66it/s]


 63%|████████████████████▉            | 31729/50000 [5:45:24<3:01:11,  1.68it/s]


 63%|████████████████████▉            | 31730/50000 [5:45:24<3:13:12,  1.58it/s]


 63%|████████████████████▉            | 31731/50000 [5:45:25<3:15:27,  1.56it/s]


 63%|████████████████████▉            | 31732/50000 [5:45:26<3:08:55,  1.61it/s]


 63%|████████████████████▉            | 31733/50000 [5:45:26<3:12:24,  1.58it/s]


 63%|████████████████████▉            | 31734/50000 [5:45:27<3:13:40,  1.57it/s]


 63%|████████████████████▉            | 31735/50000 [5:45:28<3:23:25,  1.50it/s]


 63%|████████████████████▉            | 31736/50000 [5:45:28<3:23:05,  1.50it/s]


 63%|████████████████████▉            | 31737/50000 [5:45:29<3:14:49,  1.56it/s]


 63%|████████████████████▉            | 31738/50000 [5:45:29<3:10:45,  1.60it/s]


 63%|████████████████████▉            | 31739/50000 [5:45:30<3:13:05,  1.58it/s]


 63%|████████████████████▉            | 31740/50000 [5:45:31<3:09:32,  1.61it/s]


 63%|████████████████████▉            | 31741/50000 [5:45:31<3:07:33,  1.62it/s]


 63%|████████████████████▉            | 31742/50000 [5:45:32<3:16:01,  1.55it/s]


 63%|████████████████████▉            | 31743/50000 [5:45:33<3:16:14,  1.55it/s]


 63%|████████████████████▉            | 31744/50000 [5:45:33<3:23:27,  1.50it/s]


 63%|████████████████████▉            | 31745/50000 [5:45:34<3:40:32,  1.38it/s]


 63%|████████████████████▉            | 31746/50000 [5:45:35<3:48:54,  1.33it/s]


 63%|████████████████████▉            | 31747/50000 [5:45:36<3:46:19,  1.34it/s]


 63%|████████████████████▉            | 31748/50000 [5:45:36<3:37:22,  1.40it/s]


 63%|████████████████████▉            | 31749/50000 [5:45:37<3:32:17,  1.43it/s]


 64%|████████████████████▉            | 31750/50000 [5:45:38<3:34:43,  1.42it/s]


 64%|████████████████████▉            | 31751/50000 [5:45:38<3:29:47,  1.45it/s]


 64%|████████████████████▉            | 31752/50000 [5:45:39<3:28:38,  1.46it/s]


 64%|████████████████████▉            | 31753/50000 [5:45:40<3:20:21,  1.52it/s]


 64%|████████████████████▉            | 31754/50000 [5:45:40<3:14:37,  1.56it/s]


 64%|████████████████████▉            | 31755/50000 [5:45:41<3:22:55,  1.50it/s]


 64%|████████████████████▉            | 31756/50000 [5:45:42<3:20:55,  1.51it/s]


 64%|████████████████████▉            | 31757/50000 [5:45:42<3:13:13,  1.57it/s]


 64%|████████████████████▉            | 31758/50000 [5:45:43<3:10:06,  1.60it/s]


 64%|████████████████████▉            | 31759/50000 [5:45:44<3:29:22,  1.45it/s]


 64%|████████████████████▉            | 31760/50000 [5:45:44<3:20:30,  1.52it/s]


 64%|████████████████████▉            | 31761/50000 [5:45:45<3:05:52,  1.64it/s]


 64%|████████████████████▉            | 31762/50000 [5:45:45<3:11:16,  1.59it/s]


 64%|████████████████████▉            | 31763/50000 [5:45:46<3:09:17,  1.61it/s]


 64%|████████████████████▉            | 31764/50000 [5:45:47<3:02:45,  1.66it/s]


 64%|████████████████████▉            | 31765/50000 [5:45:47<3:06:50,  1.63it/s]


 64%|████████████████████▉            | 31766/50000 [5:45:48<3:17:26,  1.54it/s]


 64%|████████████████████▉            | 31767/50000 [5:45:49<3:17:38,  1.54it/s]


 64%|████████████████████▉            | 31768/50000 [5:45:49<3:32:45,  1.43it/s]


 64%|████████████████████▉            | 31769/50000 [5:45:50<3:34:55,  1.41it/s]


 64%|████████████████████▉            | 31770/50000 [5:45:51<3:21:16,  1.51it/s]


 64%|████████████████████▉            | 31771/50000 [5:45:51<3:16:15,  1.55it/s]


 64%|████████████████████▉            | 31772/50000 [5:45:52<3:41:39,  1.37it/s]


 64%|████████████████████▉            | 31773/50000 [5:45:53<3:36:53,  1.40it/s]


 64%|████████████████████▉            | 31774/50000 [5:45:54<3:23:47,  1.49it/s]


 64%|████████████████████▉            | 31775/50000 [5:45:54<3:19:45,  1.52it/s]


 64%|████████████████████▉            | 31776/50000 [5:45:55<3:19:15,  1.52it/s]


 64%|████████████████████▉            | 31777/50000 [5:45:56<3:26:37,  1.47it/s]


 64%|████████████████████▉            | 31778/50000 [5:45:56<3:16:11,  1.55it/s]


 64%|████████████████████▉            | 31779/50000 [5:45:57<3:15:48,  1.55it/s]


 64%|████████████████████▉            | 31780/50000 [5:45:57<3:12:07,  1.58it/s]


 64%|████████████████████▉            | 31781/50000 [5:45:58<3:16:21,  1.55it/s]


 64%|████████████████████▉            | 31782/50000 [5:45:59<3:07:50,  1.62it/s]


 64%|████████████████████▉            | 31783/50000 [5:45:59<3:08:45,  1.61it/s]


 64%|████████████████████▉            | 31784/50000 [5:46:00<3:21:42,  1.51it/s]


 64%|████████████████████▉            | 31785/50000 [5:46:01<3:14:35,  1.56it/s]


 64%|████████████████████▉            | 31786/50000 [5:46:01<3:12:56,  1.57it/s]


 64%|████████████████████▉            | 31787/50000 [5:46:02<3:16:23,  1.55it/s]


 64%|████████████████████▉            | 31788/50000 [5:46:02<3:14:43,  1.56it/s]


 64%|████████████████████▉            | 31789/50000 [5:46:03<3:30:13,  1.44it/s]


 64%|████████████████████▉            | 31790/50000 [5:46:04<3:29:38,  1.45it/s]


 64%|████████████████████▉            | 31791/50000 [5:46:05<3:25:46,  1.47it/s]


 64%|████████████████████▉            | 31792/50000 [5:46:05<3:23:00,  1.49it/s]


 64%|████████████████████▉            | 31793/50000 [5:46:06<3:33:27,  1.42it/s]


 64%|████████████████████▉            | 31794/50000 [5:46:07<3:29:34,  1.45it/s]


 64%|████████████████████▉            | 31795/50000 [5:46:07<3:29:08,  1.45it/s]


 64%|████████████████████▉            | 31796/50000 [5:46:08<3:19:29,  1.52it/s]


 64%|████████████████████▉            | 31797/50000 [5:46:09<3:06:02,  1.63it/s]


 64%|████████████████████▉            | 31798/50000 [5:46:09<3:08:57,  1.61it/s]


 64%|████████████████████▉            | 31799/50000 [5:46:10<3:10:25,  1.59it/s]


 64%|████████████████████▉            | 31800/50000 [5:46:10<3:11:28,  1.58it/s]
                                                                                
{'loss': 3.2933, 'grad_norm': 4.239021301269531, 'learning_rate': 0.000364, 'epoch': 1.66}

 64%|████████████████████▉            | 31800/50000 [5:46:10<3:11:28,  1.58it/s]


 64%|████████████████████▉            | 31801/50000 [5:46:11<2:59:20,  1.69it/s]


 64%|████████████████████▉            | 31802/50000 [5:46:11<2:56:15,  1.72it/s]


 64%|████████████████████▉            | 31803/50000 [5:46:12<2:53:30,  1.75it/s]


 64%|████████████████████▉            | 31804/50000 [5:46:13<2:59:35,  1.69it/s]


 64%|████████████████████▉            | 31805/50000 [5:46:14<3:29:18,  1.45it/s]


 64%|████████████████████▉            | 31806/50000 [5:46:14<3:25:47,  1.47it/s]


 64%|████████████████████▉            | 31807/50000 [5:46:15<3:11:00,  1.59it/s]


 64%|████████████████████▉            | 31808/50000 [5:46:15<3:12:59,  1.57it/s]


 64%|████████████████████▉            | 31809/50000 [5:46:16<3:13:15,  1.57it/s]


 64%|████████████████████▉            | 31810/50000 [5:46:17<3:12:00,  1.58it/s]


 64%|████████████████████▉            | 31811/50000 [5:46:17<3:15:28,  1.55it/s]


 64%|████████████████████▉            | 31812/50000 [5:46:18<3:08:05,  1.61it/s]


 64%|████████████████████▉            | 31813/50000 [5:46:19<3:11:33,  1.58it/s]


 64%|████████████████████▉            | 31814/50000 [5:46:19<3:23:58,  1.49it/s]


 64%|████████████████████▉            | 31815/50000 [5:46:20<3:16:53,  1.54it/s]


 64%|████████████████████▉            | 31816/50000 [5:46:21<3:24:12,  1.48it/s]


 64%|████████████████████▉            | 31817/50000 [5:46:21<3:20:16,  1.51it/s]


 64%|████████████████████▉            | 31818/50000 [5:46:22<3:15:50,  1.55it/s]


 64%|█████████████████████            | 31819/50000 [5:46:23<3:16:27,  1.54it/s]


 64%|█████████████████████            | 31820/50000 [5:46:23<3:24:44,  1.48it/s]


 64%|█████████████████████            | 31821/50000 [5:46:24<3:24:28,  1.48it/s]


 64%|█████████████████████            | 31822/50000 [5:46:25<3:17:11,  1.54it/s]


 64%|█████████████████████            | 31823/50000 [5:46:25<3:16:44,  1.54it/s]


 64%|█████████████████████            | 31824/50000 [5:46:26<3:33:03,  1.42it/s]


 64%|█████████████████████            | 31825/50000 [5:46:27<3:28:20,  1.45it/s]


 64%|█████████████████████            | 31826/50000 [5:46:27<3:17:38,  1.53it/s]


 64%|█████████████████████            | 31827/50000 [5:46:28<3:40:41,  1.37it/s]


 64%|█████████████████████            | 31828/50000 [5:46:29<3:23:59,  1.48it/s]


 64%|█████████████████████            | 31829/50000 [5:46:29<3:15:43,  1.55it/s]


 64%|█████████████████████            | 31830/50000 [5:46:30<3:11:03,  1.58it/s]


 64%|█████████████████████            | 31831/50000 [5:46:30<3:08:22,  1.61it/s]


 64%|█████████████████████            | 31832/50000 [5:46:31<3:08:14,  1.61it/s]


 64%|█████████████████████            | 31833/50000 [5:46:32<2:56:56,  1.71it/s]


 64%|█████████████████████            | 31834/50000 [5:46:32<2:54:29,  1.74it/s]


 64%|█████████████████████            | 31835/50000 [5:46:33<2:59:03,  1.69it/s]


 64%|█████████████████████            | 31836/50000 [5:46:33<3:01:42,  1.67it/s]


 64%|█████████████████████            | 31837/50000 [5:46:34<2:58:46,  1.69it/s]


 64%|█████████████████████            | 31838/50000 [5:46:35<3:06:07,  1.63it/s]


 64%|█████████████████████            | 31839/50000 [5:46:35<3:10:04,  1.59it/s]


 64%|█████████████████████            | 31840/50000 [5:46:36<3:23:59,  1.48it/s]


 64%|█████████████████████            | 31841/50000 [5:46:37<3:21:24,  1.50it/s]


 64%|█████████████████████            | 31842/50000 [5:46:37<3:21:19,  1.50it/s]


 64%|█████████████████████            | 31843/50000 [5:46:38<3:19:01,  1.52it/s]


 64%|█████████████████████            | 31844/50000 [5:46:39<3:11:06,  1.58it/s]


 64%|█████████████████████            | 31845/50000 [5:46:39<3:14:14,  1.56it/s]


 64%|█████████████████████            | 31846/50000 [5:46:40<3:07:12,  1.62it/s]


 64%|█████████████████████            | 31847/50000 [5:46:40<3:08:38,  1.60it/s]


 64%|█████████████████████            | 31848/50000 [5:46:41<3:10:49,  1.59it/s]


 64%|█████████████████████            | 31849/50000 [5:46:42<3:08:26,  1.61it/s]


 64%|█████████████████████            | 31850/50000 [5:46:42<3:02:41,  1.66it/s]


 64%|█████████████████████            | 31851/50000 [5:46:43<3:15:29,  1.55it/s]


 64%|█████████████████████            | 31852/50000 [5:46:44<3:13:54,  1.56it/s]


 64%|█████████████████████            | 31853/50000 [5:46:44<3:25:07,  1.47it/s]


 64%|█████████████████████            | 31854/50000 [5:46:45<3:28:43,  1.45it/s]


 64%|█████████████████████            | 31855/50000 [5:46:46<3:18:01,  1.53it/s]


 64%|█████████████████████            | 31856/50000 [5:46:46<3:25:36,  1.47it/s]


 64%|█████████████████████            | 31857/50000 [5:46:47<3:24:59,  1.48it/s]


 64%|█████████████████████            | 31858/50000 [5:46:48<3:13:42,  1.56it/s]


 64%|█████████████████████            | 31859/50000 [5:46:48<3:06:54,  1.62it/s]


 64%|█████████████████████            | 31860/50000 [5:46:49<3:11:18,  1.58it/s]


 64%|█████████████████████            | 31861/50000 [5:46:50<3:10:57,  1.58it/s]


 64%|█████████████████████            | 31862/50000 [5:46:50<3:08:21,  1.60it/s]


 64%|█████████████████████            | 31863/50000 [5:46:51<3:10:29,  1.59it/s]


 64%|█████████████████████            | 31864/50000 [5:46:51<3:07:15,  1.61it/s]


 64%|█████████████████████            | 31865/50000 [5:46:52<3:06:10,  1.62it/s]


 64%|█████████████████████            | 31866/50000 [5:46:53<3:08:16,  1.61it/s]


 64%|█████████████████████            | 31867/50000 [5:46:53<3:01:45,  1.66it/s]


 64%|█████████████████████            | 31868/50000 [5:46:54<3:04:49,  1.64it/s]


 64%|█████████████████████            | 31869/50000 [5:46:54<3:06:53,  1.62it/s]


 64%|█████████████████████            | 31870/50000 [5:46:55<3:05:20,  1.63it/s]


 64%|█████████████████████            | 31871/50000 [5:46:56<3:08:13,  1.61it/s]


 64%|█████████████████████            | 31872/50000 [5:46:56<3:02:18,  1.66it/s]


 64%|█████████████████████            | 31873/50000 [5:46:57<3:17:25,  1.53it/s]


 64%|█████████████████████            | 31874/50000 [5:46:58<3:17:59,  1.53it/s]


 64%|█████████████████████            | 31875/50000 [5:46:58<3:13:51,  1.56it/s]


 64%|█████████████████████            | 31876/50000 [5:46:59<3:12:03,  1.57it/s]


 64%|█████████████████████            | 31877/50000 [5:47:00<3:06:42,  1.62it/s]


 64%|█████████████████████            | 31878/50000 [5:47:00<3:10:55,  1.58it/s]


 64%|█████████████████████            | 31879/50000 [5:47:01<3:11:32,  1.58it/s]


 64%|█████████████████████            | 31880/50000 [5:47:02<3:24:43,  1.48it/s]


 64%|█████████████████████            | 31881/50000 [5:47:02<3:24:58,  1.47it/s]


 64%|█████████████████████            | 31882/50000 [5:47:03<3:16:49,  1.53it/s]


 64%|█████████████████████            | 31883/50000 [5:47:04<3:18:50,  1.52it/s]


 64%|█████████████████████            | 31884/50000 [5:47:04<3:15:48,  1.54it/s]


 64%|█████████████████████            | 31885/50000 [5:47:05<3:03:22,  1.65it/s]


 64%|█████████████████████            | 31886/50000 [5:47:05<3:22:57,  1.49it/s]


 64%|█████████████████████            | 31887/50000 [5:47:06<3:13:40,  1.56it/s]


 64%|█████████████████████            | 31888/50000 [5:47:07<3:10:00,  1.59it/s]


 64%|█████████████████████            | 31889/50000 [5:47:07<3:07:28,  1.61it/s]


 64%|█████████████████████            | 31890/50000 [5:47:08<3:19:03,  1.52it/s]


 64%|█████████████████████            | 31891/50000 [5:47:09<3:16:41,  1.53it/s]


 64%|█████████████████████            | 31892/50000 [5:47:09<3:24:21,  1.48it/s]


 64%|█████████████████████            | 31893/50000 [5:47:10<3:19:14,  1.51it/s]


 64%|█████████████████████            | 31894/50000 [5:47:11<3:36:13,  1.40it/s]


 64%|█████████████████████            | 31895/50000 [5:47:12<3:30:56,  1.43it/s]


 64%|█████████████████████            | 31896/50000 [5:47:12<3:19:14,  1.51it/s]


 64%|█████████████████████            | 31897/50000 [5:47:13<3:33:56,  1.41it/s]


 64%|█████████████████████            | 31898/50000 [5:47:13<3:22:31,  1.49it/s]


 64%|█████████████████████            | 31899/50000 [5:47:14<3:11:04,  1.58it/s]


 64%|█████████████████████            | 31900/50000 [5:47:15<3:12:12,  1.57it/s]
                                                                                
{'loss': 3.254, 'grad_norm': 2.980713129043579, 'learning_rate': 0.000362, 'epoch': 1.67}

 64%|█████████████████████            | 31900/50000 [5:47:15<3:12:12,  1.57it/s]


 64%|█████████████████████            | 31901/50000 [5:47:15<3:13:01,  1.56it/s]


 64%|█████████████████████            | 31902/50000 [5:47:16<3:15:53,  1.54it/s]


 64%|█████████████████████            | 31903/50000 [5:47:17<3:07:36,  1.61it/s]


 64%|█████████████████████            | 31904/50000 [5:47:17<3:08:43,  1.60it/s]


 64%|█████████████████████            | 31905/50000 [5:47:18<3:04:27,  1.63it/s]


 64%|█████████████████████            | 31906/50000 [5:47:18<3:03:41,  1.64it/s]


 64%|█████████████████████            | 31907/50000 [5:47:19<3:03:11,  1.65it/s]


 64%|█████████████████████            | 31908/50000 [5:47:20<3:08:33,  1.60it/s]


 64%|█████████████████████            | 31909/50000 [5:47:20<3:03:42,  1.64it/s]


 64%|█████████████████████            | 31910/50000 [5:47:21<3:06:04,  1.62it/s]


 64%|█████████████████████            | 31911/50000 [5:47:21<3:03:09,  1.65it/s]


 64%|█████████████████████            | 31912/50000 [5:47:22<2:50:04,  1.77it/s]


 64%|█████████████████████            | 31913/50000 [5:47:23<3:09:19,  1.59it/s]


 64%|█████████████████████            | 31914/50000 [5:47:23<3:06:30,  1.62it/s]


 64%|█████████████████████            | 31915/50000 [5:47:24<3:04:33,  1.63it/s]


 64%|█████████████████████            | 31916/50000 [5:47:25<3:06:37,  1.62it/s]


 64%|█████████████████████            | 31917/50000 [5:47:25<3:08:35,  1.60it/s]


 64%|█████████████████████            | 31918/50000 [5:47:26<3:05:59,  1.62it/s]


 64%|█████████████████████            | 31919/50000 [5:47:26<3:04:36,  1.63it/s]


 64%|█████████████████████            | 31920/50000 [5:47:27<3:09:52,  1.59it/s]


 64%|█████████████████████            | 31921/50000 [5:47:28<3:12:59,  1.56it/s]


 64%|█████████████████████            | 31922/50000 [5:47:28<3:04:20,  1.63it/s]


 64%|█████████████████████            | 31923/50000 [5:47:29<3:07:57,  1.60it/s]


 64%|█████████████████████            | 31924/50000 [5:47:29<3:05:57,  1.62it/s]


 64%|█████████████████████            | 31925/50000 [5:47:30<3:04:30,  1.63it/s]


 64%|█████████████████████            | 31926/50000 [5:47:31<3:10:56,  1.58it/s]


 64%|█████████████████████            | 31927/50000 [5:47:31<3:14:50,  1.55it/s]


 64%|█████████████████████            | 31928/50000 [5:47:32<3:10:07,  1.58it/s]


 64%|█████████████████████            | 31929/50000 [5:47:33<3:18:12,  1.52it/s]


 64%|█████████████████████            | 31930/50000 [5:47:33<3:20:04,  1.51it/s]


 64%|█████████████████████            | 31931/50000 [5:47:34<3:17:48,  1.52it/s]


 64%|█████████████████████            | 31932/50000 [5:47:35<3:33:56,  1.41it/s]


 64%|█████████████████████            | 31933/50000 [5:47:35<3:22:42,  1.49it/s]


 64%|█████████████████████            | 31934/50000 [5:47:36<3:12:19,  1.57it/s]


 64%|█████████████████████            | 31935/50000 [5:47:37<3:16:48,  1.53it/s]


 64%|█████████████████████            | 31936/50000 [5:47:37<3:15:29,  1.54it/s]


 64%|█████████████████████            | 31937/50000 [5:47:38<3:10:01,  1.58it/s]


 64%|█████████████████████            | 31938/50000 [5:47:39<3:17:05,  1.53it/s]


 64%|█████████████████████            | 31939/50000 [5:47:39<3:26:51,  1.46it/s]


 64%|█████████████████████            | 31940/50000 [5:47:40<3:17:59,  1.52it/s]


 64%|█████████████████████            | 31941/50000 [5:47:41<3:17:26,  1.52it/s]


 64%|█████████████████████            | 31942/50000 [5:47:41<3:18:11,  1.52it/s]


 64%|█████████████████████            | 31943/50000 [5:47:42<3:11:21,  1.57it/s]


 64%|█████████████████████            | 31944/50000 [5:47:42<3:04:49,  1.63it/s]


 64%|█████████████████████            | 31945/50000 [5:47:43<3:08:51,  1.59it/s]


 64%|█████████████████████            | 31946/50000 [5:47:44<3:09:54,  1.58it/s]


 64%|█████████████████████            | 31947/50000 [5:47:44<3:11:14,  1.57it/s]


 64%|█████████████████████            | 31948/50000 [5:47:45<3:15:33,  1.54it/s]


 64%|█████████████████████            | 31949/50000 [5:47:46<3:14:34,  1.55it/s]


 64%|█████████████████████            | 31950/50000 [5:47:46<3:21:34,  1.49it/s]


 64%|█████████████████████            | 31951/50000 [5:47:47<3:22:14,  1.49it/s]


 64%|█████████████████████            | 31952/50000 [5:47:48<3:12:26,  1.56it/s]


 64%|█████████████████████            | 31953/50000 [5:47:48<3:06:43,  1.61it/s]


 64%|█████████████████████            | 31954/50000 [5:47:49<3:07:35,  1.60it/s]


 64%|█████████████████████            | 31955/50000 [5:47:50<3:21:15,  1.49it/s]


 64%|█████████████████████            | 31956/50000 [5:47:50<3:21:28,  1.49it/s]


 64%|█████████████████████            | 31957/50000 [5:47:51<3:21:33,  1.49it/s]


 64%|█████████████████████            | 31958/50000 [5:47:52<3:04:20,  1.63it/s]


 64%|█████████████████████            | 31959/50000 [5:47:52<3:09:33,  1.59it/s]


 64%|█████████████████████            | 31960/50000 [5:47:53<3:09:14,  1.59it/s]


 64%|█████████████████████            | 31961/50000 [5:47:53<2:59:35,  1.67it/s]


 64%|█████████████████████            | 31962/50000 [5:47:54<2:55:27,  1.71it/s]


 64%|█████████████████████            | 31963/50000 [5:47:55<3:03:55,  1.63it/s]


 64%|█████████████████████            | 31964/50000 [5:47:55<2:55:18,  1.71it/s]


 64%|█████████████████████            | 31965/50000 [5:47:56<3:02:04,  1.65it/s]


 64%|█████████████████████            | 31966/50000 [5:47:56<3:01:06,  1.66it/s]


 64%|█████████████████████            | 31967/50000 [5:47:57<3:10:43,  1.58it/s]


 64%|█████████████████████            | 31968/50000 [5:47:58<3:15:13,  1.54it/s]


 64%|█████████████████████            | 31969/50000 [5:47:58<3:15:59,  1.53it/s]


 64%|█████████████████████            | 31970/50000 [5:47:59<3:23:05,  1.48it/s]


 64%|█████████████████████            | 31971/50000 [5:48:00<3:21:33,  1.49it/s]


 64%|█████████████████████            | 31972/50000 [5:48:00<3:13:37,  1.55it/s]


 64%|█████████████████████            | 31973/50000 [5:48:01<3:13:10,  1.56it/s]


 64%|█████████████████████            | 31974/50000 [5:48:02<3:05:02,  1.62it/s]


 64%|█████████████████████            | 31975/50000 [5:48:02<3:08:16,  1.60it/s]


 64%|█████████████████████            | 31976/50000 [5:48:03<3:04:52,  1.62it/s]


 64%|█████████████████████            | 31977/50000 [5:48:03<3:05:54,  1.62it/s]


 64%|█████████████████████            | 31978/50000 [5:48:04<3:11:43,  1.57it/s]


 64%|█████████████████████            | 31979/50000 [5:48:05<3:13:33,  1.55it/s]


 64%|█████████████████████            | 31980/50000 [5:48:05<3:06:49,  1.61it/s]


 64%|█████████████████████            | 31981/50000 [5:48:06<3:10:39,  1.58it/s]


 64%|█████████████████████            | 31982/50000 [5:48:07<3:50:16,  1.30it/s]


 64%|█████████████████████            | 31983/50000 [5:48:08<4:03:22,  1.23it/s]


 64%|█████████████████████            | 31984/50000 [5:48:09<3:48:04,  1.32it/s]


 64%|█████████████████████            | 31985/50000 [5:48:09<3:38:38,  1.37it/s]


 64%|█████████████████████            | 31986/50000 [5:48:10<3:37:09,  1.38it/s]


 64%|█████████████████████            | 31987/50000 [5:48:11<3:29:14,  1.43it/s]


 64%|█████████████████████            | 31988/50000 [5:48:11<3:20:47,  1.50it/s]


 64%|█████████████████████            | 31989/50000 [5:48:12<3:17:09,  1.52it/s]


 64%|█████████████████████            | 31990/50000 [5:48:13<3:18:05,  1.52it/s]


 64%|█████████████████████            | 31991/50000 [5:48:13<3:09:09,  1.59it/s]


 64%|█████████████████████            | 31992/50000 [5:48:14<3:02:57,  1.64it/s]


 64%|█████████████████████            | 31993/50000 [5:48:14<3:04:13,  1.63it/s]


 64%|█████████████████████            | 31994/50000 [5:48:15<3:06:10,  1.61it/s]


 64%|█████████████████████            | 31995/50000 [5:48:16<3:15:15,  1.54it/s]


 64%|█████████████████████            | 31996/50000 [5:48:16<3:18:15,  1.51it/s]


 64%|█████████████████████            | 31997/50000 [5:48:17<3:15:45,  1.53it/s]


 64%|█████████████████████            | 31998/50000 [5:48:18<3:23:20,  1.48it/s]


 64%|█████████████████████            | 31999/50000 [5:48:18<3:29:11,  1.43it/s]


 64%|█████████████████████            | 32000/50000 [5:48:19<3:17:11,  1.52it/s]
                                                                                
{'loss': 3.2426, 'grad_norm': 2.664818525314331, 'learning_rate': 0.00035999999999999997, 'epoch': 1.68}

 64%|█████████████████████            | 32000/50000 [5:48:19<3:17:11,  1.52it/s]


 64%|█████████████████████            | 32001/50000 [5:48:20<3:12:43,  1.56it/s]


 64%|█████████████████████            | 32002/50000 [5:48:20<2:59:39,  1.67it/s]


 64%|█████████████████████            | 32003/50000 [5:48:21<3:01:36,  1.65it/s]


 64%|█████████████████████            | 32004/50000 [5:48:21<3:00:42,  1.66it/s]


 64%|█████████████████████            | 32005/50000 [5:48:22<3:02:52,  1.64it/s]


 64%|█████████████████████            | 32006/50000 [5:48:23<3:00:25,  1.66it/s]


 64%|█████████████████████            | 32007/50000 [5:48:23<3:13:07,  1.55it/s]


 64%|█████████████████████▏           | 32008/50000 [5:48:24<3:12:40,  1.56it/s]


 64%|█████████████████████▏           | 32009/50000 [5:48:25<3:37:32,  1.38it/s]


 64%|█████████████████████▏           | 32010/50000 [5:48:26<3:45:40,  1.33it/s]


 64%|█████████████████████▏           | 32011/50000 [5:48:26<3:38:53,  1.37it/s]


 64%|█████████████████████▏           | 32012/50000 [5:48:27<3:26:43,  1.45it/s]


 64%|█████████████████████▏           | 32013/50000 [5:48:28<3:23:54,  1.47it/s]


 64%|█████████████████████▏           | 32014/50000 [5:48:28<3:14:34,  1.54it/s]


 64%|█████████████████████▏           | 32015/50000 [5:48:29<3:05:10,  1.62it/s]


 64%|█████████████████████▏           | 32016/50000 [5:48:29<3:00:19,  1.66it/s]


 64%|█████████████████████▏           | 32017/50000 [5:48:30<3:12:00,  1.56it/s]


 64%|█████████████████████▏           | 32018/50000 [5:48:31<3:10:26,  1.57it/s]


 64%|█████████████████████▏           | 32019/50000 [5:48:31<3:18:36,  1.51it/s]


 64%|█████████████████████▏           | 32020/50000 [5:48:32<3:18:18,  1.51it/s]


 64%|█████████████████████▏           | 32021/50000 [5:48:33<3:16:51,  1.52it/s]


 64%|█████████████████████▏           | 32022/50000 [5:48:33<3:11:48,  1.56it/s]


 64%|█████████████████████▏           | 32023/50000 [5:48:34<3:10:03,  1.58it/s]


 64%|█████████████████████▏           | 32024/50000 [5:48:35<3:09:23,  1.58it/s]


 64%|█████████████████████▏           | 32025/50000 [5:48:35<3:11:43,  1.56it/s]


 64%|█████████████████████▏           | 32026/50000 [5:48:36<3:08:13,  1.59it/s]


 64%|█████████████████████▏           | 32027/50000 [5:48:36<3:08:04,  1.59it/s]


 64%|█████████████████████▏           | 32028/50000 [5:48:37<3:02:18,  1.64it/s]


 64%|█████████████████████▏           | 32029/50000 [5:48:38<2:57:44,  1.69it/s]


 64%|█████████████████████▏           | 32030/50000 [5:48:38<3:12:20,  1.56it/s]


 64%|█████████████████████▏           | 32031/50000 [5:48:39<3:18:55,  1.51it/s]


 64%|█████████████████████▏           | 32032/50000 [5:48:40<3:13:15,  1.55it/s]


 64%|█████████████████████▏           | 32033/50000 [5:48:40<3:15:35,  1.53it/s]


 64%|█████████████████████▏           | 32034/50000 [5:48:41<3:23:39,  1.47it/s]


 64%|█████████████████████▏           | 32035/50000 [5:48:42<3:19:25,  1.50it/s]


 64%|█████████████████████▏           | 32036/50000 [5:48:42<3:36:25,  1.38it/s]


 64%|█████████████████████▏           | 32037/50000 [5:48:43<3:37:09,  1.38it/s]


 64%|█████████████████████▏           | 32038/50000 [5:48:44<3:23:49,  1.47it/s]


 64%|█████████████████████▏           | 32039/50000 [5:48:44<3:13:53,  1.54it/s]


 64%|█████████████████████▏           | 32040/50000 [5:48:45<3:13:07,  1.55it/s]


 64%|█████████████████████▏           | 32041/50000 [5:48:46<3:32:03,  1.41it/s]


 64%|█████████████████████▏           | 32042/50000 [5:48:46<3:12:01,  1.56it/s]


 64%|█████████████████████▏           | 32043/50000 [5:48:47<3:11:47,  1.56it/s]


 64%|█████████████████████▏           | 32044/50000 [5:48:48<3:29:04,  1.43it/s]


 64%|█████████████████████▏           | 32045/50000 [5:48:48<3:26:07,  1.45it/s]


 64%|█████████████████████▏           | 32046/50000 [5:48:49<3:25:42,  1.45it/s]


 64%|█████████████████████▏           | 32047/50000 [5:48:50<3:20:50,  1.49it/s]


 64%|█████████████████████▏           | 32048/50000 [5:48:50<3:20:14,  1.49it/s]


 64%|█████████████████████▏           | 32049/50000 [5:48:51<3:29:45,  1.43it/s]


 64%|█████████████████████▏           | 32050/50000 [5:48:52<3:27:01,  1.45it/s]


 64%|█████████████████████▏           | 32051/50000 [5:48:53<3:19:24,  1.50it/s]


 64%|█████████████████████▏           | 32052/50000 [5:48:53<3:17:32,  1.51it/s]


 64%|█████████████████████▏           | 32053/50000 [5:48:54<3:16:45,  1.52it/s]


 64%|█████████████████████▏           | 32054/50000 [5:48:54<3:17:57,  1.51it/s]


 64%|█████████████████████▏           | 32055/50000 [5:48:55<3:10:18,  1.57it/s]


 64%|█████████████████████▏           | 32056/50000 [5:48:56<3:05:30,  1.61it/s]


 64%|█████████████████████▏           | 32057/50000 [5:48:56<2:55:38,  1.70it/s]


 64%|█████████████████████▏           | 32058/50000 [5:48:57<3:02:36,  1.64it/s]


 64%|█████████████████████▏           | 32059/50000 [5:48:57<3:05:44,  1.61it/s]


 64%|█████████████████████▏           | 32060/50000 [5:48:58<2:59:28,  1.67it/s]


 64%|█████████████████████▏           | 32061/50000 [5:48:59<3:09:49,  1.58it/s]


 64%|█████████████████████▏           | 32062/50000 [5:48:59<3:16:30,  1.52it/s]


 64%|█████████████████████▏           | 32063/50000 [5:49:00<3:22:28,  1.48it/s]


 64%|█████████████████████▏           | 32064/50000 [5:49:01<3:04:08,  1.62it/s]


 64%|█████████████████████▏           | 32065/50000 [5:49:01<3:07:20,  1.60it/s]


 64%|█████████████████████▏           | 32066/50000 [5:49:02<3:01:06,  1.65it/s]


 64%|█████████████████████▏           | 32067/50000 [5:49:03<3:07:29,  1.59it/s]


 64%|█████████████████████▏           | 32068/50000 [5:49:03<3:10:15,  1.57it/s]


 64%|█████████████████████▏           | 32069/50000 [5:49:04<3:04:51,  1.62it/s]


 64%|█████████████████████▏           | 32070/50000 [5:49:04<3:06:41,  1.60it/s]


 64%|█████████████████████▏           | 32071/50000 [5:49:05<3:04:36,  1.62it/s]


 64%|█████████████████████▏           | 32072/50000 [5:49:06<3:33:52,  1.40it/s]


 64%|█████████████████████▏           | 32073/50000 [5:49:07<3:35:09,  1.39it/s]


 64%|█████████████████████▏           | 32074/50000 [5:49:07<3:21:15,  1.48it/s]


 64%|█████████████████████▏           | 32075/50000 [5:49:08<3:20:04,  1.49it/s]


 64%|█████████████████████▏           | 32076/50000 [5:49:09<3:18:06,  1.51it/s]


 64%|█████████████████████▏           | 32077/50000 [5:49:09<3:18:42,  1.50it/s]


 64%|█████████████████████▏           | 32078/50000 [5:49:10<3:24:13,  1.46it/s]


 64%|█████████████████████▏           | 32079/50000 [5:49:11<3:19:56,  1.49it/s]


 64%|█████████████████████▏           | 32080/50000 [5:49:11<3:15:32,  1.53it/s]


 64%|█████████████████████▏           | 32081/50000 [5:49:12<3:17:12,  1.51it/s]


 64%|█████████████████████▏           | 32082/50000 [5:49:13<3:17:50,  1.51it/s]


 64%|█████████████████████▏           | 32083/50000 [5:49:13<3:22:10,  1.48it/s]


 64%|█████████████████████▏           | 32084/50000 [5:49:14<3:20:47,  1.49it/s]


 64%|█████████████████████▏           | 32085/50000 [5:49:15<3:14:39,  1.53it/s]


 64%|█████████████████████▏           | 32086/50000 [5:49:15<3:21:16,  1.48it/s]


 64%|█████████████████████▏           | 32087/50000 [5:49:16<3:17:39,  1.51it/s]


 64%|█████████████████████▏           | 32088/50000 [5:49:17<3:27:57,  1.44it/s]


 64%|█████████████████████▏           | 32089/50000 [5:49:17<3:18:28,  1.50it/s]


 64%|█████████████████████▏           | 32090/50000 [5:49:18<3:19:47,  1.49it/s]


 64%|█████████████████████▏           | 32091/50000 [5:49:19<3:35:21,  1.39it/s]


 64%|█████████████████████▏           | 32092/50000 [5:49:19<3:32:06,  1.41it/s]


 64%|█████████████████████▏           | 32093/50000 [5:49:20<3:25:23,  1.45it/s]


 64%|█████████████████████▏           | 32094/50000 [5:49:21<3:39:52,  1.36it/s]


 64%|█████████████████████▏           | 32095/50000 [5:49:22<3:47:07,  1.31it/s]


 64%|█████████████████████▏           | 32096/50000 [5:49:22<3:43:52,  1.33it/s]


 64%|█████████████████████▏           | 32097/50000 [5:49:23<3:29:29,  1.42it/s]


 64%|█████████████████████▏           | 32098/50000 [5:49:24<3:19:39,  1.49it/s]


 64%|█████████████████████▏           | 32099/50000 [5:49:24<3:18:21,  1.50it/s]


 64%|█████████████████████▏           | 32100/50000 [5:49:25<3:17:13,  1.51it/s]
                                                                                
{'loss': 3.237, 'grad_norm': 3.6223573684692383, 'learning_rate': 0.000358, 'epoch': 1.68}

 64%|█████████████████████▏           | 32100/50000 [5:49:25<3:17:13,  1.51it/s]


 64%|█████████████████████▏           | 32101/50000 [5:49:26<3:39:16,  1.36it/s]


 64%|█████████████████████▏           | 32102/50000 [5:49:27<3:32:28,  1.40it/s]


 64%|█████████████████████▏           | 32103/50000 [5:49:27<3:24:17,  1.46it/s]


 64%|█████████████████████▏           | 32104/50000 [5:49:28<3:21:56,  1.48it/s]


 64%|█████████████████████▏           | 32105/50000 [5:49:28<3:12:58,  1.55it/s]


 64%|█████████████████████▏           | 32106/50000 [5:49:29<3:06:06,  1.60it/s]


 64%|█████████████████████▏           | 32107/50000 [5:49:30<3:03:28,  1.63it/s]


 64%|█████████████████████▏           | 32108/50000 [5:49:30<3:05:05,  1.61it/s]


 64%|█████████████████████▏           | 32109/50000 [5:49:31<3:01:39,  1.64it/s]


 64%|█████████████████████▏           | 32110/50000 [5:49:31<3:06:36,  1.60it/s]


 64%|█████████████████████▏           | 32111/50000 [5:49:32<2:59:53,  1.66it/s]


 64%|█████████████████████▏           | 32112/50000 [5:49:33<3:02:33,  1.63it/s]


 64%|█████████████████████▏           | 32113/50000 [5:49:33<2:59:54,  1.66it/s]


 64%|█████████████████████▏           | 32114/50000 [5:49:34<2:59:23,  1.66it/s]


 64%|█████████████████████▏           | 32115/50000 [5:49:34<2:52:16,  1.73it/s]


 64%|█████████████████████▏           | 32116/50000 [5:49:35<2:58:20,  1.67it/s]


 64%|█████████████████████▏           | 32117/50000 [5:49:36<2:58:40,  1.67it/s]


 64%|█████████████████████▏           | 32118/50000 [5:49:36<3:05:52,  1.60it/s]


 64%|█████████████████████▏           | 32119/50000 [5:49:37<3:15:01,  1.53it/s]


 64%|█████████████████████▏           | 32120/50000 [5:49:38<3:02:40,  1.63it/s]


 64%|█████████████████████▏           | 32121/50000 [5:49:38<3:13:44,  1.54it/s]


 64%|█████████████████████▏           | 32122/50000 [5:49:39<3:04:55,  1.61it/s]


 64%|█████████████████████▏           | 32123/50000 [5:49:39<3:08:33,  1.58it/s]


 64%|█████████████████████▏           | 32124/50000 [5:49:40<3:21:26,  1.48it/s]


 64%|█████████████████████▏           | 32125/50000 [5:49:41<3:26:52,  1.44it/s]


 64%|█████████████████████▏           | 32126/50000 [5:49:42<3:15:12,  1.53it/s]


 64%|█████████████████████▏           | 32127/50000 [5:49:42<3:16:50,  1.51it/s]


 64%|█████████████████████▏           | 32128/50000 [5:49:43<3:11:20,  1.56it/s]


 64%|█████████████████████▏           | 32129/50000 [5:49:44<3:19:49,  1.49it/s]


 64%|█████████████████████▏           | 32130/50000 [5:49:44<3:11:56,  1.55it/s]


 64%|█████████████████████▏           | 32131/50000 [5:49:45<3:14:21,  1.53it/s]


 64%|█████████████████████▏           | 32132/50000 [5:49:45<3:08:06,  1.58it/s]


 64%|█████████████████████▏           | 32133/50000 [5:49:46<3:09:59,  1.57it/s]


 64%|█████████████████████▏           | 32134/50000 [5:49:47<3:11:12,  1.56it/s]


 64%|█████████████████████▏           | 32135/50000 [5:49:48<3:29:04,  1.42it/s]


 64%|█████████████████████▏           | 32136/50000 [5:49:48<3:15:35,  1.52it/s]


 64%|█████████████████████▏           | 32137/50000 [5:49:49<3:09:30,  1.57it/s]


 64%|█████████████████████▏           | 32138/50000 [5:49:50<3:27:48,  1.43it/s]


 64%|█████████████████████▏           | 32139/50000 [5:49:50<3:21:26,  1.48it/s]


 64%|█████████████████████▏           | 32140/50000 [5:49:51<3:20:54,  1.48it/s]


 64%|█████████████████████▏           | 32141/50000 [5:49:52<3:36:59,  1.37it/s]


 64%|█████████████████████▏           | 32142/50000 [5:49:52<3:31:45,  1.41it/s]


 64%|█████████████████████▏           | 32143/50000 [5:49:53<3:24:45,  1.45it/s]


 64%|█████████████████████▏           | 32144/50000 [5:49:54<3:27:27,  1.43it/s]


 64%|█████████████████████▏           | 32145/50000 [5:49:54<3:24:38,  1.45it/s]


 64%|█████████████████████▏           | 32146/50000 [5:49:55<3:22:35,  1.47it/s]


 64%|█████████████████████▏           | 32147/50000 [5:49:56<3:18:20,  1.50it/s]


 64%|█████████████████████▏           | 32148/50000 [5:49:56<3:05:27,  1.60it/s]


 64%|█████████████████████▏           | 32149/50000 [5:49:57<3:34:08,  1.39it/s]


 64%|█████████████████████▏           | 32150/50000 [5:49:58<3:23:33,  1.46it/s]


 64%|█████████████████████▏           | 32151/50000 [5:49:58<3:22:24,  1.47it/s]


 64%|█████████████████████▏           | 32152/50000 [5:49:59<3:21:00,  1.48it/s]


 64%|█████████████████████▏           | 32153/50000 [5:50:00<3:14:55,  1.53it/s]


 64%|█████████████████████▏           | 32154/50000 [5:50:00<3:22:42,  1.47it/s]


 64%|█████████████████████▏           | 32155/50000 [5:50:01<3:27:06,  1.44it/s]


 64%|█████████████████████▏           | 32156/50000 [5:50:02<3:16:59,  1.51it/s]


 64%|█████████████████████▏           | 32157/50000 [5:50:02<3:07:03,  1.59it/s]


 64%|█████████████████████▏           | 32158/50000 [5:50:03<3:10:13,  1.56it/s]


 64%|█████████████████████▏           | 32159/50000 [5:50:04<3:09:03,  1.57it/s]


 64%|█████████████████████▏           | 32160/50000 [5:50:04<3:16:54,  1.51it/s]


 64%|█████████████████████▏           | 32161/50000 [5:50:05<3:14:23,  1.53it/s]


 64%|█████████████████████▏           | 32162/50000 [5:50:06<3:07:28,  1.59it/s]


 64%|█████████████████████▏           | 32163/50000 [5:50:06<3:02:04,  1.63it/s]


 64%|█████████████████████▏           | 32164/50000 [5:50:07<3:07:16,  1.59it/s]


 64%|█████████████████████▏           | 32165/50000 [5:50:08<3:19:05,  1.49it/s]


 64%|█████████████████████▏           | 32166/50000 [5:50:08<3:13:00,  1.54it/s]


 64%|█████████████████████▏           | 32167/50000 [5:50:09<3:19:38,  1.49it/s]


 64%|█████████████████████▏           | 32168/50000 [5:50:09<3:06:22,  1.59it/s]


 64%|█████████████████████▏           | 32169/50000 [5:50:10<3:10:18,  1.56it/s]


 64%|█████████████████████▏           | 32170/50000 [5:50:11<3:10:45,  1.56it/s]


 64%|█████████████████████▏           | 32171/50000 [5:50:11<2:58:58,  1.66it/s]


 64%|█████████████████████▏           | 32172/50000 [5:50:12<3:01:41,  1.64it/s]


 64%|█████████████████████▏           | 32173/50000 [5:50:12<3:07:32,  1.58it/s]


 64%|█████████████████████▏           | 32174/50000 [5:50:13<3:12:20,  1.54it/s]


 64%|█████████████████████▏           | 32175/50000 [5:50:14<3:24:17,  1.45it/s]


 64%|█████████████████████▏           | 32176/50000 [5:50:15<3:15:27,  1.52it/s]


 64%|█████████████████████▏           | 32177/50000 [5:50:15<3:12:44,  1.54it/s]


 64%|█████████████████████▏           | 32178/50000 [5:50:16<3:31:37,  1.40it/s]


 64%|█████████████████████▏           | 32179/50000 [5:50:17<3:33:15,  1.39it/s]


 64%|█████████████████████▏           | 32180/50000 [5:50:17<3:29:04,  1.42it/s]


 64%|█████████████████████▏           | 32181/50000 [5:50:18<3:11:52,  1.55it/s]


 64%|█████████████████████▏           | 32182/50000 [5:50:19<3:07:53,  1.58it/s]


 64%|█████████████████████▏           | 32183/50000 [5:50:19<3:00:18,  1.65it/s]


 64%|█████████████████████▏           | 32184/50000 [5:50:20<2:59:13,  1.66it/s]


 64%|█████████████████████▏           | 32185/50000 [5:50:20<3:04:41,  1.61it/s]


 64%|█████████████████████▏           | 32186/50000 [5:50:21<3:07:55,  1.58it/s]


 64%|█████████████████████▏           | 32187/50000 [5:50:22<3:05:39,  1.60it/s]


 64%|█████████████████████▏           | 32188/50000 [5:50:22<3:13:42,  1.53it/s]


 64%|█████████████████████▏           | 32189/50000 [5:50:23<3:13:38,  1.53it/s]


 64%|█████████████████████▏           | 32190/50000 [5:50:24<3:12:04,  1.55it/s]


 64%|█████████████████████▏           | 32191/50000 [5:50:24<3:13:54,  1.53it/s]


 64%|█████████████████████▏           | 32192/50000 [5:50:25<3:28:36,  1.42it/s]


 64%|█████████████████████▏           | 32193/50000 [5:50:26<3:24:16,  1.45it/s]


 64%|█████████████████████▏           | 32194/50000 [5:50:26<3:19:36,  1.49it/s]


 64%|█████████████████████▏           | 32195/50000 [5:50:27<3:15:10,  1.52it/s]


 64%|█████████████████████▏           | 32196/50000 [5:50:28<3:19:55,  1.48it/s]


 64%|█████████████████████▎           | 32197/50000 [5:50:28<3:13:44,  1.53it/s]


 64%|█████████████████████▎           | 32198/50000 [5:50:29<3:24:48,  1.45it/s]


 64%|█████████████████████▎           | 32199/50000 [5:50:30<3:14:49,  1.52it/s]


 64%|█████████████████████▎           | 32200/50000 [5:50:30<3:00:51,  1.64it/s]
                                                                                
{'loss': 3.2264, 'grad_norm': 3.4299261569976807, 'learning_rate': 0.000356, 'epoch': 1.69}

 64%|█████████████████████▎           | 32200/50000 [5:50:30<3:00:51,  1.64it/s]


 64%|█████████████████████▎           | 32201/50000 [5:50:31<3:04:16,  1.61it/s]


 64%|█████████████████████▎           | 32202/50000 [5:50:31<3:06:06,  1.59it/s]


 64%|█████████████████████▎           | 32203/50000 [5:50:32<3:17:41,  1.50it/s]


 64%|█████████████████████▎           | 32204/50000 [5:50:33<3:10:15,  1.56it/s]


 64%|█████████████████████▎           | 32205/50000 [5:50:33<3:10:35,  1.56it/s]


 64%|█████████████████████▎           | 32206/50000 [5:50:34<3:08:38,  1.57it/s]


 64%|█████████████████████▎           | 32207/50000 [5:50:35<3:05:33,  1.60it/s]


 64%|█████████████████████▎           | 32208/50000 [5:50:35<3:09:04,  1.57it/s]


 64%|█████████████████████▎           | 32209/50000 [5:50:36<3:16:44,  1.51it/s]


 64%|█████████████████████▎           | 32210/50000 [5:50:37<3:16:50,  1.51it/s]


 64%|█████████████████████▎           | 32211/50000 [5:50:37<3:15:47,  1.51it/s]


 64%|█████████████████████▎           | 32212/50000 [5:50:38<3:13:53,  1.53it/s]


 64%|█████████████████████▎           | 32213/50000 [5:50:39<3:11:29,  1.55it/s]


 64%|█████████████████████▎           | 32214/50000 [5:50:39<2:58:51,  1.66it/s]


 64%|█████████████████████▎           | 32215/50000 [5:50:40<3:19:32,  1.49it/s]


 64%|█████████████████████▎           | 32216/50000 [5:50:41<3:19:54,  1.48it/s]


 64%|█████████████████████▎           | 32217/50000 [5:50:41<3:12:49,  1.54it/s]


 64%|█████████████████████▎           | 32218/50000 [5:50:42<3:07:18,  1.58it/s]


 64%|█████████████████████▎           | 32219/50000 [5:50:43<3:16:00,  1.51it/s]


 64%|█████████████████████▎           | 32220/50000 [5:50:43<3:09:31,  1.56it/s]


 64%|█████████████████████▎           | 32221/50000 [5:50:44<3:06:39,  1.59it/s]


 64%|█████████████████████▎           | 32222/50000 [5:50:44<3:08:36,  1.57it/s]


 64%|█████████████████████▎           | 32223/50000 [5:50:45<3:01:01,  1.64it/s]


 64%|█████████████████████▎           | 32224/50000 [5:50:46<2:53:09,  1.71it/s]


 64%|█████████████████████▎           | 32225/50000 [5:50:46<2:59:43,  1.65it/s]


 64%|█████████████████████▎           | 32226/50000 [5:50:47<3:02:04,  1.63it/s]


 64%|█████████████████████▎           | 32227/50000 [5:50:47<3:07:11,  1.58it/s]


 64%|█████████████████████▎           | 32228/50000 [5:50:48<3:01:10,  1.63it/s]


 64%|█████████████████████▎           | 32229/50000 [5:50:49<3:03:06,  1.62it/s]


 64%|█████████████████████▎           | 32230/50000 [5:50:49<3:07:14,  1.58it/s]


 64%|█████████████████████▎           | 32231/50000 [5:50:50<3:17:38,  1.50it/s]


 64%|█████████████████████▎           | 32232/50000 [5:50:51<3:15:08,  1.52it/s]


 64%|█████████████████████▎           | 32233/50000 [5:50:51<3:14:32,  1.52it/s]


 64%|█████████████████████▎           | 32234/50000 [5:50:52<3:06:51,  1.58it/s]


 64%|█████████████████████▎           | 32235/50000 [5:50:53<3:09:20,  1.56it/s]


 64%|█████████████████████▎           | 32236/50000 [5:50:53<3:19:04,  1.49it/s]


 64%|█████████████████████▎           | 32237/50000 [5:50:54<3:18:23,  1.49it/s]


 64%|█████████████████████▎           | 32238/50000 [5:50:55<3:12:48,  1.54it/s]


 64%|█████████████████████▎           | 32239/50000 [5:50:55<3:04:59,  1.60it/s]


 64%|█████████████████████▎           | 32240/50000 [5:50:56<3:21:21,  1.47it/s]


 64%|█████████████████████▎           | 32241/50000 [5:50:57<3:19:23,  1.48it/s]


 64%|█████████████████████▎           | 32242/50000 [5:50:58<3:38:54,  1.35it/s]


 64%|█████████████████████▎           | 32243/50000 [5:50:58<3:38:08,  1.36it/s]


 64%|█████████████████████▎           | 32244/50000 [5:50:59<3:13:37,  1.53it/s]


 64%|█████████████████████▎           | 32245/50000 [5:50:59<3:13:24,  1.53it/s]


 64%|█████████████████████▎           | 32246/50000 [5:51:00<3:34:40,  1.38it/s]


 64%|█████████████████████▎           | 32247/50000 [5:51:01<3:18:37,  1.49it/s]


 64%|█████████████████████▎           | 32248/50000 [5:51:01<3:11:18,  1.55it/s]


 64%|█████████████████████▎           | 32249/50000 [5:51:02<3:20:26,  1.48it/s]


 64%|█████████████████████▎           | 32250/50000 [5:51:03<3:17:40,  1.50it/s]


 65%|█████████████████████▎           | 32251/50000 [5:51:03<3:11:43,  1.54it/s]


 65%|█████████████████████▎           | 32252/50000 [5:51:04<3:07:02,  1.58it/s]


 65%|█████████████████████▎           | 32253/50000 [5:51:05<3:04:17,  1.60it/s]


 65%|█████████████████████▎           | 32254/50000 [5:51:05<3:02:19,  1.62it/s]


 65%|█████████████████████▎           | 32255/50000 [5:51:06<2:52:19,  1.72it/s]


 65%|█████████████████████▎           | 32256/50000 [5:51:06<3:05:58,  1.59it/s]


 65%|█████████████████████▎           | 32257/50000 [5:51:07<2:59:37,  1.65it/s]


 65%|█████████████████████▎           | 32258/50000 [5:51:08<3:29:26,  1.41it/s]


 65%|█████████████████████▎           | 32259/50000 [5:51:08<3:12:23,  1.54it/s]


 65%|█████████████████████▎           | 32260/50000 [5:51:09<3:37:15,  1.36it/s]


 65%|█████████████████████▎           | 32261/50000 [5:51:10<3:30:27,  1.40it/s]


 65%|█████████████████████▎           | 32262/50000 [5:51:11<3:24:36,  1.44it/s]


 65%|█████████████████████▎           | 32263/50000 [5:51:11<3:23:01,  1.46it/s]


 65%|█████████████████████▎           | 32264/50000 [5:51:12<3:43:44,  1.32it/s]


 65%|█████████████████████▎           | 32265/50000 [5:51:13<3:34:37,  1.38it/s]


 65%|█████████████████████▎           | 32266/50000 [5:51:14<3:24:16,  1.45it/s]


 65%|█████████████████████▎           | 32267/50000 [5:51:14<3:20:57,  1.47it/s]


 65%|█████████████████████▎           | 32268/50000 [5:51:15<3:09:43,  1.56it/s]


 65%|█████████████████████▎           | 32269/50000 [5:51:15<3:11:19,  1.54it/s]


 65%|█████████████████████▎           | 32270/50000 [5:51:16<3:18:16,  1.49it/s]


 65%|█████████████████████▎           | 32271/50000 [5:51:17<3:14:30,  1.52it/s]


 65%|█████████████████████▎           | 32272/50000 [5:51:17<3:15:19,  1.51it/s]


 65%|█████████████████████▎           | 32273/50000 [5:51:18<3:06:25,  1.58it/s]


 65%|█████████████████████▎           | 32274/50000 [5:51:19<3:14:55,  1.52it/s]


 65%|█████████████████████▎           | 32275/50000 [5:51:19<3:13:11,  1.53it/s]


 65%|█████████████████████▎           | 32276/50000 [5:51:20<3:03:33,  1.61it/s]


 65%|█████████████████████▎           | 32277/50000 [5:51:21<3:00:01,  1.64it/s]


 65%|█████████████████████▎           | 32278/50000 [5:51:21<3:18:02,  1.49it/s]


 65%|█████████████████████▎           | 32279/50000 [5:51:22<3:18:50,  1.49it/s]


 65%|█████████████████████▎           | 32280/50000 [5:51:23<3:12:40,  1.53it/s]


 65%|█████████████████████▎           | 32281/50000 [5:51:23<3:19:49,  1.48it/s]


 65%|█████████████████████▎           | 32282/50000 [5:51:24<3:35:46,  1.37it/s]


 65%|█████████████████████▎           | 32283/50000 [5:51:25<3:28:51,  1.41it/s]


 65%|█████████████████████▎           | 32284/50000 [5:51:25<3:18:21,  1.49it/s]


 65%|█████████████████████▎           | 32285/50000 [5:51:26<3:10:26,  1.55it/s]


 65%|█████████████████████▎           | 32286/50000 [5:51:27<3:07:05,  1.58it/s]


 65%|█████████████████████▎           | 32287/50000 [5:51:27<3:05:56,  1.59it/s]


 65%|█████████████████████▎           | 32288/50000 [5:51:28<3:16:29,  1.50it/s]


 65%|█████████████████████▎           | 32289/50000 [5:51:29<3:07:27,  1.57it/s]


 65%|█████████████████████▎           | 32290/50000 [5:51:29<3:00:36,  1.63it/s]


 65%|█████████████████████▎           | 32291/50000 [5:51:30<2:59:44,  1.64it/s]


 65%|█████████████████████▎           | 32292/50000 [5:51:30<3:04:42,  1.60it/s]


 65%|█████████████████████▎           | 32293/50000 [5:51:31<3:08:30,  1.57it/s]


 65%|█████████████████████▎           | 32294/50000 [5:51:32<3:02:30,  1.62it/s]


 65%|█████████████████████▎           | 32295/50000 [5:51:32<3:01:01,  1.63it/s]


 65%|█████████████████████▎           | 32296/50000 [5:51:33<2:57:12,  1.67it/s]


 65%|█████████████████████▎           | 32297/50000 [5:51:33<2:53:24,  1.70it/s]


 65%|█████████████████████▎           | 32298/50000 [5:51:34<3:00:45,  1.63it/s]


 65%|█████████████████████▎           | 32299/50000 [5:51:35<2:54:51,  1.69it/s]


 65%|█████████████████████▎           | 32300/50000 [5:51:35<3:08:44,  1.56it/s]


                                                                                
{'loss': 3.2801, 'grad_norm': 4.630538463592529, 'learning_rate': 0.000354, 'epoch': 1.69}

 65%|█████████████████████▎           | 32300/50000 [5:51:35<3:08:44,  1.56it/s]


 65%|█████████████████████▎           | 32301/50000 [5:51:36<3:09:16,  1.56it/s]


 65%|█████████████████████▎           | 32302/50000 [5:51:37<3:01:17,  1.63it/s]


 65%|█████████████████████▎           | 32303/50000 [5:51:37<3:13:11,  1.53it/s]


 65%|█████████████████████▎           | 32304/50000 [5:51:38<3:11:14,  1.54it/s]


 65%|█████████████████████▎           | 32305/50000 [5:51:39<3:16:35,  1.50it/s]


 65%|█████████████████████▎           | 32306/50000 [5:51:39<3:14:11,  1.52it/s]


 65%|█████████████████████▎           | 32307/50000 [5:51:40<3:05:48,  1.59it/s]


 65%|█████████████████████▎           | 32308/50000 [5:51:41<3:11:00,  1.54it/s]


 65%|█████████████████████▎           | 32309/50000 [5:51:41<2:59:56,  1.64it/s]


 65%|█████████████████████▎           | 32310/50000 [5:51:42<3:01:23,  1.63it/s]


 65%|█████████████████████▎           | 32311/50000 [5:51:42<3:10:28,  1.55it/s]


 65%|█████████████████████▎           | 32312/50000 [5:51:43<3:06:48,  1.58it/s]


 65%|█████████████████████▎           | 32313/50000 [5:51:44<3:03:00,  1.61it/s]


 65%|█████████████████████▎           | 32314/50000 [5:51:44<3:11:22,  1.54it/s]


 65%|█████████████████████▎           | 32315/50000 [5:51:45<3:16:42,  1.50it/s]


 65%|█████████████████████▎           | 32316/50000 [5:51:46<3:02:14,  1.62it/s]


 65%|█████████████████████▎           | 32317/50000 [5:51:46<3:00:42,  1.63it/s]


 65%|█████████████████████▎           | 32318/50000 [5:51:47<2:59:39,  1.64it/s]


 65%|█████████████████████▎           | 32319/50000 [5:51:47<3:04:29,  1.60it/s]


 65%|█████████████████████▎           | 32320/50000 [5:51:48<2:59:31,  1.64it/s]


 65%|█████████████████████▎           | 32321/50000 [5:51:49<3:11:19,  1.54it/s]


 65%|█████████████████████▎           | 32322/50000 [5:51:49<3:13:16,  1.52it/s]


 65%|█████████████████████▎           | 32323/50000 [5:51:50<3:13:29,  1.52it/s]


 65%|█████████████████████▎           | 32324/50000 [5:51:51<3:13:38,  1.52it/s]


 65%|█████████████████████▎           | 32325/50000 [5:51:51<3:10:54,  1.54it/s]


 65%|█████████████████████▎           | 32326/50000 [5:51:52<3:26:19,  1.43it/s]


 65%|█████████████████████▎           | 32327/50000 [5:51:53<3:12:35,  1.53it/s]


 65%|█████████████████████▎           | 32328/50000 [5:51:53<3:11:52,  1.54it/s]


 65%|█████████████████████▎           | 32329/50000 [5:51:54<3:14:46,  1.51it/s]


 65%|█████████████████████▎           | 32330/50000 [5:51:55<3:11:44,  1.54it/s]


 65%|█████████████████████▎           | 32331/50000 [5:51:55<3:13:33,  1.52it/s]


 65%|█████████████████████▎           | 32332/50000 [5:51:56<3:10:19,  1.55it/s]


 65%|█████████████████████▎           | 32333/50000 [5:51:57<3:09:41,  1.55it/s]


 65%|█████████████████████▎           | 32334/50000 [5:51:57<3:16:55,  1.50it/s]


 65%|█████████████████████▎           | 32335/50000 [5:51:58<3:07:12,  1.57it/s]


 65%|█████████████████████▎           | 32336/50000 [5:51:59<3:10:35,  1.54it/s]


 65%|█████████████████████▎           | 32337/50000 [5:51:59<2:55:48,  1.67it/s]


 65%|█████████████████████▎           | 32338/50000 [5:52:00<2:55:11,  1.68it/s]


 65%|█████████████████████▎           | 32339/50000 [5:52:00<3:11:17,  1.54it/s]


 65%|█████████████████████▎           | 32340/50000 [5:52:01<3:12:51,  1.53it/s]


 65%|█████████████████████▎           | 32341/50000 [5:52:02<3:21:51,  1.46it/s]


 65%|█████████████████████▎           | 32342/50000 [5:52:02<3:18:50,  1.48it/s]


 65%|█████████████████████▎           | 32343/50000 [5:52:03<3:17:20,  1.49it/s]


 65%|█████████████████████▎           | 32344/50000 [5:52:04<3:14:32,  1.51it/s]


 65%|█████████████████████▎           | 32345/50000 [5:52:04<3:09:47,  1.55it/s]


 65%|█████████████████████▎           | 32346/50000 [5:52:05<3:10:22,  1.55it/s]


 65%|█████████████████████▎           | 32347/50000 [5:52:06<3:16:44,  1.50it/s]


 65%|█████████████████████▎           | 32348/50000 [5:52:06<3:06:22,  1.58it/s]


 65%|█████████████████████▎           | 32349/50000 [5:52:07<3:08:35,  1.56it/s]


 65%|█████████████████████▎           | 32350/50000 [5:52:08<3:10:09,  1.55it/s]


 65%|█████████████████████▎           | 32351/50000 [5:52:08<3:03:58,  1.60it/s]


 65%|█████████████████████▎           | 32352/50000 [5:52:09<3:04:05,  1.60it/s]


 65%|█████████████████████▎           | 32353/50000 [5:52:09<2:59:44,  1.64it/s]


 65%|█████████████████████▎           | 32354/50000 [5:52:10<3:05:02,  1.59it/s]


 65%|█████████████████████▎           | 32355/50000 [5:52:11<3:02:36,  1.61it/s]


 65%|█████████████████████▎           | 32356/50000 [5:52:11<2:56:26,  1.67it/s]


 65%|█████████████████████▎           | 32357/50000 [5:52:12<3:01:36,  1.62it/s]


 65%|█████████████████████▎           | 32358/50000 [5:52:13<3:10:59,  1.54it/s]


 65%|█████████████████████▎           | 32359/50000 [5:52:13<3:02:54,  1.61it/s]


 65%|█████████████████████▎           | 32360/50000 [5:52:14<2:59:27,  1.64it/s]


 65%|█████████████████████▎           | 32361/50000 [5:52:14<2:54:15,  1.69it/s]


 65%|█████████████████████▎           | 32362/50000 [5:52:15<2:54:26,  1.69it/s]


 65%|█████████████████████▎           | 32363/50000 [5:52:16<3:01:55,  1.62it/s]


 65%|█████████████████████▎           | 32364/50000 [5:52:16<2:52:52,  1.70it/s]


 65%|█████████████████████▎           | 32365/50000 [5:52:17<3:00:43,  1.63it/s]


 65%|█████████████████████▎           | 32366/50000 [5:52:17<2:55:08,  1.68it/s]


 65%|█████████████████████▎           | 32367/50000 [5:52:18<3:00:04,  1.63it/s]


 65%|█████████████████████▎           | 32368/50000 [5:52:19<3:02:25,  1.61it/s]


 65%|█████████████████████▎           | 32369/50000 [5:52:19<3:04:38,  1.59it/s]


 65%|█████████████████████▎           | 32370/50000 [5:52:20<3:11:47,  1.53it/s]


 65%|█████████████████████▎           | 32371/50000 [5:52:21<3:19:10,  1.48it/s]


 65%|█████████████████████▎           | 32372/50000 [5:52:21<3:25:30,  1.43it/s]


 65%|█████████████████████▎           | 32373/50000 [5:52:22<3:29:23,  1.40it/s]


 65%|█████████████████████▎           | 32374/50000 [5:52:23<3:41:53,  1.32it/s]


 65%|█████████████████████▎           | 32375/50000 [5:52:24<3:27:46,  1.41it/s]


 65%|█████████████████████▎           | 32376/50000 [5:52:24<3:29:13,  1.40it/s]


 65%|█████████████████████▎           | 32377/50000 [5:52:25<3:21:10,  1.46it/s]


 65%|█████████████████████▎           | 32378/50000 [5:52:26<3:16:04,  1.50it/s]


 65%|█████████████████████▎           | 32379/50000 [5:52:26<3:09:44,  1.55it/s]


 65%|█████████████████████▎           | 32380/50000 [5:52:27<3:10:50,  1.54it/s]


 65%|█████████████████████▎           | 32381/50000 [5:52:28<3:20:13,  1.47it/s]


 65%|█████████████████████▎           | 32382/50000 [5:52:28<3:10:20,  1.54it/s]


 65%|█████████████████████▎           | 32383/50000 [5:52:29<3:02:22,  1.61it/s]


 65%|█████████████████████▎           | 32384/50000 [5:52:29<3:07:50,  1.56it/s]


 65%|█████████████████████▎           | 32385/50000 [5:52:30<2:56:26,  1.66it/s]


 65%|█████████████████████▎           | 32386/50000 [5:52:31<3:02:38,  1.61it/s]


 65%|█████████████████████▍           | 32387/50000 [5:52:31<3:19:45,  1.47it/s]


 65%|█████████████████████▍           | 32388/50000 [5:52:32<3:15:38,  1.50it/s]


 65%|█████████████████████▍           | 32389/50000 [5:52:33<3:16:03,  1.50it/s]


 65%|█████████████████████▍           | 32390/50000 [5:52:33<3:24:57,  1.43it/s]


 65%|█████████████████████▍           | 32391/50000 [5:52:34<3:51:46,  1.27it/s]


 65%|█████████████████████▍           | 32392/50000 [5:52:35<3:47:40,  1.29it/s]


 65%|█████████████████████▍           | 32393/50000 [5:52:36<4:02:28,  1.21it/s]


 65%|█████████████████████▍           | 32394/50000 [5:52:37<4:06:04,  1.19it/s]


 65%|█████████████████████▍           | 32395/50000 [5:52:38<3:40:16,  1.33it/s]


 65%|█████████████████████▍           | 32396/50000 [5:52:38<3:30:28,  1.39it/s]


 65%|█████████████████████▍           | 32397/50000 [5:52:39<3:20:15,  1.47it/s]


 65%|█████████████████████▍           | 32398/50000 [5:52:39<3:18:40,  1.48it/s]


 65%|█████████████████████▍           | 32399/50000 [5:52:40<3:15:17,  1.50it/s]


 65%|█████████████████████▍           | 32400/50000 [5:52:41<3:12:57,  1.52it/s]
                                                                                
{'loss': 3.2474, 'grad_norm': 4.8407769203186035, 'learning_rate': 0.000352, 'epoch': 1.7}

 65%|█████████████████████▍           | 32400/50000 [5:52:41<3:12:57,  1.52it/s]


 65%|█████████████████████▍           | 32401/50000 [5:52:41<3:10:48,  1.54it/s]


 65%|█████████████████████▍           | 32402/50000 [5:52:42<3:13:45,  1.51it/s]


 65%|█████████████████████▍           | 32403/50000 [5:52:43<3:11:51,  1.53it/s]


 65%|█████████████████████▍           | 32404/50000 [5:52:44<3:27:06,  1.42it/s]


 65%|█████████████████████▍           | 32405/50000 [5:52:44<3:21:55,  1.45it/s]


 65%|█████████████████████▍           | 32406/50000 [5:52:45<3:18:18,  1.48it/s]


 65%|█████████████████████▍           | 32407/50000 [5:52:45<3:04:58,  1.59it/s]


 65%|█████████████████████▍           | 32408/50000 [5:52:46<3:17:27,  1.48it/s]


 65%|█████████████████████▍           | 32409/50000 [5:52:47<3:15:18,  1.50it/s]


 65%|█████████████████████▍           | 32410/50000 [5:52:47<3:15:21,  1.50it/s]


 65%|█████████████████████▍           | 32411/50000 [5:52:48<3:08:31,  1.55it/s]


 65%|█████████████████████▍           | 32412/50000 [5:52:49<3:04:15,  1.59it/s]


 65%|█████████████████████▍           | 32413/50000 [5:52:49<3:08:01,  1.56it/s]


 65%|█████████████████████▍           | 32414/50000 [5:52:50<3:06:42,  1.57it/s]


 65%|█████████████████████▍           | 32415/50000 [5:52:51<3:09:44,  1.54it/s]


 65%|█████████████████████▍           | 32416/50000 [5:52:51<3:07:25,  1.56it/s]


 65%|█████████████████████▍           | 32417/50000 [5:52:52<3:06:20,  1.57it/s]


 65%|█████████████████████▍           | 32418/50000 [5:52:52<3:01:07,  1.62it/s]


 65%|█████████████████████▍           | 32419/50000 [5:52:53<3:11:32,  1.53it/s]


 65%|█████████████████████▍           | 32420/50000 [5:52:54<3:12:32,  1.52it/s]


 65%|█████████████████████▍           | 32421/50000 [5:52:55<3:18:33,  1.48it/s]


 65%|█████████████████████▍           | 32422/50000 [5:52:55<3:19:05,  1.47it/s]


 65%|█████████████████████▍           | 32423/50000 [5:52:56<3:15:33,  1.50it/s]


 65%|█████████████████████▍           | 32424/50000 [5:52:57<3:11:58,  1.53it/s]


 65%|█████████████████████▍           | 32425/50000 [5:52:57<3:05:36,  1.58it/s]


 65%|█████████████████████▍           | 32426/50000 [5:52:58<3:06:44,  1.57it/s]


 65%|█████████████████████▍           | 32427/50000 [5:52:58<3:05:17,  1.58it/s]


 65%|█████████████████████▍           | 32428/50000 [5:52:59<3:05:22,  1.58it/s]


 65%|█████████████████████▍           | 32429/50000 [5:53:00<3:05:57,  1.57it/s]


 65%|█████████████████████▍           | 32430/50000 [5:53:00<3:02:29,  1.60it/s]


 65%|█████████████████████▍           | 32431/50000 [5:53:01<2:56:08,  1.66it/s]


 65%|█████████████████████▍           | 32432/50000 [5:53:02<3:08:31,  1.55it/s]


 65%|█████████████████████▍           | 32433/50000 [5:53:02<3:08:37,  1.55it/s]


 65%|█████████████████████▍           | 32434/50000 [5:53:03<3:17:50,  1.48it/s]


 65%|█████████████████████▍           | 32435/50000 [5:53:04<3:09:37,  1.54it/s]


 65%|█████████████████████▍           | 32436/50000 [5:53:04<3:04:23,  1.59it/s]


 65%|█████████████████████▍           | 32437/50000 [5:53:05<3:04:40,  1.59it/s]


 65%|█████████████████████▍           | 32438/50000 [5:53:05<2:55:06,  1.67it/s]


 65%|█████████████████████▍           | 32439/50000 [5:53:06<2:59:15,  1.63it/s]


 65%|█████████████████████▍           | 32440/50000 [5:53:06<2:57:07,  1.65it/s]


 65%|█████████████████████▍           | 32441/50000 [5:53:07<2:56:14,  1.66it/s]


 65%|█████████████████████▍           | 32442/50000 [5:53:08<2:51:44,  1.70it/s]


 65%|█████████████████████▍           | 32443/50000 [5:53:08<2:58:31,  1.64it/s]


 65%|█████████████████████▍           | 32444/50000 [5:53:09<3:04:25,  1.59it/s]


 65%|█████████████████████▍           | 32445/50000 [5:53:10<2:58:00,  1.64it/s]


 65%|█████████████████████▍           | 32446/50000 [5:53:10<3:08:07,  1.56it/s]


 65%|█████████████████████▍           | 32447/50000 [5:53:11<3:31:44,  1.38it/s]


 65%|█████████████████████▍           | 32448/50000 [5:53:12<3:30:59,  1.39it/s]


 65%|█████████████████████▍           | 32449/50000 [5:53:12<3:15:30,  1.50it/s]


 65%|█████████████████████▍           | 32450/50000 [5:53:13<3:15:09,  1.50it/s]


 65%|█████████████████████▍           | 32451/50000 [5:53:14<3:12:09,  1.52it/s]


 65%|█████████████████████▍           | 32452/50000 [5:53:14<3:08:56,  1.55it/s]


 65%|█████████████████████▍           | 32453/50000 [5:53:15<3:02:14,  1.60it/s]


 65%|█████████████████████▍           | 32454/50000 [5:53:15<2:51:45,  1.70it/s]


 65%|█████████████████████▍           | 32455/50000 [5:53:16<3:03:00,  1.60it/s]


 65%|█████████████████████▍           | 32456/50000 [5:53:17<3:01:23,  1.61it/s]


 65%|█████████████████████▍           | 32457/50000 [5:53:17<3:10:59,  1.53it/s]


 65%|█████████████████████▍           | 32458/50000 [5:53:18<3:18:17,  1.47it/s]


 65%|█████████████████████▍           | 32459/50000 [5:53:19<3:18:45,  1.47it/s]


 65%|█████████████████████▍           | 32460/50000 [5:53:19<3:08:02,  1.55it/s]


 65%|█████████████████████▍           | 32461/50000 [5:53:20<3:07:10,  1.56it/s]


 65%|█████████████████████▍           | 32462/50000 [5:53:21<3:07:02,  1.56it/s]


 65%|█████████████████████▍           | 32463/50000 [5:53:21<3:02:31,  1.60it/s]


 65%|█████████████████████▍           | 32464/50000 [5:53:22<2:58:17,  1.64it/s]


 65%|█████████████████████▍           | 32465/50000 [5:53:23<3:17:58,  1.48it/s]


 65%|█████████████████████▍           | 32466/50000 [5:53:23<3:10:11,  1.54it/s]


 65%|█████████████████████▍           | 32467/50000 [5:53:24<3:33:35,  1.37it/s]


 65%|█████████████████████▍           | 32468/50000 [5:53:25<3:26:06,  1.42it/s]


 65%|█████████████████████▍           | 32469/50000 [5:53:26<3:24:13,  1.43it/s]


 65%|█████████████████████▍           | 32470/50000 [5:53:26<3:26:55,  1.41it/s]


 65%|█████████████████████▍           | 32471/50000 [5:53:27<3:23:36,  1.43it/s]


 65%|█████████████████████▍           | 32472/50000 [5:53:28<3:17:59,  1.48it/s]


 65%|█████████████████████▍           | 32473/50000 [5:53:28<3:04:21,  1.58it/s]


 65%|█████████████████████▍           | 32474/50000 [5:53:29<3:14:11,  1.50it/s]


 65%|█████████████████████▍           | 32475/50000 [5:53:29<3:01:10,  1.61it/s]


 65%|█████████████████████▍           | 32476/50000 [5:53:30<2:59:29,  1.63it/s]


 65%|█████████████████████▍           | 32477/50000 [5:53:31<3:10:17,  1.53it/s]


 65%|█████████████████████▍           | 32478/50000 [5:53:31<3:16:17,  1.49it/s]


 65%|█████████████████████▍           | 32479/50000 [5:53:32<3:23:44,  1.43it/s]


 65%|█████████████████████▍           | 32480/50000 [5:53:33<3:14:41,  1.50it/s]


 65%|█████████████████████▍           | 32481/50000 [5:53:33<3:14:54,  1.50it/s]


 65%|█████████████████████▍           | 32482/50000 [5:53:34<3:08:10,  1.55it/s]


 65%|█████████████████████▍           | 32483/50000 [5:53:35<2:56:34,  1.65it/s]


 65%|█████████████████████▍           | 32484/50000 [5:53:35<3:15:47,  1.49it/s]


 65%|█████████████████████▍           | 32485/50000 [5:53:36<3:05:32,  1.57it/s]


 65%|█████████████████████▍           | 32486/50000 [5:53:37<2:59:24,  1.63it/s]


 65%|█████████████████████▍           | 32487/50000 [5:53:37<3:12:31,  1.52it/s]


 65%|█████████████████████▍           | 32488/50000 [5:53:38<3:08:08,  1.55it/s]


 65%|█████████████████████▍           | 32489/50000 [5:53:39<3:14:01,  1.50it/s]


 65%|█████████████████████▍           | 32490/50000 [5:53:39<3:12:32,  1.52it/s]


 65%|█████████████████████▍           | 32491/50000 [5:53:40<3:08:07,  1.55it/s]


 65%|█████████████████████▍           | 32492/50000 [5:53:40<3:00:40,  1.62it/s]


 65%|█████████████████████▍           | 32493/50000 [5:53:41<3:08:41,  1.55it/s]


 65%|█████████████████████▍           | 32494/50000 [5:53:42<3:10:19,  1.53it/s]


 65%|█████████████████████▍           | 32495/50000 [5:53:42<3:01:59,  1.60it/s]


 65%|█████████████████████▍           | 32496/50000 [5:53:43<2:59:58,  1.62it/s]


 65%|█████████████████████▍           | 32497/50000 [5:53:44<3:01:52,  1.60it/s]


 65%|█████████████████████▍           | 32498/50000 [5:53:44<2:59:21,  1.63it/s]


 65%|█████████████████████▍           | 32499/50000 [5:53:45<3:05:20,  1.57it/s]


 65%|█████████████████████▍           | 32500/50000 [5:53:46<3:14:41,  1.50it/s]
                                                                                
{'loss': 3.2361, 'grad_norm': 3.1548991203308105, 'learning_rate': 0.00035, 'epoch': 1.7}

 65%|█████████████████████▍           | 32500/50000 [5:53:46<3:14:41,  1.50it/s]


 65%|█████████████████████▍           | 32501/50000 [5:53:46<3:22:24,  1.44it/s]


 65%|█████████████████████▍           | 32502/50000 [5:53:47<3:27:45,  1.40it/s]


 65%|█████████████████████▍           | 32503/50000 [5:53:48<3:23:35,  1.43it/s]


 65%|█████████████████████▍           | 32504/50000 [5:53:48<3:15:37,  1.49it/s]


 65%|█████████████████████▍           | 32505/50000 [5:53:49<3:12:18,  1.52it/s]


 65%|█████████████████████▍           | 32506/50000 [5:53:50<3:11:37,  1.52it/s]


 65%|█████████████████████▍           | 32507/50000 [5:53:50<3:16:47,  1.48it/s]


 65%|█████████████████████▍           | 32508/50000 [5:53:51<3:07:41,  1.55it/s]


 65%|█████████████████████▍           | 32509/50000 [5:53:52<3:03:32,  1.59it/s]


 65%|█████████████████████▍           | 32510/50000 [5:53:52<2:59:29,  1.62it/s]


 65%|█████████████████████▍           | 32511/50000 [5:53:53<3:02:27,  1.60it/s]


 65%|█████████████████████▍           | 32512/50000 [5:53:53<3:06:21,  1.56it/s]


 65%|█████████████████████▍           | 32513/50000 [5:53:54<2:59:42,  1.62it/s]


 65%|█████████████████████▍           | 32514/50000 [5:53:55<3:02:45,  1.59it/s]


 65%|█████████████████████▍           | 32515/50000 [5:53:55<2:59:22,  1.62it/s]


 65%|█████████████████████▍           | 32516/50000 [5:53:56<3:32:37,  1.37it/s]


 65%|█████████████████████▍           | 32517/50000 [5:53:57<3:47:59,  1.28it/s]


 65%|█████████████████████▍           | 32518/50000 [5:53:58<3:38:17,  1.33it/s]


 65%|█████████████████████▍           | 32519/50000 [5:53:58<3:28:38,  1.40it/s]


 65%|█████████████████████▍           | 32520/50000 [5:53:59<3:20:48,  1.45it/s]


 65%|█████████████████████▍           | 32521/50000 [5:54:00<3:13:04,  1.51it/s]


 65%|█████████████████████▍           | 32522/50000 [5:54:00<3:03:19,  1.59it/s]


 65%|█████████████████████▍           | 32523/50000 [5:54:01<2:59:44,  1.62it/s]


 65%|█████████████████████▍           | 32524/50000 [5:54:02<3:05:32,  1.57it/s]


 65%|█████████████████████▍           | 32525/50000 [5:54:02<2:59:05,  1.63it/s]


 65%|█████████████████████▍           | 32526/50000 [5:54:03<3:05:04,  1.57it/s]


 65%|█████████████████████▍           | 32527/50000 [5:54:03<2:57:39,  1.64it/s]


 65%|█████████████████████▍           | 32528/50000 [5:54:04<2:48:58,  1.72it/s]


 65%|█████████████████████▍           | 32529/50000 [5:54:04<2:48:04,  1.73it/s]


 65%|█████████████████████▍           | 32530/50000 [5:54:05<2:45:14,  1.76it/s]


 65%|█████████████████████▍           | 32531/50000 [5:54:06<2:46:32,  1.75it/s]


 65%|█████████████████████▍           | 32532/50000 [5:54:06<2:59:48,  1.62it/s]


 65%|█████████████████████▍           | 32533/50000 [5:54:07<3:04:30,  1.58it/s]


 65%|█████████████████████▍           | 32534/50000 [5:54:08<3:11:05,  1.52it/s]


 65%|█████████████████████▍           | 32535/50000 [5:54:08<3:20:19,  1.45it/s]


 65%|█████████████████████▍           | 32536/50000 [5:54:09<3:31:59,  1.37it/s]


 65%|█████████████████████▍           | 32537/50000 [5:54:10<3:18:10,  1.47it/s]


 65%|█████████████████████▍           | 32538/50000 [5:54:10<3:09:43,  1.53it/s]


 65%|█████████████████████▍           | 32539/50000 [5:54:11<3:10:50,  1.52it/s]


 65%|█████████████████████▍           | 32540/50000 [5:54:12<3:06:36,  1.56it/s]


 65%|█████████████████████▍           | 32541/50000 [5:54:12<3:01:27,  1.60it/s]


 65%|█████████████████████▍           | 32542/50000 [5:54:13<3:19:17,  1.46it/s]


 65%|█████████████████████▍           | 32543/50000 [5:54:14<3:34:09,  1.36it/s]


 65%|█████████████████████▍           | 32544/50000 [5:54:15<3:44:32,  1.30it/s]


 65%|█████████████████████▍           | 32545/50000 [5:54:16<3:41:26,  1.31it/s]


 65%|█████████████████████▍           | 32546/50000 [5:54:16<3:46:42,  1.28it/s]


 65%|█████████████████████▍           | 32547/50000 [5:54:17<3:24:16,  1.42it/s]


 65%|█████████████████████▍           | 32548/50000 [5:54:17<3:17:42,  1.47it/s]


 65%|█████████████████████▍           | 32549/50000 [5:54:18<3:09:12,  1.54it/s]


 65%|█████████████████████▍           | 32550/50000 [5:54:19<3:16:06,  1.48it/s]


 65%|█████████████████████▍           | 32551/50000 [5:54:19<3:09:08,  1.54it/s]


 65%|█████████████████████▍           | 32552/50000 [5:54:20<2:57:25,  1.64it/s]


 65%|█████████████████████▍           | 32553/50000 [5:54:21<3:01:38,  1.60it/s]


 65%|█████████████████████▍           | 32554/50000 [5:54:21<3:01:47,  1.60it/s]


 65%|█████████████████████▍           | 32555/50000 [5:54:22<2:54:46,  1.66it/s]


 65%|█████████████████████▍           | 32556/50000 [5:54:22<2:58:03,  1.63it/s]


 65%|█████████████████████▍           | 32557/50000 [5:54:23<3:02:34,  1.59it/s]


 65%|█████████████████████▍           | 32558/50000 [5:54:24<3:00:13,  1.61it/s]


 65%|█████████████████████▍           | 32559/50000 [5:54:24<3:00:16,  1.61it/s]


 65%|█████████████████████▍           | 32560/50000 [5:54:25<3:02:59,  1.59it/s]


 65%|█████████████████████▍           | 32561/50000 [5:54:26<3:02:44,  1.59it/s]


 65%|█████████████████████▍           | 32562/50000 [5:54:26<2:55:57,  1.65it/s]


 65%|█████████████████████▍           | 32563/50000 [5:54:27<2:55:34,  1.66it/s]


 65%|█████████████████████▍           | 32564/50000 [5:54:27<2:59:41,  1.62it/s]


 65%|█████████████████████▍           | 32565/50000 [5:54:28<2:56:34,  1.65it/s]


 65%|█████████████████████▍           | 32566/50000 [5:54:29<2:59:17,  1.62it/s]


 65%|█████████████████████▍           | 32567/50000 [5:54:29<3:19:58,  1.45it/s]


 65%|█████████████████████▍           | 32568/50000 [5:54:30<3:09:42,  1.53it/s]


 65%|█████████████████████▍           | 32569/50000 [5:54:31<3:14:37,  1.49it/s]


 65%|█████████████████████▍           | 32570/50000 [5:54:31<2:59:33,  1.62it/s]


 65%|█████████████████████▍           | 32571/50000 [5:54:32<3:08:01,  1.54it/s]


 65%|█████████████████████▍           | 32572/50000 [5:54:32<3:02:54,  1.59it/s]


 65%|█████████████████████▍           | 32573/50000 [5:54:33<3:11:04,  1.52it/s]


 65%|█████████████████████▍           | 32574/50000 [5:54:34<3:03:55,  1.58it/s]


 65%|█████████████████████▍           | 32575/50000 [5:54:34<3:00:35,  1.61it/s]


 65%|█████████████████████▌           | 32576/50000 [5:54:35<3:09:27,  1.53it/s]


 65%|█████████████████████▌           | 32577/50000 [5:54:36<3:08:50,  1.54it/s]


 65%|█████████████████████▌           | 32578/50000 [5:54:36<3:03:26,  1.58it/s]


 65%|█████████████████████▌           | 32579/50000 [5:54:37<3:23:24,  1.43it/s]


 65%|█████████████████████▌           | 32580/50000 [5:54:38<3:21:20,  1.44it/s]


 65%|█████████████████████▌           | 32581/50000 [5:54:39<3:18:16,  1.46it/s]


 65%|█████████████████████▌           | 32582/50000 [5:54:39<3:18:21,  1.46it/s]


 65%|█████████████████████▌           | 32583/50000 [5:54:40<3:13:24,  1.50it/s]


 65%|█████████████████████▌           | 32584/50000 [5:54:40<3:09:24,  1.53it/s]


 65%|█████████████████████▌           | 32585/50000 [5:54:41<3:08:44,  1.54it/s]


 65%|█████████████████████▌           | 32586/50000 [5:54:42<2:56:28,  1.64it/s]


 65%|█████████████████████▌           | 32587/50000 [5:54:42<2:59:41,  1.62it/s]


 65%|█████████████████████▌           | 32588/50000 [5:54:43<2:53:15,  1.68it/s]


 65%|█████████████████████▌           | 32589/50000 [5:54:43<2:45:40,  1.75it/s]


 65%|█████████████████████▌           | 32590/50000 [5:54:44<2:48:18,  1.72it/s]


 65%|█████████████████████▌           | 32591/50000 [5:54:45<2:47:59,  1.73it/s]


 65%|█████████████████████▌           | 32592/50000 [5:54:45<3:02:13,  1.59it/s]


 65%|█████████████████████▌           | 32593/50000 [5:54:46<3:05:24,  1.56it/s]


 65%|█████████████████████▌           | 32594/50000 [5:54:47<3:05:21,  1.57it/s]


 65%|█████████████████████▌           | 32595/50000 [5:54:47<2:54:40,  1.66it/s]


 65%|█████████████████████▌           | 32596/50000 [5:54:48<3:14:53,  1.49it/s]


 65%|█████████████████████▌           | 32597/50000 [5:54:49<3:12:03,  1.51it/s]


 65%|█████████████████████▌           | 32598/50000 [5:54:49<3:11:42,  1.51it/s]


 65%|█████████████████████▌           | 32599/50000 [5:54:50<3:10:58,  1.52it/s]


 65%|█████████████████████▌           | 32600/50000 [5:54:51<3:12:13,  1.51it/s]
                                                                                
{'loss': 3.2244, 'grad_norm': 2.7622294425964355, 'learning_rate': 0.000348, 'epoch': 1.71}

 65%|█████████████████████▌           | 32600/50000 [5:54:51<3:12:13,  1.51it/s]


 65%|█████████████████████▌           | 32601/50000 [5:54:51<3:09:51,  1.53it/s]


 65%|█████████████████████▌           | 32602/50000 [5:54:52<3:16:37,  1.47it/s]


 65%|█████████████████████▌           | 32603/50000 [5:54:52<3:07:23,  1.55it/s]


 65%|█████████████████████▌           | 32604/50000 [5:54:53<3:06:53,  1.55it/s]


 65%|█████████████████████▌           | 32605/50000 [5:54:54<2:58:53,  1.62it/s]


 65%|█████████████████████▌           | 32606/50000 [5:54:54<3:08:55,  1.53it/s]


 65%|█████████████████████▌           | 32607/50000 [5:54:55<3:05:11,  1.57it/s]


 65%|█████████████████████▌           | 32608/50000 [5:54:56<2:59:21,  1.62it/s]


 65%|█████████████████████▌           | 32609/50000 [5:54:56<2:56:55,  1.64it/s]


 65%|█████████████████████▌           | 32610/50000 [5:54:57<3:05:26,  1.56it/s]


 65%|█████████████████████▌           | 32611/50000 [5:54:58<3:05:54,  1.56it/s]


 65%|█████████████████████▌           | 32612/50000 [5:54:58<3:02:22,  1.59it/s]


 65%|█████████████████████▌           | 32613/50000 [5:54:59<3:07:02,  1.55it/s]


 65%|█████████████████████▌           | 32614/50000 [5:54:59<2:59:21,  1.62it/s]


 65%|█████████████████████▌           | 32615/50000 [5:55:00<3:09:31,  1.53it/s]


 65%|█████████████████████▌           | 32616/50000 [5:55:01<3:02:09,  1.59it/s]


 65%|█████████████████████▌           | 32617/50000 [5:55:01<3:12:04,  1.51it/s]


 65%|█████████████████████▌           | 32618/50000 [5:55:02<3:05:35,  1.56it/s]


 65%|█████████████████████▌           | 32619/50000 [5:55:03<3:00:29,  1.60it/s]


 65%|█████████████████████▌           | 32620/50000 [5:55:03<2:58:34,  1.62it/s]


 65%|█████████████████████▌           | 32621/50000 [5:55:04<3:03:58,  1.57it/s]


 65%|█████████████████████▌           | 32622/50000 [5:55:05<3:06:29,  1.55it/s]


 65%|█████████████████████▌           | 32623/50000 [5:55:05<3:04:57,  1.57it/s]


 65%|█████████████████████▌           | 32624/50000 [5:55:06<3:17:18,  1.47it/s]


 65%|█████████████████████▌           | 32625/50000 [5:55:06<3:07:04,  1.55it/s]


 65%|█████████████████████▌           | 32626/50000 [5:55:07<3:07:39,  1.54it/s]


 65%|█████████████████████▌           | 32627/50000 [5:55:08<3:08:08,  1.54it/s]


 65%|█████████████████████▌           | 32628/50000 [5:55:09<3:22:11,  1.43it/s]


 65%|█████████████████████▌           | 32629/50000 [5:55:09<3:10:07,  1.52it/s]


 65%|█████████████████████▌           | 32630/50000 [5:55:10<3:05:20,  1.56it/s]


 65%|█████████████████████▌           | 32631/50000 [5:55:10<3:01:54,  1.59it/s]


 65%|█████████████████████▌           | 32632/50000 [5:55:11<2:54:38,  1.66it/s]


 65%|█████████████████████▌           | 32633/50000 [5:55:11<2:51:10,  1.69it/s]


 65%|█████████████████████▌           | 32634/50000 [5:55:12<2:49:49,  1.70it/s]


 65%|█████████████████████▌           | 32635/50000 [5:55:13<2:53:17,  1.67it/s]


 65%|█████████████████████▌           | 32636/50000 [5:55:13<3:05:12,  1.56it/s]


 65%|█████████████████████▌           | 32637/50000 [5:55:14<3:01:14,  1.60it/s]


 65%|█████████████████████▌           | 32638/50000 [5:55:15<3:05:03,  1.56it/s]


 65%|█████████████████████▌           | 32639/50000 [5:55:15<3:01:41,  1.59it/s]


 65%|█████████████████████▌           | 32640/50000 [5:55:16<3:11:39,  1.51it/s]


 65%|█████████████████████▌           | 32641/50000 [5:55:17<3:06:19,  1.55it/s]


 65%|█████████████████████▌           | 32642/50000 [5:55:17<2:57:43,  1.63it/s]


 65%|█████████████████████▌           | 32643/50000 [5:55:18<3:10:33,  1.52it/s]


 65%|█████████████████████▌           | 32644/50000 [5:55:19<3:14:52,  1.48it/s]


 65%|█████████████████████▌           | 32645/50000 [5:55:19<3:20:51,  1.44it/s]


 65%|█████████████████████▌           | 32646/50000 [5:55:20<3:13:15,  1.50it/s]


 65%|█████████████████████▌           | 32647/50000 [5:55:21<3:12:56,  1.50it/s]


 65%|█████████████████████▌           | 32648/50000 [5:55:21<3:17:11,  1.47it/s]


 65%|█████████████████████▌           | 32649/50000 [5:55:22<3:13:54,  1.49it/s]


 65%|█████████████████████▌           | 32650/50000 [5:55:23<3:07:22,  1.54it/s]


 65%|█████████████████████▌           | 32651/50000 [5:55:23<3:09:53,  1.52it/s]


 65%|█████████████████████▌           | 32652/50000 [5:55:24<3:03:27,  1.58it/s]


 65%|█████████████████████▌           | 32653/50000 [5:55:25<3:04:55,  1.56it/s]


 65%|█████████████████████▌           | 32654/50000 [5:55:25<3:02:06,  1.59it/s]


 65%|█████████████████████▌           | 32655/50000 [5:55:26<3:01:17,  1.59it/s]


 65%|█████████████████████▌           | 32656/50000 [5:55:26<2:58:29,  1.62it/s]


 65%|█████████████████████▌           | 32657/50000 [5:55:27<2:53:49,  1.66it/s]


 65%|█████████████████████▌           | 32658/50000 [5:55:28<2:53:13,  1.67it/s]


 65%|█████████████████████▌           | 32659/50000 [5:55:28<2:46:32,  1.74it/s]


 65%|█████████████████████▌           | 32660/50000 [5:55:29<2:48:39,  1.71it/s]


 65%|█████████████████████▌           | 32661/50000 [5:55:29<2:45:16,  1.75it/s]


 65%|█████████████████████▌           | 32662/50000 [5:55:30<3:02:52,  1.58it/s]


 65%|█████████████████████▌           | 32663/50000 [5:55:31<3:02:19,  1.58it/s]


 65%|█████████████████████▌           | 32664/50000 [5:55:31<2:55:55,  1.64it/s]


 65%|█████████████████████▌           | 32665/50000 [5:55:32<3:01:20,  1.59it/s]


 65%|█████████████████████▌           | 32666/50000 [5:55:32<2:58:27,  1.62it/s]


 65%|█████████████████████▌           | 32667/50000 [5:55:33<2:53:12,  1.67it/s]


 65%|█████████████████████▌           | 32668/50000 [5:55:33<2:45:24,  1.75it/s]


 65%|█████████████████████▌           | 32669/50000 [5:55:34<2:52:14,  1.68it/s]


 65%|█████████████████████▌           | 32670/50000 [5:55:35<3:10:41,  1.51it/s]


 65%|█████████████████████▌           | 32671/50000 [5:55:36<3:11:39,  1.51it/s]


 65%|█████████████████████▌           | 32672/50000 [5:55:36<3:07:54,  1.54it/s]


 65%|█████████████████████▌           | 32673/50000 [5:55:37<3:05:17,  1.56it/s]


 65%|█████████████████████▌           | 32674/50000 [5:55:37<3:00:40,  1.60it/s]


 65%|█████████████████████▌           | 32675/50000 [5:55:38<2:57:58,  1.62it/s]


 65%|█████████████████████▌           | 32676/50000 [5:55:39<2:48:13,  1.72it/s]


 65%|█████████████████████▌           | 32677/50000 [5:55:39<3:18:23,  1.46it/s]


 65%|█████████████████████▌           | 32678/50000 [5:55:40<3:15:17,  1.48it/s]


 65%|█████████████████████▌           | 32679/50000 [5:55:41<3:08:16,  1.53it/s]


 65%|█████████████████████▌           | 32680/50000 [5:55:42<3:33:35,  1.35it/s]


 65%|█████████████████████▌           | 32681/50000 [5:55:42<3:28:16,  1.39it/s]


 65%|█████████████████████▌           | 32682/50000 [5:55:43<3:20:19,  1.44it/s]


 65%|█████████████████████▌           | 32683/50000 [5:55:44<3:14:37,  1.48it/s]


 65%|█████████████████████▌           | 32684/50000 [5:55:44<3:21:49,  1.43it/s]


 65%|█████████████████████▌           | 32685/50000 [5:55:45<3:12:56,  1.50it/s]


 65%|█████████████████████▌           | 32686/50000 [5:55:45<2:57:25,  1.63it/s]


 65%|█████████████████████▌           | 32687/50000 [5:55:46<2:59:48,  1.60it/s]


 65%|█████████████████████▌           | 32688/50000 [5:55:47<2:56:29,  1.63it/s]


 65%|█████████████████████▌           | 32689/50000 [5:55:47<2:55:42,  1.64it/s]


 65%|█████████████████████▌           | 32690/50000 [5:55:48<2:50:50,  1.69it/s]


 65%|█████████████████████▌           | 32691/50000 [5:55:48<2:57:04,  1.63it/s]


 65%|█████████████████████▌           | 32692/50000 [5:55:49<2:59:29,  1.61it/s]


 65%|█████████████████████▌           | 32693/50000 [5:55:50<3:09:01,  1.53it/s]


 65%|█████████████████████▌           | 32694/50000 [5:55:51<3:09:28,  1.52it/s]


 65%|█████████████████████▌           | 32695/50000 [5:55:51<3:17:01,  1.46it/s]


 65%|█████████████████████▌           | 32696/50000 [5:55:52<3:06:49,  1.54it/s]


 65%|█████████████████████▌           | 32697/50000 [5:55:53<3:24:49,  1.41it/s]


 65%|█████████████████████▌           | 32698/50000 [5:55:53<3:20:19,  1.44it/s]


 65%|█████████████████████▌           | 32699/50000 [5:55:54<3:12:11,  1.50it/s]


 65%|█████████████████████▌           | 32700/50000 [5:55:55<3:09:49,  1.52it/s]
                                                                                
{'loss': 3.2584, 'grad_norm': 3.1459851264953613, 'learning_rate': 0.000346, 'epoch': 1.71}

 65%|█████████████████████▌           | 32700/50000 [5:55:55<3:09:49,  1.52it/s]


 65%|█████████████████████▌           | 32701/50000 [5:55:55<3:03:20,  1.57it/s]


 65%|█████████████████████▌           | 32702/50000 [5:55:56<3:02:32,  1.58it/s]


 65%|█████████████████████▌           | 32703/50000 [5:55:56<2:54:55,  1.65it/s]


 65%|█████████████████████▌           | 32704/50000 [5:55:57<2:51:44,  1.68it/s]


 65%|█████████████████████▌           | 32705/50000 [5:55:57<2:50:34,  1.69it/s]


 65%|█████████████████████▌           | 32706/50000 [5:55:58<3:03:37,  1.57it/s]


 65%|█████████████████████▌           | 32707/50000 [5:55:59<2:59:23,  1.61it/s]


 65%|█████████████████████▌           | 32708/50000 [5:55:59<2:57:01,  1.63it/s]


 65%|█████████████████████▌           | 32709/50000 [5:56:00<3:01:38,  1.59it/s]


 65%|█████████████████████▌           | 32710/50000 [5:56:01<3:03:28,  1.57it/s]


 65%|█████████████████████▌           | 32711/50000 [5:56:01<3:03:41,  1.57it/s]


 65%|█████████████████████▌           | 32712/50000 [5:56:02<3:12:47,  1.49it/s]


 65%|█████████████████████▌           | 32713/50000 [5:56:03<3:10:11,  1.51it/s]


 65%|█████████████████████▌           | 32714/50000 [5:56:03<3:03:59,  1.57it/s]


 65%|█████████████████████▌           | 32715/50000 [5:56:04<3:03:31,  1.57it/s]


 65%|█████████████████████▌           | 32716/50000 [5:56:05<3:03:09,  1.57it/s]


 65%|█████████████████████▌           | 32717/50000 [5:56:05<3:06:41,  1.54it/s]


 65%|█████████████████████▌           | 32718/50000 [5:56:06<3:25:03,  1.40it/s]


 65%|█████████████████████▌           | 32719/50000 [5:56:07<3:36:46,  1.33it/s]


 65%|█████████████████████▌           | 32720/50000 [5:56:08<3:26:26,  1.40it/s]


 65%|█████████████████████▌           | 32721/50000 [5:56:08<3:12:39,  1.49it/s]


 65%|█████████████████████▌           | 32722/50000 [5:56:09<3:13:55,  1.48it/s]


 65%|█████████████████████▌           | 32723/50000 [5:56:10<3:10:24,  1.51it/s]


 65%|█████████████████████▌           | 32724/50000 [5:56:10<3:15:47,  1.47it/s]


 65%|█████████████████████▌           | 32725/50000 [5:56:11<3:11:43,  1.50it/s]


 65%|█████████████████████▌           | 32726/50000 [5:56:11<3:03:56,  1.57it/s]


 65%|█████████████████████▌           | 32727/50000 [5:56:12<3:00:04,  1.60it/s]


 65%|█████████████████████▌           | 32728/50000 [5:56:13<2:51:09,  1.68it/s]


 65%|█████████████████████▌           | 32729/50000 [5:56:13<2:51:11,  1.68it/s]


 65%|█████████████████████▌           | 32730/50000 [5:56:14<2:42:12,  1.77it/s]


 65%|█████████████████████▌           | 32731/50000 [5:56:14<2:48:10,  1.71it/s]


 65%|█████████████████████▌           | 32732/50000 [5:56:15<3:11:30,  1.50it/s]


 65%|█████████████████████▌           | 32733/50000 [5:56:16<3:09:47,  1.52it/s]


 65%|█████████████████████▌           | 32734/50000 [5:56:16<2:57:23,  1.62it/s]


 65%|█████████████████████▌           | 32735/50000 [5:56:17<3:02:03,  1.58it/s]


 65%|█████████████████████▌           | 32736/50000 [5:56:18<2:59:50,  1.60it/s]


 65%|█████████████████████▌           | 32737/50000 [5:56:18<3:12:10,  1.50it/s]


 65%|█████████████████████▌           | 32738/50000 [5:56:19<3:21:56,  1.42it/s]


 65%|█████████████████████▌           | 32739/50000 [5:56:20<3:18:35,  1.45it/s]


 65%|█████████████████████▌           | 32740/50000 [5:56:20<3:14:47,  1.48it/s]


 65%|█████████████████████▌           | 32741/50000 [5:56:21<3:12:02,  1.50it/s]


 65%|█████████████████████▌           | 32742/50000 [5:56:22<3:13:23,  1.49it/s]


 65%|█████████████████████▌           | 32743/50000 [5:56:23<3:29:04,  1.38it/s]


 65%|█████████████████████▌           | 32744/50000 [5:56:23<3:33:43,  1.35it/s]


 65%|█████████████████████▌           | 32745/50000 [5:56:24<3:27:28,  1.39it/s]


 65%|█████████████████████▌           | 32746/50000 [5:56:25<3:24:13,  1.41it/s]


 65%|█████████████████████▌           | 32747/50000 [5:56:25<3:24:06,  1.41it/s]


 65%|█████████████████████▌           | 32748/50000 [5:56:26<3:07:53,  1.53it/s]


 65%|█████████████████████▌           | 32749/50000 [5:56:27<3:09:52,  1.51it/s]


 66%|█████████████████████▌           | 32750/50000 [5:56:27<3:19:08,  1.44it/s]


 66%|█████████████████████▌           | 32751/50000 [5:56:28<3:14:28,  1.48it/s]


 66%|█████████████████████▌           | 32752/50000 [5:56:29<3:05:14,  1.55it/s]


 66%|█████████████████████▌           | 32753/50000 [5:56:29<3:05:16,  1.55it/s]


 66%|█████████████████████▌           | 32754/50000 [5:56:30<2:56:40,  1.63it/s]


 66%|█████████████████████▌           | 32755/50000 [5:56:30<2:47:35,  1.71it/s]


 66%|█████████████████████▌           | 32756/50000 [5:56:31<2:48:02,  1.71it/s]


 66%|█████████████████████▌           | 32757/50000 [5:56:32<2:53:17,  1.66it/s]


 66%|█████████████████████▌           | 32758/50000 [5:56:32<2:55:52,  1.63it/s]


 66%|█████████████████████▌           | 32759/50000 [5:56:33<3:13:26,  1.49it/s]


 66%|█████████████████████▌           | 32760/50000 [5:56:34<3:20:35,  1.43it/s]


 66%|█████████████████████▌           | 32761/50000 [5:56:34<3:15:28,  1.47it/s]


 66%|█████████████████████▌           | 32762/50000 [5:56:35<3:12:25,  1.49it/s]


 66%|█████████████████████▌           | 32763/50000 [5:56:36<3:11:57,  1.50it/s]


 66%|█████████████████████▌           | 32764/50000 [5:56:36<3:02:25,  1.57it/s]


 66%|█████████████████████▌           | 32765/50000 [5:56:37<2:58:56,  1.61it/s]


 66%|█████████████████████▋           | 32766/50000 [5:56:38<3:00:18,  1.59it/s]


 66%|█████████████████████▋           | 32767/50000 [5:56:38<3:00:10,  1.59it/s]


 66%|█████████████████████▋           | 32768/50000 [5:56:39<3:12:54,  1.49it/s]


 66%|█████████████████████▋           | 32769/50000 [5:56:40<3:06:47,  1.54it/s]


 66%|█████████████████████▋           | 32770/50000 [5:56:40<3:04:44,  1.55it/s]


 66%|█████████████████████▋           | 32771/50000 [5:56:41<3:03:50,  1.56it/s]


 66%|█████████████████████▋           | 32772/50000 [5:56:41<3:03:43,  1.56it/s]


 66%|█████████████████████▋           | 32773/50000 [5:56:42<3:04:43,  1.55it/s]


 66%|█████████████████████▋           | 32774/50000 [5:56:43<3:03:49,  1.56it/s]


 66%|█████████████████████▋           | 32775/50000 [5:56:43<3:05:19,  1.55it/s]


 66%|█████████████████████▋           | 32776/50000 [5:56:44<3:05:48,  1.54it/s]


 66%|█████████████████████▋           | 32777/50000 [5:56:45<3:08:56,  1.52it/s]


 66%|█████████████████████▋           | 32778/50000 [5:56:45<3:11:05,  1.50it/s]


 66%|█████████████████████▋           | 32779/50000 [5:56:46<3:05:28,  1.55it/s]


 66%|█████████████████████▋           | 32780/50000 [5:56:47<3:12:39,  1.49it/s]


 66%|█████████████████████▋           | 32781/50000 [5:56:47<3:06:36,  1.54it/s]


 66%|█████████████████████▋           | 32782/50000 [5:56:48<3:05:05,  1.55it/s]


 66%|█████████████████████▋           | 32783/50000 [5:56:49<3:04:03,  1.56it/s]


 66%|█████████████████████▋           | 32784/50000 [5:56:49<3:13:51,  1.48it/s]


 66%|█████████████████████▋           | 32785/50000 [5:56:50<3:14:33,  1.47it/s]


 66%|█████████████████████▋           | 32786/50000 [5:56:51<3:10:40,  1.50it/s]


 66%|█████████████████████▋           | 32787/50000 [5:56:51<2:57:24,  1.62it/s]


 66%|█████████████████████▋           | 32788/50000 [5:56:52<2:59:15,  1.60it/s]


 66%|█████████████████████▋           | 32789/50000 [5:56:52<2:57:11,  1.62it/s]


 66%|█████████████████████▋           | 32790/50000 [5:56:53<2:59:01,  1.60it/s]


 66%|█████████████████████▋           | 32791/50000 [5:56:54<2:49:11,  1.70it/s]


 66%|█████████████████████▋           | 32792/50000 [5:56:54<2:46:54,  1.72it/s]


 66%|█████████████████████▋           | 32793/50000 [5:56:55<2:47:27,  1.71it/s]


 66%|█████████████████████▋           | 32794/50000 [5:56:55<2:42:08,  1.77it/s]


 66%|█████████████████████▋           | 32795/50000 [5:56:56<2:47:25,  1.71it/s]


 66%|█████████████████████▋           | 32796/50000 [5:56:57<3:07:21,  1.53it/s]


 66%|█████████████████████▋           | 32797/50000 [5:56:57<3:02:51,  1.57it/s]


 66%|█████████████████████▋           | 32798/50000 [5:56:58<3:10:50,  1.50it/s]


 66%|█████████████████████▋           | 32799/50000 [5:56:59<3:12:22,  1.49it/s]


 66%|█████████████████████▋           | 32800/50000 [5:56:59<3:11:46,  1.49it/s]


                                                                                
{'loss': 3.2307, 'grad_norm': 3.3386268615722656, 'learning_rate': 0.00034399999999999996, 'epoch': 1.72}

 66%|█████████████████████▋           | 32800/50000 [5:56:59<3:11:46,  1.49it/s]


 66%|█████████████████████▋           | 32801/50000 [5:57:00<3:03:50,  1.56it/s]


 66%|█████████████████████▋           | 32802/50000 [5:57:00<2:56:03,  1.63it/s]


 66%|█████████████████████▋           | 32803/50000 [5:57:01<2:54:53,  1.64it/s]


 66%|█████████████████████▋           | 32804/50000 [5:57:02<2:56:18,  1.63it/s]


 66%|█████████████████████▋           | 32805/50000 [5:57:02<2:52:57,  1.66it/s]


 66%|█████████████████████▋           | 32806/50000 [5:57:03<3:04:22,  1.55it/s]


 66%|█████████████████████▋           | 32807/50000 [5:57:04<3:19:12,  1.44it/s]


 66%|█████████████████████▋           | 32808/50000 [5:57:05<3:26:03,  1.39it/s]


 66%|█████████████████████▋           | 32809/50000 [5:57:05<3:21:51,  1.42it/s]


 66%|█████████████████████▋           | 32810/50000 [5:57:06<3:10:49,  1.50it/s]


 66%|█████████████████████▋           | 32811/50000 [5:57:06<3:07:26,  1.53it/s]


 66%|█████████████████████▋           | 32812/50000 [5:57:07<3:08:18,  1.52it/s]


 66%|█████████████████████▋           | 32813/50000 [5:57:08<3:00:46,  1.58it/s]


 66%|█████████████████████▋           | 32814/50000 [5:57:08<2:49:48,  1.69it/s]


 66%|█████████████████████▋           | 32815/50000 [5:57:09<2:54:20,  1.64it/s]


 66%|█████████████████████▋           | 32816/50000 [5:57:10<3:00:12,  1.59it/s]


 66%|█████████████████████▋           | 32817/50000 [5:57:10<3:12:07,  1.49it/s]


 66%|█████████████████████▋           | 32818/50000 [5:57:11<3:21:33,  1.42it/s]


 66%|█████████████████████▋           | 32819/50000 [5:57:12<3:21:58,  1.42it/s]


 66%|█████████████████████▋           | 32820/50000 [5:57:12<3:09:46,  1.51it/s]


 66%|█████████████████████▋           | 32821/50000 [5:57:13<3:02:51,  1.57it/s]


 66%|█████████████████████▋           | 32822/50000 [5:57:14<3:10:08,  1.51it/s]


 66%|█████████████████████▋           | 32823/50000 [5:57:14<3:07:57,  1.52it/s]


 66%|█████████████████████▋           | 32824/50000 [5:57:15<3:08:37,  1.52it/s]


 66%|█████████████████████▋           | 32825/50000 [5:57:16<3:06:24,  1.54it/s]


 66%|█████████████████████▋           | 32826/50000 [5:57:16<3:13:43,  1.48it/s]


 66%|█████████████████████▋           | 32827/50000 [5:57:17<3:12:37,  1.49it/s]


 66%|█████████████████████▋           | 32828/50000 [5:57:18<3:15:40,  1.46it/s]


 66%|█████████████████████▋           | 32829/50000 [5:57:18<3:12:53,  1.48it/s]


 66%|█████████████████████▋           | 32830/50000 [5:57:19<3:21:35,  1.42it/s]


 66%|█████████████████████▋           | 32831/50000 [5:57:20<3:27:45,  1.38it/s]


 66%|█████████████████████▋           | 32832/50000 [5:57:21<3:17:42,  1.45it/s]


 66%|█████████████████████▋           | 32833/50000 [5:57:21<3:06:56,  1.53it/s]


 66%|█████████████████████▋           | 32834/50000 [5:57:22<3:02:37,  1.57it/s]


 66%|█████████████████████▋           | 32835/50000 [5:57:23<3:21:53,  1.42it/s]


 66%|█████████████████████▋           | 32836/50000 [5:57:23<3:12:14,  1.49it/s]


 66%|█████████████████████▋           | 32837/50000 [5:57:24<3:09:48,  1.51it/s]


 66%|█████████████████████▋           | 32838/50000 [5:57:25<3:15:11,  1.47it/s]


 66%|█████████████████████▋           | 32839/50000 [5:57:25<3:03:41,  1.56it/s]


 66%|█████████████████████▋           | 32840/50000 [5:57:26<2:59:56,  1.59it/s]


 66%|█████████████████████▋           | 32841/50000 [5:57:26<2:52:41,  1.66it/s]


 66%|█████████████████████▋           | 32842/50000 [5:57:27<3:02:57,  1.56it/s]


 66%|█████████████████████▋           | 32843/50000 [5:57:28<3:02:53,  1.56it/s]


 66%|█████████████████████▋           | 32844/50000 [5:57:28<3:05:36,  1.54it/s]


 66%|█████████████████████▋           | 32845/50000 [5:57:29<3:07:40,  1.52it/s]


 66%|█████████████████████▋           | 32846/50000 [5:57:29<2:53:51,  1.64it/s]


 66%|█████████████████████▋           | 32847/50000 [5:57:30<2:43:39,  1.75it/s]


 66%|█████████████████████▋           | 32848/50000 [5:57:31<2:52:51,  1.65it/s]


 66%|█████████████████████▋           | 32849/50000 [5:57:31<2:47:56,  1.70it/s]


 66%|█████████████████████▋           | 32850/50000 [5:57:32<3:00:08,  1.59it/s]


 66%|█████████████████████▋           | 32851/50000 [5:57:32<2:52:54,  1.65it/s]


 66%|█████████████████████▋           | 32852/50000 [5:57:33<2:51:45,  1.66it/s]


 66%|█████████████████████▋           | 32853/50000 [5:57:33<2:42:10,  1.76it/s]


 66%|█████████████████████▋           | 32854/50000 [5:57:34<2:58:05,  1.60it/s]


 66%|█████████████████████▋           | 32855/50000 [5:57:35<2:52:59,  1.65it/s]


 66%|█████████████████████▋           | 32856/50000 [5:57:35<2:45:54,  1.72it/s]


 66%|█████████████████████▋           | 32857/50000 [5:57:36<2:43:03,  1.75it/s]


 66%|█████████████████████▋           | 32858/50000 [5:57:37<2:55:16,  1.63it/s]


 66%|█████████████████████▋           | 32859/50000 [5:57:37<2:52:46,  1.65it/s]


 66%|█████████████████████▋           | 32860/50000 [5:57:38<2:57:34,  1.61it/s]


 66%|█████████████████████▋           | 32861/50000 [5:57:38<2:49:07,  1.69it/s]


 66%|█████████████████████▋           | 32862/50000 [5:57:39<2:47:45,  1.70it/s]


 66%|█████████████████████▋           | 32863/50000 [5:57:40<2:59:25,  1.59it/s]


 66%|█████████████████████▋           | 32864/50000 [5:57:41<3:22:06,  1.41it/s]


 66%|█████████████████████▋           | 32865/50000 [5:57:41<3:09:44,  1.51it/s]


 66%|█████████████████████▋           | 32866/50000 [5:57:42<3:07:02,  1.53it/s]


 66%|█████████████████████▋           | 32867/50000 [5:57:42<3:07:19,  1.52it/s]


 66%|█████████████████████▋           | 32868/50000 [5:57:43<2:58:53,  1.60it/s]


 66%|█████████████████████▋           | 32869/50000 [5:57:44<3:05:59,  1.54it/s]


 66%|█████████████████████▋           | 32870/50000 [5:57:44<2:57:24,  1.61it/s]


 66%|█████████████████████▋           | 32871/50000 [5:57:45<3:02:11,  1.57it/s]


 66%|█████████████████████▋           | 32872/50000 [5:57:45<2:51:09,  1.67it/s]


 66%|█████████████████████▋           | 32873/50000 [5:57:46<3:00:31,  1.58it/s]


 66%|█████████████████████▋           | 32874/50000 [5:57:47<3:00:33,  1.58it/s]


 66%|█████████████████████▋           | 32875/50000 [5:57:47<2:57:49,  1.61it/s]


 66%|█████████████████████▋           | 32876/50000 [5:57:48<2:53:48,  1.64it/s]


 66%|█████████████████████▋           | 32877/50000 [5:57:49<3:02:54,  1.56it/s]


 66%|█████████████████████▋           | 32878/50000 [5:57:49<2:57:21,  1.61it/s]


 66%|█████████████████████▋           | 32879/50000 [5:57:50<2:53:26,  1.65it/s]


 66%|█████████████████████▋           | 32880/50000 [5:57:50<2:50:43,  1.67it/s]


 66%|█████████████████████▋           | 32881/50000 [5:57:51<3:02:03,  1.57it/s]


 66%|█████████████████████▋           | 32882/50000 [5:57:52<2:50:05,  1.68it/s]


 66%|█████████████████████▋           | 32883/50000 [5:57:52<2:55:56,  1.62it/s]


 66%|█████████████████████▋           | 32884/50000 [5:57:53<2:58:24,  1.60it/s]


 66%|█████████████████████▋           | 32885/50000 [5:57:53<2:49:02,  1.69it/s]


 66%|█████████████████████▋           | 32886/50000 [5:57:54<2:44:58,  1.73it/s]


 66%|█████████████████████▋           | 32887/50000 [5:57:55<2:50:10,  1.68it/s]


 66%|█████████████████████▋           | 32888/50000 [5:57:55<3:03:24,  1.55it/s]


 66%|█████████████████████▋           | 32889/50000 [5:57:56<3:18:11,  1.44it/s]


 66%|█████████████████████▋           | 32890/50000 [5:57:57<3:19:25,  1.43it/s]


 66%|█████████████████████▋           | 32891/50000 [5:57:58<3:33:16,  1.34it/s]


 66%|█████████████████████▋           | 32892/50000 [5:57:58<3:20:16,  1.42it/s]


 66%|█████████████████████▋           | 32893/50000 [5:57:59<3:09:34,  1.50it/s]


 66%|█████████████████████▋           | 32894/50000 [5:58:00<3:06:56,  1.53it/s]


 66%|█████████████████████▋           | 32895/50000 [5:58:00<3:06:35,  1.53it/s]


 66%|█████████████████████▋           | 32896/50000 [5:58:01<3:08:00,  1.52it/s]


 66%|█████████████████████▋           | 32897/50000 [5:58:02<3:05:43,  1.53it/s]


 66%|█████████████████████▋           | 32898/50000 [5:58:02<3:08:30,  1.51it/s]


 66%|█████████████████████▋           | 32899/50000 [5:58:03<3:02:18,  1.56it/s]


 66%|█████████████████████▋           | 32900/50000 [5:58:03<2:57:25,  1.61it/s]
                                                                                
{'loss': 3.2625, 'grad_norm': 3.291982889175415, 'learning_rate': 0.000342, 'epoch': 1.72}

 66%|█████████████████████▋           | 32900/50000 [5:58:03<2:57:25,  1.61it/s]


 66%|█████████████████████▋           | 32901/50000 [5:58:04<2:59:33,  1.59it/s]


 66%|█████████████████████▋           | 32902/50000 [5:58:05<2:57:15,  1.61it/s]


 66%|█████████████████████▋           | 32903/50000 [5:58:05<2:57:19,  1.61it/s]


 66%|█████████████████████▋           | 32904/50000 [5:58:06<2:55:49,  1.62it/s]


 66%|█████████████████████▋           | 32905/50000 [5:58:06<2:57:54,  1.60it/s]


 66%|█████████████████████▋           | 32906/50000 [5:58:07<3:03:10,  1.56it/s]


 66%|█████████████████████▋           | 32907/50000 [5:58:08<3:06:21,  1.53it/s]


 66%|█████████████████████▋           | 32908/50000 [5:58:09<3:19:47,  1.43it/s]


 66%|█████████████████████▋           | 32909/50000 [5:58:09<3:22:35,  1.41it/s]


 66%|█████████████████████▋           | 32910/50000 [5:58:10<3:22:34,  1.41it/s]


 66%|█████████████████████▋           | 32911/50000 [5:58:11<3:22:53,  1.40it/s]


 66%|█████████████████████▋           | 32912/50000 [5:58:12<3:27:34,  1.37it/s]


 66%|█████████████████████▋           | 32913/50000 [5:58:12<3:15:46,  1.45it/s]


 66%|█████████████████████▋           | 32914/50000 [5:58:13<3:17:50,  1.44it/s]


 66%|█████████████████████▋           | 32915/50000 [5:58:14<3:16:38,  1.45it/s]


 66%|█████████████████████▋           | 32916/50000 [5:58:14<3:20:44,  1.42it/s]


 66%|█████████████████████▋           | 32917/50000 [5:58:15<3:21:20,  1.41it/s]


 66%|█████████████████████▋           | 32918/50000 [5:58:16<3:18:57,  1.43it/s]


 66%|█████████████████████▋           | 32919/50000 [5:58:16<3:13:31,  1.47it/s]


 66%|█████████████████████▋           | 32920/50000 [5:58:17<3:13:30,  1.47it/s]


 66%|█████████████████████▋           | 32921/50000 [5:58:18<3:07:36,  1.52it/s]


 66%|█████████████████████▋           | 32922/50000 [5:58:18<3:25:35,  1.38it/s]


 66%|█████████████████████▋           | 32923/50000 [5:58:19<3:13:20,  1.47it/s]


 66%|█████████████████████▋           | 32924/50000 [5:58:20<3:16:05,  1.45it/s]


 66%|█████████████████████▋           | 32925/50000 [5:58:20<3:06:06,  1.53it/s]


 66%|█████████████████████▋           | 32926/50000 [5:58:21<3:00:12,  1.58it/s]


 66%|█████████████████████▋           | 32927/50000 [5:58:22<2:54:56,  1.63it/s]


 66%|█████████████████████▋           | 32928/50000 [5:58:22<2:56:40,  1.61it/s]


 66%|█████████████████████▋           | 32929/50000 [5:58:23<2:57:53,  1.60it/s]


 66%|█████████████████████▋           | 32930/50000 [5:58:23<2:59:47,  1.58it/s]


 66%|█████████████████████▋           | 32931/50000 [5:58:24<3:00:34,  1.58it/s]


 66%|█████████████████████▋           | 32932/50000 [5:58:25<3:03:56,  1.55it/s]


 66%|█████████████████████▋           | 32933/50000 [5:58:25<3:04:00,  1.55it/s]


 66%|█████████████████████▋           | 32934/50000 [5:58:26<3:01:51,  1.56it/s]


 66%|█████████████████████▋           | 32935/50000 [5:58:27<3:04:01,  1.55it/s]


 66%|█████████████████████▋           | 32936/50000 [5:58:27<3:13:54,  1.47it/s]


 66%|█████████████████████▋           | 32937/50000 [5:58:28<3:06:31,  1.52it/s]


 66%|█████████████████████▋           | 32938/50000 [5:58:29<3:11:35,  1.48it/s]


 66%|█████████████████████▋           | 32939/50000 [5:58:29<3:14:38,  1.46it/s]


 66%|█████████████████████▋           | 32940/50000 [5:58:30<3:03:34,  1.55it/s]


 66%|█████████████████████▋           | 32941/50000 [5:58:31<2:58:11,  1.60it/s]


 66%|█████████████████████▋           | 32942/50000 [5:58:31<3:03:01,  1.55it/s]


 66%|█████████████████████▋           | 32943/50000 [5:58:32<3:03:42,  1.55it/s]


 66%|█████████████████████▋           | 32944/50000 [5:58:33<2:58:49,  1.59it/s]


 66%|█████████████████████▋           | 32945/50000 [5:58:33<3:01:52,  1.56it/s]


 66%|█████████████████████▋           | 32946/50000 [5:58:34<3:14:04,  1.46it/s]


 66%|█████████████████████▋           | 32947/50000 [5:58:35<3:16:31,  1.45it/s]


 66%|█████████████████████▋           | 32948/50000 [5:58:35<3:05:20,  1.53it/s]


 66%|█████████████████████▋           | 32949/50000 [5:58:36<2:59:39,  1.58it/s]


 66%|█████████████████████▋           | 32950/50000 [5:58:37<3:09:19,  1.50it/s]


 66%|█████████████████████▋           | 32951/50000 [5:58:37<3:05:28,  1.53it/s]


 66%|█████████████████████▋           | 32952/50000 [5:58:38<3:00:01,  1.58it/s]


 66%|█████████████████████▋           | 32953/50000 [5:58:38<3:03:14,  1.55it/s]


 66%|█████████████████████▋           | 32954/50000 [5:58:39<2:59:31,  1.58it/s]


 66%|█████████████████████▊           | 32955/50000 [5:58:40<3:02:50,  1.55it/s]


 66%|█████████████████████▊           | 32956/50000 [5:58:40<2:59:12,  1.59it/s]


 66%|█████████████████████▊           | 32957/50000 [5:58:41<2:54:33,  1.63it/s]


 66%|█████████████████████▊           | 32958/50000 [5:58:42<2:53:55,  1.63it/s]


 66%|█████████████████████▊           | 32959/50000 [5:58:42<2:57:17,  1.60it/s]


 66%|█████████████████████▊           | 32960/50000 [5:58:43<2:51:36,  1.65it/s]


 66%|█████████████████████▊           | 32961/50000 [5:58:43<3:00:40,  1.57it/s]


 66%|█████████████████████▊           | 32962/50000 [5:58:44<3:04:21,  1.54it/s]


 66%|█████████████████████▊           | 32963/50000 [5:58:45<3:05:19,  1.53it/s]


 66%|█████████████████████▊           | 32964/50000 [5:58:45<3:04:21,  1.54it/s]


 66%|█████████████████████▊           | 32965/50000 [5:58:46<2:57:46,  1.60it/s]


 66%|█████████████████████▊           | 32966/50000 [5:58:47<2:49:22,  1.68it/s]


 66%|█████████████████████▊           | 32967/50000 [5:58:47<2:41:53,  1.75it/s]


 66%|█████████████████████▊           | 32968/50000 [5:58:48<2:49:35,  1.67it/s]


 66%|█████████████████████▊           | 32969/50000 [5:58:48<2:47:58,  1.69it/s]


 66%|█████████████████████▊           | 32970/50000 [5:58:49<2:44:49,  1.72it/s]


 66%|█████████████████████▊           | 32971/50000 [5:58:49<2:47:18,  1.70it/s]


 66%|█████████████████████▊           | 32972/50000 [5:58:50<2:44:35,  1.72it/s]


 66%|█████████████████████▊           | 32973/50000 [5:58:51<2:45:23,  1.72it/s]


 66%|█████████████████████▊           | 32974/50000 [5:58:51<2:52:22,  1.65it/s]


 66%|█████████████████████▊           | 32975/50000 [5:58:52<3:06:38,  1.52it/s]


 66%|█████████████████████▊           | 32976/50000 [5:58:53<2:59:00,  1.59it/s]


 66%|█████████████████████▊           | 32977/50000 [5:58:53<2:48:22,  1.68it/s]


 66%|█████████████████████▊           | 32978/50000 [5:58:54<2:45:51,  1.71it/s]


 66%|█████████████████████▊           | 32979/50000 [5:58:54<2:46:43,  1.70it/s]


 66%|█████████████████████▊           | 32980/50000 [5:58:55<2:44:11,  1.73it/s]


 66%|█████████████████████▊           | 32981/50000 [5:58:55<2:52:01,  1.65it/s]


 66%|█████████████████████▊           | 32982/50000 [5:58:56<2:49:27,  1.67it/s]


 66%|█████████████████████▊           | 32983/50000 [5:58:57<2:56:14,  1.61it/s]


 66%|█████████████████████▊           | 32984/50000 [5:58:58<3:12:14,  1.48it/s]


 66%|█████████████████████▊           | 32985/50000 [5:58:58<3:16:45,  1.44it/s]


 66%|█████████████████████▊           | 32986/50000 [5:58:59<3:06:46,  1.52it/s]


 66%|█████████████████████▊           | 32987/50000 [5:58:59<2:59:16,  1.58it/s]


 66%|█████████████████████▊           | 32988/50000 [5:59:00<2:55:35,  1.61it/s]


 66%|█████████████████████▊           | 32989/50000 [5:59:01<2:56:53,  1.60it/s]


 66%|█████████████████████▊           | 32990/50000 [5:59:01<2:55:08,  1.62it/s]


 66%|█████████████████████▊           | 32991/50000 [5:59:02<3:07:05,  1.52it/s]


 66%|█████████████████████▊           | 32992/50000 [5:59:03<3:00:08,  1.57it/s]


 66%|█████████████████████▊           | 32993/50000 [5:59:03<3:02:47,  1.55it/s]


 66%|█████████████████████▊           | 32994/50000 [5:59:04<3:10:46,  1.49it/s]


 66%|█████████████████████▊           | 32995/50000 [5:59:05<3:07:00,  1.52it/s]


 66%|█████████████████████▊           | 32996/50000 [5:59:05<3:08:06,  1.51it/s]


 66%|█████████████████████▊           | 32997/50000 [5:59:06<2:53:36,  1.63it/s]


 66%|█████████████████████▊           | 32998/50000 [5:59:06<2:49:06,  1.68it/s]


 66%|█████████████████████▊           | 32999/50000 [5:59:07<2:52:55,  1.64it/s]


 66%|█████████████████████▊           | 33000/50000 [5:59:08<2:47:41,  1.69it/s]
                                                                                
{'loss': 3.2136, 'grad_norm': 4.428757190704346, 'learning_rate': 0.00034, 'epoch': 1.73}

 66%|█████████████████████▊           | 33000/50000 [5:59:08<2:47:41,  1.69it/s]


 66%|█████████████████████▊           | 33001/50000 [5:59:08<3:01:18,  1.56it/s]


 66%|█████████████████████▊           | 33002/50000 [5:59:09<3:10:37,  1.49it/s]


 66%|█████████████████████▊           | 33003/50000 [5:59:10<3:08:52,  1.50it/s]


 66%|█████████████████████▊           | 33004/50000 [5:59:11<3:24:55,  1.38it/s]


 66%|█████████████████████▊           | 33005/50000 [5:59:11<3:05:12,  1.53it/s]


 66%|█████████████████████▊           | 33006/50000 [5:59:12<2:58:04,  1.59it/s]


 66%|█████████████████████▊           | 33007/50000 [5:59:12<3:02:56,  1.55it/s]


 66%|█████████████████████▊           | 33008/50000 [5:59:13<3:11:18,  1.48it/s]


 66%|█████████████████████▊           | 33009/50000 [5:59:14<2:54:05,  1.63it/s]


 66%|█████████████████████▊           | 33010/50000 [5:59:14<3:06:39,  1.52it/s]


 66%|█████████████████████▊           | 33011/50000 [5:59:15<3:00:40,  1.57it/s]


 66%|█████████████████████▊           | 33012/50000 [5:59:16<3:16:18,  1.44it/s]


 66%|█████████████████████▊           | 33013/50000 [5:59:16<3:09:00,  1.50it/s]


 66%|█████████████████████▊           | 33014/50000 [5:59:17<3:00:44,  1.57it/s]


 66%|█████████████████████▊           | 33015/50000 [5:59:18<3:01:20,  1.56it/s]


 66%|█████████████████████▊           | 33016/50000 [5:59:18<3:09:30,  1.49it/s]


 66%|█████████████████████▊           | 33017/50000 [5:59:19<3:08:32,  1.50it/s]


 66%|█████████████████████▊           | 33018/50000 [5:59:20<3:06:15,  1.52it/s]


 66%|█████████████████████▊           | 33019/50000 [5:59:20<2:56:40,  1.60it/s]


 66%|█████████████████████▊           | 33020/50000 [5:59:21<2:48:00,  1.68it/s]


 66%|█████████████████████▊           | 33021/50000 [5:59:21<3:01:11,  1.56it/s]


 66%|█████████████████████▊           | 33022/50000 [5:59:22<3:08:18,  1.50it/s]


 66%|█████████████████████▊           | 33023/50000 [5:59:23<3:06:36,  1.52it/s]


 66%|█████████████████████▊           | 33024/50000 [5:59:23<3:04:51,  1.53it/s]


 66%|█████████████████████▊           | 33025/50000 [5:59:24<3:04:10,  1.54it/s]


 66%|█████████████████████▊           | 33026/50000 [5:59:25<3:05:18,  1.53it/s]


 66%|█████████████████████▊           | 33027/50000 [5:59:25<3:06:41,  1.52it/s]


 66%|█████████████████████▊           | 33028/50000 [5:59:26<2:59:01,  1.58it/s]


 66%|█████████████████████▊           | 33029/50000 [5:59:26<2:53:42,  1.63it/s]


 66%|█████████████████████▊           | 33030/50000 [5:59:27<2:50:29,  1.66it/s]


 66%|█████████████████████▊           | 33031/50000 [5:59:28<2:53:33,  1.63it/s]


 66%|█████████████████████▊           | 33032/50000 [5:59:28<2:45:50,  1.71it/s]


 66%|█████████████████████▊           | 33033/50000 [5:59:29<2:54:08,  1.62it/s]


 66%|█████████████████████▊           | 33034/50000 [5:59:30<3:06:35,  1.52it/s]


 66%|█████████████████████▊           | 33035/50000 [5:59:30<3:14:12,  1.46it/s]


 66%|█████████████████████▊           | 33036/50000 [5:59:31<3:03:53,  1.54it/s]


 66%|█████████████████████▊           | 33037/50000 [5:59:32<3:05:05,  1.53it/s]


 66%|█████████████████████▊           | 33038/50000 [5:59:32<3:15:50,  1.44it/s]


 66%|█████████████████████▊           | 33039/50000 [5:59:33<3:10:49,  1.48it/s]


 66%|█████████████████████▊           | 33040/50000 [5:59:34<3:30:24,  1.34it/s]


 66%|█████████████████████▊           | 33041/50000 [5:59:35<3:24:10,  1.38it/s]


 66%|█████████████████████▊           | 33042/50000 [5:59:35<3:18:45,  1.42it/s]


 66%|█████████████████████▊           | 33043/50000 [5:59:36<3:20:01,  1.41it/s]


 66%|█████████████████████▊           | 33044/50000 [5:59:37<3:12:39,  1.47it/s]


 66%|█████████████████████▊           | 33045/50000 [5:59:37<3:09:35,  1.49it/s]


 66%|█████████████████████▊           | 33046/50000 [5:59:38<3:17:18,  1.43it/s]


 66%|█████████████████████▊           | 33047/50000 [5:59:39<3:06:58,  1.51it/s]


 66%|█████████████████████▊           | 33048/50000 [5:59:39<3:23:24,  1.39it/s]


 66%|█████████████████████▊           | 33049/50000 [5:59:40<3:13:53,  1.46it/s]


 66%|█████████████████████▊           | 33050/50000 [5:59:41<3:13:06,  1.46it/s]


 66%|█████████████████████▊           | 33051/50000 [5:59:41<3:09:19,  1.49it/s]


 66%|█████████████████████▊           | 33052/50000 [5:59:42<3:07:46,  1.50it/s]


 66%|█████████████████████▊           | 33053/50000 [5:59:43<3:07:16,  1.51it/s]


 66%|█████████████████████▊           | 33054/50000 [5:59:43<3:06:20,  1.52it/s]


 66%|█████████████████████▊           | 33055/50000 [5:59:44<2:57:10,  1.59it/s]


 66%|█████████████████████▊           | 33056/50000 [5:59:45<3:01:24,  1.56it/s]


 66%|█████████████████████▊           | 33057/50000 [5:59:45<2:56:52,  1.60it/s]


 66%|█████████████████████▊           | 33058/50000 [5:59:46<2:57:27,  1.59it/s]


 66%|█████████████████████▊           | 33059/50000 [5:59:47<3:16:41,  1.44it/s]


 66%|█████████████████████▊           | 33060/50000 [5:59:47<3:08:03,  1.50it/s]


 66%|█████████████████████▊           | 33061/50000 [5:59:48<3:00:59,  1.56it/s]


 66%|█████████████████████▊           | 33062/50000 [5:59:48<2:48:50,  1.67it/s]


 66%|█████████████████████▊           | 33063/50000 [5:59:49<2:53:56,  1.62it/s]


 66%|█████████████████████▊           | 33064/50000 [5:59:50<2:50:33,  1.65it/s]


 66%|█████████████████████▊           | 33065/50000 [5:59:50<2:55:08,  1.61it/s]


 66%|█████████████████████▊           | 33066/50000 [5:59:51<3:06:09,  1.52it/s]


 66%|█████████████████████▊           | 33067/50000 [5:59:52<3:12:09,  1.47it/s]


 66%|█████████████████████▊           | 33068/50000 [5:59:52<3:03:18,  1.54it/s]


 66%|█████████████████████▊           | 33069/50000 [5:59:53<2:54:59,  1.61it/s]


 66%|█████████████████████▊           | 33070/50000 [5:59:53<2:50:46,  1.65it/s]


 66%|█████████████████████▊           | 33071/50000 [5:59:54<3:03:00,  1.54it/s]


 66%|█████████████████████▊           | 33072/50000 [5:59:55<2:56:20,  1.60it/s]


 66%|█████████████████████▊           | 33073/50000 [5:59:55<3:05:13,  1.52it/s]


 66%|█████████████████████▊           | 33074/50000 [5:59:56<3:03:45,  1.54it/s]


 66%|█████████████████████▊           | 33075/50000 [5:59:57<3:02:44,  1.54it/s]


 66%|█████████████████████▊           | 33076/50000 [5:59:57<3:01:28,  1.55it/s]


 66%|█████████████████████▊           | 33077/50000 [5:59:58<2:56:20,  1.60it/s]


 66%|█████████████████████▊           | 33078/50000 [5:59:59<2:58:33,  1.58it/s]


 66%|█████████████████████▊           | 33079/50000 [5:59:59<2:54:15,  1.62it/s]


 66%|█████████████████████▊           | 33080/50000 [6:00:00<2:58:12,  1.58it/s]


 66%|█████████████████████▊           | 33081/50000 [6:00:01<3:15:24,  1.44it/s]


 66%|█████████████████████▊           | 33082/50000 [6:00:01<3:04:59,  1.52it/s]


 66%|█████████████████████▊           | 33083/50000 [6:00:02<3:04:39,  1.53it/s]


 66%|█████████████████████▊           | 33084/50000 [6:00:03<3:03:20,  1.54it/s]


 66%|█████████████████████▊           | 33085/50000 [6:00:03<3:08:19,  1.50it/s]


 66%|█████████████████████▊           | 33086/50000 [6:00:04<3:07:30,  1.50it/s]


 66%|█████████████████████▊           | 33087/50000 [6:00:05<3:09:42,  1.49it/s]


 66%|█████████████████████▊           | 33088/50000 [6:00:05<3:05:52,  1.52it/s]


 66%|█████████████████████▊           | 33089/50000 [6:00:06<3:03:50,  1.53it/s]


 66%|█████████████████████▊           | 33090/50000 [6:00:07<3:04:21,  1.53it/s]


 66%|█████████████████████▊           | 33091/50000 [6:00:07<2:56:13,  1.60it/s]


 66%|█████████████████████▊           | 33092/50000 [6:00:08<3:04:06,  1.53it/s]


 66%|█████████████████████▊           | 33093/50000 [6:00:09<3:10:40,  1.48it/s]


 66%|█████████████████████▊           | 33094/50000 [6:00:09<3:30:24,  1.34it/s]


 66%|█████████████████████▊           | 33095/50000 [6:00:10<3:24:33,  1.38it/s]


 66%|█████████████████████▊           | 33096/50000 [6:00:11<3:17:12,  1.43it/s]


 66%|█████████████████████▊           | 33097/50000 [6:00:11<3:11:02,  1.47it/s]


 66%|█████████████████████▊           | 33098/50000 [6:00:12<3:25:58,  1.37it/s]


 66%|█████████████████████▊           | 33099/50000 [6:00:13<3:18:46,  1.42it/s]


 66%|█████████████████████▊           | 33100/50000 [6:00:13<3:06:16,  1.51it/s]
                                                                                
{'loss': 3.2498, 'grad_norm': 2.9101884365081787, 'learning_rate': 0.00033800000000000003, 'epoch': 1.73}

 66%|█████████████████████▊           | 33100/50000 [6:00:13<3:06:16,  1.51it/s]


 66%|█████████████████████▊           | 33101/50000 [6:00:14<3:14:16,  1.45it/s]


 66%|█████████████████████▊           | 33102/50000 [6:00:15<3:11:05,  1.47it/s]


 66%|█████████████████████▊           | 33103/50000 [6:00:16<3:14:25,  1.45it/s]


 66%|█████████████████████▊           | 33104/50000 [6:00:16<3:17:15,  1.43it/s]


 66%|█████████████████████▊           | 33105/50000 [6:00:17<3:12:04,  1.47it/s]


 66%|█████████████████████▊           | 33106/50000 [6:00:18<3:04:43,  1.52it/s]


 66%|█████████████████████▊           | 33107/50000 [6:00:18<3:00:05,  1.56it/s]


 66%|█████████████████████▊           | 33108/50000 [6:00:19<3:00:03,  1.56it/s]


 66%|█████████████████████▊           | 33109/50000 [6:00:19<2:55:14,  1.61it/s]


 66%|█████████████████████▊           | 33110/50000 [6:00:20<2:49:14,  1.66it/s]


 66%|█████████████████████▊           | 33111/50000 [6:00:21<2:58:22,  1.58it/s]


 66%|█████████████████████▊           | 33112/50000 [6:00:21<2:52:28,  1.63it/s]


 66%|█████████████████████▊           | 33113/50000 [6:00:22<2:57:26,  1.59it/s]


 66%|█████████████████████▊           | 33114/50000 [6:00:23<3:09:06,  1.49it/s]


 66%|█████████████████████▊           | 33115/50000 [6:00:23<3:12:15,  1.46it/s]


 66%|█████████████████████▊           | 33116/50000 [6:00:24<3:09:06,  1.49it/s]


 66%|█████████████████████▊           | 33117/50000 [6:00:25<3:16:45,  1.43it/s]


 66%|█████████████████████▊           | 33118/50000 [6:00:26<3:44:00,  1.26it/s]


 66%|█████████████████████▊           | 33119/50000 [6:00:27<3:39:00,  1.28it/s]


 66%|█████████████████████▊           | 33120/50000 [6:00:27<3:17:28,  1.42it/s]


 66%|█████████████████████▊           | 33121/50000 [6:00:28<3:08:34,  1.49it/s]


 66%|█████████████████████▊           | 33122/50000 [6:00:28<2:59:01,  1.57it/s]


 66%|█████████████████████▊           | 33123/50000 [6:00:29<3:02:30,  1.54it/s]


 66%|█████████████████████▊           | 33124/50000 [6:00:30<3:01:11,  1.55it/s]


 66%|█████████████████████▊           | 33125/50000 [6:00:30<2:59:44,  1.56it/s]


 66%|█████████████████████▊           | 33126/50000 [6:00:31<3:01:55,  1.55it/s]


 66%|█████████████████████▊           | 33127/50000 [6:00:32<3:07:11,  1.50it/s]


 66%|█████████████████████▊           | 33128/50000 [6:00:32<3:04:29,  1.52it/s]


 66%|█████████████████████▊           | 33129/50000 [6:00:33<2:58:48,  1.57it/s]


 66%|█████████████████████▊           | 33130/50000 [6:00:33<3:06:43,  1.51it/s]


 66%|█████████████████████▊           | 33131/50000 [6:00:34<3:00:53,  1.55it/s]


 66%|█████████████████████▊           | 33132/50000 [6:00:35<3:02:11,  1.54it/s]


 66%|█████████████████████▊           | 33133/50000 [6:00:35<2:57:12,  1.59it/s]


 66%|█████████████████████▊           | 33134/50000 [6:00:36<2:58:31,  1.57it/s]


 66%|█████████████████████▊           | 33135/50000 [6:00:37<3:08:41,  1.49it/s]


 66%|█████████████████████▊           | 33136/50000 [6:00:38<3:24:09,  1.38it/s]


 66%|█████████████████████▊           | 33137/50000 [6:00:38<3:16:47,  1.43it/s]


 66%|█████████████████████▊           | 33138/50000 [6:00:39<3:18:47,  1.41it/s]


 66%|█████████████████████▊           | 33139/50000 [6:00:39<3:03:10,  1.53it/s]


 66%|█████████████████████▊           | 33140/50000 [6:00:40<3:03:45,  1.53it/s]


 66%|█████████████████████▊           | 33141/50000 [6:00:41<2:54:32,  1.61it/s]


 66%|█████████████████████▊           | 33142/50000 [6:00:41<2:53:23,  1.62it/s]


 66%|█████████████████████▊           | 33143/50000 [6:00:42<2:58:28,  1.57it/s]


 66%|█████████████████████▉           | 33144/50000 [6:00:43<2:59:51,  1.56it/s]


 66%|█████████████████████▉           | 33145/50000 [6:00:43<2:55:30,  1.60it/s]


 66%|█████████████████████▉           | 33146/50000 [6:00:44<2:51:58,  1.63it/s]


 66%|█████████████████████▉           | 33147/50000 [6:00:44<3:00:10,  1.56it/s]


 66%|█████████████████████▉           | 33148/50000 [6:00:45<3:06:28,  1.51it/s]


 66%|█████████████████████▉           | 33149/50000 [6:00:46<3:04:24,  1.52it/s]


 66%|█████████████████████▉           | 33150/50000 [6:00:47<3:17:51,  1.42it/s]


 66%|█████████████████████▉           | 33151/50000 [6:00:47<3:09:10,  1.48it/s]


 66%|█████████████████████▉           | 33152/50000 [6:00:48<3:14:29,  1.44it/s]


 66%|█████████████████████▉           | 33153/50000 [6:00:48<2:58:03,  1.58it/s]


 66%|█████████████████████▉           | 33154/50000 [6:00:49<3:08:19,  1.49it/s]


 66%|█████████████████████▉           | 33155/50000 [6:00:50<3:12:48,  1.46it/s]


 66%|█████████████████████▉           | 33156/50000 [6:00:50<2:56:19,  1.59it/s]


 66%|█████████████████████▉           | 33157/50000 [6:00:51<2:51:26,  1.64it/s]


 66%|█████████████████████▉           | 33158/50000 [6:00:52<3:12:04,  1.46it/s]


 66%|█████████████████████▉           | 33159/50000 [6:00:53<3:10:59,  1.47it/s]


 66%|█████████████████████▉           | 33160/50000 [6:00:53<3:08:02,  1.49it/s]


 66%|█████████████████████▉           | 33161/50000 [6:00:54<3:02:20,  1.54it/s]


 66%|█████████████████████▉           | 33162/50000 [6:00:55<3:13:24,  1.45it/s]


 66%|█████████████████████▉           | 33163/50000 [6:00:55<3:03:21,  1.53it/s]


 66%|█████████████████████▉           | 33164/50000 [6:00:56<2:57:56,  1.58it/s]


 66%|█████████████████████▉           | 33165/50000 [6:00:56<2:52:33,  1.63it/s]


 66%|█████████████████████▉           | 33166/50000 [6:00:57<2:49:49,  1.65it/s]


 66%|█████████████████████▉           | 33167/50000 [6:00:58<2:53:11,  1.62it/s]


 66%|█████████████████████▉           | 33168/50000 [6:00:58<3:02:39,  1.54it/s]


 66%|█████████████████████▉           | 33169/50000 [6:00:59<3:10:23,  1.47it/s]


 66%|█████████████████████▉           | 33170/50000 [6:01:00<3:02:17,  1.54it/s]


 66%|█████████████████████▉           | 33171/50000 [6:01:00<3:03:01,  1.53it/s]


 66%|█████████████████████▉           | 33172/50000 [6:01:01<2:56:49,  1.59it/s]


 66%|█████████████████████▉           | 33173/50000 [6:01:02<3:12:59,  1.45it/s]


 66%|█████████████████████▉           | 33174/50000 [6:01:02<3:17:35,  1.42it/s]


 66%|█████████████████████▉           | 33175/50000 [6:01:03<3:11:03,  1.47it/s]


 66%|█████████████████████▉           | 33176/50000 [6:01:04<3:10:43,  1.47it/s]


 66%|█████████████████████▉           | 33177/50000 [6:01:04<3:06:15,  1.51it/s]


 66%|█████████████████████▉           | 33178/50000 [6:01:05<2:50:40,  1.64it/s]


 66%|█████████████████████▉           | 33179/50000 [6:01:05<2:49:00,  1.66it/s]


 66%|█████████████████████▉           | 33180/50000 [6:01:06<2:52:35,  1.62it/s]


 66%|█████████████████████▉           | 33181/50000 [6:01:07<2:56:09,  1.59it/s]


 66%|█████████████████████▉           | 33182/50000 [6:01:07<2:55:51,  1.59it/s]


 66%|█████████████████████▉           | 33183/50000 [6:01:08<3:05:16,  1.51it/s]


 66%|█████████████████████▉           | 33184/50000 [6:01:09<2:58:43,  1.57it/s]


 66%|█████████████████████▉           | 33185/50000 [6:01:10<3:17:02,  1.42it/s]


 66%|█████████████████████▉           | 33186/50000 [6:01:10<3:06:21,  1.50it/s]


 66%|█████████████████████▉           | 33187/50000 [6:01:11<3:22:22,  1.38it/s]


 66%|█████████████████████▉           | 33188/50000 [6:01:12<3:11:42,  1.46it/s]


 66%|█████████████████████▉           | 33189/50000 [6:01:12<3:07:24,  1.50it/s]


 66%|█████████████████████▉           | 33190/50000 [6:01:13<3:01:14,  1.55it/s]


 66%|█████████████████████▉           | 33191/50000 [6:01:13<2:54:45,  1.60it/s]


 66%|█████████████████████▉           | 33192/50000 [6:01:14<2:59:46,  1.56it/s]


 66%|█████████████████████▉           | 33193/50000 [6:01:15<3:00:36,  1.55it/s]


 66%|█████████████████████▉           | 33194/50000 [6:01:15<2:56:55,  1.58it/s]


 66%|█████████████████████▉           | 33195/50000 [6:01:16<3:06:46,  1.50it/s]


 66%|█████████████████████▉           | 33196/50000 [6:01:17<3:14:13,  1.44it/s]


 66%|█████████████████████▉           | 33197/50000 [6:01:17<3:09:39,  1.48it/s]


 66%|█████████████████████▉           | 33198/50000 [6:01:18<3:14:05,  1.44it/s]


 66%|█████████████████████▉           | 33199/50000 [6:01:19<3:08:32,  1.49it/s]


 66%|█████████████████████▉           | 33200/50000 [6:01:19<3:05:44,  1.51it/s]
                                                                                
{'loss': 3.2058, 'grad_norm': 8.616000175476074, 'learning_rate': 0.00033600000000000004, 'epoch': 1.74}

 66%|█████████████████████▉           | 33200/50000 [6:01:19<3:05:44,  1.51it/s]


 66%|█████████████████████▉           | 33201/50000 [6:01:20<2:57:15,  1.58it/s]


 66%|█████████████████████▉           | 33202/50000 [6:01:21<2:53:05,  1.62it/s]


 66%|█████████████████████▉           | 33203/50000 [6:01:21<3:05:31,  1.51it/s]


 66%|█████████████████████▉           | 33204/50000 [6:01:22<2:58:17,  1.57it/s]


 66%|█████████████████████▉           | 33205/50000 [6:01:23<2:55:48,  1.59it/s]


 66%|█████████████████████▉           | 33206/50000 [6:01:23<2:57:17,  1.58it/s]


 66%|█████████████████████▉           | 33207/50000 [6:01:24<2:57:26,  1.58it/s]


 66%|█████████████████████▉           | 33208/50000 [6:01:24<2:53:10,  1.62it/s]


 66%|█████████████████████▉           | 33209/50000 [6:01:25<2:55:57,  1.59it/s]


 66%|█████████████████████▉           | 33210/50000 [6:01:26<2:49:27,  1.65it/s]


 66%|█████████████████████▉           | 33211/50000 [6:01:26<2:46:27,  1.68it/s]


 66%|█████████████████████▉           | 33212/50000 [6:01:27<2:46:30,  1.68it/s]


 66%|█████████████████████▉           | 33213/50000 [6:01:27<2:57:20,  1.58it/s]


 66%|█████████████████████▉           | 33214/50000 [6:01:28<2:59:55,  1.55it/s]


 66%|█████████████████████▉           | 33215/50000 [6:01:29<2:53:15,  1.61it/s]


 66%|█████████████████████▉           | 33216/50000 [6:01:29<2:50:44,  1.64it/s]


 66%|█████████████████████▉           | 33217/50000 [6:01:30<2:48:24,  1.66it/s]


 66%|█████████████████████▉           | 33218/50000 [6:01:31<2:52:01,  1.63it/s]


 66%|█████████████████████▉           | 33219/50000 [6:01:31<2:43:12,  1.71it/s]


 66%|█████████████████████▉           | 33220/50000 [6:01:32<2:45:11,  1.69it/s]


 66%|█████████████████████▉           | 33221/50000 [6:01:32<3:05:42,  1.51it/s]


 66%|█████████████████████▉           | 33222/50000 [6:01:33<3:05:15,  1.51it/s]


 66%|█████████████████████▉           | 33223/50000 [6:01:34<3:10:59,  1.46it/s]


 66%|█████████████████████▉           | 33224/50000 [6:01:35<3:25:26,  1.36it/s]


 66%|█████████████████████▉           | 33225/50000 [6:01:35<3:26:07,  1.36it/s]


 66%|█████████████████████▉           | 33226/50000 [6:01:36<3:24:59,  1.36it/s]


 66%|█████████████████████▉           | 33227/50000 [6:01:37<3:07:16,  1.49it/s]


 66%|█████████████████████▉           | 33228/50000 [6:01:37<3:02:00,  1.54it/s]


 66%|█████████████████████▉           | 33229/50000 [6:01:38<2:57:49,  1.57it/s]


 66%|█████████████████████▉           | 33230/50000 [6:01:39<3:04:42,  1.51it/s]


 66%|█████████████████████▉           | 33231/50000 [6:01:39<3:00:12,  1.55it/s]


 66%|█████████████████████▉           | 33232/50000 [6:01:40<3:03:32,  1.52it/s]


 66%|█████████████████████▉           | 33233/50000 [6:01:41<2:57:50,  1.57it/s]


 66%|█████████████████████▉           | 33234/50000 [6:01:41<2:53:22,  1.61it/s]


 66%|█████████████████████▉           | 33235/50000 [6:01:42<2:50:43,  1.64it/s]


 66%|█████████████████████▉           | 33236/50000 [6:01:42<2:53:07,  1.61it/s]


 66%|█████████████████████▉           | 33237/50000 [6:01:43<3:00:39,  1.55it/s]


 66%|█████████████████████▉           | 33238/50000 [6:01:44<3:02:11,  1.53it/s]


 66%|█████████████████████▉           | 33239/50000 [6:01:44<2:54:46,  1.60it/s]


 66%|█████████████████████▉           | 33240/50000 [6:01:45<2:53:14,  1.61it/s]


 66%|█████████████████████▉           | 33241/50000 [6:01:45<2:49:32,  1.65it/s]


 66%|█████████████████████▉           | 33242/50000 [6:01:46<2:52:14,  1.62it/s]


 66%|█████████████████████▉           | 33243/50000 [6:01:47<3:01:15,  1.54it/s]


 66%|█████████████████████▉           | 33244/50000 [6:01:48<3:10:44,  1.46it/s]


 66%|█████████████████████▉           | 33245/50000 [6:01:48<3:10:16,  1.47it/s]


 66%|█████████████████████▉           | 33246/50000 [6:01:49<3:07:47,  1.49it/s]


 66%|█████████████████████▉           | 33247/50000 [6:01:50<3:08:43,  1.48it/s]


 66%|█████████████████████▉           | 33248/50000 [6:01:50<3:02:31,  1.53it/s]


 66%|█████████████████████▉           | 33249/50000 [6:01:51<2:57:04,  1.58it/s]


 66%|█████████████████████▉           | 33250/50000 [6:01:52<3:18:46,  1.40it/s]


 67%|█████████████████████▉           | 33251/50000 [6:01:52<3:11:39,  1.46it/s]


 67%|█████████████████████▉           | 33252/50000 [6:01:53<3:21:58,  1.38it/s]


 67%|█████████████████████▉           | 33253/50000 [6:01:54<3:14:23,  1.44it/s]


 67%|█████████████████████▉           | 33254/50000 [6:01:54<3:18:15,  1.41it/s]


 67%|█████████████████████▉           | 33255/50000 [6:01:55<3:20:27,  1.39it/s]


 67%|█████████████████████▉           | 33256/50000 [6:01:56<3:14:52,  1.43it/s]


 67%|█████████████████████▉           | 33257/50000 [6:01:57<3:09:23,  1.47it/s]


 67%|█████████████████████▉           | 33258/50000 [6:01:57<3:13:48,  1.44it/s]


 67%|█████████████████████▉           | 33259/50000 [6:01:58<3:08:44,  1.48it/s]


 67%|█████████████████████▉           | 33260/50000 [6:01:59<3:14:20,  1.44it/s]


 67%|█████████████████████▉           | 33261/50000 [6:01:59<3:04:49,  1.51it/s]


 67%|█████████████████████▉           | 33262/50000 [6:02:00<2:58:40,  1.56it/s]


 67%|█████████████████████▉           | 33263/50000 [6:02:00<2:57:31,  1.57it/s]


 67%|█████████████████████▉           | 33264/50000 [6:02:01<2:54:36,  1.60it/s]


 67%|█████████████████████▉           | 33265/50000 [6:02:02<2:52:33,  1.62it/s]


 67%|█████████████████████▉           | 33266/50000 [6:02:02<3:03:40,  1.52it/s]


 67%|█████████████████████▉           | 33267/50000 [6:02:03<3:03:42,  1.52it/s]


 67%|█████████████████████▉           | 33268/50000 [6:02:04<2:59:26,  1.55it/s]


 67%|█████████████████████▉           | 33269/50000 [6:02:04<3:08:56,  1.48it/s]


 67%|█████████████████████▉           | 33270/50000 [6:02:05<3:04:50,  1.51it/s]


 67%|█████████████████████▉           | 33271/50000 [6:02:06<3:06:46,  1.49it/s]


 67%|█████████████████████▉           | 33272/50000 [6:02:06<2:57:03,  1.57it/s]


 67%|█████████████████████▉           | 33273/50000 [6:02:07<2:47:18,  1.67it/s]


 67%|█████████████████████▉           | 33274/50000 [6:02:07<2:56:37,  1.58it/s]


 67%|█████████████████████▉           | 33275/50000 [6:02:08<2:56:49,  1.58it/s]


 67%|█████████████████████▉           | 33276/50000 [6:02:09<2:53:43,  1.60it/s]


 67%|█████████████████████▉           | 33277/50000 [6:02:09<3:02:18,  1.53it/s]


 67%|█████████████████████▉           | 33278/50000 [6:02:10<3:00:42,  1.54it/s]


 67%|█████████████████████▉           | 33279/50000 [6:02:11<2:59:29,  1.55it/s]


 67%|█████████████████████▉           | 33280/50000 [6:02:11<2:52:16,  1.62it/s]


 67%|█████████████████████▉           | 33281/50000 [6:02:12<2:50:19,  1.64it/s]


 67%|█████████████████████▉           | 33282/50000 [6:02:12<2:51:09,  1.63it/s]


 67%|█████████████████████▉           | 33283/50000 [6:02:13<2:55:54,  1.58it/s]


 67%|█████████████████████▉           | 33284/50000 [6:02:14<2:52:51,  1.61it/s]


 67%|█████████████████████▉           | 33285/50000 [6:02:14<2:54:25,  1.60it/s]


 67%|█████████████████████▉           | 33286/50000 [6:02:15<2:51:18,  1.63it/s]


 67%|█████████████████████▉           | 33287/50000 [6:02:16<2:50:08,  1.64it/s]


 67%|█████████████████████▉           | 33288/50000 [6:02:16<2:52:34,  1.61it/s]


 67%|█████████████████████▉           | 33289/50000 [6:02:17<2:50:01,  1.64it/s]


 67%|█████████████████████▉           | 33290/50000 [6:02:18<3:02:14,  1.53it/s]


 67%|█████████████████████▉           | 33291/50000 [6:02:18<2:50:42,  1.63it/s]


 67%|█████████████████████▉           | 33292/50000 [6:02:19<2:53:24,  1.61it/s]


 67%|█████████████████████▉           | 33293/50000 [6:02:19<2:55:50,  1.58it/s]


 67%|█████████████████████▉           | 33294/50000 [6:02:20<2:46:13,  1.68it/s]


 67%|█████████████████████▉           | 33295/50000 [6:02:21<2:48:13,  1.66it/s]


 67%|█████████████████████▉           | 33296/50000 [6:02:21<2:50:09,  1.64it/s]


 67%|█████████████████████▉           | 33297/50000 [6:02:22<2:51:28,  1.62it/s]


 67%|█████████████████████▉           | 33298/50000 [6:02:22<2:47:39,  1.66it/s]


 67%|█████████████████████▉           | 33299/50000 [6:02:23<2:43:55,  1.70it/s]


 67%|█████████████████████▉           | 33300/50000 [6:02:24<2:48:07,  1.66it/s]
                                                                                
{'loss': 3.248, 'grad_norm': 3.0452826023101807, 'learning_rate': 0.00033400000000000004, 'epoch': 1.74}

 67%|█████████████████████▉           | 33300/50000 [6:02:24<2:48:07,  1.66it/s]


 67%|█████████████████████▉           | 33301/50000 [6:02:24<2:50:03,  1.64it/s]


 67%|█████████████████████▉           | 33302/50000 [6:02:25<3:10:32,  1.46it/s]


 67%|█████████████████████▉           | 33303/50000 [6:02:26<3:10:32,  1.46it/s]


 67%|█████████████████████▉           | 33304/50000 [6:02:26<3:08:55,  1.47it/s]


 67%|█████████████████████▉           | 33305/50000 [6:02:27<3:06:39,  1.49it/s]


 67%|█████████████████████▉           | 33306/50000 [6:02:28<3:11:42,  1.45it/s]


 67%|█████████████████████▉           | 33307/50000 [6:02:28<3:15:12,  1.43it/s]


 67%|█████████████████████▉           | 33308/50000 [6:02:29<3:06:49,  1.49it/s]


 67%|█████████████████████▉           | 33309/50000 [6:02:30<3:07:24,  1.48it/s]


 67%|█████████████████████▉           | 33310/50000 [6:02:30<3:06:49,  1.49it/s]


 67%|█████████████████████▉           | 33311/50000 [6:02:31<3:04:06,  1.51it/s]


 67%|█████████████████████▉           | 33312/50000 [6:02:32<3:02:13,  1.53it/s]


 67%|█████████████████████▉           | 33313/50000 [6:02:32<3:00:22,  1.54it/s]


 67%|█████████████████████▉           | 33314/50000 [6:02:33<2:52:16,  1.61it/s]


 67%|█████████████████████▉           | 33315/50000 [6:02:33<2:48:39,  1.65it/s]


 67%|█████████████████████▉           | 33316/50000 [6:02:34<2:57:34,  1.57it/s]


 67%|█████████████████████▉           | 33317/50000 [6:02:35<2:57:09,  1.57it/s]


 67%|█████████████████████▉           | 33318/50000 [6:02:35<2:53:07,  1.61it/s]


 67%|█████████████████████▉           | 33319/50000 [6:02:36<2:49:13,  1.64it/s]


 67%|█████████████████████▉           | 33320/50000 [6:02:37<3:07:33,  1.48it/s]


 67%|█████████████████████▉           | 33321/50000 [6:02:37<3:00:56,  1.54it/s]


 67%|█████████████████████▉           | 33322/50000 [6:02:38<2:56:48,  1.57it/s]


 67%|█████████████████████▉           | 33323/50000 [6:02:39<2:59:55,  1.54it/s]


 67%|█████████████████████▉           | 33324/50000 [6:02:39<2:58:19,  1.56it/s]


 67%|█████████████████████▉           | 33325/50000 [6:02:40<3:04:30,  1.51it/s]


 67%|█████████████████████▉           | 33326/50000 [6:02:41<3:03:29,  1.51it/s]


 67%|█████████████████████▉           | 33327/50000 [6:02:41<3:01:45,  1.53it/s]


 67%|█████████████████████▉           | 33328/50000 [6:02:42<3:26:25,  1.35it/s]


 67%|█████████████████████▉           | 33329/50000 [6:02:43<3:06:29,  1.49it/s]


 67%|█████████████████████▉           | 33330/50000 [6:02:43<3:00:06,  1.54it/s]


 67%|█████████████████████▉           | 33331/50000 [6:02:44<3:07:27,  1.48it/s]


 67%|█████████████████████▉           | 33332/50000 [6:02:45<3:07:39,  1.48it/s]


 67%|█████████████████████▉           | 33333/50000 [6:02:45<3:01:28,  1.53it/s]


 67%|██████████████████████           | 33334/50000 [6:02:46<2:58:45,  1.55it/s]


 67%|██████████████████████           | 33335/50000 [6:02:47<2:53:40,  1.60it/s]


 67%|██████████████████████           | 33336/50000 [6:02:47<2:49:35,  1.64it/s]


 67%|██████████████████████           | 33337/50000 [6:02:48<2:45:40,  1.68it/s]


 67%|██████████████████████           | 33338/50000 [6:02:48<2:43:59,  1.69it/s]


 67%|██████████████████████           | 33339/50000 [6:02:49<2:49:04,  1.64it/s]


 67%|██████████████████████           | 33340/50000 [6:02:50<2:47:58,  1.65it/s]


 67%|██████████████████████           | 33341/50000 [6:02:50<3:07:40,  1.48it/s]


 67%|██████████████████████           | 33342/50000 [6:02:51<2:52:16,  1.61it/s]


 67%|██████████████████████           | 33343/50000 [6:02:52<2:52:47,  1.61it/s]


 67%|██████████████████████           | 33344/50000 [6:02:52<3:15:50,  1.42it/s]


 67%|██████████████████████           | 33345/50000 [6:02:53<3:35:13,  1.29it/s]


 67%|██████████████████████           | 33346/50000 [6:02:54<3:25:31,  1.35it/s]


 67%|██████████████████████           | 33347/50000 [6:02:55<3:12:53,  1.44it/s]


 67%|██████████████████████           | 33348/50000 [6:02:55<3:07:20,  1.48it/s]


 67%|██████████████████████           | 33349/50000 [6:02:56<3:01:21,  1.53it/s]


 67%|██████████████████████           | 33350/50000 [6:02:56<2:58:47,  1.55it/s]


 67%|██████████████████████           | 33351/50000 [6:02:57<2:57:29,  1.56it/s]


 67%|██████████████████████           | 33352/50000 [6:02:58<3:01:14,  1.53it/s]


 67%|██████████████████████           | 33353/50000 [6:02:58<3:02:27,  1.52it/s]


 67%|██████████████████████           | 33354/50000 [6:02:59<3:04:22,  1.50it/s]


 67%|██████████████████████           | 33355/50000 [6:03:00<2:56:07,  1.58it/s]


 67%|██████████████████████           | 33356/50000 [6:03:00<2:53:29,  1.60it/s]


 67%|██████████████████████           | 33357/50000 [6:03:01<2:57:26,  1.56it/s]


 67%|██████████████████████           | 33358/50000 [6:03:02<2:53:20,  1.60it/s]


 67%|██████████████████████           | 33359/50000 [6:03:02<3:06:00,  1.49it/s]


 67%|██████████████████████           | 33360/50000 [6:03:03<3:10:08,  1.46it/s]


 67%|██████████████████████           | 33361/50000 [6:03:04<3:20:40,  1.38it/s]


 67%|██████████████████████           | 33362/50000 [6:03:04<3:12:51,  1.44it/s]


 67%|██████████████████████           | 33363/50000 [6:03:05<2:56:03,  1.57it/s]


 67%|██████████████████████           | 33364/50000 [6:03:06<2:53:55,  1.59it/s]


 67%|██████████████████████           | 33365/50000 [6:03:06<2:47:46,  1.65it/s]


 67%|██████████████████████           | 33366/50000 [6:03:07<2:46:05,  1.67it/s]


 67%|██████████████████████           | 33367/50000 [6:03:08<3:06:26,  1.49it/s]


 67%|██████████████████████           | 33368/50000 [6:03:08<3:01:12,  1.53it/s]


 67%|██████████████████████           | 33369/50000 [6:03:09<2:52:25,  1.61it/s]


 67%|██████████████████████           | 33370/50000 [6:03:09<2:49:21,  1.64it/s]


 67%|██████████████████████           | 33371/50000 [6:03:10<2:44:26,  1.69it/s]


 67%|██████████████████████           | 33372/50000 [6:03:10<2:41:27,  1.72it/s]


 67%|██████████████████████           | 33373/50000 [6:03:11<2:42:29,  1.71it/s]


 67%|██████████████████████           | 33374/50000 [6:03:12<2:44:15,  1.69it/s]


 67%|██████████████████████           | 33375/50000 [6:03:12<2:50:45,  1.62it/s]


 67%|██████████████████████           | 33376/50000 [6:03:13<2:50:02,  1.63it/s]


 67%|██████████████████████           | 33377/50000 [6:03:13<2:47:11,  1.66it/s]


 67%|██████████████████████           | 33378/50000 [6:03:14<2:45:17,  1.68it/s]


 67%|██████████████████████           | 33379/50000 [6:03:15<2:50:25,  1.63it/s]


 67%|██████████████████████           | 33380/50000 [6:03:15<2:52:25,  1.61it/s]


 67%|██████████████████████           | 33381/50000 [6:03:16<3:00:09,  1.54it/s]


 67%|██████████████████████           | 33382/50000 [6:03:17<2:57:42,  1.56it/s]


 67%|██████████████████████           | 33383/50000 [6:03:17<3:03:19,  1.51it/s]


 67%|██████████████████████           | 33384/50000 [6:03:18<3:08:28,  1.47it/s]


 67%|██████████████████████           | 33385/50000 [6:03:19<3:19:09,  1.39it/s]


 67%|██████████████████████           | 33386/50000 [6:03:20<3:05:43,  1.49it/s]


 67%|██████████████████████           | 33387/50000 [6:03:20<2:53:23,  1.60it/s]


 67%|██████████████████████           | 33388/50000 [6:03:21<2:56:33,  1.57it/s]


 67%|██████████████████████           | 33389/50000 [6:03:21<3:00:24,  1.53it/s]


 67%|██████████████████████           | 33390/50000 [6:03:22<2:56:14,  1.57it/s]


 67%|██████████████████████           | 33391/50000 [6:03:23<2:50:44,  1.62it/s]


 67%|██████████████████████           | 33392/50000 [6:03:23<2:58:25,  1.55it/s]


 67%|██████████████████████           | 33393/50000 [6:03:24<2:51:43,  1.61it/s]


 67%|██████████████████████           | 33394/50000 [6:03:25<3:00:48,  1.53it/s]


 67%|██████████████████████           | 33395/50000 [6:03:25<3:06:10,  1.49it/s]


 67%|██████████████████████           | 33396/50000 [6:03:26<3:19:05,  1.39it/s]


 67%|██████████████████████           | 33397/50000 [6:03:27<3:09:15,  1.46it/s]


 67%|██████████████████████           | 33398/50000 [6:03:28<3:22:53,  1.36it/s]


 67%|██████████████████████           | 33399/50000 [6:03:28<3:10:53,  1.45it/s]


 67%|██████████████████████           | 33400/50000 [6:03:29<3:08:20,  1.47it/s]
                                                                                
{'loss': 3.2434, 'grad_norm': 5.34166955947876, 'learning_rate': 0.00033200000000000005, 'epoch': 1.75}

 67%|██████████████████████           | 33400/50000 [6:03:29<3:08:20,  1.47it/s]


 67%|██████████████████████           | 33401/50000 [6:03:29<3:03:53,  1.50it/s]


 67%|██████████████████████           | 33402/50000 [6:03:30<3:00:43,  1.53it/s]


 67%|██████████████████████           | 33403/50000 [6:03:31<3:01:10,  1.53it/s]


 67%|██████████████████████           | 33404/50000 [6:03:31<2:59:22,  1.54it/s]


 67%|██████████████████████           | 33405/50000 [6:03:32<2:53:59,  1.59it/s]


 67%|██████████████████████           | 33406/50000 [6:03:32<2:41:02,  1.72it/s]


 67%|██████████████████████           | 33407/50000 [6:03:33<2:42:07,  1.71it/s]


 67%|██████████████████████           | 33408/50000 [6:03:34<2:47:39,  1.65it/s]


 67%|██████████████████████           | 33409/50000 [6:03:34<2:57:37,  1.56it/s]


 67%|██████████████████████           | 33410/50000 [6:03:35<2:47:45,  1.65it/s]


 67%|██████████████████████           | 33411/50000 [6:03:36<3:00:20,  1.53it/s]


 67%|██████████████████████           | 33412/50000 [6:03:36<2:48:14,  1.64it/s]


 67%|██████████████████████           | 33413/50000 [6:03:37<2:46:47,  1.66it/s]


 67%|██████████████████████           | 33414/50000 [6:03:37<2:57:14,  1.56it/s]


 67%|██████████████████████           | 33415/50000 [6:03:38<2:57:30,  1.56it/s]


 67%|██████████████████████           | 33416/50000 [6:03:39<2:56:55,  1.56it/s]


 67%|██████████████████████           | 33417/50000 [6:03:39<2:56:53,  1.56it/s]


 67%|██████████████████████           | 33418/50000 [6:03:40<2:50:34,  1.62it/s]


 67%|██████████████████████           | 33419/50000 [6:03:41<2:48:12,  1.64it/s]


 67%|██████████████████████           | 33420/50000 [6:03:41<2:44:57,  1.68it/s]


 67%|██████████████████████           | 33421/50000 [6:03:42<3:03:38,  1.50it/s]


 67%|██████████████████████           | 33422/50000 [6:03:43<2:57:21,  1.56it/s]


 67%|██████████████████████           | 33423/50000 [6:03:43<2:56:37,  1.56it/s]


 67%|██████████████████████           | 33424/50000 [6:03:44<2:51:30,  1.61it/s]


 67%|██████████████████████           | 33425/50000 [6:03:44<2:46:19,  1.66it/s]


 67%|██████████████████████           | 33426/50000 [6:03:45<2:43:39,  1.69it/s]


 67%|██████████████████████           | 33427/50000 [6:03:46<3:01:40,  1.52it/s]


 67%|██████████████████████           | 33428/50000 [6:03:46<2:54:56,  1.58it/s]


 67%|██████████████████████           | 33429/50000 [6:03:47<2:52:17,  1.60it/s]


 67%|██████████████████████           | 33430/50000 [6:03:47<2:52:01,  1.61it/s]


 67%|██████████████████████           | 33431/50000 [6:03:48<2:53:23,  1.59it/s]


 67%|██████████████████████           | 33432/50000 [6:03:49<2:49:38,  1.63it/s]


 67%|██████████████████████           | 33433/50000 [6:03:50<3:16:54,  1.40it/s]


 67%|██████████████████████           | 33434/50000 [6:03:50<3:09:17,  1.46it/s]


 67%|██████████████████████           | 33435/50000 [6:03:51<3:01:48,  1.52it/s]


 67%|██████████████████████           | 33436/50000 [6:03:52<3:23:53,  1.35it/s]


 67%|██████████████████████           | 33437/50000 [6:03:52<3:10:29,  1.45it/s]


 67%|██████████████████████           | 33438/50000 [6:03:53<3:24:11,  1.35it/s]


 67%|██████████████████████           | 33439/50000 [6:03:54<3:31:04,  1.31it/s]


 67%|██████████████████████           | 33440/50000 [6:03:55<3:16:35,  1.40it/s]


 67%|██████████████████████           | 33441/50000 [6:03:55<3:00:54,  1.53it/s]


 67%|██████████████████████           | 33442/50000 [6:03:56<3:03:14,  1.51it/s]


 67%|██████████████████████           | 33443/50000 [6:03:57<3:04:52,  1.49it/s]


 67%|██████████████████████           | 33444/50000 [6:03:57<2:52:09,  1.60it/s]


 67%|██████████████████████           | 33445/50000 [6:03:58<2:59:58,  1.53it/s]


 67%|██████████████████████           | 33446/50000 [6:03:58<3:01:02,  1.52it/s]


 67%|██████████████████████           | 33447/50000 [6:03:59<3:02:47,  1.51it/s]


 67%|██████████████████████           | 33448/50000 [6:04:00<2:58:13,  1.55it/s]


 67%|██████████████████████           | 33449/50000 [6:04:00<2:59:45,  1.53it/s]


 67%|██████████████████████           | 33450/50000 [6:04:01<3:08:25,  1.46it/s]


 67%|██████████████████████           | 33451/50000 [6:04:02<3:03:47,  1.50it/s]


 67%|██████████████████████           | 33452/50000 [6:04:02<3:02:06,  1.51it/s]


 67%|██████████████████████           | 33453/50000 [6:04:03<2:59:21,  1.54it/s]


 67%|██████████████████████           | 33454/50000 [6:04:04<2:57:01,  1.56it/s]


 67%|██████████████████████           | 33455/50000 [6:04:05<3:25:20,  1.34it/s]


 67%|██████████████████████           | 33456/50000 [6:04:05<3:12:33,  1.43it/s]


 67%|██████████████████████           | 33457/50000 [6:04:06<3:00:23,  1.53it/s]


 67%|██████████████████████           | 33458/50000 [6:04:07<3:08:15,  1.46it/s]


 67%|██████████████████████           | 33459/50000 [6:04:07<3:04:39,  1.49it/s]


 67%|██████████████████████           | 33460/50000 [6:04:08<2:58:00,  1.55it/s]


 67%|██████████████████████           | 33461/50000 [6:04:08<2:59:05,  1.54it/s]


 67%|██████████████████████           | 33462/50000 [6:04:09<2:59:49,  1.53it/s]


 67%|██████████████████████           | 33463/50000 [6:04:10<3:00:20,  1.53it/s]


 67%|██████████████████████           | 33464/50000 [6:04:11<3:13:46,  1.42it/s]


 67%|██████████████████████           | 33465/50000 [6:04:11<3:11:39,  1.44it/s]


 67%|██████████████████████           | 33466/50000 [6:04:12<3:23:46,  1.35it/s]


 67%|██████████████████████           | 33467/50000 [6:04:13<3:08:44,  1.46it/s]


 67%|██████████████████████           | 33468/50000 [6:04:13<3:08:38,  1.46it/s]


 67%|██████████████████████           | 33469/50000 [6:04:14<3:07:00,  1.47it/s]


 67%|██████████████████████           | 33470/50000 [6:04:15<3:07:34,  1.47it/s]


 67%|██████████████████████           | 33471/50000 [6:04:15<3:04:18,  1.49it/s]


 67%|██████████████████████           | 33472/50000 [6:04:16<3:11:08,  1.44it/s]


 67%|██████████████████████           | 33473/50000 [6:04:17<3:02:38,  1.51it/s]


 67%|██████████████████████           | 33474/50000 [6:04:17<3:01:45,  1.52it/s]


 67%|██████████████████████           | 33475/50000 [6:04:18<3:03:22,  1.50it/s]


 67%|██████████████████████           | 33476/50000 [6:04:19<3:00:20,  1.53it/s]


 67%|██████████████████████           | 33477/50000 [6:04:19<2:59:12,  1.54it/s]


 67%|██████████████████████           | 33478/50000 [6:04:20<3:05:26,  1.48it/s]


 67%|██████████████████████           | 33479/50000 [6:04:21<3:03:19,  1.50it/s]


 67%|██████████████████████           | 33480/50000 [6:04:21<3:02:25,  1.51it/s]


 67%|██████████████████████           | 33481/50000 [6:04:22<3:03:03,  1.50it/s]


 67%|██████████████████████           | 33482/50000 [6:04:23<2:59:52,  1.53it/s]


 67%|██████████████████████           | 33483/50000 [6:04:23<2:57:51,  1.55it/s]


 67%|██████████████████████           | 33484/50000 [6:04:24<3:01:04,  1.52it/s]


 67%|██████████████████████           | 33485/50000 [6:04:25<3:15:12,  1.41it/s]


 67%|██████████████████████           | 33486/50000 [6:04:25<3:19:36,  1.38it/s]


 67%|██████████████████████           | 33487/50000 [6:04:26<3:11:29,  1.44it/s]


 67%|██████████████████████           | 33488/50000 [6:04:27<3:00:06,  1.53it/s]


 67%|██████████████████████           | 33489/50000 [6:04:27<2:57:48,  1.55it/s]


 67%|██████████████████████           | 33490/50000 [6:04:28<3:04:14,  1.49it/s]


 67%|██████████████████████           | 33491/50000 [6:04:29<3:02:47,  1.51it/s]


 67%|██████████████████████           | 33492/50000 [6:04:29<2:51:03,  1.61it/s]


 67%|██████████████████████           | 33493/50000 [6:04:30<3:02:09,  1.51it/s]


 67%|██████████████████████           | 33494/50000 [6:04:31<2:57:39,  1.55it/s]


 67%|██████████████████████           | 33495/50000 [6:04:31<2:46:25,  1.65it/s]


 67%|██████████████████████           | 33496/50000 [6:04:32<2:52:58,  1.59it/s]


 67%|██████████████████████           | 33497/50000 [6:04:32<2:55:27,  1.57it/s]


 67%|██████████████████████           | 33498/50000 [6:04:33<2:44:55,  1.67it/s]


 67%|██████████████████████           | 33499/50000 [6:04:34<2:51:19,  1.61it/s]


 67%|██████████████████████           | 33500/50000 [6:04:34<3:09:23,  1.45it/s]
                                                                                
{'loss': 3.2844, 'grad_norm': 3.857142925262451, 'learning_rate': 0.00033, 'epoch': 1.75}

 67%|██████████████████████           | 33500/50000 [6:04:34<3:09:23,  1.45it/s]


 67%|██████████████████████           | 33501/50000 [6:04:35<3:08:00,  1.46it/s]


 67%|██████████████████████           | 33502/50000 [6:04:36<2:58:39,  1.54it/s]


 67%|██████████████████████           | 33503/50000 [6:04:36<2:50:32,  1.61it/s]


 67%|██████████████████████           | 33504/50000 [6:04:37<2:50:35,  1.61it/s]


 67%|██████████████████████           | 33505/50000 [6:04:37<2:46:58,  1.65it/s]


 67%|██████████████████████           | 33506/50000 [6:04:38<2:46:59,  1.65it/s]


 67%|██████████████████████           | 33507/50000 [6:04:39<2:42:26,  1.69it/s]


 67%|██████████████████████           | 33508/50000 [6:04:39<2:46:08,  1.65it/s]


 67%|██████████████████████           | 33509/50000 [6:04:40<2:49:43,  1.62it/s]


 67%|██████████████████████           | 33510/50000 [6:04:41<2:52:10,  1.60it/s]


 67%|██████████████████████           | 33511/50000 [6:04:41<2:59:40,  1.53it/s]


 67%|██████████████████████           | 33512/50000 [6:04:42<2:51:22,  1.60it/s]


 67%|██████████████████████           | 33513/50000 [6:04:42<2:52:50,  1.59it/s]


 67%|██████████████████████           | 33514/50000 [6:04:43<3:09:57,  1.45it/s]


 67%|██████████████████████           | 33515/50000 [6:04:44<3:07:24,  1.47it/s]


 67%|██████████████████████           | 33516/50000 [6:04:45<3:04:01,  1.49it/s]


 67%|██████████████████████           | 33517/50000 [6:04:45<2:58:01,  1.54it/s]


 67%|██████████████████████           | 33518/50000 [6:04:46<2:47:19,  1.64it/s]


 67%|██████████████████████           | 33519/50000 [6:04:46<3:01:46,  1.51it/s]


 67%|██████████████████████           | 33520/50000 [6:04:47<3:14:41,  1.41it/s]


 67%|██████████████████████           | 33521/50000 [6:04:48<3:14:59,  1.41it/s]


 67%|██████████████████████           | 33522/50000 [6:04:49<3:06:42,  1.47it/s]


 67%|██████████████████████▏          | 33523/50000 [6:04:49<3:03:01,  1.50it/s]


 67%|██████████████████████▏          | 33524/50000 [6:04:50<2:57:50,  1.54it/s]


 67%|██████████████████████▏          | 33525/50000 [6:04:51<2:58:54,  1.53it/s]


 67%|██████████████████████▏          | 33526/50000 [6:04:51<3:01:06,  1.52it/s]


 67%|██████████████████████▏          | 33527/50000 [6:04:52<3:20:23,  1.37it/s]


 67%|██████████████████████▏          | 33528/50000 [6:04:53<3:12:25,  1.43it/s]


 67%|██████████████████████▏          | 33529/50000 [6:04:53<3:07:53,  1.46it/s]


 67%|██████████████████████▏          | 33530/50000 [6:04:54<2:54:02,  1.58it/s]


 67%|██████████████████████▏          | 33531/50000 [6:04:55<2:53:04,  1.59it/s]


 67%|██████████████████████▏          | 33532/50000 [6:04:55<2:49:18,  1.62it/s]


 67%|██████████████████████▏          | 33533/50000 [6:04:56<2:54:27,  1.57it/s]


 67%|██████████████████████▏          | 33534/50000 [6:04:57<3:05:02,  1.48it/s]


 67%|██████████████████████▏          | 33535/50000 [6:04:57<3:10:53,  1.44it/s]


 67%|██████████████████████▏          | 33536/50000 [6:04:58<3:07:27,  1.46it/s]


 67%|██████████████████████▏          | 33537/50000 [6:04:58<2:54:24,  1.57it/s]


 67%|██████████████████████▏          | 33538/50000 [6:04:59<2:53:51,  1.58it/s]


 67%|██████████████████████▏          | 33539/50000 [6:05:00<2:48:49,  1.63it/s]


 67%|██████████████████████▏          | 33540/50000 [6:05:00<2:48:20,  1.63it/s]


 67%|██████████████████████▏          | 33541/50000 [6:05:01<2:45:56,  1.65it/s]


 67%|██████████████████████▏          | 33542/50000 [6:05:02<3:05:05,  1.48it/s]


 67%|██████████████████████▏          | 33543/50000 [6:05:02<2:57:40,  1.54it/s]


 67%|██████████████████████▏          | 33544/50000 [6:05:03<2:56:32,  1.55it/s]


 67%|██████████████████████▏          | 33545/50000 [6:05:04<3:06:49,  1.47it/s]


 67%|██████████████████████▏          | 33546/50000 [6:05:04<3:09:39,  1.45it/s]


 67%|██████████████████████▏          | 33547/50000 [6:05:05<3:06:23,  1.47it/s]


 67%|██████████████████████▏          | 33548/50000 [6:05:06<3:04:40,  1.48it/s]


 67%|██████████████████████▏          | 33549/50000 [6:05:06<3:11:59,  1.43it/s]


 67%|██████████████████████▏          | 33550/50000 [6:05:07<3:02:49,  1.50it/s]


 67%|██████████████████████▏          | 33551/50000 [6:05:08<3:01:34,  1.51it/s]


 67%|██████████████████████▏          | 33552/50000 [6:05:08<2:58:20,  1.54it/s]


 67%|██████████████████████▏          | 33553/50000 [6:05:09<2:52:56,  1.59it/s]


 67%|██████████████████████▏          | 33554/50000 [6:05:10<2:48:37,  1.63it/s]


 67%|██████████████████████▏          | 33555/50000 [6:05:10<3:04:41,  1.48it/s]


 67%|██████████████████████▏          | 33556/50000 [6:05:11<2:48:41,  1.62it/s]


 67%|██████████████████████▏          | 33557/50000 [6:05:11<2:49:07,  1.62it/s]


 67%|██████████████████████▏          | 33558/50000 [6:05:12<2:58:56,  1.53it/s]


 67%|██████████████████████▏          | 33559/50000 [6:05:13<2:58:21,  1.54it/s]


 67%|██████████████████████▏          | 33560/50000 [6:05:13<2:58:26,  1.54it/s]


 67%|██████████████████████▏          | 33561/50000 [6:05:14<3:04:24,  1.49it/s]


 67%|██████████████████████▏          | 33562/50000 [6:05:15<3:02:07,  1.50it/s]


 67%|██████████████████████▏          | 33563/50000 [6:05:16<3:03:39,  1.49it/s]


 67%|██████████████████████▏          | 33564/50000 [6:05:16<3:00:05,  1.52it/s]


 67%|██████████████████████▏          | 33565/50000 [6:05:17<3:16:22,  1.39it/s]


 67%|██████████████████████▏          | 33566/50000 [6:05:18<3:09:58,  1.44it/s]


 67%|██████████████████████▏          | 33567/50000 [6:05:18<3:06:32,  1.47it/s]


 67%|██████████████████████▏          | 33568/50000 [6:05:19<3:03:40,  1.49it/s]


 67%|██████████████████████▏          | 33569/50000 [6:05:20<3:08:03,  1.46it/s]


 67%|██████████████████████▏          | 33570/50000 [6:05:20<2:59:30,  1.53it/s]


 67%|██████████████████████▏          | 33571/50000 [6:05:21<3:08:47,  1.45it/s]


 67%|██████████████████████▏          | 33572/50000 [6:05:22<3:03:38,  1.49it/s]


 67%|██████████████████████▏          | 33573/50000 [6:05:22<3:15:01,  1.40it/s]


 67%|██████████████████████▏          | 33574/50000 [6:05:23<3:03:52,  1.49it/s]


 67%|██████████████████████▏          | 33575/50000 [6:05:24<2:54:30,  1.57it/s]


 67%|██████████████████████▏          | 33576/50000 [6:05:24<2:51:02,  1.60it/s]


 67%|██████████████████████▏          | 33577/50000 [6:05:25<2:59:14,  1.53it/s]


 67%|██████████████████████▏          | 33578/50000 [6:05:26<2:56:59,  1.55it/s]


 67%|██████████████████████▏          | 33579/50000 [6:05:26<2:51:44,  1.59it/s]


 67%|██████████████████████▏          | 33580/50000 [6:05:27<2:55:50,  1.56it/s]


 67%|██████████████████████▏          | 33581/50000 [6:05:27<2:52:57,  1.58it/s]


 67%|██████████████████████▏          | 33582/50000 [6:05:28<2:47:54,  1.63it/s]


 67%|██████████████████████▏          | 33583/50000 [6:05:29<2:51:41,  1.59it/s]


 67%|██████████████████████▏          | 33584/50000 [6:05:29<2:49:00,  1.62it/s]


 67%|██████████████████████▏          | 33585/50000 [6:05:30<2:46:14,  1.65it/s]


 67%|██████████████████████▏          | 33586/50000 [6:05:31<3:06:11,  1.47it/s]


 67%|██████████████████████▏          | 33587/50000 [6:05:31<3:02:10,  1.50it/s]


 67%|██████████████████████▏          | 33588/50000 [6:05:32<3:02:38,  1.50it/s]


 67%|██████████████████████▏          | 33589/50000 [6:05:33<3:00:29,  1.52it/s]


 67%|██████████████████████▏          | 33590/50000 [6:05:33<2:51:06,  1.60it/s]


 67%|██████████████████████▏          | 33591/50000 [6:05:34<2:42:08,  1.69it/s]


 67%|██████████████████████▏          | 33592/50000 [6:05:34<2:34:50,  1.77it/s]


 67%|██████████████████████▏          | 33593/50000 [6:05:35<2:43:39,  1.67it/s]


 67%|██████████████████████▏          | 33594/50000 [6:05:35<2:46:40,  1.64it/s]


 67%|██████████████████████▏          | 33595/50000 [6:05:36<2:43:57,  1.67it/s]


 67%|██████████████████████▏          | 33596/50000 [6:05:37<2:55:41,  1.56it/s]


 67%|██████████████████████▏          | 33597/50000 [6:05:37<2:53:54,  1.57it/s]


 67%|██████████████████████▏          | 33598/50000 [6:05:38<2:57:18,  1.54it/s]


 67%|██████████████████████▏          | 33599/50000 [6:05:39<2:52:24,  1.59it/s]


 67%|██████████████████████▏          | 33600/50000 [6:05:39<2:52:35,  1.58it/s]
                                                                                
{'loss': 3.2123, 'grad_norm': 3.0071585178375244, 'learning_rate': 0.000328, 'epoch': 1.76}

 67%|██████████████████████▏          | 33600/50000 [6:05:39<2:52:35,  1.58it/s]


 67%|██████████████████████▏          | 33601/50000 [6:05:40<2:54:51,  1.56it/s]


 67%|██████████████████████▏          | 33602/50000 [6:05:41<2:48:08,  1.63it/s]


 67%|██████████████████████▏          | 33603/50000 [6:05:41<2:46:27,  1.64it/s]


 67%|██████████████████████▏          | 33604/50000 [6:05:42<2:43:45,  1.67it/s]


 67%|██████████████████████▏          | 33605/50000 [6:05:42<2:49:40,  1.61it/s]


 67%|██████████████████████▏          | 33606/50000 [6:05:43<2:40:36,  1.70it/s]


 67%|██████████████████████▏          | 33607/50000 [6:05:44<2:44:19,  1.66it/s]


 67%|██████████████████████▏          | 33608/50000 [6:05:44<2:53:53,  1.57it/s]


 67%|██████████████████████▏          | 33609/50000 [6:05:45<2:54:39,  1.56it/s]


 67%|██████████████████████▏          | 33610/50000 [6:05:45<2:50:00,  1.61it/s]


 67%|██████████████████████▏          | 33611/50000 [6:05:46<2:54:02,  1.57it/s]


 67%|██████████████████████▏          | 33612/50000 [6:05:47<2:54:44,  1.56it/s]


 67%|██████████████████████▏          | 33613/50000 [6:05:47<2:45:08,  1.65it/s]


 67%|██████████████████████▏          | 33614/50000 [6:05:48<2:56:04,  1.55it/s]


 67%|██████████████████████▏          | 33615/50000 [6:05:49<2:52:11,  1.59it/s]


 67%|██████████████████████▏          | 33616/50000 [6:05:49<3:07:01,  1.46it/s]


 67%|██████████████████████▏          | 33617/50000 [6:05:50<2:55:46,  1.55it/s]


 67%|██████████████████████▏          | 33618/50000 [6:05:51<2:44:31,  1.66it/s]


 67%|██████████████████████▏          | 33619/50000 [6:05:51<2:55:14,  1.56it/s]


 67%|██████████████████████▏          | 33620/50000 [6:05:52<2:57:47,  1.54it/s]


 67%|██████████████████████▏          | 33621/50000 [6:05:52<2:51:13,  1.59it/s]


 67%|██████████████████████▏          | 33622/50000 [6:05:53<2:50:42,  1.60it/s]


 67%|██████████████████████▏          | 33623/50000 [6:05:54<2:52:22,  1.58it/s]


 67%|██████████████████████▏          | 33624/50000 [6:05:55<3:04:38,  1.48it/s]


 67%|██████████████████████▏          | 33625/50000 [6:05:55<3:08:04,  1.45it/s]


 67%|██████████████████████▏          | 33626/50000 [6:05:56<3:04:32,  1.48it/s]


 67%|██████████████████████▏          | 33627/50000 [6:05:57<3:13:09,  1.41it/s]


 67%|██████████████████████▏          | 33628/50000 [6:05:57<3:06:33,  1.46it/s]


 67%|██████████████████████▏          | 33629/50000 [6:05:58<2:59:17,  1.52it/s]


 67%|██████████████████████▏          | 33630/50000 [6:05:58<2:54:42,  1.56it/s]


 67%|██████████████████████▏          | 33631/50000 [6:05:59<3:02:07,  1.50it/s]


 67%|██████████████████████▏          | 33632/50000 [6:06:00<2:54:11,  1.57it/s]


 67%|██████████████████████▏          | 33633/50000 [6:06:01<3:10:47,  1.43it/s]


 67%|██████████████████████▏          | 33634/50000 [6:06:01<3:09:33,  1.44it/s]


 67%|██████████████████████▏          | 33635/50000 [6:06:02<3:04:32,  1.48it/s]


 67%|██████████████████████▏          | 33636/50000 [6:06:03<3:08:24,  1.45it/s]


 67%|██████████████████████▏          | 33637/50000 [6:06:03<3:09:54,  1.44it/s]


 67%|██████████████████████▏          | 33638/50000 [6:06:04<3:07:20,  1.46it/s]


 67%|██████████████████████▏          | 33639/50000 [6:06:05<3:12:27,  1.42it/s]


 67%|██████████████████████▏          | 33640/50000 [6:06:06<3:15:08,  1.40it/s]


 67%|██████████████████████▏          | 33641/50000 [6:06:06<3:16:01,  1.39it/s]


 67%|██████████████████████▏          | 33642/50000 [6:06:07<3:15:56,  1.39it/s]


 67%|██████████████████████▏          | 33643/50000 [6:06:08<3:19:07,  1.37it/s]


 67%|██████████████████████▏          | 33644/50000 [6:06:08<3:08:12,  1.45it/s]


 67%|██████████████████████▏          | 33645/50000 [6:06:09<2:56:28,  1.54it/s]


 67%|██████████████████████▏          | 33646/50000 [6:06:10<2:58:05,  1.53it/s]


 67%|██████████████████████▏          | 33647/50000 [6:06:10<2:57:06,  1.54it/s]


 67%|██████████████████████▏          | 33648/50000 [6:06:11<2:52:48,  1.58it/s]


 67%|██████████████████████▏          | 33649/50000 [6:06:11<2:53:22,  1.57it/s]


 67%|██████████████████████▏          | 33650/50000 [6:06:12<2:59:30,  1.52it/s]


 67%|██████████████████████▏          | 33651/50000 [6:06:13<2:50:54,  1.59it/s]


 67%|██████████████████████▏          | 33652/50000 [6:06:13<2:44:52,  1.65it/s]


 67%|██████████████████████▏          | 33653/50000 [6:06:14<2:49:55,  1.60it/s]


 67%|██████████████████████▏          | 33654/50000 [6:06:15<2:54:03,  1.57it/s]


 67%|██████████████████████▏          | 33655/50000 [6:06:15<3:01:10,  1.50it/s]


 67%|██████████████████████▏          | 33656/50000 [6:06:16<2:58:07,  1.53it/s]


 67%|██████████████████████▏          | 33657/50000 [6:06:17<2:51:18,  1.59it/s]


 67%|██████████████████████▏          | 33658/50000 [6:06:17<2:51:38,  1.59it/s]


 67%|██████████████████████▏          | 33659/50000 [6:06:18<2:51:55,  1.58it/s]


 67%|██████████████████████▏          | 33660/50000 [6:06:19<2:58:17,  1.53it/s]


 67%|██████████████████████▏          | 33661/50000 [6:06:19<2:59:41,  1.52it/s]


 67%|██████████████████████▏          | 33662/50000 [6:06:20<2:57:01,  1.54it/s]


 67%|██████████████████████▏          | 33663/50000 [6:06:20<2:49:27,  1.61it/s]


 67%|██████████████████████▏          | 33664/50000 [6:06:21<2:41:17,  1.69it/s]


 67%|██████████████████████▏          | 33665/50000 [6:06:22<2:54:08,  1.56it/s]


 67%|██████████████████████▏          | 33666/50000 [6:06:22<2:59:49,  1.51it/s]


 67%|██████████████████████▏          | 33667/50000 [6:06:23<3:06:03,  1.46it/s]


 67%|██████████████████████▏          | 33668/50000 [6:06:24<3:01:56,  1.50it/s]


 67%|██████████████████████▏          | 33669/50000 [6:06:24<3:01:38,  1.50it/s]


 67%|██████████████████████▏          | 33670/50000 [6:06:25<3:06:45,  1.46it/s]


 67%|██████████████████████▏          | 33671/50000 [6:06:26<3:10:28,  1.43it/s]


 67%|██████████████████████▏          | 33672/50000 [6:06:26<2:58:52,  1.52it/s]


 67%|██████████████████████▏          | 33673/50000 [6:06:27<2:54:19,  1.56it/s]


 67%|██████████████████████▏          | 33674/50000 [6:06:28<2:55:15,  1.55it/s]


 67%|██████████████████████▏          | 33675/50000 [6:06:28<3:00:34,  1.51it/s]


 67%|██████████████████████▏          | 33676/50000 [6:06:29<3:01:13,  1.50it/s]


 67%|██████████████████████▏          | 33677/50000 [6:06:30<2:58:33,  1.52it/s]


 67%|██████████████████████▏          | 33678/50000 [6:06:30<2:59:44,  1.51it/s]


 67%|██████████████████████▏          | 33679/50000 [6:06:31<2:55:24,  1.55it/s]


 67%|██████████████████████▏          | 33680/50000 [6:06:32<3:06:02,  1.46it/s]


 67%|██████████████████████▏          | 33681/50000 [6:06:32<2:59:19,  1.52it/s]


 67%|██████████████████████▏          | 33682/50000 [6:06:33<3:01:18,  1.50it/s]


 67%|██████████████████████▏          | 33683/50000 [6:06:34<2:55:27,  1.55it/s]


 67%|██████████████████████▏          | 33684/50000 [6:06:34<3:01:26,  1.50it/s]


 67%|██████████████████████▏          | 33685/50000 [6:06:35<2:55:34,  1.55it/s]


 67%|██████████████████████▏          | 33686/50000 [6:06:36<3:04:35,  1.47it/s]


 67%|██████████████████████▏          | 33687/50000 [6:06:36<2:57:48,  1.53it/s]


 67%|██████████████████████▏          | 33688/50000 [6:06:37<3:02:20,  1.49it/s]


 67%|██████████████████████▏          | 33689/50000 [6:06:38<2:54:22,  1.56it/s]


 67%|██████████████████████▏          | 33690/50000 [6:06:38<2:49:11,  1.61it/s]


 67%|██████████████████████▏          | 33691/50000 [6:06:39<2:49:32,  1.60it/s]


 67%|██████████████████████▏          | 33692/50000 [6:06:39<2:51:49,  1.58it/s]


 67%|██████████████████████▏          | 33693/50000 [6:06:40<2:47:51,  1.62it/s]


 67%|██████████████████████▏          | 33694/50000 [6:06:41<2:57:11,  1.53it/s]


 67%|██████████████████████▏          | 33695/50000 [6:06:41<2:54:40,  1.56it/s]


 67%|██████████████████████▏          | 33696/50000 [6:06:42<2:56:28,  1.54it/s]


 67%|██████████████████████▏          | 33697/50000 [6:06:43<3:03:04,  1.48it/s]


 67%|██████████████████████▏          | 33698/50000 [6:06:43<3:03:55,  1.48it/s]


 67%|██████████████████████▏          | 33699/50000 [6:06:44<2:51:06,  1.59it/s]


 67%|██████████████████████▏          | 33700/50000 [6:06:45<2:46:46,  1.63it/s]
                                                                                
{'loss': 3.2144, 'grad_norm': 5.4860992431640625, 'learning_rate': 0.000326, 'epoch': 1.76}

 67%|██████████████████████▏          | 33700/50000 [6:06:45<2:46:46,  1.63it/s]


 67%|██████████████████████▏          | 33701/50000 [6:06:45<2:44:15,  1.65it/s]


 67%|██████████████████████▏          | 33702/50000 [6:06:46<2:45:33,  1.64it/s]


 67%|██████████████████████▏          | 33703/50000 [6:06:46<2:44:26,  1.65it/s]


 67%|██████████████████████▏          | 33704/50000 [6:06:47<2:51:04,  1.59it/s]


 67%|██████████████████████▏          | 33705/50000 [6:06:48<2:47:53,  1.62it/s]


 67%|██████████████████████▏          | 33706/50000 [6:06:48<2:59:53,  1.51it/s]


 67%|██████████████████████▏          | 33707/50000 [6:06:49<2:53:09,  1.57it/s]


 67%|██████████████████████▏          | 33708/50000 [6:06:49<2:43:57,  1.66it/s]


 67%|██████████████████████▏          | 33709/50000 [6:06:50<2:37:29,  1.72it/s]


 67%|██████████████████████▏          | 33710/50000 [6:06:51<3:04:09,  1.47it/s]


 67%|██████████████████████▏          | 33711/50000 [6:06:52<3:03:15,  1.48it/s]


 67%|██████████████████████▏          | 33712/50000 [6:06:52<3:01:02,  1.50it/s]


 67%|██████████████████████▎          | 33713/50000 [6:06:53<3:06:30,  1.46it/s]


 67%|██████████████████████▎          | 33714/50000 [6:06:54<3:03:49,  1.48it/s]


 67%|██████████████████████▎          | 33715/50000 [6:06:54<3:00:18,  1.51it/s]


 67%|██████████████████████▎          | 33716/50000 [6:06:55<3:00:50,  1.50it/s]


 67%|██████████████████████▎          | 33717/50000 [6:06:55<2:49:05,  1.60it/s]


 67%|██████████████████████▎          | 33718/50000 [6:06:56<2:59:56,  1.51it/s]


 67%|██████████████████████▎          | 33719/50000 [6:06:57<3:05:22,  1.46it/s]


 67%|██████████████████████▎          | 33720/50000 [6:06:58<3:07:37,  1.45it/s]


 67%|██████████████████████▎          | 33721/50000 [6:06:58<3:06:07,  1.46it/s]


 67%|██████████████████████▎          | 33722/50000 [6:06:59<3:13:33,  1.40it/s]


 67%|██████████████████████▎          | 33723/50000 [6:07:00<3:06:42,  1.45it/s]


 67%|██████████████████████▎          | 33724/50000 [6:07:00<2:53:22,  1.56it/s]


 67%|██████████████████████▎          | 33725/50000 [6:07:01<2:55:06,  1.55it/s]


 67%|██████████████████████▎          | 33726/50000 [6:07:01<2:48:40,  1.61it/s]


 67%|██████████████████████▎          | 33727/50000 [6:07:02<2:44:06,  1.65it/s]


 67%|██████████████████████▎          | 33728/50000 [6:07:03<2:47:30,  1.62it/s]


 67%|██████████████████████▎          | 33729/50000 [6:07:03<2:47:53,  1.62it/s]


 67%|██████████████████████▎          | 33730/50000 [6:07:04<2:43:24,  1.66it/s]


 67%|██████████████████████▎          | 33731/50000 [6:07:04<2:43:47,  1.66it/s]


 67%|██████████████████████▎          | 33732/50000 [6:07:05<2:39:30,  1.70it/s]


 67%|██████████████████████▎          | 33733/50000 [6:07:06<2:44:39,  1.65it/s]


 67%|██████████████████████▎          | 33734/50000 [6:07:06<2:36:13,  1.74it/s]


 67%|██████████████████████▎          | 33735/50000 [6:07:07<2:36:12,  1.74it/s]


 67%|██████████████████████▎          | 33736/50000 [6:07:07<2:38:14,  1.71it/s]


 67%|██████████████████████▎          | 33737/50000 [6:07:08<3:00:23,  1.50it/s]


 67%|██████████████████████▎          | 33738/50000 [6:07:09<2:54:08,  1.56it/s]


 67%|██████████████████████▎          | 33739/50000 [6:07:09<2:54:54,  1.55it/s]


 67%|██████████████████████▎          | 33740/50000 [6:07:10<2:49:48,  1.60it/s]


 67%|██████████████████████▎          | 33741/50000 [6:07:11<2:38:48,  1.71it/s]


 67%|██████████████████████▎          | 33742/50000 [6:07:11<2:43:07,  1.66it/s]


 67%|██████████████████████▎          | 33743/50000 [6:07:12<2:38:59,  1.70it/s]


 67%|██████████████████████▎          | 33744/50000 [6:07:12<2:42:15,  1.67it/s]


 67%|██████████████████████▎          | 33745/50000 [6:07:13<2:49:17,  1.60it/s]


 67%|██████████████████████▎          | 33746/50000 [6:07:14<2:49:26,  1.60it/s]


 67%|██████████████████████▎          | 33747/50000 [6:07:14<2:46:56,  1.62it/s]


 67%|██████████████████████▎          | 33748/50000 [6:07:15<3:14:34,  1.39it/s]


 67%|██████████████████████▎          | 33749/50000 [6:07:16<3:04:02,  1.47it/s]


 68%|██████████████████████▎          | 33750/50000 [6:07:17<3:07:38,  1.44it/s]


 68%|██████████████████████▎          | 33751/50000 [6:07:17<3:04:51,  1.46it/s]


 68%|██████████████████████▎          | 33752/50000 [6:07:18<2:51:21,  1.58it/s]


 68%|██████████████████████▎          | 33753/50000 [6:07:18<2:54:59,  1.55it/s]


 68%|██████████████████████▎          | 33754/50000 [6:07:19<3:12:29,  1.41it/s]


 68%|██████████████████████▎          | 33755/50000 [6:07:20<2:56:39,  1.53it/s]


 68%|██████████████████████▎          | 33756/50000 [6:07:20<2:50:30,  1.59it/s]


 68%|██████████████████████▎          | 33757/50000 [6:07:21<2:46:42,  1.62it/s]


 68%|██████████████████████▎          | 33758/50000 [6:07:21<2:44:31,  1.65it/s]


 68%|██████████████████████▎          | 33759/50000 [6:07:22<2:40:29,  1.69it/s]


 68%|██████████████████████▎          | 33760/50000 [6:07:23<2:43:15,  1.66it/s]


 68%|██████████████████████▎          | 33761/50000 [6:07:23<2:36:45,  1.73it/s]


 68%|██████████████████████▎          | 33762/50000 [6:07:24<2:44:15,  1.65it/s]


 68%|██████████████████████▎          | 33763/50000 [6:07:25<2:48:25,  1.61it/s]


 68%|██████████████████████▎          | 33764/50000 [6:07:25<2:49:19,  1.60it/s]


 68%|██████████████████████▎          | 33765/50000 [6:07:26<3:07:54,  1.44it/s]


 68%|██████████████████████▎          | 33766/50000 [6:07:27<3:06:32,  1.45it/s]


 68%|██████████████████████▎          | 33767/50000 [6:07:27<2:55:50,  1.54it/s]


 68%|██████████████████████▎          | 33768/50000 [6:07:28<3:04:24,  1.47it/s]


 68%|██████████████████████▎          | 33769/50000 [6:07:29<3:00:57,  1.49it/s]


 68%|██████████████████████▎          | 33770/50000 [6:07:29<2:58:04,  1.52it/s]


 68%|██████████████████████▎          | 33771/50000 [6:07:30<2:56:05,  1.54it/s]


 68%|██████████████████████▎          | 33772/50000 [6:07:31<3:04:39,  1.46it/s]


 68%|██████████████████████▎          | 33773/50000 [6:07:31<2:51:38,  1.58it/s]


 68%|██████████████████████▎          | 33774/50000 [6:07:32<2:44:23,  1.65it/s]


 68%|██████████████████████▎          | 33775/50000 [6:07:32<2:53:19,  1.56it/s]


 68%|██████████████████████▎          | 33776/50000 [6:07:33<2:55:17,  1.54it/s]


 68%|██████████████████████▎          | 33777/50000 [6:07:34<2:54:04,  1.55it/s]


 68%|██████████████████████▎          | 33778/50000 [6:07:34<2:49:06,  1.60it/s]


 68%|██████████████████████▎          | 33779/50000 [6:07:35<3:05:06,  1.46it/s]


 68%|██████████████████████▎          | 33780/50000 [6:07:36<2:57:50,  1.52it/s]


 68%|██████████████████████▎          | 33781/50000 [6:07:36<2:57:23,  1.52it/s]


 68%|██████████████████████▎          | 33782/50000 [6:07:37<2:48:53,  1.60it/s]


 68%|██████████████████████▎          | 33783/50000 [6:07:38<2:53:08,  1.56it/s]


 68%|██████████████████████▎          | 33784/50000 [6:07:38<2:48:25,  1.60it/s]


 68%|██████████████████████▎          | 33785/50000 [6:07:39<2:51:48,  1.57it/s]


 68%|██████████████████████▎          | 33786/50000 [6:07:40<2:53:40,  1.56it/s]


 68%|██████████████████████▎          | 33787/50000 [6:07:40<2:48:49,  1.60it/s]


 68%|██████████████████████▎          | 33788/50000 [6:07:41<2:48:58,  1.60it/s]


 68%|██████████████████████▎          | 33789/50000 [6:07:41<2:47:00,  1.62it/s]


 68%|██████████████████████▎          | 33790/50000 [6:07:42<2:56:09,  1.53it/s]


 68%|██████████████████████▎          | 33791/50000 [6:07:43<3:03:31,  1.47it/s]


 68%|██████████████████████▎          | 33792/50000 [6:07:43<3:01:17,  1.49it/s]


 68%|██████████████████████▎          | 33793/50000 [6:07:44<3:00:45,  1.49it/s]


 68%|██████████████████████▎          | 33794/50000 [6:07:45<3:08:20,  1.43it/s]


 68%|██████████████████████▎          | 33795/50000 [6:07:46<3:04:10,  1.47it/s]


 68%|██████████████████████▎          | 33796/50000 [6:07:46<3:09:40,  1.42it/s]


 68%|██████████████████████▎          | 33797/50000 [6:07:47<3:06:35,  1.45it/s]


 68%|██████████████████████▎          | 33798/50000 [6:07:48<3:10:46,  1.42it/s]


 68%|██████████████████████▎          | 33799/50000 [6:07:48<2:59:44,  1.50it/s]


 68%|██████████████████████▎          | 33800/50000 [6:07:49<2:59:08,  1.51it/s]
                                                                                
{'loss': 3.2518, 'grad_norm': 3.297224760055542, 'learning_rate': 0.000324, 'epoch': 1.77}

 68%|██████████████████████▎          | 33800/50000 [6:07:49<2:59:08,  1.51it/s]


 68%|██████████████████████▎          | 33801/50000 [6:07:49<2:43:43,  1.65it/s]


 68%|██████████████████████▎          | 33802/50000 [6:07:50<2:53:42,  1.55it/s]


 68%|██████████████████████▎          | 33803/50000 [6:07:51<2:49:17,  1.59it/s]


 68%|██████████████████████▎          | 33804/50000 [6:07:51<2:58:08,  1.52it/s]


 68%|██████████████████████▎          | 33805/50000 [6:07:52<2:59:04,  1.51it/s]


 68%|██████████████████████▎          | 33806/50000 [6:07:53<2:54:01,  1.55it/s]


 68%|██████████████████████▎          | 33807/50000 [6:07:53<2:48:28,  1.60it/s]


 68%|██████████████████████▎          | 33808/50000 [6:07:54<2:52:18,  1.57it/s]


 68%|██████████████████████▎          | 33809/50000 [6:07:55<2:59:28,  1.50it/s]


 68%|██████████████████████▎          | 33810/50000 [6:07:55<3:05:25,  1.46it/s]


 68%|██████████████████████▎          | 33811/50000 [6:07:56<2:56:57,  1.52it/s]


 68%|██████████████████████▎          | 33812/50000 [6:07:57<2:48:28,  1.60it/s]


 68%|██████████████████████▎          | 33813/50000 [6:07:57<2:57:31,  1.52it/s]


 68%|██████████████████████▎          | 33814/50000 [6:07:58<2:56:30,  1.53it/s]


 68%|██████████████████████▎          | 33815/50000 [6:07:59<2:54:49,  1.54it/s]


 68%|██████████████████████▎          | 33816/50000 [6:07:59<2:55:08,  1.54it/s]


 68%|██████████████████████▎          | 33817/50000 [6:08:00<3:02:43,  1.48it/s]


 68%|██████████████████████▎          | 33818/50000 [6:08:01<2:55:32,  1.54it/s]


 68%|██████████████████████▎          | 33819/50000 [6:08:01<2:49:30,  1.59it/s]


 68%|██████████████████████▎          | 33820/50000 [6:08:02<2:52:25,  1.56it/s]


 68%|██████████████████████▎          | 33821/50000 [6:08:03<2:54:59,  1.54it/s]


 68%|██████████████████████▎          | 33822/50000 [6:08:03<2:42:16,  1.66it/s]


 68%|██████████████████████▎          | 33823/50000 [6:08:04<2:47:51,  1.61it/s]


 68%|██████████████████████▎          | 33824/50000 [6:08:04<2:57:36,  1.52it/s]


 68%|██████████████████████▎          | 33825/50000 [6:08:05<2:57:02,  1.52it/s]


 68%|██████████████████████▎          | 33826/50000 [6:08:06<2:48:30,  1.60it/s]


 68%|██████████████████████▎          | 33827/50000 [6:08:06<2:46:05,  1.62it/s]


 68%|██████████████████████▎          | 33828/50000 [6:08:07<2:48:29,  1.60it/s]


 68%|██████████████████████▎          | 33829/50000 [6:08:08<2:50:07,  1.58it/s]


 68%|██████████████████████▎          | 33830/50000 [6:08:08<2:53:20,  1.55it/s]


 68%|██████████████████████▎          | 33831/50000 [6:08:09<2:45:24,  1.63it/s]


 68%|██████████████████████▎          | 33832/50000 [6:08:09<2:55:27,  1.54it/s]


 68%|██████████████████████▎          | 33833/50000 [6:08:10<2:48:07,  1.60it/s]


 68%|██████████████████████▎          | 33834/50000 [6:08:11<2:48:30,  1.60it/s]


 68%|██████████████████████▎          | 33835/50000 [6:08:11<2:50:48,  1.58it/s]


 68%|██████████████████████▎          | 33836/50000 [6:08:12<2:45:18,  1.63it/s]


 68%|██████████████████████▎          | 33837/50000 [6:08:13<2:53:44,  1.55it/s]


 68%|██████████████████████▎          | 33838/50000 [6:08:13<2:56:00,  1.53it/s]


 68%|██████████████████████▎          | 33839/50000 [6:08:14<2:54:33,  1.54it/s]


 68%|██████████████████████▎          | 33840/50000 [6:08:14<2:47:58,  1.60it/s]


 68%|██████████████████████▎          | 33841/50000 [6:08:15<2:57:50,  1.51it/s]


 68%|██████████████████████▎          | 33842/50000 [6:08:16<2:57:19,  1.52it/s]


 68%|██████████████████████▎          | 33843/50000 [6:08:17<2:56:59,  1.52it/s]


 68%|██████████████████████▎          | 33844/50000 [6:08:17<2:45:38,  1.63it/s]


 68%|██████████████████████▎          | 33845/50000 [6:08:18<2:45:00,  1.63it/s]


 68%|██████████████████████▎          | 33846/50000 [6:08:18<2:42:37,  1.66it/s]


 68%|██████████████████████▎          | 33847/50000 [6:08:19<2:47:32,  1.61it/s]


 68%|██████████████████████▎          | 33848/50000 [6:08:19<2:43:21,  1.65it/s]


 68%|██████████████████████▎          | 33849/50000 [6:08:20<2:45:28,  1.63it/s]


 68%|██████████████████████▎          | 33850/50000 [6:08:21<2:48:28,  1.60it/s]


 68%|██████████████████████▎          | 33851/50000 [6:08:21<2:39:07,  1.69it/s]


 68%|██████████████████████▎          | 33852/50000 [6:08:22<2:44:01,  1.64it/s]


 68%|██████████████████████▎          | 33853/50000 [6:08:23<2:55:16,  1.54it/s]


 68%|██████████████████████▎          | 33854/50000 [6:08:23<2:50:44,  1.58it/s]


 68%|██████████████████████▎          | 33855/50000 [6:08:24<3:02:33,  1.47it/s]


 68%|██████████████████████▎          | 33856/50000 [6:08:25<2:58:23,  1.51it/s]


 68%|██████████████████████▎          | 33857/50000 [6:08:25<2:57:00,  1.52it/s]


 68%|██████████████████████▎          | 33858/50000 [6:08:26<2:57:04,  1.52it/s]


 68%|██████████████████████▎          | 33859/50000 [6:08:27<2:51:56,  1.56it/s]


 68%|██████████████████████▎          | 33860/50000 [6:08:27<2:48:22,  1.60it/s]


 68%|██████████████████████▎          | 33861/50000 [6:08:28<2:49:33,  1.59it/s]


 68%|██████████████████████▎          | 33862/50000 [6:08:28<2:46:12,  1.62it/s]


 68%|██████████████████████▎          | 33863/50000 [6:08:29<2:56:15,  1.53it/s]


 68%|██████████████████████▎          | 33864/50000 [6:08:30<3:12:26,  1.40it/s]


 68%|██████████████████████▎          | 33865/50000 [6:08:31<3:01:42,  1.48it/s]


 68%|██████████████████████▎          | 33866/50000 [6:08:31<3:05:07,  1.45it/s]


 68%|██████████████████████▎          | 33867/50000 [6:08:32<3:04:12,  1.46it/s]


 68%|██████████████████████▎          | 33868/50000 [6:08:33<3:03:08,  1.47it/s]


 68%|██████████████████████▎          | 33869/50000 [6:08:33<3:01:52,  1.48it/s]


 68%|██████████████████████▎          | 33870/50000 [6:08:34<2:52:46,  1.56it/s]


 68%|██████████████████████▎          | 33871/50000 [6:08:34<2:44:55,  1.63it/s]


 68%|██████████████████████▎          | 33872/50000 [6:08:35<2:39:25,  1.69it/s]


 68%|██████████████████████▎          | 33873/50000 [6:08:36<2:40:34,  1.67it/s]


 68%|██████████████████████▎          | 33874/50000 [6:08:36<2:58:45,  1.50it/s]


 68%|██████████████████████▎          | 33875/50000 [6:08:37<3:08:12,  1.43it/s]


 68%|██████████████████████▎          | 33876/50000 [6:08:38<3:00:23,  1.49it/s]


 68%|██████████████████████▎          | 33877/50000 [6:08:39<3:11:38,  1.40it/s]


 68%|██████████████████████▎          | 33878/50000 [6:08:39<3:03:16,  1.47it/s]


 68%|██████████████████████▎          | 33879/50000 [6:08:40<3:03:04,  1.47it/s]


 68%|██████████████████████▎          | 33880/50000 [6:08:41<3:03:26,  1.46it/s]


 68%|██████████████████████▎          | 33881/50000 [6:08:41<3:05:45,  1.45it/s]


 68%|██████████████████████▎          | 33882/50000 [6:08:42<3:04:48,  1.45it/s]


 68%|██████████████████████▎          | 33883/50000 [6:08:43<2:58:28,  1.51it/s]


 68%|██████████████████████▎          | 33884/50000 [6:08:43<2:53:05,  1.55it/s]


 68%|██████████████████████▎          | 33885/50000 [6:08:44<2:52:49,  1.55it/s]


 68%|██████████████████████▎          | 33886/50000 [6:08:44<2:51:08,  1.57it/s]


 68%|██████████████████████▎          | 33887/50000 [6:08:45<2:55:11,  1.53it/s]


 68%|██████████████████████▎          | 33888/50000 [6:08:46<2:46:30,  1.61it/s]


 68%|██████████████████████▎          | 33889/50000 [6:08:46<2:51:05,  1.57it/s]


 68%|██████████████████████▎          | 33890/50000 [6:08:47<2:50:15,  1.58it/s]


 68%|██████████████████████▎          | 33891/50000 [6:08:48<2:46:19,  1.61it/s]


 68%|██████████████████████▎          | 33892/50000 [6:08:48<2:47:37,  1.60it/s]


 68%|██████████████████████▎          | 33893/50000 [6:08:49<2:49:50,  1.58it/s]


 68%|██████████████████████▎          | 33894/50000 [6:08:49<2:46:47,  1.61it/s]


 68%|██████████████████████▎          | 33895/50000 [6:08:50<2:56:36,  1.52it/s]


 68%|██████████████████████▎          | 33896/50000 [6:08:51<2:57:10,  1.51it/s]


 68%|██████████████████████▎          | 33897/50000 [6:08:51<2:52:25,  1.56it/s]


 68%|██████████████████████▎          | 33898/50000 [6:08:52<2:51:09,  1.57it/s]


 68%|██████████████████████▎          | 33899/50000 [6:08:53<2:51:46,  1.56it/s]


 68%|██████████████████████▎          | 33900/50000 [6:08:53<2:46:40,  1.61it/s]
                                                                                
{'loss': 3.2552, 'grad_norm': 9.185192108154297, 'learning_rate': 0.000322, 'epoch': 1.77}

 68%|██████████████████████▎          | 33900/50000 [6:08:53<2:46:40,  1.61it/s]


 68%|██████████████████████▎          | 33901/50000 [6:08:54<2:48:40,  1.59it/s]


 68%|██████████████████████▍          | 33902/50000 [6:08:55<2:46:57,  1.61it/s]


 68%|██████████████████████▍          | 33903/50000 [6:08:55<2:38:26,  1.69it/s]


 68%|██████████████████████▍          | 33904/50000 [6:08:56<2:50:12,  1.58it/s]


 68%|██████████████████████▍          | 33905/50000 [6:08:56<2:46:33,  1.61it/s]


 68%|██████████████████████▍          | 33906/50000 [6:08:57<2:46:34,  1.61it/s]


 68%|██████████████████████▍          | 33907/50000 [6:08:58<2:43:29,  1.64it/s]


 68%|██████████████████████▍          | 33908/50000 [6:08:58<2:42:49,  1.65it/s]


 68%|██████████████████████▍          | 33909/50000 [6:08:59<2:45:23,  1.62it/s]


 68%|██████████████████████▍          | 33910/50000 [6:08:59<2:48:44,  1.59it/s]


 68%|██████████████████████▍          | 33911/50000 [6:09:00<2:59:27,  1.49it/s]


 68%|██████████████████████▍          | 33912/50000 [6:09:01<2:53:30,  1.55it/s]


 68%|██████████████████████▍          | 33913/50000 [6:09:01<2:46:49,  1.61it/s]


 68%|██████████████████████▍          | 33914/50000 [6:09:02<2:41:04,  1.66it/s]


 68%|██████████████████████▍          | 33915/50000 [6:09:03<2:38:36,  1.69it/s]


 68%|██████████████████████▍          | 33916/50000 [6:09:03<2:58:41,  1.50it/s]


 68%|██████████████████████▍          | 33917/50000 [6:09:04<3:07:53,  1.43it/s]


 68%|██████████████████████▍          | 33918/50000 [6:09:05<3:08:36,  1.42it/s]


 68%|██████████████████████▍          | 33919/50000 [6:09:06<3:13:17,  1.39it/s]


 68%|██████████████████████▍          | 33920/50000 [6:09:06<3:06:13,  1.44it/s]


 68%|██████████████████████▍          | 33921/50000 [6:09:07<2:51:52,  1.56it/s]


 68%|██████████████████████▍          | 33922/50000 [6:09:08<3:02:38,  1.47it/s]


 68%|██████████████████████▍          | 33923/50000 [6:09:08<2:49:49,  1.58it/s]


 68%|██████████████████████▍          | 33924/50000 [6:09:09<3:04:29,  1.45it/s]


 68%|██████████████████████▍          | 33925/50000 [6:09:09<2:57:00,  1.51it/s]


 68%|██████████████████████▍          | 33926/50000 [6:09:10<2:51:16,  1.56it/s]


 68%|██████████████████████▍          | 33927/50000 [6:09:11<2:48:53,  1.59it/s]


 68%|██████████████████████▍          | 33928/50000 [6:09:11<2:44:38,  1.63it/s]


 68%|██████████████████████▍          | 33929/50000 [6:09:12<2:40:33,  1.67it/s]


 68%|██████████████████████▍          | 33930/50000 [6:09:13<2:50:34,  1.57it/s]


 68%|██████████████████████▍          | 33931/50000 [6:09:13<2:44:14,  1.63it/s]


 68%|██████████████████████▍          | 33932/50000 [6:09:14<2:42:17,  1.65it/s]


 68%|██████████████████████▍          | 33933/50000 [6:09:14<2:40:55,  1.66it/s]


 68%|██████████████████████▍          | 33934/50000 [6:09:15<2:41:25,  1.66it/s]


 68%|██████████████████████▍          | 33935/50000 [6:09:15<2:40:18,  1.67it/s]


 68%|██████████████████████▍          | 33936/50000 [6:09:16<2:36:38,  1.71it/s]


 68%|██████████████████████▍          | 33937/50000 [6:09:17<2:37:36,  1.70it/s]


 68%|██████████████████████▍          | 33938/50000 [6:09:17<2:32:26,  1.76it/s]


 68%|██████████████████████▍          | 33939/50000 [6:09:18<2:31:08,  1.77it/s]


 68%|██████████████████████▍          | 33940/50000 [6:09:18<2:34:43,  1.73it/s]


 68%|██████████████████████▍          | 33941/50000 [6:09:19<2:32:44,  1.75it/s]


 68%|██████████████████████▍          | 33942/50000 [6:09:19<2:35:50,  1.72it/s]


 68%|██████████████████████▍          | 33943/50000 [6:09:20<2:52:00,  1.56it/s]


 68%|██████████████████████▍          | 33944/50000 [6:09:21<2:51:55,  1.56it/s]


 68%|██████████████████████▍          | 33945/50000 [6:09:22<2:55:27,  1.53it/s]


 68%|██████████████████████▍          | 33946/50000 [6:09:22<2:50:48,  1.57it/s]


 68%|██████████████████████▍          | 33947/50000 [6:09:23<2:58:20,  1.50it/s]


 68%|██████████████████████▍          | 33948/50000 [6:09:23<2:43:28,  1.64it/s]


 68%|██████████████████████▍          | 33949/50000 [6:09:24<2:42:22,  1.65it/s]


 68%|██████████████████████▍          | 33950/50000 [6:09:25<3:00:19,  1.48it/s]


 68%|██████████████████████▍          | 33951/50000 [6:09:25<2:45:18,  1.62it/s]


 68%|██████████████████████▍          | 33952/50000 [6:09:26<2:50:12,  1.57it/s]


 68%|██████████████████████▍          | 33953/50000 [6:09:27<2:49:32,  1.58it/s]


 68%|██████████████████████▍          | 33954/50000 [6:09:27<2:48:59,  1.58it/s]


 68%|██████████████████████▍          | 33955/50000 [6:09:28<2:52:38,  1.55it/s]


 68%|██████████████████████▍          | 33956/50000 [6:09:29<2:53:42,  1.54it/s]


 68%|██████████████████████▍          | 33957/50000 [6:09:29<2:51:54,  1.56it/s]


 68%|██████████████████████▍          | 33958/50000 [6:09:30<2:48:36,  1.59it/s]


 68%|██████████████████████▍          | 33959/50000 [6:09:30<2:46:17,  1.61it/s]


 68%|██████████████████████▍          | 33960/50000 [6:09:31<2:46:52,  1.60it/s]


 68%|██████████████████████▍          | 33961/50000 [6:09:32<2:45:08,  1.62it/s]


 68%|██████████████████████▍          | 33962/50000 [6:09:32<2:39:48,  1.67it/s]


 68%|██████████████████████▍          | 33963/50000 [6:09:33<2:38:41,  1.68it/s]


 68%|██████████████████████▍          | 33964/50000 [6:09:33<2:38:51,  1.68it/s]


 68%|██████████████████████▍          | 33965/50000 [6:09:34<2:43:29,  1.63it/s]


 68%|██████████████████████▍          | 33966/50000 [6:09:35<2:39:40,  1.67it/s]


 68%|██████████████████████▍          | 33967/50000 [6:09:35<2:39:30,  1.68it/s]


 68%|██████████████████████▍          | 33968/50000 [6:09:36<2:46:32,  1.60it/s]


 68%|██████████████████████▍          | 33969/50000 [6:09:36<2:45:16,  1.62it/s]


 68%|██████████████████████▍          | 33970/50000 [6:09:37<2:45:58,  1.61it/s]


 68%|██████████████████████▍          | 33971/50000 [6:09:38<2:54:14,  1.53it/s]


 68%|██████████████████████▍          | 33972/50000 [6:09:38<2:50:08,  1.57it/s]


 68%|██████████████████████▍          | 33973/50000 [6:09:39<2:57:08,  1.51it/s]


 68%|██████████████████████▍          | 33974/50000 [6:09:40<2:50:18,  1.57it/s]


 68%|██████████████████████▍          | 33975/50000 [6:09:40<2:50:57,  1.56it/s]


 68%|██████████████████████▍          | 33976/50000 [6:09:41<2:46:21,  1.61it/s]


 68%|██████████████████████▍          | 33977/50000 [6:09:42<2:55:29,  1.52it/s]


 68%|██████████████████████▍          | 33978/50000 [6:09:42<2:49:35,  1.57it/s]


 68%|██████████████████████▍          | 33979/50000 [6:09:43<2:40:16,  1.67it/s]


 68%|██████████████████████▍          | 33980/50000 [6:09:43<2:42:27,  1.64it/s]


 68%|██████████████████████▍          | 33981/50000 [6:09:44<2:54:32,  1.53it/s]


 68%|██████████████████████▍          | 33982/50000 [6:09:45<3:15:33,  1.37it/s]


 68%|██████████████████████▍          | 33983/50000 [6:09:46<3:09:16,  1.41it/s]


 68%|██████████████████████▍          | 33984/50000 [6:09:46<3:03:42,  1.45it/s]


 68%|██████████████████████▍          | 33985/50000 [6:09:47<3:06:35,  1.43it/s]


 68%|██████████████████████▍          | 33986/50000 [6:09:48<2:56:47,  1.51it/s]


 68%|██████████████████████▍          | 33987/50000 [6:09:48<2:51:25,  1.56it/s]


 68%|██████████████████████▍          | 33988/50000 [6:09:49<2:52:43,  1.55it/s]


 68%|██████████████████████▍          | 33989/50000 [6:09:50<3:01:56,  1.47it/s]


 68%|██████████████████████▍          | 33990/50000 [6:09:51<3:09:25,  1.41it/s]


 68%|██████████████████████▍          | 33991/50000 [6:09:51<3:02:50,  1.46it/s]


 68%|██████████████████████▍          | 33992/50000 [6:09:52<2:56:07,  1.51it/s]


 68%|██████████████████████▍          | 33993/50000 [6:09:53<3:10:38,  1.40it/s]


 68%|██████████████████████▍          | 33994/50000 [6:09:53<3:04:03,  1.45it/s]


 68%|██████████████████████▍          | 33995/50000 [6:09:54<3:01:58,  1.47it/s]


 68%|██████████████████████▍          | 33996/50000 [6:09:55<3:02:06,  1.46it/s]


 68%|██████████████████████▍          | 33997/50000 [6:09:55<3:06:27,  1.43it/s]


 68%|██████████████████████▍          | 33998/50000 [6:09:56<3:03:53,  1.45it/s]


 68%|██████████████████████▍          | 33999/50000 [6:09:57<2:56:00,  1.52it/s]


 68%|██████████████████████▍          | 34000/50000 [6:09:57<2:58:03,  1.50it/s]
                                                                                
{'loss': 3.2345, 'grad_norm': 4.991178512573242, 'learning_rate': 0.00032, 'epoch': 1.78}

 68%|██████████████████████▍          | 34000/50000 [6:09:57<2:58:03,  1.50it/s]


 68%|██████████████████████▍          | 34001/50000 [6:09:58<3:01:42,  1.47it/s]


 68%|██████████████████████▍          | 34002/50000 [6:09:59<2:52:28,  1.55it/s]


 68%|██████████████████████▍          | 34003/50000 [6:09:59<2:57:38,  1.50it/s]


 68%|██████████████████████▍          | 34004/50000 [6:10:00<2:56:09,  1.51it/s]


 68%|██████████████████████▍          | 34005/50000 [6:10:00<2:53:08,  1.54it/s]


 68%|██████████████████████▍          | 34006/50000 [6:10:01<3:10:39,  1.40it/s]


 68%|██████████████████████▍          | 34007/50000 [6:10:02<3:05:10,  1.44it/s]


 68%|██████████████████████▍          | 34008/50000 [6:10:03<3:02:17,  1.46it/s]


 68%|██████████████████████▍          | 34009/50000 [6:10:03<3:00:17,  1.48it/s]


 68%|██████████████████████▍          | 34010/50000 [6:10:04<2:52:48,  1.54it/s]


 68%|██████████████████████▍          | 34011/50000 [6:10:04<2:41:45,  1.65it/s]


 68%|██████████████████████▍          | 34012/50000 [6:10:05<2:34:59,  1.72it/s]


 68%|██████████████████████▍          | 34013/50000 [6:10:06<2:38:35,  1.68it/s]


 68%|██████████████████████▍          | 34014/50000 [6:10:06<2:43:43,  1.63it/s]


 68%|██████████████████████▍          | 34015/50000 [6:10:07<2:39:40,  1.67it/s]


 68%|██████████████████████▍          | 34016/50000 [6:10:07<2:41:24,  1.65it/s]


 68%|██████████████████████▍          | 34017/50000 [6:10:08<2:51:22,  1.55it/s]


 68%|██████████████████████▍          | 34018/50000 [6:10:09<2:58:53,  1.49it/s]


 68%|██████████████████████▍          | 34019/50000 [6:10:10<2:56:19,  1.51it/s]


 68%|██████████████████████▍          | 34020/50000 [6:10:10<2:56:33,  1.51it/s]


 68%|██████████████████████▍          | 34021/50000 [6:10:11<2:55:09,  1.52it/s]


 68%|██████████████████████▍          | 34022/50000 [6:10:12<2:59:19,  1.49it/s]


 68%|██████████████████████▍          | 34023/50000 [6:10:12<2:59:15,  1.49it/s]


 68%|██████████████████████▍          | 34024/50000 [6:10:13<2:56:33,  1.51it/s]


 68%|██████████████████████▍          | 34025/50000 [6:10:13<2:49:08,  1.57it/s]


 68%|██████████████████████▍          | 34026/50000 [6:10:14<2:50:29,  1.56it/s]


 68%|██████████████████████▍          | 34027/50000 [6:10:15<2:58:40,  1.49it/s]


 68%|██████████████████████▍          | 34028/50000 [6:10:15<2:50:35,  1.56it/s]


 68%|██████████████████████▍          | 34029/50000 [6:10:16<2:56:04,  1.51it/s]


 68%|██████████████████████▍          | 34030/50000 [6:10:17<2:49:17,  1.57it/s]


 68%|██████████████████████▍          | 34031/50000 [6:10:17<2:45:01,  1.61it/s]


 68%|██████████████████████▍          | 34032/50000 [6:10:18<2:52:39,  1.54it/s]


 68%|██████████████████████▍          | 34033/50000 [6:10:19<2:48:21,  1.58it/s]


 68%|██████████████████████▍          | 34034/50000 [6:10:19<2:48:25,  1.58it/s]


 68%|██████████████████████▍          | 34035/50000 [6:10:20<2:57:14,  1.50it/s]


 68%|██████████████████████▍          | 34036/50000 [6:10:21<2:54:05,  1.53it/s]


 68%|██████████████████████▍          | 34037/50000 [6:10:21<2:55:26,  1.52it/s]


 68%|██████████████████████▍          | 34038/50000 [6:10:22<2:55:51,  1.51it/s]


 68%|██████████████████████▍          | 34039/50000 [6:10:23<2:56:09,  1.51it/s]


 68%|██████████████████████▍          | 34040/50000 [6:10:23<3:00:24,  1.47it/s]


 68%|██████████████████████▍          | 34041/50000 [6:10:24<2:54:14,  1.53it/s]


 68%|██████████████████████▍          | 34042/50000 [6:10:25<3:02:16,  1.46it/s]


 68%|██████████████████████▍          | 34043/50000 [6:10:25<2:55:05,  1.52it/s]


 68%|██████████████████████▍          | 34044/50000 [6:10:26<2:49:04,  1.57it/s]


 68%|██████████████████████▍          | 34045/50000 [6:10:26<2:45:48,  1.60it/s]


 68%|██████████████████████▍          | 34046/50000 [6:10:27<2:47:34,  1.59it/s]


 68%|██████████████████████▍          | 34047/50000 [6:10:28<2:43:46,  1.62it/s]


 68%|██████████████████████▍          | 34048/50000 [6:10:28<2:41:36,  1.65it/s]


 68%|██████████████████████▍          | 34049/50000 [6:10:29<2:52:24,  1.54it/s]


 68%|██████████████████████▍          | 34050/50000 [6:10:30<2:44:54,  1.61it/s]


 68%|██████████████████████▍          | 34051/50000 [6:10:30<2:48:08,  1.58it/s]


 68%|██████████████████████▍          | 34052/50000 [6:10:31<2:51:52,  1.55it/s]


 68%|██████████████████████▍          | 34053/50000 [6:10:32<2:53:18,  1.53it/s]


 68%|██████████████████████▍          | 34054/50000 [6:10:32<2:48:57,  1.57it/s]


 68%|██████████████████████▍          | 34055/50000 [6:10:33<2:50:56,  1.55it/s]


 68%|██████████████████████▍          | 34056/50000 [6:10:33<2:49:46,  1.57it/s]


 68%|██████████████████████▍          | 34057/50000 [6:10:34<2:42:25,  1.64it/s]


 68%|██████████████████████▍          | 34058/50000 [6:10:35<2:54:34,  1.52it/s]


 68%|██████████████████████▍          | 34059/50000 [6:10:35<2:49:46,  1.56it/s]


 68%|██████████████████████▍          | 34060/50000 [6:10:36<2:52:35,  1.54it/s]


 68%|██████████████████████▍          | 34061/50000 [6:10:37<2:48:52,  1.57it/s]


 68%|██████████████████████▍          | 34062/50000 [6:10:37<2:37:30,  1.69it/s]


 68%|██████████████████████▍          | 34063/50000 [6:10:38<2:37:18,  1.69it/s]


 68%|██████████████████████▍          | 34064/50000 [6:10:38<2:44:14,  1.62it/s]


 68%|██████████████████████▍          | 34065/50000 [6:10:39<2:40:26,  1.66it/s]


 68%|██████████████████████▍          | 34066/50000 [6:10:40<2:43:44,  1.62it/s]


 68%|██████████████████████▍          | 34067/50000 [6:10:40<2:51:49,  1.55it/s]


 68%|██████████████████████▍          | 34068/50000 [6:10:41<2:58:24,  1.49it/s]


 68%|██████████████████████▍          | 34069/50000 [6:10:42<3:05:35,  1.43it/s]


 68%|██████████████████████▍          | 34070/50000 [6:10:42<2:56:48,  1.50it/s]


 68%|██████████████████████▍          | 34071/50000 [6:10:43<2:47:38,  1.58it/s]


 68%|██████████████████████▍          | 34072/50000 [6:10:44<2:42:13,  1.64it/s]


 68%|██████████████████████▍          | 34073/50000 [6:10:44<2:53:42,  1.53it/s]


 68%|██████████████████████▍          | 34074/50000 [6:10:45<2:42:12,  1.64it/s]


 68%|██████████████████████▍          | 34075/50000 [6:10:45<2:37:58,  1.68it/s]


 68%|██████████████████████▍          | 34076/50000 [6:10:46<2:41:04,  1.65it/s]


 68%|██████████████████████▍          | 34077/50000 [6:10:47<2:57:12,  1.50it/s]


 68%|██████████████████████▍          | 34078/50000 [6:10:48<3:08:28,  1.41it/s]


 68%|██████████████████████▍          | 34079/50000 [6:10:48<3:03:17,  1.45it/s]


 68%|██████████████████████▍          | 34080/50000 [6:10:49<3:07:57,  1.41it/s]


 68%|██████████████████████▍          | 34081/50000 [6:10:50<3:01:58,  1.46it/s]


 68%|██████████████████████▍          | 34082/50000 [6:10:50<2:59:47,  1.48it/s]


 68%|██████████████████████▍          | 34083/50000 [6:10:51<2:53:40,  1.53it/s]


 68%|██████████████████████▍          | 34084/50000 [6:10:52<2:51:59,  1.54it/s]


 68%|██████████████████████▍          | 34085/50000 [6:10:52<3:00:30,  1.47it/s]


 68%|██████████████████████▍          | 34086/50000 [6:10:53<2:47:55,  1.58it/s]


 68%|██████████████████████▍          | 34087/50000 [6:10:53<2:53:59,  1.52it/s]


 68%|██████████████████████▍          | 34088/50000 [6:10:54<2:49:12,  1.57it/s]


 68%|██████████████████████▍          | 34089/50000 [6:10:55<3:03:22,  1.45it/s]


 68%|██████████████████████▍          | 34090/50000 [6:10:56<2:55:11,  1.51it/s]


 68%|██████████████████████▌          | 34091/50000 [6:10:56<2:55:29,  1.51it/s]


 68%|██████████████████████▌          | 34092/50000 [6:10:57<3:03:32,  1.44it/s]


 68%|██████████████████████▌          | 34093/50000 [6:10:58<3:06:03,  1.42it/s]


 68%|██████████████████████▌          | 34094/50000 [6:10:58<3:12:27,  1.38it/s]


 68%|██████████████████████▌          | 34095/50000 [6:10:59<3:04:16,  1.44it/s]


 68%|██████████████████████▌          | 34096/50000 [6:11:00<3:11:16,  1.39it/s]


 68%|██████████████████████▌          | 34097/50000 [6:11:01<3:07:39,  1.41it/s]


 68%|██████████████████████▌          | 34098/50000 [6:11:01<2:59:27,  1.48it/s]


 68%|██████████████████████▌          | 34099/50000 [6:11:02<2:53:29,  1.53it/s]


 68%|██████████████████████▌          | 34100/50000 [6:11:02<2:45:16,  1.60it/s]
                                                                                
{'loss': 3.2021, 'grad_norm': 4.424311637878418, 'learning_rate': 0.00031800000000000003, 'epoch': 1.79}

 68%|██████████████████████▌          | 34100/50000 [6:11:02<2:45:16,  1.60it/s]


 68%|██████████████████████▌          | 34101/50000 [6:11:03<2:42:02,  1.64it/s]


 68%|██████████████████████▌          | 34102/50000 [6:11:03<2:39:14,  1.66it/s]


 68%|██████████████████████▌          | 34103/50000 [6:11:04<2:42:17,  1.63it/s]


 68%|██████████████████████▌          | 34104/50000 [6:11:05<2:47:29,  1.58it/s]


 68%|██████████████████████▌          | 34105/50000 [6:11:05<2:56:25,  1.50it/s]


 68%|██████████████████████▌          | 34106/50000 [6:11:06<2:52:03,  1.54it/s]


 68%|██████████████████████▌          | 34107/50000 [6:11:07<2:53:54,  1.52it/s]


 68%|██████████████████████▌          | 34108/50000 [6:11:07<2:55:17,  1.51it/s]


 68%|██████████████████████▌          | 34109/50000 [6:11:08<3:04:03,  1.44it/s]


 68%|██████████████████████▌          | 34110/50000 [6:11:09<3:00:42,  1.47it/s]


 68%|██████████████████████▌          | 34111/50000 [6:11:10<2:57:48,  1.49it/s]


 68%|██████████████████████▌          | 34112/50000 [6:11:10<2:44:59,  1.60it/s]


 68%|██████████████████████▌          | 34113/50000 [6:11:11<2:49:18,  1.56it/s]


 68%|██████████████████████▌          | 34114/50000 [6:11:11<2:45:23,  1.60it/s]


 68%|██████████████████████▌          | 34115/50000 [6:11:12<2:46:20,  1.59it/s]


 68%|██████████████████████▌          | 34116/50000 [6:11:13<2:54:45,  1.51it/s]


 68%|██████████████████████▌          | 34117/50000 [6:11:13<2:50:13,  1.56it/s]


 68%|██████████████████████▌          | 34118/50000 [6:11:14<2:44:34,  1.61it/s]


 68%|██████████████████████▌          | 34119/50000 [6:11:15<2:47:07,  1.58it/s]


 68%|██████████████████████▌          | 34120/50000 [6:11:15<2:41:56,  1.63it/s]


 68%|██████████████████████▌          | 34121/50000 [6:11:16<2:40:46,  1.65it/s]


 68%|██████████████████████▌          | 34122/50000 [6:11:16<2:42:59,  1.62it/s]


 68%|██████████████████████▌          | 34123/50000 [6:11:17<2:43:32,  1.62it/s]


 68%|██████████████████████▌          | 34124/50000 [6:11:18<2:41:55,  1.63it/s]


 68%|██████████████████████▌          | 34125/50000 [6:11:18<2:50:12,  1.55it/s]


 68%|██████████████████████▌          | 34126/50000 [6:11:19<3:04:53,  1.43it/s]


 68%|██████████████████████▌          | 34127/50000 [6:11:20<3:03:10,  1.44it/s]


 68%|██████████████████████▌          | 34128/50000 [6:11:20<3:04:33,  1.43it/s]


 68%|██████████████████████▌          | 34129/50000 [6:11:21<3:11:10,  1.38it/s]


 68%|██████████████████████▌          | 34130/50000 [6:11:22<3:03:32,  1.44it/s]


 68%|██████████████████████▌          | 34131/50000 [6:11:23<3:01:10,  1.46it/s]


 68%|██████████████████████▌          | 34132/50000 [6:11:23<2:52:33,  1.53it/s]


 68%|██████████████████████▌          | 34133/50000 [6:11:24<2:45:01,  1.60it/s]


 68%|██████████████████████▌          | 34134/50000 [6:11:24<2:45:16,  1.60it/s]


 68%|██████████████████████▌          | 34135/50000 [6:11:25<2:43:24,  1.62it/s]


 68%|██████████████████████▌          | 34136/50000 [6:11:26<2:46:04,  1.59it/s]


 68%|██████████████████████▌          | 34137/50000 [6:11:26<2:56:50,  1.50it/s]


 68%|██████████████████████▌          | 34138/50000 [6:11:27<3:00:25,  1.47it/s]


 68%|██████████████████████▌          | 34139/50000 [6:11:28<2:50:38,  1.55it/s]


 68%|██████████████████████▌          | 34140/50000 [6:11:28<2:44:43,  1.60it/s]


 68%|██████████████████████▌          | 34141/50000 [6:11:29<3:00:02,  1.47it/s]


 68%|██████████████████████▌          | 34142/50000 [6:11:30<2:53:42,  1.52it/s]


 68%|██████████████████████▌          | 34143/50000 [6:11:30<2:52:16,  1.53it/s]


 68%|██████████████████████▌          | 34144/50000 [6:11:31<2:45:46,  1.59it/s]


 68%|██████████████████████▌          | 34145/50000 [6:11:31<2:42:12,  1.63it/s]


 68%|██████████████████████▌          | 34146/50000 [6:11:32<2:53:25,  1.52it/s]


 68%|██████████████████████▌          | 34147/50000 [6:11:33<2:51:05,  1.54it/s]


 68%|██████████████████████▌          | 34148/50000 [6:11:33<2:57:08,  1.49it/s]


 68%|██████████████████████▌          | 34149/50000 [6:11:34<2:48:11,  1.57it/s]


 68%|██████████████████████▌          | 34150/50000 [6:11:35<2:45:23,  1.60it/s]


 68%|██████████████████████▌          | 34151/50000 [6:11:35<2:46:24,  1.59it/s]


 68%|██████████████████████▌          | 34152/50000 [6:11:36<2:40:43,  1.64it/s]


 68%|██████████████████████▌          | 34153/50000 [6:11:36<2:42:38,  1.62it/s]


 68%|██████████████████████▌          | 34154/50000 [6:11:37<2:54:14,  1.52it/s]


 68%|██████████████████████▌          | 34155/50000 [6:11:38<2:43:20,  1.62it/s]


 68%|██████████████████████▌          | 34156/50000 [6:11:38<2:40:02,  1.65it/s]


 68%|██████████████████████▌          | 34157/50000 [6:11:39<2:42:40,  1.62it/s]


 68%|██████████████████████▌          | 34158/50000 [6:11:40<2:55:49,  1.50it/s]


 68%|██████████████████████▌          | 34159/50000 [6:11:40<2:52:41,  1.53it/s]


 68%|██████████████████████▌          | 34160/50000 [6:11:41<2:52:01,  1.53it/s]


 68%|██████████████████████▌          | 34161/50000 [6:11:42<2:48:01,  1.57it/s]


 68%|██████████████████████▌          | 34162/50000 [6:11:42<2:44:13,  1.61it/s]


 68%|██████████████████████▌          | 34163/50000 [6:11:43<2:34:25,  1.71it/s]


 68%|██████████████████████▌          | 34164/50000 [6:11:43<2:34:14,  1.71it/s]


 68%|██████████████████████▌          | 34165/50000 [6:11:44<2:33:08,  1.72it/s]


 68%|██████████████████████▌          | 34166/50000 [6:11:44<2:32:49,  1.73it/s]


 68%|██████████████████████▌          | 34167/50000 [6:11:45<2:32:43,  1.73it/s]


 68%|██████████████████████▌          | 34168/50000 [6:11:46<2:44:56,  1.60it/s]


 68%|██████████████████████▌          | 34169/50000 [6:11:46<2:53:58,  1.52it/s]


 68%|██████████████████████▌          | 34170/50000 [6:11:47<3:09:49,  1.39it/s]


 68%|██████████████████████▌          | 34171/50000 [6:11:48<3:18:06,  1.33it/s]


 68%|██████████████████████▌          | 34172/50000 [6:11:49<3:05:25,  1.42it/s]


 68%|██████████████████████▌          | 34173/50000 [6:11:50<3:08:48,  1.40it/s]


 68%|██████████████████████▌          | 34174/50000 [6:11:50<3:12:00,  1.37it/s]


 68%|██████████████████████▌          | 34175/50000 [6:11:51<3:00:29,  1.46it/s]


 68%|██████████████████████▌          | 34176/50000 [6:11:52<3:03:37,  1.44it/s]


 68%|██████████████████████▌          | 34177/50000 [6:11:52<3:09:21,  1.39it/s]


 68%|██████████████████████▌          | 34178/50000 [6:11:53<3:03:39,  1.44it/s]


 68%|██████████████████████▌          | 34179/50000 [6:11:54<3:13:36,  1.36it/s]


 68%|██████████████████████▌          | 34180/50000 [6:11:54<3:01:06,  1.46it/s]


 68%|██████████████████████▌          | 34181/50000 [6:11:55<3:00:19,  1.46it/s]


 68%|██████████████████████▌          | 34182/50000 [6:11:56<3:17:53,  1.33it/s]


 68%|██████████████████████▌          | 34183/50000 [6:11:57<3:08:36,  1.40it/s]


 68%|██████████████████████▌          | 34184/50000 [6:11:57<3:00:03,  1.46it/s]


 68%|██████████████████████▌          | 34185/50000 [6:11:58<2:57:34,  1.48it/s]


 68%|██████████████████████▌          | 34186/50000 [6:11:58<2:48:52,  1.56it/s]


 68%|██████████████████████▌          | 34187/50000 [6:11:59<2:48:44,  1.56it/s]


 68%|██████████████████████▌          | 34188/50000 [6:12:00<2:49:38,  1.55it/s]


 68%|██████████████████████▌          | 34189/50000 [6:12:00<2:44:19,  1.60it/s]


 68%|██████████████████████▌          | 34190/50000 [6:12:01<2:47:04,  1.58it/s]


 68%|██████████████████████▌          | 34191/50000 [6:12:02<2:44:57,  1.60it/s]


 68%|██████████████████████▌          | 34192/50000 [6:12:02<2:40:30,  1.64it/s]


 68%|██████████████████████▌          | 34193/50000 [6:12:03<2:43:21,  1.61it/s]


 68%|██████████████████████▌          | 34194/50000 [6:12:03<2:45:24,  1.59it/s]


 68%|██████████████████████▌          | 34195/50000 [6:12:04<2:41:50,  1.63it/s]


 68%|██████████████████████▌          | 34196/50000 [6:12:05<2:53:03,  1.52it/s]


 68%|██████████████████████▌          | 34197/50000 [6:12:05<2:47:40,  1.57it/s]


 68%|██████████████████████▌          | 34198/50000 [6:12:06<2:46:33,  1.58it/s]


 68%|██████████████████████▌          | 34199/50000 [6:12:07<2:49:06,  1.56it/s]


 68%|██████████████████████▌          | 34200/50000 [6:12:07<2:48:54,  1.56it/s]
                                                                                
{'loss': 3.2145, 'grad_norm': 3.410863161087036, 'learning_rate': 0.000316, 'epoch': 1.79}

 68%|██████████████████████▌          | 34200/50000 [6:12:07<2:48:54,  1.56it/s]


 68%|██████████████████████▌          | 34201/50000 [6:12:08<2:50:24,  1.55it/s]


 68%|██████████████████████▌          | 34202/50000 [6:12:08<2:43:59,  1.61it/s]


 68%|██████████████████████▌          | 34203/50000 [6:12:09<2:40:59,  1.64it/s]


 68%|██████████████████████▌          | 34204/50000 [6:12:10<2:36:25,  1.68it/s]


 68%|██████████████████████▌          | 34205/50000 [6:12:10<2:46:10,  1.58it/s]


 68%|██████████████████████▌          | 34206/50000 [6:12:11<2:50:30,  1.54it/s]


 68%|██████████████████████▌          | 34207/50000 [6:12:12<2:47:28,  1.57it/s]


 68%|██████████████████████▌          | 34208/50000 [6:12:13<3:04:25,  1.43it/s]


 68%|██████████████████████▌          | 34209/50000 [6:12:13<3:02:20,  1.44it/s]


 68%|██████████████████████▌          | 34210/50000 [6:12:14<2:58:16,  1.48it/s]


 68%|██████████████████████▌          | 34211/50000 [6:12:15<3:06:10,  1.41it/s]


 68%|██████████████████████▌          | 34212/50000 [6:12:15<2:53:54,  1.51it/s]


 68%|██████████████████████▌          | 34213/50000 [6:12:16<2:42:02,  1.62it/s]


 68%|██████████████████████▌          | 34214/50000 [6:12:16<2:39:53,  1.65it/s]


 68%|██████████████████████▌          | 34215/50000 [6:12:17<2:39:53,  1.65it/s]


 68%|██████████████████████▌          | 34216/50000 [6:12:18<2:51:08,  1.54it/s]


 68%|██████████████████████▌          | 34217/50000 [6:12:18<2:44:46,  1.60it/s]


 68%|██████████████████████▌          | 34218/50000 [6:12:19<2:40:18,  1.64it/s]


 68%|██████████████████████▌          | 34219/50000 [6:12:19<2:33:27,  1.71it/s]


 68%|██████████████████████▌          | 34220/50000 [6:12:20<2:39:25,  1.65it/s]


 68%|██████████████████████▌          | 34221/50000 [6:12:21<2:58:28,  1.47it/s]


 68%|██████████████████████▌          | 34222/50000 [6:12:21<2:55:50,  1.50it/s]


 68%|██████████████████████▌          | 34223/50000 [6:12:22<2:47:32,  1.57it/s]


 68%|██████████████████████▌          | 34224/50000 [6:12:23<2:55:22,  1.50it/s]


 68%|██████████████████████▌          | 34225/50000 [6:12:23<2:56:41,  1.49it/s]


 68%|██████████████████████▌          | 34226/50000 [6:12:24<2:50:35,  1.54it/s]


 68%|██████████████████████▌          | 34227/50000 [6:12:25<2:42:25,  1.62it/s]


 68%|██████████████████████▌          | 34228/50000 [6:12:25<2:40:36,  1.64it/s]


 68%|██████████████████████▌          | 34229/50000 [6:12:26<2:52:30,  1.52it/s]


 68%|██████████████████████▌          | 34230/50000 [6:12:27<2:51:10,  1.54it/s]


 68%|██████████████████████▌          | 34231/50000 [6:12:27<2:43:16,  1.61it/s]


 68%|██████████████████████▌          | 34232/50000 [6:12:28<2:46:18,  1.58it/s]


 68%|██████████████████████▌          | 34233/50000 [6:12:29<2:57:00,  1.48it/s]


 68%|██████████████████████▌          | 34234/50000 [6:12:29<2:51:47,  1.53it/s]


 68%|██████████████████████▌          | 34235/50000 [6:12:30<2:46:09,  1.58it/s]


 68%|██████████████████████▌          | 34236/50000 [6:12:31<3:03:00,  1.44it/s]


 68%|██████████████████████▌          | 34237/50000 [6:12:31<2:53:21,  1.52it/s]


 68%|██████████████████████▌          | 34238/50000 [6:12:32<2:49:22,  1.55it/s]


 68%|██████████████████████▌          | 34239/50000 [6:12:32<2:52:38,  1.52it/s]


 68%|██████████████████████▌          | 34240/50000 [6:12:33<3:01:05,  1.45it/s]


 68%|██████████████████████▌          | 34241/50000 [6:12:34<2:47:07,  1.57it/s]


 68%|██████████████████████▌          | 34242/50000 [6:12:34<2:42:33,  1.62it/s]


 68%|██████████████████████▌          | 34243/50000 [6:12:35<2:58:17,  1.47it/s]


 68%|██████████████████████▌          | 34244/50000 [6:12:36<3:02:03,  1.44it/s]


 68%|██████████████████████▌          | 34245/50000 [6:12:36<2:54:52,  1.50it/s]


 68%|██████████████████████▌          | 34246/50000 [6:12:37<2:55:44,  1.49it/s]


 68%|██████████████████████▌          | 34247/50000 [6:12:38<2:55:22,  1.50it/s]


 68%|██████████████████████▌          | 34248/50000 [6:12:38<2:47:11,  1.57it/s]


 68%|██████████████████████▌          | 34249/50000 [6:12:39<2:57:34,  1.48it/s]


 68%|██████████████████████▌          | 34250/50000 [6:12:40<3:05:49,  1.41it/s]


 69%|██████████████████████▌          | 34251/50000 [6:12:41<3:03:53,  1.43it/s]


 69%|██████████████████████▌          | 34252/50000 [6:12:41<3:00:33,  1.45it/s]


 69%|██████████████████████▌          | 34253/50000 [6:12:42<2:57:47,  1.48it/s]


 69%|██████████████████████▌          | 34254/50000 [6:12:43<2:56:20,  1.49it/s]


 69%|██████████████████████▌          | 34255/50000 [6:12:43<3:00:56,  1.45it/s]


 69%|██████████████████████▌          | 34256/50000 [6:12:44<3:00:12,  1.46it/s]


 69%|██████████████████████▌          | 34257/50000 [6:12:45<2:53:12,  1.51it/s]


 69%|██████████████████████▌          | 34258/50000 [6:12:45<2:47:44,  1.56it/s]


 69%|██████████████████████▌          | 34259/50000 [6:12:46<2:48:24,  1.56it/s]


 69%|██████████████████████▌          | 34260/50000 [6:12:46<2:41:25,  1.63it/s]


 69%|██████████████████████▌          | 34261/50000 [6:12:47<2:40:27,  1.63it/s]


 69%|██████████████████████▌          | 34262/50000 [6:12:48<2:49:00,  1.55it/s]


 69%|██████████████████████▌          | 34263/50000 [6:12:48<2:46:15,  1.58it/s]


 69%|██████████████████████▌          | 34264/50000 [6:12:49<2:43:52,  1.60it/s]


 69%|██████████████████████▌          | 34265/50000 [6:12:49<2:39:14,  1.65it/s]


 69%|██████████████████████▌          | 34266/50000 [6:12:50<2:42:24,  1.61it/s]


 69%|██████████████████████▌          | 34267/50000 [6:12:51<2:45:07,  1.59it/s]


 69%|██████████████████████▌          | 34268/50000 [6:12:51<2:47:33,  1.56it/s]


 69%|██████████████████████▌          | 34269/50000 [6:12:52<2:43:45,  1.60it/s]


 69%|██████████████████████▌          | 34270/50000 [6:12:53<2:45:02,  1.59it/s]


 69%|██████████████████████▌          | 34271/50000 [6:12:53<2:47:58,  1.56it/s]


 69%|██████████████████████▌          | 34272/50000 [6:12:54<2:37:52,  1.66it/s]


 69%|██████████████████████▌          | 34273/50000 [6:12:54<2:35:54,  1.68it/s]


 69%|██████████████████████▌          | 34274/50000 [6:12:55<2:42:57,  1.61it/s]


 69%|██████████████████████▌          | 34275/50000 [6:12:56<2:50:59,  1.53it/s]


 69%|██████████████████████▌          | 34276/50000 [6:12:57<2:57:37,  1.48it/s]


 69%|██████████████████████▌          | 34277/50000 [6:12:57<2:54:35,  1.50it/s]


 69%|██████████████████████▌          | 34278/50000 [6:12:58<2:46:35,  1.57it/s]


 69%|██████████████████████▌          | 34279/50000 [6:12:59<3:02:15,  1.44it/s]


 69%|██████████████████████▌          | 34280/50000 [6:12:59<2:56:22,  1.49it/s]


 69%|██████████████████████▋          | 34281/50000 [6:13:00<2:55:41,  1.49it/s]


 69%|██████████████████████▋          | 34282/50000 [6:13:00<2:44:13,  1.60it/s]


 69%|██████████████████████▋          | 34283/50000 [6:13:01<2:55:26,  1.49it/s]


 69%|██████████████████████▋          | 34284/50000 [6:13:02<2:54:23,  1.50it/s]


 69%|██████████████████████▋          | 34285/50000 [6:13:03<2:58:36,  1.47it/s]


 69%|██████████████████████▋          | 34286/50000 [6:13:03<2:52:54,  1.51it/s]


 69%|██████████████████████▋          | 34287/50000 [6:13:04<2:51:26,  1.53it/s]


 69%|██████████████████████▋          | 34288/50000 [6:13:04<2:55:52,  1.49it/s]


 69%|██████████████████████▋          | 34289/50000 [6:13:05<2:39:57,  1.64it/s]


 69%|██████████████████████▋          | 34290/50000 [6:13:06<2:45:14,  1.58it/s]


 69%|██████████████████████▋          | 34291/50000 [6:13:06<2:46:33,  1.57it/s]


 69%|██████████████████████▋          | 34292/50000 [6:13:07<2:44:01,  1.60it/s]


 69%|██████████████████████▋          | 34293/50000 [6:13:08<2:59:23,  1.46it/s]


 69%|██████████████████████▋          | 34294/50000 [6:13:08<2:49:51,  1.54it/s]


 69%|██████████████████████▋          | 34295/50000 [6:13:09<2:50:34,  1.53it/s]


 69%|██████████████████████▋          | 34296/50000 [6:13:10<2:45:37,  1.58it/s]


 69%|██████████████████████▋          | 34297/50000 [6:13:10<2:52:46,  1.51it/s]


 69%|██████████████████████▋          | 34298/50000 [6:13:11<3:01:48,  1.44it/s]


 69%|██████████████████████▋          | 34299/50000 [6:13:12<2:57:26,  1.47it/s]


 69%|██████████████████████▋          | 34300/50000 [6:13:12<3:03:02,  1.43it/s]
                                                                                
{'loss': 3.1966, 'grad_norm': 2.998995065689087, 'learning_rate': 0.000314, 'epoch': 1.8}

 69%|██████████████████████▋          | 34300/50000 [6:13:12<3:03:02,  1.43it/s]


 69%|██████████████████████▋          | 34301/50000 [6:13:13<2:57:51,  1.47it/s]


 69%|██████████████████████▋          | 34302/50000 [6:13:14<2:55:11,  1.49it/s]


 69%|██████████████████████▋          | 34303/50000 [6:13:15<3:13:44,  1.35it/s]


 69%|██████████████████████▋          | 34304/50000 [6:13:15<3:01:21,  1.44it/s]


 69%|██████████████████████▋          | 34305/50000 [6:13:16<3:11:59,  1.36it/s]


 69%|██████████████████████▋          | 34306/50000 [6:13:17<3:14:42,  1.34it/s]


 69%|██████████████████████▋          | 34307/50000 [6:13:17<3:03:59,  1.42it/s]


 69%|██████████████████████▋          | 34308/50000 [6:13:18<2:52:32,  1.52it/s]


 69%|██████████████████████▋          | 34309/50000 [6:13:19<2:51:27,  1.53it/s]


 69%|██████████████████████▋          | 34310/50000 [6:13:19<2:55:41,  1.49it/s]


 69%|██████████████████████▋          | 34311/50000 [6:13:20<2:55:55,  1.49it/s]


 69%|██████████████████████▋          | 34312/50000 [6:13:21<2:55:25,  1.49it/s]


 69%|██████████████████████▋          | 34313/50000 [6:13:21<2:59:34,  1.46it/s]


 69%|██████████████████████▋          | 34314/50000 [6:13:22<2:59:18,  1.46it/s]


 69%|██████████████████████▋          | 34315/50000 [6:13:23<2:58:10,  1.47it/s]


 69%|██████████████████████▋          | 34316/50000 [6:13:23<2:49:26,  1.54it/s]


 69%|██████████████████████▋          | 34317/50000 [6:13:24<2:51:43,  1.52it/s]


 69%|██████████████████████▋          | 34318/50000 [6:13:25<2:49:22,  1.54it/s]


 69%|██████████████████████▋          | 34319/50000 [6:13:25<2:41:20,  1.62it/s]


 69%|██████████████████████▋          | 34320/50000 [6:13:26<2:35:40,  1.68it/s]


 69%|██████████████████████▋          | 34321/50000 [6:13:26<2:39:35,  1.64it/s]


 69%|██████████████████████▋          | 34322/50000 [6:13:27<2:38:51,  1.64it/s]


 69%|██████████████████████▋          | 34323/50000 [6:13:28<2:41:48,  1.61it/s]


 69%|██████████████████████▋          | 34324/50000 [6:13:28<2:43:53,  1.59it/s]


 69%|██████████████████████▋          | 34325/50000 [6:13:29<2:37:55,  1.65it/s]


 69%|██████████████████████▋          | 34326/50000 [6:13:29<2:40:11,  1.63it/s]


 69%|██████████████████████▋          | 34327/50000 [6:13:30<2:43:13,  1.60it/s]


 69%|██████████████████████▋          | 34328/50000 [6:13:31<2:59:40,  1.45it/s]


 69%|██████████████████████▋          | 34329/50000 [6:13:32<2:56:52,  1.48it/s]


 69%|██████████████████████▋          | 34330/50000 [6:13:32<2:50:54,  1.53it/s]


 69%|██████████████████████▋          | 34331/50000 [6:13:33<2:58:49,  1.46it/s]


 69%|██████████████████████▋          | 34332/50000 [6:13:33<2:48:52,  1.55it/s]


 69%|██████████████████████▋          | 34333/50000 [6:13:34<3:02:10,  1.43it/s]


 69%|██████████████████████▋          | 34334/50000 [6:13:35<2:58:35,  1.46it/s]


 69%|██████████████████████▋          | 34335/50000 [6:13:36<2:55:05,  1.49it/s]


 69%|██████████████████████▋          | 34336/50000 [6:13:36<2:58:48,  1.46it/s]


 69%|██████████████████████▋          | 34337/50000 [6:13:37<2:49:19,  1.54it/s]


 69%|██████████████████████▋          | 34338/50000 [6:13:37<2:42:41,  1.60it/s]


 69%|██████████████████████▋          | 34339/50000 [6:13:38<2:58:18,  1.46it/s]


 69%|██████████████████████▋          | 34340/50000 [6:13:39<2:53:25,  1.50it/s]


 69%|██████████████████████▋          | 34341/50000 [6:13:40<2:50:51,  1.53it/s]


 69%|██████████████████████▋          | 34342/50000 [6:13:40<2:48:15,  1.55it/s]


 69%|██████████████████████▋          | 34343/50000 [6:13:41<2:46:24,  1.57it/s]


 69%|██████████████████████▋          | 34344/50000 [6:13:41<2:43:32,  1.60it/s]


 69%|██████████████████████▋          | 34345/50000 [6:13:42<2:55:37,  1.49it/s]


 69%|██████████████████████▋          | 34346/50000 [6:13:43<2:52:32,  1.51it/s]


 69%|██████████████████████▋          | 34347/50000 [6:13:43<2:47:51,  1.55it/s]


 69%|██████████████████████▋          | 34348/50000 [6:13:44<2:44:32,  1.59it/s]


 69%|██████████████████████▋          | 34349/50000 [6:13:45<2:51:10,  1.52it/s]


 69%|██████████████████████▋          | 34350/50000 [6:13:45<2:45:24,  1.58it/s]


 69%|██████████████████████▋          | 34351/50000 [6:13:46<2:52:28,  1.51it/s]


 69%|██████████████████████▋          | 34352/50000 [6:13:47<2:56:12,  1.48it/s]


 69%|██████████████████████▋          | 34353/50000 [6:13:47<2:43:13,  1.60it/s]


 69%|██████████████████████▋          | 34354/50000 [6:13:48<2:47:14,  1.56it/s]


 69%|██████████████████████▋          | 34355/50000 [6:13:48<2:41:10,  1.62it/s]


 69%|██████████████████████▋          | 34356/50000 [6:13:49<2:31:45,  1.72it/s]


 69%|██████████████████████▋          | 34357/50000 [6:13:50<2:38:12,  1.65it/s]


 69%|██████████████████████▋          | 34358/50000 [6:13:50<2:40:17,  1.63it/s]


 69%|██████████████████████▋          | 34359/50000 [6:13:51<2:42:41,  1.60it/s]


 69%|██████████████████████▋          | 34360/50000 [6:13:52<2:47:19,  1.56it/s]


 69%|██████████████████████▋          | 34361/50000 [6:13:52<2:54:42,  1.49it/s]


 69%|██████████████████████▋          | 34362/50000 [6:13:53<3:00:25,  1.44it/s]


 69%|██████████████████████▋          | 34363/50000 [6:13:54<3:01:45,  1.43it/s]


 69%|██████████████████████▋          | 34364/50000 [6:13:54<2:54:13,  1.50it/s]


 69%|██████████████████████▋          | 34365/50000 [6:13:55<2:48:27,  1.55it/s]


 69%|██████████████████████▋          | 34366/50000 [6:13:56<2:54:59,  1.49it/s]


 69%|██████████████████████▋          | 34367/50000 [6:13:57<3:07:14,  1.39it/s]


 69%|██████████████████████▋          | 34368/50000 [6:13:57<2:59:35,  1.45it/s]


 69%|██████████████████████▋          | 34369/50000 [6:13:58<2:55:50,  1.48it/s]


 69%|██████████████████████▋          | 34370/50000 [6:13:59<3:07:23,  1.39it/s]


 69%|██████████████████████▋          | 34371/50000 [6:13:59<3:03:08,  1.42it/s]


 69%|██████████████████████▋          | 34372/50000 [6:14:00<3:09:13,  1.38it/s]


 69%|██████████████████████▋          | 34373/50000 [6:14:01<2:59:25,  1.45it/s]


 69%|██████████████████████▋          | 34374/50000 [6:14:01<3:02:08,  1.43it/s]


 69%|██████████████████████▋          | 34375/50000 [6:14:02<3:00:36,  1.44it/s]


 69%|██████████████████████▋          | 34376/50000 [6:14:03<2:53:03,  1.50it/s]


 69%|██████████████████████▋          | 34377/50000 [6:14:03<2:44:42,  1.58it/s]


 69%|██████████████████████▋          | 34378/50000 [6:14:04<2:51:50,  1.52it/s]


 69%|██████████████████████▋          | 34379/50000 [6:14:05<2:47:14,  1.56it/s]


 69%|██████████████████████▋          | 34380/50000 [6:14:05<2:46:01,  1.57it/s]


 69%|██████████████████████▋          | 34381/50000 [6:14:06<2:53:44,  1.50it/s]


 69%|██████████████████████▋          | 34382/50000 [6:14:07<3:08:27,  1.38it/s]


 69%|██████████████████████▋          | 34383/50000 [6:14:07<2:58:52,  1.46it/s]


 69%|██████████████████████▋          | 34384/50000 [6:14:08<2:56:06,  1.48it/s]


 69%|██████████████████████▋          | 34385/50000 [6:14:09<2:59:09,  1.45it/s]


 69%|██████████████████████▋          | 34386/50000 [6:14:09<2:55:50,  1.48it/s]


 69%|██████████████████████▋          | 34387/50000 [6:14:10<2:54:57,  1.49it/s]


 69%|██████████████████████▋          | 34388/50000 [6:14:11<2:48:26,  1.54it/s]


 69%|██████████████████████▋          | 34389/50000 [6:14:11<2:48:25,  1.54it/s]


 69%|██████████████████████▋          | 34390/50000 [6:14:12<2:37:17,  1.65it/s]


 69%|██████████████████████▋          | 34391/50000 [6:14:12<2:36:32,  1.66it/s]


 69%|██████████████████████▋          | 34392/50000 [6:14:13<2:41:26,  1.61it/s]


 69%|██████████████████████▋          | 34393/50000 [6:14:14<2:43:52,  1.59it/s]


 69%|██████████████████████▋          | 34394/50000 [6:14:15<3:00:23,  1.44it/s]


 69%|██████████████████████▋          | 34395/50000 [6:14:15<2:46:04,  1.57it/s]


 69%|██████████████████████▋          | 34396/50000 [6:14:16<2:53:12,  1.50it/s]


 69%|██████████████████████▋          | 34397/50000 [6:14:16<2:50:09,  1.53it/s]


 69%|██████████████████████▋          | 34398/50000 [6:14:17<2:54:57,  1.49it/s]


 69%|██████████████████████▋          | 34399/50000 [6:14:18<2:42:44,  1.60it/s]


 69%|██████████████████████▋          | 34400/50000 [6:14:18<2:51:21,  1.52it/s]
                                                                                
{'loss': 3.2098, 'grad_norm': 3.540569305419922, 'learning_rate': 0.000312, 'epoch': 1.8}

 69%|██████████████████████▋          | 34400/50000 [6:14:18<2:51:21,  1.52it/s]


 69%|██████████████████████▋          | 34401/50000 [6:14:19<2:55:58,  1.48it/s]


 69%|██████████████████████▋          | 34402/50000 [6:14:20<2:58:30,  1.46it/s]


 69%|██████████████████████▋          | 34403/50000 [6:14:20<2:54:04,  1.49it/s]


 69%|██████████████████████▋          | 34404/50000 [6:14:21<3:06:18,  1.40it/s]


 69%|██████████████████████▋          | 34405/50000 [6:14:22<3:08:57,  1.38it/s]


 69%|██████████████████████▋          | 34406/50000 [6:14:23<3:02:07,  1.43it/s]


 69%|██████████████████████▋          | 34407/50000 [6:14:23<2:51:56,  1.51it/s]


 69%|██████████████████████▋          | 34408/50000 [6:14:24<3:04:57,  1.41it/s]


 69%|██████████████████████▋          | 34409/50000 [6:14:25<3:14:35,  1.34it/s]


 69%|██████████████████████▋          | 34410/50000 [6:14:26<3:08:01,  1.38it/s]


 69%|██████████████████████▋          | 34411/50000 [6:14:26<2:58:28,  1.46it/s]


 69%|██████████████████████▋          | 34412/50000 [6:14:27<2:55:17,  1.48it/s]


 69%|██████████████████████▋          | 34413/50000 [6:14:27<2:53:59,  1.49it/s]


 69%|██████████████████████▋          | 34414/50000 [6:14:28<2:47:40,  1.55it/s]


 69%|██████████████████████▋          | 34415/50000 [6:14:29<2:37:36,  1.65it/s]


 69%|██████████████████████▋          | 34416/50000 [6:14:29<2:47:43,  1.55it/s]


 69%|██████████████████████▋          | 34417/50000 [6:14:30<2:40:52,  1.61it/s]


 69%|██████████████████████▋          | 34418/50000 [6:14:30<2:39:33,  1.63it/s]


 69%|██████████████████████▋          | 34419/50000 [6:14:31<2:38:43,  1.64it/s]


 69%|██████████████████████▋          | 34420/50000 [6:14:32<2:37:37,  1.65it/s]


 69%|██████████████████████▋          | 34421/50000 [6:14:32<2:33:21,  1.69it/s]


 69%|██████████████████████▋          | 34422/50000 [6:14:33<2:31:48,  1.71it/s]


 69%|██████████████████████▋          | 34423/50000 [6:14:33<2:24:30,  1.80it/s]


 69%|██████████████████████▋          | 34424/50000 [6:14:34<2:31:24,  1.71it/s]


 69%|██████████████████████▋          | 34425/50000 [6:14:35<2:35:45,  1.67it/s]


 69%|██████████████████████▋          | 34426/50000 [6:14:35<2:40:49,  1.61it/s]


 69%|██████████████████████▋          | 34427/50000 [6:14:36<2:37:59,  1.64it/s]


 69%|██████████████████████▋          | 34428/50000 [6:14:37<2:46:58,  1.55it/s]


 69%|██████████████████████▋          | 34429/50000 [6:14:37<2:46:39,  1.56it/s]


 69%|██████████████████████▋          | 34430/50000 [6:14:38<2:41:32,  1.61it/s]


 69%|██████████████████████▋          | 34431/50000 [6:14:38<2:44:49,  1.57it/s]


 69%|██████████████████████▋          | 34432/50000 [6:14:39<2:48:05,  1.54it/s]


 69%|██████████████████████▋          | 34433/50000 [6:14:40<2:43:01,  1.59it/s]


 69%|██████████████████████▋          | 34434/50000 [6:14:40<2:52:58,  1.50it/s]


 69%|██████████████████████▋          | 34435/50000 [6:14:41<2:51:49,  1.51it/s]


 69%|██████████████████████▋          | 34436/50000 [6:14:42<2:46:06,  1.56it/s]


 69%|██████████████████████▋          | 34437/50000 [6:14:42<2:46:59,  1.55it/s]


 69%|██████████████████████▋          | 34438/50000 [6:14:43<2:49:36,  1.53it/s]


 69%|██████████████████████▋          | 34439/50000 [6:14:44<2:45:00,  1.57it/s]


 69%|██████████████████████▋          | 34440/50000 [6:14:44<2:46:46,  1.55it/s]


 69%|██████████████████████▋          | 34441/50000 [6:14:45<2:57:35,  1.46it/s]


 69%|██████████████████████▋          | 34442/50000 [6:14:46<3:00:11,  1.44it/s]


 69%|██████████████████████▋          | 34443/50000 [6:14:46<2:54:55,  1.48it/s]


 69%|██████████████████████▋          | 34444/50000 [6:14:47<2:58:05,  1.46it/s]


 69%|██████████████████████▋          | 34445/50000 [6:14:48<2:48:31,  1.54it/s]


 69%|██████████████████████▋          | 34446/50000 [6:14:48<2:50:38,  1.52it/s]


 69%|██████████████████████▋          | 34447/50000 [6:14:49<2:46:12,  1.56it/s]


 69%|██████████████████████▋          | 34448/50000 [6:14:50<2:46:35,  1.56it/s]


 69%|██████████████████████▋          | 34449/50000 [6:14:50<2:40:27,  1.62it/s]


 69%|██████████████████████▋          | 34450/50000 [6:14:51<2:37:36,  1.64it/s]


 69%|██████████████████████▋          | 34451/50000 [6:14:51<2:34:40,  1.68it/s]


 69%|██████████████████████▋          | 34452/50000 [6:14:52<2:30:39,  1.72it/s]


 69%|██████████████████████▋          | 34453/50000 [6:14:53<2:41:21,  1.61it/s]


 69%|██████████████████████▋          | 34454/50000 [6:14:53<2:36:27,  1.66it/s]


 69%|██████████████████████▋          | 34455/50000 [6:14:54<2:35:17,  1.67it/s]


 69%|██████████████████████▋          | 34456/50000 [6:14:54<2:33:59,  1.68it/s]


 69%|██████████████████████▋          | 34457/50000 [6:14:55<2:33:32,  1.69it/s]


 69%|██████████████████████▋          | 34458/50000 [6:14:56<2:37:40,  1.64it/s]


 69%|██████████████████████▋          | 34459/50000 [6:14:56<2:35:08,  1.67it/s]


 69%|██████████████████████▋          | 34460/50000 [6:14:57<2:27:15,  1.76it/s]


 69%|██████████████████████▋          | 34461/50000 [6:14:57<2:25:25,  1.78it/s]


 69%|██████████████████████▋          | 34462/50000 [6:14:58<2:27:04,  1.76it/s]


 69%|██████████████████████▋          | 34463/50000 [6:14:58<2:38:45,  1.63it/s]


 69%|██████████████████████▋          | 34464/50000 [6:14:59<2:42:19,  1.60it/s]


 69%|██████████████████████▋          | 34465/50000 [6:15:00<2:40:20,  1.61it/s]


 69%|██████████████████████▋          | 34466/50000 [6:15:00<2:34:35,  1.67it/s]


 69%|██████████████████████▋          | 34467/50000 [6:15:01<2:35:23,  1.67it/s]


 69%|██████████████████████▋          | 34468/50000 [6:15:01<2:38:26,  1.63it/s]


 69%|██████████████████████▋          | 34469/50000 [6:15:02<2:47:07,  1.55it/s]


 69%|██████████████████████▊          | 34470/50000 [6:15:03<2:41:46,  1.60it/s]


 69%|██████████████████████▊          | 34471/50000 [6:15:03<2:41:53,  1.60it/s]


 69%|██████████████████████▊          | 34472/50000 [6:15:04<2:39:59,  1.62it/s]


 69%|██████████████████████▊          | 34473/50000 [6:15:05<2:45:06,  1.57it/s]


 69%|██████████████████████▊          | 34474/50000 [6:15:05<2:51:45,  1.51it/s]


 69%|██████████████████████▊          | 34475/50000 [6:15:06<2:55:58,  1.47it/s]


 69%|██████████████████████▊          | 34476/50000 [6:15:07<2:52:18,  1.50it/s]


 69%|██████████████████████▊          | 34477/50000 [6:15:07<2:43:56,  1.58it/s]


 69%|██████████████████████▊          | 34478/50000 [6:15:08<2:44:25,  1.57it/s]


 69%|██████████████████████▊          | 34479/50000 [6:15:09<2:38:49,  1.63it/s]


 69%|██████████████████████▊          | 34480/50000 [6:15:10<3:11:44,  1.35it/s]


 69%|██████████████████████▊          | 34481/50000 [6:15:10<3:09:57,  1.36it/s]


 69%|██████████████████████▊          | 34482/50000 [6:15:11<3:01:35,  1.42it/s]


 69%|██████████████████████▊          | 34483/50000 [6:15:12<2:56:15,  1.47it/s]


 69%|██████████████████████▊          | 34484/50000 [6:15:12<3:00:32,  1.43it/s]


 69%|██████████████████████▊          | 34485/50000 [6:15:13<2:56:26,  1.47it/s]


 69%|██████████████████████▊          | 34486/50000 [6:15:14<2:52:44,  1.50it/s]


 69%|██████████████████████▊          | 34487/50000 [6:15:14<2:52:06,  1.50it/s]


 69%|██████████████████████▊          | 34488/50000 [6:15:15<2:51:38,  1.51it/s]


 69%|██████████████████████▊          | 34489/50000 [6:15:15<2:46:01,  1.56it/s]


 69%|██████████████████████▊          | 34490/50000 [6:15:16<2:43:38,  1.58it/s]


 69%|██████████████████████▊          | 34491/50000 [6:15:17<2:44:46,  1.57it/s]


 69%|██████████████████████▊          | 34492/50000 [6:15:17<2:41:10,  1.60it/s]


 69%|██████████████████████▊          | 34493/50000 [6:15:18<2:41:18,  1.60it/s]


 69%|██████████████████████▊          | 34494/50000 [6:15:19<2:45:58,  1.56it/s]


 69%|██████████████████████▊          | 34495/50000 [6:15:19<2:55:44,  1.47it/s]


 69%|██████████████████████▊          | 34496/50000 [6:15:20<3:01:06,  1.43it/s]


 69%|██████████████████████▊          | 34497/50000 [6:15:21<3:02:13,  1.42it/s]


 69%|██████████████████████▊          | 34498/50000 [6:15:21<2:43:47,  1.58it/s]


 69%|██████████████████████▊          | 34499/50000 [6:15:22<2:43:43,  1.58it/s]


 69%|██████████████████████▊          | 34500/50000 [6:15:23<2:57:19,  1.46it/s]
                                                                                
{'loss': 3.248, 'grad_norm': 4.682790279388428, 'learning_rate': 0.00031, 'epoch': 1.81}

 69%|██████████████████████▊          | 34500/50000 [6:15:23<2:57:19,  1.46it/s]


 69%|██████████████████████▊          | 34501/50000 [6:15:24<2:59:47,  1.44it/s]


 69%|██████████████████████▊          | 34502/50000 [6:15:24<2:54:56,  1.48it/s]


 69%|██████████████████████▊          | 34503/50000 [6:15:25<2:44:41,  1.57it/s]


 69%|██████████████████████▊          | 34504/50000 [6:15:25<2:38:28,  1.63it/s]


 69%|██████████████████████▊          | 34505/50000 [6:15:26<2:45:51,  1.56it/s]


 69%|██████████████████████▊          | 34506/50000 [6:15:27<2:39:46,  1.62it/s]


 69%|██████████████████████▊          | 34507/50000 [6:15:27<2:41:22,  1.60it/s]


 69%|██████████████████████▊          | 34508/50000 [6:15:28<2:44:36,  1.57it/s]


 69%|██████████████████████▊          | 34509/50000 [6:15:29<2:54:17,  1.48it/s]


 69%|██████████████████████▊          | 34510/50000 [6:15:29<2:42:27,  1.59it/s]


 69%|██████████████████████▊          | 34511/50000 [6:15:30<2:44:43,  1.57it/s]


 69%|██████████████████████▊          | 34512/50000 [6:15:30<2:42:20,  1.59it/s]


 69%|██████████████████████▊          | 34513/50000 [6:15:31<2:41:41,  1.60it/s]


 69%|██████████████████████▊          | 34514/50000 [6:15:32<2:40:12,  1.61it/s]


 69%|██████████████████████▊          | 34515/50000 [6:15:32<2:51:09,  1.51it/s]


 69%|██████████████████████▊          | 34516/50000 [6:15:33<2:51:17,  1.51it/s]


 69%|██████████████████████▊          | 34517/50000 [6:15:34<2:46:54,  1.55it/s]


 69%|██████████████████████▊          | 34518/50000 [6:15:34<2:40:56,  1.60it/s]


 69%|██████████████████████▊          | 34519/50000 [6:15:35<2:44:06,  1.57it/s]


 69%|██████████████████████▊          | 34520/50000 [6:15:36<2:43:24,  1.58it/s]


 69%|██████████████████████▊          | 34521/50000 [6:15:36<2:39:56,  1.61it/s]


 69%|██████████████████████▊          | 34522/50000 [6:15:37<2:47:38,  1.54it/s]


 69%|██████████████████████▊          | 34523/50000 [6:15:37<2:41:28,  1.60it/s]


 69%|██████████████████████▊          | 34524/50000 [6:15:38<2:45:36,  1.56it/s]


 69%|██████████████████████▊          | 34525/50000 [6:15:39<2:46:17,  1.55it/s]


 69%|██████████████████████▊          | 34526/50000 [6:15:40<3:02:56,  1.41it/s]


 69%|██████████████████████▊          | 34527/50000 [6:15:40<3:00:37,  1.43it/s]


 69%|██████████████████████▊          | 34528/50000 [6:15:41<2:54:37,  1.48it/s]


 69%|██████████████████████▊          | 34529/50000 [6:15:41<2:45:33,  1.56it/s]


 69%|██████████████████████▊          | 34530/50000 [6:15:42<2:41:00,  1.60it/s]


 69%|██████████████████████▊          | 34531/50000 [6:15:43<2:36:46,  1.64it/s]


 69%|██████████████████████▊          | 34532/50000 [6:15:43<2:38:22,  1.63it/s]


 69%|██████████████████████▊          | 34533/50000 [6:15:44<2:47:07,  1.54it/s]


 69%|██████████████████████▊          | 34534/50000 [6:15:44<2:37:11,  1.64it/s]


 69%|██████████████████████▊          | 34535/50000 [6:15:45<2:41:43,  1.59it/s]


 69%|██████████████████████▊          | 34536/50000 [6:15:46<2:37:52,  1.63it/s]


 69%|██████████████████████▊          | 34537/50000 [6:15:46<2:37:38,  1.63it/s]


 69%|██████████████████████▊          | 34538/50000 [6:15:47<2:40:46,  1.60it/s]


 69%|██████████████████████▊          | 34539/50000 [6:15:48<2:39:07,  1.62it/s]


 69%|██████████████████████▊          | 34540/50000 [6:15:48<2:37:57,  1.63it/s]


 69%|██████████████████████▊          | 34541/50000 [6:15:49<2:42:02,  1.59it/s]


 69%|██████████████████████▊          | 34542/50000 [6:15:49<2:38:33,  1.62it/s]


 69%|██████████████████████▊          | 34543/50000 [6:15:50<2:38:15,  1.63it/s]


 69%|██████████████████████▊          | 34544/50000 [6:15:51<2:45:47,  1.55it/s]


 69%|██████████████████████▊          | 34545/50000 [6:15:51<2:51:43,  1.50it/s]


 69%|██████████████████████▊          | 34546/50000 [6:15:52<2:46:19,  1.55it/s]


 69%|██████████████████████▊          | 34547/50000 [6:15:53<3:01:38,  1.42it/s]


 69%|██████████████████████▊          | 34548/50000 [6:15:54<2:58:38,  1.44it/s]


 69%|██████████████████████▊          | 34549/50000 [6:15:54<2:56:04,  1.46it/s]


 69%|██████████████████████▊          | 34550/50000 [6:15:55<2:48:25,  1.53it/s]


 69%|██████████████████████▊          | 34551/50000 [6:15:56<2:53:51,  1.48it/s]


 69%|██████████████████████▊          | 34552/50000 [6:15:56<2:53:35,  1.48it/s]


 69%|██████████████████████▊          | 34553/50000 [6:15:57<2:57:25,  1.45it/s]


 69%|██████████████████████▊          | 34554/50000 [6:15:58<2:54:06,  1.48it/s]


 69%|██████████████████████▊          | 34555/50000 [6:15:58<2:50:17,  1.51it/s]


 69%|██████████████████████▊          | 34556/50000 [6:15:59<2:44:11,  1.57it/s]


 69%|██████████████████████▊          | 34557/50000 [6:15:59<2:47:43,  1.53it/s]


 69%|██████████████████████▊          | 34558/50000 [6:16:00<2:39:27,  1.61it/s]


 69%|██████████████████████▊          | 34559/50000 [6:16:01<2:36:05,  1.65it/s]


 69%|██████████████████████▊          | 34560/50000 [6:16:01<2:46:39,  1.54it/s]


 69%|██████████████████████▊          | 34561/50000 [6:16:02<2:41:20,  1.59it/s]


 69%|██████████████████████▊          | 34562/50000 [6:16:03<2:38:57,  1.62it/s]


 69%|██████████████████████▊          | 34563/50000 [6:16:03<2:40:08,  1.61it/s]


 69%|██████████████████████▊          | 34564/50000 [6:16:04<2:45:09,  1.56it/s]


 69%|██████████████████████▊          | 34565/50000 [6:16:04<2:40:35,  1.60it/s]


 69%|██████████████████████▊          | 34566/50000 [6:16:05<2:36:54,  1.64it/s]


 69%|██████████████████████▊          | 34567/50000 [6:16:06<2:33:48,  1.67it/s]


 69%|██████████████████████▊          | 34568/50000 [6:16:06<2:31:15,  1.70it/s]


 69%|██████████████████████▊          | 34569/50000 [6:16:07<2:51:55,  1.50it/s]


 69%|██████████████████████▊          | 34570/50000 [6:16:08<2:47:12,  1.54it/s]


 69%|██████████████████████▊          | 34571/50000 [6:16:08<2:46:50,  1.54it/s]


 69%|██████████████████████▊          | 34572/50000 [6:16:09<2:48:33,  1.53it/s]


 69%|██████████████████████▊          | 34573/50000 [6:16:09<2:38:16,  1.62it/s]


 69%|██████████████████████▊          | 34574/50000 [6:16:10<2:41:04,  1.60it/s]


 69%|██████████████████████▊          | 34575/50000 [6:16:11<2:43:01,  1.58it/s]


 69%|██████████████████████▊          | 34576/50000 [6:16:11<2:44:53,  1.56it/s]


 69%|██████████████████████▊          | 34577/50000 [6:16:12<2:40:49,  1.60it/s]


 69%|██████████████████████▊          | 34578/50000 [6:16:13<2:45:17,  1.56it/s]


 69%|██████████████████████▊          | 34579/50000 [6:16:13<2:45:01,  1.56it/s]


 69%|██████████████████████▊          | 34580/50000 [6:16:14<2:39:27,  1.61it/s]


 69%|██████████████████████▊          | 34581/50000 [6:16:15<2:42:51,  1.58it/s]


 69%|██████████████████████▊          | 34582/50000 [6:16:15<2:51:14,  1.50it/s]


 69%|██████████████████████▊          | 34583/50000 [6:16:16<2:42:21,  1.58it/s]


 69%|██████████████████████▊          | 34584/50000 [6:16:17<2:53:53,  1.48it/s]


 69%|██████████████████████▊          | 34585/50000 [6:16:17<3:00:29,  1.42it/s]


 69%|██████████████████████▊          | 34586/50000 [6:16:18<2:46:08,  1.55it/s]


 69%|██████████████████████▊          | 34587/50000 [6:16:18<2:38:20,  1.62it/s]


 69%|██████████████████████▊          | 34588/50000 [6:16:19<2:39:12,  1.61it/s]


 69%|██████████████████████▊          | 34589/50000 [6:16:20<2:37:48,  1.63it/s]


 69%|██████████████████████▊          | 34590/50000 [6:16:20<2:48:41,  1.52it/s]


 69%|██████████████████████▊          | 34591/50000 [6:16:21<2:44:23,  1.56it/s]


 69%|██████████████████████▊          | 34592/50000 [6:16:22<2:40:25,  1.60it/s]


 69%|██████████████████████▊          | 34593/50000 [6:16:22<2:38:08,  1.62it/s]


 69%|██████████████████████▊          | 34594/50000 [6:16:23<2:50:24,  1.51it/s]


 69%|██████████████████████▊          | 34595/50000 [6:16:24<2:41:15,  1.59it/s]


 69%|██████████████████████▊          | 34596/50000 [6:16:24<2:29:45,  1.71it/s]


 69%|██████████████████████▊          | 34597/50000 [6:16:25<2:44:59,  1.56it/s]


 69%|██████████████████████▊          | 34598/50000 [6:16:25<2:40:48,  1.60it/s]


 69%|██████████████████████▊          | 34599/50000 [6:16:26<2:36:56,  1.64it/s]


 69%|██████████████████████▊          | 34600/50000 [6:16:27<2:34:43,  1.66it/s]
                                                                                
{'loss': 3.1899, 'grad_norm': 3.041595458984375, 'learning_rate': 0.000308, 'epoch': 1.81}

 69%|██████████████████████▊          | 34600/50000 [6:16:27<2:34:43,  1.66it/s]


 69%|██████████████████████▊          | 34601/50000 [6:16:27<2:53:06,  1.48it/s]


 69%|██████████████████████▊          | 34602/50000 [6:16:28<2:43:12,  1.57it/s]


 69%|██████████████████████▊          | 34603/50000 [6:16:29<2:39:33,  1.61it/s]


 69%|██████████████████████▊          | 34604/50000 [6:16:29<2:47:56,  1.53it/s]


 69%|██████████████████████▊          | 34605/50000 [6:16:30<2:50:13,  1.51it/s]


 69%|██████████████████████▊          | 34606/50000 [6:16:31<2:50:47,  1.50it/s]


 69%|██████████████████████▊          | 34607/50000 [6:16:31<2:54:07,  1.47it/s]


 69%|██████████████████████▊          | 34608/50000 [6:16:32<3:11:05,  1.34it/s]


 69%|██████████████████████▊          | 34609/50000 [6:16:33<3:05:25,  1.38it/s]


 69%|██████████████████████▊          | 34610/50000 [6:16:34<3:05:03,  1.39it/s]


 69%|██████████████████████▊          | 34611/50000 [6:16:34<3:12:47,  1.33it/s]


 69%|██████████████████████▊          | 34612/50000 [6:16:35<2:57:51,  1.44it/s]


 69%|██████████████████████▊          | 34613/50000 [6:16:36<2:52:42,  1.48it/s]


 69%|██████████████████████▊          | 34614/50000 [6:16:36<2:51:02,  1.50it/s]


 69%|██████████████████████▊          | 34615/50000 [6:16:37<2:38:58,  1.61it/s]


 69%|██████████████████████▊          | 34616/50000 [6:16:37<2:41:25,  1.59it/s]


 69%|██████████████████████▊          | 34617/50000 [6:16:38<2:51:10,  1.50it/s]


 69%|██████████████████████▊          | 34618/50000 [6:16:39<2:48:59,  1.52it/s]


 69%|██████████████████████▊          | 34619/50000 [6:16:39<2:44:04,  1.56it/s]


 69%|██████████████████████▊          | 34620/50000 [6:16:40<2:49:23,  1.51it/s]


 69%|██████████████████████▊          | 34621/50000 [6:16:41<2:48:42,  1.52it/s]


 69%|██████████████████████▊          | 34622/50000 [6:16:41<2:42:55,  1.57it/s]


 69%|██████████████████████▊          | 34623/50000 [6:16:42<2:46:37,  1.54it/s]


 69%|██████████████████████▊          | 34624/50000 [6:16:43<2:49:12,  1.51it/s]


 69%|██████████████████████▊          | 34625/50000 [6:16:43<2:50:31,  1.50it/s]


 69%|██████████████████████▊          | 34626/50000 [6:16:44<2:39:32,  1.61it/s]


 69%|██████████████████████▊          | 34627/50000 [6:16:44<2:31:50,  1.69it/s]


 69%|██████████████████████▊          | 34628/50000 [6:16:45<2:34:57,  1.65it/s]


 69%|██████████████████████▊          | 34629/50000 [6:16:46<2:41:03,  1.59it/s]


 69%|██████████████████████▊          | 34630/50000 [6:16:46<2:41:24,  1.59it/s]


 69%|██████████████████████▊          | 34631/50000 [6:16:47<2:41:38,  1.58it/s]


 69%|██████████████████████▊          | 34632/50000 [6:16:48<2:36:02,  1.64it/s]


 69%|██████████████████████▊          | 34633/50000 [6:16:48<2:39:49,  1.60it/s]


 69%|██████████████████████▊          | 34634/50000 [6:16:49<3:04:53,  1.39it/s]


 69%|██████████████████████▊          | 34635/50000 [6:16:50<2:58:31,  1.43it/s]


 69%|██████████████████████▊          | 34636/50000 [6:16:50<2:50:12,  1.50it/s]


 69%|██████████████████████▊          | 34637/50000 [6:16:51<2:45:20,  1.55it/s]


 69%|██████████████████████▊          | 34638/50000 [6:16:52<2:39:09,  1.61it/s]


 69%|██████████████████████▊          | 34639/50000 [6:16:52<2:39:40,  1.60it/s]


 69%|██████████████████████▊          | 34640/50000 [6:16:53<2:34:08,  1.66it/s]


 69%|██████████████████████▊          | 34641/50000 [6:16:53<2:39:25,  1.61it/s]


 69%|██████████████████████▊          | 34642/50000 [6:16:54<2:43:37,  1.56it/s]


 69%|██████████████████████▊          | 34643/50000 [6:16:55<2:42:21,  1.58it/s]


 69%|██████████████████████▊          | 34644/50000 [6:16:56<2:56:46,  1.45it/s]


 69%|██████████████████████▊          | 34645/50000 [6:16:56<2:52:03,  1.49it/s]


 69%|██████████████████████▊          | 34646/50000 [6:16:57<2:46:09,  1.54it/s]


 69%|██████████████████████▊          | 34647/50000 [6:16:57<2:47:52,  1.52it/s]


 69%|██████████████████████▊          | 34648/50000 [6:16:58<2:49:02,  1.51it/s]


 69%|██████████████████████▊          | 34649/50000 [6:16:59<2:57:20,  1.44it/s]


 69%|██████████████████████▊          | 34650/50000 [6:17:00<2:54:39,  1.46it/s]


 69%|██████████████████████▊          | 34651/50000 [6:17:00<3:05:50,  1.38it/s]


 69%|██████████████████████▊          | 34652/50000 [6:17:01<2:53:23,  1.48it/s]


 69%|██████████████████████▊          | 34653/50000 [6:17:02<2:49:02,  1.51it/s]


 69%|██████████████████████▊          | 34654/50000 [6:17:03<3:08:50,  1.35it/s]


 69%|██████████████████████▊          | 34655/50000 [6:17:03<2:55:57,  1.45it/s]


 69%|██████████████████████▊          | 34656/50000 [6:17:04<2:47:07,  1.53it/s]


 69%|██████████████████████▊          | 34657/50000 [6:17:04<2:48:28,  1.52it/s]


 69%|██████████████████████▊          | 34658/50000 [6:17:05<2:48:56,  1.51it/s]


 69%|██████████████████████▊          | 34659/50000 [6:17:06<2:42:29,  1.57it/s]


 69%|██████████████████████▉          | 34660/50000 [6:17:06<2:41:51,  1.58it/s]


 69%|██████████████████████▉          | 34661/50000 [6:17:07<2:41:23,  1.58it/s]


 69%|██████████████████████▉          | 34662/50000 [6:17:07<2:39:04,  1.61it/s]


 69%|██████████████████████▉          | 34663/50000 [6:17:08<2:43:48,  1.56it/s]


 69%|██████████████████████▉          | 34664/50000 [6:17:09<2:46:09,  1.54it/s]


 69%|██████████████████████▉          | 34665/50000 [6:17:09<2:45:03,  1.55it/s]


 69%|██████████████████████▉          | 34666/50000 [6:17:10<2:45:32,  1.54it/s]


 69%|██████████████████████▉          | 34667/50000 [6:17:11<2:40:32,  1.59it/s]


 69%|██████████████████████▉          | 34668/50000 [6:17:11<2:55:27,  1.46it/s]


 69%|██████████████████████▉          | 34669/50000 [6:17:12<2:52:18,  1.48it/s]


 69%|██████████████████████▉          | 34670/50000 [6:17:13<2:45:47,  1.54it/s]


 69%|██████████████████████▉          | 34671/50000 [6:17:13<2:38:19,  1.61it/s]


 69%|██████████████████████▉          | 34672/50000 [6:17:14<2:35:29,  1.64it/s]


 69%|██████████████████████▉          | 34673/50000 [6:17:15<2:43:51,  1.56it/s]


 69%|██████████████████████▉          | 34674/50000 [6:17:15<2:42:43,  1.57it/s]


 69%|██████████████████████▉          | 34675/50000 [6:17:16<2:41:54,  1.58it/s]


 69%|██████████████████████▉          | 34676/50000 [6:17:16<2:36:02,  1.64it/s]


 69%|██████████████████████▉          | 34677/50000 [6:17:17<2:46:37,  1.53it/s]


 69%|██████████████████████▉          | 34678/50000 [6:17:18<2:45:37,  1.54it/s]


 69%|██████████████████████▉          | 34679/50000 [6:17:18<2:47:22,  1.53it/s]


 69%|██████████████████████▉          | 34680/50000 [6:17:19<2:47:03,  1.53it/s]


 69%|██████████████████████▉          | 34681/50000 [6:17:20<2:43:00,  1.57it/s]


 69%|██████████████████████▉          | 34682/50000 [6:17:20<2:46:28,  1.53it/s]


 69%|██████████████████████▉          | 34683/50000 [6:17:21<2:51:57,  1.48it/s]


 69%|██████████████████████▉          | 34684/50000 [6:17:22<3:04:17,  1.39it/s]


 69%|██████████████████████▉          | 34685/50000 [6:17:23<2:59:25,  1.42it/s]


 69%|██████████████████████▉          | 34686/50000 [6:17:23<2:51:08,  1.49it/s]


 69%|██████████████████████▉          | 34687/50000 [6:17:24<3:02:14,  1.40it/s]


 69%|██████████████████████▉          | 34688/50000 [6:17:25<2:58:32,  1.43it/s]


 69%|██████████████████████▉          | 34689/50000 [6:17:25<2:48:40,  1.51it/s]


 69%|██████████████████████▉          | 34690/50000 [6:17:26<2:57:24,  1.44it/s]


 69%|██████████████████████▉          | 34691/50000 [6:17:27<3:09:43,  1.34it/s]


 69%|██████████████████████▉          | 34692/50000 [6:17:28<3:02:11,  1.40it/s]


 69%|██████████████████████▉          | 34693/50000 [6:17:28<3:04:52,  1.38it/s]


 69%|██████████████████████▉          | 34694/50000 [6:17:29<2:53:01,  1.47it/s]


 69%|██████████████████████▉          | 34695/50000 [6:17:29<2:51:29,  1.49it/s]


 69%|██████████████████████▉          | 34696/50000 [6:17:30<2:40:01,  1.59it/s]


 69%|██████████████████████▉          | 34697/50000 [6:17:31<2:41:53,  1.58it/s]


 69%|██████████████████████▉          | 34698/50000 [6:17:31<2:48:43,  1.51it/s]


 69%|██████████████████████▉          | 34699/50000 [6:17:32<2:54:01,  1.47it/s]


 69%|██████████████████████▉          | 34700/50000 [6:17:33<2:45:53,  1.54it/s]
                                                                                
{'loss': 3.2345, 'grad_norm': 2.8992249965667725, 'learning_rate': 0.000306, 'epoch': 1.82}

 69%|██████████████████████▉          | 34700/50000 [6:17:33<2:45:53,  1.54it/s]


 69%|██████████████████████▉          | 34701/50000 [6:17:33<2:44:08,  1.55it/s]


 69%|██████████████████████▉          | 34702/50000 [6:17:34<2:38:58,  1.60it/s]


 69%|██████████████████████▉          | 34703/50000 [6:17:35<2:37:30,  1.62it/s]


 69%|██████████████████████▉          | 34704/50000 [6:17:35<2:29:23,  1.71it/s]


 69%|██████████████████████▉          | 34705/50000 [6:17:36<2:23:35,  1.78it/s]


 69%|██████████████████████▉          | 34706/50000 [6:17:36<2:31:20,  1.68it/s]


 69%|██████████████████████▉          | 34707/50000 [6:17:37<2:40:52,  1.58it/s]


 69%|██████████████████████▉          | 34708/50000 [6:17:38<2:43:58,  1.55it/s]


 69%|██████████████████████▉          | 34709/50000 [6:17:38<2:41:14,  1.58it/s]


 69%|██████████████████████▉          | 34710/50000 [6:17:39<2:38:20,  1.61it/s]


 69%|██████████████████████▉          | 34711/50000 [6:17:39<2:40:40,  1.59it/s]


 69%|██████████████████████▉          | 34712/50000 [6:17:40<2:44:43,  1.55it/s]


 69%|██████████████████████▉          | 34713/50000 [6:17:41<2:38:23,  1.61it/s]


 69%|██████████████████████▉          | 34714/50000 [6:17:41<2:36:55,  1.62it/s]


 69%|██████████████████████▉          | 34715/50000 [6:17:42<2:41:19,  1.58it/s]


 69%|██████████████████████▉          | 34716/50000 [6:17:43<2:43:52,  1.55it/s]


 69%|██████████████████████▉          | 34717/50000 [6:17:43<2:45:40,  1.54it/s]


 69%|██████████████████████▉          | 34718/50000 [6:17:44<2:42:03,  1.57it/s]


 69%|██████████████████████▉          | 34719/50000 [6:17:44<2:38:35,  1.61it/s]


 69%|██████████████████████▉          | 34720/50000 [6:17:45<2:40:31,  1.59it/s]


 69%|██████████████████████▉          | 34721/50000 [6:17:46<2:37:03,  1.62it/s]


 69%|██████████████████████▉          | 34722/50000 [6:17:46<2:40:02,  1.59it/s]


 69%|██████████████████████▉          | 34723/50000 [6:17:47<2:51:43,  1.48it/s]


 69%|██████████████████████▉          | 34724/50000 [6:17:48<2:50:45,  1.49it/s]


 69%|██████████████████████▉          | 34725/50000 [6:17:48<2:48:22,  1.51it/s]


 69%|██████████████████████▉          | 34726/50000 [6:17:49<2:42:49,  1.56it/s]


 69%|██████████████████████▉          | 34727/50000 [6:17:50<2:42:46,  1.56it/s]


 69%|██████████████████████▉          | 34728/50000 [6:17:50<2:33:51,  1.65it/s]


 69%|██████████████████████▉          | 34729/50000 [6:17:51<2:32:11,  1.67it/s]


 69%|██████████████████████▉          | 34730/50000 [6:17:52<2:51:49,  1.48it/s]


 69%|██████████████████████▉          | 34731/50000 [6:17:52<2:39:11,  1.60it/s]


 69%|██████████████████████▉          | 34732/50000 [6:17:53<2:37:20,  1.62it/s]


 69%|██████████████████████▉          | 34733/50000 [6:17:53<2:41:51,  1.57it/s]


 69%|██████████████████████▉          | 34734/50000 [6:17:54<2:43:03,  1.56it/s]


 69%|██████████████████████▉          | 34735/50000 [6:17:55<2:33:34,  1.66it/s]


 69%|██████████████████████▉          | 34736/50000 [6:17:55<2:32:27,  1.67it/s]


 69%|██████████████████████▉          | 34737/50000 [6:17:56<2:21:57,  1.79it/s]


 69%|██████████████████████▉          | 34738/50000 [6:17:56<2:34:10,  1.65it/s]


 69%|██████████████████████▉          | 34739/50000 [6:17:57<2:58:30,  1.42it/s]


 69%|██████████████████████▉          | 34740/50000 [6:17:58<2:44:48,  1.54it/s]


 69%|██████████████████████▉          | 34741/50000 [6:17:59<2:52:02,  1.48it/s]


 69%|██████████████████████▉          | 34742/50000 [6:17:59<2:44:23,  1.55it/s]


 69%|██████████████████████▉          | 34743/50000 [6:18:00<3:05:11,  1.37it/s]


 69%|██████████████████████▉          | 34744/50000 [6:18:01<2:57:55,  1.43it/s]


 69%|██████████████████████▉          | 34745/50000 [6:18:01<3:00:44,  1.41it/s]


 69%|██████████████████████▉          | 34746/50000 [6:18:02<2:57:16,  1.43it/s]


 69%|██████████████████████▉          | 34747/50000 [6:18:03<2:54:58,  1.45it/s]


 69%|██████████████████████▉          | 34748/50000 [6:18:03<2:54:49,  1.45it/s]


 69%|██████████████████████▉          | 34749/50000 [6:18:04<2:46:30,  1.53it/s]


 70%|██████████████████████▉          | 34750/50000 [6:18:05<2:39:43,  1.59it/s]


 70%|██████████████████████▉          | 34751/50000 [6:18:05<2:50:05,  1.49it/s]


 70%|██████████████████████▉          | 34752/50000 [6:18:06<2:56:51,  1.44it/s]


 70%|██████████████████████▉          | 34753/50000 [6:18:07<2:50:22,  1.49it/s]


 70%|██████████████████████▉          | 34754/50000 [6:18:07<2:37:49,  1.61it/s]


 70%|██████████████████████▉          | 34755/50000 [6:18:08<2:39:23,  1.59it/s]


 70%|██████████████████████▉          | 34756/50000 [6:18:09<2:43:29,  1.55it/s]


 70%|██████████████████████▉          | 34757/50000 [6:18:09<2:59:45,  1.41it/s]


 70%|██████████████████████▉          | 34758/50000 [6:18:10<2:54:38,  1.45it/s]


 70%|██████████████████████▉          | 34759/50000 [6:18:11<2:49:46,  1.50it/s]


 70%|██████████████████████▉          | 34760/50000 [6:18:11<2:41:06,  1.58it/s]


 70%|██████████████████████▉          | 34761/50000 [6:18:12<2:35:25,  1.63it/s]


 70%|██████████████████████▉          | 34762/50000 [6:18:12<2:40:41,  1.58it/s]


 70%|██████████████████████▉          | 34763/50000 [6:18:13<2:35:59,  1.63it/s]


 70%|██████████████████████▉          | 34764/50000 [6:18:14<2:30:42,  1.68it/s]


 70%|██████████████████████▉          | 34765/50000 [6:18:14<2:28:25,  1.71it/s]


 70%|██████████████████████▉          | 34766/50000 [6:18:15<2:28:43,  1.71it/s]


 70%|██████████████████████▉          | 34767/50000 [6:18:15<2:31:49,  1.67it/s]


 70%|██████████████████████▉          | 34768/50000 [6:18:16<2:35:54,  1.63it/s]


 70%|██████████████████████▉          | 34769/50000 [6:18:17<2:40:16,  1.58it/s]


 70%|██████████████████████▉          | 34770/50000 [6:18:17<2:34:15,  1.65it/s]


 70%|██████████████████████▉          | 34771/50000 [6:18:18<2:32:52,  1.66it/s]


 70%|██████████████████████▉          | 34772/50000 [6:18:18<2:34:16,  1.65it/s]


 70%|██████████████████████▉          | 34773/50000 [6:18:19<2:42:00,  1.57it/s]


 70%|██████████████████████▉          | 34774/50000 [6:18:20<2:49:02,  1.50it/s]


 70%|██████████████████████▉          | 34775/50000 [6:18:21<2:54:24,  1.45it/s]


 70%|██████████████████████▉          | 34776/50000 [6:18:21<3:04:44,  1.37it/s]


 70%|██████████████████████▉          | 34777/50000 [6:18:22<2:50:50,  1.49it/s]


 70%|██████████████████████▉          | 34778/50000 [6:18:23<2:55:40,  1.44it/s]


 70%|██████████████████████▉          | 34779/50000 [6:18:23<2:50:44,  1.49it/s]


 70%|██████████████████████▉          | 34780/50000 [6:18:24<3:08:26,  1.35it/s]


 70%|██████████████████████▉          | 34781/50000 [6:18:25<3:02:29,  1.39it/s]


 70%|██████████████████████▉          | 34782/50000 [6:18:26<2:51:06,  1.48it/s]


 70%|██████████████████████▉          | 34783/50000 [6:18:26<2:49:20,  1.50it/s]


 70%|██████████████████████▉          | 34784/50000 [6:18:27<2:52:29,  1.47it/s]


 70%|██████████████████████▉          | 34785/50000 [6:18:28<2:57:14,  1.43it/s]


 70%|██████████████████████▉          | 34786/50000 [6:18:28<2:54:11,  1.46it/s]


 70%|██████████████████████▉          | 34787/50000 [6:18:29<2:52:30,  1.47it/s]


 70%|██████████████████████▉          | 34788/50000 [6:18:30<2:45:03,  1.54it/s]


 70%|██████████████████████▉          | 34789/50000 [6:18:30<2:47:02,  1.52it/s]


 70%|██████████████████████▉          | 34790/50000 [6:18:31<2:46:00,  1.53it/s]


 70%|██████████████████████▉          | 34791/50000 [6:18:31<2:44:47,  1.54it/s]


 70%|██████████████████████▉          | 34792/50000 [6:18:32<2:50:56,  1.48it/s]


 70%|██████████████████████▉          | 34793/50000 [6:18:33<2:45:23,  1.53it/s]


 70%|██████████████████████▉          | 34794/50000 [6:18:34<2:51:16,  1.48it/s]


 70%|██████████████████████▉          | 34795/50000 [6:18:34<2:47:05,  1.52it/s]


 70%|██████████████████████▉          | 34796/50000 [6:18:35<3:02:26,  1.39it/s]


 70%|██████████████████████▉          | 34797/50000 [6:18:36<2:47:25,  1.51it/s]


 70%|██████████████████████▉          | 34798/50000 [6:18:36<2:44:54,  1.54it/s]


 70%|██████████████████████▉          | 34799/50000 [6:18:37<2:45:57,  1.53it/s]


 70%|██████████████████████▉          | 34800/50000 [6:18:37<2:45:55,  1.53it/s]
                                                                                
{'loss': 3.2546, 'grad_norm': 2.9478862285614014, 'learning_rate': 0.000304, 'epoch': 1.82}

 70%|██████████████████████▉          | 34800/50000 [6:18:37<2:45:55,  1.53it/s]


 70%|██████████████████████▉          | 34801/50000 [6:18:38<2:46:33,  1.52it/s]


 70%|██████████████████████▉          | 34802/50000 [6:18:39<2:48:50,  1.50it/s]


 70%|██████████████████████▉          | 34803/50000 [6:18:39<2:43:33,  1.55it/s]


 70%|██████████████████████▉          | 34804/50000 [6:18:40<2:43:19,  1.55it/s]


 70%|██████████████████████▉          | 34805/50000 [6:18:41<2:45:10,  1.53it/s]


 70%|██████████████████████▉          | 34806/50000 [6:18:41<2:44:01,  1.54it/s]


 70%|██████████████████████▉          | 34807/50000 [6:18:42<2:46:13,  1.52it/s]


 70%|██████████████████████▉          | 34808/50000 [6:18:43<2:43:43,  1.55it/s]


 70%|██████████████████████▉          | 34809/50000 [6:18:43<2:39:22,  1.59it/s]


 70%|██████████████████████▉          | 34810/50000 [6:18:44<2:33:25,  1.65it/s]


 70%|██████████████████████▉          | 34811/50000 [6:18:45<2:43:47,  1.55it/s]


 70%|██████████████████████▉          | 34812/50000 [6:18:45<2:48:32,  1.50it/s]


 70%|██████████████████████▉          | 34813/50000 [6:18:46<2:42:13,  1.56it/s]


 70%|██████████████████████▉          | 34814/50000 [6:18:47<2:47:26,  1.51it/s]


 70%|██████████████████████▉          | 34815/50000 [6:18:47<2:53:08,  1.46it/s]


 70%|██████████████████████▉          | 34816/50000 [6:18:48<2:52:36,  1.47it/s]


 70%|██████████████████████▉          | 34817/50000 [6:18:49<2:48:52,  1.50it/s]


 70%|██████████████████████▉          | 34818/50000 [6:18:49<2:56:03,  1.44it/s]


 70%|██████████████████████▉          | 34819/50000 [6:18:50<2:54:17,  1.45it/s]


 70%|██████████████████████▉          | 34820/50000 [6:18:51<2:54:01,  1.45it/s]


 70%|██████████████████████▉          | 34821/50000 [6:18:51<2:44:16,  1.54it/s]


 70%|██████████████████████▉          | 34822/50000 [6:18:52<2:36:20,  1.62it/s]


 70%|██████████████████████▉          | 34823/50000 [6:18:52<2:37:28,  1.61it/s]


 70%|██████████████████████▉          | 34824/50000 [6:18:53<2:35:23,  1.63it/s]


 70%|██████████████████████▉          | 34825/50000 [6:18:54<2:45:41,  1.53it/s]


 70%|██████████████████████▉          | 34826/50000 [6:18:54<2:41:36,  1.56it/s]


 70%|██████████████████████▉          | 34827/50000 [6:18:55<2:34:56,  1.63it/s]


 70%|██████████████████████▉          | 34828/50000 [6:18:56<2:32:11,  1.66it/s]


 70%|██████████████████████▉          | 34829/50000 [6:18:56<2:34:35,  1.64it/s]


 70%|██████████████████████▉          | 34830/50000 [6:18:57<2:31:56,  1.66it/s]


 70%|██████████████████████▉          | 34831/50000 [6:18:57<2:37:42,  1.60it/s]


 70%|██████████████████████▉          | 34832/50000 [6:18:58<2:28:36,  1.70it/s]


 70%|██████████████████████▉          | 34833/50000 [6:18:59<2:32:31,  1.66it/s]


 70%|██████████████████████▉          | 34834/50000 [6:18:59<2:30:57,  1.67it/s]


 70%|██████████████████████▉          | 34835/50000 [6:19:00<2:34:45,  1.63it/s]


 70%|██████████████████████▉          | 34836/50000 [6:19:01<2:44:49,  1.53it/s]


 70%|██████████████████████▉          | 34837/50000 [6:19:01<2:40:10,  1.58it/s]


 70%|██████████████████████▉          | 34838/50000 [6:19:02<2:44:05,  1.54it/s]


 70%|██████████████████████▉          | 34839/50000 [6:19:03<2:59:54,  1.40it/s]


 70%|██████████████████████▉          | 34840/50000 [6:19:03<2:49:18,  1.49it/s]


 70%|██████████████████████▉          | 34841/50000 [6:19:04<3:00:01,  1.40it/s]


 70%|██████████████████████▉          | 34842/50000 [6:19:05<2:45:17,  1.53it/s]


 70%|██████████████████████▉          | 34843/50000 [6:19:05<2:45:16,  1.53it/s]


 70%|██████████████████████▉          | 34844/50000 [6:19:06<2:38:20,  1.60it/s]


 70%|██████████████████████▉          | 34845/50000 [6:19:06<2:38:13,  1.60it/s]


 70%|██████████████████████▉          | 34846/50000 [6:19:07<2:34:58,  1.63it/s]


 70%|██████████████████████▉          | 34847/50000 [6:19:08<2:36:28,  1.61it/s]


 70%|██████████████████████▉          | 34848/50000 [6:19:08<2:34:36,  1.63it/s]


 70%|███████████████████████          | 34849/50000 [6:19:09<2:38:05,  1.60it/s]


 70%|███████████████████████          | 34850/50000 [6:19:10<2:40:01,  1.58it/s]


 70%|███████████████████████          | 34851/50000 [6:19:10<2:34:43,  1.63it/s]


 70%|███████████████████████          | 34852/50000 [6:19:11<2:43:39,  1.54it/s]


 70%|███████████████████████          | 34853/50000 [6:19:11<2:37:14,  1.61it/s]


 70%|███████████████████████          | 34854/50000 [6:19:12<2:34:37,  1.63it/s]


 70%|███████████████████████          | 34855/50000 [6:19:13<2:33:54,  1.64it/s]


 70%|███████████████████████          | 34856/50000 [6:19:13<2:30:03,  1.68it/s]


 70%|███████████████████████          | 34857/50000 [6:19:14<2:24:10,  1.75it/s]


 70%|███████████████████████          | 34858/50000 [6:19:14<2:27:54,  1.71it/s]


 70%|███████████████████████          | 34859/50000 [6:19:15<2:34:19,  1.64it/s]


 70%|███████████████████████          | 34860/50000 [6:19:16<2:31:15,  1.67it/s]


 70%|███████████████████████          | 34861/50000 [6:19:16<2:28:33,  1.70it/s]


 70%|███████████████████████          | 34862/50000 [6:19:17<2:28:06,  1.70it/s]


 70%|███████████████████████          | 34863/50000 [6:19:17<2:33:58,  1.64it/s]


 70%|███████████████████████          | 34864/50000 [6:19:18<2:43:04,  1.55it/s]


 70%|███████████████████████          | 34865/50000 [6:19:19<2:45:24,  1.53it/s]


 70%|███████████████████████          | 34866/50000 [6:19:19<2:47:30,  1.51it/s]


 70%|███████████████████████          | 34867/50000 [6:19:20<2:36:20,  1.61it/s]


 70%|███████████████████████          | 34868/50000 [6:19:21<2:45:39,  1.52it/s]


 70%|███████████████████████          | 34869/50000 [6:19:21<2:46:14,  1.52it/s]


 70%|███████████████████████          | 34870/50000 [6:19:22<2:47:09,  1.51it/s]


 70%|███████████████████████          | 34871/50000 [6:19:23<2:43:58,  1.54it/s]


 70%|███████████████████████          | 34872/50000 [6:19:23<2:45:03,  1.53it/s]


 70%|███████████████████████          | 34873/50000 [6:19:24<2:52:11,  1.46it/s]


 70%|███████████████████████          | 34874/50000 [6:19:25<2:45:32,  1.52it/s]


 70%|███████████████████████          | 34875/50000 [6:19:25<2:40:53,  1.57it/s]


 70%|███████████████████████          | 34876/50000 [6:19:26<2:44:18,  1.53it/s]


 70%|███████████████████████          | 34877/50000 [6:19:26<2:31:40,  1.66it/s]


 70%|███████████████████████          | 34878/50000 [6:19:27<2:36:25,  1.61it/s]


 70%|███████████████████████          | 34879/50000 [6:19:28<2:41:11,  1.56it/s]


 70%|███████████████████████          | 34880/50000 [6:19:28<2:40:43,  1.57it/s]


 70%|███████████████████████          | 34881/50000 [6:19:29<2:50:09,  1.48it/s]


 70%|███████████████████████          | 34882/50000 [6:19:30<2:53:49,  1.45it/s]


 70%|███████████████████████          | 34883/50000 [6:19:31<2:56:25,  1.43it/s]


 70%|███████████████████████          | 34884/50000 [6:19:31<2:49:23,  1.49it/s]


 70%|███████████████████████          | 34885/50000 [6:19:32<3:00:42,  1.39it/s]


 70%|███████████████████████          | 34886/50000 [6:19:33<2:50:32,  1.48it/s]


 70%|███████████████████████          | 34887/50000 [6:19:33<2:42:56,  1.55it/s]


 70%|███████████████████████          | 34888/50000 [6:19:34<2:43:19,  1.54it/s]


 70%|███████████████████████          | 34889/50000 [6:19:35<2:42:10,  1.55it/s]


 70%|███████████████████████          | 34890/50000 [6:19:35<2:38:30,  1.59it/s]


 70%|███████████████████████          | 34891/50000 [6:19:36<2:34:31,  1.63it/s]


 70%|███████████████████████          | 34892/50000 [6:19:36<2:27:10,  1.71it/s]


 70%|███████████████████████          | 34893/50000 [6:19:37<2:33:42,  1.64it/s]


 70%|███████████████████████          | 34894/50000 [6:19:38<2:36:23,  1.61it/s]


 70%|███████████████████████          | 34895/50000 [6:19:38<2:32:03,  1.66it/s]


 70%|███████████████████████          | 34896/50000 [6:19:39<2:30:26,  1.67it/s]


 70%|███████████████████████          | 34897/50000 [6:19:39<2:30:18,  1.67it/s]


 70%|███████████████████████          | 34898/50000 [6:19:40<2:33:04,  1.64it/s]


 70%|███████████████████████          | 34899/50000 [6:19:40<2:29:55,  1.68it/s]


 70%|███████████████████████          | 34900/50000 [6:19:41<2:34:11,  1.63it/s]
                                                                                
{'loss': 3.2329, 'grad_norm': 3.1378726959228516, 'learning_rate': 0.000302, 'epoch': 1.83}

 70%|███████████████████████          | 34900/50000 [6:19:41<2:34:11,  1.63it/s]


 70%|███████████████████████          | 34901/50000 [6:19:42<2:33:22,  1.64it/s]


 70%|███████████████████████          | 34902/50000 [6:19:42<2:44:33,  1.53it/s]


 70%|███████████████████████          | 34903/50000 [6:19:43<2:43:08,  1.54it/s]


 70%|███████████████████████          | 34904/50000 [6:19:44<2:51:58,  1.46it/s]


 70%|███████████████████████          | 34905/50000 [6:19:45<2:55:36,  1.43it/s]


 70%|███████████████████████          | 34906/50000 [6:19:45<3:05:01,  1.36it/s]


 70%|███████████████████████          | 34907/50000 [6:19:46<3:03:42,  1.37it/s]


 70%|███████████████████████          | 34908/50000 [6:19:47<2:50:43,  1.47it/s]


 70%|███████████████████████          | 34909/50000 [6:19:48<3:03:27,  1.37it/s]


 70%|███████████████████████          | 34910/50000 [6:19:48<2:59:02,  1.40it/s]


 70%|███████████████████████          | 34911/50000 [6:19:49<2:53:05,  1.45it/s]


 70%|███████████████████████          | 34912/50000 [6:19:49<2:46:01,  1.51it/s]


 70%|███████████████████████          | 34913/50000 [6:19:50<2:46:22,  1.51it/s]


 70%|███████████████████████          | 34914/50000 [6:19:51<2:40:50,  1.56it/s]


 70%|███████████████████████          | 34915/50000 [6:19:51<2:38:03,  1.59it/s]


 70%|███████████████████████          | 34916/50000 [6:19:52<2:33:19,  1.64it/s]


 70%|███████████████████████          | 34917/50000 [6:19:53<2:41:38,  1.56it/s]


 70%|███████████████████████          | 34918/50000 [6:19:53<2:55:46,  1.43it/s]


 70%|███████████████████████          | 34919/50000 [6:19:54<2:49:01,  1.49it/s]


 70%|███████████████████████          | 34920/50000 [6:19:55<2:47:36,  1.50it/s]


 70%|███████████████████████          | 34921/50000 [6:19:55<2:48:06,  1.49it/s]


 70%|███████████████████████          | 34922/50000 [6:19:56<2:46:05,  1.51it/s]


 70%|███████████████████████          | 34923/50000 [6:19:57<2:57:27,  1.42it/s]


 70%|███████████████████████          | 34924/50000 [6:19:57<2:45:28,  1.52it/s]


 70%|███████████████████████          | 34925/50000 [6:19:58<2:39:54,  1.57it/s]


 70%|███████████████████████          | 34926/50000 [6:19:59<2:40:20,  1.57it/s]


 70%|███████████████████████          | 34927/50000 [6:19:59<2:41:29,  1.56it/s]


 70%|███████████████████████          | 34928/50000 [6:20:00<2:37:06,  1.60it/s]


 70%|███████████████████████          | 34929/50000 [6:20:00<2:33:03,  1.64it/s]


 70%|███████████████████████          | 34930/50000 [6:20:01<2:41:24,  1.56it/s]


 70%|███████████████████████          | 34931/50000 [6:20:02<2:58:30,  1.41it/s]


 70%|███████████████████████          | 34932/50000 [6:20:03<2:49:28,  1.48it/s]


 70%|███████████████████████          | 34933/50000 [6:20:04<3:10:21,  1.32it/s]


 70%|███████████████████████          | 34934/50000 [6:20:04<3:06:51,  1.34it/s]


 70%|███████████████████████          | 34935/50000 [6:20:05<2:53:26,  1.45it/s]


 70%|███████████████████████          | 34936/50000 [6:20:05<2:52:34,  1.45it/s]


 70%|███████████████████████          | 34937/50000 [6:20:06<2:49:33,  1.48it/s]


 70%|███████████████████████          | 34938/50000 [6:20:07<2:41:45,  1.55it/s]


 70%|███████████████████████          | 34939/50000 [6:20:07<2:34:53,  1.62it/s]


 70%|███████████████████████          | 34940/50000 [6:20:08<2:37:40,  1.59it/s]


 70%|███████████████████████          | 34941/50000 [6:20:09<2:36:14,  1.61it/s]


 70%|███████████████████████          | 34942/50000 [6:20:09<2:37:12,  1.60it/s]


 70%|███████████████████████          | 34943/50000 [6:20:10<2:38:21,  1.58it/s]


 70%|███████████████████████          | 34944/50000 [6:20:10<2:38:11,  1.59it/s]


 70%|███████████████████████          | 34945/50000 [6:20:11<2:41:26,  1.55it/s]


 70%|███████████████████████          | 34946/50000 [6:20:12<2:47:53,  1.49it/s]


 70%|███████████████████████          | 34947/50000 [6:20:12<2:41:59,  1.55it/s]


 70%|███████████████████████          | 34948/50000 [6:20:13<2:41:38,  1.55it/s]


 70%|███████████████████████          | 34949/50000 [6:20:14<2:35:14,  1.62it/s]


 70%|███████████████████████          | 34950/50000 [6:20:14<2:35:55,  1.61it/s]


 70%|███████████████████████          | 34951/50000 [6:20:15<2:33:59,  1.63it/s]


 70%|███████████████████████          | 34952/50000 [6:20:15<2:32:47,  1.64it/s]


 70%|███████████████████████          | 34953/50000 [6:20:16<2:30:56,  1.66it/s]


 70%|███████████████████████          | 34954/50000 [6:20:17<2:30:05,  1.67it/s]


 70%|███████████████████████          | 34955/50000 [6:20:17<2:36:37,  1.60it/s]


 70%|███████████████████████          | 34956/50000 [6:20:18<2:32:49,  1.64it/s]


 70%|███████████████████████          | 34957/50000 [6:20:18<2:29:29,  1.68it/s]


 70%|███████████████████████          | 34958/50000 [6:20:19<2:31:28,  1.66it/s]


 70%|███████████████████████          | 34959/50000 [6:20:20<2:29:59,  1.67it/s]


 70%|███████████████████████          | 34960/50000 [6:20:20<2:34:37,  1.62it/s]


 70%|███████████████████████          | 34961/50000 [6:20:21<2:31:43,  1.65it/s]


 70%|███████████████████████          | 34962/50000 [6:20:22<2:35:53,  1.61it/s]


 70%|███████████████████████          | 34963/50000 [6:20:22<2:34:28,  1.62it/s]


 70%|███████████████████████          | 34964/50000 [6:20:23<2:27:32,  1.70it/s]


 70%|███████████████████████          | 34965/50000 [6:20:23<2:39:15,  1.57it/s]


 70%|███████████████████████          | 34966/50000 [6:20:24<2:40:36,  1.56it/s]


 70%|███████████████████████          | 34967/50000 [6:20:25<2:42:02,  1.55it/s]


 70%|███████████████████████          | 34968/50000 [6:20:25<2:37:22,  1.59it/s]


 70%|███████████████████████          | 34969/50000 [6:20:26<2:38:21,  1.58it/s]


 70%|███████████████████████          | 34970/50000 [6:20:27<2:40:57,  1.56it/s]


 70%|███████████████████████          | 34971/50000 [6:20:27<2:44:11,  1.53it/s]


 70%|███████████████████████          | 34972/50000 [6:20:28<2:39:51,  1.57it/s]


 70%|███████████████████████          | 34973/50000 [6:20:28<2:30:54,  1.66it/s]


 70%|███████████████████████          | 34974/50000 [6:20:29<2:32:25,  1.64it/s]


 70%|███████████████████████          | 34975/50000 [6:20:30<2:36:46,  1.60it/s]


 70%|███████████████████████          | 34976/50000 [6:20:30<2:31:46,  1.65it/s]


 70%|███████████████████████          | 34977/50000 [6:20:31<2:31:05,  1.66it/s]


 70%|███████████████████████          | 34978/50000 [6:20:32<2:39:51,  1.57it/s]


 70%|███████████████████████          | 34979/50000 [6:20:32<2:35:48,  1.61it/s]


 70%|███████████████████████          | 34980/50000 [6:20:33<2:26:13,  1.71it/s]


 70%|███████████████████████          | 34981/50000 [6:20:33<2:27:41,  1.69it/s]


 70%|███████████████████████          | 34982/50000 [6:20:34<2:46:15,  1.51it/s]


 70%|███████████████████████          | 34983/50000 [6:20:35<2:40:27,  1.56it/s]


 70%|███████████████████████          | 34984/50000 [6:20:35<2:46:25,  1.50it/s]


 70%|███████████████████████          | 34985/50000 [6:20:36<2:45:36,  1.51it/s]


 70%|███████████████████████          | 34986/50000 [6:20:37<2:57:49,  1.41it/s]


 70%|███████████████████████          | 34987/50000 [6:20:38<2:53:16,  1.44it/s]


 70%|███████████████████████          | 34988/50000 [6:20:38<2:47:02,  1.50it/s]


 70%|███████████████████████          | 34989/50000 [6:20:39<2:41:07,  1.55it/s]


 70%|███████████████████████          | 34990/50000 [6:20:39<2:41:43,  1.55it/s]


 70%|███████████████████████          | 34991/50000 [6:20:40<2:39:53,  1.56it/s]


 70%|███████████████████████          | 34992/50000 [6:20:41<2:35:22,  1.61it/s]


 70%|███████████████████████          | 34993/50000 [6:20:41<2:28:04,  1.69it/s]


 70%|███████████████████████          | 34994/50000 [6:20:42<2:25:00,  1.72it/s]


 70%|███████████████████████          | 34995/50000 [6:20:42<2:18:06,  1.81it/s]


 70%|███████████████████████          | 34996/50000 [6:20:43<2:21:26,  1.77it/s]


 70%|███████████████████████          | 34997/50000 [6:20:43<2:27:30,  1.70it/s]


 70%|███████████████████████          | 34998/50000 [6:20:44<2:32:10,  1.64it/s]


 70%|███████████████████████          | 34999/50000 [6:20:45<2:40:50,  1.55it/s]


 70%|███████████████████████          | 35000/50000 [6:20:46<2:51:13,  1.46it/s]
                                                                                
{'loss': 3.2178, 'grad_norm': 2.7939293384552, 'learning_rate': 0.0003, 'epoch': 1.83}

 70%|███████████████████████          | 35000/50000 [6:20:46<2:51:13,  1.46it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A





 50%|██████████████████████▌                      | 2/4 [00:05<00:05,  2.86s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:07<00:02,  2.35s/it][A





100%|█████████████████████████████████████████████| 4/4 [00:12<00:00,  3.55s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 33.078584, 'eval_rouge-2': 8.306274, 'eval_rouge-l': 25.673636000000002, 'eval_bleu-4': 0.03771101132487006, 'eval_runtime': 20.204, 'eval_samples_per_second': 2.475, 'eval_steps_per_second': 0.198, 'epoch': 1.83}

 70%|███████████████████████          | 35000/50000 [6:21:06<2:51:13,  1.46it/s]

100%|█████████████████████████████████████████████| 4/4 [00:13<00:00,  3.55s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-35000


tokenizer config file saved in ./output/tmp-checkpoint-35000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-35000/special_tokens_map.json



 70%|██████████████████████▍         | 35001/50000 [6:21:07<28:17:21,  6.79s/it]


 70%|██████████████████████▍         | 35002/50000 [6:21:07<20:36:01,  4.94s/it]


 70%|██████████████████████▍         | 35003/50000 [6:21:08<15:14:31,  3.66s/it]


 70%|██████████████████████▍         | 35004/50000 [6:21:09<11:25:13,  2.74s/it]


 70%|███████████████████████          | 35005/50000 [6:21:09<8:52:47,  2.13s/it]


 70%|███████████████████████          | 35006/50000 [6:21:10<6:59:28,  1.68s/it]


 70%|███████████████████████          | 35007/50000 [6:21:11<5:48:20,  1.39s/it]


 70%|███████████████████████          | 35008/50000 [6:21:11<4:46:06,  1.15s/it]


 70%|███████████████████████          | 35009/50000 [6:21:12<4:08:38,  1.00it/s]


 70%|███████████████████████          | 35010/50000 [6:21:12<3:41:31,  1.13it/s]


 70%|███████████████████████          | 35011/50000 [6:21:13<3:26:17,  1.21it/s]


 70%|███████████████████████          | 35012/50000 [6:21:14<3:02:07,  1.37it/s]


 70%|███████████████████████          | 35013/50000 [6:21:14<2:56:48,  1.41it/s]


 70%|███████████████████████          | 35014/50000 [6:21:15<2:48:21,  1.48it/s]


 70%|███████████████████████          | 35015/50000 [6:21:15<2:42:52,  1.53it/s]


 70%|███████████████████████          | 35016/50000 [6:21:16<2:39:05,  1.57it/s]


 70%|███████████████████████          | 35017/50000 [6:21:17<2:38:49,  1.57it/s]


 70%|███████████████████████          | 35018/50000 [6:21:17<2:41:54,  1.54it/s]


 70%|███████████████████████          | 35019/50000 [6:21:18<2:49:59,  1.47it/s]


 70%|███████████████████████          | 35020/50000 [6:21:19<2:53:43,  1.44it/s]


 70%|███████████████████████          | 35021/50000 [6:21:20<2:55:49,  1.42it/s]


 70%|███████████████████████          | 35022/50000 [6:21:20<2:46:15,  1.50it/s]


 70%|███████████████████████          | 35023/50000 [6:21:21<2:50:34,  1.46it/s]


 70%|███████████████████████          | 35024/50000 [6:21:21<2:42:04,  1.54it/s]


 70%|███████████████████████          | 35025/50000 [6:21:22<2:32:05,  1.64it/s]


 70%|███████████████████████          | 35026/50000 [6:21:23<2:33:22,  1.63it/s]


 70%|███████████████████████          | 35027/50000 [6:21:23<2:34:17,  1.62it/s]


 70%|███████████████████████          | 35028/50000 [6:21:24<2:42:40,  1.53it/s]


 70%|███████████████████████          | 35029/50000 [6:21:25<2:40:20,  1.56it/s]


 70%|███████████████████████          | 35030/50000 [6:21:25<2:35:54,  1.60it/s]


 70%|███████████████████████          | 35031/50000 [6:21:26<2:45:46,  1.50it/s]


 70%|███████████████████████          | 35032/50000 [6:21:27<2:43:25,  1.53it/s]


 70%|███████████████████████          | 35033/50000 [6:21:27<2:37:03,  1.59it/s]


 70%|███████████████████████          | 35034/50000 [6:21:28<2:53:54,  1.43it/s]


 70%|███████████████████████          | 35035/50000 [6:21:29<2:43:54,  1.52it/s]


 70%|███████████████████████          | 35036/50000 [6:21:29<2:40:13,  1.56it/s]


 70%|███████████████████████          | 35037/50000 [6:21:30<2:35:27,  1.60it/s]


 70%|███████████████████████▏         | 35038/50000 [6:21:31<2:50:06,  1.47it/s]


 70%|███████████████████████▏         | 35039/50000 [6:21:31<2:44:12,  1.52it/s]


 70%|███████████████████████▏         | 35040/50000 [6:21:32<2:43:25,  1.53it/s]


 70%|███████████████████████▏         | 35041/50000 [6:21:32<2:39:32,  1.56it/s]


 70%|███████████████████████▏         | 35042/50000 [6:21:33<2:40:50,  1.55it/s]


 70%|███████████████████████▏         | 35043/50000 [6:21:34<2:40:18,  1.56it/s]


 70%|███████████████████████▏         | 35044/50000 [6:21:34<2:34:58,  1.61it/s]


 70%|███████████████████████▏         | 35045/50000 [6:21:35<2:35:31,  1.60it/s]


 70%|███████████████████████▏         | 35046/50000 [6:21:36<2:45:32,  1.51it/s]


 70%|███████████████████████▏         | 35047/50000 [6:21:36<2:44:46,  1.51it/s]


 70%|███████████████████████▏         | 35048/50000 [6:21:37<2:51:05,  1.46it/s]


 70%|███████████████████████▏         | 35049/50000 [6:21:38<2:48:44,  1.48it/s]


 70%|███████████████████████▏         | 35050/50000 [6:21:38<2:40:01,  1.56it/s]


 70%|███████████████████████▏         | 35051/50000 [6:21:39<2:33:59,  1.62it/s]


 70%|███████████████████████▏         | 35052/50000 [6:21:39<2:32:24,  1.63it/s]


 70%|███████████████████████▏         | 35053/50000 [6:21:40<2:39:41,  1.56it/s]


 70%|███████████████████████▏         | 35054/50000 [6:21:41<2:46:49,  1.49it/s]


 70%|███████████████████████▏         | 35055/50000 [6:21:42<2:43:10,  1.53it/s]


 70%|███████████████████████▏         | 35056/50000 [6:21:42<2:37:17,  1.58it/s]


 70%|███████████████████████▏         | 35057/50000 [6:21:43<2:38:48,  1.57it/s]


 70%|███████████████████████▏         | 35058/50000 [6:21:43<2:42:14,  1.53it/s]


 70%|███████████████████████▏         | 35059/50000 [6:21:44<2:37:08,  1.58it/s]


 70%|███████████████████████▏         | 35060/50000 [6:21:45<2:35:23,  1.60it/s]


 70%|███████████████████████▏         | 35061/50000 [6:21:45<2:43:18,  1.52it/s]


 70%|███████████████████████▏         | 35062/50000 [6:21:46<2:43:30,  1.52it/s]


 70%|███████████████████████▏         | 35063/50000 [6:21:47<2:45:02,  1.51it/s]


 70%|███████████████████████▏         | 35064/50000 [6:21:47<2:43:43,  1.52it/s]


 70%|███████████████████████▏         | 35065/50000 [6:21:48<2:43:18,  1.52it/s]


 70%|███████████████████████▏         | 35066/50000 [6:21:49<2:38:18,  1.57it/s]


 70%|███████████████████████▏         | 35067/50000 [6:21:49<2:38:06,  1.57it/s]


 70%|███████████████████████▏         | 35068/50000 [6:21:50<2:34:38,  1.61it/s]


 70%|███████████████████████▏         | 35069/50000 [6:21:50<2:36:28,  1.59it/s]


 70%|███████████████████████▏         | 35070/50000 [6:21:51<2:33:41,  1.62it/s]


 70%|███████████████████████▏         | 35071/50000 [6:21:52<2:35:54,  1.60it/s]


 70%|███████████████████████▏         | 35072/50000 [6:21:52<2:45:33,  1.50it/s]


 70%|███████████████████████▏         | 35073/50000 [6:21:53<2:50:22,  1.46it/s]


 70%|███████████████████████▏         | 35074/50000 [6:21:54<2:48:26,  1.48it/s]


 70%|███████████████████████▏         | 35075/50000 [6:21:54<2:46:04,  1.50it/s]


 70%|███████████████████████▏         | 35076/50000 [6:21:55<2:40:38,  1.55it/s]


 70%|███████████████████████▏         | 35077/50000 [6:21:56<2:42:02,  1.53it/s]


 70%|███████████████████████▏         | 35078/50000 [6:21:56<2:48:51,  1.47it/s]


 70%|███████████████████████▏         | 35079/50000 [6:21:57<3:09:08,  1.31it/s]


 70%|███████████████████████▏         | 35080/50000 [6:21:58<3:05:18,  1.34it/s]


 70%|███████████████████████▏         | 35081/50000 [6:21:59<3:04:15,  1.35it/s]


 70%|███████████████████████▏         | 35082/50000 [6:22:00<3:09:50,  1.31it/s]


 70%|███████████████████████▏         | 35083/50000 [6:22:00<3:03:23,  1.36it/s]


 70%|███████████████████████▏         | 35084/50000 [6:22:01<2:56:31,  1.41it/s]


 70%|███████████████████████▏         | 35085/50000 [6:22:02<2:57:08,  1.40it/s]


 70%|███████████████████████▏         | 35086/50000 [6:22:02<2:52:08,  1.44it/s]


 70%|███████████████████████▏         | 35087/50000 [6:22:03<2:51:01,  1.45it/s]


 70%|███████████████████████▏         | 35088/50000 [6:22:04<2:43:10,  1.52it/s]


 70%|███████████████████████▏         | 35089/50000 [6:22:04<2:35:50,  1.59it/s]


 70%|███████████████████████▏         | 35090/50000 [6:22:05<2:45:40,  1.50it/s]


 70%|███████████████████████▏         | 35091/50000 [6:22:06<2:39:09,  1.56it/s]


 70%|███████████████████████▏         | 35092/50000 [6:22:06<2:32:02,  1.63it/s]


 70%|███████████████████████▏         | 35093/50000 [6:22:07<2:37:26,  1.58it/s]


 70%|███████████████████████▏         | 35094/50000 [6:22:07<2:38:56,  1.56it/s]


 70%|███████████████████████▏         | 35095/50000 [6:22:08<2:33:47,  1.62it/s]


 70%|███████████████████████▏         | 35096/50000 [6:22:09<2:31:32,  1.64it/s]


 70%|███████████████████████▏         | 35097/50000 [6:22:09<2:28:44,  1.67it/s]


 70%|███████████████████████▏         | 35098/50000 [6:22:10<2:34:12,  1.61it/s]


 70%|███████████████████████▏         | 35099/50000 [6:22:10<2:34:38,  1.61it/s]


 70%|███████████████████████▏         | 35100/50000 [6:22:11<2:44:34,  1.51it/s]


                                                                                
{'loss': 3.2265, 'grad_norm': 3.2233057022094727, 'learning_rate': 0.000298, 'epoch': 1.84}

 70%|███████████████████████▏         | 35100/50000 [6:22:11<2:44:34,  1.51it/s]


 70%|███████████████████████▏         | 35101/50000 [6:22:12<2:43:47,  1.52it/s]


 70%|███████████████████████▏         | 35102/50000 [6:22:12<2:37:07,  1.58it/s]


 70%|███████████████████████▏         | 35103/50000 [6:22:13<2:37:09,  1.58it/s]


 70%|███████████████████████▏         | 35104/50000 [6:22:14<2:32:30,  1.63it/s]


 70%|███████████████████████▏         | 35105/50000 [6:22:14<2:31:34,  1.64it/s]


 70%|███████████████████████▏         | 35106/50000 [6:22:15<2:28:33,  1.67it/s]


 70%|███████████████████████▏         | 35107/50000 [6:22:15<2:27:21,  1.68it/s]


 70%|███████████████████████▏         | 35108/50000 [6:22:16<2:44:21,  1.51it/s]


 70%|███████████████████████▏         | 35109/50000 [6:22:17<2:50:51,  1.45it/s]


 70%|███████████████████████▏         | 35110/50000 [6:22:18<2:44:22,  1.51it/s]


 70%|███████████████████████▏         | 35111/50000 [6:22:18<2:41:16,  1.54it/s]


 70%|███████████████████████▏         | 35112/50000 [6:22:19<2:56:30,  1.41it/s]


 70%|███████████████████████▏         | 35113/50000 [6:22:20<2:51:08,  1.45it/s]


 70%|███████████████████████▏         | 35114/50000 [6:22:20<2:48:17,  1.47it/s]


 70%|███████████████████████▏         | 35115/50000 [6:22:21<2:42:33,  1.53it/s]


 70%|███████████████████████▏         | 35116/50000 [6:22:21<2:36:42,  1.58it/s]


 70%|███████████████████████▏         | 35117/50000 [6:22:22<2:38:40,  1.56it/s]


 70%|███████████████████████▏         | 35118/50000 [6:22:23<2:23:37,  1.73it/s]


 70%|███████████████████████▏         | 35119/50000 [6:22:23<2:26:42,  1.69it/s]


 70%|███████████████████████▏         | 35120/50000 [6:22:24<2:31:11,  1.64it/s]


 70%|███████████████████████▏         | 35121/50000 [6:22:25<2:34:19,  1.61it/s]


 70%|███████████████████████▏         | 35122/50000 [6:22:25<2:35:35,  1.59it/s]


 70%|███████████████████████▏         | 35123/50000 [6:22:26<2:34:11,  1.61it/s]


 70%|███████████████████████▏         | 35124/50000 [6:22:26<2:22:47,  1.74it/s]


 70%|███████████████████████▏         | 35125/50000 [6:22:27<2:25:45,  1.70it/s]


 70%|███████████████████████▏         | 35126/50000 [6:22:27<2:22:38,  1.74it/s]


 70%|███████████████████████▏         | 35127/50000 [6:22:28<2:33:03,  1.62it/s]


 70%|███████████████████████▏         | 35128/50000 [6:22:29<2:36:07,  1.59it/s]


 70%|███████████████████████▏         | 35129/50000 [6:22:29<2:26:44,  1.69it/s]


 70%|███████████████████████▏         | 35130/50000 [6:22:30<2:29:47,  1.65it/s]


 70%|███████████████████████▏         | 35131/50000 [6:22:31<2:42:55,  1.52it/s]


 70%|███████████████████████▏         | 35132/50000 [6:22:32<2:56:35,  1.40it/s]


 70%|███████████████████████▏         | 35133/50000 [6:22:32<2:50:12,  1.46it/s]


 70%|███████████████████████▏         | 35134/50000 [6:22:33<2:59:47,  1.38it/s]


 70%|███████████████████████▏         | 35135/50000 [6:22:34<2:54:46,  1.42it/s]


 70%|███████████████████████▏         | 35136/50000 [6:22:34<2:49:50,  1.46it/s]


 70%|███████████████████████▏         | 35137/50000 [6:22:35<2:39:51,  1.55it/s]


 70%|███████████████████████▏         | 35138/50000 [6:22:36<2:46:08,  1.49it/s]


 70%|███████████████████████▏         | 35139/50000 [6:22:36<2:42:25,  1.52it/s]


 70%|███████████████████████▏         | 35140/50000 [6:22:37<2:47:28,  1.48it/s]


 70%|███████████████████████▏         | 35141/50000 [6:22:38<2:47:32,  1.48it/s]


 70%|███████████████████████▏         | 35142/50000 [6:22:38<2:46:13,  1.49it/s]


 70%|███████████████████████▏         | 35143/50000 [6:22:39<2:37:16,  1.57it/s]


 70%|███████████████████████▏         | 35144/50000 [6:22:39<2:39:03,  1.56it/s]


 70%|███████████████████████▏         | 35145/50000 [6:22:40<2:37:55,  1.57it/s]


 70%|███████████████████████▏         | 35146/50000 [6:22:41<2:37:43,  1.57it/s]


 70%|███████████████████████▏         | 35147/50000 [6:22:41<2:32:30,  1.62it/s]


 70%|███████████████████████▏         | 35148/50000 [6:22:42<2:25:31,  1.70it/s]


 70%|███████████████████████▏         | 35149/50000 [6:22:42<2:32:09,  1.63it/s]


 70%|███████████████████████▏         | 35150/50000 [6:22:43<2:33:03,  1.62it/s]


 70%|███████████████████████▏         | 35151/50000 [6:22:44<2:37:52,  1.57it/s]


 70%|███████████████████████▏         | 35152/50000 [6:22:44<2:37:58,  1.57it/s]


 70%|███████████████████████▏         | 35153/50000 [6:22:45<2:35:13,  1.59it/s]


 70%|███████████████████████▏         | 35154/50000 [6:22:46<2:38:02,  1.57it/s]


 70%|███████████████████████▏         | 35155/50000 [6:22:46<2:38:32,  1.56it/s]


 70%|███████████████████████▏         | 35156/50000 [6:22:47<2:45:39,  1.49it/s]


 70%|███████████████████████▏         | 35157/50000 [6:22:48<2:38:17,  1.56it/s]


 70%|███████████████████████▏         | 35158/50000 [6:22:48<2:34:03,  1.61it/s]


 70%|███████████████████████▏         | 35159/50000 [6:22:49<2:48:25,  1.47it/s]


 70%|███████████████████████▏         | 35160/50000 [6:22:50<2:45:51,  1.49it/s]


 70%|███████████████████████▏         | 35161/50000 [6:22:50<2:36:35,  1.58it/s]


 70%|███████████████████████▏         | 35162/50000 [6:22:51<2:44:16,  1.51it/s]


 70%|███████████████████████▏         | 35163/50000 [6:22:52<2:56:01,  1.40it/s]


 70%|███████████████████████▏         | 35164/50000 [6:22:52<2:51:33,  1.44it/s]


 70%|███████████████████████▏         | 35165/50000 [6:22:53<2:54:14,  1.42it/s]


 70%|███████████████████████▏         | 35166/50000 [6:22:54<2:48:00,  1.47it/s]


 70%|███████████████████████▏         | 35167/50000 [6:22:54<2:41:43,  1.53it/s]


 70%|███████████████████████▏         | 35168/50000 [6:22:55<2:56:37,  1.40it/s]


 70%|███████████████████████▏         | 35169/50000 [6:22:56<2:45:53,  1.49it/s]


 70%|███████████████████████▏         | 35170/50000 [6:22:57<2:52:39,  1.43it/s]


 70%|███████████████████████▏         | 35171/50000 [6:22:57<2:51:05,  1.44it/s]


 70%|███████████████████████▏         | 35172/50000 [6:22:58<2:41:33,  1.53it/s]


 70%|███████████████████████▏         | 35173/50000 [6:22:58<2:37:13,  1.57it/s]


 70%|███████████████████████▏         | 35174/50000 [6:22:59<2:47:58,  1.47it/s]


 70%|███████████████████████▏         | 35175/50000 [6:23:00<2:36:19,  1.58it/s]


 70%|███████████████████████▏         | 35176/50000 [6:23:00<2:42:02,  1.52it/s]


 70%|███████████████████████▏         | 35177/50000 [6:23:01<2:44:05,  1.51it/s]


 70%|███████████████████████▏         | 35178/50000 [6:23:02<2:36:39,  1.58it/s]


 70%|███████████████████████▏         | 35179/50000 [6:23:02<2:35:39,  1.59it/s]


 70%|███████████████████████▏         | 35180/50000 [6:23:03<2:41:32,  1.53it/s]


 70%|███████████████████████▏         | 35181/50000 [6:23:04<2:57:20,  1.39it/s]


 70%|███████████████████████▏         | 35182/50000 [6:23:04<2:41:57,  1.52it/s]


 70%|███████████████████████▏         | 35183/50000 [6:23:05<2:39:22,  1.55it/s]


 70%|███████████████████████▏         | 35184/50000 [6:23:06<2:40:50,  1.54it/s]


 70%|███████████████████████▏         | 35185/50000 [6:23:06<2:39:57,  1.54it/s]


 70%|███████████████████████▏         | 35186/50000 [6:23:07<2:39:17,  1.55it/s]


 70%|███████████████████████▏         | 35187/50000 [6:23:07<2:32:21,  1.62it/s]


 70%|███████████████████████▏         | 35188/50000 [6:23:08<2:30:15,  1.64it/s]


 70%|███████████████████████▏         | 35189/50000 [6:23:09<2:25:58,  1.69it/s]


 70%|███████████████████████▏         | 35190/50000 [6:23:09<2:27:10,  1.68it/s]


 70%|███████████████████████▏         | 35191/50000 [6:23:10<2:37:33,  1.57it/s]


 70%|███████████████████████▏         | 35192/50000 [6:23:11<2:40:53,  1.53it/s]


 70%|███████████████████████▏         | 35193/50000 [6:23:11<2:38:35,  1.56it/s]


 70%|███████████████████████▏         | 35194/50000 [6:23:12<2:33:13,  1.61it/s]


 70%|███████████████████████▏         | 35195/50000 [6:23:12<2:35:01,  1.59it/s]


 70%|███████████████████████▏         | 35196/50000 [6:23:13<2:35:23,  1.59it/s]


 70%|███████████████████████▏         | 35197/50000 [6:23:14<2:37:07,  1.57it/s]


 70%|███████████████████████▏         | 35198/50000 [6:23:14<2:37:39,  1.56it/s]


 70%|███████████████████████▏         | 35199/50000 [6:23:15<2:40:21,  1.54it/s]


 70%|███████████████████████▏         | 35200/50000 [6:23:16<2:41:13,  1.53it/s]
                                                                                
{'loss': 3.2248, 'grad_norm': 2.805739164352417, 'learning_rate': 0.000296, 'epoch': 1.84}

 70%|███████████████████████▏         | 35200/50000 [6:23:16<2:41:13,  1.53it/s]


 70%|███████████████████████▏         | 35201/50000 [6:23:16<2:35:52,  1.58it/s]


 70%|███████████████████████▏         | 35202/50000 [6:23:17<2:30:26,  1.64it/s]


 70%|███████████████████████▏         | 35203/50000 [6:23:18<2:33:30,  1.61it/s]


 70%|███████████████████████▏         | 35204/50000 [6:23:18<2:38:01,  1.56it/s]


 70%|███████████████████████▏         | 35205/50000 [6:23:19<2:28:31,  1.66it/s]


 70%|███████████████████████▏         | 35206/50000 [6:23:19<2:32:10,  1.62it/s]


 70%|███████████████████████▏         | 35207/50000 [6:23:20<2:31:02,  1.63it/s]


 70%|███████████████████████▏         | 35208/50000 [6:23:21<2:30:13,  1.64it/s]


 70%|███████████████████████▏         | 35209/50000 [6:23:21<2:39:39,  1.54it/s]


 70%|███████████████████████▏         | 35210/50000 [6:23:22<2:29:56,  1.64it/s]


 70%|███████████████████████▏         | 35211/50000 [6:23:23<2:37:26,  1.57it/s]


 70%|███████████████████████▏         | 35212/50000 [6:23:23<2:33:18,  1.61it/s]


 70%|███████████████████████▏         | 35213/50000 [6:23:24<2:36:29,  1.57it/s]


 70%|███████████████████████▏         | 35214/50000 [6:23:24<2:31:13,  1.63it/s]


 70%|███████████████████████▏         | 35215/50000 [6:23:25<2:30:16,  1.64it/s]


 70%|███████████████████████▏         | 35216/50000 [6:23:26<2:35:42,  1.58it/s]


 70%|███████████████████████▏         | 35217/50000 [6:23:26<2:36:14,  1.58it/s]


 70%|███████████████████████▏         | 35218/50000 [6:23:27<2:50:36,  1.44it/s]


 70%|███████████████████████▏         | 35219/50000 [6:23:28<2:49:42,  1.45it/s]


 70%|███████████████████████▏         | 35220/50000 [6:23:28<2:42:54,  1.51it/s]


 70%|███████████████████████▏         | 35221/50000 [6:23:29<2:36:23,  1.58it/s]


 70%|███████████████████████▏         | 35222/50000 [6:23:30<2:39:38,  1.54it/s]


 70%|███████████████████████▏         | 35223/50000 [6:23:30<2:29:56,  1.64it/s]


 70%|███████████████████████▏         | 35224/50000 [6:23:31<2:27:33,  1.67it/s]


 70%|███████████████████████▏         | 35225/50000 [6:23:31<2:35:40,  1.58it/s]


 70%|███████████████████████▏         | 35226/50000 [6:23:32<2:37:17,  1.57it/s]


 70%|███████████████████████▏         | 35227/50000 [6:23:33<2:38:02,  1.56it/s]


 70%|███████████████████████▎         | 35228/50000 [6:23:33<2:43:56,  1.50it/s]


 70%|███████████████████████▎         | 35229/50000 [6:23:34<2:35:02,  1.59it/s]


 70%|███████████████████████▎         | 35230/50000 [6:23:35<2:37:36,  1.56it/s]


 70%|███████████████████████▎         | 35231/50000 [6:23:35<2:34:44,  1.59it/s]


 70%|███████████████████████▎         | 35232/50000 [6:23:36<2:42:14,  1.52it/s]


 70%|███████████████████████▎         | 35233/50000 [6:23:37<2:41:15,  1.53it/s]


 70%|███████████████████████▎         | 35234/50000 [6:23:37<2:45:44,  1.48it/s]


 70%|███████████████████████▎         | 35235/50000 [6:23:38<2:50:51,  1.44it/s]


 70%|███████████████████████▎         | 35236/50000 [6:23:39<2:43:32,  1.50it/s]


 70%|███████████████████████▎         | 35237/50000 [6:23:40<2:51:13,  1.44it/s]


 70%|███████████████████████▎         | 35238/50000 [6:23:40<3:06:42,  1.32it/s]


 70%|███████████████████████▎         | 35239/50000 [6:23:41<2:53:42,  1.42it/s]


 70%|███████████████████████▎         | 35240/50000 [6:23:42<2:45:57,  1.48it/s]


 70%|███████████████████████▎         | 35241/50000 [6:23:42<2:39:38,  1.54it/s]


 70%|███████████████████████▎         | 35242/50000 [6:23:43<2:46:10,  1.48it/s]


 70%|███████████████████████▎         | 35243/50000 [6:23:44<2:43:58,  1.50it/s]


 70%|███████████████████████▎         | 35244/50000 [6:23:44<2:36:50,  1.57it/s]


 70%|███████████████████████▎         | 35245/50000 [6:23:45<2:31:50,  1.62it/s]


 70%|███████████████████████▎         | 35246/50000 [6:23:46<2:55:01,  1.40it/s]


 70%|███████████████████████▎         | 35247/50000 [6:23:46<2:52:03,  1.43it/s]


 70%|███████████████████████▎         | 35248/50000 [6:23:47<3:03:32,  1.34it/s]


 70%|███████████████████████▎         | 35249/50000 [6:23:48<2:48:39,  1.46it/s]


 70%|███████████████████████▎         | 35250/50000 [6:23:48<2:41:56,  1.52it/s]


 71%|███████████████████████▎         | 35251/50000 [6:23:49<2:42:52,  1.51it/s]


 71%|███████████████████████▎         | 35252/50000 [6:23:50<2:36:32,  1.57it/s]


 71%|███████████████████████▎         | 35253/50000 [6:23:50<2:43:24,  1.50it/s]


 71%|███████████████████████▎         | 35254/50000 [6:23:51<2:47:46,  1.46it/s]


 71%|███████████████████████▎         | 35255/50000 [6:23:52<2:42:12,  1.51it/s]


 71%|███████████████████████▎         | 35256/50000 [6:23:52<2:36:57,  1.57it/s]


 71%|███████████████████████▎         | 35257/50000 [6:23:53<2:34:11,  1.59it/s]


 71%|███████████████████████▎         | 35258/50000 [6:23:53<2:35:32,  1.58it/s]


 71%|███████████████████████▎         | 35259/50000 [6:23:54<2:35:03,  1.58it/s]


 71%|███████████████████████▎         | 35260/50000 [6:23:55<2:35:14,  1.58it/s]


 71%|███████████████████████▎         | 35261/50000 [6:23:55<2:29:19,  1.64it/s]


 71%|███████████████████████▎         | 35262/50000 [6:23:56<2:33:03,  1.60it/s]


 71%|███████████████████████▎         | 35263/50000 [6:23:57<2:41:52,  1.52it/s]


 71%|███████████████████████▎         | 35264/50000 [6:23:57<2:36:41,  1.57it/s]


 71%|███████████████████████▎         | 35265/50000 [6:23:58<2:32:10,  1.61it/s]


 71%|███████████████████████▎         | 35266/50000 [6:23:59<2:39:25,  1.54it/s]


 71%|███████████████████████▎         | 35267/50000 [6:23:59<2:33:09,  1.60it/s]


 71%|███████████████████████▎         | 35268/50000 [6:24:00<2:30:07,  1.64it/s]


 71%|███████████████████████▎         | 35269/50000 [6:24:00<2:32:38,  1.61it/s]


 71%|███████████████████████▎         | 35270/50000 [6:24:01<2:35:51,  1.58it/s]


 71%|███████████████████████▎         | 35271/50000 [6:24:02<2:44:19,  1.49it/s]


 71%|███████████████████████▎         | 35272/50000 [6:24:03<2:51:08,  1.43it/s]


 71%|███████████████████████▎         | 35273/50000 [6:24:03<2:39:56,  1.53it/s]


 71%|███████████████████████▎         | 35274/50000 [6:24:04<2:49:28,  1.45it/s]


 71%|███████████████████████▎         | 35275/50000 [6:24:04<2:45:41,  1.48it/s]


 71%|███████████████████████▎         | 35276/50000 [6:24:05<2:46:21,  1.48it/s]


 71%|███████████████████████▎         | 35277/50000 [6:24:06<2:45:08,  1.49it/s]


 71%|███████████████████████▎         | 35278/50000 [6:24:06<2:37:13,  1.56it/s]


 71%|███████████████████████▎         | 35279/50000 [6:24:07<2:34:45,  1.59it/s]


 71%|███████████████████████▎         | 35280/50000 [6:24:08<2:38:37,  1.55it/s]


 71%|███████████████████████▎         | 35281/50000 [6:24:08<2:27:20,  1.66it/s]


 71%|███████████████████████▎         | 35282/50000 [6:24:09<2:25:33,  1.69it/s]


 71%|███████████████████████▎         | 35283/50000 [6:24:09<2:27:33,  1.66it/s]


 71%|███████████████████████▎         | 35284/50000 [6:24:10<2:32:42,  1.61it/s]


 71%|███████████████████████▎         | 35285/50000 [6:24:11<2:33:56,  1.59it/s]


 71%|███████████████████████▎         | 35286/50000 [6:24:11<2:29:12,  1.64it/s]


 71%|███████████████████████▎         | 35287/50000 [6:24:12<2:36:41,  1.56it/s]


 71%|███████████████████████▎         | 35288/50000 [6:24:13<2:35:52,  1.57it/s]


 71%|███████████████████████▎         | 35289/50000 [6:24:13<2:36:39,  1.57it/s]


 71%|███████████████████████▎         | 35290/50000 [6:24:14<2:39:02,  1.54it/s]


 71%|███████████████████████▎         | 35291/50000 [6:24:15<2:50:51,  1.43it/s]


 71%|███████████████████████▎         | 35292/50000 [6:24:15<2:47:26,  1.46it/s]


 71%|███████████████████████▎         | 35293/50000 [6:24:16<2:46:35,  1.47it/s]


 71%|███████████████████████▎         | 35294/50000 [6:24:17<2:44:07,  1.49it/s]


 71%|███████████████████████▎         | 35295/50000 [6:24:17<2:30:34,  1.63it/s]


 71%|███████████████████████▎         | 35296/50000 [6:24:18<2:23:20,  1.71it/s]


 71%|███████████████████████▎         | 35297/50000 [6:24:18<2:28:44,  1.65it/s]


 71%|███████████████████████▎         | 35298/50000 [6:24:19<2:26:06,  1.68it/s]


 71%|███████████████████████▎         | 35299/50000 [6:24:20<2:28:34,  1.65it/s]


 71%|███████████████████████▎         | 35300/50000 [6:24:20<2:33:37,  1.59it/s]
                                                                                
{'loss': 3.2311, 'grad_norm': 3.5853281021118164, 'learning_rate': 0.000294, 'epoch': 1.85}

 71%|███████████████████████▎         | 35300/50000 [6:24:20<2:33:37,  1.59it/s]


 71%|███████████████████████▎         | 35301/50000 [6:24:21<2:53:49,  1.41it/s]


 71%|███████████████████████▎         | 35302/50000 [6:24:22<2:50:40,  1.44it/s]


 71%|███████████████████████▎         | 35303/50000 [6:24:23<3:00:04,  1.36it/s]


 71%|███████████████████████▎         | 35304/50000 [6:24:23<2:54:33,  1.40it/s]


 71%|███████████████████████▎         | 35305/50000 [6:24:24<2:46:30,  1.47it/s]


 71%|███████████████████████▎         | 35306/50000 [6:24:25<2:43:42,  1.50it/s]


 71%|███████████████████████▎         | 35307/50000 [6:24:25<2:43:06,  1.50it/s]


 71%|███████████████████████▎         | 35308/50000 [6:24:26<2:44:04,  1.49it/s]


 71%|███████████████████████▎         | 35309/50000 [6:24:26<2:38:14,  1.55it/s]


 71%|███████████████████████▎         | 35310/50000 [6:24:27<2:33:15,  1.60it/s]


 71%|███████████████████████▎         | 35311/50000 [6:24:28<2:46:51,  1.47it/s]


 71%|███████████████████████▎         | 35312/50000 [6:24:29<2:46:36,  1.47it/s]


 71%|███████████████████████▎         | 35313/50000 [6:24:29<2:45:53,  1.48it/s]


 71%|███████████████████████▎         | 35314/50000 [6:24:30<2:43:03,  1.50it/s]


 71%|███████████████████████▎         | 35315/50000 [6:24:30<2:40:10,  1.53it/s]


 71%|███████████████████████▎         | 35316/50000 [6:24:31<2:35:48,  1.57it/s]


 71%|███████████████████████▎         | 35317/50000 [6:24:32<2:36:00,  1.57it/s]


 71%|███████████████████████▎         | 35318/50000 [6:24:32<2:38:00,  1.55it/s]


 71%|███████████████████████▎         | 35319/50000 [6:24:33<2:51:56,  1.42it/s]


 71%|███████████████████████▎         | 35320/50000 [6:24:34<2:55:22,  1.40it/s]


 71%|███████████████████████▎         | 35321/50000 [6:24:35<2:46:33,  1.47it/s]


 71%|███████████████████████▎         | 35322/50000 [6:24:35<2:54:04,  1.41it/s]


 71%|███████████████████████▎         | 35323/50000 [6:24:36<2:56:08,  1.39it/s]


 71%|███████████████████████▎         | 35324/50000 [6:24:37<2:51:16,  1.43it/s]


 71%|███████████████████████▎         | 35325/50000 [6:24:38<2:59:39,  1.36it/s]


 71%|███████████████████████▎         | 35326/50000 [6:24:38<2:52:13,  1.42it/s]


 71%|███████████████████████▎         | 35327/50000 [6:24:39<2:35:36,  1.57it/s]


 71%|███████████████████████▎         | 35328/50000 [6:24:39<2:39:14,  1.54it/s]


 71%|███████████████████████▎         | 35329/50000 [6:24:40<2:31:55,  1.61it/s]


 71%|███████████████████████▎         | 35330/50000 [6:24:41<2:41:48,  1.51it/s]


 71%|███████████████████████▎         | 35331/50000 [6:24:41<2:45:47,  1.47it/s]


 71%|███████████████████████▎         | 35332/50000 [6:24:42<2:44:20,  1.49it/s]


 71%|███████████████████████▎         | 35333/50000 [6:24:43<2:35:01,  1.58it/s]


 71%|███████████████████████▎         | 35334/50000 [6:24:43<2:30:20,  1.63it/s]


 71%|███████████████████████▎         | 35335/50000 [6:24:44<2:28:50,  1.64it/s]


 71%|███████████████████████▎         | 35336/50000 [6:24:44<2:32:27,  1.60it/s]


 71%|███████████████████████▎         | 35337/50000 [6:24:45<2:30:19,  1.63it/s]


 71%|███████████████████████▎         | 35338/50000 [6:24:46<2:28:22,  1.65it/s]


 71%|███████████████████████▎         | 35339/50000 [6:24:46<2:46:59,  1.46it/s]


 71%|███████████████████████▎         | 35340/50000 [6:24:47<2:50:52,  1.43it/s]


 71%|███████████████████████▎         | 35341/50000 [6:24:48<2:52:14,  1.42it/s]


 71%|███████████████████████▎         | 35342/50000 [6:24:49<3:03:40,  1.33it/s]


 71%|███████████████████████▎         | 35343/50000 [6:24:49<2:58:13,  1.37it/s]


 71%|███████████████████████▎         | 35344/50000 [6:24:50<2:58:19,  1.37it/s]


 71%|███████████████████████▎         | 35345/50000 [6:24:51<2:48:26,  1.45it/s]


 71%|███████████████████████▎         | 35346/50000 [6:24:51<2:41:58,  1.51it/s]


 71%|███████████████████████▎         | 35347/50000 [6:24:52<2:36:09,  1.56it/s]


 71%|███████████████████████▎         | 35348/50000 [6:24:53<2:36:10,  1.56it/s]


 71%|███████████████████████▎         | 35349/50000 [6:24:53<2:38:55,  1.54it/s]


 71%|███████████████████████▎         | 35350/50000 [6:24:54<2:39:00,  1.54it/s]


 71%|███████████████████████▎         | 35351/50000 [6:24:54<2:28:12,  1.65it/s]


 71%|███████████████████████▎         | 35352/50000 [6:24:55<2:26:25,  1.67it/s]


 71%|███████████████████████▎         | 35353/50000 [6:24:56<2:22:50,  1.71it/s]


 71%|███████████████████████▎         | 35354/50000 [6:24:56<2:28:40,  1.64it/s]


 71%|███████████████████████▎         | 35355/50000 [6:24:57<2:30:53,  1.62it/s]


 71%|███████████████████████▎         | 35356/50000 [6:24:57<2:28:49,  1.64it/s]


 71%|███████████████████████▎         | 35357/50000 [6:24:58<2:38:38,  1.54it/s]


 71%|███████████████████████▎         | 35358/50000 [6:24:59<2:40:16,  1.52it/s]


 71%|███████████████████████▎         | 35359/50000 [6:24:59<2:36:12,  1.56it/s]


 71%|███████████████████████▎         | 35360/50000 [6:25:00<2:36:08,  1.56it/s]


 71%|███████████████████████▎         | 35361/50000 [6:25:01<2:38:30,  1.54it/s]


 71%|███████████████████████▎         | 35362/50000 [6:25:01<2:37:26,  1.55it/s]


 71%|███████████████████████▎         | 35363/50000 [6:25:02<2:37:07,  1.55it/s]


 71%|███████████████████████▎         | 35364/50000 [6:25:03<2:35:33,  1.57it/s]


 71%|███████████████████████▎         | 35365/50000 [6:25:03<2:43:23,  1.49it/s]


 71%|███████████████████████▎         | 35366/50000 [6:25:04<2:32:19,  1.60it/s]


 71%|███████████████████████▎         | 35367/50000 [6:25:05<2:30:17,  1.62it/s]


 71%|███████████████████████▎         | 35368/50000 [6:25:05<2:40:11,  1.52it/s]


 71%|███████████████████████▎         | 35369/50000 [6:25:06<2:54:46,  1.40it/s]


 71%|███████████████████████▎         | 35370/50000 [6:25:07<2:44:03,  1.49it/s]


 71%|███████████████████████▎         | 35371/50000 [6:25:07<2:32:10,  1.60it/s]


 71%|███████████████████████▎         | 35372/50000 [6:25:08<2:28:15,  1.64it/s]


 71%|███████████████████████▎         | 35373/50000 [6:25:08<2:29:36,  1.63it/s]


 71%|███████████████████████▎         | 35374/50000 [6:25:09<2:31:57,  1.60it/s]


 71%|███████████████████████▎         | 35375/50000 [6:25:10<2:45:33,  1.47it/s]


 71%|███████████████████████▎         | 35376/50000 [6:25:11<2:45:23,  1.47it/s]


 71%|███████████████████████▎         | 35377/50000 [6:25:11<2:41:35,  1.51it/s]


 71%|███████████████████████▎         | 35378/50000 [6:25:12<2:36:36,  1.56it/s]


 71%|███████████████████████▎         | 35379/50000 [6:25:12<2:35:54,  1.56it/s]


 71%|███████████████████████▎         | 35380/50000 [6:25:13<2:36:47,  1.55it/s]


 71%|███████████████████████▎         | 35381/50000 [6:25:14<2:33:43,  1.58it/s]


 71%|███████████████████████▎         | 35382/50000 [6:25:14<2:46:46,  1.46it/s]


 71%|███████████████████████▎         | 35383/50000 [6:25:15<2:34:06,  1.58it/s]


 71%|███████████████████████▎         | 35384/50000 [6:25:16<2:35:16,  1.57it/s]


 71%|███████████████████████▎         | 35385/50000 [6:25:16<2:37:46,  1.54it/s]


 71%|███████████████████████▎         | 35386/50000 [6:25:17<2:37:19,  1.55it/s]


 71%|███████████████████████▎         | 35387/50000 [6:25:18<2:39:44,  1.52it/s]


 71%|███████████████████████▎         | 35388/50000 [6:25:18<2:41:22,  1.51it/s]


 71%|███████████████████████▎         | 35389/50000 [6:25:19<2:34:37,  1.57it/s]


 71%|███████████████████████▎         | 35390/50000 [6:25:19<2:34:02,  1.58it/s]


 71%|███████████████████████▎         | 35391/50000 [6:25:20<2:50:46,  1.43it/s]


 71%|███████████████████████▎         | 35392/50000 [6:25:21<2:39:19,  1.53it/s]


 71%|███████████████████████▎         | 35393/50000 [6:25:22<2:38:41,  1.53it/s]


 71%|███████████████████████▎         | 35394/50000 [6:25:22<2:34:33,  1.58it/s]


 71%|███████████████████████▎         | 35395/50000 [6:25:23<2:37:17,  1.55it/s]


 71%|███████████████████████▎         | 35396/50000 [6:25:23<2:38:47,  1.53it/s]


 71%|███████████████████████▎         | 35397/50000 [6:25:24<2:35:12,  1.57it/s]


 71%|███████████████████████▎         | 35398/50000 [6:25:25<2:38:15,  1.54it/s]


 71%|███████████████████████▎         | 35399/50000 [6:25:25<2:29:05,  1.63it/s]


 71%|███████████████████████▎         | 35400/50000 [6:25:26<2:30:17,  1.62it/s]
                                                                                
{'loss': 3.1958, 'grad_norm': 3.1801910400390625, 'learning_rate': 0.000292, 'epoch': 1.85}

 71%|███████████████████████▎         | 35400/50000 [6:25:26<2:30:17,  1.62it/s]


 71%|███████████████████████▎         | 35401/50000 [6:25:27<2:28:50,  1.63it/s]


 71%|███████████████████████▎         | 35402/50000 [6:25:27<2:36:08,  1.56it/s]


 71%|███████████████████████▎         | 35403/50000 [6:25:28<2:43:13,  1.49it/s]


 71%|███████████████████████▎         | 35404/50000 [6:25:29<2:41:29,  1.51it/s]


 71%|███████████████████████▎         | 35405/50000 [6:25:29<2:42:36,  1.50it/s]


 71%|███████████████████████▎         | 35406/50000 [6:25:30<2:43:53,  1.48it/s]


 71%|███████████████████████▎         | 35407/50000 [6:25:31<3:02:48,  1.33it/s]


 71%|███████████████████████▎         | 35408/50000 [6:25:32<2:55:12,  1.39it/s]


 71%|███████████████████████▎         | 35409/50000 [6:25:32<2:51:17,  1.42it/s]


 71%|███████████████████████▎         | 35410/50000 [6:25:33<2:43:00,  1.49it/s]


 71%|███████████████████████▎         | 35411/50000 [6:25:33<2:41:48,  1.50it/s]


 71%|███████████████████████▎         | 35412/50000 [6:25:34<2:42:49,  1.49it/s]


 71%|███████████████████████▎         | 35413/50000 [6:25:35<2:30:25,  1.62it/s]


 71%|███████████████████████▎         | 35414/50000 [6:25:35<2:37:11,  1.55it/s]


 71%|███████████████████████▎         | 35415/50000 [6:25:36<2:43:53,  1.48it/s]


 71%|███████████████████████▎         | 35416/50000 [6:25:37<2:43:27,  1.49it/s]


 71%|███████████████████████▍         | 35417/50000 [6:25:37<2:37:57,  1.54it/s]


 71%|███████████████████████▍         | 35418/50000 [6:25:38<2:44:24,  1.48it/s]


 71%|███████████████████████▍         | 35419/50000 [6:25:39<2:44:11,  1.48it/s]


 71%|███████████████████████▍         | 35420/50000 [6:25:40<2:47:52,  1.45it/s]


 71%|███████████████████████▍         | 35421/50000 [6:25:40<3:08:56,  1.29it/s]


 71%|███████████████████████▍         | 35422/50000 [6:25:41<3:07:59,  1.29it/s]


 71%|███████████████████████▍         | 35423/50000 [6:25:42<2:59:13,  1.36it/s]


 71%|███████████████████████▍         | 35424/50000 [6:25:42<2:47:36,  1.45it/s]


 71%|███████████████████████▍         | 35425/50000 [6:25:43<2:52:34,  1.41it/s]


 71%|███████████████████████▍         | 35426/50000 [6:25:44<2:44:19,  1.48it/s]


 71%|███████████████████████▍         | 35427/50000 [6:25:44<2:41:19,  1.51it/s]


 71%|███████████████████████▍         | 35428/50000 [6:25:45<2:38:44,  1.53it/s]


 71%|███████████████████████▍         | 35429/50000 [6:25:46<2:32:19,  1.59it/s]


 71%|███████████████████████▍         | 35430/50000 [6:25:46<2:32:28,  1.59it/s]


 71%|███████████████████████▍         | 35431/50000 [6:25:47<2:30:14,  1.62it/s]


 71%|███████████████████████▍         | 35432/50000 [6:25:47<2:29:06,  1.63it/s]


 71%|███████████████████████▍         | 35433/50000 [6:25:48<2:31:33,  1.60it/s]


 71%|███████████████████████▍         | 35434/50000 [6:25:49<2:27:46,  1.64it/s]


 71%|███████████████████████▍         | 35435/50000 [6:25:49<2:30:18,  1.61it/s]


 71%|███████████████████████▍         | 35436/50000 [6:25:50<2:28:47,  1.63it/s]


 71%|███████████████████████▍         | 35437/50000 [6:25:51<2:39:21,  1.52it/s]


 71%|███████████████████████▍         | 35438/50000 [6:25:51<2:39:11,  1.52it/s]


 71%|███████████████████████▍         | 35439/50000 [6:25:52<2:43:40,  1.48it/s]


 71%|███████████████████████▍         | 35440/50000 [6:25:53<2:54:07,  1.39it/s]


 71%|███████████████████████▍         | 35441/50000 [6:25:54<2:48:34,  1.44it/s]


 71%|███████████████████████▍         | 35442/50000 [6:25:54<2:46:58,  1.45it/s]


 71%|███████████████████████▍         | 35443/50000 [6:25:55<2:38:31,  1.53it/s]


 71%|███████████████████████▍         | 35444/50000 [6:25:55<2:36:42,  1.55it/s]


 71%|███████████████████████▍         | 35445/50000 [6:25:56<2:39:09,  1.52it/s]


 71%|███████████████████████▍         | 35446/50000 [6:25:57<2:43:11,  1.49it/s]


 71%|███████████████████████▍         | 35447/50000 [6:25:58<2:48:49,  1.44it/s]


 71%|███████████████████████▍         | 35448/50000 [6:25:58<2:45:11,  1.47it/s]


 71%|███████████████████████▍         | 35449/50000 [6:25:59<2:35:43,  1.56it/s]


 71%|███████████████████████▍         | 35450/50000 [6:25:59<2:30:56,  1.61it/s]


 71%|███████████████████████▍         | 35451/50000 [6:26:00<2:39:43,  1.52it/s]


 71%|███████████████████████▍         | 35452/50000 [6:26:01<2:37:23,  1.54it/s]


 71%|███████████████████████▍         | 35453/50000 [6:26:01<2:36:14,  1.55it/s]


 71%|███████████████████████▍         | 35454/50000 [6:26:02<2:49:09,  1.43it/s]


 71%|███████████████████████▍         | 35455/50000 [6:26:03<2:39:52,  1.52it/s]


 71%|███████████████████████▍         | 35456/50000 [6:26:03<2:41:37,  1.50it/s]


 71%|███████████████████████▍         | 35457/50000 [6:26:04<2:48:35,  1.44it/s]


 71%|███████████████████████▍         | 35458/50000 [6:26:05<2:42:09,  1.49it/s]


 71%|███████████████████████▍         | 35459/50000 [6:26:05<2:33:06,  1.58it/s]


 71%|███████████████████████▍         | 35460/50000 [6:26:06<2:46:30,  1.46it/s]


 71%|███████████████████████▍         | 35461/50000 [6:26:07<2:44:23,  1.47it/s]


 71%|███████████████████████▍         | 35462/50000 [6:26:08<3:12:14,  1.26it/s]


 71%|███████████████████████▍         | 35463/50000 [6:26:09<3:11:20,  1.27it/s]


 71%|███████████████████████▍         | 35464/50000 [6:26:09<2:56:15,  1.37it/s]


 71%|███████████████████████▍         | 35465/50000 [6:26:10<2:49:50,  1.43it/s]


 71%|███████████████████████▍         | 35466/50000 [6:26:11<2:51:29,  1.41it/s]


 71%|███████████████████████▍         | 35467/50000 [6:26:11<2:41:26,  1.50it/s]


 71%|███████████████████████▍         | 35468/50000 [6:26:12<2:40:22,  1.51it/s]


 71%|███████████████████████▍         | 35469/50000 [6:26:12<2:32:28,  1.59it/s]


 71%|███████████████████████▍         | 35470/50000 [6:26:13<2:32:52,  1.58it/s]


 71%|███████████████████████▍         | 35471/50000 [6:26:14<2:27:36,  1.64it/s]


 71%|███████████████████████▍         | 35472/50000 [6:26:14<2:27:02,  1.65it/s]


 71%|███████████████████████▍         | 35473/50000 [6:26:15<2:26:10,  1.66it/s]


 71%|███████████████████████▍         | 35474/50000 [6:26:15<2:29:39,  1.62it/s]


 71%|███████████████████████▍         | 35475/50000 [6:26:16<2:34:26,  1.57it/s]


 71%|███████████████████████▍         | 35476/50000 [6:26:17<2:48:45,  1.43it/s]


 71%|███████████████████████▍         | 35477/50000 [6:26:18<2:39:31,  1.52it/s]


 71%|███████████████████████▍         | 35478/50000 [6:26:18<2:47:06,  1.45it/s]


 71%|███████████████████████▍         | 35479/50000 [6:26:19<2:45:16,  1.46it/s]


 71%|███████████████████████▍         | 35480/50000 [6:26:20<2:43:56,  1.48it/s]


 71%|███████████████████████▍         | 35481/50000 [6:26:20<2:48:13,  1.44it/s]


 71%|███████████████████████▍         | 35482/50000 [6:26:21<2:41:01,  1.50it/s]


 71%|███████████████████████▍         | 35483/50000 [6:26:22<2:36:31,  1.55it/s]


 71%|███████████████████████▍         | 35484/50000 [6:26:22<2:41:39,  1.50it/s]


 71%|███████████████████████▍         | 35485/50000 [6:26:23<2:35:09,  1.56it/s]


 71%|███████████████████████▍         | 35486/50000 [6:26:23<2:28:19,  1.63it/s]


 71%|███████████████████████▍         | 35487/50000 [6:26:24<2:21:28,  1.71it/s]


 71%|███████████████████████▍         | 35488/50000 [6:26:25<2:26:00,  1.66it/s]


 71%|███████████████████████▍         | 35489/50000 [6:26:25<2:27:49,  1.64it/s]


 71%|███████████████████████▍         | 35490/50000 [6:26:26<2:35:32,  1.55it/s]


 71%|███████████████████████▍         | 35491/50000 [6:26:26<2:26:29,  1.65it/s]


 71%|███████████████████████▍         | 35492/50000 [6:26:27<2:25:51,  1.66it/s]


 71%|███████████████████████▍         | 35493/50000 [6:26:28<2:30:01,  1.61it/s]


 71%|███████████████████████▍         | 35494/50000 [6:26:28<2:31:30,  1.60it/s]


 71%|███████████████████████▍         | 35495/50000 [6:26:29<2:32:11,  1.59it/s]


 71%|███████████████████████▍         | 35496/50000 [6:26:30<2:33:04,  1.58it/s]


 71%|███████████████████████▍         | 35497/50000 [6:26:30<2:35:59,  1.55it/s]


 71%|███████████████████████▍         | 35498/50000 [6:26:31<2:34:50,  1.56it/s]


 71%|███████████████████████▍         | 35499/50000 [6:26:32<2:36:48,  1.54it/s]


 71%|███████████████████████▍         | 35500/50000 [6:26:32<2:38:03,  1.53it/s]
                                                                                
{'loss': 3.2046, 'grad_norm': 4.050107479095459, 'learning_rate': 0.00029, 'epoch': 1.86}

 71%|███████████████████████▍         | 35500/50000 [6:26:32<2:38:03,  1.53it/s]


 71%|███████████████████████▍         | 35501/50000 [6:26:33<2:31:47,  1.59it/s]


 71%|███████████████████████▍         | 35502/50000 [6:26:33<2:33:38,  1.57it/s]


 71%|███████████████████████▍         | 35503/50000 [6:26:34<2:30:49,  1.60it/s]


 71%|███████████████████████▍         | 35504/50000 [6:26:35<2:28:51,  1.62it/s]


 71%|███████████████████████▍         | 35505/50000 [6:26:35<2:33:25,  1.57it/s]


 71%|███████████████████████▍         | 35506/50000 [6:26:36<2:36:37,  1.54it/s]


 71%|███████████████████████▍         | 35507/50000 [6:26:37<2:41:12,  1.50it/s]


 71%|███████████████████████▍         | 35508/50000 [6:26:37<2:38:55,  1.52it/s]


 71%|███████████████████████▍         | 35509/50000 [6:26:38<2:32:43,  1.58it/s]


 71%|███████████████████████▍         | 35510/50000 [6:26:39<2:32:57,  1.58it/s]


 71%|███████████████████████▍         | 35511/50000 [6:26:39<2:30:46,  1.60it/s]


 71%|███████████████████████▍         | 35512/50000 [6:26:40<2:34:44,  1.56it/s]


 71%|███████████████████████▍         | 35513/50000 [6:26:41<2:38:00,  1.53it/s]


 71%|███████████████████████▍         | 35514/50000 [6:26:41<2:49:22,  1.43it/s]


 71%|███████████████████████▍         | 35515/50000 [6:26:42<2:57:21,  1.36it/s]


 71%|███████████████████████▍         | 35516/50000 [6:26:43<2:56:43,  1.37it/s]


 71%|███████████████████████▍         | 35517/50000 [6:26:43<2:45:37,  1.46it/s]


 71%|███████████████████████▍         | 35518/50000 [6:26:44<2:38:21,  1.52it/s]


 71%|███████████████████████▍         | 35519/50000 [6:26:45<2:50:02,  1.42it/s]


 71%|███████████████████████▍         | 35520/50000 [6:26:45<2:33:04,  1.58it/s]


 71%|███████████████████████▍         | 35521/50000 [6:26:46<2:28:07,  1.63it/s]


 71%|███████████████████████▍         | 35522/50000 [6:26:46<2:24:11,  1.67it/s]


 71%|███████████████████████▍         | 35523/50000 [6:26:47<2:22:27,  1.69it/s]


 71%|███████████████████████▍         | 35524/50000 [6:26:48<2:33:41,  1.57it/s]


 71%|███████████████████████▍         | 35525/50000 [6:26:48<2:33:35,  1.57it/s]


 71%|███████████████████████▍         | 35526/50000 [6:26:49<2:47:38,  1.44it/s]


 71%|███████████████████████▍         | 35527/50000 [6:26:50<2:45:37,  1.46it/s]


 71%|███████████████████████▍         | 35528/50000 [6:26:51<2:44:13,  1.47it/s]


 71%|███████████████████████▍         | 35529/50000 [6:26:51<2:36:21,  1.54it/s]


 71%|███████████████████████▍         | 35530/50000 [6:26:52<2:41:26,  1.49it/s]


 71%|███████████████████████▍         | 35531/50000 [6:26:53<2:47:20,  1.44it/s]


 71%|███████████████████████▍         | 35532/50000 [6:26:53<2:39:26,  1.51it/s]


 71%|███████████████████████▍         | 35533/50000 [6:26:54<2:43:05,  1.48it/s]


 71%|███████████████████████▍         | 35534/50000 [6:26:54<2:35:59,  1.55it/s]


 71%|███████████████████████▍         | 35535/50000 [6:26:55<2:45:55,  1.45it/s]


 71%|███████████████████████▍         | 35536/50000 [6:26:56<2:36:37,  1.54it/s]


 71%|███████████████████████▍         | 35537/50000 [6:26:56<2:29:40,  1.61it/s]


 71%|███████████████████████▍         | 35538/50000 [6:26:57<2:27:30,  1.63it/s]


 71%|███████████████████████▍         | 35539/50000 [6:26:58<2:34:40,  1.56it/s]


 71%|███████████████████████▍         | 35540/50000 [6:26:58<2:29:11,  1.62it/s]


 71%|███████████████████████▍         | 35541/50000 [6:26:59<2:24:24,  1.67it/s]


 71%|███████████████████████▍         | 35542/50000 [6:26:59<2:29:20,  1.61it/s]


 71%|███████████████████████▍         | 35543/50000 [6:27:00<2:33:32,  1.57it/s]


 71%|███████████████████████▍         | 35544/50000 [6:27:01<2:32:20,  1.58it/s]


 71%|███████████████████████▍         | 35545/50000 [6:27:02<2:48:24,  1.43it/s]


 71%|███████████████████████▍         | 35546/50000 [6:27:02<2:40:03,  1.51it/s]


 71%|███████████████████████▍         | 35547/50000 [6:27:03<2:29:02,  1.62it/s]


 71%|███████████████████████▍         | 35548/50000 [6:27:03<2:27:01,  1.64it/s]


 71%|███████████████████████▍         | 35549/50000 [6:27:04<2:25:09,  1.66it/s]


 71%|███████████████████████▍         | 35550/50000 [6:27:05<2:30:14,  1.60it/s]


 71%|███████████████████████▍         | 35551/50000 [6:27:05<2:25:32,  1.65it/s]


 71%|███████████████████████▍         | 35552/50000 [6:27:06<2:25:18,  1.66it/s]


 71%|███████████████████████▍         | 35553/50000 [6:27:06<2:21:59,  1.70it/s]


 71%|███████████████████████▍         | 35554/50000 [6:27:07<2:23:41,  1.68it/s]


 71%|███████████████████████▍         | 35555/50000 [6:27:08<2:28:07,  1.63it/s]


 71%|███████████████████████▍         | 35556/50000 [6:27:08<2:30:20,  1.60it/s]


 71%|███████████████████████▍         | 35557/50000 [6:27:09<2:25:02,  1.66it/s]


 71%|███████████████████████▍         | 35558/50000 [6:27:09<2:27:16,  1.63it/s]


 71%|███████████████████████▍         | 35559/50000 [6:27:10<2:31:32,  1.59it/s]


 71%|███████████████████████▍         | 35560/50000 [6:27:11<2:32:39,  1.58it/s]


 71%|███████████████████████▍         | 35561/50000 [6:27:11<2:29:22,  1.61it/s]


 71%|███████████████████████▍         | 35562/50000 [6:27:12<2:32:07,  1.58it/s]


 71%|███████████████████████▍         | 35563/50000 [6:27:13<2:27:39,  1.63it/s]


 71%|███████████████████████▍         | 35564/50000 [6:27:13<2:44:34,  1.46it/s]


 71%|███████████████████████▍         | 35565/50000 [6:27:14<2:41:20,  1.49it/s]


 71%|███████████████████████▍         | 35566/50000 [6:27:15<2:29:17,  1.61it/s]


 71%|███████████████████████▍         | 35567/50000 [6:27:15<2:36:50,  1.53it/s]


 71%|███████████████████████▍         | 35568/50000 [6:27:16<2:31:57,  1.58it/s]


 71%|███████████████████████▍         | 35569/50000 [6:27:16<2:26:32,  1.64it/s]


 71%|███████████████████████▍         | 35570/50000 [6:27:17<2:30:22,  1.60it/s]


 71%|███████████████████████▍         | 35571/50000 [6:27:18<2:30:29,  1.60it/s]


 71%|███████████████████████▍         | 35572/50000 [6:27:18<2:25:06,  1.66it/s]


 71%|███████████████████████▍         | 35573/50000 [6:27:19<2:29:35,  1.61it/s]


 71%|███████████████████████▍         | 35574/50000 [6:27:20<2:32:39,  1.58it/s]


 71%|███████████████████████▍         | 35575/50000 [6:27:20<2:17:35,  1.75it/s]


 71%|███████████████████████▍         | 35576/50000 [6:27:21<2:23:19,  1.68it/s]


 71%|███████████████████████▍         | 35577/50000 [6:27:21<2:27:48,  1.63it/s]


 71%|███████████████████████▍         | 35578/50000 [6:27:22<2:42:20,  1.48it/s]


 71%|███████████████████████▍         | 35579/50000 [6:27:23<2:42:02,  1.48it/s]


 71%|███████████████████████▍         | 35580/50000 [6:27:23<2:42:25,  1.48it/s]


 71%|███████████████████████▍         | 35581/50000 [6:27:24<2:37:06,  1.53it/s]


 71%|███████████████████████▍         | 35582/50000 [6:27:25<2:32:27,  1.58it/s]


 71%|███████████████████████▍         | 35583/50000 [6:27:25<2:40:16,  1.50it/s]


 71%|███████████████████████▍         | 35584/50000 [6:27:26<2:44:54,  1.46it/s]


 71%|███████████████████████▍         | 35585/50000 [6:27:27<2:33:04,  1.57it/s]


 71%|███████████████████████▍         | 35586/50000 [6:27:27<2:33:13,  1.57it/s]


 71%|███████████████████████▍         | 35587/50000 [6:27:28<2:32:53,  1.57it/s]


 71%|███████████████████████▍         | 35588/50000 [6:27:29<2:46:17,  1.44it/s]


 71%|███████████████████████▍         | 35589/50000 [6:27:29<2:42:29,  1.48it/s]


 71%|███████████████████████▍         | 35590/50000 [6:27:30<2:38:28,  1.52it/s]


 71%|███████████████████████▍         | 35591/50000 [6:27:31<2:33:21,  1.57it/s]


 71%|███████████████████████▍         | 35592/50000 [6:27:31<2:35:20,  1.55it/s]


 71%|███████████████████████▍         | 35593/50000 [6:27:32<2:34:28,  1.55it/s]


 71%|███████████████████████▍         | 35594/50000 [6:27:33<2:34:40,  1.55it/s]


 71%|███████████████████████▍         | 35595/50000 [6:27:33<2:25:54,  1.65it/s]


 71%|███████████████████████▍         | 35596/50000 [6:27:34<2:27:45,  1.62it/s]


 71%|███████████████████████▍         | 35597/50000 [6:27:34<2:29:29,  1.61it/s]


 71%|███████████████████████▍         | 35598/50000 [6:27:35<2:26:09,  1.64it/s]


 71%|███████████████████████▍         | 35599/50000 [6:27:36<2:31:05,  1.59it/s]


 71%|███████████████████████▍         | 35600/50000 [6:27:36<2:22:03,  1.69it/s]
                                                                                
{'loss': 3.2122, 'grad_norm': 2.64692759513855, 'learning_rate': 0.000288, 'epoch': 1.86}

 71%|███████████████████████▍         | 35600/50000 [6:27:36<2:22:03,  1.69it/s]


 71%|███████████████████████▍         | 35601/50000 [6:27:37<2:21:51,  1.69it/s]


 71%|███████████████████████▍         | 35602/50000 [6:27:37<2:32:49,  1.57it/s]


 71%|███████████████████████▍         | 35603/50000 [6:27:38<2:28:00,  1.62it/s]


 71%|███████████████████████▍         | 35604/50000 [6:27:39<2:20:19,  1.71it/s]


 71%|███████████████████████▍         | 35605/50000 [6:27:39<2:31:41,  1.58it/s]


 71%|███████████████████████▍         | 35606/50000 [6:27:40<2:37:15,  1.53it/s]


 71%|███████████████████████▌         | 35607/50000 [6:27:41<2:33:46,  1.56it/s]


 71%|███████████████████████▌         | 35608/50000 [6:27:41<2:29:07,  1.61it/s]


 71%|███████████████████████▌         | 35609/50000 [6:27:42<2:31:17,  1.59it/s]


 71%|███████████████████████▌         | 35610/50000 [6:27:42<2:28:43,  1.61it/s]


 71%|███████████████████████▌         | 35611/50000 [6:27:43<2:29:11,  1.61it/s]


 71%|███████████████████████▌         | 35612/50000 [6:27:44<2:24:59,  1.65it/s]


 71%|███████████████████████▌         | 35613/50000 [6:27:44<2:22:04,  1.69it/s]


 71%|███████████████████████▌         | 35614/50000 [6:27:45<2:28:39,  1.61it/s]


 71%|███████████████████████▌         | 35615/50000 [6:27:45<2:30:02,  1.60it/s]


 71%|███████████████████████▌         | 35616/50000 [6:27:46<2:33:19,  1.56it/s]


 71%|███████████████████████▌         | 35617/50000 [6:27:47<2:33:44,  1.56it/s]


 71%|███████████████████████▌         | 35618/50000 [6:27:47<2:30:26,  1.59it/s]


 71%|███████████████████████▌         | 35619/50000 [6:27:48<2:32:50,  1.57it/s]


 71%|███████████████████████▌         | 35620/50000 [6:27:49<2:34:59,  1.55it/s]


 71%|███████████████████████▌         | 35621/50000 [6:27:49<2:26:12,  1.64it/s]


 71%|███████████████████████▌         | 35622/50000 [6:27:50<2:27:37,  1.62it/s]


 71%|███████████████████████▌         | 35623/50000 [6:27:51<2:29:55,  1.60it/s]


 71%|███████████████████████▌         | 35624/50000 [6:27:51<2:25:49,  1.64it/s]


 71%|███████████████████████▌         | 35625/50000 [6:27:52<2:22:32,  1.68it/s]


 71%|███████████████████████▌         | 35626/50000 [6:27:52<2:21:21,  1.69it/s]


 71%|███████████████████████▌         | 35627/50000 [6:27:53<2:37:16,  1.52it/s]


 71%|███████████████████████▌         | 35628/50000 [6:27:54<2:36:37,  1.53it/s]


 71%|███████████████████████▌         | 35629/50000 [6:27:54<2:29:52,  1.60it/s]


 71%|███████████████████████▌         | 35630/50000 [6:27:55<2:27:49,  1.62it/s]


 71%|███████████████████████▌         | 35631/50000 [6:27:56<2:35:59,  1.54it/s]


 71%|███████████████████████▌         | 35632/50000 [6:27:56<2:30:07,  1.60it/s]


 71%|███████████████████████▌         | 35633/50000 [6:27:57<2:27:51,  1.62it/s]


 71%|███████████████████████▌         | 35634/50000 [6:27:57<2:36:25,  1.53it/s]


 71%|███████████████████████▌         | 35635/50000 [6:27:58<2:32:13,  1.57it/s]


 71%|███████████████████████▌         | 35636/50000 [6:27:59<2:32:54,  1.57it/s]


 71%|███████████████████████▌         | 35637/50000 [6:27:59<2:35:42,  1.54it/s]


 71%|███████████████████████▌         | 35638/50000 [6:28:00<2:34:54,  1.55it/s]


 71%|███████████████████████▌         | 35639/50000 [6:28:01<2:33:02,  1.56it/s]


 71%|███████████████████████▌         | 35640/50000 [6:28:01<2:30:18,  1.59it/s]


 71%|███████████████████████▌         | 35641/50000 [6:28:02<2:20:39,  1.70it/s]


 71%|███████████████████████▌         | 35642/50000 [6:28:02<2:21:11,  1.69it/s]


 71%|███████████████████████▌         | 35643/50000 [6:28:03<2:31:44,  1.58it/s]


 71%|███████████████████████▌         | 35644/50000 [6:28:04<2:35:20,  1.54it/s]


 71%|███████████████████████▌         | 35645/50000 [6:28:04<2:28:45,  1.61it/s]


 71%|███████████████████████▌         | 35646/50000 [6:28:05<2:45:55,  1.44it/s]


 71%|███████████████████████▌         | 35647/50000 [6:28:06<2:47:02,  1.43it/s]


 71%|███████████████████████▌         | 35648/50000 [6:28:06<2:39:39,  1.50it/s]


 71%|███████████████████████▌         | 35649/50000 [6:28:07<2:31:47,  1.58it/s]


 71%|███████████████████████▌         | 35650/50000 [6:28:08<2:28:31,  1.61it/s]


 71%|███████████████████████▌         | 35651/50000 [6:28:08<2:24:53,  1.65it/s]


 71%|███████████████████████▌         | 35652/50000 [6:28:09<2:27:46,  1.62it/s]


 71%|███████████████████████▌         | 35653/50000 [6:28:10<2:32:31,  1.57it/s]


 71%|███████████████████████▌         | 35654/50000 [6:28:10<2:32:12,  1.57it/s]


 71%|███████████████████████▌         | 35655/50000 [6:28:11<2:33:17,  1.56it/s]


 71%|███████████████████████▌         | 35656/50000 [6:28:11<2:28:37,  1.61it/s]


 71%|███████████████████████▌         | 35657/50000 [6:28:12<2:25:49,  1.64it/s]


 71%|███████████████████████▌         | 35658/50000 [6:28:13<2:35:21,  1.54it/s]


 71%|███████████████████████▌         | 35659/50000 [6:28:14<2:46:48,  1.43it/s]


 71%|███████████████████████▌         | 35660/50000 [6:28:14<2:39:52,  1.49it/s]


 71%|███████████████████████▌         | 35661/50000 [6:28:15<2:34:59,  1.54it/s]


 71%|███████████████████████▌         | 35662/50000 [6:28:15<2:36:01,  1.53it/s]


 71%|███████████████████████▌         | 35663/50000 [6:28:16<2:35:29,  1.54it/s]


 71%|███████████████████████▌         | 35664/50000 [6:28:17<2:42:07,  1.47it/s]


 71%|███████████████████████▌         | 35665/50000 [6:28:17<2:36:08,  1.53it/s]


 71%|███████████████████████▌         | 35666/50000 [6:28:18<2:29:17,  1.60it/s]


 71%|███████████████████████▌         | 35667/50000 [6:28:19<2:35:21,  1.54it/s]


 71%|███████████████████████▌         | 35668/50000 [6:28:19<2:28:15,  1.61it/s]


 71%|███████████████████████▌         | 35669/50000 [6:28:20<2:37:02,  1.52it/s]


 71%|███████████████████████▌         | 35670/50000 [6:28:20<2:23:27,  1.66it/s]


 71%|███████████████████████▌         | 35671/50000 [6:28:21<2:23:33,  1.66it/s]


 71%|███████████████████████▌         | 35672/50000 [6:28:22<2:19:59,  1.71it/s]


 71%|███████████████████████▌         | 35673/50000 [6:28:22<2:36:28,  1.53it/s]


 71%|███████████████████████▌         | 35674/50000 [6:28:23<2:36:43,  1.52it/s]


 71%|███████████████████████▌         | 35675/50000 [6:28:24<2:32:22,  1.57it/s]


 71%|███████████████████████▌         | 35676/50000 [6:28:24<2:30:10,  1.59it/s]


 71%|███████████████████████▌         | 35677/50000 [6:28:25<2:30:30,  1.59it/s]


 71%|███████████████████████▌         | 35678/50000 [6:28:26<2:33:27,  1.56it/s]


 71%|███████████████████████▌         | 35679/50000 [6:28:26<2:33:12,  1.56it/s]


 71%|███████████████████████▌         | 35680/50000 [6:28:27<2:39:05,  1.50it/s]


 71%|███████████████████████▌         | 35681/50000 [6:28:28<2:44:07,  1.45it/s]


 71%|███████████████████████▌         | 35682/50000 [6:28:28<2:35:14,  1.54it/s]


 71%|███████████████████████▌         | 35683/50000 [6:28:29<2:39:53,  1.49it/s]


 71%|███████████████████████▌         | 35684/50000 [6:28:30<2:37:40,  1.51it/s]


 71%|███████████████████████▌         | 35685/50000 [6:28:30<2:31:37,  1.57it/s]


 71%|███████████████████████▌         | 35686/50000 [6:28:31<2:27:23,  1.62it/s]


 71%|███████████████████████▌         | 35687/50000 [6:28:31<2:37:43,  1.51it/s]


 71%|███████████████████████▌         | 35688/50000 [6:28:32<2:34:48,  1.54it/s]


 71%|███████████████████████▌         | 35689/50000 [6:28:33<2:31:24,  1.58it/s]


 71%|███████████████████████▌         | 35690/50000 [6:28:33<2:28:06,  1.61it/s]


 71%|███████████████████████▌         | 35691/50000 [6:28:34<2:32:35,  1.56it/s]


 71%|███████████████████████▌         | 35692/50000 [6:28:35<2:42:43,  1.47it/s]


 71%|███████████████████████▌         | 35693/50000 [6:28:35<2:33:48,  1.55it/s]


 71%|███████████████████████▌         | 35694/50000 [6:28:36<2:43:09,  1.46it/s]


 71%|███████████████████████▌         | 35695/50000 [6:28:37<2:35:29,  1.53it/s]


 71%|███████████████████████▌         | 35696/50000 [6:28:37<2:30:56,  1.58it/s]


 71%|███████████████████████▌         | 35697/50000 [6:28:38<2:30:28,  1.58it/s]


 71%|███████████████████████▌         | 35698/50000 [6:28:38<2:25:13,  1.64it/s]


 71%|███████████████████████▌         | 35699/50000 [6:28:39<2:40:54,  1.48it/s]


 71%|███████████████████████▌         | 35700/50000 [6:28:40<2:34:44,  1.54it/s]
                                                                                
{'loss': 3.2176, 'grad_norm': 2.7878096103668213, 'learning_rate': 0.00028599999999999996, 'epoch': 1.87}

 71%|███████████████████████▌         | 35700/50000 [6:28:40<2:34:44,  1.54it/s]


 71%|███████████████████████▌         | 35701/50000 [6:28:40<2:31:20,  1.57it/s]


 71%|███████████████████████▌         | 35702/50000 [6:28:41<2:30:21,  1.58it/s]


 71%|███████████████████████▌         | 35703/50000 [6:28:42<2:21:20,  1.69it/s]


 71%|███████████████████████▌         | 35704/50000 [6:28:42<2:25:59,  1.63it/s]


 71%|███████████████████████▌         | 35705/50000 [6:28:43<2:27:01,  1.62it/s]


 71%|███████████████████████▌         | 35706/50000 [6:28:44<2:28:12,  1.61it/s]


 71%|███████████████████████▌         | 35707/50000 [6:28:44<2:24:31,  1.65it/s]


 71%|███████████████████████▌         | 35708/50000 [6:28:45<2:21:55,  1.68it/s]


 71%|███████████████████████▌         | 35709/50000 [6:28:45<2:19:11,  1.71it/s]


 71%|███████████████████████▌         | 35710/50000 [6:28:46<2:12:06,  1.80it/s]


 71%|███████████████████████▌         | 35711/50000 [6:28:46<2:14:06,  1.78it/s]


 71%|███████████████████████▌         | 35712/50000 [6:28:47<2:19:06,  1.71it/s]


 71%|███████████████████████▌         | 35713/50000 [6:28:47<2:14:42,  1.77it/s]


 71%|███████████████████████▌         | 35714/50000 [6:28:48<2:16:49,  1.74it/s]


 71%|███████████████████████▌         | 35715/50000 [6:28:49<2:17:51,  1.73it/s]


 71%|███████████████████████▌         | 35716/50000 [6:28:49<2:24:52,  1.64it/s]


 71%|███████████████████████▌         | 35717/50000 [6:28:50<2:35:22,  1.53it/s]


 71%|███████████████████████▌         | 35718/50000 [6:28:51<2:39:25,  1.49it/s]


 71%|███████████████████████▌         | 35719/50000 [6:28:51<2:37:14,  1.51it/s]


 71%|███████████████████████▌         | 35720/50000 [6:28:52<2:29:54,  1.59it/s]


 71%|███████████████████████▌         | 35721/50000 [6:28:53<2:23:55,  1.65it/s]


 71%|███████████████████████▌         | 35722/50000 [6:28:53<2:24:06,  1.65it/s]


 71%|███████████████████████▌         | 35723/50000 [6:28:54<2:22:54,  1.67it/s]


 71%|███████████████████████▌         | 35724/50000 [6:28:54<2:23:23,  1.66it/s]


 71%|███████████████████████▌         | 35725/50000 [6:28:55<2:17:38,  1.73it/s]


 71%|███████████████████████▌         | 35726/50000 [6:28:55<2:23:20,  1.66it/s]


 71%|███████████████████████▌         | 35727/50000 [6:28:56<2:22:23,  1.67it/s]


 71%|███████████████████████▌         | 35728/50000 [6:28:57<2:47:02,  1.42it/s]


 71%|███████████████████████▌         | 35729/50000 [6:28:58<2:43:30,  1.45it/s]


 71%|███████████████████████▌         | 35730/50000 [6:28:58<2:41:54,  1.47it/s]


 71%|███████████████████████▌         | 35731/50000 [6:28:59<2:34:00,  1.54it/s]


 71%|███████████████████████▌         | 35732/50000 [6:29:00<2:39:28,  1.49it/s]


 71%|███████████████████████▌         | 35733/50000 [6:29:00<2:36:23,  1.52it/s]


 71%|███████████████████████▌         | 35734/50000 [6:29:01<2:44:16,  1.45it/s]


 71%|███████████████████████▌         | 35735/50000 [6:29:02<2:36:06,  1.52it/s]


 71%|███████████████████████▌         | 35736/50000 [6:29:02<2:28:06,  1.61it/s]


 71%|███████████████████████▌         | 35737/50000 [6:29:03<2:17:50,  1.72it/s]


 71%|███████████████████████▌         | 35738/50000 [6:29:03<2:21:45,  1.68it/s]


 71%|███████████████████████▌         | 35739/50000 [6:29:04<2:14:05,  1.77it/s]


 71%|███████████████████████▌         | 35740/50000 [6:29:04<2:14:16,  1.77it/s]


 71%|███████████████████████▌         | 35741/50000 [6:29:05<2:19:48,  1.70it/s]


 71%|███████████████████████▌         | 35742/50000 [6:29:06<2:29:41,  1.59it/s]


 71%|███████████████████████▌         | 35743/50000 [6:29:06<2:25:11,  1.64it/s]


 71%|███████████████████████▌         | 35744/50000 [6:29:07<2:22:36,  1.67it/s]


 71%|███████████████████████▌         | 35745/50000 [6:29:07<2:19:48,  1.70it/s]


 71%|███████████████████████▌         | 35746/50000 [6:29:08<2:15:20,  1.76it/s]


 71%|███████████████████████▌         | 35747/50000 [6:29:09<2:27:00,  1.62it/s]


 71%|███████████████████████▌         | 35748/50000 [6:29:09<2:31:18,  1.57it/s]


 71%|███████████████████████▌         | 35749/50000 [6:29:10<2:29:03,  1.59it/s]


 72%|███████████████████████▌         | 35750/50000 [6:29:11<2:32:49,  1.55it/s]


 72%|███████████████████████▌         | 35751/50000 [6:29:11<2:32:39,  1.56it/s]


 72%|███████████████████████▌         | 35752/50000 [6:29:12<2:47:20,  1.42it/s]


 72%|███████████████████████▌         | 35753/50000 [6:29:13<2:47:46,  1.42it/s]


 72%|███████████████████████▌         | 35754/50000 [6:29:14<2:49:54,  1.40it/s]


 72%|███████████████████████▌         | 35755/50000 [6:29:14<2:38:11,  1.50it/s]


 72%|███████████████████████▌         | 35756/50000 [6:29:15<2:42:18,  1.46it/s]


 72%|███████████████████████▌         | 35757/50000 [6:29:16<2:41:49,  1.47it/s]


 72%|███████████████████████▌         | 35758/50000 [6:29:16<2:33:52,  1.54it/s]


 72%|███████████████████████▌         | 35759/50000 [6:29:17<2:32:59,  1.55it/s]


 72%|███████████████████████▌         | 35760/50000 [6:29:17<2:22:32,  1.67it/s]


 72%|███████████████████████▌         | 35761/50000 [6:29:18<2:28:04,  1.60it/s]


 72%|███████████████████████▌         | 35762/50000 [6:29:19<2:32:20,  1.56it/s]


 72%|███████████████████████▌         | 35763/50000 [6:29:19<2:31:15,  1.57it/s]


 72%|███████████████████████▌         | 35764/50000 [6:29:20<2:26:01,  1.62it/s]


 72%|███████████████████████▌         | 35765/50000 [6:29:20<2:30:01,  1.58it/s]


 72%|███████████████████████▌         | 35766/50000 [6:29:21<2:38:01,  1.50it/s]


 72%|███████████████████████▌         | 35767/50000 [6:29:22<2:41:10,  1.47it/s]


 72%|███████████████████████▌         | 35768/50000 [6:29:23<2:39:13,  1.49it/s]


 72%|███████████████████████▌         | 35769/50000 [6:29:23<2:43:53,  1.45it/s]


 72%|███████████████████████▌         | 35770/50000 [6:29:24<2:42:01,  1.46it/s]


 72%|███████████████████████▌         | 35771/50000 [6:29:24<2:30:36,  1.57it/s]


 72%|███████████████████████▌         | 35772/50000 [6:29:25<2:43:27,  1.45it/s]


 72%|███████████████████████▌         | 35773/50000 [6:29:26<2:54:17,  1.36it/s]


 72%|███████████████████████▌         | 35774/50000 [6:29:27<2:49:46,  1.40it/s]


 72%|███████████████████████▌         | 35775/50000 [6:29:27<2:38:31,  1.50it/s]


 72%|███████████████████████▌         | 35776/50000 [6:29:28<2:43:21,  1.45it/s]


 72%|███████████████████████▌         | 35777/50000 [6:29:29<2:34:00,  1.54it/s]


 72%|███████████████████████▌         | 35778/50000 [6:29:29<2:27:00,  1.61it/s]


 72%|███████████████████████▌         | 35779/50000 [6:29:30<2:30:39,  1.57it/s]


 72%|███████████████████████▌         | 35780/50000 [6:29:30<2:25:59,  1.62it/s]


 72%|███████████████████████▌         | 35781/50000 [6:29:31<2:22:43,  1.66it/s]


 72%|███████████████████████▌         | 35782/50000 [6:29:32<2:33:12,  1.55it/s]


 72%|███████████████████████▌         | 35783/50000 [6:29:32<2:27:19,  1.61it/s]


 72%|███████████████████████▌         | 35784/50000 [6:29:33<2:28:59,  1.59it/s]


 72%|███████████████████████▌         | 35785/50000 [6:29:34<2:27:28,  1.61it/s]


 72%|███████████████████████▌         | 35786/50000 [6:29:34<2:34:40,  1.53it/s]


 72%|███████████████████████▌         | 35787/50000 [6:29:35<2:32:50,  1.55it/s]


 72%|███████████████████████▌         | 35788/50000 [6:29:36<2:41:36,  1.47it/s]


 72%|███████████████████████▌         | 35789/50000 [6:29:36<2:37:14,  1.51it/s]


 72%|███████████████████████▌         | 35790/50000 [6:29:37<2:34:11,  1.54it/s]


 72%|███████████████████████▌         | 35791/50000 [6:29:38<2:29:45,  1.58it/s]


 72%|███████████████████████▌         | 35792/50000 [6:29:38<2:48:17,  1.41it/s]


 72%|███████████████████████▌         | 35793/50000 [6:29:39<2:49:21,  1.40it/s]


 72%|███████████████████████▌         | 35794/50000 [6:29:40<2:44:06,  1.44it/s]


 72%|███████████████████████▌         | 35795/50000 [6:29:41<2:52:31,  1.37it/s]


 72%|███████████████████████▋         | 35796/50000 [6:29:41<2:52:18,  1.37it/s]


 72%|███████████████████████▋         | 35797/50000 [6:29:42<2:48:51,  1.40it/s]


 72%|███████████████████████▋         | 35798/50000 [6:29:43<2:45:33,  1.43it/s]


 72%|███████████████████████▋         | 35799/50000 [6:29:43<2:39:10,  1.49it/s]


 72%|███████████████████████▋         | 35800/50000 [6:29:44<2:33:55,  1.54it/s]
                                                                                
{'loss': 3.2211, 'grad_norm': 3.507983684539795, 'learning_rate': 0.00028399999999999996, 'epoch': 1.87}

 72%|███████████████████████▋         | 35800/50000 [6:29:44<2:33:55,  1.54it/s]


 72%|███████████████████████▋         | 35801/50000 [6:29:45<2:33:19,  1.54it/s]


 72%|███████████████████████▋         | 35802/50000 [6:29:45<2:48:08,  1.41it/s]


 72%|███████████████████████▋         | 35803/50000 [6:29:46<2:39:32,  1.48it/s]


 72%|███████████████████████▋         | 35804/50000 [6:29:47<2:37:02,  1.51it/s]


 72%|███████████████████████▋         | 35805/50000 [6:29:47<2:32:08,  1.56it/s]


 72%|███████████████████████▋         | 35806/50000 [6:29:48<2:31:51,  1.56it/s]


 72%|███████████████████████▋         | 35807/50000 [6:29:48<2:26:46,  1.61it/s]


 72%|███████████████████████▋         | 35808/50000 [6:29:49<2:28:59,  1.59it/s]


 72%|███████████████████████▋         | 35809/50000 [6:29:50<2:23:51,  1.64it/s]


 72%|███████████████████████▋         | 35810/50000 [6:29:50<2:26:56,  1.61it/s]


 72%|███████████████████████▋         | 35811/50000 [6:29:51<2:31:21,  1.56it/s]


 72%|███████████████████████▋         | 35812/50000 [6:29:51<2:22:07,  1.66it/s]


 72%|███████████████████████▋         | 35813/50000 [6:29:52<2:22:08,  1.66it/s]


 72%|███████████████████████▋         | 35814/50000 [6:29:53<2:20:50,  1.68it/s]


 72%|███████████████████████▋         | 35815/50000 [6:29:53<2:19:29,  1.69it/s]


 72%|███████████████████████▋         | 35816/50000 [6:29:54<2:22:37,  1.66it/s]


 72%|███████████████████████▋         | 35817/50000 [6:29:55<2:27:52,  1.60it/s]


 72%|███████████████████████▋         | 35818/50000 [6:29:55<2:24:48,  1.63it/s]


 72%|███████████████████████▋         | 35819/50000 [6:29:56<2:28:02,  1.60it/s]


 72%|███████████████████████▋         | 35820/50000 [6:29:56<2:31:12,  1.56it/s]


 72%|███████████████████████▋         | 35821/50000 [6:29:57<2:24:56,  1.63it/s]


 72%|███████████████████████▋         | 35822/50000 [6:29:58<2:31:44,  1.56it/s]


 72%|███████████████████████▋         | 35823/50000 [6:29:58<2:30:15,  1.57it/s]


 72%|███████████████████████▋         | 35824/50000 [6:29:59<2:36:05,  1.51it/s]


 72%|███████████████████████▋         | 35825/50000 [6:30:00<2:30:10,  1.57it/s]


 72%|███████████████████████▋         | 35826/50000 [6:30:00<2:30:23,  1.57it/s]


 72%|███████████████████████▋         | 35827/50000 [6:30:01<2:31:27,  1.56it/s]


 72%|███████████████████████▋         | 35828/50000 [6:30:01<2:23:01,  1.65it/s]


 72%|███████████████████████▋         | 35829/50000 [6:30:02<2:24:58,  1.63it/s]


 72%|███████████████████████▋         | 35830/50000 [6:30:03<2:28:06,  1.59it/s]


 72%|███████████████████████▋         | 35831/50000 [6:30:03<2:22:16,  1.66it/s]


 72%|███████████████████████▋         | 35832/50000 [6:30:04<2:28:05,  1.59it/s]


 72%|███████████████████████▋         | 35833/50000 [6:30:05<2:28:56,  1.59it/s]


 72%|███████████████████████▋         | 35834/50000 [6:30:05<2:39:36,  1.48it/s]


 72%|███████████████████████▋         | 35835/50000 [6:30:06<2:36:36,  1.51it/s]


 72%|███████████████████████▋         | 35836/50000 [6:30:07<2:35:20,  1.52it/s]


 72%|███████████████████████▋         | 35837/50000 [6:30:07<2:33:07,  1.54it/s]


 72%|███████████████████████▋         | 35838/50000 [6:30:08<2:22:53,  1.65it/s]


 72%|███████████████████████▋         | 35839/50000 [6:30:08<2:26:09,  1.61it/s]


 72%|███████████████████████▋         | 35840/50000 [6:30:09<2:27:08,  1.60it/s]


 72%|███████████████████████▋         | 35841/50000 [6:30:10<2:24:14,  1.64it/s]


 72%|███████████████████████▋         | 35842/50000 [6:30:10<2:27:35,  1.60it/s]


 72%|███████████████████████▋         | 35843/50000 [6:30:11<2:24:07,  1.64it/s]


 72%|███████████████████████▋         | 35844/50000 [6:30:12<2:27:55,  1.60it/s]


 72%|███████████████████████▋         | 35845/50000 [6:30:12<2:23:26,  1.64it/s]


 72%|███████████████████████▋         | 35846/50000 [6:30:13<2:26:37,  1.61it/s]


 72%|███████████████████████▋         | 35847/50000 [6:30:14<2:34:00,  1.53it/s]


 72%|███████████████████████▋         | 35848/50000 [6:30:14<2:41:54,  1.46it/s]


 72%|███████████████████████▋         | 35849/50000 [6:30:15<2:40:32,  1.47it/s]


 72%|███████████████████████▋         | 35850/50000 [6:30:16<2:34:11,  1.53it/s]


 72%|███████████████████████▋         | 35851/50000 [6:30:16<2:26:36,  1.61it/s]


 72%|███████████████████████▋         | 35852/50000 [6:30:17<2:24:26,  1.63it/s]


 72%|███████████████████████▋         | 35853/50000 [6:30:17<2:25:11,  1.62it/s]


 72%|███████████████████████▋         | 35854/50000 [6:30:18<2:22:33,  1.65it/s]


 72%|███████████████████████▋         | 35855/50000 [6:30:19<2:31:39,  1.55it/s]


 72%|███████████████████████▋         | 35856/50000 [6:30:19<2:32:54,  1.54it/s]


 72%|███████████████████████▋         | 35857/50000 [6:30:20<2:28:25,  1.59it/s]


 72%|███████████████████████▋         | 35858/50000 [6:30:21<2:35:43,  1.51it/s]


 72%|███████████████████████▋         | 35859/50000 [6:30:21<2:33:02,  1.54it/s]


 72%|███████████████████████▋         | 35860/50000 [6:30:22<2:23:20,  1.64it/s]


 72%|███████████████████████▋         | 35861/50000 [6:30:22<2:21:20,  1.67it/s]


 72%|███████████████████████▋         | 35862/50000 [6:30:23<2:23:24,  1.64it/s]


 72%|███████████████████████▋         | 35863/50000 [6:30:23<2:19:30,  1.69it/s]


 72%|███████████████████████▋         | 35864/50000 [6:30:24<2:19:21,  1.69it/s]


 72%|███████████████████████▋         | 35865/50000 [6:30:25<2:30:57,  1.56it/s]


 72%|███████████████████████▋         | 35866/50000 [6:30:26<2:45:41,  1.42it/s]


 72%|███████████████████████▋         | 35867/50000 [6:30:26<2:29:24,  1.58it/s]


 72%|███████████████████████▋         | 35868/50000 [6:30:27<2:23:58,  1.64it/s]


 72%|███████████████████████▋         | 35869/50000 [6:30:27<2:27:44,  1.59it/s]


 72%|███████████████████████▋         | 35870/50000 [6:30:28<2:31:43,  1.55it/s]


 72%|███████████████████████▋         | 35871/50000 [6:30:29<2:32:44,  1.54it/s]


 72%|███████████████████████▋         | 35872/50000 [6:30:29<2:40:44,  1.46it/s]


 72%|███████████████████████▋         | 35873/50000 [6:30:30<2:37:15,  1.50it/s]


 72%|███████████████████████▋         | 35874/50000 [6:30:31<2:40:10,  1.47it/s]


 72%|███████████████████████▋         | 35875/50000 [6:30:32<2:52:01,  1.37it/s]


 72%|███████████████████████▋         | 35876/50000 [6:30:32<2:40:34,  1.47it/s]


 72%|███████████████████████▋         | 35877/50000 [6:30:33<2:55:31,  1.34it/s]


 72%|███████████████████████▋         | 35878/50000 [6:30:34<2:46:43,  1.41it/s]


 72%|███████████████████████▋         | 35879/50000 [6:30:34<2:43:14,  1.44it/s]


 72%|███████████████████████▋         | 35880/50000 [6:30:35<2:40:45,  1.46it/s]


 72%|███████████████████████▋         | 35881/50000 [6:30:36<2:35:26,  1.51it/s]


 72%|███████████████████████▋         | 35882/50000 [6:30:36<2:43:03,  1.44it/s]


 72%|███████████████████████▋         | 35883/50000 [6:30:37<2:35:16,  1.52it/s]


 72%|███████████████████████▋         | 35884/50000 [6:30:38<2:32:55,  1.54it/s]


 72%|███████████████████████▋         | 35885/50000 [6:30:38<2:31:45,  1.55it/s]


 72%|███████████████████████▋         | 35886/50000 [6:30:39<2:33:35,  1.53it/s]


 72%|███████████████████████▋         | 35887/50000 [6:30:40<2:38:00,  1.49it/s]


 72%|███████████████████████▋         | 35888/50000 [6:30:40<2:35:42,  1.51it/s]


 72%|███████████████████████▋         | 35889/50000 [6:30:41<2:34:07,  1.53it/s]


 72%|███████████████████████▋         | 35890/50000 [6:30:42<2:32:06,  1.55it/s]


 72%|███████████████████████▋         | 35891/50000 [6:30:42<2:27:34,  1.59it/s]


 72%|███████████████████████▋         | 35892/50000 [6:30:43<2:42:39,  1.45it/s]


 72%|███████████████████████▋         | 35893/50000 [6:30:44<2:28:05,  1.59it/s]


 72%|███████████████████████▋         | 35894/50000 [6:30:44<2:30:32,  1.56it/s]


 72%|███████████████████████▋         | 35895/50000 [6:30:45<2:24:17,  1.63it/s]


 72%|███████████████████████▋         | 35896/50000 [6:30:45<2:21:39,  1.66it/s]


 72%|███████████████████████▋         | 35897/50000 [6:30:46<2:21:59,  1.66it/s]


 72%|███████████████████████▋         | 35898/50000 [6:30:46<2:20:00,  1.68it/s]


 72%|███████████████████████▋         | 35899/50000 [6:30:47<2:29:28,  1.57it/s]


 72%|███████████████████████▋         | 35900/50000 [6:30:48<2:28:50,  1.58it/s]
                                                                                
{'loss': 3.2327, 'grad_norm': 3.3519856929779053, 'learning_rate': 0.00028199999999999997, 'epoch': 1.88}

 72%|███████████████████████▋         | 35900/50000 [6:30:48<2:28:50,  1.58it/s]


 72%|███████████████████████▋         | 35901/50000 [6:30:49<2:32:24,  1.54it/s]


 72%|███████████████████████▋         | 35902/50000 [6:30:49<2:33:32,  1.53it/s]


 72%|███████████████████████▋         | 35903/50000 [6:30:50<2:26:20,  1.61it/s]


 72%|███████████████████████▋         | 35904/50000 [6:30:50<2:29:17,  1.57it/s]


 72%|███████████████████████▋         | 35905/50000 [6:30:51<2:37:44,  1.49it/s]


 72%|███████████████████████▋         | 35906/50000 [6:30:52<2:30:09,  1.56it/s]


 72%|███████████████████████▋         | 35907/50000 [6:30:53<2:44:23,  1.43it/s]


 72%|███████████████████████▋         | 35908/50000 [6:30:53<2:34:47,  1.52it/s]


 72%|███████████████████████▋         | 35909/50000 [6:30:54<2:24:18,  1.63it/s]


 72%|███████████████████████▋         | 35910/50000 [6:30:54<2:33:49,  1.53it/s]


 72%|███████████████████████▋         | 35911/50000 [6:30:55<2:29:08,  1.57it/s]


 72%|███████████████████████▋         | 35912/50000 [6:30:56<2:36:15,  1.50it/s]


 72%|███████████████████████▋         | 35913/50000 [6:30:56<2:39:20,  1.47it/s]


 72%|███████████████████████▋         | 35914/50000 [6:30:57<2:33:29,  1.53it/s]


 72%|███████████████████████▋         | 35915/50000 [6:30:58<2:29:20,  1.57it/s]


 72%|███████████████████████▋         | 35916/50000 [6:30:58<2:29:07,  1.57it/s]


 72%|███████████████████████▋         | 35917/50000 [6:30:59<2:25:25,  1.61it/s]


 72%|███████████████████████▋         | 35918/50000 [6:31:00<2:32:47,  1.54it/s]


 72%|███████████████████████▋         | 35919/50000 [6:31:01<2:58:52,  1.31it/s]


 72%|███████████████████████▋         | 35920/50000 [6:31:01<2:45:18,  1.42it/s]


 72%|███████████████████████▋         | 35921/50000 [6:31:02<2:30:43,  1.56it/s]


 72%|███████████████████████▋         | 35922/50000 [6:31:02<2:26:05,  1.61it/s]


 72%|███████████████████████▋         | 35923/50000 [6:31:03<2:23:16,  1.64it/s]


 72%|███████████████████████▋         | 35924/50000 [6:31:04<2:30:13,  1.56it/s]


 72%|███████████████████████▋         | 35925/50000 [6:31:04<2:27:28,  1.59it/s]


 72%|███████████████████████▋         | 35926/50000 [6:31:05<2:47:55,  1.40it/s]


 72%|███████████████████████▋         | 35927/50000 [6:31:06<3:03:29,  1.28it/s]


 72%|███████████████████████▋         | 35928/50000 [6:31:07<2:49:25,  1.38it/s]


 72%|███████████████████████▋         | 35929/50000 [6:31:07<2:41:20,  1.45it/s]


 72%|███████████████████████▋         | 35930/50000 [6:31:08<2:40:08,  1.46it/s]


 72%|███████████████████████▋         | 35931/50000 [6:31:08<2:33:57,  1.52it/s]


 72%|███████████████████████▋         | 35932/50000 [6:31:09<2:29:11,  1.57it/s]


 72%|███████████████████████▋         | 35933/50000 [6:31:10<2:34:57,  1.51it/s]


 72%|███████████████████████▋         | 35934/50000 [6:31:10<2:34:18,  1.52it/s]


 72%|███████████████████████▋         | 35935/50000 [6:31:11<2:27:40,  1.59it/s]


 72%|███████████████████████▋         | 35936/50000 [6:31:12<2:28:47,  1.58it/s]


 72%|███████████████████████▋         | 35937/50000 [6:31:12<2:24:40,  1.62it/s]


 72%|███████████████████████▋         | 35938/50000 [6:31:13<2:27:32,  1.59it/s]


 72%|███████████████████████▋         | 35939/50000 [6:31:13<2:29:07,  1.57it/s]


 72%|███████████████████████▋         | 35940/50000 [6:31:14<2:24:01,  1.63it/s]


 72%|███████████████████████▋         | 35941/50000 [6:31:15<2:23:04,  1.64it/s]


 72%|███████████████████████▋         | 35942/50000 [6:31:15<2:19:47,  1.68it/s]


 72%|███████████████████████▋         | 35943/50000 [6:31:16<2:19:21,  1.68it/s]


 72%|███████████████████████▋         | 35944/50000 [6:31:16<2:19:01,  1.69it/s]


 72%|███████████████████████▋         | 35945/50000 [6:31:17<2:29:39,  1.57it/s]


 72%|███████████████████████▋         | 35946/50000 [6:31:18<2:24:56,  1.62it/s]


 72%|███████████████████████▋         | 35947/50000 [6:31:18<2:24:12,  1.62it/s]


 72%|███████████████████████▋         | 35948/50000 [6:31:19<2:32:48,  1.53it/s]


 72%|███████████████████████▋         | 35949/50000 [6:31:20<2:27:59,  1.58it/s]


 72%|███████████████████████▋         | 35950/50000 [6:31:20<2:29:58,  1.56it/s]


 72%|███████████████████████▋         | 35951/50000 [6:31:21<2:25:08,  1.61it/s]


 72%|███████████████████████▋         | 35952/50000 [6:31:21<2:19:58,  1.67it/s]


 72%|███████████████████████▋         | 35953/50000 [6:31:22<2:24:49,  1.62it/s]


 72%|███████████████████████▋         | 35954/50000 [6:31:23<2:29:33,  1.57it/s]


 72%|███████████████████████▋         | 35955/50000 [6:31:23<2:26:30,  1.60it/s]


 72%|███████████████████████▋         | 35956/50000 [6:31:24<2:24:19,  1.62it/s]


 72%|███████████████████████▋         | 35957/50000 [6:31:25<2:29:16,  1.57it/s]


 72%|███████████████████████▋         | 35958/50000 [6:31:25<2:27:18,  1.59it/s]


 72%|███████████████████████▋         | 35959/50000 [6:31:26<2:27:33,  1.59it/s]


 72%|███████████████████████▋         | 35960/50000 [6:31:26<2:23:43,  1.63it/s]


 72%|███████████████████████▋         | 35961/50000 [6:31:27<2:33:10,  1.53it/s]


 72%|███████████████████████▋         | 35962/50000 [6:31:28<2:27:41,  1.58it/s]


 72%|███████████████████████▋         | 35963/50000 [6:31:29<2:35:05,  1.51it/s]


 72%|███████████████████████▋         | 35964/50000 [6:31:29<2:41:07,  1.45it/s]


 72%|███████████████████████▋         | 35965/50000 [6:31:30<2:42:33,  1.44it/s]


 72%|███████████████████████▋         | 35966/50000 [6:31:31<2:37:26,  1.49it/s]


 72%|███████████████████████▋         | 35967/50000 [6:31:31<2:34:27,  1.51it/s]


 72%|███████████████████████▋         | 35968/50000 [6:31:32<2:33:40,  1.52it/s]


 72%|███████████████████████▋         | 35969/50000 [6:31:33<2:40:25,  1.46it/s]


 72%|███████████████████████▋         | 35970/50000 [6:31:33<2:39:00,  1.47it/s]


 72%|███████████████████████▋         | 35971/50000 [6:31:34<2:31:28,  1.54it/s]


 72%|███████████████████████▋         | 35972/50000 [6:31:35<2:34:12,  1.52it/s]


 72%|███████████████████████▋         | 35973/50000 [6:31:35<2:35:40,  1.50it/s]


 72%|███████████████████████▋         | 35974/50000 [6:31:36<2:41:21,  1.45it/s]


 72%|███████████████████████▋         | 35975/50000 [6:31:37<2:38:58,  1.47it/s]


 72%|███████████████████████▋         | 35976/50000 [6:31:37<2:42:18,  1.44it/s]


 72%|███████████████████████▋         | 35977/50000 [6:31:38<2:39:01,  1.47it/s]


 72%|███████████████████████▋         | 35978/50000 [6:31:39<2:38:07,  1.48it/s]


 72%|███████████████████████▋         | 35979/50000 [6:31:39<2:35:11,  1.51it/s]


 72%|███████████████████████▋         | 35980/50000 [6:31:40<2:45:22,  1.41it/s]


 72%|███████████████████████▋         | 35981/50000 [6:31:41<2:47:51,  1.39it/s]


 72%|███████████████████████▋         | 35982/50000 [6:31:42<2:48:44,  1.38it/s]


 72%|███████████████████████▋         | 35983/50000 [6:31:42<2:50:15,  1.37it/s]


 72%|███████████████████████▋         | 35984/50000 [6:31:43<2:39:43,  1.46it/s]


 72%|███████████████████████▊         | 35985/50000 [6:31:44<2:42:04,  1.44it/s]


 72%|███████████████████████▊         | 35986/50000 [6:31:45<2:53:22,  1.35it/s]


 72%|███████████████████████▊         | 35987/50000 [6:31:45<2:48:49,  1.38it/s]


 72%|███████████████████████▊         | 35988/50000 [6:31:46<2:38:07,  1.48it/s]


 72%|███████████████████████▊         | 35989/50000 [6:31:46<2:31:56,  1.54it/s]


 72%|███████████████████████▊         | 35990/50000 [6:31:47<2:37:34,  1.48it/s]


 72%|███████████████████████▊         | 35991/50000 [6:31:48<2:38:14,  1.48it/s]


 72%|███████████████████████▊         | 35992/50000 [6:31:48<2:37:23,  1.48it/s]


 72%|███████████████████████▊         | 35993/50000 [6:31:49<2:32:18,  1.53it/s]


 72%|███████████████████████▊         | 35994/50000 [6:31:50<2:32:18,  1.53it/s]


 72%|███████████████████████▊         | 35995/50000 [6:31:50<2:36:16,  1.49it/s]


 72%|███████████████████████▊         | 35996/50000 [6:31:51<2:23:48,  1.62it/s]


 72%|███████████████████████▊         | 35997/50000 [6:31:51<2:22:45,  1.63it/s]


 72%|███████████████████████▊         | 35998/50000 [6:31:52<2:18:33,  1.68it/s]


 72%|███████████████████████▊         | 35999/50000 [6:31:53<2:26:46,  1.59it/s]


 72%|███████████████████████▊         | 36000/50000 [6:31:53<2:29:42,  1.56it/s]
                                                                                
{'loss': 3.2058, 'grad_norm': 3.6349074840545654, 'learning_rate': 0.00028000000000000003, 'epoch': 1.88}

 72%|███████████████████████▊         | 36000/50000 [6:31:53<2:29:42,  1.56it/s]


 72%|███████████████████████▊         | 36001/50000 [6:31:54<2:31:19,  1.54it/s]


 72%|███████████████████████▊         | 36002/50000 [6:31:55<2:36:10,  1.49it/s]


 72%|███████████████████████▊         | 36003/50000 [6:31:56<2:53:13,  1.35it/s]


 72%|███████████████████████▊         | 36004/50000 [6:31:57<2:59:42,  1.30it/s]


 72%|███████████████████████▊         | 36005/50000 [6:31:57<2:51:25,  1.36it/s]


 72%|███████████████████████▊         | 36006/50000 [6:31:58<2:46:03,  1.40it/s]


 72%|███████████████████████▊         | 36007/50000 [6:31:59<2:41:57,  1.44it/s]


 72%|███████████████████████▊         | 36008/50000 [6:31:59<2:35:30,  1.50it/s]


 72%|███████████████████████▊         | 36009/50000 [6:32:00<2:33:41,  1.52it/s]


 72%|███████████████████████▊         | 36010/50000 [6:32:00<2:35:01,  1.50it/s]


 72%|███████████████████████▊         | 36011/50000 [6:32:01<2:35:31,  1.50it/s]


 72%|███████████████████████▊         | 36012/50000 [6:32:02<2:35:51,  1.50it/s]


 72%|███████████████████████▊         | 36013/50000 [6:32:02<2:29:10,  1.56it/s]


 72%|███████████████████████▊         | 36014/50000 [6:32:03<2:30:34,  1.55it/s]


 72%|███████████████████████▊         | 36015/50000 [6:32:04<2:31:04,  1.54it/s]


 72%|███████████████████████▊         | 36016/50000 [6:32:04<2:43:20,  1.43it/s]


 72%|███████████████████████▊         | 36017/50000 [6:32:05<2:36:22,  1.49it/s]


 72%|███████████████████████▊         | 36018/50000 [6:32:06<2:31:02,  1.54it/s]


 72%|███████████████████████▊         | 36019/50000 [6:32:06<2:33:46,  1.52it/s]


 72%|███████████████████████▊         | 36020/50000 [6:32:07<2:32:32,  1.53it/s]


 72%|███████████████████████▊         | 36021/50000 [6:32:08<2:38:44,  1.47it/s]


 72%|███████████████████████▊         | 36022/50000 [6:32:08<2:32:44,  1.53it/s]


 72%|███████████████████████▊         | 36023/50000 [6:32:09<2:27:12,  1.58it/s]


 72%|███████████████████████▊         | 36024/50000 [6:32:10<2:29:08,  1.56it/s]


 72%|███████████████████████▊         | 36025/50000 [6:32:10<2:25:07,  1.60it/s]


 72%|███████████████████████▊         | 36026/50000 [6:32:11<2:29:24,  1.56it/s]


 72%|███████████████████████▊         | 36027/50000 [6:32:12<2:31:02,  1.54it/s]


 72%|███████████████████████▊         | 36028/50000 [6:32:12<2:33:02,  1.52it/s]


 72%|███████████████████████▊         | 36029/50000 [6:32:13<2:32:14,  1.53it/s]


 72%|███████████████████████▊         | 36030/50000 [6:32:13<2:26:49,  1.59it/s]


 72%|███████████████████████▊         | 36031/50000 [6:32:14<2:29:38,  1.56it/s]


 72%|███████████████████████▊         | 36032/50000 [6:32:15<2:28:30,  1.57it/s]


 72%|███████████████████████▊         | 36033/50000 [6:32:15<2:33:55,  1.51it/s]


 72%|███████████████████████▊         | 36034/50000 [6:32:16<2:30:08,  1.55it/s]


 72%|███████████████████████▊         | 36035/50000 [6:32:17<2:30:12,  1.55it/s]


 72%|███████████████████████▊         | 36036/50000 [6:32:17<2:30:38,  1.54it/s]


 72%|███████████████████████▊         | 36037/50000 [6:32:18<2:33:10,  1.52it/s]


 72%|███████████████████████▊         | 36038/50000 [6:32:19<2:32:19,  1.53it/s]


 72%|███████████████████████▊         | 36039/50000 [6:32:19<2:38:56,  1.46it/s]


 72%|███████████████████████▊         | 36040/50000 [6:32:20<2:35:28,  1.50it/s]


 72%|███████████████████████▊         | 36041/50000 [6:32:21<2:32:40,  1.52it/s]


 72%|███████████████████████▊         | 36042/50000 [6:32:21<2:27:42,  1.57it/s]


 72%|███████████████████████▊         | 36043/50000 [6:32:22<2:34:11,  1.51it/s]


 72%|███████████████████████▊         | 36044/50000 [6:32:23<2:32:17,  1.53it/s]


 72%|███████████████████████▊         | 36045/50000 [6:32:23<2:29:07,  1.56it/s]


 72%|███████████████████████▊         | 36046/50000 [6:32:24<2:31:22,  1.54it/s]


 72%|███████████████████████▊         | 36047/50000 [6:32:25<2:26:18,  1.59it/s]


 72%|███████████████████████▊         | 36048/50000 [6:32:25<2:21:56,  1.64it/s]


 72%|███████████████████████▊         | 36049/50000 [6:32:26<2:18:48,  1.68it/s]


 72%|███████████████████████▊         | 36050/50000 [6:32:26<2:21:02,  1.65it/s]


 72%|███████████████████████▊         | 36051/50000 [6:32:27<2:26:35,  1.59it/s]


 72%|███████████████████████▊         | 36052/50000 [6:32:28<2:32:13,  1.53it/s]


 72%|███████████████████████▊         | 36053/50000 [6:32:28<2:26:26,  1.59it/s]


 72%|███████████████████████▊         | 36054/50000 [6:32:29<2:24:09,  1.61it/s]


 72%|███████████████████████▊         | 36055/50000 [6:32:29<2:27:23,  1.58it/s]


 72%|███████████████████████▊         | 36056/50000 [6:32:30<2:25:16,  1.60it/s]


 72%|███████████████████████▊         | 36057/50000 [6:32:31<2:22:52,  1.63it/s]


 72%|███████████████████████▊         | 36058/50000 [6:32:31<2:22:06,  1.64it/s]


 72%|███████████████████████▊         | 36059/50000 [6:32:32<2:13:27,  1.74it/s]


 72%|███████████████████████▊         | 36060/50000 [6:32:32<2:11:59,  1.76it/s]


 72%|███████████████████████▊         | 36061/50000 [6:32:33<2:17:04,  1.69it/s]


 72%|███████████████████████▊         | 36062/50000 [6:32:34<2:16:42,  1.70it/s]


 72%|███████████████████████▊         | 36063/50000 [6:32:34<2:19:30,  1.67it/s]


 72%|███████████████████████▊         | 36064/50000 [6:32:35<2:29:33,  1.55it/s]


 72%|███████████████████████▊         | 36065/50000 [6:32:36<2:28:05,  1.57it/s]


 72%|███████████████████████▊         | 36066/50000 [6:32:36<2:31:00,  1.54it/s]


 72%|███████████████████████▊         | 36067/50000 [6:32:37<2:29:58,  1.55it/s]


 72%|███████████████████████▊         | 36068/50000 [6:32:38<2:29:14,  1.56it/s]


 72%|███████████████████████▊         | 36069/50000 [6:32:38<2:29:35,  1.55it/s]


 72%|███████████████████████▊         | 36070/50000 [6:32:39<2:32:03,  1.53it/s]


 72%|███████████████████████▊         | 36071/50000 [6:32:39<2:26:46,  1.58it/s]


 72%|███████████████████████▊         | 36072/50000 [6:32:40<2:42:23,  1.43it/s]


 72%|███████████████████████▊         | 36073/50000 [6:32:41<2:39:01,  1.46it/s]


 72%|███████████████████████▊         | 36074/50000 [6:32:42<2:35:52,  1.49it/s]


 72%|███████████████████████▊         | 36075/50000 [6:32:42<2:33:11,  1.51it/s]


 72%|███████████████████████▊         | 36076/50000 [6:32:43<2:33:59,  1.51it/s]


 72%|███████████████████████▊         | 36077/50000 [6:32:44<2:39:54,  1.45it/s]


 72%|███████████████████████▊         | 36078/50000 [6:32:44<2:31:12,  1.53it/s]


 72%|███████████████████████▊         | 36079/50000 [6:32:45<2:19:36,  1.66it/s]


 72%|███████████████████████▊         | 36080/50000 [6:32:45<2:27:37,  1.57it/s]


 72%|███████████████████████▊         | 36081/50000 [6:32:46<2:25:12,  1.60it/s]


 72%|███████████████████████▊         | 36082/50000 [6:32:47<2:28:46,  1.56it/s]


 72%|███████████████████████▊         | 36083/50000 [6:32:47<2:23:47,  1.61it/s]


 72%|███████████████████████▊         | 36084/50000 [6:32:48<2:28:13,  1.56it/s]


 72%|███████████████████████▊         | 36085/50000 [6:32:49<2:35:28,  1.49it/s]


 72%|███████████████████████▊         | 36086/50000 [6:32:49<2:36:32,  1.48it/s]


 72%|███████████████████████▊         | 36087/50000 [6:32:50<2:30:31,  1.54it/s]


 72%|███████████████████████▊         | 36088/50000 [6:32:51<2:39:16,  1.46it/s]


 72%|███████████████████████▊         | 36089/50000 [6:32:51<2:37:18,  1.47it/s]


 72%|███████████████████████▊         | 36090/50000 [6:32:52<2:43:06,  1.42it/s]


 72%|███████████████████████▊         | 36091/50000 [6:32:53<2:35:05,  1.49it/s]


 72%|███████████████████████▊         | 36092/50000 [6:32:53<2:31:43,  1.53it/s]


 72%|███████████████████████▊         | 36093/50000 [6:32:54<2:38:47,  1.46it/s]


 72%|███████████████████████▊         | 36094/50000 [6:32:55<2:31:13,  1.53it/s]


 72%|███████████████████████▊         | 36095/50000 [6:32:55<2:32:55,  1.52it/s]


 72%|███████████████████████▊         | 36096/50000 [6:32:56<2:40:02,  1.45it/s]


 72%|███████████████████████▊         | 36097/50000 [6:32:57<2:46:19,  1.39it/s]


 72%|███████████████████████▊         | 36098/50000 [6:32:58<2:39:34,  1.45it/s]


 72%|███████████████████████▊         | 36099/50000 [6:32:58<2:36:59,  1.48it/s]


 72%|███████████████████████▊         | 36100/50000 [6:32:59<2:27:44,  1.57it/s]
                                                                                
{'loss': 3.1717, 'grad_norm': 3.0129215717315674, 'learning_rate': 0.00027800000000000004, 'epoch': 1.89}

 72%|███████████████████████▊         | 36100/50000 [6:32:59<2:27:44,  1.57it/s]


 72%|███████████████████████▊         | 36101/50000 [6:32:59<2:25:12,  1.60it/s]


 72%|███████████████████████▊         | 36102/50000 [6:33:00<2:19:56,  1.66it/s]


 72%|███████████████████████▊         | 36103/50000 [6:33:00<2:17:08,  1.69it/s]


 72%|███████████████████████▊         | 36104/50000 [6:33:01<2:20:52,  1.64it/s]


 72%|███████████████████████▊         | 36105/50000 [6:33:02<2:23:52,  1.61it/s]


 72%|███████████████████████▊         | 36106/50000 [6:33:02<2:33:53,  1.50it/s]


 72%|███████████████████████▊         | 36107/50000 [6:33:03<2:33:34,  1.51it/s]


 72%|███████████████████████▊         | 36108/50000 [6:33:04<2:34:16,  1.50it/s]


 72%|███████████████████████▊         | 36109/50000 [6:33:04<2:33:22,  1.51it/s]


 72%|███████████████████████▊         | 36110/50000 [6:33:05<2:30:56,  1.53it/s]


 72%|███████████████████████▊         | 36111/50000 [6:33:06<2:21:03,  1.64it/s]


 72%|███████████████████████▊         | 36112/50000 [6:33:06<2:28:02,  1.56it/s]


 72%|███████████████████████▊         | 36113/50000 [6:33:07<2:22:00,  1.63it/s]


 72%|███████████████████████▊         | 36114/50000 [6:33:08<2:24:47,  1.60it/s]


 72%|███████████████████████▊         | 36115/50000 [6:33:08<2:22:21,  1.63it/s]


 72%|███████████████████████▊         | 36116/50000 [6:33:09<2:20:13,  1.65it/s]


 72%|███████████████████████▊         | 36117/50000 [6:33:09<2:25:44,  1.59it/s]


 72%|███████████████████████▊         | 36118/50000 [6:33:10<2:29:08,  1.55it/s]


 72%|███████████████████████▊         | 36119/50000 [6:33:11<2:26:27,  1.58it/s]


 72%|███████████████████████▊         | 36120/50000 [6:33:12<2:49:09,  1.37it/s]


 72%|███████████████████████▊         | 36121/50000 [6:33:12<2:45:24,  1.40it/s]


 72%|███████████████████████▊         | 36122/50000 [6:33:13<2:40:34,  1.44it/s]


 72%|███████████████████████▊         | 36123/50000 [6:33:14<2:41:35,  1.43it/s]


 72%|███████████████████████▊         | 36124/50000 [6:33:14<2:37:03,  1.47it/s]


 72%|███████████████████████▊         | 36125/50000 [6:33:15<2:33:51,  1.50it/s]


 72%|███████████████████████▊         | 36126/50000 [6:33:16<2:34:43,  1.49it/s]


 72%|███████████████████████▊         | 36127/50000 [6:33:16<2:24:06,  1.60it/s]


 72%|███████████████████████▊         | 36128/50000 [6:33:17<2:40:38,  1.44it/s]


 72%|███████████████████████▊         | 36129/50000 [6:33:18<2:35:53,  1.48it/s]


 72%|███████████████████████▊         | 36130/50000 [6:33:18<2:32:10,  1.52it/s]


 72%|███████████████████████▊         | 36131/50000 [6:33:19<2:24:19,  1.60it/s]


 72%|███████████████████████▊         | 36132/50000 [6:33:19<2:26:12,  1.58it/s]


 72%|███████████████████████▊         | 36133/50000 [6:33:20<2:39:20,  1.45it/s]


 72%|███████████████████████▊         | 36134/50000 [6:33:21<2:31:36,  1.52it/s]


 72%|███████████████████████▊         | 36135/50000 [6:33:22<2:37:13,  1.47it/s]


 72%|███████████████████████▊         | 36136/50000 [6:33:22<2:28:18,  1.56it/s]


 72%|███████████████████████▊         | 36137/50000 [6:33:23<2:27:43,  1.56it/s]


 72%|███████████████████████▊         | 36138/50000 [6:33:23<2:29:02,  1.55it/s]


 72%|███████████████████████▊         | 36139/50000 [6:33:24<2:25:35,  1.59it/s]


 72%|███████████████████████▊         | 36140/50000 [6:33:25<2:38:28,  1.46it/s]


 72%|███████████████████████▊         | 36141/50000 [6:33:25<2:36:35,  1.48it/s]


 72%|███████████████████████▊         | 36142/50000 [6:33:26<2:39:13,  1.45it/s]


 72%|███████████████████████▊         | 36143/50000 [6:33:27<2:32:41,  1.51it/s]


 72%|███████████████████████▊         | 36144/50000 [6:33:27<2:24:44,  1.60it/s]


 72%|███████████████████████▊         | 36145/50000 [6:33:28<2:24:43,  1.60it/s]


 72%|███████████████████████▊         | 36146/50000 [6:33:29<2:30:28,  1.53it/s]


 72%|███████████████████████▊         | 36147/50000 [6:33:29<2:25:41,  1.58it/s]


 72%|███████████████████████▊         | 36148/50000 [6:33:30<2:17:19,  1.68it/s]


 72%|███████████████████████▊         | 36149/50000 [6:33:30<2:23:27,  1.61it/s]


 72%|███████████████████████▊         | 36150/50000 [6:33:31<2:24:41,  1.60it/s]


 72%|███████████████████████▊         | 36151/50000 [6:33:32<2:24:41,  1.60it/s]


 72%|███████████████████████▊         | 36152/50000 [6:33:32<2:27:17,  1.57it/s]


 72%|███████████████████████▊         | 36153/50000 [6:33:33<2:29:07,  1.55it/s]


 72%|███████████████████████▊         | 36154/50000 [6:33:34<2:29:59,  1.54it/s]


 72%|███████████████████████▊         | 36155/50000 [6:33:34<2:30:35,  1.53it/s]


 72%|███████████████████████▊         | 36156/50000 [6:33:35<2:35:31,  1.48it/s]


 72%|███████████████████████▊         | 36157/50000 [6:33:36<2:35:44,  1.48it/s]


 72%|███████████████████████▊         | 36158/50000 [6:33:37<2:42:43,  1.42it/s]


 72%|███████████████████████▊         | 36159/50000 [6:33:37<2:49:54,  1.36it/s]


 72%|███████████████████████▊         | 36160/50000 [6:33:38<2:42:23,  1.42it/s]


 72%|███████████████████████▊         | 36161/50000 [6:33:39<2:39:13,  1.45it/s]


 72%|███████████████████████▊         | 36162/50000 [6:33:39<2:26:43,  1.57it/s]


 72%|███████████████████████▊         | 36163/50000 [6:33:40<2:29:07,  1.55it/s]


 72%|███████████████████████▊         | 36164/50000 [6:33:40<2:26:15,  1.58it/s]


 72%|███████████████████████▊         | 36165/50000 [6:33:41<2:26:25,  1.57it/s]


 72%|███████████████████████▊         | 36166/50000 [6:33:42<2:31:42,  1.52it/s]


 72%|███████████████████████▊         | 36167/50000 [6:33:43<2:35:41,  1.48it/s]


 72%|███████████████████████▊         | 36168/50000 [6:33:43<2:30:08,  1.54it/s]


 72%|███████████████████████▊         | 36169/50000 [6:33:44<2:24:57,  1.59it/s]


 72%|███████████████████████▊         | 36170/50000 [6:33:44<2:21:20,  1.63it/s]


 72%|███████████████████████▊         | 36171/50000 [6:33:45<2:21:52,  1.62it/s]


 72%|███████████████████████▊         | 36172/50000 [6:33:46<2:26:28,  1.57it/s]


 72%|███████████████████████▊         | 36173/50000 [6:33:46<2:27:15,  1.56it/s]


 72%|███████████████████████▊         | 36174/50000 [6:33:47<2:24:41,  1.59it/s]


 72%|███████████████████████▉         | 36175/50000 [6:33:47<2:16:59,  1.68it/s]


 72%|███████████████████████▉         | 36176/50000 [6:33:48<2:22:21,  1.62it/s]


 72%|███████████████████████▉         | 36177/50000 [6:33:49<2:25:12,  1.59it/s]


 72%|███████████████████████▉         | 36178/50000 [6:33:49<2:20:34,  1.64it/s]


 72%|███████████████████████▉         | 36179/50000 [6:33:50<2:21:15,  1.63it/s]


 72%|███████████████████████▉         | 36180/50000 [6:33:50<2:24:21,  1.60it/s]


 72%|███████████████████████▉         | 36181/50000 [6:33:51<2:21:41,  1.63it/s]


 72%|███████████████████████▉         | 36182/50000 [6:33:52<2:31:22,  1.52it/s]


 72%|███████████████████████▉         | 36183/50000 [6:33:53<2:31:54,  1.52it/s]


 72%|███████████████████████▉         | 36184/50000 [6:33:53<2:29:36,  1.54it/s]


 72%|███████████████████████▉         | 36185/50000 [6:33:54<2:22:20,  1.62it/s]


 72%|███████████████████████▉         | 36186/50000 [6:33:54<2:14:49,  1.71it/s]


 72%|███████████████████████▉         | 36187/50000 [6:33:55<2:13:18,  1.73it/s]


 72%|███████████████████████▉         | 36188/50000 [6:33:55<2:08:56,  1.79it/s]


 72%|███████████████████████▉         | 36189/50000 [6:33:56<2:14:48,  1.71it/s]


 72%|███████████████████████▉         | 36190/50000 [6:33:57<2:23:48,  1.60it/s]


 72%|███████████████████████▉         | 36191/50000 [6:33:57<2:24:45,  1.59it/s]


 72%|███████████████████████▉         | 36192/50000 [6:33:58<2:27:15,  1.56it/s]


 72%|███████████████████████▉         | 36193/50000 [6:33:59<2:24:36,  1.59it/s]


 72%|███████████████████████▉         | 36194/50000 [6:33:59<2:28:32,  1.55it/s]


 72%|███████████████████████▉         | 36195/50000 [6:34:00<2:23:28,  1.60it/s]


 72%|███████████████████████▉         | 36196/50000 [6:34:00<2:21:06,  1.63it/s]


 72%|███████████████████████▉         | 36197/50000 [6:34:01<2:22:29,  1.61it/s]


 72%|███████████████████████▉         | 36198/50000 [6:34:02<2:18:38,  1.66it/s]


 72%|███████████████████████▉         | 36199/50000 [6:34:02<2:22:53,  1.61it/s]


 72%|███████████████████████▉         | 36200/50000 [6:34:03<2:21:32,  1.62it/s]
                                                                                
{'loss': 3.2271, 'grad_norm': 3.0183732509613037, 'learning_rate': 0.00027600000000000004, 'epoch': 1.9}

 72%|███████████████████████▉         | 36200/50000 [6:34:03<2:21:32,  1.62it/s]


 72%|███████████████████████▉         | 36201/50000 [6:34:03<2:18:26,  1.66it/s]


 72%|███████████████████████▉         | 36202/50000 [6:34:04<2:21:52,  1.62it/s]


 72%|███████████████████████▉         | 36203/50000 [6:34:05<2:23:49,  1.60it/s]


 72%|███████████████████████▉         | 36204/50000 [6:34:05<2:33:16,  1.50it/s]


 72%|███████████████████████▉         | 36205/50000 [6:34:06<2:28:47,  1.55it/s]


 72%|███████████████████████▉         | 36206/50000 [6:34:07<2:41:22,  1.42it/s]


 72%|███████████████████████▉         | 36207/50000 [6:34:07<2:32:40,  1.51it/s]


 72%|███████████████████████▉         | 36208/50000 [6:34:08<2:27:57,  1.55it/s]


 72%|███████████████████████▉         | 36209/50000 [6:34:09<2:22:54,  1.61it/s]


 72%|███████████████████████▉         | 36210/50000 [6:34:09<2:25:27,  1.58it/s]


 72%|███████████████████████▉         | 36211/50000 [6:34:10<2:32:37,  1.51it/s]


 72%|███████████████████████▉         | 36212/50000 [6:34:11<2:40:17,  1.43it/s]


 72%|███████████████████████▉         | 36213/50000 [6:34:11<2:32:47,  1.50it/s]


 72%|███████████████████████▉         | 36214/50000 [6:34:12<2:28:23,  1.55it/s]


 72%|███████████████████████▉         | 36215/50000 [6:34:13<2:25:44,  1.58it/s]


 72%|███████████████████████▉         | 36216/50000 [6:34:13<2:30:53,  1.52it/s]


 72%|███████████████████████▉         | 36217/50000 [6:34:14<2:31:25,  1.52it/s]


 72%|███████████████████████▉         | 36218/50000 [6:34:15<2:42:40,  1.41it/s]


 72%|███████████████████████▉         | 36219/50000 [6:34:15<2:31:51,  1.51it/s]


 72%|███████████████████████▉         | 36220/50000 [6:34:16<2:29:28,  1.54it/s]


 72%|███████████████████████▉         | 36221/50000 [6:34:17<2:28:41,  1.54it/s]


 72%|███████████████████████▉         | 36222/50000 [6:34:17<2:20:04,  1.64it/s]


 72%|███████████████████████▉         | 36223/50000 [6:34:18<2:25:16,  1.58it/s]


 72%|███████████████████████▉         | 36224/50000 [6:34:18<2:22:19,  1.61it/s]


 72%|███████████████████████▉         | 36225/50000 [6:34:19<2:24:58,  1.58it/s]


 72%|███████████████████████▉         | 36226/50000 [6:34:20<2:21:38,  1.62it/s]


 72%|███████████████████████▉         | 36227/50000 [6:34:20<2:25:20,  1.58it/s]


 72%|███████████████████████▉         | 36228/50000 [6:34:21<2:24:56,  1.58it/s]


 72%|███████████████████████▉         | 36229/50000 [6:34:22<2:26:39,  1.57it/s]


 72%|███████████████████████▉         | 36230/50000 [6:34:22<2:41:46,  1.42it/s]


 72%|███████████████████████▉         | 36231/50000 [6:34:23<2:31:36,  1.51it/s]


 72%|███████████████████████▉         | 36232/50000 [6:34:24<2:31:56,  1.51it/s]


 72%|███████████████████████▉         | 36233/50000 [6:34:24<2:29:57,  1.53it/s]


 72%|███████████████████████▉         | 36234/50000 [6:34:25<2:37:50,  1.45it/s]


 72%|███████████████████████▉         | 36235/50000 [6:34:26<2:36:40,  1.46it/s]


 72%|███████████████████████▉         | 36236/50000 [6:34:26<2:33:14,  1.50it/s]


 72%|███████████████████████▉         | 36237/50000 [6:34:27<2:32:07,  1.51it/s]


 72%|███████████████████████▉         | 36238/50000 [6:34:28<2:19:51,  1.64it/s]


 72%|███████████████████████▉         | 36239/50000 [6:34:28<2:25:00,  1.58it/s]


 72%|███████████████████████▉         | 36240/50000 [6:34:29<2:22:01,  1.61it/s]


 72%|███████████████████████▉         | 36241/50000 [6:34:29<2:23:26,  1.60it/s]


 72%|███████████████████████▉         | 36242/50000 [6:34:30<2:23:34,  1.60it/s]


 72%|███████████████████████▉         | 36243/50000 [6:34:31<2:53:58,  1.32it/s]


 72%|███████████████████████▉         | 36244/50000 [6:34:32<2:42:16,  1.41it/s]


 72%|███████████████████████▉         | 36245/50000 [6:34:32<2:38:54,  1.44it/s]


 72%|███████████████████████▉         | 36246/50000 [6:34:33<2:32:59,  1.50it/s]


 72%|███████████████████████▉         | 36247/50000 [6:34:34<2:26:42,  1.56it/s]


 72%|███████████████████████▉         | 36248/50000 [6:34:34<2:22:21,  1.61it/s]


 72%|███████████████████████▉         | 36249/50000 [6:34:35<2:24:55,  1.58it/s]


 72%|███████████████████████▉         | 36250/50000 [6:34:35<2:21:29,  1.62it/s]


 73%|███████████████████████▉         | 36251/50000 [6:34:36<2:20:22,  1.63it/s]


 73%|███████████████████████▉         | 36252/50000 [6:34:37<2:13:45,  1.71it/s]


 73%|███████████████████████▉         | 36253/50000 [6:34:37<2:09:32,  1.77it/s]


 73%|███████████████████████▉         | 36254/50000 [6:34:38<2:11:59,  1.74it/s]


 73%|███████████████████████▉         | 36255/50000 [6:34:38<2:30:13,  1.52it/s]


 73%|███████████████████████▉         | 36256/50000 [6:34:39<2:23:57,  1.59it/s]


 73%|███████████████████████▉         | 36257/50000 [6:34:40<2:23:26,  1.60it/s]


 73%|███████████████████████▉         | 36258/50000 [6:34:40<2:24:47,  1.58it/s]


 73%|███████████████████████▉         | 36259/50000 [6:34:41<2:31:57,  1.51it/s]


 73%|███████████████████████▉         | 36260/50000 [6:34:42<2:29:53,  1.53it/s]


 73%|███████████████████████▉         | 36261/50000 [6:34:42<2:31:26,  1.51it/s]


 73%|███████████████████████▉         | 36262/50000 [6:34:43<2:42:55,  1.41it/s]


 73%|███████████████████████▉         | 36263/50000 [6:34:44<2:45:56,  1.38it/s]


 73%|███████████████████████▉         | 36264/50000 [6:34:45<2:42:15,  1.41it/s]


 73%|███████████████████████▉         | 36265/50000 [6:34:45<2:37:34,  1.45it/s]


 73%|███████████████████████▉         | 36266/50000 [6:34:46<2:31:17,  1.51it/s]


 73%|███████████████████████▉         | 36267/50000 [6:34:47<2:31:41,  1.51it/s]


 73%|███████████████████████▉         | 36268/50000 [6:34:47<2:30:12,  1.52it/s]


 73%|███████████████████████▉         | 36269/50000 [6:34:48<2:33:58,  1.49it/s]


 73%|███████████████████████▉         | 36270/50000 [6:34:48<2:28:47,  1.54it/s]


 73%|███████████████████████▉         | 36271/50000 [6:34:49<2:34:50,  1.48it/s]


 73%|███████████████████████▉         | 36272/50000 [6:34:50<2:52:00,  1.33it/s]


 73%|███████████████████████▉         | 36273/50000 [6:34:51<2:40:06,  1.43it/s]


 73%|███████████████████████▉         | 36274/50000 [6:34:51<2:26:22,  1.56it/s]


 73%|███████████████████████▉         | 36275/50000 [6:34:52<2:18:01,  1.66it/s]


 73%|███████████████████████▉         | 36276/50000 [6:34:52<2:25:51,  1.57it/s]


 73%|███████████████████████▉         | 36277/50000 [6:34:53<2:34:07,  1.48it/s]


 73%|███████████████████████▉         | 36278/50000 [6:34:54<2:34:58,  1.48it/s]


 73%|███████████████████████▉         | 36279/50000 [6:34:54<2:29:02,  1.53it/s]


 73%|███████████████████████▉         | 36280/50000 [6:34:55<2:28:46,  1.54it/s]


 73%|███████████████████████▉         | 36281/50000 [6:34:56<2:31:10,  1.51it/s]


 73%|███████████████████████▉         | 36282/50000 [6:34:57<2:33:09,  1.49it/s]


 73%|███████████████████████▉         | 36283/50000 [6:34:57<2:27:17,  1.55it/s]


 73%|███████████████████████▉         | 36284/50000 [6:34:58<2:31:55,  1.50it/s]


 73%|███████████████████████▉         | 36285/50000 [6:34:59<2:38:21,  1.44it/s]


 73%|███████████████████████▉         | 36286/50000 [6:34:59<2:31:05,  1.51it/s]


 73%|███████████████████████▉         | 36287/50000 [6:35:00<2:32:47,  1.50it/s]


 73%|███████████████████████▉         | 36288/50000 [6:35:01<2:33:31,  1.49it/s]


 73%|███████████████████████▉         | 36289/50000 [6:35:01<2:31:27,  1.51it/s]


 73%|███████████████████████▉         | 36290/50000 [6:35:02<2:30:24,  1.52it/s]


 73%|███████████████████████▉         | 36291/50000 [6:35:02<2:24:56,  1.58it/s]


 73%|███████████████████████▉         | 36292/50000 [6:35:03<2:22:21,  1.60it/s]


 73%|███████████████████████▉         | 36293/50000 [6:35:04<2:23:34,  1.59it/s]


 73%|███████████████████████▉         | 36294/50000 [6:35:04<2:29:15,  1.53it/s]


 73%|███████████████████████▉         | 36295/50000 [6:35:05<2:24:09,  1.58it/s]


 73%|███████████████████████▉         | 36296/50000 [6:35:05<2:20:12,  1.63it/s]


 73%|███████████████████████▉         | 36297/50000 [6:35:06<2:22:54,  1.60it/s]


 73%|███████████████████████▉         | 36298/50000 [6:35:07<2:23:37,  1.59it/s]


 73%|███████████████████████▉         | 36299/50000 [6:35:07<2:27:32,  1.55it/s]


 73%|███████████████████████▉         | 36300/50000 [6:35:08<2:21:56,  1.61it/s]
                                                                                
{'loss': 3.1857, 'grad_norm': 3.0072438716888428, 'learning_rate': 0.00027400000000000005, 'epoch': 1.9}

 73%|███████████████████████▉         | 36300/50000 [6:35:08<2:21:56,  1.61it/s]


 73%|███████████████████████▉         | 36301/50000 [6:35:09<2:23:35,  1.59it/s]


 73%|███████████████████████▉         | 36302/50000 [6:35:09<2:29:32,  1.53it/s]


 73%|███████████████████████▉         | 36303/50000 [6:35:10<2:24:34,  1.58it/s]


 73%|███████████████████████▉         | 36304/50000 [6:35:11<2:37:03,  1.45it/s]


 73%|███████████████████████▉         | 36305/50000 [6:35:12<2:41:44,  1.41it/s]


 73%|███████████████████████▉         | 36306/50000 [6:35:12<2:35:44,  1.47it/s]


 73%|███████████████████████▉         | 36307/50000 [6:35:13<2:47:38,  1.36it/s]


 73%|███████████████████████▉         | 36308/50000 [6:35:14<2:41:58,  1.41it/s]


 73%|███████████████████████▉         | 36309/50000 [6:35:14<2:33:16,  1.49it/s]


 73%|███████████████████████▉         | 36310/50000 [6:35:15<2:32:46,  1.49it/s]


 73%|███████████████████████▉         | 36311/50000 [6:35:16<2:27:15,  1.55it/s]


 73%|███████████████████████▉         | 36312/50000 [6:35:16<2:28:19,  1.54it/s]


 73%|███████████████████████▉         | 36313/50000 [6:35:17<2:34:41,  1.47it/s]


 73%|███████████████████████▉         | 36314/50000 [6:35:18<2:38:41,  1.44it/s]


 73%|███████████████████████▉         | 36315/50000 [6:35:18<2:33:34,  1.49it/s]


 73%|███████████████████████▉         | 36316/50000 [6:35:19<2:23:14,  1.59it/s]


 73%|███████████████████████▉         | 36317/50000 [6:35:19<2:23:10,  1.59it/s]


 73%|███████████████████████▉         | 36318/50000 [6:35:20<2:27:01,  1.55it/s]


 73%|███████████████████████▉         | 36319/50000 [6:35:21<2:20:36,  1.62it/s]


 73%|███████████████████████▉         | 36320/50000 [6:35:21<2:23:53,  1.58it/s]


 73%|███████████████████████▉         | 36321/50000 [6:35:22<2:24:52,  1.57it/s]


 73%|███████████████████████▉         | 36322/50000 [6:35:23<2:32:14,  1.50it/s]


 73%|███████████████████████▉         | 36323/50000 [6:35:23<2:24:42,  1.58it/s]


 73%|███████████████████████▉         | 36324/50000 [6:35:24<2:26:45,  1.55it/s]


 73%|███████████████████████▉         | 36325/50000 [6:35:25<2:26:56,  1.55it/s]


 73%|███████████████████████▉         | 36326/50000 [6:35:25<2:28:14,  1.54it/s]


 73%|███████████████████████▉         | 36327/50000 [6:35:26<2:32:53,  1.49it/s]


 73%|███████████████████████▉         | 36328/50000 [6:35:26<2:22:19,  1.60it/s]


 73%|███████████████████████▉         | 36329/50000 [6:35:27<2:14:55,  1.69it/s]


 73%|███████████████████████▉         | 36330/50000 [6:35:28<2:15:57,  1.68it/s]


 73%|███████████████████████▉         | 36331/50000 [6:35:28<2:14:59,  1.69it/s]


 73%|███████████████████████▉         | 36332/50000 [6:35:29<2:15:35,  1.68it/s]


 73%|███████████████████████▉         | 36333/50000 [6:35:29<2:12:34,  1.72it/s]


 73%|███████████████████████▉         | 36334/50000 [6:35:30<2:13:53,  1.70it/s]


 73%|███████████████████████▉         | 36335/50000 [6:35:31<2:19:09,  1.64it/s]


 73%|███████████████████████▉         | 36336/50000 [6:35:31<2:20:22,  1.62it/s]


 73%|███████████████████████▉         | 36337/50000 [6:35:32<2:21:56,  1.60it/s]


 73%|███████████████████████▉         | 36338/50000 [6:35:33<2:25:37,  1.56it/s]


 73%|███████████████████████▉         | 36339/50000 [6:35:33<2:26:03,  1.56it/s]


 73%|███████████████████████▉         | 36340/50000 [6:35:34<2:22:54,  1.59it/s]


 73%|███████████████████████▉         | 36341/50000 [6:35:35<2:29:29,  1.52it/s]


 73%|███████████████████████▉         | 36342/50000 [6:35:35<2:30:01,  1.52it/s]


 73%|███████████████████████▉         | 36343/50000 [6:35:36<2:29:07,  1.53it/s]


 73%|███████████████████████▉         | 36344/50000 [6:35:36<2:25:01,  1.57it/s]


 73%|███████████████████████▉         | 36345/50000 [6:35:37<2:23:15,  1.59it/s]


 73%|███████████████████████▉         | 36346/50000 [6:35:38<2:24:44,  1.57it/s]


 73%|███████████████████████▉         | 36347/50000 [6:35:38<2:26:42,  1.55it/s]


 73%|███████████████████████▉         | 36348/50000 [6:35:39<2:25:29,  1.56it/s]


 73%|███████████████████████▉         | 36349/50000 [6:35:40<2:23:19,  1.59it/s]


 73%|███████████████████████▉         | 36350/50000 [6:35:40<2:23:57,  1.58it/s]


 73%|███████████████████████▉         | 36351/50000 [6:35:41<2:22:13,  1.60it/s]


 73%|███████████████████████▉         | 36352/50000 [6:35:41<2:23:42,  1.58it/s]


 73%|███████████████████████▉         | 36353/50000 [6:35:42<2:26:35,  1.55it/s]


 73%|███████████████████████▉         | 36354/50000 [6:35:43<2:25:52,  1.56it/s]


 73%|███████████████████████▉         | 36355/50000 [6:35:43<2:22:16,  1.60it/s]


 73%|███████████████████████▉         | 36356/50000 [6:35:44<2:17:12,  1.66it/s]


 73%|███████████████████████▉         | 36357/50000 [6:35:45<2:31:41,  1.50it/s]


 73%|███████████████████████▉         | 36358/50000 [6:35:45<2:29:46,  1.52it/s]


 73%|███████████████████████▉         | 36359/50000 [6:35:46<2:18:23,  1.64it/s]


 73%|███████████████████████▉         | 36360/50000 [6:35:46<2:18:18,  1.64it/s]


 73%|███████████████████████▉         | 36361/50000 [6:35:47<2:19:32,  1.63it/s]


 73%|███████████████████████▉         | 36362/50000 [6:35:48<2:21:15,  1.61it/s]


 73%|███████████████████████▉         | 36363/50000 [6:35:48<2:21:36,  1.61it/s]


 73%|████████████████████████         | 36364/50000 [6:35:49<2:28:52,  1.53it/s]


 73%|████████████████████████         | 36365/50000 [6:35:50<2:33:34,  1.48it/s]


 73%|████████████████████████         | 36366/50000 [6:35:51<2:48:47,  1.35it/s]


 73%|████████████████████████         | 36367/50000 [6:35:51<2:42:10,  1.40it/s]


 73%|████████████████████████         | 36368/50000 [6:35:52<2:32:47,  1.49it/s]


 73%|████████████████████████         | 36369/50000 [6:35:53<2:31:21,  1.50it/s]


 73%|████████████████████████         | 36370/50000 [6:35:53<2:29:29,  1.52it/s]


 73%|████████████████████████         | 36371/50000 [6:35:54<2:24:45,  1.57it/s]


 73%|████████████████████████         | 36372/50000 [6:35:54<2:24:27,  1.57it/s]


 73%|████████████████████████         | 36373/50000 [6:35:55<2:19:58,  1.62it/s]


 73%|████████████████████████         | 36374/50000 [6:35:56<2:24:34,  1.57it/s]


 73%|████████████████████████         | 36375/50000 [6:35:56<2:20:27,  1.62it/s]


 73%|████████████████████████         | 36376/50000 [6:35:57<2:17:34,  1.65it/s]


 73%|████████████████████████         | 36377/50000 [6:35:58<2:26:04,  1.55it/s]


 73%|████████████████████████         | 36378/50000 [6:35:58<2:33:16,  1.48it/s]


 73%|████████████████████████         | 36379/50000 [6:35:59<2:36:00,  1.46it/s]


 73%|████████████████████████         | 36380/50000 [6:36:00<2:38:55,  1.43it/s]


 73%|████████████████████████         | 36381/50000 [6:36:00<2:33:30,  1.48it/s]


 73%|████████████████████████         | 36382/50000 [6:36:01<2:35:44,  1.46it/s]


 73%|████████████████████████         | 36383/50000 [6:36:02<2:33:23,  1.48it/s]


 73%|████████████████████████         | 36384/50000 [6:36:02<2:28:16,  1.53it/s]


 73%|████████████████████████         | 36385/50000 [6:36:03<2:25:07,  1.56it/s]


 73%|████████████████████████         | 36386/50000 [6:36:04<2:23:50,  1.58it/s]


 73%|████████████████████████         | 36387/50000 [6:36:04<2:28:58,  1.52it/s]


 73%|████████████████████████         | 36388/50000 [6:36:05<2:33:56,  1.47it/s]


 73%|████████████████████████         | 36389/50000 [6:36:06<2:38:45,  1.43it/s]


 73%|████████████████████████         | 36390/50000 [6:36:07<2:49:22,  1.34it/s]


 73%|████████████████████████         | 36391/50000 [6:36:07<2:33:16,  1.48it/s]


 73%|████████████████████████         | 36392/50000 [6:36:08<2:30:51,  1.50it/s]


 73%|████████████████████████         | 36393/50000 [6:36:08<2:28:17,  1.53it/s]


 73%|████████████████████████         | 36394/50000 [6:36:09<2:30:21,  1.51it/s]


 73%|████████████████████████         | 36395/50000 [6:36:10<2:25:18,  1.56it/s]


 73%|████████████████████████         | 36396/50000 [6:36:10<2:25:10,  1.56it/s]


 73%|████████████████████████         | 36397/50000 [6:36:11<2:22:31,  1.59it/s]


 73%|████████████████████████         | 36398/50000 [6:36:12<2:22:22,  1.59it/s]


 73%|████████████████████████         | 36399/50000 [6:36:12<2:23:34,  1.58it/s]


 73%|████████████████████████         | 36400/50000 [6:36:13<2:21:48,  1.60it/s]
                                                                                
{'loss': 3.2129, 'grad_norm': 3.1849911212921143, 'learning_rate': 0.00027200000000000005, 'epoch': 1.91}

 73%|████████████████████████         | 36400/50000 [6:36:13<2:21:48,  1.60it/s]


 73%|████████████████████████         | 36401/50000 [6:36:13<2:16:45,  1.66it/s]


 73%|████████████████████████         | 36402/50000 [6:36:14<2:18:47,  1.63it/s]


 73%|████████████████████████         | 36403/50000 [6:36:15<2:23:12,  1.58it/s]


 73%|████████████████████████         | 36404/50000 [6:36:15<2:19:24,  1.63it/s]


 73%|████████████████████████         | 36405/50000 [6:36:16<2:29:25,  1.52it/s]


 73%|████████████████████████         | 36406/50000 [6:36:17<2:30:37,  1.50it/s]


 73%|████████████████████████         | 36407/50000 [6:36:17<2:22:57,  1.58it/s]


 73%|████████████████████████         | 36408/50000 [6:36:18<2:20:14,  1.62it/s]


 73%|████████████████████████         | 36409/50000 [6:36:18<2:21:19,  1.60it/s]


 73%|████████████████████████         | 36410/50000 [6:36:19<2:25:08,  1.56it/s]


 73%|████████████████████████         | 36411/50000 [6:36:20<2:17:13,  1.65it/s]


 73%|████████████████████████         | 36412/50000 [6:36:20<2:21:19,  1.60it/s]


 73%|████████████████████████         | 36413/50000 [6:36:21<2:30:13,  1.51it/s]


 73%|████████████████████████         | 36414/50000 [6:36:22<2:27:34,  1.53it/s]


 73%|████████████████████████         | 36415/50000 [6:36:22<2:21:39,  1.60it/s]


 73%|████████████████████████         | 36416/50000 [6:36:23<2:17:01,  1.65it/s]


 73%|████████████████████████         | 36417/50000 [6:36:24<2:20:47,  1.61it/s]


 73%|████████████████████████         | 36418/50000 [6:36:24<2:13:41,  1.69it/s]


 73%|████████████████████████         | 36419/50000 [6:36:25<2:16:10,  1.66it/s]


 73%|████████████████████████         | 36420/50000 [6:36:25<2:22:02,  1.59it/s]


 73%|████████████████████████         | 36421/50000 [6:36:26<2:24:49,  1.56it/s]


 73%|████████████████████████         | 36422/50000 [6:36:27<2:20:30,  1.61it/s]


 73%|████████████████████████         | 36423/50000 [6:36:27<2:28:46,  1.52it/s]


 73%|████████████████████████         | 36424/50000 [6:36:28<2:29:03,  1.52it/s]


 73%|████████████████████████         | 36425/50000 [6:36:29<2:35:46,  1.45it/s]


 73%|████████████████████████         | 36426/50000 [6:36:30<2:44:53,  1.37it/s]


 73%|████████████████████████         | 36427/50000 [6:36:30<2:39:38,  1.42it/s]


 73%|████████████████████████         | 36428/50000 [6:36:31<2:30:28,  1.50it/s]


 73%|████████████████████████         | 36429/50000 [6:36:32<2:34:27,  1.46it/s]


 73%|████████████████████████         | 36430/50000 [6:36:32<2:32:19,  1.48it/s]


 73%|████████████████████████         | 36431/50000 [6:36:33<2:30:50,  1.50it/s]


 73%|████████████████████████         | 36432/50000 [6:36:34<2:31:11,  1.50it/s]


 73%|████████████████████████         | 36433/50000 [6:36:34<2:41:36,  1.40it/s]


 73%|████████████████████████         | 36434/50000 [6:36:35<2:41:16,  1.40it/s]


 73%|████████████████████████         | 36435/50000 [6:36:36<2:39:14,  1.42it/s]


 73%|████████████████████████         | 36436/50000 [6:36:36<2:42:41,  1.39it/s]


 73%|████████████████████████         | 36437/50000 [6:36:37<2:32:30,  1.48it/s]


 73%|████████████████████████         | 36438/50000 [6:36:38<2:26:39,  1.54it/s]


 73%|████████████████████████         | 36439/50000 [6:36:38<2:26:49,  1.54it/s]


 73%|████████████████████████         | 36440/50000 [6:36:39<2:25:42,  1.55it/s]


 73%|████████████████████████         | 36441/50000 [6:36:40<2:26:36,  1.54it/s]


 73%|████████████████████████         | 36442/50000 [6:36:40<2:22:06,  1.59it/s]


 73%|████████████████████████         | 36443/50000 [6:36:41<2:28:31,  1.52it/s]


 73%|████████████████████████         | 36444/50000 [6:36:42<2:28:34,  1.52it/s]


 73%|████████████████████████         | 36445/50000 [6:36:42<2:24:17,  1.57it/s]


 73%|████████████████████████         | 36446/50000 [6:36:43<2:25:37,  1.55it/s]


 73%|████████████████████████         | 36447/50000 [6:36:43<2:19:43,  1.62it/s]


 73%|████████████████████████         | 36448/50000 [6:36:44<2:16:01,  1.66it/s]


 73%|████████████████████████         | 36449/50000 [6:36:45<2:15:31,  1.67it/s]


 73%|████████████████████████         | 36450/50000 [6:36:45<2:26:28,  1.54it/s]


 73%|████████████████████████         | 36451/50000 [6:36:46<2:21:33,  1.60it/s]


 73%|████████████████████████         | 36452/50000 [6:36:46<2:21:31,  1.60it/s]


 73%|████████████████████████         | 36453/50000 [6:36:47<2:18:55,  1.63it/s]


 73%|████████████████████████         | 36454/50000 [6:36:48<2:22:42,  1.58it/s]


 73%|████████████████████████         | 36455/50000 [6:36:48<2:24:03,  1.57it/s]


 73%|████████████████████████         | 36456/50000 [6:36:49<2:28:50,  1.52it/s]


 73%|████████████████████████         | 36457/50000 [6:36:50<2:34:54,  1.46it/s]


 73%|████████████████████████         | 36458/50000 [6:36:51<2:38:41,  1.42it/s]


 73%|████████████████████████         | 36459/50000 [6:36:51<2:36:55,  1.44it/s]


 73%|████████████████████████         | 36460/50000 [6:36:52<2:32:15,  1.48it/s]


 73%|████████████████████████         | 36461/50000 [6:36:53<2:29:25,  1.51it/s]


 73%|████████████████████████         | 36462/50000 [6:36:53<2:29:59,  1.50it/s]


 73%|████████████████████████         | 36463/50000 [6:36:54<2:28:37,  1.52it/s]


 73%|████████████████████████         | 36464/50000 [6:36:54<2:24:17,  1.56it/s]


 73%|████████████████████████         | 36465/50000 [6:36:55<2:23:25,  1.57it/s]


 73%|████████████████████████         | 36466/50000 [6:36:56<2:23:13,  1.57it/s]


 73%|████████████████████████         | 36467/50000 [6:36:56<2:20:29,  1.61it/s]


 73%|████████████████████████         | 36468/50000 [6:36:57<2:20:43,  1.60it/s]


 73%|████████████████████████         | 36469/50000 [6:36:58<2:20:28,  1.61it/s]


 73%|████████████████████████         | 36470/50000 [6:36:58<2:17:19,  1.64it/s]


 73%|████████████████████████         | 36471/50000 [6:36:59<2:22:25,  1.58it/s]


 73%|████████████████████████         | 36472/50000 [6:36:59<2:19:29,  1.62it/s]


 73%|████████████████████████         | 36473/50000 [6:37:00<2:28:49,  1.51it/s]


 73%|████████████████████████         | 36474/50000 [6:37:01<2:39:46,  1.41it/s]


 73%|████████████████████████         | 36475/50000 [6:37:02<2:29:07,  1.51it/s]


 73%|████████████████████████         | 36476/50000 [6:37:02<2:22:04,  1.59it/s]


 73%|████████████████████████         | 36477/50000 [6:37:03<2:18:50,  1.62it/s]


 73%|████████████████████████         | 36478/50000 [6:37:03<2:21:39,  1.59it/s]


 73%|████████████████████████         | 36479/50000 [6:37:04<2:21:56,  1.59it/s]


 73%|████████████████████████         | 36480/50000 [6:37:05<2:22:33,  1.58it/s]


 73%|████████████████████████         | 36481/50000 [6:37:05<2:13:52,  1.68it/s]


 73%|████████████████████████         | 36482/50000 [6:37:06<2:15:52,  1.66it/s]


 73%|████████████████████████         | 36483/50000 [6:37:06<2:17:52,  1.63it/s]


 73%|████████████████████████         | 36484/50000 [6:37:07<2:19:42,  1.61it/s]


 73%|████████████████████████         | 36485/50000 [6:37:08<2:28:00,  1.52it/s]


 73%|████████████████████████         | 36486/50000 [6:37:08<2:23:22,  1.57it/s]


 73%|████████████████████████         | 36487/50000 [6:37:09<2:26:16,  1.54it/s]


 73%|████████████████████████         | 36488/50000 [6:37:10<2:17:51,  1.63it/s]


 73%|████████████████████████         | 36489/50000 [6:37:10<2:27:06,  1.53it/s]


 73%|████████████████████████         | 36490/50000 [6:37:11<2:33:10,  1.47it/s]


 73%|████████████████████████         | 36491/50000 [6:37:12<2:31:15,  1.49it/s]


 73%|████████████████████████         | 36492/50000 [6:37:12<2:32:31,  1.48it/s]


 73%|████████████████████████         | 36493/50000 [6:37:13<2:25:44,  1.54it/s]


 73%|████████████████████████         | 36494/50000 [6:37:14<2:28:15,  1.52it/s]


 73%|████████████████████████         | 36495/50000 [6:37:14<2:36:34,  1.44it/s]


 73%|████████████████████████         | 36496/50000 [6:37:15<2:31:36,  1.48it/s]


 73%|████████████████████████         | 36497/50000 [6:37:16<2:34:03,  1.46it/s]


 73%|████████████████████████         | 36498/50000 [6:37:16<2:26:47,  1.53it/s]


 73%|████████████████████████         | 36499/50000 [6:37:17<2:18:08,  1.63it/s]


 73%|████████████████████████         | 36500/50000 [6:37:18<2:28:14,  1.52it/s]
                                                                                
{'loss': 3.186, 'grad_norm': 3.2472734451293945, 'learning_rate': 0.00027, 'epoch': 1.91}

 73%|████████████████████████         | 36500/50000 [6:37:18<2:28:14,  1.52it/s]


 73%|████████████████████████         | 36501/50000 [6:37:18<2:28:19,  1.52it/s]


 73%|████████████████████████         | 36502/50000 [6:37:19<2:27:52,  1.52it/s]


 73%|████████████████████████         | 36503/50000 [6:37:20<2:24:34,  1.56it/s]


 73%|████████████████████████         | 36504/50000 [6:37:20<2:21:50,  1.59it/s]


 73%|████████████████████████         | 36505/50000 [6:37:21<2:21:40,  1.59it/s]


 73%|████████████████████████         | 36506/50000 [6:37:21<2:11:54,  1.71it/s]


 73%|████████████████████████         | 36507/50000 [6:37:22<2:21:10,  1.59it/s]


 73%|████████████████████████         | 36508/50000 [6:37:23<2:26:39,  1.53it/s]


 73%|████████████████████████         | 36509/50000 [6:37:23<2:21:56,  1.58it/s]


 73%|████████████████████████         | 36510/50000 [6:37:24<2:11:14,  1.71it/s]


 73%|████████████████████████         | 36511/50000 [6:37:24<2:19:42,  1.61it/s]


 73%|████████████████████████         | 36512/50000 [6:37:25<2:23:52,  1.56it/s]


 73%|████████████████████████         | 36513/50000 [6:37:26<2:32:07,  1.48it/s]


 73%|████████████████████████         | 36514/50000 [6:37:26<2:24:30,  1.56it/s]


 73%|████████████████████████         | 36515/50000 [6:37:27<2:19:33,  1.61it/s]


 73%|████████████████████████         | 36516/50000 [6:37:28<2:15:18,  1.66it/s]


 73%|████████████████████████         | 36517/50000 [6:37:28<2:19:30,  1.61it/s]


 73%|████████████████████████         | 36518/50000 [6:37:29<2:33:56,  1.46it/s]


 73%|████████████████████████         | 36519/50000 [6:37:30<2:32:10,  1.48it/s]


 73%|████████████████████████         | 36520/50000 [6:37:30<2:21:20,  1.59it/s]


 73%|████████████████████████         | 36521/50000 [6:37:31<2:20:46,  1.60it/s]


 73%|████████████████████████         | 36522/50000 [6:37:31<2:16:32,  1.65it/s]


 73%|████████████████████████         | 36523/50000 [6:37:32<2:14:35,  1.67it/s]


 73%|████████████████████████         | 36524/50000 [6:37:33<2:13:08,  1.69it/s]


 73%|████████████████████████         | 36525/50000 [6:37:33<2:12:01,  1.70it/s]


 73%|████████████████████████         | 36526/50000 [6:37:34<2:18:30,  1.62it/s]


 73%|████████████████████████         | 36527/50000 [6:37:35<2:22:10,  1.58it/s]


 73%|████████████████████████         | 36528/50000 [6:37:35<2:37:29,  1.43it/s]


 73%|████████████████████████         | 36529/50000 [6:37:36<2:32:54,  1.47it/s]


 73%|████████████████████████         | 36530/50000 [6:37:37<2:35:23,  1.44it/s]


 73%|████████████████████████         | 36531/50000 [6:37:37<2:38:22,  1.42it/s]


 73%|████████████████████████         | 36532/50000 [6:37:38<2:25:37,  1.54it/s]


 73%|████████████████████████         | 36533/50000 [6:37:39<2:25:04,  1.55it/s]


 73%|████████████████████████         | 36534/50000 [6:37:39<2:29:34,  1.50it/s]


 73%|████████████████████████         | 36535/50000 [6:37:40<2:26:38,  1.53it/s]


 73%|████████████████████████         | 36536/50000 [6:37:41<2:27:50,  1.52it/s]


 73%|████████████████████████         | 36537/50000 [6:37:41<2:28:13,  1.51it/s]


 73%|████████████████████████         | 36538/50000 [6:37:42<2:28:04,  1.52it/s]


 73%|████████████████████████         | 36539/50000 [6:37:43<2:24:07,  1.56it/s]


 73%|████████████████████████         | 36540/50000 [6:37:43<2:25:11,  1.55it/s]


 73%|████████████████████████         | 36541/50000 [6:37:44<2:20:00,  1.60it/s]


 73%|████████████████████████         | 36542/50000 [6:37:44<2:20:37,  1.59it/s]


 73%|████████████████████████         | 36543/50000 [6:37:45<2:15:42,  1.65it/s]


 73%|████████████████████████         | 36544/50000 [6:37:46<2:22:51,  1.57it/s]


 73%|████████████████████████         | 36545/50000 [6:37:46<2:24:18,  1.55it/s]


 73%|████████████████████████         | 36546/50000 [6:37:47<2:24:50,  1.55it/s]


 73%|████████████████████████         | 36547/50000 [6:37:48<2:18:27,  1.62it/s]


 73%|████████████████████████         | 36548/50000 [6:37:48<2:16:41,  1.64it/s]


 73%|████████████████████████         | 36549/50000 [6:37:49<2:18:46,  1.62it/s]


 73%|████████████████████████         | 36550/50000 [6:37:49<2:15:30,  1.65it/s]


 73%|████████████████████████         | 36551/50000 [6:37:50<2:22:32,  1.57it/s]


 73%|████████████████████████         | 36552/50000 [6:37:51<2:32:18,  1.47it/s]


 73%|████████████████████████         | 36553/50000 [6:37:52<2:32:43,  1.47it/s]


 73%|████████████████████████▏        | 36554/50000 [6:37:52<2:32:25,  1.47it/s]


 73%|████████████████████████▏        | 36555/50000 [6:37:53<2:48:34,  1.33it/s]


 73%|████████████████████████▏        | 36556/50000 [6:37:54<2:41:08,  1.39it/s]


 73%|████████████████████████▏        | 36557/50000 [6:37:54<2:41:00,  1.39it/s]


 73%|████████████████████████▏        | 36558/50000 [6:37:55<2:37:23,  1.42it/s]


 73%|████████████████████████▏        | 36559/50000 [6:37:56<2:33:38,  1.46it/s]


 73%|████████████████████████▏        | 36560/50000 [6:37:56<2:27:31,  1.52it/s]


 73%|████████████████████████▏        | 36561/50000 [6:37:57<2:21:35,  1.58it/s]


 73%|████████████████████████▏        | 36562/50000 [6:37:58<2:20:04,  1.60it/s]


 73%|████████████████████████▏        | 36563/50000 [6:37:58<2:17:37,  1.63it/s]


 73%|████████████████████████▏        | 36564/50000 [6:37:59<2:30:41,  1.49it/s]


 73%|████████████████████████▏        | 36565/50000 [6:38:00<2:25:26,  1.54it/s]


 73%|████████████████████████▏        | 36566/50000 [6:38:00<2:21:19,  1.58it/s]


 73%|████████████████████████▏        | 36567/50000 [6:38:01<2:18:51,  1.61it/s]


 73%|████████████████████████▏        | 36568/50000 [6:38:02<2:33:14,  1.46it/s]


 73%|████████████████████████▏        | 36569/50000 [6:38:02<2:39:42,  1.40it/s]


 73%|████████████████████████▏        | 36570/50000 [6:38:03<2:47:53,  1.33it/s]


 73%|████████████████████████▏        | 36571/50000 [6:38:04<2:37:54,  1.42it/s]


 73%|████████████████████████▏        | 36572/50000 [6:38:04<2:33:50,  1.45it/s]


 73%|████████████████████████▏        | 36573/50000 [6:38:05<2:35:42,  1.44it/s]


 73%|████████████████████████▏        | 36574/50000 [6:38:06<2:27:16,  1.52it/s]


 73%|████████████████████████▏        | 36575/50000 [6:38:06<2:33:21,  1.46it/s]


 73%|████████████████████████▏        | 36576/50000 [6:38:07<2:20:45,  1.59it/s]


 73%|████████████████████████▏        | 36577/50000 [6:38:08<2:15:56,  1.65it/s]


 73%|████████████████████████▏        | 36578/50000 [6:38:08<2:24:35,  1.55it/s]


 73%|████████████████████████▏        | 36579/50000 [6:38:09<2:21:32,  1.58it/s]


 73%|████████████████████████▏        | 36580/50000 [6:38:10<2:20:43,  1.59it/s]


 73%|████████████████████████▏        | 36581/50000 [6:38:10<2:22:39,  1.57it/s]


 73%|████████████████████████▏        | 36582/50000 [6:38:11<2:28:26,  1.51it/s]


 73%|████████████████████████▏        | 36583/50000 [6:38:12<2:34:14,  1.45it/s]


 73%|████████████████████████▏        | 36584/50000 [6:38:12<2:24:32,  1.55it/s]


 73%|████████████████████████▏        | 36585/50000 [6:38:13<2:24:52,  1.54it/s]


 73%|████████████████████████▏        | 36586/50000 [6:38:13<2:20:03,  1.60it/s]


 73%|████████████████████████▏        | 36587/50000 [6:38:14<2:23:28,  1.56it/s]


 73%|████████████████████████▏        | 36588/50000 [6:38:15<2:40:44,  1.39it/s]


 73%|████████████████████████▏        | 36589/50000 [6:38:16<2:37:31,  1.42it/s]


 73%|████████████████████████▏        | 36590/50000 [6:38:16<2:34:49,  1.44it/s]


 73%|████████████████████████▏        | 36591/50000 [6:38:17<2:27:49,  1.51it/s]


 73%|████████████████████████▏        | 36592/50000 [6:38:17<2:22:31,  1.57it/s]


 73%|████████████████████████▏        | 36593/50000 [6:38:18<2:20:03,  1.60it/s]


 73%|████████████████████████▏        | 36594/50000 [6:38:19<2:19:38,  1.60it/s]


 73%|████████████████████████▏        | 36595/50000 [6:38:19<2:15:31,  1.65it/s]


 73%|████████████████████████▏        | 36596/50000 [6:38:20<2:18:08,  1.62it/s]


 73%|████████████████████████▏        | 36597/50000 [6:38:20<2:11:41,  1.70it/s]


 73%|████████████████████████▏        | 36598/50000 [6:38:21<2:22:22,  1.57it/s]


 73%|████████████████████████▏        | 36599/50000 [6:38:22<2:16:09,  1.64it/s]


 73%|████████████████████████▏        | 36600/50000 [6:38:22<2:13:28,  1.67it/s]
                                                                                
{'loss': 3.2347, 'grad_norm': 4.008800983428955, 'learning_rate': 0.000268, 'epoch': 1.92}

 73%|████████████████████████▏        | 36600/50000 [6:38:22<2:13:28,  1.67it/s]


 73%|████████████████████████▏        | 36601/50000 [6:38:23<2:15:53,  1.64it/s]


 73%|████████████████████████▏        | 36602/50000 [6:38:24<2:14:33,  1.66it/s]


 73%|████████████████████████▏        | 36603/50000 [6:38:24<2:08:20,  1.74it/s]


 73%|████████████████████████▏        | 36604/50000 [6:38:25<2:27:13,  1.52it/s]


 73%|████████████████████████▏        | 36605/50000 [6:38:25<2:22:57,  1.56it/s]


 73%|████████████████████████▏        | 36606/50000 [6:38:26<2:27:37,  1.51it/s]


 73%|████████████████████████▏        | 36607/50000 [6:38:27<2:28:26,  1.50it/s]


 73%|████████████████████████▏        | 36608/50000 [6:38:27<2:23:27,  1.56it/s]


 73%|████████████████████████▏        | 36609/50000 [6:38:28<2:37:46,  1.41it/s]


 73%|████████████████████████▏        | 36610/50000 [6:38:29<2:28:19,  1.50it/s]


 73%|████████████████████████▏        | 36611/50000 [6:38:30<2:26:45,  1.52it/s]


 73%|████████████████████████▏        | 36612/50000 [6:38:30<2:32:16,  1.47it/s]


 73%|████████████████████████▏        | 36613/50000 [6:38:31<2:50:39,  1.31it/s]


 73%|████████████████████████▏        | 36614/50000 [6:38:32<2:38:59,  1.40it/s]


 73%|████████████████████████▏        | 36615/50000 [6:38:33<2:37:11,  1.42it/s]


 73%|████████████████████████▏        | 36616/50000 [6:38:33<2:30:25,  1.48it/s]


 73%|████████████████████████▏        | 36617/50000 [6:38:34<2:28:13,  1.50it/s]


 73%|████████████████████████▏        | 36618/50000 [6:38:34<2:22:27,  1.57it/s]


 73%|████████████████████████▏        | 36619/50000 [6:38:35<2:36:30,  1.42it/s]


 73%|████████████████████████▏        | 36620/50000 [6:38:36<2:38:50,  1.40it/s]


 73%|████████████████████████▏        | 36621/50000 [6:38:37<2:35:24,  1.43it/s]


 73%|████████████████████████▏        | 36622/50000 [6:38:37<2:31:48,  1.47it/s]


 73%|████████████████████████▏        | 36623/50000 [6:38:38<2:24:56,  1.54it/s]


 73%|████████████████████████▏        | 36624/50000 [6:38:38<2:14:29,  1.66it/s]


 73%|████████████████████████▏        | 36625/50000 [6:38:39<2:19:09,  1.60it/s]


 73%|████████████████████████▏        | 36626/50000 [6:38:40<2:20:42,  1.58it/s]


 73%|████████████████████████▏        | 36627/50000 [6:38:40<2:23:35,  1.55it/s]


 73%|████████████████████████▏        | 36628/50000 [6:38:41<2:36:32,  1.42it/s]


 73%|████████████████████████▏        | 36629/50000 [6:38:42<2:34:36,  1.44it/s]


 73%|████████████████████████▏        | 36630/50000 [6:38:43<2:45:32,  1.35it/s]


 73%|████████████████████████▏        | 36631/50000 [6:38:43<2:30:33,  1.48it/s]


 73%|████████████████████████▏        | 36632/50000 [6:38:44<2:20:00,  1.59it/s]


 73%|████████████████████████▏        | 36633/50000 [6:38:44<2:20:00,  1.59it/s]


 73%|████████████████████████▏        | 36634/50000 [6:38:45<2:18:19,  1.61it/s]


 73%|████████████████████████▏        | 36635/50000 [6:38:46<2:22:13,  1.57it/s]


 73%|████████████████████████▏        | 36636/50000 [6:38:46<2:16:56,  1.63it/s]


 73%|████████████████████████▏        | 36637/50000 [6:38:47<2:21:39,  1.57it/s]


 73%|████████████████████████▏        | 36638/50000 [6:38:48<2:42:20,  1.37it/s]


 73%|████████████████████████▏        | 36639/50000 [6:38:48<2:37:44,  1.41it/s]


 73%|████████████████████████▏        | 36640/50000 [6:38:49<2:28:36,  1.50it/s]


 73%|████████████████████████▏        | 36641/50000 [6:38:50<2:40:43,  1.39it/s]


 73%|████████████████████████▏        | 36642/50000 [6:38:50<2:32:39,  1.46it/s]


 73%|████████████████████████▏        | 36643/50000 [6:38:51<2:43:30,  1.36it/s]


 73%|████████████████████████▏        | 36644/50000 [6:38:52<2:42:47,  1.37it/s]


 73%|████████████████████████▏        | 36645/50000 [6:38:53<2:37:52,  1.41it/s]


 73%|████████████████████████▏        | 36646/50000 [6:38:53<2:33:34,  1.45it/s]


 73%|████████████████████████▏        | 36647/50000 [6:38:54<2:34:52,  1.44it/s]


 73%|████████████████████████▏        | 36648/50000 [6:38:55<2:21:13,  1.58it/s]


 73%|████████████████████████▏        | 36649/50000 [6:38:55<2:18:09,  1.61it/s]


 73%|████████████████████████▏        | 36650/50000 [6:38:56<2:18:07,  1.61it/s]


 73%|████████████████████████▏        | 36651/50000 [6:38:56<2:16:23,  1.63it/s]


 73%|████████████████████████▏        | 36652/50000 [6:38:57<2:14:20,  1.66it/s]


 73%|████████████████████████▏        | 36653/50000 [6:38:58<2:15:26,  1.64it/s]


 73%|████████████████████████▏        | 36654/50000 [6:38:58<2:31:55,  1.46it/s]


 73%|████████████████████████▏        | 36655/50000 [6:38:59<2:31:05,  1.47it/s]


 73%|████████████████████████▏        | 36656/50000 [6:39:00<2:23:56,  1.55it/s]


 73%|████████████████████████▏        | 36657/50000 [6:39:00<2:23:08,  1.55it/s]


 73%|████████████████████████▏        | 36658/50000 [6:39:01<2:24:30,  1.54it/s]


 73%|████████████████████████▏        | 36659/50000 [6:39:02<2:22:33,  1.56it/s]


 73%|████████████████████████▏        | 36660/50000 [6:39:02<2:22:51,  1.56it/s]


 73%|████████████████████████▏        | 36661/50000 [6:39:03<2:23:05,  1.55it/s]


 73%|████████████████████████▏        | 36662/50000 [6:39:04<2:28:02,  1.50it/s]


 73%|████████████████████████▏        | 36663/50000 [6:39:04<2:19:57,  1.59it/s]


 73%|████████████████████████▏        | 36664/50000 [6:39:05<2:27:28,  1.51it/s]


 73%|████████████████████████▏        | 36665/50000 [6:39:06<2:34:12,  1.44it/s]


 73%|████████████████████████▏        | 36666/50000 [6:39:06<2:29:42,  1.48it/s]


 73%|████████████████████████▏        | 36667/50000 [6:39:07<2:26:10,  1.52it/s]


 73%|████████████████████████▏        | 36668/50000 [6:39:08<2:31:02,  1.47it/s]


 73%|████████████████████████▏        | 36669/50000 [6:39:08<2:24:07,  1.54it/s]


 73%|████████████████████████▏        | 36670/50000 [6:39:09<2:38:15,  1.40it/s]


 73%|████████████████████████▏        | 36671/50000 [6:39:10<2:32:33,  1.46it/s]


 73%|████████████████████████▏        | 36672/50000 [6:39:10<2:19:32,  1.59it/s]


 73%|████████████████████████▏        | 36673/50000 [6:39:11<2:16:53,  1.62it/s]


 73%|████████████████████████▏        | 36674/50000 [6:39:12<2:26:36,  1.51it/s]


 73%|████████████████████████▏        | 36675/50000 [6:39:12<2:21:49,  1.57it/s]


 73%|████████████████████████▏        | 36676/50000 [6:39:13<2:13:40,  1.66it/s]


 73%|████████████████████████▏        | 36677/50000 [6:39:13<2:11:56,  1.68it/s]


 73%|████████████████████████▏        | 36678/50000 [6:39:14<2:21:25,  1.57it/s]


 73%|████████████████████████▏        | 36679/50000 [6:39:15<2:29:18,  1.49it/s]


 73%|████████████████████████▏        | 36680/50000 [6:39:15<2:32:43,  1.45it/s]


 73%|████████████████████████▏        | 36681/50000 [6:39:16<2:32:27,  1.46it/s]


 73%|████████████████████████▏        | 36682/50000 [6:39:17<2:29:42,  1.48it/s]


 73%|████████████████████████▏        | 36683/50000 [6:39:17<2:24:49,  1.53it/s]


 73%|████████████████████████▏        | 36684/50000 [6:39:18<2:29:58,  1.48it/s]


 73%|████████████████████████▏        | 36685/50000 [6:39:19<2:28:22,  1.50it/s]


 73%|████████████████████████▏        | 36686/50000 [6:39:19<2:26:25,  1.52it/s]


 73%|████████████████████████▏        | 36687/50000 [6:39:20<2:37:04,  1.41it/s]


 73%|████████████████████████▏        | 36688/50000 [6:39:21<2:26:47,  1.51it/s]


 73%|████████████████████████▏        | 36689/50000 [6:39:21<2:28:16,  1.50it/s]


 73%|████████████████████████▏        | 36690/50000 [6:39:22<2:26:46,  1.51it/s]


 73%|████████████████████████▏        | 36691/50000 [6:39:23<2:25:42,  1.52it/s]


 73%|████████████████████████▏        | 36692/50000 [6:39:23<2:24:31,  1.53it/s]


 73%|████████████████████████▏        | 36693/50000 [6:39:24<2:28:28,  1.49it/s]


 73%|████████████████████████▏        | 36694/50000 [6:39:25<2:26:37,  1.51it/s]


 73%|████████████████████████▏        | 36695/50000 [6:39:25<2:22:46,  1.55it/s]


 73%|████████████████████████▏        | 36696/50000 [6:39:26<2:17:35,  1.61it/s]


 73%|████████████████████████▏        | 36697/50000 [6:39:26<2:09:31,  1.71it/s]


 73%|████████████████████████▏        | 36698/50000 [6:39:27<2:12:29,  1.67it/s]


 73%|████████████████████████▏        | 36699/50000 [6:39:28<2:20:27,  1.58it/s]


 73%|████████████████████████▏        | 36700/50000 [6:39:28<2:19:44,  1.59it/s]
                                                                                
{'loss': 3.2345, 'grad_norm': 3.548147201538086, 'learning_rate': 0.000266, 'epoch': 1.92}

 73%|████████████████████████▏        | 36700/50000 [6:39:28<2:19:44,  1.59it/s]


 73%|████████████████████████▏        | 36701/50000 [6:39:29<2:21:52,  1.56it/s]


 73%|████████████████████████▏        | 36702/50000 [6:39:30<2:22:13,  1.56it/s]


 73%|████████████████████████▏        | 36703/50000 [6:39:30<2:28:38,  1.49it/s]


 73%|████████████████████████▏        | 36704/50000 [6:39:31<2:28:39,  1.49it/s]


 73%|████████████████████████▏        | 36705/50000 [6:39:32<2:26:57,  1.51it/s]


 73%|████████████████████████▏        | 36706/50000 [6:39:32<2:31:48,  1.46it/s]


 73%|████████████████████████▏        | 36707/50000 [6:39:33<2:24:58,  1.53it/s]


 73%|████████████████████████▏        | 36708/50000 [6:39:34<2:23:57,  1.54it/s]


 73%|████████████████████████▏        | 36709/50000 [6:39:35<2:40:58,  1.38it/s]


 73%|████████████████████████▏        | 36710/50000 [6:39:35<2:37:16,  1.41it/s]


 73%|████████████████████████▏        | 36711/50000 [6:39:36<2:31:20,  1.46it/s]


 73%|████████████████████████▏        | 36712/50000 [6:39:37<2:30:31,  1.47it/s]


 73%|████████████████████████▏        | 36713/50000 [6:39:37<2:41:16,  1.37it/s]


 73%|████████████████████████▏        | 36714/50000 [6:39:38<2:32:00,  1.46it/s]


 73%|████████████████████████▏        | 36715/50000 [6:39:39<2:35:44,  1.42it/s]


 73%|████████████████████████▏        | 36716/50000 [6:39:39<2:37:56,  1.40it/s]


 73%|████████████████████████▏        | 36717/50000 [6:39:40<2:27:34,  1.50it/s]


 73%|████████████████████████▏        | 36718/50000 [6:39:41<2:37:03,  1.41it/s]


 73%|████████████████████████▏        | 36719/50000 [6:39:41<2:26:05,  1.52it/s]


 73%|████████████████████████▏        | 36720/50000 [6:39:42<2:38:59,  1.39it/s]


 73%|████████████████████████▏        | 36721/50000 [6:39:43<2:30:47,  1.47it/s]


 73%|████████████████████████▏        | 36722/50000 [6:39:43<2:30:07,  1.47it/s]


 73%|████████████████████████▏        | 36723/50000 [6:39:44<2:34:51,  1.43it/s]


 73%|████████████████████████▏        | 36724/50000 [6:39:45<2:31:40,  1.46it/s]


 73%|████████████████████████▏        | 36725/50000 [6:39:46<2:30:00,  1.47it/s]


 73%|████████████████████████▏        | 36726/50000 [6:39:46<2:21:19,  1.57it/s]


 73%|████████████████████████▏        | 36727/50000 [6:39:47<2:17:35,  1.61it/s]


 73%|████████████████████████▏        | 36728/50000 [6:39:47<2:18:21,  1.60it/s]


 73%|████████████████████████▏        | 36729/50000 [6:39:48<2:20:05,  1.58it/s]


 73%|████████████████████████▏        | 36730/50000 [6:39:48<2:12:18,  1.67it/s]


 73%|████████████████████████▏        | 36731/50000 [6:39:49<2:17:33,  1.61it/s]


 73%|████████████████████████▏        | 36732/50000 [6:39:50<2:15:19,  1.63it/s]


 73%|████████████████████████▏        | 36733/50000 [6:39:50<2:08:58,  1.71it/s]


 73%|████████████████████████▏        | 36734/50000 [6:39:51<2:14:22,  1.65it/s]


 73%|████████████████████████▏        | 36735/50000 [6:39:52<2:18:58,  1.59it/s]


 73%|████████████████████████▏        | 36736/50000 [6:39:52<2:16:19,  1.62it/s]


 73%|████████████████████████▏        | 36737/50000 [6:39:53<2:17:23,  1.61it/s]


 73%|████████████████████████▏        | 36738/50000 [6:39:53<2:18:08,  1.60it/s]


 73%|████████████████████████▏        | 36739/50000 [6:39:54<2:16:07,  1.62it/s]


 73%|████████████████████████▏        | 36740/50000 [6:39:55<2:19:46,  1.58it/s]


 73%|████████████████████████▏        | 36741/50000 [6:39:55<2:21:53,  1.56it/s]


 73%|████████████████████████▏        | 36742/50000 [6:39:56<2:34:41,  1.43it/s]


 73%|████████████████████████▎        | 36743/50000 [6:39:57<2:32:44,  1.45it/s]


 73%|████████████████████████▎        | 36744/50000 [6:39:58<2:31:29,  1.46it/s]


 73%|████████████████████████▎        | 36745/50000 [6:39:58<2:19:58,  1.58it/s]


 73%|████████████████████████▎        | 36746/50000 [6:39:59<2:26:31,  1.51it/s]


 73%|████████████████████████▎        | 36747/50000 [6:39:59<2:27:13,  1.50it/s]


 73%|████████████████████████▎        | 36748/50000 [6:40:00<2:21:21,  1.56it/s]


 73%|████████████████████████▎        | 36749/50000 [6:40:01<2:13:18,  1.66it/s]


 74%|████████████████████████▎        | 36750/50000 [6:40:01<2:27:22,  1.50it/s]


 74%|████████████████████████▎        | 36751/50000 [6:40:02<2:22:58,  1.54it/s]


 74%|████████████████████████▎        | 36752/50000 [6:40:03<2:21:13,  1.56it/s]


 74%|████████████████████████▎        | 36753/50000 [6:40:04<2:38:25,  1.39it/s]


 74%|████████████████████████▎        | 36754/50000 [6:40:04<2:38:17,  1.39it/s]


 74%|████████████████████████▎        | 36755/50000 [6:40:05<2:35:15,  1.42it/s]


 74%|████████████████████████▎        | 36756/50000 [6:40:06<2:44:24,  1.34it/s]


 74%|████████████████████████▎        | 36757/50000 [6:40:06<2:38:40,  1.39it/s]


 74%|████████████████████████▎        | 36758/50000 [6:40:07<2:23:34,  1.54it/s]


 74%|████████████████████████▎        | 36759/50000 [6:40:08<2:34:03,  1.43it/s]


 74%|████████████████████████▎        | 36760/50000 [6:40:08<2:31:27,  1.46it/s]


 74%|████████████████████████▎        | 36761/50000 [6:40:09<2:28:44,  1.48it/s]


 74%|████████████████████████▎        | 36762/50000 [6:40:10<2:26:51,  1.50it/s]


 74%|████████████████████████▎        | 36763/50000 [6:40:10<2:22:35,  1.55it/s]


 74%|████████████████████████▎        | 36764/50000 [6:40:11<2:22:56,  1.54it/s]


 74%|████████████████████████▎        | 36765/50000 [6:40:12<2:23:47,  1.53it/s]


 74%|████████████████████████▎        | 36766/50000 [6:40:12<2:24:43,  1.52it/s]


 74%|████████████████████████▎        | 36767/50000 [6:40:13<2:31:19,  1.46it/s]


 74%|████████████████████████▎        | 36768/50000 [6:40:14<2:24:55,  1.52it/s]


 74%|████████████████████████▎        | 36769/50000 [6:40:14<2:20:33,  1.57it/s]


 74%|████████████████████████▎        | 36770/50000 [6:40:15<2:17:55,  1.60it/s]


 74%|████████████████████████▎        | 36771/50000 [6:40:16<2:24:26,  1.53it/s]


 74%|████████████████████████▎        | 36772/50000 [6:40:16<2:12:02,  1.67it/s]


 74%|████████████████████████▎        | 36773/50000 [6:40:17<2:16:52,  1.61it/s]


 74%|████████████████████████▎        | 36774/50000 [6:40:17<2:25:26,  1.52it/s]


 74%|████████████████████████▎        | 36775/50000 [6:40:18<2:19:13,  1.58it/s]


 74%|████████████████████████▎        | 36776/50000 [6:40:19<2:25:26,  1.52it/s]


 74%|████████████████████████▎        | 36777/50000 [6:40:19<2:23:48,  1.53it/s]


 74%|████████████████████████▎        | 36778/50000 [6:40:20<2:20:32,  1.57it/s]


 74%|████████████████████████▎        | 36779/50000 [6:40:21<2:23:16,  1.54it/s]


 74%|████████████████████████▎        | 36780/50000 [6:40:21<2:16:50,  1.61it/s]


 74%|████████████████████████▎        | 36781/50000 [6:40:22<2:19:25,  1.58it/s]


 74%|████████████████████████▎        | 36782/50000 [6:40:23<2:26:01,  1.51it/s]


 74%|████████████████████████▎        | 36783/50000 [6:40:23<2:21:40,  1.55it/s]


 74%|████████████████████████▎        | 36784/50000 [6:40:24<2:23:13,  1.54it/s]


 74%|████████████████████████▎        | 36785/50000 [6:40:25<2:25:31,  1.51it/s]


 74%|████████████████████████▎        | 36786/50000 [6:40:25<2:19:56,  1.57it/s]


 74%|████████████████████████▎        | 36787/50000 [6:40:26<2:12:04,  1.67it/s]


 74%|████████████████████████▎        | 36788/50000 [6:40:26<2:06:56,  1.73it/s]


 74%|████████████████████████▎        | 36789/50000 [6:40:27<2:06:05,  1.75it/s]


 74%|████████████████████████▎        | 36790/50000 [6:40:27<2:11:17,  1.68it/s]


 74%|████████████████████████▎        | 36791/50000 [6:40:28<2:33:49,  1.43it/s]


 74%|████████████████████████▎        | 36792/50000 [6:40:29<2:41:18,  1.36it/s]


 74%|████████████████████████▎        | 36793/50000 [6:40:30<2:34:26,  1.43it/s]


 74%|████████████████████████▎        | 36794/50000 [6:40:30<2:35:32,  1.42it/s]


 74%|████████████████████████▎        | 36795/50000 [6:40:31<2:31:37,  1.45it/s]


 74%|████████████████████████▎        | 36796/50000 [6:40:32<2:41:19,  1.36it/s]


 74%|████████████████████████▎        | 36797/50000 [6:40:33<2:48:11,  1.31it/s]


 74%|████████████████████████▎        | 36798/50000 [6:40:33<2:43:01,  1.35it/s]


 74%|████████████████████████▎        | 36799/50000 [6:40:34<2:30:37,  1.46it/s]


 74%|████████████████████████▎        | 36800/50000 [6:40:35<2:30:16,  1.46it/s]
                                                                                
{'loss': 3.1884, 'grad_norm': 4.720425128936768, 'learning_rate': 0.000264, 'epoch': 1.93}

 74%|████████████████████████▎        | 36800/50000 [6:40:35<2:30:16,  1.46it/s]


 74%|████████████████████████▎        | 36801/50000 [6:40:35<2:22:34,  1.54it/s]


 74%|████████████████████████▎        | 36802/50000 [6:40:36<2:28:58,  1.48it/s]


 74%|████████████████████████▎        | 36803/50000 [6:40:37<2:20:23,  1.57it/s]


 74%|████████████████████████▎        | 36804/50000 [6:40:37<2:12:53,  1.65it/s]


 74%|████████████████████████▎        | 36805/50000 [6:40:38<2:22:36,  1.54it/s]


 74%|████████████████████████▎        | 36806/50000 [6:40:39<2:29:01,  1.48it/s]


 74%|████████████████████████▎        | 36807/50000 [6:40:39<2:17:15,  1.60it/s]


 74%|████████████████████████▎        | 36808/50000 [6:40:40<2:14:14,  1.64it/s]


 74%|████████████████████████▎        | 36809/50000 [6:40:40<2:12:05,  1.66it/s]


 74%|████████████████████████▎        | 36810/50000 [6:40:41<2:14:19,  1.64it/s]


 74%|████████████████████████▎        | 36811/50000 [6:40:42<2:23:58,  1.53it/s]


 74%|████████████████████████▎        | 36812/50000 [6:40:42<2:20:07,  1.57it/s]


 74%|████████████████████████▎        | 36813/50000 [6:40:43<2:28:58,  1.48it/s]


 74%|████████████████████████▎        | 36814/50000 [6:40:44<2:21:58,  1.55it/s]


 74%|████████████████████████▎        | 36815/50000 [6:40:44<2:20:52,  1.56it/s]


 74%|████████████████████████▎        | 36816/50000 [6:40:45<2:17:08,  1.60it/s]


 74%|████████████████████████▎        | 36817/50000 [6:40:45<2:12:26,  1.66it/s]


 74%|████████████████████████▎        | 36818/50000 [6:40:46<2:13:44,  1.64it/s]


 74%|████████████████████████▎        | 36819/50000 [6:40:47<2:11:44,  1.67it/s]


 74%|████████████████████████▎        | 36820/50000 [6:40:47<2:11:33,  1.67it/s]


 74%|████████████████████████▎        | 36821/50000 [6:40:48<2:08:58,  1.70it/s]


 74%|████████████████████████▎        | 36822/50000 [6:40:48<2:11:18,  1.67it/s]


 74%|████████████████████████▎        | 36823/50000 [6:40:49<2:16:13,  1.61it/s]


 74%|████████████████████████▎        | 36824/50000 [6:40:50<2:13:44,  1.64it/s]


 74%|████████████████████████▎        | 36825/50000 [6:40:50<2:32:51,  1.44it/s]


 74%|████████████████████████▎        | 36826/50000 [6:40:51<2:30:46,  1.46it/s]


 74%|████████████████████████▎        | 36827/50000 [6:40:52<2:29:23,  1.47it/s]


 74%|████████████████████████▎        | 36828/50000 [6:40:52<2:20:53,  1.56it/s]


 74%|████████████████████████▎        | 36829/50000 [6:40:53<2:21:34,  1.55it/s]


 74%|████████████████████████▎        | 36830/50000 [6:40:54<2:17:02,  1.60it/s]


 74%|████████████████████████▎        | 36831/50000 [6:40:54<2:13:02,  1.65it/s]


 74%|████████████████████████▎        | 36832/50000 [6:40:55<2:13:06,  1.65it/s]


 74%|████████████████████████▎        | 36833/50000 [6:40:55<2:14:01,  1.64it/s]


 74%|████████████████████████▎        | 36834/50000 [6:40:56<2:07:21,  1.72it/s]


 74%|████████████████████████▎        | 36835/50000 [6:40:57<2:16:47,  1.60it/s]


 74%|████████████████████████▎        | 36836/50000 [6:40:57<2:09:44,  1.69it/s]


 74%|████████████████████████▎        | 36837/50000 [6:40:58<2:12:04,  1.66it/s]


 74%|████████████████████████▎        | 36838/50000 [6:40:59<2:23:52,  1.52it/s]


 74%|████████████████████████▎        | 36839/50000 [6:40:59<2:28:00,  1.48it/s]


 74%|████████████████████████▎        | 36840/50000 [6:41:00<2:28:08,  1.48it/s]


 74%|████████████████████████▎        | 36841/50000 [6:41:00<2:20:21,  1.56it/s]


 74%|████████████████████████▎        | 36842/50000 [6:41:01<2:26:43,  1.49it/s]


 74%|████████████████████████▎        | 36843/50000 [6:41:02<2:21:26,  1.55it/s]


 74%|████████████████████████▎        | 36844/50000 [6:41:02<2:18:33,  1.58it/s]


 74%|████████████████████████▎        | 36845/50000 [6:41:03<2:32:44,  1.44it/s]


 74%|████████████████████████▎        | 36846/50000 [6:41:04<2:25:24,  1.51it/s]


 74%|████████████████████████▎        | 36847/50000 [6:41:05<2:32:02,  1.44it/s]


 74%|████████████████████████▎        | 36848/50000 [6:41:05<2:22:40,  1.54it/s]


 74%|████████████████████████▎        | 36849/50000 [6:41:06<2:20:41,  1.56it/s]


 74%|████████████████████████▎        | 36850/50000 [6:41:06<2:15:09,  1.62it/s]


 74%|████████████████████████▎        | 36851/50000 [6:41:07<2:25:35,  1.51it/s]


 74%|████████████████████████▎        | 36852/50000 [6:41:08<2:25:02,  1.51it/s]


 74%|████████████████████████▎        | 36853/50000 [6:41:08<2:23:40,  1.53it/s]


 74%|████████████████████████▎        | 36854/50000 [6:41:09<2:25:12,  1.51it/s]


 74%|████████████████████████▎        | 36855/50000 [6:41:10<2:18:54,  1.58it/s]


 74%|████████████████████████▎        | 36856/50000 [6:41:10<2:10:53,  1.67it/s]


 74%|████████████████████████▎        | 36857/50000 [6:41:11<2:20:36,  1.56it/s]


 74%|████████████████████████▎        | 36858/50000 [6:41:12<2:29:31,  1.46it/s]


 74%|████████████████████████▎        | 36859/50000 [6:41:12<2:19:07,  1.57it/s]


 74%|████████████████████████▎        | 36860/50000 [6:41:13<2:21:35,  1.55it/s]


 74%|████████████████████████▎        | 36861/50000 [6:41:13<2:20:24,  1.56it/s]


 74%|████████████████████████▎        | 36862/50000 [6:41:14<2:21:12,  1.55it/s]


 74%|████████████████████████▎        | 36863/50000 [6:41:15<2:20:07,  1.56it/s]


 74%|████████████████████████▎        | 36864/50000 [6:41:15<2:16:28,  1.60it/s]


 74%|████████████████████████▎        | 36865/50000 [6:41:16<2:30:29,  1.45it/s]


 74%|████████████████████████▎        | 36866/50000 [6:41:17<2:22:03,  1.54it/s]


 74%|████████████████████████▎        | 36867/50000 [6:41:17<2:26:43,  1.49it/s]


 74%|████████████████████████▎        | 36868/50000 [6:41:18<2:22:11,  1.54it/s]


 74%|████████████████████████▎        | 36869/50000 [6:41:19<2:17:02,  1.60it/s]


 74%|████████████████████████▎        | 36870/50000 [6:41:19<2:15:27,  1.62it/s]


 74%|████████████████████████▎        | 36871/50000 [6:41:20<2:19:45,  1.57it/s]


 74%|████████████████████████▎        | 36872/50000 [6:41:21<2:17:18,  1.59it/s]


 74%|████████████████████████▎        | 36873/50000 [6:41:21<2:25:17,  1.51it/s]


 74%|████████████████████████▎        | 36874/50000 [6:41:22<2:41:33,  1.35it/s]


 74%|████████████████████████▎        | 36875/50000 [6:41:23<2:40:35,  1.36it/s]


 74%|████████████████████████▎        | 36876/50000 [6:41:24<2:33:57,  1.42it/s]


 74%|████████████████████████▎        | 36877/50000 [6:41:24<2:25:12,  1.51it/s]


 74%|████████████████████████▎        | 36878/50000 [6:41:25<2:25:40,  1.50it/s]


 74%|████████████████████████▎        | 36879/50000 [6:41:25<2:24:43,  1.51it/s]


 74%|████████████████████████▎        | 36880/50000 [6:41:26<2:30:27,  1.45it/s]


 74%|████████████████████████▎        | 36881/50000 [6:41:27<2:23:56,  1.52it/s]


 74%|████████████████████████▎        | 36882/50000 [6:41:27<2:17:43,  1.59it/s]


 74%|████████████████████████▎        | 36883/50000 [6:41:28<2:18:46,  1.58it/s]


 74%|████████████████████████▎        | 36884/50000 [6:41:29<2:25:51,  1.50it/s]


 74%|████████████████████████▎        | 36885/50000 [6:41:29<2:18:41,  1.58it/s]


 74%|████████████████████████▎        | 36886/50000 [6:41:30<2:21:06,  1.55it/s]


 74%|████████████████████████▎        | 36887/50000 [6:41:31<2:43:40,  1.34it/s]


 74%|████████████████████████▎        | 36888/50000 [6:41:32<2:38:58,  1.37it/s]


 74%|████████████████████████▎        | 36889/50000 [6:41:32<2:32:48,  1.43it/s]


 74%|████████████████████████▎        | 36890/50000 [6:41:33<2:26:02,  1.50it/s]


 74%|████████████████████████▎        | 36891/50000 [6:41:34<2:25:24,  1.50it/s]


 74%|████████████████████████▎        | 36892/50000 [6:41:34<2:30:02,  1.46it/s]


 74%|████████████████████████▎        | 36893/50000 [6:41:35<2:29:56,  1.46it/s]


 74%|████████████████████████▎        | 36894/50000 [6:41:36<2:27:56,  1.48it/s]


 74%|████████████████████████▎        | 36895/50000 [6:41:36<2:27:09,  1.48it/s]


 74%|████████████████████████▎        | 36896/50000 [6:41:37<2:25:41,  1.50it/s]


 74%|████████████████████████▎        | 36897/50000 [6:41:38<2:23:50,  1.52it/s]


 74%|████████████████████████▎        | 36898/50000 [6:41:38<2:12:54,  1.64it/s]


 74%|████████████████████████▎        | 36899/50000 [6:41:39<2:13:40,  1.63it/s]


 74%|████████████████████████▎        | 36900/50000 [6:41:39<2:17:06,  1.59it/s]
                                                                                
{'loss': 3.1627, 'grad_norm': 2.9137051105499268, 'learning_rate': 0.000262, 'epoch': 1.93}

 74%|████████████████████████▎        | 36900/50000 [6:41:39<2:17:06,  1.59it/s]


 74%|████████████████████████▎        | 36901/50000 [6:41:40<2:25:31,  1.50it/s]


 74%|████████████████████████▎        | 36902/50000 [6:41:41<2:21:13,  1.55it/s]


 74%|████████████████████████▎        | 36903/50000 [6:41:41<2:20:19,  1.56it/s]


 74%|████████████████████████▎        | 36904/50000 [6:41:42<2:19:41,  1.56it/s]


 74%|████████████████████████▎        | 36905/50000 [6:41:43<2:14:40,  1.62it/s]


 74%|████████████████████████▎        | 36906/50000 [6:41:43<2:12:25,  1.65it/s]


 74%|████████████████████████▎        | 36907/50000 [6:41:44<2:17:00,  1.59it/s]


 74%|████████████████████████▎        | 36908/50000 [6:41:44<2:17:44,  1.58it/s]


 74%|████████████████████████▎        | 36909/50000 [6:41:45<2:12:58,  1.64it/s]


 74%|████████████████████████▎        | 36910/50000 [6:41:46<2:10:23,  1.67it/s]


 74%|████████████████████████▎        | 36911/50000 [6:41:46<2:13:54,  1.63it/s]


 74%|████████████████████████▎        | 36912/50000 [6:41:47<2:15:59,  1.60it/s]


 74%|████████████████████████▎        | 36913/50000 [6:41:48<2:22:57,  1.53it/s]


 74%|████████████████████████▎        | 36914/50000 [6:41:48<2:16:08,  1.60it/s]


 74%|████████████████████████▎        | 36915/50000 [6:41:49<2:17:57,  1.58it/s]


 74%|████████████████████████▎        | 36916/50000 [6:41:50<2:25:09,  1.50it/s]


 74%|████████████████████████▎        | 36917/50000 [6:41:50<2:30:13,  1.45it/s]


 74%|████████████████████████▎        | 36918/50000 [6:41:51<2:29:04,  1.46it/s]


 74%|████████████████████████▎        | 36919/50000 [6:41:52<2:22:51,  1.53it/s]


 74%|████████████████████████▎        | 36920/50000 [6:41:52<2:19:19,  1.56it/s]


 74%|████████████████████████▎        | 36921/50000 [6:41:53<2:21:25,  1.54it/s]


 74%|████████████████████████▎        | 36922/50000 [6:41:53<2:12:44,  1.64it/s]


 74%|████████████████████████▎        | 36923/50000 [6:41:54<2:13:33,  1.63it/s]


 74%|████████████████████████▎        | 36924/50000 [6:41:55<2:12:47,  1.64it/s]


 74%|████████████████████████▎        | 36925/50000 [6:41:55<2:06:41,  1.72it/s]


 74%|████████████████████████▎        | 36926/50000 [6:41:56<2:07:43,  1.71it/s]


 74%|████████████████████████▎        | 36927/50000 [6:41:56<2:17:12,  1.59it/s]


 74%|████████████████████████▎        | 36928/50000 [6:41:57<2:16:59,  1.59it/s]


 74%|████████████████████████▎        | 36929/50000 [6:41:58<2:11:53,  1.65it/s]


 74%|████████████████████████▎        | 36930/50000 [6:41:58<2:15:21,  1.61it/s]


 74%|████████████████████████▎        | 36931/50000 [6:41:59<2:11:10,  1.66it/s]


 74%|████████████████████████▍        | 36932/50000 [6:42:00<2:24:42,  1.51it/s]


 74%|████████████████████████▍        | 36933/50000 [6:42:00<2:18:08,  1.58it/s]


 74%|████████████████████████▍        | 36934/50000 [6:42:01<2:23:36,  1.52it/s]


 74%|████████████████████████▍        | 36935/50000 [6:42:01<2:17:47,  1.58it/s]


 74%|████████████████████████▍        | 36936/50000 [6:42:02<2:14:35,  1.62it/s]


 74%|████████████████████████▍        | 36937/50000 [6:42:03<2:18:37,  1.57it/s]


 74%|████████████████████████▍        | 36938/50000 [6:42:03<2:14:26,  1.62it/s]


 74%|████████████████████████▍        | 36939/50000 [6:42:04<2:36:18,  1.39it/s]


 74%|████████████████████████▍        | 36940/50000 [6:42:05<2:48:14,  1.29it/s]


 74%|████████████████████████▍        | 36941/50000 [6:42:06<2:35:06,  1.40it/s]


 74%|████████████████████████▍        | 36942/50000 [6:42:06<2:37:37,  1.38it/s]


 74%|████████████████████████▍        | 36943/50000 [6:42:07<2:24:34,  1.51it/s]


 74%|████████████████████████▍        | 36944/50000 [6:42:08<2:18:09,  1.58it/s]


 74%|████████████████████████▍        | 36945/50000 [6:42:08<2:23:33,  1.52it/s]


 74%|████████████████████████▍        | 36946/50000 [6:42:09<2:24:49,  1.50it/s]


 74%|████████████████████████▍        | 36947/50000 [6:42:10<2:19:08,  1.56it/s]


 74%|████████████████████████▍        | 36948/50000 [6:42:10<2:20:03,  1.55it/s]


 74%|████████████████████████▍        | 36949/50000 [6:42:11<2:27:03,  1.48it/s]


 74%|████████████████████████▍        | 36950/50000 [6:42:12<2:31:09,  1.44it/s]


 74%|████████████████████████▍        | 36951/50000 [6:42:12<2:39:40,  1.36it/s]


 74%|████████████████████████▍        | 36952/50000 [6:42:13<2:29:07,  1.46it/s]


 74%|████████████████████████▍        | 36953/50000 [6:42:14<2:31:22,  1.44it/s]


 74%|████████████████████████▍        | 36954/50000 [6:42:14<2:28:12,  1.47it/s]


 74%|████████████████████████▍        | 36955/50000 [6:42:15<2:33:38,  1.42it/s]


 74%|████████████████████████▍        | 36956/50000 [6:42:16<2:26:05,  1.49it/s]


 74%|████████████████████████▍        | 36957/50000 [6:42:16<2:23:41,  1.51it/s]


 74%|████████████████████████▍        | 36958/50000 [6:42:17<2:33:54,  1.41it/s]


 74%|████████████████████████▍        | 36959/50000 [6:42:18<2:38:47,  1.37it/s]


 74%|████████████████████████▍        | 36960/50000 [6:42:19<2:27:37,  1.47it/s]


 74%|████████████████████████▍        | 36961/50000 [6:42:19<2:31:29,  1.43it/s]


 74%|████████████████████████▍        | 36962/50000 [6:42:20<2:37:06,  1.38it/s]


 74%|████████████████████████▍        | 36963/50000 [6:42:21<2:34:11,  1.41it/s]


 74%|████████████████████████▍        | 36964/50000 [6:42:21<2:30:25,  1.44it/s]


 74%|████████████████████████▍        | 36965/50000 [6:42:22<2:27:45,  1.47it/s]


 74%|████████████████████████▍        | 36966/50000 [6:42:23<2:31:24,  1.43it/s]


 74%|████████████████████████▍        | 36967/50000 [6:42:23<2:24:44,  1.50it/s]


 74%|████████████████████████▍        | 36968/50000 [6:42:24<2:29:18,  1.45it/s]


 74%|████████████████████████▍        | 36969/50000 [6:42:25<2:22:27,  1.52it/s]


 74%|████████████████████████▍        | 36970/50000 [6:42:25<2:18:04,  1.57it/s]


 74%|████████████████████████▍        | 36971/50000 [6:42:26<2:13:45,  1.62it/s]


 74%|████████████████████████▍        | 36972/50000 [6:42:26<2:06:01,  1.72it/s]


 74%|████████████████████████▍        | 36973/50000 [6:42:27<2:00:12,  1.81it/s]


 74%|████████████████████████▍        | 36974/50000 [6:42:28<2:05:45,  1.73it/s]


 74%|████████████████████████▍        | 36975/50000 [6:42:28<2:06:46,  1.71it/s]


 74%|████████████████████████▍        | 36976/50000 [6:42:29<2:12:50,  1.63it/s]


 74%|████████████████████████▍        | 36977/50000 [6:42:29<2:14:36,  1.61it/s]


 74%|████████████████████████▍        | 36978/50000 [6:42:30<2:18:19,  1.57it/s]


 74%|████████████████████████▍        | 36979/50000 [6:42:31<2:25:42,  1.49it/s]


 74%|████████████████████████▍        | 36980/50000 [6:42:31<2:15:35,  1.60it/s]


 74%|████████████████████████▍        | 36981/50000 [6:42:32<2:16:57,  1.58it/s]


 74%|████████████████████████▍        | 36982/50000 [6:42:33<2:13:22,  1.63it/s]


 74%|████████████████████████▍        | 36983/50000 [6:42:33<2:14:58,  1.61it/s]


 74%|████████████████████████▍        | 36984/50000 [6:42:34<2:08:14,  1.69it/s]


 74%|████████████████████████▍        | 36985/50000 [6:42:34<2:14:27,  1.61it/s]


 74%|████████████████████████▍        | 36986/50000 [6:42:35<2:10:28,  1.66it/s]


 74%|████████████████████████▍        | 36987/50000 [6:42:36<2:18:36,  1.56it/s]


 74%|████████████████████████▍        | 36988/50000 [6:42:36<2:17:32,  1.58it/s]


 74%|████████████████████████▍        | 36989/50000 [6:42:37<2:37:40,  1.38it/s]


 74%|████████████████████████▍        | 36990/50000 [6:42:38<2:27:25,  1.47it/s]


 74%|████████████████████████▍        | 36991/50000 [6:42:39<2:34:01,  1.41it/s]


 74%|████████████████████████▍        | 36992/50000 [6:42:39<2:28:58,  1.46it/s]


 74%|████████████████████████▍        | 36993/50000 [6:42:40<2:27:40,  1.47it/s]


 74%|████████████████████████▍        | 36994/50000 [6:42:41<2:21:25,  1.53it/s]


 74%|████████████████████████▍        | 36995/50000 [6:42:41<2:17:16,  1.58it/s]


 74%|████████████████████████▍        | 36996/50000 [6:42:42<2:16:26,  1.59it/s]


 74%|████████████████████████▍        | 36997/50000 [6:42:42<2:23:22,  1.51it/s]


 74%|████████████████████████▍        | 36998/50000 [6:42:43<2:23:33,  1.51it/s]


 74%|████████████████████████▍        | 36999/50000 [6:42:44<2:16:19,  1.59it/s]


 74%|████████████████████████▍        | 37000/50000 [6:42:44<2:18:36,  1.56it/s]
                                                                                
{'loss': 3.2325, 'grad_norm': 3.0540711879730225, 'learning_rate': 0.00026000000000000003, 'epoch': 1.94}

 74%|████████████████████████▍        | 37000/50000 [6:42:44<2:18:36,  1.56it/s]


 74%|████████████████████████▍        | 37001/50000 [6:42:45<2:10:09,  1.66it/s]


 74%|████████████████████████▍        | 37002/50000 [6:42:45<2:08:07,  1.69it/s]


 74%|████████████████████████▍        | 37003/50000 [6:42:46<2:02:25,  1.77it/s]


 74%|████████████████████████▍        | 37004/50000 [6:42:47<2:06:49,  1.71it/s]


 74%|████████████████████████▍        | 37005/50000 [6:42:47<2:11:04,  1.65it/s]


 74%|████████████████████████▍        | 37006/50000 [6:42:48<2:08:46,  1.68it/s]


 74%|████████████████████████▍        | 37007/50000 [6:42:49<2:20:02,  1.55it/s]


 74%|████████████████████████▍        | 37008/50000 [6:42:49<2:10:46,  1.66it/s]


 74%|████████████████████████▍        | 37009/50000 [6:42:50<2:19:23,  1.55it/s]


 74%|████████████████████████▍        | 37010/50000 [6:42:51<2:33:29,  1.41it/s]


 74%|████████████████████████▍        | 37011/50000 [6:42:52<2:41:39,  1.34it/s]


 74%|████████████████████████▍        | 37012/50000 [6:42:52<2:31:23,  1.43it/s]


 74%|████████████████████████▍        | 37013/50000 [6:42:53<2:24:10,  1.50it/s]


 74%|████████████████████████▍        | 37014/50000 [6:42:53<2:17:29,  1.57it/s]


 74%|████████████████████████▍        | 37015/50000 [6:42:54<2:14:04,  1.61it/s]


 74%|████████████████████████▍        | 37016/50000 [6:42:54<2:16:08,  1.59it/s]


 74%|████████████████████████▍        | 37017/50000 [6:42:55<2:24:47,  1.49it/s]


 74%|████████████████████████▍        | 37018/50000 [6:42:56<2:18:22,  1.56it/s]


 74%|████████████████████████▍        | 37019/50000 [6:42:57<2:23:26,  1.51it/s]


 74%|████████████████████████▍        | 37020/50000 [6:42:57<2:08:46,  1.68it/s]


 74%|████████████████████████▍        | 37021/50000 [6:42:58<2:07:09,  1.70it/s]


 74%|████████████████████████▍        | 37022/50000 [6:42:58<2:27:48,  1.46it/s]


 74%|████████████████████████▍        | 37023/50000 [6:42:59<2:20:25,  1.54it/s]


 74%|████████████████████████▍        | 37024/50000 [6:43:00<2:16:29,  1.58it/s]


 74%|████████████████████████▍        | 37025/50000 [6:43:00<2:16:12,  1.59it/s]


 74%|████████████████████████▍        | 37026/50000 [6:43:01<2:12:21,  1.63it/s]


 74%|████████████████████████▍        | 37027/50000 [6:43:01<2:10:26,  1.66it/s]


 74%|████████████████████████▍        | 37028/50000 [6:43:02<2:04:47,  1.73it/s]


 74%|████████████████████████▍        | 37029/50000 [6:43:03<2:11:16,  1.65it/s]


 74%|████████████████████████▍        | 37030/50000 [6:43:03<2:10:52,  1.65it/s]


 74%|████████████████████████▍        | 37031/50000 [6:43:04<2:15:08,  1.60it/s]


 74%|████████████████████████▍        | 37032/50000 [6:43:05<2:22:22,  1.52it/s]


 74%|████████████████████████▍        | 37033/50000 [6:43:05<2:20:19,  1.54it/s]


 74%|████████████████████████▍        | 37034/50000 [6:43:06<2:25:57,  1.48it/s]


 74%|████████████████████████▍        | 37035/50000 [6:43:07<2:18:19,  1.56it/s]


 74%|████████████████████████▍        | 37036/50000 [6:43:07<2:24:46,  1.49it/s]


 74%|████████████████████████▍        | 37037/50000 [6:43:08<2:35:59,  1.39it/s]


 74%|████████████████████████▍        | 37038/50000 [6:43:09<2:33:05,  1.41it/s]


 74%|████████████████████████▍        | 37039/50000 [6:43:09<2:29:25,  1.45it/s]


 74%|████████████████████████▍        | 37040/50000 [6:43:10<2:25:12,  1.49it/s]


 74%|████████████████████████▍        | 37041/50000 [6:43:11<2:24:42,  1.49it/s]


 74%|████████████████████████▍        | 37042/50000 [6:43:11<2:27:13,  1.47it/s]


 74%|████████████████████████▍        | 37043/50000 [6:43:12<2:23:38,  1.50it/s]


 74%|████████████████████████▍        | 37044/50000 [6:43:13<2:21:39,  1.52it/s]


 74%|████████████████████████▍        | 37045/50000 [6:43:13<2:26:04,  1.48it/s]


 74%|████████████████████████▍        | 37046/50000 [6:43:14<2:31:38,  1.42it/s]


 74%|████████████████████████▍        | 37047/50000 [6:43:15<2:27:56,  1.46it/s]


 74%|████████████████████████▍        | 37048/50000 [6:43:15<2:24:32,  1.49it/s]


 74%|████████████████████████▍        | 37049/50000 [6:43:16<2:14:13,  1.61it/s]


 74%|████████████████████████▍        | 37050/50000 [6:43:17<2:14:32,  1.60it/s]


 74%|████████████████████████▍        | 37051/50000 [6:43:17<2:14:46,  1.60it/s]


 74%|████████████████████████▍        | 37052/50000 [6:43:18<2:12:03,  1.63it/s]


 74%|████████████████████████▍        | 37053/50000 [6:43:18<2:07:43,  1.69it/s]


 74%|████████████████████████▍        | 37054/50000 [6:43:19<2:13:15,  1.62it/s]


 74%|████████████████████████▍        | 37055/50000 [6:43:20<2:29:00,  1.45it/s]


 74%|████████████████████████▍        | 37056/50000 [6:43:20<2:19:59,  1.54it/s]


 74%|████████████████████████▍        | 37057/50000 [6:43:21<2:36:36,  1.38it/s]


 74%|████████████████████████▍        | 37058/50000 [6:43:22<2:30:57,  1.43it/s]


 74%|████████████████████████▍        | 37059/50000 [6:43:23<2:27:27,  1.46it/s]


 74%|████████████████████████▍        | 37060/50000 [6:43:23<2:26:13,  1.47it/s]


 74%|████████████████████████▍        | 37061/50000 [6:43:24<2:15:45,  1.59it/s]


 74%|████████████████████████▍        | 37062/50000 [6:43:25<2:30:20,  1.43it/s]


 74%|████████████████████████▍        | 37063/50000 [6:43:25<2:22:30,  1.51it/s]


 74%|████████████████████████▍        | 37064/50000 [6:43:26<2:21:54,  1.52it/s]


 74%|████████████████████████▍        | 37065/50000 [6:43:26<2:11:09,  1.64it/s]


 74%|████████████████████████▍        | 37066/50000 [6:43:27<2:18:37,  1.56it/s]


 74%|████████████████████████▍        | 37067/50000 [6:43:28<2:27:10,  1.46it/s]


 74%|████████████████████████▍        | 37068/50000 [6:43:29<2:27:12,  1.46it/s]


 74%|████████████████████████▍        | 37069/50000 [6:43:29<2:26:50,  1.47it/s]


 74%|████████████████████████▍        | 37070/50000 [6:43:30<2:31:40,  1.42it/s]


 74%|████████████████████████▍        | 37071/50000 [6:43:31<2:29:34,  1.44it/s]


 74%|████████████████████████▍        | 37072/50000 [6:43:31<2:27:39,  1.46it/s]


 74%|████████████████████████▍        | 37073/50000 [6:43:32<2:26:20,  1.47it/s]


 74%|████████████████████████▍        | 37074/50000 [6:43:33<2:25:00,  1.49it/s]


 74%|████████████████████████▍        | 37075/50000 [6:43:33<2:25:16,  1.48it/s]


 74%|████████████████████████▍        | 37076/50000 [6:43:34<2:21:47,  1.52it/s]


 74%|████████████████████████▍        | 37077/50000 [6:43:35<2:28:03,  1.45it/s]


 74%|████████████████████████▍        | 37078/50000 [6:43:35<2:22:04,  1.52it/s]


 74%|████████████████████████▍        | 37079/50000 [6:43:36<2:15:27,  1.59it/s]


 74%|████████████████████████▍        | 37080/50000 [6:43:36<2:12:27,  1.63it/s]


 74%|████████████████████████▍        | 37081/50000 [6:43:37<2:06:04,  1.71it/s]


 74%|████████████████████████▍        | 37082/50000 [6:43:37<2:01:59,  1.76it/s]


 74%|████████████████████████▍        | 37083/50000 [6:43:38<2:13:20,  1.61it/s]


 74%|████████████████████████▍        | 37084/50000 [6:43:39<2:26:00,  1.47it/s]


 74%|████████████████████████▍        | 37085/50000 [6:43:40<2:45:24,  1.30it/s]


 74%|████████████████████████▍        | 37086/50000 [6:43:41<2:54:18,  1.23it/s]


 74%|████████████████████████▍        | 37087/50000 [6:43:42<2:40:49,  1.34it/s]


 74%|████████████████████████▍        | 37088/50000 [6:43:42<2:33:02,  1.41it/s]


 74%|████████████████████████▍        | 37089/50000 [6:43:43<2:20:27,  1.53it/s]


 74%|████████████████████████▍        | 37090/50000 [6:43:43<2:17:08,  1.57it/s]


 74%|████████████████████████▍        | 37091/50000 [6:43:44<2:09:20,  1.66it/s]


 74%|████████████████████████▍        | 37092/50000 [6:43:44<2:12:37,  1.62it/s]


 74%|████████████████████████▍        | 37093/50000 [6:43:45<2:14:59,  1.59it/s]


 74%|████████████████████████▍        | 37094/50000 [6:43:46<2:16:16,  1.58it/s]


 74%|████████████████████████▍        | 37095/50000 [6:43:46<2:23:27,  1.50it/s]


 74%|████████████████████████▍        | 37096/50000 [6:43:47<2:15:41,  1.58it/s]


 74%|████████████████████████▍        | 37097/50000 [6:43:48<2:16:13,  1.58it/s]


 74%|████████████████████████▍        | 37098/50000 [6:43:48<2:19:11,  1.54it/s]


 74%|████████████████████████▍        | 37099/50000 [6:43:49<2:20:52,  1.53it/s]


 74%|████████████████████████▍        | 37100/50000 [6:43:50<2:19:56,  1.54it/s]
                                                                                
{'loss': 3.2431, 'grad_norm': 3.1152901649475098, 'learning_rate': 0.00025800000000000004, 'epoch': 1.94}

 74%|████████████████████████▍        | 37100/50000 [6:43:50<2:19:56,  1.54it/s]


 74%|████████████████████████▍        | 37101/50000 [6:43:51<2:32:23,  1.41it/s]


 74%|████████████████████████▍        | 37102/50000 [6:43:51<2:27:38,  1.46it/s]


 74%|████████████████████████▍        | 37103/50000 [6:43:52<2:41:05,  1.33it/s]


 74%|████████████████████████▍        | 37104/50000 [6:43:53<2:33:37,  1.40it/s]


 74%|████████████████████████▍        | 37105/50000 [6:43:53<2:27:55,  1.45it/s]


 74%|████████████████████████▍        | 37106/50000 [6:43:54<2:20:41,  1.53it/s]


 74%|████████████████████████▍        | 37107/50000 [6:43:55<2:28:50,  1.44it/s]


 74%|████████████████████████▍        | 37108/50000 [6:43:55<2:31:18,  1.42it/s]


 74%|████████████████████████▍        | 37109/50000 [6:43:56<2:40:33,  1.34it/s]


 74%|████████████████████████▍        | 37110/50000 [6:43:57<2:39:01,  1.35it/s]


 74%|████████████████████████▍        | 37111/50000 [6:43:58<2:28:03,  1.45it/s]


 74%|████████████████████████▍        | 37112/50000 [6:43:58<2:31:06,  1.42it/s]


 74%|████████████████████████▍        | 37113/50000 [6:43:59<2:24:05,  1.49it/s]


 74%|████████████████████████▍        | 37114/50000 [6:43:59<2:19:10,  1.54it/s]


 74%|████████████████████████▍        | 37115/50000 [6:44:00<2:12:32,  1.62it/s]


 74%|████████████████████████▍        | 37116/50000 [6:44:01<2:11:31,  1.63it/s]


 74%|████████████████████████▍        | 37117/50000 [6:44:01<2:04:06,  1.73it/s]


 74%|████████████████████████▍        | 37118/50000 [6:44:02<2:05:36,  1.71it/s]


 74%|████████████████████████▍        | 37119/50000 [6:44:02<2:10:18,  1.65it/s]


 74%|████████████████████████▍        | 37120/50000 [6:44:03<2:07:55,  1.68it/s]


 74%|████████████████████████▍        | 37121/50000 [6:44:04<2:09:31,  1.66it/s]


 74%|████████████████████████▌        | 37122/50000 [6:44:04<2:17:44,  1.56it/s]


 74%|████████████████████████▌        | 37123/50000 [6:44:05<2:19:38,  1.54it/s]


 74%|████████████████████████▌        | 37124/50000 [6:44:06<2:31:56,  1.41it/s]


 74%|████████████████████████▌        | 37125/50000 [6:44:06<2:22:39,  1.50it/s]


 74%|████████████████████████▌        | 37126/50000 [6:44:07<2:23:04,  1.50it/s]


 74%|████████████████████████▌        | 37127/50000 [6:44:08<2:17:15,  1.56it/s]


 74%|████████████████████████▌        | 37128/50000 [6:44:08<2:14:45,  1.59it/s]


 74%|████████████████████████▌        | 37129/50000 [6:44:09<2:13:24,  1.61it/s]


 74%|████████████████████████▌        | 37130/50000 [6:44:09<2:15:46,  1.58it/s]


 74%|████████████████████████▌        | 37131/50000 [6:44:10<2:07:03,  1.69it/s]


 74%|████████████████████████▌        | 37132/50000 [6:44:11<2:10:05,  1.65it/s]


 74%|████████████████████████▌        | 37133/50000 [6:44:11<2:06:08,  1.70it/s]


 74%|████████████████████████▌        | 37134/50000 [6:44:12<2:06:33,  1.69it/s]


 74%|████████████████████████▌        | 37135/50000 [6:44:12<2:07:15,  1.68it/s]


 74%|████████████████████████▌        | 37136/50000 [6:44:13<2:04:58,  1.72it/s]


 74%|████████████████████████▌        | 37137/50000 [6:44:13<2:03:22,  1.74it/s]


 74%|████████████████████████▌        | 37138/50000 [6:44:14<2:11:59,  1.62it/s]


 74%|████████████████████████▌        | 37139/50000 [6:44:15<2:18:56,  1.54it/s]


 74%|████████████████████████▌        | 37140/50000 [6:44:16<2:25:00,  1.48it/s]


 74%|████████████████████████▌        | 37141/50000 [6:44:16<2:22:36,  1.50it/s]


 74%|████████████████████████▌        | 37142/50000 [6:44:17<2:22:58,  1.50it/s]


 74%|████████████████████████▌        | 37143/50000 [6:44:18<2:23:14,  1.50it/s]


 74%|████████████████████████▌        | 37144/50000 [6:44:18<2:21:21,  1.52it/s]


 74%|████████████████████████▌        | 37145/50000 [6:44:19<2:17:36,  1.56it/s]


 74%|████████████████████████▌        | 37146/50000 [6:44:20<2:18:12,  1.55it/s]


 74%|████████████████████████▌        | 37147/50000 [6:44:20<2:25:44,  1.47it/s]


 74%|████████████████████████▌        | 37148/50000 [6:44:21<2:17:52,  1.55it/s]


 74%|████████████████████████▌        | 37149/50000 [6:44:22<2:23:25,  1.49it/s]


 74%|████████████████████████▌        | 37150/50000 [6:44:22<2:21:27,  1.51it/s]


 74%|████████████████████████▌        | 37151/50000 [6:44:23<2:24:34,  1.48it/s]


 74%|████████████████████████▌        | 37152/50000 [6:44:24<2:28:31,  1.44it/s]


 74%|████████████████████████▌        | 37153/50000 [6:44:25<2:38:25,  1.35it/s]


 74%|████████████████████████▌        | 37154/50000 [6:44:25<2:31:55,  1.41it/s]


 74%|████████████████████████▌        | 37155/50000 [6:44:26<2:21:21,  1.51it/s]


 74%|████████████████████████▌        | 37156/50000 [6:44:26<2:19:37,  1.53it/s]


 74%|████████████████████████▌        | 37157/50000 [6:44:27<2:16:19,  1.57it/s]


 74%|████████████████████████▌        | 37158/50000 [6:44:28<2:16:52,  1.56it/s]


 74%|████████████████████████▌        | 37159/50000 [6:44:28<2:14:00,  1.60it/s]


 74%|████████████████████████▌        | 37160/50000 [6:44:29<2:32:20,  1.40it/s]


 74%|████████████████████████▌        | 37161/50000 [6:44:30<2:26:57,  1.46it/s]


 74%|████████████████████████▌        | 37162/50000 [6:44:30<2:23:58,  1.49it/s]


 74%|████████████████████████▌        | 37163/50000 [6:44:31<2:23:26,  1.49it/s]


 74%|████████████████████████▌        | 37164/50000 [6:44:32<2:18:42,  1.54it/s]


 74%|████████████████████████▌        | 37165/50000 [6:44:32<2:18:40,  1.54it/s]


 74%|████████████████████████▌        | 37166/50000 [6:44:33<2:17:25,  1.56it/s]


 74%|████████████████████████▌        | 37167/50000 [6:44:33<2:13:18,  1.60it/s]


 74%|████████████████████████▌        | 37168/50000 [6:44:34<2:15:39,  1.58it/s]


 74%|████████████████████████▌        | 37169/50000 [6:44:35<2:06:36,  1.69it/s]


 74%|████████████████████████▌        | 37170/50000 [6:44:35<2:05:44,  1.70it/s]


 74%|████████████████████████▌        | 37171/50000 [6:44:36<2:12:01,  1.62it/s]


 74%|████████████████████████▌        | 37172/50000 [6:44:36<2:09:53,  1.65it/s]


 74%|████████████████████████▌        | 37173/50000 [6:44:37<2:13:40,  1.60it/s]


 74%|████████████████████████▌        | 37174/50000 [6:44:38<2:12:15,  1.62it/s]


 74%|████████████████████████▌        | 37175/50000 [6:44:39<2:26:14,  1.46it/s]


 74%|████████████████████████▌        | 37176/50000 [6:44:39<2:15:07,  1.58it/s]


 74%|████████████████████████▌        | 37177/50000 [6:44:40<2:14:46,  1.59it/s]


 74%|████████████████████████▌        | 37178/50000 [6:44:40<2:12:04,  1.62it/s]


 74%|████████████████████████▌        | 37179/50000 [6:44:41<2:08:59,  1.66it/s]


 74%|████████████████████████▌        | 37180/50000 [6:44:42<2:17:55,  1.55it/s]


 74%|████████████████████████▌        | 37181/50000 [6:44:42<2:17:06,  1.56it/s]


 74%|████████████████████████▌        | 37182/50000 [6:44:43<2:27:51,  1.44it/s]


 74%|████████████████████████▌        | 37183/50000 [6:44:44<2:25:17,  1.47it/s]


 74%|████████████████████████▌        | 37184/50000 [6:44:44<2:23:52,  1.48it/s]


 74%|████████████████████████▌        | 37185/50000 [6:44:45<2:21:20,  1.51it/s]


 74%|████████████████████████▌        | 37186/50000 [6:44:46<2:21:07,  1.51it/s]


 74%|████████████████████████▌        | 37187/50000 [6:44:46<2:21:22,  1.51it/s]


 74%|████████████████████████▌        | 37188/50000 [6:44:47<2:19:30,  1.53it/s]


 74%|████████████████████████▌        | 37189/50000 [6:44:48<2:19:24,  1.53it/s]


 74%|████████████████████████▌        | 37190/50000 [6:44:48<2:22:59,  1.49it/s]


 74%|████████████████████████▌        | 37191/50000 [6:44:49<2:22:16,  1.50it/s]


 74%|████████████████████████▌        | 37192/50000 [6:44:50<2:16:30,  1.56it/s]


 74%|████████████████████████▌        | 37193/50000 [6:44:50<2:10:51,  1.63it/s]


 74%|████████████████████████▌        | 37194/50000 [6:44:51<2:07:41,  1.67it/s]


 74%|████████████████████████▌        | 37195/50000 [6:44:51<2:05:28,  1.70it/s]


 74%|████████████████████████▌        | 37196/50000 [6:44:52<2:14:37,  1.59it/s]


 74%|████████████████████████▌        | 37197/50000 [6:44:53<2:20:34,  1.52it/s]


 74%|████████████████████████▌        | 37198/50000 [6:44:53<2:16:28,  1.56it/s]


 74%|████████████████████████▌        | 37199/50000 [6:44:54<2:11:59,  1.62it/s]


 74%|████████████████████████▌        | 37200/50000 [6:44:54<2:05:58,  1.69it/s]


                                                                                
{'loss': 3.2234, 'grad_norm': 4.83553409576416, 'learning_rate': 0.000256, 'epoch': 1.95}

 74%|████████████████████████▌        | 37200/50000 [6:44:54<2:05:58,  1.69it/s]


 74%|████████████████████████▌        | 37201/50000 [6:44:55<2:04:44,  1.71it/s]


 74%|████████████████████████▌        | 37202/50000 [6:44:56<2:10:16,  1.64it/s]


 74%|████████████████████████▌        | 37203/50000 [6:44:56<2:12:30,  1.61it/s]


 74%|████████████████████████▌        | 37204/50000 [6:44:57<2:16:31,  1.56it/s]


 74%|████████████████████████▌        | 37205/50000 [6:44:58<2:21:20,  1.51it/s]


 74%|████████████████████████▌        | 37206/50000 [6:44:58<2:20:38,  1.52it/s]


 74%|████████████████████████▌        | 37207/50000 [6:44:59<2:19:45,  1.53it/s]


 74%|████████████████████████▌        | 37208/50000 [6:45:00<2:15:05,  1.58it/s]


 74%|████████████████████████▌        | 37209/50000 [6:45:00<2:17:50,  1.55it/s]


 74%|████████████████████████▌        | 37210/50000 [6:45:01<2:17:12,  1.55it/s]


 74%|████████████████████████▌        | 37211/50000 [6:45:02<2:24:02,  1.48it/s]


 74%|████████████████████████▌        | 37212/50000 [6:45:02<2:19:40,  1.53it/s]


 74%|████████████████████████▌        | 37213/50000 [6:45:03<2:15:12,  1.58it/s]


 74%|████████████████████████▌        | 37214/50000 [6:45:04<2:30:10,  1.42it/s]


 74%|████████████████████████▌        | 37215/50000 [6:45:04<2:26:00,  1.46it/s]


 74%|████████████████████████▌        | 37216/50000 [6:45:05<2:34:47,  1.38it/s]


 74%|████████████████████████▌        | 37217/50000 [6:45:06<2:35:51,  1.37it/s]


 74%|████████████████████████▌        | 37218/50000 [6:45:06<2:27:54,  1.44it/s]


 74%|████████████████████████▌        | 37219/50000 [6:45:07<2:30:36,  1.41it/s]


 74%|████████████████████████▌        | 37220/50000 [6:45:08<2:25:53,  1.46it/s]


 74%|████████████████████████▌        | 37221/50000 [6:45:08<2:18:34,  1.54it/s]


 74%|████████████████████████▌        | 37222/50000 [6:45:09<2:26:55,  1.45it/s]


 74%|████████████████████████▌        | 37223/50000 [6:45:10<2:24:53,  1.47it/s]


 74%|████████████████████████▌        | 37224/50000 [6:45:11<2:28:06,  1.44it/s]


 74%|████████████████████████▌        | 37225/50000 [6:45:11<2:22:05,  1.50it/s]


 74%|████████████████████████▌        | 37226/50000 [6:45:12<2:17:30,  1.55it/s]


 74%|████████████████████████▌        | 37227/50000 [6:45:12<2:11:27,  1.62it/s]


 74%|████████████████████████▌        | 37228/50000 [6:45:13<2:13:38,  1.59it/s]


 74%|████████████████████████▌        | 37229/50000 [6:45:14<2:11:57,  1.61it/s]


 74%|████████████████████████▌        | 37230/50000 [6:45:14<2:12:46,  1.60it/s]


 74%|████████████████████████▌        | 37231/50000 [6:45:15<2:12:57,  1.60it/s]


 74%|████████████████████████▌        | 37232/50000 [6:45:15<2:10:17,  1.63it/s]


 74%|████████████████████████▌        | 37233/50000 [6:45:16<2:18:42,  1.53it/s]


 74%|████████████████████████▌        | 37234/50000 [6:45:17<2:09:15,  1.65it/s]


 74%|████████████████████████▌        | 37235/50000 [6:45:17<2:10:28,  1.63it/s]


 74%|████████████████████████▌        | 37236/50000 [6:45:18<2:20:00,  1.52it/s]


 74%|████████████████████████▌        | 37237/50000 [6:45:19<2:29:58,  1.42it/s]


 74%|████████████████████████▌        | 37238/50000 [6:45:19<2:20:57,  1.51it/s]


 74%|████████████████████████▌        | 37239/50000 [6:45:20<2:20:40,  1.51it/s]


 74%|████████████████████████▌        | 37240/50000 [6:45:21<2:14:02,  1.59it/s]


 74%|████████████████████████▌        | 37241/50000 [6:45:21<2:10:36,  1.63it/s]


 74%|████████████████████████▌        | 37242/50000 [6:45:22<2:12:36,  1.60it/s]


 74%|████████████████████████▌        | 37243/50000 [6:45:23<2:13:35,  1.59it/s]


 74%|████████████████████████▌        | 37244/50000 [6:45:23<2:12:15,  1.61it/s]


 74%|████████████████████████▌        | 37245/50000 [6:45:24<2:22:23,  1.49it/s]


 74%|████████████████████████▌        | 37246/50000 [6:45:25<2:29:06,  1.43it/s]


 74%|████████████████████████▌        | 37247/50000 [6:45:25<2:21:54,  1.50it/s]


 74%|████████████████████████▌        | 37248/50000 [6:45:26<2:18:58,  1.53it/s]


 74%|████████████████████████▌        | 37249/50000 [6:45:27<2:15:36,  1.57it/s]


 74%|████████████████████████▌        | 37250/50000 [6:45:27<2:12:28,  1.60it/s]


 75%|████████████████████████▌        | 37251/50000 [6:45:28<2:18:53,  1.53it/s]


 75%|████████████████████████▌        | 37252/50000 [6:45:28<2:15:08,  1.57it/s]


 75%|████████████████████████▌        | 37253/50000 [6:45:29<2:05:31,  1.69it/s]


 75%|████████████████████████▌        | 37254/50000 [6:45:29<2:05:01,  1.70it/s]


 75%|████████████████████████▌        | 37255/50000 [6:45:30<2:09:02,  1.65it/s]


 75%|████████████████████████▌        | 37256/50000 [6:45:31<2:18:54,  1.53it/s]


 75%|████████████████████████▌        | 37257/50000 [6:45:32<2:25:47,  1.46it/s]


 75%|████████████████████████▌        | 37258/50000 [6:45:32<2:18:06,  1.54it/s]


 75%|████████████████████████▌        | 37259/50000 [6:45:33<2:16:19,  1.56it/s]


 75%|████████████████████████▌        | 37260/50000 [6:45:33<2:16:07,  1.56it/s]


 75%|████████████████████████▌        | 37261/50000 [6:45:34<2:24:14,  1.47it/s]


 75%|████████████████████████▌        | 37262/50000 [6:45:35<2:21:18,  1.50it/s]


 75%|████████████████████████▌        | 37263/50000 [6:45:36<2:22:13,  1.49it/s]


 75%|████████████████████████▌        | 37264/50000 [6:45:36<2:17:59,  1.54it/s]


 75%|████████████████████████▌        | 37265/50000 [6:45:37<2:09:50,  1.63it/s]


 75%|████████████████████████▌        | 37266/50000 [6:45:37<2:10:24,  1.63it/s]


 75%|████████████████████████▌        | 37267/50000 [6:45:38<2:17:24,  1.54it/s]


 75%|████████████████████████▌        | 37268/50000 [6:45:39<2:27:43,  1.44it/s]


 75%|████████████████████████▌        | 37269/50000 [6:45:39<2:21:26,  1.50it/s]


 75%|████████████████████████▌        | 37270/50000 [6:45:40<2:26:00,  1.45it/s]


 75%|████████████████████████▌        | 37271/50000 [6:45:41<2:22:32,  1.49it/s]


 75%|████████████████████████▌        | 37272/50000 [6:45:42<2:23:16,  1.48it/s]


 75%|████████████████████████▌        | 37273/50000 [6:45:42<2:22:39,  1.49it/s]


 75%|████████████████████████▌        | 37274/50000 [6:45:43<2:21:51,  1.50it/s]


 75%|████████████████████████▌        | 37275/50000 [6:45:43<2:20:30,  1.51it/s]


 75%|████████████████████████▌        | 37276/50000 [6:45:44<2:13:06,  1.59it/s]


 75%|████████████████████████▌        | 37277/50000 [6:45:45<2:15:12,  1.57it/s]


 75%|████████████████████████▌        | 37278/50000 [6:45:45<2:17:22,  1.54it/s]


 75%|████████████████████████▌        | 37279/50000 [6:45:46<2:16:52,  1.55it/s]


 75%|████████████████████████▌        | 37280/50000 [6:45:47<2:25:12,  1.46it/s]


 75%|████████████████████████▌        | 37281/50000 [6:45:47<2:26:50,  1.44it/s]


 75%|████████████████████████▌        | 37282/50000 [6:45:48<2:35:07,  1.37it/s]


 75%|████████████████████████▌        | 37283/50000 [6:45:49<2:26:30,  1.45it/s]


 75%|████████████████████████▌        | 37284/50000 [6:45:50<2:21:22,  1.50it/s]


 75%|████████████████████████▌        | 37285/50000 [6:45:50<2:22:28,  1.49it/s]


 75%|████████████████████████▌        | 37286/50000 [6:45:51<2:23:17,  1.48it/s]


 75%|████████████████████████▌        | 37287/50000 [6:45:52<2:21:47,  1.49it/s]


 75%|████████████████████████▌        | 37288/50000 [6:45:52<2:24:32,  1.47it/s]


 75%|████████████████████████▌        | 37289/50000 [6:45:53<2:35:07,  1.37it/s]


 75%|████████████████████████▌        | 37290/50000 [6:45:54<2:31:43,  1.40it/s]


 75%|████████████████████████▌        | 37291/50000 [6:45:54<2:28:08,  1.43it/s]


 75%|████████████████████████▌        | 37292/50000 [6:45:55<2:19:22,  1.52it/s]


 75%|████████████████████████▌        | 37293/50000 [6:45:56<2:15:05,  1.57it/s]


 75%|████████████████████████▌        | 37294/50000 [6:45:56<2:12:22,  1.60it/s]


 75%|████████████████████████▌        | 37295/50000 [6:45:57<2:11:16,  1.61it/s]


 75%|████████████████████████▌        | 37296/50000 [6:45:57<2:06:36,  1.67it/s]


 75%|████████████████████████▌        | 37297/50000 [6:45:58<2:09:43,  1.63it/s]


 75%|████████████████████████▌        | 37298/50000 [6:45:59<2:10:37,  1.62it/s]


 75%|████████████████████████▌        | 37299/50000 [6:45:59<2:10:00,  1.63it/s]


 75%|████████████████████████▌        | 37300/50000 [6:46:00<2:11:28,  1.61it/s]
                                                                                
{'loss': 3.2351, 'grad_norm': 3.2425012588500977, 'learning_rate': 0.000254, 'epoch': 1.95}

 75%|████████████████████████▌        | 37300/50000 [6:46:00<2:11:28,  1.61it/s]


 75%|████████████████████████▌        | 37301/50000 [6:46:01<2:20:38,  1.50it/s]


 75%|████████████████████████▌        | 37302/50000 [6:46:01<2:19:31,  1.52it/s]


 75%|████████████████████████▌        | 37303/50000 [6:46:02<2:19:56,  1.51it/s]


 75%|████████████████████████▌        | 37304/50000 [6:46:03<2:25:55,  1.45it/s]


 75%|████████████████████████▌        | 37305/50000 [6:46:03<2:18:47,  1.52it/s]


 75%|████████████████████████▌        | 37306/50000 [6:46:04<2:20:39,  1.50it/s]


 75%|████████████████████████▌        | 37307/50000 [6:46:05<2:19:07,  1.52it/s]


 75%|████████████████████████▌        | 37308/50000 [6:46:05<2:14:01,  1.58it/s]


 75%|████████████████████████▌        | 37309/50000 [6:46:06<2:13:16,  1.59it/s]


 75%|████████████████████████▌        | 37310/50000 [6:46:06<2:09:55,  1.63it/s]


 75%|████████████████████████▋        | 37311/50000 [6:46:07<2:02:10,  1.73it/s]


 75%|████████████████████████▋        | 37312/50000 [6:46:08<2:13:06,  1.59it/s]


 75%|████████████████████████▋        | 37313/50000 [6:46:08<2:13:00,  1.59it/s]


 75%|████████████████████████▋        | 37314/50000 [6:46:09<2:12:56,  1.59it/s]


 75%|████████████████████████▋        | 37315/50000 [6:46:10<2:18:08,  1.53it/s]


 75%|████████████████████████▋        | 37316/50000 [6:46:10<2:13:19,  1.59it/s]


 75%|████████████████████████▋        | 37317/50000 [6:46:11<2:06:31,  1.67it/s]


 75%|████████████████████████▋        | 37318/50000 [6:46:11<2:06:45,  1.67it/s]


 75%|████████████████████████▋        | 37319/50000 [6:46:12<2:03:40,  1.71it/s]


 75%|████████████████████████▋        | 37320/50000 [6:46:13<2:11:36,  1.61it/s]


 75%|████████████████████████▋        | 37321/50000 [6:46:13<2:05:16,  1.69it/s]


 75%|████████████████████████▋        | 37322/50000 [6:46:14<2:05:24,  1.68it/s]


 75%|████████████████████████▋        | 37323/50000 [6:46:14<2:08:22,  1.65it/s]


 75%|████████████████████████▋        | 37324/50000 [6:46:15<2:12:30,  1.59it/s]


 75%|████████████████████████▋        | 37325/50000 [6:46:16<2:07:45,  1.65it/s]


 75%|████████████████████████▋        | 37326/50000 [6:46:16<2:16:29,  1.55it/s]


 75%|████████████████████████▋        | 37327/50000 [6:46:17<2:08:40,  1.64it/s]


 75%|████████████████████████▋        | 37328/50000 [6:46:17<2:06:28,  1.67it/s]


 75%|████████████████████████▋        | 37329/50000 [6:46:18<2:06:36,  1.67it/s]


 75%|████████████████████████▋        | 37330/50000 [6:46:19<2:04:21,  1.70it/s]


 75%|████████████████████████▋        | 37331/50000 [6:46:19<2:09:35,  1.63it/s]


 75%|████████████████████████▋        | 37332/50000 [6:46:20<2:18:11,  1.53it/s]


 75%|████████████████████████▋        | 37333/50000 [6:46:21<2:11:39,  1.60it/s]


 75%|████████████████████████▋        | 37334/50000 [6:46:21<2:18:47,  1.52it/s]


 75%|████████████████████████▋        | 37335/50000 [6:46:22<2:09:36,  1.63it/s]


 75%|████████████████████████▋        | 37336/50000 [6:46:22<2:07:40,  1.65it/s]


 75%|████████████████████████▋        | 37337/50000 [6:46:23<2:06:40,  1.67it/s]


 75%|████████████████████████▋        | 37338/50000 [6:46:23<2:03:33,  1.71it/s]


 75%|████████████████████████▋        | 37339/50000 [6:46:24<2:09:32,  1.63it/s]


 75%|████████████████████████▋        | 37340/50000 [6:46:25<2:16:37,  1.54it/s]


 75%|████████████████████████▋        | 37341/50000 [6:46:26<2:15:00,  1.56it/s]


 75%|████████████████████████▋        | 37342/50000 [6:46:26<2:28:51,  1.42it/s]


 75%|████████████████████████▋        | 37343/50000 [6:46:27<2:35:55,  1.35it/s]


 75%|████████████████████████▋        | 37344/50000 [6:46:28<2:35:33,  1.36it/s]


 75%|████████████████████████▋        | 37345/50000 [6:46:29<2:28:51,  1.42it/s]


 75%|████████████████████████▋        | 37346/50000 [6:46:29<2:29:43,  1.41it/s]


 75%|████████████████████████▋        | 37347/50000 [6:46:30<2:24:27,  1.46it/s]


 75%|████████████████████████▋        | 37348/50000 [6:46:31<2:20:48,  1.50it/s]


 75%|████████████████████████▋        | 37349/50000 [6:46:31<2:18:59,  1.52it/s]


 75%|████████████████████████▋        | 37350/50000 [6:46:32<2:12:07,  1.60it/s]


 75%|████████████████████████▋        | 37351/50000 [6:46:32<2:13:42,  1.58it/s]


 75%|████████████████████████▋        | 37352/50000 [6:46:33<2:37:18,  1.34it/s]


 75%|████████████████████████▋        | 37353/50000 [6:46:34<2:24:59,  1.45it/s]


 75%|████████████████████████▋        | 37354/50000 [6:46:35<2:24:02,  1.46it/s]


 75%|████████████████████████▋        | 37355/50000 [6:46:35<2:23:13,  1.47it/s]


 75%|████████████████████████▋        | 37356/50000 [6:46:36<2:17:05,  1.54it/s]


 75%|████████████████████████▋        | 37357/50000 [6:46:36<2:08:58,  1.63it/s]


 75%|████████████████████████▋        | 37358/50000 [6:46:37<2:11:54,  1.60it/s]


 75%|████████████████████████▋        | 37359/50000 [6:46:38<2:18:54,  1.52it/s]


 75%|████████████████████████▋        | 37360/50000 [6:46:38<2:14:50,  1.56it/s]


 75%|████████████████████████▋        | 37361/50000 [6:46:39<2:15:58,  1.55it/s]


 75%|████████████████████████▋        | 37362/50000 [6:46:40<2:22:57,  1.47it/s]


 75%|████████████████████████▋        | 37363/50000 [6:46:41<2:27:23,  1.43it/s]


 75%|████████████████████████▋        | 37364/50000 [6:46:41<2:22:44,  1.48it/s]


 75%|████████████████████████▋        | 37365/50000 [6:46:42<2:17:52,  1.53it/s]


 75%|████████████████████████▋        | 37366/50000 [6:46:42<2:21:51,  1.48it/s]


 75%|████████████████████████▋        | 37367/50000 [6:46:43<2:25:00,  1.45it/s]


 75%|████████████████████████▋        | 37368/50000 [6:46:44<2:17:06,  1.54it/s]


 75%|████████████████████████▋        | 37369/50000 [6:46:44<2:17:55,  1.53it/s]


 75%|████████████████████████▋        | 37370/50000 [6:46:45<2:18:57,  1.51it/s]


 75%|████████████████████████▋        | 37371/50000 [6:46:46<2:11:40,  1.60it/s]


 75%|████████████████████████▋        | 37372/50000 [6:46:46<2:23:39,  1.47it/s]


 75%|████████████████████████▋        | 37373/50000 [6:46:47<2:19:43,  1.51it/s]


 75%|████████████████████████▋        | 37374/50000 [6:46:48<2:28:53,  1.41it/s]


 75%|████████████████████████▋        | 37375/50000 [6:46:49<2:25:20,  1.45it/s]


 75%|████████████████████████▋        | 37376/50000 [6:46:49<2:26:30,  1.44it/s]


 75%|████████████████████████▋        | 37377/50000 [6:46:50<2:24:55,  1.45it/s]


 75%|████████████████████████▋        | 37378/50000 [6:46:51<2:21:47,  1.48it/s]


 75%|████████████████████████▋        | 37379/50000 [6:46:51<2:26:56,  1.43it/s]


 75%|████████████████████████▋        | 37380/50000 [6:46:52<2:20:48,  1.49it/s]


 75%|████████████████████████▋        | 37381/50000 [6:46:52<2:09:33,  1.62it/s]


 75%|████████████████████████▋        | 37382/50000 [6:46:53<2:18:46,  1.52it/s]


 75%|████████████████████████▋        | 37383/50000 [6:46:54<2:17:52,  1.53it/s]


 75%|████████████████████████▋        | 37384/50000 [6:46:54<2:07:59,  1.64it/s]


 75%|████████████████████████▋        | 37385/50000 [6:46:55<2:03:56,  1.70it/s]


 75%|████████████████████████▋        | 37386/50000 [6:46:55<1:59:19,  1.76it/s]


 75%|████████████████████████▋        | 37387/50000 [6:46:56<2:03:28,  1.70it/s]


 75%|████████████████████████▋        | 37388/50000 [6:46:57<2:14:05,  1.57it/s]


 75%|████████████████████████▋        | 37389/50000 [6:46:57<2:14:07,  1.57it/s]


 75%|████████████████████████▋        | 37390/50000 [6:46:58<2:18:59,  1.51it/s]


 75%|████████████████████████▋        | 37391/50000 [6:46:59<2:22:01,  1.48it/s]


 75%|████████████████████████▋        | 37392/50000 [6:46:59<2:15:46,  1.55it/s]


 75%|████████████████████████▋        | 37393/50000 [6:47:00<2:15:18,  1.55it/s]


 75%|████████████████████████▋        | 37394/50000 [6:47:01<2:20:20,  1.50it/s]


 75%|████████████████████████▋        | 37395/50000 [6:47:02<2:25:02,  1.45it/s]


 75%|████████████████████████▋        | 37396/50000 [6:47:02<2:23:50,  1.46it/s]


 75%|████████████████████████▋        | 37397/50000 [6:47:03<2:21:21,  1.49it/s]


 75%|████████████████████████▋        | 37398/50000 [6:47:04<2:24:33,  1.45it/s]


 75%|████████████████████████▋        | 37399/50000 [6:47:04<2:19:28,  1.51it/s]


 75%|████████████████████████▋        | 37400/50000 [6:47:05<2:29:04,  1.41it/s]
                                                                                
{'loss': 3.1705, 'grad_norm': 2.974452257156372, 'learning_rate': 0.000252, 'epoch': 1.96}

 75%|████████████████████████▋        | 37400/50000 [6:47:05<2:29:04,  1.41it/s]


 75%|████████████████████████▋        | 37401/50000 [6:47:06<2:25:49,  1.44it/s]


 75%|████████████████████████▋        | 37402/50000 [6:47:06<2:17:35,  1.53it/s]


 75%|████████████████████████▋        | 37403/50000 [6:47:07<2:17:46,  1.52it/s]


 75%|████████████████████████▋        | 37404/50000 [6:47:07<2:12:21,  1.59it/s]


 75%|████████████████████████▋        | 37405/50000 [6:47:08<2:24:00,  1.46it/s]


 75%|████████████████████████▋        | 37406/50000 [6:47:09<2:21:50,  1.48it/s]


 75%|████████████████████████▋        | 37407/50000 [6:47:10<2:25:15,  1.44it/s]


 75%|████████████████████████▋        | 37408/50000 [6:47:10<2:23:57,  1.46it/s]


 75%|████████████████████████▋        | 37409/50000 [6:47:11<2:26:46,  1.43it/s]


 75%|████████████████████████▋        | 37410/50000 [6:47:12<2:19:48,  1.50it/s]


 75%|████████████████████████▋        | 37411/50000 [6:47:13<2:36:04,  1.34it/s]


 75%|████████████████████████▋        | 37412/50000 [6:47:13<2:31:51,  1.38it/s]


 75%|████████████████████████▋        | 37413/50000 [6:47:14<2:31:51,  1.38it/s]


 75%|████████████████████████▋        | 37414/50000 [6:47:15<2:28:33,  1.41it/s]


 75%|████████████████████████▋        | 37415/50000 [6:47:15<2:23:25,  1.46it/s]


 75%|████████████████████████▋        | 37416/50000 [6:47:16<2:21:23,  1.48it/s]


 75%|████████████████████████▋        | 37417/50000 [6:47:17<2:44:01,  1.28it/s]


 75%|████████████████████████▋        | 37418/50000 [6:47:18<2:29:53,  1.40it/s]


 75%|████████████████████████▋        | 37419/50000 [6:47:18<2:26:19,  1.43it/s]


 75%|████████████████████████▋        | 37420/50000 [6:47:19<2:31:11,  1.39it/s]


 75%|████████████████████████▋        | 37421/50000 [6:47:20<2:31:21,  1.39it/s]


 75%|████████████████████████▋        | 37422/50000 [6:47:20<2:24:58,  1.45it/s]


 75%|████████████████████████▋        | 37423/50000 [6:47:21<2:15:44,  1.54it/s]


 75%|████████████████████████▋        | 37424/50000 [6:47:21<2:16:26,  1.54it/s]


 75%|████████████████████████▋        | 37425/50000 [6:47:22<2:14:31,  1.56it/s]


 75%|████████████████████████▋        | 37426/50000 [6:47:23<2:14:52,  1.55it/s]


 75%|████████████████████████▋        | 37427/50000 [6:47:23<2:12:05,  1.59it/s]


 75%|████████████████████████▋        | 37428/50000 [6:47:24<2:06:58,  1.65it/s]


 75%|████████████████████████▋        | 37429/50000 [6:47:25<2:11:57,  1.59it/s]


 75%|████████████████████████▋        | 37430/50000 [6:47:25<2:19:51,  1.50it/s]


 75%|████████████████████████▋        | 37431/50000 [6:47:26<2:14:30,  1.56it/s]


 75%|████████████████████████▋        | 37432/50000 [6:47:26<2:08:47,  1.63it/s]


 75%|████████████████████████▋        | 37433/50000 [6:47:27<2:07:56,  1.64it/s]


 75%|████████████████████████▋        | 37434/50000 [6:47:28<2:09:02,  1.62it/s]


 75%|████████████████████████▋        | 37435/50000 [6:47:28<2:06:20,  1.66it/s]


 75%|████████████████████████▋        | 37436/50000 [6:47:29<2:08:00,  1.64it/s]


 75%|████████████████████████▋        | 37437/50000 [6:47:29<2:01:19,  1.73it/s]


 75%|████████████████████████▋        | 37438/50000 [6:47:30<2:03:55,  1.69it/s]


 75%|████████████████████████▋        | 37439/50000 [6:47:31<2:07:40,  1.64it/s]


 75%|████████████████████████▋        | 37440/50000 [6:47:31<2:16:29,  1.53it/s]


 75%|████████████████████████▋        | 37441/50000 [6:47:32<2:12:38,  1.58it/s]


 75%|████████████████████████▋        | 37442/50000 [6:47:33<2:13:10,  1.57it/s]


 75%|████████████████████████▋        | 37443/50000 [6:47:33<2:10:22,  1.61it/s]


 75%|████████████████████████▋        | 37444/50000 [6:47:34<2:07:09,  1.65it/s]


 75%|████████████████████████▋        | 37445/50000 [6:47:35<2:16:34,  1.53it/s]


 75%|████████████████████████▋        | 37446/50000 [6:47:35<2:17:27,  1.52it/s]


 75%|████████████████████████▋        | 37447/50000 [6:47:36<2:19:15,  1.50it/s]


 75%|████████████████████████▋        | 37448/50000 [6:47:37<2:13:46,  1.56it/s]


 75%|████████████████████████▋        | 37449/50000 [6:47:37<2:09:54,  1.61it/s]


 75%|████████████████████████▋        | 37450/50000 [6:47:38<2:23:30,  1.46it/s]


 75%|████████████████████████▋        | 37451/50000 [6:47:39<2:25:33,  1.44it/s]


 75%|████████████████████████▋        | 37452/50000 [6:47:39<2:19:13,  1.50it/s]


 75%|████████████████████████▋        | 37453/50000 [6:47:40<2:15:08,  1.55it/s]


 75%|████████████████████████▋        | 37454/50000 [6:47:41<2:15:52,  1.54it/s]


 75%|████████████████████████▋        | 37455/50000 [6:47:41<2:21:30,  1.48it/s]


 75%|████████████████████████▋        | 37456/50000 [6:47:42<2:20:52,  1.48it/s]


 75%|████████████████████████▋        | 37457/50000 [6:47:43<2:19:53,  1.49it/s]


 75%|████████████████████████▋        | 37458/50000 [6:47:43<2:22:26,  1.47it/s]


 75%|████████████████████████▋        | 37459/50000 [6:47:44<2:15:02,  1.55it/s]


 75%|████████████████████████▋        | 37460/50000 [6:47:45<2:20:21,  1.49it/s]


 75%|████████████████████████▋        | 37461/50000 [6:47:45<2:17:10,  1.52it/s]


 75%|████████████████████████▋        | 37462/50000 [6:47:46<2:12:55,  1.57it/s]


 75%|████████████████████████▋        | 37463/50000 [6:47:46<2:13:52,  1.56it/s]


 75%|████████████████████████▋        | 37464/50000 [6:47:47<2:13:44,  1.56it/s]


 75%|████████████████████████▋        | 37465/50000 [6:47:48<2:06:20,  1.65it/s]


 75%|████████████████████████▋        | 37466/50000 [6:47:48<2:04:32,  1.68it/s]


 75%|████████████████████████▋        | 37467/50000 [6:47:49<2:06:04,  1.66it/s]


 75%|████████████████████████▋        | 37468/50000 [6:47:49<2:08:41,  1.62it/s]


 75%|████████████████████████▋        | 37469/50000 [6:47:50<2:11:21,  1.59it/s]


 75%|████████████████████████▋        | 37470/50000 [6:47:51<2:11:12,  1.59it/s]


 75%|████████████████████████▋        | 37471/50000 [6:47:51<2:05:59,  1.66it/s]


 75%|████████████████████████▋        | 37472/50000 [6:47:52<2:04:24,  1.68it/s]


 75%|████████████████████████▋        | 37473/50000 [6:47:53<2:19:32,  1.50it/s]


 75%|████████████████████████▋        | 37474/50000 [6:47:53<2:18:12,  1.51it/s]


 75%|████████████████████████▋        | 37475/50000 [6:47:54<2:21:10,  1.48it/s]


 75%|████████████████████████▋        | 37476/50000 [6:47:55<2:13:21,  1.57it/s]


 75%|████████████████████████▋        | 37477/50000 [6:47:55<2:14:34,  1.55it/s]


 75%|████████████████████████▋        | 37478/50000 [6:47:56<2:26:04,  1.43it/s]


 75%|████████████████████████▋        | 37479/50000 [6:47:57<2:23:27,  1.45it/s]


 75%|████████████████████████▋        | 37480/50000 [6:47:57<2:19:38,  1.49it/s]


 75%|████████████████████████▋        | 37481/50000 [6:47:58<2:18:32,  1.51it/s]


 75%|████████████████████████▋        | 37482/50000 [6:47:59<2:30:03,  1.39it/s]


 75%|████████████████████████▋        | 37483/50000 [6:47:59<2:20:44,  1.48it/s]


 75%|████████████████████████▋        | 37484/50000 [6:48:00<2:18:32,  1.51it/s]


 75%|████████████████████████▋        | 37485/50000 [6:48:01<2:09:39,  1.61it/s]


 75%|████████████████████████▋        | 37486/50000 [6:48:01<2:06:49,  1.64it/s]


 75%|████████████████████████▋        | 37487/50000 [6:48:02<2:08:46,  1.62it/s]


 75%|████████████████████████▋        | 37488/50000 [6:48:02<2:02:26,  1.70it/s]


 75%|████████████████████████▋        | 37489/50000 [6:48:03<2:08:29,  1.62it/s]


 75%|████████████████████████▋        | 37490/50000 [6:48:04<2:10:47,  1.59it/s]


 75%|████████████████████████▋        | 37491/50000 [6:48:04<2:19:36,  1.49it/s]


 75%|████████████████████████▋        | 37492/50000 [6:48:05<2:13:22,  1.56it/s]


 75%|████████████████████████▋        | 37493/50000 [6:48:06<2:14:07,  1.55it/s]


 75%|████████████████████████▋        | 37494/50000 [6:48:06<2:16:49,  1.52it/s]


 75%|████████████████████████▋        | 37495/50000 [6:48:07<2:20:08,  1.49it/s]


 75%|████████████████████████▋        | 37496/50000 [6:48:08<2:23:42,  1.45it/s]


 75%|████████████████████████▋        | 37497/50000 [6:48:08<2:21:51,  1.47it/s]


 75%|████████████████████████▋        | 37498/50000 [6:48:09<2:24:09,  1.45it/s]


 75%|████████████████████████▋        | 37499/50000 [6:48:10<2:22:58,  1.46it/s]


 75%|████████████████████████▊        | 37500/50000 [6:48:11<2:26:15,  1.42it/s]
                                                                                
{'loss': 3.2267, 'grad_norm': 3.065948486328125, 'learning_rate': 0.00025, 'epoch': 1.96}

 75%|████████████████████████▊        | 37500/50000 [6:48:11<2:26:15,  1.42it/s]


 75%|████████████████████████▊        | 37501/50000 [6:48:11<2:35:07,  1.34it/s]


 75%|████████████████████████▊        | 37502/50000 [6:48:12<2:28:38,  1.40it/s]


 75%|████████████████████████▊        | 37503/50000 [6:48:13<2:23:19,  1.45it/s]


 75%|████████████████████████▊        | 37504/50000 [6:48:13<2:11:55,  1.58it/s]


 75%|████████████████████████▊        | 37505/50000 [6:48:14<2:10:25,  1.60it/s]


 75%|████████████████████████▊        | 37506/50000 [6:48:15<2:18:37,  1.50it/s]


 75%|████████████████████████▊        | 37507/50000 [6:48:15<2:19:28,  1.49it/s]


 75%|████████████████████████▊        | 37508/50000 [6:48:16<2:18:56,  1.50it/s]


 75%|████████████████████████▊        | 37509/50000 [6:48:17<2:28:47,  1.40it/s]


 75%|████████████████████████▊        | 37510/50000 [6:48:17<2:28:35,  1.40it/s]


 75%|████████████████████████▊        | 37511/50000 [6:48:18<2:26:03,  1.43it/s]


 75%|████████████████████████▊        | 37512/50000 [6:48:19<2:20:12,  1.48it/s]


 75%|████████████████████████▊        | 37513/50000 [6:48:19<2:18:27,  1.50it/s]


 75%|████████████████████████▊        | 37514/50000 [6:48:20<2:13:40,  1.56it/s]


 75%|████████████████████████▊        | 37515/50000 [6:48:21<2:09:08,  1.61it/s]


 75%|████████████████████████▊        | 37516/50000 [6:48:21<2:07:32,  1.63it/s]


 75%|████████████████████████▊        | 37517/50000 [6:48:22<2:11:32,  1.58it/s]


 75%|████████████████████████▊        | 37518/50000 [6:48:23<2:17:16,  1.52it/s]


 75%|████████████████████████▊        | 37519/50000 [6:48:23<2:16:45,  1.52it/s]


 75%|████████████████████████▊        | 37520/50000 [6:48:24<2:26:37,  1.42it/s]


 75%|████████████████████████▊        | 37521/50000 [6:48:25<2:17:49,  1.51it/s]


 75%|████████████████████████▊        | 37522/50000 [6:48:25<2:30:11,  1.38it/s]


 75%|████████████████████████▊        | 37523/50000 [6:48:26<2:22:21,  1.46it/s]


 75%|████████████████████████▊        | 37524/50000 [6:48:27<2:17:16,  1.51it/s]


 75%|████████████████████████▊        | 37525/50000 [6:48:27<2:21:23,  1.47it/s]


 75%|████████████████████████▊        | 37526/50000 [6:48:28<2:15:03,  1.54it/s]


 75%|████████████████████████▊        | 37527/50000 [6:48:29<2:28:01,  1.40it/s]


 75%|████████████████████████▊        | 37528/50000 [6:48:30<2:30:05,  1.38it/s]


 75%|████████████████████████▊        | 37529/50000 [6:48:30<2:24:38,  1.44it/s]


 75%|████████████████████████▊        | 37530/50000 [6:48:31<2:22:49,  1.46it/s]


 75%|████████████████████████▊        | 37531/50000 [6:48:31<2:16:25,  1.52it/s]


 75%|████████████████████████▊        | 37532/50000 [6:48:32<2:12:43,  1.57it/s]


 75%|████████████████████████▊        | 37533/50000 [6:48:33<2:15:14,  1.54it/s]


 75%|████████████████████████▊        | 37534/50000 [6:48:33<2:06:10,  1.65it/s]


 75%|████████████████████████▊        | 37535/50000 [6:48:34<2:14:50,  1.54it/s]


 75%|████████████████████████▊        | 37536/50000 [6:48:35<2:14:22,  1.55it/s]


 75%|████████████████████████▊        | 37537/50000 [6:48:35<2:12:02,  1.57it/s]


 75%|████████████████████████▊        | 37538/50000 [6:48:36<2:10:02,  1.60it/s]


 75%|████████████████████████▊        | 37539/50000 [6:48:37<2:16:15,  1.52it/s]


 75%|████████████████████████▊        | 37540/50000 [6:48:37<2:15:42,  1.53it/s]


 75%|████████████████████████▊        | 37541/50000 [6:48:38<2:19:18,  1.49it/s]


 75%|████████████████████████▊        | 37542/50000 [6:48:38<2:14:41,  1.54it/s]


 75%|████████████████████████▊        | 37543/50000 [6:48:39<2:13:03,  1.56it/s]


 75%|████████████████████████▊        | 37544/50000 [6:48:40<2:14:12,  1.55it/s]


 75%|████████████████████████▊        | 37545/50000 [6:48:40<2:10:13,  1.59it/s]


 75%|████████████████████████▊        | 37546/50000 [6:48:41<2:08:10,  1.62it/s]


 75%|████████████████████████▊        | 37547/50000 [6:48:42<2:11:52,  1.57it/s]


 75%|████████████████████████▊        | 37548/50000 [6:48:42<2:10:07,  1.59it/s]


 75%|████████████████████████▊        | 37549/50000 [6:48:43<2:02:50,  1.69it/s]


 75%|████████████████████████▊        | 37550/50000 [6:48:43<2:03:01,  1.69it/s]


 75%|████████████████████████▊        | 37551/50000 [6:48:44<2:06:16,  1.64it/s]


 75%|████████████████████████▊        | 37552/50000 [6:48:45<2:08:21,  1.62it/s]


 75%|████████████████████████▊        | 37553/50000 [6:48:45<2:09:42,  1.60it/s]


 75%|████████████████████████▊        | 37554/50000 [6:48:46<2:11:46,  1.57it/s]


 75%|████████████████████████▊        | 37555/50000 [6:48:47<2:20:04,  1.48it/s]


 75%|████████████████████████▊        | 37556/50000 [6:48:47<2:15:28,  1.53it/s]


 75%|████████████████████████▊        | 37557/50000 [6:48:48<2:11:51,  1.57it/s]


 75%|████████████████████████▊        | 37558/50000 [6:48:49<2:12:51,  1.56it/s]


 75%|████████████████████████▊        | 37559/50000 [6:48:49<2:07:42,  1.62it/s]


 75%|████████████████████████▊        | 37560/50000 [6:48:50<2:10:44,  1.59it/s]


 75%|████████████████████████▊        | 37561/50000 [6:48:50<2:16:01,  1.52it/s]


 75%|████████████████████████▊        | 37562/50000 [6:48:51<2:15:21,  1.53it/s]


 75%|████████████████████████▊        | 37563/50000 [6:48:52<2:26:39,  1.41it/s]


 75%|████████████████████████▊        | 37564/50000 [6:48:53<2:22:01,  1.46it/s]


 75%|████████████████████████▊        | 37565/50000 [6:48:53<2:14:50,  1.54it/s]


 75%|████████████████████████▊        | 37566/50000 [6:48:54<2:12:58,  1.56it/s]


 75%|████████████████████████▊        | 37567/50000 [6:48:54<2:13:35,  1.55it/s]


 75%|████████████████████████▊        | 37568/50000 [6:48:55<2:10:06,  1.59it/s]


 75%|████████████████████████▊        | 37569/50000 [6:48:56<2:22:33,  1.45it/s]


 75%|████████████████████████▊        | 37570/50000 [6:48:56<2:13:39,  1.55it/s]


 75%|████████████████████████▊        | 37571/50000 [6:48:57<2:14:30,  1.54it/s]


 75%|████████████████████████▊        | 37572/50000 [6:48:58<2:14:17,  1.54it/s]


 75%|████████████████████████▊        | 37573/50000 [6:48:58<2:13:43,  1.55it/s]


 75%|████████████████████████▊        | 37574/50000 [6:48:59<2:15:16,  1.53it/s]


 75%|████████████████████████▊        | 37575/50000 [6:49:00<2:08:31,  1.61it/s]


 75%|████████████████████████▊        | 37576/50000 [6:49:00<2:08:53,  1.61it/s]


 75%|████████████████████████▊        | 37577/50000 [6:49:01<2:16:21,  1.52it/s]


 75%|████████████████████████▊        | 37578/50000 [6:49:02<2:15:08,  1.53it/s]


 75%|████████████████████████▊        | 37579/50000 [6:49:02<2:18:38,  1.49it/s]


 75%|████████████████████████▊        | 37580/50000 [6:49:03<2:13:36,  1.55it/s]


 75%|████████████████████████▊        | 37581/50000 [6:49:04<2:14:17,  1.54it/s]


 75%|████████████████████████▊        | 37582/50000 [6:49:04<2:08:49,  1.61it/s]


 75%|████████████████████████▊        | 37583/50000 [6:49:05<2:02:43,  1.69it/s]


 75%|████████████████████████▊        | 37584/50000 [6:49:05<2:06:55,  1.63it/s]


 75%|████████████████████████▊        | 37585/50000 [6:49:06<2:17:05,  1.51it/s]


 75%|████████████████████████▊        | 37586/50000 [6:49:07<2:17:46,  1.50it/s]


 75%|████████████████████████▊        | 37587/50000 [6:49:07<2:11:35,  1.57it/s]


 75%|████████████████████████▊        | 37588/50000 [6:49:08<2:23:51,  1.44it/s]


 75%|████████████████████████▊        | 37589/50000 [6:49:09<2:17:21,  1.51it/s]


 75%|████████████████████████▊        | 37590/50000 [6:49:09<2:11:31,  1.57it/s]


 75%|████████████████████████▊        | 37591/50000 [6:49:10<2:20:09,  1.48it/s]


 75%|████████████████████████▊        | 37592/50000 [6:49:11<2:12:42,  1.56it/s]


 75%|████████████████████████▊        | 37593/50000 [6:49:11<2:12:31,  1.56it/s]


 75%|████████████████████████▊        | 37594/50000 [6:49:12<2:08:55,  1.60it/s]


 75%|████████████████████████▊        | 37595/50000 [6:49:12<2:04:50,  1.66it/s]


 75%|████████████████████████▊        | 37596/50000 [6:49:13<2:09:23,  1.60it/s]


 75%|████████████████████████▊        | 37597/50000 [6:49:14<2:15:54,  1.52it/s]


 75%|████████████████████████▊        | 37598/50000 [6:49:15<2:20:01,  1.48it/s]


 75%|████████████████████████▊        | 37599/50000 [6:49:15<2:12:59,  1.55it/s]


 75%|████████████████████████▊        | 37600/50000 [6:49:16<2:12:43,  1.56it/s]
                                                                                
{'loss': 3.2096, 'grad_norm': 2.841057538986206, 'learning_rate': 0.000248, 'epoch': 1.97}

 75%|████████████████████████▊        | 37600/50000 [6:49:16<2:12:43,  1.56it/s]


 75%|████████████████████████▊        | 37601/50000 [6:49:16<2:03:50,  1.67it/s]


 75%|████████████████████████▊        | 37602/50000 [6:49:17<2:19:45,  1.48it/s]


 75%|████████████████████████▊        | 37603/50000 [6:49:18<2:13:56,  1.54it/s]


 75%|████████████████████████▊        | 37604/50000 [6:49:18<2:08:18,  1.61it/s]


 75%|████████████████████████▊        | 37605/50000 [6:49:19<2:06:47,  1.63it/s]


 75%|████████████████████████▊        | 37606/50000 [6:49:19<2:06:11,  1.64it/s]


 75%|████████████████████████▊        | 37607/50000 [6:49:20<2:00:07,  1.72it/s]


 75%|████████████████████████▊        | 37608/50000 [6:49:21<2:01:47,  1.70it/s]


 75%|████████████████████████▊        | 37609/50000 [6:49:21<2:15:48,  1.52it/s]


 75%|████████████████████████▊        | 37610/50000 [6:49:22<2:15:56,  1.52it/s]


 75%|████████████████████████▊        | 37611/50000 [6:49:23<2:09:42,  1.59it/s]


 75%|████████████████████████▊        | 37612/50000 [6:49:23<2:07:15,  1.62it/s]


 75%|████████████████████████▊        | 37613/50000 [6:49:24<2:15:07,  1.53it/s]


 75%|████████████████████████▊        | 37614/50000 [6:49:25<2:18:30,  1.49it/s]


 75%|████████████████████████▊        | 37615/50000 [6:49:25<2:06:02,  1.64it/s]


 75%|████████████████████████▊        | 37616/50000 [6:49:26<2:13:07,  1.55it/s]


 75%|████████████████████████▊        | 37617/50000 [6:49:26<2:13:41,  1.54it/s]


 75%|████████████████████████▊        | 37618/50000 [6:49:27<2:30:27,  1.37it/s]


 75%|████████████████████████▊        | 37619/50000 [6:49:28<2:21:00,  1.46it/s]


 75%|████████████████████████▊        | 37620/50000 [6:49:29<2:18:17,  1.49it/s]


 75%|████████████████████████▊        | 37621/50000 [6:49:29<2:13:58,  1.54it/s]


 75%|████████████████████████▊        | 37622/50000 [6:49:30<2:14:32,  1.53it/s]


 75%|████████████████████████▊        | 37623/50000 [6:49:31<2:20:32,  1.47it/s]


 75%|████████████████████████▊        | 37624/50000 [6:49:31<2:20:17,  1.47it/s]


 75%|████████████████████████▊        | 37625/50000 [6:49:32<2:15:26,  1.52it/s]


 75%|████████████████████████▊        | 37626/50000 [6:49:33<2:16:21,  1.51it/s]


 75%|████████████████████████▊        | 37627/50000 [6:49:33<2:21:48,  1.45it/s]


 75%|████████████████████████▊        | 37628/50000 [6:49:34<2:26:25,  1.41it/s]


 75%|████████████████████████▊        | 37629/50000 [6:49:35<2:30:27,  1.37it/s]


 75%|████████████████████████▊        | 37630/50000 [6:49:35<2:19:02,  1.48it/s]


 75%|████████████████████████▊        | 37631/50000 [6:49:36<2:18:48,  1.49it/s]


 75%|████████████████████████▊        | 37632/50000 [6:49:37<2:18:15,  1.49it/s]


 75%|████████████████████████▊        | 37633/50000 [6:49:37<2:15:56,  1.52it/s]


 75%|████████████████████████▊        | 37634/50000 [6:49:38<2:10:20,  1.58it/s]


 75%|████████████████████████▊        | 37635/50000 [6:49:39<2:07:18,  1.62it/s]


 75%|████████████████████████▊        | 37636/50000 [6:49:39<2:13:19,  1.55it/s]


 75%|████████████████████████▊        | 37637/50000 [6:49:40<2:08:57,  1.60it/s]


 75%|████████████████████████▊        | 37638/50000 [6:49:40<2:11:19,  1.57it/s]


 75%|████████████████████████▊        | 37639/50000 [6:49:41<2:24:45,  1.42it/s]


 75%|████████████████████████▊        | 37640/50000 [6:49:42<2:18:04,  1.49it/s]


 75%|████████████████████████▊        | 37641/50000 [6:49:43<2:16:56,  1.50it/s]


 75%|████████████████████████▊        | 37642/50000 [6:49:43<2:20:13,  1.47it/s]


 75%|████████████████████████▊        | 37643/50000 [6:49:44<2:13:23,  1.54it/s]


 75%|████████████████████████▊        | 37644/50000 [6:49:44<2:12:31,  1.55it/s]


 75%|████████████████████████▊        | 37645/50000 [6:49:45<2:13:49,  1.54it/s]


 75%|████████████████████████▊        | 37646/50000 [6:49:46<2:23:40,  1.43it/s]


 75%|████████████████████████▊        | 37647/50000 [6:49:47<2:22:01,  1.45it/s]


 75%|████████████████████████▊        | 37648/50000 [6:49:47<2:13:50,  1.54it/s]


 75%|████████████████████████▊        | 37649/50000 [6:49:48<2:12:23,  1.55it/s]


 75%|████████████████████████▊        | 37650/50000 [6:49:49<2:25:30,  1.41it/s]


 75%|████████████████████████▊        | 37651/50000 [6:49:49<2:16:40,  1.51it/s]


 75%|████████████████████████▊        | 37652/50000 [6:49:50<2:11:38,  1.56it/s]


 75%|████████████████████████▊        | 37653/50000 [6:49:51<2:18:33,  1.49it/s]


 75%|████████████████████████▊        | 37654/50000 [6:49:51<2:27:53,  1.39it/s]


 75%|████████████████████████▊        | 37655/50000 [6:49:52<2:27:17,  1.40it/s]


 75%|████████████████████████▊        | 37656/50000 [6:49:53<2:28:33,  1.38it/s]


 75%|████████████████████████▊        | 37657/50000 [6:49:53<2:23:48,  1.43it/s]


 75%|████████████████████████▊        | 37658/50000 [6:49:54<2:16:14,  1.51it/s]


 75%|████████████████████████▊        | 37659/50000 [6:49:55<2:22:47,  1.44it/s]


 75%|████████████████████████▊        | 37660/50000 [6:49:55<2:15:56,  1.51it/s]


 75%|████████████████████████▊        | 37661/50000 [6:49:56<2:11:51,  1.56it/s]


 75%|████████████████████████▊        | 37662/50000 [6:49:57<2:11:22,  1.57it/s]


 75%|████████████████████████▊        | 37663/50000 [6:49:57<2:08:41,  1.60it/s]


 75%|████████████████████████▊        | 37664/50000 [6:49:58<2:08:46,  1.60it/s]


 75%|████████████████████████▊        | 37665/50000 [6:49:58<2:03:45,  1.66it/s]


 75%|████████████████████████▊        | 37666/50000 [6:49:59<2:08:23,  1.60it/s]


 75%|████████████████████████▊        | 37667/50000 [6:50:00<2:06:11,  1.63it/s]


 75%|████████████████████████▊        | 37668/50000 [6:50:00<2:13:21,  1.54it/s]


 75%|████████████████████████▊        | 37669/50000 [6:50:01<2:13:07,  1.54it/s]


 75%|████████████████████████▊        | 37670/50000 [6:50:02<2:10:38,  1.57it/s]


 75%|████████████████████████▊        | 37671/50000 [6:50:02<2:21:20,  1.45it/s]


 75%|████████████████████████▊        | 37672/50000 [6:50:03<2:18:21,  1.48it/s]


 75%|████████████████████████▊        | 37673/50000 [6:50:04<2:29:45,  1.37it/s]


 75%|████████████████████████▊        | 37674/50000 [6:50:05<2:24:49,  1.42it/s]


 75%|████████████████████████▊        | 37675/50000 [6:50:05<2:26:53,  1.40it/s]


 75%|████████████████████████▊        | 37676/50000 [6:50:06<2:23:07,  1.44it/s]


 75%|████████████████████████▊        | 37677/50000 [6:50:07<2:10:37,  1.57it/s]


 75%|████████████████████████▊        | 37678/50000 [6:50:07<2:13:43,  1.54it/s]


 75%|████████████████████████▊        | 37679/50000 [6:50:08<2:09:37,  1.58it/s]


 75%|████████████████████████▊        | 37680/50000 [6:50:08<2:11:47,  1.56it/s]


 75%|████████████████████████▊        | 37681/50000 [6:50:09<2:11:21,  1.56it/s]


 75%|████████████████████████▊        | 37682/50000 [6:50:10<2:12:14,  1.55it/s]


 75%|████████████████████████▊        | 37683/50000 [6:50:10<2:08:34,  1.60it/s]


 75%|████████████████████████▊        | 37684/50000 [6:50:11<2:10:39,  1.57it/s]


 75%|████████████████████████▊        | 37685/50000 [6:50:12<2:05:06,  1.64it/s]


 75%|████████████████████████▊        | 37686/50000 [6:50:12<2:02:02,  1.68it/s]


 75%|████████████████████████▊        | 37687/50000 [6:50:13<2:04:34,  1.65it/s]


 75%|████████████████████████▊        | 37688/50000 [6:50:13<2:04:45,  1.64it/s]


 75%|████████████████████████▊        | 37689/50000 [6:50:14<2:07:59,  1.60it/s]


 75%|████████████████████████▉        | 37690/50000 [6:50:15<2:06:20,  1.62it/s]


 75%|████████████████████████▉        | 37691/50000 [6:50:15<2:04:26,  1.65it/s]


 75%|████████████████████████▉        | 37692/50000 [6:50:16<2:03:27,  1.66it/s]


 75%|████████████████████████▉        | 37693/50000 [6:50:16<2:06:40,  1.62it/s]


 75%|████████████████████████▉        | 37694/50000 [6:50:17<2:15:19,  1.52it/s]


 75%|████████████████████████▉        | 37695/50000 [6:50:18<2:16:32,  1.50it/s]


 75%|████████████████████████▉        | 37696/50000 [6:50:18<2:12:41,  1.55it/s]


 75%|████████████████████████▉        | 37697/50000 [6:50:19<2:14:20,  1.53it/s]


 75%|████████████████████████▉        | 37698/50000 [6:50:20<2:15:52,  1.51it/s]


 75%|████████████████████████▉        | 37699/50000 [6:50:21<2:18:53,  1.48it/s]


 75%|████████████████████████▉        | 37700/50000 [6:50:21<2:12:49,  1.54it/s]
                                                                                
{'loss': 3.1971, 'grad_norm': 2.9740066528320312, 'learning_rate': 0.000246, 'epoch': 1.97}

 75%|████████████████████████▉        | 37700/50000 [6:50:21<2:12:49,  1.54it/s]


 75%|████████████████████████▉        | 37701/50000 [6:50:22<2:10:24,  1.57it/s]


 75%|████████████████████████▉        | 37702/50000 [6:50:22<2:18:10,  1.48it/s]


 75%|████████████████████████▉        | 37703/50000 [6:50:23<2:15:39,  1.51it/s]


 75%|████████████████████████▉        | 37704/50000 [6:50:24<2:14:18,  1.53it/s]


 75%|████████████████████████▉        | 37705/50000 [6:50:24<2:08:42,  1.59it/s]


 75%|████████████████████████▉        | 37706/50000 [6:50:25<2:02:20,  1.67it/s]


 75%|████████████████████████▉        | 37707/50000 [6:50:25<2:01:38,  1.68it/s]


 75%|████████████████████████▉        | 37708/50000 [6:50:26<2:04:17,  1.65it/s]


 75%|████████████████████████▉        | 37709/50000 [6:50:27<2:07:15,  1.61it/s]


 75%|████████████████████████▉        | 37710/50000 [6:50:27<2:13:44,  1.53it/s]


 75%|████████████████████████▉        | 37711/50000 [6:50:28<2:12:45,  1.54it/s]


 75%|████████████████████████▉        | 37712/50000 [6:50:29<2:14:27,  1.52it/s]


 75%|████████████████████████▉        | 37713/50000 [6:50:29<2:12:27,  1.55it/s]


 75%|████████████████████████▉        | 37714/50000 [6:50:30<2:16:26,  1.50it/s]


 75%|████████████████████████▉        | 37715/50000 [6:50:31<2:14:34,  1.52it/s]


 75%|████████████████████████▉        | 37716/50000 [6:50:31<2:08:59,  1.59it/s]


 75%|████████████████████████▉        | 37717/50000 [6:50:32<2:05:54,  1.63it/s]


 75%|████████████████████████▉        | 37718/50000 [6:50:32<2:04:06,  1.65it/s]


 75%|████████████████████████▉        | 37719/50000 [6:50:33<2:08:41,  1.59it/s]


 75%|████████████████████████▉        | 37720/50000 [6:50:34<2:09:55,  1.58it/s]


 75%|████████████████████████▉        | 37721/50000 [6:50:34<2:10:21,  1.57it/s]


 75%|████████████████████████▉        | 37722/50000 [6:50:35<2:11:02,  1.56it/s]


 75%|████████████████████████▉        | 37723/50000 [6:50:36<2:08:56,  1.59it/s]


 75%|████████████████████████▉        | 37724/50000 [6:50:36<2:05:50,  1.63it/s]


 75%|████████████████████████▉        | 37725/50000 [6:50:37<2:09:23,  1.58it/s]


 75%|████████████████████████▉        | 37726/50000 [6:50:38<2:09:33,  1.58it/s]


 75%|████████████████████████▉        | 37727/50000 [6:50:38<2:20:56,  1.45it/s]


 75%|████████████████████████▉        | 37728/50000 [6:50:39<2:29:54,  1.36it/s]


 75%|████████████████████████▉        | 37729/50000 [6:50:40<2:25:12,  1.41it/s]


 75%|████████████████████████▉        | 37730/50000 [6:50:41<2:26:35,  1.39it/s]


 75%|████████████████████████▉        | 37731/50000 [6:50:41<2:18:37,  1.48it/s]


 75%|████████████████████████▉        | 37732/50000 [6:50:42<2:15:45,  1.51it/s]


 75%|████████████████████████▉        | 37733/50000 [6:50:43<2:14:53,  1.52it/s]


 75%|████████████████████████▉        | 37734/50000 [6:50:43<2:10:43,  1.56it/s]


 75%|████████████████████████▉        | 37735/50000 [6:50:44<2:17:14,  1.49it/s]


 75%|████████████████████████▉        | 37736/50000 [6:50:45<2:16:59,  1.49it/s]


 75%|████████████████████████▉        | 37737/50000 [6:50:45<2:18:09,  1.48it/s]


 75%|████████████████████████▉        | 37738/50000 [6:50:46<2:10:41,  1.56it/s]


 75%|████████████████████████▉        | 37739/50000 [6:50:46<2:12:57,  1.54it/s]


 75%|████████████████████████▉        | 37740/50000 [6:50:47<2:10:15,  1.57it/s]


 75%|████████████████████████▉        | 37741/50000 [6:50:48<2:17:47,  1.48it/s]


 75%|████████████████████████▉        | 37742/50000 [6:50:48<2:11:56,  1.55it/s]


 75%|████████████████████████▉        | 37743/50000 [6:50:49<2:18:50,  1.47it/s]


 75%|████████████████████████▉        | 37744/50000 [6:50:50<2:22:29,  1.43it/s]


 75%|████████████████████████▉        | 37745/50000 [6:50:51<2:18:19,  1.48it/s]


 75%|████████████████████████▉        | 37746/50000 [6:50:51<2:12:18,  1.54it/s]


 75%|████████████████████████▉        | 37747/50000 [6:50:52<2:13:26,  1.53it/s]


 75%|████████████████████████▉        | 37748/50000 [6:50:52<2:08:46,  1.59it/s]


 75%|████████████████████████▉        | 37749/50000 [6:50:53<2:06:37,  1.61it/s]


 76%|████████████████████████▉        | 37750/50000 [6:50:53<2:03:13,  1.66it/s]


 76%|████████████████████████▉        | 37751/50000 [6:50:54<2:02:43,  1.66it/s]


 76%|████████████████████████▉        | 37752/50000 [6:50:55<2:05:06,  1.63it/s]


 76%|████████████████████████▉        | 37753/50000 [6:50:55<2:11:05,  1.56it/s]


 76%|████████████████████████▉        | 37754/50000 [6:50:56<2:22:58,  1.43it/s]


 76%|████████████████████████▉        | 37755/50000 [6:50:57<2:23:56,  1.42it/s]


 76%|████████████████████████▉        | 37756/50000 [6:50:57<2:11:41,  1.55it/s]


 76%|████████████████████████▉        | 37757/50000 [6:50:58<2:19:42,  1.46it/s]


 76%|████████████████████████▉        | 37758/50000 [6:50:59<2:16:16,  1.50it/s]


 76%|████████████████████████▉        | 37759/50000 [6:51:00<2:30:32,  1.36it/s]


 76%|████████████████████████▉        | 37760/50000 [6:51:00<2:21:39,  1.44it/s]


 76%|████████████████████████▉        | 37761/50000 [6:51:01<2:19:32,  1.46it/s]


 76%|████████████████████████▉        | 37762/50000 [6:51:02<2:13:26,  1.53it/s]


 76%|████████████████████████▉        | 37763/50000 [6:51:02<2:21:16,  1.44it/s]


 76%|████████████████████████▉        | 37764/50000 [6:51:03<2:15:01,  1.51it/s]


 76%|████████████████████████▉        | 37765/50000 [6:51:04<2:13:45,  1.52it/s]


 76%|████████████████████████▉        | 37766/50000 [6:51:04<2:09:30,  1.57it/s]


 76%|████████████████████████▉        | 37767/50000 [6:51:05<2:05:46,  1.62it/s]


 76%|████████████████████████▉        | 37768/50000 [6:51:05<2:09:36,  1.57it/s]


 76%|████████████████████████▉        | 37769/50000 [6:51:06<2:09:11,  1.58it/s]


 76%|████████████████████████▉        | 37770/50000 [6:51:07<2:09:14,  1.58it/s]


 76%|████████████████████████▉        | 37771/50000 [6:51:07<2:05:21,  1.63it/s]


 76%|████████████████████████▉        | 37772/50000 [6:51:08<2:02:13,  1.67it/s]


 76%|████████████████████████▉        | 37773/50000 [6:51:08<2:00:00,  1.70it/s]


 76%|████████████████████████▉        | 37774/50000 [6:51:09<2:08:42,  1.58it/s]


 76%|████████████████████████▉        | 37775/50000 [6:51:10<2:10:21,  1.56it/s]


 76%|████████████████████████▉        | 37776/50000 [6:51:11<2:17:26,  1.48it/s]


 76%|████████████████████████▉        | 37777/50000 [6:51:11<2:14:30,  1.51it/s]


 76%|████████████████████████▉        | 37778/50000 [6:51:12<2:17:53,  1.48it/s]


 76%|████████████████████████▉        | 37779/50000 [6:51:13<2:16:20,  1.49it/s]


 76%|████████████████████████▉        | 37780/50000 [6:51:13<2:11:49,  1.55it/s]


 76%|████████████████████████▉        | 37781/50000 [6:51:14<2:10:57,  1.56it/s]


 76%|████████████████████████▉        | 37782/50000 [6:51:14<2:09:37,  1.57it/s]


 76%|████████████████████████▉        | 37783/50000 [6:51:15<2:16:07,  1.50it/s]


 76%|████████████████████████▉        | 37784/50000 [6:51:16<2:15:53,  1.50it/s]


 76%|████████████████████████▉        | 37785/50000 [6:51:17<2:20:32,  1.45it/s]


 76%|████████████████████████▉        | 37786/50000 [6:51:17<2:15:27,  1.50it/s]


 76%|████████████████████████▉        | 37787/50000 [6:51:18<2:09:46,  1.57it/s]


 76%|████████████████████████▉        | 37788/50000 [6:51:18<2:11:25,  1.55it/s]


 76%|████████████████████████▉        | 37789/50000 [6:51:19<2:13:43,  1.52it/s]


 76%|████████████████████████▉        | 37790/50000 [6:51:20<2:10:34,  1.56it/s]


 76%|████████████████████████▉        | 37791/50000 [6:51:20<2:07:29,  1.60it/s]


 76%|████████████████████████▉        | 37792/50000 [6:51:21<2:08:25,  1.58it/s]


 76%|████████████████████████▉        | 37793/50000 [6:51:22<2:06:44,  1.61it/s]


 76%|████████████████████████▉        | 37794/50000 [6:51:22<2:15:00,  1.51it/s]


 76%|████████████████████████▉        | 37795/50000 [6:51:23<2:13:37,  1.52it/s]


 76%|████████████████████████▉        | 37796/50000 [6:51:24<2:14:36,  1.51it/s]


 76%|████████████████████████▉        | 37797/50000 [6:51:24<2:13:22,  1.52it/s]


 76%|████████████████████████▉        | 37798/50000 [6:51:25<2:12:31,  1.53it/s]


 76%|████████████████████████▉        | 37799/50000 [6:51:26<2:09:11,  1.57it/s]


 76%|████████████████████████▉        | 37800/50000 [6:51:26<2:08:46,  1.58it/s]
                                                                                
{'loss': 3.1964, 'grad_norm': 2.723212718963623, 'learning_rate': 0.000244, 'epoch': 1.98}

 76%|████████████████████████▉        | 37800/50000 [6:51:26<2:08:46,  1.58it/s]


 76%|████████████████████████▉        | 37801/50000 [6:51:27<2:31:53,  1.34it/s]


 76%|████████████████████████▉        | 37802/50000 [6:51:28<2:31:22,  1.34it/s]


 76%|████████████████████████▉        | 37803/50000 [6:51:29<2:24:44,  1.40it/s]


 76%|████████████████████████▉        | 37804/50000 [6:51:29<2:26:46,  1.38it/s]


 76%|████████████████████████▉        | 37805/50000 [6:51:30<2:21:51,  1.43it/s]


 76%|████████████████████████▉        | 37806/50000 [6:51:31<2:23:33,  1.42it/s]


 76%|████████████████████████▉        | 37807/50000 [6:51:31<2:15:23,  1.50it/s]


 76%|████████████████████████▉        | 37808/50000 [6:51:32<2:27:01,  1.38it/s]


 76%|████████████████████████▉        | 37809/50000 [6:51:33<2:32:19,  1.33it/s]


 76%|████████████████████████▉        | 37810/50000 [6:51:33<2:18:34,  1.47it/s]


 76%|████████████████████████▉        | 37811/50000 [6:51:34<2:18:05,  1.47it/s]


 76%|████████████████████████▉        | 37812/50000 [6:51:35<2:10:02,  1.56it/s]


 76%|████████████████████████▉        | 37813/50000 [6:51:35<2:05:59,  1.61it/s]


 76%|████████████████████████▉        | 37814/50000 [6:51:36<2:09:39,  1.57it/s]


 76%|████████████████████████▉        | 37815/50000 [6:51:37<2:09:31,  1.57it/s]


 76%|████████████████████████▉        | 37816/50000 [6:51:37<2:07:05,  1.60it/s]


 76%|████████████████████████▉        | 37817/50000 [6:51:38<2:06:07,  1.61it/s]


 76%|████████████████████████▉        | 37818/50000 [6:51:38<2:03:55,  1.64it/s]


 76%|████████████████████████▉        | 37819/50000 [6:51:39<2:07:22,  1.59it/s]


 76%|████████████████████████▉        | 37820/50000 [6:51:40<2:15:18,  1.50it/s]


 76%|████████████████████████▉        | 37821/50000 [6:51:40<2:11:27,  1.54it/s]


 76%|████████████████████████▉        | 37822/50000 [6:51:41<2:10:41,  1.55it/s]


 76%|████████████████████████▉        | 37823/50000 [6:51:42<2:13:14,  1.52it/s]


 76%|████████████████████████▉        | 37824/50000 [6:51:42<2:19:49,  1.45it/s]


 76%|████████████████████████▉        | 37825/50000 [6:51:43<2:17:41,  1.47it/s]


 76%|████████████████████████▉        | 37826/50000 [6:51:44<2:12:44,  1.53it/s]


 76%|████████████████████████▉        | 37827/50000 [6:51:44<2:12:21,  1.53it/s]


 76%|████████████████████████▉        | 37828/50000 [6:51:45<2:09:23,  1.57it/s]


 76%|████████████████████████▉        | 37829/50000 [6:51:46<2:15:07,  1.50it/s]


 76%|████████████████████████▉        | 37830/50000 [6:51:46<2:14:46,  1.50it/s]


 76%|████████████████████████▉        | 37831/50000 [6:51:47<2:09:08,  1.57it/s]


 76%|████████████████████████▉        | 37832/50000 [6:51:48<2:07:07,  1.60it/s]


 76%|████████████████████████▉        | 37833/50000 [6:51:48<2:08:24,  1.58it/s]


 76%|████████████████████████▉        | 37834/50000 [6:51:49<2:03:56,  1.64it/s]


 76%|████████████████████████▉        | 37835/50000 [6:51:49<2:06:13,  1.61it/s]


 76%|████████████████████████▉        | 37836/50000 [6:51:50<2:13:35,  1.52it/s]


 76%|████████████████████████▉        | 37837/50000 [6:51:51<2:12:24,  1.53it/s]


 76%|████████████████████████▉        | 37838/50000 [6:51:51<2:14:17,  1.51it/s]


 76%|████████████████████████▉        | 37839/50000 [6:51:52<2:24:00,  1.41it/s]


 76%|████████████████████████▉        | 37840/50000 [6:51:53<2:22:25,  1.42it/s]


 76%|████████████████████████▉        | 37841/50000 [6:51:54<2:18:58,  1.46it/s]


 76%|████████████████████████▉        | 37842/50000 [6:51:54<2:11:21,  1.54it/s]


 76%|████████████████████████▉        | 37843/50000 [6:51:55<2:12:56,  1.52it/s]


 76%|████████████████████████▉        | 37844/50000 [6:51:55<2:08:35,  1.58it/s]


 76%|████████████████████████▉        | 37845/50000 [6:51:56<2:15:36,  1.49it/s]


 76%|████████████████████████▉        | 37846/50000 [6:51:57<2:10:54,  1.55it/s]


 76%|████████████████████████▉        | 37847/50000 [6:51:57<2:09:49,  1.56it/s]


 76%|████████████████████████▉        | 37848/50000 [6:51:58<2:26:27,  1.38it/s]


 76%|████████████████████████▉        | 37849/50000 [6:51:59<2:24:09,  1.40it/s]


 76%|████████████████████████▉        | 37850/50000 [6:52:00<2:27:20,  1.37it/s]


 76%|████████████████████████▉        | 37851/50000 [6:52:01<2:38:41,  1.28it/s]


 76%|████████████████████████▉        | 37852/50000 [6:52:01<2:36:45,  1.29it/s]


 76%|████████████████████████▉        | 37853/50000 [6:52:02<2:28:42,  1.36it/s]


 76%|████████████████████████▉        | 37854/50000 [6:52:03<2:31:17,  1.34it/s]


 76%|████████████████████████▉        | 37855/50000 [6:52:04<2:27:09,  1.38it/s]


 76%|████████████████████████▉        | 37856/50000 [6:52:04<2:33:27,  1.32it/s]


 76%|████████████████████████▉        | 37857/50000 [6:52:05<2:22:10,  1.42it/s]


 76%|████████████████████████▉        | 37858/50000 [6:52:05<2:14:38,  1.50it/s]


 76%|████████████████████████▉        | 37859/50000 [6:52:06<2:13:36,  1.51it/s]


 76%|████████████████████████▉        | 37860/50000 [6:52:07<2:07:56,  1.58it/s]


 76%|████████████████████████▉        | 37861/50000 [6:52:08<2:19:05,  1.45it/s]


 76%|████████████████████████▉        | 37862/50000 [6:52:08<2:14:13,  1.51it/s]


 76%|████████████████████████▉        | 37863/50000 [6:52:09<2:15:21,  1.49it/s]


 76%|████████████████████████▉        | 37864/50000 [6:52:10<2:22:09,  1.42it/s]


 76%|████████████████████████▉        | 37865/50000 [6:52:10<2:18:16,  1.46it/s]


 76%|████████████████████████▉        | 37866/50000 [6:52:11<2:12:31,  1.53it/s]


 76%|████████████████████████▉        | 37867/50000 [6:52:11<2:09:37,  1.56it/s]


 76%|████████████████████████▉        | 37868/50000 [6:52:12<2:05:42,  1.61it/s]


 76%|████████████████████████▉        | 37869/50000 [6:52:13<2:07:30,  1.59it/s]


 76%|████████████████████████▉        | 37870/50000 [6:52:13<2:03:28,  1.64it/s]


 76%|████████████████████████▉        | 37871/50000 [6:52:14<2:01:23,  1.67it/s]


 76%|████████████████████████▉        | 37872/50000 [6:52:15<2:19:29,  1.45it/s]


 76%|████████████████████████▉        | 37873/50000 [6:52:15<2:17:57,  1.47it/s]


 76%|████████████████████████▉        | 37874/50000 [6:52:16<2:20:50,  1.43it/s]


 76%|████████████████████████▉        | 37875/50000 [6:52:17<2:17:26,  1.47it/s]


 76%|████████████████████████▉        | 37876/50000 [6:52:17<2:09:43,  1.56it/s]


 76%|████████████████████████▉        | 37877/50000 [6:52:18<2:10:21,  1.55it/s]


 76%|████████████████████████▉        | 37878/50000 [6:52:19<2:06:12,  1.60it/s]


 76%|█████████████████████████        | 37879/50000 [6:52:19<2:12:58,  1.52it/s]


 76%|█████████████████████████        | 37880/50000 [6:52:20<2:07:15,  1.59it/s]


 76%|█████████████████████████        | 37881/50000 [6:52:20<2:00:21,  1.68it/s]


 76%|█████████████████████████        | 37882/50000 [6:52:21<1:59:32,  1.69it/s]


 76%|█████████████████████████        | 37883/50000 [6:52:22<2:19:20,  1.45it/s]


 76%|█████████████████████████        | 37884/50000 [6:52:23<2:18:38,  1.46it/s]


 76%|█████████████████████████        | 37885/50000 [6:52:23<2:16:32,  1.48it/s]


 76%|█████████████████████████        | 37886/50000 [6:52:24<2:13:57,  1.51it/s]


 76%|█████████████████████████        | 37887/50000 [6:52:24<2:12:53,  1.52it/s]


 76%|█████████████████████████        | 37888/50000 [6:52:25<2:15:57,  1.48it/s]


 76%|█████████████████████████        | 37889/50000 [6:52:26<2:15:48,  1.49it/s]


 76%|█████████████████████████        | 37890/50000 [6:52:26<2:06:18,  1.60it/s]


 76%|█████████████████████████        | 37891/50000 [6:52:27<2:11:20,  1.54it/s]


 76%|█████████████████████████        | 37892/50000 [6:52:28<2:15:46,  1.49it/s]


 76%|█████████████████████████        | 37893/50000 [6:52:28<2:16:02,  1.48it/s]


 76%|█████████████████████████        | 37894/50000 [6:52:29<2:10:29,  1.55it/s]


 76%|█████████████████████████        | 37895/50000 [6:52:30<2:11:34,  1.53it/s]


 76%|█████████████████████████        | 37896/50000 [6:52:30<2:03:05,  1.64it/s]


 76%|█████████████████████████        | 37897/50000 [6:52:31<2:09:41,  1.56it/s]


 76%|█████████████████████████        | 37898/50000 [6:52:32<2:11:26,  1.53it/s]


 76%|█████████████████████████        | 37899/50000 [6:52:32<2:16:56,  1.47it/s]


 76%|█████████████████████████        | 37900/50000 [6:52:33<2:10:43,  1.54it/s]
                                                                                
{'loss': 3.21, 'grad_norm': 4.41869592666626, 'learning_rate': 0.000242, 'epoch': 1.98}

 76%|█████████████████████████        | 37900/50000 [6:52:33<2:10:43,  1.54it/s]


 76%|█████████████████████████        | 37901/50000 [6:52:33<2:05:39,  1.60it/s]


 76%|█████████████████████████        | 37902/50000 [6:52:34<2:08:56,  1.56it/s]


 76%|█████████████████████████        | 37903/50000 [6:52:35<2:07:00,  1.59it/s]


 76%|█████████████████████████        | 37904/50000 [6:52:36<2:20:37,  1.43it/s]


 76%|█████████████████████████        | 37905/50000 [6:52:36<2:16:20,  1.48it/s]


 76%|█████████████████████████        | 37906/50000 [6:52:37<2:11:23,  1.53it/s]


 76%|█████████████████████████        | 37907/50000 [6:52:37<2:09:51,  1.55it/s]


 76%|█████████████████████████        | 37908/50000 [6:52:38<2:10:43,  1.54it/s]


 76%|█████████████████████████        | 37909/50000 [6:52:39<2:10:09,  1.55it/s]


 76%|█████████████████████████        | 37910/50000 [6:52:39<2:05:56,  1.60it/s]


 76%|█████████████████████████        | 37911/50000 [6:52:40<2:02:14,  1.65it/s]


 76%|█████████████████████████        | 37912/50000 [6:52:41<2:19:55,  1.44it/s]


 76%|█████████████████████████        | 37913/50000 [6:52:41<2:13:53,  1.50it/s]


 76%|█████████████████████████        | 37914/50000 [6:52:42<2:10:05,  1.55it/s]


 76%|█████████████████████████        | 37915/50000 [6:52:43<2:10:50,  1.54it/s]


 76%|█████████████████████████        | 37916/50000 [6:52:43<2:11:21,  1.53it/s]


 76%|█████████████████████████        | 37917/50000 [6:52:44<2:10:59,  1.54it/s]


 76%|█████████████████████████        | 37918/50000 [6:52:45<2:06:12,  1.60it/s]


 76%|█████████████████████████        | 37919/50000 [6:52:45<1:58:26,  1.70it/s]


 76%|█████████████████████████        | 37920/50000 [6:52:46<1:56:58,  1.72it/s]


 76%|█████████████████████████        | 37921/50000 [6:52:46<1:56:42,  1.72it/s]


 76%|█████████████████████████        | 37922/50000 [6:52:47<2:13:18,  1.51it/s]


 76%|█████████████████████████        | 37923/50000 [6:52:48<2:09:14,  1.56it/s]


 76%|█████████████████████████        | 37924/50000 [6:52:48<2:10:38,  1.54it/s]


 76%|█████████████████████████        | 37925/50000 [6:52:49<2:10:04,  1.55it/s]


 76%|█████████████████████████        | 37926/50000 [6:52:50<2:10:25,  1.54it/s]


 76%|█████████████████████████        | 37927/50000 [6:52:51<2:26:19,  1.38it/s]


 76%|█████████████████████████        | 37928/50000 [6:52:51<2:12:51,  1.51it/s]


 76%|█████████████████████████        | 37929/50000 [6:52:52<2:11:13,  1.53it/s]


 76%|█████████████████████████        | 37930/50000 [6:52:52<2:19:00,  1.45it/s]


 76%|█████████████████████████        | 37931/50000 [6:52:53<2:17:08,  1.47it/s]


 76%|█████████████████████████        | 37932/50000 [6:52:54<2:15:27,  1.48it/s]


 76%|█████████████████████████        | 37933/50000 [6:52:54<2:13:10,  1.51it/s]


 76%|█████████████████████████        | 37934/50000 [6:52:55<2:11:33,  1.53it/s]


 76%|█████████████████████████        | 37935/50000 [6:52:56<2:10:01,  1.55it/s]


 76%|█████████████████████████        | 37936/50000 [6:52:56<2:17:08,  1.47it/s]


 76%|█████████████████████████        | 37937/50000 [6:52:57<2:10:30,  1.54it/s]


 76%|█████████████████████████        | 37938/50000 [6:52:58<2:06:37,  1.59it/s]


 76%|█████████████████████████        | 37939/50000 [6:52:58<2:03:53,  1.62it/s]


 76%|█████████████████████████        | 37940/50000 [6:52:59<2:02:48,  1.64it/s]


 76%|█████████████████████████        | 37941/50000 [6:52:59<2:04:14,  1.62it/s]


 76%|█████████████████████████        | 37942/50000 [6:53:00<2:10:05,  1.54it/s]


 76%|█████████████████████████        | 37943/50000 [6:53:01<2:14:21,  1.50it/s]


 76%|█████████████████████████        | 37944/50000 [6:53:02<2:25:36,  1.38it/s]


 76%|█████████████████████████        | 37945/50000 [6:53:02<2:18:09,  1.45it/s]


 76%|█████████████████████████        | 37946/50000 [6:53:03<2:20:18,  1.43it/s]


 76%|█████████████████████████        | 37947/50000 [6:53:04<2:28:35,  1.35it/s]


 76%|█████████████████████████        | 37948/50000 [6:53:04<2:22:12,  1.41it/s]


 76%|█████████████████████████        | 37949/50000 [6:53:05<2:14:38,  1.49it/s]


 76%|█████████████████████████        | 37950/50000 [6:53:06<2:09:20,  1.55it/s]


 76%|█████████████████████████        | 37951/50000 [6:53:06<2:02:01,  1.65it/s]


 76%|█████████████████████████        | 37952/50000 [6:53:07<2:00:06,  1.67it/s]


 76%|█████████████████████████        | 37953/50000 [6:53:07<2:09:34,  1.55it/s]


 76%|█████████████████████████        | 37954/50000 [6:53:08<2:06:36,  1.59it/s]


 76%|█████████████████████████        | 37955/50000 [6:53:09<2:04:28,  1.61it/s]


 76%|█████████████████████████        | 37956/50000 [6:53:09<2:11:49,  1.52it/s]


 76%|█████████████████████████        | 37957/50000 [6:53:10<2:21:22,  1.42it/s]


 76%|█████████████████████████        | 37958/50000 [6:53:11<2:15:09,  1.49it/s]


 76%|█████████████████████████        | 37959/50000 [6:53:11<2:05:19,  1.60it/s]


 76%|█████████████████████████        | 37960/50000 [6:53:12<2:11:28,  1.53it/s]


 76%|█████████████████████████        | 37961/50000 [6:53:13<2:15:41,  1.48it/s]


 76%|█████████████████████████        | 37962/50000 [6:53:13<2:13:28,  1.50it/s]


 76%|█████████████████████████        | 37963/50000 [6:53:14<2:12:17,  1.52it/s]


 76%|█████████████████████████        | 37964/50000 [6:53:15<2:11:28,  1.53it/s]


 76%|█████████████████████████        | 37965/50000 [6:53:15<2:13:10,  1.51it/s]


 76%|█████████████████████████        | 37966/50000 [6:53:16<2:06:45,  1.58it/s]


 76%|█████████████████████████        | 37967/50000 [6:53:17<2:08:21,  1.56it/s]


 76%|█████████████████████████        | 37968/50000 [6:53:17<2:07:32,  1.57it/s]


 76%|█████████████████████████        | 37969/50000 [6:53:18<2:07:22,  1.57it/s]


 76%|█████████████████████████        | 37970/50000 [6:53:19<2:08:05,  1.57it/s]


 76%|█████████████████████████        | 37971/50000 [6:53:19<2:10:27,  1.54it/s]


 76%|█████████████████████████        | 37972/50000 [6:53:20<2:11:41,  1.52it/s]


 76%|█████████████████████████        | 37973/50000 [6:53:21<2:21:22,  1.42it/s]


 76%|█████████████████████████        | 37974/50000 [6:53:21<2:21:44,  1.41it/s]


 76%|█████████████████████████        | 37975/50000 [6:53:22<2:17:23,  1.46it/s]


 76%|█████████████████████████        | 37976/50000 [6:53:23<2:21:26,  1.42it/s]


 76%|█████████████████████████        | 37977/50000 [6:53:23<2:19:31,  1.44it/s]


 76%|█████████████████████████        | 37978/50000 [6:53:24<2:18:28,  1.45it/s]


 76%|█████████████████████████        | 37979/50000 [6:53:25<2:15:51,  1.47it/s]


 76%|█████████████████████████        | 37980/50000 [6:53:25<2:09:51,  1.54it/s]


 76%|█████████████████████████        | 37981/50000 [6:53:26<2:04:51,  1.60it/s]


 76%|█████████████████████████        | 37982/50000 [6:53:27<2:02:08,  1.64it/s]


 76%|█████████████████████████        | 37983/50000 [6:53:27<2:04:23,  1.61it/s]


 76%|█████████████████████████        | 37984/50000 [6:53:28<2:11:03,  1.53it/s]


 76%|█████████████████████████        | 37985/50000 [6:53:28<2:07:39,  1.57it/s]


 76%|█████████████████████████        | 37986/50000 [6:53:29<2:07:07,  1.58it/s]


 76%|█████████████████████████        | 37987/50000 [6:53:30<2:09:52,  1.54it/s]


 76%|█████████████████████████        | 37988/50000 [6:53:30<2:05:32,  1.59it/s]


 76%|█████████████████████████        | 37989/50000 [6:53:31<2:06:40,  1.58it/s]


 76%|█████████████████████████        | 37990/50000 [6:53:32<2:02:57,  1.63it/s]


 76%|█████████████████████████        | 37991/50000 [6:53:32<2:05:13,  1.60it/s]


 76%|█████████████████████████        | 37992/50000 [6:53:33<2:03:46,  1.62it/s]


 76%|█████████████████████████        | 37993/50000 [6:53:33<2:02:45,  1.63it/s]


 76%|█████████████████████████        | 37994/50000 [6:53:34<2:01:41,  1.64it/s]


 76%|█████████████████████████        | 37995/50000 [6:53:35<1:59:46,  1.67it/s]


 76%|█████████████████████████        | 37996/50000 [6:53:35<2:15:14,  1.48it/s]


 76%|█████████████████████████        | 37997/50000 [6:53:36<2:23:15,  1.40it/s]


 76%|█████████████████████████        | 37998/50000 [6:53:37<2:24:29,  1.38it/s]


 76%|█████████████████████████        | 37999/50000 [6:53:38<2:18:45,  1.44it/s]


 76%|█████████████████████████        | 38000/50000 [6:53:38<2:13:13,  1.50it/s]
                                                                                
{'loss': 3.1654, 'grad_norm': 2.9110653400421143, 'learning_rate': 0.00024, 'epoch': 1.99}

 76%|█████████████████████████        | 38000/50000 [6:53:38<2:13:13,  1.50it/s]


 76%|█████████████████████████        | 38001/50000 [6:53:39<2:22:26,  1.40it/s]


 76%|█████████████████████████        | 38002/50000 [6:53:40<2:13:41,  1.50it/s]


 76%|█████████████████████████        | 38003/50000 [6:53:40<2:13:19,  1.50it/s]


 76%|█████████████████████████        | 38004/50000 [6:53:41<2:13:05,  1.50it/s]


 76%|█████████████████████████        | 38005/50000 [6:53:42<2:17:49,  1.45it/s]


 76%|█████████████████████████        | 38006/50000 [6:53:42<2:14:11,  1.49it/s]


 76%|█████████████████████████        | 38007/50000 [6:53:43<2:12:44,  1.51it/s]


 76%|█████████████████████████        | 38008/50000 [6:53:44<2:21:57,  1.41it/s]


 76%|█████████████████████████        | 38009/50000 [6:53:44<2:19:42,  1.43it/s]


 76%|█████████████████████████        | 38010/50000 [6:53:45<2:16:58,  1.46it/s]


 76%|█████████████████████████        | 38011/50000 [6:53:46<2:07:17,  1.57it/s]


 76%|█████████████████████████        | 38012/50000 [6:53:46<2:05:15,  1.60it/s]


 76%|█████████████████████████        | 38013/50000 [6:53:47<1:58:11,  1.69it/s]


 76%|█████████████████████████        | 38014/50000 [6:53:47<1:54:00,  1.75it/s]


 76%|█████████████████████████        | 38015/50000 [6:53:48<1:54:26,  1.75it/s]


 76%|█████████████████████████        | 38016/50000 [6:53:49<2:01:14,  1.65it/s]


 76%|█████████████████████████        | 38017/50000 [6:53:49<1:58:01,  1.69it/s]


 76%|█████████████████████████        | 38018/50000 [6:53:50<2:02:36,  1.63it/s]


 76%|█████████████████████████        | 38019/50000 [6:53:51<2:11:29,  1.52it/s]


 76%|█████████████████████████        | 38020/50000 [6:53:51<2:13:03,  1.50it/s]


 76%|█████████████████████████        | 38021/50000 [6:53:52<2:12:10,  1.51it/s]


 76%|█████████████████████████        | 38022/50000 [6:53:53<2:10:02,  1.54it/s]


 76%|█████████████████████████        | 38023/50000 [6:53:53<2:06:37,  1.58it/s]


 76%|█████████████████████████        | 38024/50000 [6:53:54<2:08:25,  1.55it/s]


 76%|█████████████████████████        | 38025/50000 [6:53:54<2:00:47,  1.65it/s]


 76%|█████████████████████████        | 38026/50000 [6:53:55<2:01:43,  1.64it/s]


 76%|█████████████████████████        | 38027/50000 [6:53:55<2:00:04,  1.66it/s]


 76%|█████████████████████████        | 38028/50000 [6:53:56<2:09:26,  1.54it/s]


 76%|█████████████████████████        | 38029/50000 [6:53:57<2:06:43,  1.57it/s]


 76%|█████████████████████████        | 38030/50000 [6:53:57<2:00:08,  1.66it/s]


 76%|█████████████████████████        | 38031/50000 [6:53:58<2:04:00,  1.61it/s]


 76%|█████████████████████████        | 38032/50000 [6:53:59<2:01:05,  1.65it/s]


 76%|█████████████████████████        | 38033/50000 [6:53:59<2:08:58,  1.55it/s]


 76%|█████████████████████████        | 38034/50000 [6:54:00<2:26:37,  1.36it/s]


 76%|█████████████████████████        | 38035/50000 [6:54:01<2:21:32,  1.41it/s]


 76%|█████████████████████████        | 38036/50000 [6:54:02<2:18:17,  1.44it/s]


 76%|█████████████████████████        | 38037/50000 [6:54:02<2:20:59,  1.41it/s]


 76%|█████████████████████████        | 38038/50000 [6:54:03<2:17:17,  1.45it/s]


 76%|█████████████████████████        | 38039/50000 [6:54:04<2:25:41,  1.37it/s]


 76%|█████████████████████████        | 38040/50000 [6:54:04<2:17:55,  1.45it/s]


 76%|█████████████████████████        | 38041/50000 [6:54:05<2:26:05,  1.36it/s]


 76%|█████████████████████████        | 38042/50000 [6:54:06<2:15:58,  1.47it/s]


 76%|█████████████████████████        | 38043/50000 [6:54:06<2:13:08,  1.50it/s]


 76%|█████████████████████████        | 38044/50000 [6:54:07<2:03:52,  1.61it/s]


 76%|█████████████████████████        | 38045/50000 [6:54:08<2:06:33,  1.57it/s]


 76%|█████████████████████████        | 38046/50000 [6:54:08<2:09:10,  1.54it/s]


 76%|█████████████████████████        | 38047/50000 [6:54:09<2:03:51,  1.61it/s]


 76%|█████████████████████████        | 38048/50000 [6:54:09<2:02:44,  1.62it/s]


 76%|█████████████████████████        | 38049/50000 [6:54:10<2:04:59,  1.59it/s]


 76%|█████████████████████████        | 38050/50000 [6:54:11<1:58:48,  1.68it/s]


 76%|█████████████████████████        | 38051/50000 [6:54:11<1:56:36,  1.71it/s]


 76%|█████████████████████████        | 38052/50000 [6:54:12<1:56:54,  1.70it/s]


 76%|█████████████████████████        | 38053/50000 [6:54:12<2:02:47,  1.62it/s]


 76%|█████████████████████████        | 38054/50000 [6:54:13<2:02:24,  1.63it/s]


 76%|█████████████████████████        | 38055/50000 [6:54:14<2:02:51,  1.62it/s]


 76%|█████████████████████████        | 38056/50000 [6:54:14<1:59:46,  1.66it/s]


 76%|█████████████████████████        | 38057/50000 [6:54:15<1:58:43,  1.68it/s]


 76%|█████████████████████████        | 38058/50000 [6:54:16<2:11:56,  1.51it/s]


 76%|█████████████████████████        | 38059/50000 [6:54:16<2:10:32,  1.52it/s]


 76%|█████████████████████████        | 38060/50000 [6:54:17<2:06:53,  1.57it/s]


 76%|█████████████████████████        | 38061/50000 [6:54:18<2:04:41,  1.60it/s]


 76%|█████████████████████████        | 38062/50000 [6:54:18<2:07:40,  1.56it/s]


 76%|█████████████████████████        | 38063/50000 [6:54:19<2:09:02,  1.54it/s]


 76%|█████████████████████████        | 38064/50000 [6:54:20<2:10:44,  1.52it/s]


 76%|█████████████████████████        | 38065/50000 [6:54:20<2:07:23,  1.56it/s]


 76%|█████████████████████████        | 38066/50000 [6:54:21<1:59:58,  1.66it/s]


 76%|█████████████████████████        | 38067/50000 [6:54:21<2:02:52,  1.62it/s]


 76%|█████████████████████████        | 38068/50000 [6:54:22<2:11:06,  1.52it/s]


 76%|█████████████████████████▏       | 38069/50000 [6:54:23<2:09:57,  1.53it/s]


 76%|█████████████████████████▏       | 38070/50000 [6:54:23<2:14:09,  1.48it/s]


 76%|█████████████████████████▏       | 38071/50000 [6:54:24<2:11:42,  1.51it/s]


 76%|█████████████████████████▏       | 38072/50000 [6:54:25<2:08:02,  1.55it/s]


 76%|█████████████████████████▏       | 38073/50000 [6:54:25<2:10:01,  1.53it/s]


 76%|█████████████████████████▏       | 38074/50000 [6:54:26<2:08:23,  1.55it/s]


 76%|█████████████████████████▏       | 38075/50000 [6:54:27<2:14:56,  1.47it/s]


 76%|█████████████████████████▏       | 38076/50000 [6:54:27<2:07:42,  1.56it/s]


 76%|█████████████████████████▏       | 38077/50000 [6:54:28<2:03:45,  1.61it/s]


 76%|█████████████████████████▏       | 38078/50000 [6:54:28<2:02:27,  1.62it/s]


 76%|█████████████████████████▏       | 38079/50000 [6:54:29<2:01:33,  1.63it/s]


 76%|█████████████████████████▏       | 38080/50000 [6:54:30<2:00:33,  1.65it/s]


 76%|█████████████████████████▏       | 38081/50000 [6:54:30<1:56:49,  1.70it/s]


 76%|█████████████████████████▏       | 38082/50000 [6:54:31<1:56:29,  1.71it/s]


 76%|█████████████████████████▏       | 38083/50000 [6:54:31<2:01:51,  1.63it/s]


 76%|█████████████████████████▏       | 38084/50000 [6:54:32<2:13:53,  1.48it/s]


 76%|█████████████████████████▏       | 38085/50000 [6:54:33<2:14:02,  1.48it/s]


 76%|█████████████████████████▏       | 38086/50000 [6:54:34<2:07:23,  1.56it/s]


 76%|█████████████████████████▏       | 38087/50000 [6:54:34<2:03:54,  1.60it/s]


 76%|█████████████████████████▏       | 38088/50000 [6:54:35<2:12:06,  1.50it/s]


 76%|█████████████████████████▏       | 38089/50000 [6:54:35<2:07:30,  1.56it/s]


 76%|█████████████████████████▏       | 38090/50000 [6:54:36<2:04:17,  1.60it/s]


 76%|█████████████████████████▏       | 38091/50000 [6:54:37<2:05:47,  1.58it/s]


 76%|█████████████████████████▏       | 38092/50000 [6:54:37<2:02:20,  1.62it/s]


 76%|█████████████████████████▏       | 38093/50000 [6:54:38<2:05:10,  1.59it/s]


 76%|█████████████████████████▏       | 38094/50000 [6:54:39<2:16:09,  1.46it/s]


 76%|█████████████████████████▏       | 38095/50000 [6:54:39<2:10:19,  1.52it/s]


 76%|█████████████████████████▏       | 38096/50000 [6:54:40<2:10:28,  1.52it/s]


 76%|█████████████████████████▏       | 38097/50000 [6:54:41<2:05:22,  1.58it/s]


 76%|█████████████████████████▏       | 38098/50000 [6:54:41<2:02:31,  1.62it/s]


 76%|█████████████████████████▏       | 38099/50000 [6:54:42<2:06:09,  1.57it/s]


 76%|█████████████████████████▏       | 38100/50000 [6:54:43<2:09:05,  1.54it/s]
                                                                                
{'loss': 3.2008, 'grad_norm': 3.5176267623901367, 'learning_rate': 0.00023799999999999998, 'epoch': 1.99}

 76%|█████████████████████████▏       | 38100/50000 [6:54:43<2:09:05,  1.54it/s]


 76%|█████████████████████████▏       | 38101/50000 [6:54:43<2:08:58,  1.54it/s]


 76%|█████████████████████████▏       | 38102/50000 [6:54:44<2:05:48,  1.58it/s]


 76%|█████████████████████████▏       | 38103/50000 [6:54:44<2:01:23,  1.63it/s]


 76%|█████████████████████████▏       | 38104/50000 [6:54:45<2:02:44,  1.62it/s]


 76%|█████████████████████████▏       | 38105/50000 [6:54:46<2:00:44,  1.64it/s]


 76%|█████████████████████████▏       | 38106/50000 [6:54:46<2:15:34,  1.46it/s]


 76%|█████████████████████████▏       | 38107/50000 [6:54:47<2:21:28,  1.40it/s]


 76%|█████████████████████████▏       | 38108/50000 [6:54:48<2:12:21,  1.50it/s]


 76%|█████████████████████████▏       | 38109/50000 [6:54:48<2:07:46,  1.55it/s]


 76%|█████████████████████████▏       | 38110/50000 [6:54:49<2:06:27,  1.57it/s]


 76%|█████████████████████████▏       | 38111/50000 [6:54:50<2:14:00,  1.48it/s]


 76%|█████████████████████████▏       | 38112/50000 [6:54:50<2:13:02,  1.49it/s]


 76%|█████████████████████████▏       | 38113/50000 [6:54:51<2:06:46,  1.56it/s]


 76%|█████████████████████████▏       | 38114/50000 [6:54:52<2:03:53,  1.60it/s]


 76%|█████████████████████████▏       | 38115/50000 [6:54:52<2:09:51,  1.53it/s]


 76%|█████████████████████████▏       | 38116/50000 [6:54:53<2:04:55,  1.59it/s]


 76%|█████████████████████████▏       | 38117/50000 [6:54:53<2:01:24,  1.63it/s]


 76%|█████████████████████████▏       | 38118/50000 [6:54:54<1:57:51,  1.68it/s]


 76%|█████████████████████████▏       | 38119/50000 [6:54:55<2:02:50,  1.61it/s]


 76%|█████████████████████████▏       | 38120/50000 [6:54:55<2:03:43,  1.60it/s]


 76%|█████████████████████████▏       | 38121/50000 [6:54:56<2:00:32,  1.64it/s]


 76%|█████████████████████████▏       | 38122/50000 [6:54:56<1:55:09,  1.72it/s]


 76%|█████████████████████████▏       | 38123/50000 [6:54:57<1:58:43,  1.67it/s]


 76%|█████████████████████████▏       | 38124/50000 [6:54:58<2:06:34,  1.56it/s]


 76%|█████████████████████████▏       | 38125/50000 [6:54:58<2:11:14,  1.51it/s]


 76%|█████████████████████████▏       | 38126/50000 [6:54:59<2:04:43,  1.59it/s]


 76%|█████████████████████████▏       | 38127/50000 [6:55:00<2:02:45,  1.61it/s]


 76%|█████████████████████████▏       | 38128/50000 [6:55:00<2:02:53,  1.61it/s]


 76%|█████████████████████████▏       | 38129/50000 [6:55:01<2:03:21,  1.60it/s]


 76%|█████████████████████████▏       | 38130/50000 [6:55:01<2:00:17,  1.64it/s]


 76%|█████████████████████████▏       | 38131/50000 [6:55:02<2:07:41,  1.55it/s]


 76%|█████████████████████████▏       | 38132/50000 [6:55:03<2:03:42,  1.60it/s]


 76%|█████████████████████████▏       | 38133/50000 [6:55:03<2:05:00,  1.58it/s]


 76%|█████████████████████████▏       | 38134/50000 [6:55:04<2:16:25,  1.45it/s]


 76%|█████████████████████████▏       | 38135/50000 [6:55:05<2:24:02,  1.37it/s]


 76%|█████████████████████████▏       | 38136/50000 [6:55:06<2:16:28,  1.45it/s]


 76%|█████████████████████████▏       | 38137/50000 [6:55:06<2:16:14,  1.45it/s]


 76%|█████████████████████████▏       | 38138/50000 [6:55:07<2:10:28,  1.52it/s]


 76%|█████████████████████████▏       | 38139/50000 [6:55:08<2:17:24,  1.44it/s]


 76%|█████████████████████████▏       | 38140/50000 [6:55:08<2:22:36,  1.39it/s]


 76%|█████████████████████████▏       | 38141/50000 [6:55:09<2:16:44,  1.45it/s]


 76%|█████████████████████████▏       | 38142/50000 [6:55:10<2:12:58,  1.49it/s]


 76%|█████████████████████████▏       | 38143/50000 [6:55:10<2:17:38,  1.44it/s]


 76%|█████████████████████████▏       | 38144/50000 [6:55:11<2:17:05,  1.44it/s]


 76%|█████████████████████████▏       | 38145/50000 [6:55:12<2:14:22,  1.47it/s]


 76%|█████████████████████████▏       | 38146/50000 [6:55:12<2:14:00,  1.47it/s]


 76%|█████████████████████████▏       | 38147/50000 [6:55:13<2:08:02,  1.54it/s]


 76%|█████████████████████████▏       | 38148/50000 [6:55:14<2:08:14,  1.54it/s]


 76%|█████████████████████████▏       | 38149/50000 [6:55:14<2:14:08,  1.47it/s]


 76%|█████████████████████████▏       | 38150/50000 [6:55:15<2:11:25,  1.50it/s]


 76%|█████████████████████████▏       | 38151/50000 [6:55:16<2:12:29,  1.49it/s]


 76%|█████████████████████████▏       | 38152/50000 [6:55:17<2:15:36,  1.46it/s]


 76%|█████████████████████████▏       | 38153/50000 [6:55:17<2:09:24,  1.53it/s]


 76%|█████████████████████████▏       | 38154/50000 [6:55:18<2:16:28,  1.45it/s]


 76%|█████████████████████████▏       | 38155/50000 [6:55:18<2:10:46,  1.51it/s]


 76%|█████████████████████████▏       | 38156/50000 [6:55:19<2:11:15,  1.50it/s]


 76%|█████████████████████████▏       | 38157/50000 [6:55:20<2:06:42,  1.56it/s]


 76%|█████████████████████████▏       | 38158/50000 [6:55:20<2:03:53,  1.59it/s]


 76%|█████████████████████████▏       | 38159/50000 [6:55:21<2:00:05,  1.64it/s]


 76%|█████████████████████████▏       | 38160/50000 [6:55:22<2:06:32,  1.56it/s]


 76%|█████████████████████████▏       | 38161/50000 [6:55:22<2:08:39,  1.53it/s]


 76%|█████████████████████████▏       | 38162/50000 [6:55:23<2:03:47,  1.59it/s]


 76%|█████████████████████████▏       | 38163/50000 [6:55:24<2:17:13,  1.44it/s]


 76%|█████████████████████████▏       | 38164/50000 [6:55:24<2:09:06,  1.53it/s]


 76%|█████████████████████████▏       | 38165/50000 [6:55:25<2:04:59,  1.58it/s]


 76%|█████████████████████████▏       | 38166/50000 [6:55:26<2:11:10,  1.50it/s]


 76%|█████████████████████████▏       | 38167/50000 [6:55:26<2:12:44,  1.49it/s]


 76%|█████████████████████████▏       | 38168/50000 [6:55:27<2:08:12,  1.54it/s]


 76%|█████████████████████████▏       | 38169/50000 [6:55:27<2:05:27,  1.57it/s]


 76%|█████████████████████████▏       | 38170/50000 [6:55:28<2:02:35,  1.61it/s]


 76%|█████████████████████████▏       | 38171/50000 [6:55:29<1:58:21,  1.67it/s]


 76%|█████████████████████████▏       | 38172/50000 [6:55:29<2:01:44,  1.62it/s]


 76%|█████████████████████████▏       | 38173/50000 [6:55:30<2:08:44,  1.53it/s]


 76%|█████████████████████████▏       | 38174/50000 [6:55:31<2:05:39,  1.57it/s]


 76%|█████████████████████████▏       | 38175/50000 [6:55:31<2:05:42,  1.57it/s]


 76%|█████████████████████████▏       | 38176/50000 [6:55:32<2:09:53,  1.52it/s]


 76%|█████████████████████████▏       | 38177/50000 [6:55:33<2:06:26,  1.56it/s]


 76%|█████████████████████████▏       | 38178/50000 [6:55:33<2:06:38,  1.56it/s]


 76%|█████████████████████████▏       | 38179/50000 [6:55:34<2:07:53,  1.54it/s]


 76%|█████████████████████████▏       | 38180/50000 [6:55:34<2:06:55,  1.55it/s]


 76%|█████████████████████████▏       | 38181/50000 [6:55:35<2:11:35,  1.50it/s]


 76%|█████████████████████████▏       | 38182/50000 [6:55:36<2:06:31,  1.56it/s]


 76%|█████████████████████████▏       | 38183/50000 [6:55:36<2:00:43,  1.63it/s]


 76%|█████████████████████████▏       | 38184/50000 [6:55:37<2:01:53,  1.62it/s]


 76%|█████████████████████████▏       | 38185/50000 [6:55:38<2:08:49,  1.53it/s]


 76%|█████████████████████████▏       | 38186/50000 [6:55:38<2:10:07,  1.51it/s]


 76%|█████████████████████████▏       | 38187/50000 [6:55:39<2:19:36,  1.41it/s]


 76%|█████████████████████████▏       | 38188/50000 [6:55:40<2:15:26,  1.45it/s]


 76%|█████████████████████████▏       | 38189/50000 [6:55:41<2:16:39,  1.44it/s]


 76%|█████████████████████████▏       | 38190/50000 [6:55:41<2:08:33,  1.53it/s]


 76%|█████████████████████████▏       | 38191/50000 [6:55:42<2:08:26,  1.53it/s]


 76%|█████████████████████████▏       | 38192/50000 [6:55:43<2:13:21,  1.48it/s]


 76%|█████████████████████████▏       | 38193/50000 [6:55:43<2:12:55,  1.48it/s]


 76%|█████████████████████████▏       | 38194/50000 [6:55:44<2:11:09,  1.50it/s]


 76%|█████████████████████████▏       | 38195/50000 [6:55:44<2:11:00,  1.50it/s]


 76%|█████████████████████████▏       | 38196/50000 [6:55:45<2:05:44,  1.56it/s]


 76%|█████████████████████████▏       | 38197/50000 [6:55:46<2:04:59,  1.57it/s]


 76%|█████████████████████████▏       | 38198/50000 [6:55:46<1:57:11,  1.68it/s]


 76%|█████████████████████████▏       | 38199/50000 [6:55:47<2:01:37,  1.62it/s]


 76%|█████████████████████████▏       | 38200/50000 [6:55:48<2:07:03,  1.55it/s]
                                                                                
{'loss': 3.1892, 'grad_norm': 6.471377849578857, 'learning_rate': 0.000236, 'epoch': 2.0}

 76%|█████████████████████████▏       | 38200/50000 [6:55:48<2:07:03,  1.55it/s]


 76%|█████████████████████████▏       | 38201/50000 [6:55:49<3:13:21,  1.02it/s]


 76%|█████████████████████████▏       | 38202/50000 [6:55:50<2:51:00,  1.15it/s]


 76%|█████████████████████████▏       | 38203/50000 [6:55:51<2:41:42,  1.22it/s]


 76%|█████████████████████████▏       | 38204/50000 [6:55:51<2:37:57,  1.24it/s]


 76%|█████████████████████████▏       | 38205/50000 [6:55:52<2:29:52,  1.31it/s]


 76%|█████████████████████████▏       | 38206/50000 [6:55:53<2:17:35,  1.43it/s]


 76%|█████████████████████████▏       | 38207/50000 [6:55:53<2:10:06,  1.51it/s]


 76%|█████████████████████████▏       | 38208/50000 [6:55:54<2:15:02,  1.46it/s]


 76%|█████████████████████████▏       | 38209/50000 [6:55:55<2:16:25,  1.44it/s]


 76%|█████████████████████████▏       | 38210/50000 [6:55:55<2:21:38,  1.39it/s]


 76%|█████████████████████████▏       | 38211/50000 [6:55:56<2:24:04,  1.36it/s]


 76%|█████████████████████████▏       | 38212/50000 [6:55:57<2:16:15,  1.44it/s]


 76%|█████████████████████████▏       | 38213/50000 [6:55:57<2:08:13,  1.53it/s]


 76%|█████████████████████████▏       | 38214/50000 [6:55:58<2:07:28,  1.54it/s]


 76%|█████████████████████████▏       | 38215/50000 [6:55:59<2:15:20,  1.45it/s]


 76%|█████████████████████████▏       | 38216/50000 [6:55:59<2:09:08,  1.52it/s]


 76%|█████████████████████████▏       | 38217/50000 [6:56:00<2:04:02,  1.58it/s]


 76%|█████████████████████████▏       | 38218/50000 [6:56:01<2:01:07,  1.62it/s]


 76%|█████████████████████████▏       | 38219/50000 [6:56:01<1:58:19,  1.66it/s]


 76%|█████████████████████████▏       | 38220/50000 [6:56:02<2:06:34,  1.55it/s]


 76%|█████████████████████████▏       | 38221/50000 [6:56:02<2:03:38,  1.59it/s]


 76%|█████████████████████████▏       | 38222/50000 [6:56:03<2:01:55,  1.61it/s]


 76%|█████████████████████████▏       | 38223/50000 [6:56:04<2:07:58,  1.53it/s]


 76%|█████████████████████████▏       | 38224/50000 [6:56:04<2:09:07,  1.52it/s]


 76%|█████████████████████████▏       | 38225/50000 [6:56:05<2:05:01,  1.57it/s]


 76%|█████████████████████████▏       | 38226/50000 [6:56:06<2:05:07,  1.57it/s]


 76%|█████████████████████████▏       | 38227/50000 [6:56:06<2:02:36,  1.60it/s]


 76%|█████████████████████████▏       | 38228/50000 [6:56:07<2:01:11,  1.62it/s]


 76%|█████████████████████████▏       | 38229/50000 [6:56:07<1:57:58,  1.66it/s]


 76%|█████████████████████████▏       | 38230/50000 [6:56:08<2:00:34,  1.63it/s]


 76%|█████████████████████████▏       | 38231/50000 [6:56:09<2:01:39,  1.61it/s]


 76%|█████████████████████████▏       | 38232/50000 [6:56:09<1:57:59,  1.66it/s]


 76%|█████████████████████████▏       | 38233/50000 [6:56:10<2:02:47,  1.60it/s]


 76%|█████████████████████████▏       | 38234/50000 [6:56:11<2:00:34,  1.63it/s]


 76%|█████████████████████████▏       | 38235/50000 [6:56:11<2:14:44,  1.46it/s]


 76%|█████████████████████████▏       | 38236/50000 [6:56:12<2:08:11,  1.53it/s]


 76%|█████████████████████████▏       | 38237/50000 [6:56:13<2:09:31,  1.51it/s]


 76%|█████████████████████████▏       | 38238/50000 [6:56:13<2:02:41,  1.60it/s]


 76%|█████████████████████████▏       | 38239/50000 [6:56:14<2:04:10,  1.58it/s]


 76%|█████████████████████████▏       | 38240/50000 [6:56:14<2:04:10,  1.58it/s]


 76%|█████████████████████████▏       | 38241/50000 [6:56:15<2:10:35,  1.50it/s]


 76%|█████████████████████████▏       | 38242/50000 [6:56:16<2:03:47,  1.58it/s]


 76%|█████████████████████████▏       | 38243/50000 [6:56:16<2:01:57,  1.61it/s]


 76%|█████████████████████████▏       | 38244/50000 [6:56:17<2:03:40,  1.58it/s]


 76%|█████████████████████████▏       | 38245/50000 [6:56:18<2:03:23,  1.59it/s]


 76%|█████████████████████████▏       | 38246/50000 [6:56:18<2:15:38,  1.44it/s]


 76%|█████████████████████████▏       | 38247/50000 [6:56:19<2:18:56,  1.41it/s]


 76%|█████████████████████████▏       | 38248/50000 [6:56:20<2:13:42,  1.46it/s]


 76%|█████████████████████████▏       | 38249/50000 [6:56:20<2:11:07,  1.49it/s]


 76%|█████████████████████████▏       | 38250/50000 [6:56:21<2:10:24,  1.50it/s]


 77%|█████████████████████████▏       | 38251/50000 [6:56:22<2:04:02,  1.58it/s]


 77%|█████████████████████████▏       | 38252/50000 [6:56:22<2:03:59,  1.58it/s]


 77%|█████████████████████████▏       | 38253/50000 [6:56:23<2:03:12,  1.59it/s]


 77%|█████████████████████████▏       | 38254/50000 [6:56:24<2:01:53,  1.61it/s]


 77%|█████████████████████████▏       | 38255/50000 [6:56:24<1:55:58,  1.69it/s]


 77%|█████████████████████████▏       | 38256/50000 [6:56:25<1:58:19,  1.65it/s]


 77%|█████████████████████████▏       | 38257/50000 [6:56:25<2:01:26,  1.61it/s]


 77%|█████████████████████████▎       | 38258/50000 [6:56:26<1:55:17,  1.70it/s]


 77%|█████████████████████████▎       | 38259/50000 [6:56:26<1:49:32,  1.79it/s]


 77%|█████████████████████████▎       | 38260/50000 [6:56:27<1:53:25,  1.73it/s]


 77%|█████████████████████████▎       | 38261/50000 [6:56:28<2:01:00,  1.62it/s]


 77%|█████████████████████████▎       | 38262/50000 [6:56:28<2:01:08,  1.61it/s]


 77%|█████████████████████████▎       | 38263/50000 [6:56:29<2:03:00,  1.59it/s]


 77%|█████████████████████████▎       | 38264/50000 [6:56:30<2:00:16,  1.63it/s]


 77%|█████████████████████████▎       | 38265/50000 [6:56:30<2:01:41,  1.61it/s]


 77%|█████████████████████████▎       | 38266/50000 [6:56:31<2:00:47,  1.62it/s]


 77%|█████████████████████████▎       | 38267/50000 [6:56:31<2:00:09,  1.63it/s]


 77%|█████████████████████████▎       | 38268/50000 [6:56:32<2:00:50,  1.62it/s]


 77%|█████████████████████████▎       | 38269/50000 [6:56:33<1:57:17,  1.67it/s]


 77%|█████████████████████████▎       | 38270/50000 [6:56:33<2:01:46,  1.61it/s]


 77%|█████████████████████████▎       | 38271/50000 [6:56:34<1:57:11,  1.67it/s]


 77%|█████████████████████████▎       | 38272/50000 [6:56:34<1:59:51,  1.63it/s]


 77%|█████████████████████████▎       | 38273/50000 [6:56:35<2:03:58,  1.58it/s]


 77%|█████████████████████████▎       | 38274/50000 [6:56:36<2:01:39,  1.61it/s]


 77%|█████████████████████████▎       | 38275/50000 [6:56:36<2:06:42,  1.54it/s]


 77%|█████████████████████████▎       | 38276/50000 [6:56:37<2:02:26,  1.60it/s]


 77%|█████████████████████████▎       | 38277/50000 [6:56:38<2:09:36,  1.51it/s]


 77%|█████████████████████████▎       | 38278/50000 [6:56:39<2:13:07,  1.47it/s]


 77%|█████████████████████████▎       | 38279/50000 [6:56:39<2:11:46,  1.48it/s]


 77%|█████████████████████████▎       | 38280/50000 [6:56:40<2:19:38,  1.40it/s]


 77%|█████████████████████████▎       | 38281/50000 [6:56:41<2:22:22,  1.37it/s]


 77%|█████████████████████████▎       | 38282/50000 [6:56:41<2:17:27,  1.42it/s]


 77%|█████████████████████████▎       | 38283/50000 [6:56:42<2:10:20,  1.50it/s]


 77%|█████████████████████████▎       | 38284/50000 [6:56:43<2:05:42,  1.55it/s]


 77%|█████████████████████████▎       | 38285/50000 [6:56:43<2:05:47,  1.55it/s]


 77%|█████████████████████████▎       | 38286/50000 [6:56:44<2:16:33,  1.43it/s]


 77%|█████████████████████████▎       | 38287/50000 [6:56:45<2:13:43,  1.46it/s]


 77%|█████████████████████████▎       | 38288/50000 [6:56:45<2:08:26,  1.52it/s]


 77%|█████████████████████████▎       | 38289/50000 [6:56:46<2:05:05,  1.56it/s]


 77%|█████████████████████████▎       | 38290/50000 [6:56:46<2:02:44,  1.59it/s]


 77%|█████████████████████████▎       | 38291/50000 [6:56:47<2:05:33,  1.55it/s]


 77%|█████████████████████████▎       | 38292/50000 [6:56:48<2:04:12,  1.57it/s]


 77%|█████████████████████████▎       | 38293/50000 [6:56:48<2:06:33,  1.54it/s]


 77%|█████████████████████████▎       | 38294/50000 [6:56:49<2:07:49,  1.53it/s]


 77%|█████████████████████████▎       | 38295/50000 [6:56:50<2:09:29,  1.51it/s]


 77%|█████████████████████████▎       | 38296/50000 [6:56:50<2:10:36,  1.49it/s]


 77%|█████████████████████████▎       | 38297/50000 [6:56:51<2:04:45,  1.56it/s]


 77%|█████████████████████████▎       | 38298/50000 [6:56:52<2:06:35,  1.54it/s]


 77%|█████████████████████████▎       | 38299/50000 [6:56:53<2:18:12,  1.41it/s]


 77%|█████████████████████████▎       | 38300/50000 [6:56:53<2:18:11,  1.41it/s]
                                                                                
{'loss': 3.1905, 'grad_norm': 3.2558839321136475, 'learning_rate': 0.00023400000000000002, 'epoch': 2.01}

 77%|█████████████████████████▎       | 38300/50000 [6:56:53<2:18:11,  1.41it/s]


 77%|█████████████████████████▎       | 38301/50000 [6:56:54<2:15:16,  1.44it/s]


 77%|█████████████████████████▎       | 38302/50000 [6:56:55<2:18:11,  1.41it/s]


 77%|█████████████████████████▎       | 38303/50000 [6:56:55<2:13:52,  1.46it/s]


 77%|█████████████████████████▎       | 38304/50000 [6:56:56<2:09:00,  1.51it/s]


 77%|█████████████████████████▎       | 38305/50000 [6:56:57<2:09:40,  1.50it/s]


 77%|█████████████████████████▎       | 38306/50000 [6:56:57<2:13:33,  1.46it/s]


 77%|█████████████████████████▎       | 38307/50000 [6:56:58<2:10:14,  1.50it/s]


 77%|█████████████████████████▎       | 38308/50000 [6:56:59<2:08:20,  1.52it/s]


 77%|█████████████████████████▎       | 38309/50000 [6:56:59<2:08:47,  1.51it/s]


 77%|█████████████████████████▎       | 38310/50000 [6:57:00<2:05:24,  1.55it/s]


 77%|█████████████████████████▎       | 38311/50000 [6:57:01<2:07:55,  1.52it/s]


 77%|█████████████████████████▎       | 38312/50000 [6:57:01<2:12:44,  1.47it/s]


 77%|█████████████████████████▎       | 38313/50000 [6:57:02<2:21:23,  1.38it/s]


 77%|█████████████████████████▎       | 38314/50000 [6:57:03<2:13:49,  1.46it/s]


 77%|█████████████████████████▎       | 38315/50000 [6:57:03<2:18:16,  1.41it/s]


 77%|█████████████████████████▎       | 38316/50000 [6:57:04<2:14:14,  1.45it/s]


 77%|█████████████████████████▎       | 38317/50000 [6:57:05<2:11:02,  1.49it/s]


 77%|█████████████████████████▎       | 38318/50000 [6:57:05<2:00:30,  1.62it/s]


 77%|█████████████████████████▎       | 38319/50000 [6:57:06<1:57:43,  1.65it/s]


 77%|█████████████████████████▎       | 38320/50000 [6:57:06<1:56:31,  1.67it/s]


 77%|█████████████████████████▎       | 38321/50000 [6:57:07<1:59:44,  1.63it/s]


 77%|█████████████████████████▎       | 38322/50000 [6:57:08<2:02:45,  1.59it/s]


 77%|█████████████████████████▎       | 38323/50000 [6:57:08<1:55:47,  1.68it/s]


 77%|█████████████████████████▎       | 38324/50000 [6:57:09<2:03:24,  1.58it/s]


 77%|█████████████████████████▎       | 38325/50000 [6:57:10<2:02:42,  1.59it/s]


 77%|█████████████████████████▎       | 38326/50000 [6:57:10<1:56:29,  1.67it/s]


 77%|█████████████████████████▎       | 38327/50000 [6:57:11<1:54:39,  1.70it/s]


 77%|█████████████████████████▎       | 38328/50000 [6:57:11<2:01:40,  1.60it/s]


 77%|█████████████████████████▎       | 38329/50000 [6:57:12<1:54:33,  1.70it/s]


 77%|█████████████████████████▎       | 38330/50000 [6:57:13<1:56:50,  1.66it/s]


 77%|█████████████████████████▎       | 38331/50000 [6:57:13<2:01:00,  1.61it/s]


 77%|█████████████████████████▎       | 38332/50000 [6:57:14<1:59:24,  1.63it/s]


 77%|█████████████████████████▎       | 38333/50000 [6:57:15<2:08:47,  1.51it/s]


 77%|█████████████████████████▎       | 38334/50000 [6:57:15<2:00:03,  1.62it/s]


 77%|█████████████████████████▎       | 38335/50000 [6:57:16<2:03:15,  1.58it/s]


 77%|█████████████████████████▎       | 38336/50000 [6:57:16<1:59:18,  1.63it/s]


 77%|█████████████████████████▎       | 38337/50000 [6:57:17<1:55:23,  1.68it/s]


 77%|█████████████████████████▎       | 38338/50000 [6:57:18<2:03:54,  1.57it/s]


 77%|█████████████████████████▎       | 38339/50000 [6:57:18<2:05:53,  1.54it/s]


 77%|█████████████████████████▎       | 38340/50000 [6:57:19<1:59:52,  1.62it/s]


 77%|█████████████████████████▎       | 38341/50000 [6:57:20<2:06:26,  1.54it/s]


 77%|█████████████████████████▎       | 38342/50000 [6:57:20<2:05:44,  1.55it/s]


 77%|█████████████████████████▎       | 38343/50000 [6:57:21<2:04:10,  1.56it/s]


 77%|█████████████████████████▎       | 38344/50000 [6:57:21<1:59:02,  1.63it/s]


 77%|█████████████████████████▎       | 38345/50000 [6:57:22<1:59:49,  1.62it/s]


 77%|█████████████████████████▎       | 38346/50000 [6:57:23<2:07:32,  1.52it/s]


 77%|█████████████████████████▎       | 38347/50000 [6:57:23<2:09:05,  1.50it/s]


 77%|█████████████████████████▎       | 38348/50000 [6:57:24<2:13:37,  1.45it/s]


 77%|█████████████████████████▎       | 38349/50000 [6:57:25<2:02:29,  1.59it/s]


 77%|█████████████████████████▎       | 38350/50000 [6:57:25<1:58:34,  1.64it/s]


 77%|█████████████████████████▎       | 38351/50000 [6:57:26<1:59:10,  1.63it/s]


 77%|█████████████████████████▎       | 38352/50000 [6:57:27<2:05:06,  1.55it/s]


 77%|█████████████████████████▎       | 38353/50000 [6:57:27<2:07:25,  1.52it/s]


 77%|█████████████████████████▎       | 38354/50000 [6:57:28<1:57:29,  1.65it/s]


 77%|█████████████████████████▎       | 38355/50000 [6:57:28<1:57:19,  1.65it/s]


 77%|█████████████████████████▎       | 38356/50000 [6:57:29<2:04:19,  1.56it/s]


 77%|█████████████████████████▎       | 38357/50000 [6:57:30<2:11:06,  1.48it/s]


 77%|█████████████████████████▎       | 38358/50000 [6:57:31<2:11:35,  1.47it/s]


 77%|█████████████████████████▎       | 38359/50000 [6:57:31<2:06:27,  1.53it/s]


 77%|█████████████████████████▎       | 38360/50000 [6:57:32<2:01:44,  1.59it/s]


 77%|█████████████████████████▎       | 38361/50000 [6:57:32<2:08:29,  1.51it/s]


 77%|█████████████████████████▎       | 38362/50000 [6:57:33<2:03:30,  1.57it/s]


 77%|█████████████████████████▎       | 38363/50000 [6:57:34<1:58:53,  1.63it/s]


 77%|█████████████████████████▎       | 38364/50000 [6:57:34<1:57:52,  1.65it/s]


 77%|█████████████████████████▎       | 38365/50000 [6:57:35<1:58:59,  1.63it/s]


 77%|█████████████████████████▎       | 38366/50000 [6:57:35<2:01:12,  1.60it/s]


 77%|█████████████████████████▎       | 38367/50000 [6:57:36<1:55:17,  1.68it/s]


 77%|█████████████████████████▎       | 38368/50000 [6:57:37<2:02:04,  1.59it/s]


 77%|█████████████████████████▎       | 38369/50000 [6:57:37<2:07:38,  1.52it/s]


 77%|█████████████████████████▎       | 38370/50000 [6:57:38<2:03:14,  1.57it/s]


 77%|█████████████████████████▎       | 38371/50000 [6:57:39<2:14:08,  1.44it/s]


 77%|█████████████████████████▎       | 38372/50000 [6:57:39<2:11:27,  1.47it/s]


 77%|█████████████████████████▎       | 38373/50000 [6:57:40<2:09:34,  1.50it/s]


 77%|█████████████████████████▎       | 38374/50000 [6:57:41<2:16:07,  1.42it/s]


 77%|█████████████████████████▎       | 38375/50000 [6:57:42<2:13:55,  1.45it/s]


 77%|█████████████████████████▎       | 38376/50000 [6:57:42<2:07:36,  1.52it/s]


 77%|█████████████████████████▎       | 38377/50000 [6:57:43<2:04:37,  1.55it/s]


 77%|█████████████████████████▎       | 38378/50000 [6:57:43<2:05:07,  1.55it/s]


 77%|█████████████████████████▎       | 38379/50000 [6:57:44<1:59:38,  1.62it/s]


 77%|█████████████████████████▎       | 38380/50000 [6:57:44<1:56:30,  1.66it/s]


 77%|█████████████████████████▎       | 38381/50000 [6:57:45<1:59:49,  1.62it/s]


 77%|█████████████████████████▎       | 38382/50000 [6:57:46<2:03:16,  1.57it/s]


 77%|█████████████████████████▎       | 38383/50000 [6:57:46<2:05:17,  1.55it/s]


 77%|█████████████████████████▎       | 38384/50000 [6:57:47<2:00:50,  1.60it/s]


 77%|█████████████████████████▎       | 38385/50000 [6:57:48<2:14:56,  1.43it/s]


 77%|█████████████████████████▎       | 38386/50000 [6:57:48<2:04:30,  1.55it/s]


 77%|█████████████████████████▎       | 38387/50000 [6:57:49<2:03:39,  1.57it/s]


 77%|█████████████████████████▎       | 38388/50000 [6:57:50<2:00:06,  1.61it/s]


 77%|█████████████████████████▎       | 38389/50000 [6:57:50<2:07:17,  1.52it/s]


 77%|█████████████████████████▎       | 38390/50000 [6:57:51<2:05:15,  1.54it/s]


 77%|█████████████████████████▎       | 38391/50000 [6:57:52<2:06:37,  1.53it/s]


 77%|█████████████████████████▎       | 38392/50000 [6:57:52<2:08:25,  1.51it/s]


 77%|█████████████████████████▎       | 38393/50000 [6:57:53<2:08:57,  1.50it/s]


 77%|█████████████████████████▎       | 38394/50000 [6:57:54<2:14:14,  1.44it/s]


 77%|█████████████████████████▎       | 38395/50000 [6:57:54<2:11:12,  1.47it/s]


 77%|█████████████████████████▎       | 38396/50000 [6:57:55<2:13:28,  1.45it/s]


 77%|█████████████████████████▎       | 38397/50000 [6:57:56<2:11:11,  1.47it/s]


 77%|█████████████████████████▎       | 38398/50000 [6:57:56<2:09:29,  1.49it/s]


 77%|█████████████████████████▎       | 38399/50000 [6:57:57<2:00:42,  1.60it/s]


 77%|█████████████████████████▎       | 38400/50000 [6:57:58<2:08:50,  1.50it/s]


                                                                                
{'loss': 3.1346, 'grad_norm': 2.6527233123779297, 'learning_rate': 0.00023200000000000003, 'epoch': 2.01}

 77%|█████████████████████████▎       | 38400/50000 [6:57:58<2:08:50,  1.50it/s]


 77%|█████████████████████████▎       | 38401/50000 [6:57:58<2:08:34,  1.50it/s]


 77%|█████████████████████████▎       | 38402/50000 [6:57:59<2:06:50,  1.52it/s]


 77%|█████████████████████████▎       | 38403/50000 [6:58:00<2:01:13,  1.59it/s]


 77%|█████████████████████████▎       | 38404/50000 [6:58:00<1:55:17,  1.68it/s]


 77%|█████████████████████████▎       | 38405/50000 [6:58:01<1:59:44,  1.61it/s]


 77%|█████████████████████████▎       | 38406/50000 [6:58:01<2:00:38,  1.60it/s]


 77%|█████████████████████████▎       | 38407/50000 [6:58:02<1:57:15,  1.65it/s]


 77%|█████████████████████████▎       | 38408/50000 [6:58:03<1:52:27,  1.72it/s]


 77%|█████████████████████████▎       | 38409/50000 [6:58:03<1:53:19,  1.70it/s]


 77%|█████████████████████████▎       | 38410/50000 [6:58:04<1:52:49,  1.71it/s]


 77%|█████████████████████████▎       | 38411/50000 [6:58:04<1:53:07,  1.71it/s]


 77%|█████████████████████████▎       | 38412/50000 [6:58:05<2:02:58,  1.57it/s]


 77%|█████████████████████████▎       | 38413/50000 [6:58:06<1:58:44,  1.63it/s]


 77%|█████████████████████████▎       | 38414/50000 [6:58:06<1:59:25,  1.62it/s]


 77%|█████████████████████████▎       | 38415/50000 [6:58:07<1:55:10,  1.68it/s]


 77%|█████████████████████████▎       | 38416/50000 [6:58:07<1:55:25,  1.67it/s]


 77%|█████████████████████████▎       | 38417/50000 [6:58:08<1:49:35,  1.76it/s]


 77%|█████████████████████████▎       | 38418/50000 [6:58:09<1:54:50,  1.68it/s]


 77%|█████████████████████████▎       | 38419/50000 [6:58:09<2:09:07,  1.49it/s]


 77%|█████████████████████████▎       | 38420/50000 [6:58:10<2:12:36,  1.46it/s]


 77%|█████████████████████████▎       | 38421/50000 [6:58:11<2:06:55,  1.52it/s]


 77%|█████████████████████████▎       | 38422/50000 [6:58:11<2:14:26,  1.44it/s]


 77%|█████████████████████████▎       | 38423/50000 [6:58:12<2:11:10,  1.47it/s]


 77%|█████████████████████████▎       | 38424/50000 [6:58:13<2:11:25,  1.47it/s]


 77%|█████████████████████████▎       | 38425/50000 [6:58:13<2:03:54,  1.56it/s]


 77%|█████████████████████████▎       | 38426/50000 [6:58:14<2:03:44,  1.56it/s]


 77%|█████████████████████████▎       | 38427/50000 [6:58:15<2:04:19,  1.55it/s]


 77%|█████████████████████████▎       | 38428/50000 [6:58:15<2:09:38,  1.49it/s]


 77%|█████████████████████████▎       | 38429/50000 [6:58:16<2:20:12,  1.38it/s]


 77%|█████████████████████████▎       | 38430/50000 [6:58:17<2:21:27,  1.36it/s]


 77%|█████████████████████████▎       | 38431/50000 [6:58:18<2:13:26,  1.44it/s]


 77%|█████████████████████████▎       | 38432/50000 [6:58:18<2:07:26,  1.51it/s]


 77%|█████████████████████████▎       | 38433/50000 [6:58:19<2:01:46,  1.58it/s]


 77%|█████████████████████████▎       | 38434/50000 [6:58:19<1:55:06,  1.67it/s]


 77%|█████████████████████████▎       | 38435/50000 [6:58:20<1:59:23,  1.61it/s]


 77%|█████████████████████████▎       | 38436/50000 [6:58:21<1:58:20,  1.63it/s]


 77%|█████████████████████████▎       | 38437/50000 [6:58:21<1:57:35,  1.64it/s]


 77%|█████████████████████████▎       | 38438/50000 [6:58:22<2:11:43,  1.46it/s]


 77%|█████████████████████████▎       | 38439/50000 [6:58:23<2:11:21,  1.47it/s]


 77%|█████████████████████████▎       | 38440/50000 [6:58:23<2:07:02,  1.52it/s]


 77%|█████████████████████████▎       | 38441/50000 [6:58:24<2:02:19,  1.57it/s]


 77%|█████████████████████████▎       | 38442/50000 [6:58:24<2:02:02,  1.58it/s]


 77%|█████████████████████████▎       | 38443/50000 [6:58:25<2:00:30,  1.60it/s]


 77%|█████████████████████████▎       | 38444/50000 [6:58:26<2:13:46,  1.44it/s]


 77%|█████████████████████████▎       | 38445/50000 [6:58:27<2:08:02,  1.50it/s]


 77%|█████████████████████████▎       | 38446/50000 [6:58:27<2:08:21,  1.50it/s]


 77%|█████████████████████████▍       | 38447/50000 [6:58:28<1:57:50,  1.63it/s]


 77%|█████████████████████████▍       | 38448/50000 [6:58:28<2:01:57,  1.58it/s]


 77%|█████████████████████████▍       | 38449/50000 [6:58:29<1:58:22,  1.63it/s]


 77%|█████████████████████████▍       | 38450/50000 [6:58:30<2:03:50,  1.55it/s]


 77%|█████████████████████████▍       | 38451/50000 [6:58:30<1:58:53,  1.62it/s]


 77%|█████████████████████████▍       | 38452/50000 [6:58:31<2:01:12,  1.59it/s]


 77%|█████████████████████████▍       | 38453/50000 [6:58:32<2:03:55,  1.55it/s]


 77%|█████████████████████████▍       | 38454/50000 [6:58:32<2:05:27,  1.53it/s]


 77%|█████████████████████████▍       | 38455/50000 [6:58:33<2:00:59,  1.59it/s]


 77%|█████████████████████████▍       | 38456/50000 [6:58:33<2:00:52,  1.59it/s]


 77%|█████████████████████████▍       | 38457/50000 [6:58:34<2:05:33,  1.53it/s]


 77%|█████████████████████████▍       | 38458/50000 [6:58:35<2:06:13,  1.52it/s]


 77%|█████████████████████████▍       | 38459/50000 [6:58:36<2:16:10,  1.41it/s]


 77%|█████████████████████████▍       | 38460/50000 [6:58:36<2:08:37,  1.50it/s]


 77%|█████████████████████████▍       | 38461/50000 [6:58:37<2:03:01,  1.56it/s]


 77%|█████████████████████████▍       | 38462/50000 [6:58:38<2:15:35,  1.42it/s]


 77%|█████████████████████████▍       | 38463/50000 [6:58:38<2:11:10,  1.47it/s]


 77%|█████████████████████████▍       | 38464/50000 [6:58:39<2:07:44,  1.51it/s]


 77%|█████████████████████████▍       | 38465/50000 [6:58:39<2:02:46,  1.57it/s]


 77%|█████████████████████████▍       | 38466/50000 [6:58:40<2:06:58,  1.51it/s]


 77%|█████████████████████████▍       | 38467/50000 [6:58:41<2:12:52,  1.45it/s]


 77%|█████████████████████████▍       | 38468/50000 [6:58:42<2:09:51,  1.48it/s]


 77%|█████████████████████████▍       | 38469/50000 [6:58:42<2:22:23,  1.35it/s]


 77%|█████████████████████████▍       | 38470/50000 [6:58:43<2:21:23,  1.36it/s]


 77%|█████████████████████████▍       | 38471/50000 [6:58:44<2:15:28,  1.42it/s]


 77%|█████████████████████████▍       | 38472/50000 [6:58:44<2:11:40,  1.46it/s]


 77%|█████████████████████████▍       | 38473/50000 [6:58:45<2:10:08,  1.48it/s]


 77%|█████████████████████████▍       | 38474/50000 [6:58:46<2:08:52,  1.49it/s]


 77%|█████████████████████████▍       | 38475/50000 [6:58:46<2:09:14,  1.49it/s]


 77%|█████████████████████████▍       | 38476/50000 [6:58:47<2:06:34,  1.52it/s]


 77%|█████████████████████████▍       | 38477/50000 [6:58:48<1:58:20,  1.62it/s]


 77%|█████████████████████████▍       | 38478/50000 [6:58:49<2:20:58,  1.36it/s]


 77%|█████████████████████████▍       | 38479/50000 [6:58:49<2:21:29,  1.36it/s]


 77%|█████████████████████████▍       | 38480/50000 [6:58:50<2:16:13,  1.41it/s]


 77%|█████████████████████████▍       | 38481/50000 [6:58:51<2:08:55,  1.49it/s]


 77%|█████████████████████████▍       | 38482/50000 [6:58:51<2:08:53,  1.49it/s]


 77%|█████████████████████████▍       | 38483/50000 [6:58:52<2:12:17,  1.45it/s]


 77%|█████████████████████████▍       | 38484/50000 [6:58:53<2:14:16,  1.43it/s]


 77%|█████████████████████████▍       | 38485/50000 [6:58:53<2:10:09,  1.47it/s]


 77%|█████████████████████████▍       | 38486/50000 [6:58:54<2:14:45,  1.42it/s]


 77%|█████████████████████████▍       | 38487/50000 [6:58:55<2:03:21,  1.56it/s]


 77%|█████████████████████████▍       | 38488/50000 [6:58:55<2:00:35,  1.59it/s]


 77%|█████████████████████████▍       | 38489/50000 [6:58:56<2:05:44,  1.53it/s]


 77%|█████████████████████████▍       | 38490/50000 [6:58:57<2:06:36,  1.52it/s]


 77%|█████████████████████████▍       | 38491/50000 [6:58:57<1:57:57,  1.63it/s]


 77%|█████████████████████████▍       | 38492/50000 [6:58:58<1:56:04,  1.65it/s]


 77%|█████████████████████████▍       | 38493/50000 [6:58:58<1:58:44,  1.62it/s]


 77%|█████████████████████████▍       | 38494/50000 [6:58:59<2:01:00,  1.58it/s]


 77%|█████████████████████████▍       | 38495/50000 [6:59:00<1:59:18,  1.61it/s]


 77%|█████████████████████████▍       | 38496/50000 [6:59:00<1:52:27,  1.70it/s]


 77%|█████████████████████████▍       | 38497/50000 [6:59:01<1:56:34,  1.64it/s]


 77%|█████████████████████████▍       | 38498/50000 [6:59:02<2:05:27,  1.53it/s]


 77%|█████████████████████████▍       | 38499/50000 [6:59:02<2:03:50,  1.55it/s]


 77%|█████████████████████████▍       | 38500/50000 [6:59:03<2:04:32,  1.54it/s]
                                                                                
{'loss': 3.194, 'grad_norm': 3.1614513397216797, 'learning_rate': 0.00023, 'epoch': 2.02}

 77%|█████████████████████████▍       | 38500/50000 [6:59:03<2:04:32,  1.54it/s]


 77%|█████████████████████████▍       | 38501/50000 [6:59:03<1:56:10,  1.65it/s]


 77%|█████████████████████████▍       | 38502/50000 [6:59:04<1:57:42,  1.63it/s]


 77%|█████████████████████████▍       | 38503/50000 [6:59:05<2:00:14,  1.59it/s]


 77%|█████████████████████████▍       | 38504/50000 [6:59:05<2:00:34,  1.59it/s]


 77%|█████████████████████████▍       | 38505/50000 [6:59:06<1:55:41,  1.66it/s]


 77%|█████████████████████████▍       | 38506/50000 [6:59:06<1:57:43,  1.63it/s]


 77%|█████████████████████████▍       | 38507/50000 [6:59:07<2:00:14,  1.59it/s]


 77%|█████████████████████████▍       | 38508/50000 [6:59:08<2:11:30,  1.46it/s]


 77%|█████████████████████████▍       | 38509/50000 [6:59:09<2:14:45,  1.42it/s]


 77%|█████████████████████████▍       | 38510/50000 [6:59:09<2:15:22,  1.41it/s]


 77%|█████████████████████████▍       | 38511/50000 [6:59:10<2:07:38,  1.50it/s]


 77%|█████████████████████████▍       | 38512/50000 [6:59:11<2:06:33,  1.51it/s]


 77%|█████████████████████████▍       | 38513/50000 [6:59:11<2:15:11,  1.42it/s]


 77%|█████████████████████████▍       | 38514/50000 [6:59:12<2:11:26,  1.46it/s]


 77%|█████████████████████████▍       | 38515/50000 [6:59:13<2:10:39,  1.47it/s]


 77%|█████████████████████████▍       | 38516/50000 [6:59:13<2:16:28,  1.40it/s]


 77%|█████████████████████████▍       | 38517/50000 [6:59:14<2:05:38,  1.52it/s]


 77%|█████████████████████████▍       | 38518/50000 [6:59:15<2:06:36,  1.51it/s]


 77%|█████████████████████████▍       | 38519/50000 [6:59:15<2:07:53,  1.50it/s]


 77%|█████████████████████████▍       | 38520/50000 [6:59:16<2:16:43,  1.40it/s]


 77%|█████████████████████████▍       | 38521/50000 [6:59:17<2:15:00,  1.42it/s]


 77%|█████████████████████████▍       | 38522/50000 [6:59:17<2:06:30,  1.51it/s]


 77%|█████████████████████████▍       | 38523/50000 [6:59:18<2:03:23,  1.55it/s]


 77%|█████████████████████████▍       | 38524/50000 [6:59:19<2:00:30,  1.59it/s]


 77%|█████████████████████████▍       | 38525/50000 [6:59:19<1:57:46,  1.62it/s]


 77%|█████████████████████████▍       | 38526/50000 [6:59:20<1:56:12,  1.65it/s]


 77%|█████████████████████████▍       | 38527/50000 [6:59:20<1:55:50,  1.65it/s]


 77%|█████████████████████████▍       | 38528/50000 [6:59:21<1:53:45,  1.68it/s]


 77%|█████████████████████████▍       | 38529/50000 [6:59:22<1:57:24,  1.63it/s]


 77%|█████████████████████████▍       | 38530/50000 [6:59:22<2:00:17,  1.59it/s]


 77%|█████████████████████████▍       | 38531/50000 [6:59:23<1:56:50,  1.64it/s]


 77%|█████████████████████████▍       | 38532/50000 [6:59:24<2:01:12,  1.58it/s]


 77%|█████████████████████████▍       | 38533/50000 [6:59:24<1:59:49,  1.60it/s]


 77%|█████████████████████████▍       | 38534/50000 [6:59:25<2:06:12,  1.51it/s]


 77%|█████████████████████████▍       | 38535/50000 [6:59:25<2:00:26,  1.59it/s]


 77%|█████████████████████████▍       | 38536/50000 [6:59:26<1:58:30,  1.61it/s]


 77%|█████████████████████████▍       | 38537/50000 [6:59:27<1:57:32,  1.63it/s]


 77%|█████████████████████████▍       | 38538/50000 [6:59:27<1:58:49,  1.61it/s]


 77%|█████████████████████████▍       | 38539/50000 [6:59:28<2:02:29,  1.56it/s]


 77%|█████████████████████████▍       | 38540/50000 [6:59:29<1:58:02,  1.62it/s]


 77%|█████████████████████████▍       | 38541/50000 [6:59:29<2:03:46,  1.54it/s]


 77%|█████████████████████████▍       | 38542/50000 [6:59:30<2:10:23,  1.46it/s]


 77%|█████████████████████████▍       | 38543/50000 [6:59:31<2:09:01,  1.48it/s]


 77%|█████████████████████████▍       | 38544/50000 [6:59:31<2:07:16,  1.50it/s]


 77%|█████████████████████████▍       | 38545/50000 [6:59:32<2:11:59,  1.45it/s]


 77%|█████████████████████████▍       | 38546/50000 [6:59:33<2:06:27,  1.51it/s]


 77%|█████████████████████████▍       | 38547/50000 [6:59:33<2:01:31,  1.57it/s]


 77%|█████████████████████████▍       | 38548/50000 [6:59:34<1:57:42,  1.62it/s]


 77%|█████████████████████████▍       | 38549/50000 [6:59:35<2:01:30,  1.57it/s]


 77%|█████████████████████████▍       | 38550/50000 [6:59:35<1:54:55,  1.66it/s]


 77%|█████████████████████████▍       | 38551/50000 [6:59:36<2:01:51,  1.57it/s]


 77%|█████████████████████████▍       | 38552/50000 [6:59:36<1:57:33,  1.62it/s]


 77%|█████████████████████████▍       | 38553/50000 [6:59:37<2:02:59,  1.55it/s]


 77%|█████████████████████████▍       | 38554/50000 [6:59:38<2:00:36,  1.58it/s]


 77%|█████████████████████████▍       | 38555/50000 [6:59:38<2:00:46,  1.58it/s]


 77%|█████████████████████████▍       | 38556/50000 [6:59:39<2:03:47,  1.54it/s]


 77%|█████████████████████████▍       | 38557/50000 [6:59:40<1:59:23,  1.60it/s]


 77%|█████████████████████████▍       | 38558/50000 [6:59:40<2:04:15,  1.53it/s]


 77%|█████████████████████████▍       | 38559/50000 [6:59:41<2:03:19,  1.55it/s]


 77%|█████████████████████████▍       | 38560/50000 [6:59:41<2:00:27,  1.58it/s]


 77%|█████████████████████████▍       | 38561/50000 [6:59:42<2:00:59,  1.58it/s]


 77%|█████████████████████████▍       | 38562/50000 [6:59:43<2:02:50,  1.55it/s]


 77%|█████████████████████████▍       | 38563/50000 [6:59:43<2:01:56,  1.56it/s]


 77%|█████████████████████████▍       | 38564/50000 [6:59:44<2:06:51,  1.50it/s]


 77%|█████████████████████████▍       | 38565/50000 [6:59:45<2:11:02,  1.45it/s]


 77%|█████████████████████████▍       | 38566/50000 [6:59:46<2:09:05,  1.48it/s]


 77%|█████████████████████████▍       | 38567/50000 [6:59:46<2:03:01,  1.55it/s]


 77%|█████████████████████████▍       | 38568/50000 [6:59:47<2:03:52,  1.54it/s]


 77%|█████████████████████████▍       | 38569/50000 [6:59:47<2:04:26,  1.53it/s]


 77%|█████████████████████████▍       | 38570/50000 [6:59:48<2:04:28,  1.53it/s]


 77%|█████████████████████████▍       | 38571/50000 [6:59:49<2:16:12,  1.40it/s]


 77%|█████████████████████████▍       | 38572/50000 [6:59:50<2:11:49,  1.44it/s]


 77%|█████████████████████████▍       | 38573/50000 [6:59:50<2:19:36,  1.36it/s]


 77%|█████████████████████████▍       | 38574/50000 [6:59:51<2:11:58,  1.44it/s]


 77%|█████████████████████████▍       | 38575/50000 [6:59:52<2:16:35,  1.39it/s]


 77%|█████████████████████████▍       | 38576/50000 [6:59:53<2:22:10,  1.34it/s]


 77%|█████████████████████████▍       | 38577/50000 [6:59:53<2:16:04,  1.40it/s]


 77%|█████████████████████████▍       | 38578/50000 [6:59:54<2:13:44,  1.42it/s]


 77%|█████████████████████████▍       | 38579/50000 [6:59:55<2:12:31,  1.44it/s]


 77%|█████████████████████████▍       | 38580/50000 [6:59:55<2:21:55,  1.34it/s]


 77%|█████████████████████████▍       | 38581/50000 [6:59:56<2:12:09,  1.44it/s]


 77%|█████████████████████████▍       | 38582/50000 [6:59:57<2:05:18,  1.52it/s]


 77%|█████████████████████████▍       | 38583/50000 [6:59:57<2:14:12,  1.42it/s]


 77%|█████████████████████████▍       | 38584/50000 [6:59:58<2:21:57,  1.34it/s]


 77%|█████████████████████████▍       | 38585/50000 [6:59:59<2:11:05,  1.45it/s]


 77%|█████████████████████████▍       | 38586/50000 [6:59:59<2:05:59,  1.51it/s]


 77%|█████████████████████████▍       | 38587/50000 [7:00:00<2:02:47,  1.55it/s]


 77%|█████████████████████████▍       | 38588/50000 [7:00:01<2:19:47,  1.36it/s]


 77%|█████████████████████████▍       | 38589/50000 [7:00:02<2:11:21,  1.45it/s]


 77%|█████████████████████████▍       | 38590/50000 [7:00:02<2:07:38,  1.49it/s]


 77%|█████████████████████████▍       | 38591/50000 [7:00:03<2:11:35,  1.44it/s]


 77%|█████████████████████████▍       | 38592/50000 [7:00:03<2:04:25,  1.53it/s]


 77%|█████████████████████████▍       | 38593/50000 [7:00:04<2:07:45,  1.49it/s]


 77%|█████████████████████████▍       | 38594/50000 [7:00:05<2:08:14,  1.48it/s]


 77%|█████████████████████████▍       | 38595/50000 [7:00:05<2:03:50,  1.53it/s]


 77%|█████████████████████████▍       | 38596/50000 [7:00:06<2:09:58,  1.46it/s]


 77%|█████████████████████████▍       | 38597/50000 [7:00:07<2:04:21,  1.53it/s]


 77%|█████████████████████████▍       | 38598/50000 [7:00:07<1:59:41,  1.59it/s]


 77%|█████████████████████████▍       | 38599/50000 [7:00:08<2:01:30,  1.56it/s]


 77%|█████████████████████████▍       | 38600/50000 [7:00:09<2:07:14,  1.49it/s]
                                                                                
{'loss': 3.1754, 'grad_norm': 3.8445942401885986, 'learning_rate': 0.000228, 'epoch': 2.02}

 77%|█████████████████████████▍       | 38600/50000 [7:00:09<2:07:14,  1.49it/s]


 77%|█████████████████████████▍       | 38601/50000 [7:00:09<2:08:12,  1.48it/s]


 77%|█████████████████████████▍       | 38602/50000 [7:00:10<2:06:20,  1.50it/s]


 77%|█████████████████████████▍       | 38603/50000 [7:00:11<2:00:51,  1.57it/s]


 77%|█████████████████████████▍       | 38604/50000 [7:00:11<2:05:39,  1.51it/s]


 77%|█████████████████████████▍       | 38605/50000 [7:00:12<2:04:53,  1.52it/s]


 77%|█████████████████████████▍       | 38606/50000 [7:00:13<2:06:10,  1.51it/s]


 77%|█████████████████████████▍       | 38607/50000 [7:00:13<2:09:20,  1.47it/s]


 77%|█████████████████████████▍       | 38608/50000 [7:00:14<2:15:19,  1.40it/s]


 77%|█████████████████████████▍       | 38609/50000 [7:00:15<2:07:48,  1.49it/s]


 77%|█████████████████████████▍       | 38610/50000 [7:00:15<2:02:31,  1.55it/s]


 77%|█████████████████████████▍       | 38611/50000 [7:00:16<1:57:01,  1.62it/s]


 77%|█████████████████████████▍       | 38612/50000 [7:00:17<1:56:24,  1.63it/s]


 77%|█████████████████████████▍       | 38613/50000 [7:00:17<1:53:07,  1.68it/s]


 77%|█████████████████████████▍       | 38614/50000 [7:00:18<2:05:56,  1.51it/s]


 77%|█████████████████████████▍       | 38615/50000 [7:00:19<2:10:00,  1.46it/s]


 77%|█████████████████████████▍       | 38616/50000 [7:00:20<2:18:06,  1.37it/s]


 77%|█████████████████████████▍       | 38617/50000 [7:00:20<2:20:00,  1.36it/s]


 77%|█████████████████████████▍       | 38618/50000 [7:00:21<2:09:59,  1.46it/s]


 77%|█████████████████████████▍       | 38619/50000 [7:00:21<2:05:05,  1.52it/s]


 77%|█████████████████████████▍       | 38620/50000 [7:00:22<2:03:46,  1.53it/s]


 77%|█████████████████████████▍       | 38621/50000 [7:00:23<2:05:42,  1.51it/s]


 77%|█████████████████████████▍       | 38622/50000 [7:00:23<2:05:13,  1.51it/s]


 77%|█████████████████████████▍       | 38623/50000 [7:00:24<2:05:38,  1.51it/s]


 77%|█████████████████████████▍       | 38624/50000 [7:00:25<2:21:31,  1.34it/s]


 77%|█████████████████████████▍       | 38625/50000 [7:00:26<2:12:21,  1.43it/s]


 77%|█████████████████████████▍       | 38626/50000 [7:00:26<2:04:29,  1.52it/s]


 77%|█████████████████████████▍       | 38627/50000 [7:00:27<2:05:27,  1.51it/s]


 77%|█████████████████████████▍       | 38628/50000 [7:00:28<2:06:07,  1.50it/s]


 77%|█████████████████████████▍       | 38629/50000 [7:00:28<1:57:23,  1.61it/s]


 77%|█████████████████████████▍       | 38630/50000 [7:00:29<2:00:06,  1.58it/s]


 77%|█████████████████████████▍       | 38631/50000 [7:00:29<1:52:49,  1.68it/s]


 77%|█████████████████████████▍       | 38632/50000 [7:00:30<1:50:51,  1.71it/s]


 77%|█████████████████████████▍       | 38633/50000 [7:00:31<2:00:10,  1.58it/s]


 77%|█████████████████████████▍       | 38634/50000 [7:00:31<2:13:23,  1.42it/s]


 77%|█████████████████████████▍       | 38635/50000 [7:00:32<2:08:08,  1.48it/s]


 77%|█████████████████████████▍       | 38636/50000 [7:00:33<2:04:06,  1.53it/s]


 77%|█████████████████████████▌       | 38637/50000 [7:00:33<1:59:19,  1.59it/s]


 77%|█████████████████████████▌       | 38638/50000 [7:00:34<2:04:43,  1.52it/s]


 77%|█████████████████████████▌       | 38639/50000 [7:00:35<2:04:02,  1.53it/s]


 77%|█████████████████████████▌       | 38640/50000 [7:00:35<1:58:21,  1.60it/s]


 77%|█████████████████████████▌       | 38641/50000 [7:00:36<2:03:28,  1.53it/s]


 77%|█████████████████████████▌       | 38642/50000 [7:00:36<1:59:58,  1.58it/s]


 77%|█████████████████████████▌       | 38643/50000 [7:00:37<1:58:35,  1.60it/s]


 77%|█████████████████████████▌       | 38644/50000 [7:00:38<1:54:05,  1.66it/s]


 77%|█████████████████████████▌       | 38645/50000 [7:00:38<1:55:54,  1.63it/s]


 77%|█████████████████████████▌       | 38646/50000 [7:00:39<1:58:59,  1.59it/s]


 77%|█████████████████████████▌       | 38647/50000 [7:00:39<1:55:42,  1.64it/s]


 77%|█████████████████████████▌       | 38648/50000 [7:00:40<1:57:59,  1.60it/s]


 77%|█████████████████████████▌       | 38649/50000 [7:00:41<1:59:12,  1.59it/s]


 77%|█████████████████████████▌       | 38650/50000 [7:00:41<1:56:13,  1.63it/s]


 77%|█████████████████████████▌       | 38651/50000 [7:00:42<1:57:43,  1.61it/s]


 77%|█████████████████████████▌       | 38652/50000 [7:00:43<2:00:14,  1.57it/s]


 77%|█████████████████████████▌       | 38653/50000 [7:00:43<2:06:48,  1.49it/s]


 77%|█████████████████████████▌       | 38654/50000 [7:00:44<2:04:48,  1.52it/s]


 77%|█████████████████████████▌       | 38655/50000 [7:00:45<2:00:53,  1.56it/s]


 77%|█████████████████████████▌       | 38656/50000 [7:00:45<2:08:14,  1.47it/s]


 77%|█████████████████████████▌       | 38657/50000 [7:00:46<2:06:31,  1.49it/s]


 77%|█████████████████████████▌       | 38658/50000 [7:00:47<2:00:35,  1.57it/s]


 77%|█████████████████████████▌       | 38659/50000 [7:00:47<2:05:26,  1.51it/s]


 77%|█████████████████████████▌       | 38660/50000 [7:00:48<1:56:01,  1.63it/s]


 77%|█████████████████████████▌       | 38661/50000 [7:00:48<1:54:57,  1.64it/s]


 77%|█████████████████████████▌       | 38662/50000 [7:00:49<1:58:08,  1.60it/s]


 77%|█████████████████████████▌       | 38663/50000 [7:00:50<2:08:36,  1.47it/s]


 77%|█████████████████████████▌       | 38664/50000 [7:00:51<2:10:13,  1.45it/s]


 77%|█████████████████████████▌       | 38665/50000 [7:00:51<2:07:21,  1.48it/s]


 77%|█████████████████████████▌       | 38666/50000 [7:00:52<2:06:48,  1.49it/s]


 77%|█████████████████████████▌       | 38667/50000 [7:00:52<2:01:46,  1.55it/s]


 77%|█████████████████████████▌       | 38668/50000 [7:00:53<1:57:33,  1.61it/s]


 77%|█████████████████████████▌       | 38669/50000 [7:00:54<1:56:41,  1.62it/s]


 77%|█████████████████████████▌       | 38670/50000 [7:00:54<1:57:52,  1.60it/s]


 77%|█████████████████████████▌       | 38671/50000 [7:00:55<1:57:40,  1.60it/s]


 77%|█████████████████████████▌       | 38672/50000 [7:00:56<1:59:59,  1.57it/s]


 77%|█████████████████████████▌       | 38673/50000 [7:00:56<2:00:54,  1.56it/s]


 77%|█████████████████████████▌       | 38674/50000 [7:00:57<1:58:20,  1.60it/s]


 77%|█████████████████████████▌       | 38675/50000 [7:00:57<1:52:25,  1.68it/s]


 77%|█████████████████████████▌       | 38676/50000 [7:00:58<2:01:27,  1.55it/s]


 77%|█████████████████████████▌       | 38677/50000 [7:00:59<2:01:13,  1.56it/s]


 77%|█████████████████████████▌       | 38678/50000 [7:00:59<2:00:41,  1.56it/s]


 77%|█████████████████████████▌       | 38679/50000 [7:01:00<2:04:42,  1.51it/s]


 77%|█████████████████████████▌       | 38680/50000 [7:01:01<2:02:26,  1.54it/s]


 77%|█████████████████████████▌       | 38681/50000 [7:01:01<2:08:28,  1.47it/s]


 77%|█████████████████████████▌       | 38682/50000 [7:01:02<2:06:48,  1.49it/s]


 77%|█████████████████████████▌       | 38683/50000 [7:01:03<2:04:54,  1.51it/s]


 77%|█████████████████████████▌       | 38684/50000 [7:01:03<2:00:46,  1.56it/s]


 77%|█████████████████████████▌       | 38685/50000 [7:01:04<2:02:51,  1.53it/s]


 77%|█████████████████████████▌       | 38686/50000 [7:01:05<2:03:57,  1.52it/s]


 77%|█████████████████████████▌       | 38687/50000 [7:01:05<2:00:26,  1.57it/s]


 77%|█████████████████████████▌       | 38688/50000 [7:01:06<2:05:35,  1.50it/s]


 77%|█████████████████████████▌       | 38689/50000 [7:01:07<2:09:14,  1.46it/s]


 77%|█████████████████████████▌       | 38690/50000 [7:01:07<2:12:29,  1.42it/s]


 77%|█████████████████████████▌       | 38691/50000 [7:01:08<2:08:10,  1.47it/s]


 77%|█████████████████████████▌       | 38692/50000 [7:01:09<2:01:02,  1.56it/s]


 77%|█████████████████████████▌       | 38693/50000 [7:01:09<2:01:45,  1.55it/s]


 77%|█████████████████████████▌       | 38694/50000 [7:01:10<1:58:19,  1.59it/s]


 77%|█████████████████████████▌       | 38695/50000 [7:01:11<1:57:17,  1.61it/s]


 77%|█████████████████████████▌       | 38696/50000 [7:01:11<2:04:34,  1.51it/s]


 77%|█████████████████████████▌       | 38697/50000 [7:01:12<1:56:30,  1.62it/s]


 77%|█████████████████████████▌       | 38698/50000 [7:01:12<2:02:05,  1.54it/s]


 77%|█████████████████████████▌       | 38699/50000 [7:01:13<1:57:48,  1.60it/s]


 77%|█████████████████████████▌       | 38700/50000 [7:01:14<1:59:44,  1.57it/s]
                                                                                
{'loss': 3.1652, 'grad_norm': 3.553438425064087, 'learning_rate': 0.00022600000000000002, 'epoch': 2.03}

 77%|█████████████████████████▌       | 38700/50000 [7:01:14<1:59:44,  1.57it/s]


 77%|█████████████████████████▌       | 38701/50000 [7:01:15<2:10:24,  1.44it/s]


 77%|█████████████████████████▌       | 38702/50000 [7:01:15<2:04:39,  1.51it/s]


 77%|█████████████████████████▌       | 38703/50000 [7:01:16<2:03:27,  1.52it/s]


 77%|█████████████████████████▌       | 38704/50000 [7:01:16<1:59:48,  1.57it/s]


 77%|█████████████████████████▌       | 38705/50000 [7:01:17<2:05:15,  1.50it/s]


 77%|█████████████████████████▌       | 38706/50000 [7:01:18<1:57:18,  1.60it/s]


 77%|█████████████████████████▌       | 38707/50000 [7:01:18<1:55:04,  1.64it/s]


 77%|█████████████████████████▌       | 38708/50000 [7:01:19<2:13:07,  1.41it/s]


 77%|█████████████████████████▌       | 38709/50000 [7:01:20<2:26:51,  1.28it/s]


 77%|█████████████████████████▌       | 38710/50000 [7:01:21<2:24:45,  1.30it/s]


 77%|█████████████████████████▌       | 38711/50000 [7:01:21<2:14:13,  1.40it/s]


 77%|█████████████████████████▌       | 38712/50000 [7:01:22<2:10:01,  1.45it/s]


 77%|█████████████████████████▌       | 38713/50000 [7:01:23<2:09:13,  1.46it/s]


 77%|█████████████████████████▌       | 38714/50000 [7:01:23<2:02:21,  1.54it/s]


 77%|█████████████████████████▌       | 38715/50000 [7:01:24<1:56:44,  1.61it/s]


 77%|█████████████████████████▌       | 38716/50000 [7:01:25<1:59:34,  1.57it/s]


 77%|█████████████████████████▌       | 38717/50000 [7:01:25<2:01:55,  1.54it/s]


 77%|█████████████████████████▌       | 38718/50000 [7:01:26<2:13:35,  1.41it/s]


 77%|█████████████████████████▌       | 38719/50000 [7:01:27<2:07:46,  1.47it/s]


 77%|█████████████████████████▌       | 38720/50000 [7:01:27<2:13:30,  1.41it/s]


 77%|█████████████████████████▌       | 38721/50000 [7:01:28<2:11:38,  1.43it/s]


 77%|█████████████████████████▌       | 38722/50000 [7:01:29<2:07:29,  1.47it/s]


 77%|█████████████████████████▌       | 38723/50000 [7:01:29<2:00:41,  1.56it/s]


 77%|█████████████████████████▌       | 38724/50000 [7:01:30<2:04:27,  1.51it/s]


 77%|█████████████████████████▌       | 38725/50000 [7:01:31<2:03:51,  1.52it/s]


 77%|█████████████████████████▌       | 38726/50000 [7:01:31<2:03:25,  1.52it/s]


 77%|█████████████████████████▌       | 38727/50000 [7:01:32<2:02:04,  1.54it/s]


 77%|█████████████████████████▌       | 38728/50000 [7:01:33<2:05:25,  1.50it/s]


 77%|█████████████████████████▌       | 38729/50000 [7:01:33<2:04:09,  1.51it/s]


 77%|█████████████████████████▌       | 38730/50000 [7:01:34<2:06:54,  1.48it/s]


 77%|█████████████████████████▌       | 38731/50000 [7:01:35<2:03:48,  1.52it/s]


 77%|█████████████████████████▌       | 38732/50000 [7:01:35<2:07:27,  1.47it/s]


 77%|█████████████████████████▌       | 38733/50000 [7:01:36<2:05:36,  1.50it/s]


 77%|█████████████████████████▌       | 38734/50000 [7:01:37<2:01:05,  1.55it/s]


 77%|█████████████████████████▌       | 38735/50000 [7:01:37<1:56:54,  1.61it/s]


 77%|█████████████████████████▌       | 38736/50000 [7:01:38<1:56:59,  1.60it/s]


 77%|█████████████████████████▌       | 38737/50000 [7:01:38<1:58:00,  1.59it/s]


 77%|█████████████████████████▌       | 38738/50000 [7:01:39<1:58:42,  1.58it/s]


 77%|█████████████████████████▌       | 38739/50000 [7:01:40<1:57:04,  1.60it/s]


 77%|█████████████████████████▌       | 38740/50000 [7:01:40<1:54:51,  1.63it/s]


 77%|█████████████████████████▌       | 38741/50000 [7:01:41<2:08:14,  1.46it/s]


 77%|█████████████████████████▌       | 38742/50000 [7:01:42<2:00:55,  1.55it/s]


 77%|█████████████████████████▌       | 38743/50000 [7:01:42<1:58:36,  1.58it/s]


 77%|█████████████████████████▌       | 38744/50000 [7:01:43<1:56:58,  1.60it/s]


 77%|█████████████████████████▌       | 38745/50000 [7:01:44<1:58:40,  1.58it/s]


 77%|█████████████████████████▌       | 38746/50000 [7:01:44<1:54:32,  1.64it/s]


 77%|█████████████████████████▌       | 38747/50000 [7:01:45<1:47:35,  1.74it/s]


 77%|█████████████████████████▌       | 38748/50000 [7:01:45<1:53:12,  1.66it/s]


 77%|█████████████████████████▌       | 38749/50000 [7:01:46<1:55:41,  1.62it/s]


 78%|█████████████████████████▌       | 38750/50000 [7:01:47<1:57:03,  1.60it/s]


 78%|█████████████████████████▌       | 38751/50000 [7:01:47<1:53:04,  1.66it/s]


 78%|█████████████████████████▌       | 38752/50000 [7:01:48<2:05:25,  1.49it/s]


 78%|█████████████████████████▌       | 38753/50000 [7:01:48<1:59:30,  1.57it/s]


 78%|█████████████████████████▌       | 38754/50000 [7:01:49<2:01:43,  1.54it/s]


 78%|█████████████████████████▌       | 38755/50000 [7:01:50<1:57:57,  1.59it/s]


 78%|█████████████████████████▌       | 38756/50000 [7:01:50<1:55:18,  1.63it/s]


 78%|█████████████████████████▌       | 38757/50000 [7:01:51<1:58:30,  1.58it/s]


 78%|█████████████████████████▌       | 38758/50000 [7:01:52<1:55:43,  1.62it/s]


 78%|█████████████████████████▌       | 38759/50000 [7:01:52<1:58:22,  1.58it/s]


 78%|█████████████████████████▌       | 38760/50000 [7:01:53<2:00:55,  1.55it/s]


 78%|█████████████████████████▌       | 38761/50000 [7:01:54<2:02:43,  1.53it/s]


 78%|█████████████████████████▌       | 38762/50000 [7:01:54<2:01:28,  1.54it/s]


 78%|█████████████████████████▌       | 38763/50000 [7:01:55<2:02:00,  1.54it/s]


 78%|█████████████████████████▌       | 38764/50000 [7:01:55<1:58:07,  1.59it/s]


 78%|█████████████████████████▌       | 38765/50000 [7:01:56<2:00:44,  1.55it/s]


 78%|█████████████████████████▌       | 38766/50000 [7:01:57<2:01:51,  1.54it/s]


 78%|█████████████████████████▌       | 38767/50000 [7:01:57<2:03:00,  1.52it/s]


 78%|█████████████████████████▌       | 38768/50000 [7:01:58<2:04:30,  1.50it/s]


 78%|█████████████████████████▌       | 38769/50000 [7:01:59<1:58:06,  1.58it/s]


 78%|█████████████████████████▌       | 38770/50000 [7:01:59<1:58:56,  1.57it/s]


 78%|█████████████████████████▌       | 38771/50000 [7:02:00<1:57:02,  1.60it/s]


 78%|█████████████████████████▌       | 38772/50000 [7:02:01<1:55:21,  1.62it/s]


 78%|█████████████████████████▌       | 38773/50000 [7:02:01<2:01:28,  1.54it/s]


 78%|█████████████████████████▌       | 38774/50000 [7:02:02<2:06:00,  1.48it/s]


 78%|█████████████████████████▌       | 38775/50000 [7:02:03<2:00:12,  1.56it/s]


 78%|█████████████████████████▌       | 38776/50000 [7:02:03<2:00:01,  1.56it/s]


 78%|█████████████████████████▌       | 38777/50000 [7:02:04<1:56:43,  1.60it/s]


 78%|█████████████████████████▌       | 38778/50000 [7:02:05<2:04:29,  1.50it/s]


 78%|█████████████████████████▌       | 38779/50000 [7:02:05<1:58:25,  1.58it/s]


 78%|█████████████████████████▌       | 38780/50000 [7:02:06<2:05:38,  1.49it/s]


 78%|█████████████████████████▌       | 38781/50000 [7:02:06<2:01:19,  1.54it/s]


 78%|█████████████████████████▌       | 38782/50000 [7:02:07<2:05:05,  1.49it/s]


 78%|█████████████████████████▌       | 38783/50000 [7:02:08<2:04:55,  1.50it/s]


 78%|█████████████████████████▌       | 38784/50000 [7:02:08<2:01:37,  1.54it/s]


 78%|█████████████████████████▌       | 38785/50000 [7:02:09<2:00:48,  1.55it/s]


 78%|█████████████████████████▌       | 38786/50000 [7:02:10<2:02:39,  1.52it/s]


 78%|█████████████████████████▌       | 38787/50000 [7:02:11<2:13:07,  1.40it/s]


 78%|█████████████████████████▌       | 38788/50000 [7:02:11<2:08:47,  1.45it/s]


 78%|█████████████████████████▌       | 38789/50000 [7:02:12<2:02:54,  1.52it/s]


 78%|█████████████████████████▌       | 38790/50000 [7:02:12<1:59:29,  1.56it/s]


 78%|█████████████████████████▌       | 38791/50000 [7:02:13<2:03:51,  1.51it/s]


 78%|█████████████████████████▌       | 38792/50000 [7:02:14<1:58:46,  1.57it/s]


 78%|█████████████████████████▌       | 38793/50000 [7:02:14<1:59:04,  1.57it/s]


 78%|█████████████████████████▌       | 38794/50000 [7:02:15<1:56:15,  1.61it/s]


 78%|█████████████████████████▌       | 38795/50000 [7:02:16<1:58:25,  1.58it/s]


 78%|█████████████████████████▌       | 38796/50000 [7:02:16<2:03:15,  1.51it/s]


 78%|█████████████████████████▌       | 38797/50000 [7:02:17<2:12:36,  1.41it/s]


 78%|█████████████████████████▌       | 38798/50000 [7:02:18<2:12:42,  1.41it/s]


 78%|█████████████████████████▌       | 38799/50000 [7:02:19<2:15:46,  1.38it/s]


 78%|█████████████████████████▌       | 38800/50000 [7:02:20<2:27:31,  1.27it/s]
                                                                                
{'loss': 3.1845, 'grad_norm': 3.47965669631958, 'learning_rate': 0.000224, 'epoch': 2.03}

 78%|█████████████████████████▌       | 38800/50000 [7:02:20<2:27:31,  1.27it/s]


 78%|█████████████████████████▌       | 38801/50000 [7:02:20<2:16:33,  1.37it/s]


 78%|█████████████████████████▌       | 38802/50000 [7:02:21<2:08:19,  1.45it/s]


 78%|█████████████████████████▌       | 38803/50000 [7:02:21<2:06:06,  1.48it/s]


 78%|█████████████████████████▌       | 38804/50000 [7:02:22<2:04:33,  1.50it/s]


 78%|█████████████████████████▌       | 38805/50000 [7:02:23<2:03:05,  1.52it/s]


 78%|█████████████████████████▌       | 38806/50000 [7:02:23<2:02:37,  1.52it/s]


 78%|█████████████████████████▌       | 38807/50000 [7:02:24<2:03:45,  1.51it/s]


 78%|█████████████████████████▌       | 38808/50000 [7:02:25<2:03:05,  1.52it/s]


 78%|█████████████████████████▌       | 38809/50000 [7:02:25<2:00:53,  1.54it/s]


 78%|█████████████████████████▌       | 38810/50000 [7:02:26<2:05:50,  1.48it/s]


 78%|█████████████████████████▌       | 38811/50000 [7:02:27<2:03:10,  1.51it/s]


 78%|█████████████████████████▌       | 38812/50000 [7:02:27<2:01:16,  1.54it/s]


 78%|█████████████████████████▌       | 38813/50000 [7:02:28<1:55:44,  1.61it/s]


 78%|█████████████████████████▌       | 38814/50000 [7:02:29<2:01:30,  1.53it/s]


 78%|█████████████████████████▌       | 38815/50000 [7:02:29<1:59:00,  1.57it/s]


 78%|█████████████████████████▌       | 38816/50000 [7:02:30<2:08:35,  1.45it/s]


 78%|█████████████████████████▌       | 38817/50000 [7:02:31<2:06:07,  1.48it/s]


 78%|█████████████████████████▌       | 38818/50000 [7:02:31<2:01:31,  1.53it/s]


 78%|█████████████████████████▌       | 38819/50000 [7:02:32<2:05:55,  1.48it/s]


 78%|█████████████████████████▌       | 38820/50000 [7:02:33<1:59:19,  1.56it/s]


 78%|█████████████████████████▌       | 38821/50000 [7:02:33<2:03:35,  1.51it/s]


 78%|█████████████████████████▌       | 38822/50000 [7:02:34<2:03:47,  1.51it/s]


 78%|█████████████████████████▌       | 38823/50000 [7:02:35<2:00:17,  1.55it/s]


 78%|█████████████████████████▌       | 38824/50000 [7:02:35<2:01:04,  1.54it/s]


 78%|█████████████████████████▌       | 38825/50000 [7:02:36<2:04:28,  1.50it/s]


 78%|█████████████████████████▋       | 38826/50000 [7:02:37<2:02:36,  1.52it/s]


 78%|█████████████████████████▋       | 38827/50000 [7:02:37<2:03:05,  1.51it/s]


 78%|█████████████████████████▋       | 38828/50000 [7:02:38<2:05:54,  1.48it/s]


 78%|█████████████████████████▋       | 38829/50000 [7:02:39<2:07:51,  1.46it/s]


 78%|█████████████████████████▋       | 38830/50000 [7:02:39<2:02:51,  1.52it/s]


 78%|█████████████████████████▋       | 38831/50000 [7:02:40<2:03:14,  1.51it/s]


 78%|█████████████████████████▋       | 38832/50000 [7:02:41<2:01:21,  1.53it/s]


 78%|█████████████████████████▋       | 38833/50000 [7:02:41<2:15:36,  1.37it/s]


 78%|█████████████████████████▋       | 38834/50000 [7:02:42<2:06:50,  1.47it/s]


 78%|█████████████████████████▋       | 38835/50000 [7:02:43<2:05:56,  1.48it/s]


 78%|█████████████████████████▋       | 38836/50000 [7:02:43<2:03:28,  1.51it/s]


 78%|█████████████████████████▋       | 38837/50000 [7:02:44<1:59:59,  1.55it/s]


 78%|█████████████████████████▋       | 38838/50000 [7:02:44<1:57:33,  1.58it/s]


 78%|█████████████████████████▋       | 38839/50000 [7:02:45<2:00:27,  1.54it/s]


 78%|█████████████████████████▋       | 38840/50000 [7:02:46<1:52:46,  1.65it/s]


 78%|█████████████████████████▋       | 38841/50000 [7:02:46<1:52:29,  1.65it/s]


 78%|█████████████████████████▋       | 38842/50000 [7:02:47<2:01:14,  1.53it/s]


 78%|█████████████████████████▋       | 38843/50000 [7:02:48<1:59:50,  1.55it/s]


 78%|█████████████████████████▋       | 38844/50000 [7:02:48<2:00:14,  1.55it/s]


 78%|█████████████████████████▋       | 38845/50000 [7:02:49<2:05:37,  1.48it/s]


 78%|█████████████████████████▋       | 38846/50000 [7:02:50<1:59:21,  1.56it/s]


 78%|█████████████████████████▋       | 38847/50000 [7:02:50<1:59:54,  1.55it/s]


 78%|█████████████████████████▋       | 38848/50000 [7:02:51<2:10:08,  1.43it/s]


 78%|█████████████████████████▋       | 38849/50000 [7:02:52<2:08:51,  1.44it/s]


 78%|█████████████████████████▋       | 38850/50000 [7:02:52<2:05:09,  1.48it/s]


 78%|█████████████████████████▋       | 38851/50000 [7:02:53<2:03:36,  1.50it/s]


 78%|█████████████████████████▋       | 38852/50000 [7:02:54<2:01:48,  1.53it/s]


 78%|█████████████████████████▋       | 38853/50000 [7:02:54<2:04:47,  1.49it/s]


 78%|█████████████████████████▋       | 38854/50000 [7:02:55<2:18:10,  1.34it/s]


 78%|█████████████████████████▋       | 38855/50000 [7:02:56<2:17:24,  1.35it/s]


 78%|█████████████████████████▋       | 38856/50000 [7:02:57<2:08:17,  1.45it/s]


 78%|█████████████████████████▋       | 38857/50000 [7:02:57<2:07:32,  1.46it/s]


 78%|█████████████████████████▋       | 38858/50000 [7:02:58<2:14:45,  1.38it/s]


 78%|█████████████████████████▋       | 38859/50000 [7:02:59<2:07:08,  1.46it/s]


 78%|█████████████████████████▋       | 38860/50000 [7:02:59<2:03:54,  1.50it/s]


 78%|█████████████████████████▋       | 38861/50000 [7:03:00<2:06:14,  1.47it/s]


 78%|█████████████████████████▋       | 38862/50000 [7:03:01<2:02:11,  1.52it/s]


 78%|█████████████████████████▋       | 38863/50000 [7:03:01<1:57:18,  1.58it/s]


 78%|█████████████████████████▋       | 38864/50000 [7:03:02<1:58:47,  1.56it/s]


 78%|█████████████████████████▋       | 38865/50000 [7:03:03<2:04:51,  1.49it/s]


 78%|█████████████████████████▋       | 38866/50000 [7:03:03<2:09:06,  1.44it/s]


 78%|█████████████████████████▋       | 38867/50000 [7:03:04<2:15:26,  1.37it/s]


 78%|█████████████████████████▋       | 38868/50000 [7:03:05<2:03:52,  1.50it/s]


 78%|█████████████████████████▋       | 38869/50000 [7:03:05<1:59:49,  1.55it/s]


 78%|█████████████████████████▋       | 38870/50000 [7:03:06<1:59:47,  1.55it/s]


 78%|█████████████████████████▋       | 38871/50000 [7:03:07<1:58:44,  1.56it/s]


 78%|█████████████████████████▋       | 38872/50000 [7:03:07<1:59:02,  1.56it/s]


 78%|█████████████████████████▋       | 38873/50000 [7:03:08<1:56:25,  1.59it/s]


 78%|█████████████████████████▋       | 38874/50000 [7:03:09<2:04:59,  1.48it/s]


 78%|█████████████████████████▋       | 38875/50000 [7:03:09<1:58:33,  1.56it/s]


 78%|█████████████████████████▋       | 38876/50000 [7:03:10<1:54:40,  1.62it/s]


 78%|█████████████████████████▋       | 38877/50000 [7:03:10<2:02:16,  1.52it/s]


 78%|█████████████████████████▋       | 38878/50000 [7:03:11<2:05:50,  1.47it/s]


 78%|█████████████████████████▋       | 38879/50000 [7:03:12<2:05:43,  1.47it/s]


 78%|█████████████████████████▋       | 38880/50000 [7:03:13<2:03:13,  1.50it/s]


 78%|█████████████████████████▋       | 38881/50000 [7:03:13<2:09:41,  1.43it/s]


 78%|█████████████████████████▋       | 38882/50000 [7:03:14<2:08:06,  1.45it/s]


 78%|█████████████████████████▋       | 38883/50000 [7:03:15<2:05:53,  1.47it/s]


 78%|█████████████████████████▋       | 38884/50000 [7:03:15<2:04:01,  1.49it/s]


 78%|█████████████████████████▋       | 38885/50000 [7:03:16<2:07:03,  1.46it/s]


 78%|█████████████████████████▋       | 38886/50000 [7:03:17<1:59:25,  1.55it/s]


 78%|█████████████████████████▋       | 38887/50000 [7:03:17<2:03:53,  1.50it/s]


 78%|█████████████████████████▋       | 38888/50000 [7:03:18<2:01:31,  1.52it/s]


 78%|█████████████████████████▋       | 38889/50000 [7:03:19<2:01:59,  1.52it/s]


 78%|█████████████████████████▋       | 38890/50000 [7:03:19<2:01:13,  1.53it/s]


 78%|█████████████████████████▋       | 38891/50000 [7:03:20<2:01:03,  1.53it/s]


 78%|█████████████████████████▋       | 38892/50000 [7:03:20<1:52:42,  1.64it/s]


 78%|█████████████████████████▋       | 38893/50000 [7:03:21<1:53:22,  1.63it/s]


 78%|█████████████████████████▋       | 38894/50000 [7:03:22<1:52:44,  1.64it/s]


 78%|█████████████████████████▋       | 38895/50000 [7:03:22<1:58:16,  1.56it/s]


 78%|█████████████████████████▋       | 38896/50000 [7:03:23<1:55:28,  1.60it/s]


 78%|█████████████████████████▋       | 38897/50000 [7:03:24<1:56:39,  1.59it/s]


 78%|█████████████████████████▋       | 38898/50000 [7:03:24<1:59:36,  1.55it/s]


 78%|█████████████████████████▋       | 38899/50000 [7:03:25<1:56:23,  1.59it/s]


 78%|█████████████████████████▋       | 38900/50000 [7:03:25<1:58:21,  1.56it/s]
                                                                                
{'loss': 3.1646, 'grad_norm': 3.164341449737549, 'learning_rate': 0.000222, 'epoch': 2.04}

 78%|█████████████████████████▋       | 38900/50000 [7:03:25<1:58:21,  1.56it/s]


 78%|█████████████████████████▋       | 38901/50000 [7:03:26<1:53:26,  1.63it/s]


 78%|█████████████████████████▋       | 38902/50000 [7:03:27<2:00:58,  1.53it/s]


 78%|█████████████████████████▋       | 38903/50000 [7:03:27<2:00:52,  1.53it/s]


 78%|█████████████████████████▋       | 38904/50000 [7:03:28<2:01:30,  1.52it/s]


 78%|█████████████████████████▋       | 38905/50000 [7:03:29<1:58:03,  1.57it/s]


 78%|█████████████████████████▋       | 38906/50000 [7:03:29<2:00:11,  1.54it/s]


 78%|█████████████████████████▋       | 38907/50000 [7:03:30<1:56:04,  1.59it/s]


 78%|█████████████████████████▋       | 38908/50000 [7:03:31<1:56:21,  1.59it/s]


 78%|█████████████████████████▋       | 38909/50000 [7:03:31<1:54:05,  1.62it/s]


 78%|█████████████████████████▋       | 38910/50000 [7:03:32<1:57:05,  1.58it/s]


 78%|█████████████████████████▋       | 38911/50000 [7:03:33<1:59:53,  1.54it/s]


 78%|█████████████████████████▋       | 38912/50000 [7:03:33<2:03:15,  1.50it/s]


 78%|█████████████████████████▋       | 38913/50000 [7:03:34<2:02:45,  1.51it/s]


 78%|█████████████████████████▋       | 38914/50000 [7:03:35<2:01:02,  1.53it/s]


 78%|█████████████████████████▋       | 38915/50000 [7:03:35<2:01:34,  1.52it/s]


 78%|█████████████████████████▋       | 38916/50000 [7:03:36<2:04:25,  1.48it/s]


 78%|█████████████████████████▋       | 38917/50000 [7:03:36<1:56:01,  1.59it/s]


 78%|█████████████████████████▋       | 38918/50000 [7:03:37<1:54:30,  1.61it/s]


 78%|█████████████████████████▋       | 38919/50000 [7:03:38<1:53:28,  1.63it/s]


 78%|█████████████████████████▋       | 38920/50000 [7:03:38<1:55:54,  1.59it/s]


 78%|█████████████████████████▋       | 38921/50000 [7:03:39<1:58:20,  1.56it/s]


 78%|█████████████████████████▋       | 38922/50000 [7:03:40<2:08:00,  1.44it/s]


 78%|█████████████████████████▋       | 38923/50000 [7:03:40<1:58:31,  1.56it/s]


 78%|█████████████████████████▋       | 38924/50000 [7:03:41<1:55:14,  1.60it/s]


 78%|█████████████████████████▋       | 38925/50000 [7:03:42<1:56:45,  1.58it/s]


 78%|█████████████████████████▋       | 38926/50000 [7:03:42<1:53:38,  1.62it/s]


 78%|█████████████████████████▋       | 38927/50000 [7:03:43<1:56:01,  1.59it/s]


 78%|█████████████████████████▋       | 38928/50000 [7:03:44<2:03:44,  1.49it/s]


 78%|█████████████████████████▋       | 38929/50000 [7:03:44<2:02:41,  1.50it/s]


 78%|█████████████████████████▋       | 38930/50000 [7:03:45<1:58:23,  1.56it/s]


 78%|█████████████████████████▋       | 38931/50000 [7:03:45<1:57:35,  1.57it/s]


 78%|█████████████████████████▋       | 38932/50000 [7:03:46<1:59:47,  1.54it/s]


 78%|█████████████████████████▋       | 38933/50000 [7:03:47<2:00:59,  1.52it/s]


 78%|█████████████████████████▋       | 38934/50000 [7:03:47<2:01:49,  1.51it/s]


 78%|█████████████████████████▋       | 38935/50000 [7:03:48<2:00:17,  1.53it/s]


 78%|█████████████████████████▋       | 38936/50000 [7:03:49<1:53:05,  1.63it/s]


 78%|█████████████████████████▋       | 38937/50000 [7:03:49<1:49:39,  1.68it/s]


 78%|█████████████████████████▋       | 38938/50000 [7:03:50<1:56:47,  1.58it/s]


 78%|█████████████████████████▋       | 38939/50000 [7:03:50<1:54:18,  1.61it/s]


 78%|█████████████████████████▋       | 38940/50000 [7:03:51<2:00:02,  1.54it/s]


 78%|█████████████████████████▋       | 38941/50000 [7:03:52<2:03:13,  1.50it/s]


 78%|█████████████████████████▋       | 38942/50000 [7:03:52<1:58:28,  1.56it/s]


 78%|█████████████████████████▋       | 38943/50000 [7:03:53<1:59:19,  1.54it/s]


 78%|█████████████████████████▋       | 38944/50000 [7:03:54<2:08:39,  1.43it/s]


 78%|█████████████████████████▋       | 38945/50000 [7:03:55<2:05:22,  1.47it/s]


 78%|█████████████████████████▋       | 38946/50000 [7:03:55<2:00:17,  1.53it/s]


 78%|█████████████████████████▋       | 38947/50000 [7:03:56<2:02:06,  1.51it/s]


 78%|█████████████████████████▋       | 38948/50000 [7:03:57<2:03:01,  1.50it/s]


 78%|█████████████████████████▋       | 38949/50000 [7:03:57<2:03:37,  1.49it/s]


 78%|█████████████████████████▋       | 38950/50000 [7:03:58<2:03:22,  1.49it/s]


 78%|█████████████████████████▋       | 38951/50000 [7:03:58<1:59:39,  1.54it/s]


 78%|█████████████████████████▋       | 38952/50000 [7:03:59<1:59:11,  1.54it/s]


 78%|█████████████████████████▋       | 38953/50000 [7:04:00<1:56:24,  1.58it/s]


 78%|█████████████████████████▋       | 38954/50000 [7:04:00<2:01:15,  1.52it/s]


 78%|█████████████████████████▋       | 38955/50000 [7:04:01<1:56:48,  1.58it/s]


 78%|█████████████████████████▋       | 38956/50000 [7:04:02<1:53:42,  1.62it/s]


 78%|█████████████████████████▋       | 38957/50000 [7:04:02<1:55:21,  1.60it/s]


 78%|█████████████████████████▋       | 38958/50000 [7:04:03<2:16:08,  1.35it/s]


 78%|█████████████████████████▋       | 38959/50000 [7:04:04<2:06:44,  1.45it/s]


 78%|█████████████████████████▋       | 38960/50000 [7:04:04<2:00:52,  1.52it/s]


 78%|█████████████████████████▋       | 38961/50000 [7:04:05<1:57:27,  1.57it/s]


 78%|█████████████████████████▋       | 38962/50000 [7:04:06<1:55:22,  1.59it/s]


 78%|█████████████████████████▋       | 38963/50000 [7:04:06<1:55:19,  1.59it/s]


 78%|█████████████████████████▋       | 38964/50000 [7:04:07<1:54:14,  1.61it/s]


 78%|█████████████████████████▋       | 38965/50000 [7:04:07<1:57:20,  1.57it/s]


 78%|█████████████████████████▋       | 38966/50000 [7:04:08<2:04:10,  1.48it/s]


 78%|█████████████████████████▋       | 38967/50000 [7:04:09<1:58:22,  1.55it/s]


 78%|█████████████████████████▋       | 38968/50000 [7:04:10<2:02:28,  1.50it/s]


 78%|█████████████████████████▋       | 38969/50000 [7:04:10<1:56:28,  1.58it/s]


 78%|█████████████████████████▋       | 38970/50000 [7:04:11<1:58:11,  1.56it/s]


 78%|█████████████████████████▋       | 38971/50000 [7:04:11<1:54:31,  1.61it/s]


 78%|█████████████████████████▋       | 38972/50000 [7:04:12<1:52:38,  1.63it/s]


 78%|█████████████████████████▋       | 38973/50000 [7:04:13<2:03:28,  1.49it/s]


 78%|█████████████████████████▋       | 38974/50000 [7:04:13<2:05:31,  1.46it/s]


 78%|█████████████████████████▋       | 38975/50000 [7:04:14<1:56:39,  1.58it/s]


 78%|█████████████████████████▋       | 38976/50000 [7:04:15<1:56:12,  1.58it/s]


 78%|█████████████████████████▋       | 38977/50000 [7:04:15<1:56:13,  1.58it/s]


 78%|█████████████████████████▋       | 38978/50000 [7:04:16<1:56:13,  1.58it/s]


 78%|█████████████████████████▋       | 38979/50000 [7:04:16<1:55:33,  1.59it/s]


 78%|█████████████████████████▋       | 38980/50000 [7:04:17<1:57:09,  1.57it/s]


 78%|█████████████████████████▋       | 38981/50000 [7:04:18<1:55:08,  1.60it/s]


 78%|█████████████████████████▋       | 38982/50000 [7:04:18<1:47:59,  1.70it/s]


 78%|█████████████████████████▋       | 38983/50000 [7:04:19<1:52:32,  1.63it/s]


 78%|█████████████████████████▋       | 38984/50000 [7:04:20<1:55:23,  1.59it/s]


 78%|█████████████████████████▋       | 38985/50000 [7:04:20<1:53:31,  1.62it/s]


 78%|█████████████████████████▋       | 38986/50000 [7:04:21<2:01:46,  1.51it/s]


 78%|█████████████████████████▋       | 38987/50000 [7:04:22<2:04:38,  1.47it/s]


 78%|█████████████████████████▋       | 38988/50000 [7:04:22<2:02:26,  1.50it/s]


 78%|█████████████████████████▋       | 38989/50000 [7:04:23<2:05:54,  1.46it/s]


 78%|█████████████████████████▋       | 38990/50000 [7:04:24<2:02:17,  1.50it/s]


 78%|█████████████████████████▋       | 38991/50000 [7:04:24<1:56:40,  1.57it/s]


 78%|█████████████████████████▋       | 38992/50000 [7:04:25<1:56:09,  1.58it/s]


 78%|█████████████████████████▋       | 38993/50000 [7:04:26<2:01:34,  1.51it/s]


 78%|█████████████████████████▋       | 38994/50000 [7:04:26<2:10:43,  1.40it/s]


 78%|█████████████████████████▋       | 38995/50000 [7:04:27<2:04:37,  1.47it/s]


 78%|█████████████████████████▋       | 38996/50000 [7:04:28<2:03:28,  1.49it/s]


 78%|█████████████████████████▋       | 38997/50000 [7:04:28<1:59:10,  1.54it/s]


 78%|█████████████████████████▋       | 38998/50000 [7:04:29<1:58:14,  1.55it/s]


 78%|█████████████████████████▋       | 38999/50000 [7:04:30<1:57:13,  1.56it/s]


 78%|█████████████████████████▋       | 39000/50000 [7:04:30<2:04:41,  1.47it/s]
                                                                                
{'loss': 3.1502, 'grad_norm': 3.6389522552490234, 'learning_rate': 0.00022, 'epoch': 2.04}

 78%|█████████████████████████▋       | 39000/50000 [7:04:30<2:04:41,  1.47it/s]


 78%|█████████████████████████▋       | 39001/50000 [7:04:31<1:56:02,  1.58it/s]


 78%|█████████████████████████▋       | 39002/50000 [7:04:31<1:56:04,  1.58it/s]


 78%|█████████████████████████▋       | 39003/50000 [7:04:32<2:06:07,  1.45it/s]


 78%|█████████████████████████▋       | 39004/50000 [7:04:33<2:03:54,  1.48it/s]


 78%|█████████████████████████▋       | 39005/50000 [7:04:34<2:04:25,  1.47it/s]


 78%|█████████████████████████▋       | 39006/50000 [7:04:34<2:07:56,  1.43it/s]


 78%|█████████████████████████▋       | 39007/50000 [7:04:35<2:05:45,  1.46it/s]


 78%|█████████████████████████▋       | 39008/50000 [7:04:36<2:05:15,  1.46it/s]


 78%|█████████████████████████▋       | 39009/50000 [7:04:36<2:08:09,  1.43it/s]


 78%|█████████████████████████▋       | 39010/50000 [7:04:37<2:01:02,  1.51it/s]


 78%|█████████████████████████▋       | 39011/50000 [7:04:37<1:53:05,  1.62it/s]


 78%|█████████████████████████▋       | 39012/50000 [7:04:38<1:58:28,  1.55it/s]


 78%|█████████████████████████▋       | 39013/50000 [7:04:39<1:59:47,  1.53it/s]


 78%|█████████████████████████▋       | 39014/50000 [7:04:40<2:09:20,  1.42it/s]


 78%|█████████████████████████▋       | 39015/50000 [7:04:40<2:02:52,  1.49it/s]


 78%|█████████████████████████▊       | 39016/50000 [7:04:41<2:00:47,  1.52it/s]


 78%|█████████████████████████▊       | 39017/50000 [7:04:42<2:09:01,  1.42it/s]


 78%|█████████████████████████▊       | 39018/50000 [7:04:42<2:02:17,  1.50it/s]


 78%|█████████████████████████▊       | 39019/50000 [7:04:43<1:56:53,  1.57it/s]


 78%|█████████████████████████▊       | 39020/50000 [7:04:44<2:01:14,  1.51it/s]


 78%|█████████████████████████▊       | 39021/50000 [7:04:44<1:59:15,  1.53it/s]


 78%|█████████████████████████▊       | 39022/50000 [7:04:45<1:56:09,  1.58it/s]


 78%|█████████████████████████▊       | 39023/50000 [7:04:45<1:52:14,  1.63it/s]


 78%|█████████████████████████▊       | 39024/50000 [7:04:46<1:50:12,  1.66it/s]


 78%|█████████████████████████▊       | 39025/50000 [7:04:47<1:52:33,  1.63it/s]


 78%|█████████████████████████▊       | 39026/50000 [7:04:47<1:51:06,  1.65it/s]


 78%|█████████████████████████▊       | 39027/50000 [7:04:48<1:53:12,  1.62it/s]


 78%|█████████████████████████▊       | 39028/50000 [7:04:49<1:58:57,  1.54it/s]


 78%|█████████████████████████▊       | 39029/50000 [7:04:49<1:56:16,  1.57it/s]


 78%|█████████████████████████▊       | 39030/50000 [7:04:50<1:50:02,  1.66it/s]


 78%|█████████████████████████▊       | 39031/50000 [7:04:50<1:58:07,  1.55it/s]


 78%|█████████████████████████▊       | 39032/50000 [7:04:51<2:03:04,  1.49it/s]


 78%|█████████████████████████▊       | 39033/50000 [7:04:52<2:00:32,  1.52it/s]


 78%|█████████████████████████▊       | 39034/50000 [7:04:52<2:01:30,  1.50it/s]


 78%|█████████████████████████▊       | 39035/50000 [7:04:53<2:01:29,  1.50it/s]


 78%|█████████████████████████▊       | 39036/50000 [7:04:54<1:56:38,  1.57it/s]


 78%|█████████████████████████▊       | 39037/50000 [7:04:54<1:54:17,  1.60it/s]


 78%|█████████████████████████▊       | 39038/50000 [7:04:55<1:52:57,  1.62it/s]


 78%|█████████████████████████▊       | 39039/50000 [7:04:56<1:59:25,  1.53it/s]


 78%|█████████████████████████▊       | 39040/50000 [7:04:56<1:56:13,  1.57it/s]


 78%|█████████████████████████▊       | 39041/50000 [7:04:57<1:58:29,  1.54it/s]


 78%|█████████████████████████▊       | 39042/50000 [7:04:58<1:57:56,  1.55it/s]


 78%|█████████████████████████▊       | 39043/50000 [7:04:58<1:49:09,  1.67it/s]


 78%|█████████████████████████▊       | 39044/50000 [7:04:59<1:55:45,  1.58it/s]


 78%|█████████████████████████▊       | 39045/50000 [7:04:59<1:53:36,  1.61it/s]


 78%|█████████████████████████▊       | 39046/50000 [7:05:00<1:49:43,  1.66it/s]


 78%|█████████████████████████▊       | 39047/50000 [7:05:01<1:53:54,  1.60it/s]


 78%|█████████████████████████▊       | 39048/50000 [7:05:01<1:52:39,  1.62it/s]


 78%|█████████████████████████▊       | 39049/50000 [7:05:02<1:55:20,  1.58it/s]


 78%|█████████████████████████▊       | 39050/50000 [7:05:02<1:53:38,  1.61it/s]


 78%|█████████████████████████▊       | 39051/50000 [7:05:03<1:50:03,  1.66it/s]


 78%|█████████████████████████▊       | 39052/50000 [7:05:04<1:49:57,  1.66it/s]


 78%|█████████████████████████▊       | 39053/50000 [7:05:04<1:51:57,  1.63it/s]


 78%|█████████████████████████▊       | 39054/50000 [7:05:05<2:07:57,  1.43it/s]


 78%|█████████████████████████▊       | 39055/50000 [7:05:06<2:03:32,  1.48it/s]


 78%|█████████████████████████▊       | 39056/50000 [7:05:06<2:02:08,  1.49it/s]


 78%|█████████████████████████▊       | 39057/50000 [7:05:07<2:02:52,  1.48it/s]


 78%|█████████████████████████▊       | 39058/50000 [7:05:08<2:14:52,  1.35it/s]


 78%|█████████████████████████▊       | 39059/50000 [7:05:09<2:04:11,  1.47it/s]


 78%|█████████████████████████▊       | 39060/50000 [7:05:09<2:04:19,  1.47it/s]


 78%|█████████████████████████▊       | 39061/50000 [7:05:10<2:01:58,  1.49it/s]


 78%|█████████████████████████▊       | 39062/50000 [7:05:11<2:07:02,  1.43it/s]


 78%|█████████████████████████▊       | 39063/50000 [7:05:11<2:03:52,  1.47it/s]


 78%|█████████████████████████▊       | 39064/50000 [7:05:12<2:03:23,  1.48it/s]


 78%|█████████████████████████▊       | 39065/50000 [7:05:13<2:05:08,  1.46it/s]


 78%|█████████████████████████▊       | 39066/50000 [7:05:13<1:59:26,  1.53it/s]


 78%|█████████████████████████▊       | 39067/50000 [7:05:14<1:58:31,  1.54it/s]


 78%|█████████████████████████▊       | 39068/50000 [7:05:15<1:58:56,  1.53it/s]


 78%|█████████████████████████▊       | 39069/50000 [7:05:15<1:57:51,  1.55it/s]


 78%|█████████████████████████▊       | 39070/50000 [7:05:16<1:58:05,  1.54it/s]


 78%|█████████████████████████▊       | 39071/50000 [7:05:16<1:53:06,  1.61it/s]


 78%|█████████████████████████▊       | 39072/50000 [7:05:17<1:51:00,  1.64it/s]


 78%|█████████████████████████▊       | 39073/50000 [7:05:18<1:50:31,  1.65it/s]


 78%|█████████████████████████▊       | 39074/50000 [7:05:18<1:51:57,  1.63it/s]


 78%|█████████████████████████▊       | 39075/50000 [7:05:19<1:50:51,  1.64it/s]


 78%|█████████████████████████▊       | 39076/50000 [7:05:19<1:50:48,  1.64it/s]


 78%|█████████████████████████▊       | 39077/50000 [7:05:20<1:56:37,  1.56it/s]


 78%|█████████████████████████▊       | 39078/50000 [7:05:21<2:01:52,  1.49it/s]


 78%|█████████████████████████▊       | 39079/50000 [7:05:21<1:59:12,  1.53it/s]


 78%|█████████████████████████▊       | 39080/50000 [7:05:22<2:00:25,  1.51it/s]


 78%|█████████████████████████▊       | 39081/50000 [7:05:23<1:56:06,  1.57it/s]


 78%|█████████████████████████▊       | 39082/50000 [7:05:23<1:55:29,  1.58it/s]


 78%|█████████████████████████▊       | 39083/50000 [7:05:24<1:57:47,  1.54it/s]


 78%|█████████████████████████▊       | 39084/50000 [7:05:25<1:53:34,  1.60it/s]


 78%|█████████████████████████▊       | 39085/50000 [7:05:25<1:48:03,  1.68it/s]


 78%|█████████████████████████▊       | 39086/50000 [7:05:26<1:58:03,  1.54it/s]


 78%|█████████████████████████▊       | 39087/50000 [7:05:27<2:04:58,  1.46it/s]


 78%|█████████████████████████▊       | 39088/50000 [7:05:27<1:57:13,  1.55it/s]


 78%|█████████████████████████▊       | 39089/50000 [7:05:28<1:57:08,  1.55it/s]


 78%|█████████████████████████▊       | 39090/50000 [7:05:29<2:16:19,  1.33it/s]


 78%|█████████████████████████▊       | 39091/50000 [7:05:29<2:07:32,  1.43it/s]


 78%|█████████████████████████▊       | 39092/50000 [7:05:30<2:01:43,  1.49it/s]


 78%|█████████████████████████▊       | 39093/50000 [7:05:31<1:57:59,  1.54it/s]


 78%|█████████████████████████▊       | 39094/50000 [7:05:31<2:03:05,  1.48it/s]


 78%|█████████████████████████▊       | 39095/50000 [7:05:32<1:58:35,  1.53it/s]


 78%|█████████████████████████▊       | 39096/50000 [7:05:33<2:04:54,  1.45it/s]


 78%|█████████████████████████▊       | 39097/50000 [7:05:33<1:55:54,  1.57it/s]


 78%|█████████████████████████▊       | 39098/50000 [7:05:34<2:01:58,  1.49it/s]


 78%|█████████████████████████▊       | 39099/50000 [7:05:35<1:53:50,  1.60it/s]


 78%|█████████████████████████▊       | 39100/50000 [7:05:35<1:59:28,  1.52it/s]
                                                                                
{'loss': 3.2022, 'grad_norm': 2.8556880950927734, 'learning_rate': 0.000218, 'epoch': 2.05}

 78%|█████████████████████████▊       | 39100/50000 [7:05:35<1:59:28,  1.52it/s]


 78%|█████████████████████████▊       | 39101/50000 [7:05:36<2:00:14,  1.51it/s]


 78%|█████████████████████████▊       | 39102/50000 [7:05:37<2:15:35,  1.34it/s]


 78%|█████████████████████████▊       | 39103/50000 [7:05:38<2:10:04,  1.40it/s]


 78%|█████████████████████████▊       | 39104/50000 [7:05:38<2:02:48,  1.48it/s]


 78%|█████████████████████████▊       | 39105/50000 [7:05:39<2:04:56,  1.45it/s]


 78%|█████████████████████████▊       | 39106/50000 [7:05:39<1:58:31,  1.53it/s]


 78%|█████████████████████████▊       | 39107/50000 [7:05:40<1:55:03,  1.58it/s]


 78%|█████████████████████████▊       | 39108/50000 [7:05:41<1:49:02,  1.66it/s]


 78%|█████████████████████████▊       | 39109/50000 [7:05:41<1:44:06,  1.74it/s]


 78%|█████████████████████████▊       | 39110/50000 [7:05:42<1:52:40,  1.61it/s]


 78%|█████████████████████████▊       | 39111/50000 [7:05:42<1:54:43,  1.58it/s]


 78%|█████████████████████████▊       | 39112/50000 [7:05:43<1:54:07,  1.59it/s]


 78%|█████████████████████████▊       | 39113/50000 [7:05:44<1:50:56,  1.64it/s]


 78%|█████████████████████████▊       | 39114/50000 [7:05:44<1:52:52,  1.61it/s]


 78%|█████████████████████████▊       | 39115/50000 [7:05:45<1:54:58,  1.58it/s]


 78%|█████████████████████████▊       | 39116/50000 [7:05:46<1:54:53,  1.58it/s]


 78%|█████████████████████████▊       | 39117/50000 [7:05:46<2:02:57,  1.48it/s]


 78%|█████████████████████████▊       | 39118/50000 [7:05:47<1:58:05,  1.54it/s]


 78%|█████████████████████████▊       | 39119/50000 [7:05:48<2:02:25,  1.48it/s]


 78%|█████████████████████████▊       | 39120/50000 [7:05:48<2:04:36,  1.46it/s]


 78%|█████████████████████████▊       | 39121/50000 [7:05:49<2:07:59,  1.42it/s]


 78%|█████████████████████████▊       | 39122/50000 [7:05:50<2:00:36,  1.50it/s]


 78%|█████████████████████████▊       | 39123/50000 [7:05:50<1:52:30,  1.61it/s]


 78%|█████████████████████████▊       | 39124/50000 [7:05:51<1:48:21,  1.67it/s]


 78%|█████████████████████████▊       | 39125/50000 [7:05:51<1:46:50,  1.70it/s]


 78%|█████████████████████████▊       | 39126/50000 [7:05:52<1:41:51,  1.78it/s]


 78%|█████████████████████████▊       | 39127/50000 [7:05:52<1:42:18,  1.77it/s]


 78%|█████████████████████████▊       | 39128/50000 [7:05:53<1:46:41,  1.70it/s]


 78%|█████████████████████████▊       | 39129/50000 [7:05:54<1:47:01,  1.69it/s]


 78%|█████████████████████████▊       | 39130/50000 [7:05:54<1:45:32,  1.72it/s]


 78%|█████████████████████████▊       | 39131/50000 [7:05:55<1:49:18,  1.66it/s]


 78%|█████████████████████████▊       | 39132/50000 [7:05:56<2:02:34,  1.48it/s]


 78%|█████████████████████████▊       | 39133/50000 [7:05:56<2:06:33,  1.43it/s]


 78%|█████████████████████████▊       | 39134/50000 [7:05:57<2:05:26,  1.44it/s]


 78%|█████████████████████████▊       | 39135/50000 [7:05:58<2:02:16,  1.48it/s]


 78%|█████████████████████████▊       | 39136/50000 [7:05:58<1:58:19,  1.53it/s]


 78%|█████████████████████████▊       | 39137/50000 [7:05:59<1:55:07,  1.57it/s]


 78%|█████████████████████████▊       | 39138/50000 [7:06:00<1:50:52,  1.63it/s]


 78%|█████████████████████████▊       | 39139/50000 [7:06:00<1:44:22,  1.73it/s]


 78%|█████████████████████████▊       | 39140/50000 [7:06:01<1:52:27,  1.61it/s]


 78%|█████████████████████████▊       | 39141/50000 [7:06:01<1:51:01,  1.63it/s]


 78%|█████████████████████████▊       | 39142/50000 [7:06:02<1:50:08,  1.64it/s]


 78%|█████████████████████████▊       | 39143/50000 [7:06:03<1:51:22,  1.62it/s]


 78%|█████████████████████████▊       | 39144/50000 [7:06:03<1:52:25,  1.61it/s]


 78%|█████████████████████████▊       | 39145/50000 [7:06:04<1:53:22,  1.60it/s]


 78%|█████████████████████████▊       | 39146/50000 [7:06:05<1:55:35,  1.56it/s]


 78%|█████████████████████████▊       | 39147/50000 [7:06:05<1:59:30,  1.51it/s]


 78%|█████████████████████████▊       | 39148/50000 [7:06:06<2:00:47,  1.50it/s]


 78%|█████████████████████████▊       | 39149/50000 [7:06:07<2:01:17,  1.49it/s]


 78%|█████████████████████████▊       | 39150/50000 [7:06:07<1:56:48,  1.55it/s]


 78%|█████████████████████████▊       | 39151/50000 [7:06:08<1:54:03,  1.59it/s]


 78%|█████████████████████████▊       | 39152/50000 [7:06:08<1:51:05,  1.63it/s]


 78%|█████████████████████████▊       | 39153/50000 [7:06:09<1:49:42,  1.65it/s]


 78%|█████████████████████████▊       | 39154/50000 [7:06:10<1:58:15,  1.53it/s]


 78%|█████████████████████████▊       | 39155/50000 [7:06:10<1:57:12,  1.54it/s]


 78%|█████████████████████████▊       | 39156/50000 [7:06:11<1:59:06,  1.52it/s]


 78%|█████████████████████████▊       | 39157/50000 [7:06:12<2:10:23,  1.39it/s]


 78%|█████████████████████████▊       | 39158/50000 [7:06:12<2:02:10,  1.48it/s]


 78%|█████████████████████████▊       | 39159/50000 [7:06:13<1:56:46,  1.55it/s]


 78%|█████████████████████████▊       | 39160/50000 [7:06:14<1:54:39,  1.58it/s]


 78%|█████████████████████████▊       | 39161/50000 [7:06:14<1:54:12,  1.58it/s]


 78%|█████████████████████████▊       | 39162/50000 [7:06:15<2:00:33,  1.50it/s]


 78%|█████████████████████████▊       | 39163/50000 [7:06:16<1:56:17,  1.55it/s]


 78%|█████████████████████████▊       | 39164/50000 [7:06:16<1:52:37,  1.60it/s]


 78%|█████████████████████████▊       | 39165/50000 [7:06:17<2:04:26,  1.45it/s]


 78%|█████████████████████████▊       | 39166/50000 [7:06:18<1:58:18,  1.53it/s]


 78%|█████████████████████████▊       | 39167/50000 [7:06:18<1:59:10,  1.52it/s]


 78%|█████████████████████████▊       | 39168/50000 [7:06:19<2:00:36,  1.50it/s]


 78%|█████████████████████████▊       | 39169/50000 [7:06:20<1:55:44,  1.56it/s]


 78%|█████████████████████████▊       | 39170/50000 [7:06:20<1:55:37,  1.56it/s]


 78%|█████████████████████████▊       | 39171/50000 [7:06:21<1:55:33,  1.56it/s]


 78%|█████████████████████████▊       | 39172/50000 [7:06:21<1:55:30,  1.56it/s]


 78%|█████████████████████████▊       | 39173/50000 [7:06:22<1:56:08,  1.55it/s]


 78%|█████████████████████████▊       | 39174/50000 [7:06:23<2:07:33,  1.41it/s]


 78%|█████████████████████████▊       | 39175/50000 [7:06:24<2:03:18,  1.46it/s]


 78%|█████████████████████████▊       | 39176/50000 [7:06:24<1:56:29,  1.55it/s]


 78%|█████████████████████████▊       | 39177/50000 [7:06:25<2:05:21,  1.44it/s]


 78%|█████████████████████████▊       | 39178/50000 [7:06:26<1:59:16,  1.51it/s]


 78%|█████████████████████████▊       | 39179/50000 [7:06:26<2:02:39,  1.47it/s]


 78%|█████████████████████████▊       | 39180/50000 [7:06:27<2:06:03,  1.43it/s]


 78%|█████████████████████████▊       | 39181/50000 [7:06:28<1:56:09,  1.55it/s]


 78%|█████████████████████████▊       | 39182/50000 [7:06:28<1:48:53,  1.66it/s]


 78%|█████████████████████████▊       | 39183/50000 [7:06:29<1:56:04,  1.55it/s]


 78%|█████████████████████████▊       | 39184/50000 [7:06:29<1:54:48,  1.57it/s]


 78%|█████████████████████████▊       | 39185/50000 [7:06:30<1:51:30,  1.62it/s]


 78%|█████████████████████████▊       | 39186/50000 [7:06:31<1:48:52,  1.66it/s]


 78%|█████████████████████████▊       | 39187/50000 [7:06:31<1:52:08,  1.61it/s]


 78%|█████████████████████████▊       | 39188/50000 [7:06:32<1:52:22,  1.60it/s]


 78%|█████████████████████████▊       | 39189/50000 [7:06:32<1:52:33,  1.60it/s]


 78%|█████████████████████████▊       | 39190/50000 [7:06:33<1:57:54,  1.53it/s]


 78%|█████████████████████████▊       | 39191/50000 [7:06:34<1:50:45,  1.63it/s]


 78%|█████████████████████████▊       | 39192/50000 [7:06:34<1:51:23,  1.62it/s]


 78%|█████████████████████████▊       | 39193/50000 [7:06:35<2:00:10,  1.50it/s]


 78%|█████████████████████████▊       | 39194/50000 [7:06:36<2:05:23,  1.44it/s]


 78%|█████████████████████████▊       | 39195/50000 [7:06:37<2:04:13,  1.45it/s]


 78%|█████████████████████████▊       | 39196/50000 [7:06:37<1:57:12,  1.54it/s]


 78%|█████████████████████████▊       | 39197/50000 [7:06:38<1:57:04,  1.54it/s]


 78%|█████████████████████████▊       | 39198/50000 [7:06:39<2:04:16,  1.45it/s]


 78%|█████████████████████████▊       | 39199/50000 [7:06:39<1:58:16,  1.52it/s]


 78%|█████████████████████████▊       | 39200/50000 [7:06:40<1:54:41,  1.57it/s]
                                                                                
{'loss': 3.1822, 'grad_norm': 2.8396432399749756, 'learning_rate': 0.000216, 'epoch': 2.05}

 78%|█████████████████████████▊       | 39200/50000 [7:06:40<1:54:41,  1.57it/s]


 78%|█████████████████████████▊       | 39201/50000 [7:06:40<1:51:14,  1.62it/s]


 78%|█████████████████████████▊       | 39202/50000 [7:06:41<1:54:32,  1.57it/s]


 78%|█████████████████████████▊       | 39203/50000 [7:06:42<1:52:44,  1.60it/s]


 78%|█████████████████████████▊       | 39204/50000 [7:06:42<1:50:28,  1.63it/s]


 78%|█████████████████████████▉       | 39205/50000 [7:06:43<1:51:15,  1.62it/s]


 78%|█████████████████████████▉       | 39206/50000 [7:06:43<1:52:11,  1.60it/s]


 78%|█████████████████████████▉       | 39207/50000 [7:06:44<1:54:29,  1.57it/s]


 78%|█████████████████████████▉       | 39208/50000 [7:06:45<1:48:15,  1.66it/s]


 78%|█████████████████████████▉       | 39209/50000 [7:06:45<2:01:39,  1.48it/s]


 78%|█████████████████████████▉       | 39210/50000 [7:06:46<1:55:45,  1.55it/s]


 78%|█████████████████████████▉       | 39211/50000 [7:06:47<1:57:03,  1.54it/s]


 78%|█████████████████████████▉       | 39212/50000 [7:06:47<1:57:34,  1.53it/s]


 78%|█████████████████████████▉       | 39213/50000 [7:06:48<1:55:53,  1.55it/s]


 78%|█████████████████████████▉       | 39214/50000 [7:06:49<1:53:44,  1.58it/s]


 78%|█████████████████████████▉       | 39215/50000 [7:06:49<1:49:48,  1.64it/s]


 78%|█████████████████████████▉       | 39216/50000 [7:06:50<2:03:44,  1.45it/s]


 78%|█████████████████████████▉       | 39217/50000 [7:06:51<1:57:07,  1.53it/s]


 78%|█████████████████████████▉       | 39218/50000 [7:06:51<1:54:50,  1.56it/s]


 78%|█████████████████████████▉       | 39219/50000 [7:06:52<1:54:38,  1.57it/s]


 78%|█████████████████████████▉       | 39220/50000 [7:06:53<1:56:33,  1.54it/s]


 78%|█████████████████████████▉       | 39221/50000 [7:06:53<2:02:05,  1.47it/s]


 78%|█████████████████████████▉       | 39222/50000 [7:06:54<1:59:01,  1.51it/s]


 78%|█████████████████████████▉       | 39223/50000 [7:06:54<1:54:49,  1.56it/s]


 78%|█████████████████████████▉       | 39224/50000 [7:06:55<1:55:59,  1.55it/s]


 78%|█████████████████████████▉       | 39225/50000 [7:06:56<1:49:23,  1.64it/s]


 78%|█████████████████████████▉       | 39226/50000 [7:06:56<1:50:41,  1.62it/s]


 78%|█████████████████████████▉       | 39227/50000 [7:06:57<1:56:28,  1.54it/s]


 78%|█████████████████████████▉       | 39228/50000 [7:06:58<1:52:55,  1.59it/s]


 78%|█████████████████████████▉       | 39229/50000 [7:06:58<1:54:51,  1.56it/s]


 78%|█████████████████████████▉       | 39230/50000 [7:06:59<1:51:46,  1.61it/s]


 78%|█████████████████████████▉       | 39231/50000 [7:06:59<1:53:01,  1.59it/s]


 78%|█████████████████████████▉       | 39232/50000 [7:07:00<1:48:26,  1.65it/s]


 78%|█████████████████████████▉       | 39233/50000 [7:07:01<1:50:18,  1.63it/s]


 78%|█████████████████████████▉       | 39234/50000 [7:07:01<1:47:13,  1.67it/s]


 78%|█████████████████████████▉       | 39235/50000 [7:07:02<2:00:06,  1.49it/s]


 78%|█████████████████████████▉       | 39236/50000 [7:07:03<2:00:55,  1.48it/s]


 78%|█████████████████████████▉       | 39237/50000 [7:07:03<1:56:38,  1.54it/s]


 78%|█████████████████████████▉       | 39238/50000 [7:07:04<1:57:46,  1.52it/s]


 78%|█████████████████████████▉       | 39239/50000 [7:07:05<1:56:29,  1.54it/s]


 78%|█████████████████████████▉       | 39240/50000 [7:07:05<1:51:30,  1.61it/s]


 78%|█████████████████████████▉       | 39241/50000 [7:07:06<1:49:23,  1.64it/s]


 78%|█████████████████████████▉       | 39242/50000 [7:07:06<1:45:53,  1.69it/s]


 78%|█████████████████████████▉       | 39243/50000 [7:07:07<1:44:25,  1.72it/s]


 78%|█████████████████████████▉       | 39244/50000 [7:07:07<1:43:44,  1.73it/s]


 78%|█████████████████████████▉       | 39245/50000 [7:07:08<1:46:58,  1.68it/s]


 78%|█████████████████████████▉       | 39246/50000 [7:07:09<1:53:50,  1.57it/s]


 78%|█████████████████████████▉       | 39247/50000 [7:07:10<2:05:16,  1.43it/s]


 78%|█████████████████████████▉       | 39248/50000 [7:07:10<2:08:40,  1.39it/s]


 78%|█████████████████████████▉       | 39249/50000 [7:07:11<2:13:56,  1.34it/s]


 78%|█████████████████████████▉       | 39250/50000 [7:07:12<2:10:09,  1.38it/s]


 79%|█████████████████████████▉       | 39251/50000 [7:07:13<2:04:47,  1.44it/s]


 79%|█████████████████████████▉       | 39252/50000 [7:07:13<1:58:17,  1.51it/s]


 79%|█████████████████████████▉       | 39253/50000 [7:07:14<2:00:53,  1.48it/s]


 79%|█████████████████████████▉       | 39254/50000 [7:07:15<1:59:59,  1.49it/s]


 79%|█████████████████████████▉       | 39255/50000 [7:07:15<1:54:37,  1.56it/s]


 79%|█████████████████████████▉       | 39256/50000 [7:07:16<1:50:11,  1.63it/s]


 79%|█████████████████████████▉       | 39257/50000 [7:07:16<1:53:52,  1.57it/s]


 79%|█████████████████████████▉       | 39258/50000 [7:07:17<1:49:19,  1.64it/s]


 79%|█████████████████████████▉       | 39259/50000 [7:07:17<1:46:07,  1.69it/s]


 79%|█████████████████████████▉       | 39260/50000 [7:07:18<1:45:12,  1.70it/s]


 79%|█████████████████████████▉       | 39261/50000 [7:07:19<1:46:15,  1.68it/s]


 79%|█████████████████████████▉       | 39262/50000 [7:07:19<1:46:19,  1.68it/s]


 79%|█████████████████████████▉       | 39263/50000 [7:07:20<1:54:19,  1.57it/s]


 79%|█████████████████████████▉       | 39264/50000 [7:07:20<1:49:16,  1.64it/s]


 79%|█████████████████████████▉       | 39265/50000 [7:07:21<1:48:28,  1.65it/s]


 79%|█████████████████████████▉       | 39266/50000 [7:07:22<1:50:32,  1.62it/s]


 79%|█████████████████████████▉       | 39267/50000 [7:07:22<1:56:22,  1.54it/s]


 79%|█████████████████████████▉       | 39268/50000 [7:07:23<1:56:06,  1.54it/s]


 79%|█████████████████████████▉       | 39269/50000 [7:07:24<1:52:47,  1.59it/s]


 79%|█████████████████████████▉       | 39270/50000 [7:07:24<1:53:35,  1.57it/s]


 79%|█████████████████████████▉       | 39271/50000 [7:07:25<1:52:05,  1.60it/s]


 79%|█████████████████████████▉       | 39272/50000 [7:07:26<1:59:20,  1.50it/s]


 79%|█████████████████████████▉       | 39273/50000 [7:07:26<2:00:21,  1.49it/s]


 79%|█████████████████████████▉       | 39274/50000 [7:07:27<2:03:50,  1.44it/s]


 79%|█████████████████████████▉       | 39275/50000 [7:07:28<2:00:17,  1.49it/s]


 79%|█████████████████████████▉       | 39276/50000 [7:07:29<2:09:19,  1.38it/s]


 79%|█████████████████████████▉       | 39277/50000 [7:07:29<2:08:31,  1.39it/s]


 79%|█████████████████████████▉       | 39278/50000 [7:07:30<2:03:53,  1.44it/s]


 79%|█████████████████████████▉       | 39279/50000 [7:07:31<1:57:56,  1.51it/s]


 79%|█████████████████████████▉       | 39280/50000 [7:07:32<2:17:19,  1.30it/s]


 79%|█████████████████████████▉       | 39281/50000 [7:07:32<2:03:47,  1.44it/s]


 79%|█████████████████████████▉       | 39282/50000 [7:07:33<2:01:14,  1.47it/s]


 79%|█████████████████████████▉       | 39283/50000 [7:07:33<1:56:04,  1.54it/s]


 79%|█████████████████████████▉       | 39284/50000 [7:07:34<1:48:34,  1.64it/s]


 79%|█████████████████████████▉       | 39285/50000 [7:07:34<1:46:11,  1.68it/s]


 79%|█████████████████████████▉       | 39286/50000 [7:07:35<1:49:33,  1.63it/s]


 79%|█████████████████████████▉       | 39287/50000 [7:07:36<1:48:55,  1.64it/s]


 79%|█████████████████████████▉       | 39288/50000 [7:07:36<1:52:51,  1.58it/s]


 79%|█████████████████████████▉       | 39289/50000 [7:07:37<1:51:10,  1.61it/s]


 79%|█████████████████████████▉       | 39290/50000 [7:07:37<1:43:48,  1.72it/s]


 79%|█████████████████████████▉       | 39291/50000 [7:07:38<1:51:24,  1.60it/s]


 79%|█████████████████████████▉       | 39292/50000 [7:07:39<1:48:58,  1.64it/s]


 79%|█████████████████████████▉       | 39293/50000 [7:07:40<2:02:28,  1.46it/s]


 79%|█████████████████████████▉       | 39294/50000 [7:07:40<2:05:09,  1.43it/s]


 79%|█████████████████████████▉       | 39295/50000 [7:07:41<1:55:33,  1.54it/s]


 79%|█████████████████████████▉       | 39296/50000 [7:07:41<1:55:45,  1.54it/s]


 79%|█████████████████████████▉       | 39297/50000 [7:07:42<1:53:11,  1.58it/s]


 79%|█████████████████████████▉       | 39298/50000 [7:07:43<1:55:50,  1.54it/s]


 79%|█████████████████████████▉       | 39299/50000 [7:07:43<1:54:57,  1.55it/s]


 79%|█████████████████████████▉       | 39300/50000 [7:07:44<1:55:19,  1.55it/s]
                                                                                
{'loss': 3.1871, 'grad_norm': 3.0813848972320557, 'learning_rate': 0.000214, 'epoch': 2.06}

 79%|█████████████████████████▉       | 39300/50000 [7:07:44<1:55:19,  1.55it/s]


 79%|█████████████████████████▉       | 39301/50000 [7:07:45<1:48:21,  1.65it/s]


 79%|█████████████████████████▉       | 39302/50000 [7:07:45<1:42:09,  1.75it/s]


 79%|█████████████████████████▉       | 39303/50000 [7:07:46<1:43:39,  1.72it/s]


 79%|█████████████████████████▉       | 39304/50000 [7:07:46<1:50:27,  1.61it/s]


 79%|█████████████████████████▉       | 39305/50000 [7:07:47<1:55:42,  1.54it/s]


 79%|█████████████████████████▉       | 39306/50000 [7:07:48<1:52:49,  1.58it/s]


 79%|█████████████████████████▉       | 39307/50000 [7:07:48<1:49:27,  1.63it/s]


 79%|█████████████████████████▉       | 39308/50000 [7:07:49<1:52:48,  1.58it/s]


 79%|█████████████████████████▉       | 39309/50000 [7:07:50<1:55:09,  1.55it/s]


 79%|█████████████████████████▉       | 39310/50000 [7:07:50<1:58:31,  1.50it/s]


 79%|█████████████████████████▉       | 39311/50000 [7:07:51<2:08:50,  1.38it/s]


 79%|█████████████████████████▉       | 39312/50000 [7:07:52<2:05:53,  1.42it/s]


 79%|█████████████████████████▉       | 39313/50000 [7:07:52<2:00:01,  1.48it/s]


 79%|█████████████████████████▉       | 39314/50000 [7:07:53<2:08:22,  1.39it/s]


 79%|█████████████████████████▉       | 39315/50000 [7:07:54<2:08:15,  1.39it/s]


 79%|█████████████████████████▉       | 39316/50000 [7:07:55<2:04:25,  1.43it/s]


 79%|█████████████████████████▉       | 39317/50000 [7:07:55<2:02:23,  1.45it/s]


 79%|█████████████████████████▉       | 39318/50000 [7:07:56<1:57:33,  1.51it/s]


 79%|█████████████████████████▉       | 39319/50000 [7:07:57<1:56:52,  1.52it/s]


 79%|█████████████████████████▉       | 39320/50000 [7:07:57<1:57:23,  1.52it/s]


 79%|█████████████████████████▉       | 39321/50000 [7:07:58<1:53:03,  1.57it/s]


 79%|█████████████████████████▉       | 39322/50000 [7:07:59<1:58:55,  1.50it/s]


 79%|█████████████████████████▉       | 39323/50000 [7:07:59<1:58:18,  1.50it/s]


 79%|█████████████████████████▉       | 39324/50000 [7:08:00<1:57:50,  1.51it/s]


 79%|█████████████████████████▉       | 39325/50000 [7:08:00<1:50:10,  1.61it/s]


 79%|█████████████████████████▉       | 39326/50000 [7:08:01<1:48:36,  1.64it/s]


 79%|█████████████████████████▉       | 39327/50000 [7:08:02<1:47:33,  1.65it/s]


 79%|█████████████████████████▉       | 39328/50000 [7:08:02<1:59:22,  1.49it/s]


 79%|█████████████████████████▉       | 39329/50000 [7:08:03<1:57:48,  1.51it/s]


 79%|█████████████████████████▉       | 39330/50000 [7:08:04<1:49:48,  1.62it/s]


 79%|█████████████████████████▉       | 39331/50000 [7:08:04<2:00:09,  1.48it/s]


 79%|█████████████████████████▉       | 39332/50000 [7:08:05<1:54:16,  1.56it/s]


 79%|█████████████████████████▉       | 39333/50000 [7:08:05<1:50:09,  1.61it/s]


 79%|█████████████████████████▉       | 39334/50000 [7:08:06<2:05:05,  1.42it/s]


 79%|█████████████████████████▉       | 39335/50000 [7:08:07<1:55:25,  1.54it/s]


 79%|█████████████████████████▉       | 39336/50000 [7:08:07<1:52:49,  1.58it/s]


 79%|█████████████████████████▉       | 39337/50000 [7:08:08<1:53:02,  1.57it/s]


 79%|█████████████████████████▉       | 39338/50000 [7:08:09<1:52:52,  1.57it/s]


 79%|█████████████████████████▉       | 39339/50000 [7:08:09<1:49:24,  1.62it/s]


 79%|█████████████████████████▉       | 39340/50000 [7:08:10<1:52:21,  1.58it/s]


 79%|█████████████████████████▉       | 39341/50000 [7:08:11<2:12:05,  1.34it/s]


 79%|█████████████████████████▉       | 39342/50000 [7:08:12<2:11:41,  1.35it/s]


 79%|█████████████████████████▉       | 39343/50000 [7:08:12<2:02:52,  1.45it/s]


 79%|█████████████████████████▉       | 39344/50000 [7:08:13<1:59:44,  1.48it/s]


 79%|█████████████████████████▉       | 39345/50000 [7:08:13<1:51:19,  1.60it/s]


 79%|█████████████████████████▉       | 39346/50000 [7:08:14<1:48:57,  1.63it/s]


 79%|█████████████████████████▉       | 39347/50000 [7:08:15<1:46:17,  1.67it/s]


 79%|█████████████████████████▉       | 39348/50000 [7:08:16<2:02:57,  1.44it/s]


 79%|█████████████████████████▉       | 39349/50000 [7:08:16<1:58:05,  1.50it/s]


 79%|█████████████████████████▉       | 39350/50000 [7:08:17<2:00:56,  1.47it/s]


 79%|█████████████████████████▉       | 39351/50000 [7:08:17<1:56:00,  1.53it/s]


 79%|█████████████████████████▉       | 39352/50000 [7:08:18<1:53:31,  1.56it/s]


 79%|█████████████████████████▉       | 39353/50000 [7:08:19<1:53:51,  1.56it/s]


 79%|█████████████████████████▉       | 39354/50000 [7:08:19<1:50:02,  1.61it/s]


 79%|█████████████████████████▉       | 39355/50000 [7:08:20<1:58:36,  1.50it/s]


 79%|█████████████████████████▉       | 39356/50000 [7:08:21<1:55:21,  1.54it/s]


 79%|█████████████████████████▉       | 39357/50000 [7:08:21<1:54:44,  1.55it/s]


 79%|█████████████████████████▉       | 39358/50000 [7:08:22<2:00:50,  1.47it/s]


 79%|█████████████████████████▉       | 39359/50000 [7:08:23<1:52:21,  1.58it/s]


 79%|█████████████████████████▉       | 39360/50000 [7:08:23<1:49:37,  1.62it/s]


 79%|█████████████████████████▉       | 39361/50000 [7:08:24<1:55:14,  1.54it/s]


 79%|█████████████████████████▉       | 39362/50000 [7:08:25<2:08:48,  1.38it/s]


 79%|█████████████████████████▉       | 39363/50000 [7:08:25<2:00:29,  1.47it/s]


 79%|█████████████████████████▉       | 39364/50000 [7:08:26<1:57:39,  1.51it/s]


 79%|█████████████████████████▉       | 39365/50000 [7:08:27<1:54:20,  1.55it/s]


 79%|█████████████████████████▉       | 39366/50000 [7:08:27<1:56:24,  1.52it/s]


 79%|█████████████████████████▉       | 39367/50000 [7:08:28<1:52:47,  1.57it/s]


 79%|█████████████████████████▉       | 39368/50000 [7:08:28<1:46:43,  1.66it/s]


 79%|█████████████████████████▉       | 39369/50000 [7:08:29<1:49:41,  1.62it/s]


 79%|█████████████████████████▉       | 39370/50000 [7:08:30<1:55:37,  1.53it/s]


 79%|█████████████████████████▉       | 39371/50000 [7:08:30<1:55:12,  1.54it/s]


 79%|█████████████████████████▉       | 39372/50000 [7:08:31<1:51:16,  1.59it/s]


 79%|█████████████████████████▉       | 39373/50000 [7:08:32<1:48:10,  1.64it/s]


 79%|█████████████████████████▉       | 39374/50000 [7:08:32<1:47:20,  1.65it/s]


 79%|█████████████████████████▉       | 39375/50000 [7:08:33<1:49:06,  1.62it/s]


 79%|█████████████████████████▉       | 39376/50000 [7:08:33<1:51:40,  1.59it/s]


 79%|█████████████████████████▉       | 39377/50000 [7:08:34<1:49:26,  1.62it/s]


 79%|█████████████████████████▉       | 39378/50000 [7:08:35<1:56:04,  1.53it/s]


 79%|█████████████████████████▉       | 39379/50000 [7:08:35<1:50:52,  1.60it/s]


 79%|█████████████████████████▉       | 39380/50000 [7:08:36<2:02:17,  1.45it/s]


 79%|█████████████████████████▉       | 39381/50000 [7:08:37<1:55:12,  1.54it/s]


 79%|█████████████████████████▉       | 39382/50000 [7:08:37<1:54:54,  1.54it/s]


 79%|█████████████████████████▉       | 39383/50000 [7:08:38<1:58:24,  1.49it/s]


 79%|█████████████████████████▉       | 39384/50000 [7:08:39<1:57:07,  1.51it/s]


 79%|█████████████████████████▉       | 39385/50000 [7:08:39<1:59:56,  1.48it/s]


 79%|█████████████████████████▉       | 39386/50000 [7:08:40<1:56:09,  1.52it/s]


 79%|█████████████████████████▉       | 39387/50000 [7:08:41<1:52:52,  1.57it/s]


 79%|█████████████████████████▉       | 39388/50000 [7:08:41<1:53:34,  1.56it/s]


 79%|█████████████████████████▉       | 39389/50000 [7:08:42<2:07:21,  1.39it/s]


 79%|█████████████████████████▉       | 39390/50000 [7:08:43<1:56:41,  1.52it/s]


 79%|█████████████████████████▉       | 39391/50000 [7:08:43<1:48:49,  1.62it/s]


 79%|█████████████████████████▉       | 39392/50000 [7:08:44<1:47:11,  1.65it/s]


 79%|█████████████████████████▉       | 39393/50000 [7:08:44<1:45:22,  1.68it/s]


 79%|██████████████████████████       | 39394/50000 [7:08:45<1:47:46,  1.64it/s]


 79%|██████████████████████████       | 39395/50000 [7:08:46<1:49:08,  1.62it/s]


 79%|██████████████████████████       | 39396/50000 [7:08:46<1:51:05,  1.59it/s]


 79%|██████████████████████████       | 39397/50000 [7:08:47<2:01:31,  1.45it/s]


 79%|██████████████████████████       | 39398/50000 [7:08:48<1:52:12,  1.57it/s]


 79%|██████████████████████████       | 39399/50000 [7:08:48<1:50:11,  1.60it/s]


 79%|██████████████████████████       | 39400/50000 [7:08:49<1:49:02,  1.62it/s]
                                                                                
{'loss': 3.1955, 'grad_norm': 2.6454498767852783, 'learning_rate': 0.000212, 'epoch': 2.06}

 79%|██████████████████████████       | 39400/50000 [7:08:49<1:49:02,  1.62it/s]


 79%|██████████████████████████       | 39401/50000 [7:08:49<1:46:00,  1.67it/s]


 79%|██████████████████████████       | 39402/50000 [7:08:50<1:53:04,  1.56it/s]


 79%|██████████████████████████       | 39403/50000 [7:08:51<1:48:50,  1.62it/s]


 79%|██████████████████████████       | 39404/50000 [7:08:51<1:55:01,  1.54it/s]


 79%|██████████████████████████       | 39405/50000 [7:08:52<1:54:11,  1.55it/s]


 79%|██████████████████████████       | 39406/50000 [7:08:53<1:55:16,  1.53it/s]


 79%|██████████████████████████       | 39407/50000 [7:08:53<1:52:21,  1.57it/s]


 79%|██████████████████████████       | 39408/50000 [7:08:54<1:47:57,  1.64it/s]


 79%|██████████████████████████       | 39409/50000 [7:08:55<1:51:15,  1.59it/s]


 79%|██████████████████████████       | 39410/50000 [7:08:55<1:47:32,  1.64it/s]


 79%|██████████████████████████       | 39411/50000 [7:08:56<1:49:56,  1.61it/s]


 79%|██████████████████████████       | 39412/50000 [7:08:56<1:53:18,  1.56it/s]


 79%|██████████████████████████       | 39413/50000 [7:08:57<1:59:08,  1.48it/s]


 79%|██████████████████████████       | 39414/50000 [7:08:58<1:52:22,  1.57it/s]


 79%|██████████████████████████       | 39415/50000 [7:08:58<1:52:57,  1.56it/s]


 79%|██████████████████████████       | 39416/50000 [7:08:59<2:04:34,  1.42it/s]


 79%|██████████████████████████       | 39417/50000 [7:09:00<1:58:28,  1.49it/s]


 79%|██████████████████████████       | 39418/50000 [7:09:01<2:06:18,  1.40it/s]


 79%|██████████████████████████       | 39419/50000 [7:09:01<2:03:25,  1.43it/s]


 79%|██████████████████████████       | 39420/50000 [7:09:02<1:53:29,  1.55it/s]


 79%|██████████████████████████       | 39421/50000 [7:09:03<2:00:55,  1.46it/s]


 79%|██████████████████████████       | 39422/50000 [7:09:03<2:01:02,  1.46it/s]


 79%|██████████████████████████       | 39423/50000 [7:09:04<1:58:17,  1.49it/s]


 79%|██████████████████████████       | 39424/50000 [7:09:05<1:58:22,  1.49it/s]


 79%|██████████████████████████       | 39425/50000 [7:09:05<1:54:34,  1.54it/s]


 79%|██████████████████████████       | 39426/50000 [7:09:06<1:57:58,  1.49it/s]


 79%|██████████████████████████       | 39427/50000 [7:09:07<2:01:29,  1.45it/s]


 79%|██████████████████████████       | 39428/50000 [7:09:07<1:54:05,  1.54it/s]


 79%|██████████████████████████       | 39429/50000 [7:09:08<1:47:42,  1.64it/s]


 79%|██████████████████████████       | 39430/50000 [7:09:09<1:56:38,  1.51it/s]


 79%|██████████████████████████       | 39431/50000 [7:09:09<1:52:59,  1.56it/s]


 79%|██████████████████████████       | 39432/50000 [7:09:10<2:04:36,  1.41it/s]


 79%|██████████████████████████       | 39433/50000 [7:09:11<2:13:01,  1.32it/s]


 79%|██████████████████████████       | 39434/50000 [7:09:12<2:07:30,  1.38it/s]


 79%|██████████████████████████       | 39435/50000 [7:09:12<2:09:28,  1.36it/s]


 79%|██████████████████████████       | 39436/50000 [7:09:13<2:00:58,  1.46it/s]


 79%|██████████████████████████       | 39437/50000 [7:09:13<1:53:41,  1.55it/s]


 79%|██████████████████████████       | 39438/50000 [7:09:14<1:59:41,  1.47it/s]


 79%|██████████████████████████       | 39439/50000 [7:09:15<1:55:08,  1.53it/s]


 79%|██████████████████████████       | 39440/50000 [7:09:16<2:00:47,  1.46it/s]


 79%|██████████████████████████       | 39441/50000 [7:09:16<1:52:01,  1.57it/s]


 79%|██████████████████████████       | 39442/50000 [7:09:17<1:48:43,  1.62it/s]


 79%|██████████████████████████       | 39443/50000 [7:09:17<1:47:06,  1.64it/s]


 79%|██████████████████████████       | 39444/50000 [7:09:18<1:53:26,  1.55it/s]


 79%|██████████████████████████       | 39445/50000 [7:09:19<1:54:24,  1.54it/s]


 79%|██████████████████████████       | 39446/50000 [7:09:19<1:54:43,  1.53it/s]


 79%|██████████████████████████       | 39447/50000 [7:09:20<1:59:26,  1.47it/s]


 79%|██████████████████████████       | 39448/50000 [7:09:21<1:59:37,  1.47it/s]


 79%|██████████████████████████       | 39449/50000 [7:09:21<2:02:53,  1.43it/s]


 79%|██████████████████████████       | 39450/50000 [7:09:22<1:59:21,  1.47it/s]


 79%|██████████████████████████       | 39451/50000 [7:09:23<1:49:41,  1.60it/s]


 79%|██████████████████████████       | 39452/50000 [7:09:23<1:50:25,  1.59it/s]


 79%|██████████████████████████       | 39453/50000 [7:09:24<1:52:15,  1.57it/s]


 79%|██████████████████████████       | 39454/50000 [7:09:25<1:56:41,  1.51it/s]


 79%|██████████████████████████       | 39455/50000 [7:09:25<1:56:54,  1.50it/s]


 79%|██████████████████████████       | 39456/50000 [7:09:26<1:53:59,  1.54it/s]


 79%|██████████████████████████       | 39457/50000 [7:09:27<1:57:20,  1.50it/s]


 79%|██████████████████████████       | 39458/50000 [7:09:27<1:52:59,  1.56it/s]


 79%|██████████████████████████       | 39459/50000 [7:09:28<1:53:15,  1.55it/s]


 79%|██████████████████████████       | 39460/50000 [7:09:29<2:00:38,  1.46it/s]


 79%|██████████████████████████       | 39461/50000 [7:09:29<2:07:57,  1.37it/s]


 79%|██████████████████████████       | 39462/50000 [7:09:30<2:04:01,  1.42it/s]


 79%|██████████████████████████       | 39463/50000 [7:09:31<2:07:29,  1.38it/s]


 79%|██████████████████████████       | 39464/50000 [7:09:32<2:08:12,  1.37it/s]


 79%|██████████████████████████       | 39465/50000 [7:09:32<2:05:01,  1.40it/s]


 79%|██████████████████████████       | 39466/50000 [7:09:33<2:06:15,  1.39it/s]


 79%|██████████████████████████       | 39467/50000 [7:09:34<2:01:32,  1.44it/s]


 79%|██████████████████████████       | 39468/50000 [7:09:34<2:01:15,  1.45it/s]


 79%|██████████████████████████       | 39469/50000 [7:09:35<1:56:38,  1.50it/s]


 79%|██████████████████████████       | 39470/50000 [7:09:36<2:00:35,  1.46it/s]


 79%|██████████████████████████       | 39471/50000 [7:09:36<1:58:32,  1.48it/s]


 79%|██████████████████████████       | 39472/50000 [7:09:37<1:49:17,  1.61it/s]


 79%|██████████████████████████       | 39473/50000 [7:09:37<1:46:37,  1.65it/s]


 79%|██████████████████████████       | 39474/50000 [7:09:38<1:49:04,  1.61it/s]


 79%|██████████████████████████       | 39475/50000 [7:09:39<1:54:17,  1.53it/s]


 79%|██████████████████████████       | 39476/50000 [7:09:39<1:50:27,  1.59it/s]


 79%|██████████████████████████       | 39477/50000 [7:09:40<1:49:24,  1.60it/s]


 79%|██████████████████████████       | 39478/50000 [7:09:41<1:51:41,  1.57it/s]


 79%|██████████████████████████       | 39479/50000 [7:09:41<1:49:16,  1.60it/s]


 79%|██████████████████████████       | 39480/50000 [7:09:42<2:00:15,  1.46it/s]


 79%|██████████████████████████       | 39481/50000 [7:09:43<2:01:38,  1.44it/s]


 79%|██████████████████████████       | 39482/50000 [7:09:44<2:09:35,  1.35it/s]


 79%|██████████████████████████       | 39483/50000 [7:09:44<2:03:59,  1.41it/s]


 79%|██████████████████████████       | 39484/50000 [7:09:45<2:02:23,  1.43it/s]


 79%|██████████████████████████       | 39485/50000 [7:09:45<1:56:36,  1.50it/s]


 79%|██████████████████████████       | 39486/50000 [7:09:46<1:55:53,  1.51it/s]


 79%|██████████████████████████       | 39487/50000 [7:09:47<1:52:23,  1.56it/s]


 79%|██████████████████████████       | 39488/50000 [7:09:48<2:03:00,  1.42it/s]


 79%|██████████████████████████       | 39489/50000 [7:09:48<1:59:50,  1.46it/s]


 79%|██████████████████████████       | 39490/50000 [7:09:49<2:08:18,  1.37it/s]


 79%|██████████████████████████       | 39491/50000 [7:09:50<2:10:22,  1.34it/s]


 79%|██████████████████████████       | 39492/50000 [7:09:51<2:06:18,  1.39it/s]


 79%|██████████████████████████       | 39493/50000 [7:09:51<1:59:29,  1.47it/s]


 79%|██████████████████████████       | 39494/50000 [7:09:52<2:02:51,  1.43it/s]


 79%|██████████████████████████       | 39495/50000 [7:09:53<2:01:55,  1.44it/s]


 79%|██████████████████████████       | 39496/50000 [7:09:53<2:10:43,  1.34it/s]


 79%|██████████████████████████       | 39497/50000 [7:09:54<2:11:01,  1.34it/s]


 79%|██████████████████████████       | 39498/50000 [7:09:55<1:59:16,  1.47it/s]


 79%|██████████████████████████       | 39499/50000 [7:09:55<2:03:40,  1.42it/s]


 79%|██████████████████████████       | 39500/50000 [7:09:56<1:59:58,  1.46it/s]
                                                                                
{'loss': 3.139, 'grad_norm': 3.0655133724212646, 'learning_rate': 0.00021, 'epoch': 2.07}

 79%|██████████████████████████       | 39500/50000 [7:09:56<1:59:58,  1.46it/s]


 79%|██████████████████████████       | 39501/50000 [7:09:57<2:11:24,  1.33it/s]


 79%|██████████████████████████       | 39502/50000 [7:09:58<2:11:14,  1.33it/s]


 79%|██████████████████████████       | 39503/50000 [7:09:58<2:03:32,  1.42it/s]


 79%|██████████████████████████       | 39504/50000 [7:09:59<2:01:30,  1.44it/s]


 79%|██████████████████████████       | 39505/50000 [7:10:00<1:55:43,  1.51it/s]


 79%|██████████████████████████       | 39506/50000 [7:10:00<2:01:50,  1.44it/s]


 79%|██████████████████████████       | 39507/50000 [7:10:01<1:56:57,  1.50it/s]


 79%|██████████████████████████       | 39508/50000 [7:10:02<1:59:42,  1.46it/s]


 79%|██████████████████████████       | 39509/50000 [7:10:02<2:04:56,  1.40it/s]


 79%|██████████████████████████       | 39510/50000 [7:10:03<2:01:48,  1.44it/s]


 79%|██████████████████████████       | 39511/50000 [7:10:04<2:06:24,  1.38it/s]


 79%|██████████████████████████       | 39512/50000 [7:10:04<1:59:10,  1.47it/s]


 79%|██████████████████████████       | 39513/50000 [7:10:05<1:59:04,  1.47it/s]


 79%|██████████████████████████       | 39514/50000 [7:10:06<1:57:20,  1.49it/s]


 79%|██████████████████████████       | 39515/50000 [7:10:06<1:52:11,  1.56it/s]


 79%|██████████████████████████       | 39516/50000 [7:10:07<1:45:49,  1.65it/s]


 79%|██████████████████████████       | 39517/50000 [7:10:08<1:45:48,  1.65it/s]


 79%|██████████████████████████       | 39518/50000 [7:10:08<1:44:14,  1.68it/s]


 79%|██████████████████████████       | 39519/50000 [7:10:09<1:55:48,  1.51it/s]


 79%|██████████████████████████       | 39520/50000 [7:10:09<1:50:44,  1.58it/s]


 79%|██████████████████████████       | 39521/50000 [7:10:10<1:52:50,  1.55it/s]


 79%|██████████████████████████       | 39522/50000 [7:10:11<1:50:37,  1.58it/s]


 79%|██████████████████████████       | 39523/50000 [7:10:11<1:47:12,  1.63it/s]


 79%|██████████████████████████       | 39524/50000 [7:10:12<1:46:18,  1.64it/s]


 79%|██████████████████████████       | 39525/50000 [7:10:13<1:51:47,  1.56it/s]


 79%|██████████████████████████       | 39526/50000 [7:10:14<2:07:58,  1.36it/s]


 79%|██████████████████████████       | 39527/50000 [7:10:14<1:59:37,  1.46it/s]


 79%|██████████████████████████       | 39528/50000 [7:10:15<1:54:45,  1.52it/s]


 79%|██████████████████████████       | 39529/50000 [7:10:15<1:51:01,  1.57it/s]


 79%|██████████████████████████       | 39530/50000 [7:10:16<2:01:21,  1.44it/s]


 79%|██████████████████████████       | 39531/50000 [7:10:17<1:58:56,  1.47it/s]


 79%|██████████████████████████       | 39532/50000 [7:10:17<1:55:13,  1.51it/s]


 79%|██████████████████████████       | 39533/50000 [7:10:18<1:58:00,  1.48it/s]


 79%|██████████████████████████       | 39534/50000 [7:10:19<1:53:55,  1.53it/s]


 79%|██████████████████████████       | 39535/50000 [7:10:19<1:54:05,  1.53it/s]


 79%|██████████████████████████       | 39536/50000 [7:10:20<1:51:36,  1.56it/s]


 79%|██████████████████████████       | 39537/50000 [7:10:21<1:52:52,  1.54it/s]


 79%|██████████████████████████       | 39538/50000 [7:10:21<1:49:44,  1.59it/s]


 79%|██████████████████████████       | 39539/50000 [7:10:22<1:49:52,  1.59it/s]


 79%|██████████████████████████       | 39540/50000 [7:10:23<1:52:17,  1.55it/s]


 79%|██████████████████████████       | 39541/50000 [7:10:23<1:45:49,  1.65it/s]


 79%|██████████████████████████       | 39542/50000 [7:10:24<1:53:42,  1.53it/s]


 79%|██████████████████████████       | 39543/50000 [7:10:24<1:46:57,  1.63it/s]


 79%|██████████████████████████       | 39544/50000 [7:10:25<1:46:13,  1.64it/s]


 79%|██████████████████████████       | 39545/50000 [7:10:26<1:48:20,  1.61it/s]


 79%|██████████████████████████       | 39546/50000 [7:10:26<1:51:51,  1.56it/s]


 79%|██████████████████████████       | 39547/50000 [7:10:27<1:54:19,  1.52it/s]


 79%|██████████████████████████       | 39548/50000 [7:10:28<1:50:23,  1.58it/s]


 79%|██████████████████████████       | 39549/50000 [7:10:28<1:49:57,  1.58it/s]


 79%|██████████████████████████       | 39550/50000 [7:10:29<1:55:07,  1.51it/s]


 79%|██████████████████████████       | 39551/50000 [7:10:30<1:51:27,  1.56it/s]


 79%|██████████████████████████       | 39552/50000 [7:10:30<1:58:56,  1.46it/s]


 79%|██████████████████████████       | 39553/50000 [7:10:31<1:54:14,  1.52it/s]


 79%|██████████████████████████       | 39554/50000 [7:10:32<1:53:28,  1.53it/s]


 79%|██████████████████████████       | 39555/50000 [7:10:32<1:57:02,  1.49it/s]


 79%|██████████████████████████       | 39556/50000 [7:10:33<2:09:28,  1.34it/s]


 79%|██████████████████████████       | 39557/50000 [7:10:34<2:03:58,  1.40it/s]


 79%|██████████████████████████       | 39558/50000 [7:10:34<2:00:25,  1.45it/s]


 79%|██████████████████████████       | 39559/50000 [7:10:35<1:56:57,  1.49it/s]


 79%|██████████████████████████       | 39560/50000 [7:10:36<1:51:35,  1.56it/s]


 79%|██████████████████████████       | 39561/50000 [7:10:36<1:44:55,  1.66it/s]


 79%|██████████████████████████       | 39562/50000 [7:10:37<1:48:44,  1.60it/s]


 79%|██████████████████████████       | 39563/50000 [7:10:37<1:44:46,  1.66it/s]


 79%|██████████████████████████       | 39564/50000 [7:10:38<1:51:01,  1.57it/s]


 79%|██████████████████████████       | 39565/50000 [7:10:39<1:50:57,  1.57it/s]


 79%|██████████████████████████       | 39566/50000 [7:10:40<1:56:35,  1.49it/s]


 79%|██████████████████████████       | 39567/50000 [7:10:40<1:54:26,  1.52it/s]


 79%|██████████████████████████       | 39568/50000 [7:10:41<1:49:37,  1.59it/s]


 79%|██████████████████████████       | 39569/50000 [7:10:41<1:47:14,  1.62it/s]


 79%|██████████████████████████       | 39570/50000 [7:10:42<1:46:14,  1.64it/s]


 79%|██████████████████████████       | 39571/50000 [7:10:43<1:48:51,  1.60it/s]


 79%|██████████████████████████       | 39572/50000 [7:10:43<1:46:03,  1.64it/s]


 79%|██████████████████████████       | 39573/50000 [7:10:44<1:47:04,  1.62it/s]


 79%|██████████████████████████       | 39574/50000 [7:10:44<1:44:47,  1.66it/s]


 79%|██████████████████████████       | 39575/50000 [7:10:45<1:44:48,  1.66it/s]


 79%|██████████████████████████       | 39576/50000 [7:10:46<1:46:50,  1.63it/s]


 79%|██████████████████████████       | 39577/50000 [7:10:46<1:43:38,  1.68it/s]


 79%|██████████████████████████       | 39578/50000 [7:10:47<1:41:44,  1.71it/s]


 79%|██████████████████████████       | 39579/50000 [7:10:47<1:42:07,  1.70it/s]


 79%|██████████████████████████       | 39580/50000 [7:10:48<1:41:20,  1.71it/s]


 79%|██████████████████████████       | 39581/50000 [7:10:48<1:41:44,  1.71it/s]


 79%|██████████████████████████       | 39582/50000 [7:10:49<1:42:41,  1.69it/s]


 79%|██████████████████████████       | 39583/50000 [7:10:50<1:46:59,  1.62it/s]


 79%|██████████████████████████▏      | 39584/50000 [7:10:50<1:48:40,  1.60it/s]


 79%|██████████████████████████▏      | 39585/50000 [7:10:51<1:53:30,  1.53it/s]


 79%|██████████████████████████▏      | 39586/50000 [7:10:52<1:51:54,  1.55it/s]


 79%|██████████████████████████▏      | 39587/50000 [7:10:52<1:51:44,  1.55it/s]


 79%|██████████████████████████▏      | 39588/50000 [7:10:53<1:52:38,  1.54it/s]


 79%|██████████████████████████▏      | 39589/50000 [7:10:54<1:52:34,  1.54it/s]


 79%|██████████████████████████▏      | 39590/50000 [7:10:54<1:48:36,  1.60it/s]


 79%|██████████████████████████▏      | 39591/50000 [7:10:55<1:46:47,  1.62it/s]


 79%|██████████████████████████▏      | 39592/50000 [7:10:55<1:43:33,  1.68it/s]


 79%|██████████████████████████▏      | 39593/50000 [7:10:56<1:41:55,  1.70it/s]


 79%|██████████████████████████▏      | 39594/50000 [7:10:57<1:42:45,  1.69it/s]


 79%|██████████████████████████▏      | 39595/50000 [7:10:57<1:51:45,  1.55it/s]


 79%|██████████████████████████▏      | 39596/50000 [7:10:58<1:48:00,  1.61it/s]


 79%|██████████████████████████▏      | 39597/50000 [7:10:59<1:50:57,  1.56it/s]


 79%|██████████████████████████▏      | 39598/50000 [7:10:59<1:50:43,  1.57it/s]


 79%|██████████████████████████▏      | 39599/50000 [7:11:00<1:46:38,  1.63it/s]


 79%|██████████████████████████▏      | 39600/50000 [7:11:00<1:48:02,  1.60it/s]
                                                                                
{'loss': 3.1667, 'grad_norm': 3.024686336517334, 'learning_rate': 0.000208, 'epoch': 2.07}

 79%|██████████████████████████▏      | 39600/50000 [7:11:00<1:48:02,  1.60it/s]


 79%|██████████████████████████▏      | 39601/50000 [7:11:01<1:53:26,  1.53it/s]


 79%|██████████████████████████▏      | 39602/50000 [7:11:02<1:50:28,  1.57it/s]


 79%|██████████████████████████▏      | 39603/50000 [7:11:03<2:06:38,  1.37it/s]


 79%|██████████████████████████▏      | 39604/50000 [7:11:03<2:07:20,  1.36it/s]


 79%|██████████████████████████▏      | 39605/50000 [7:11:04<2:03:49,  1.40it/s]


 79%|██████████████████████████▏      | 39606/50000 [7:11:05<1:56:46,  1.48it/s]


 79%|██████████████████████████▏      | 39607/50000 [7:11:05<1:55:30,  1.50it/s]


 79%|██████████████████████████▏      | 39608/50000 [7:11:06<1:59:45,  1.45it/s]


 79%|██████████████████████████▏      | 39609/50000 [7:11:07<1:57:01,  1.48it/s]


 79%|██████████████████████████▏      | 39610/50000 [7:11:07<1:51:07,  1.56it/s]


 79%|██████████████████████████▏      | 39611/50000 [7:11:08<2:02:30,  1.41it/s]


 79%|██████████████████████████▏      | 39612/50000 [7:11:09<1:59:11,  1.45it/s]


 79%|██████████████████████████▏      | 39613/50000 [7:11:09<1:53:57,  1.52it/s]


 79%|██████████████████████████▏      | 39614/50000 [7:11:10<1:55:12,  1.50it/s]


 79%|██████████████████████████▏      | 39615/50000 [7:11:11<2:00:10,  1.44it/s]


 79%|██████████████████████████▏      | 39616/50000 [7:11:12<2:03:40,  1.40it/s]


 79%|██████████████████████████▏      | 39617/50000 [7:11:12<2:07:25,  1.36it/s]


 79%|██████████████████████████▏      | 39618/50000 [7:11:13<1:58:22,  1.46it/s]


 79%|██████████████████████████▏      | 39619/50000 [7:11:13<1:50:15,  1.57it/s]


 79%|██████████████████████████▏      | 39620/50000 [7:11:14<1:50:17,  1.57it/s]


 79%|██████████████████████████▏      | 39621/50000 [7:11:15<1:50:24,  1.57it/s]


 79%|██████████████████████████▏      | 39622/50000 [7:11:15<1:48:08,  1.60it/s]


 79%|██████████████████████████▏      | 39623/50000 [7:11:16<1:49:13,  1.58it/s]


 79%|██████████████████████████▏      | 39624/50000 [7:11:17<1:51:34,  1.55it/s]


 79%|██████████████████████████▏      | 39625/50000 [7:11:17<1:48:57,  1.59it/s]


 79%|██████████████████████████▏      | 39626/50000 [7:11:18<1:44:52,  1.65it/s]


 79%|██████████████████████████▏      | 39627/50000 [7:11:18<1:45:53,  1.63it/s]


 79%|██████████████████████████▏      | 39628/50000 [7:11:19<1:48:17,  1.60it/s]


 79%|██████████████████████████▏      | 39629/50000 [7:11:20<1:47:58,  1.60it/s]


 79%|██████████████████████████▏      | 39630/50000 [7:11:20<1:56:05,  1.49it/s]


 79%|██████████████████████████▏      | 39631/50000 [7:11:21<2:03:13,  1.40it/s]


 79%|██████████████████████████▏      | 39632/50000 [7:11:22<1:59:23,  1.45it/s]


 79%|██████████████████████████▏      | 39633/50000 [7:11:23<1:54:28,  1.51it/s]


 79%|██████████████████████████▏      | 39634/50000 [7:11:23<1:57:23,  1.47it/s]


 79%|██████████████████████████▏      | 39635/50000 [7:11:24<1:53:27,  1.52it/s]


 79%|██████████████████████████▏      | 39636/50000 [7:11:24<1:53:20,  1.52it/s]


 79%|██████████████████████████▏      | 39637/50000 [7:11:25<1:46:12,  1.63it/s]


 79%|██████████████████████████▏      | 39638/50000 [7:11:26<1:46:57,  1.61it/s]


 79%|██████████████████████████▏      | 39639/50000 [7:11:26<1:43:53,  1.66it/s]


 79%|██████████████████████████▏      | 39640/50000 [7:11:27<1:45:19,  1.64it/s]


 79%|██████████████████████████▏      | 39641/50000 [7:11:27<1:46:39,  1.62it/s]


 79%|██████████████████████████▏      | 39642/50000 [7:11:28<1:46:55,  1.61it/s]


 79%|██████████████████████████▏      | 39643/50000 [7:11:29<1:52:11,  1.54it/s]


 79%|██████████████████████████▏      | 39644/50000 [7:11:29<1:48:49,  1.59it/s]


 79%|██████████████████████████▏      | 39645/50000 [7:11:30<1:46:47,  1.62it/s]


 79%|██████████████████████████▏      | 39646/50000 [7:11:31<1:45:02,  1.64it/s]


 79%|██████████████████████████▏      | 39647/50000 [7:11:31<1:48:04,  1.60it/s]


 79%|██████████████████████████▏      | 39648/50000 [7:11:32<1:54:36,  1.51it/s]


 79%|██████████████████████████▏      | 39649/50000 [7:11:33<1:51:24,  1.55it/s]


 79%|██████████████████████████▏      | 39650/50000 [7:11:33<1:56:55,  1.48it/s]


 79%|██████████████████████████▏      | 39651/50000 [7:11:34<1:55:24,  1.49it/s]


 79%|██████████████████████████▏      | 39652/50000 [7:11:35<1:55:18,  1.50it/s]


 79%|██████████████████████████▏      | 39653/50000 [7:11:35<1:51:57,  1.54it/s]


 79%|██████████████████████████▏      | 39654/50000 [7:11:36<1:53:32,  1.52it/s]


 79%|██████████████████████████▏      | 39655/50000 [7:11:37<1:52:40,  1.53it/s]


 79%|██████████████████████████▏      | 39656/50000 [7:11:37<1:45:40,  1.63it/s]


 79%|██████████████████████████▏      | 39657/50000 [7:11:38<1:43:54,  1.66it/s]


 79%|██████████████████████████▏      | 39658/50000 [7:11:38<1:45:16,  1.64it/s]


 79%|██████████████████████████▏      | 39659/50000 [7:11:39<1:44:54,  1.64it/s]


 79%|██████████████████████████▏      | 39660/50000 [7:11:39<1:42:41,  1.68it/s]


 79%|██████████████████████████▏      | 39661/50000 [7:11:40<1:46:23,  1.62it/s]


 79%|██████████████████████████▏      | 39662/50000 [7:11:41<1:47:17,  1.61it/s]


 79%|██████████████████████████▏      | 39663/50000 [7:11:41<1:45:19,  1.64it/s]


 79%|██████████████████████████▏      | 39664/50000 [7:11:42<1:44:56,  1.64it/s]


 79%|██████████████████████████▏      | 39665/50000 [7:11:43<1:45:40,  1.63it/s]


 79%|██████████████████████████▏      | 39666/50000 [7:11:43<1:44:12,  1.65it/s]


 79%|██████████████████████████▏      | 39667/50000 [7:11:44<1:41:32,  1.70it/s]


 79%|██████████████████████████▏      | 39668/50000 [7:11:44<1:49:37,  1.57it/s]


 79%|██████████████████████████▏      | 39669/50000 [7:11:45<1:46:18,  1.62it/s]


 79%|██████████████████████████▏      | 39670/50000 [7:11:46<1:43:43,  1.66it/s]


 79%|██████████████████████████▏      | 39671/50000 [7:11:46<1:56:21,  1.48it/s]


 79%|██████████████████████████▏      | 39672/50000 [7:11:47<1:55:20,  1.49it/s]


 79%|██████████████████████████▏      | 39673/50000 [7:11:48<1:58:23,  1.45it/s]


 79%|██████████████████████████▏      | 39674/50000 [7:11:49<2:06:39,  1.36it/s]


 79%|██████████████████████████▏      | 39675/50000 [7:11:49<1:54:33,  1.50it/s]


 79%|██████████████████████████▏      | 39676/50000 [7:11:50<1:54:18,  1.51it/s]


 79%|██████████████████████████▏      | 39677/50000 [7:11:50<1:50:30,  1.56it/s]


 79%|██████████████████████████▏      | 39678/50000 [7:11:51<2:00:21,  1.43it/s]


 79%|██████████████████████████▏      | 39679/50000 [7:11:52<1:54:52,  1.50it/s]


 79%|██████████████████████████▏      | 39680/50000 [7:11:53<1:57:36,  1.46it/s]


 79%|██████████████████████████▏      | 39681/50000 [7:11:53<1:57:12,  1.47it/s]


 79%|██████████████████████████▏      | 39682/50000 [7:11:54<2:00:18,  1.43it/s]


 79%|██████████████████████████▏      | 39683/50000 [7:11:55<2:03:10,  1.40it/s]


 79%|██████████████████████████▏      | 39684/50000 [7:11:55<1:56:10,  1.48it/s]


 79%|██████████████████████████▏      | 39685/50000 [7:11:56<2:01:34,  1.41it/s]


 79%|██████████████████████████▏      | 39686/50000 [7:11:57<1:59:19,  1.44it/s]


 79%|██████████████████████████▏      | 39687/50000 [7:11:57<1:53:27,  1.52it/s]


 79%|██████████████████████████▏      | 39688/50000 [7:11:58<1:52:29,  1.53it/s]


 79%|██████████████████████████▏      | 39689/50000 [7:11:59<1:42:54,  1.67it/s]


 79%|██████████████████████████▏      | 39690/50000 [7:11:59<1:39:04,  1.73it/s]


 79%|██████████████████████████▏      | 39691/50000 [7:12:00<1:44:04,  1.65it/s]


 79%|██████████████████████████▏      | 39692/50000 [7:12:00<1:45:56,  1.62it/s]


 79%|██████████████████████████▏      | 39693/50000 [7:12:01<1:49:11,  1.57it/s]


 79%|██████████████████████████▏      | 39694/50000 [7:12:02<1:46:15,  1.62it/s]


 79%|██████████████████████████▏      | 39695/50000 [7:12:02<1:45:09,  1.63it/s]


 79%|██████████████████████████▏      | 39696/50000 [7:12:03<1:45:03,  1.63it/s]


 79%|██████████████████████████▏      | 39697/50000 [7:12:03<1:46:18,  1.62it/s]


 79%|██████████████████████████▏      | 39698/50000 [7:12:04<1:48:49,  1.58it/s]


 79%|██████████████████████████▏      | 39699/50000 [7:12:05<2:00:02,  1.43it/s]


 79%|██████████████████████████▏      | 39700/50000 [7:12:06<2:10:51,  1.31it/s]
                                                                                
{'loss': 3.1629, 'grad_norm': 3.6769633293151855, 'learning_rate': 0.000206, 'epoch': 2.08}

 79%|██████████████████████████▏      | 39700/50000 [7:12:06<2:10:51,  1.31it/s]


 79%|██████████████████████████▏      | 39701/50000 [7:12:07<2:08:44,  1.33it/s]


 79%|██████████████████████████▏      | 39702/50000 [7:12:07<2:02:31,  1.40it/s]


 79%|██████████████████████████▏      | 39703/50000 [7:12:08<2:03:51,  1.39it/s]


 79%|██████████████████████████▏      | 39704/50000 [7:12:09<2:04:46,  1.38it/s]


 79%|██████████████████████████▏      | 39705/50000 [7:12:09<1:56:29,  1.47it/s]


 79%|██████████████████████████▏      | 39706/50000 [7:12:10<1:51:57,  1.53it/s]


 79%|██████████████████████████▏      | 39707/50000 [7:12:11<1:53:01,  1.52it/s]


 79%|██████████████████████████▏      | 39708/50000 [7:12:11<1:54:06,  1.50it/s]


 79%|██████████████████████████▏      | 39709/50000 [7:12:12<1:48:39,  1.58it/s]


 79%|██████████████████████████▏      | 39710/50000 [7:12:12<1:51:03,  1.54it/s]


 79%|██████████████████████████▏      | 39711/50000 [7:12:13<1:44:43,  1.64it/s]


 79%|██████████████████████████▏      | 39712/50000 [7:12:14<1:47:23,  1.60it/s]


 79%|██████████████████████████▏      | 39713/50000 [7:12:14<1:42:15,  1.68it/s]


 79%|██████████████████████████▏      | 39714/50000 [7:12:15<1:40:28,  1.71it/s]


 79%|██████████████████████████▏      | 39715/50000 [7:12:15<1:41:11,  1.69it/s]


 79%|██████████████████████████▏      | 39716/50000 [7:12:16<1:37:35,  1.76it/s]


 79%|██████████████████████████▏      | 39717/50000 [7:12:17<1:47:46,  1.59it/s]


 79%|██████████████████████████▏      | 39718/50000 [7:12:17<1:44:20,  1.64it/s]


 79%|██████████████████████████▏      | 39719/50000 [7:12:18<1:41:36,  1.69it/s]


 79%|██████████████████████████▏      | 39720/50000 [7:12:18<1:45:53,  1.62it/s]


 79%|██████████████████████████▏      | 39721/50000 [7:12:19<1:49:11,  1.57it/s]


 79%|██████████████████████████▏      | 39722/50000 [7:12:20<1:47:08,  1.60it/s]


 79%|██████████████████████████▏      | 39723/50000 [7:12:20<1:50:14,  1.55it/s]


 79%|██████████████████████████▏      | 39724/50000 [7:12:21<1:50:06,  1.56it/s]


 79%|██████████████████████████▏      | 39725/50000 [7:12:22<1:45:22,  1.63it/s]


 79%|██████████████████████████▏      | 39726/50000 [7:12:22<1:47:18,  1.60it/s]


 79%|██████████████████████████▏      | 39727/50000 [7:12:23<1:52:06,  1.53it/s]


 79%|██████████████████████████▏      | 39728/50000 [7:12:24<1:47:12,  1.60it/s]


 79%|██████████████████████████▏      | 39729/50000 [7:12:24<1:49:35,  1.56it/s]


 79%|██████████████████████████▏      | 39730/50000 [7:12:25<1:47:21,  1.59it/s]


 79%|██████████████████████████▏      | 39731/50000 [7:12:25<1:40:28,  1.70it/s]


 79%|██████████████████████████▏      | 39732/50000 [7:12:26<1:40:59,  1.69it/s]


 79%|██████████████████████████▏      | 39733/50000 [7:12:26<1:37:42,  1.75it/s]


 79%|██████████████████████████▏      | 39734/50000 [7:12:27<1:43:05,  1.66it/s]


 79%|██████████████████████████▏      | 39735/50000 [7:12:28<1:45:25,  1.62it/s]


 79%|██████████████████████████▏      | 39736/50000 [7:12:28<1:50:48,  1.54it/s]


 79%|██████████████████████████▏      | 39737/50000 [7:12:29<1:46:37,  1.60it/s]


 79%|██████████████████████████▏      | 39738/50000 [7:12:30<1:47:53,  1.59it/s]


 79%|██████████████████████████▏      | 39739/50000 [7:12:30<1:48:59,  1.57it/s]


 79%|██████████████████████████▏      | 39740/50000 [7:12:31<1:50:48,  1.54it/s]


 79%|██████████████████████████▏      | 39741/50000 [7:12:32<1:45:38,  1.62it/s]


 79%|██████████████████████████▏      | 39742/50000 [7:12:32<1:47:30,  1.59it/s]


 79%|██████████████████████████▏      | 39743/50000 [7:12:33<1:56:57,  1.46it/s]


 79%|██████████████████████████▏      | 39744/50000 [7:12:34<1:52:30,  1.52it/s]


 79%|██████████████████████████▏      | 39745/50000 [7:12:34<1:53:57,  1.50it/s]


 79%|██████████████████████████▏      | 39746/50000 [7:12:35<2:14:25,  1.27it/s]


 79%|██████████████████████████▏      | 39747/50000 [7:12:36<2:06:42,  1.35it/s]


 79%|██████████████████████████▏      | 39748/50000 [7:12:37<1:57:23,  1.46it/s]


 79%|██████████████████████████▏      | 39749/50000 [7:12:37<1:55:43,  1.48it/s]


 80%|██████████████████████████▏      | 39750/50000 [7:12:38<1:54:33,  1.49it/s]


 80%|██████████████████████████▏      | 39751/50000 [7:12:39<2:08:47,  1.33it/s]


 80%|██████████████████████████▏      | 39752/50000 [7:12:39<2:00:46,  1.41it/s]


 80%|██████████████████████████▏      | 39753/50000 [7:12:40<1:53:52,  1.50it/s]


 80%|██████████████████████████▏      | 39754/50000 [7:12:41<1:53:39,  1.50it/s]


 80%|██████████████████████████▏      | 39755/50000 [7:12:41<1:46:31,  1.60it/s]


 80%|██████████████████████████▏      | 39756/50000 [7:12:42<1:45:47,  1.61it/s]


 80%|██████████████████████████▏      | 39757/50000 [7:12:43<1:57:08,  1.46it/s]


 80%|██████████████████████████▏      | 39758/50000 [7:12:43<2:00:19,  1.42it/s]


 80%|██████████████████████████▏      | 39759/50000 [7:12:44<1:56:21,  1.47it/s]


 80%|██████████████████████████▏      | 39760/50000 [7:12:45<1:52:09,  1.52it/s]


 80%|██████████████████████████▏      | 39761/50000 [7:12:45<1:58:46,  1.44it/s]


 80%|██████████████████████████▏      | 39762/50000 [7:12:46<2:05:13,  1.36it/s]


 80%|██████████████████████████▏      | 39763/50000 [7:12:47<2:05:00,  1.36it/s]


 80%|██████████████████████████▏      | 39764/50000 [7:12:48<2:11:55,  1.29it/s]


 80%|██████████████████████████▏      | 39765/50000 [7:12:48<2:06:09,  1.35it/s]


 80%|██████████████████████████▏      | 39766/50000 [7:12:49<2:00:30,  1.42it/s]


 80%|██████████████████████████▏      | 39767/50000 [7:12:50<1:56:30,  1.46it/s]


 80%|██████████████████████████▏      | 39768/50000 [7:12:50<1:55:55,  1.47it/s]


 80%|██████████████████████████▏      | 39769/50000 [7:12:51<1:51:14,  1.53it/s]


 80%|██████████████████████████▏      | 39770/50000 [7:12:52<1:47:23,  1.59it/s]


 80%|██████████████████████████▏      | 39771/50000 [7:12:52<1:45:58,  1.61it/s]


 80%|██████████████████████████▏      | 39772/50000 [7:12:53<1:43:04,  1.65it/s]


 80%|██████████████████████████▎      | 39773/50000 [7:12:53<1:49:12,  1.56it/s]


 80%|██████████████████████████▎      | 39774/50000 [7:12:54<1:50:46,  1.54it/s]


 80%|██████████████████████████▎      | 39775/50000 [7:12:55<1:58:59,  1.43it/s]


 80%|██████████████████████████▎      | 39776/50000 [7:12:56<1:56:39,  1.46it/s]


 80%|██████████████████████████▎      | 39777/50000 [7:12:56<1:50:13,  1.55it/s]


 80%|██████████████████████████▎      | 39778/50000 [7:12:57<1:56:12,  1.47it/s]


 80%|██████████████████████████▎      | 39779/50000 [7:12:58<1:51:50,  1.52it/s]


 80%|██████████████████████████▎      | 39780/50000 [7:12:58<1:48:28,  1.57it/s]


 80%|██████████████████████████▎      | 39781/50000 [7:12:59<1:54:32,  1.49it/s]


 80%|██████████████████████████▎      | 39782/50000 [7:13:00<1:57:31,  1.45it/s]


 80%|██████████████████████████▎      | 39783/50000 [7:13:00<1:51:42,  1.52it/s]


 80%|██████████████████████████▎      | 39784/50000 [7:13:01<2:00:01,  1.42it/s]


 80%|██████████████████████████▎      | 39785/50000 [7:13:02<1:54:36,  1.49it/s]


 80%|██████████████████████████▎      | 39786/50000 [7:13:02<1:53:23,  1.50it/s]


 80%|██████████████████████████▎      | 39787/50000 [7:13:03<1:56:32,  1.46it/s]


 80%|██████████████████████████▎      | 39788/50000 [7:13:04<1:59:44,  1.42it/s]


 80%|██████████████████████████▎      | 39789/50000 [7:13:04<1:49:05,  1.56it/s]


 80%|██████████████████████████▎      | 39790/50000 [7:13:05<1:45:59,  1.61it/s]


 80%|██████████████████████████▎      | 39791/50000 [7:13:05<1:48:20,  1.57it/s]


 80%|██████████████████████████▎      | 39792/50000 [7:13:06<1:49:38,  1.55it/s]


 80%|██████████████████████████▎      | 39793/50000 [7:13:07<1:49:49,  1.55it/s]


 80%|██████████████████████████▎      | 39794/50000 [7:13:08<2:00:00,  1.42it/s]


 80%|██████████████████████████▎      | 39795/50000 [7:13:08<1:53:11,  1.50it/s]


 80%|██████████████████████████▎      | 39796/50000 [7:13:09<1:49:21,  1.56it/s]


 80%|██████████████████████████▎      | 39797/50000 [7:13:09<1:44:43,  1.62it/s]


 80%|██████████████████████████▎      | 39798/50000 [7:13:10<1:46:36,  1.59it/s]


 80%|██████████████████████████▎      | 39799/50000 [7:13:11<1:51:11,  1.53it/s]


 80%|██████████████████████████▎      | 39800/50000 [7:13:11<1:51:30,  1.52it/s]


                                                                                
{'loss': 3.1491, 'grad_norm': 3.205767869949341, 'learning_rate': 0.000204, 'epoch': 2.08}

 80%|██████████████████████████▎      | 39800/50000 [7:13:11<1:51:30,  1.52it/s]


 80%|██████████████████████████▎      | 39801/50000 [7:13:12<1:48:49,  1.56it/s]


 80%|██████████████████████████▎      | 39802/50000 [7:13:13<1:50:29,  1.54it/s]


 80%|██████████████████████████▎      | 39803/50000 [7:13:13<1:51:35,  1.52it/s]


 80%|██████████████████████████▎      | 39804/50000 [7:13:14<2:01:50,  1.39it/s]


 80%|██████████████████████████▎      | 39805/50000 [7:13:15<1:53:14,  1.50it/s]


 80%|██████████████████████████▎      | 39806/50000 [7:13:15<1:53:36,  1.50it/s]


 80%|██████████████████████████▎      | 39807/50000 [7:13:16<1:49:43,  1.55it/s]


 80%|██████████████████████████▎      | 39808/50000 [7:13:17<1:45:08,  1.62it/s]


 80%|██████████████████████████▎      | 39809/50000 [7:13:17<1:47:03,  1.59it/s]


 80%|██████████████████████████▎      | 39810/50000 [7:13:18<1:47:45,  1.58it/s]


 80%|██████████████████████████▎      | 39811/50000 [7:13:18<1:46:09,  1.60it/s]


 80%|██████████████████████████▎      | 39812/50000 [7:13:19<1:56:57,  1.45it/s]


 80%|██████████████████████████▎      | 39813/50000 [7:13:20<1:56:28,  1.46it/s]


 80%|██████████████████████████▎      | 39814/50000 [7:13:21<1:52:06,  1.51it/s]


 80%|██████████████████████████▎      | 39815/50000 [7:13:21<1:52:11,  1.51it/s]


 80%|██████████████████████████▎      | 39816/50000 [7:13:22<1:52:49,  1.50it/s]


 80%|██████████████████████████▎      | 39817/50000 [7:13:22<1:47:08,  1.58it/s]


 80%|██████████████████████████▎      | 39818/50000 [7:13:23<1:43:29,  1.64it/s]


 80%|██████████████████████████▎      | 39819/50000 [7:13:24<1:42:50,  1.65it/s]


 80%|██████████████████████████▎      | 39820/50000 [7:13:24<1:45:36,  1.61it/s]


 80%|██████████████████████████▎      | 39821/50000 [7:13:25<1:47:50,  1.57it/s]


 80%|██████████████████████████▎      | 39822/50000 [7:13:25<1:44:44,  1.62it/s]


 80%|██████████████████████████▎      | 39823/50000 [7:13:26<1:43:10,  1.64it/s]


 80%|██████████████████████████▎      | 39824/50000 [7:13:27<1:43:13,  1.64it/s]


 80%|██████████████████████████▎      | 39825/50000 [7:13:27<1:42:55,  1.65it/s]


 80%|██████████████████████████▎      | 39826/50000 [7:13:28<1:48:13,  1.57it/s]


 80%|██████████████████████████▎      | 39827/50000 [7:13:29<1:48:21,  1.56it/s]


 80%|██████████████████████████▎      | 39828/50000 [7:13:29<1:43:59,  1.63it/s]


 80%|██████████████████████████▎      | 39829/50000 [7:13:30<1:44:45,  1.62it/s]


 80%|██████████████████████████▎      | 39830/50000 [7:13:30<1:41:28,  1.67it/s]


 80%|██████████████████████████▎      | 39831/50000 [7:13:31<1:41:42,  1.67it/s]


 80%|██████████████████████████▎      | 39832/50000 [7:13:32<1:37:31,  1.74it/s]


 80%|██████████████████████████▎      | 39833/50000 [7:13:32<1:42:29,  1.65it/s]


 80%|██████████████████████████▎      | 39834/50000 [7:13:33<1:45:38,  1.60it/s]


 80%|██████████████████████████▎      | 39835/50000 [7:13:34<1:48:10,  1.57it/s]


 80%|██████████████████████████▎      | 39836/50000 [7:13:34<1:43:31,  1.64it/s]


 80%|██████████████████████████▎      | 39837/50000 [7:13:35<1:49:39,  1.54it/s]


 80%|██████████████████████████▎      | 39838/50000 [7:13:35<1:45:50,  1.60it/s]


 80%|██████████████████████████▎      | 39839/50000 [7:13:36<1:43:46,  1.63it/s]


 80%|██████████████████████████▎      | 39840/50000 [7:13:37<1:52:25,  1.51it/s]


 80%|██████████████████████████▎      | 39841/50000 [7:13:37<1:48:24,  1.56it/s]


 80%|██████████████████████████▎      | 39842/50000 [7:13:38<1:48:30,  1.56it/s]


 80%|██████████████████████████▎      | 39843/50000 [7:13:39<1:45:58,  1.60it/s]


 80%|██████████████████████████▎      | 39844/50000 [7:13:39<1:47:49,  1.57it/s]


 80%|██████████████████████████▎      | 39845/50000 [7:13:40<1:49:05,  1.55it/s]


 80%|██████████████████████████▎      | 39846/50000 [7:13:41<1:48:42,  1.56it/s]


 80%|██████████████████████████▎      | 39847/50000 [7:13:41<1:42:43,  1.65it/s]


 80%|██████████████████████████▎      | 39848/50000 [7:13:42<1:49:03,  1.55it/s]


 80%|██████████████████████████▎      | 39849/50000 [7:13:42<1:44:06,  1.63it/s]


 80%|██████████████████████████▎      | 39850/50000 [7:13:43<1:45:53,  1.60it/s]


 80%|██████████████████████████▎      | 39851/50000 [7:13:43<1:40:52,  1.68it/s]


 80%|██████████████████████████▎      | 39852/50000 [7:13:44<1:43:51,  1.63it/s]


 80%|██████████████████████████▎      | 39853/50000 [7:13:45<1:45:58,  1.60it/s]


 80%|██████████████████████████▎      | 39854/50000 [7:13:45<1:45:53,  1.60it/s]


 80%|██████████████████████████▎      | 39855/50000 [7:13:46<1:46:24,  1.59it/s]


 80%|██████████████████████████▎      | 39856/50000 [7:13:47<1:56:28,  1.45it/s]


 80%|██████████████████████████▎      | 39857/50000 [7:13:47<1:48:12,  1.56it/s]


 80%|██████████████████████████▎      | 39858/50000 [7:13:48<1:57:26,  1.44it/s]


 80%|██████████████████████████▎      | 39859/50000 [7:13:49<1:56:32,  1.45it/s]


 80%|██████████████████████████▎      | 39860/50000 [7:13:50<1:58:18,  1.43it/s]


 80%|██████████████████████████▎      | 39861/50000 [7:13:50<1:48:59,  1.55it/s]


 80%|██████████████████████████▎      | 39862/50000 [7:13:51<1:49:01,  1.55it/s]


 80%|██████████████████████████▎      | 39863/50000 [7:13:51<1:48:04,  1.56it/s]


 80%|██████████████████████████▎      | 39864/50000 [7:13:52<1:49:02,  1.55it/s]


 80%|██████████████████████████▎      | 39865/50000 [7:13:53<1:48:04,  1.56it/s]


 80%|██████████████████████████▎      | 39866/50000 [7:13:53<1:45:09,  1.61it/s]


 80%|██████████████████████████▎      | 39867/50000 [7:13:54<1:48:20,  1.56it/s]


 80%|██████████████████████████▎      | 39868/50000 [7:13:55<1:45:44,  1.60it/s]


 80%|██████████████████████████▎      | 39869/50000 [7:13:55<1:44:15,  1.62it/s]


 80%|██████████████████████████▎      | 39870/50000 [7:13:56<1:51:23,  1.52it/s]


 80%|██████████████████████████▎      | 39871/50000 [7:13:57<1:52:03,  1.51it/s]


 80%|██████████████████████████▎      | 39872/50000 [7:13:57<1:49:19,  1.54it/s]


 80%|██████████████████████████▎      | 39873/50000 [7:13:58<1:47:23,  1.57it/s]


 80%|██████████████████████████▎      | 39874/50000 [7:13:59<1:52:33,  1.50it/s]


 80%|██████████████████████████▎      | 39875/50000 [7:13:59<1:47:06,  1.58it/s]


 80%|██████████████████████████▎      | 39876/50000 [7:14:00<1:56:23,  1.45it/s]


 80%|██████████████████████████▎      | 39877/50000 [7:14:01<1:53:19,  1.49it/s]


 80%|██████████████████████████▎      | 39878/50000 [7:14:01<1:57:39,  1.43it/s]


 80%|██████████████████████████▎      | 39879/50000 [7:14:02<1:56:25,  1.45it/s]


 80%|██████████████████████████▎      | 39880/50000 [7:14:03<2:02:54,  1.37it/s]


 80%|██████████████████████████▎      | 39881/50000 [7:14:03<2:00:05,  1.40it/s]


 80%|██████████████████████████▎      | 39882/50000 [7:14:04<1:54:14,  1.48it/s]


 80%|██████████████████████████▎      | 39883/50000 [7:14:05<1:45:09,  1.60it/s]


 80%|██████████████████████████▎      | 39884/50000 [7:14:05<1:44:01,  1.62it/s]


 80%|██████████████████████████▎      | 39885/50000 [7:14:06<1:41:57,  1.65it/s]


 80%|██████████████████████████▎      | 39886/50000 [7:14:06<1:40:30,  1.68it/s]


 80%|██████████████████████████▎      | 39887/50000 [7:14:07<1:40:06,  1.68it/s]


 80%|██████████████████████████▎      | 39888/50000 [7:14:08<1:47:00,  1.57it/s]


 80%|██████████████████████████▎      | 39889/50000 [7:14:09<1:58:09,  1.43it/s]


 80%|██████████████████████████▎      | 39890/50000 [7:14:09<1:50:15,  1.53it/s]


 80%|██████████████████████████▎      | 39891/50000 [7:14:10<1:53:06,  1.49it/s]


 80%|██████████████████████████▎      | 39892/50000 [7:14:10<1:47:08,  1.57it/s]


 80%|██████████████████████████▎      | 39893/50000 [7:14:11<1:40:15,  1.68it/s]


 80%|██████████████████████████▎      | 39894/50000 [7:14:11<1:35:44,  1.76it/s]


 80%|██████████████████████████▎      | 39895/50000 [7:14:12<1:37:34,  1.73it/s]


 80%|██████████████████████████▎      | 39896/50000 [7:14:13<1:41:21,  1.66it/s]


 80%|██████████████████████████▎      | 39897/50000 [7:14:13<1:43:42,  1.62it/s]


 80%|██████████████████████████▎      | 39898/50000 [7:14:14<1:41:11,  1.66it/s]


 80%|██████████████████████████▎      | 39899/50000 [7:14:14<1:35:45,  1.76it/s]


 80%|██████████████████████████▎      | 39900/50000 [7:14:15<1:39:27,  1.69it/s]
                                                                                
{'loss': 3.204, 'grad_norm': 4.256750583648682, 'learning_rate': 0.000202, 'epoch': 2.09}

 80%|██████████████████████████▎      | 39900/50000 [7:14:15<1:39:27,  1.69it/s]


 80%|██████████████████████████▎      | 39901/50000 [7:14:16<1:39:04,  1.70it/s]


 80%|██████████████████████████▎      | 39902/50000 [7:14:16<1:37:39,  1.72it/s]


 80%|██████████████████████████▎      | 39903/50000 [7:14:17<1:42:22,  1.64it/s]


 80%|██████████████████████████▎      | 39904/50000 [7:14:17<1:44:03,  1.62it/s]


 80%|██████████████████████████▎      | 39905/50000 [7:14:18<1:50:49,  1.52it/s]


 80%|██████████████████████████▎      | 39906/50000 [7:14:19<1:46:09,  1.58it/s]


 80%|██████████████████████████▎      | 39907/50000 [7:14:19<1:44:08,  1.62it/s]


 80%|██████████████████████████▎      | 39908/50000 [7:14:20<1:43:22,  1.63it/s]


 80%|██████████████████████████▎      | 39909/50000 [7:14:21<1:44:26,  1.61it/s]


 80%|██████████████████████████▎      | 39910/50000 [7:14:21<1:42:37,  1.64it/s]


 80%|██████████████████████████▎      | 39911/50000 [7:14:22<1:37:59,  1.72it/s]


 80%|██████████████████████████▎      | 39912/50000 [7:14:22<1:39:12,  1.69it/s]


 80%|██████████████████████████▎      | 39913/50000 [7:14:23<1:47:45,  1.56it/s]


 80%|██████████████████████████▎      | 39914/50000 [7:14:24<1:45:14,  1.60it/s]


 80%|██████████████████████████▎      | 39915/50000 [7:14:24<1:43:48,  1.62it/s]


 80%|██████████████████████████▎      | 39916/50000 [7:14:25<1:45:21,  1.60it/s]


 80%|██████████████████████████▎      | 39917/50000 [7:14:26<2:01:52,  1.38it/s]


 80%|██████████████████████████▎      | 39918/50000 [7:14:26<1:54:29,  1.47it/s]


 80%|██████████████████████████▎      | 39919/50000 [7:14:27<1:52:29,  1.49it/s]


 80%|██████████████████████████▎      | 39920/50000 [7:14:28<1:51:42,  1.50it/s]


 80%|██████████████████████████▎      | 39921/50000 [7:14:28<1:51:10,  1.51it/s]


 80%|██████████████████████████▎      | 39922/50000 [7:14:29<1:43:19,  1.63it/s]


 80%|██████████████████████████▎      | 39923/50000 [7:14:29<1:41:47,  1.65it/s]


 80%|██████████████████████████▎      | 39924/50000 [7:14:30<1:47:32,  1.56it/s]


 80%|██████████████████████████▎      | 39925/50000 [7:14:31<1:45:39,  1.59it/s]


 80%|██████████████████████████▎      | 39926/50000 [7:14:31<1:43:23,  1.62it/s]


 80%|██████████████████████████▎      | 39927/50000 [7:14:32<1:45:57,  1.58it/s]


 80%|██████████████████████████▎      | 39928/50000 [7:14:33<1:53:37,  1.48it/s]


 80%|██████████████████████████▎      | 39929/50000 [7:14:34<1:57:41,  1.43it/s]


 80%|██████████████████████████▎      | 39930/50000 [7:14:34<1:50:51,  1.51it/s]


 80%|██████████████████████████▎      | 39931/50000 [7:14:35<1:50:01,  1.53it/s]


 80%|██████████████████████████▎      | 39932/50000 [7:14:35<1:48:50,  1.54it/s]


 80%|██████████████████████████▎      | 39933/50000 [7:14:36<1:47:31,  1.56it/s]


 80%|██████████████████████████▎      | 39934/50000 [7:14:37<1:49:09,  1.54it/s]


 80%|██████████████████████████▎      | 39935/50000 [7:14:37<1:50:17,  1.52it/s]


 80%|██████████████████████████▎      | 39936/50000 [7:14:38<1:45:22,  1.59it/s]


 80%|██████████████████████████▎      | 39937/50000 [7:14:38<1:41:54,  1.65it/s]


 80%|██████████████████████████▎      | 39938/50000 [7:14:39<1:52:49,  1.49it/s]


 80%|██████████████████████████▎      | 39939/50000 [7:14:40<1:58:18,  1.42it/s]


 80%|██████████████████████████▎      | 39940/50000 [7:14:41<1:52:25,  1.49it/s]


 80%|██████████████████████████▎      | 39941/50000 [7:14:41<1:50:50,  1.51it/s]


 80%|██████████████████████████▎      | 39942/50000 [7:14:42<1:51:59,  1.50it/s]


 80%|██████████████████████████▎      | 39943/50000 [7:14:43<1:52:46,  1.49it/s]


 80%|██████████████████████████▎      | 39944/50000 [7:14:43<1:51:04,  1.51it/s]


 80%|██████████████████████████▎      | 39945/50000 [7:14:44<1:43:25,  1.62it/s]


 80%|██████████████████████████▎      | 39946/50000 [7:14:44<1:40:13,  1.67it/s]


 80%|██████████████████████████▎      | 39947/50000 [7:14:45<1:44:01,  1.61it/s]


 80%|██████████████████████████▎      | 39948/50000 [7:14:46<1:44:46,  1.60it/s]


 80%|██████████████████████████▎      | 39949/50000 [7:14:46<1:44:40,  1.60it/s]


 80%|██████████████████████████▎      | 39950/50000 [7:14:47<1:43:55,  1.61it/s]


 80%|██████████████████████████▎      | 39951/50000 [7:14:48<2:04:37,  1.34it/s]


 80%|██████████████████████████▎      | 39952/50000 [7:14:49<2:01:07,  1.38it/s]


 80%|██████████████████████████▎      | 39953/50000 [7:14:49<1:58:00,  1.42it/s]


 80%|██████████████████████████▎      | 39954/50000 [7:14:50<1:58:23,  1.41it/s]


 80%|██████████████████████████▎      | 39955/50000 [7:14:51<1:52:28,  1.49it/s]


 80%|██████████████████████████▎      | 39956/50000 [7:14:51<1:52:41,  1.49it/s]


 80%|██████████████████████████▎      | 39957/50000 [7:14:52<1:57:18,  1.43it/s]


 80%|██████████████████████████▎      | 39958/50000 [7:14:53<1:50:58,  1.51it/s]


 80%|██████████████████████████▎      | 39959/50000 [7:14:53<1:48:12,  1.55it/s]


 80%|██████████████████████████▎      | 39960/50000 [7:14:54<1:54:42,  1.46it/s]


 80%|██████████████████████████▎      | 39961/50000 [7:14:55<1:56:56,  1.43it/s]


 80%|██████████████████████████▎      | 39962/50000 [7:14:55<1:51:34,  1.50it/s]


 80%|██████████████████████████▍      | 39963/50000 [7:14:56<1:48:42,  1.54it/s]


 80%|██████████████████████████▍      | 39964/50000 [7:14:57<1:48:37,  1.54it/s]


 80%|██████████████████████████▍      | 39965/50000 [7:14:57<1:47:57,  1.55it/s]


 80%|██████████████████████████▍      | 39966/50000 [7:14:58<1:46:10,  1.58it/s]


 80%|██████████████████████████▍      | 39967/50000 [7:14:58<1:45:35,  1.58it/s]


 80%|██████████████████████████▍      | 39968/50000 [7:15:00<2:07:43,  1.31it/s]


 80%|██████████████████████████▍      | 39969/50000 [7:15:00<2:03:13,  1.36it/s]


 80%|██████████████████████████▍      | 39970/50000 [7:15:01<2:00:02,  1.39it/s]


 80%|██████████████████████████▍      | 39971/50000 [7:15:02<1:55:42,  1.44it/s]


 80%|██████████████████████████▍      | 39972/50000 [7:15:02<1:46:59,  1.56it/s]


 80%|██████████████████████████▍      | 39973/50000 [7:15:03<1:39:17,  1.68it/s]


 80%|██████████████████████████▍      | 39974/50000 [7:15:03<1:38:50,  1.69it/s]


 80%|██████████████████████████▍      | 39975/50000 [7:15:04<1:39:46,  1.67it/s]


 80%|██████████████████████████▍      | 39976/50000 [7:15:04<1:48:50,  1.53it/s]


 80%|██████████████████████████▍      | 39977/50000 [7:15:05<1:45:12,  1.59it/s]


 80%|██████████████████████████▍      | 39978/50000 [7:15:06<1:45:48,  1.58it/s]


 80%|██████████████████████████▍      | 39979/50000 [7:15:06<1:51:46,  1.49it/s]


 80%|██████████████████████████▍      | 39980/50000 [7:15:07<1:54:39,  1.46it/s]


 80%|██████████████████████████▍      | 39981/50000 [7:15:08<1:59:52,  1.39it/s]


 80%|██████████████████████████▍      | 39982/50000 [7:15:09<2:06:55,  1.32it/s]


 80%|██████████████████████████▍      | 39983/50000 [7:15:09<1:58:27,  1.41it/s]


 80%|██████████████████████████▍      | 39984/50000 [7:15:10<2:01:55,  1.37it/s]


 80%|██████████████████████████▍      | 39985/50000 [7:15:11<1:50:39,  1.51it/s]


 80%|██████████████████████████▍      | 39986/50000 [7:15:11<1:42:27,  1.63it/s]


 80%|██████████████████████████▍      | 39987/50000 [7:15:12<1:52:44,  1.48it/s]


 80%|██████████████████████████▍      | 39988/50000 [7:15:13<1:50:26,  1.51it/s]


 80%|██████████████████████████▍      | 39989/50000 [7:15:13<1:49:26,  1.52it/s]


 80%|██████████████████████████▍      | 39990/50000 [7:15:14<1:52:53,  1.48it/s]


 80%|██████████████████████████▍      | 39991/50000 [7:15:15<1:48:05,  1.54it/s]


 80%|██████████████████████████▍      | 39992/50000 [7:15:15<1:54:05,  1.46it/s]


 80%|██████████████████████████▍      | 39993/50000 [7:15:16<1:52:03,  1.49it/s]


 80%|██████████████████████████▍      | 39994/50000 [7:15:17<1:47:48,  1.55it/s]


 80%|██████████████████████████▍      | 39995/50000 [7:15:17<1:52:54,  1.48it/s]


 80%|██████████████████████████▍      | 39996/50000 [7:15:18<1:50:56,  1.50it/s]


 80%|██████████████████████████▍      | 39997/50000 [7:15:19<1:59:23,  1.40it/s]


 80%|██████████████████████████▍      | 39998/50000 [7:15:19<1:51:59,  1.49it/s]


 80%|██████████████████████████▍      | 39999/50000 [7:15:20<1:51:13,  1.50it/s]


 80%|██████████████████████████▍      | 40000/50000 [7:15:21<1:50:03,  1.51it/s]
                                                                                
{'loss': 3.145, 'grad_norm': 3.597551107406616, 'learning_rate': 0.0002, 'epoch': 2.09}

 80%|██████████████████████████▍      | 40000/50000 [7:15:21<1:50:03,  1.51it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.47s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:04<00:01,  1.68s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:06<00:00,  1.62s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 34.224652, 'eval_rouge-2': 8.019416000000001, 'eval_rouge-l': 26.25965, 'eval_bleu-4': 0.03878492571749384, 'eval_runtime': 13.2524, 'eval_samples_per_second': 3.773, 'eval_steps_per_second': 0.302, 'epoch': 2.09}

 80%|██████████████████████████▍      | 40000/50000 [7:15:34<1:50:03,  1.51it/s]

100%|█████████████████████████████████████████████| 4/4 [00:06<00:00,  1.62s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-40000


tokenizer config file saved in ./output/tmp-checkpoint-40000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-40000/special_tokens_map.json



 80%|█████████████████████████▌      | 40001/50000 [7:15:35<13:04:07,  4.71s/it]


 80%|██████████████████████████▍      | 40002/50000 [7:15:35<9:36:21,  3.46s/it]


 80%|██████████████████████████▍      | 40003/50000 [7:15:36<7:08:54,  2.57s/it]


 80%|██████████████████████████▍      | 40004/50000 [7:15:37<5:32:09,  1.99s/it]


 80%|██████████████████████████▍      | 40005/50000 [7:15:37<4:34:33,  1.65s/it]


 80%|██████████████████████████▍      | 40006/50000 [7:15:38<3:48:59,  1.37s/it]


 80%|██████████████████████████▍      | 40007/50000 [7:15:39<3:23:01,  1.22s/it]


 80%|██████████████████████████▍      | 40008/50000 [7:15:40<2:47:54,  1.01s/it]


 80%|██████████████████████████▍      | 40009/50000 [7:15:40<2:30:24,  1.11it/s]


 80%|██████████████████████████▍      | 40010/50000 [7:15:41<2:12:47,  1.25it/s]


 80%|██████████████████████████▍      | 40011/50000 [7:15:41<2:04:51,  1.33it/s]


 80%|██████████████████████████▍      | 40012/50000 [7:15:42<2:03:50,  1.34it/s]


 80%|██████████████████████████▍      | 40013/50000 [7:15:43<2:02:07,  1.36it/s]


 80%|██████████████████████████▍      | 40014/50000 [7:15:44<2:01:12,  1.37it/s]


 80%|██████████████████████████▍      | 40015/50000 [7:15:44<1:58:01,  1.41it/s]


 80%|██████████████████████████▍      | 40016/50000 [7:15:45<1:56:07,  1.43it/s]


 80%|██████████████████████████▍      | 40017/50000 [7:15:46<1:54:09,  1.46it/s]


 80%|██████████████████████████▍      | 40018/50000 [7:15:46<1:48:21,  1.54it/s]


 80%|██████████████████████████▍      | 40019/50000 [7:15:47<1:48:22,  1.53it/s]


 80%|██████████████████████████▍      | 40020/50000 [7:15:47<1:48:42,  1.53it/s]


 80%|██████████████████████████▍      | 40021/50000 [7:15:48<1:46:24,  1.56it/s]


 80%|██████████████████████████▍      | 40022/50000 [7:15:49<1:48:36,  1.53it/s]


 80%|██████████████████████████▍      | 40023/50000 [7:15:49<1:45:43,  1.57it/s]


 80%|██████████████████████████▍      | 40024/50000 [7:15:50<1:46:31,  1.56it/s]


 80%|██████████████████████████▍      | 40025/50000 [7:15:51<1:47:43,  1.54it/s]


 80%|██████████████████████████▍      | 40026/50000 [7:15:51<1:58:02,  1.41it/s]


 80%|██████████████████████████▍      | 40027/50000 [7:15:52<2:05:33,  1.32it/s]


 80%|██████████████████████████▍      | 40028/50000 [7:15:53<2:03:59,  1.34it/s]


 80%|██████████████████████████▍      | 40029/50000 [7:15:54<2:10:02,  1.28it/s]


 80%|██████████████████████████▍      | 40030/50000 [7:15:55<2:21:00,  1.18it/s]


 80%|██████████████████████████▍      | 40031/50000 [7:15:56<2:09:56,  1.28it/s]


 80%|██████████████████████████▍      | 40032/50000 [7:15:56<1:59:59,  1.38it/s]


 80%|██████████████████████████▍      | 40033/50000 [7:15:57<1:51:46,  1.49it/s]


 80%|██████████████████████████▍      | 40034/50000 [7:15:57<1:46:39,  1.56it/s]


 80%|██████████████████████████▍      | 40035/50000 [7:15:58<1:43:03,  1.61it/s]


 80%|██████████████████████████▍      | 40036/50000 [7:15:59<1:48:50,  1.53it/s]


 80%|██████████████████████████▍      | 40037/50000 [7:15:59<1:44:34,  1.59it/s]


 80%|██████████████████████████▍      | 40038/50000 [7:16:00<1:49:53,  1.51it/s]


 80%|██████████████████████████▍      | 40039/50000 [7:16:00<1:44:03,  1.60it/s]


 80%|██████████████████████████▍      | 40040/50000 [7:16:01<1:49:49,  1.51it/s]


 80%|██████████████████████████▍      | 40041/50000 [7:16:02<1:49:39,  1.51it/s]


 80%|██████████████████████████▍      | 40042/50000 [7:16:02<1:46:42,  1.56it/s]


 80%|██████████████████████████▍      | 40043/50000 [7:16:03<1:42:09,  1.62it/s]


 80%|██████████████████████████▍      | 40044/50000 [7:16:04<1:42:24,  1.62it/s]


 80%|██████████████████████████▍      | 40045/50000 [7:16:04<1:43:50,  1.60it/s]


 80%|██████████████████████████▍      | 40046/50000 [7:16:05<1:49:45,  1.51it/s]


 80%|██████████████████████████▍      | 40047/50000 [7:16:06<1:52:57,  1.47it/s]


 80%|██████████████████████████▍      | 40048/50000 [7:16:06<1:46:36,  1.56it/s]


 80%|██████████████████████████▍      | 40049/50000 [7:16:07<1:47:29,  1.54it/s]


 80%|██████████████████████████▍      | 40050/50000 [7:16:08<1:49:03,  1.52it/s]


 80%|██████████████████████████▍      | 40051/50000 [7:16:08<1:45:06,  1.58it/s]


 80%|██████████████████████████▍      | 40052/50000 [7:16:09<1:49:00,  1.52it/s]


 80%|██████████████████████████▍      | 40053/50000 [7:16:10<1:48:52,  1.52it/s]


 80%|██████████████████████████▍      | 40054/50000 [7:16:10<1:49:58,  1.51it/s]


 80%|██████████████████████████▍      | 40055/50000 [7:16:11<1:54:39,  1.45it/s]


 80%|██████████████████████████▍      | 40056/50000 [7:16:12<1:49:38,  1.51it/s]


 80%|██████████████████████████▍      | 40057/50000 [7:16:12<1:46:25,  1.56it/s]


 80%|██████████████████████████▍      | 40058/50000 [7:16:13<1:42:18,  1.62it/s]


 80%|██████████████████████████▍      | 40059/50000 [7:16:13<1:40:03,  1.66it/s]


 80%|██████████████████████████▍      | 40060/50000 [7:16:14<1:47:23,  1.54it/s]


 80%|██████████████████████████▍      | 40061/50000 [7:16:15<1:43:32,  1.60it/s]


 80%|██████████████████████████▍      | 40062/50000 [7:16:15<1:39:52,  1.66it/s]


 80%|██████████████████████████▍      | 40063/50000 [7:16:16<1:43:00,  1.61it/s]


 80%|██████████████████████████▍      | 40064/50000 [7:16:17<1:46:07,  1.56it/s]


 80%|██████████████████████████▍      | 40065/50000 [7:16:17<1:44:26,  1.59it/s]


 80%|██████████████████████████▍      | 40066/50000 [7:16:18<1:48:47,  1.52it/s]


 80%|██████████████████████████▍      | 40067/50000 [7:16:18<1:47:37,  1.54it/s]


 80%|██████████████████████████▍      | 40068/50000 [7:16:19<1:47:05,  1.55it/s]


 80%|██████████████████████████▍      | 40069/50000 [7:16:20<1:46:24,  1.56it/s]


 80%|██████████████████████████▍      | 40070/50000 [7:16:20<1:47:48,  1.54it/s]


 80%|██████████████████████████▍      | 40071/50000 [7:16:21<1:46:52,  1.55it/s]


 80%|██████████████████████████▍      | 40072/50000 [7:16:22<1:48:07,  1.53it/s]


 80%|██████████████████████████▍      | 40073/50000 [7:16:22<1:48:04,  1.53it/s]


 80%|██████████████████████████▍      | 40074/50000 [7:16:23<1:40:44,  1.64it/s]


 80%|██████████████████████████▍      | 40075/50000 [7:16:23<1:39:27,  1.66it/s]


 80%|██████████████████████████▍      | 40076/50000 [7:16:24<1:42:17,  1.62it/s]


 80%|██████████████████████████▍      | 40077/50000 [7:16:25<1:47:31,  1.54it/s]


 80%|██████████████████████████▍      | 40078/50000 [7:16:26<1:51:49,  1.48it/s]


 80%|██████████████████████████▍      | 40079/50000 [7:16:26<1:50:17,  1.50it/s]


 80%|██████████████████████████▍      | 40080/50000 [7:16:27<1:42:30,  1.61it/s]


 80%|██████████████████████████▍      | 40081/50000 [7:16:28<1:52:52,  1.46it/s]


 80%|██████████████████████████▍      | 40082/50000 [7:16:28<1:51:59,  1.48it/s]


 80%|██████████████████████████▍      | 40083/50000 [7:16:29<1:46:40,  1.55it/s]


 80%|██████████████████████████▍      | 40084/50000 [7:16:29<1:46:40,  1.55it/s]


 80%|██████████████████████████▍      | 40085/50000 [7:16:30<1:47:55,  1.53it/s]


 80%|██████████████████████████▍      | 40086/50000 [7:16:31<1:45:01,  1.57it/s]


 80%|██████████████████████████▍      | 40087/50000 [7:16:31<1:43:18,  1.60it/s]


 80%|██████████████████████████▍      | 40088/50000 [7:16:32<1:45:14,  1.57it/s]


 80%|██████████████████████████▍      | 40089/50000 [7:16:33<1:52:24,  1.47it/s]


 80%|██████████████████████████▍      | 40090/50000 [7:16:33<1:52:15,  1.47it/s]


 80%|██████████████████████████▍      | 40091/50000 [7:16:34<1:45:35,  1.56it/s]


 80%|██████████████████████████▍      | 40092/50000 [7:16:35<1:47:48,  1.53it/s]


 80%|██████████████████████████▍      | 40093/50000 [7:16:36<1:57:52,  1.40it/s]


 80%|██████████████████████████▍      | 40094/50000 [7:16:36<1:51:41,  1.48it/s]


 80%|██████████████████████████▍      | 40095/50000 [7:16:37<1:51:43,  1.48it/s]


 80%|██████████████████████████▍      | 40096/50000 [7:16:37<1:45:30,  1.56it/s]


 80%|██████████████████████████▍      | 40097/50000 [7:16:38<1:47:44,  1.53it/s]


 80%|██████████████████████████▍      | 40098/50000 [7:16:39<1:50:50,  1.49it/s]


 80%|██████████████████████████▍      | 40099/50000 [7:16:39<1:54:03,  1.45it/s]


 80%|██████████████████████████▍      | 40100/50000 [7:16:40<1:49:38,  1.50it/s]
                                                                                
{'loss': 3.1877, 'grad_norm': 3.2241761684417725, 'learning_rate': 0.00019800000000000002, 'epoch': 2.1}

 80%|██████████████████████████▍      | 40100/50000 [7:16:40<1:49:38,  1.50it/s]


 80%|██████████████████████████▍      | 40101/50000 [7:16:41<1:50:17,  1.50it/s]


 80%|██████████████████████████▍      | 40102/50000 [7:16:42<1:59:38,  1.38it/s]


 80%|██████████████████████████▍      | 40103/50000 [7:16:42<1:53:14,  1.46it/s]


 80%|██████████████████████████▍      | 40104/50000 [7:16:43<1:54:21,  1.44it/s]


 80%|██████████████████████████▍      | 40105/50000 [7:16:44<1:49:10,  1.51it/s]


 80%|██████████████████████████▍      | 40106/50000 [7:16:44<1:42:16,  1.61it/s]


 80%|██████████████████████████▍      | 40107/50000 [7:16:45<1:36:30,  1.71it/s]


 80%|██████████████████████████▍      | 40108/50000 [7:16:45<1:31:32,  1.80it/s]


 80%|██████████████████████████▍      | 40109/50000 [7:16:46<1:31:58,  1.79it/s]


 80%|██████████████████████████▍      | 40110/50000 [7:16:46<1:44:24,  1.58it/s]


 80%|██████████████████████████▍      | 40111/50000 [7:16:47<1:45:00,  1.57it/s]


 80%|██████████████████████████▍      | 40112/50000 [7:16:48<1:44:29,  1.58it/s]


 80%|██████████████████████████▍      | 40113/50000 [7:16:49<2:00:22,  1.37it/s]


 80%|██████████████████████████▍      | 40114/50000 [7:16:49<1:56:29,  1.41it/s]


 80%|██████████████████████████▍      | 40115/50000 [7:16:50<1:50:37,  1.49it/s]


 80%|██████████████████████████▍      | 40116/50000 [7:16:50<1:44:58,  1.57it/s]


 80%|██████████████████████████▍      | 40117/50000 [7:16:51<1:38:40,  1.67it/s]


 80%|██████████████████████████▍      | 40118/50000 [7:16:52<1:40:21,  1.64it/s]


 80%|██████████████████████████▍      | 40119/50000 [7:16:52<1:46:00,  1.55it/s]


 80%|██████████████████████████▍      | 40120/50000 [7:16:53<1:47:22,  1.53it/s]


 80%|██████████████████████████▍      | 40121/50000 [7:16:54<1:42:42,  1.60it/s]


 80%|██████████████████████████▍      | 40122/50000 [7:16:54<1:39:26,  1.66it/s]


 80%|██████████████████████████▍      | 40123/50000 [7:16:55<1:51:50,  1.47it/s]


 80%|██████████████████████████▍      | 40124/50000 [7:16:56<1:47:22,  1.53it/s]


 80%|██████████████████████████▍      | 40125/50000 [7:16:56<1:42:41,  1.60it/s]


 80%|██████████████████████████▍      | 40126/50000 [7:16:57<1:45:39,  1.56it/s]


 80%|██████████████████████████▍      | 40127/50000 [7:16:57<1:46:50,  1.54it/s]


 80%|██████████████████████████▍      | 40128/50000 [7:16:58<1:45:36,  1.56it/s]


 80%|██████████████████████████▍      | 40129/50000 [7:16:59<1:44:57,  1.57it/s]


 80%|██████████████████████████▍      | 40130/50000 [7:16:59<1:47:18,  1.53it/s]


 80%|██████████████████████████▍      | 40131/50000 [7:17:00<1:45:49,  1.55it/s]


 80%|██████████████████████████▍      | 40132/50000 [7:17:01<1:41:41,  1.62it/s]


 80%|██████████████████████████▍      | 40133/50000 [7:17:01<1:42:55,  1.60it/s]


 80%|██████████████████████████▍      | 40134/50000 [7:17:02<1:49:46,  1.50it/s]


 80%|██████████████████████████▍      | 40135/50000 [7:17:03<1:44:08,  1.58it/s]


 80%|██████████████████████████▍      | 40136/50000 [7:17:03<1:42:38,  1.60it/s]


 80%|██████████████████████████▍      | 40137/50000 [7:17:04<1:49:32,  1.50it/s]


 80%|██████████████████████████▍      | 40138/50000 [7:17:04<1:42:43,  1.60it/s]


 80%|██████████████████████████▍      | 40139/50000 [7:17:05<1:44:31,  1.57it/s]


 80%|██████████████████████████▍      | 40140/50000 [7:17:06<1:44:48,  1.57it/s]


 80%|██████████████████████████▍      | 40141/50000 [7:17:06<1:47:15,  1.53it/s]


 80%|██████████████████████████▍      | 40142/50000 [7:17:07<1:48:52,  1.51it/s]


 80%|██████████████████████████▍      | 40143/50000 [7:17:08<1:45:20,  1.56it/s]


 80%|██████████████████████████▍      | 40144/50000 [7:17:08<1:46:16,  1.55it/s]


 80%|██████████████████████████▍      | 40145/50000 [7:17:09<1:50:10,  1.49it/s]


 80%|██████████████████████████▍      | 40146/50000 [7:17:10<1:49:58,  1.49it/s]


 80%|██████████████████████████▍      | 40147/50000 [7:17:10<1:53:03,  1.45it/s]


 80%|██████████████████████████▍      | 40148/50000 [7:17:11<1:48:32,  1.51it/s]


 80%|██████████████████████████▍      | 40149/50000 [7:17:12<1:47:12,  1.53it/s]


 80%|██████████████████████████▍      | 40150/50000 [7:17:12<1:43:45,  1.58it/s]


 80%|██████████████████████████▍      | 40151/50000 [7:17:13<1:41:01,  1.62it/s]


 80%|██████████████████████████▌      | 40152/50000 [7:17:13<1:35:49,  1.71it/s]


 80%|██████████████████████████▌      | 40153/50000 [7:17:14<1:36:23,  1.70it/s]


 80%|██████████████████████████▌      | 40154/50000 [7:17:15<1:38:02,  1.67it/s]


 80%|██████████████████████████▌      | 40155/50000 [7:17:15<1:41:04,  1.62it/s]


 80%|██████████████████████████▌      | 40156/50000 [7:17:16<1:34:57,  1.73it/s]


 80%|██████████████████████████▌      | 40157/50000 [7:17:16<1:38:33,  1.66it/s]


 80%|██████████████████████████▌      | 40158/50000 [7:17:17<1:36:26,  1.70it/s]


 80%|██████████████████████████▌      | 40159/50000 [7:17:18<1:39:17,  1.65it/s]


 80%|██████████████████████████▌      | 40160/50000 [7:17:18<1:38:46,  1.66it/s]


 80%|██████████████████████████▌      | 40161/50000 [7:17:19<1:42:08,  1.61it/s]


 80%|██████████████████████████▌      | 40162/50000 [7:17:19<1:39:52,  1.64it/s]


 80%|██████████████████████████▌      | 40163/50000 [7:17:20<1:50:21,  1.49it/s]


 80%|██████████████████████████▌      | 40164/50000 [7:17:21<1:48:40,  1.51it/s]


 80%|██████████████████████████▌      | 40165/50000 [7:17:22<1:56:48,  1.40it/s]


 80%|██████████████████████████▌      | 40166/50000 [7:17:22<1:59:13,  1.37it/s]


 80%|██████████████████████████▌      | 40167/50000 [7:17:23<1:50:53,  1.48it/s]


 80%|██████████████████████████▌      | 40168/50000 [7:17:24<1:50:18,  1.49it/s]


 80%|██████████████████████████▌      | 40169/50000 [7:17:24<1:55:23,  1.42it/s]


 80%|██████████████████████████▌      | 40170/50000 [7:17:25<1:52:35,  1.46it/s]


 80%|██████████████████████████▌      | 40171/50000 [7:17:26<1:50:57,  1.48it/s]


 80%|██████████████████████████▌      | 40172/50000 [7:17:26<1:46:23,  1.54it/s]


 80%|██████████████████████████▌      | 40173/50000 [7:17:27<1:43:47,  1.58it/s]


 80%|██████████████████████████▌      | 40174/50000 [7:17:27<1:37:48,  1.67it/s]


 80%|██████████████████████████▌      | 40175/50000 [7:17:28<1:35:57,  1.71it/s]


 80%|██████████████████████████▌      | 40176/50000 [7:17:29<1:36:11,  1.70it/s]


 80%|██████████████████████████▌      | 40177/50000 [7:17:29<1:36:58,  1.69it/s]


 80%|██████████████████████████▌      | 40178/50000 [7:17:30<1:42:49,  1.59it/s]


 80%|██████████████████████████▌      | 40179/50000 [7:17:31<1:50:26,  1.48it/s]


 80%|██████████████████████████▌      | 40180/50000 [7:17:32<1:55:45,  1.41it/s]


 80%|██████████████████████████▌      | 40181/50000 [7:17:32<1:50:53,  1.48it/s]


 80%|██████████████████████████▌      | 40182/50000 [7:17:33<1:50:33,  1.48it/s]


 80%|██████████████████████████▌      | 40183/50000 [7:17:34<1:54:08,  1.43it/s]


 80%|██████████████████████████▌      | 40184/50000 [7:17:34<1:50:59,  1.47it/s]


 80%|██████████████████████████▌      | 40185/50000 [7:17:35<1:45:58,  1.54it/s]


 80%|██████████████████████████▌      | 40186/50000 [7:17:35<1:47:42,  1.52it/s]


 80%|██████████████████████████▌      | 40187/50000 [7:17:36<1:43:02,  1.59it/s]


 80%|██████████████████████████▌      | 40188/50000 [7:17:37<1:41:09,  1.62it/s]


 80%|██████████████████████████▌      | 40189/50000 [7:17:37<1:47:19,  1.52it/s]


 80%|██████████████████████████▌      | 40190/50000 [7:17:38<1:46:28,  1.54it/s]


 80%|██████████████████████████▌      | 40191/50000 [7:17:39<1:45:34,  1.55it/s]


 80%|██████████████████████████▌      | 40192/50000 [7:17:39<1:44:38,  1.56it/s]


 80%|██████████████████████████▌      | 40193/50000 [7:17:40<1:38:51,  1.65it/s]


 80%|██████████████████████████▌      | 40194/50000 [7:17:40<1:33:55,  1.74it/s]


 80%|██████████████████████████▌      | 40195/50000 [7:17:41<1:36:57,  1.69it/s]


 80%|██████████████████████████▌      | 40196/50000 [7:17:42<1:43:07,  1.58it/s]


 80%|██████████████████████████▌      | 40197/50000 [7:17:42<1:40:15,  1.63it/s]


 80%|██████████████████████████▌      | 40198/50000 [7:17:43<1:41:36,  1.61it/s]


 80%|██████████████████████████▌      | 40199/50000 [7:17:43<1:40:01,  1.63it/s]


 80%|██████████████████████████▌      | 40200/50000 [7:17:44<1:42:35,  1.59it/s]
                                                                                
{'loss': 3.1654, 'grad_norm': 3.6785898208618164, 'learning_rate': 0.00019600000000000002, 'epoch': 2.1}

 80%|██████████████████████████▌      | 40200/50000 [7:17:44<1:42:35,  1.59it/s]


 80%|██████████████████████████▌      | 40201/50000 [7:17:45<1:42:31,  1.59it/s]


 80%|██████████████████████████▌      | 40202/50000 [7:17:45<1:39:40,  1.64it/s]


 80%|██████████████████████████▌      | 40203/50000 [7:17:46<1:39:30,  1.64it/s]


 80%|██████████████████████████▌      | 40204/50000 [7:17:46<1:36:38,  1.69it/s]


 80%|██████████████████████████▌      | 40205/50000 [7:17:47<1:43:25,  1.58it/s]


 80%|██████████████████████████▌      | 40206/50000 [7:17:48<1:48:11,  1.51it/s]


 80%|██████████████████████████▌      | 40207/50000 [7:17:48<1:39:47,  1.64it/s]


 80%|██████████████████████████▌      | 40208/50000 [7:17:49<1:41:27,  1.61it/s]


 80%|██████████████████████████▌      | 40209/50000 [7:17:50<1:38:55,  1.65it/s]


 80%|██████████████████████████▌      | 40210/50000 [7:17:50<1:37:45,  1.67it/s]


 80%|██████████████████████████▌      | 40211/50000 [7:17:51<1:44:28,  1.56it/s]


 80%|██████████████████████████▌      | 40212/50000 [7:17:52<1:45:01,  1.55it/s]


 80%|██████████████████████████▌      | 40213/50000 [7:17:52<1:53:24,  1.44it/s]


 80%|██████████████████████████▌      | 40214/50000 [7:17:53<1:54:47,  1.42it/s]


 80%|██████████████████████████▌      | 40215/50000 [7:17:54<1:51:00,  1.47it/s]


 80%|██████████████████████████▌      | 40216/50000 [7:17:54<1:45:34,  1.54it/s]


 80%|██████████████████████████▌      | 40217/50000 [7:17:55<1:42:42,  1.59it/s]


 80%|██████████████████████████▌      | 40218/50000 [7:17:56<1:44:24,  1.56it/s]


 80%|██████████████████████████▌      | 40219/50000 [7:17:56<1:44:43,  1.56it/s]


 80%|██████████████████████████▌      | 40220/50000 [7:17:57<1:45:10,  1.55it/s]


 80%|██████████████████████████▌      | 40221/50000 [7:17:57<1:43:58,  1.57it/s]


 80%|██████████████████████████▌      | 40222/50000 [7:17:58<1:45:16,  1.55it/s]


 80%|██████████████████████████▌      | 40223/50000 [7:17:59<1:38:37,  1.65it/s]


 80%|██████████████████████████▌      | 40224/50000 [7:17:59<1:37:32,  1.67it/s]


 80%|██████████████████████████▌      | 40225/50000 [7:18:00<1:46:27,  1.53it/s]


 80%|██████████████████████████▌      | 40226/50000 [7:18:01<1:45:02,  1.55it/s]


 80%|██████████████████████████▌      | 40227/50000 [7:18:01<1:46:41,  1.53it/s]


 80%|██████████████████████████▌      | 40228/50000 [7:18:02<1:40:12,  1.63it/s]


 80%|██████████████████████████▌      | 40229/50000 [7:18:03<1:43:13,  1.58it/s]


 80%|██████████████████████████▌      | 40230/50000 [7:18:03<1:39:32,  1.64it/s]


 80%|██████████████████████████▌      | 40231/50000 [7:18:04<1:38:49,  1.65it/s]


 80%|██████████████████████████▌      | 40232/50000 [7:18:04<1:36:30,  1.69it/s]


 80%|██████████████████████████▌      | 40233/50000 [7:18:05<1:40:26,  1.62it/s]


 80%|██████████████████████████▌      | 40234/50000 [7:18:06<1:45:02,  1.55it/s]


 80%|██████████████████████████▌      | 40235/50000 [7:18:06<1:39:06,  1.64it/s]


 80%|██████████████████████████▌      | 40236/50000 [7:18:07<1:49:35,  1.48it/s]


 80%|██████████████████████████▌      | 40237/50000 [7:18:08<1:44:38,  1.55it/s]


 80%|██████████████████████████▌      | 40238/50000 [7:18:08<1:40:14,  1.62it/s]


 80%|██████████████████████████▌      | 40239/50000 [7:18:09<1:43:36,  1.57it/s]


 80%|██████████████████████████▌      | 40240/50000 [7:18:09<1:37:30,  1.67it/s]


 80%|██████████████████████████▌      | 40241/50000 [7:18:10<1:41:37,  1.60it/s]


 80%|██████████████████████████▌      | 40242/50000 [7:18:11<1:39:14,  1.64it/s]


 80%|██████████████████████████▌      | 40243/50000 [7:18:11<1:39:44,  1.63it/s]


 80%|██████████████████████████▌      | 40244/50000 [7:18:12<1:35:16,  1.71it/s]


 80%|██████████████████████████▌      | 40245/50000 [7:18:12<1:43:32,  1.57it/s]


 80%|██████████████████████████▌      | 40246/50000 [7:18:13<1:45:50,  1.54it/s]


 80%|██████████████████████████▌      | 40247/50000 [7:18:14<1:50:18,  1.47it/s]


 80%|██████████████████████████▌      | 40248/50000 [7:18:14<1:46:48,  1.52it/s]


 80%|██████████████████████████▌      | 40249/50000 [7:18:15<1:43:09,  1.58it/s]


 80%|██████████████████████████▌      | 40250/50000 [7:18:16<1:45:33,  1.54it/s]


 81%|██████████████████████████▌      | 40251/50000 [7:18:16<1:49:28,  1.48it/s]


 81%|██████████████████████████▌      | 40252/50000 [7:18:17<1:46:39,  1.52it/s]


 81%|██████████████████████████▌      | 40253/50000 [7:18:18<1:41:36,  1.60it/s]


 81%|██████████████████████████▌      | 40254/50000 [7:18:18<1:40:14,  1.62it/s]


 81%|██████████████████████████▌      | 40255/50000 [7:18:19<1:38:21,  1.65it/s]


 81%|██████████████████████████▌      | 40256/50000 [7:18:20<1:56:52,  1.39it/s]


 81%|██████████████████████████▌      | 40257/50000 [7:18:20<1:50:19,  1.47it/s]


 81%|██████████████████████████▌      | 40258/50000 [7:18:21<1:50:35,  1.47it/s]


 81%|██████████████████████████▌      | 40259/50000 [7:18:22<1:53:11,  1.43it/s]


 81%|██████████████████████████▌      | 40260/50000 [7:18:22<1:48:54,  1.49it/s]


 81%|██████████████████████████▌      | 40261/50000 [7:18:23<1:49:39,  1.48it/s]


 81%|██████████████████████████▌      | 40262/50000 [7:18:24<1:43:59,  1.56it/s]


 81%|██████████████████████████▌      | 40263/50000 [7:18:24<1:43:42,  1.56it/s]


 81%|██████████████████████████▌      | 40264/50000 [7:18:25<1:42:15,  1.59it/s]


 81%|██████████████████████████▌      | 40265/50000 [7:18:26<1:43:43,  1.56it/s]


 81%|██████████████████████████▌      | 40266/50000 [7:18:26<1:43:12,  1.57it/s]


 81%|██████████████████████████▌      | 40267/50000 [7:18:27<1:40:32,  1.61it/s]


 81%|██████████████████████████▌      | 40268/50000 [7:18:27<1:42:49,  1.58it/s]


 81%|██████████████████████████▌      | 40269/50000 [7:18:28<1:41:01,  1.61it/s]


 81%|██████████████████████████▌      | 40270/50000 [7:18:29<1:43:09,  1.57it/s]


 81%|██████████████████████████▌      | 40271/50000 [7:18:29<1:43:06,  1.57it/s]


 81%|██████████████████████████▌      | 40272/50000 [7:18:30<1:40:53,  1.61it/s]


 81%|██████████████████████████▌      | 40273/50000 [7:18:31<1:43:57,  1.56it/s]


 81%|██████████████████████████▌      | 40274/50000 [7:18:31<1:45:44,  1.53it/s]


 81%|██████████████████████████▌      | 40275/50000 [7:18:32<1:46:37,  1.52it/s]


 81%|██████████████████████████▌      | 40276/50000 [7:18:33<1:40:03,  1.62it/s]


 81%|██████████████████████████▌      | 40277/50000 [7:18:33<1:40:13,  1.62it/s]


 81%|██████████████████████████▌      | 40278/50000 [7:18:34<1:48:07,  1.50it/s]


 81%|██████████████████████████▌      | 40279/50000 [7:18:35<1:57:17,  1.38it/s]


 81%|██████████████████████████▌      | 40280/50000 [7:18:35<1:54:44,  1.41it/s]


 81%|██████████████████████████▌      | 40281/50000 [7:18:36<1:51:59,  1.45it/s]


 81%|██████████████████████████▌      | 40282/50000 [7:18:37<1:49:46,  1.48it/s]


 81%|██████████████████████████▌      | 40283/50000 [7:18:37<1:51:18,  1.45it/s]


 81%|██████████████████████████▌      | 40284/50000 [7:18:38<1:51:07,  1.46it/s]


 81%|██████████████████████████▌      | 40285/50000 [7:18:39<1:48:50,  1.49it/s]


 81%|██████████████████████████▌      | 40286/50000 [7:18:39<1:50:38,  1.46it/s]


 81%|██████████████████████████▌      | 40287/50000 [7:18:40<2:03:16,  1.31it/s]


 81%|██████████████████████████▌      | 40288/50000 [7:18:41<1:53:39,  1.42it/s]


 81%|██████████████████████████▌      | 40289/50000 [7:18:42<1:50:19,  1.47it/s]


 81%|██████████████████████████▌      | 40290/50000 [7:18:42<1:47:58,  1.50it/s]


 81%|██████████████████████████▌      | 40291/50000 [7:18:43<1:46:56,  1.51it/s]


 81%|██████████████████████████▌      | 40292/50000 [7:18:44<1:46:00,  1.53it/s]


 81%|██████████████████████████▌      | 40293/50000 [7:18:44<1:40:43,  1.61it/s]


 81%|██████████████████████████▌      | 40294/50000 [7:18:45<1:49:46,  1.47it/s]


 81%|██████████████████████████▌      | 40295/50000 [7:18:46<1:47:17,  1.51it/s]


 81%|██████████████████████████▌      | 40296/50000 [7:18:46<1:48:16,  1.49it/s]


 81%|██████████████████████████▌      | 40297/50000 [7:18:47<1:48:41,  1.49it/s]


 81%|██████████████████████████▌      | 40298/50000 [7:18:48<1:56:11,  1.39it/s]


 81%|██████████████████████████▌      | 40299/50000 [7:18:48<1:54:06,  1.42it/s]


 81%|██████████████████████████▌      | 40300/50000 [7:18:49<1:50:59,  1.46it/s]
                                                                                
{'loss': 3.1436, 'grad_norm': 2.812462329864502, 'learning_rate': 0.000194, 'epoch': 2.11}

 81%|██████████████████████████▌      | 40300/50000 [7:18:49<1:50:59,  1.46it/s]


 81%|██████████████████████████▌      | 40301/50000 [7:18:50<1:55:42,  1.40it/s]


 81%|██████████████████████████▌      | 40302/50000 [7:18:50<1:49:01,  1.48it/s]


 81%|██████████████████████████▌      | 40303/50000 [7:18:51<1:44:02,  1.55it/s]


 81%|██████████████████████████▌      | 40304/50000 [7:18:52<1:50:29,  1.46it/s]


 81%|██████████████████████████▌      | 40305/50000 [7:18:52<1:50:15,  1.47it/s]


 81%|██████████████████████████▌      | 40306/50000 [7:18:53<1:49:29,  1.48it/s]


 81%|██████████████████████████▌      | 40307/50000 [7:18:54<1:44:03,  1.55it/s]


 81%|██████████████████████████▌      | 40308/50000 [7:18:54<1:48:55,  1.48it/s]


 81%|██████████████████████████▌      | 40309/50000 [7:18:55<1:43:58,  1.55it/s]


 81%|██████████████████████████▌      | 40310/50000 [7:18:56<1:41:09,  1.60it/s]


 81%|██████████████████████████▌      | 40311/50000 [7:18:56<1:39:43,  1.62it/s]


 81%|██████████████████████████▌      | 40312/50000 [7:18:57<1:39:00,  1.63it/s]


 81%|██████████████████████████▌      | 40313/50000 [7:18:57<1:37:04,  1.66it/s]


 81%|██████████████████████████▌      | 40314/50000 [7:18:58<1:37:32,  1.65it/s]


 81%|██████████████████████████▌      | 40315/50000 [7:18:58<1:33:04,  1.73it/s]


 81%|██████████████████████████▌      | 40316/50000 [7:18:59<1:29:20,  1.81it/s]


 81%|██████████████████████████▌      | 40317/50000 [7:19:00<1:43:41,  1.56it/s]


 81%|██████████████████████████▌      | 40318/50000 [7:19:00<1:43:00,  1.57it/s]


 81%|██████████████████████████▌      | 40319/50000 [7:19:01<1:43:45,  1.56it/s]


 81%|██████████████████████████▌      | 40320/50000 [7:19:02<1:39:05,  1.63it/s]


 81%|██████████████████████████▌      | 40321/50000 [7:19:02<1:38:14,  1.64it/s]


 81%|██████████████████████████▌      | 40322/50000 [7:19:03<1:39:48,  1.62it/s]


 81%|██████████████████████████▌      | 40323/50000 [7:19:04<1:40:17,  1.61it/s]


 81%|██████████████████████████▌      | 40324/50000 [7:19:04<1:43:23,  1.56it/s]


 81%|██████████████████████████▌      | 40325/50000 [7:19:05<1:43:42,  1.55it/s]


 81%|██████████████████████████▌      | 40326/50000 [7:19:06<1:44:49,  1.54it/s]


 81%|██████████████████████████▌      | 40327/50000 [7:19:06<1:45:18,  1.53it/s]


 81%|██████████████████████████▌      | 40328/50000 [7:19:07<1:43:54,  1.55it/s]


 81%|██████████████████████████▌      | 40329/50000 [7:19:07<1:41:00,  1.60it/s]


 81%|██████████████████████████▌      | 40330/50000 [7:19:08<1:46:08,  1.52it/s]


 81%|██████████████████████████▌      | 40331/50000 [7:19:09<1:54:06,  1.41it/s]


 81%|██████████████████████████▌      | 40332/50000 [7:19:09<1:46:30,  1.51it/s]


 81%|██████████████████████████▌      | 40333/50000 [7:19:10<1:41:30,  1.59it/s]


 81%|██████████████████████████▌      | 40334/50000 [7:19:11<1:38:35,  1.63it/s]


 81%|██████████████████████████▌      | 40335/50000 [7:19:11<1:43:42,  1.55it/s]


 81%|██████████████████████████▌      | 40336/50000 [7:19:12<1:43:47,  1.55it/s]


 81%|██████████████████████████▌      | 40337/50000 [7:19:13<1:43:33,  1.56it/s]


 81%|██████████████████████████▌      | 40338/50000 [7:19:13<1:45:12,  1.53it/s]


 81%|██████████████████████████▌      | 40339/50000 [7:19:14<1:42:59,  1.56it/s]


 81%|██████████████████████████▌      | 40340/50000 [7:19:15<1:41:09,  1.59it/s]


 81%|██████████████████████████▋      | 40341/50000 [7:19:15<1:48:38,  1.48it/s]


 81%|██████████████████████████▋      | 40342/50000 [7:19:16<1:47:21,  1.50it/s]


 81%|██████████████████████████▋      | 40343/50000 [7:19:16<1:41:52,  1.58it/s]


 81%|██████████████████████████▋      | 40344/50000 [7:19:17<1:46:02,  1.52it/s]


 81%|██████████████████████████▋      | 40345/50000 [7:19:18<1:42:42,  1.57it/s]


 81%|██████████████████████████▋      | 40346/50000 [7:19:18<1:41:13,  1.59it/s]


 81%|██████████████████████████▋      | 40347/50000 [7:19:19<1:50:34,  1.45it/s]


 81%|██████████████████████████▋      | 40348/50000 [7:19:20<1:46:30,  1.51it/s]


 81%|██████████████████████████▋      | 40349/50000 [7:19:20<1:45:09,  1.53it/s]


 81%|██████████████████████████▋      | 40350/50000 [7:19:21<1:44:45,  1.54it/s]


 81%|██████████████████████████▋      | 40351/50000 [7:19:22<1:48:34,  1.48it/s]


 81%|██████████████████████████▋      | 40352/50000 [7:19:23<1:48:40,  1.48it/s]


 81%|██████████████████████████▋      | 40353/50000 [7:19:23<1:38:52,  1.63it/s]


 81%|██████████████████████████▋      | 40354/50000 [7:19:24<1:40:59,  1.59it/s]


 81%|██████████████████████████▋      | 40355/50000 [7:19:24<1:43:22,  1.55it/s]


 81%|██████████████████████████▋      | 40356/50000 [7:19:25<1:43:48,  1.55it/s]


 81%|██████████████████████████▋      | 40357/50000 [7:19:26<1:40:01,  1.61it/s]


 81%|██████████████████████████▋      | 40358/50000 [7:19:26<1:37:51,  1.64it/s]


 81%|██████████████████████████▋      | 40359/50000 [7:19:27<1:33:04,  1.73it/s]


 81%|██████████████████████████▋      | 40360/50000 [7:19:27<1:37:46,  1.64it/s]


 81%|██████████████████████████▋      | 40361/50000 [7:19:28<1:43:02,  1.56it/s]


 81%|██████████████████████████▋      | 40362/50000 [7:19:29<1:43:14,  1.56it/s]


 81%|██████████████████████████▋      | 40363/50000 [7:19:29<1:42:27,  1.57it/s]


 81%|██████████████████████████▋      | 40364/50000 [7:19:30<1:40:40,  1.60it/s]


 81%|██████████████████████████▋      | 40365/50000 [7:19:30<1:38:49,  1.62it/s]


 81%|██████████████████████████▋      | 40366/50000 [7:19:31<1:39:58,  1.61it/s]


 81%|██████████████████████████▋      | 40367/50000 [7:19:32<1:41:43,  1.58it/s]


 81%|██████████████████████████▋      | 40368/50000 [7:19:32<1:41:23,  1.58it/s]


 81%|██████████████████████████▋      | 40369/50000 [7:19:33<1:35:12,  1.69it/s]


 81%|██████████████████████████▋      | 40370/50000 [7:19:34<1:41:50,  1.58it/s]


 81%|██████████████████████████▋      | 40371/50000 [7:19:34<1:46:08,  1.51it/s]


 81%|██████████████████████████▋      | 40372/50000 [7:19:35<1:50:23,  1.45it/s]


 81%|██████████████████████████▋      | 40373/50000 [7:19:36<1:58:25,  1.35it/s]


 81%|██████████████████████████▋      | 40374/50000 [7:19:37<1:48:01,  1.49it/s]


 81%|██████████████████████████▋      | 40375/50000 [7:19:37<1:43:39,  1.55it/s]


 81%|██████████████████████████▋      | 40376/50000 [7:19:38<1:44:15,  1.54it/s]


 81%|██████████████████████████▋      | 40377/50000 [7:19:38<1:42:50,  1.56it/s]


 81%|██████████████████████████▋      | 40378/50000 [7:19:39<1:40:36,  1.59it/s]


 81%|██████████████████████████▋      | 40379/50000 [7:19:40<1:40:53,  1.59it/s]


 81%|██████████████████████████▋      | 40380/50000 [7:19:40<1:37:07,  1.65it/s]


 81%|██████████████████████████▋      | 40381/50000 [7:19:41<1:36:52,  1.65it/s]


 81%|██████████████████████████▋      | 40382/50000 [7:19:41<1:39:29,  1.61it/s]


 81%|██████████████████████████▋      | 40383/50000 [7:19:42<1:40:40,  1.59it/s]


 81%|██████████████████████████▋      | 40384/50000 [7:19:43<1:36:58,  1.65it/s]


 81%|██████████████████████████▋      | 40385/50000 [7:19:43<1:38:56,  1.62it/s]


 81%|██████████████████████████▋      | 40386/50000 [7:19:44<1:37:32,  1.64it/s]


 81%|██████████████████████████▋      | 40387/50000 [7:19:45<1:48:04,  1.48it/s]


 81%|██████████████████████████▋      | 40388/50000 [7:19:45<1:52:07,  1.43it/s]


 81%|██████████████████████████▋      | 40389/50000 [7:19:46<1:44:49,  1.53it/s]


 81%|██████████████████████████▋      | 40390/50000 [7:19:47<1:54:51,  1.39it/s]


 81%|██████████████████████████▋      | 40391/50000 [7:19:47<1:48:11,  1.48it/s]


 81%|██████████████████████████▋      | 40392/50000 [7:19:48<1:46:36,  1.50it/s]


 81%|██████████████████████████▋      | 40393/50000 [7:19:49<1:44:50,  1.53it/s]


 81%|██████████████████████████▋      | 40394/50000 [7:19:49<1:45:43,  1.51it/s]


 81%|██████████████████████████▋      | 40395/50000 [7:19:50<1:46:38,  1.50it/s]


 81%|██████████████████████████▋      | 40396/50000 [7:19:51<1:42:32,  1.56it/s]


 81%|██████████████████████████▋      | 40397/50000 [7:19:51<1:38:56,  1.62it/s]


 81%|██████████████████████████▋      | 40398/50000 [7:19:52<1:39:09,  1.61it/s]


 81%|██████████████████████████▋      | 40399/50000 [7:19:52<1:41:56,  1.57it/s]


 81%|██████████████████████████▋      | 40400/50000 [7:19:53<1:36:14,  1.66it/s]


                                                                                
{'loss': 3.1651, 'grad_norm': 3.1814091205596924, 'learning_rate': 0.000192, 'epoch': 2.12}

 81%|██████████████████████████▋      | 40400/50000 [7:19:53<1:36:14,  1.66it/s]


 81%|██████████████████████████▋      | 40401/50000 [7:19:54<1:36:38,  1.66it/s]


 81%|██████████████████████████▋      | 40402/50000 [7:19:54<1:36:53,  1.65it/s]


 81%|██████████████████████████▋      | 40403/50000 [7:19:55<1:44:31,  1.53it/s]


 81%|██████████████████████████▋      | 40404/50000 [7:19:56<1:56:26,  1.37it/s]


 81%|██████████████████████████▋      | 40405/50000 [7:19:56<1:46:03,  1.51it/s]


 81%|██████████████████████████▋      | 40406/50000 [7:19:57<1:38:48,  1.62it/s]


 81%|██████████████████████████▋      | 40407/50000 [7:19:58<1:40:51,  1.59it/s]


 81%|██████████████████████████▋      | 40408/50000 [7:19:58<1:46:02,  1.51it/s]


 81%|██████████████████████████▋      | 40409/50000 [7:19:59<1:45:54,  1.51it/s]


 81%|██████████████████████████▋      | 40410/50000 [7:20:00<1:44:16,  1.53it/s]


 81%|██████████████████████████▋      | 40411/50000 [7:20:00<1:52:32,  1.42it/s]


 81%|██████████████████████████▋      | 40412/50000 [7:20:01<1:52:53,  1.42it/s]


 81%|██████████████████████████▋      | 40413/50000 [7:20:02<1:47:33,  1.49it/s]


 81%|██████████████████████████▋      | 40414/50000 [7:20:03<1:54:59,  1.39it/s]


 81%|██████████████████████████▋      | 40415/50000 [7:20:03<1:55:49,  1.38it/s]


 81%|██████████████████████████▋      | 40416/50000 [7:20:04<1:53:15,  1.41it/s]


 81%|██████████████████████████▋      | 40417/50000 [7:20:05<1:52:01,  1.43it/s]


 81%|██████████████████████████▋      | 40418/50000 [7:20:05<1:45:42,  1.51it/s]


 81%|██████████████████████████▋      | 40419/50000 [7:20:06<1:49:54,  1.45it/s]


 81%|██████████████████████████▋      | 40420/50000 [7:20:07<1:47:50,  1.48it/s]


 81%|██████████████████████████▋      | 40421/50000 [7:20:07<1:42:28,  1.56it/s]


 81%|██████████████████████████▋      | 40422/50000 [7:20:08<1:38:43,  1.62it/s]


 81%|██████████████████████████▋      | 40423/50000 [7:20:08<1:39:41,  1.60it/s]


 81%|██████████████████████████▋      | 40424/50000 [7:20:09<1:44:45,  1.52it/s]


 81%|██████████████████████████▋      | 40425/50000 [7:20:10<1:36:52,  1.65it/s]


 81%|██████████████████████████▋      | 40426/50000 [7:20:10<1:41:45,  1.57it/s]


 81%|██████████████████████████▋      | 40427/50000 [7:20:11<1:40:00,  1.60it/s]


 81%|██████████████████████████▋      | 40428/50000 [7:20:12<1:38:30,  1.62it/s]


 81%|██████████████████████████▋      | 40429/50000 [7:20:12<1:48:17,  1.47it/s]


 81%|██████████████████████████▋      | 40430/50000 [7:20:13<1:44:52,  1.52it/s]


 81%|██████████████████████████▋      | 40431/50000 [7:20:14<1:40:05,  1.59it/s]


 81%|██████████████████████████▋      | 40432/50000 [7:20:14<1:47:25,  1.48it/s]


 81%|██████████████████████████▋      | 40433/50000 [7:20:15<1:45:45,  1.51it/s]


 81%|██████████████████████████▋      | 40434/50000 [7:20:15<1:40:40,  1.58it/s]


 81%|██████████████████████████▋      | 40435/50000 [7:20:16<1:44:23,  1.53it/s]


 81%|██████████████████████████▋      | 40436/50000 [7:20:17<1:44:51,  1.52it/s]


 81%|██████████████████████████▋      | 40437/50000 [7:20:18<1:44:15,  1.53it/s]


 81%|██████████████████████████▋      | 40438/50000 [7:20:18<1:53:29,  1.40it/s]


 81%|██████████████████████████▋      | 40439/50000 [7:20:19<1:58:44,  1.34it/s]


 81%|██████████████████████████▋      | 40440/50000 [7:20:20<1:58:19,  1.35it/s]


 81%|██████████████████████████▋      | 40441/50000 [7:20:20<1:50:04,  1.45it/s]


 81%|██████████████████████████▋      | 40442/50000 [7:20:21<1:51:21,  1.43it/s]


 81%|██████████████████████████▋      | 40443/50000 [7:20:22<1:52:32,  1.42it/s]


 81%|██████████████████████████▋      | 40444/50000 [7:20:23<1:48:44,  1.46it/s]


 81%|██████████████████████████▋      | 40445/50000 [7:20:23<1:54:47,  1.39it/s]


 81%|██████████████████████████▋      | 40446/50000 [7:20:24<1:48:46,  1.46it/s]


 81%|██████████████████████████▋      | 40447/50000 [7:20:25<1:48:47,  1.46it/s]


 81%|██████████████████████████▋      | 40448/50000 [7:20:25<1:51:19,  1.43it/s]


 81%|██████████████████████████▋      | 40449/50000 [7:20:26<1:47:38,  1.48it/s]


 81%|██████████████████████████▋      | 40450/50000 [7:20:27<1:46:15,  1.50it/s]


 81%|██████████████████████████▋      | 40451/50000 [7:20:27<1:46:45,  1.49it/s]


 81%|██████████████████████████▋      | 40452/50000 [7:20:28<1:50:54,  1.43it/s]


 81%|██████████████████████████▋      | 40453/50000 [7:20:29<1:49:25,  1.45it/s]


 81%|██████████████████████████▋      | 40454/50000 [7:20:29<1:47:49,  1.48it/s]


 81%|██████████████████████████▋      | 40455/50000 [7:20:30<1:46:40,  1.49it/s]


 81%|██████████████████████████▋      | 40456/50000 [7:20:31<1:43:08,  1.54it/s]


 81%|██████████████████████████▋      | 40457/50000 [7:20:31<1:42:44,  1.55it/s]


 81%|██████████████████████████▋      | 40458/50000 [7:20:32<1:48:18,  1.47it/s]


 81%|██████████████████████████▋      | 40459/50000 [7:20:33<1:51:35,  1.43it/s]


 81%|██████████████████████████▋      | 40460/50000 [7:20:33<1:46:48,  1.49it/s]


 81%|██████████████████████████▋      | 40461/50000 [7:20:34<1:51:43,  1.42it/s]


 81%|██████████████████████████▋      | 40462/50000 [7:20:35<1:46:35,  1.49it/s]


 81%|██████████████████████████▋      | 40463/50000 [7:20:35<1:40:44,  1.58it/s]


 81%|██████████████████████████▋      | 40464/50000 [7:20:36<1:41:19,  1.57it/s]


 81%|██████████████████████████▋      | 40465/50000 [7:20:37<1:47:15,  1.48it/s]


 81%|██████████████████████████▋      | 40466/50000 [7:20:37<1:43:08,  1.54it/s]


 81%|██████████████████████████▋      | 40467/50000 [7:20:38<1:41:08,  1.57it/s]


 81%|██████████████████████████▋      | 40468/50000 [7:20:39<1:38:16,  1.62it/s]


 81%|██████████████████████████▋      | 40469/50000 [7:20:39<1:38:22,  1.61it/s]


 81%|██████████████████████████▋      | 40470/50000 [7:20:40<1:39:56,  1.59it/s]


 81%|██████████████████████████▋      | 40471/50000 [7:20:40<1:40:41,  1.58it/s]


 81%|██████████████████████████▋      | 40472/50000 [7:20:41<1:40:19,  1.58it/s]


 81%|██████████████████████████▋      | 40473/50000 [7:20:42<1:45:18,  1.51it/s]


 81%|██████████████████████████▋      | 40474/50000 [7:20:42<1:42:03,  1.56it/s]


 81%|██████████████████████████▋      | 40475/50000 [7:20:43<1:39:11,  1.60it/s]


 81%|██████████████████████████▋      | 40476/50000 [7:20:44<1:40:09,  1.58it/s]


 81%|██████████████████████████▋      | 40477/50000 [7:20:44<1:36:38,  1.64it/s]


 81%|██████████████████████████▋      | 40478/50000 [7:20:45<1:43:00,  1.54it/s]


 81%|██████████████████████████▋      | 40479/50000 [7:20:45<1:38:20,  1.61it/s]


 81%|██████████████████████████▋      | 40480/50000 [7:20:46<1:37:08,  1.63it/s]


 81%|██████████████████████████▋      | 40481/50000 [7:20:47<1:36:20,  1.65it/s]


 81%|██████████████████████████▋      | 40482/50000 [7:20:47<1:35:25,  1.66it/s]


 81%|██████████████████████████▋      | 40483/50000 [7:20:48<1:35:30,  1.66it/s]


 81%|██████████████████████████▋      | 40484/50000 [7:20:48<1:35:16,  1.66it/s]


 81%|██████████████████████████▋      | 40485/50000 [7:20:49<1:34:33,  1.68it/s]


 81%|██████████████████████████▋      | 40486/50000 [7:20:50<1:40:43,  1.57it/s]


 81%|██████████████████████████▋      | 40487/50000 [7:20:51<1:47:21,  1.48it/s]


 81%|██████████████████████████▋      | 40488/50000 [7:20:51<1:41:48,  1.56it/s]


 81%|██████████████████████████▋      | 40489/50000 [7:20:52<1:38:46,  1.60it/s]


 81%|██████████████████████████▋      | 40490/50000 [7:20:52<1:40:14,  1.58it/s]


 81%|██████████████████████████▋      | 40491/50000 [7:20:53<1:44:40,  1.51it/s]


 81%|██████████████████████████▋      | 40492/50000 [7:20:54<1:52:14,  1.41it/s]


 81%|██████████████████████████▋      | 40493/50000 [7:20:55<1:54:40,  1.38it/s]


 81%|██████████████████████████▋      | 40494/50000 [7:20:55<1:54:06,  1.39it/s]


 81%|██████████████████████████▋      | 40495/50000 [7:20:56<1:56:51,  1.36it/s]


 81%|██████████████████████████▋      | 40496/50000 [7:20:57<1:56:01,  1.37it/s]


 81%|██████████████████████████▋      | 40497/50000 [7:20:58<1:52:37,  1.41it/s]


 81%|██████████████████████████▋      | 40498/50000 [7:20:58<1:48:25,  1.46it/s]


 81%|██████████████████████████▋      | 40499/50000 [7:20:59<1:50:06,  1.44it/s]


 81%|██████████████████████████▋      | 40500/50000 [7:21:00<1:48:45,  1.46it/s]
                                                                                
{'loss': 3.1836, 'grad_norm': 3.9275295734405518, 'learning_rate': 0.00019, 'epoch': 2.12}

 81%|██████████████████████████▋      | 40500/50000 [7:21:00<1:48:45,  1.46it/s]


 81%|██████████████████████████▋      | 40501/50000 [7:21:00<1:53:20,  1.40it/s]


 81%|██████████████████████████▋      | 40502/50000 [7:21:01<1:51:35,  1.42it/s]


 81%|██████████████████████████▋      | 40503/50000 [7:21:02<1:53:11,  1.40it/s]


 81%|██████████████████████████▋      | 40504/50000 [7:21:02<1:53:25,  1.40it/s]


 81%|██████████████████████████▋      | 40505/50000 [7:21:03<1:47:44,  1.47it/s]


 81%|██████████████████████████▋      | 40506/50000 [7:21:04<1:41:41,  1.56it/s]


 81%|██████████████████████████▋      | 40507/50000 [7:21:04<1:42:51,  1.54it/s]


 81%|██████████████████████████▋      | 40508/50000 [7:21:05<1:43:21,  1.53it/s]


 81%|██████████████████████████▋      | 40509/50000 [7:21:06<1:40:27,  1.57it/s]


 81%|██████████████████████████▋      | 40510/50000 [7:21:06<1:49:07,  1.45it/s]


 81%|██████████████████████████▋      | 40511/50000 [7:21:07<1:55:29,  1.37it/s]


 81%|██████████████████████████▋      | 40512/50000 [7:21:08<1:50:42,  1.43it/s]


 81%|██████████████████████████▋      | 40513/50000 [7:21:09<1:53:08,  1.40it/s]


 81%|██████████████████████████▋      | 40514/50000 [7:21:09<1:47:31,  1.47it/s]


 81%|██████████████████████████▋      | 40515/50000 [7:21:10<1:43:46,  1.52it/s]


 81%|██████████████████████████▋      | 40516/50000 [7:21:10<1:39:28,  1.59it/s]


 81%|██████████████████████████▋      | 40517/50000 [7:21:11<1:42:12,  1.55it/s]


 81%|██████████████████████████▋      | 40518/50000 [7:21:12<1:41:56,  1.55it/s]


 81%|██████████████████████████▋      | 40519/50000 [7:21:12<1:43:53,  1.52it/s]


 81%|██████████████████████████▋      | 40520/50000 [7:21:13<1:49:50,  1.44it/s]


 81%|██████████████████████████▋      | 40521/50000 [7:21:14<1:44:37,  1.51it/s]


 81%|██████████████████████████▋      | 40522/50000 [7:21:14<1:41:05,  1.56it/s]


 81%|██████████████████████████▋      | 40523/50000 [7:21:15<1:38:55,  1.60it/s]


 81%|██████████████████████████▋      | 40524/50000 [7:21:15<1:36:01,  1.64it/s]


 81%|██████████████████████████▋      | 40525/50000 [7:21:16<1:39:19,  1.59it/s]


 81%|██████████████████████████▋      | 40526/50000 [7:21:17<1:33:58,  1.68it/s]


 81%|██████████████████████████▋      | 40527/50000 [7:21:17<1:33:13,  1.69it/s]


 81%|██████████████████████████▋      | 40528/50000 [7:21:18<1:33:48,  1.68it/s]


 81%|██████████████████████████▋      | 40529/50000 [7:21:18<1:37:44,  1.61it/s]


 81%|██████████████████████████▋      | 40530/50000 [7:21:19<1:35:24,  1.65it/s]


 81%|██████████████████████████▊      | 40531/50000 [7:21:20<1:38:18,  1.61it/s]


 81%|██████████████████████████▊      | 40532/50000 [7:21:20<1:37:01,  1.63it/s]


 81%|██████████████████████████▊      | 40533/50000 [7:21:21<1:37:56,  1.61it/s]


 81%|██████████████████████████▊      | 40534/50000 [7:21:22<1:36:25,  1.64it/s]


 81%|██████████████████████████▊      | 40535/50000 [7:21:22<1:43:17,  1.53it/s]


 81%|██████████████████████████▊      | 40536/50000 [7:21:23<1:44:02,  1.52it/s]


 81%|██████████████████████████▊      | 40537/50000 [7:21:24<1:41:16,  1.56it/s]


 81%|██████████████████████████▊      | 40538/50000 [7:21:24<1:41:43,  1.55it/s]


 81%|██████████████████████████▊      | 40539/50000 [7:21:25<1:54:04,  1.38it/s]


 81%|██████████████████████████▊      | 40540/50000 [7:21:26<2:03:18,  1.28it/s]


 81%|██████████████████████████▊      | 40541/50000 [7:21:27<2:01:26,  1.30it/s]


 81%|██████████████████████████▊      | 40542/50000 [7:21:27<1:49:06,  1.44it/s]


 81%|██████████████████████████▊      | 40543/50000 [7:21:28<1:48:49,  1.45it/s]


 81%|██████████████████████████▊      | 40544/50000 [7:21:29<1:46:59,  1.47it/s]


 81%|██████████████████████████▊      | 40545/50000 [7:21:29<1:54:27,  1.38it/s]


 81%|██████████████████████████▊      | 40546/50000 [7:21:30<1:56:33,  1.35it/s]


 81%|██████████████████████████▊      | 40547/50000 [7:21:31<1:52:28,  1.40it/s]


 81%|██████████████████████████▊      | 40548/50000 [7:21:32<1:49:05,  1.44it/s]


 81%|██████████████████████████▊      | 40549/50000 [7:21:32<1:48:31,  1.45it/s]


 81%|██████████████████████████▊      | 40550/50000 [7:21:33<1:48:02,  1.46it/s]


 81%|██████████████████████████▊      | 40551/50000 [7:21:34<1:50:32,  1.42it/s]


 81%|██████████████████████████▊      | 40552/50000 [7:21:34<1:47:09,  1.47it/s]


 81%|██████████████████████████▊      | 40553/50000 [7:21:35<1:47:01,  1.47it/s]


 81%|██████████████████████████▊      | 40554/50000 [7:21:36<1:43:19,  1.52it/s]


 81%|██████████████████████████▊      | 40555/50000 [7:21:36<1:36:06,  1.64it/s]


 81%|██████████████████████████▊      | 40556/50000 [7:21:37<1:35:40,  1.65it/s]


 81%|██████████████████████████▊      | 40557/50000 [7:21:37<1:34:45,  1.66it/s]


 81%|██████████████████████████▊      | 40558/50000 [7:21:38<1:44:31,  1.51it/s]


 81%|██████████████████████████▊      | 40559/50000 [7:21:39<1:37:14,  1.62it/s]


 81%|██████████████████████████▊      | 40560/50000 [7:21:39<1:46:53,  1.47it/s]


 81%|██████████████████████████▊      | 40561/50000 [7:21:40<1:43:29,  1.52it/s]


 81%|██████████████████████████▊      | 40562/50000 [7:21:41<1:39:55,  1.57it/s]


 81%|██████████████████████████▊      | 40563/50000 [7:21:41<1:37:43,  1.61it/s]


 81%|██████████████████████████▊      | 40564/50000 [7:21:42<1:39:27,  1.58it/s]


 81%|██████████████████████████▊      | 40565/50000 [7:21:43<1:44:05,  1.51it/s]


 81%|██████████████████████████▊      | 40566/50000 [7:21:43<1:44:49,  1.50it/s]


 81%|██████████████████████████▊      | 40567/50000 [7:21:44<1:39:58,  1.57it/s]


 81%|██████████████████████████▊      | 40568/50000 [7:21:44<1:39:31,  1.58it/s]


 81%|██████████████████████████▊      | 40569/50000 [7:21:45<1:36:51,  1.62it/s]


 81%|██████████████████████████▊      | 40570/50000 [7:21:46<1:32:08,  1.71it/s]


 81%|██████████████████████████▊      | 40571/50000 [7:21:46<1:32:34,  1.70it/s]


 81%|██████████████████████████▊      | 40572/50000 [7:21:47<1:35:50,  1.64it/s]


 81%|██████████████████████████▊      | 40573/50000 [7:21:47<1:36:41,  1.62it/s]


 81%|██████████████████████████▊      | 40574/50000 [7:21:48<1:35:32,  1.64it/s]


 81%|██████████████████████████▊      | 40575/50000 [7:21:49<1:36:31,  1.63it/s]


 81%|██████████████████████████▊      | 40576/50000 [7:21:49<1:37:34,  1.61it/s]


 81%|██████████████████████████▊      | 40577/50000 [7:21:50<1:36:54,  1.62it/s]


 81%|██████████████████████████▊      | 40578/50000 [7:21:51<1:38:08,  1.60it/s]


 81%|██████████████████████████▊      | 40579/50000 [7:21:51<1:40:59,  1.55it/s]


 81%|██████████████████████████▊      | 40580/50000 [7:21:52<1:37:55,  1.60it/s]


 81%|██████████████████████████▊      | 40581/50000 [7:21:52<1:39:02,  1.59it/s]


 81%|██████████████████████████▊      | 40582/50000 [7:21:53<1:33:25,  1.68it/s]


 81%|██████████████████████████▊      | 40583/50000 [7:21:54<1:33:29,  1.68it/s]


 81%|██████████████████████████▊      | 40584/50000 [7:21:54<1:32:58,  1.69it/s]


 81%|██████████████████████████▊      | 40585/50000 [7:21:55<1:32:53,  1.69it/s]


 81%|██████████████████████████▊      | 40586/50000 [7:21:55<1:36:43,  1.62it/s]


 81%|██████████████████████████▊      | 40587/50000 [7:21:56<1:41:12,  1.55it/s]


 81%|██████████████████████████▊      | 40588/50000 [7:21:57<1:46:46,  1.47it/s]


 81%|██████████████████████████▊      | 40589/50000 [7:21:58<1:48:08,  1.45it/s]


 81%|██████████████████████████▊      | 40590/50000 [7:21:58<1:58:36,  1.32it/s]


 81%|██████████████████████████▊      | 40591/50000 [7:21:59<1:53:23,  1.38it/s]


 81%|██████████████████████████▊      | 40592/50000 [7:22:00<1:53:06,  1.39it/s]


 81%|██████████████████████████▊      | 40593/50000 [7:22:00<1:47:28,  1.46it/s]


 81%|██████████████████████████▊      | 40594/50000 [7:22:01<1:41:26,  1.55it/s]


 81%|██████████████████████████▊      | 40595/50000 [7:22:02<1:47:27,  1.46it/s]


 81%|██████████████████████████▊      | 40596/50000 [7:22:02<1:42:40,  1.53it/s]


 81%|██████████████████████████▊      | 40597/50000 [7:22:03<1:45:25,  1.49it/s]


 81%|██████████████████████████▊      | 40598/50000 [7:22:04<1:48:49,  1.44it/s]


 81%|██████████████████████████▊      | 40599/50000 [7:22:04<1:42:45,  1.52it/s]


 81%|██████████████████████████▊      | 40600/50000 [7:22:05<1:42:58,  1.52it/s]
                                                                                
{'loss': 3.1154, 'grad_norm': 2.8690340518951416, 'learning_rate': 0.00018800000000000002, 'epoch': 2.13}

 81%|██████████████████████████▊      | 40600/50000 [7:22:05<1:42:58,  1.52it/s]


 81%|██████████████████████████▊      | 40601/50000 [7:22:06<1:39:35,  1.57it/s]


 81%|██████████████████████████▊      | 40602/50000 [7:22:06<1:39:50,  1.57it/s]


 81%|██████████████████████████▊      | 40603/50000 [7:22:07<1:42:21,  1.53it/s]


 81%|██████████████████████████▊      | 40604/50000 [7:22:08<1:46:23,  1.47it/s]


 81%|██████████████████████████▊      | 40605/50000 [7:22:08<1:45:46,  1.48it/s]


 81%|██████████████████████████▊      | 40606/50000 [7:22:09<1:41:10,  1.55it/s]


 81%|██████████████████████████▊      | 40607/50000 [7:22:10<1:44:13,  1.50it/s]


 81%|██████████████████████████▊      | 40608/50000 [7:22:10<1:44:50,  1.49it/s]


 81%|██████████████████████████▊      | 40609/50000 [7:22:11<1:42:53,  1.52it/s]


 81%|██████████████████████████▊      | 40610/50000 [7:22:12<1:40:21,  1.56it/s]


 81%|██████████████████████████▊      | 40611/50000 [7:22:12<1:41:49,  1.54it/s]


 81%|██████████████████████████▊      | 40612/50000 [7:22:13<1:38:03,  1.60it/s]


 81%|██████████████████████████▊      | 40613/50000 [7:22:13<1:34:30,  1.66it/s]


 81%|██████████████████████████▊      | 40614/50000 [7:22:14<1:35:52,  1.63it/s]


 81%|██████████████████████████▊      | 40615/50000 [7:22:15<1:35:01,  1.65it/s]


 81%|██████████████████████████▊      | 40616/50000 [7:22:15<1:41:04,  1.55it/s]


 81%|██████████████████████████▊      | 40617/50000 [7:22:16<1:36:40,  1.62it/s]


 81%|██████████████████████████▊      | 40618/50000 [7:22:16<1:35:58,  1.63it/s]


 81%|██████████████████████████▊      | 40619/50000 [7:22:17<1:45:52,  1.48it/s]


 81%|██████████████████████████▊      | 40620/50000 [7:22:18<1:48:46,  1.44it/s]


 81%|██████████████████████████▊      | 40621/50000 [7:22:19<1:45:21,  1.48it/s]


 81%|██████████████████████████▊      | 40622/50000 [7:22:19<1:44:07,  1.50it/s]


 81%|██████████████████████████▊      | 40623/50000 [7:22:20<1:42:04,  1.53it/s]


 81%|██████████████████████████▊      | 40624/50000 [7:22:21<1:41:30,  1.54it/s]


 81%|██████████████████████████▊      | 40625/50000 [7:22:21<1:40:52,  1.55it/s]


 81%|██████████████████████████▊      | 40626/50000 [7:22:22<1:40:06,  1.56it/s]


 81%|██████████████████████████▊      | 40627/50000 [7:22:23<1:41:02,  1.55it/s]


 81%|██████████████████████████▊      | 40628/50000 [7:22:23<1:49:39,  1.42it/s]


 81%|██████████████████████████▊      | 40629/50000 [7:22:24<1:44:26,  1.50it/s]


 81%|██████████████████████████▊      | 40630/50000 [7:22:24<1:39:38,  1.57it/s]


 81%|██████████████████████████▊      | 40631/50000 [7:22:25<1:33:10,  1.68it/s]


 81%|██████████████████████████▊      | 40632/50000 [7:22:26<1:39:37,  1.57it/s]


 81%|██████████████████████████▊      | 40633/50000 [7:22:26<1:35:24,  1.64it/s]


 81%|██████████████████████████▊      | 40634/50000 [7:22:27<1:35:19,  1.64it/s]


 81%|██████████████████████████▊      | 40635/50000 [7:22:28<1:46:50,  1.46it/s]


 81%|██████████████████████████▊      | 40636/50000 [7:22:29<1:58:28,  1.32it/s]


 81%|██████████████████████████▊      | 40637/50000 [7:22:29<1:52:08,  1.39it/s]


 81%|██████████████████████████▊      | 40638/50000 [7:22:30<1:49:41,  1.42it/s]


 81%|██████████████████████████▊      | 40639/50000 [7:22:31<1:45:18,  1.48it/s]


 81%|██████████████████████████▊      | 40640/50000 [7:22:31<1:42:51,  1.52it/s]


 81%|██████████████████████████▊      | 40641/50000 [7:22:32<1:55:03,  1.36it/s]


 81%|██████████████████████████▊      | 40642/50000 [7:22:33<1:50:44,  1.41it/s]


 81%|██████████████████████████▊      | 40643/50000 [7:22:33<1:46:33,  1.46it/s]


 81%|██████████████████████████▊      | 40644/50000 [7:22:34<1:45:58,  1.47it/s]


 81%|██████████████████████████▊      | 40645/50000 [7:22:35<1:40:31,  1.55it/s]


 81%|██████████████████████████▊      | 40646/50000 [7:22:35<1:44:12,  1.50it/s]


 81%|██████████████████████████▊      | 40647/50000 [7:22:36<1:40:17,  1.55it/s]


 81%|██████████████████████████▊      | 40648/50000 [7:22:37<1:38:03,  1.59it/s]


 81%|██████████████████████████▊      | 40649/50000 [7:22:37<1:35:34,  1.63it/s]


 81%|██████████████████████████▊      | 40650/50000 [7:22:38<1:34:43,  1.65it/s]


 81%|██████████████████████████▊      | 40651/50000 [7:22:38<1:38:16,  1.59it/s]


 81%|██████████████████████████▊      | 40652/50000 [7:22:39<1:38:04,  1.59it/s]


 81%|██████████████████████████▊      | 40653/50000 [7:22:40<1:40:17,  1.55it/s]


 81%|██████████████████████████▊      | 40654/50000 [7:22:40<1:41:33,  1.53it/s]


 81%|██████████████████████████▊      | 40655/50000 [7:22:41<1:41:15,  1.54it/s]


 81%|██████████████████████████▊      | 40656/50000 [7:22:42<1:34:23,  1.65it/s]


 81%|██████████████████████████▊      | 40657/50000 [7:22:42<1:40:13,  1.55it/s]


 81%|██████████████████████████▊      | 40658/50000 [7:22:43<1:41:12,  1.54it/s]


 81%|██████████████████████████▊      | 40659/50000 [7:22:43<1:38:38,  1.58it/s]


 81%|██████████████████████████▊      | 40660/50000 [7:22:44<1:40:46,  1.54it/s]


 81%|██████████████████████████▊      | 40661/50000 [7:22:45<1:40:27,  1.55it/s]


 81%|██████████████████████████▊      | 40662/50000 [7:22:45<1:36:38,  1.61it/s]


 81%|██████████████████████████▊      | 40663/50000 [7:22:46<1:41:26,  1.53it/s]


 81%|██████████████████████████▊      | 40664/50000 [7:22:47<1:39:04,  1.57it/s]


 81%|██████████████████████████▊      | 40665/50000 [7:22:47<1:35:38,  1.63it/s]


 81%|██████████████████████████▊      | 40666/50000 [7:22:48<1:41:36,  1.53it/s]


 81%|██████████████████████████▊      | 40667/50000 [7:22:49<1:37:29,  1.60it/s]


 81%|██████████████████████████▊      | 40668/50000 [7:22:50<1:54:44,  1.36it/s]


 81%|██████████████████████████▊      | 40669/50000 [7:22:50<1:49:33,  1.42it/s]


 81%|██████████████████████████▊      | 40670/50000 [7:22:51<1:43:36,  1.50it/s]


 81%|██████████████████████████▊      | 40671/50000 [7:22:51<1:42:57,  1.51it/s]


 81%|██████████████████████████▊      | 40672/50000 [7:22:52<1:34:58,  1.64it/s]


 81%|██████████████████████████▊      | 40673/50000 [7:22:52<1:31:53,  1.69it/s]


 81%|██████████████████████████▊      | 40674/50000 [7:22:53<1:35:00,  1.64it/s]


 81%|██████████████████████████▊      | 40675/50000 [7:22:54<1:36:19,  1.61it/s]


 81%|██████████████████████████▊      | 40676/50000 [7:22:54<1:35:28,  1.63it/s]


 81%|██████████████████████████▊      | 40677/50000 [7:22:55<1:36:36,  1.61it/s]


 81%|██████████████████████████▊      | 40678/50000 [7:22:56<1:34:36,  1.64it/s]


 81%|██████████████████████████▊      | 40679/50000 [7:22:56<1:40:21,  1.55it/s]


 81%|██████████████████████████▊      | 40680/50000 [7:22:57<1:48:52,  1.43it/s]


 81%|██████████████████████████▊      | 40681/50000 [7:22:58<1:50:32,  1.41it/s]


 81%|██████████████████████████▊      | 40682/50000 [7:22:58<1:44:48,  1.48it/s]


 81%|██████████████████████████▊      | 40683/50000 [7:22:59<1:37:42,  1.59it/s]


 81%|██████████████████████████▊      | 40684/50000 [7:23:00<1:36:10,  1.61it/s]


 81%|██████████████████████████▊      | 40685/50000 [7:23:00<1:33:35,  1.66it/s]


 81%|██████████████████████████▊      | 40686/50000 [7:23:01<1:35:01,  1.63it/s]


 81%|██████████████████████████▊      | 40687/50000 [7:23:02<1:40:33,  1.54it/s]


 81%|██████████████████████████▊      | 40688/50000 [7:23:02<1:38:04,  1.58it/s]


 81%|██████████████████████████▊      | 40689/50000 [7:23:03<1:34:37,  1.64it/s]


 81%|██████████████████████████▊      | 40690/50000 [7:23:03<1:34:14,  1.65it/s]


 81%|██████████████████████████▊      | 40691/50000 [7:23:04<1:40:02,  1.55it/s]


 81%|██████████████████████████▊      | 40692/50000 [7:23:05<1:44:55,  1.48it/s]


 81%|██████████████████████████▊      | 40693/50000 [7:23:06<1:53:15,  1.37it/s]


 81%|██████████████████████████▊      | 40694/50000 [7:23:06<1:50:47,  1.40it/s]


 81%|██████████████████████████▊      | 40695/50000 [7:23:07<1:44:21,  1.49it/s]


 81%|██████████████████████████▊      | 40696/50000 [7:23:07<1:37:21,  1.59it/s]


 81%|██████████████████████████▊      | 40697/50000 [7:23:08<1:39:39,  1.56it/s]


 81%|██████████████████████████▊      | 40698/50000 [7:23:09<1:36:33,  1.61it/s]


 81%|██████████████████████████▊      | 40699/50000 [7:23:09<1:31:52,  1.69it/s]


 81%|██████████████████████████▊      | 40700/50000 [7:23:10<1:43:08,  1.50it/s]
                                                                                
{'loss': 3.1658, 'grad_norm': 3.7325997352600098, 'learning_rate': 0.000186, 'epoch': 2.13}

 81%|██████████████████████████▊      | 40700/50000 [7:23:10<1:43:08,  1.50it/s]


 81%|██████████████████████████▊      | 40701/50000 [7:23:11<1:39:18,  1.56it/s]


 81%|██████████████████████████▊      | 40702/50000 [7:23:11<1:49:13,  1.42it/s]


 81%|██████████████████████████▊      | 40703/50000 [7:23:12<1:46:28,  1.46it/s]


 81%|██████████████████████████▊      | 40704/50000 [7:23:13<1:46:01,  1.46it/s]


 81%|██████████████████████████▊      | 40705/50000 [7:23:14<1:49:55,  1.41it/s]


 81%|██████████████████████████▊      | 40706/50000 [7:23:14<1:52:57,  1.37it/s]


 81%|██████████████████████████▊      | 40707/50000 [7:23:15<1:55:22,  1.34it/s]


 81%|██████████████████████████▊      | 40708/50000 [7:23:16<1:51:54,  1.38it/s]


 81%|██████████████████████████▊      | 40709/50000 [7:23:17<1:57:26,  1.32it/s]


 81%|██████████████████████████▊      | 40710/50000 [7:23:17<1:49:32,  1.41it/s]


 81%|██████████████████████████▊      | 40711/50000 [7:23:18<1:45:47,  1.46it/s]


 81%|██████████████████████████▊      | 40712/50000 [7:23:18<1:44:55,  1.48it/s]


 81%|██████████████████████████▊      | 40713/50000 [7:23:19<1:46:30,  1.45it/s]


 81%|██████████████████████████▊      | 40714/50000 [7:23:20<1:40:27,  1.54it/s]


 81%|██████████████████████████▊      | 40715/50000 [7:23:21<1:49:59,  1.41it/s]


 81%|██████████████████████████▊      | 40716/50000 [7:23:21<1:47:03,  1.45it/s]


 81%|██████████████████████████▊      | 40717/50000 [7:23:22<1:45:12,  1.47it/s]


 81%|██████████████████████████▊      | 40718/50000 [7:23:23<1:52:25,  1.38it/s]


 81%|██████████████████████████▊      | 40719/50000 [7:23:24<1:54:05,  1.36it/s]


 81%|██████████████████████████▉      | 40720/50000 [7:23:24<1:56:07,  1.33it/s]


 81%|██████████████████████████▉      | 40721/50000 [7:23:25<1:47:44,  1.44it/s]


 81%|██████████████████████████▉      | 40722/50000 [7:23:25<1:44:47,  1.48it/s]


 81%|██████████████████████████▉      | 40723/50000 [7:23:26<1:40:12,  1.54it/s]


 81%|██████████████████████████▉      | 40724/50000 [7:23:27<1:44:06,  1.48it/s]


 81%|██████████████████████████▉      | 40725/50000 [7:23:27<1:41:39,  1.52it/s]


 81%|██████████████████████████▉      | 40726/50000 [7:23:28<1:44:43,  1.48it/s]


 81%|██████████████████████████▉      | 40727/50000 [7:23:29<1:44:08,  1.48it/s]


 81%|██████████████████████████▉      | 40728/50000 [7:23:29<1:39:09,  1.56it/s]


 81%|██████████████████████████▉      | 40729/50000 [7:23:30<1:39:38,  1.55it/s]


 81%|██████████████████████████▉      | 40730/50000 [7:23:31<1:41:09,  1.53it/s]


 81%|██████████████████████████▉      | 40731/50000 [7:23:31<1:38:23,  1.57it/s]


 81%|██████████████████████████▉      | 40732/50000 [7:23:32<1:35:34,  1.62it/s]


 81%|██████████████████████████▉      | 40733/50000 [7:23:33<1:38:36,  1.57it/s]


 81%|██████████████████████████▉      | 40734/50000 [7:23:33<1:44:56,  1.47it/s]


 81%|██████████████████████████▉      | 40735/50000 [7:23:34<1:44:33,  1.48it/s]


 81%|██████████████████████████▉      | 40736/50000 [7:23:35<1:44:41,  1.47it/s]


 81%|██████████████████████████▉      | 40737/50000 [7:23:35<1:44:27,  1.48it/s]


 81%|██████████████████████████▉      | 40738/50000 [7:23:36<1:42:26,  1.51it/s]


 81%|██████████████████████████▉      | 40739/50000 [7:23:37<1:46:42,  1.45it/s]


 81%|██████████████████████████▉      | 40740/50000 [7:23:37<1:40:47,  1.53it/s]


 81%|██████████████████████████▉      | 40741/50000 [7:23:38<1:39:52,  1.55it/s]


 81%|██████████████████████████▉      | 40742/50000 [7:23:39<1:37:44,  1.58it/s]


 81%|██████████████████████████▉      | 40743/50000 [7:23:39<1:42:30,  1.51it/s]


 81%|██████████████████████████▉      | 40744/50000 [7:23:40<1:38:43,  1.56it/s]


 81%|██████████████████████████▉      | 40745/50000 [7:23:41<1:46:33,  1.45it/s]


 81%|██████████████████████████▉      | 40746/50000 [7:23:41<1:42:42,  1.50it/s]


 81%|██████████████████████████▉      | 40747/50000 [7:23:42<1:41:28,  1.52it/s]


 81%|██████████████████████████▉      | 40748/50000 [7:23:43<1:50:15,  1.40it/s]


 81%|██████████████████████████▉      | 40749/50000 [7:23:43<1:47:37,  1.43it/s]


 82%|██████████████████████████▉      | 40750/50000 [7:23:44<1:39:29,  1.55it/s]


 82%|██████████████████████████▉      | 40751/50000 [7:23:45<1:39:12,  1.55it/s]


 82%|██████████████████████████▉      | 40752/50000 [7:23:45<1:40:45,  1.53it/s]


 82%|██████████████████████████▉      | 40753/50000 [7:23:46<1:40:39,  1.53it/s]


 82%|██████████████████████████▉      | 40754/50000 [7:23:47<1:44:12,  1.48it/s]


 82%|██████████████████████████▉      | 40755/50000 [7:23:47<1:46:59,  1.44it/s]


 82%|██████████████████████████▉      | 40756/50000 [7:23:48<1:41:32,  1.52it/s]


 82%|██████████████████████████▉      | 40757/50000 [7:23:49<1:43:51,  1.48it/s]


 82%|██████████████████████████▉      | 40758/50000 [7:23:49<1:47:38,  1.43it/s]


 82%|██████████████████████████▉      | 40759/50000 [7:23:50<1:46:39,  1.44it/s]


 82%|██████████████████████████▉      | 40760/50000 [7:23:51<1:45:40,  1.46it/s]


 82%|██████████████████████████▉      | 40761/50000 [7:23:51<1:45:14,  1.46it/s]


 82%|██████████████████████████▉      | 40762/50000 [7:23:52<1:47:23,  1.43it/s]


 82%|██████████████████████████▉      | 40763/50000 [7:23:53<1:48:20,  1.42it/s]


 82%|██████████████████████████▉      | 40764/50000 [7:23:54<1:49:16,  1.41it/s]


 82%|██████████████████████████▉      | 40765/50000 [7:23:54<1:46:52,  1.44it/s]


 82%|██████████████████████████▉      | 40766/50000 [7:23:55<1:41:24,  1.52it/s]


 82%|██████████████████████████▉      | 40767/50000 [7:23:55<1:38:45,  1.56it/s]


 82%|██████████████████████████▉      | 40768/50000 [7:23:56<1:36:07,  1.60it/s]


 82%|██████████████████████████▉      | 40769/50000 [7:23:57<1:40:03,  1.54it/s]


 82%|██████████████████████████▉      | 40770/50000 [7:23:57<1:39:50,  1.54it/s]


 82%|██████████████████████████▉      | 40771/50000 [7:23:58<1:36:45,  1.59it/s]


 82%|██████████████████████████▉      | 40772/50000 [7:23:59<1:41:24,  1.52it/s]


 82%|██████████████████████████▉      | 40773/50000 [7:23:59<1:41:40,  1.51it/s]


 82%|██████████████████████████▉      | 40774/50000 [7:24:00<1:41:51,  1.51it/s]


 82%|██████████████████████████▉      | 40775/50000 [7:24:01<1:42:19,  1.50it/s]


 82%|██████████████████████████▉      | 40776/50000 [7:24:01<1:35:44,  1.61it/s]


 82%|██████████████████████████▉      | 40777/50000 [7:24:02<1:36:35,  1.59it/s]


 82%|██████████████████████████▉      | 40778/50000 [7:24:02<1:34:00,  1.63it/s]


 82%|██████████████████████████▉      | 40779/50000 [7:24:03<1:35:54,  1.60it/s]


 82%|██████████████████████████▉      | 40780/50000 [7:24:04<1:33:24,  1.65it/s]


 82%|██████████████████████████▉      | 40781/50000 [7:24:04<1:34:56,  1.62it/s]


 82%|██████████████████████████▉      | 40782/50000 [7:24:05<1:32:18,  1.66it/s]


 82%|██████████████████████████▉      | 40783/50000 [7:24:06<1:39:32,  1.54it/s]


 82%|██████████████████████████▉      | 40784/50000 [7:24:07<1:49:03,  1.41it/s]


 82%|██████████████████████████▉      | 40785/50000 [7:24:07<1:42:19,  1.50it/s]


 82%|██████████████████████████▉      | 40786/50000 [7:24:08<1:41:03,  1.52it/s]


 82%|██████████████████████████▉      | 40787/50000 [7:24:08<1:36:06,  1.60it/s]


 82%|██████████████████████████▉      | 40788/50000 [7:24:09<1:30:11,  1.70it/s]


 82%|██████████████████████████▉      | 40789/50000 [7:24:09<1:34:03,  1.63it/s]


 82%|██████████████████████████▉      | 40790/50000 [7:24:10<1:34:59,  1.62it/s]


 82%|██████████████████████████▉      | 40791/50000 [7:24:11<1:37:41,  1.57it/s]


 82%|██████████████████████████▉      | 40792/50000 [7:24:11<1:38:21,  1.56it/s]


 82%|██████████████████████████▉      | 40793/50000 [7:24:12<1:35:25,  1.61it/s]


 82%|██████████████████████████▉      | 40794/50000 [7:24:13<1:46:12,  1.44it/s]


 82%|██████████████████████████▉      | 40795/50000 [7:24:13<1:42:01,  1.50it/s]


 82%|██████████████████████████▉      | 40796/50000 [7:24:14<1:41:02,  1.52it/s]


 82%|██████████████████████████▉      | 40797/50000 [7:24:15<1:45:49,  1.45it/s]


 82%|██████████████████████████▉      | 40798/50000 [7:24:15<1:42:54,  1.49it/s]


 82%|██████████████████████████▉      | 40799/50000 [7:24:16<1:45:55,  1.45it/s]


 82%|██████████████████████████▉      | 40800/50000 [7:24:17<1:44:25,  1.47it/s]
                                                                                
{'loss': 3.1464, 'grad_norm': 3.036376476287842, 'learning_rate': 0.000184, 'epoch': 2.14}

 82%|██████████████████████████▉      | 40800/50000 [7:24:17<1:44:25,  1.47it/s]


 82%|██████████████████████████▉      | 40801/50000 [7:24:17<1:40:12,  1.53it/s]


 82%|██████████████████████████▉      | 40802/50000 [7:24:18<1:39:41,  1.54it/s]


 82%|██████████████████████████▉      | 40803/50000 [7:24:19<1:48:00,  1.42it/s]


 82%|██████████████████████████▉      | 40804/50000 [7:24:20<1:49:28,  1.40it/s]


 82%|██████████████████████████▉      | 40805/50000 [7:24:20<1:40:41,  1.52it/s]


 82%|██████████████████████████▉      | 40806/50000 [7:24:21<1:48:18,  1.41it/s]


 82%|██████████████████████████▉      | 40807/50000 [7:24:22<1:41:43,  1.51it/s]


 82%|██████████████████████████▉      | 40808/50000 [7:24:22<1:38:15,  1.56it/s]


 82%|██████████████████████████▉      | 40809/50000 [7:24:23<1:42:03,  1.50it/s]


 82%|██████████████████████████▉      | 40810/50000 [7:24:24<1:41:40,  1.51it/s]


 82%|██████████████████████████▉      | 40811/50000 [7:24:24<1:46:12,  1.44it/s]


 82%|██████████████████████████▉      | 40812/50000 [7:24:25<1:43:07,  1.48it/s]


 82%|██████████████████████████▉      | 40813/50000 [7:24:26<1:40:41,  1.52it/s]


 82%|██████████████████████████▉      | 40814/50000 [7:24:26<1:39:49,  1.53it/s]


 82%|██████████████████████████▉      | 40815/50000 [7:24:27<1:38:43,  1.55it/s]


 82%|██████████████████████████▉      | 40816/50000 [7:24:27<1:38:49,  1.55it/s]


 82%|██████████████████████████▉      | 40817/50000 [7:24:28<1:36:52,  1.58it/s]


 82%|██████████████████████████▉      | 40818/50000 [7:24:29<1:36:23,  1.59it/s]


 82%|██████████████████████████▉      | 40819/50000 [7:24:29<1:38:22,  1.56it/s]


 82%|██████████████████████████▉      | 40820/50000 [7:24:30<1:50:29,  1.38it/s]


 82%|██████████████████████████▉      | 40821/50000 [7:24:31<1:54:35,  1.33it/s]


 82%|██████████████████████████▉      | 40822/50000 [7:24:32<1:53:13,  1.35it/s]


 82%|██████████████████████████▉      | 40823/50000 [7:24:33<1:53:57,  1.34it/s]


 82%|██████████████████████████▉      | 40824/50000 [7:24:33<1:54:28,  1.34it/s]


 82%|██████████████████████████▉      | 40825/50000 [7:24:34<1:54:31,  1.34it/s]


 82%|██████████████████████████▉      | 40826/50000 [7:24:35<1:50:22,  1.39it/s]


 82%|██████████████████████████▉      | 40827/50000 [7:24:35<1:48:19,  1.41it/s]


 82%|██████████████████████████▉      | 40828/50000 [7:24:36<1:42:15,  1.49it/s]


 82%|██████████████████████████▉      | 40829/50000 [7:24:37<1:38:34,  1.55it/s]


 82%|██████████████████████████▉      | 40830/50000 [7:24:37<1:41:30,  1.51it/s]


 82%|██████████████████████████▉      | 40831/50000 [7:24:38<1:40:56,  1.51it/s]


 82%|██████████████████████████▉      | 40832/50000 [7:24:39<1:38:13,  1.56it/s]


 82%|██████████████████████████▉      | 40833/50000 [7:24:39<1:35:27,  1.60it/s]


 82%|██████████████████████████▉      | 40834/50000 [7:24:40<1:37:22,  1.57it/s]


 82%|██████████████████████████▉      | 40835/50000 [7:24:40<1:36:02,  1.59it/s]


 82%|██████████████████████████▉      | 40836/50000 [7:24:41<1:36:30,  1.58it/s]


 82%|██████████████████████████▉      | 40837/50000 [7:24:42<1:31:28,  1.67it/s]


 82%|██████████████████████████▉      | 40838/50000 [7:24:42<1:33:06,  1.64it/s]


 82%|██████████████████████████▉      | 40839/50000 [7:24:43<1:36:01,  1.59it/s]


 82%|██████████████████████████▉      | 40840/50000 [7:24:44<1:40:12,  1.52it/s]


 82%|██████████████████████████▉      | 40841/50000 [7:24:44<1:47:35,  1.42it/s]


 82%|██████████████████████████▉      | 40842/50000 [7:24:45<1:38:27,  1.55it/s]


 82%|██████████████████████████▉      | 40843/50000 [7:24:46<1:37:59,  1.56it/s]


 82%|██████████████████████████▉      | 40844/50000 [7:24:46<1:35:56,  1.59it/s]


 82%|██████████████████████████▉      | 40845/50000 [7:24:47<1:37:01,  1.57it/s]


 82%|██████████████████████████▉      | 40846/50000 [7:24:48<1:41:19,  1.51it/s]


 82%|██████████████████████████▉      | 40847/50000 [7:24:48<1:37:19,  1.57it/s]


 82%|██████████████████████████▉      | 40848/50000 [7:24:49<1:37:22,  1.57it/s]


 82%|██████████████████████████▉      | 40849/50000 [7:24:49<1:35:24,  1.60it/s]


 82%|██████████████████████████▉      | 40850/50000 [7:24:50<1:38:04,  1.55it/s]


 82%|██████████████████████████▉      | 40851/50000 [7:24:51<1:41:23,  1.50it/s]


 82%|██████████████████████████▉      | 40852/50000 [7:24:51<1:34:00,  1.62it/s]


 82%|██████████████████████████▉      | 40853/50000 [7:24:52<1:31:19,  1.67it/s]


 82%|██████████████████████████▉      | 40854/50000 [7:24:52<1:27:49,  1.74it/s]


 82%|██████████████████████████▉      | 40855/50000 [7:24:53<1:29:09,  1.71it/s]


 82%|██████████████████████████▉      | 40856/50000 [7:24:54<1:31:57,  1.66it/s]


 82%|██████████████████████████▉      | 40857/50000 [7:24:54<1:37:46,  1.56it/s]


 82%|██████████████████████████▉      | 40858/50000 [7:24:55<1:39:06,  1.54it/s]


 82%|██████████████████████████▉      | 40859/50000 [7:24:56<1:36:52,  1.57it/s]


 82%|██████████████████████████▉      | 40860/50000 [7:24:56<1:37:00,  1.57it/s]


 82%|██████████████████████████▉      | 40861/50000 [7:24:57<1:42:26,  1.49it/s]


 82%|██████████████████████████▉      | 40862/50000 [7:24:58<1:42:56,  1.48it/s]


 82%|██████████████████████████▉      | 40863/50000 [7:24:58<1:37:31,  1.56it/s]


 82%|██████████████████████████▉      | 40864/50000 [7:24:59<1:36:37,  1.58it/s]


 82%|██████████████████████████▉      | 40865/50000 [7:24:59<1:37:25,  1.56it/s]


 82%|██████████████████████████▉      | 40866/50000 [7:25:00<1:46:36,  1.43it/s]


 82%|██████████████████████████▉      | 40867/50000 [7:25:01<1:40:05,  1.52it/s]


 82%|██████████████████████████▉      | 40868/50000 [7:25:01<1:33:04,  1.64it/s]


 82%|██████████████████████████▉      | 40869/50000 [7:25:02<1:39:20,  1.53it/s]


 82%|██████████████████████████▉      | 40870/50000 [7:25:03<1:31:25,  1.66it/s]


 82%|██████████████████████████▉      | 40871/50000 [7:25:03<1:31:09,  1.67it/s]


 82%|██████████████████████████▉      | 40872/50000 [7:25:04<1:33:12,  1.63it/s]


 82%|██████████████████████████▉      | 40873/50000 [7:25:05<1:39:42,  1.53it/s]


 82%|██████████████████████████▉      | 40874/50000 [7:25:05<1:38:57,  1.54it/s]


 82%|██████████████████████████▉      | 40875/50000 [7:25:06<1:39:52,  1.52it/s]


 82%|██████████████████████████▉      | 40876/50000 [7:25:07<1:38:49,  1.54it/s]


 82%|██████████████████████████▉      | 40877/50000 [7:25:07<1:36:52,  1.57it/s]


 82%|██████████████████████████▉      | 40878/50000 [7:25:08<1:44:43,  1.45it/s]


 82%|██████████████████████████▉      | 40879/50000 [7:25:09<1:38:43,  1.54it/s]


 82%|██████████████████████████▉      | 40880/50000 [7:25:09<1:32:39,  1.64it/s]


 82%|██████████████████████████▉      | 40881/50000 [7:25:10<1:31:42,  1.66it/s]


 82%|██████████████████████████▉      | 40882/50000 [7:25:10<1:34:29,  1.61it/s]


 82%|██████████████████████████▉      | 40883/50000 [7:25:11<1:34:42,  1.60it/s]


 82%|██████████████████████████▉      | 40884/50000 [7:25:12<1:51:43,  1.36it/s]


 82%|██████████████████████████▉      | 40885/50000 [7:25:13<1:45:23,  1.44it/s]


 82%|██████████████████████████▉      | 40886/50000 [7:25:13<1:43:17,  1.47it/s]


 82%|██████████████████████████▉      | 40887/50000 [7:25:14<1:45:07,  1.44it/s]


 82%|██████████████████████████▉      | 40888/50000 [7:25:14<1:39:55,  1.52it/s]


 82%|██████████████████████████▉      | 40889/50000 [7:25:15<1:39:43,  1.52it/s]


 82%|██████████████████████████▉      | 40890/50000 [7:25:16<1:40:10,  1.52it/s]


 82%|██████████████████████████▉      | 40891/50000 [7:25:16<1:38:28,  1.54it/s]


 82%|██████████████████████████▉      | 40892/50000 [7:25:17<1:36:07,  1.58it/s]


 82%|██████████████████████████▉      | 40893/50000 [7:25:18<1:36:30,  1.57it/s]


 82%|██████████████████████████▉      | 40894/50000 [7:25:18<1:36:27,  1.57it/s]


 82%|██████████████████████████▉      | 40895/50000 [7:25:19<1:34:24,  1.61it/s]


 82%|██████████████████████████▉      | 40896/50000 [7:25:20<1:35:33,  1.59it/s]


 82%|██████████████████████████▉      | 40897/50000 [7:25:20<1:35:29,  1.59it/s]


 82%|██████████████████████████▉      | 40898/50000 [7:25:21<1:34:01,  1.61it/s]


 82%|██████████████████████████▉      | 40899/50000 [7:25:21<1:33:00,  1.63it/s]


 82%|██████████████████████████▉      | 40900/50000 [7:25:22<1:33:42,  1.62it/s]
                                                                                
{'loss': 3.1481, 'grad_norm': 4.572486400604248, 'learning_rate': 0.000182, 'epoch': 2.14}

 82%|██████████████████████████▉      | 40900/50000 [7:25:22<1:33:42,  1.62it/s]


 82%|██████████████████████████▉      | 40901/50000 [7:25:23<1:43:39,  1.46it/s]


 82%|██████████████████████████▉      | 40902/50000 [7:25:23<1:38:46,  1.54it/s]


 82%|██████████████████████████▉      | 40903/50000 [7:25:24<1:36:30,  1.57it/s]


 82%|██████████████████████████▉      | 40904/50000 [7:25:25<1:41:37,  1.49it/s]


 82%|██████████████████████████▉      | 40905/50000 [7:25:25<1:36:33,  1.57it/s]


 82%|██████████████████████████▉      | 40906/50000 [7:25:26<1:41:25,  1.49it/s]


 82%|██████████████████████████▉      | 40907/50000 [7:25:27<1:40:19,  1.51it/s]


 82%|██████████████████████████▉      | 40908/50000 [7:25:27<1:39:32,  1.52it/s]


 82%|██████████████████████████▉      | 40909/50000 [7:25:28<1:38:29,  1.54it/s]


 82%|███████████████████████████      | 40910/50000 [7:25:29<1:43:01,  1.47it/s]


 82%|███████████████████████████      | 40911/50000 [7:25:29<1:42:03,  1.48it/s]


 82%|███████████████████████████      | 40912/50000 [7:25:30<1:42:12,  1.48it/s]


 82%|███████████████████████████      | 40913/50000 [7:25:31<1:42:04,  1.48it/s]


 82%|███████████████████████████      | 40914/50000 [7:25:31<1:39:58,  1.51it/s]


 82%|███████████████████████████      | 40915/50000 [7:25:32<1:37:01,  1.56it/s]


 82%|███████████████████████████      | 40916/50000 [7:25:33<1:48:30,  1.40it/s]


 82%|███████████████████████████      | 40917/50000 [7:25:34<1:53:00,  1.34it/s]


 82%|███████████████████████████      | 40918/50000 [7:25:34<1:49:16,  1.39it/s]


 82%|███████████████████████████      | 40919/50000 [7:25:35<1:45:21,  1.44it/s]


 82%|███████████████████████████      | 40920/50000 [7:25:36<1:40:11,  1.51it/s]


 82%|███████████████████████████      | 40921/50000 [7:25:36<1:40:34,  1.50it/s]


 82%|███████████████████████████      | 40922/50000 [7:25:37<1:37:05,  1.56it/s]


 82%|███████████████████████████      | 40923/50000 [7:25:37<1:35:14,  1.59it/s]


 82%|███████████████████████████      | 40924/50000 [7:25:38<1:31:59,  1.64it/s]


 82%|███████████████████████████      | 40925/50000 [7:25:39<1:33:57,  1.61it/s]


 82%|███████████████████████████      | 40926/50000 [7:25:39<1:31:38,  1.65it/s]


 82%|███████████████████████████      | 40927/50000 [7:25:40<1:34:36,  1.60it/s]


 82%|███████████████████████████      | 40928/50000 [7:25:41<1:36:23,  1.57it/s]


 82%|███████████████████████████      | 40929/50000 [7:25:41<1:37:53,  1.54it/s]


 82%|███████████████████████████      | 40930/50000 [7:25:42<1:38:58,  1.53it/s]


 82%|███████████████████████████      | 40931/50000 [7:25:43<1:41:26,  1.49it/s]


 82%|███████████████████████████      | 40932/50000 [7:25:43<1:37:59,  1.54it/s]


 82%|███████████████████████████      | 40933/50000 [7:25:44<1:41:30,  1.49it/s]


 82%|███████████████████████████      | 40934/50000 [7:25:44<1:36:42,  1.56it/s]


 82%|███████████████████████████      | 40935/50000 [7:25:45<1:37:21,  1.55it/s]


 82%|███████████████████████████      | 40936/50000 [7:25:46<1:38:41,  1.53it/s]


 82%|███████████████████████████      | 40937/50000 [7:25:46<1:37:35,  1.55it/s]


 82%|███████████████████████████      | 40938/50000 [7:25:47<1:39:21,  1.52it/s]


 82%|███████████████████████████      | 40939/50000 [7:25:48<1:43:19,  1.46it/s]


 82%|███████████████████████████      | 40940/50000 [7:25:48<1:39:38,  1.52it/s]


 82%|███████████████████████████      | 40941/50000 [7:25:49<1:36:30,  1.56it/s]


 82%|███████████████████████████      | 40942/50000 [7:25:50<1:32:53,  1.63it/s]


 82%|███████████████████████████      | 40943/50000 [7:25:50<1:31:00,  1.66it/s]


 82%|███████████████████████████      | 40944/50000 [7:25:51<1:28:30,  1.71it/s]


 82%|███████████████████████████      | 40945/50000 [7:25:51<1:28:08,  1.71it/s]


 82%|███████████████████████████      | 40946/50000 [7:25:52<1:27:19,  1.73it/s]


 82%|███████████████████████████      | 40947/50000 [7:25:52<1:28:09,  1.71it/s]


 82%|███████████████████████████      | 40948/50000 [7:25:53<1:35:42,  1.58it/s]


 82%|███████████████████████████      | 40949/50000 [7:25:54<1:38:02,  1.54it/s]


 82%|███████████████████████████      | 40950/50000 [7:25:54<1:32:05,  1.64it/s]


 82%|███████████████████████████      | 40951/50000 [7:25:55<1:39:40,  1.51it/s]


 82%|███████████████████████████      | 40952/50000 [7:25:56<1:37:55,  1.54it/s]


 82%|███████████████████████████      | 40953/50000 [7:25:57<1:40:49,  1.50it/s]


 82%|███████████████████████████      | 40954/50000 [7:25:57<1:40:49,  1.50it/s]


 82%|███████████████████████████      | 40955/50000 [7:25:58<1:41:31,  1.48it/s]


 82%|███████████████████████████      | 40956/50000 [7:25:59<1:41:23,  1.49it/s]


 82%|███████████████████████████      | 40957/50000 [7:25:59<1:39:51,  1.51it/s]


 82%|███████████████████████████      | 40958/50000 [7:26:00<1:38:31,  1.53it/s]


 82%|███████████████████████████      | 40959/50000 [7:26:01<1:39:52,  1.51it/s]


 82%|███████████████████████████      | 40960/50000 [7:26:01<1:38:31,  1.53it/s]


 82%|███████████████████████████      | 40961/50000 [7:26:02<1:37:50,  1.54it/s]


 82%|███████████████████████████      | 40962/50000 [7:26:02<1:36:47,  1.56it/s]


 82%|███████████████████████████      | 40963/50000 [7:26:03<1:44:35,  1.44it/s]


 82%|███████████████████████████      | 40964/50000 [7:26:04<1:38:24,  1.53it/s]


 82%|███████████████████████████      | 40965/50000 [7:26:04<1:35:11,  1.58it/s]


 82%|███████████████████████████      | 40966/50000 [7:26:05<1:32:23,  1.63it/s]


 82%|███████████████████████████      | 40967/50000 [7:26:05<1:30:23,  1.67it/s]


 82%|███████████████████████████      | 40968/50000 [7:26:06<1:34:07,  1.60it/s]


 82%|███████████████████████████      | 40969/50000 [7:26:07<1:32:44,  1.62it/s]


 82%|███████████████████████████      | 40970/50000 [7:26:07<1:33:12,  1.61it/s]


 82%|███████████████████████████      | 40971/50000 [7:26:08<1:35:15,  1.58it/s]


 82%|███████████████████████████      | 40972/50000 [7:26:09<1:40:49,  1.49it/s]


 82%|███████████████████████████      | 40973/50000 [7:26:10<1:41:27,  1.48it/s]


 82%|███████████████████████████      | 40974/50000 [7:26:10<1:34:20,  1.59it/s]


 82%|███████████████████████████      | 40975/50000 [7:26:11<1:31:43,  1.64it/s]


 82%|███████████████████████████      | 40976/50000 [7:26:11<1:29:20,  1.68it/s]


 82%|███████████████████████████      | 40977/50000 [7:26:12<1:29:38,  1.68it/s]


 82%|███████████████████████████      | 40978/50000 [7:26:12<1:29:01,  1.69it/s]


 82%|███████████████████████████      | 40979/50000 [7:26:13<1:32:51,  1.62it/s]


 82%|███████████████████████████      | 40980/50000 [7:26:14<1:35:34,  1.57it/s]


 82%|███████████████████████████      | 40981/50000 [7:26:14<1:35:46,  1.57it/s]


 82%|███████████████████████████      | 40982/50000 [7:26:15<1:33:34,  1.61it/s]


 82%|███████████████████████████      | 40983/50000 [7:26:16<1:32:51,  1.62it/s]


 82%|███████████████████████████      | 40984/50000 [7:26:16<1:39:02,  1.52it/s]


 82%|███████████████████████████      | 40985/50000 [7:26:17<1:39:50,  1.50it/s]


 82%|███████████████████████████      | 40986/50000 [7:26:17<1:33:24,  1.61it/s]


 82%|███████████████████████████      | 40987/50000 [7:26:18<1:33:54,  1.60it/s]


 82%|███████████████████████████      | 40988/50000 [7:26:19<1:33:57,  1.60it/s]


 82%|███████████████████████████      | 40989/50000 [7:26:20<1:40:08,  1.50it/s]


 82%|███████████████████████████      | 40990/50000 [7:26:20<1:39:47,  1.50it/s]


 82%|███████████████████████████      | 40991/50000 [7:26:21<1:36:44,  1.55it/s]


 82%|███████████████████████████      | 40992/50000 [7:26:22<1:41:31,  1.48it/s]


 82%|███████████████████████████      | 40993/50000 [7:26:22<1:41:05,  1.48it/s]


 82%|███████████████████████████      | 40994/50000 [7:26:23<1:38:13,  1.53it/s]


 82%|███████████████████████████      | 40995/50000 [7:26:24<1:45:33,  1.42it/s]


 82%|███████████████████████████      | 40996/50000 [7:26:24<1:49:08,  1.37it/s]


 82%|███████████████████████████      | 40997/50000 [7:26:25<1:46:58,  1.40it/s]


 82%|███████████████████████████      | 40998/50000 [7:26:26<1:43:11,  1.45it/s]


 82%|███████████████████████████      | 40999/50000 [7:26:27<1:48:44,  1.38it/s]


 82%|███████████████████████████      | 41000/50000 [7:26:27<1:45:32,  1.42it/s]
                                                                                
{'loss': 3.1709, 'grad_norm': 3.8275797367095947, 'learning_rate': 0.00017999999999999998, 'epoch': 2.15}

 82%|███████████████████████████      | 41000/50000 [7:26:27<1:45:32,  1.42it/s]


 82%|███████████████████████████      | 41001/50000 [7:26:28<1:42:10,  1.47it/s]


 82%|███████████████████████████      | 41002/50000 [7:26:28<1:38:51,  1.52it/s]


 82%|███████████████████████████      | 41003/50000 [7:26:29<1:43:28,  1.45it/s]


 82%|███████████████████████████      | 41004/50000 [7:26:30<1:41:11,  1.48it/s]


 82%|███████████████████████████      | 41005/50000 [7:26:30<1:40:09,  1.50it/s]


 82%|███████████████████████████      | 41006/50000 [7:26:31<1:38:55,  1.52it/s]


 82%|███████████████████████████      | 41007/50000 [7:26:32<1:34:41,  1.58it/s]


 82%|███████████████████████████      | 41008/50000 [7:26:32<1:35:27,  1.57it/s]


 82%|███████████████████████████      | 41009/50000 [7:26:33<1:33:05,  1.61it/s]


 82%|███████████████████████████      | 41010/50000 [7:26:34<1:33:42,  1.60it/s]


 82%|███████████████████████████      | 41011/50000 [7:26:34<1:32:43,  1.62it/s]


 82%|███████████████████████████      | 41012/50000 [7:26:35<1:30:35,  1.65it/s]


 82%|███████████████████████████      | 41013/50000 [7:26:35<1:31:38,  1.63it/s]


 82%|███████████████████████████      | 41014/50000 [7:26:36<1:29:32,  1.67it/s]


 82%|███████████████████████████      | 41015/50000 [7:26:36<1:26:13,  1.74it/s]


 82%|███████████████████████████      | 41016/50000 [7:26:37<1:23:37,  1.79it/s]


 82%|███████████████████████████      | 41017/50000 [7:26:38<1:25:03,  1.76it/s]


 82%|███████████████████████████      | 41018/50000 [7:26:38<1:27:47,  1.71it/s]


 82%|███████████████████████████      | 41019/50000 [7:26:39<1:27:24,  1.71it/s]


 82%|███████████████████████████      | 41020/50000 [7:26:39<1:31:59,  1.63it/s]


 82%|███████████████████████████      | 41021/50000 [7:26:40<1:34:19,  1.59it/s]


 82%|███████████████████████████      | 41022/50000 [7:26:41<1:33:05,  1.61it/s]


 82%|███████████████████████████      | 41023/50000 [7:26:41<1:30:16,  1.66it/s]


 82%|███████████████████████████      | 41024/50000 [7:26:42<1:30:15,  1.66it/s]


 82%|███████████████████████████      | 41025/50000 [7:26:43<1:33:22,  1.60it/s]


 82%|███████████████████████████      | 41026/50000 [7:26:43<1:33:51,  1.59it/s]


 82%|███████████████████████████      | 41027/50000 [7:26:44<1:34:58,  1.57it/s]


 82%|███████████████████████████      | 41028/50000 [7:26:45<1:40:05,  1.49it/s]


 82%|███████████████████████████      | 41029/50000 [7:26:45<1:35:08,  1.57it/s]


 82%|███████████████████████████      | 41030/50000 [7:26:46<1:42:59,  1.45it/s]


 82%|███████████████████████████      | 41031/50000 [7:26:47<1:38:55,  1.51it/s]


 82%|███████████████████████████      | 41032/50000 [7:26:47<1:41:17,  1.48it/s]


 82%|███████████████████████████      | 41033/50000 [7:26:48<1:34:25,  1.58it/s]


 82%|███████████████████████████      | 41034/50000 [7:26:48<1:36:14,  1.55it/s]


 82%|███████████████████████████      | 41035/50000 [7:26:49<1:34:25,  1.58it/s]


 82%|███████████████████████████      | 41036/50000 [7:26:50<1:47:02,  1.40it/s]


 82%|███████████████████████████      | 41037/50000 [7:26:50<1:38:36,  1.51it/s]


 82%|███████████████████████████      | 41038/50000 [7:26:51<1:35:37,  1.56it/s]


 82%|███████████████████████████      | 41039/50000 [7:26:52<1:32:50,  1.61it/s]


 82%|███████████████████████████      | 41040/50000 [7:26:52<1:35:07,  1.57it/s]


 82%|███████████████████████████      | 41041/50000 [7:26:53<1:33:52,  1.59it/s]


 82%|███████████████████████████      | 41042/50000 [7:26:53<1:30:13,  1.65it/s]


 82%|███████████████████████████      | 41043/50000 [7:26:54<1:35:55,  1.56it/s]


 82%|███████████████████████████      | 41044/50000 [7:26:55<1:32:29,  1.61it/s]


 82%|███████████████████████████      | 41045/50000 [7:26:56<1:38:05,  1.52it/s]


 82%|███████████████████████████      | 41046/50000 [7:26:56<1:36:49,  1.54it/s]


 82%|███████████████████████████      | 41047/50000 [7:26:57<1:36:14,  1.55it/s]


 82%|███████████████████████████      | 41048/50000 [7:26:57<1:32:07,  1.62it/s]


 82%|███████████████████████████      | 41049/50000 [7:26:58<1:31:13,  1.64it/s]


 82%|███████████████████████████      | 41050/50000 [7:26:59<1:32:31,  1.61it/s]


 82%|███████████████████████████      | 41051/50000 [7:26:59<1:30:38,  1.65it/s]


 82%|███████████████████████████      | 41052/50000 [7:27:00<1:40:02,  1.49it/s]


 82%|███████████████████████████      | 41053/50000 [7:27:01<1:38:27,  1.51it/s]


 82%|███████████████████████████      | 41054/50000 [7:27:01<1:41:54,  1.46it/s]


 82%|███████████████████████████      | 41055/50000 [7:27:02<1:40:28,  1.48it/s]


 82%|███████████████████████████      | 41056/50000 [7:27:03<1:32:01,  1.62it/s]


 82%|███████████████████████████      | 41057/50000 [7:27:03<1:26:06,  1.73it/s]


 82%|███████████████████████████      | 41058/50000 [7:27:04<1:34:37,  1.58it/s]


 82%|███████████████████████████      | 41059/50000 [7:27:04<1:38:35,  1.51it/s]


 82%|███████████████████████████      | 41060/50000 [7:27:05<1:33:21,  1.60it/s]


 82%|███████████████████████████      | 41061/50000 [7:27:06<1:35:21,  1.56it/s]


 82%|███████████████████████████      | 41062/50000 [7:27:06<1:29:49,  1.66it/s]


 82%|███████████████████████████      | 41063/50000 [7:27:07<1:27:32,  1.70it/s]


 82%|███████████████████████████      | 41064/50000 [7:27:08<1:35:07,  1.57it/s]


 82%|███████████████████████████      | 41065/50000 [7:27:08<1:31:30,  1.63it/s]


 82%|███████████████████████████      | 41066/50000 [7:27:09<1:38:42,  1.51it/s]


 82%|███████████████████████████      | 41067/50000 [7:27:10<1:38:29,  1.51it/s]


 82%|███████████████████████████      | 41068/50000 [7:27:10<1:37:28,  1.53it/s]


 82%|███████████████████████████      | 41069/50000 [7:27:11<1:37:20,  1.53it/s]


 82%|███████████████████████████      | 41070/50000 [7:27:12<1:49:25,  1.36it/s]


 82%|███████████████████████████      | 41071/50000 [7:27:12<1:43:08,  1.44it/s]


 82%|███████████████████████████      | 41072/50000 [7:27:13<1:39:00,  1.50it/s]


 82%|███████████████████████████      | 41073/50000 [7:27:14<1:35:01,  1.57it/s]


 82%|███████████████████████████      | 41074/50000 [7:27:14<1:31:25,  1.63it/s]


 82%|███████████████████████████      | 41075/50000 [7:27:15<1:33:57,  1.58it/s]


 82%|███████████████████████████      | 41076/50000 [7:27:15<1:34:01,  1.58it/s]


 82%|███████████████████████████      | 41077/50000 [7:27:16<1:34:35,  1.57it/s]


 82%|███████████████████████████      | 41078/50000 [7:27:17<1:36:42,  1.54it/s]


 82%|███████████████████████████      | 41079/50000 [7:27:17<1:35:56,  1.55it/s]


 82%|███████████████████████████      | 41080/50000 [7:27:18<1:32:51,  1.60it/s]


 82%|███████████████████████████      | 41081/50000 [7:27:19<1:33:14,  1.59it/s]


 82%|███████████████████████████      | 41082/50000 [7:27:19<1:34:38,  1.57it/s]


 82%|███████████████████████████      | 41083/50000 [7:27:20<1:31:23,  1.63it/s]


 82%|███████████████████████████      | 41084/50000 [7:27:21<1:40:04,  1.48it/s]


 82%|███████████████████████████      | 41085/50000 [7:27:21<1:38:33,  1.51it/s]


 82%|███████████████████████████      | 41086/50000 [7:27:22<1:41:31,  1.46it/s]


 82%|███████████████████████████      | 41087/50000 [7:27:23<1:40:25,  1.48it/s]


 82%|███████████████████████████      | 41088/50000 [7:27:23<1:35:08,  1.56it/s]


 82%|███████████████████████████      | 41089/50000 [7:27:24<1:32:16,  1.61it/s]


 82%|███████████████████████████      | 41090/50000 [7:27:24<1:31:22,  1.63it/s]


 82%|███████████████████████████      | 41091/50000 [7:27:25<1:30:44,  1.64it/s]


 82%|███████████████████████████      | 41092/50000 [7:27:25<1:27:49,  1.69it/s]


 82%|███████████████████████████      | 41093/50000 [7:27:26<1:31:56,  1.61it/s]


 82%|███████████████████████████      | 41094/50000 [7:27:27<1:42:40,  1.45it/s]


 82%|███████████████████████████      | 41095/50000 [7:27:28<1:40:36,  1.48it/s]


 82%|███████████████████████████      | 41096/50000 [7:27:28<1:47:02,  1.39it/s]


 82%|███████████████████████████      | 41097/50000 [7:27:29<1:41:42,  1.46it/s]


 82%|███████████████████████████      | 41098/50000 [7:27:30<1:41:38,  1.46it/s]


 82%|███████████████████████████▏     | 41099/50000 [7:27:30<1:37:20,  1.52it/s]


 82%|███████████████████████████▏     | 41100/50000 [7:27:31<1:34:54,  1.56it/s]
                                                                                
{'loss': 3.125, 'grad_norm': 4.238916873931885, 'learning_rate': 0.000178, 'epoch': 2.15}

 82%|███████████████████████████▏     | 41100/50000 [7:27:31<1:34:54,  1.56it/s]


 82%|███████████████████████████▏     | 41101/50000 [7:27:32<1:31:16,  1.63it/s]


 82%|███████████████████████████▏     | 41102/50000 [7:27:32<1:35:27,  1.55it/s]


 82%|███████████████████████████▏     | 41103/50000 [7:27:33<1:38:22,  1.51it/s]


 82%|███████████████████████████▏     | 41104/50000 [7:27:34<1:35:01,  1.56it/s]


 82%|███████████████████████████▏     | 41105/50000 [7:27:34<1:32:58,  1.59it/s]


 82%|███████████████████████████▏     | 41106/50000 [7:27:35<1:30:58,  1.63it/s]


 82%|███████████████████████████▏     | 41107/50000 [7:27:35<1:29:34,  1.65it/s]


 82%|███████████████████████████▏     | 41108/50000 [7:27:36<1:30:32,  1.64it/s]


 82%|███████████████████████████▏     | 41109/50000 [7:27:37<1:36:25,  1.54it/s]


 82%|███████████████████████████▏     | 41110/50000 [7:27:37<1:40:31,  1.47it/s]


 82%|███████████████████████████▏     | 41111/50000 [7:27:38<1:35:13,  1.56it/s]


 82%|███████████████████████████▏     | 41112/50000 [7:27:39<1:41:32,  1.46it/s]


 82%|███████████████████████████▏     | 41113/50000 [7:27:39<1:42:37,  1.44it/s]


 82%|███████████████████████████▏     | 41114/50000 [7:27:40<1:39:40,  1.49it/s]


 82%|███████████████████████████▏     | 41115/50000 [7:27:41<1:47:25,  1.38it/s]


 82%|███████████████████████████▏     | 41116/50000 [7:27:42<1:48:29,  1.36it/s]


 82%|███████████████████████████▏     | 41117/50000 [7:27:42<1:47:50,  1.37it/s]


 82%|███████████████████████████▏     | 41118/50000 [7:27:43<1:42:10,  1.45it/s]


 82%|███████████████████████████▏     | 41119/50000 [7:27:44<1:41:02,  1.46it/s]


 82%|███████████████████████████▏     | 41120/50000 [7:27:44<1:44:00,  1.42it/s]


 82%|███████████████████████████▏     | 41121/50000 [7:27:45<1:37:00,  1.53it/s]


 82%|███████████████████████████▏     | 41122/50000 [7:27:46<1:40:54,  1.47it/s]


 82%|███████████████████████████▏     | 41123/50000 [7:27:46<1:36:29,  1.53it/s]


 82%|███████████████████████████▏     | 41124/50000 [7:27:47<1:32:00,  1.61it/s]


 82%|███████████████████████████▏     | 41125/50000 [7:27:48<1:33:56,  1.57it/s]


 82%|███████████████████████████▏     | 41126/50000 [7:27:48<1:32:25,  1.60it/s]


 82%|███████████████████████████▏     | 41127/50000 [7:27:49<1:40:56,  1.46it/s]


 82%|███████████████████████████▏     | 41128/50000 [7:27:50<1:39:52,  1.48it/s]


 82%|███████████████████████████▏     | 41129/50000 [7:27:50<1:47:14,  1.38it/s]


 82%|███████████████████████████▏     | 41130/50000 [7:27:51<1:43:58,  1.42it/s]


 82%|███████████████████████████▏     | 41131/50000 [7:27:52<1:50:42,  1.34it/s]


 82%|███████████████████████████▏     | 41132/50000 [7:27:53<1:49:35,  1.35it/s]


 82%|███████████████████████████▏     | 41133/50000 [7:27:53<1:46:43,  1.38it/s]


 82%|███████████████████████████▏     | 41134/50000 [7:27:54<1:43:27,  1.43it/s]


 82%|███████████████████████████▏     | 41135/50000 [7:27:55<1:38:49,  1.50it/s]


 82%|███████████████████████████▏     | 41136/50000 [7:27:55<1:41:31,  1.46it/s]


 82%|███████████████████████████▏     | 41137/50000 [7:27:56<1:39:23,  1.49it/s]


 82%|███████████████████████████▏     | 41138/50000 [7:27:57<1:37:37,  1.51it/s]


 82%|███████████████████████████▏     | 41139/50000 [7:27:57<1:33:37,  1.58it/s]


 82%|███████████████████████████▏     | 41140/50000 [7:27:58<1:33:41,  1.58it/s]


 82%|███████████████████████████▏     | 41141/50000 [7:27:58<1:30:38,  1.63it/s]


 82%|███████████████████████████▏     | 41142/50000 [7:27:59<1:32:49,  1.59it/s]


 82%|███████████████████████████▏     | 41143/50000 [7:28:00<1:30:21,  1.63it/s]


 82%|███████████████████████████▏     | 41144/50000 [7:28:00<1:29:24,  1.65it/s]


 82%|███████████████████████████▏     | 41145/50000 [7:28:01<1:26:48,  1.70it/s]


 82%|███████████████████████████▏     | 41146/50000 [7:28:01<1:26:21,  1.71it/s]


 82%|███████████████████████████▏     | 41147/50000 [7:28:02<1:37:59,  1.51it/s]


 82%|███████████████████████████▏     | 41148/50000 [7:28:03<1:37:38,  1.51it/s]


 82%|███████████████████████████▏     | 41149/50000 [7:28:03<1:35:07,  1.55it/s]


 82%|███████████████████████████▏     | 41150/50000 [7:28:04<1:42:47,  1.44it/s]


 82%|███████████████████████████▏     | 41151/50000 [7:28:05<1:37:31,  1.51it/s]


 82%|███████████████████████████▏     | 41152/50000 [7:28:05<1:32:52,  1.59it/s]


 82%|███████████████████████████▏     | 41153/50000 [7:28:06<1:34:15,  1.56it/s]


 82%|███████████████████████████▏     | 41154/50000 [7:28:07<1:32:39,  1.59it/s]


 82%|███████████████████████████▏     | 41155/50000 [7:28:07<1:30:09,  1.64it/s]


 82%|███████████████████████████▏     | 41156/50000 [7:28:08<1:32:35,  1.59it/s]


 82%|███████████████████████████▏     | 41157/50000 [7:28:09<1:36:41,  1.52it/s]


 82%|███████████████████████████▏     | 41158/50000 [7:28:09<1:35:57,  1.54it/s]


 82%|███████████████████████████▏     | 41159/50000 [7:28:10<1:35:46,  1.54it/s]


 82%|███████████████████████████▏     | 41160/50000 [7:28:11<1:35:04,  1.55it/s]


 82%|███████████████████████████▏     | 41161/50000 [7:28:11<1:34:14,  1.56it/s]


 82%|███████████████████████████▏     | 41162/50000 [7:28:12<1:27:57,  1.67it/s]


 82%|███████████████████████████▏     | 41163/50000 [7:28:12<1:30:40,  1.62it/s]


 82%|███████████████████████████▏     | 41164/50000 [7:28:13<1:36:51,  1.52it/s]


 82%|███████████████████████████▏     | 41165/50000 [7:28:14<1:36:20,  1.53it/s]


 82%|███████████████████████████▏     | 41166/50000 [7:28:14<1:35:39,  1.54it/s]


 82%|███████████████████████████▏     | 41167/50000 [7:28:15<1:34:22,  1.56it/s]


 82%|███████████████████████████▏     | 41168/50000 [7:28:16<1:34:17,  1.56it/s]


 82%|███████████████████████████▏     | 41169/50000 [7:28:16<1:42:19,  1.44it/s]


 82%|███████████████████████████▏     | 41170/50000 [7:28:17<1:41:15,  1.45it/s]


 82%|███████████████████████████▏     | 41171/50000 [7:28:18<1:38:17,  1.50it/s]


 82%|███████████████████████████▏     | 41172/50000 [7:28:18<1:37:35,  1.51it/s]


 82%|███████████████████████████▏     | 41173/50000 [7:28:19<1:30:49,  1.62it/s]


 82%|███████████████████████████▏     | 41174/50000 [7:28:20<1:37:29,  1.51it/s]


 82%|███████████████████████████▏     | 41175/50000 [7:28:20<1:38:11,  1.50it/s]


 82%|███████████████████████████▏     | 41176/50000 [7:28:21<1:41:25,  1.45it/s]


 82%|███████████████████████████▏     | 41177/50000 [7:28:22<1:38:47,  1.49it/s]


 82%|███████████████████████████▏     | 41178/50000 [7:28:22<1:34:39,  1.55it/s]


 82%|███████████████████████████▏     | 41179/50000 [7:28:23<1:37:38,  1.51it/s]


 82%|███████████████████████████▏     | 41180/50000 [7:28:24<1:36:54,  1.52it/s]


 82%|███████████████████████████▏     | 41181/50000 [7:28:24<1:40:10,  1.47it/s]


 82%|███████████████████████████▏     | 41182/50000 [7:28:25<1:31:19,  1.61it/s]


 82%|███████████████████████████▏     | 41183/50000 [7:28:25<1:31:38,  1.60it/s]


 82%|███████████████████████████▏     | 41184/50000 [7:28:26<1:31:52,  1.60it/s]


 82%|███████████████████████████▏     | 41185/50000 [7:28:27<1:33:40,  1.57it/s]


 82%|███████████████████████████▏     | 41186/50000 [7:28:27<1:33:50,  1.57it/s]


 82%|███████████████████████████▏     | 41187/50000 [7:28:28<1:30:54,  1.62it/s]


 82%|███████████████████████████▏     | 41188/50000 [7:28:29<1:41:22,  1.45it/s]


 82%|███████████████████████████▏     | 41189/50000 [7:28:29<1:35:36,  1.54it/s]


 82%|███████████████████████████▏     | 41190/50000 [7:28:30<1:33:45,  1.57it/s]


 82%|███████████████████████████▏     | 41191/50000 [7:28:31<1:33:54,  1.56it/s]


 82%|███████████████████████████▏     | 41192/50000 [7:28:31<1:34:16,  1.56it/s]


 82%|███████████████████████████▏     | 41193/50000 [7:28:32<1:32:34,  1.59it/s]


 82%|███████████████████████████▏     | 41194/50000 [7:28:32<1:29:09,  1.65it/s]


 82%|███████████████████████████▏     | 41195/50000 [7:28:33<1:38:57,  1.48it/s]


 82%|███████████████████████████▏     | 41196/50000 [7:28:34<1:37:13,  1.51it/s]


 82%|███████████████████████████▏     | 41197/50000 [7:28:35<1:40:00,  1.47it/s]


 82%|███████████████████████████▏     | 41198/50000 [7:28:35<1:35:44,  1.53it/s]


 82%|███████████████████████████▏     | 41199/50000 [7:28:36<1:31:54,  1.60it/s]


 82%|███████████████████████████▏     | 41200/50000 [7:28:36<1:33:05,  1.58it/s]
                                                                                
{'loss': 3.1762, 'grad_norm': 2.8342385292053223, 'learning_rate': 0.000176, 'epoch': 2.16}

 82%|███████████████████████████▏     | 41200/50000 [7:28:36<1:33:05,  1.58it/s]


 82%|███████████████████████████▏     | 41201/50000 [7:28:37<1:31:26,  1.60it/s]


 82%|███████████████████████████▏     | 41202/50000 [7:28:38<1:33:02,  1.58it/s]


 82%|███████████████████████████▏     | 41203/50000 [7:28:38<1:29:54,  1.63it/s]


 82%|███████████████████████████▏     | 41204/50000 [7:28:39<1:30:16,  1.62it/s]


 82%|███████████████████████████▏     | 41205/50000 [7:28:40<1:32:09,  1.59it/s]


 82%|███████████████████████████▏     | 41206/50000 [7:28:40<1:41:30,  1.44it/s]


 82%|███████████████████████████▏     | 41207/50000 [7:28:41<1:36:39,  1.52it/s]


 82%|███████████████████████████▏     | 41208/50000 [7:28:42<1:34:05,  1.56it/s]


 82%|███████████████████████████▏     | 41209/50000 [7:28:42<1:33:09,  1.57it/s]


 82%|███████████████████████████▏     | 41210/50000 [7:28:43<1:30:32,  1.62it/s]


 82%|███████████████████████████▏     | 41211/50000 [7:28:43<1:26:20,  1.70it/s]


 82%|███████████████████████████▏     | 41212/50000 [7:28:44<1:28:31,  1.65it/s]


 82%|███████████████████████████▏     | 41213/50000 [7:28:44<1:24:21,  1.74it/s]


 82%|███████████████████████████▏     | 41214/50000 [7:28:45<1:21:28,  1.80it/s]


 82%|███████████████████████████▏     | 41215/50000 [7:28:46<1:25:22,  1.71it/s]


 82%|███████████████████████████▏     | 41216/50000 [7:28:46<1:23:58,  1.74it/s]


 82%|███████████████████████████▏     | 41217/50000 [7:28:47<1:23:49,  1.75it/s]


 82%|███████████████████████████▏     | 41218/50000 [7:28:48<1:43:32,  1.41it/s]


 82%|███████████████████████████▏     | 41219/50000 [7:28:48<1:38:52,  1.48it/s]


 82%|███████████████████████████▏     | 41220/50000 [7:28:49<1:36:44,  1.51it/s]


 82%|███████████████████████████▏     | 41221/50000 [7:28:50<1:35:31,  1.53it/s]


 82%|███████████████████████████▏     | 41222/50000 [7:28:50<1:33:30,  1.56it/s]


 82%|███████████████████████████▏     | 41223/50000 [7:28:51<1:31:20,  1.60it/s]


 82%|███████████████████████████▏     | 41224/50000 [7:28:51<1:33:23,  1.57it/s]


 82%|███████████████████████████▏     | 41225/50000 [7:28:52<1:33:58,  1.56it/s]


 82%|███████████████████████████▏     | 41226/50000 [7:28:53<1:33:33,  1.56it/s]


 82%|███████████████████████████▏     | 41227/50000 [7:28:53<1:29:40,  1.63it/s]


 82%|███████████████████████████▏     | 41228/50000 [7:28:54<1:26:57,  1.68it/s]


 82%|███████████████████████████▏     | 41229/50000 [7:28:54<1:27:29,  1.67it/s]


 82%|███████████████████████████▏     | 41230/50000 [7:28:55<1:29:07,  1.64it/s]


 82%|███████████████████████████▏     | 41231/50000 [7:28:56<1:26:34,  1.69it/s]


 82%|███████████████████████████▏     | 41232/50000 [7:28:56<1:30:16,  1.62it/s]


 82%|███████████████████████████▏     | 41233/50000 [7:28:57<1:40:08,  1.46it/s]


 82%|███████████████████████████▏     | 41234/50000 [7:28:58<1:35:43,  1.53it/s]


 82%|███████████████████████████▏     | 41235/50000 [7:28:58<1:36:13,  1.52it/s]


 82%|███████████████████████████▏     | 41236/50000 [7:28:59<1:38:28,  1.48it/s]


 82%|███████████████████████████▏     | 41237/50000 [7:29:00<1:34:44,  1.54it/s]


 82%|███████████████████████████▏     | 41238/50000 [7:29:01<1:42:53,  1.42it/s]


 82%|███████████████████████████▏     | 41239/50000 [7:29:01<1:40:35,  1.45it/s]


 82%|███████████████████████████▏     | 41240/50000 [7:29:02<1:39:55,  1.46it/s]


 82%|███████████████████████████▏     | 41241/50000 [7:29:03<1:38:28,  1.48it/s]


 82%|███████████████████████████▏     | 41242/50000 [7:29:03<1:37:12,  1.50it/s]


 82%|███████████████████████████▏     | 41243/50000 [7:29:04<1:33:34,  1.56it/s]


 82%|███████████████████████████▏     | 41244/50000 [7:29:05<1:42:37,  1.42it/s]


 82%|███████████████████████████▏     | 41245/50000 [7:29:05<1:43:47,  1.41it/s]


 82%|███████████████████████████▏     | 41246/50000 [7:29:06<1:44:24,  1.40it/s]


 82%|███████████████████████████▏     | 41247/50000 [7:29:07<1:40:47,  1.45it/s]


 82%|███████████████████████████▏     | 41248/50000 [7:29:07<1:36:34,  1.51it/s]


 82%|███████████████████████████▏     | 41249/50000 [7:29:08<1:39:15,  1.47it/s]


 82%|███████████████████████████▏     | 41250/50000 [7:29:09<1:37:43,  1.49it/s]


 83%|███████████████████████████▏     | 41251/50000 [7:29:09<1:34:43,  1.54it/s]


 83%|███████████████████████████▏     | 41252/50000 [7:29:10<1:46:11,  1.37it/s]


 83%|███████████████████████████▏     | 41253/50000 [7:29:11<1:40:05,  1.46it/s]


 83%|███████████████████████████▏     | 41254/50000 [7:29:11<1:36:05,  1.52it/s]


 83%|███████████████████████████▏     | 41255/50000 [7:29:12<1:35:14,  1.53it/s]


 83%|███████████████████████████▏     | 41256/50000 [7:29:13<1:31:51,  1.59it/s]


 83%|███████████████████████████▏     | 41257/50000 [7:29:13<1:41:05,  1.44it/s]


 83%|███████████████████████████▏     | 41258/50000 [7:29:14<1:38:27,  1.48it/s]


 83%|███████████████████████████▏     | 41259/50000 [7:29:15<1:40:10,  1.45it/s]


 83%|███████████████████████████▏     | 41260/50000 [7:29:15<1:35:34,  1.52it/s]


 83%|███████████████████████████▏     | 41261/50000 [7:29:16<1:32:53,  1.57it/s]


 83%|███████████████████████████▏     | 41262/50000 [7:29:17<1:34:45,  1.54it/s]


 83%|███████████████████████████▏     | 41263/50000 [7:29:17<1:30:12,  1.61it/s]


 83%|███████████████████████████▏     | 41264/50000 [7:29:18<1:31:24,  1.59it/s]


 83%|███████████████████████████▏     | 41265/50000 [7:29:19<1:41:35,  1.43it/s]


 83%|███████████████████████████▏     | 41266/50000 [7:29:19<1:36:22,  1.51it/s]


 83%|███████████████████████████▏     | 41267/50000 [7:29:20<1:36:12,  1.51it/s]


 83%|███████████████████████████▏     | 41268/50000 [7:29:21<1:36:05,  1.51it/s]


 83%|███████████████████████████▏     | 41269/50000 [7:29:21<1:44:17,  1.40it/s]


 83%|███████████████████████████▏     | 41270/50000 [7:29:22<1:40:53,  1.44it/s]


 83%|███████████████████████████▏     | 41271/50000 [7:29:23<1:42:48,  1.42it/s]


 83%|███████████████████████████▏     | 41272/50000 [7:29:23<1:37:22,  1.49it/s]


 83%|███████████████████████████▏     | 41273/50000 [7:29:24<1:40:36,  1.45it/s]


 83%|███████████████████████████▏     | 41274/50000 [7:29:25<1:35:50,  1.52it/s]


 83%|███████████████████████████▏     | 41275/50000 [7:29:26<1:44:38,  1.39it/s]


 83%|███████████████████████████▏     | 41276/50000 [7:29:26<1:41:39,  1.43it/s]


 83%|███████████████████████████▏     | 41277/50000 [7:29:27<1:38:13,  1.48it/s]


 83%|███████████████████████████▏     | 41278/50000 [7:29:28<1:38:00,  1.48it/s]


 83%|███████████████████████████▏     | 41279/50000 [7:29:28<1:33:27,  1.56it/s]


 83%|███████████████████████████▏     | 41280/50000 [7:29:29<1:29:42,  1.62it/s]


 83%|███████████████████████████▏     | 41281/50000 [7:29:29<1:32:03,  1.58it/s]


 83%|███████████████████████████▏     | 41282/50000 [7:29:30<1:35:59,  1.51it/s]


 83%|███████████████████████████▏     | 41283/50000 [7:29:31<1:33:06,  1.56it/s]


 83%|███████████████████████████▏     | 41284/50000 [7:29:31<1:31:21,  1.59it/s]


 83%|███████████████████████████▏     | 41285/50000 [7:29:32<1:30:58,  1.60it/s]


 83%|███████████████████████████▏     | 41286/50000 [7:29:32<1:27:57,  1.65it/s]


 83%|███████████████████████████▏     | 41287/50000 [7:29:33<1:29:25,  1.62it/s]


 83%|███████████████████████████▎     | 41288/50000 [7:29:34<1:37:51,  1.48it/s]


 83%|███████████████████████████▎     | 41289/50000 [7:29:34<1:33:36,  1.55it/s]


 83%|███████████████████████████▎     | 41290/50000 [7:29:35<1:33:55,  1.55it/s]


 83%|███████████████████████████▎     | 41291/50000 [7:29:36<1:36:37,  1.50it/s]


 83%|███████████████████████████▎     | 41292/50000 [7:29:37<1:40:16,  1.45it/s]


 83%|███████████████████████████▎     | 41293/50000 [7:29:37<1:41:04,  1.44it/s]


 83%|███████████████████████████▎     | 41294/50000 [7:29:38<1:39:41,  1.46it/s]


 83%|███████████████████████████▎     | 41295/50000 [7:29:39<1:37:21,  1.49it/s]


 83%|███████████████████████████▎     | 41296/50000 [7:29:39<1:36:52,  1.50it/s]


 83%|███████████████████████████▎     | 41297/50000 [7:29:40<1:33:31,  1.55it/s]


 83%|███████████████████████████▎     | 41298/50000 [7:29:41<1:39:16,  1.46it/s]


 83%|███████████████████████████▎     | 41299/50000 [7:29:41<1:37:55,  1.48it/s]


 83%|███████████████████████████▎     | 41300/50000 [7:29:42<1:41:47,  1.42it/s]
                                                                                
{'loss': 3.1689, 'grad_norm': 5.084524631500244, 'learning_rate': 0.000174, 'epoch': 2.16}

 83%|███████████████████████████▎     | 41300/50000 [7:29:42<1:41:47,  1.42it/s]


 83%|███████████████████████████▎     | 41301/50000 [7:29:43<1:44:13,  1.39it/s]


 83%|███████████████████████████▎     | 41302/50000 [7:29:43<1:42:15,  1.42it/s]


 83%|███████████████████████████▎     | 41303/50000 [7:29:44<1:43:43,  1.40it/s]


 83%|███████████████████████████▎     | 41304/50000 [7:29:45<1:44:11,  1.39it/s]


 83%|███████████████████████████▎     | 41305/50000 [7:29:46<1:40:33,  1.44it/s]


 83%|███████████████████████████▎     | 41306/50000 [7:29:46<1:38:16,  1.47it/s]


 83%|███████████████████████████▎     | 41307/50000 [7:29:47<1:34:12,  1.54it/s]


 83%|███████████████████████████▎     | 41308/50000 [7:29:48<1:41:30,  1.43it/s]


 83%|███████████████████████████▎     | 41309/50000 [7:29:48<1:38:18,  1.47it/s]


 83%|███████████████████████████▎     | 41310/50000 [7:29:49<1:38:13,  1.47it/s]


 83%|███████████████████████████▎     | 41311/50000 [7:29:50<1:34:53,  1.53it/s]


 83%|███████████████████████████▎     | 41312/50000 [7:29:50<1:34:43,  1.53it/s]


 83%|███████████████████████████▎     | 41313/50000 [7:29:51<1:42:50,  1.41it/s]


 83%|███████████████████████████▎     | 41314/50000 [7:29:52<1:39:12,  1.46it/s]


 83%|███████████████████████████▎     | 41315/50000 [7:29:52<1:41:10,  1.43it/s]


 83%|███████████████████████████▎     | 41316/50000 [7:29:53<1:45:57,  1.37it/s]


 83%|███████████████████████████▎     | 41317/50000 [7:29:54<1:45:52,  1.37it/s]


 83%|███████████████████████████▎     | 41318/50000 [7:29:55<1:43:47,  1.39it/s]


 83%|███████████████████████████▎     | 41319/50000 [7:29:55<1:38:12,  1.47it/s]


 83%|███████████████████████████▎     | 41320/50000 [7:29:56<1:33:45,  1.54it/s]


 83%|███████████████████████████▎     | 41321/50000 [7:29:56<1:35:18,  1.52it/s]


 83%|███████████████████████████▎     | 41322/50000 [7:29:57<1:33:04,  1.55it/s]


 83%|███████████████████████████▎     | 41323/50000 [7:29:58<1:30:58,  1.59it/s]


 83%|███████████████████████████▎     | 41324/50000 [7:29:58<1:27:37,  1.65it/s]


 83%|███████████████████████████▎     | 41325/50000 [7:29:59<1:27:26,  1.65it/s]


 83%|███████████████████████████▎     | 41326/50000 [7:29:59<1:27:33,  1.65it/s]


 83%|███████████████████████████▎     | 41327/50000 [7:30:00<1:29:18,  1.62it/s]


 83%|███████████████████████████▎     | 41328/50000 [7:30:01<1:34:59,  1.52it/s]


 83%|███████████████████████████▎     | 41329/50000 [7:30:01<1:30:23,  1.60it/s]


 83%|███████████████████████████▎     | 41330/50000 [7:30:02<1:32:20,  1.56it/s]


 83%|███████████████████████████▎     | 41331/50000 [7:30:03<1:34:16,  1.53it/s]


 83%|███████████████████████████▎     | 41332/50000 [7:30:03<1:38:27,  1.47it/s]


 83%|███████████████████████████▎     | 41333/50000 [7:30:04<1:47:52,  1.34it/s]


 83%|███████████████████████████▎     | 41334/50000 [7:30:05<1:46:14,  1.36it/s]


 83%|███████████████████████████▎     | 41335/50000 [7:30:06<1:41:47,  1.42it/s]


 83%|███████████████████████████▎     | 41336/50000 [7:30:06<1:38:24,  1.47it/s]


 83%|███████████████████████████▎     | 41337/50000 [7:30:07<1:29:38,  1.61it/s]


 83%|███████████████████████████▎     | 41338/50000 [7:30:08<1:35:14,  1.52it/s]


 83%|███████████████████████████▎     | 41339/50000 [7:30:08<1:33:33,  1.54it/s]


 83%|███████████████████████████▎     | 41340/50000 [7:30:09<1:37:56,  1.47it/s]


 83%|███████████████████████████▎     | 41341/50000 [7:30:10<1:41:17,  1.42it/s]


 83%|███████████████████████████▎     | 41342/50000 [7:30:10<1:34:46,  1.52it/s]


 83%|███████████████████████████▎     | 41343/50000 [7:30:11<1:41:21,  1.42it/s]


 83%|███████████████████████████▎     | 41344/50000 [7:30:12<1:36:26,  1.50it/s]


 83%|███████████████████████████▎     | 41345/50000 [7:30:12<1:35:42,  1.51it/s]


 83%|███████████████████████████▎     | 41346/50000 [7:30:13<1:36:34,  1.49it/s]


 83%|███████████████████████████▎     | 41347/50000 [7:30:14<1:36:21,  1.50it/s]


 83%|███████████████████████████▎     | 41348/50000 [7:30:14<1:35:55,  1.50it/s]


 83%|███████████████████████████▎     | 41349/50000 [7:30:15<1:33:09,  1.55it/s]


 83%|███████████████████████████▎     | 41350/50000 [7:30:16<1:33:56,  1.53it/s]


 83%|███████████████████████████▎     | 41351/50000 [7:30:16<1:36:25,  1.49it/s]


 83%|███████████████████████████▎     | 41352/50000 [7:30:17<1:39:27,  1.45it/s]


 83%|███████████████████████████▎     | 41353/50000 [7:30:18<1:33:32,  1.54it/s]


 83%|███████████████████████████▎     | 41354/50000 [7:30:18<1:33:39,  1.54it/s]


 83%|███████████████████████████▎     | 41355/50000 [7:30:19<1:32:39,  1.55it/s]


 83%|███████████████████████████▎     | 41356/50000 [7:30:20<1:34:07,  1.53it/s]


 83%|███████████████████████████▎     | 41357/50000 [7:30:20<1:41:03,  1.43it/s]


 83%|███████████████████████████▎     | 41358/50000 [7:30:21<1:47:25,  1.34it/s]


 83%|███████████████████████████▎     | 41359/50000 [7:30:22<1:43:20,  1.39it/s]


 83%|███████████████████████████▎     | 41360/50000 [7:30:22<1:38:17,  1.47it/s]


 83%|███████████████████████████▎     | 41361/50000 [7:30:23<1:33:41,  1.54it/s]


 83%|███████████████████████████▎     | 41362/50000 [7:30:24<1:31:31,  1.57it/s]


 83%|███████████████████████████▎     | 41363/50000 [7:30:24<1:30:02,  1.60it/s]


 83%|███████████████████████████▎     | 41364/50000 [7:30:25<1:25:02,  1.69it/s]


 83%|███████████████████████████▎     | 41365/50000 [7:30:25<1:32:09,  1.56it/s]


 83%|███████████████████████████▎     | 41366/50000 [7:30:26<1:32:38,  1.55it/s]


 83%|███████████████████████████▎     | 41367/50000 [7:30:27<1:30:30,  1.59it/s]


 83%|███████████████████████████▎     | 41368/50000 [7:30:27<1:35:23,  1.51it/s]


 83%|███████████████████████████▎     | 41369/50000 [7:30:28<1:31:06,  1.58it/s]


 83%|███████████████████████████▎     | 41370/50000 [7:30:29<1:33:15,  1.54it/s]


 83%|███████████████████████████▎     | 41371/50000 [7:30:29<1:33:23,  1.54it/s]


 83%|███████████████████████████▎     | 41372/50000 [7:30:30<1:32:41,  1.55it/s]


 83%|███████████████████████████▎     | 41373/50000 [7:30:31<1:45:07,  1.37it/s]


 83%|███████████████████████████▎     | 41374/50000 [7:30:32<1:39:35,  1.44it/s]


 83%|███████████████████████████▎     | 41375/50000 [7:30:32<1:35:27,  1.51it/s]


 83%|███████████████████████████▎     | 41376/50000 [7:30:33<1:34:45,  1.52it/s]


 83%|███████████████████████████▎     | 41377/50000 [7:30:33<1:37:23,  1.48it/s]


 83%|███████████████████████████▎     | 41378/50000 [7:30:34<1:40:52,  1.42it/s]


 83%|███████████████████████████▎     | 41379/50000 [7:30:35<1:38:48,  1.45it/s]


 83%|███████████████████████████▎     | 41380/50000 [7:30:36<1:41:01,  1.42it/s]


 83%|███████████████████████████▎     | 41381/50000 [7:30:36<1:41:22,  1.42it/s]


 83%|███████████████████████████▎     | 41382/50000 [7:30:37<1:36:56,  1.48it/s]


 83%|███████████████████████████▎     | 41383/50000 [7:30:37<1:30:13,  1.59it/s]


 83%|███████████████████████████▎     | 41384/50000 [7:30:38<1:40:02,  1.44it/s]


 83%|███████████████████████████▎     | 41385/50000 [7:30:39<1:41:45,  1.41it/s]


 83%|███████████████████████████▎     | 41386/50000 [7:30:40<1:38:13,  1.46it/s]


 83%|███████████████████████████▎     | 41387/50000 [7:30:40<1:32:46,  1.55it/s]


 83%|███████████████████████████▎     | 41388/50000 [7:30:41<1:29:28,  1.60it/s]


 83%|███████████████████████████▎     | 41389/50000 [7:30:41<1:24:50,  1.69it/s]


 83%|███████████████████████████▎     | 41390/50000 [7:30:42<1:29:55,  1.60it/s]


 83%|███████████████████████████▎     | 41391/50000 [7:30:43<1:34:07,  1.52it/s]


 83%|███████████████████████████▎     | 41392/50000 [7:30:43<1:31:13,  1.57it/s]


 83%|███████████████████████████▎     | 41393/50000 [7:30:44<1:32:27,  1.55it/s]


 83%|███████████████████████████▎     | 41394/50000 [7:30:45<1:29:31,  1.60it/s]


 83%|███████████████████████████▎     | 41395/50000 [7:30:45<1:33:11,  1.54it/s]


 83%|███████████████████████████▎     | 41396/50000 [7:30:46<1:36:55,  1.48it/s]


 83%|███████████████████████████▎     | 41397/50000 [7:30:47<1:40:05,  1.43it/s]


 83%|███████████████████████████▎     | 41398/50000 [7:30:47<1:37:50,  1.47it/s]


 83%|███████████████████████████▎     | 41399/50000 [7:30:48<1:35:42,  1.50it/s]


 83%|███████████████████████████▎     | 41400/50000 [7:30:49<1:31:46,  1.56it/s]
                                                                                
{'loss': 3.1613, 'grad_norm': 2.7960894107818604, 'learning_rate': 0.00017199999999999998, 'epoch': 2.17}

 83%|███████████████████████████▎     | 41400/50000 [7:30:49<1:31:46,  1.56it/s]


 83%|███████████████████████████▎     | 41401/50000 [7:30:49<1:32:34,  1.55it/s]


 83%|███████████████████████████▎     | 41402/50000 [7:30:50<1:32:16,  1.55it/s]


 83%|███████████████████████████▎     | 41403/50000 [7:30:51<1:31:47,  1.56it/s]


 83%|███████████████████████████▎     | 41404/50000 [7:30:51<1:36:11,  1.49it/s]


 83%|███████████████████████████▎     | 41405/50000 [7:30:52<1:43:28,  1.38it/s]


 83%|███████████████████████████▎     | 41406/50000 [7:30:53<1:42:54,  1.39it/s]


 83%|███████████████████████████▎     | 41407/50000 [7:30:54<1:38:43,  1.45it/s]


 83%|███████████████████████████▎     | 41408/50000 [7:30:54<1:36:02,  1.49it/s]


 83%|███████████████████████████▎     | 41409/50000 [7:30:55<1:37:43,  1.47it/s]


 83%|███████████████████████████▎     | 41410/50000 [7:30:55<1:32:37,  1.55it/s]


 83%|███████████████████████████▎     | 41411/50000 [7:30:56<1:37:00,  1.48it/s]


 83%|███████████████████████████▎     | 41412/50000 [7:30:57<1:46:31,  1.34it/s]


 83%|███████████████████████████▎     | 41413/50000 [7:30:58<1:37:01,  1.47it/s]


 83%|███████████████████████████▎     | 41414/50000 [7:30:58<1:37:17,  1.47it/s]


 83%|███████████████████████████▎     | 41415/50000 [7:30:59<1:33:54,  1.52it/s]


 83%|███████████████████████████▎     | 41416/50000 [7:31:00<1:37:20,  1.47it/s]


 83%|███████████████████████████▎     | 41417/50000 [7:31:00<1:36:41,  1.48it/s]


 83%|███████████████████████████▎     | 41418/50000 [7:31:01<1:40:22,  1.42it/s]


 83%|███████████████████████████▎     | 41419/50000 [7:31:02<1:36:53,  1.48it/s]


 83%|███████████████████████████▎     | 41420/50000 [7:31:02<1:36:37,  1.48it/s]


 83%|███████████████████████████▎     | 41421/50000 [7:31:03<1:39:13,  1.44it/s]


 83%|███████████████████████████▎     | 41422/50000 [7:31:04<1:36:05,  1.49it/s]


 83%|███████████████████████████▎     | 41423/50000 [7:31:04<1:31:58,  1.55it/s]


 83%|███████████████████████████▎     | 41424/50000 [7:31:05<1:33:42,  1.53it/s]


 83%|███████████████████████████▎     | 41425/50000 [7:31:06<1:36:53,  1.48it/s]


 83%|███████████████████████████▎     | 41426/50000 [7:31:06<1:32:57,  1.54it/s]


 83%|███████████████████████████▎     | 41427/50000 [7:31:07<1:37:45,  1.46it/s]


 83%|███████████████████████████▎     | 41428/50000 [7:31:08<1:35:16,  1.50it/s]


 83%|███████████████████████████▎     | 41429/50000 [7:31:08<1:30:03,  1.59it/s]


 83%|███████████████████████████▎     | 41430/50000 [7:31:09<1:31:47,  1.56it/s]


 83%|███████████████████████████▎     | 41431/50000 [7:31:10<1:37:10,  1.47it/s]


 83%|███████████████████████████▎     | 41432/50000 [7:31:10<1:35:24,  1.50it/s]


 83%|███████████████████████████▎     | 41433/50000 [7:31:11<1:34:25,  1.51it/s]


 83%|███████████████████████████▎     | 41434/50000 [7:31:12<1:38:12,  1.45it/s]


 83%|███████████████████████████▎     | 41435/50000 [7:31:12<1:30:49,  1.57it/s]


 83%|███████████████████████████▎     | 41436/50000 [7:31:13<1:36:29,  1.48it/s]


 83%|███████████████████████████▎     | 41437/50000 [7:31:14<1:37:55,  1.46it/s]


 83%|███████████████████████████▎     | 41438/50000 [7:31:14<1:32:57,  1.54it/s]


 83%|███████████████████████████▎     | 41439/50000 [7:31:15<1:29:29,  1.59it/s]


 83%|███████████████████████████▎     | 41440/50000 [7:31:16<1:36:04,  1.48it/s]


 83%|███████████████████████████▎     | 41441/50000 [7:31:16<1:31:54,  1.55it/s]


 83%|███████████████████████████▎     | 41442/50000 [7:31:17<1:28:59,  1.60it/s]


 83%|███████████████████████████▎     | 41443/50000 [7:31:17<1:27:45,  1.63it/s]


 83%|███████████████████████████▎     | 41444/50000 [7:31:18<1:26:37,  1.65it/s]


 83%|███████████████████████████▎     | 41445/50000 [7:31:19<1:25:48,  1.66it/s]


 83%|███████████████████████████▎     | 41446/50000 [7:31:19<1:31:01,  1.57it/s]


 83%|███████████████████████████▎     | 41447/50000 [7:31:20<1:28:53,  1.60it/s]


 83%|███████████████████████████▎     | 41448/50000 [7:31:21<1:31:10,  1.56it/s]


 83%|███████████████████████████▎     | 41449/50000 [7:31:21<1:28:59,  1.60it/s]


 83%|███████████████████████████▎     | 41450/50000 [7:31:22<1:35:42,  1.49it/s]


 83%|███████████████████████████▎     | 41451/50000 [7:31:23<1:34:50,  1.50it/s]


 83%|███████████████████████████▎     | 41452/50000 [7:31:23<1:42:19,  1.39it/s]


 83%|███████████████████████████▎     | 41453/50000 [7:31:24<1:36:47,  1.47it/s]


 83%|███████████████████████████▎     | 41454/50000 [7:31:25<1:36:41,  1.47it/s]


 83%|███████████████████████████▎     | 41455/50000 [7:31:25<1:36:05,  1.48it/s]


 83%|███████████████████████████▎     | 41456/50000 [7:31:26<1:35:06,  1.50it/s]


 83%|███████████████████████████▎     | 41457/50000 [7:31:27<1:41:08,  1.41it/s]


 83%|███████████████████████████▎     | 41458/50000 [7:31:28<1:43:06,  1.38it/s]


 83%|███████████████████████████▎     | 41459/50000 [7:31:28<1:42:44,  1.39it/s]


 83%|███████████████████████████▎     | 41460/50000 [7:31:29<1:39:28,  1.43it/s]


 83%|███████████████████████████▎     | 41461/50000 [7:31:29<1:34:30,  1.51it/s]


 83%|███████████████████████████▎     | 41462/50000 [7:31:30<1:30:28,  1.57it/s]


 83%|███████████████████████████▎     | 41463/50000 [7:31:31<1:27:39,  1.62it/s]


 83%|███████████████████████████▎     | 41464/50000 [7:31:31<1:36:26,  1.48it/s]


 83%|███████████████████████████▎     | 41465/50000 [7:31:32<1:36:25,  1.48it/s]


 83%|███████████████████████████▎     | 41466/50000 [7:31:33<1:31:36,  1.55it/s]


 83%|███████████████████████████▎     | 41467/50000 [7:31:33<1:28:32,  1.61it/s]


 83%|███████████████████████████▎     | 41468/50000 [7:31:34<1:23:50,  1.70it/s]


 83%|███████████████████████████▎     | 41469/50000 [7:31:34<1:29:57,  1.58it/s]


 83%|███████████████████████████▎     | 41470/50000 [7:31:35<1:31:09,  1.56it/s]


 83%|███████████████████████████▎     | 41471/50000 [7:31:36<1:35:52,  1.48it/s]


 83%|███████████████████████████▎     | 41472/50000 [7:31:37<1:35:33,  1.49it/s]


 83%|███████████████████████████▎     | 41473/50000 [7:31:37<1:30:48,  1.57it/s]


 83%|███████████████████████████▎     | 41474/50000 [7:31:38<1:33:55,  1.51it/s]


 83%|███████████████████████████▎     | 41475/50000 [7:31:39<1:40:38,  1.41it/s]


 83%|███████████████████████████▎     | 41476/50000 [7:31:39<1:37:47,  1.45it/s]


 83%|███████████████████████████▎     | 41477/50000 [7:31:40<1:38:46,  1.44it/s]


 83%|███████████████████████████▍     | 41478/50000 [7:31:41<1:33:46,  1.51it/s]


 83%|███████████████████████████▍     | 41479/50000 [7:31:41<1:37:09,  1.46it/s]


 83%|███████████████████████████▍     | 41480/50000 [7:31:42<1:34:48,  1.50it/s]


 83%|███████████████████████████▍     | 41481/50000 [7:31:42<1:28:41,  1.60it/s]


 83%|███████████████████████████▍     | 41482/50000 [7:31:43<1:26:58,  1.63it/s]


 83%|███████████████████████████▍     | 41483/50000 [7:31:44<1:30:08,  1.57it/s]


 83%|███████████████████████████▍     | 41484/50000 [7:31:44<1:27:43,  1.62it/s]


 83%|███████████████████████████▍     | 41485/50000 [7:31:45<1:25:13,  1.67it/s]


 83%|███████████████████████████▍     | 41486/50000 [7:31:46<1:38:01,  1.45it/s]


 83%|███████████████████████████▍     | 41487/50000 [7:31:46<1:29:53,  1.58it/s]


 83%|███████████████████████████▍     | 41488/50000 [7:31:47<1:31:50,  1.54it/s]


 83%|███████████████████████████▍     | 41489/50000 [7:31:48<1:32:06,  1.54it/s]


 83%|███████████████████████████▍     | 41490/50000 [7:31:48<1:32:37,  1.53it/s]


 83%|███████████████████████████▍     | 41491/50000 [7:31:49<1:33:44,  1.51it/s]


 83%|███████████████████████████▍     | 41492/50000 [7:31:50<1:38:07,  1.45it/s]


 83%|███████████████████████████▍     | 41493/50000 [7:31:50<1:35:59,  1.48it/s]


 83%|███████████████████████████▍     | 41494/50000 [7:31:51<1:36:06,  1.48it/s]


 83%|███████████████████████████▍     | 41495/50000 [7:31:52<1:34:49,  1.49it/s]


 83%|███████████████████████████▍     | 41496/50000 [7:31:52<1:34:10,  1.50it/s]


 83%|███████████████████████████▍     | 41497/50000 [7:31:53<1:32:56,  1.52it/s]


 83%|███████████████████████████▍     | 41498/50000 [7:31:54<1:36:12,  1.47it/s]


 83%|███████████████████████████▍     | 41499/50000 [7:31:54<1:38:44,  1.43it/s]


 83%|███████████████████████████▍     | 41500/50000 [7:31:55<1:36:08,  1.47it/s]
                                                                                
{'loss': 3.1467, 'grad_norm': 3.128586769104004, 'learning_rate': 0.00017, 'epoch': 2.17}

 83%|███████████████████████████▍     | 41500/50000 [7:31:55<1:36:08,  1.47it/s]


 83%|███████████████████████████▍     | 41501/50000 [7:31:56<1:29:03,  1.59it/s]


 83%|███████████████████████████▍     | 41502/50000 [7:31:56<1:32:33,  1.53it/s]


 83%|███████████████████████████▍     | 41503/50000 [7:31:57<1:32:19,  1.53it/s]


 83%|███████████████████████████▍     | 41504/50000 [7:31:58<1:37:53,  1.45it/s]


 83%|███████████████████████████▍     | 41505/50000 [7:31:59<1:43:15,  1.37it/s]


 83%|███████████████████████████▍     | 41506/50000 [7:31:59<1:36:18,  1.47it/s]


 83%|███████████████████████████▍     | 41507/50000 [7:32:00<1:39:20,  1.42it/s]


 83%|███████████████████████████▍     | 41508/50000 [7:32:01<1:40:06,  1.41it/s]


 83%|███████████████████████████▍     | 41509/50000 [7:32:01<1:38:38,  1.43it/s]


 83%|███████████████████████████▍     | 41510/50000 [7:32:02<1:36:17,  1.47it/s]


 83%|███████████████████████████▍     | 41511/50000 [7:32:03<1:34:56,  1.49it/s]


 83%|███████████████████████████▍     | 41512/50000 [7:32:03<1:35:03,  1.49it/s]


 83%|███████████████████████████▍     | 41513/50000 [7:32:04<1:36:59,  1.46it/s]


 83%|███████████████████████████▍     | 41514/50000 [7:32:05<1:35:05,  1.49it/s]


 83%|███████████████████████████▍     | 41515/50000 [7:32:05<1:34:18,  1.50it/s]


 83%|███████████████████████████▍     | 41516/50000 [7:32:06<1:37:22,  1.45it/s]


 83%|███████████████████████████▍     | 41517/50000 [7:32:07<1:32:09,  1.53it/s]


 83%|███████████████████████████▍     | 41518/50000 [7:32:07<1:31:28,  1.55it/s]


 83%|███████████████████████████▍     | 41519/50000 [7:32:08<1:27:46,  1.61it/s]


 83%|███████████████████████████▍     | 41520/50000 [7:32:08<1:25:57,  1.64it/s]


 83%|███████████████████████████▍     | 41521/50000 [7:32:09<1:35:56,  1.47it/s]


 83%|███████████████████████████▍     | 41522/50000 [7:32:10<1:31:50,  1.54it/s]


 83%|███████████████████████████▍     | 41523/50000 [7:32:10<1:31:52,  1.54it/s]


 83%|███████████████████████████▍     | 41524/50000 [7:32:11<1:29:36,  1.58it/s]


 83%|███████████████████████████▍     | 41525/50000 [7:32:12<1:31:16,  1.55it/s]


 83%|███████████████████████████▍     | 41526/50000 [7:32:12<1:34:39,  1.49it/s]


 83%|███████████████████████████▍     | 41527/50000 [7:32:13<1:31:34,  1.54it/s]


 83%|███████████████████████████▍     | 41528/50000 [7:32:14<1:29:24,  1.58it/s]


 83%|███████████████████████████▍     | 41529/50000 [7:32:14<1:31:07,  1.55it/s]


 83%|███████████████████████████▍     | 41530/50000 [7:32:15<1:31:28,  1.54it/s]


 83%|███████████████████████████▍     | 41531/50000 [7:32:16<1:30:56,  1.55it/s]


 83%|███████████████████████████▍     | 41532/50000 [7:32:16<1:28:26,  1.60it/s]


 83%|███████████████████████████▍     | 41533/50000 [7:32:17<1:25:06,  1.66it/s]


 83%|███████████████████████████▍     | 41534/50000 [7:32:17<1:27:31,  1.61it/s]


 83%|███████████████████████████▍     | 41535/50000 [7:32:18<1:30:16,  1.56it/s]


 83%|███████████████████████████▍     | 41536/50000 [7:32:19<1:30:36,  1.56it/s]


 83%|███████████████████████████▍     | 41537/50000 [7:32:19<1:30:53,  1.55it/s]


 83%|███████████████████████████▍     | 41538/50000 [7:32:20<1:28:38,  1.59it/s]


 83%|███████████████████████████▍     | 41539/50000 [7:32:21<1:32:08,  1.53it/s]


 83%|███████████████████████████▍     | 41540/50000 [7:32:21<1:36:13,  1.47it/s]


 83%|███████████████████████████▍     | 41541/50000 [7:32:22<1:37:37,  1.44it/s]


 83%|███████████████████████████▍     | 41542/50000 [7:32:23<1:36:27,  1.46it/s]


 83%|███████████████████████████▍     | 41543/50000 [7:32:23<1:31:36,  1.54it/s]


 83%|███████████████████████████▍     | 41544/50000 [7:32:24<1:30:38,  1.55it/s]


 83%|███████████████████████████▍     | 41545/50000 [7:32:25<1:28:21,  1.59it/s]


 83%|███████████████████████████▍     | 41546/50000 [7:32:25<1:28:37,  1.59it/s]


 83%|███████████████████████████▍     | 41547/50000 [7:32:26<1:34:28,  1.49it/s]


 83%|███████████████████████████▍     | 41548/50000 [7:32:27<1:32:22,  1.52it/s]


 83%|███████████████████████████▍     | 41549/50000 [7:32:27<1:33:01,  1.51it/s]


 83%|███████████████████████████▍     | 41550/50000 [7:32:28<1:29:12,  1.58it/s]


 83%|███████████████████████████▍     | 41551/50000 [7:32:29<1:31:03,  1.55it/s]


 83%|███████████████████████████▍     | 41552/50000 [7:32:29<1:31:50,  1.53it/s]


 83%|███████████████████████████▍     | 41553/50000 [7:32:30<1:33:23,  1.51it/s]


 83%|███████████████████████████▍     | 41554/50000 [7:32:30<1:30:00,  1.56it/s]


 83%|███████████████████████████▍     | 41555/50000 [7:32:31<1:33:49,  1.50it/s]


 83%|███████████████████████████▍     | 41556/50000 [7:32:32<1:36:32,  1.46it/s]


 83%|███████████████████████████▍     | 41557/50000 [7:32:32<1:30:39,  1.55it/s]


 83%|███████████████████████████▍     | 41558/50000 [7:32:33<1:34:06,  1.50it/s]


 83%|███████████████████████████▍     | 41559/50000 [7:32:34<1:33:46,  1.50it/s]


 83%|███████████████████████████▍     | 41560/50000 [7:32:34<1:27:31,  1.61it/s]


 83%|███████████████████████████▍     | 41561/50000 [7:32:35<1:29:07,  1.58it/s]


 83%|███████████████████████████▍     | 41562/50000 [7:32:36<1:29:08,  1.58it/s]


 83%|███████████████████████████▍     | 41563/50000 [7:32:36<1:32:41,  1.52it/s]


 83%|███████████████████████████▍     | 41564/50000 [7:32:37<1:33:44,  1.50it/s]


 83%|███████████████████████████▍     | 41565/50000 [7:32:38<1:40:19,  1.40it/s]


 83%|███████████████████████████▍     | 41566/50000 [7:32:39<1:42:26,  1.37it/s]


 83%|███████████████████████████▍     | 41567/50000 [7:32:39<1:39:27,  1.41it/s]


 83%|███████████████████████████▍     | 41568/50000 [7:32:40<1:36:34,  1.46it/s]


 83%|███████████████████████████▍     | 41569/50000 [7:32:41<1:34:17,  1.49it/s]


 83%|███████████████████████████▍     | 41570/50000 [7:32:41<1:33:28,  1.50it/s]


 83%|███████████████████████████▍     | 41571/50000 [7:32:42<1:36:14,  1.46it/s]


 83%|███████████████████████████▍     | 41572/50000 [7:32:43<1:35:22,  1.47it/s]


 83%|███████████████████████████▍     | 41573/50000 [7:32:43<1:32:08,  1.52it/s]


 83%|███████████████████████████▍     | 41574/50000 [7:32:44<1:26:31,  1.62it/s]


 83%|███████████████████████████▍     | 41575/50000 [7:32:44<1:27:28,  1.61it/s]


 83%|███████████████████████████▍     | 41576/50000 [7:32:45<1:27:38,  1.60it/s]


 83%|███████████████████████████▍     | 41577/50000 [7:32:46<1:23:06,  1.69it/s]


 83%|███████████████████████████▍     | 41578/50000 [7:32:46<1:30:52,  1.54it/s]


 83%|███████████████████████████▍     | 41579/50000 [7:32:47<1:24:16,  1.67it/s]


 83%|███████████████████████████▍     | 41580/50000 [7:32:47<1:20:08,  1.75it/s]


 83%|███████████████████████████▍     | 41581/50000 [7:32:48<1:17:08,  1.82it/s]


 83%|███████████████████████████▍     | 41582/50000 [7:32:48<1:22:35,  1.70it/s]


 83%|███████████████████████████▍     | 41583/50000 [7:32:49<1:25:04,  1.65it/s]


 83%|███████████████████████████▍     | 41584/50000 [7:32:50<1:27:51,  1.60it/s]


 83%|███████████████████████████▍     | 41585/50000 [7:32:50<1:26:08,  1.63it/s]


 83%|███████████████████████████▍     | 41586/50000 [7:32:51<1:24:37,  1.66it/s]


 83%|███████████████████████████▍     | 41587/50000 [7:32:52<1:26:27,  1.62it/s]


 83%|███████████████████████████▍     | 41588/50000 [7:32:52<1:28:18,  1.59it/s]


 83%|███████████████████████████▍     | 41589/50000 [7:32:53<1:28:53,  1.58it/s]


 83%|███████████████████████████▍     | 41590/50000 [7:32:53<1:25:32,  1.64it/s]


 83%|███████████████████████████▍     | 41591/50000 [7:32:54<1:25:31,  1.64it/s]


 83%|███████████████████████████▍     | 41592/50000 [7:32:55<1:31:43,  1.53it/s]


 83%|███████████████████████████▍     | 41593/50000 [7:32:56<1:36:21,  1.45it/s]


 83%|███████████████████████████▍     | 41594/50000 [7:32:56<1:35:27,  1.47it/s]


 83%|███████████████████████████▍     | 41595/50000 [7:32:57<1:37:52,  1.43it/s]


 83%|███████████████████████████▍     | 41596/50000 [7:32:58<1:44:15,  1.34it/s]


 83%|███████████████████████████▍     | 41597/50000 [7:32:59<1:47:38,  1.30it/s]


 83%|███████████████████████████▍     | 41598/50000 [7:32:59<1:37:23,  1.44it/s]


 83%|███████████████████████████▍     | 41599/50000 [7:33:00<1:34:35,  1.48it/s]


 83%|███████████████████████████▍     | 41600/50000 [7:33:01<1:37:45,  1.43it/s]
                                                                                
{'loss': 3.1863, 'grad_norm': 3.0953073501586914, 'learning_rate': 0.00016800000000000002, 'epoch': 2.18}

 83%|███████████████████████████▍     | 41600/50000 [7:33:01<1:37:45,  1.43it/s]


 83%|███████████████████████████▍     | 41601/50000 [7:33:01<1:32:27,  1.51it/s]


 83%|███████████████████████████▍     | 41602/50000 [7:33:02<1:31:55,  1.52it/s]


 83%|███████████████████████████▍     | 41603/50000 [7:33:03<1:36:26,  1.45it/s]


 83%|███████████████████████████▍     | 41604/50000 [7:33:03<1:30:44,  1.54it/s]


 83%|███████████████████████████▍     | 41605/50000 [7:33:04<1:28:35,  1.58it/s]


 83%|███████████████████████████▍     | 41606/50000 [7:33:04<1:28:41,  1.58it/s]


 83%|███████████████████████████▍     | 41607/50000 [7:33:05<1:25:01,  1.65it/s]


 83%|███████████████████████████▍     | 41608/50000 [7:33:06<1:30:32,  1.54it/s]


 83%|███████████████████████████▍     | 41609/50000 [7:33:06<1:28:11,  1.59it/s]


 83%|███████████████████████████▍     | 41610/50000 [7:33:07<1:30:17,  1.55it/s]


 83%|███████████████████████████▍     | 41611/50000 [7:33:08<1:27:28,  1.60it/s]


 83%|███████████████████████████▍     | 41612/50000 [7:33:08<1:24:43,  1.65it/s]


 83%|███████████████████████████▍     | 41613/50000 [7:33:09<1:23:17,  1.68it/s]


 83%|███████████████████████████▍     | 41614/50000 [7:33:09<1:26:34,  1.61it/s]


 83%|███████████████████████████▍     | 41615/50000 [7:33:10<1:27:13,  1.60it/s]


 83%|███████████████████████████▍     | 41616/50000 [7:33:11<1:29:15,  1.57it/s]


 83%|███████████████████████████▍     | 41617/50000 [7:33:11<1:38:24,  1.42it/s]


 83%|███████████████████████████▍     | 41618/50000 [7:33:12<1:36:49,  1.44it/s]


 83%|███████████████████████████▍     | 41619/50000 [7:33:13<1:43:08,  1.35it/s]


 83%|███████████████████████████▍     | 41620/50000 [7:33:14<1:36:12,  1.45it/s]


 83%|███████████████████████████▍     | 41621/50000 [7:33:14<1:35:01,  1.47it/s]


 83%|███████████████████████████▍     | 41622/50000 [7:33:15<1:29:25,  1.56it/s]


 83%|███████████████████████████▍     | 41623/50000 [7:33:16<1:34:34,  1.48it/s]


 83%|███████████████████████████▍     | 41624/50000 [7:33:16<1:43:51,  1.34it/s]


 83%|███████████████████████████▍     | 41625/50000 [7:33:17<1:34:33,  1.48it/s]


 83%|███████████████████████████▍     | 41626/50000 [7:33:18<1:40:38,  1.39it/s]


 83%|███████████████████████████▍     | 41627/50000 [7:33:18<1:35:05,  1.47it/s]


 83%|███████████████████████████▍     | 41628/50000 [7:33:19<1:34:54,  1.47it/s]


 83%|███████████████████████████▍     | 41629/50000 [7:33:20<1:29:15,  1.56it/s]


 83%|███████████████████████████▍     | 41630/50000 [7:33:20<1:29:14,  1.56it/s]


 83%|███████████████████████████▍     | 41631/50000 [7:33:21<1:23:50,  1.66it/s]


 83%|███████████████████████████▍     | 41632/50000 [7:33:21<1:25:42,  1.63it/s]


 83%|███████████████████████████▍     | 41633/50000 [7:33:22<1:26:47,  1.61it/s]


 83%|███████████████████████████▍     | 41634/50000 [7:33:23<1:35:15,  1.46it/s]


 83%|███████████████████████████▍     | 41635/50000 [7:33:24<1:37:49,  1.43it/s]


 83%|███████████████████████████▍     | 41636/50000 [7:33:24<1:33:41,  1.49it/s]


 83%|███████████████████████████▍     | 41637/50000 [7:33:25<1:28:59,  1.57it/s]


 83%|███████████████████████████▍     | 41638/50000 [7:33:25<1:32:02,  1.51it/s]


 83%|███████████████████████████▍     | 41639/50000 [7:33:26<1:32:35,  1.50it/s]


 83%|███████████████████████████▍     | 41640/50000 [7:33:27<1:32:10,  1.51it/s]


 83%|███████████████████████████▍     | 41641/50000 [7:33:27<1:32:06,  1.51it/s]


 83%|███████████████████████████▍     | 41642/50000 [7:33:28<1:30:45,  1.53it/s]


 83%|███████████████████████████▍     | 41643/50000 [7:33:29<1:31:22,  1.52it/s]


 83%|███████████████████████████▍     | 41644/50000 [7:33:29<1:29:10,  1.56it/s]


 83%|███████████████████████████▍     | 41645/50000 [7:33:30<1:27:52,  1.58it/s]


 83%|███████████████████████████▍     | 41646/50000 [7:33:31<1:33:07,  1.50it/s]


 83%|███████████████████████████▍     | 41647/50000 [7:33:31<1:29:13,  1.56it/s]


 83%|███████████████████████████▍     | 41648/50000 [7:33:32<1:25:43,  1.62it/s]


 83%|███████████████████████████▍     | 41649/50000 [7:33:33<1:28:01,  1.58it/s]


 83%|███████████████████████████▍     | 41650/50000 [7:33:33<1:27:47,  1.59it/s]


 83%|███████████████████████████▍     | 41651/50000 [7:33:34<1:25:45,  1.62it/s]


 83%|███████████████████████████▍     | 41652/50000 [7:33:34<1:24:36,  1.64it/s]


 83%|███████████████████████████▍     | 41653/50000 [7:33:35<1:26:10,  1.61it/s]


 83%|███████████████████████████▍     | 41654/50000 [7:33:36<1:35:24,  1.46it/s]


 83%|███████████████████████████▍     | 41655/50000 [7:33:36<1:33:43,  1.48it/s]


 83%|███████████████████████████▍     | 41656/50000 [7:33:37<1:29:55,  1.55it/s]


 83%|███████████████████████████▍     | 41657/50000 [7:33:38<1:30:40,  1.53it/s]


 83%|███████████████████████████▍     | 41658/50000 [7:33:38<1:27:30,  1.59it/s]


 83%|███████████████████████████▍     | 41659/50000 [7:33:39<1:30:49,  1.53it/s]


 83%|███████████████████████████▍     | 41660/50000 [7:33:40<1:28:38,  1.57it/s]


 83%|███████████████████████████▍     | 41661/50000 [7:33:40<1:28:44,  1.57it/s]


 83%|███████████████████████████▍     | 41662/50000 [7:33:41<1:25:55,  1.62it/s]


 83%|███████████████████████████▍     | 41663/50000 [7:33:41<1:23:24,  1.67it/s]


 83%|███████████████████████████▍     | 41664/50000 [7:33:42<1:21:21,  1.71it/s]


 83%|███████████████████████████▍     | 41665/50000 [7:33:43<1:25:27,  1.63it/s]


 83%|███████████████████████████▍     | 41666/50000 [7:33:43<1:26:44,  1.60it/s]


 83%|███████████████████████████▌     | 41667/50000 [7:33:44<1:23:40,  1.66it/s]


 83%|███████████████████████████▌     | 41668/50000 [7:33:45<1:29:16,  1.56it/s]


 83%|███████████████████████████▌     | 41669/50000 [7:33:45<1:26:29,  1.61it/s]


 83%|███████████████████████████▌     | 41670/50000 [7:33:46<1:24:15,  1.65it/s]


 83%|███████████████████████████▌     | 41671/50000 [7:33:46<1:22:26,  1.68it/s]


 83%|███████████████████████████▌     | 41672/50000 [7:33:47<1:27:49,  1.58it/s]


 83%|███████████████████████████▌     | 41673/50000 [7:33:48<1:24:08,  1.65it/s]


 83%|███████████████████████████▌     | 41674/50000 [7:33:48<1:24:59,  1.63it/s]


 83%|███████████████████████████▌     | 41675/50000 [7:33:49<1:31:13,  1.52it/s]


 83%|███████████████████████████▌     | 41676/50000 [7:33:50<1:29:55,  1.54it/s]


 83%|███████████████████████████▌     | 41677/50000 [7:33:50<1:32:27,  1.50it/s]


 83%|███████████████████████████▌     | 41678/50000 [7:33:51<1:34:28,  1.47it/s]


 83%|███████████████████████████▌     | 41679/50000 [7:33:52<1:40:19,  1.38it/s]


 83%|███████████████████████████▌     | 41680/50000 [7:33:52<1:37:19,  1.42it/s]


 83%|███████████████████████████▌     | 41681/50000 [7:33:53<1:38:13,  1.41it/s]


 83%|███████████████████████████▌     | 41682/50000 [7:33:54<1:33:43,  1.48it/s]


 83%|███████████████████████████▌     | 41683/50000 [7:33:54<1:30:03,  1.54it/s]


 83%|███████████████████████████▌     | 41684/50000 [7:33:55<1:33:22,  1.48it/s]


 83%|███████████████████████████▌     | 41685/50000 [7:33:56<1:29:43,  1.54it/s]


 83%|███████████████████████████▌     | 41686/50000 [7:33:56<1:30:33,  1.53it/s]


 83%|███████████████████████████▌     | 41687/50000 [7:33:57<1:27:44,  1.58it/s]


 83%|███████████████████████████▌     | 41688/50000 [7:33:58<1:27:33,  1.58it/s]


 83%|███████████████████████████▌     | 41689/50000 [7:33:58<1:26:21,  1.60it/s]


 83%|███████████████████████████▌     | 41690/50000 [7:33:59<1:27:36,  1.58it/s]


 83%|███████████████████████████▌     | 41691/50000 [7:33:59<1:28:28,  1.57it/s]


 83%|███████████████████████████▌     | 41692/50000 [7:34:00<1:28:34,  1.56it/s]


 83%|███████████████████████████▌     | 41693/50000 [7:34:01<1:26:17,  1.60it/s]


 83%|███████████████████████████▌     | 41694/50000 [7:34:01<1:27:16,  1.59it/s]


 83%|███████████████████████████▌     | 41695/50000 [7:34:02<1:23:47,  1.65it/s]


 83%|███████████████████████████▌     | 41696/50000 [7:34:02<1:21:38,  1.70it/s]


 83%|███████████████████████████▌     | 41697/50000 [7:34:03<1:27:48,  1.58it/s]


 83%|███████████████████████████▌     | 41698/50000 [7:34:04<1:26:30,  1.60it/s]


 83%|███████████████████████████▌     | 41699/50000 [7:34:05<1:32:43,  1.49it/s]


 83%|███████████████████████████▌     | 41700/50000 [7:34:05<1:35:58,  1.44it/s]
                                                                                
{'loss': 3.1458, 'grad_norm': 2.8223187923431396, 'learning_rate': 0.00016600000000000002, 'epoch': 2.18}

 83%|███████████████████████████▌     | 41700/50000 [7:34:05<1:35:58,  1.44it/s]


 83%|███████████████████████████▌     | 41701/50000 [7:34:06<1:31:23,  1.51it/s]


 83%|███████████████████████████▌     | 41702/50000 [7:34:07<1:31:20,  1.51it/s]


 83%|███████████████████████████▌     | 41703/50000 [7:34:07<1:30:30,  1.53it/s]


 83%|███████████████████████████▌     | 41704/50000 [7:34:08<1:31:32,  1.51it/s]


 83%|███████████████████████████▌     | 41705/50000 [7:34:08<1:30:23,  1.53it/s]


 83%|███████████████████████████▌     | 41706/50000 [7:34:09<1:26:59,  1.59it/s]


 83%|███████████████████████████▌     | 41707/50000 [7:34:10<1:26:48,  1.59it/s]


 83%|███████████████████████████▌     | 41708/50000 [7:34:10<1:33:17,  1.48it/s]


 83%|███████████████████████████▌     | 41709/50000 [7:34:11<1:33:12,  1.48it/s]


 83%|███████████████████████████▌     | 41710/50000 [7:34:12<1:37:07,  1.42it/s]


 83%|███████████████████████████▌     | 41711/50000 [7:34:13<1:33:42,  1.47it/s]


 83%|███████████████████████████▌     | 41712/50000 [7:34:13<1:29:58,  1.54it/s]


 83%|███████████████████████████▌     | 41713/50000 [7:34:14<1:36:45,  1.43it/s]


 83%|███████████████████████████▌     | 41714/50000 [7:34:15<1:30:49,  1.52it/s]


 83%|███████████████████████████▌     | 41715/50000 [7:34:15<1:28:28,  1.56it/s]


 83%|███████████████████████████▌     | 41716/50000 [7:34:16<1:30:14,  1.53it/s]


 83%|███████████████████████████▌     | 41717/50000 [7:34:16<1:27:01,  1.59it/s]


 83%|███████████████████████████▌     | 41718/50000 [7:34:17<1:25:16,  1.62it/s]


 83%|███████████████████████████▌     | 41719/50000 [7:34:18<1:31:00,  1.52it/s]


 83%|███████████████████████████▌     | 41720/50000 [7:34:18<1:30:24,  1.53it/s]


 83%|███████████████████████████▌     | 41721/50000 [7:34:19<1:27:48,  1.57it/s]


 83%|███████████████████████████▌     | 41722/50000 [7:34:20<1:26:21,  1.60it/s]


 83%|███████████████████████████▌     | 41723/50000 [7:34:20<1:27:24,  1.58it/s]


 83%|███████████████████████████▌     | 41724/50000 [7:34:21<1:27:54,  1.57it/s]


 83%|███████████████████████████▌     | 41725/50000 [7:34:21<1:26:27,  1.60it/s]


 83%|███████████████████████████▌     | 41726/50000 [7:34:22<1:25:43,  1.61it/s]


 83%|███████████████████████████▌     | 41727/50000 [7:34:23<1:27:16,  1.58it/s]


 83%|███████████████████████████▌     | 41728/50000 [7:34:23<1:31:34,  1.51it/s]


 83%|███████████████████████████▌     | 41729/50000 [7:34:24<1:30:31,  1.52it/s]


 83%|███████████████████████████▌     | 41730/50000 [7:34:25<1:32:40,  1.49it/s]


 83%|███████████████████████████▌     | 41731/50000 [7:34:25<1:29:27,  1.54it/s]


 83%|███████████████████████████▌     | 41732/50000 [7:34:26<1:29:32,  1.54it/s]


 83%|███████████████████████████▌     | 41733/50000 [7:34:27<1:32:14,  1.49it/s]


 83%|███████████████████████████▌     | 41734/50000 [7:34:27<1:31:15,  1.51it/s]


 83%|███████████████████████████▌     | 41735/50000 [7:34:28<1:33:10,  1.48it/s]


 83%|███████████████████████████▌     | 41736/50000 [7:34:29<1:32:08,  1.49it/s]


 83%|███████████████████████████▌     | 41737/50000 [7:34:30<1:34:55,  1.45it/s]


 83%|███████████████████████████▌     | 41738/50000 [7:34:30<1:32:04,  1.50it/s]


 83%|███████████████████████████▌     | 41739/50000 [7:34:31<1:31:05,  1.51it/s]


 83%|███████████████████████████▌     | 41740/50000 [7:34:31<1:29:52,  1.53it/s]


 83%|███████████████████████████▌     | 41741/50000 [7:34:32<1:26:58,  1.58it/s]


 83%|███████████████████████████▌     | 41742/50000 [7:34:33<1:27:14,  1.58it/s]


 83%|███████████████████████████▌     | 41743/50000 [7:34:33<1:23:32,  1.65it/s]


 83%|███████████████████████████▌     | 41744/50000 [7:34:34<1:26:26,  1.59it/s]


 83%|███████████████████████████▌     | 41745/50000 [7:34:35<1:30:08,  1.53it/s]


 83%|███████████████████████████▌     | 41746/50000 [7:34:35<1:26:05,  1.60it/s]


 83%|███████████████████████████▌     | 41747/50000 [7:34:36<1:32:29,  1.49it/s]


 83%|███████████████████████████▌     | 41748/50000 [7:34:36<1:28:46,  1.55it/s]


 83%|███████████████████████████▌     | 41749/50000 [7:34:37<1:25:56,  1.60it/s]


 84%|███████████████████████████▌     | 41750/50000 [7:34:38<1:26:33,  1.59it/s]


 84%|███████████████████████████▌     | 41751/50000 [7:34:38<1:27:28,  1.57it/s]


 84%|███████████████████████████▌     | 41752/50000 [7:34:39<1:27:34,  1.57it/s]


 84%|███████████████████████████▌     | 41753/50000 [7:34:40<1:31:56,  1.49it/s]


 84%|███████████████████████████▌     | 41754/50000 [7:34:40<1:28:54,  1.55it/s]


 84%|███████████████████████████▌     | 41755/50000 [7:34:41<1:27:01,  1.58it/s]


 84%|███████████████████████████▌     | 41756/50000 [7:34:42<1:28:56,  1.54it/s]


 84%|███████████████████████████▌     | 41757/50000 [7:34:42<1:28:38,  1.55it/s]


 84%|███████████████████████████▌     | 41758/50000 [7:34:43<1:26:02,  1.60it/s]


 84%|███████████████████████████▌     | 41759/50000 [7:34:43<1:26:18,  1.59it/s]


 84%|███████████████████████████▌     | 41760/50000 [7:34:44<1:24:53,  1.62it/s]


 84%|███████████████████████████▌     | 41761/50000 [7:34:45<1:28:38,  1.55it/s]


 84%|███████████████████████████▌     | 41762/50000 [7:34:45<1:28:36,  1.55it/s]


 84%|███████████████████████████▌     | 41763/50000 [7:34:46<1:24:42,  1.62it/s]


 84%|███████████████████████████▌     | 41764/50000 [7:34:47<1:25:05,  1.61it/s]


 84%|███████████████████████████▌     | 41765/50000 [7:34:47<1:27:11,  1.57it/s]


 84%|███████████████████████████▌     | 41766/50000 [7:34:48<1:28:53,  1.54it/s]


 84%|███████████████████████████▌     | 41767/50000 [7:34:49<1:26:26,  1.59it/s]


 84%|███████████████████████████▌     | 41768/50000 [7:34:49<1:24:44,  1.62it/s]


 84%|███████████████████████████▌     | 41769/50000 [7:34:50<1:27:11,  1.57it/s]


 84%|███████████████████████████▌     | 41770/50000 [7:34:50<1:25:30,  1.60it/s]


 84%|███████████████████████████▌     | 41771/50000 [7:34:51<1:23:50,  1.64it/s]


 84%|███████████████████████████▌     | 41772/50000 [7:34:52<1:29:45,  1.53it/s]


 84%|███████████████████████████▌     | 41773/50000 [7:34:52<1:27:02,  1.58it/s]


 84%|███████████████████████████▌     | 41774/50000 [7:34:53<1:25:39,  1.60it/s]


 84%|███████████████████████████▌     | 41775/50000 [7:34:54<1:37:11,  1.41it/s]


 84%|███████████████████████████▌     | 41776/50000 [7:34:55<1:41:48,  1.35it/s]


 84%|███████████████████████████▌     | 41777/50000 [7:34:55<1:37:48,  1.40it/s]


 84%|███████████████████████████▌     | 41778/50000 [7:34:56<1:29:10,  1.54it/s]


 84%|███████████████████████████▌     | 41779/50000 [7:34:56<1:26:51,  1.58it/s]


 84%|███████████████████████████▌     | 41780/50000 [7:34:57<1:30:32,  1.51it/s]


 84%|███████████████████████████▌     | 41781/50000 [7:34:58<1:30:10,  1.52it/s]


 84%|███████████████████████████▌     | 41782/50000 [7:34:58<1:27:03,  1.57it/s]


 84%|███████████████████████████▌     | 41783/50000 [7:34:59<1:27:44,  1.56it/s]


 84%|███████████████████████████▌     | 41784/50000 [7:35:00<1:27:26,  1.57it/s]


 84%|███████████████████████████▌     | 41785/50000 [7:35:00<1:29:02,  1.54it/s]


 84%|███████████████████████████▌     | 41786/50000 [7:35:01<1:37:55,  1.40it/s]


 84%|███████████████████████████▌     | 41787/50000 [7:35:02<1:39:18,  1.38it/s]


 84%|███████████████████████████▌     | 41788/50000 [7:35:03<1:37:34,  1.40it/s]


 84%|███████████████████████████▌     | 41789/50000 [7:35:03<1:39:36,  1.37it/s]


 84%|███████████████████████████▌     | 41790/50000 [7:35:04<1:35:57,  1.43it/s]


 84%|███████████████████████████▌     | 41791/50000 [7:35:05<1:38:27,  1.39it/s]


 84%|███████████████████████████▌     | 41792/50000 [7:35:06<1:47:23,  1.27it/s]


 84%|███████████████████████████▌     | 41793/50000 [7:35:06<1:43:14,  1.32it/s]


 84%|███████████████████████████▌     | 41794/50000 [7:35:07<1:37:57,  1.40it/s]


 84%|███████████████████████████▌     | 41795/50000 [7:35:08<1:35:35,  1.43it/s]


 84%|███████████████████████████▌     | 41796/50000 [7:35:08<1:33:38,  1.46it/s]


 84%|███████████████████████████▌     | 41797/50000 [7:35:09<1:33:35,  1.46it/s]


 84%|███████████████████████████▌     | 41798/50000 [7:35:10<1:27:51,  1.56it/s]


 84%|███████████████████████████▌     | 41799/50000 [7:35:10<1:21:52,  1.67it/s]


 84%|███████████████████████████▌     | 41800/50000 [7:35:11<1:21:11,  1.68it/s]
                                                                                
{'loss': 3.1737, 'grad_norm': 3.7624049186706543, 'learning_rate': 0.000164, 'epoch': 2.19}

 84%|███████████████████████████▌     | 41800/50000 [7:35:11<1:21:11,  1.68it/s]


 84%|███████████████████████████▌     | 41801/50000 [7:35:11<1:26:30,  1.58it/s]


 84%|███████████████████████████▌     | 41802/50000 [7:35:12<1:28:19,  1.55it/s]


 84%|███████████████████████████▌     | 41803/50000 [7:35:13<1:22:44,  1.65it/s]


 84%|███████████████████████████▌     | 41804/50000 [7:35:13<1:28:05,  1.55it/s]


 84%|███████████████████████████▌     | 41805/50000 [7:35:14<1:29:25,  1.53it/s]


 84%|███████████████████████████▌     | 41806/50000 [7:35:15<1:37:39,  1.40it/s]


 84%|███████████████████████████▌     | 41807/50000 [7:35:15<1:31:30,  1.49it/s]


 84%|███████████████████████████▌     | 41808/50000 [7:35:16<1:31:17,  1.50it/s]


 84%|███████████████████████████▌     | 41809/50000 [7:35:17<1:31:06,  1.50it/s]


 84%|███████████████████████████▌     | 41810/50000 [7:35:17<1:33:26,  1.46it/s]


 84%|███████████████████████████▌     | 41811/50000 [7:35:18<1:31:36,  1.49it/s]


 84%|███████████████████████████▌     | 41812/50000 [7:35:19<1:32:07,  1.48it/s]


 84%|███████████████████████████▌     | 41813/50000 [7:35:19<1:31:58,  1.48it/s]


 84%|███████████████████████████▌     | 41814/50000 [7:35:20<1:31:02,  1.50it/s]


 84%|███████████████████████████▌     | 41815/50000 [7:35:21<1:26:17,  1.58it/s]


 84%|███████████████████████████▌     | 41816/50000 [7:35:21<1:24:13,  1.62it/s]


 84%|███████████████████████████▌     | 41817/50000 [7:35:22<1:26:25,  1.58it/s]


 84%|███████████████████████████▌     | 41818/50000 [7:35:22<1:23:18,  1.64it/s]


 84%|███████████████████████████▌     | 41819/50000 [7:35:23<1:28:43,  1.54it/s]


 84%|███████████████████████████▌     | 41820/50000 [7:35:24<1:25:09,  1.60it/s]


 84%|███████████████████████████▌     | 41821/50000 [7:35:24<1:26:15,  1.58it/s]


 84%|███████████████████████████▌     | 41822/50000 [7:35:25<1:26:00,  1.58it/s]


 84%|███████████████████████████▌     | 41823/50000 [7:35:26<1:24:15,  1.62it/s]


 84%|███████████████████████████▌     | 41824/50000 [7:35:26<1:25:06,  1.60it/s]


 84%|███████████████████████████▌     | 41825/50000 [7:35:27<1:25:26,  1.59it/s]


 84%|███████████████████████████▌     | 41826/50000 [7:35:28<1:30:08,  1.51it/s]


 84%|███████████████████████████▌     | 41827/50000 [7:35:28<1:26:49,  1.57it/s]


 84%|███████████████████████████▌     | 41828/50000 [7:35:29<1:29:46,  1.52it/s]


 84%|███████████████████████████▌     | 41829/50000 [7:35:30<1:30:45,  1.50it/s]


 84%|███████████████████████████▌     | 41830/50000 [7:35:30<1:24:51,  1.60it/s]


 84%|███████████████████████████▌     | 41831/50000 [7:35:31<1:26:32,  1.57it/s]


 84%|███████████████████████████▌     | 41832/50000 [7:35:31<1:20:38,  1.69it/s]


 84%|███████████████████████████▌     | 41833/50000 [7:35:32<1:19:41,  1.71it/s]


 84%|███████████████████████████▌     | 41834/50000 [7:35:33<1:23:43,  1.63it/s]


 84%|███████████████████████████▌     | 41835/50000 [7:35:33<1:24:26,  1.61it/s]


 84%|███████████████████████████▌     | 41836/50000 [7:35:34<1:26:44,  1.57it/s]


 84%|███████████████████████████▌     | 41837/50000 [7:35:35<1:27:36,  1.55it/s]


 84%|███████████████████████████▌     | 41838/50000 [7:35:35<1:31:08,  1.49it/s]


 84%|███████████████████████████▌     | 41839/50000 [7:35:36<1:27:18,  1.56it/s]


 84%|███████████████████████████▌     | 41840/50000 [7:35:36<1:26:56,  1.56it/s]


 84%|███████████████████████████▌     | 41841/50000 [7:35:37<1:27:59,  1.55it/s]


 84%|███████████████████████████▌     | 41842/50000 [7:35:38<1:27:47,  1.55it/s]


 84%|███████████████████████████▌     | 41843/50000 [7:35:38<1:24:32,  1.61it/s]


 84%|███████████████████████████▌     | 41844/50000 [7:35:39<1:22:25,  1.65it/s]


 84%|███████████████████████████▌     | 41845/50000 [7:35:40<1:25:03,  1.60it/s]


 84%|███████████████████████████▌     | 41846/50000 [7:35:40<1:22:47,  1.64it/s]


 84%|███████████████████████████▌     | 41847/50000 [7:35:41<1:18:44,  1.73it/s]


 84%|███████████████████████████▌     | 41848/50000 [7:35:41<1:16:08,  1.78it/s]


 84%|███████████████████████████▌     | 41849/50000 [7:35:42<1:22:10,  1.65it/s]


 84%|███████████████████████████▌     | 41850/50000 [7:35:43<1:24:03,  1.62it/s]


 84%|███████████████████████████▌     | 41851/50000 [7:35:43<1:22:02,  1.66it/s]


 84%|███████████████████████████▌     | 41852/50000 [7:35:44<1:18:42,  1.73it/s]


 84%|███████████████████████████▌     | 41853/50000 [7:35:44<1:25:20,  1.59it/s]


 84%|███████████████████████████▌     | 41854/50000 [7:35:45<1:22:26,  1.65it/s]


 84%|███████████████████████████▌     | 41855/50000 [7:35:46<1:23:43,  1.62it/s]


 84%|███████████████████████████▌     | 41856/50000 [7:35:46<1:22:18,  1.65it/s]


 84%|███████████████████████████▋     | 41857/50000 [7:35:47<1:30:50,  1.49it/s]


 84%|███████████████████████████▋     | 41858/50000 [7:35:48<1:30:39,  1.50it/s]


 84%|███████████████████████████▋     | 41859/50000 [7:35:48<1:31:19,  1.49it/s]


 84%|███████████████████████████▋     | 41860/50000 [7:35:49<1:33:39,  1.45it/s]


 84%|███████████████████████████▋     | 41861/50000 [7:35:50<1:30:03,  1.51it/s]


 84%|███████████████████████████▋     | 41862/50000 [7:35:50<1:33:15,  1.45it/s]


 84%|███████████████████████████▋     | 41863/50000 [7:35:51<1:28:44,  1.53it/s]


 84%|███████████████████████████▋     | 41864/50000 [7:35:52<1:27:22,  1.55it/s]


 84%|███████████████████████████▋     | 41865/50000 [7:35:52<1:24:20,  1.61it/s]


 84%|███████████████████████████▋     | 41866/50000 [7:35:53<1:25:32,  1.58it/s]


 84%|███████████████████████████▋     | 41867/50000 [7:35:53<1:23:18,  1.63it/s]


 84%|███████████████████████████▋     | 41868/50000 [7:35:54<1:22:14,  1.65it/s]


 84%|███████████████████████████▋     | 41869/50000 [7:35:55<1:26:46,  1.56it/s]


 84%|███████████████████████████▋     | 41870/50000 [7:35:56<1:35:00,  1.43it/s]


 84%|███████████████████████████▋     | 41871/50000 [7:35:56<1:35:55,  1.41it/s]


 84%|███████████████████████████▋     | 41872/50000 [7:35:57<1:31:51,  1.47it/s]


 84%|███████████████████████████▋     | 41873/50000 [7:35:57<1:27:59,  1.54it/s]


 84%|███████████████████████████▋     | 41874/50000 [7:35:58<1:28:34,  1.53it/s]


 84%|███████████████████████████▋     | 41875/50000 [7:35:59<1:28:44,  1.53it/s]


 84%|███████████████████████████▋     | 41876/50000 [7:35:59<1:25:30,  1.58it/s]


 84%|███████████████████████████▋     | 41877/50000 [7:36:00<1:28:38,  1.53it/s]


 84%|███████████████████████████▋     | 41878/50000 [7:36:01<1:33:44,  1.44it/s]


 84%|███████████████████████████▋     | 41879/50000 [7:36:02<1:32:50,  1.46it/s]


 84%|███████████████████████████▋     | 41880/50000 [7:36:02<1:41:14,  1.34it/s]


 84%|███████████████████████████▋     | 41881/50000 [7:36:03<1:36:18,  1.41it/s]


 84%|███████████████████████████▋     | 41882/50000 [7:36:04<1:30:32,  1.49it/s]


 84%|███████████████████████████▋     | 41883/50000 [7:36:04<1:29:48,  1.51it/s]


 84%|███████████████████████████▋     | 41884/50000 [7:36:05<1:29:34,  1.51it/s]


 84%|███████████████████████████▋     | 41885/50000 [7:36:05<1:25:34,  1.58it/s]


 84%|███████████████████████████▋     | 41886/50000 [7:36:06<1:26:06,  1.57it/s]


 84%|███████████████████████████▋     | 41887/50000 [7:36:07<1:33:25,  1.45it/s]


 84%|███████████████████████████▋     | 41888/50000 [7:36:08<1:29:15,  1.51it/s]


 84%|███████████████████████████▋     | 41889/50000 [7:36:08<1:24:35,  1.60it/s]


 84%|███████████████████████████▋     | 41890/50000 [7:36:09<1:28:19,  1.53it/s]


 84%|███████████████████████████▋     | 41891/50000 [7:36:09<1:25:57,  1.57it/s]


 84%|███████████████████████████▋     | 41892/50000 [7:36:10<1:25:50,  1.57it/s]


 84%|███████████████████████████▋     | 41893/50000 [7:36:11<1:25:44,  1.58it/s]


 84%|███████████████████████████▋     | 41894/50000 [7:36:11<1:27:27,  1.54it/s]


 84%|███████████████████████████▋     | 41895/50000 [7:36:12<1:25:19,  1.58it/s]


 84%|███████████████████████████▋     | 41896/50000 [7:36:13<1:27:25,  1.54it/s]


 84%|███████████████████████████▋     | 41897/50000 [7:36:13<1:23:32,  1.62it/s]


 84%|███████████████████████████▋     | 41898/50000 [7:36:14<1:20:48,  1.67it/s]


 84%|███████████████████████████▋     | 41899/50000 [7:36:14<1:25:38,  1.58it/s]


 84%|███████████████████████████▋     | 41900/50000 [7:36:15<1:23:47,  1.61it/s]
                                                                                
{'loss': 3.1608, 'grad_norm': 3.6586856842041016, 'learning_rate': 0.000162, 'epoch': 2.19}

 84%|███████████████████████████▋     | 41900/50000 [7:36:15<1:23:47,  1.61it/s]


 84%|███████████████████████████▋     | 41901/50000 [7:36:16<1:28:15,  1.53it/s]


 84%|███████████████████████████▋     | 41902/50000 [7:36:17<1:34:49,  1.42it/s]


 84%|███████████████████████████▋     | 41903/50000 [7:36:17<1:35:56,  1.41it/s]


 84%|███████████████████████████▋     | 41904/50000 [7:36:18<1:33:32,  1.44it/s]


 84%|███████████████████████████▋     | 41905/50000 [7:36:19<1:29:48,  1.50it/s]


 84%|███████████████████████████▋     | 41906/50000 [7:36:19<1:26:40,  1.56it/s]


 84%|███████████████████████████▋     | 41907/50000 [7:36:20<1:24:13,  1.60it/s]


 84%|███████████████████████████▋     | 41908/50000 [7:36:20<1:21:46,  1.65it/s]


 84%|███████████████████████████▋     | 41909/50000 [7:36:21<1:24:54,  1.59it/s]


 84%|███████████████████████████▋     | 41910/50000 [7:36:22<1:25:47,  1.57it/s]


 84%|███████████████████████████▋     | 41911/50000 [7:36:22<1:20:55,  1.67it/s]


 84%|███████████████████████████▋     | 41912/50000 [7:36:23<1:18:40,  1.71it/s]


 84%|███████████████████████████▋     | 41913/50000 [7:36:23<1:22:43,  1.63it/s]


 84%|███████████████████████████▋     | 41914/50000 [7:36:24<1:24:14,  1.60it/s]


 84%|███████████████████████████▋     | 41915/50000 [7:36:25<1:27:59,  1.53it/s]


 84%|███████████████████████████▋     | 41916/50000 [7:36:25<1:28:13,  1.53it/s]


 84%|███████████████████████████▋     | 41917/50000 [7:36:26<1:27:04,  1.55it/s]


 84%|███████████████████████████▋     | 41918/50000 [7:36:27<1:26:01,  1.57it/s]


 84%|███████████████████████████▋     | 41919/50000 [7:36:27<1:24:30,  1.59it/s]


 84%|███████████████████████████▋     | 41920/50000 [7:36:28<1:23:42,  1.61it/s]


 84%|███████████████████████████▋     | 41921/50000 [7:36:28<1:24:40,  1.59it/s]


 84%|███████████████████████████▋     | 41922/50000 [7:36:29<1:28:09,  1.53it/s]


 84%|███████████████████████████▋     | 41923/50000 [7:36:30<1:26:00,  1.57it/s]


 84%|███████████████████████████▋     | 41924/50000 [7:36:31<1:31:44,  1.47it/s]


 84%|███████████████████████████▋     | 41925/50000 [7:36:31<1:26:58,  1.55it/s]


 84%|███████████████████████████▋     | 41926/50000 [7:36:32<1:24:43,  1.59it/s]


 84%|███████████████████████████▋     | 41927/50000 [7:36:33<1:29:52,  1.50it/s]


 84%|███████████████████████████▋     | 41928/50000 [7:36:33<1:29:44,  1.50it/s]


 84%|███████████████████████████▋     | 41929/50000 [7:36:34<1:27:54,  1.53it/s]


 84%|███████████████████████████▋     | 41930/50000 [7:36:35<1:30:30,  1.49it/s]


 84%|███████████████████████████▋     | 41931/50000 [7:36:35<1:28:57,  1.51it/s]


 84%|███████████████████████████▋     | 41932/50000 [7:36:36<1:24:17,  1.60it/s]


 84%|███████████████████████████▋     | 41933/50000 [7:36:37<1:40:42,  1.34it/s]


 84%|███████████████████████████▋     | 41934/50000 [7:36:37<1:32:42,  1.45it/s]


 84%|███████████████████████████▋     | 41935/50000 [7:36:38<1:29:55,  1.49it/s]


 84%|███████████████████████████▋     | 41936/50000 [7:36:38<1:25:25,  1.57it/s]


 84%|███████████████████████████▋     | 41937/50000 [7:36:39<1:33:58,  1.43it/s]


 84%|███████████████████████████▋     | 41938/50000 [7:36:40<1:31:48,  1.46it/s]


 84%|███████████████████████████▋     | 41939/50000 [7:36:40<1:25:03,  1.58it/s]


 84%|███████████████████████████▋     | 41940/50000 [7:36:41<1:28:59,  1.51it/s]


 84%|███████████████████████████▋     | 41941/50000 [7:36:42<1:31:42,  1.46it/s]


 84%|███████████████████████████▋     | 41942/50000 [7:36:43<1:28:23,  1.52it/s]


 84%|███████████████████████████▋     | 41943/50000 [7:36:43<1:24:50,  1.58it/s]


 84%|███████████████████████████▋     | 41944/50000 [7:36:44<1:19:41,  1.68it/s]


 84%|███████████████████████████▋     | 41945/50000 [7:36:44<1:24:55,  1.58it/s]


 84%|███████████████████████████▋     | 41946/50000 [7:36:45<1:28:36,  1.52it/s]


 84%|███████████████████████████▋     | 41947/50000 [7:36:46<1:28:35,  1.52it/s]


 84%|███████████████████████████▋     | 41948/50000 [7:36:46<1:26:27,  1.55it/s]


 84%|███████████████████████████▋     | 41949/50000 [7:36:47<1:23:42,  1.60it/s]


 84%|███████████████████████████▋     | 41950/50000 [7:36:47<1:22:02,  1.64it/s]


 84%|███████████████████████████▋     | 41951/50000 [7:36:48<1:20:36,  1.66it/s]


 84%|███████████████████████████▋     | 41952/50000 [7:36:49<1:16:57,  1.74it/s]


 84%|███████████████████████████▋     | 41953/50000 [7:36:49<1:20:20,  1.67it/s]


 84%|███████████████████████████▋     | 41954/50000 [7:36:50<1:20:25,  1.67it/s]


 84%|███████████████████████████▋     | 41955/50000 [7:36:51<1:26:25,  1.55it/s]


 84%|███████████████████████████▋     | 41956/50000 [7:36:51<1:23:54,  1.60it/s]


 84%|███████████████████████████▋     | 41957/50000 [7:36:52<1:21:54,  1.64it/s]


 84%|███████████████████████████▋     | 41958/50000 [7:36:52<1:23:50,  1.60it/s]


 84%|███████████████████████████▋     | 41959/50000 [7:36:53<1:21:35,  1.64it/s]


 84%|███████████████████████████▋     | 41960/50000 [7:36:54<1:19:46,  1.68it/s]


 84%|███████████████████████████▋     | 41961/50000 [7:36:54<1:22:49,  1.62it/s]


 84%|███████████████████████████▋     | 41962/50000 [7:36:55<1:32:48,  1.44it/s]


 84%|███████████████████████████▋     | 41963/50000 [7:36:56<1:34:30,  1.42it/s]


 84%|███████████████████████████▋     | 41964/50000 [7:36:57<1:35:46,  1.40it/s]


 84%|███████████████████████████▋     | 41965/50000 [7:36:57<1:31:57,  1.46it/s]


 84%|███████████████████████████▋     | 41966/50000 [7:36:58<1:32:50,  1.44it/s]


 84%|███████████████████████████▋     | 41967/50000 [7:36:59<1:33:49,  1.43it/s]


 84%|███████████████████████████▋     | 41968/50000 [7:36:59<1:31:34,  1.46it/s]


 84%|███████████████████████████▋     | 41969/50000 [7:37:00<1:38:24,  1.36it/s]


 84%|███████████████████████████▋     | 41970/50000 [7:37:01<1:41:21,  1.32it/s]


 84%|███████████████████████████▋     | 41971/50000 [7:37:02<1:39:45,  1.34it/s]


 84%|███████████████████████████▋     | 41972/50000 [7:37:02<1:35:45,  1.40it/s]


 84%|███████████████████████████▋     | 41973/50000 [7:37:03<1:34:27,  1.42it/s]


 84%|███████████████████████████▋     | 41974/50000 [7:37:04<1:35:09,  1.41it/s]


 84%|███████████████████████████▋     | 41975/50000 [7:37:04<1:33:15,  1.43it/s]


 84%|███████████████████████████▋     | 41976/50000 [7:37:05<1:25:59,  1.56it/s]


 84%|███████████████████████████▋     | 41977/50000 [7:37:06<1:32:40,  1.44it/s]


 84%|███████████████████████████▋     | 41978/50000 [7:37:06<1:31:31,  1.46it/s]


 84%|███████████████████████████▋     | 41979/50000 [7:37:07<1:28:09,  1.52it/s]


 84%|███████████████████████████▋     | 41980/50000 [7:37:08<1:30:44,  1.47it/s]


 84%|███████████████████████████▋     | 41981/50000 [7:37:08<1:27:35,  1.53it/s]


 84%|███████████████████████████▋     | 41982/50000 [7:37:09<1:27:44,  1.52it/s]


 84%|███████████████████████████▋     | 41983/50000 [7:37:10<1:26:50,  1.54it/s]


 84%|███████████████████████████▋     | 41984/50000 [7:37:10<1:25:39,  1.56it/s]


 84%|███████████████████████████▋     | 41985/50000 [7:37:11<1:22:48,  1.61it/s]


 84%|███████████████████████████▋     | 41986/50000 [7:37:11<1:23:35,  1.60it/s]


 84%|███████████████████████████▋     | 41987/50000 [7:37:12<1:31:13,  1.46it/s]


 84%|███████████████████████████▋     | 41988/50000 [7:37:13<1:30:59,  1.47it/s]


 84%|███████████████████████████▋     | 41989/50000 [7:37:14<1:30:03,  1.48it/s]


 84%|███████████████████████████▋     | 41990/50000 [7:37:14<1:30:25,  1.48it/s]


 84%|███████████████████████████▋     | 41991/50000 [7:37:15<1:30:23,  1.48it/s]


 84%|███████████████████████████▋     | 41992/50000 [7:37:16<1:33:47,  1.42it/s]


 84%|███████████████████████████▋     | 41993/50000 [7:37:17<1:39:53,  1.34it/s]


 84%|███████████████████████████▋     | 41994/50000 [7:37:17<1:36:47,  1.38it/s]


 84%|███████████████████████████▋     | 41995/50000 [7:37:18<1:34:21,  1.41it/s]


 84%|███████████████████████████▋     | 41996/50000 [7:37:19<1:37:03,  1.37it/s]


 84%|███████████████████████████▋     | 41997/50000 [7:37:19<1:28:48,  1.50it/s]


 84%|███████████████████████████▋     | 41998/50000 [7:37:20<1:29:29,  1.49it/s]


 84%|███████████████████████████▋     | 41999/50000 [7:37:21<1:33:53,  1.42it/s]


 84%|███████████████████████████▋     | 42000/50000 [7:37:21<1:34:26,  1.41it/s]
                                                                                
{'loss': 3.1663, 'grad_norm': 3.1462206840515137, 'learning_rate': 0.00016, 'epoch': 2.2}

 84%|███████████████████████████▋     | 42000/50000 [7:37:21<1:34:26,  1.41it/s]


 84%|███████████████████████████▋     | 42001/50000 [7:37:22<1:36:19,  1.38it/s]


 84%|███████████████████████████▋     | 42002/50000 [7:37:23<1:34:17,  1.41it/s]


 84%|███████████████████████████▋     | 42003/50000 [7:37:23<1:30:48,  1.47it/s]


 84%|███████████████████████████▋     | 42004/50000 [7:37:24<1:32:30,  1.44it/s]


 84%|███████████████████████████▋     | 42005/50000 [7:37:25<1:30:32,  1.47it/s]


 84%|███████████████████████████▋     | 42006/50000 [7:37:25<1:29:55,  1.48it/s]


 84%|███████████████████████████▋     | 42007/50000 [7:37:26<1:29:16,  1.49it/s]


 84%|███████████████████████████▋     | 42008/50000 [7:37:27<1:31:24,  1.46it/s]


 84%|███████████████████████████▋     | 42009/50000 [7:37:27<1:30:01,  1.48it/s]


 84%|███████████████████████████▋     | 42010/50000 [7:37:28<1:29:02,  1.50it/s]


 84%|███████████████████████████▋     | 42011/50000 [7:37:29<1:24:36,  1.57it/s]


 84%|███████████████████████████▋     | 42012/50000 [7:37:29<1:25:46,  1.55it/s]


 84%|███████████████████████████▋     | 42013/50000 [7:37:30<1:23:04,  1.60it/s]


 84%|███████████████████████████▋     | 42014/50000 [7:37:30<1:21:40,  1.63it/s]


 84%|███████████████████████████▋     | 42015/50000 [7:37:31<1:19:43,  1.67it/s]


 84%|███████████████████████████▋     | 42016/50000 [7:37:32<1:16:43,  1.73it/s]


 84%|███████████████████████████▋     | 42017/50000 [7:37:32<1:18:47,  1.69it/s]


 84%|███████████████████████████▋     | 42018/50000 [7:37:33<1:23:49,  1.59it/s]


 84%|███████████████████████████▋     | 42019/50000 [7:37:33<1:19:15,  1.68it/s]


 84%|███████████████████████████▋     | 42020/50000 [7:37:34<1:22:15,  1.62it/s]


 84%|███████████████████████████▋     | 42021/50000 [7:37:35<1:28:01,  1.51it/s]


 84%|███████████████████████████▋     | 42022/50000 [7:37:35<1:25:22,  1.56it/s]


 84%|███████████████████████████▋     | 42023/50000 [7:37:36<1:30:53,  1.46it/s]


 84%|███████████████████████████▋     | 42024/50000 [7:37:37<1:27:05,  1.53it/s]


 84%|███████████████████████████▋     | 42025/50000 [7:37:38<1:29:49,  1.48it/s]


 84%|███████████████████████████▋     | 42026/50000 [7:37:38<1:25:05,  1.56it/s]


 84%|███████████████████████████▋     | 42027/50000 [7:37:39<1:26:17,  1.54it/s]


 84%|███████████████████████████▋     | 42028/50000 [7:37:39<1:28:40,  1.50it/s]


 84%|███████████████████████████▋     | 42029/50000 [7:37:40<1:30:20,  1.47it/s]


 84%|███████████████████████████▋     | 42030/50000 [7:37:41<1:25:56,  1.55it/s]


 84%|███████████████████████████▋     | 42031/50000 [7:37:41<1:25:51,  1.55it/s]


 84%|███████████████████████████▋     | 42032/50000 [7:37:42<1:29:09,  1.49it/s]


 84%|███████████████████████████▋     | 42033/50000 [7:37:43<1:33:32,  1.42it/s]


 84%|███████████████████████████▋     | 42034/50000 [7:37:44<1:28:39,  1.50it/s]


 84%|███████████████████████████▋     | 42035/50000 [7:37:44<1:28:15,  1.50it/s]


 84%|███████████████████████████▋     | 42036/50000 [7:37:45<1:27:13,  1.52it/s]


 84%|███████████████████████████▋     | 42037/50000 [7:37:46<1:30:06,  1.47it/s]


 84%|███████████████████████████▋     | 42038/50000 [7:37:46<1:35:50,  1.38it/s]


 84%|███████████████████████████▋     | 42039/50000 [7:37:47<1:36:24,  1.38it/s]


 84%|███████████████████████████▋     | 42040/50000 [7:37:48<1:31:39,  1.45it/s]


 84%|███████████████████████████▋     | 42041/50000 [7:37:48<1:30:25,  1.47it/s]


 84%|███████████████████████████▋     | 42042/50000 [7:37:49<1:24:08,  1.58it/s]


 84%|███████████████████████████▋     | 42043/50000 [7:37:49<1:21:20,  1.63it/s]


 84%|███████████████████████████▋     | 42044/50000 [7:37:50<1:32:28,  1.43it/s]


 84%|███████████████████████████▋     | 42045/50000 [7:37:51<1:24:47,  1.56it/s]


 84%|███████████████████████████▊     | 42046/50000 [7:37:51<1:24:30,  1.57it/s]


 84%|███████████████████████████▊     | 42047/50000 [7:37:52<1:21:10,  1.63it/s]


 84%|███████████████████████████▊     | 42048/50000 [7:37:53<1:27:57,  1.51it/s]


 84%|███████████████████████████▊     | 42049/50000 [7:37:54<1:32:38,  1.43it/s]


 84%|███████████████████████████▊     | 42050/50000 [7:37:54<1:32:02,  1.44it/s]


 84%|███████████████████████████▊     | 42051/50000 [7:37:55<1:30:35,  1.46it/s]


 84%|███████████████████████████▊     | 42052/50000 [7:37:56<1:32:24,  1.43it/s]


 84%|███████████████████████████▊     | 42053/50000 [7:37:56<1:34:41,  1.40it/s]


 84%|███████████████████████████▊     | 42054/50000 [7:37:57<1:35:31,  1.39it/s]


 84%|███████████████████████████▊     | 42055/50000 [7:37:58<1:30:15,  1.47it/s]


 84%|███████████████████████████▊     | 42056/50000 [7:37:58<1:28:49,  1.49it/s]


 84%|███████████████████████████▊     | 42057/50000 [7:37:59<1:25:17,  1.55it/s]


 84%|███████████████████████████▊     | 42058/50000 [7:38:00<1:29:27,  1.48it/s]


 84%|███████████████████████████▊     | 42059/50000 [7:38:00<1:29:15,  1.48it/s]


 84%|███████████████████████████▊     | 42060/50000 [7:38:01<1:27:22,  1.51it/s]


 84%|███████████████████████████▊     | 42061/50000 [7:38:02<1:26:34,  1.53it/s]


 84%|███████████████████████████▊     | 42062/50000 [7:38:02<1:23:43,  1.58it/s]


 84%|███████████████████████████▊     | 42063/50000 [7:38:03<1:32:33,  1.43it/s]


 84%|███████████████████████████▊     | 42064/50000 [7:38:04<1:27:10,  1.52it/s]


 84%|███████████████████████████▊     | 42065/50000 [7:38:04<1:26:53,  1.52it/s]


 84%|███████████████████████████▊     | 42066/50000 [7:38:05<1:34:16,  1.40it/s]


 84%|███████████████████████████▊     | 42067/50000 [7:38:06<1:27:50,  1.51it/s]


 84%|███████████████████████████▊     | 42068/50000 [7:38:06<1:28:34,  1.49it/s]


 84%|███████████████████████████▊     | 42069/50000 [7:38:07<1:27:50,  1.50it/s]


 84%|███████████████████████████▊     | 42070/50000 [7:38:08<1:23:07,  1.59it/s]


 84%|███████████████████████████▊     | 42071/50000 [7:38:08<1:23:51,  1.58it/s]


 84%|███████████████████████████▊     | 42072/50000 [7:38:09<1:22:25,  1.60it/s]


 84%|███████████████████████████▊     | 42073/50000 [7:38:10<1:31:56,  1.44it/s]


 84%|███████████████████████████▊     | 42074/50000 [7:38:10<1:27:32,  1.51it/s]


 84%|███████████████████████████▊     | 42075/50000 [7:38:11<1:27:56,  1.50it/s]


 84%|███████████████████████████▊     | 42076/50000 [7:38:12<1:28:10,  1.50it/s]


 84%|███████████████████████████▊     | 42077/50000 [7:38:12<1:26:52,  1.52it/s]


 84%|███████████████████████████▊     | 42078/50000 [7:38:13<1:28:58,  1.48it/s]


 84%|███████████████████████████▊     | 42079/50000 [7:38:14<1:30:21,  1.46it/s]


 84%|███████████████████████████▊     | 42080/50000 [7:38:14<1:29:40,  1.47it/s]


 84%|███████████████████████████▊     | 42081/50000 [7:38:15<1:23:16,  1.58it/s]


 84%|███████████████████████████▊     | 42082/50000 [7:38:16<1:23:21,  1.58it/s]


 84%|███████████████████████████▊     | 42083/50000 [7:38:16<1:21:39,  1.62it/s]


 84%|███████████████████████████▊     | 42084/50000 [7:38:17<1:20:12,  1.64it/s]


 84%|███████████████████████████▊     | 42085/50000 [7:38:17<1:16:40,  1.72it/s]


 84%|███████████████████████████▊     | 42086/50000 [7:38:18<1:23:10,  1.59it/s]


 84%|███████████████████████████▊     | 42087/50000 [7:38:19<1:21:30,  1.62it/s]


 84%|███████████████████████████▊     | 42088/50000 [7:38:19<1:27:42,  1.50it/s]


 84%|███████████████████████████▊     | 42089/50000 [7:38:20<1:29:25,  1.47it/s]


 84%|███████████████████████████▊     | 42090/50000 [7:38:21<1:25:08,  1.55it/s]


 84%|███████████████████████████▊     | 42091/50000 [7:38:21<1:22:52,  1.59it/s]


 84%|███████████████████████████▊     | 42092/50000 [7:38:22<1:23:09,  1.58it/s]


 84%|███████████████████████████▊     | 42093/50000 [7:38:23<1:31:30,  1.44it/s]


 84%|███████████████████████████▊     | 42094/50000 [7:38:23<1:25:52,  1.53it/s]


 84%|███████████████████████████▊     | 42095/50000 [7:38:24<1:23:23,  1.58it/s]


 84%|███████████████████████████▊     | 42096/50000 [7:38:24<1:21:11,  1.62it/s]


 84%|███████████████████████████▊     | 42097/50000 [7:38:25<1:21:20,  1.62it/s]


 84%|███████████████████████████▊     | 42098/50000 [7:38:26<1:21:27,  1.62it/s]


 84%|███████████████████████████▊     | 42099/50000 [7:38:26<1:22:46,  1.59it/s]


 84%|███████████████████████████▊     | 42100/50000 [7:38:27<1:26:11,  1.53it/s]
                                                                                
{'loss': 3.1518, 'grad_norm': 4.800976753234863, 'learning_rate': 0.000158, 'epoch': 2.2}

 84%|███████████████████████████▊     | 42100/50000 [7:38:27<1:26:11,  1.53it/s]


 84%|███████████████████████████▊     | 42101/50000 [7:38:28<1:25:35,  1.54it/s]


 84%|███████████████████████████▊     | 42102/50000 [7:38:28<1:24:00,  1.57it/s]


 84%|███████████████████████████▊     | 42103/50000 [7:38:29<1:22:21,  1.60it/s]


 84%|███████████████████████████▊     | 42104/50000 [7:38:29<1:22:37,  1.59it/s]


 84%|███████████████████████████▊     | 42105/50000 [7:38:30<1:20:49,  1.63it/s]


 84%|███████████████████████████▊     | 42106/50000 [7:38:31<1:22:47,  1.59it/s]


 84%|███████████████████████████▊     | 42107/50000 [7:38:31<1:20:10,  1.64it/s]


 84%|███████████████████████████▊     | 42108/50000 [7:38:32<1:19:20,  1.66it/s]


 84%|███████████████████████████▊     | 42109/50000 [7:38:33<1:24:38,  1.55it/s]


 84%|███████████████████████████▊     | 42110/50000 [7:38:33<1:23:57,  1.57it/s]


 84%|███████████████████████████▊     | 42111/50000 [7:38:34<1:20:58,  1.62it/s]


 84%|███████████████████████████▊     | 42112/50000 [7:38:34<1:22:22,  1.60it/s]


 84%|███████████████████████████▊     | 42113/50000 [7:38:35<1:22:37,  1.59it/s]


 84%|███████████████████████████▊     | 42114/50000 [7:38:36<1:24:45,  1.55it/s]


 84%|███████████████████████████▊     | 42115/50000 [7:38:36<1:22:17,  1.60it/s]


 84%|███████████████████████████▊     | 42116/50000 [7:38:37<1:23:24,  1.58it/s]


 84%|███████████████████████████▊     | 42117/50000 [7:38:38<1:18:46,  1.67it/s]


 84%|███████████████████████████▊     | 42118/50000 [7:38:38<1:19:56,  1.64it/s]


 84%|███████████████████████████▊     | 42119/50000 [7:38:39<1:18:44,  1.67it/s]


 84%|███████████████████████████▊     | 42120/50000 [7:38:39<1:23:51,  1.57it/s]


 84%|███████████████████████████▊     | 42121/50000 [7:38:40<1:23:51,  1.57it/s]


 84%|███████████████████████████▊     | 42122/50000 [7:38:41<1:20:54,  1.62it/s]


 84%|███████████████████████████▊     | 42123/50000 [7:38:41<1:22:33,  1.59it/s]


 84%|███████████████████████████▊     | 42124/50000 [7:38:42<1:21:11,  1.62it/s]


 84%|███████████████████████████▊     | 42125/50000 [7:38:43<1:21:58,  1.60it/s]


 84%|███████████████████████████▊     | 42126/50000 [7:38:43<1:27:52,  1.49it/s]


 84%|███████████████████████████▊     | 42127/50000 [7:38:44<1:24:12,  1.56it/s]


 84%|███████████████████████████▊     | 42128/50000 [7:38:44<1:19:14,  1.66it/s]


 84%|███████████████████████████▊     | 42129/50000 [7:38:45<1:27:33,  1.50it/s]


 84%|███████████████████████████▊     | 42130/50000 [7:38:46<1:23:27,  1.57it/s]


 84%|███████████████████████████▊     | 42131/50000 [7:38:46<1:23:36,  1.57it/s]


 84%|███████████████████████████▊     | 42132/50000 [7:38:47<1:27:31,  1.50it/s]


 84%|███████████████████████████▊     | 42133/50000 [7:38:48<1:26:39,  1.51it/s]


 84%|███████████████████████████▊     | 42134/50000 [7:38:48<1:26:07,  1.52it/s]


 84%|███████████████████████████▊     | 42135/50000 [7:38:49<1:28:12,  1.49it/s]


 84%|███████████████████████████▊     | 42136/50000 [7:38:50<1:27:07,  1.50it/s]


 84%|███████████████████████████▊     | 42137/50000 [7:38:50<1:24:50,  1.54it/s]


 84%|███████████████████████████▊     | 42138/50000 [7:38:51<1:22:16,  1.59it/s]


 84%|███████████████████████████▊     | 42139/50000 [7:38:52<1:24:28,  1.55it/s]


 84%|███████████████████████████▊     | 42140/50000 [7:38:53<1:31:56,  1.42it/s]


 84%|███████████████████████████▊     | 42141/50000 [7:38:53<1:27:29,  1.50it/s]


 84%|███████████████████████████▊     | 42142/50000 [7:38:54<1:24:52,  1.54it/s]


 84%|███████████████████████████▊     | 42143/50000 [7:38:54<1:23:47,  1.56it/s]


 84%|███████████████████████████▊     | 42144/50000 [7:38:55<1:25:14,  1.54it/s]


 84%|███████████████████████████▊     | 42145/50000 [7:38:56<1:22:51,  1.58it/s]


 84%|███████████████████████████▊     | 42146/50000 [7:38:56<1:23:10,  1.57it/s]


 84%|███████████████████████████▊     | 42147/50000 [7:38:57<1:26:04,  1.52it/s]


 84%|███████████████████████████▊     | 42148/50000 [7:38:58<1:29:53,  1.46it/s]


 84%|███████████████████████████▊     | 42149/50000 [7:38:58<1:28:29,  1.48it/s]


 84%|███████████████████████████▊     | 42150/50000 [7:38:59<1:27:00,  1.50it/s]


 84%|███████████████████████████▊     | 42151/50000 [7:39:00<1:26:28,  1.51it/s]


 84%|███████████████████████████▊     | 42152/50000 [7:39:00<1:25:22,  1.53it/s]


 84%|███████████████████████████▊     | 42153/50000 [7:39:01<1:25:22,  1.53it/s]


 84%|███████████████████████████▊     | 42154/50000 [7:39:02<1:24:11,  1.55it/s]


 84%|███████████████████████████▊     | 42155/50000 [7:39:02<1:21:22,  1.61it/s]


 84%|███████████████████████████▊     | 42156/50000 [7:39:03<1:20:51,  1.62it/s]


 84%|███████████████████████████▊     | 42157/50000 [7:39:03<1:21:59,  1.59it/s]


 84%|███████████████████████████▊     | 42158/50000 [7:39:04<1:21:47,  1.60it/s]


 84%|███████████████████████████▊     | 42159/50000 [7:39:05<1:22:39,  1.58it/s]


 84%|███████████████████████████▊     | 42160/50000 [7:39:05<1:24:01,  1.56it/s]


 84%|███████████████████████████▊     | 42161/50000 [7:39:06<1:24:58,  1.54it/s]


 84%|███████████████████████████▊     | 42162/50000 [7:39:07<1:24:52,  1.54it/s]


 84%|███████████████████████████▊     | 42163/50000 [7:39:07<1:24:19,  1.55it/s]


 84%|███████████████████████████▊     | 42164/50000 [7:39:08<1:27:46,  1.49it/s]


 84%|███████████████████████████▊     | 42165/50000 [7:39:09<1:24:48,  1.54it/s]


 84%|███████████████████████████▊     | 42166/50000 [7:39:09<1:27:13,  1.50it/s]


 84%|███████████████████████████▊     | 42167/50000 [7:39:10<1:23:27,  1.56it/s]


 84%|███████████████████████████▊     | 42168/50000 [7:39:10<1:21:31,  1.60it/s]


 84%|███████████████████████████▊     | 42169/50000 [7:39:11<1:20:39,  1.62it/s]


 84%|███████████████████████████▊     | 42170/50000 [7:39:12<1:19:49,  1.63it/s]


 84%|███████████████████████████▊     | 42171/50000 [7:39:12<1:17:15,  1.69it/s]


 84%|███████████████████████████▊     | 42172/50000 [7:39:13<1:15:58,  1.72it/s]


 84%|███████████████████████████▊     | 42173/50000 [7:39:14<1:21:51,  1.59it/s]


 84%|███████████████████████████▊     | 42174/50000 [7:39:14<1:23:07,  1.57it/s]


 84%|███████████████████████████▊     | 42175/50000 [7:39:15<1:19:48,  1.63it/s]


 84%|███████████████████████████▊     | 42176/50000 [7:39:15<1:21:11,  1.61it/s]


 84%|███████████████████████████▊     | 42177/50000 [7:39:16<1:23:05,  1.57it/s]


 84%|███████████████████████████▊     | 42178/50000 [7:39:17<1:24:53,  1.54it/s]


 84%|███████████████████████████▊     | 42179/50000 [7:39:17<1:21:56,  1.59it/s]


 84%|███████████████████████████▊     | 42180/50000 [7:39:18<1:23:49,  1.55it/s]


 84%|███████████████████████████▊     | 42181/50000 [7:39:19<1:24:26,  1.54it/s]


 84%|███████████████████████████▊     | 42182/50000 [7:39:19<1:19:31,  1.64it/s]


 84%|███████████████████████████▊     | 42183/50000 [7:39:20<1:17:41,  1.68it/s]


 84%|███████████████████████████▊     | 42184/50000 [7:39:20<1:16:10,  1.71it/s]


 84%|███████████████████████████▊     | 42185/50000 [7:39:21<1:23:19,  1.56it/s]


 84%|███████████████████████████▊     | 42186/50000 [7:39:22<1:24:31,  1.54it/s]


 84%|███████████████████████████▊     | 42187/50000 [7:39:22<1:18:07,  1.67it/s]


 84%|███████████████████████████▊     | 42188/50000 [7:39:23<1:23:27,  1.56it/s]


 84%|███████████████████████████▊     | 42189/50000 [7:39:24<1:20:12,  1.62it/s]


 84%|███████████████████████████▊     | 42190/50000 [7:39:24<1:21:21,  1.60it/s]


 84%|███████████████████████████▊     | 42191/50000 [7:39:25<1:26:12,  1.51it/s]


 84%|███████████████████████████▊     | 42192/50000 [7:39:26<1:26:31,  1.50it/s]


 84%|███████████████████████████▊     | 42193/50000 [7:39:26<1:29:17,  1.46it/s]


 84%|███████████████████████████▊     | 42194/50000 [7:39:27<1:31:14,  1.43it/s]


 84%|███████████████████████████▊     | 42195/50000 [7:39:28<1:31:53,  1.42it/s]


 84%|███████████████████████████▊     | 42196/50000 [7:39:28<1:26:33,  1.50it/s]


 84%|███████████████████████████▊     | 42197/50000 [7:39:29<1:21:51,  1.59it/s]


 84%|███████████████████████████▊     | 42198/50000 [7:39:29<1:20:00,  1.63it/s]


 84%|███████████████████████████▊     | 42199/50000 [7:39:30<1:29:19,  1.46it/s]


 84%|███████████████████████████▊     | 42200/50000 [7:39:31<1:28:27,  1.47it/s]
                                                                                
{'loss': 3.1768, 'grad_norm': 2.7951931953430176, 'learning_rate': 0.000156, 'epoch': 2.21}

 84%|███████████████████████████▊     | 42200/50000 [7:39:31<1:28:27,  1.47it/s]


 84%|███████████████████████████▊     | 42201/50000 [7:39:32<1:30:38,  1.43it/s]


 84%|███████████████████████████▊     | 42202/50000 [7:39:32<1:26:08,  1.51it/s]


 84%|███████████████████████████▊     | 42203/50000 [7:39:33<1:28:13,  1.47it/s]


 84%|███████████████████████████▊     | 42204/50000 [7:39:34<1:35:04,  1.37it/s]


 84%|███████████████████████████▊     | 42205/50000 [7:39:35<1:31:55,  1.41it/s]


 84%|███████████████████████████▊     | 42206/50000 [7:39:35<1:26:02,  1.51it/s]


 84%|███████████████████████████▊     | 42207/50000 [7:39:36<1:20:34,  1.61it/s]


 84%|███████████████████████████▊     | 42208/50000 [7:39:36<1:28:24,  1.47it/s]


 84%|███████████████████████████▊     | 42209/50000 [7:39:37<1:23:05,  1.56it/s]


 84%|███████████████████████████▊     | 42210/50000 [7:39:38<1:25:52,  1.51it/s]


 84%|███████████████████████████▊     | 42211/50000 [7:39:38<1:26:15,  1.51it/s]


 84%|███████████████████████████▊     | 42212/50000 [7:39:39<1:25:01,  1.53it/s]


 84%|███████████████████████████▊     | 42213/50000 [7:39:40<1:19:23,  1.63it/s]


 84%|███████████████████████████▊     | 42214/50000 [7:39:40<1:20:00,  1.62it/s]


 84%|███████████████████████████▊     | 42215/50000 [7:39:41<1:24:45,  1.53it/s]


 84%|███████████████████████████▊     | 42216/50000 [7:39:42<1:32:55,  1.40it/s]


 84%|███████████████████████████▊     | 42217/50000 [7:39:42<1:28:32,  1.47it/s]


 84%|███████████████████████████▊     | 42218/50000 [7:39:43<1:26:55,  1.49it/s]


 84%|███████████████████████████▊     | 42219/50000 [7:39:44<1:28:37,  1.46it/s]


 84%|███████████████████████████▊     | 42220/50000 [7:39:44<1:27:34,  1.48it/s]


 84%|███████████████████████████▊     | 42221/50000 [7:39:45<1:23:47,  1.55it/s]


 84%|███████████████████████████▊     | 42222/50000 [7:39:46<1:26:51,  1.49it/s]


 84%|███████████████████████████▊     | 42223/50000 [7:39:46<1:27:09,  1.49it/s]


 84%|███████████████████████████▊     | 42224/50000 [7:39:47<1:24:06,  1.54it/s]


 84%|███████████████████████████▊     | 42225/50000 [7:39:48<1:26:25,  1.50it/s]


 84%|███████████████████████████▊     | 42226/50000 [7:39:48<1:28:03,  1.47it/s]


 84%|███████████████████████████▊     | 42227/50000 [7:39:49<1:30:15,  1.44it/s]


 84%|███████████████████████████▊     | 42228/50000 [7:39:50<1:29:28,  1.45it/s]


 84%|███████████████████████████▊     | 42229/50000 [7:39:50<1:25:42,  1.51it/s]


 84%|███████████████████████████▊     | 42230/50000 [7:39:51<1:23:04,  1.56it/s]


 84%|███████████████████████████▊     | 42231/50000 [7:39:52<1:23:14,  1.56it/s]


 84%|███████████████████████████▊     | 42232/50000 [7:39:52<1:18:18,  1.65it/s]


 84%|███████████████████████████▊     | 42233/50000 [7:39:53<1:22:42,  1.57it/s]


 84%|███████████████████████████▊     | 42234/50000 [7:39:53<1:23:25,  1.55it/s]


 84%|███████████████████████████▉     | 42235/50000 [7:39:54<1:22:28,  1.57it/s]


 84%|███████████████████████████▉     | 42236/50000 [7:39:55<1:26:48,  1.49it/s]


 84%|███████████████████████████▉     | 42237/50000 [7:39:55<1:25:19,  1.52it/s]


 84%|███████████████████████████▉     | 42238/50000 [7:39:56<1:21:20,  1.59it/s]


 84%|███████████████████████████▉     | 42239/50000 [7:39:57<1:19:50,  1.62it/s]


 84%|███████████████████████████▉     | 42240/50000 [7:39:57<1:18:22,  1.65it/s]


 84%|███████████████████████████▉     | 42241/50000 [7:39:58<1:16:04,  1.70it/s]


 84%|███████████████████████████▉     | 42242/50000 [7:39:58<1:18:23,  1.65it/s]


 84%|███████████████████████████▉     | 42243/50000 [7:39:59<1:19:15,  1.63it/s]


 84%|███████████████████████████▉     | 42244/50000 [7:40:00<1:18:51,  1.64it/s]


 84%|███████████████████████████▉     | 42245/50000 [7:40:00<1:24:20,  1.53it/s]


 84%|███████████████████████████▉     | 42246/50000 [7:40:01<1:22:49,  1.56it/s]


 84%|███████████████████████████▉     | 42247/50000 [7:40:02<1:21:36,  1.58it/s]


 84%|███████████████████████████▉     | 42248/50000 [7:40:02<1:20:44,  1.60it/s]


 84%|███████████████████████████▉     | 42249/50000 [7:40:03<1:18:41,  1.64it/s]


 84%|███████████████████████████▉     | 42250/50000 [7:40:03<1:19:54,  1.62it/s]


 85%|███████████████████████████▉     | 42251/50000 [7:40:04<1:17:21,  1.67it/s]


 85%|███████████████████████████▉     | 42252/50000 [7:40:05<1:19:43,  1.62it/s]


 85%|███████████████████████████▉     | 42253/50000 [7:40:05<1:27:12,  1.48it/s]


 85%|███████████████████████████▉     | 42254/50000 [7:40:06<1:26:07,  1.50it/s]


 85%|███████████████████████████▉     | 42255/50000 [7:40:07<1:25:36,  1.51it/s]


 85%|███████████████████████████▉     | 42256/50000 [7:40:07<1:24:43,  1.52it/s]


 85%|███████████████████████████▉     | 42257/50000 [7:40:08<1:24:07,  1.53it/s]


 85%|███████████████████████████▉     | 42258/50000 [7:40:09<1:18:11,  1.65it/s]


 85%|███████████████████████████▉     | 42259/50000 [7:40:09<1:22:14,  1.57it/s]


 85%|███████████████████████████▉     | 42260/50000 [7:40:10<1:20:54,  1.59it/s]


 85%|███████████████████████████▉     | 42261/50000 [7:40:10<1:16:55,  1.68it/s]


 85%|███████████████████████████▉     | 42262/50000 [7:40:11<1:22:24,  1.56it/s]


 85%|███████████████████████████▉     | 42263/50000 [7:40:12<1:22:57,  1.55it/s]


 85%|███████████████████████████▉     | 42264/50000 [7:40:13<1:30:07,  1.43it/s]


 85%|███████████████████████████▉     | 42265/50000 [7:40:13<1:28:16,  1.46it/s]


 85%|███████████████████████████▉     | 42266/50000 [7:40:14<1:27:29,  1.47it/s]


 85%|███████████████████████████▉     | 42267/50000 [7:40:15<1:26:26,  1.49it/s]


 85%|███████████████████████████▉     | 42268/50000 [7:40:15<1:25:19,  1.51it/s]


 85%|███████████████████████████▉     | 42269/50000 [7:40:16<1:25:43,  1.50it/s]


 85%|███████████████████████████▉     | 42270/50000 [7:40:17<1:26:24,  1.49it/s]


 85%|███████████████████████████▉     | 42271/50000 [7:40:17<1:23:00,  1.55it/s]


 85%|███████████████████████████▉     | 42272/50000 [7:40:18<1:27:01,  1.48it/s]


 85%|███████████████████████████▉     | 42273/50000 [7:40:19<1:32:54,  1.39it/s]


 85%|███████████████████████████▉     | 42274/50000 [7:40:20<1:36:46,  1.33it/s]


 85%|███████████████████████████▉     | 42275/50000 [7:40:20<1:36:11,  1.34it/s]


 85%|███████████████████████████▉     | 42276/50000 [7:40:21<1:29:06,  1.44it/s]


 85%|███████████████████████████▉     | 42277/50000 [7:40:22<1:31:20,  1.41it/s]


 85%|███████████████████████████▉     | 42278/50000 [7:40:22<1:25:40,  1.50it/s]


 85%|███████████████████████████▉     | 42279/50000 [7:40:23<1:23:11,  1.55it/s]


 85%|███████████████████████████▉     | 42280/50000 [7:40:23<1:20:24,  1.60it/s]


 85%|███████████████████████████▉     | 42281/50000 [7:40:24<1:18:21,  1.64it/s]


 85%|███████████████████████████▉     | 42282/50000 [7:40:25<1:26:34,  1.49it/s]


 85%|███████████████████████████▉     | 42283/50000 [7:40:25<1:20:00,  1.61it/s]


 85%|███████████████████████████▉     | 42284/50000 [7:40:26<1:17:44,  1.65it/s]


 85%|███████████████████████████▉     | 42285/50000 [7:40:26<1:19:02,  1.63it/s]


 85%|███████████████████████████▉     | 42286/50000 [7:40:27<1:17:03,  1.67it/s]


 85%|███████████████████████████▉     | 42287/50000 [7:40:28<1:19:25,  1.62it/s]


 85%|███████████████████████████▉     | 42288/50000 [7:40:28<1:19:04,  1.63it/s]


 85%|███████████████████████████▉     | 42289/50000 [7:40:29<1:21:41,  1.57it/s]


 85%|███████████████████████████▉     | 42290/50000 [7:40:30<1:21:33,  1.58it/s]


 85%|███████████████████████████▉     | 42291/50000 [7:40:30<1:20:29,  1.60it/s]


 85%|███████████████████████████▉     | 42292/50000 [7:40:31<1:18:17,  1.64it/s]


 85%|███████████████████████████▉     | 42293/50000 [7:40:31<1:17:44,  1.65it/s]


 85%|███████████████████████████▉     | 42294/50000 [7:40:32<1:16:21,  1.68it/s]


 85%|███████████████████████████▉     | 42295/50000 [7:40:33<1:21:49,  1.57it/s]


 85%|███████████████████████████▉     | 42296/50000 [7:40:33<1:22:21,  1.56it/s]


 85%|███████████████████████████▉     | 42297/50000 [7:40:34<1:27:44,  1.46it/s]


 85%|███████████████████████████▉     | 42298/50000 [7:40:35<1:26:27,  1.48it/s]


 85%|███████████████████████████▉     | 42299/50000 [7:40:35<1:26:12,  1.49it/s]


 85%|███████████████████████████▉     | 42300/50000 [7:40:36<1:24:29,  1.52it/s]


                                                                                
{'loss': 3.1532, 'grad_norm': 2.711771011352539, 'learning_rate': 0.000154, 'epoch': 2.21}

 85%|███████████████████████████▉     | 42300/50000 [7:40:36<1:24:29,  1.52it/s]


 85%|███████████████████████████▉     | 42301/50000 [7:40:37<1:23:06,  1.54it/s]


 85%|███████████████████████████▉     | 42302/50000 [7:40:37<1:20:48,  1.59it/s]


 85%|███████████████████████████▉     | 42303/50000 [7:40:38<1:22:01,  1.56it/s]


 85%|███████████████████████████▉     | 42304/50000 [7:40:38<1:19:07,  1.62it/s]


 85%|███████████████████████████▉     | 42305/50000 [7:40:39<1:27:45,  1.46it/s]


 85%|███████████████████████████▉     | 42306/50000 [7:40:40<1:30:44,  1.41it/s]


 85%|███████████████████████████▉     | 42307/50000 [7:40:41<1:24:28,  1.52it/s]


 85%|███████████████████████████▉     | 42308/50000 [7:40:41<1:22:59,  1.54it/s]


 85%|███████████████████████████▉     | 42309/50000 [7:40:42<1:17:56,  1.64it/s]


 85%|███████████████████████████▉     | 42310/50000 [7:40:42<1:13:39,  1.74it/s]


 85%|███████████████████████████▉     | 42311/50000 [7:40:43<1:13:28,  1.74it/s]


 85%|███████████████████████████▉     | 42312/50000 [7:40:44<1:17:14,  1.66it/s]


 85%|███████████████████████████▉     | 42313/50000 [7:40:44<1:21:52,  1.56it/s]


 85%|███████████████████████████▉     | 42314/50000 [7:40:45<1:25:50,  1.49it/s]


 85%|███████████████████████████▉     | 42315/50000 [7:40:46<1:26:05,  1.49it/s]


 85%|███████████████████████████▉     | 42316/50000 [7:40:46<1:23:34,  1.53it/s]


 85%|███████████████████████████▉     | 42317/50000 [7:40:47<1:23:30,  1.53it/s]


 85%|███████████████████████████▉     | 42318/50000 [7:40:48<1:21:45,  1.57it/s]


 85%|███████████████████████████▉     | 42319/50000 [7:40:48<1:17:02,  1.66it/s]


 85%|███████████████████████████▉     | 42320/50000 [7:40:49<1:16:17,  1.68it/s]


 85%|███████████████████████████▉     | 42321/50000 [7:40:49<1:15:30,  1.69it/s]


 85%|███████████████████████████▉     | 42322/50000 [7:40:50<1:18:50,  1.62it/s]


 85%|███████████████████████████▉     | 42323/50000 [7:40:50<1:19:13,  1.61it/s]


 85%|███████████████████████████▉     | 42324/50000 [7:40:51<1:24:42,  1.51it/s]


 85%|███████████████████████████▉     | 42325/50000 [7:40:52<1:30:35,  1.41it/s]


 85%|███████████████████████████▉     | 42326/50000 [7:40:53<1:30:52,  1.41it/s]


 85%|███████████████████████████▉     | 42327/50000 [7:40:54<1:31:09,  1.40it/s]


 85%|███████████████████████████▉     | 42328/50000 [7:40:54<1:35:34,  1.34it/s]


 85%|███████████████████████████▉     | 42329/50000 [7:40:55<1:26:41,  1.47it/s]


 85%|███████████████████████████▉     | 42330/50000 [7:40:56<1:26:38,  1.48it/s]


 85%|███████████████████████████▉     | 42331/50000 [7:40:56<1:23:26,  1.53it/s]


 85%|███████████████████████████▉     | 42332/50000 [7:40:57<1:23:23,  1.53it/s]


 85%|███████████████████████████▉     | 42333/50000 [7:40:57<1:19:58,  1.60it/s]


 85%|███████████████████████████▉     | 42334/50000 [7:40:58<1:17:33,  1.65it/s]


 85%|███████████████████████████▉     | 42335/50000 [7:40:58<1:16:37,  1.67it/s]


 85%|███████████████████████████▉     | 42336/50000 [7:40:59<1:19:50,  1.60it/s]


 85%|███████████████████████████▉     | 42337/50000 [7:41:00<1:27:35,  1.46it/s]


 85%|███████████████████████████▉     | 42338/50000 [7:41:01<1:29:32,  1.43it/s]


 85%|███████████████████████████▉     | 42339/50000 [7:41:01<1:26:41,  1.47it/s]


 85%|███████████████████████████▉     | 42340/50000 [7:41:02<1:22:14,  1.55it/s]


 85%|███████████████████████████▉     | 42341/50000 [7:41:03<1:19:52,  1.60it/s]


 85%|███████████████████████████▉     | 42342/50000 [7:41:03<1:17:15,  1.65it/s]


 85%|███████████████████████████▉     | 42343/50000 [7:41:04<1:17:04,  1.66it/s]


 85%|███████████████████████████▉     | 42344/50000 [7:41:05<1:28:07,  1.45it/s]


 85%|███████████████████████████▉     | 42345/50000 [7:41:05<1:29:23,  1.43it/s]


 85%|███████████████████████████▉     | 42346/50000 [7:41:06<1:24:07,  1.52it/s]


 85%|███████████████████████████▉     | 42347/50000 [7:41:06<1:21:09,  1.57it/s]


 85%|███████████████████████████▉     | 42348/50000 [7:41:07<1:20:01,  1.59it/s]


 85%|███████████████████████████▉     | 42349/50000 [7:41:08<1:19:44,  1.60it/s]


 85%|███████████████████████████▉     | 42350/50000 [7:41:08<1:17:36,  1.64it/s]


 85%|███████████████████████████▉     | 42351/50000 [7:41:09<1:27:02,  1.46it/s]


 85%|███████████████████████████▉     | 42352/50000 [7:41:10<1:24:52,  1.50it/s]


 85%|███████████████████████████▉     | 42353/50000 [7:41:10<1:24:18,  1.51it/s]


 85%|███████████████████████████▉     | 42354/50000 [7:41:11<1:20:32,  1.58it/s]


 85%|███████████████████████████▉     | 42355/50000 [7:41:12<1:20:48,  1.58it/s]


 85%|███████████████████████████▉     | 42356/50000 [7:41:12<1:18:19,  1.63it/s]


 85%|███████████████████████████▉     | 42357/50000 [7:41:13<1:17:19,  1.65it/s]


 85%|███████████████████████████▉     | 42358/50000 [7:41:13<1:18:47,  1.62it/s]


 85%|███████████████████████████▉     | 42359/50000 [7:41:14<1:16:37,  1.66it/s]


 85%|███████████████████████████▉     | 42360/50000 [7:41:14<1:15:24,  1.69it/s]


 85%|███████████████████████████▉     | 42361/50000 [7:41:15<1:11:47,  1.77it/s]


 85%|███████████████████████████▉     | 42362/50000 [7:41:16<1:18:26,  1.62it/s]


 85%|███████████████████████████▉     | 42363/50000 [7:41:16<1:20:35,  1.58it/s]


 85%|███████████████████████████▉     | 42364/50000 [7:41:17<1:24:53,  1.50it/s]


 85%|███████████████████████████▉     | 42365/50000 [7:41:18<1:21:58,  1.55it/s]


 85%|███████████████████████████▉     | 42366/50000 [7:41:18<1:18:44,  1.62it/s]


 85%|███████████████████████████▉     | 42367/50000 [7:41:19<1:23:46,  1.52it/s]


 85%|███████████████████████████▉     | 42368/50000 [7:41:20<1:22:22,  1.54it/s]


 85%|███████████████████████████▉     | 42369/50000 [7:41:20<1:22:35,  1.54it/s]


 85%|███████████████████████████▉     | 42370/50000 [7:41:21<1:19:36,  1.60it/s]


 85%|███████████████████████████▉     | 42371/50000 [7:41:22<1:22:50,  1.53it/s]


 85%|███████████████████████████▉     | 42372/50000 [7:41:22<1:26:53,  1.46it/s]


 85%|███████████████████████████▉     | 42373/50000 [7:41:23<1:25:03,  1.49it/s]


 85%|███████████████████████████▉     | 42374/50000 [7:41:24<1:20:23,  1.58it/s]


 85%|███████████████████████████▉     | 42375/50000 [7:41:24<1:21:40,  1.56it/s]


 85%|███████████████████████████▉     | 42376/50000 [7:41:25<1:18:30,  1.62it/s]


 85%|███████████████████████████▉     | 42377/50000 [7:41:25<1:18:00,  1.63it/s]


 85%|███████████████████████████▉     | 42378/50000 [7:41:26<1:19:31,  1.60it/s]


 85%|███████████████████████████▉     | 42379/50000 [7:41:27<1:20:59,  1.57it/s]


 85%|███████████████████████████▉     | 42380/50000 [7:41:27<1:20:47,  1.57it/s]


 85%|███████████████████████████▉     | 42381/50000 [7:41:28<1:20:40,  1.57it/s]


 85%|███████████████████████████▉     | 42382/50000 [7:41:29<1:30:42,  1.40it/s]


 85%|███████████████████████████▉     | 42383/50000 [7:41:30<1:36:03,  1.32it/s]


 85%|███████████████████████████▉     | 42384/50000 [7:41:30<1:32:31,  1.37it/s]


 85%|███████████████████████████▉     | 42385/50000 [7:41:31<1:29:24,  1.42it/s]


 85%|███████████████████████████▉     | 42386/50000 [7:41:32<1:28:21,  1.44it/s]


 85%|███████████████████████████▉     | 42387/50000 [7:41:32<1:29:10,  1.42it/s]


 85%|███████████████████████████▉     | 42388/50000 [7:41:33<1:31:39,  1.38it/s]


 85%|███████████████████████████▉     | 42389/50000 [7:41:34<1:27:01,  1.46it/s]


 85%|███████████████████████████▉     | 42390/50000 [7:41:35<1:33:10,  1.36it/s]


 85%|███████████████████████████▉     | 42391/50000 [7:41:35<1:24:35,  1.50it/s]


 85%|███████████████████████████▉     | 42392/50000 [7:41:36<1:24:16,  1.50it/s]


 85%|███████████████████████████▉     | 42393/50000 [7:41:36<1:23:06,  1.53it/s]


 85%|███████████████████████████▉     | 42394/50000 [7:41:37<1:27:08,  1.45it/s]


 85%|███████████████████████████▉     | 42395/50000 [7:41:38<1:26:44,  1.46it/s]


 85%|███████████████████████████▉     | 42396/50000 [7:41:38<1:23:20,  1.52it/s]


 85%|███████████████████████████▉     | 42397/50000 [7:41:39<1:26:34,  1.46it/s]


 85%|███████████████████████████▉     | 42398/50000 [7:41:40<1:22:57,  1.53it/s]


 85%|███████████████████████████▉     | 42399/50000 [7:41:40<1:23:06,  1.52it/s]


 85%|███████████████████████████▉     | 42400/50000 [7:41:41<1:21:00,  1.56it/s]
                                                                                
{'loss': 3.1226, 'grad_norm': 2.9689505100250244, 'learning_rate': 0.000152, 'epoch': 2.22}

 85%|███████████████████████████▉     | 42400/50000 [7:41:41<1:21:00,  1.56it/s]


 85%|███████████████████████████▉     | 42401/50000 [7:41:42<1:22:10,  1.54it/s]


 85%|███████████████████████████▉     | 42402/50000 [7:41:42<1:16:11,  1.66it/s]


 85%|███████████████████████████▉     | 42403/50000 [7:41:43<1:17:51,  1.63it/s]


 85%|███████████████████████████▉     | 42404/50000 [7:41:44<1:23:27,  1.52it/s]


 85%|███████████████████████████▉     | 42405/50000 [7:41:44<1:21:14,  1.56it/s]


 85%|███████████████████████████▉     | 42406/50000 [7:41:45<1:21:06,  1.56it/s]


 85%|███████████████████████████▉     | 42407/50000 [7:41:45<1:19:21,  1.59it/s]


 85%|███████████████████████████▉     | 42408/50000 [7:41:46<1:23:15,  1.52it/s]


 85%|███████████████████████████▉     | 42409/50000 [7:41:47<1:23:29,  1.52it/s]


 85%|███████████████████████████▉     | 42410/50000 [7:41:47<1:19:35,  1.59it/s]


 85%|███████████████████████████▉     | 42411/50000 [7:41:48<1:19:28,  1.59it/s]


 85%|███████████████████████████▉     | 42412/50000 [7:41:49<1:19:53,  1.58it/s]


 85%|███████████████████████████▉     | 42413/50000 [7:41:49<1:24:49,  1.49it/s]


 85%|███████████████████████████▉     | 42414/50000 [7:41:50<1:23:51,  1.51it/s]


 85%|███████████████████████████▉     | 42415/50000 [7:41:51<1:22:27,  1.53it/s]


 85%|███████████████████████████▉     | 42416/50000 [7:41:52<1:26:50,  1.46it/s]


 85%|███████████████████████████▉     | 42417/50000 [7:41:52<1:22:38,  1.53it/s]


 85%|███████████████████████████▉     | 42418/50000 [7:41:53<1:31:42,  1.38it/s]


 85%|███████████████████████████▉     | 42419/50000 [7:41:54<1:36:37,  1.31it/s]


 85%|███████████████████████████▉     | 42420/50000 [7:41:55<1:33:34,  1.35it/s]


 85%|███████████████████████████▉     | 42421/50000 [7:41:55<1:30:41,  1.39it/s]


 85%|███████████████████████████▉     | 42422/50000 [7:41:56<1:26:01,  1.47it/s]


 85%|███████████████████████████▉     | 42423/50000 [7:41:56<1:22:58,  1.52it/s]


 85%|███████████████████████████▉     | 42424/50000 [7:41:57<1:19:25,  1.59it/s]


 85%|████████████████████████████     | 42425/50000 [7:41:58<1:18:22,  1.61it/s]


 85%|████████████████████████████     | 42426/50000 [7:41:58<1:16:41,  1.65it/s]


 85%|████████████████████████████     | 42427/50000 [7:41:59<1:16:12,  1.66it/s]


 85%|████████████████████████████     | 42428/50000 [7:41:59<1:15:22,  1.67it/s]


 85%|████████████████████████████     | 42429/50000 [7:42:00<1:14:20,  1.70it/s]


 85%|████████████████████████████     | 42430/50000 [7:42:01<1:16:42,  1.64it/s]


 85%|████████████████████████████     | 42431/50000 [7:42:01<1:21:20,  1.55it/s]


 85%|████████████████████████████     | 42432/50000 [7:42:02<1:28:45,  1.42it/s]


 85%|████████████████████████████     | 42433/50000 [7:42:03<1:27:45,  1.44it/s]


 85%|████████████████████████████     | 42434/50000 [7:42:03<1:25:51,  1.47it/s]


 85%|████████████████████████████     | 42435/50000 [7:42:04<1:20:59,  1.56it/s]


 85%|████████████████████████████     | 42436/50000 [7:42:05<1:20:26,  1.57it/s]


 85%|████████████████████████████     | 42437/50000 [7:42:05<1:24:58,  1.48it/s]


 85%|████████████████████████████     | 42438/50000 [7:42:06<1:23:33,  1.51it/s]


 85%|████████████████████████████     | 42439/50000 [7:42:07<1:27:52,  1.43it/s]


 85%|████████████████████████████     | 42440/50000 [7:42:08<1:32:10,  1.37it/s]


 85%|████████████████████████████     | 42441/50000 [7:42:08<1:31:24,  1.38it/s]


 85%|████████████████████████████     | 42442/50000 [7:42:09<1:25:22,  1.48it/s]


 85%|████████████████████████████     | 42443/50000 [7:42:10<1:26:34,  1.45it/s]


 85%|████████████████████████████     | 42444/50000 [7:42:10<1:24:32,  1.49it/s]


 85%|████████████████████████████     | 42445/50000 [7:42:11<1:19:59,  1.57it/s]


 85%|████████████████████████████     | 42446/50000 [7:42:11<1:17:02,  1.63it/s]


 85%|████████████████████████████     | 42447/50000 [7:42:12<1:21:58,  1.54it/s]


 85%|████████████████████████████     | 42448/50000 [7:42:13<1:31:19,  1.38it/s]


 85%|████████████████████████████     | 42449/50000 [7:42:14<1:34:58,  1.33it/s]


 85%|████████████████████████████     | 42450/50000 [7:42:14<1:30:22,  1.39it/s]


 85%|████████████████████████████     | 42451/50000 [7:42:15<1:27:08,  1.44it/s]


 85%|████████████████████████████     | 42452/50000 [7:42:16<1:23:54,  1.50it/s]


 85%|████████████████████████████     | 42453/50000 [7:42:16<1:20:41,  1.56it/s]


 85%|████████████████████████████     | 42454/50000 [7:42:17<1:23:33,  1.51it/s]


 85%|████████████████████████████     | 42455/50000 [7:42:18<1:19:45,  1.58it/s]


 85%|████████████████████████████     | 42456/50000 [7:42:18<1:24:05,  1.50it/s]


 85%|████████████████████████████     | 42457/50000 [7:42:19<1:20:49,  1.56it/s]


 85%|████████████████████████████     | 42458/50000 [7:42:20<1:21:38,  1.54it/s]


 85%|████████████████████████████     | 42459/50000 [7:42:20<1:18:23,  1.60it/s]


 85%|████████████████████████████     | 42460/50000 [7:42:21<1:19:25,  1.58it/s]


 85%|████████████████████████████     | 42461/50000 [7:42:21<1:20:26,  1.56it/s]


 85%|████████████████████████████     | 42462/50000 [7:42:22<1:18:15,  1.61it/s]


 85%|████████████████████████████     | 42463/50000 [7:42:23<1:31:49,  1.37it/s]


 85%|████████████████████████████     | 42464/50000 [7:42:24<1:25:30,  1.47it/s]


 85%|████████████████████████████     | 42465/50000 [7:42:24<1:23:13,  1.51it/s]


 85%|████████████████████████████     | 42466/50000 [7:42:25<1:19:59,  1.57it/s]


 85%|████████████████████████████     | 42467/50000 [7:42:25<1:21:15,  1.55it/s]


 85%|████████████████████████████     | 42468/50000 [7:42:26<1:18:06,  1.61it/s]


 85%|████████████████████████████     | 42469/50000 [7:42:27<1:29:58,  1.40it/s]


 85%|████████████████████████████     | 42470/50000 [7:42:27<1:24:40,  1.48it/s]


 85%|████████████████████████████     | 42471/50000 [7:42:28<1:20:01,  1.57it/s]


 85%|████████████████████████████     | 42472/50000 [7:42:29<1:19:41,  1.57it/s]


 85%|████████████████████████████     | 42473/50000 [7:42:29<1:17:56,  1.61it/s]


 85%|████████████████████████████     | 42474/50000 [7:42:30<1:18:36,  1.60it/s]


 85%|████████████████████████████     | 42475/50000 [7:42:31<1:19:03,  1.59it/s]


 85%|████████████████████████████     | 42476/50000 [7:42:31<1:19:28,  1.58it/s]


 85%|████████████████████████████     | 42477/50000 [7:42:32<1:23:52,  1.49it/s]


 85%|████████████████████████████     | 42478/50000 [7:42:33<1:23:56,  1.49it/s]


 85%|████████████████████████████     | 42479/50000 [7:42:33<1:26:11,  1.45it/s]


 85%|████████████████████████████     | 42480/50000 [7:42:34<1:22:14,  1.52it/s]


 85%|████████████████████████████     | 42481/50000 [7:42:34<1:20:09,  1.56it/s]


 85%|████████████████████████████     | 42482/50000 [7:42:35<1:24:17,  1.49it/s]


 85%|████████████████████████████     | 42483/50000 [7:42:36<1:21:34,  1.54it/s]


 85%|████████████████████████████     | 42484/50000 [7:42:36<1:20:24,  1.56it/s]


 85%|████████████████████████████     | 42485/50000 [7:42:37<1:19:50,  1.57it/s]


 85%|████████████████████████████     | 42486/50000 [7:42:38<1:16:48,  1.63it/s]


 85%|████████████████████████████     | 42487/50000 [7:42:38<1:17:17,  1.62it/s]


 85%|████████████████████████████     | 42488/50000 [7:42:39<1:22:02,  1.53it/s]


 85%|████████████████████████████     | 42489/50000 [7:42:40<1:23:11,  1.50it/s]


 85%|████████████████████████████     | 42490/50000 [7:42:40<1:26:54,  1.44it/s]


 85%|████████████████████████████     | 42491/50000 [7:42:41<1:25:19,  1.47it/s]


 85%|████████████████████████████     | 42492/50000 [7:42:42<1:20:38,  1.55it/s]


 85%|████████████████████████████     | 42493/50000 [7:42:42<1:20:26,  1.56it/s]


 85%|████████████████████████████     | 42494/50000 [7:42:43<1:24:54,  1.47it/s]


 85%|████████████████████████████     | 42495/50000 [7:42:44<1:26:22,  1.45it/s]


 85%|████████████████████████████     | 42496/50000 [7:42:44<1:23:19,  1.50it/s]


 85%|████████████████████████████     | 42497/50000 [7:42:45<1:19:00,  1.58it/s]


 85%|████████████████████████████     | 42498/50000 [7:42:46<1:22:56,  1.51it/s]


 85%|████████████████████████████     | 42499/50000 [7:42:46<1:20:08,  1.56it/s]


 85%|████████████████████████████     | 42500/50000 [7:42:47<1:24:11,  1.48it/s]
                                                                                
{'loss': 3.1558, 'grad_norm': 2.8572888374328613, 'learning_rate': 0.00015, 'epoch': 2.23}

 85%|████████████████████████████     | 42500/50000 [7:42:47<1:24:11,  1.48it/s]


 85%|████████████████████████████     | 42501/50000 [7:42:48<1:20:19,  1.56it/s]


 85%|████████████████████████████     | 42502/50000 [7:42:48<1:20:09,  1.56it/s]


 85%|████████████████████████████     | 42503/50000 [7:42:49<1:18:24,  1.59it/s]


 85%|████████████████████████████     | 42504/50000 [7:42:49<1:19:17,  1.58it/s]


 85%|████████████████████████████     | 42505/50000 [7:42:50<1:16:24,  1.63it/s]


 85%|████████████████████████████     | 42506/50000 [7:42:51<1:22:45,  1.51it/s]


 85%|████████████████████████████     | 42507/50000 [7:42:51<1:21:51,  1.53it/s]


 85%|████████████████████████████     | 42508/50000 [7:42:52<1:19:53,  1.56it/s]


 85%|████████████████████████████     | 42509/50000 [7:42:53<1:15:28,  1.65it/s]


 85%|████████████████████████████     | 42510/50000 [7:42:53<1:14:02,  1.69it/s]


 85%|████████████████████████████     | 42511/50000 [7:42:54<1:27:01,  1.43it/s]


 85%|████████████████████████████     | 42512/50000 [7:42:55<1:22:29,  1.51it/s]


 85%|████████████████████████████     | 42513/50000 [7:42:55<1:18:22,  1.59it/s]


 85%|████████████████████████████     | 42514/50000 [7:42:56<1:15:29,  1.65it/s]


 85%|████████████████████████████     | 42515/50000 [7:42:56<1:16:04,  1.64it/s]


 85%|████████████████████████████     | 42516/50000 [7:42:57<1:15:03,  1.66it/s]


 85%|████████████████████████████     | 42517/50000 [7:42:58<1:14:20,  1.68it/s]


 85%|████████████████████████████     | 42518/50000 [7:42:58<1:14:17,  1.68it/s]


 85%|████████████████████████████     | 42519/50000 [7:42:59<1:15:25,  1.65it/s]


 85%|████████████████████████████     | 42520/50000 [7:42:59<1:16:00,  1.64it/s]


 85%|████████████████████████████     | 42521/50000 [7:43:00<1:17:04,  1.62it/s]


 85%|████████████████████████████     | 42522/50000 [7:43:01<1:14:33,  1.67it/s]


 85%|████████████████████████████     | 42523/50000 [7:43:01<1:16:32,  1.63it/s]


 85%|████████████████████████████     | 42524/50000 [7:43:02<1:18:10,  1.59it/s]


 85%|████████████████████████████     | 42525/50000 [7:43:03<1:26:22,  1.44it/s]


 85%|████████████████████████████     | 42526/50000 [7:43:03<1:27:11,  1.43it/s]


 85%|████████████████████████████     | 42527/50000 [7:43:04<1:23:15,  1.50it/s]


 85%|████████████████████████████     | 42528/50000 [7:43:05<1:20:01,  1.56it/s]


 85%|████████████████████████████     | 42529/50000 [7:43:05<1:18:00,  1.60it/s]


 85%|████████████████████████████     | 42530/50000 [7:43:06<1:16:07,  1.64it/s]


 85%|████████████████████████████     | 42531/50000 [7:43:06<1:15:28,  1.65it/s]


 85%|████████████████████████████     | 42532/50000 [7:43:07<1:19:51,  1.56it/s]


 85%|████████████████████████████     | 42533/50000 [7:43:08<1:15:09,  1.66it/s]


 85%|████████████████████████████     | 42534/50000 [7:43:08<1:15:02,  1.66it/s]


 85%|████████████████████████████     | 42535/50000 [7:43:09<1:17:06,  1.61it/s]


 85%|████████████████████████████     | 42536/50000 [7:43:09<1:16:24,  1.63it/s]


 85%|████████████████████████████     | 42537/50000 [7:43:10<1:12:58,  1.70it/s]


 85%|████████████████████████████     | 42538/50000 [7:43:11<1:18:06,  1.59it/s]


 85%|████████████████████████████     | 42539/50000 [7:43:11<1:21:07,  1.53it/s]


 85%|████████████████████████████     | 42540/50000 [7:43:12<1:23:33,  1.49it/s]


 85%|████████████████████████████     | 42541/50000 [7:43:13<1:19:15,  1.57it/s]


 85%|████████████████████████████     | 42542/50000 [7:43:13<1:19:06,  1.57it/s]


 85%|████████████████████████████     | 42543/50000 [7:43:14<1:22:20,  1.51it/s]


 85%|████████████████████████████     | 42544/50000 [7:43:15<1:20:59,  1.53it/s]


 85%|████████████████████████████     | 42545/50000 [7:43:15<1:20:17,  1.55it/s]


 85%|████████████████████████████     | 42546/50000 [7:43:16<1:18:37,  1.58it/s]


 85%|████████████████████████████     | 42547/50000 [7:43:17<1:15:47,  1.64it/s]


 85%|████████████████████████████     | 42548/50000 [7:43:17<1:13:34,  1.69it/s]


 85%|████████████████████████████     | 42549/50000 [7:43:18<1:12:43,  1.71it/s]


 85%|████████████████████████████     | 42550/50000 [7:43:18<1:17:18,  1.61it/s]


 85%|████████████████████████████     | 42551/50000 [7:43:19<1:18:38,  1.58it/s]


 85%|████████████████████████████     | 42552/50000 [7:43:20<1:16:44,  1.62it/s]


 85%|████████████████████████████     | 42553/50000 [7:43:20<1:17:02,  1.61it/s]


 85%|████████████████████████████     | 42554/50000 [7:43:21<1:18:38,  1.58it/s]


 85%|████████████████████████████     | 42555/50000 [7:43:21<1:18:36,  1.58it/s]


 85%|████████████████████████████     | 42556/50000 [7:43:22<1:18:33,  1.58it/s]


 85%|████████████████████████████     | 42557/50000 [7:43:23<1:16:39,  1.62it/s]


 85%|████████████████████████████     | 42558/50000 [7:43:23<1:22:02,  1.51it/s]


 85%|████████████████████████████     | 42559/50000 [7:43:24<1:22:23,  1.51it/s]


 85%|████████████████████████████     | 42560/50000 [7:43:25<1:24:35,  1.47it/s]


 85%|████████████████████████████     | 42561/50000 [7:43:26<1:26:36,  1.43it/s]


 85%|████████████████████████████     | 42562/50000 [7:43:26<1:24:37,  1.46it/s]


 85%|████████████████████████████     | 42563/50000 [7:43:27<1:29:19,  1.39it/s]


 85%|████████████████████████████     | 42564/50000 [7:43:28<1:26:03,  1.44it/s]


 85%|████████████████████████████     | 42565/50000 [7:43:28<1:25:11,  1.45it/s]


 85%|████████████████████████████     | 42566/50000 [7:43:29<1:23:36,  1.48it/s]


 85%|████████████████████████████     | 42567/50000 [7:43:30<1:21:48,  1.51it/s]


 85%|████████████████████████████     | 42568/50000 [7:43:30<1:19:22,  1.56it/s]


 85%|████████████████████████████     | 42569/50000 [7:43:31<1:20:28,  1.54it/s]


 85%|████████████████████████████     | 42570/50000 [7:43:32<1:21:29,  1.52it/s]


 85%|████████████████████████████     | 42571/50000 [7:43:32<1:22:10,  1.51it/s]


 85%|████████████████████████████     | 42572/50000 [7:43:33<1:20:04,  1.55it/s]


 85%|████████████████████████████     | 42573/50000 [7:43:33<1:17:25,  1.60it/s]


 85%|████████████████████████████     | 42574/50000 [7:43:34<1:24:59,  1.46it/s]


 85%|████████████████████████████     | 42575/50000 [7:43:35<1:32:43,  1.33it/s]


 85%|████████████████████████████     | 42576/50000 [7:43:36<1:28:44,  1.39it/s]


 85%|████████████████████████████     | 42577/50000 [7:43:36<1:24:30,  1.46it/s]


 85%|████████████████████████████     | 42578/50000 [7:43:37<1:22:43,  1.50it/s]


 85%|████████████████████████████     | 42579/50000 [7:43:38<1:19:20,  1.56it/s]


 85%|████████████████████████████     | 42580/50000 [7:43:38<1:16:31,  1.62it/s]


 85%|████████████████████████████     | 42581/50000 [7:43:39<1:17:48,  1.59it/s]


 85%|████████████████████████████     | 42582/50000 [7:43:39<1:13:40,  1.68it/s]


 85%|████████████████████████████     | 42583/50000 [7:43:40<1:12:44,  1.70it/s]


 85%|████████████████████████████     | 42584/50000 [7:43:40<1:10:20,  1.76it/s]


 85%|████████████████████████████     | 42585/50000 [7:43:41<1:10:55,  1.74it/s]


 85%|████████████████████████████     | 42586/50000 [7:43:42<1:23:31,  1.48it/s]


 85%|████████████████████████████     | 42587/50000 [7:43:43<1:21:58,  1.51it/s]


 85%|████████████████████████████     | 42588/50000 [7:43:44<1:35:27,  1.29it/s]


 85%|████████████████████████████     | 42589/50000 [7:43:44<1:31:11,  1.35it/s]


 85%|████████████████████████████     | 42590/50000 [7:43:45<1:31:06,  1.36it/s]


 85%|████████████████████████████     | 42591/50000 [7:43:46<1:27:54,  1.40it/s]


 85%|████████████████████████████     | 42592/50000 [7:43:46<1:25:39,  1.44it/s]


 85%|████████████████████████████     | 42593/50000 [7:43:47<1:24:19,  1.46it/s]


 85%|████████████████████████████     | 42594/50000 [7:43:47<1:18:21,  1.58it/s]


 85%|████████████████████████████     | 42595/50000 [7:43:48<1:15:02,  1.64it/s]


 85%|████████████████████████████     | 42596/50000 [7:43:49<1:17:08,  1.60it/s]


 85%|████████████████████████████     | 42597/50000 [7:43:49<1:15:05,  1.64it/s]


 85%|████████████████████████████     | 42598/50000 [7:43:50<1:18:47,  1.57it/s]


 85%|████████████████████████████     | 42599/50000 [7:43:51<1:16:42,  1.61it/s]


 85%|████████████████████████████     | 42600/50000 [7:43:51<1:19:55,  1.54it/s]
                                                                                
{'loss': 3.1303, 'grad_norm': 3.8737986087799072, 'learning_rate': 0.000148, 'epoch': 2.23}

 85%|████████████████████████████     | 42600/50000 [7:43:51<1:19:55,  1.54it/s]


 85%|████████████████████████████     | 42601/50000 [7:43:52<1:22:57,  1.49it/s]


 85%|████████████████████████████     | 42602/50000 [7:43:53<1:25:03,  1.45it/s]


 85%|████████████████████████████     | 42603/50000 [7:43:53<1:21:46,  1.51it/s]


 85%|████████████████████████████     | 42604/50000 [7:43:54<1:18:45,  1.57it/s]


 85%|████████████████████████████     | 42605/50000 [7:43:55<1:25:01,  1.45it/s]


 85%|████████████████████████████     | 42606/50000 [7:43:55<1:26:16,  1.43it/s]


 85%|████████████████████████████     | 42607/50000 [7:43:56<1:19:40,  1.55it/s]


 85%|████████████████████████████     | 42608/50000 [7:43:57<1:17:58,  1.58it/s]


 85%|████████████████████████████     | 42609/50000 [7:43:57<1:19:21,  1.55it/s]


 85%|████████████████████████████     | 42610/50000 [7:43:58<1:23:42,  1.47it/s]


 85%|████████████████████████████     | 42611/50000 [7:43:59<1:22:54,  1.49it/s]


 85%|████████████████████████████     | 42612/50000 [7:43:59<1:19:32,  1.55it/s]


 85%|████████████████████████████     | 42613/50000 [7:44:00<1:20:27,  1.53it/s]


 85%|████████████████████████████▏    | 42614/50000 [7:44:00<1:17:34,  1.59it/s]


 85%|████████████████████████████▏    | 42615/50000 [7:44:01<1:22:12,  1.50it/s]


 85%|████████████████████████████▏    | 42616/50000 [7:44:02<1:21:22,  1.51it/s]


 85%|████████████████████████████▏    | 42617/50000 [7:44:03<1:19:52,  1.54it/s]


 85%|████████████████████████████▏    | 42618/50000 [7:44:03<1:20:55,  1.52it/s]


 85%|████████████████████████████▏    | 42619/50000 [7:44:04<1:19:45,  1.54it/s]


 85%|████████████████████████████▏    | 42620/50000 [7:44:04<1:19:23,  1.55it/s]


 85%|████████████████████████████▏    | 42621/50000 [7:44:05<1:17:45,  1.58it/s]


 85%|████████████████████████████▏    | 42622/50000 [7:44:06<1:19:40,  1.54it/s]


 85%|████████████████████████████▏    | 42623/50000 [7:44:06<1:18:46,  1.56it/s]


 85%|████████████████████████████▏    | 42624/50000 [7:44:07<1:16:41,  1.60it/s]


 85%|████████████████████████████▏    | 42625/50000 [7:44:08<1:15:22,  1.63it/s]


 85%|████████████████████████████▏    | 42626/50000 [7:44:08<1:17:56,  1.58it/s]


 85%|████████████████████████████▏    | 42627/50000 [7:44:09<1:21:57,  1.50it/s]


 85%|████████████████████████████▏    | 42628/50000 [7:44:10<1:20:27,  1.53it/s]


 85%|████████████████████████████▏    | 42629/50000 [7:44:10<1:19:38,  1.54it/s]


 85%|████████████████████████████▏    | 42630/50000 [7:44:11<1:19:17,  1.55it/s]


 85%|████████████████████████████▏    | 42631/50000 [7:44:11<1:16:56,  1.60it/s]


 85%|████████████████████████████▏    | 42632/50000 [7:44:12<1:17:23,  1.59it/s]


 85%|████████████████████████████▏    | 42633/50000 [7:44:13<1:14:28,  1.65it/s]


 85%|████████████████████████████▏    | 42634/50000 [7:44:13<1:14:15,  1.65it/s]


 85%|████████████████████████████▏    | 42635/50000 [7:44:14<1:19:05,  1.55it/s]


 85%|████████████████████████████▏    | 42636/50000 [7:44:15<1:16:33,  1.60it/s]


 85%|████████████████████████████▏    | 42637/50000 [7:44:15<1:20:54,  1.52it/s]


 85%|████████████████████████████▏    | 42638/50000 [7:44:16<1:19:28,  1.54it/s]


 85%|████████████████████████████▏    | 42639/50000 [7:44:16<1:15:40,  1.62it/s]


 85%|████████████████████████████▏    | 42640/50000 [7:44:17<1:20:18,  1.53it/s]


 85%|████████████████████████████▏    | 42641/50000 [7:44:18<1:14:58,  1.64it/s]


 85%|████████████████████████████▏    | 42642/50000 [7:44:18<1:16:26,  1.60it/s]


 85%|████████████████████████████▏    | 42643/50000 [7:44:19<1:12:29,  1.69it/s]


 85%|████████████████████████████▏    | 42644/50000 [7:44:19<1:11:55,  1.70it/s]


 85%|████████████████████████████▏    | 42645/50000 [7:44:20<1:17:27,  1.58it/s]


 85%|████████████████████████████▏    | 42646/50000 [7:44:21<1:22:00,  1.49it/s]


 85%|████████████████████████████▏    | 42647/50000 [7:44:22<1:25:10,  1.44it/s]


 85%|████████████████████████████▏    | 42648/50000 [7:44:22<1:25:44,  1.43it/s]


 85%|████████████████████████████▏    | 42649/50000 [7:44:23<1:21:42,  1.50it/s]


 85%|████████████████████████████▏    | 42650/50000 [7:44:24<1:18:52,  1.55it/s]


 85%|████████████████████████████▏    | 42651/50000 [7:44:24<1:16:41,  1.60it/s]


 85%|████████████████████████████▏    | 42652/50000 [7:44:25<1:19:48,  1.53it/s]


 85%|████████████████████████████▏    | 42653/50000 [7:44:26<1:23:13,  1.47it/s]


 85%|████████████████████████████▏    | 42654/50000 [7:44:26<1:26:21,  1.42it/s]


 85%|████████████████████████████▏    | 42655/50000 [7:44:27<1:21:19,  1.51it/s]


 85%|████████████████████████████▏    | 42656/50000 [7:44:27<1:16:06,  1.61it/s]


 85%|████████████████████████████▏    | 42657/50000 [7:44:28<1:21:26,  1.50it/s]


 85%|████████████████████████████▏    | 42658/50000 [7:44:29<1:19:00,  1.55it/s]


 85%|████████████████████████████▏    | 42659/50000 [7:44:29<1:18:28,  1.56it/s]


 85%|████████████████████████████▏    | 42660/50000 [7:44:30<1:19:37,  1.54it/s]


 85%|████████████████████████████▏    | 42661/50000 [7:44:31<1:16:10,  1.61it/s]


 85%|████████████████████████████▏    | 42662/50000 [7:44:31<1:17:32,  1.58it/s]


 85%|████████████████████████████▏    | 42663/50000 [7:44:32<1:21:21,  1.50it/s]


 85%|████████████████████████████▏    | 42664/50000 [7:44:33<1:18:44,  1.55it/s]


 85%|████████████████████████████▏    | 42665/50000 [7:44:33<1:21:26,  1.50it/s]


 85%|████████████████████████████▏    | 42666/50000 [7:44:34<1:22:05,  1.49it/s]


 85%|████████████████████████████▏    | 42667/50000 [7:44:35<1:22:03,  1.49it/s]


 85%|████████████████████████████▏    | 42668/50000 [7:44:35<1:16:34,  1.60it/s]


 85%|████████████████████████████▏    | 42669/50000 [7:44:36<1:21:44,  1.49it/s]


 85%|████████████████████████████▏    | 42670/50000 [7:44:37<1:22:15,  1.49it/s]


 85%|████████████████████████████▏    | 42671/50000 [7:44:37<1:18:00,  1.57it/s]


 85%|████████████████████████████▏    | 42672/50000 [7:44:38<1:20:33,  1.52it/s]


 85%|████████████████████████████▏    | 42673/50000 [7:44:39<1:20:43,  1.51it/s]


 85%|████████████████████████████▏    | 42674/50000 [7:44:39<1:21:18,  1.50it/s]


 85%|████████████████████████████▏    | 42675/50000 [7:44:40<1:27:14,  1.40it/s]


 85%|████████████████████████████▏    | 42676/50000 [7:44:41<1:22:23,  1.48it/s]


 85%|████████████████████████████▏    | 42677/50000 [7:44:41<1:21:04,  1.51it/s]


 85%|████████████████████████████▏    | 42678/50000 [7:44:42<1:24:23,  1.45it/s]


 85%|████████████████████████████▏    | 42679/50000 [7:44:43<1:23:51,  1.45it/s]


 85%|████████████████████████████▏    | 42680/50000 [7:44:44<1:25:39,  1.42it/s]


 85%|████████████████████████████▏    | 42681/50000 [7:44:44<1:22:04,  1.49it/s]


 85%|████████████████████████████▏    | 42682/50000 [7:44:45<1:28:57,  1.37it/s]


 85%|████████████████████████████▏    | 42683/50000 [7:44:46<1:26:34,  1.41it/s]


 85%|████████████████████████████▏    | 42684/50000 [7:44:46<1:21:53,  1.49it/s]


 85%|████████████████████████████▏    | 42685/50000 [7:44:47<1:22:17,  1.48it/s]


 85%|████████████████████████████▏    | 42686/50000 [7:44:48<1:18:52,  1.55it/s]


 85%|████████████████████████████▏    | 42687/50000 [7:44:48<1:19:30,  1.53it/s]


 85%|████████████████████████████▏    | 42688/50000 [7:44:49<1:19:56,  1.52it/s]


 85%|████████████████████████████▏    | 42689/50000 [7:44:49<1:16:59,  1.58it/s]


 85%|████████████████████████████▏    | 42690/50000 [7:44:50<1:13:48,  1.65it/s]


 85%|████████████████████████████▏    | 42691/50000 [7:44:51<1:19:28,  1.53it/s]


 85%|████████████████████████████▏    | 42692/50000 [7:44:52<1:34:52,  1.28it/s]


 85%|████████████████████████████▏    | 42693/50000 [7:44:53<1:31:08,  1.34it/s]


 85%|████████████████████████████▏    | 42694/50000 [7:44:53<1:27:10,  1.40it/s]


 85%|████████████████████████████▏    | 42695/50000 [7:44:54<1:22:17,  1.48it/s]


 85%|████████████████████████████▏    | 42696/50000 [7:44:54<1:24:14,  1.45it/s]


 85%|████████████████████████████▏    | 42697/50000 [7:44:55<1:19:45,  1.53it/s]


 85%|████████████████████████████▏    | 42698/50000 [7:44:56<1:23:51,  1.45it/s]


 85%|████████████████████████████▏    | 42699/50000 [7:44:56<1:22:56,  1.47it/s]


 85%|████████████████████████████▏    | 42700/50000 [7:44:57<1:21:37,  1.49it/s]
                                                                                
{'loss': 3.1319, 'grad_norm': 4.6210503578186035, 'learning_rate': 0.000146, 'epoch': 2.24}

 85%|████████████████████████████▏    | 42700/50000 [7:44:57<1:21:37,  1.49it/s]


 85%|████████████████████████████▏    | 42701/50000 [7:44:58<1:20:54,  1.50it/s]


 85%|████████████████████████████▏    | 42702/50000 [7:44:58<1:18:25,  1.55it/s]


 85%|████████████████████████████▏    | 42703/50000 [7:44:59<1:26:06,  1.41it/s]


 85%|████████████████████████████▏    | 42704/50000 [7:45:00<1:27:22,  1.39it/s]


 85%|████████████████████████████▏    | 42705/50000 [7:45:01<1:24:54,  1.43it/s]


 85%|████████████████████████████▏    | 42706/50000 [7:45:01<1:21:06,  1.50it/s]


 85%|████████████████████████████▏    | 42707/50000 [7:45:02<1:29:21,  1.36it/s]


 85%|████████████████████████████▏    | 42708/50000 [7:45:03<1:35:20,  1.27it/s]


 85%|████████████████████████████▏    | 42709/50000 [7:45:04<1:28:52,  1.37it/s]


 85%|████████████████████████████▏    | 42710/50000 [7:45:04<1:33:20,  1.30it/s]


 85%|████████████████████████████▏    | 42711/50000 [7:45:05<1:30:18,  1.35it/s]


 85%|████████████████████████████▏    | 42712/50000 [7:45:06<1:26:57,  1.40it/s]


 85%|████████████████████████████▏    | 42713/50000 [7:45:06<1:21:51,  1.48it/s]


 85%|████████████████████████████▏    | 42714/50000 [7:45:07<1:21:57,  1.48it/s]


 85%|████████████████████████████▏    | 42715/50000 [7:45:08<1:25:48,  1.41it/s]


 85%|████████████████████████████▏    | 42716/50000 [7:45:09<1:24:30,  1.44it/s]


 85%|████████████████████████████▏    | 42717/50000 [7:45:09<1:20:46,  1.50it/s]


 85%|████████████████████████████▏    | 42718/50000 [7:45:10<1:19:20,  1.53it/s]


 85%|████████████████████████████▏    | 42719/50000 [7:45:10<1:18:33,  1.54it/s]


 85%|████████████████████████████▏    | 42720/50000 [7:45:11<1:16:36,  1.58it/s]


 85%|████████████████████████████▏    | 42721/50000 [7:45:12<1:17:33,  1.56it/s]


 85%|████████████████████████████▏    | 42722/50000 [7:45:12<1:17:46,  1.56it/s]


 85%|████████████████████████████▏    | 42723/50000 [7:45:13<1:18:52,  1.54it/s]


 85%|████████████████████████████▏    | 42724/50000 [7:45:13<1:14:12,  1.63it/s]


 85%|████████████████████████████▏    | 42725/50000 [7:45:14<1:19:09,  1.53it/s]


 85%|████████████████████████████▏    | 42726/50000 [7:45:15<1:18:14,  1.55it/s]


 85%|████████████████████████████▏    | 42727/50000 [7:45:15<1:16:26,  1.59it/s]


 85%|████████████████████████████▏    | 42728/50000 [7:45:16<1:20:22,  1.51it/s]


 85%|████████████████████████████▏    | 42729/50000 [7:45:17<1:19:34,  1.52it/s]


 85%|████████████████████████████▏    | 42730/50000 [7:45:17<1:17:45,  1.56it/s]


 85%|████████████████████████████▏    | 42731/50000 [7:45:18<1:17:12,  1.57it/s]


 85%|████████████████████████████▏    | 42732/50000 [7:45:19<1:18:52,  1.54it/s]


 85%|████████████████████████████▏    | 42733/50000 [7:45:19<1:19:34,  1.52it/s]


 85%|████████████████████████████▏    | 42734/50000 [7:45:20<1:19:49,  1.52it/s]


 85%|████████████████████████████▏    | 42735/50000 [7:45:21<1:17:42,  1.56it/s]


 85%|████████████████████████████▏    | 42736/50000 [7:45:21<1:15:45,  1.60it/s]


 85%|████████████████████████████▏    | 42737/50000 [7:45:22<1:13:02,  1.66it/s]


 85%|████████████████████████████▏    | 42738/50000 [7:45:22<1:15:57,  1.59it/s]


 85%|████████████████████████████▏    | 42739/50000 [7:45:23<1:14:59,  1.61it/s]


 85%|████████████████████████████▏    | 42740/50000 [7:45:24<1:13:11,  1.65it/s]


 85%|████████████████████████████▏    | 42741/50000 [7:45:24<1:14:23,  1.63it/s]


 85%|████████████████████████████▏    | 42742/50000 [7:45:25<1:13:25,  1.65it/s]


 85%|████████████████████████████▏    | 42743/50000 [7:45:26<1:15:57,  1.59it/s]


 85%|████████████████████████████▏    | 42744/50000 [7:45:26<1:19:51,  1.51it/s]


 85%|████████████████████████████▏    | 42745/50000 [7:45:27<1:21:37,  1.48it/s]


 85%|████████████████████████████▏    | 42746/50000 [7:45:28<1:18:15,  1.54it/s]


 85%|████████████████████████████▏    | 42747/50000 [7:45:28<1:16:06,  1.59it/s]


 85%|████████████████████████████▏    | 42748/50000 [7:45:29<1:15:03,  1.61it/s]


 85%|████████████████████████████▏    | 42749/50000 [7:45:30<1:19:13,  1.53it/s]


 86%|████████████████████████████▏    | 42750/50000 [7:45:30<1:18:50,  1.53it/s]


 86%|████████████████████████████▏    | 42751/50000 [7:45:31<1:16:44,  1.57it/s]


 86%|████████████████████████████▏    | 42752/50000 [7:45:32<1:21:51,  1.48it/s]


 86%|████████████████████████████▏    | 42753/50000 [7:45:32<1:21:43,  1.48it/s]


 86%|████████████████████████████▏    | 42754/50000 [7:45:33<1:17:56,  1.55it/s]


 86%|████████████████████████████▏    | 42755/50000 [7:45:33<1:18:27,  1.54it/s]


 86%|████████████████████████████▏    | 42756/50000 [7:45:34<1:20:40,  1.50it/s]


 86%|████████████████████████████▏    | 42757/50000 [7:45:35<1:27:29,  1.38it/s]


 86%|████████████████████████████▏    | 42758/50000 [7:45:36<1:23:01,  1.45it/s]


 86%|████████████████████████████▏    | 42759/50000 [7:45:36<1:21:44,  1.48it/s]


 86%|████████████████████████████▏    | 42760/50000 [7:45:37<1:23:28,  1.45it/s]


 86%|████████████████████████████▏    | 42761/50000 [7:45:38<1:21:37,  1.48it/s]


 86%|████████████████████████████▏    | 42762/50000 [7:45:38<1:17:22,  1.56it/s]


 86%|████████████████████████████▏    | 42763/50000 [7:45:39<1:19:53,  1.51it/s]


 86%|████████████████████████████▏    | 42764/50000 [7:45:40<1:19:34,  1.52it/s]


 86%|████████████████████████████▏    | 42765/50000 [7:45:40<1:23:17,  1.45it/s]


 86%|████████████████████████████▏    | 42766/50000 [7:45:41<1:21:29,  1.48it/s]


 86%|████████████████████████████▏    | 42767/50000 [7:45:42<1:21:50,  1.47it/s]


 86%|████████████████████████████▏    | 42768/50000 [7:45:42<1:21:37,  1.48it/s]


 86%|████████████████████████████▏    | 42769/50000 [7:45:43<1:19:10,  1.52it/s]


 86%|████████████████████████████▏    | 42770/50000 [7:45:44<1:23:29,  1.44it/s]


 86%|████████████████████████████▏    | 42771/50000 [7:45:44<1:21:09,  1.48it/s]


 86%|████████████████████████████▏    | 42772/50000 [7:45:45<1:19:18,  1.52it/s]


 86%|████████████████████████████▏    | 42773/50000 [7:45:46<1:17:18,  1.56it/s]


 86%|████████████████████████████▏    | 42774/50000 [7:45:46<1:15:54,  1.59it/s]


 86%|████████████████████████████▏    | 42775/50000 [7:45:47<1:14:41,  1.61it/s]


 86%|████████████████████████████▏    | 42776/50000 [7:45:48<1:18:59,  1.52it/s]


 86%|████████████████████████████▏    | 42777/50000 [7:45:48<1:14:13,  1.62it/s]


 86%|████████████████████████████▏    | 42778/50000 [7:45:49<1:16:43,  1.57it/s]


 86%|████████████████████████████▏    | 42779/50000 [7:45:49<1:20:35,  1.49it/s]


 86%|████████████████████████████▏    | 42780/50000 [7:45:50<1:23:30,  1.44it/s]


 86%|████████████████████████████▏    | 42781/50000 [7:45:51<1:22:29,  1.46it/s]


 86%|████████████████████████████▏    | 42782/50000 [7:45:52<1:21:20,  1.48it/s]


 86%|████████████████████████████▏    | 42783/50000 [7:45:52<1:24:02,  1.43it/s]


 86%|████████████████████████████▏    | 42784/50000 [7:45:53<1:25:00,  1.41it/s]


 86%|████████████████████████████▏    | 42785/50000 [7:45:54<1:22:37,  1.46it/s]


 86%|████████████████████████████▏    | 42786/50000 [7:45:54<1:21:51,  1.47it/s]


 86%|████████████████████████████▏    | 42787/50000 [7:45:55<1:20:52,  1.49it/s]


 86%|████████████████████████████▏    | 42788/50000 [7:45:56<1:18:34,  1.53it/s]


 86%|████████████████████████████▏    | 42789/50000 [7:45:56<1:16:03,  1.58it/s]


 86%|████████████████████████████▏    | 42790/50000 [7:45:57<1:17:03,  1.56it/s]


 86%|████████████████████████████▏    | 42791/50000 [7:45:58<1:19:33,  1.51it/s]


 86%|████████████████████████████▏    | 42792/50000 [7:45:58<1:16:53,  1.56it/s]


 86%|████████████████████████████▏    | 42793/50000 [7:45:59<1:16:28,  1.57it/s]


 86%|████████████████████████████▏    | 42794/50000 [7:45:59<1:17:32,  1.55it/s]


 86%|████████████████████████████▏    | 42795/50000 [7:46:00<1:16:56,  1.56it/s]


 86%|████████████████████████████▏    | 42796/50000 [7:46:01<1:19:27,  1.51it/s]


 86%|████████████████████████████▏    | 42797/50000 [7:46:01<1:21:32,  1.47it/s]


 86%|████████████████████████████▏    | 42798/50000 [7:46:02<1:20:37,  1.49it/s]


 86%|████████████████████████████▏    | 42799/50000 [7:46:03<1:19:21,  1.51it/s]


 86%|████████████████████████████▏    | 42800/50000 [7:46:03<1:17:02,  1.56it/s]
                                                                                
{'loss': 3.1201, 'grad_norm': 3.0719785690307617, 'learning_rate': 0.000144, 'epoch': 2.24}

 86%|████████████████████████████▏    | 42800/50000 [7:46:03<1:17:02,  1.56it/s]


 86%|████████████████████████████▏    | 42801/50000 [7:46:04<1:18:07,  1.54it/s]


 86%|████████████████████████████▏    | 42802/50000 [7:46:05<1:21:15,  1.48it/s]


 86%|████████████████████████████▏    | 42803/50000 [7:46:05<1:20:38,  1.49it/s]


 86%|████████████████████████████▎    | 42804/50000 [7:46:06<1:17:41,  1.54it/s]


 86%|████████████████████████████▎    | 42805/50000 [7:46:07<1:15:37,  1.59it/s]


 86%|████████████████████████████▎    | 42806/50000 [7:46:07<1:16:39,  1.56it/s]


 86%|████████████████████████████▎    | 42807/50000 [7:46:08<1:15:21,  1.59it/s]


 86%|████████████████████████████▎    | 42808/50000 [7:46:09<1:18:38,  1.52it/s]


 86%|████████████████████████████▎    | 42809/50000 [7:46:09<1:16:43,  1.56it/s]


 86%|████████████████████████████▎    | 42810/50000 [7:46:10<1:16:59,  1.56it/s]


 86%|████████████████████████████▎    | 42811/50000 [7:46:11<1:19:25,  1.51it/s]


 86%|████████████████████████████▎    | 42812/50000 [7:46:11<1:18:37,  1.52it/s]


 86%|████████████████████████████▎    | 42813/50000 [7:46:12<1:16:01,  1.58it/s]


 86%|████████████████████████████▎    | 42814/50000 [7:46:12<1:13:32,  1.63it/s]


 86%|████████████████████████████▎    | 42815/50000 [7:46:13<1:22:18,  1.46it/s]


 86%|████████████████████████████▎    | 42816/50000 [7:46:14<1:18:36,  1.52it/s]


 86%|████████████████████████████▎    | 42817/50000 [7:46:14<1:19:10,  1.51it/s]


 86%|████████████████████████████▎    | 42818/50000 [7:46:15<1:15:03,  1.59it/s]


 86%|████████████████████████████▎    | 42819/50000 [7:46:16<1:15:47,  1.58it/s]


 86%|████████████████████████████▎    | 42820/50000 [7:46:16<1:14:54,  1.60it/s]


 86%|████████████████████████████▎    | 42821/50000 [7:46:17<1:13:10,  1.64it/s]


 86%|████████████████████████████▎    | 42822/50000 [7:46:18<1:18:39,  1.52it/s]


 86%|████████████████████████████▎    | 42823/50000 [7:46:18<1:21:32,  1.47it/s]


 86%|████████████████████████████▎    | 42824/50000 [7:46:19<1:21:26,  1.47it/s]


 86%|████████████████████████████▎    | 42825/50000 [7:46:20<1:26:07,  1.39it/s]


 86%|████████████████████████████▎    | 42826/50000 [7:46:20<1:21:27,  1.47it/s]


 86%|████████████████████████████▎    | 42827/50000 [7:46:21<1:19:19,  1.51it/s]


 86%|████████████████████████████▎    | 42828/50000 [7:46:22<1:22:26,  1.45it/s]


 86%|████████████████████████████▎    | 42829/50000 [7:46:22<1:21:51,  1.46it/s]


 86%|████████████████████████████▎    | 42830/50000 [7:46:23<1:21:12,  1.47it/s]


 86%|████████████████████████████▎    | 42831/50000 [7:46:24<1:19:09,  1.51it/s]


 86%|████████████████████████████▎    | 42832/50000 [7:46:24<1:16:06,  1.57it/s]


 86%|████████████████████████████▎    | 42833/50000 [7:46:25<1:23:59,  1.42it/s]


 86%|████████████████████████████▎    | 42834/50000 [7:46:26<1:29:42,  1.33it/s]


 86%|████████████████████████████▎    | 42835/50000 [7:46:27<1:24:00,  1.42it/s]


 86%|████████████████████████████▎    | 42836/50000 [7:46:27<1:20:24,  1.48it/s]


 86%|████████████████████████████▎    | 42837/50000 [7:46:28<1:20:35,  1.48it/s]


 86%|████████████████████████████▎    | 42838/50000 [7:46:29<1:23:45,  1.43it/s]


 86%|████████████████████████████▎    | 42839/50000 [7:46:29<1:22:01,  1.46it/s]


 86%|████████████████████████████▎    | 42840/50000 [7:46:30<1:20:35,  1.48it/s]


 86%|████████████████████████████▎    | 42841/50000 [7:46:31<1:25:54,  1.39it/s]


 86%|████████████████████████████▎    | 42842/50000 [7:46:32<1:29:22,  1.33it/s]


 86%|████████████████████████████▎    | 42843/50000 [7:46:32<1:28:31,  1.35it/s]


 86%|████████████████████████████▎    | 42844/50000 [7:46:33<1:26:30,  1.38it/s]


 86%|████████████████████████████▎    | 42845/50000 [7:46:34<1:21:00,  1.47it/s]


 86%|████████████████████████████▎    | 42846/50000 [7:46:34<1:22:07,  1.45it/s]


 86%|████████████████████████████▎    | 42847/50000 [7:46:35<1:18:36,  1.52it/s]


 86%|████████████████████████████▎    | 42848/50000 [7:46:36<1:16:34,  1.56it/s]


 86%|████████████████████████████▎    | 42849/50000 [7:46:36<1:23:58,  1.42it/s]


 86%|████████████████████████████▎    | 42850/50000 [7:46:37<1:20:06,  1.49it/s]


 86%|████████████████████████████▎    | 42851/50000 [7:46:38<1:20:34,  1.48it/s]


 86%|████████████████████████████▎    | 42852/50000 [7:46:38<1:19:19,  1.50it/s]


 86%|████████████████████████████▎    | 42853/50000 [7:46:39<1:23:17,  1.43it/s]


 86%|████████████████████████████▎    | 42854/50000 [7:46:40<1:20:45,  1.47it/s]


 86%|████████████████████████████▎    | 42855/50000 [7:46:40<1:22:27,  1.44it/s]


 86%|████████████████████████████▎    | 42856/50000 [7:46:41<1:17:54,  1.53it/s]


 86%|████████████████████████████▎    | 42857/50000 [7:46:42<1:17:52,  1.53it/s]


 86%|████████████████████████████▎    | 42858/50000 [7:46:42<1:17:24,  1.54it/s]


 86%|████████████████████████████▎    | 42859/50000 [7:46:43<1:17:30,  1.54it/s]


 86%|████████████████████████████▎    | 42860/50000 [7:46:44<1:18:03,  1.52it/s]


 86%|████████████████████████████▎    | 42861/50000 [7:46:44<1:11:48,  1.66it/s]


 86%|████████████████████████████▎    | 42862/50000 [7:46:45<1:13:08,  1.63it/s]


 86%|████████████████████████████▎    | 42863/50000 [7:46:45<1:11:21,  1.67it/s]


 86%|████████████████████████████▎    | 42864/50000 [7:46:46<1:15:19,  1.58it/s]


 86%|████████████████████████████▎    | 42865/50000 [7:46:47<1:12:40,  1.64it/s]


 86%|████████████████████████████▎    | 42866/50000 [7:46:47<1:11:14,  1.67it/s]


 86%|████████████████████████████▎    | 42867/50000 [7:46:48<1:12:45,  1.63it/s]


 86%|████████████████████████████▎    | 42868/50000 [7:46:48<1:11:45,  1.66it/s]


 86%|████████████████████████████▎    | 42869/50000 [7:46:49<1:10:37,  1.68it/s]


 86%|████████████████████████████▎    | 42870/50000 [7:46:50<1:16:11,  1.56it/s]


 86%|████████████████████████████▎    | 42871/50000 [7:46:50<1:16:09,  1.56it/s]


 86%|████████████████████████████▎    | 42872/50000 [7:46:51<1:14:48,  1.59it/s]


 86%|████████████████████████████▎    | 42873/50000 [7:46:52<1:14:59,  1.58it/s]


 86%|████████████████████████████▎    | 42874/50000 [7:46:52<1:16:29,  1.55it/s]


 86%|████████████████████████████▎    | 42875/50000 [7:46:53<1:16:36,  1.55it/s]


 86%|████████████████████████████▎    | 42876/50000 [7:46:54<1:16:00,  1.56it/s]


 86%|████████████████████████████▎    | 42877/50000 [7:46:54<1:14:26,  1.59it/s]


 86%|████████████████████████████▎    | 42878/50000 [7:46:55<1:13:34,  1.61it/s]


 86%|████████████████████████████▎    | 42879/50000 [7:46:55<1:14:07,  1.60it/s]


 86%|████████████████████████████▎    | 42880/50000 [7:46:56<1:13:07,  1.62it/s]


 86%|████████████████████████████▎    | 42881/50000 [7:46:57<1:12:30,  1.64it/s]


 86%|████████████████████████████▎    | 42882/50000 [7:46:57<1:14:16,  1.60it/s]


 86%|████████████████████████████▎    | 42883/50000 [7:46:58<1:17:33,  1.53it/s]


 86%|████████████████████████████▎    | 42884/50000 [7:46:59<1:16:39,  1.55it/s]


 86%|████████████████████████████▎    | 42885/50000 [7:46:59<1:23:53,  1.41it/s]


 86%|████████████████████████████▎    | 42886/50000 [7:47:00<1:21:01,  1.46it/s]


 86%|████████████████████████████▎    | 42887/50000 [7:47:01<1:20:46,  1.47it/s]


 86%|████████████████████████████▎    | 42888/50000 [7:47:01<1:18:48,  1.50it/s]


 86%|████████████████████████████▎    | 42889/50000 [7:47:02<1:19:27,  1.49it/s]


 86%|████████████████████████████▎    | 42890/50000 [7:47:03<1:19:14,  1.50it/s]


 86%|████████████████████████████▎    | 42891/50000 [7:47:03<1:20:40,  1.47it/s]


 86%|████████████████████████████▎    | 42892/50000 [7:47:04<1:16:28,  1.55it/s]


 86%|████████████████████████████▎    | 42893/50000 [7:47:05<1:17:22,  1.53it/s]


 86%|████████████████████████████▎    | 42894/50000 [7:47:05<1:17:59,  1.52it/s]


 86%|████████████████████████████▎    | 42895/50000 [7:47:06<1:19:59,  1.48it/s]


 86%|████████████████████████████▎    | 42896/50000 [7:47:07<1:16:13,  1.55it/s]


 86%|████████████████████████████▎    | 42897/50000 [7:47:07<1:19:29,  1.49it/s]


 86%|████████████████████████████▎    | 42898/50000 [7:47:08<1:20:48,  1.46it/s]


 86%|████████████████████████████▎    | 42899/50000 [7:47:09<1:16:33,  1.55it/s]


 86%|████████████████████████████▎    | 42900/50000 [7:47:09<1:15:49,  1.56it/s]
                                                                                
{'loss': 3.1582, 'grad_norm': 3.0878732204437256, 'learning_rate': 0.00014199999999999998, 'epoch': 2.25}

 86%|████████████████████████████▎    | 42900/50000 [7:47:09<1:15:49,  1.56it/s]


 86%|████████████████████████████▎    | 42901/50000 [7:47:10<1:15:46,  1.56it/s]


 86%|████████████████████████████▎    | 42902/50000 [7:47:11<1:18:14,  1.51it/s]


 86%|████████████████████████████▎    | 42903/50000 [7:47:11<1:15:29,  1.57it/s]


 86%|████████████████████████████▎    | 42904/50000 [7:47:12<1:17:10,  1.53it/s]


 86%|████████████████████████████▎    | 42905/50000 [7:47:13<1:16:51,  1.54it/s]


 86%|████████████████████████████▎    | 42906/50000 [7:47:13<1:16:53,  1.54it/s]


 86%|████████████████████████████▎    | 42907/50000 [7:47:14<1:14:45,  1.58it/s]


 86%|████████████████████████████▎    | 42908/50000 [7:47:14<1:18:18,  1.51it/s]


 86%|████████████████████████████▎    | 42909/50000 [7:47:15<1:17:06,  1.53it/s]


 86%|████████████████████████████▎    | 42910/50000 [7:47:16<1:16:56,  1.54it/s]


 86%|████████████████████████████▎    | 42911/50000 [7:47:17<1:21:09,  1.46it/s]


 86%|████████████████████████████▎    | 42912/50000 [7:47:17<1:25:46,  1.38it/s]


 86%|████████████████████████████▎    | 42913/50000 [7:47:18<1:27:46,  1.35it/s]


 86%|████████████████████████████▎    | 42914/50000 [7:47:19<1:24:05,  1.40it/s]


 86%|████████████████████████████▎    | 42915/50000 [7:47:19<1:21:03,  1.46it/s]


 86%|████████████████████████████▎    | 42916/50000 [7:47:20<1:20:42,  1.46it/s]


 86%|████████████████████████████▎    | 42917/50000 [7:47:21<1:15:46,  1.56it/s]


 86%|████████████████████████████▎    | 42918/50000 [7:47:21<1:18:39,  1.50it/s]


 86%|████████████████████████████▎    | 42919/50000 [7:47:22<1:21:07,  1.45it/s]


 86%|████████████████████████████▎    | 42920/50000 [7:47:23<1:17:36,  1.52it/s]


 86%|████████████████████████████▎    | 42921/50000 [7:47:23<1:18:03,  1.51it/s]


 86%|████████████████████████████▎    | 42922/50000 [7:47:24<1:20:01,  1.47it/s]


 86%|████████████████████████████▎    | 42923/50000 [7:47:25<1:14:28,  1.58it/s]


 86%|████████████████████████████▎    | 42924/50000 [7:47:25<1:13:10,  1.61it/s]


 86%|████████████████████████████▎    | 42925/50000 [7:47:26<1:13:35,  1.60it/s]


 86%|████████████████████████████▎    | 42926/50000 [7:47:26<1:15:12,  1.57it/s]


 86%|████████████████████████████▎    | 42927/50000 [7:47:27<1:12:46,  1.62it/s]


 86%|████████████████████████████▎    | 42928/50000 [7:47:28<1:14:52,  1.57it/s]


 86%|████████████████████████████▎    | 42929/50000 [7:47:28<1:15:52,  1.55it/s]


 86%|████████████████████████████▎    | 42930/50000 [7:47:29<1:13:03,  1.61it/s]


 86%|████████████████████████████▎    | 42931/50000 [7:47:30<1:21:21,  1.45it/s]


 86%|████████████████████████████▎    | 42932/50000 [7:47:30<1:20:11,  1.47it/s]


 86%|████████████████████████████▎    | 42933/50000 [7:47:31<1:18:55,  1.49it/s]


 86%|████████████████████████████▎    | 42934/50000 [7:47:32<1:18:56,  1.49it/s]


 86%|████████████████████████████▎    | 42935/50000 [7:47:32<1:20:17,  1.47it/s]


 86%|████████████████████████████▎    | 42936/50000 [7:47:33<1:19:40,  1.48it/s]


 86%|████████████████████████████▎    | 42937/50000 [7:47:34<1:19:00,  1.49it/s]


 86%|████████████████████████████▎    | 42938/50000 [7:47:34<1:17:25,  1.52it/s]


 86%|████████████████████████████▎    | 42939/50000 [7:47:35<1:15:24,  1.56it/s]


 86%|████████████████████████████▎    | 42940/50000 [7:47:36<1:13:19,  1.60it/s]


 86%|████████████████████████████▎    | 42941/50000 [7:47:36<1:20:33,  1.46it/s]


 86%|████████████████████████████▎    | 42942/50000 [7:47:37<1:19:08,  1.49it/s]


 86%|████████████████████████████▎    | 42943/50000 [7:47:38<1:18:09,  1.50it/s]


 86%|████████████████████████████▎    | 42944/50000 [7:47:38<1:21:08,  1.45it/s]


 86%|████████████████████████████▎    | 42945/50000 [7:47:39<1:19:19,  1.48it/s]


 86%|████████████████████████████▎    | 42946/50000 [7:47:40<1:18:29,  1.50it/s]


 86%|████████████████████████████▎    | 42947/50000 [7:47:41<1:20:53,  1.45it/s]


 86%|████████████████████████████▎    | 42948/50000 [7:47:41<1:16:28,  1.54it/s]


 86%|████████████████████████████▎    | 42949/50000 [7:47:42<1:16:29,  1.54it/s]


 86%|████████████████████████████▎    | 42950/50000 [7:47:42<1:14:43,  1.57it/s]


 86%|████████████████████████████▎    | 42951/50000 [7:47:43<1:13:42,  1.59it/s]


 86%|████████████████████████████▎    | 42952/50000 [7:47:43<1:11:26,  1.64it/s]


 86%|████████████████████████████▎    | 42953/50000 [7:47:44<1:09:50,  1.68it/s]


 86%|████████████████████████████▎    | 42954/50000 [7:47:45<1:09:38,  1.69it/s]


 86%|████████████████████████████▎    | 42955/50000 [7:47:45<1:07:09,  1.75it/s]


 86%|████████████████████████████▎    | 42956/50000 [7:47:46<1:10:36,  1.66it/s]


 86%|████████████████████████████▎    | 42957/50000 [7:47:46<1:09:14,  1.70it/s]


 86%|████████████████████████████▎    | 42958/50000 [7:47:47<1:11:36,  1.64it/s]


 86%|████████████████████████████▎    | 42959/50000 [7:47:48<1:12:10,  1.63it/s]


 86%|████████████████████████████▎    | 42960/50000 [7:47:48<1:15:48,  1.55it/s]


 86%|████████████████████████████▎    | 42961/50000 [7:47:49<1:15:20,  1.56it/s]


 86%|████████████████████████████▎    | 42962/50000 [7:47:50<1:14:08,  1.58it/s]


 86%|████████████████████████████▎    | 42963/50000 [7:47:50<1:18:51,  1.49it/s]


 86%|████████████████████████████▎    | 42964/50000 [7:47:51<1:16:37,  1.53it/s]


 86%|████████████████████████████▎    | 42965/50000 [7:47:52<1:15:44,  1.55it/s]


 86%|████████████████████████████▎    | 42966/50000 [7:47:52<1:15:39,  1.55it/s]


 86%|████████████████████████████▎    | 42967/50000 [7:47:53<1:16:04,  1.54it/s]


 86%|████████████████████████████▎    | 42968/50000 [7:47:54<1:14:26,  1.57it/s]


 86%|████████████████████████████▎    | 42969/50000 [7:47:54<1:14:25,  1.57it/s]


 86%|████████████████████████████▎    | 42970/50000 [7:47:55<1:12:24,  1.62it/s]


 86%|████████████████████████████▎    | 42971/50000 [7:47:55<1:13:52,  1.59it/s]


 86%|████████████████████████████▎    | 42972/50000 [7:47:57<1:29:17,  1.31it/s]


 86%|████████████████████████████▎    | 42973/50000 [7:47:57<1:26:08,  1.36it/s]


 86%|████████████████████████████▎    | 42974/50000 [7:47:58<1:22:19,  1.42it/s]


 86%|████████████████████████████▎    | 42975/50000 [7:47:58<1:20:44,  1.45it/s]


 86%|████████████████████████████▎    | 42976/50000 [7:47:59<1:18:17,  1.50it/s]


 86%|████████████████████████████▎    | 42977/50000 [7:48:00<1:20:10,  1.46it/s]


 86%|████████████████████████████▎    | 42978/50000 [7:48:00<1:19:53,  1.47it/s]


 86%|████████████████████████████▎    | 42979/50000 [7:48:01<1:19:13,  1.48it/s]


 86%|████████████████████████████▎    | 42980/50000 [7:48:02<1:16:05,  1.54it/s]


 86%|████████████████████████████▎    | 42981/50000 [7:48:02<1:15:40,  1.55it/s]


 86%|████████████████████████████▎    | 42982/50000 [7:48:03<1:13:11,  1.60it/s]


 86%|████████████████████████████▎    | 42983/50000 [7:48:03<1:10:19,  1.66it/s]


 86%|████████████████████████████▎    | 42984/50000 [7:48:04<1:11:37,  1.63it/s]


 86%|████████████████████████████▎    | 42985/50000 [7:48:05<1:20:07,  1.46it/s]


 86%|████████████████████████████▎    | 42986/50000 [7:48:06<1:19:09,  1.48it/s]


 86%|████████████████████████████▎    | 42987/50000 [7:48:06<1:14:29,  1.57it/s]


 86%|████████████████████████████▎    | 42988/50000 [7:48:07<1:18:24,  1.49it/s]


 86%|████████████████████████████▎    | 42989/50000 [7:48:07<1:14:25,  1.57it/s]


 86%|████████████████████████████▎    | 42990/50000 [7:48:08<1:14:16,  1.57it/s]


 86%|████████████████████████████▎    | 42991/50000 [7:48:09<1:14:40,  1.56it/s]


 86%|████████████████████████████▎    | 42992/50000 [7:48:09<1:12:55,  1.60it/s]


 86%|████████████████████████████▍    | 42993/50000 [7:48:10<1:19:26,  1.47it/s]


 86%|████████████████████████████▍    | 42994/50000 [7:48:11<1:14:57,  1.56it/s]


 86%|████████████████████████████▍    | 42995/50000 [7:48:11<1:14:41,  1.56it/s]


 86%|████████████████████████████▍    | 42996/50000 [7:48:12<1:16:17,  1.53it/s]


 86%|████████████████████████████▍    | 42997/50000 [7:48:13<1:21:47,  1.43it/s]


 86%|████████████████████████████▍    | 42998/50000 [7:48:13<1:19:00,  1.48it/s]


 86%|████████████████████████████▍    | 42999/50000 [7:48:14<1:18:36,  1.48it/s]


 86%|████████████████████████████▍    | 43000/50000 [7:48:15<1:18:33,  1.49it/s]
                                                                                
{'loss': 3.1361, 'grad_norm': 2.9899773597717285, 'learning_rate': 0.00014000000000000001, 'epoch': 2.25}

 86%|████████████████████████████▍    | 43000/50000 [7:48:15<1:18:33,  1.49it/s]


 86%|████████████████████████████▍    | 43001/50000 [7:48:15<1:14:06,  1.57it/s]


 86%|████████████████████████████▍    | 43002/50000 [7:48:16<1:15:32,  1.54it/s]


 86%|████████████████████████████▍    | 43003/50000 [7:48:17<1:15:26,  1.55it/s]


 86%|████████████████████████████▍    | 43004/50000 [7:48:17<1:15:34,  1.54it/s]


 86%|████████████████████████████▍    | 43005/50000 [7:48:18<1:14:49,  1.56it/s]


 86%|████████████████████████████▍    | 43006/50000 [7:48:19<1:20:41,  1.44it/s]


 86%|████████████████████████████▍    | 43007/50000 [7:48:20<1:26:47,  1.34it/s]


 86%|████████████████████████████▍    | 43008/50000 [7:48:20<1:23:58,  1.39it/s]


 86%|████████████████████████████▍    | 43009/50000 [7:48:21<1:21:58,  1.42it/s]


 86%|████████████████████████████▍    | 43010/50000 [7:48:21<1:15:37,  1.54it/s]


 86%|████████████████████████████▍    | 43011/50000 [7:48:22<1:16:23,  1.52it/s]


 86%|████████████████████████████▍    | 43012/50000 [7:48:23<1:12:55,  1.60it/s]


 86%|████████████████████████████▍    | 43013/50000 [7:48:23<1:10:16,  1.66it/s]


 86%|████████████████████████████▍    | 43014/50000 [7:48:24<1:09:29,  1.68it/s]


 86%|████████████████████████████▍    | 43015/50000 [7:48:25<1:12:16,  1.61it/s]


 86%|████████████████████████████▍    | 43016/50000 [7:48:25<1:11:22,  1.63it/s]


 86%|████████████████████████████▍    | 43017/50000 [7:48:26<1:12:56,  1.60it/s]


 86%|████████████████████████████▍    | 43018/50000 [7:48:26<1:12:55,  1.60it/s]


 86%|████████████████████████████▍    | 43019/50000 [7:48:27<1:08:48,  1.69it/s]


 86%|████████████████████████████▍    | 43020/50000 [7:48:27<1:07:23,  1.73it/s]


 86%|████████████████████████████▍    | 43021/50000 [7:48:28<1:06:36,  1.75it/s]


 86%|████████████████████████████▍    | 43022/50000 [7:48:29<1:11:52,  1.62it/s]


 86%|████████████████████████████▍    | 43023/50000 [7:48:29<1:11:21,  1.63it/s]


 86%|████████████████████████████▍    | 43024/50000 [7:48:30<1:16:43,  1.52it/s]


 86%|████████████████████████████▍    | 43025/50000 [7:48:31<1:19:11,  1.47it/s]


 86%|████████████████████████████▍    | 43026/50000 [7:48:32<1:24:18,  1.38it/s]


 86%|████████████████████████████▍    | 43027/50000 [7:48:32<1:19:45,  1.46it/s]


 86%|████████████████████████████▍    | 43028/50000 [7:48:33<1:17:54,  1.49it/s]


 86%|████████████████████████████▍    | 43029/50000 [7:48:34<1:15:28,  1.54it/s]


 86%|████████████████████████████▍    | 43030/50000 [7:48:34<1:17:34,  1.50it/s]


 86%|████████████████████████████▍    | 43031/50000 [7:48:35<1:16:21,  1.52it/s]


 86%|████████████████████████████▍    | 43032/50000 [7:48:36<1:16:48,  1.51it/s]


 86%|████████████████████████████▍    | 43033/50000 [7:48:36<1:22:36,  1.41it/s]


 86%|████████████████████████████▍    | 43034/50000 [7:48:37<1:18:56,  1.47it/s]


 86%|████████████████████████████▍    | 43035/50000 [7:48:38<1:17:29,  1.50it/s]


 86%|████████████████████████████▍    | 43036/50000 [7:48:38<1:17:47,  1.49it/s]


 86%|████████████████████████████▍    | 43037/50000 [7:48:39<1:13:51,  1.57it/s]


 86%|████████████████████████████▍    | 43038/50000 [7:48:39<1:11:46,  1.62it/s]


 86%|████████████████████████████▍    | 43039/50000 [7:48:40<1:11:50,  1.61it/s]


 86%|████████████████████████████▍    | 43040/50000 [7:48:41<1:14:04,  1.57it/s]


 86%|████████████████████████████▍    | 43041/50000 [7:48:41<1:14:03,  1.57it/s]


 86%|████████████████████████████▍    | 43042/50000 [7:48:42<1:13:51,  1.57it/s]


 86%|████████████████████████████▍    | 43043/50000 [7:48:43<1:11:56,  1.61it/s]


 86%|████████████████████████████▍    | 43044/50000 [7:48:43<1:12:47,  1.59it/s]


 86%|████████████████████████████▍    | 43045/50000 [7:48:44<1:19:04,  1.47it/s]


 86%|████████████████████████████▍    | 43046/50000 [7:48:45<1:18:03,  1.48it/s]


 86%|████████████████████████████▍    | 43047/50000 [7:48:45<1:17:16,  1.50it/s]


 86%|████████████████████████████▍    | 43048/50000 [7:48:46<1:19:03,  1.47it/s]


 86%|████████████████████████████▍    | 43049/50000 [7:48:47<1:17:22,  1.50it/s]


 86%|████████████████████████████▍    | 43050/50000 [7:48:47<1:20:11,  1.44it/s]


 86%|████████████████████████████▍    | 43051/50000 [7:48:48<1:21:19,  1.42it/s]


 86%|████████████████████████████▍    | 43052/50000 [7:48:49<1:16:07,  1.52it/s]


 86%|████████████████████████████▍    | 43053/50000 [7:48:49<1:14:01,  1.56it/s]


 86%|████████████████████████████▍    | 43054/50000 [7:48:50<1:11:59,  1.61it/s]


 86%|████████████████████████████▍    | 43055/50000 [7:48:51<1:13:10,  1.58it/s]


 86%|████████████████████████████▍    | 43056/50000 [7:48:51<1:16:25,  1.51it/s]


 86%|████████████████████████████▍    | 43057/50000 [7:48:52<1:14:24,  1.56it/s]


 86%|████████████████████████████▍    | 43058/50000 [7:48:52<1:09:23,  1.67it/s]


 86%|████████████████████████████▍    | 43059/50000 [7:48:53<1:09:30,  1.66it/s]


 86%|████████████████████████████▍    | 43060/50000 [7:48:54<1:13:18,  1.58it/s]


 86%|████████████████████████████▍    | 43061/50000 [7:48:55<1:19:43,  1.45it/s]


 86%|████████████████████████████▍    | 43062/50000 [7:48:55<1:21:12,  1.42it/s]


 86%|████████████████████████████▍    | 43063/50000 [7:48:56<1:22:39,  1.40it/s]


 86%|████████████████████████████▍    | 43064/50000 [7:48:57<1:18:54,  1.47it/s]


 86%|████████████████████████████▍    | 43065/50000 [7:48:57<1:16:04,  1.52it/s]


 86%|████████████████████████████▍    | 43066/50000 [7:48:58<1:14:05,  1.56it/s]


 86%|████████████████████████████▍    | 43067/50000 [7:48:58<1:14:14,  1.56it/s]


 86%|████████████████████████████▍    | 43068/50000 [7:48:59<1:17:43,  1.49it/s]


 86%|████████████████████████████▍    | 43069/50000 [7:49:00<1:17:40,  1.49it/s]


 86%|████████████████████████████▍    | 43070/50000 [7:49:00<1:14:31,  1.55it/s]


 86%|████████████████████████████▍    | 43071/50000 [7:49:01<1:15:24,  1.53it/s]


 86%|████████████████████████████▍    | 43072/50000 [7:49:02<1:19:15,  1.46it/s]


 86%|████████████████████████████▍    | 43073/50000 [7:49:02<1:15:41,  1.53it/s]


 86%|████████████████████████████▍    | 43074/50000 [7:49:03<1:13:34,  1.57it/s]


 86%|████████████████████████████▍    | 43075/50000 [7:49:04<1:11:54,  1.61it/s]


 86%|████████████████████████████▍    | 43076/50000 [7:49:04<1:12:27,  1.59it/s]


 86%|████████████████████████████▍    | 43077/50000 [7:49:05<1:13:03,  1.58it/s]


 86%|████████████████████████████▍    | 43078/50000 [7:49:05<1:11:05,  1.62it/s]


 86%|████████████████████████████▍    | 43079/50000 [7:49:06<1:09:03,  1.67it/s]


 86%|████████████████████████████▍    | 43080/50000 [7:49:07<1:16:59,  1.50it/s]


 86%|████████████████████████████▍    | 43081/50000 [7:49:07<1:11:44,  1.61it/s]


 86%|████████████████████████████▍    | 43082/50000 [7:49:08<1:09:55,  1.65it/s]


 86%|████████████████████████████▍    | 43083/50000 [7:49:09<1:10:39,  1.63it/s]


 86%|████████████████████████████▍    | 43084/50000 [7:49:09<1:12:42,  1.59it/s]


 86%|████████████████████████████▍    | 43085/50000 [7:49:10<1:08:21,  1.69it/s]


 86%|████████████████████████████▍    | 43086/50000 [7:49:10<1:07:20,  1.71it/s]


 86%|████████████████████████████▍    | 43087/50000 [7:49:11<1:09:25,  1.66it/s]


 86%|████████████████████████████▍    | 43088/50000 [7:49:12<1:11:46,  1.61it/s]


 86%|████████████████████████████▍    | 43089/50000 [7:49:12<1:10:09,  1.64it/s]


 86%|████████████████████████████▍    | 43090/50000 [7:49:13<1:10:46,  1.63it/s]


 86%|████████████████████████████▍    | 43091/50000 [7:49:14<1:12:42,  1.58it/s]


 86%|████████████████████████████▍    | 43092/50000 [7:49:14<1:13:10,  1.57it/s]


 86%|████████████████████████████▍    | 43093/50000 [7:49:15<1:16:52,  1.50it/s]


 86%|████████████████████████████▍    | 43094/50000 [7:49:16<1:17:33,  1.48it/s]


 86%|████████████████████████████▍    | 43095/50000 [7:49:16<1:15:41,  1.52it/s]


 86%|████████████████████████████▍    | 43096/50000 [7:49:17<1:11:59,  1.60it/s]


 86%|████████████████████████████▍    | 43097/50000 [7:49:17<1:13:06,  1.57it/s]


 86%|████████████████████████████▍    | 43098/50000 [7:49:18<1:14:06,  1.55it/s]


 86%|████████████████████████████▍    | 43099/50000 [7:49:19<1:13:29,  1.56it/s]


 86%|████████████████████████████▍    | 43100/50000 [7:49:19<1:14:08,  1.55it/s]
                                                                                
{'loss': 3.1417, 'grad_norm': 2.7673072814941406, 'learning_rate': 0.00013800000000000002, 'epoch': 2.26}

 86%|████████████████████████████▍    | 43100/50000 [7:49:19<1:14:08,  1.55it/s]


 86%|████████████████████████████▍    | 43101/50000 [7:49:20<1:11:06,  1.62it/s]


 86%|████████████████████████████▍    | 43102/50000 [7:49:21<1:10:04,  1.64it/s]


 86%|████████████████████████████▍    | 43103/50000 [7:49:21<1:09:46,  1.65it/s]


 86%|████████████████████████████▍    | 43104/50000 [7:49:22<1:13:29,  1.56it/s]


 86%|████████████████████████████▍    | 43105/50000 [7:49:22<1:11:08,  1.62it/s]


 86%|████████████████████████████▍    | 43106/50000 [7:49:23<1:14:29,  1.54it/s]


 86%|████████████████████████████▍    | 43107/50000 [7:49:24<1:18:25,  1.46it/s]


 86%|████████████████████████████▍    | 43108/50000 [7:49:25<1:17:08,  1.49it/s]


 86%|████████████████████████████▍    | 43109/50000 [7:49:25<1:22:31,  1.39it/s]


 86%|████████████████████████████▍    | 43110/50000 [7:49:26<1:18:42,  1.46it/s]


 86%|████████████████████████████▍    | 43111/50000 [7:49:27<1:20:27,  1.43it/s]


 86%|████████████████████████████▍    | 43112/50000 [7:49:27<1:16:36,  1.50it/s]


 86%|████████████████████████████▍    | 43113/50000 [7:49:28<1:15:46,  1.51it/s]


 86%|████████████████████████████▍    | 43114/50000 [7:49:29<1:15:26,  1.52it/s]


 86%|████████████████████████████▍    | 43115/50000 [7:49:29<1:14:47,  1.53it/s]


 86%|████████████████████████████▍    | 43116/50000 [7:49:30<1:19:00,  1.45it/s]


 86%|████████████████████████████▍    | 43117/50000 [7:49:31<1:17:56,  1.47it/s]


 86%|████████████████████████████▍    | 43118/50000 [7:49:31<1:12:30,  1.58it/s]


 86%|████████████████████████████▍    | 43119/50000 [7:49:32<1:15:26,  1.52it/s]


 86%|████████████████████████████▍    | 43120/50000 [7:49:33<1:15:00,  1.53it/s]


 86%|████████████████████████████▍    | 43121/50000 [7:49:33<1:12:56,  1.57it/s]


 86%|████████████████████████████▍    | 43122/50000 [7:49:34<1:16:08,  1.51it/s]


 86%|████████████████████████████▍    | 43123/50000 [7:49:34<1:14:40,  1.53it/s]


 86%|████████████████████████████▍    | 43124/50000 [7:49:35<1:13:53,  1.55it/s]


 86%|████████████████████████████▍    | 43125/50000 [7:49:36<1:16:40,  1.49it/s]


 86%|████████████████████████████▍    | 43126/50000 [7:49:36<1:13:57,  1.55it/s]


 86%|████████████████████████████▍    | 43127/50000 [7:49:37<1:16:51,  1.49it/s]


 86%|████████████████████████████▍    | 43128/50000 [7:49:38<1:11:31,  1.60it/s]


 86%|████████████████████████████▍    | 43129/50000 [7:49:38<1:12:39,  1.58it/s]


 86%|████████████████████████████▍    | 43130/50000 [7:49:39<1:14:06,  1.54it/s]


 86%|████████████████████████████▍    | 43131/50000 [7:49:40<1:18:01,  1.47it/s]


 86%|████████████████████████████▍    | 43132/50000 [7:49:40<1:11:40,  1.60it/s]


 86%|████████████████████████████▍    | 43133/50000 [7:49:41<1:14:39,  1.53it/s]


 86%|████████████████████████████▍    | 43134/50000 [7:49:42<1:13:08,  1.56it/s]


 86%|████████████████████████████▍    | 43135/50000 [7:49:42<1:10:19,  1.63it/s]


 86%|████████████████████████████▍    | 43136/50000 [7:49:43<1:12:15,  1.58it/s]


 86%|████████████████████████████▍    | 43137/50000 [7:49:44<1:23:11,  1.37it/s]


 86%|████████████████████████████▍    | 43138/50000 [7:49:45<1:29:57,  1.27it/s]


 86%|████████████████████████████▍    | 43139/50000 [7:49:45<1:28:25,  1.29it/s]


 86%|████████████████████████████▍    | 43140/50000 [7:49:46<1:23:10,  1.37it/s]


 86%|████████████████████████████▍    | 43141/50000 [7:49:47<1:20:48,  1.41it/s]


 86%|████████████████████████████▍    | 43142/50000 [7:49:47<1:21:22,  1.40it/s]


 86%|████████████████████████████▍    | 43143/50000 [7:49:48<1:18:39,  1.45it/s]


 86%|████████████████████████████▍    | 43144/50000 [7:49:49<1:18:03,  1.46it/s]


 86%|████████████████████████████▍    | 43145/50000 [7:49:49<1:15:01,  1.52it/s]


 86%|████████████████████████████▍    | 43146/50000 [7:49:50<1:17:00,  1.48it/s]


 86%|████████████████████████████▍    | 43147/50000 [7:49:51<1:13:37,  1.55it/s]


 86%|████████████████████████████▍    | 43148/50000 [7:49:51<1:16:44,  1.49it/s]


 86%|████████████████████████████▍    | 43149/50000 [7:49:52<1:16:28,  1.49it/s]


 86%|████████████████████████████▍    | 43150/50000 [7:49:53<1:15:38,  1.51it/s]


 86%|████████████████████████████▍    | 43151/50000 [7:49:53<1:16:07,  1.50it/s]


 86%|████████████████████████████▍    | 43152/50000 [7:49:54<1:13:24,  1.55it/s]


 86%|████████████████████████████▍    | 43153/50000 [7:49:55<1:14:07,  1.54it/s]


 86%|████████████████████████████▍    | 43154/50000 [7:49:55<1:13:44,  1.55it/s]


 86%|████████████████████████████▍    | 43155/50000 [7:49:56<1:14:21,  1.53it/s]


 86%|████████████████████████████▍    | 43156/50000 [7:49:57<1:16:18,  1.49it/s]


 86%|████████████████████████████▍    | 43157/50000 [7:49:57<1:13:44,  1.55it/s]


 86%|████████████████████████████▍    | 43158/50000 [7:49:58<1:13:28,  1.55it/s]


 86%|████████████████████████████▍    | 43159/50000 [7:49:59<1:16:10,  1.50it/s]


 86%|████████████████████████████▍    | 43160/50000 [7:49:59<1:13:26,  1.55it/s]


 86%|████████████████████████████▍    | 43161/50000 [7:50:00<1:14:33,  1.53it/s]


 86%|████████████████████████████▍    | 43162/50000 [7:50:00<1:12:31,  1.57it/s]


 86%|████████████████████████████▍    | 43163/50000 [7:50:01<1:13:54,  1.54it/s]


 86%|████████████████████████████▍    | 43164/50000 [7:50:02<1:13:47,  1.54it/s]


 86%|████████████████████████████▍    | 43165/50000 [7:50:03<1:19:31,  1.43it/s]


 86%|████████████████████████████▍    | 43166/50000 [7:50:03<1:15:21,  1.51it/s]


 86%|████████████████████████████▍    | 43167/50000 [7:50:04<1:09:56,  1.63it/s]


 86%|████████████████████████████▍    | 43168/50000 [7:50:04<1:08:50,  1.65it/s]


 86%|████████████████████████████▍    | 43169/50000 [7:50:05<1:12:54,  1.56it/s]


 86%|████████████████████████████▍    | 43170/50000 [7:50:06<1:11:32,  1.59it/s]


 86%|████████████████████████████▍    | 43171/50000 [7:50:06<1:13:11,  1.56it/s]


 86%|████████████████████████████▍    | 43172/50000 [7:50:07<1:14:34,  1.53it/s]


 86%|████████████████████████████▍    | 43173/50000 [7:50:08<1:17:32,  1.47it/s]


 86%|████████████████████████████▍    | 43174/50000 [7:50:08<1:14:20,  1.53it/s]


 86%|████████████████████████████▍    | 43175/50000 [7:50:09<1:14:04,  1.54it/s]


 86%|████████████████████████████▍    | 43176/50000 [7:50:09<1:11:42,  1.59it/s]


 86%|████████████████████████████▍    | 43177/50000 [7:50:10<1:15:30,  1.51it/s]


 86%|████████████████████████████▍    | 43178/50000 [7:50:11<1:15:17,  1.51it/s]


 86%|████████████████████████████▍    | 43179/50000 [7:50:12<1:13:51,  1.54it/s]


 86%|████████████████████████████▍    | 43180/50000 [7:50:12<1:11:07,  1.60it/s]


 86%|████████████████████████████▍    | 43181/50000 [7:50:13<1:09:11,  1.64it/s]


 86%|████████████████████████████▌    | 43182/50000 [7:50:13<1:13:43,  1.54it/s]


 86%|████████████████████████████▌    | 43183/50000 [7:50:14<1:13:22,  1.55it/s]


 86%|████████████████████████████▌    | 43184/50000 [7:50:15<1:07:40,  1.68it/s]


 86%|████████████████████████████▌    | 43185/50000 [7:50:15<1:07:36,  1.68it/s]


 86%|████████████████████████████▌    | 43186/50000 [7:50:16<1:06:56,  1.70it/s]


 86%|████████████████████████████▌    | 43187/50000 [7:50:16<1:07:06,  1.69it/s]


 86%|████████████████████████████▌    | 43188/50000 [7:50:17<1:08:17,  1.66it/s]


 86%|████████████████████████████▌    | 43189/50000 [7:50:18<1:08:29,  1.66it/s]


 86%|████████████████████████████▌    | 43190/50000 [7:50:18<1:12:17,  1.57it/s]


 86%|████████████████████████████▌    | 43191/50000 [7:50:19<1:13:26,  1.55it/s]


 86%|████████████████████████████▌    | 43192/50000 [7:50:20<1:13:23,  1.55it/s]


 86%|████████████████████████████▌    | 43193/50000 [7:50:20<1:14:11,  1.53it/s]


 86%|████████████████████████████▌    | 43194/50000 [7:50:21<1:11:56,  1.58it/s]


 86%|████████████████████████████▌    | 43195/50000 [7:50:22<1:15:12,  1.51it/s]


 86%|████████████████████████████▌    | 43196/50000 [7:50:22<1:12:27,  1.56it/s]


 86%|████████████████████████████▌    | 43197/50000 [7:50:23<1:15:20,  1.50it/s]


 86%|████████████████████████████▌    | 43198/50000 [7:50:24<1:20:55,  1.40it/s]


 86%|████████████████████████████▌    | 43199/50000 [7:50:24<1:22:19,  1.38it/s]


 86%|████████████████████████████▌    | 43200/50000 [7:50:25<1:19:08,  1.43it/s]
                                                                                
{'loss': 3.1403, 'grad_norm': 3.343552589416504, 'learning_rate': 0.00013600000000000003, 'epoch': 2.26}

 86%|████████████████████████████▌    | 43200/50000 [7:50:25<1:19:08,  1.43it/s]


 86%|████████████████████████████▌    | 43201/50000 [7:50:26<1:14:58,  1.51it/s]


 86%|████████████████████████████▌    | 43202/50000 [7:50:26<1:08:44,  1.65it/s]


 86%|████████████████████████████▌    | 43203/50000 [7:50:27<1:17:11,  1.47it/s]


 86%|████████████████████████████▌    | 43204/50000 [7:50:28<1:15:57,  1.49it/s]


 86%|████████████████████████████▌    | 43205/50000 [7:50:28<1:12:43,  1.56it/s]


 86%|████████████████████████████▌    | 43206/50000 [7:50:29<1:12:11,  1.57it/s]


 86%|████████████████████████████▌    | 43207/50000 [7:50:29<1:10:43,  1.60it/s]


 86%|████████████████████████████▌    | 43208/50000 [7:50:30<1:10:06,  1.61it/s]


 86%|████████████████████████████▌    | 43209/50000 [7:50:31<1:08:50,  1.64it/s]


 86%|████████████████████████████▌    | 43210/50000 [7:50:31<1:08:22,  1.66it/s]


 86%|████████████████████████████▌    | 43211/50000 [7:50:32<1:09:58,  1.62it/s]


 86%|████████████████████████████▌    | 43212/50000 [7:50:32<1:10:13,  1.61it/s]


 86%|████████████████████████████▌    | 43213/50000 [7:50:33<1:10:41,  1.60it/s]


 86%|████████████████████████████▌    | 43214/50000 [7:50:34<1:12:07,  1.57it/s]


 86%|████████████████████████████▌    | 43215/50000 [7:50:34<1:12:02,  1.57it/s]


 86%|████████████████████████████▌    | 43216/50000 [7:50:35<1:12:00,  1.57it/s]


 86%|████████████████████████████▌    | 43217/50000 [7:50:36<1:15:51,  1.49it/s]


 86%|████████████████████████████▌    | 43218/50000 [7:50:36<1:13:20,  1.54it/s]


 86%|████████████████████████████▌    | 43219/50000 [7:50:37<1:13:16,  1.54it/s]


 86%|████████████████████████████▌    | 43220/50000 [7:50:38<1:12:36,  1.56it/s]


 86%|████████████████████████████▌    | 43221/50000 [7:50:38<1:10:25,  1.60it/s]


 86%|████████████████████████████▌    | 43222/50000 [7:50:39<1:09:18,  1.63it/s]


 86%|████████████████████████████▌    | 43223/50000 [7:50:39<1:08:57,  1.64it/s]


 86%|████████████████████████████▌    | 43224/50000 [7:50:40<1:08:42,  1.64it/s]


 86%|████████████████████████████▌    | 43225/50000 [7:50:41<1:14:10,  1.52it/s]


 86%|████████████████████████████▌    | 43226/50000 [7:50:41<1:13:51,  1.53it/s]


 86%|████████████████████████████▌    | 43227/50000 [7:50:42<1:09:27,  1.63it/s]


 86%|████████████████████████████▌    | 43228/50000 [7:50:43<1:11:12,  1.59it/s]


 86%|████████████████████████████▌    | 43229/50000 [7:50:43<1:15:24,  1.50it/s]


 86%|████████████████████████████▌    | 43230/50000 [7:50:44<1:15:56,  1.49it/s]


 86%|████████████████████████████▌    | 43231/50000 [7:50:45<1:12:39,  1.55it/s]


 86%|████████████████████████████▌    | 43232/50000 [7:50:45<1:09:41,  1.62it/s]


 86%|████████████████████████████▌    | 43233/50000 [7:50:46<1:08:03,  1.66it/s]


 86%|████████████████████████████▌    | 43234/50000 [7:50:46<1:10:45,  1.59it/s]


 86%|████████████████████████████▌    | 43235/50000 [7:50:47<1:08:34,  1.64it/s]


 86%|████████████████████████████▌    | 43236/50000 [7:50:48<1:12:54,  1.55it/s]


 86%|████████████████████████████▌    | 43237/50000 [7:50:48<1:13:03,  1.54it/s]


 86%|████████████████████████████▌    | 43238/50000 [7:50:49<1:12:58,  1.54it/s]


 86%|████████████████████████████▌    | 43239/50000 [7:50:50<1:15:15,  1.50it/s]


 86%|████████████████████████████▌    | 43240/50000 [7:50:50<1:12:37,  1.55it/s]


 86%|████████████████████████████▌    | 43241/50000 [7:50:51<1:10:45,  1.59it/s]


 86%|████████████████████████████▌    | 43242/50000 [7:50:52<1:08:08,  1.65it/s]


 86%|████████████████████████████▌    | 43243/50000 [7:50:52<1:07:49,  1.66it/s]


 86%|████████████████████████████▌    | 43244/50000 [7:50:53<1:08:54,  1.63it/s]


 86%|████████████████████████████▌    | 43245/50000 [7:50:53<1:06:50,  1.68it/s]


 86%|████████████████████████████▌    | 43246/50000 [7:50:54<1:04:24,  1.75it/s]


 86%|████████████████████████████▌    | 43247/50000 [7:50:54<1:07:30,  1.67it/s]


 86%|████████████████████████████▌    | 43248/50000 [7:50:55<1:07:32,  1.67it/s]


 86%|████████████████████████████▌    | 43249/50000 [7:50:56<1:04:17,  1.75it/s]


 86%|████████████████████████████▌    | 43250/50000 [7:50:56<1:04:00,  1.76it/s]


 87%|████████████████████████████▌    | 43251/50000 [7:50:57<1:02:13,  1.81it/s]


 87%|████████████████████████████▌    | 43252/50000 [7:50:57<1:05:32,  1.72it/s]


 87%|████████████████████████████▌    | 43253/50000 [7:50:58<1:07:01,  1.68it/s]


 87%|████████████████████████████▌    | 43254/50000 [7:50:59<1:07:12,  1.67it/s]


 87%|████████████████████████████▌    | 43255/50000 [7:50:59<1:04:15,  1.75it/s]


 87%|████████████████████████████▌    | 43256/50000 [7:51:00<1:06:44,  1.68it/s]


 87%|████████████████████████████▌    | 43257/50000 [7:51:00<1:05:17,  1.72it/s]


 87%|████████████████████████████▌    | 43258/50000 [7:51:01<1:10:06,  1.60it/s]


 87%|████████████████████████████▌    | 43259/50000 [7:51:02<1:10:11,  1.60it/s]


 87%|████████████████████████████▌    | 43260/50000 [7:51:02<1:14:22,  1.51it/s]


 87%|████████████████████████████▌    | 43261/50000 [7:51:03<1:14:26,  1.51it/s]


 87%|████████████████████████████▌    | 43262/50000 [7:51:04<1:17:33,  1.45it/s]


 87%|████████████████████████████▌    | 43263/50000 [7:51:04<1:15:48,  1.48it/s]


 87%|████████████████████████████▌    | 43264/50000 [7:51:05<1:17:39,  1.45it/s]


 87%|████████████████████████████▌    | 43265/50000 [7:51:06<1:13:45,  1.52it/s]


 87%|████████████████████████████▌    | 43266/50000 [7:51:06<1:12:44,  1.54it/s]


 87%|████████████████████████████▌    | 43267/50000 [7:51:07<1:12:38,  1.54it/s]


 87%|████████████████████████████▌    | 43268/50000 [7:51:08<1:10:50,  1.58it/s]


 87%|████████████████████████████▌    | 43269/50000 [7:51:08<1:10:53,  1.58it/s]


 87%|████████████████████████████▌    | 43270/50000 [7:51:09<1:09:38,  1.61it/s]


 87%|████████████████████████████▌    | 43271/50000 [7:51:09<1:08:32,  1.64it/s]


 87%|████████████████████████████▌    | 43272/50000 [7:51:10<1:08:16,  1.64it/s]


 87%|████████████████████████████▌    | 43273/50000 [7:51:11<1:11:58,  1.56it/s]


 87%|████████████████████████████▌    | 43274/50000 [7:51:11<1:09:19,  1.62it/s]


 87%|████████████████████████████▌    | 43275/50000 [7:51:12<1:10:13,  1.60it/s]


 87%|████████████████████████████▌    | 43276/50000 [7:51:13<1:09:21,  1.62it/s]


 87%|████████████████████████████▌    | 43277/50000 [7:51:13<1:13:05,  1.53it/s]


 87%|████████████████████████████▌    | 43278/50000 [7:51:14<1:13:16,  1.53it/s]


 87%|████████████████████████████▌    | 43279/50000 [7:51:15<1:17:31,  1.44it/s]


 87%|████████████████████████████▌    | 43280/50000 [7:51:15<1:15:22,  1.49it/s]


 87%|████████████████████████████▌    | 43281/50000 [7:51:16<1:15:48,  1.48it/s]


 87%|████████████████████████████▌    | 43282/50000 [7:51:17<1:13:20,  1.53it/s]


 87%|████████████████████████████▌    | 43283/50000 [7:51:17<1:13:05,  1.53it/s]


 87%|████████████████████████████▌    | 43284/50000 [7:51:18<1:12:53,  1.54it/s]


 87%|████████████████████████████▌    | 43285/50000 [7:51:19<1:12:21,  1.55it/s]


 87%|████████████████████████████▌    | 43286/50000 [7:51:19<1:16:43,  1.46it/s]


 87%|████████████████████████████▌    | 43287/50000 [7:51:20<1:21:28,  1.37it/s]


 87%|████████████████████████████▌    | 43288/50000 [7:51:21<1:22:20,  1.36it/s]


 87%|████████████████████████████▌    | 43289/50000 [7:51:22<1:29:20,  1.25it/s]


 87%|████████████████████████████▌    | 43290/50000 [7:51:22<1:21:54,  1.37it/s]


 87%|████████████████████████████▌    | 43291/50000 [7:51:23<1:22:12,  1.36it/s]


 87%|████████████████████████████▌    | 43292/50000 [7:51:24<1:17:12,  1.45it/s]


 87%|████████████████████████████▌    | 43293/50000 [7:51:24<1:12:47,  1.54it/s]


 87%|████████████████████████████▌    | 43294/50000 [7:51:25<1:10:34,  1.58it/s]


 87%|████████████████████████████▌    | 43295/50000 [7:51:26<1:10:53,  1.58it/s]


 87%|████████████████████████████▌    | 43296/50000 [7:51:26<1:10:41,  1.58it/s]


 87%|████████████████████████████▌    | 43297/50000 [7:51:27<1:09:42,  1.60it/s]


 87%|████████████████████████████▌    | 43298/50000 [7:51:28<1:22:28,  1.35it/s]


 87%|████████████████████████████▌    | 43299/50000 [7:51:28<1:20:01,  1.40it/s]


 87%|████████████████████████████▌    | 43300/50000 [7:51:29<1:14:27,  1.50it/s]
                                                                                
{'loss': 3.1389, 'grad_norm': 3.1414740085601807, 'learning_rate': 0.000134, 'epoch': 2.27}

 87%|████████████████████████████▌    | 43300/50000 [7:51:29<1:14:27,  1.50it/s]


 87%|████████████████████████████▌    | 43301/50000 [7:51:30<1:13:56,  1.51it/s]


 87%|████████████████████████████▌    | 43302/50000 [7:51:30<1:12:05,  1.55it/s]


 87%|████████████████████████████▌    | 43303/50000 [7:51:31<1:14:41,  1.49it/s]


 87%|████████████████████████████▌    | 43304/50000 [7:51:32<1:11:33,  1.56it/s]


 87%|████████████████████████████▌    | 43305/50000 [7:51:32<1:09:47,  1.60it/s]


 87%|████████████████████████████▌    | 43306/50000 [7:51:33<1:08:46,  1.62it/s]


 87%|████████████████████████████▌    | 43307/50000 [7:51:33<1:07:25,  1.65it/s]


 87%|████████████████████████████▌    | 43308/50000 [7:51:34<1:09:51,  1.60it/s]


 87%|████████████████████████████▌    | 43309/50000 [7:51:35<1:14:11,  1.50it/s]


 87%|████████████████████████████▌    | 43310/50000 [7:51:35<1:09:12,  1.61it/s]


 87%|████████████████████████████▌    | 43311/50000 [7:51:36<1:07:19,  1.66it/s]


 87%|████████████████████████████▌    | 43312/50000 [7:51:37<1:09:21,  1.61it/s]


 87%|████████████████████████████▌    | 43313/50000 [7:51:37<1:07:35,  1.65it/s]


 87%|████████████████████████████▌    | 43314/50000 [7:51:38<1:04:20,  1.73it/s]


 87%|████████████████████████████▌    | 43315/50000 [7:51:38<1:04:19,  1.73it/s]


 87%|████████████████████████████▌    | 43316/50000 [7:51:39<1:07:14,  1.66it/s]


 87%|████████████████████████████▌    | 43317/50000 [7:51:40<1:09:17,  1.61it/s]


 87%|████████████████████████████▌    | 43318/50000 [7:51:40<1:09:50,  1.59it/s]


 87%|████████████████████████████▌    | 43319/50000 [7:51:41<1:12:34,  1.53it/s]


 87%|████████████████████████████▌    | 43320/50000 [7:51:42<1:19:45,  1.40it/s]


 87%|████████████████████████████▌    | 43321/50000 [7:51:42<1:18:13,  1.42it/s]


 87%|████████████████████████████▌    | 43322/50000 [7:51:43<1:14:36,  1.49it/s]


 87%|████████████████████████████▌    | 43323/50000 [7:51:44<1:12:19,  1.54it/s]


 87%|████████████████████████████▌    | 43324/50000 [7:51:44<1:16:43,  1.45it/s]


 87%|████████████████████████████▌    | 43325/50000 [7:51:45<1:19:49,  1.39it/s]


 87%|████████████████████████████▌    | 43326/50000 [7:51:46<1:17:28,  1.44it/s]


 87%|████████████████████████████▌    | 43327/50000 [7:51:47<1:20:20,  1.38it/s]


 87%|████████████████████████████▌    | 43328/50000 [7:51:47<1:13:16,  1.52it/s]


 87%|████████████████████████████▌    | 43329/50000 [7:51:48<1:09:52,  1.59it/s]


 87%|████████████████████████████▌    | 43330/50000 [7:51:48<1:10:02,  1.59it/s]


 87%|████████████████████████████▌    | 43331/50000 [7:51:49<1:09:55,  1.59it/s]


 87%|████████████████████████████▌    | 43332/50000 [7:51:50<1:10:14,  1.58it/s]


 87%|████████████████████████████▌    | 43333/50000 [7:51:50<1:11:43,  1.55it/s]


 87%|████████████████████████████▌    | 43334/50000 [7:51:51<1:14:56,  1.48it/s]


 87%|████████████████████████████▌    | 43335/50000 [7:51:52<1:11:14,  1.56it/s]


 87%|████████████████████████████▌    | 43336/50000 [7:51:52<1:15:05,  1.48it/s]


 87%|████████████████████████████▌    | 43337/50000 [7:51:53<1:13:55,  1.50it/s]


 87%|████████████████████████████▌    | 43338/50000 [7:51:54<1:15:37,  1.47it/s]


 87%|████████████████████████████▌    | 43339/50000 [7:51:54<1:14:25,  1.49it/s]


 87%|████████████████████████████▌    | 43340/50000 [7:51:55<1:13:50,  1.50it/s]


 87%|████████████████████████████▌    | 43341/50000 [7:51:56<1:11:33,  1.55it/s]


 87%|████████████████████████████▌    | 43342/50000 [7:51:56<1:12:27,  1.53it/s]


 87%|████████████████████████████▌    | 43343/50000 [7:51:57<1:12:41,  1.53it/s]


 87%|████████████████████████████▌    | 43344/50000 [7:51:57<1:10:18,  1.58it/s]


 87%|████████████████████████████▌    | 43345/50000 [7:51:58<1:16:12,  1.46it/s]


 87%|████████████████████████████▌    | 43346/50000 [7:51:59<1:12:55,  1.52it/s]


 87%|████████████████████████████▌    | 43347/50000 [7:51:59<1:10:12,  1.58it/s]


 87%|████████████████████████████▌    | 43348/50000 [7:52:00<1:11:39,  1.55it/s]


 87%|████████████████████████████▌    | 43349/50000 [7:52:01<1:09:55,  1.59it/s]


 87%|████████████████████████████▌    | 43350/50000 [7:52:01<1:08:55,  1.61it/s]


 87%|████████████████████████████▌    | 43351/50000 [7:52:02<1:09:40,  1.59it/s]


 87%|████████████████████████████▌    | 43352/50000 [7:52:03<1:16:32,  1.45it/s]


 87%|████████████████████████████▌    | 43353/50000 [7:52:03<1:12:42,  1.52it/s]


 87%|████████████████████████████▌    | 43354/50000 [7:52:04<1:10:26,  1.57it/s]


 87%|████████████████████████████▌    | 43355/50000 [7:52:05<1:10:19,  1.57it/s]


 87%|████████████████████████████▌    | 43356/50000 [7:52:05<1:13:15,  1.51it/s]


 87%|████████████████████████████▌    | 43357/50000 [7:52:06<1:12:29,  1.53it/s]


 87%|████████████████████████████▌    | 43358/50000 [7:52:06<1:08:03,  1.63it/s]


 87%|████████████████████████████▌    | 43359/50000 [7:52:07<1:12:05,  1.54it/s]


 87%|████████████████████████████▌    | 43360/50000 [7:52:08<1:11:28,  1.55it/s]


 87%|████████████████████████████▌    | 43361/50000 [7:52:08<1:09:58,  1.58it/s]


 87%|████████████████████████████▌    | 43362/50000 [7:52:09<1:09:46,  1.59it/s]


 87%|████████████████████████████▌    | 43363/50000 [7:52:10<1:12:21,  1.53it/s]


 87%|████████████████████████████▌    | 43364/50000 [7:52:10<1:09:19,  1.60it/s]


 87%|████████████████████████████▌    | 43365/50000 [7:52:11<1:12:45,  1.52it/s]


 87%|████████████████████████████▌    | 43366/50000 [7:52:12<1:12:20,  1.53it/s]


 87%|████████████████████████████▌    | 43367/50000 [7:52:12<1:14:51,  1.48it/s]


 87%|████████████████████████████▌    | 43368/50000 [7:52:13<1:14:13,  1.49it/s]


 87%|████████████████████████████▌    | 43369/50000 [7:52:14<1:19:02,  1.40it/s]


 87%|████████████████████████████▌    | 43370/50000 [7:52:14<1:11:27,  1.55it/s]


 87%|████████████████████████████▌    | 43371/50000 [7:52:15<1:11:02,  1.56it/s]


 87%|████████████████████████████▋    | 43372/50000 [7:52:16<1:12:10,  1.53it/s]


 87%|████████████████████████████▋    | 43373/50000 [7:52:17<1:20:32,  1.37it/s]


 87%|████████████████████████████▋    | 43374/50000 [7:52:17<1:14:52,  1.47it/s]


 87%|████████████████████████████▋    | 43375/50000 [7:52:18<1:12:13,  1.53it/s]


 87%|████████████████████████████▋    | 43376/50000 [7:52:18<1:07:55,  1.63it/s]


 87%|████████████████████████████▋    | 43377/50000 [7:52:19<1:06:37,  1.66it/s]


 87%|████████████████████████████▋    | 43378/50000 [7:52:19<1:06:21,  1.66it/s]


 87%|████████████████████████████▋    | 43379/50000 [7:52:20<1:08:15,  1.62it/s]


 87%|████████████████████████████▋    | 43380/50000 [7:52:21<1:07:02,  1.65it/s]


 87%|████████████████████████████▋    | 43381/50000 [7:52:21<1:10:35,  1.56it/s]


 87%|████████████████████████████▋    | 43382/50000 [7:52:22<1:11:48,  1.54it/s]


 87%|████████████████████████████▋    | 43383/50000 [7:52:23<1:12:28,  1.52it/s]


 87%|████████████████████████████▋    | 43384/50000 [7:52:23<1:05:15,  1.69it/s]


 87%|████████████████████████████▋    | 43385/50000 [7:52:24<1:07:16,  1.64it/s]


 87%|████████████████████████████▋    | 43386/50000 [7:52:24<1:06:58,  1.65it/s]


 87%|████████████████████████████▋    | 43387/50000 [7:52:25<1:11:26,  1.54it/s]


 87%|████████████████████████████▋    | 43388/50000 [7:52:26<1:09:27,  1.59it/s]


 87%|████████████████████████████▋    | 43389/50000 [7:52:27<1:12:32,  1.52it/s]


 87%|████████████████████████████▋    | 43390/50000 [7:52:27<1:10:01,  1.57it/s]


 87%|████████████████████████████▋    | 43391/50000 [7:52:28<1:13:07,  1.51it/s]


 87%|████████████████████████████▋    | 43392/50000 [7:52:28<1:09:23,  1.59it/s]


 87%|████████████████████████████▋    | 43393/50000 [7:52:29<1:10:04,  1.57it/s]


 87%|████████████████████████████▋    | 43394/50000 [7:52:30<1:11:43,  1.54it/s]


 87%|████████████████████████████▋    | 43395/50000 [7:52:30<1:12:48,  1.51it/s]


 87%|████████████████████████████▋    | 43396/50000 [7:52:31<1:12:08,  1.53it/s]


 87%|████████████████████████████▋    | 43397/50000 [7:52:32<1:08:47,  1.60it/s]


 87%|████████████████████████████▋    | 43398/50000 [7:52:32<1:09:38,  1.58it/s]


 87%|████████████████████████████▋    | 43399/50000 [7:52:33<1:11:35,  1.54it/s]


 87%|████████████████████████████▋    | 43400/50000 [7:52:34<1:09:19,  1.59it/s]
                                                                                
{'loss': 3.1065, 'grad_norm': 3.739251136779785, 'learning_rate': 0.000132, 'epoch': 2.27}

 87%|████████████████████████████▋    | 43400/50000 [7:52:34<1:09:19,  1.59it/s]


 87%|████████████████████████████▋    | 43401/50000 [7:52:34<1:10:51,  1.55it/s]


 87%|████████████████████████████▋    | 43402/50000 [7:52:35<1:13:28,  1.50it/s]


 87%|████████████████████████████▋    | 43403/50000 [7:52:36<1:10:39,  1.56it/s]


 87%|████████████████████████████▋    | 43404/50000 [7:52:36<1:10:07,  1.57it/s]


 87%|████████████████████████████▋    | 43405/50000 [7:52:37<1:14:25,  1.48it/s]


 87%|████████████████████████████▋    | 43406/50000 [7:52:38<1:16:22,  1.44it/s]


 87%|████████████████████████████▋    | 43407/50000 [7:52:38<1:15:09,  1.46it/s]


 87%|████████████████████████████▋    | 43408/50000 [7:52:39<1:13:39,  1.49it/s]


 87%|████████████████████████████▋    | 43409/50000 [7:52:40<1:12:13,  1.52it/s]


 87%|████████████████████████████▋    | 43410/50000 [7:52:40<1:14:37,  1.47it/s]


 87%|████████████████████████████▋    | 43411/50000 [7:52:41<1:12:51,  1.51it/s]


 87%|████████████████████████████▋    | 43412/50000 [7:52:42<1:15:26,  1.46it/s]


 87%|████████████████████████████▋    | 43413/50000 [7:52:42<1:17:03,  1.42it/s]


 87%|████████████████████████████▋    | 43414/50000 [7:52:43<1:17:17,  1.42it/s]


 87%|████████████████████████████▋    | 43415/50000 [7:52:44<1:12:52,  1.51it/s]


 87%|████████████████████████████▋    | 43416/50000 [7:52:44<1:12:52,  1.51it/s]


 87%|████████████████████████████▋    | 43417/50000 [7:52:45<1:09:46,  1.57it/s]


 87%|████████████████████████████▋    | 43418/50000 [7:52:45<1:07:21,  1.63it/s]


 87%|████████████████████████████▋    | 43419/50000 [7:52:46<1:07:44,  1.62it/s]


 87%|████████████████████████████▋    | 43420/50000 [7:52:47<1:08:51,  1.59it/s]


 87%|████████████████████████████▋    | 43421/50000 [7:52:47<1:10:42,  1.55it/s]


 87%|████████████████████████████▋    | 43422/50000 [7:52:48<1:10:30,  1.56it/s]


 87%|████████████████████████████▋    | 43423/50000 [7:52:49<1:11:24,  1.54it/s]


 87%|████████████████████████████▋    | 43424/50000 [7:52:49<1:09:19,  1.58it/s]


 87%|████████████████████████████▋    | 43425/50000 [7:52:50<1:10:21,  1.56it/s]


 87%|████████████████████████████▋    | 43426/50000 [7:52:51<1:11:38,  1.53it/s]


 87%|████████████████████████████▋    | 43427/50000 [7:52:51<1:08:51,  1.59it/s]


 87%|████████████████████████████▋    | 43428/50000 [7:52:52<1:13:22,  1.49it/s]


 87%|████████████████████████████▋    | 43429/50000 [7:52:53<1:12:58,  1.50it/s]


 87%|████████████████████████████▋    | 43430/50000 [7:52:53<1:12:03,  1.52it/s]


 87%|████████████████████████████▋    | 43431/50000 [7:52:54<1:10:49,  1.55it/s]


 87%|████████████████████████████▋    | 43432/50000 [7:52:55<1:17:12,  1.42it/s]


 87%|████████████████████████████▋    | 43433/50000 [7:52:56<1:18:38,  1.39it/s]


 87%|████████████████████████████▋    | 43434/50000 [7:52:56<1:14:09,  1.48it/s]


 87%|████████████████████████████▋    | 43435/50000 [7:52:57<1:17:34,  1.41it/s]


 87%|████████████████████████████▋    | 43436/50000 [7:52:58<1:14:09,  1.48it/s]


 87%|████████████████████████████▋    | 43437/50000 [7:52:58<1:13:20,  1.49it/s]


 87%|████████████████████████████▋    | 43438/50000 [7:52:59<1:11:07,  1.54it/s]


 87%|████████████████████████████▋    | 43439/50000 [7:52:59<1:11:32,  1.53it/s]


 87%|████████████████████████████▋    | 43440/50000 [7:53:00<1:09:08,  1.58it/s]


 87%|████████████████████████████▋    | 43441/50000 [7:53:01<1:07:09,  1.63it/s]


 87%|████████████████████████████▋    | 43442/50000 [7:53:01<1:08:52,  1.59it/s]


 87%|████████████████████████████▋    | 43443/50000 [7:53:02<1:11:27,  1.53it/s]


 87%|████████████████████████████▋    | 43444/50000 [7:53:03<1:09:06,  1.58it/s]


 87%|████████████████████████████▋    | 43445/50000 [7:53:03<1:12:04,  1.52it/s]


 87%|████████████████████████████▋    | 43446/50000 [7:53:04<1:10:20,  1.55it/s]


 87%|████████████████████████████▋    | 43447/50000 [7:53:05<1:10:09,  1.56it/s]


 87%|████████████████████████████▋    | 43448/50000 [7:53:05<1:11:04,  1.54it/s]


 87%|████████████████████████████▋    | 43449/50000 [7:53:06<1:16:51,  1.42it/s]


 87%|████████████████████████████▋    | 43450/50000 [7:53:07<1:11:50,  1.52it/s]


 87%|████████████████████████████▋    | 43451/50000 [7:53:07<1:08:55,  1.58it/s]


 87%|████████████████████████████▋    | 43452/50000 [7:53:08<1:09:22,  1.57it/s]


 87%|████████████████████████████▋    | 43453/50000 [7:53:08<1:08:52,  1.58it/s]


 87%|████████████████████████████▋    | 43454/50000 [7:53:09<1:11:38,  1.52it/s]


 87%|████████████████████████████▋    | 43455/50000 [7:53:10<1:08:22,  1.60it/s]


 87%|████████████████████████████▋    | 43456/50000 [7:53:10<1:07:06,  1.63it/s]


 87%|████████████████████████████▋    | 43457/50000 [7:53:11<1:09:05,  1.58it/s]


 87%|████████████████████████████▋    | 43458/50000 [7:53:12<1:15:15,  1.45it/s]


 87%|████████████████████████████▋    | 43459/50000 [7:53:12<1:14:44,  1.46it/s]


 87%|████████████████████████████▋    | 43460/50000 [7:53:13<1:11:24,  1.53it/s]


 87%|████████████████████████████▋    | 43461/50000 [7:53:14<1:08:11,  1.60it/s]


 87%|████████████████████████████▋    | 43462/50000 [7:53:14<1:07:09,  1.62it/s]


 87%|████████████████████████████▋    | 43463/50000 [7:53:15<1:10:39,  1.54it/s]


 87%|████████████████████████████▋    | 43464/50000 [7:53:16<1:14:20,  1.47it/s]


 87%|████████████████████████████▋    | 43465/50000 [7:53:16<1:12:41,  1.50it/s]


 87%|████████████████████████████▋    | 43466/50000 [7:53:17<1:08:52,  1.58it/s]


 87%|████████████████████████████▋    | 43467/50000 [7:53:18<1:11:49,  1.52it/s]


 87%|████████████████████████████▋    | 43468/50000 [7:53:18<1:15:34,  1.44it/s]


 87%|████████████████████████████▋    | 43469/50000 [7:53:19<1:11:17,  1.53it/s]


 87%|████████████████████████████▋    | 43470/50000 [7:53:20<1:11:48,  1.52it/s]


 87%|████████████████████████████▋    | 43471/50000 [7:53:20<1:11:18,  1.53it/s]


 87%|████████████████████████████▋    | 43472/50000 [7:53:21<1:09:30,  1.57it/s]


 87%|████████████████████████████▋    | 43473/50000 [7:53:21<1:08:03,  1.60it/s]


 87%|████████████████████████████▋    | 43474/50000 [7:53:22<1:11:14,  1.53it/s]


 87%|████████████████████████████▋    | 43475/50000 [7:53:23<1:10:17,  1.55it/s]


 87%|████████████████████████████▋    | 43476/50000 [7:53:24<1:14:40,  1.46it/s]


 87%|████████████████████████████▋    | 43477/50000 [7:53:24<1:11:02,  1.53it/s]


 87%|████████████████████████████▋    | 43478/50000 [7:53:25<1:10:21,  1.54it/s]


 87%|████████████████████████████▋    | 43479/50000 [7:53:25<1:08:25,  1.59it/s]


 87%|████████████████████████████▋    | 43480/50000 [7:53:26<1:09:57,  1.55it/s]


 87%|████████████████████████████▋    | 43481/50000 [7:53:27<1:12:20,  1.50it/s]


 87%|████████████████████████████▋    | 43482/50000 [7:53:27<1:12:22,  1.50it/s]


 87%|████████████████████████████▋    | 43483/50000 [7:53:28<1:07:31,  1.61it/s]


 87%|████████████████████████████▋    | 43484/50000 [7:53:29<1:11:14,  1.52it/s]


 87%|████████████████████████████▋    | 43485/50000 [7:53:29<1:10:53,  1.53it/s]


 87%|████████████████████████████▋    | 43486/50000 [7:53:30<1:11:39,  1.52it/s]


 87%|████████████████████████████▋    | 43487/50000 [7:53:31<1:11:34,  1.52it/s]


 87%|████████████████████████████▋    | 43488/50000 [7:53:31<1:11:37,  1.52it/s]


 87%|████████████████████████████▋    | 43489/50000 [7:53:32<1:08:46,  1.58it/s]


 87%|████████████████████████████▋    | 43490/50000 [7:53:33<1:08:44,  1.58it/s]


 87%|████████████████████████████▋    | 43491/50000 [7:53:33<1:12:28,  1.50it/s]


 87%|████████████████████████████▋    | 43492/50000 [7:53:34<1:09:52,  1.55it/s]


 87%|████████████████████████████▋    | 43493/50000 [7:53:34<1:08:14,  1.59it/s]


 87%|████████████████████████████▋    | 43494/50000 [7:53:35<1:08:07,  1.59it/s]


 87%|████████████████████████████▋    | 43495/50000 [7:53:36<1:05:34,  1.65it/s]


 87%|████████████████████████████▋    | 43496/50000 [7:53:36<1:12:12,  1.50it/s]


 87%|████████████████████████████▋    | 43497/50000 [7:53:37<1:09:28,  1.56it/s]


 87%|████████████████████████████▋    | 43498/50000 [7:53:38<1:06:19,  1.63it/s]


 87%|████████████████████████████▋    | 43499/50000 [7:53:38<1:13:07,  1.48it/s]


 87%|████████████████████████████▋    | 43500/50000 [7:53:39<1:09:08,  1.57it/s]


                                                                                
{'loss': 3.1641, 'grad_norm': 3.6897788047790527, 'learning_rate': 0.00013000000000000002, 'epoch': 2.28}

 87%|████████████████████████████▋    | 43500/50000 [7:53:39<1:09:08,  1.57it/s]


 87%|████████████████████████████▋    | 43501/50000 [7:53:40<1:10:17,  1.54it/s]


 87%|████████████████████████████▋    | 43502/50000 [7:53:40<1:11:28,  1.52it/s]


 87%|████████████████████████████▋    | 43503/50000 [7:53:41<1:09:31,  1.56it/s]


 87%|████████████████████████████▋    | 43504/50000 [7:53:42<1:09:19,  1.56it/s]


 87%|████████████████████████████▋    | 43505/50000 [7:53:42<1:06:16,  1.63it/s]


 87%|████████████████████████████▋    | 43506/50000 [7:53:43<1:14:12,  1.46it/s]


 87%|████████████████████████████▋    | 43507/50000 [7:53:44<1:14:11,  1.46it/s]


 87%|████████████████████████████▋    | 43508/50000 [7:53:44<1:15:27,  1.43it/s]


 87%|████████████████████████████▋    | 43509/50000 [7:53:45<1:16:21,  1.42it/s]


 87%|████████████████████████████▋    | 43510/50000 [7:53:46<1:16:48,  1.41it/s]


 87%|████████████████████████████▋    | 43511/50000 [7:53:46<1:10:46,  1.53it/s]


 87%|████████████████████████████▋    | 43512/50000 [7:53:47<1:08:04,  1.59it/s]


 87%|████████████████████████████▋    | 43513/50000 [7:53:47<1:06:11,  1.63it/s]


 87%|████████████████████████████▋    | 43514/50000 [7:53:48<1:07:43,  1.60it/s]


 87%|████████████████████████████▋    | 43515/50000 [7:53:49<1:07:59,  1.59it/s]


 87%|████████████████████████████▋    | 43516/50000 [7:53:50<1:14:29,  1.45it/s]


 87%|████████████████████████████▋    | 43517/50000 [7:53:50<1:16:06,  1.42it/s]


 87%|████████████████████████████▋    | 43518/50000 [7:53:51<1:26:50,  1.24it/s]


 87%|████████████████████████████▋    | 43519/50000 [7:53:52<1:21:47,  1.32it/s]


 87%|████████████████████████████▋    | 43520/50000 [7:53:53<1:16:46,  1.41it/s]


 87%|████████████████████████████▋    | 43521/50000 [7:53:53<1:14:05,  1.46it/s]


 87%|████████████████████████████▋    | 43522/50000 [7:53:54<1:13:14,  1.47it/s]


 87%|████████████████████████████▋    | 43523/50000 [7:53:54<1:09:23,  1.56it/s]


 87%|████████████████████████████▋    | 43524/50000 [7:53:55<1:11:34,  1.51it/s]


 87%|████████████████████████████▋    | 43525/50000 [7:53:56<1:10:13,  1.54it/s]


 87%|████████████████████████████▋    | 43526/50000 [7:53:56<1:10:07,  1.54it/s]


 87%|████████████████████████████▋    | 43527/50000 [7:53:57<1:09:48,  1.55it/s]


 87%|████████████████████████████▋    | 43528/50000 [7:53:58<1:06:58,  1.61it/s]


 87%|████████████████████████████▋    | 43529/50000 [7:53:58<1:05:35,  1.64it/s]


 87%|████████████████████████████▋    | 43530/50000 [7:53:59<1:06:16,  1.63it/s]


 87%|████████████████████████████▋    | 43531/50000 [7:53:59<1:02:31,  1.72it/s]


 87%|████████████████████████████▋    | 43532/50000 [7:54:00<1:01:25,  1.75it/s]


 87%|████████████████████████████▋    | 43533/50000 [7:54:00<1:02:28,  1.73it/s]


 87%|████████████████████████████▋    | 43534/50000 [7:54:01<1:07:22,  1.60it/s]


 87%|████████████████████████████▋    | 43535/50000 [7:54:02<1:10:36,  1.53it/s]


 87%|████████████████████████████▋    | 43536/50000 [7:54:03<1:10:16,  1.53it/s]


 87%|████████████████████████████▋    | 43537/50000 [7:54:03<1:11:15,  1.51it/s]


 87%|████████████████████████████▋    | 43538/50000 [7:54:04<1:13:52,  1.46it/s]


 87%|████████████████████████████▋    | 43539/50000 [7:54:05<1:11:44,  1.50it/s]


 87%|████████████████████████████▋    | 43540/50000 [7:54:05<1:12:16,  1.49it/s]


 87%|████████████████████████████▋    | 43541/50000 [7:54:06<1:10:11,  1.53it/s]


 87%|████████████████████████████▋    | 43542/50000 [7:54:07<1:08:45,  1.57it/s]


 87%|████████████████████████████▋    | 43543/50000 [7:54:07<1:12:06,  1.49it/s]


 87%|████████████████████████████▋    | 43544/50000 [7:54:08<1:08:15,  1.58it/s]


 87%|████████████████████████████▋    | 43545/50000 [7:54:08<1:08:23,  1.57it/s]


 87%|████████████████████████████▋    | 43546/50000 [7:54:09<1:07:05,  1.60it/s]


 87%|████████████████████████████▋    | 43547/50000 [7:54:10<1:04:31,  1.67it/s]


 87%|████████████████████████████▋    | 43548/50000 [7:54:10<1:05:34,  1.64it/s]


 87%|████████████████████████████▋    | 43549/50000 [7:54:11<1:06:54,  1.61it/s]


 87%|████████████████████████████▋    | 43550/50000 [7:54:12<1:14:24,  1.44it/s]


 87%|████████████████████████████▋    | 43551/50000 [7:54:12<1:10:27,  1.53it/s]


 87%|████████████████████████████▋    | 43552/50000 [7:54:13<1:13:52,  1.45it/s]


 87%|████████████████████████████▋    | 43553/50000 [7:54:14<1:12:30,  1.48it/s]


 87%|████████████████████████████▋    | 43554/50000 [7:54:14<1:14:40,  1.44it/s]


 87%|████████████████████████████▋    | 43555/50000 [7:54:15<1:12:14,  1.49it/s]


 87%|████████████████████████████▋    | 43556/50000 [7:54:16<1:07:22,  1.59it/s]


 87%|████████████████████████████▋    | 43557/50000 [7:54:16<1:05:41,  1.63it/s]


 87%|████████████████████████████▋    | 43558/50000 [7:54:17<1:10:07,  1.53it/s]


 87%|████████████████████████████▋    | 43559/50000 [7:54:18<1:09:38,  1.54it/s]


 87%|████████████████████████████▋    | 43560/50000 [7:54:18<1:07:18,  1.59it/s]


 87%|████████████████████████████▊    | 43561/50000 [7:54:19<1:08:09,  1.57it/s]


 87%|████████████████████████████▊    | 43562/50000 [7:54:20<1:12:17,  1.48it/s]


 87%|████████████████████████████▊    | 43563/50000 [7:54:20<1:18:09,  1.37it/s]


 87%|████████████████████████████▊    | 43564/50000 [7:54:21<1:18:02,  1.37it/s]


 87%|████████████████████████████▊    | 43565/50000 [7:54:22<1:16:00,  1.41it/s]


 87%|████████████████████████████▊    | 43566/50000 [7:54:22<1:13:33,  1.46it/s]


 87%|████████████████████████████▊    | 43567/50000 [7:54:23<1:10:36,  1.52it/s]


 87%|████████████████████████████▊    | 43568/50000 [7:54:24<1:10:33,  1.52it/s]


 87%|████████████████████████████▊    | 43569/50000 [7:54:24<1:14:30,  1.44it/s]


 87%|████████████████████████████▊    | 43570/50000 [7:54:25<1:10:15,  1.53it/s]


 87%|████████████████████████████▊    | 43571/50000 [7:54:26<1:07:29,  1.59it/s]


 87%|████████████████████████████▊    | 43572/50000 [7:54:26<1:08:47,  1.56it/s]


 87%|████████████████████████████▊    | 43573/50000 [7:54:27<1:10:05,  1.53it/s]


 87%|████████████████████████████▊    | 43574/50000 [7:54:28<1:07:35,  1.58it/s]


 87%|████████████████████████████▊    | 43575/50000 [7:54:28<1:08:39,  1.56it/s]


 87%|████████████████████████████▊    | 43576/50000 [7:54:29<1:09:11,  1.55it/s]


 87%|████████████████████████████▊    | 43577/50000 [7:54:30<1:11:28,  1.50it/s]


 87%|████████████████████████████▊    | 43578/50000 [7:54:30<1:09:21,  1.54it/s]


 87%|████████████████████████████▊    | 43579/50000 [7:54:31<1:09:15,  1.55it/s]


 87%|████████████████████████████▊    | 43580/50000 [7:54:32<1:15:27,  1.42it/s]


 87%|████████████████████████████▊    | 43581/50000 [7:54:32<1:15:34,  1.42it/s]


 87%|████████████████████████████▊    | 43582/50000 [7:54:33<1:12:10,  1.48it/s]


 87%|████████████████████████████▊    | 43583/50000 [7:54:34<1:08:35,  1.56it/s]


 87%|████████████████████████████▊    | 43584/50000 [7:54:34<1:06:17,  1.61it/s]


 87%|████████████████████████████▊    | 43585/50000 [7:54:35<1:10:38,  1.51it/s]


 87%|████████████████████████████▊    | 43586/50000 [7:54:35<1:09:43,  1.53it/s]


 87%|████████████████████████████▊    | 43587/50000 [7:54:36<1:10:01,  1.53it/s]


 87%|████████████████████████████▊    | 43588/50000 [7:54:37<1:10:06,  1.52it/s]


 87%|████████████████████████████▊    | 43589/50000 [7:54:37<1:07:20,  1.59it/s]


 87%|████████████████████████████▊    | 43590/50000 [7:54:38<1:05:47,  1.62it/s]


 87%|████████████████████████████▊    | 43591/50000 [7:54:39<1:07:07,  1.59it/s]


 87%|████████████████████████████▊    | 43592/50000 [7:54:39<1:06:24,  1.61it/s]


 87%|████████████████████████████▊    | 43593/50000 [7:54:40<1:07:20,  1.59it/s]


 87%|████████████████████████████▊    | 43594/50000 [7:54:40<1:05:23,  1.63it/s]


 87%|████████████████████████████▊    | 43595/50000 [7:54:41<1:03:25,  1.68it/s]


 87%|████████████████████████████▊    | 43596/50000 [7:54:42<1:04:26,  1.66it/s]


 87%|████████████████████████████▊    | 43597/50000 [7:54:42<1:06:10,  1.61it/s]


 87%|████████████████████████████▊    | 43598/50000 [7:54:43<1:04:50,  1.65it/s]


 87%|████████████████████████████▊    | 43599/50000 [7:54:44<1:06:13,  1.61it/s]


 87%|████████████████████████████▊    | 43600/50000 [7:54:44<1:09:21,  1.54it/s]
                                                                                
{'loss': 3.1319, 'grad_norm': 3.772240400314331, 'learning_rate': 0.000128, 'epoch': 2.28}

 87%|████████████████████████████▊    | 43600/50000 [7:54:44<1:09:21,  1.54it/s]


 87%|████████████████████████████▊    | 43601/50000 [7:54:45<1:09:04,  1.54it/s]


 87%|████████████████████████████▊    | 43602/50000 [7:54:45<1:07:13,  1.59it/s]


 87%|████████████████████████████▊    | 43603/50000 [7:54:46<1:08:33,  1.56it/s]


 87%|████████████████████████████▊    | 43604/50000 [7:54:47<1:05:27,  1.63it/s]


 87%|████████████████████████████▊    | 43605/50000 [7:54:47<1:04:30,  1.65it/s]


 87%|████████████████████████████▊    | 43606/50000 [7:54:48<1:05:27,  1.63it/s]


 87%|████████████████████████████▊    | 43607/50000 [7:54:49<1:06:30,  1.60it/s]


 87%|████████████████████████████▊    | 43608/50000 [7:54:49<1:06:50,  1.59it/s]


 87%|████████████████████████████▊    | 43609/50000 [7:54:50<1:03:30,  1.68it/s]


 87%|████████████████████████████▊    | 43610/50000 [7:54:50<1:05:18,  1.63it/s]


 87%|████████████████████████████▊    | 43611/50000 [7:54:51<1:04:34,  1.65it/s]


 87%|████████████████████████████▊    | 43612/50000 [7:54:52<1:03:51,  1.67it/s]


 87%|████████████████████████████▊    | 43613/50000 [7:54:52<1:05:22,  1.63it/s]


 87%|████████████████████████████▊    | 43614/50000 [7:54:53<1:06:14,  1.61it/s]


 87%|████████████████████████████▊    | 43615/50000 [7:54:53<1:04:22,  1.65it/s]


 87%|████████████████████████████▊    | 43616/50000 [7:54:54<1:03:38,  1.67it/s]


 87%|████████████████████████████▊    | 43617/50000 [7:54:55<1:05:43,  1.62it/s]


 87%|████████████████████████████▊    | 43618/50000 [7:54:55<1:03:35,  1.67it/s]


 87%|████████████████████████████▊    | 43619/50000 [7:54:56<1:03:05,  1.69it/s]


 87%|████████████████████████████▊    | 43620/50000 [7:54:57<1:08:52,  1.54it/s]


 87%|████████████████████████████▊    | 43621/50000 [7:54:57<1:11:05,  1.50it/s]


 87%|████████████████████████████▊    | 43622/50000 [7:54:58<1:10:56,  1.50it/s]


 87%|████████████████████████████▊    | 43623/50000 [7:54:59<1:09:37,  1.53it/s]


 87%|████████████████████████████▊    | 43624/50000 [7:54:59<1:11:35,  1.48it/s]


 87%|████████████████████████████▊    | 43625/50000 [7:55:00<1:09:04,  1.54it/s]


 87%|████████████████████████████▊    | 43626/50000 [7:55:01<1:08:41,  1.55it/s]


 87%|████████████████████████████▊    | 43627/50000 [7:55:01<1:08:49,  1.54it/s]


 87%|████████████████████████████▊    | 43628/50000 [7:55:02<1:08:23,  1.55it/s]


 87%|████████████████████████████▊    | 43629/50000 [7:55:02<1:09:49,  1.52it/s]


 87%|████████████████████████████▊    | 43630/50000 [7:55:03<1:08:16,  1.55it/s]


 87%|████████████████████████████▊    | 43631/50000 [7:55:04<1:11:30,  1.48it/s]


 87%|████████████████████████████▊    | 43632/50000 [7:55:05<1:11:40,  1.48it/s]


 87%|████████████████████████████▊    | 43633/50000 [7:55:05<1:11:34,  1.48it/s]


 87%|████████████████████████████▊    | 43634/50000 [7:55:06<1:08:26,  1.55it/s]


 87%|████████████████████████████▊    | 43635/50000 [7:55:06<1:09:26,  1.53it/s]


 87%|████████████████████████████▊    | 43636/50000 [7:55:07<1:09:45,  1.52it/s]


 87%|████████████████████████████▊    | 43637/50000 [7:55:08<1:15:34,  1.40it/s]


 87%|████████████████████████████▊    | 43638/50000 [7:55:08<1:10:13,  1.51it/s]


 87%|████████████████████████████▊    | 43639/50000 [7:55:09<1:08:52,  1.54it/s]


 87%|████████████████████████████▊    | 43640/50000 [7:55:10<1:05:42,  1.61it/s]


 87%|████████████████████████████▊    | 43641/50000 [7:55:11<1:12:43,  1.46it/s]


 87%|████████████████████████████▊    | 43642/50000 [7:55:11<1:15:06,  1.41it/s]


 87%|████████████████████████████▊    | 43643/50000 [7:55:12<1:15:05,  1.41it/s]


 87%|████████████████████████████▊    | 43644/50000 [7:55:13<1:18:40,  1.35it/s]


 87%|████████████████████████████▊    | 43645/50000 [7:55:13<1:16:23,  1.39it/s]


 87%|████████████████████████████▊    | 43646/50000 [7:55:14<1:16:38,  1.38it/s]


 87%|████████████████████████████▊    | 43647/50000 [7:55:15<1:11:42,  1.48it/s]


 87%|████████████████████████████▊    | 43648/50000 [7:55:15<1:08:28,  1.55it/s]


 87%|████████████████████████████▊    | 43649/50000 [7:55:16<1:07:59,  1.56it/s]


 87%|████████████████████████████▊    | 43650/50000 [7:55:17<1:08:52,  1.54it/s]


 87%|████████████████████████████▊    | 43651/50000 [7:55:17<1:06:53,  1.58it/s]


 87%|████████████████████████████▊    | 43652/50000 [7:55:18<1:08:06,  1.55it/s]


 87%|████████████████████████████▊    | 43653/50000 [7:55:18<1:05:33,  1.61it/s]


 87%|████████████████████████████▊    | 43654/50000 [7:55:19<1:07:21,  1.57it/s]


 87%|████████████████████████████▊    | 43655/50000 [7:55:20<1:08:24,  1.55it/s]


 87%|████████████████████████████▊    | 43656/50000 [7:55:20<1:09:21,  1.52it/s]


 87%|████████████████████████████▊    | 43657/50000 [7:55:21<1:06:01,  1.60it/s]


 87%|████████████████████████████▊    | 43658/50000 [7:55:22<1:12:15,  1.46it/s]


 87%|████████████████████████████▊    | 43659/50000 [7:55:22<1:09:14,  1.53it/s]


 87%|████████████████████████████▊    | 43660/50000 [7:55:23<1:08:31,  1.54it/s]


 87%|████████████████████████████▊    | 43661/50000 [7:55:24<1:09:12,  1.53it/s]


 87%|████████████████████████████▊    | 43662/50000 [7:55:24<1:06:06,  1.60it/s]


 87%|████████████████████████████▊    | 43663/50000 [7:55:25<1:06:29,  1.59it/s]


 87%|████████████████████████████▊    | 43664/50000 [7:55:26<1:05:46,  1.61it/s]


 87%|████████████████████████████▊    | 43665/50000 [7:55:26<1:05:15,  1.62it/s]


 87%|████████████████████████████▊    | 43666/50000 [7:55:27<1:06:30,  1.59it/s]


 87%|████████████████████████████▊    | 43667/50000 [7:55:27<1:04:47,  1.63it/s]


 87%|████████████████████████████▊    | 43668/50000 [7:55:28<1:05:23,  1.61it/s]


 87%|████████████████████████████▊    | 43669/50000 [7:55:29<1:05:38,  1.61it/s]


 87%|████████████████████████████▊    | 43670/50000 [7:55:29<1:06:33,  1.59it/s]


 87%|████████████████████████████▊    | 43671/50000 [7:55:30<1:05:10,  1.62it/s]


 87%|████████████████████████████▊    | 43672/50000 [7:55:31<1:08:54,  1.53it/s]


 87%|████████████████████████████▊    | 43673/50000 [7:55:31<1:08:03,  1.55it/s]


 87%|████████████████████████████▊    | 43674/50000 [7:55:32<1:06:14,  1.59it/s]


 87%|████████████████████████████▊    | 43675/50000 [7:55:33<1:06:44,  1.58it/s]


 87%|████████████████████████████▊    | 43676/50000 [7:55:33<1:07:18,  1.57it/s]


 87%|████████████████████████████▊    | 43677/50000 [7:55:34<1:07:29,  1.56it/s]


 87%|████████████████████████████▊    | 43678/50000 [7:55:34<1:07:37,  1.56it/s]


 87%|████████████████████████████▊    | 43679/50000 [7:55:35<1:08:19,  1.54it/s]


 87%|████████████████████████████▊    | 43680/50000 [7:55:36<1:08:23,  1.54it/s]


 87%|████████████████████████████▊    | 43681/50000 [7:55:36<1:05:27,  1.61it/s]


 87%|████████████████████████████▊    | 43682/50000 [7:55:37<1:09:08,  1.52it/s]


 87%|████████████████████████████▊    | 43683/50000 [7:55:38<1:13:03,  1.44it/s]


 87%|████████████████████████████▊    | 43684/50000 [7:55:38<1:09:55,  1.51it/s]


 87%|████████████████████████████▊    | 43685/50000 [7:55:39<1:09:07,  1.52it/s]


 87%|████████████████████████████▊    | 43686/50000 [7:55:40<1:09:21,  1.52it/s]


 87%|████████████████████████████▊    | 43687/50000 [7:55:40<1:06:31,  1.58it/s]


 87%|████████████████████████████▊    | 43688/50000 [7:55:41<1:05:18,  1.61it/s]


 87%|████████████████████████████▊    | 43689/50000 [7:55:42<1:05:41,  1.60it/s]


 87%|████████████████████████████▊    | 43690/50000 [7:55:42<1:02:15,  1.69it/s]


 87%|██████████████████████████████▌    | 43691/50000 [7:55:43<59:38,  1.76it/s]


 87%|████████████████████████████▊    | 43692/50000 [7:55:43<1:02:17,  1.69it/s]


 87%|████████████████████████████▊    | 43693/50000 [7:55:44<1:02:09,  1.69it/s]


 87%|████████████████████████████▊    | 43694/50000 [7:55:44<1:04:50,  1.62it/s]


 87%|████████████████████████████▊    | 43695/50000 [7:55:45<1:09:00,  1.52it/s]


 87%|████████████████████████████▊    | 43696/50000 [7:55:46<1:09:03,  1.52it/s]


 87%|████████████████████████████▊    | 43697/50000 [7:55:47<1:09:28,  1.51it/s]


 87%|████████████████████████████▊    | 43698/50000 [7:55:48<1:19:24,  1.32it/s]


 87%|████████████████████████████▊    | 43699/50000 [7:55:48<1:16:19,  1.38it/s]


 87%|████████████████████████████▊    | 43700/50000 [7:55:49<1:16:01,  1.38it/s]
                                                                                
{'loss': 3.1205, 'grad_norm': 3.048663377761841, 'learning_rate': 0.000126, 'epoch': 2.29}

 87%|████████████████████████████▊    | 43700/50000 [7:55:49<1:16:01,  1.38it/s]


 87%|████████████████████████████▊    | 43701/50000 [7:55:50<1:21:46,  1.28it/s]


 87%|████████████████████████████▊    | 43702/50000 [7:55:50<1:17:20,  1.36it/s]


 87%|████████████████████████████▊    | 43703/50000 [7:55:51<1:14:13,  1.41it/s]


 87%|████████████████████████████▊    | 43704/50000 [7:55:52<1:10:50,  1.48it/s]


 87%|████████████████████████████▊    | 43705/50000 [7:55:53<1:15:27,  1.39it/s]


 87%|████████████████████████████▊    | 43706/50000 [7:55:53<1:11:19,  1.47it/s]


 87%|████████████████████████████▊    | 43707/50000 [7:55:54<1:12:29,  1.45it/s]


 87%|████████████████████████████▊    | 43708/50000 [7:55:54<1:11:02,  1.48it/s]


 87%|████████████████████████████▊    | 43709/50000 [7:55:55<1:08:38,  1.53it/s]


 87%|████████████████████████████▊    | 43710/50000 [7:55:56<1:10:58,  1.48it/s]


 87%|████████████████████████████▊    | 43711/50000 [7:55:56<1:07:42,  1.55it/s]


 87%|████████████████████████████▊    | 43712/50000 [7:55:57<1:08:18,  1.53it/s]


 87%|████████████████████████████▊    | 43713/50000 [7:55:58<1:06:06,  1.58it/s]


 87%|████████████████████████████▊    | 43714/50000 [7:55:58<1:09:24,  1.51it/s]


 87%|████████████████████████████▊    | 43715/50000 [7:55:59<1:09:03,  1.52it/s]


 87%|████████████████████████████▊    | 43716/50000 [7:56:00<1:07:23,  1.55it/s]


 87%|████████████████████████████▊    | 43717/50000 [7:56:00<1:08:26,  1.53it/s]


 87%|████████████████████████████▊    | 43718/50000 [7:56:01<1:07:34,  1.55it/s]


 87%|████████████████████████████▊    | 43719/50000 [7:56:01<1:03:16,  1.65it/s]


 87%|████████████████████████████▊    | 43720/50000 [7:56:02<1:05:45,  1.59it/s]


 87%|████████████████████████████▊    | 43721/50000 [7:56:03<1:06:28,  1.57it/s]


 87%|████████████████████████████▊    | 43722/50000 [7:56:03<1:09:14,  1.51it/s]


 87%|████████████████████████████▊    | 43723/50000 [7:56:04<1:08:07,  1.54it/s]


 87%|████████████████████████████▊    | 43724/50000 [7:56:05<1:08:31,  1.53it/s]


 87%|████████████████████████████▊    | 43725/50000 [7:56:05<1:07:36,  1.55it/s]


 87%|████████████████████████████▊    | 43726/50000 [7:56:06<1:08:08,  1.53it/s]


 87%|████████████████████████████▊    | 43727/50000 [7:56:07<1:08:43,  1.52it/s]


 87%|████████████████████████████▊    | 43728/50000 [7:56:07<1:09:19,  1.51it/s]


 87%|████████████████████████████▊    | 43729/50000 [7:56:08<1:09:32,  1.50it/s]


 87%|████████████████████████████▊    | 43730/50000 [7:56:09<1:09:04,  1.51it/s]


 87%|████████████████████████████▊    | 43731/50000 [7:56:09<1:06:23,  1.57it/s]


 87%|████████████████████████████▊    | 43732/50000 [7:56:10<1:06:06,  1.58it/s]


 87%|████████████████████████████▊    | 43733/50000 [7:56:11<1:09:57,  1.49it/s]


 87%|████████████████████████████▊    | 43734/50000 [7:56:11<1:09:10,  1.51it/s]


 87%|████████████████████████████▊    | 43735/50000 [7:56:12<1:11:16,  1.47it/s]


 87%|████████████████████████████▊    | 43736/50000 [7:56:13<1:12:05,  1.45it/s]


 87%|████████████████████████████▊    | 43737/50000 [7:56:13<1:10:16,  1.49it/s]


 87%|████████████████████████████▊    | 43738/50000 [7:56:14<1:06:55,  1.56it/s]


 87%|████████████████████████████▊    | 43739/50000 [7:56:15<1:03:56,  1.63it/s]


 87%|████████████████████████████▊    | 43740/50000 [7:56:15<1:03:09,  1.65it/s]


 87%|████████████████████████████▊    | 43741/50000 [7:56:16<1:03:48,  1.63it/s]


 87%|████████████████████████████▊    | 43742/50000 [7:56:16<1:06:01,  1.58it/s]


 87%|████████████████████████████▊    | 43743/50000 [7:56:17<1:08:38,  1.52it/s]


 87%|████████████████████████████▊    | 43744/50000 [7:56:18<1:10:44,  1.47it/s]


 87%|████████████████████████████▊    | 43745/50000 [7:56:19<1:10:20,  1.48it/s]


 87%|████████████████████████████▊    | 43746/50000 [7:56:19<1:06:55,  1.56it/s]


 87%|████████████████████████████▊    | 43747/50000 [7:56:20<1:04:42,  1.61it/s]


 87%|████████████████████████████▊    | 43748/50000 [7:56:20<1:03:08,  1.65it/s]


 87%|████████████████████████████▊    | 43749/50000 [7:56:21<1:05:13,  1.60it/s]


 88%|████████████████████████████▉    | 43750/50000 [7:56:22<1:04:29,  1.62it/s]


 88%|████████████████████████████▉    | 43751/50000 [7:56:22<1:10:52,  1.47it/s]


 88%|████████████████████████████▉    | 43752/50000 [7:56:23<1:10:49,  1.47it/s]


 88%|████████████████████████████▉    | 43753/50000 [7:56:24<1:07:14,  1.55it/s]


 88%|████████████████████████████▉    | 43754/50000 [7:56:24<1:06:52,  1.56it/s]


 88%|████████████████████████████▉    | 43755/50000 [7:56:25<1:04:09,  1.62it/s]


 88%|████████████████████████████▉    | 43756/50000 [7:56:25<1:03:52,  1.63it/s]


 88%|████████████████████████████▉    | 43757/50000 [7:56:26<1:04:20,  1.62it/s]


 88%|████████████████████████████▉    | 43758/50000 [7:56:27<1:07:53,  1.53it/s]


 88%|████████████████████████████▉    | 43759/50000 [7:56:27<1:06:19,  1.57it/s]


 88%|████████████████████████████▉    | 43760/50000 [7:56:28<1:03:28,  1.64it/s]


 88%|████████████████████████████▉    | 43761/50000 [7:56:29<1:03:50,  1.63it/s]


 88%|████████████████████████████▉    | 43762/50000 [7:56:29<1:03:05,  1.65it/s]


 88%|████████████████████████████▉    | 43763/50000 [7:56:30<1:02:57,  1.65it/s]


 88%|████████████████████████████▉    | 43764/50000 [7:56:30<1:01:40,  1.69it/s]


 88%|████████████████████████████▉    | 43765/50000 [7:56:31<1:00:11,  1.73it/s]


 88%|████████████████████████████▉    | 43766/50000 [7:56:32<1:05:08,  1.60it/s]


 88%|████████████████████████████▉    | 43767/50000 [7:56:32<1:06:44,  1.56it/s]


 88%|████████████████████████████▉    | 43768/50000 [7:56:33<1:09:08,  1.50it/s]


 88%|████████████████████████████▉    | 43769/50000 [7:56:34<1:06:47,  1.55it/s]


 88%|████████████████████████████▉    | 43770/50000 [7:56:34<1:04:44,  1.60it/s]


 88%|████████████████████████████▉    | 43771/50000 [7:56:35<1:05:39,  1.58it/s]


 88%|████████████████████████████▉    | 43772/50000 [7:56:35<1:01:19,  1.69it/s]


 88%|████████████████████████████▉    | 43773/50000 [7:56:36<1:00:55,  1.70it/s]


 88%|████████████████████████████▉    | 43774/50000 [7:56:36<1:02:35,  1.66it/s]


 88%|████████████████████████████▉    | 43775/50000 [7:56:37<1:07:23,  1.54it/s]


 88%|████████████████████████████▉    | 43776/50000 [7:56:38<1:07:06,  1.55it/s]


 88%|████████████████████████████▉    | 43777/50000 [7:56:39<1:06:51,  1.55it/s]


 88%|████████████████████████████▉    | 43778/50000 [7:56:39<1:10:19,  1.47it/s]


 88%|████████████████████████████▉    | 43779/50000 [7:56:40<1:07:55,  1.53it/s]


 88%|████████████████████████████▉    | 43780/50000 [7:56:41<1:07:13,  1.54it/s]


 88%|████████████████████████████▉    | 43781/50000 [7:56:41<1:07:18,  1.54it/s]


 88%|████████████████████████████▉    | 43782/50000 [7:56:42<1:04:25,  1.61it/s]


 88%|████████████████████████████▉    | 43783/50000 [7:56:42<1:03:22,  1.63it/s]


 88%|████████████████████████████▉    | 43784/50000 [7:56:43<1:05:36,  1.58it/s]


 88%|████████████████████████████▉    | 43785/50000 [7:56:44<1:11:30,  1.45it/s]


 88%|████████████████████████████▉    | 43786/50000 [7:56:44<1:10:29,  1.47it/s]


 88%|████████████████████████████▉    | 43787/50000 [7:56:45<1:12:48,  1.42it/s]


 88%|████████████████████████████▉    | 43788/50000 [7:56:46<1:09:36,  1.49it/s]


 88%|████████████████████████████▉    | 43789/50000 [7:56:47<1:09:55,  1.48it/s]


 88%|████████████████████████████▉    | 43790/50000 [7:56:47<1:08:46,  1.51it/s]


 88%|████████████████████████████▉    | 43791/50000 [7:56:48<1:08:57,  1.50it/s]


 88%|████████████████████████████▉    | 43792/50000 [7:56:48<1:06:08,  1.56it/s]


 88%|████████████████████████████▉    | 43793/50000 [7:56:49<1:04:32,  1.60it/s]


 88%|████████████████████████████▉    | 43794/50000 [7:56:50<1:04:59,  1.59it/s]


 88%|████████████████████████████▉    | 43795/50000 [7:56:50<1:05:54,  1.57it/s]


 88%|████████████████████████████▉    | 43796/50000 [7:56:51<1:06:43,  1.55it/s]


 88%|████████████████████████████▉    | 43797/50000 [7:56:52<1:04:44,  1.60it/s]


 88%|████████████████████████████▉    | 43798/50000 [7:56:52<1:02:57,  1.64it/s]


 88%|████████████████████████████▉    | 43799/50000 [7:56:53<1:03:51,  1.62it/s]


 88%|████████████████████████████▉    | 43800/50000 [7:56:53<1:00:06,  1.72it/s]
                                                                                
{'loss': 3.1768, 'grad_norm': 2.9078025817871094, 'learning_rate': 0.000124, 'epoch': 2.29}

 88%|████████████████████████████▉    | 43800/50000 [7:56:53<1:00:06,  1.72it/s]


 88%|████████████████████████████▉    | 43801/50000 [7:56:54<1:05:05,  1.59it/s]


 88%|████████████████████████████▉    | 43802/50000 [7:56:55<1:08:45,  1.50it/s]


 88%|████████████████████████████▉    | 43803/50000 [7:56:55<1:08:55,  1.50it/s]


 88%|████████████████████████████▉    | 43804/50000 [7:56:56<1:04:24,  1.60it/s]


 88%|████████████████████████████▉    | 43805/50000 [7:56:56<1:01:14,  1.69it/s]


 88%|████████████████████████████▉    | 43806/50000 [7:56:57<1:03:14,  1.63it/s]


 88%|████████████████████████████▉    | 43807/50000 [7:56:58<1:07:16,  1.53it/s]


 88%|████████████████████████████▉    | 43808/50000 [7:56:58<1:04:08,  1.61it/s]


 88%|████████████████████████████▉    | 43809/50000 [7:56:59<1:10:54,  1.46it/s]


 88%|████████████████████████████▉    | 43810/50000 [7:57:00<1:08:02,  1.52it/s]


 88%|████████████████████████████▉    | 43811/50000 [7:57:00<1:05:38,  1.57it/s]


 88%|████████████████████████████▉    | 43812/50000 [7:57:01<1:08:21,  1.51it/s]


 88%|████████████████████████████▉    | 43813/50000 [7:57:02<1:06:13,  1.56it/s]


 88%|████████████████████████████▉    | 43814/50000 [7:57:02<1:04:32,  1.60it/s]


 88%|████████████████████████████▉    | 43815/50000 [7:57:03<1:03:56,  1.61it/s]


 88%|████████████████████████████▉    | 43816/50000 [7:57:04<1:05:06,  1.58it/s]


 88%|████████████████████████████▉    | 43817/50000 [7:57:04<1:04:01,  1.61it/s]


 88%|████████████████████████████▉    | 43818/50000 [7:57:05<1:03:27,  1.62it/s]


 88%|████████████████████████████▉    | 43819/50000 [7:57:05<1:04:56,  1.59it/s]


 88%|████████████████████████████▉    | 43820/50000 [7:57:06<1:05:58,  1.56it/s]


 88%|████████████████████████████▉    | 43821/50000 [7:57:07<1:09:07,  1.49it/s]


 88%|████████████████████████████▉    | 43822/50000 [7:57:07<1:06:00,  1.56it/s]


 88%|████████████████████████████▉    | 43823/50000 [7:57:08<1:04:57,  1.58it/s]


 88%|████████████████████████████▉    | 43824/50000 [7:57:09<1:06:22,  1.55it/s]


 88%|████████████████████████████▉    | 43825/50000 [7:57:09<1:07:20,  1.53it/s]


 88%|████████████████████████████▉    | 43826/50000 [7:57:10<1:05:41,  1.57it/s]


 88%|████████████████████████████▉    | 43827/50000 [7:57:11<1:05:30,  1.57it/s]


 88%|████████████████████████████▉    | 43828/50000 [7:57:11<1:08:24,  1.50it/s]


 88%|████████████████████████████▉    | 43829/50000 [7:57:12<1:09:44,  1.47it/s]


 88%|████████████████████████████▉    | 43830/50000 [7:57:13<1:06:34,  1.54it/s]


 88%|████████████████████████████▉    | 43831/50000 [7:57:13<1:09:42,  1.47it/s]


 88%|████████████████████████████▉    | 43832/50000 [7:57:14<1:07:19,  1.53it/s]


 88%|████████████████████████████▉    | 43833/50000 [7:57:15<1:05:52,  1.56it/s]


 88%|████████████████████████████▉    | 43834/50000 [7:57:15<1:07:12,  1.53it/s]


 88%|████████████████████████████▉    | 43835/50000 [7:57:16<1:08:09,  1.51it/s]


 88%|████████████████████████████▉    | 43836/50000 [7:57:17<1:04:52,  1.58it/s]


 88%|████████████████████████████▉    | 43837/50000 [7:57:17<1:04:43,  1.59it/s]


 88%|████████████████████████████▉    | 43838/50000 [7:57:18<1:03:38,  1.61it/s]


 88%|████████████████████████████▉    | 43839/50000 [7:57:18<1:02:32,  1.64it/s]


 88%|████████████████████████████▉    | 43840/50000 [7:57:19<1:09:21,  1.48it/s]


 88%|████████████████████████████▉    | 43841/50000 [7:57:20<1:06:44,  1.54it/s]


 88%|████████████████████████████▉    | 43842/50000 [7:57:20<1:04:42,  1.59it/s]


 88%|████████████████████████████▉    | 43843/50000 [7:57:21<1:03:56,  1.60it/s]


 88%|████████████████████████████▉    | 43844/50000 [7:57:21<1:00:57,  1.68it/s]


 88%|████████████████████████████▉    | 43845/50000 [7:57:22<1:00:39,  1.69it/s]


 88%|████████████████████████████▉    | 43846/50000 [7:57:23<1:10:23,  1.46it/s]


 88%|████████████████████████████▉    | 43847/50000 [7:57:24<1:09:46,  1.47it/s]


 88%|████████████████████████████▉    | 43848/50000 [7:57:24<1:03:26,  1.62it/s]


 88%|████████████████████████████▉    | 43849/50000 [7:57:25<1:05:18,  1.57it/s]


 88%|████████████████████████████▉    | 43850/50000 [7:57:26<1:08:25,  1.50it/s]


 88%|████████████████████████████▉    | 43851/50000 [7:57:26<1:08:24,  1.50it/s]


 88%|████████████████████████████▉    | 43852/50000 [7:57:27<1:07:24,  1.52it/s]


 88%|████████████████████████████▉    | 43853/50000 [7:57:28<1:13:11,  1.40it/s]


 88%|████████████████████████████▉    | 43854/50000 [7:57:28<1:10:58,  1.44it/s]


 88%|████████████████████████████▉    | 43855/50000 [7:57:29<1:14:49,  1.37it/s]


 88%|████████████████████████████▉    | 43856/50000 [7:57:30<1:18:05,  1.31it/s]


 88%|████████████████████████████▉    | 43857/50000 [7:57:31<1:11:39,  1.43it/s]


 88%|████████████████████████████▉    | 43858/50000 [7:57:31<1:10:18,  1.46it/s]


 88%|████████████████████████████▉    | 43859/50000 [7:57:32<1:11:31,  1.43it/s]


 88%|████████████████████████████▉    | 43860/50000 [7:57:33<1:09:11,  1.48it/s]


 88%|████████████████████████████▉    | 43861/50000 [7:57:33<1:06:04,  1.55it/s]


 88%|████████████████████████████▉    | 43862/50000 [7:57:34<1:07:17,  1.52it/s]


 88%|████████████████████████████▉    | 43863/50000 [7:57:34<1:07:21,  1.52it/s]


 88%|████████████████████████████▉    | 43864/50000 [7:57:35<1:07:02,  1.53it/s]


 88%|████████████████████████████▉    | 43865/50000 [7:57:36<1:07:10,  1.52it/s]


 88%|████████████████████████████▉    | 43866/50000 [7:57:36<1:06:40,  1.53it/s]


 88%|████████████████████████████▉    | 43867/50000 [7:57:37<1:06:09,  1.54it/s]


 88%|████████████████████████████▉    | 43868/50000 [7:57:38<1:04:19,  1.59it/s]


 88%|████████████████████████████▉    | 43869/50000 [7:57:38<1:03:26,  1.61it/s]


 88%|████████████████████████████▉    | 43870/50000 [7:57:39<1:00:16,  1.69it/s]


 88%|██████████████████████████████▋    | 43871/50000 [7:57:39<59:20,  1.72it/s]


 88%|████████████████████████████▉    | 43872/50000 [7:57:40<1:01:10,  1.67it/s]


 88%|████████████████████████████▉    | 43873/50000 [7:57:41<1:04:35,  1.58it/s]


 88%|████████████████████████████▉    | 43874/50000 [7:57:41<1:04:45,  1.58it/s]


 88%|████████████████████████████▉    | 43875/50000 [7:57:42<1:03:37,  1.60it/s]


 88%|████████████████████████████▉    | 43876/50000 [7:57:42<1:02:26,  1.63it/s]


 88%|████████████████████████████▉    | 43877/50000 [7:57:43<1:03:57,  1.60it/s]


 88%|██████████████████████████████▋    | 43878/50000 [7:57:44<59:52,  1.70it/s]


 88%|████████████████████████████▉    | 43879/50000 [7:57:44<1:04:18,  1.59it/s]


 88%|████████████████████████████▉    | 43880/50000 [7:57:45<1:04:49,  1.57it/s]


 88%|████████████████████████████▉    | 43881/50000 [7:57:46<1:03:15,  1.61it/s]


 88%|████████████████████████████▉    | 43882/50000 [7:57:46<1:02:32,  1.63it/s]


 88%|████████████████████████████▉    | 43883/50000 [7:57:47<1:01:16,  1.66it/s]


 88%|████████████████████████████▉    | 43884/50000 [7:57:47<1:03:26,  1.61it/s]


 88%|████████████████████████████▉    | 43885/50000 [7:57:48<1:04:12,  1.59it/s]


 88%|████████████████████████████▉    | 43886/50000 [7:57:49<1:04:10,  1.59it/s]


 88%|████████████████████████████▉    | 43887/50000 [7:57:49<1:03:11,  1.61it/s]


 88%|██████████████████████████████▋    | 43888/50000 [7:57:50<58:32,  1.74it/s]


 88%|████████████████████████████▉    | 43889/50000 [7:57:50<1:01:20,  1.66it/s]


 88%|████████████████████████████▉    | 43890/50000 [7:57:51<1:02:30,  1.63it/s]


 88%|████████████████████████████▉    | 43891/50000 [7:57:52<1:02:00,  1.64it/s]


 88%|████████████████████████████▉    | 43892/50000 [7:57:52<1:02:37,  1.63it/s]


 88%|████████████████████████████▉    | 43893/50000 [7:57:53<1:05:00,  1.57it/s]


 88%|████████████████████████████▉    | 43894/50000 [7:57:54<1:05:28,  1.55it/s]


 88%|████████████████████████████▉    | 43895/50000 [7:57:54<1:01:38,  1.65it/s]


 88%|████████████████████████████▉    | 43896/50000 [7:57:55<1:03:41,  1.60it/s]


 88%|████████████████████████████▉    | 43897/50000 [7:57:56<1:06:44,  1.52it/s]


 88%|████████████████████████████▉    | 43898/50000 [7:57:56<1:01:34,  1.65it/s]


 88%|████████████████████████████▉    | 43899/50000 [7:57:57<1:02:49,  1.62it/s]


 88%|████████████████████████████▉    | 43900/50000 [7:57:57<1:03:54,  1.59it/s]
                                                                                
{'loss': 3.1133, 'grad_norm': 5.702817440032959, 'learning_rate': 0.000122, 'epoch': 2.3}

 88%|████████████████████████████▉    | 43900/50000 [7:57:57<1:03:54,  1.59it/s]


 88%|████████████████████████████▉    | 43901/50000 [7:57:58<1:09:30,  1.46it/s]


 88%|████████████████████████████▉    | 43902/50000 [7:57:59<1:06:29,  1.53it/s]


 88%|████████████████████████████▉    | 43903/50000 [7:57:59<1:06:15,  1.53it/s]


 88%|████████████████████████████▉    | 43904/50000 [7:58:00<1:04:00,  1.59it/s]


 88%|████████████████████████████▉    | 43905/50000 [7:58:01<1:02:47,  1.62it/s]


 88%|████████████████████████████▉    | 43906/50000 [7:58:01<1:08:41,  1.48it/s]


 88%|████████████████████████████▉    | 43907/50000 [7:58:02<1:11:47,  1.41it/s]


 88%|████████████████████████████▉    | 43908/50000 [7:58:03<1:10:56,  1.43it/s]


 88%|████████████████████████████▉    | 43909/50000 [7:58:04<1:09:33,  1.46it/s]


 88%|████████████████████████████▉    | 43910/50000 [7:58:04<1:05:54,  1.54it/s]


 88%|████████████████████████████▉    | 43911/50000 [7:58:05<1:03:10,  1.61it/s]


 88%|████████████████████████████▉    | 43912/50000 [7:58:05<1:01:25,  1.65it/s]


 88%|████████████████████████████▉    | 43913/50000 [7:58:06<1:03:38,  1.59it/s]


 88%|████████████████████████████▉    | 43914/50000 [7:58:07<1:06:49,  1.52it/s]


 88%|████████████████████████████▉    | 43915/50000 [7:58:07<1:04:33,  1.57it/s]


 88%|████████████████████████████▉    | 43916/50000 [7:58:08<1:05:28,  1.55it/s]


 88%|████████████████████████████▉    | 43917/50000 [7:58:08<1:03:46,  1.59it/s]


 88%|████████████████████████████▉    | 43918/50000 [7:58:09<1:03:09,  1.60it/s]


 88%|████████████████████████████▉    | 43919/50000 [7:58:10<1:02:21,  1.63it/s]


 88%|████████████████████████████▉    | 43920/50000 [7:58:10<1:05:33,  1.55it/s]


 88%|████████████████████████████▉    | 43921/50000 [7:58:11<1:06:19,  1.53it/s]


 88%|████████████████████████████▉    | 43922/50000 [7:58:12<1:05:42,  1.54it/s]


 88%|████████████████████████████▉    | 43923/50000 [7:58:12<1:02:38,  1.62it/s]


 88%|████████████████████████████▉    | 43924/50000 [7:58:13<1:01:15,  1.65it/s]


 88%|████████████████████████████▉    | 43925/50000 [7:58:13<1:03:18,  1.60it/s]


 88%|████████████████████████████▉    | 43926/50000 [7:58:14<1:04:01,  1.58it/s]


 88%|████████████████████████████▉    | 43927/50000 [7:58:15<1:05:49,  1.54it/s]


 88%|████████████████████████████▉    | 43928/50000 [7:58:16<1:06:54,  1.51it/s]


 88%|████████████████████████████▉    | 43929/50000 [7:58:16<1:04:57,  1.56it/s]


 88%|████████████████████████████▉    | 43930/50000 [7:58:17<1:07:42,  1.49it/s]


 88%|████████████████████████████▉    | 43931/50000 [7:58:17<1:03:58,  1.58it/s]


 88%|████████████████████████████▉    | 43932/50000 [7:58:18<1:04:03,  1.58it/s]


 88%|████████████████████████████▉    | 43933/50000 [7:58:19<1:04:39,  1.56it/s]


 88%|████████████████████████████▉    | 43934/50000 [7:58:19<1:01:50,  1.63it/s]


 88%|████████████████████████████▉    | 43935/50000 [7:58:20<1:05:28,  1.54it/s]


 88%|████████████████████████████▉    | 43936/50000 [7:58:21<1:05:28,  1.54it/s]


 88%|████████████████████████████▉    | 43937/50000 [7:58:21<1:01:32,  1.64it/s]


 88%|████████████████████████████▉    | 43938/50000 [7:58:22<1:02:08,  1.63it/s]


 88%|████████████████████████████▉    | 43939/50000 [7:58:22<1:05:05,  1.55it/s]


 88%|█████████████████████████████    | 43940/50000 [7:58:23<1:05:21,  1.55it/s]


 88%|█████████████████████████████    | 43941/50000 [7:58:24<1:02:52,  1.61it/s]


 88%|█████████████████████████████    | 43942/50000 [7:58:24<1:06:57,  1.51it/s]


 88%|█████████████████████████████    | 43943/50000 [7:58:25<1:06:07,  1.53it/s]


 88%|█████████████████████████████    | 43944/50000 [7:58:26<1:04:09,  1.57it/s]


 88%|█████████████████████████████    | 43945/50000 [7:58:26<1:04:18,  1.57it/s]


 88%|█████████████████████████████    | 43946/50000 [7:58:27<1:04:59,  1.55it/s]


 88%|█████████████████████████████    | 43947/50000 [7:58:28<1:02:35,  1.61it/s]


 88%|█████████████████████████████    | 43948/50000 [7:58:28<1:03:01,  1.60it/s]


 88%|█████████████████████████████    | 43949/50000 [7:58:29<1:04:39,  1.56it/s]


 88%|█████████████████████████████    | 43950/50000 [7:58:30<1:07:50,  1.49it/s]


 88%|█████████████████████████████    | 43951/50000 [7:58:30<1:09:27,  1.45it/s]


 88%|█████████████████████████████    | 43952/50000 [7:58:31<1:03:54,  1.58it/s]


 88%|█████████████████████████████    | 43953/50000 [7:58:31<1:03:57,  1.58it/s]


 88%|█████████████████████████████    | 43954/50000 [7:58:32<1:06:53,  1.51it/s]


 88%|█████████████████████████████    | 43955/50000 [7:58:33<1:04:07,  1.57it/s]


 88%|█████████████████████████████    | 43956/50000 [7:58:34<1:09:37,  1.45it/s]


 88%|█████████████████████████████    | 43957/50000 [7:58:34<1:05:49,  1.53it/s]


 88%|█████████████████████████████    | 43958/50000 [7:58:35<1:04:06,  1.57it/s]


 88%|█████████████████████████████    | 43959/50000 [7:58:35<1:04:26,  1.56it/s]


 88%|█████████████████████████████    | 43960/50000 [7:58:36<1:03:30,  1.59it/s]


 88%|█████████████████████████████    | 43961/50000 [7:58:37<1:03:36,  1.58it/s]


 88%|█████████████████████████████    | 43962/50000 [7:58:37<1:05:13,  1.54it/s]


 88%|█████████████████████████████    | 43963/50000 [7:58:38<1:03:51,  1.58it/s]


 88%|█████████████████████████████    | 43964/50000 [7:58:38<1:01:34,  1.63it/s]


 88%|█████████████████████████████    | 43965/50000 [7:58:39<1:05:12,  1.54it/s]


 88%|█████████████████████████████    | 43966/50000 [7:58:40<1:02:09,  1.62it/s]


 88%|█████████████████████████████    | 43967/50000 [7:58:40<1:00:00,  1.68it/s]


 88%|█████████████████████████████    | 43968/50000 [7:58:41<1:04:51,  1.55it/s]


 88%|█████████████████████████████    | 43969/50000 [7:58:42<1:04:42,  1.55it/s]


 88%|█████████████████████████████    | 43970/50000 [7:58:42<1:08:20,  1.47it/s]


 88%|█████████████████████████████    | 43971/50000 [7:58:43<1:08:29,  1.47it/s]


 88%|█████████████████████████████    | 43972/50000 [7:58:44<1:09:48,  1.44it/s]


 88%|█████████████████████████████    | 43973/50000 [7:58:45<1:10:28,  1.43it/s]


 88%|█████████████████████████████    | 43974/50000 [7:58:45<1:09:44,  1.44it/s]


 88%|█████████████████████████████    | 43975/50000 [7:58:46<1:11:34,  1.40it/s]


 88%|█████████████████████████████    | 43976/50000 [7:58:47<1:07:05,  1.50it/s]


 88%|█████████████████████████████    | 43977/50000 [7:58:47<1:06:34,  1.51it/s]


 88%|█████████████████████████████    | 43978/50000 [7:58:48<1:08:23,  1.47it/s]


 88%|█████████████████████████████    | 43979/50000 [7:58:49<1:08:04,  1.47it/s]


 88%|█████████████████████████████    | 43980/50000 [7:58:49<1:07:15,  1.49it/s]


 88%|█████████████████████████████    | 43981/50000 [7:58:50<1:07:16,  1.49it/s]


 88%|█████████████████████████████    | 43982/50000 [7:58:51<1:09:14,  1.45it/s]


 88%|█████████████████████████████    | 43983/50000 [7:58:51<1:07:41,  1.48it/s]


 88%|█████████████████████████████    | 43984/50000 [7:58:52<1:05:25,  1.53it/s]


 88%|█████████████████████████████    | 43985/50000 [7:58:53<1:07:40,  1.48it/s]


 88%|█████████████████████████████    | 43986/50000 [7:58:53<1:07:46,  1.48it/s]


 88%|█████████████████████████████    | 43987/50000 [7:58:54<1:06:38,  1.50it/s]


 88%|█████████████████████████████    | 43988/50000 [7:58:55<1:05:27,  1.53it/s]


 88%|█████████████████████████████    | 43989/50000 [7:58:55<1:03:52,  1.57it/s]


 88%|█████████████████████████████    | 43990/50000 [7:58:56<1:04:06,  1.56it/s]


 88%|█████████████████████████████    | 43991/50000 [7:58:57<1:06:26,  1.51it/s]


 88%|█████████████████████████████    | 43992/50000 [7:58:57<1:11:35,  1.40it/s]


 88%|█████████████████████████████    | 43993/50000 [7:58:58<1:10:03,  1.43it/s]


 88%|█████████████████████████████    | 43994/50000 [7:58:59<1:06:31,  1.50it/s]


 88%|█████████████████████████████    | 43995/50000 [7:58:59<1:07:50,  1.48it/s]


 88%|█████████████████████████████    | 43996/50000 [7:59:00<1:07:15,  1.49it/s]


 88%|█████████████████████████████    | 43997/50000 [7:59:01<1:07:24,  1.48it/s]


 88%|█████████████████████████████    | 43998/50000 [7:59:01<1:02:50,  1.59it/s]


 88%|█████████████████████████████    | 43999/50000 [7:59:02<1:02:35,  1.60it/s]


 88%|█████████████████████████████    | 44000/50000 [7:59:03<1:04:07,  1.56it/s]
                                                                                
{'loss': 3.1426, 'grad_norm': 2.924182415008545, 'learning_rate': 0.00012, 'epoch': 2.3}

 88%|█████████████████████████████    | 44000/50000 [7:59:03<1:04:07,  1.56it/s]


 88%|█████████████████████████████    | 44001/50000 [7:59:03<1:06:09,  1.51it/s]


 88%|█████████████████████████████    | 44002/50000 [7:59:04<1:03:46,  1.57it/s]


 88%|█████████████████████████████    | 44003/50000 [7:59:05<1:06:43,  1.50it/s]


 88%|█████████████████████████████    | 44004/50000 [7:59:05<1:01:46,  1.62it/s]


 88%|█████████████████████████████    | 44005/50000 [7:59:06<1:02:22,  1.60it/s]


 88%|█████████████████████████████    | 44006/50000 [7:59:06<1:00:54,  1.64it/s]


 88%|█████████████████████████████    | 44007/50000 [7:59:07<1:09:34,  1.44it/s]


 88%|█████████████████████████████    | 44008/50000 [7:59:08<1:13:54,  1.35it/s]


 88%|█████████████████████████████    | 44009/50000 [7:59:09<1:08:48,  1.45it/s]


 88%|█████████████████████████████    | 44010/50000 [7:59:09<1:05:24,  1.53it/s]


 88%|█████████████████████████████    | 44011/50000 [7:59:10<1:05:52,  1.52it/s]


 88%|█████████████████████████████    | 44012/50000 [7:59:10<1:05:14,  1.53it/s]


 88%|█████████████████████████████    | 44013/50000 [7:59:11<1:04:47,  1.54it/s]


 88%|█████████████████████████████    | 44014/50000 [7:59:12<1:02:58,  1.58it/s]


 88%|█████████████████████████████    | 44015/50000 [7:59:12<1:02:52,  1.59it/s]


 88%|█████████████████████████████    | 44016/50000 [7:59:13<1:06:07,  1.51it/s]


 88%|█████████████████████████████    | 44017/50000 [7:59:14<1:08:57,  1.45it/s]


 88%|█████████████████████████████    | 44018/50000 [7:59:14<1:08:24,  1.46it/s]


 88%|█████████████████████████████    | 44019/50000 [7:59:15<1:07:38,  1.47it/s]


 88%|█████████████████████████████    | 44020/50000 [7:59:16<1:06:41,  1.49it/s]


 88%|█████████████████████████████    | 44021/50000 [7:59:17<1:08:44,  1.45it/s]


 88%|█████████████████████████████    | 44022/50000 [7:59:17<1:06:19,  1.50it/s]


 88%|█████████████████████████████    | 44023/50000 [7:59:18<1:04:03,  1.55it/s]


 88%|█████████████████████████████    | 44024/50000 [7:59:19<1:09:40,  1.43it/s]


 88%|█████████████████████████████    | 44025/50000 [7:59:19<1:12:59,  1.36it/s]


 88%|█████████████████████████████    | 44026/50000 [7:59:20<1:06:20,  1.50it/s]


 88%|█████████████████████████████    | 44027/50000 [7:59:21<1:08:01,  1.46it/s]


 88%|█████████████████████████████    | 44028/50000 [7:59:21<1:09:13,  1.44it/s]


 88%|█████████████████████████████    | 44029/50000 [7:59:22<1:08:17,  1.46it/s]


 88%|█████████████████████████████    | 44030/50000 [7:59:23<1:07:26,  1.48it/s]


 88%|█████████████████████████████    | 44031/50000 [7:59:23<1:07:01,  1.48it/s]


 88%|█████████████████████████████    | 44032/50000 [7:59:24<1:07:19,  1.48it/s]


 88%|█████████████████████████████    | 44033/50000 [7:59:25<1:09:28,  1.43it/s]


 88%|█████████████████████████████    | 44034/50000 [7:59:25<1:10:02,  1.42it/s]


 88%|█████████████████████████████    | 44035/50000 [7:59:26<1:13:20,  1.36it/s]


 88%|█████████████████████████████    | 44036/50000 [7:59:27<1:09:15,  1.44it/s]


 88%|█████████████████████████████    | 44037/50000 [7:59:28<1:08:39,  1.45it/s]


 88%|█████████████████████████████    | 44038/50000 [7:59:28<1:04:28,  1.54it/s]


 88%|█████████████████████████████    | 44039/50000 [7:59:29<1:03:59,  1.55it/s]


 88%|█████████████████████████████    | 44040/50000 [7:59:29<1:02:19,  1.59it/s]


 88%|██████████████████████████████▊    | 44041/50000 [7:59:30<59:10,  1.68it/s]


 88%|█████████████████████████████    | 44042/50000 [7:59:31<1:00:43,  1.64it/s]


 88%|█████████████████████████████    | 44043/50000 [7:59:31<1:04:10,  1.55it/s]


 88%|█████████████████████████████    | 44044/50000 [7:59:32<1:04:33,  1.54it/s]


 88%|█████████████████████████████    | 44045/50000 [7:59:33<1:03:08,  1.57it/s]


 88%|█████████████████████████████    | 44046/50000 [7:59:33<1:06:22,  1.50it/s]


 88%|█████████████████████████████    | 44047/50000 [7:59:34<1:06:52,  1.48it/s]


 88%|█████████████████████████████    | 44048/50000 [7:59:35<1:06:38,  1.49it/s]


 88%|█████████████████████████████    | 44049/50000 [7:59:35<1:09:33,  1.43it/s]


 88%|█████████████████████████████    | 44050/50000 [7:59:36<1:05:30,  1.51it/s]


 88%|█████████████████████████████    | 44051/50000 [7:59:37<1:02:40,  1.58it/s]


 88%|██████████████████████████████▊    | 44052/50000 [7:59:37<59:06,  1.68it/s]


 88%|██████████████████████████████▊    | 44053/50000 [7:59:37<55:20,  1.79it/s]


 88%|█████████████████████████████    | 44054/50000 [7:59:38<1:01:38,  1.61it/s]


 88%|█████████████████████████████    | 44055/50000 [7:59:39<1:05:48,  1.51it/s]


 88%|█████████████████████████████    | 44056/50000 [7:59:40<1:01:34,  1.61it/s]


 88%|█████████████████████████████    | 44057/50000 [7:59:40<1:00:13,  1.64it/s]


 88%|█████████████████████████████    | 44058/50000 [7:59:41<1:00:00,  1.65it/s]


 88%|█████████████████████████████    | 44059/50000 [7:59:41<1:00:36,  1.63it/s]


 88%|██████████████████████████████▊    | 44060/50000 [7:59:42<57:33,  1.72it/s]


 88%|█████████████████████████████    | 44061/50000 [7:59:43<1:00:22,  1.64it/s]


 88%|█████████████████████████████    | 44062/50000 [7:59:43<1:02:21,  1.59it/s]


 88%|█████████████████████████████    | 44063/50000 [7:59:44<1:02:15,  1.59it/s]


 88%|█████████████████████████████    | 44064/50000 [7:59:44<1:02:55,  1.57it/s]


 88%|█████████████████████████████    | 44065/50000 [7:59:45<1:01:53,  1.60it/s]


 88%|█████████████████████████████    | 44066/50000 [7:59:46<1:00:57,  1.62it/s]


 88%|█████████████████████████████    | 44067/50000 [7:59:46<1:02:56,  1.57it/s]


 88%|█████████████████████████████    | 44068/50000 [7:59:47<1:06:40,  1.48it/s]


 88%|█████████████████████████████    | 44069/50000 [7:59:48<1:02:48,  1.57it/s]


 88%|█████████████████████████████    | 44070/50000 [7:59:48<1:05:38,  1.51it/s]


 88%|█████████████████████████████    | 44071/50000 [7:59:49<1:04:32,  1.53it/s]


 88%|█████████████████████████████    | 44072/50000 [7:59:50<1:05:15,  1.51it/s]


 88%|█████████████████████████████    | 44073/50000 [7:59:50<1:08:04,  1.45it/s]


 88%|█████████████████████████████    | 44074/50000 [7:59:51<1:07:09,  1.47it/s]


 88%|█████████████████████████████    | 44075/50000 [7:59:52<1:03:07,  1.56it/s]


 88%|██████████████████████████████▊    | 44076/50000 [7:59:52<57:59,  1.70it/s]


 88%|█████████████████████████████    | 44077/50000 [7:59:53<1:00:50,  1.62it/s]


 88%|█████████████████████████████    | 44078/50000 [7:59:54<1:04:58,  1.52it/s]


 88%|█████████████████████████████    | 44079/50000 [7:59:54<1:04:45,  1.52it/s]


 88%|█████████████████████████████    | 44080/50000 [7:59:55<1:07:07,  1.47it/s]


 88%|█████████████████████████████    | 44081/50000 [7:59:56<1:06:15,  1.49it/s]


 88%|█████████████████████████████    | 44082/50000 [7:59:56<1:05:28,  1.51it/s]


 88%|█████████████████████████████    | 44083/50000 [7:59:57<1:06:02,  1.49it/s]


 88%|█████████████████████████████    | 44084/50000 [7:59:57<1:02:20,  1.58it/s]


 88%|█████████████████████████████    | 44085/50000 [7:59:58<1:00:07,  1.64it/s]


 88%|█████████████████████████████    | 44086/50000 [7:59:59<1:00:36,  1.63it/s]


 88%|█████████████████████████████    | 44087/50000 [7:59:59<1:02:15,  1.58it/s]


 88%|█████████████████████████████    | 44088/50000 [8:00:00<1:03:12,  1.56it/s]


 88%|█████████████████████████████    | 44089/50000 [8:00:01<1:00:53,  1.62it/s]


 88%|█████████████████████████████    | 44090/50000 [8:00:01<1:00:00,  1.64it/s]


 88%|██████████████████████████████▊    | 44091/50000 [8:00:02<59:12,  1.66it/s]


 88%|█████████████████████████████    | 44092/50000 [8:00:02<1:01:26,  1.60it/s]


 88%|█████████████████████████████    | 44093/50000 [8:00:03<1:02:14,  1.58it/s]


 88%|█████████████████████████████    | 44094/50000 [8:00:04<1:05:41,  1.50it/s]


 88%|█████████████████████████████    | 44095/50000 [8:00:05<1:07:54,  1.45it/s]


 88%|█████████████████████████████    | 44096/50000 [8:00:05<1:04:19,  1.53it/s]


 88%|██████████████████████████████▊    | 44097/50000 [8:00:06<59:22,  1.66it/s]


 88%|██████████████████████████████▊    | 44098/50000 [8:00:06<58:22,  1.69it/s]


 88%|██████████████████████████████▊    | 44099/50000 [8:00:07<57:28,  1.71it/s]


 88%|██████████████████████████████▊    | 44100/50000 [8:00:07<59:05,  1.66it/s]
                                                                                
{'loss': 3.1126, 'grad_norm': 4.854515075683594, 'learning_rate': 0.000118, 'epoch': 2.31}

 88%|██████████████████████████████▊    | 44100/50000 [8:00:07<59:05,  1.66it/s]


 88%|█████████████████████████████    | 44101/50000 [8:00:08<1:00:43,  1.62it/s]


 88%|██████████████████████████████▊    | 44102/50000 [8:00:09<57:14,  1.72it/s]


 88%|█████████████████████████████    | 44103/50000 [8:00:09<1:01:13,  1.61it/s]


 88%|█████████████████████████████    | 44104/50000 [8:00:10<1:02:04,  1.58it/s]


 88%|█████████████████████████████    | 44105/50000 [8:00:11<1:00:58,  1.61it/s]


 88%|█████████████████████████████    | 44106/50000 [8:00:11<1:09:20,  1.42it/s]


 88%|█████████████████████████████    | 44107/50000 [8:00:12<1:07:22,  1.46it/s]


 88%|█████████████████████████████    | 44108/50000 [8:00:13<1:06:21,  1.48it/s]


 88%|█████████████████████████████    | 44109/50000 [8:00:13<1:03:02,  1.56it/s]


 88%|█████████████████████████████    | 44110/50000 [8:00:14<1:02:33,  1.57it/s]


 88%|█████████████████████████████    | 44111/50000 [8:00:15<1:05:05,  1.51it/s]


 88%|█████████████████████████████    | 44112/50000 [8:00:15<1:03:14,  1.55it/s]


 88%|██████████████████████████████▉    | 44113/50000 [8:00:16<59:38,  1.65it/s]


 88%|██████████████████████████████▉    | 44114/50000 [8:00:16<59:26,  1.65it/s]


 88%|██████████████████████████████▉    | 44115/50000 [8:00:17<59:27,  1.65it/s]


 88%|█████████████████████████████    | 44116/50000 [8:00:18<1:00:37,  1.62it/s]


 88%|█████████████████████████████    | 44117/50000 [8:00:18<1:02:09,  1.58it/s]


 88%|█████████████████████████████    | 44118/50000 [8:00:19<1:03:24,  1.55it/s]


 88%|█████████████████████████████    | 44119/50000 [8:00:20<1:02:03,  1.58it/s]


 88%|█████████████████████████████    | 44120/50000 [8:00:20<1:00:26,  1.62it/s]


 88%|██████████████████████████████▉    | 44121/50000 [8:00:21<56:53,  1.72it/s]


 88%|██████████████████████████████▉    | 44122/50000 [8:00:21<58:15,  1.68it/s]


 88%|██████████████████████████████▉    | 44123/50000 [8:00:22<54:50,  1.79it/s]


 88%|██████████████████████████████▉    | 44124/50000 [8:00:22<56:59,  1.72it/s]


 88%|██████████████████████████████▉    | 44125/50000 [8:00:23<58:29,  1.67it/s]


 88%|██████████████████████████████▉    | 44126/50000 [8:00:24<57:50,  1.69it/s]


 88%|██████████████████████████████▉    | 44127/50000 [8:00:24<59:27,  1.65it/s]


 88%|█████████████████████████████    | 44128/50000 [8:00:25<1:02:41,  1.56it/s]


 88%|█████████████████████████████▏   | 44129/50000 [8:00:26<1:01:21,  1.59it/s]


 88%|█████████████████████████████▏   | 44130/50000 [8:00:26<1:00:24,  1.62it/s]


 88%|█████████████████████████████▏   | 44131/50000 [8:00:27<1:00:51,  1.61it/s]


 88%|█████████████████████████████▏   | 44132/50000 [8:00:27<1:03:24,  1.54it/s]


 88%|█████████████████████████████▏   | 44133/50000 [8:00:28<1:03:19,  1.54it/s]


 88%|█████████████████████████████▏   | 44134/50000 [8:00:29<1:00:39,  1.61it/s]


 88%|█████████████████████████████▏   | 44135/50000 [8:00:29<1:01:11,  1.60it/s]


 88%|█████████████████████████████▏   | 44136/50000 [8:00:30<1:07:19,  1.45it/s]


 88%|█████████████████████████████▏   | 44137/50000 [8:00:31<1:05:40,  1.49it/s]


 88%|█████████████████████████████▏   | 44138/50000 [8:00:31<1:03:23,  1.54it/s]


 88%|█████████████████████████████▏   | 44139/50000 [8:00:32<1:03:39,  1.53it/s]


 88%|█████████████████████████████▏   | 44140/50000 [8:00:33<1:01:36,  1.59it/s]


 88%|█████████████████████████████▏   | 44141/50000 [8:00:33<1:05:59,  1.48it/s]


 88%|█████████████████████████████▏   | 44142/50000 [8:00:34<1:05:50,  1.48it/s]


 88%|█████████████████████████████▏   | 44143/50000 [8:00:35<1:04:37,  1.51it/s]


 88%|█████████████████████████████▏   | 44144/50000 [8:00:35<1:03:23,  1.54it/s]


 88%|█████████████████████████████▏   | 44145/50000 [8:00:36<1:01:57,  1.57it/s]


 88%|█████████████████████████████▏   | 44146/50000 [8:00:37<1:01:32,  1.59it/s]


 88%|█████████████████████████████▏   | 44147/50000 [8:00:37<1:04:04,  1.52it/s]


 88%|█████████████████████████████▏   | 44148/50000 [8:00:38<1:01:42,  1.58it/s]


 88%|█████████████████████████████▏   | 44149/50000 [8:00:38<1:01:53,  1.58it/s]


 88%|█████████████████████████████▏   | 44150/50000 [8:00:39<1:01:49,  1.58it/s]


 88%|█████████████████████████████▏   | 44151/50000 [8:00:40<1:04:00,  1.52it/s]


 88%|█████████████████████████████▏   | 44152/50000 [8:00:41<1:06:09,  1.47it/s]


 88%|█████████████████████████████▏   | 44153/50000 [8:00:41<1:05:47,  1.48it/s]


 88%|█████████████████████████████▏   | 44154/50000 [8:00:42<1:05:08,  1.50it/s]


 88%|█████████████████████████████▏   | 44155/50000 [8:00:43<1:04:38,  1.51it/s]


 88%|█████████████████████████████▏   | 44156/50000 [8:00:43<1:05:58,  1.48it/s]


 88%|█████████████████████████████▏   | 44157/50000 [8:00:44<1:12:26,  1.34it/s]


 88%|█████████████████████████████▏   | 44158/50000 [8:00:45<1:05:24,  1.49it/s]


 88%|█████████████████████████████▏   | 44159/50000 [8:00:45<1:04:07,  1.52it/s]


 88%|█████████████████████████████▏   | 44160/50000 [8:00:46<1:07:34,  1.44it/s]


 88%|█████████████████████████████▏   | 44161/50000 [8:00:47<1:09:24,  1.40it/s]


 88%|█████████████████████████████▏   | 44162/50000 [8:00:48<1:10:52,  1.37it/s]


 88%|█████████████████████████████▏   | 44163/50000 [8:00:48<1:08:00,  1.43it/s]


 88%|█████████████████████████████▏   | 44164/50000 [8:00:49<1:05:56,  1.48it/s]


 88%|█████████████████████████████▏   | 44165/50000 [8:00:49<1:03:33,  1.53it/s]


 88%|█████████████████████████████▏   | 44166/50000 [8:00:50<1:10:35,  1.38it/s]


 88%|█████████████████████████████▏   | 44167/50000 [8:00:51<1:08:05,  1.43it/s]


 88%|█████████████████████████████▏   | 44168/50000 [8:00:52<1:08:38,  1.42it/s]


 88%|█████████████████████████████▏   | 44169/50000 [8:00:52<1:07:05,  1.45it/s]


 88%|█████████████████████████████▏   | 44170/50000 [8:00:53<1:03:36,  1.53it/s]


 88%|█████████████████████████████▏   | 44171/50000 [8:00:54<1:06:14,  1.47it/s]


 88%|█████████████████████████████▏   | 44172/50000 [8:00:54<1:08:26,  1.42it/s]


 88%|█████████████████████████████▏   | 44173/50000 [8:00:55<1:06:25,  1.46it/s]


 88%|█████████████████████████████▏   | 44174/50000 [8:00:56<1:03:32,  1.53it/s]


 88%|█████████████████████████████▏   | 44175/50000 [8:00:56<1:03:08,  1.54it/s]


 88%|█████████████████████████████▏   | 44176/50000 [8:00:57<1:00:39,  1.60it/s]


 88%|█████████████████████████████▏   | 44177/50000 [8:00:57<1:00:56,  1.59it/s]


 88%|█████████████████████████████▏   | 44178/50000 [8:00:58<1:03:36,  1.53it/s]


 88%|█████████████████████████████▏   | 44179/50000 [8:00:59<1:01:42,  1.57it/s]


 88%|█████████████████████████████▏   | 44180/50000 [8:00:59<1:00:29,  1.60it/s]


 88%|█████████████████████████████▏   | 44181/50000 [8:01:00<1:04:55,  1.49it/s]


 88%|█████████████████████████████▏   | 44182/50000 [8:01:01<1:04:50,  1.50it/s]


 88%|█████████████████████████████▏   | 44183/50000 [8:01:01<1:04:59,  1.49it/s]


 88%|█████████████████████████████▏   | 44184/50000 [8:01:02<1:04:53,  1.49it/s]


 88%|█████████████████████████████▏   | 44185/50000 [8:01:03<1:04:14,  1.51it/s]


 88%|█████████████████████████████▏   | 44186/50000 [8:01:04<1:08:44,  1.41it/s]


 88%|█████████████████████████████▏   | 44187/50000 [8:01:04<1:10:19,  1.38it/s]


 88%|█████████████████████████████▏   | 44188/50000 [8:01:05<1:07:19,  1.44it/s]


 88%|█████████████████████████████▏   | 44189/50000 [8:01:06<1:06:29,  1.46it/s]


 88%|█████████████████████████████▏   | 44190/50000 [8:01:06<1:03:20,  1.53it/s]


 88%|█████████████████████████████▏   | 44191/50000 [8:01:07<1:00:45,  1.59it/s]


 88%|█████████████████████████████▏   | 44192/50000 [8:01:07<1:00:58,  1.59it/s]


 88%|█████████████████████████████▏   | 44193/50000 [8:01:08<1:04:39,  1.50it/s]


 88%|█████████████████████████████▏   | 44194/50000 [8:01:09<1:08:49,  1.41it/s]


 88%|█████████████████████████████▏   | 44195/50000 [8:01:10<1:07:41,  1.43it/s]


 88%|█████████████████████████████▏   | 44196/50000 [8:01:10<1:05:48,  1.47it/s]


 88%|█████████████████████████████▏   | 44197/50000 [8:01:11<1:05:12,  1.48it/s]


 88%|█████████████████████████████▏   | 44198/50000 [8:01:12<1:03:18,  1.53it/s]


 88%|█████████████████████████████▏   | 44199/50000 [8:01:12<1:03:16,  1.53it/s]


 88%|█████████████████████████████▏   | 44200/50000 [8:01:13<1:02:52,  1.54it/s]
                                                                                
{'loss': 3.1489, 'grad_norm': 5.844295978546143, 'learning_rate': 0.00011600000000000001, 'epoch': 2.31}

 88%|█████████████████████████████▏   | 44200/50000 [8:01:13<1:02:52,  1.54it/s]


 88%|█████████████████████████████▏   | 44201/50000 [8:01:13<1:01:08,  1.58it/s]


 88%|██████████████████████████████▉    | 44202/50000 [8:01:14<59:23,  1.63it/s]


 88%|█████████████████████████████▏   | 44203/50000 [8:01:15<1:01:26,  1.57it/s]


 88%|█████████████████████████████▏   | 44204/50000 [8:01:15<1:03:52,  1.51it/s]


 88%|█████████████████████████████▏   | 44205/50000 [8:01:16<1:02:54,  1.54it/s]


 88%|█████████████████████████████▏   | 44206/50000 [8:01:17<1:03:20,  1.52it/s]


 88%|█████████████████████████████▏   | 44207/50000 [8:01:17<1:01:36,  1.57it/s]


 88%|██████████████████████████████▉    | 44208/50000 [8:01:18<57:56,  1.67it/s]


 88%|██████████████████████████████▉    | 44209/50000 [8:01:19<59:50,  1.61it/s]


 88%|██████████████████████████████▉    | 44210/50000 [8:01:19<59:09,  1.63it/s]


 88%|█████████████████████████████▏   | 44211/50000 [8:01:20<1:00:20,  1.60it/s]


 88%|█████████████████████████████▏   | 44212/50000 [8:01:20<1:01:42,  1.56it/s]


 88%|█████████████████████████████▏   | 44213/50000 [8:01:21<1:00:05,  1.60it/s]


 88%|█████████████████████████████▏   | 44214/50000 [8:01:22<1:00:57,  1.58it/s]


 88%|█████████████████████████████▏   | 44215/50000 [8:01:22<1:04:12,  1.50it/s]


 88%|█████████████████████████████▏   | 44216/50000 [8:01:23<1:03:50,  1.51it/s]


 88%|█████████████████████████████▏   | 44217/50000 [8:01:24<1:01:35,  1.56it/s]


 88%|█████████████████████████████▏   | 44218/50000 [8:01:24<1:02:57,  1.53it/s]


 88%|█████████████████████████████▏   | 44219/50000 [8:01:25<1:02:57,  1.53it/s]


 88%|█████████████████████████████▏   | 44220/50000 [8:01:26<1:00:35,  1.59it/s]


 88%|█████████████████████████████▏   | 44221/50000 [8:01:26<1:07:00,  1.44it/s]


 88%|█████████████████████████████▏   | 44222/50000 [8:01:27<1:04:54,  1.48it/s]


 88%|█████████████████████████████▏   | 44223/50000 [8:01:28<1:01:56,  1.55it/s]


 88%|█████████████████████████████▏   | 44224/50000 [8:01:28<1:05:25,  1.47it/s]


 88%|█████████████████████████████▏   | 44225/50000 [8:01:29<1:12:22,  1.33it/s]


 88%|█████████████████████████████▏   | 44226/50000 [8:01:30<1:06:48,  1.44it/s]


 88%|█████████████████████████████▏   | 44227/50000 [8:01:30<1:04:08,  1.50it/s]


 88%|█████████████████████████████▏   | 44228/50000 [8:01:31<1:02:24,  1.54it/s]


 88%|█████████████████████████████▏   | 44229/50000 [8:01:32<1:00:07,  1.60it/s]


 88%|█████████████████████████████▏   | 44230/50000 [8:01:32<1:01:26,  1.57it/s]


 88%|█████████████████████████████▏   | 44231/50000 [8:01:33<1:04:25,  1.49it/s]


 88%|█████████████████████████████▏   | 44232/50000 [8:01:34<1:03:52,  1.50it/s]


 88%|█████████████████████████████▏   | 44233/50000 [8:01:35<1:07:15,  1.43it/s]


 88%|█████████████████████████████▏   | 44234/50000 [8:01:35<1:10:58,  1.35it/s]


 88%|█████████████████████████████▏   | 44235/50000 [8:01:36<1:10:42,  1.36it/s]


 88%|█████████████████████████████▏   | 44236/50000 [8:01:37<1:09:01,  1.39it/s]


 88%|█████████████████████████████▏   | 44237/50000 [8:01:37<1:07:49,  1.42it/s]


 88%|█████████████████████████████▏   | 44238/50000 [8:01:38<1:04:37,  1.49it/s]


 88%|█████████████████████████████▏   | 44239/50000 [8:01:39<1:01:50,  1.55it/s]


 88%|█████████████████████████████▏   | 44240/50000 [8:01:39<1:04:18,  1.49it/s]


 88%|█████████████████████████████▏   | 44241/50000 [8:01:40<1:01:58,  1.55it/s]


 88%|██████████████████████████████▉    | 44242/50000 [8:01:40<59:25,  1.61it/s]


 88%|█████████████████████████████▏   | 44243/50000 [8:01:41<1:00:11,  1.59it/s]


 88%|██████████████████████████████▉    | 44244/50000 [8:01:42<59:17,  1.62it/s]


 88%|██████████████████████████████▉    | 44245/50000 [8:01:42<58:37,  1.64it/s]


 88%|██████████████████████████████▉    | 44246/50000 [8:01:43<59:58,  1.60it/s]


 88%|█████████████████████████████▏   | 44247/50000 [8:01:44<1:00:54,  1.57it/s]


 88%|█████████████████████████████▏   | 44248/50000 [8:01:45<1:08:29,  1.40it/s]


 88%|█████████████████████████████▏   | 44249/50000 [8:01:45<1:08:45,  1.39it/s]


 88%|█████████████████████████████▏   | 44250/50000 [8:01:46<1:05:13,  1.47it/s]


 89%|█████████████████████████████▏   | 44251/50000 [8:01:47<1:09:52,  1.37it/s]


 89%|█████████████████████████████▏   | 44252/50000 [8:01:47<1:10:04,  1.37it/s]


 89%|█████████████████████████████▏   | 44253/50000 [8:01:48<1:05:58,  1.45it/s]


 89%|█████████████████████████████▏   | 44254/50000 [8:01:49<1:07:41,  1.41it/s]


 89%|█████████████████████████████▏   | 44255/50000 [8:01:49<1:08:10,  1.40it/s]


 89%|█████████████████████████████▏   | 44256/50000 [8:01:50<1:11:20,  1.34it/s]


 89%|█████████████████████████████▏   | 44257/50000 [8:01:51<1:08:27,  1.40it/s]


 89%|█████████████████████████████▏   | 44258/50000 [8:01:52<1:06:16,  1.44it/s]


 89%|█████████████████████████████▏   | 44259/50000 [8:01:52<1:03:26,  1.51it/s]


 89%|█████████████████████████████▏   | 44260/50000 [8:01:53<1:05:43,  1.46it/s]


 89%|█████████████████████████████▏   | 44261/50000 [8:01:54<1:09:25,  1.38it/s]


 89%|█████████████████████████████▏   | 44262/50000 [8:01:54<1:08:11,  1.40it/s]


 89%|█████████████████████████████▏   | 44263/50000 [8:01:55<1:06:37,  1.44it/s]


 89%|█████████████████████████████▏   | 44264/50000 [8:01:56<1:03:52,  1.50it/s]


 89%|█████████████████████████████▏   | 44265/50000 [8:01:56<1:06:42,  1.43it/s]


 89%|█████████████████████████████▏   | 44266/50000 [8:01:57<1:05:24,  1.46it/s]


 89%|█████████████████████████████▏   | 44267/50000 [8:01:58<1:03:56,  1.49it/s]


 89%|█████████████████████████████▏   | 44268/50000 [8:01:58<1:03:59,  1.49it/s]


 89%|█████████████████████████████▏   | 44269/50000 [8:01:59<1:01:10,  1.56it/s]


 89%|██████████████████████████████▉    | 44270/50000 [8:02:00<59:34,  1.60it/s]


 89%|██████████████████████████████▉    | 44271/50000 [8:02:00<58:41,  1.63it/s]


 89%|██████████████████████████████▉    | 44272/50000 [8:02:01<59:42,  1.60it/s]


 89%|█████████████████████████████▏   | 44273/50000 [8:02:01<1:00:30,  1.58it/s]


 89%|█████████████████████████████▏   | 44274/50000 [8:02:02<1:01:15,  1.56it/s]


 89%|█████████████████████████████▏   | 44275/50000 [8:02:03<1:01:57,  1.54it/s]


 89%|█████████████████████████████▏   | 44276/50000 [8:02:04<1:09:52,  1.37it/s]


 89%|█████████████████████████████▏   | 44277/50000 [8:02:04<1:08:08,  1.40it/s]


 89%|█████████████████████████████▏   | 44278/50000 [8:02:05<1:06:57,  1.42it/s]


 89%|█████████████████████████████▏   | 44279/50000 [8:02:06<1:05:22,  1.46it/s]


 89%|█████████████████████████████▏   | 44280/50000 [8:02:06<1:02:59,  1.51it/s]


 89%|█████████████████████████████▏   | 44281/50000 [8:02:07<1:07:59,  1.40it/s]


 89%|█████████████████████████████▏   | 44282/50000 [8:02:08<1:04:50,  1.47it/s]


 89%|█████████████████████████████▏   | 44283/50000 [8:02:08<1:03:20,  1.50it/s]


 89%|█████████████████████████████▏   | 44284/50000 [8:02:09<1:02:49,  1.52it/s]


 89%|█████████████████████████████▏   | 44285/50000 [8:02:10<1:03:11,  1.51it/s]


 89%|█████████████████████████████▏   | 44286/50000 [8:02:10<1:03:13,  1.51it/s]


 89%|█████████████████████████████▏   | 44287/50000 [8:02:11<1:05:17,  1.46it/s]


 89%|█████████████████████████████▏   | 44288/50000 [8:02:12<1:07:38,  1.41it/s]


 89%|█████████████████████████████▏   | 44289/50000 [8:02:13<1:05:26,  1.45it/s]


 89%|█████████████████████████████▏   | 44290/50000 [8:02:13<1:00:21,  1.58it/s]


 89%|█████████████████████████████▏   | 44291/50000 [8:02:14<1:06:25,  1.43it/s]


 89%|█████████████████████████████▏   | 44292/50000 [8:02:15<1:04:53,  1.47it/s]


 89%|█████████████████████████████▏   | 44293/50000 [8:02:15<1:02:23,  1.52it/s]


 89%|█████████████████████████████▏   | 44294/50000 [8:02:16<1:00:37,  1.57it/s]


 89%|█████████████████████████████▏   | 44295/50000 [8:02:16<1:04:21,  1.48it/s]


 89%|█████████████████████████████▏   | 44296/50000 [8:02:17<1:06:36,  1.43it/s]


 89%|█████████████████████████████▏   | 44297/50000 [8:02:18<1:03:36,  1.49it/s]


 89%|█████████████████████████████▏   | 44298/50000 [8:02:18<1:01:38,  1.54it/s]


 89%|███████████████████████████████    | 44299/50000 [8:02:19<59:45,  1.59it/s]


 89%|███████████████████████████████    | 44300/50000 [8:02:20<59:56,  1.58it/s]
                                                                                
{'loss': 3.1335, 'grad_norm': 3.1468563079833984, 'learning_rate': 0.000114, 'epoch': 2.32}

 89%|███████████████████████████████    | 44300/50000 [8:02:20<59:56,  1.58it/s]


 89%|█████████████████████████████▏   | 44301/50000 [8:02:21<1:07:59,  1.40it/s]


 89%|█████████████████████████████▏   | 44302/50000 [8:02:21<1:07:04,  1.42it/s]


 89%|█████████████████████████████▏   | 44303/50000 [8:02:22<1:02:26,  1.52it/s]


 89%|█████████████████████████████▏   | 44304/50000 [8:02:22<1:03:57,  1.48it/s]


 89%|█████████████████████████████▏   | 44305/50000 [8:02:23<1:05:00,  1.46it/s]


 89%|█████████████████████████████▏   | 44306/50000 [8:02:24<1:01:53,  1.53it/s]


 89%|█████████████████████████████▏   | 44307/50000 [8:02:25<1:04:27,  1.47it/s]


 89%|█████████████████████████████▏   | 44308/50000 [8:02:25<1:01:32,  1.54it/s]


 89%|█████████████████████████████▏   | 44309/50000 [8:02:26<1:02:00,  1.53it/s]


 89%|█████████████████████████████▏   | 44310/50000 [8:02:26<1:00:18,  1.57it/s]


 89%|█████████████████████████████▏   | 44311/50000 [8:02:27<1:03:19,  1.50it/s]


 89%|█████████████████████████████▏   | 44312/50000 [8:02:28<1:00:42,  1.56it/s]


 89%|███████████████████████████████    | 44313/50000 [8:02:28<59:14,  1.60it/s]


 89%|█████████████████████████████▏   | 44314/50000 [8:02:29<1:01:36,  1.54it/s]


 89%|███████████████████████████████    | 44315/50000 [8:02:30<59:40,  1.59it/s]


 89%|███████████████████████████████    | 44316/50000 [8:02:30<59:34,  1.59it/s]


 89%|█████████████████████████████▏   | 44317/50000 [8:02:31<1:00:24,  1.57it/s]


 89%|███████████████████████████████    | 44318/50000 [8:02:31<59:54,  1.58it/s]


 89%|█████████████████████████████▎   | 44319/50000 [8:02:32<1:06:23,  1.43it/s]


 89%|█████████████████████████████▎   | 44320/50000 [8:02:33<1:02:17,  1.52it/s]


 89%|█████████████████████████████▎   | 44321/50000 [8:02:34<1:03:01,  1.50it/s]


 89%|█████████████████████████████▎   | 44322/50000 [8:02:34<1:00:39,  1.56it/s]


 89%|█████████████████████████████▎   | 44323/50000 [8:02:35<1:01:44,  1.53it/s]


 89%|█████████████████████████████▎   | 44324/50000 [8:02:36<1:09:48,  1.36it/s]


 89%|█████████████████████████████▎   | 44325/50000 [8:02:36<1:07:21,  1.40it/s]


 89%|█████████████████████████████▎   | 44326/50000 [8:02:37<1:03:40,  1.49it/s]


 89%|█████████████████████████████▎   | 44327/50000 [8:02:38<1:00:54,  1.55it/s]


 89%|█████████████████████████████▎   | 44328/50000 [8:02:38<1:02:45,  1.51it/s]


 89%|█████████████████████████████▎   | 44329/50000 [8:02:39<1:00:37,  1.56it/s]


 89%|███████████████████████████████    | 44330/50000 [8:02:39<58:57,  1.60it/s]


 89%|█████████████████████████████▎   | 44331/50000 [8:02:40<1:00:06,  1.57it/s]


 89%|███████████████████████████████    | 44332/50000 [8:02:41<58:45,  1.61it/s]


 89%|███████████████████████████████    | 44333/50000 [8:02:41<59:13,  1.59it/s]


 89%|███████████████████████████████    | 44334/50000 [8:02:42<58:07,  1.62it/s]


 89%|███████████████████████████████    | 44335/50000 [8:02:43<57:33,  1.64it/s]


 89%|███████████████████████████████    | 44336/50000 [8:02:43<56:36,  1.67it/s]


 89%|███████████████████████████████    | 44337/50000 [8:02:44<56:27,  1.67it/s]


 89%|███████████████████████████████    | 44338/50000 [8:02:44<59:49,  1.58it/s]


 89%|█████████████████████████████▎   | 44339/50000 [8:02:45<1:00:41,  1.55it/s]


 89%|█████████████████████████████▎   | 44340/50000 [8:02:46<1:00:43,  1.55it/s]


 89%|█████████████████████████████▎   | 44341/50000 [8:02:47<1:07:56,  1.39it/s]


 89%|█████████████████████████████▎   | 44342/50000 [8:02:47<1:04:02,  1.47it/s]


 89%|█████████████████████████████▎   | 44343/50000 [8:02:48<1:04:53,  1.45it/s]


 89%|█████████████████████████████▎   | 44344/50000 [8:02:49<1:02:03,  1.52it/s]


 89%|███████████████████████████████    | 44345/50000 [8:02:49<59:54,  1.57it/s]


 89%|███████████████████████████████    | 44346/50000 [8:02:50<58:44,  1.60it/s]


 89%|███████████████████████████████    | 44347/50000 [8:02:50<59:00,  1.60it/s]


 89%|███████████████████████████████    | 44348/50000 [8:02:51<57:02,  1.65it/s]


 89%|███████████████████████████████    | 44349/50000 [8:02:51<56:02,  1.68it/s]


 89%|███████████████████████████████    | 44350/50000 [8:02:52<57:06,  1.65it/s]


 89%|███████████████████████████████    | 44351/50000 [8:02:53<57:40,  1.63it/s]


 89%|███████████████████████████████    | 44352/50000 [8:02:53<59:10,  1.59it/s]


 89%|███████████████████████████████    | 44353/50000 [8:02:54<58:03,  1.62it/s]


 89%|███████████████████████████████    | 44354/50000 [8:02:55<59:13,  1.59it/s]


 89%|███████████████████████████████    | 44355/50000 [8:02:55<58:14,  1.62it/s]


 89%|███████████████████████████████    | 44356/50000 [8:02:56<54:38,  1.72it/s]


 89%|███████████████████████████████    | 44357/50000 [8:02:56<59:03,  1.59it/s]


 89%|█████████████████████████████▎   | 44358/50000 [8:02:57<1:02:03,  1.52it/s]


 89%|███████████████████████████████    | 44359/50000 [8:02:58<59:59,  1.57it/s]


 89%|███████████████████████████████    | 44360/50000 [8:02:58<59:42,  1.57it/s]


 89%|█████████████████████████████▎   | 44361/50000 [8:02:59<1:02:54,  1.49it/s]


 89%|█████████████████████████████▎   | 44362/50000 [8:03:00<1:02:46,  1.50it/s]


 89%|█████████████████████████████▎   | 44363/50000 [8:03:00<1:01:48,  1.52it/s]


 89%|█████████████████████████████▎   | 44364/50000 [8:03:01<1:00:12,  1.56it/s]


 89%|█████████████████████████████▎   | 44365/50000 [8:03:02<1:01:12,  1.53it/s]


 89%|███████████████████████████████    | 44366/50000 [8:03:02<59:16,  1.58it/s]


 89%|███████████████████████████████    | 44367/50000 [8:03:03<57:43,  1.63it/s]


 89%|███████████████████████████████    | 44368/50000 [8:03:03<56:16,  1.67it/s]


 89%|███████████████████████████████    | 44369/50000 [8:03:04<58:28,  1.60it/s]


 89%|███████████████████████████████    | 44370/50000 [8:03:05<56:48,  1.65it/s]


 89%|███████████████████████████████    | 44371/50000 [8:03:05<57:13,  1.64it/s]


 89%|███████████████████████████████    | 44372/50000 [8:03:06<58:23,  1.61it/s]


 89%|███████████████████████████████    | 44373/50000 [8:03:07<58:51,  1.59it/s]


 89%|█████████████████████████████▎   | 44374/50000 [8:03:07<1:02:27,  1.50it/s]


 89%|█████████████████████████████▎   | 44375/50000 [8:03:08<1:06:39,  1.41it/s]


 89%|█████████████████████████████▎   | 44376/50000 [8:03:09<1:02:30,  1.50it/s]


 89%|█████████████████████████████▎   | 44377/50000 [8:03:09<1:01:54,  1.51it/s]


 89%|█████████████████████████████▎   | 44378/50000 [8:03:10<1:00:13,  1.56it/s]


 89%|███████████████████████████████    | 44379/50000 [8:03:11<57:59,  1.62it/s]


 89%|███████████████████████████████    | 44380/50000 [8:03:11<59:05,  1.59it/s]


 89%|███████████████████████████████    | 44381/50000 [8:03:12<59:50,  1.56it/s]


 89%|███████████████████████████████    | 44382/50000 [8:03:12<58:25,  1.60it/s]


 89%|█████████████████████████████▎   | 44383/50000 [8:03:13<1:03:37,  1.47it/s]


 89%|█████████████████████████████▎   | 44384/50000 [8:03:14<1:00:42,  1.54it/s]


 89%|███████████████████████████████    | 44385/50000 [8:03:14<59:55,  1.56it/s]


 89%|█████████████████████████████▎   | 44386/50000 [8:03:15<1:00:56,  1.54it/s]


 89%|███████████████████████████████    | 44387/50000 [8:03:16<58:38,  1.60it/s]


 89%|█████████████████████████████▎   | 44388/50000 [8:03:16<1:00:02,  1.56it/s]


 89%|█████████████████████████████▎   | 44389/50000 [8:03:17<1:01:13,  1.53it/s]


 89%|█████████████████████████████▎   | 44390/50000 [8:03:18<1:00:16,  1.55it/s]


 89%|█████████████████████████████▎   | 44391/50000 [8:03:18<1:01:28,  1.52it/s]


 89%|█████████████████████████████▎   | 44392/50000 [8:03:19<1:01:58,  1.51it/s]


 89%|█████████████████████████████▎   | 44393/50000 [8:03:20<1:03:19,  1.48it/s]


 89%|█████████████████████████████▎   | 44394/50000 [8:03:20<1:01:18,  1.52it/s]


 89%|█████████████████████████████▎   | 44395/50000 [8:03:21<1:01:30,  1.52it/s]


 89%|███████████████████████████████    | 44396/50000 [8:03:22<59:14,  1.58it/s]


 89%|███████████████████████████████    | 44397/50000 [8:03:22<58:10,  1.61it/s]


 89%|█████████████████████████████▎   | 44398/50000 [8:03:23<1:01:44,  1.51it/s]


 89%|█████████████████████████████▎   | 44399/50000 [8:03:24<1:01:21,  1.52it/s]


 89%|█████████████████████████████▎   | 44400/50000 [8:03:24<1:00:22,  1.55it/s]
                                                                                
{'loss': 3.13, 'grad_norm': 2.8484315872192383, 'learning_rate': 0.000112, 'epoch': 2.32}

 89%|█████████████████████████████▎   | 44400/50000 [8:03:24<1:00:22,  1.55it/s]


 89%|█████████████████████████████▎   | 44401/50000 [8:03:25<1:02:56,  1.48it/s]


 89%|█████████████████████████████▎   | 44402/50000 [8:03:26<1:09:26,  1.34it/s]


 89%|█████████████████████████████▎   | 44403/50000 [8:03:27<1:09:44,  1.34it/s]


 89%|█████████████████████████████▎   | 44404/50000 [8:03:27<1:10:08,  1.33it/s]


 89%|█████████████████████████████▎   | 44405/50000 [8:03:28<1:05:23,  1.43it/s]


 89%|█████████████████████████████▎   | 44406/50000 [8:03:29<1:03:38,  1.46it/s]


 89%|█████████████████████████████▎   | 44407/50000 [8:03:29<1:05:51,  1.42it/s]


 89%|█████████████████████████████▎   | 44408/50000 [8:03:30<1:01:40,  1.51it/s]


 89%|█████████████████████████████▎   | 44409/50000 [8:03:31<1:02:16,  1.50it/s]


 89%|███████████████████████████████    | 44410/50000 [8:03:31<57:21,  1.62it/s]


 89%|███████████████████████████████    | 44411/50000 [8:03:32<55:53,  1.67it/s]


 89%|███████████████████████████████    | 44412/50000 [8:03:32<55:44,  1.67it/s]


 89%|███████████████████████████████    | 44413/50000 [8:03:33<55:10,  1.69it/s]


 89%|███████████████████████████████    | 44414/50000 [8:03:33<56:09,  1.66it/s]


 89%|███████████████████████████████    | 44415/50000 [8:03:34<57:30,  1.62it/s]


 89%|███████████████████████████████    | 44416/50000 [8:03:35<58:38,  1.59it/s]


 89%|███████████████████████████████    | 44417/50000 [8:03:35<59:57,  1.55it/s]


 89%|███████████████████████████████    | 44418/50000 [8:03:36<58:44,  1.58it/s]


 89%|█████████████████████████████▎   | 44419/50000 [8:03:37<1:01:18,  1.52it/s]


 89%|███████████████████████████████    | 44420/50000 [8:03:37<56:57,  1.63it/s]


 89%|███████████████████████████████    | 44421/50000 [8:03:38<58:02,  1.60it/s]


 89%|█████████████████████████████▎   | 44422/50000 [8:03:39<1:00:48,  1.53it/s]


 89%|█████████████████████████████▎   | 44423/50000 [8:03:39<1:00:43,  1.53it/s]


 89%|█████████████████████████████▎   | 44424/50000 [8:03:40<1:03:45,  1.46it/s]


 89%|█████████████████████████████▎   | 44425/50000 [8:03:41<1:00:40,  1.53it/s]


 89%|███████████████████████████████    | 44426/50000 [8:03:41<59:22,  1.56it/s]


 89%|███████████████████████████████    | 44427/50000 [8:03:42<57:59,  1.60it/s]


 89%|███████████████████████████████    | 44428/50000 [8:03:42<56:42,  1.64it/s]


 89%|███████████████████████████████    | 44429/50000 [8:03:43<53:15,  1.74it/s]


 89%|███████████████████████████████    | 44430/50000 [8:03:44<53:52,  1.72it/s]


 89%|███████████████████████████████    | 44431/50000 [8:03:44<56:37,  1.64it/s]


 89%|███████████████████████████████    | 44432/50000 [8:03:45<57:49,  1.60it/s]


 89%|███████████████████████████████    | 44433/50000 [8:03:45<57:16,  1.62it/s]


 89%|███████████████████████████████    | 44434/50000 [8:03:46<56:11,  1.65it/s]


 89%|███████████████████████████████    | 44435/50000 [8:03:47<54:32,  1.70it/s]


 89%|███████████████████████████████    | 44436/50000 [8:03:47<58:22,  1.59it/s]


 89%|███████████████████████████████    | 44437/50000 [8:03:48<58:20,  1.59it/s]


 89%|███████████████████████████████    | 44438/50000 [8:03:49<56:42,  1.63it/s]


 89%|███████████████████████████████    | 44439/50000 [8:03:49<55:00,  1.68it/s]


 89%|███████████████████████████████    | 44440/50000 [8:03:50<58:38,  1.58it/s]


 89%|███████████████████████████████    | 44441/50000 [8:03:50<58:57,  1.57it/s]


 89%|█████████████████████████████▎   | 44442/50000 [8:03:51<1:04:17,  1.44it/s]


 89%|█████████████████████████████▎   | 44443/50000 [8:03:52<1:03:28,  1.46it/s]


 89%|█████████████████████████████▎   | 44444/50000 [8:03:53<1:04:22,  1.44it/s]


 89%|█████████████████████████████▎   | 44445/50000 [8:03:53<1:02:59,  1.47it/s]


 89%|█████████████████████████████▎   | 44446/50000 [8:03:54<1:02:54,  1.47it/s]


 89%|█████████████████████████████▎   | 44447/50000 [8:03:55<1:00:12,  1.54it/s]


 89%|███████████████████████████████    | 44448/50000 [8:03:55<57:59,  1.60it/s]


 89%|███████████████████████████████    | 44449/50000 [8:03:56<55:05,  1.68it/s]


 89%|███████████████████████████████    | 44450/50000 [8:03:56<56:49,  1.63it/s]


 89%|███████████████████████████████    | 44451/50000 [8:03:57<54:05,  1.71it/s]


 89%|█████████████████████████████▎   | 44452/50000 [8:03:58<1:00:29,  1.53it/s]


 89%|███████████████████████████████    | 44453/50000 [8:03:58<58:50,  1.57it/s]


 89%|███████████████████████████████    | 44454/50000 [8:03:59<56:38,  1.63it/s]


 89%|███████████████████████████████    | 44455/50000 [8:03:59<56:07,  1.65it/s]


 89%|███████████████████████████████    | 44456/50000 [8:04:00<54:44,  1.69it/s]


 89%|███████████████████████████████    | 44457/50000 [8:04:01<59:47,  1.55it/s]


 89%|███████████████████████████████    | 44458/50000 [8:04:01<57:17,  1.61it/s]


 89%|███████████████████████████████    | 44459/50000 [8:04:02<57:52,  1.60it/s]


 89%|█████████████████████████████▎   | 44460/50000 [8:04:03<1:00:10,  1.53it/s]


 89%|███████████████████████████████    | 44461/50000 [8:04:03<57:43,  1.60it/s]


 89%|███████████████████████████████    | 44462/50000 [8:04:04<57:46,  1.60it/s]


 89%|███████████████████████████████    | 44463/50000 [8:04:04<59:21,  1.55it/s]


 89%|█████████████████████████████▎   | 44464/50000 [8:04:05<1:00:06,  1.54it/s]


 89%|█████████████████████████████▎   | 44465/50000 [8:04:06<1:02:17,  1.48it/s]


 89%|█████████████████████████████▎   | 44466/50000 [8:04:06<1:00:13,  1.53it/s]


 89%|███████████████████████████████▏   | 44467/50000 [8:04:07<58:26,  1.58it/s]


 89%|███████████████████████████████▏   | 44468/50000 [8:04:08<56:49,  1.62it/s]


 89%|███████████████████████████████▏   | 44469/50000 [8:04:08<58:10,  1.58it/s]


 89%|█████████████████████████████▎   | 44470/50000 [8:04:09<1:00:27,  1.52it/s]


 89%|█████████████████████████████▎   | 44471/50000 [8:04:10<1:02:28,  1.47it/s]


 89%|█████████████████████████████▎   | 44472/50000 [8:04:10<1:02:06,  1.48it/s]


 89%|█████████████████████████████▎   | 44473/50000 [8:04:11<1:01:07,  1.51it/s]


 89%|███████████████████████████████▏   | 44474/50000 [8:04:12<58:31,  1.57it/s]


 89%|███████████████████████████████▏   | 44475/50000 [8:04:12<56:32,  1.63it/s]


 89%|███████████████████████████████▏   | 44476/50000 [8:04:13<54:59,  1.67it/s]


 89%|███████████████████████████████▏   | 44477/50000 [8:04:13<56:45,  1.62it/s]


 89%|███████████████████████████████▏   | 44478/50000 [8:04:14<55:35,  1.66it/s]


 89%|███████████████████████████████▏   | 44479/50000 [8:04:15<55:37,  1.65it/s]


 89%|███████████████████████████████▏   | 44480/50000 [8:04:15<57:07,  1.61it/s]


 89%|███████████████████████████████▏   | 44481/50000 [8:04:16<57:57,  1.59it/s]


 89%|███████████████████████████████▏   | 44482/50000 [8:04:17<56:47,  1.62it/s]


 89%|███████████████████████████████▏   | 44483/50000 [8:04:17<56:56,  1.61it/s]


 89%|███████████████████████████████▏   | 44484/50000 [8:04:18<57:43,  1.59it/s]


 89%|█████████████████████████████▎   | 44485/50000 [8:04:19<1:00:35,  1.52it/s]


 89%|███████████████████████████████▏   | 44486/50000 [8:04:19<59:55,  1.53it/s]


 89%|█████████████████████████████▎   | 44487/50000 [8:04:20<1:02:17,  1.48it/s]


 89%|█████████████████████████████▎   | 44488/50000 [8:04:20<1:00:15,  1.52it/s]


 89%|█████████████████████████████▎   | 44489/50000 [8:04:21<1:00:53,  1.51it/s]


 89%|█████████████████████████████▎   | 44490/50000 [8:04:22<1:00:34,  1.52it/s]


 89%|█████████████████████████████▎   | 44491/50000 [8:04:22<1:00:41,  1.51it/s]


 89%|███████████████████████████████▏   | 44492/50000 [8:04:23<59:13,  1.55it/s]


 89%|█████████████████████████████▎   | 44493/50000 [8:04:24<1:02:36,  1.47it/s]


 89%|█████████████████████████████▎   | 44494/50000 [8:04:25<1:04:58,  1.41it/s]


 89%|█████████████████████████████▎   | 44495/50000 [8:04:25<1:02:54,  1.46it/s]


 89%|█████████████████████████████▎   | 44496/50000 [8:04:26<1:00:33,  1.51it/s]


 89%|███████████████████████████████▏   | 44497/50000 [8:04:26<59:05,  1.55it/s]


 89%|███████████████████████████████▏   | 44498/50000 [8:04:27<58:03,  1.58it/s]


 89%|█████████████████████████████▎   | 44499/50000 [8:04:28<1:01:15,  1.50it/s]


 89%|█████████████████████████████▎   | 44500/50000 [8:04:29<1:06:43,  1.37it/s]
                                                                                
{'loss': 3.1234, 'grad_norm': 3.371838092803955, 'learning_rate': 0.00011, 'epoch': 2.33}

 89%|█████████████████████████████▎   | 44500/50000 [8:04:29<1:06:43,  1.37it/s]


 89%|█████████████████████████████▎   | 44501/50000 [8:04:29<1:04:27,  1.42it/s]


 89%|█████████████████████████████▎   | 44502/50000 [8:04:30<1:00:36,  1.51it/s]


 89%|███████████████████████████████▏   | 44503/50000 [8:04:30<57:44,  1.59it/s]


 89%|███████████████████████████████▏   | 44504/50000 [8:04:31<56:56,  1.61it/s]


 89%|███████████████████████████████▏   | 44505/50000 [8:04:32<56:55,  1.61it/s]


 89%|███████████████████████████████▏   | 44506/50000 [8:04:32<58:40,  1.56it/s]


 89%|███████████████████████████████▏   | 44507/50000 [8:04:33<57:14,  1.60it/s]


 89%|███████████████████████████████▏   | 44508/50000 [8:04:34<56:34,  1.62it/s]


 89%|███████████████████████████████▏   | 44509/50000 [8:04:34<57:40,  1.59it/s]


 89%|███████████████████████████████▏   | 44510/50000 [8:04:35<56:52,  1.61it/s]


 89%|███████████████████████████████▏   | 44511/50000 [8:04:35<58:02,  1.58it/s]


 89%|███████████████████████████████▏   | 44512/50000 [8:04:36<58:30,  1.56it/s]


 89%|███████████████████████████████▏   | 44513/50000 [8:04:37<57:05,  1.60it/s]


 89%|█████████████████████████████▍   | 44514/50000 [8:04:37<1:00:50,  1.50it/s]


 89%|█████████████████████████████▍   | 44515/50000 [8:04:38<1:00:27,  1.51it/s]


 89%|███████████████████████████████▏   | 44516/50000 [8:04:39<59:40,  1.53it/s]


 89%|█████████████████████████████▍   | 44517/50000 [8:04:39<1:01:12,  1.49it/s]


 89%|█████████████████████████████▍   | 44518/50000 [8:04:40<1:04:14,  1.42it/s]


 89%|█████████████████████████████▍   | 44519/50000 [8:04:41<1:02:28,  1.46it/s]


 89%|███████████████████████████████▏   | 44520/50000 [8:04:41<59:30,  1.53it/s]


 89%|███████████████████████████████▏   | 44521/50000 [8:04:42<57:26,  1.59it/s]


 89%|███████████████████████████████▏   | 44522/50000 [8:04:43<57:43,  1.58it/s]


 89%|███████████████████████████████▏   | 44523/50000 [8:04:43<56:22,  1.62it/s]


 89%|███████████████████████████████▏   | 44524/50000 [8:04:44<56:05,  1.63it/s]


 89%|███████████████████████████████▏   | 44525/50000 [8:04:45<56:24,  1.62it/s]


 89%|███████████████████████████████▏   | 44526/50000 [8:04:45<54:53,  1.66it/s]


 89%|███████████████████████████████▏   | 44527/50000 [8:04:46<56:47,  1.61it/s]


 89%|█████████████████████████████▍   | 44528/50000 [8:04:47<1:00:25,  1.51it/s]


 89%|███████████████████████████████▏   | 44529/50000 [8:04:47<57:42,  1.58it/s]


 89%|███████████████████████████████▏   | 44530/50000 [8:04:48<56:50,  1.60it/s]


 89%|███████████████████████████████▏   | 44531/50000 [8:04:48<56:54,  1.60it/s]


 89%|█████████████████████████████▍   | 44532/50000 [8:04:49<1:02:53,  1.45it/s]


 89%|█████████████████████████████▍   | 44533/50000 [8:04:50<1:05:03,  1.40it/s]


 89%|█████████████████████████████▍   | 44534/50000 [8:04:50<1:00:34,  1.50it/s]


 89%|█████████████████████████████▍   | 44535/50000 [8:04:51<1:00:12,  1.51it/s]


 89%|█████████████████████████████▍   | 44536/50000 [8:04:52<1:03:33,  1.43it/s]


 89%|█████████████████████████████▍   | 44537/50000 [8:04:53<1:06:37,  1.37it/s]


 89%|█████████████████████████████▍   | 44538/50000 [8:04:53<1:03:46,  1.43it/s]


 89%|█████████████████████████████▍   | 44539/50000 [8:04:54<1:01:06,  1.49it/s]


 89%|█████████████████████████████▍   | 44540/50000 [8:04:55<1:03:16,  1.44it/s]


 89%|█████████████████████████████▍   | 44541/50000 [8:04:55<1:05:28,  1.39it/s]


 89%|█████████████████████████████▍   | 44542/50000 [8:04:56<1:01:46,  1.47it/s]


 89%|█████████████████████████████▍   | 44543/50000 [8:04:57<1:00:54,  1.49it/s]


 89%|███████████████████████████████▏   | 44544/50000 [8:04:57<59:34,  1.53it/s]


 89%|█████████████████████████████▍   | 44545/50000 [8:04:58<1:05:03,  1.40it/s]


 89%|█████████████████████████████▍   | 44546/50000 [8:04:59<1:03:11,  1.44it/s]


 89%|█████████████████████████████▍   | 44547/50000 [8:05:00<1:05:24,  1.39it/s]


 89%|█████████████████████████████▍   | 44548/50000 [8:05:00<1:05:21,  1.39it/s]


 89%|█████████████████████████████▍   | 44549/50000 [8:05:01<1:03:14,  1.44it/s]


 89%|███████████████████████████████▏   | 44550/50000 [8:05:02<59:49,  1.52it/s]


 89%|███████████████████████████████▏   | 44551/50000 [8:05:02<57:27,  1.58it/s]


 89%|███████████████████████████████▏   | 44552/50000 [8:05:03<57:40,  1.57it/s]


 89%|███████████████████████████████▏   | 44553/50000 [8:05:03<59:54,  1.52it/s]


 89%|███████████████████████████████▏   | 44554/50000 [8:05:04<59:32,  1.52it/s]


 89%|█████████████████████████████▍   | 44555/50000 [8:05:05<1:01:02,  1.49it/s]


 89%|█████████████████████████████▍   | 44556/50000 [8:05:06<1:02:28,  1.45it/s]


 89%|███████████████████████████████▏   | 44557/50000 [8:05:06<58:57,  1.54it/s]


 89%|███████████████████████████████▏   | 44558/50000 [8:05:07<59:55,  1.51it/s]


 89%|███████████████████████████████▏   | 44559/50000 [8:05:07<59:24,  1.53it/s]


 89%|███████████████████████████████▏   | 44560/50000 [8:05:08<59:31,  1.52it/s]


 89%|███████████████████████████████▏   | 44561/50000 [8:05:09<59:58,  1.51it/s]


 89%|███████████████████████████████▏   | 44562/50000 [8:05:09<58:13,  1.56it/s]


 89%|███████████████████████████████▏   | 44563/50000 [8:05:10<55:00,  1.65it/s]


 89%|███████████████████████████████▏   | 44564/50000 [8:05:11<58:45,  1.54it/s]


 89%|███████████████████████████████▏   | 44565/50000 [8:05:11<54:44,  1.65it/s]


 89%|███████████████████████████████▏   | 44566/50000 [8:05:12<54:52,  1.65it/s]


 89%|███████████████████████████████▏   | 44567/50000 [8:05:12<56:09,  1.61it/s]


 89%|███████████████████████████████▏   | 44568/50000 [8:05:13<57:15,  1.58it/s]


 89%|███████████████████████████████▏   | 44569/50000 [8:05:14<59:30,  1.52it/s]


 89%|███████████████████████████████▏   | 44570/50000 [8:05:14<59:50,  1.51it/s]


 89%|███████████████████████████████▏   | 44571/50000 [8:05:15<58:52,  1.54it/s]


 89%|███████████████████████████████▏   | 44572/50000 [8:05:16<57:10,  1.58it/s]


 89%|███████████████████████████████▏   | 44573/50000 [8:05:16<55:38,  1.63it/s]


 89%|███████████████████████████████▏   | 44574/50000 [8:05:17<54:12,  1.67it/s]


 89%|███████████████████████████████▏   | 44575/50000 [8:05:17<53:13,  1.70it/s]


 89%|███████████████████████████████▏   | 44576/50000 [8:05:18<57:04,  1.58it/s]


 89%|███████████████████████████████▏   | 44577/50000 [8:05:19<57:26,  1.57it/s]


 89%|███████████████████████████████▏   | 44578/50000 [8:05:19<55:48,  1.62it/s]


 89%|███████████████████████████████▏   | 44579/50000 [8:05:20<55:24,  1.63it/s]


 89%|█████████████████████████████▍   | 44580/50000 [8:05:21<1:02:08,  1.45it/s]


 89%|███████████████████████████████▏   | 44581/50000 [8:05:21<58:25,  1.55it/s]


 89%|███████████████████████████████▏   | 44582/50000 [8:05:22<56:08,  1.61it/s]


 89%|███████████████████████████████▏   | 44583/50000 [8:05:22<54:33,  1.65it/s]


 89%|███████████████████████████████▏   | 44584/50000 [8:05:23<52:19,  1.73it/s]


 89%|███████████████████████████████▏   | 44585/50000 [8:05:24<55:49,  1.62it/s]


 89%|███████████████████████████████▏   | 44586/50000 [8:05:24<59:09,  1.53it/s]


 89%|█████████████████████████████▍   | 44587/50000 [8:05:25<1:00:59,  1.48it/s]


 89%|█████████████████████████████▍   | 44588/50000 [8:05:26<1:00:46,  1.48it/s]


 89%|█████████████████████████████▍   | 44589/50000 [8:05:26<1:00:47,  1.48it/s]


 89%|█████████████████████████████▍   | 44590/50000 [8:05:27<1:00:54,  1.48it/s]


 89%|█████████████████████████████▍   | 44591/50000 [8:05:28<1:01:00,  1.48it/s]


 89%|█████████████████████████████▍   | 44592/50000 [8:05:29<1:00:34,  1.49it/s]


 89%|███████████████████████████████▏   | 44593/50000 [8:05:29<57:31,  1.57it/s]


 89%|███████████████████████████████▏   | 44594/50000 [8:05:30<56:44,  1.59it/s]


 89%|███████████████████████████████▏   | 44595/50000 [8:05:30<57:43,  1.56it/s]


 89%|███████████████████████████████▏   | 44596/50000 [8:05:31<57:33,  1.56it/s]


 89%|███████████████████████████████▏   | 44597/50000 [8:05:32<58:08,  1.55it/s]


 89%|███████████████████████████████▏   | 44598/50000 [8:05:32<55:47,  1.61it/s]


 89%|███████████████████████████████▏   | 44599/50000 [8:05:33<57:23,  1.57it/s]


 89%|███████████████████████████████▏   | 44600/50000 [8:05:34<58:19,  1.54it/s]
                                                                                
{'loss': 3.1107, 'grad_norm': 3.3573856353759766, 'learning_rate': 0.000108, 'epoch': 2.34}

 89%|███████████████████████████████▏   | 44600/50000 [8:05:34<58:19,  1.54it/s]


 89%|███████████████████████████████▏   | 44601/50000 [8:05:34<58:50,  1.53it/s]


 89%|███████████████████████████████▏   | 44602/50000 [8:05:35<57:59,  1.55it/s]


 89%|███████████████████████████████▏   | 44603/50000 [8:05:35<56:31,  1.59it/s]


 89%|███████████████████████████████▏   | 44604/50000 [8:05:36<54:19,  1.66it/s]


 89%|███████████████████████████████▏   | 44605/50000 [8:05:37<54:06,  1.66it/s]


 89%|███████████████████████████████▏   | 44606/50000 [8:05:37<51:29,  1.75it/s]


 89%|███████████████████████████████▏   | 44607/50000 [8:05:38<52:49,  1.70it/s]


 89%|███████████████████████████████▏   | 44608/50000 [8:05:38<53:04,  1.69it/s]


 89%|███████████████████████████████▏   | 44609/50000 [8:05:39<52:23,  1.71it/s]


 89%|███████████████████████████████▏   | 44610/50000 [8:05:39<52:43,  1.70it/s]


 89%|███████████████████████████████▏   | 44611/50000 [8:05:40<54:58,  1.63it/s]


 89%|███████████████████████████████▏   | 44612/50000 [8:05:41<56:42,  1.58it/s]


 89%|███████████████████████████████▏   | 44613/50000 [8:05:41<53:44,  1.67it/s]


 89%|███████████████████████████████▏   | 44614/50000 [8:05:42<51:31,  1.74it/s]


 89%|███████████████████████████████▏   | 44615/50000 [8:05:43<53:26,  1.68it/s]


 89%|███████████████████████████████▏   | 44616/50000 [8:05:43<55:06,  1.63it/s]


 89%|███████████████████████████████▏   | 44617/50000 [8:05:44<58:09,  1.54it/s]


 89%|███████████████████████████████▏   | 44618/50000 [8:05:44<56:26,  1.59it/s]


 89%|███████████████████████████████▏   | 44619/50000 [8:05:45<54:13,  1.65it/s]


 89%|███████████████████████████████▏   | 44620/50000 [8:05:46<57:05,  1.57it/s]


 89%|███████████████████████████████▏   | 44621/50000 [8:05:46<58:04,  1.54it/s]


 89%|█████████████████████████████▍   | 44622/50000 [8:05:47<1:00:20,  1.49it/s]


 89%|███████████████████████████████▏   | 44623/50000 [8:05:48<58:26,  1.53it/s]


 89%|███████████████████████████████▏   | 44624/50000 [8:05:48<55:51,  1.60it/s]


 89%|███████████████████████████████▏   | 44625/50000 [8:05:49<58:17,  1.54it/s]


 89%|███████████████████████████████▏   | 44626/50000 [8:05:50<54:33,  1.64it/s]


 89%|███████████████████████████████▏   | 44627/50000 [8:05:50<55:46,  1.61it/s]


 89%|███████████████████████████████▏   | 44628/50000 [8:05:51<54:04,  1.66it/s]


 89%|███████████████████████████████▏   | 44629/50000 [8:05:51<53:41,  1.67it/s]


 89%|███████████████████████████████▏   | 44630/50000 [8:05:52<57:03,  1.57it/s]


 89%|█████████████████████████████▍   | 44631/50000 [8:05:53<1:01:43,  1.45it/s]


 89%|███████████████████████████████▏   | 44632/50000 [8:05:53<59:24,  1.51it/s]


 89%|█████████████████████████████▍   | 44633/50000 [8:05:54<1:01:56,  1.44it/s]


 89%|█████████████████████████████▍   | 44634/50000 [8:05:55<1:00:33,  1.48it/s]


 89%|███████████████████████████████▏   | 44635/50000 [8:05:56<59:45,  1.50it/s]


 89%|███████████████████████████████▏   | 44636/50000 [8:05:56<56:48,  1.57it/s]


 89%|███████████████████████████████▏   | 44637/50000 [8:05:57<55:06,  1.62it/s]


 89%|███████████████████████████████▏   | 44638/50000 [8:05:57<55:36,  1.61it/s]


 89%|███████████████████████████████▏   | 44639/50000 [8:05:58<54:55,  1.63it/s]


 89%|███████████████████████████████▏   | 44640/50000 [8:05:59<55:36,  1.61it/s]


 89%|█████████████████████████████▍   | 44641/50000 [8:05:59<1:03:24,  1.41it/s]


 89%|█████████████████████████████▍   | 44642/50000 [8:06:00<1:00:43,  1.47it/s]


 89%|███████████████████████████████▎   | 44643/50000 [8:06:01<58:28,  1.53it/s]


 89%|███████████████████████████████▎   | 44644/50000 [8:06:01<55:33,  1.61it/s]


 89%|███████████████████████████████▎   | 44645/50000 [8:06:02<55:43,  1.60it/s]


 89%|███████████████████████████████▎   | 44646/50000 [8:06:02<56:10,  1.59it/s]


 89%|███████████████████████████████▎   | 44647/50000 [8:06:03<55:18,  1.61it/s]


 89%|███████████████████████████████▎   | 44648/50000 [8:06:04<58:08,  1.53it/s]


 89%|███████████████████████████████▎   | 44649/50000 [8:06:04<58:23,  1.53it/s]


 89%|███████████████████████████████▎   | 44650/50000 [8:06:05<54:53,  1.62it/s]


 89%|███████████████████████████████▎   | 44651/50000 [8:06:06<53:12,  1.68it/s]


 89%|███████████████████████████████▎   | 44652/50000 [8:06:06<53:11,  1.68it/s]


 89%|███████████████████████████████▎   | 44653/50000 [8:06:07<54:33,  1.63it/s]


 89%|███████████████████████████████▎   | 44654/50000 [8:06:07<55:36,  1.60it/s]


 89%|█████████████████████████████▍   | 44655/50000 [8:06:08<1:00:47,  1.47it/s]


 89%|█████████████████████████████▍   | 44656/50000 [8:06:09<1:02:36,  1.42it/s]


 89%|█████████████████████████████▍   | 44657/50000 [8:06:10<1:02:48,  1.42it/s]


 89%|█████████████████████████████▍   | 44658/50000 [8:06:10<1:01:14,  1.45it/s]


 89%|███████████████████████████████▎   | 44659/50000 [8:06:11<58:52,  1.51it/s]


 89%|███████████████████████████████▎   | 44660/50000 [8:06:11<55:10,  1.61it/s]


 89%|███████████████████████████████▎   | 44661/50000 [8:06:12<56:58,  1.56it/s]


 89%|███████████████████████████████▎   | 44662/50000 [8:06:13<58:00,  1.53it/s]


 89%|███████████████████████████████▎   | 44663/50000 [8:06:14<59:57,  1.48it/s]


 89%|███████████████████████████████▎   | 44664/50000 [8:06:14<59:44,  1.49it/s]


 89%|█████████████████████████████▍   | 44665/50000 [8:06:15<1:01:31,  1.45it/s]


 89%|███████████████████████████████▎   | 44666/50000 [8:06:16<58:07,  1.53it/s]


 89%|███████████████████████████████▎   | 44667/50000 [8:06:16<55:55,  1.59it/s]


 89%|███████████████████████████████▎   | 44668/50000 [8:06:17<52:07,  1.70it/s]


 89%|███████████████████████████████▎   | 44669/50000 [8:06:17<53:25,  1.66it/s]


 89%|███████████████████████████████▎   | 44670/50000 [8:06:18<55:29,  1.60it/s]


 89%|███████████████████████████████▎   | 44671/50000 [8:06:18<53:45,  1.65it/s]


 89%|███████████████████████████████▎   | 44672/50000 [8:06:19<54:42,  1.62it/s]


 89%|███████████████████████████████▎   | 44673/50000 [8:06:20<55:21,  1.60it/s]


 89%|███████████████████████████████▎   | 44674/50000 [8:06:20<56:08,  1.58it/s]


 89%|███████████████████████████████▎   | 44675/50000 [8:06:21<54:01,  1.64it/s]


 89%|███████████████████████████████▎   | 44676/50000 [8:06:22<54:23,  1.63it/s]


 89%|███████████████████████████████▎   | 44677/50000 [8:06:22<54:47,  1.62it/s]


 89%|███████████████████████████████▎   | 44678/50000 [8:06:23<55:45,  1.59it/s]


 89%|███████████████████████████████▎   | 44679/50000 [8:06:24<56:55,  1.56it/s]


 89%|███████████████████████████████▎   | 44680/50000 [8:06:24<59:46,  1.48it/s]


 89%|███████████████████████████████▎   | 44681/50000 [8:06:25<57:15,  1.55it/s]


 89%|███████████████████████████████▎   | 44682/50000 [8:06:25<54:47,  1.62it/s]


 89%|███████████████████████████████▎   | 44683/50000 [8:06:26<53:53,  1.64it/s]


 89%|███████████████████████████████▎   | 44684/50000 [8:06:27<51:29,  1.72it/s]


 89%|███████████████████████████████▎   | 44685/50000 [8:06:27<53:45,  1.65it/s]


 89%|███████████████████████████████▎   | 44686/50000 [8:06:28<54:59,  1.61it/s]


 89%|███████████████████████████████▎   | 44687/50000 [8:06:29<57:22,  1.54it/s]


 89%|█████████████████████████████▍   | 44688/50000 [8:06:29<1:01:42,  1.43it/s]


 89%|███████████████████████████████▎   | 44689/50000 [8:06:30<58:39,  1.51it/s]


 89%|███████████████████████████████▎   | 44690/50000 [8:06:31<56:49,  1.56it/s]


 89%|███████████████████████████████▎   | 44691/50000 [8:06:31<55:42,  1.59it/s]


 89%|███████████████████████████████▎   | 44692/50000 [8:06:32<53:46,  1.65it/s]


 89%|███████████████████████████████▎   | 44693/50000 [8:06:32<55:47,  1.59it/s]


 89%|███████████████████████████████▎   | 44694/50000 [8:06:33<55:30,  1.59it/s]


 89%|███████████████████████████████▎   | 44695/50000 [8:06:34<57:53,  1.53it/s]


 89%|███████████████████████████████▎   | 44696/50000 [8:06:34<55:27,  1.59it/s]


 89%|███████████████████████████████▎   | 44697/50000 [8:06:35<55:44,  1.59it/s]


 89%|███████████████████████████████▎   | 44698/50000 [8:06:36<54:37,  1.62it/s]


 89%|███████████████████████████████▎   | 44699/50000 [8:06:36<55:29,  1.59it/s]


 89%|███████████████████████████████▎   | 44700/50000 [8:06:37<56:37,  1.56it/s]
                                                                                
{'loss': 3.0925, 'grad_norm': 3.09501051902771, 'learning_rate': 0.000106, 'epoch': 2.34}

 89%|███████████████████████████████▎   | 44700/50000 [8:06:37<56:37,  1.56it/s]


 89%|███████████████████████████████▎   | 44701/50000 [8:06:37<57:05,  1.55it/s]


 89%|███████████████████████████████▎   | 44702/50000 [8:06:38<57:03,  1.55it/s]


 89%|███████████████████████████████▎   | 44703/50000 [8:06:39<55:52,  1.58it/s]


 89%|███████████████████████████████▎   | 44704/50000 [8:06:39<52:46,  1.67it/s]


 89%|███████████████████████████████▎   | 44705/50000 [8:06:40<52:01,  1.70it/s]


 89%|███████████████████████████████▎   | 44706/50000 [8:06:40<53:39,  1.64it/s]


 89%|███████████████████████████████▎   | 44707/50000 [8:06:41<53:27,  1.65it/s]


 89%|███████████████████████████████▎   | 44708/50000 [8:06:42<58:05,  1.52it/s]


 89%|███████████████████████████████▎   | 44709/50000 [8:06:42<57:15,  1.54it/s]


 89%|███████████████████████████████▎   | 44710/50000 [8:06:43<53:53,  1.64it/s]


 89%|███████████████████████████████▎   | 44711/50000 [8:06:44<52:38,  1.67it/s]


 89%|███████████████████████████████▎   | 44712/50000 [8:06:44<52:53,  1.67it/s]


 89%|███████████████████████████████▎   | 44713/50000 [8:06:45<54:55,  1.60it/s]


 89%|███████████████████████████████▎   | 44714/50000 [8:06:46<55:39,  1.58it/s]


 89%|███████████████████████████████▎   | 44715/50000 [8:06:46<55:21,  1.59it/s]


 89%|███████████████████████████████▎   | 44716/50000 [8:06:47<54:07,  1.63it/s]


 89%|███████████████████████████████▎   | 44717/50000 [8:06:47<57:19,  1.54it/s]


 89%|███████████████████████████████▎   | 44718/50000 [8:06:48<58:50,  1.50it/s]


 89%|███████████████████████████████▎   | 44719/50000 [8:06:49<56:13,  1.57it/s]


 89%|███████████████████████████████▎   | 44720/50000 [8:06:49<57:24,  1.53it/s]


 89%|███████████████████████████████▎   | 44721/50000 [8:06:50<55:22,  1.59it/s]


 89%|███████████████████████████████▎   | 44722/50000 [8:06:51<55:17,  1.59it/s]


 89%|███████████████████████████████▎   | 44723/50000 [8:06:51<56:23,  1.56it/s]


 89%|███████████████████████████████▎   | 44724/50000 [8:06:52<54:59,  1.60it/s]


 89%|███████████████████████████████▎   | 44725/50000 [8:06:52<53:50,  1.63it/s]


 89%|███████████████████████████████▎   | 44726/50000 [8:06:53<51:27,  1.71it/s]


 89%|███████████████████████████████▎   | 44727/50000 [8:06:54<52:51,  1.66it/s]


 89%|███████████████████████████████▎   | 44728/50000 [8:06:54<52:40,  1.67it/s]


 89%|███████████████████████████████▎   | 44729/50000 [8:06:55<58:52,  1.49it/s]


 89%|███████████████████████████████▎   | 44730/50000 [8:06:56<56:42,  1.55it/s]


 89%|███████████████████████████████▎   | 44731/50000 [8:06:56<53:17,  1.65it/s]


 89%|███████████████████████████████▎   | 44732/50000 [8:06:57<53:07,  1.65it/s]


 89%|███████████████████████████████▎   | 44733/50000 [8:06:57<52:30,  1.67it/s]


 89%|███████████████████████████████▎   | 44734/50000 [8:06:58<55:59,  1.57it/s]


 89%|███████████████████████████████▎   | 44735/50000 [8:06:59<56:11,  1.56it/s]


 89%|█████████████████████████████▌   | 44736/50000 [8:07:00<1:03:20,  1.38it/s]


 89%|█████████████████████████████▌   | 44737/50000 [8:07:00<1:02:19,  1.41it/s]


 89%|███████████████████████████████▎   | 44738/50000 [8:07:01<58:37,  1.50it/s]


 89%|█████████████████████████████▌   | 44739/50000 [8:07:02<1:04:49,  1.35it/s]


 89%|█████████████████████████████▌   | 44740/50000 [8:07:02<1:02:21,  1.41it/s]


 89%|█████████████████████████████▌   | 44741/50000 [8:07:03<1:02:31,  1.40it/s]


 89%|█████████████████████████████▌   | 44742/50000 [8:07:04<1:05:53,  1.33it/s]


 89%|█████████████████████████████▌   | 44743/50000 [8:07:05<1:01:55,  1.42it/s]


 89%|███████████████████████████████▎   | 44744/50000 [8:07:05<59:58,  1.46it/s]


 89%|███████████████████████████████▎   | 44745/50000 [8:07:06<59:05,  1.48it/s]


 89%|███████████████████████████████▎   | 44746/50000 [8:07:07<57:58,  1.51it/s]


 89%|███████████████████████████████▎   | 44747/50000 [8:07:07<56:02,  1.56it/s]


 89%|███████████████████████████████▎   | 44748/50000 [8:07:08<58:04,  1.51it/s]


 89%|███████████████████████████████▎   | 44749/50000 [8:07:08<57:56,  1.51it/s]


 90%|███████████████████████████████▎   | 44750/50000 [8:07:09<57:01,  1.53it/s]


 90%|█████████████████████████████▌   | 44751/50000 [8:07:10<1:00:07,  1.46it/s]


 90%|███████████████████████████████▎   | 44752/50000 [8:07:11<59:54,  1.46it/s]


 90%|█████████████████████████████▌   | 44753/50000 [8:07:11<1:03:13,  1.38it/s]


 90%|█████████████████████████████▌   | 44754/50000 [8:07:12<1:01:33,  1.42it/s]


 90%|█████████████████████████████▌   | 44755/50000 [8:07:13<1:00:24,  1.45it/s]


 90%|███████████████████████████████▎   | 44756/50000 [8:07:13<58:46,  1.49it/s]


 90%|███████████████████████████████▎   | 44757/50000 [8:07:14<59:56,  1.46it/s]


 90%|███████████████████████████████▎   | 44758/50000 [8:07:15<57:15,  1.53it/s]


 90%|███████████████████████████████▎   | 44759/50000 [8:07:15<59:31,  1.47it/s]


 90%|███████████████████████████████▎   | 44760/50000 [8:07:16<59:34,  1.47it/s]


 90%|███████████████████████████████▎   | 44761/50000 [8:07:17<56:27,  1.55it/s]


 90%|███████████████████████████████▎   | 44762/50000 [8:07:17<55:57,  1.56it/s]


 90%|███████████████████████████████▎   | 44763/50000 [8:07:18<57:45,  1.51it/s]


 90%|███████████████████████████████▎   | 44764/50000 [8:07:18<55:01,  1.59it/s]


 90%|███████████████████████████████▎   | 44765/50000 [8:07:19<53:55,  1.62it/s]


 90%|███████████████████████████████▎   | 44766/50000 [8:07:20<57:20,  1.52it/s]


 90%|███████████████████████████████▎   | 44767/50000 [8:07:20<55:22,  1.57it/s]


 90%|███████████████████████████████▎   | 44768/50000 [8:07:21<55:29,  1.57it/s]


 90%|███████████████████████████████▎   | 44769/50000 [8:07:22<55:14,  1.58it/s]


 90%|███████████████████████████████▎   | 44770/50000 [8:07:22<59:05,  1.48it/s]


 90%|███████████████████████████████▎   | 44771/50000 [8:07:23<56:54,  1.53it/s]


 90%|█████████████████████████████▌   | 44772/50000 [8:07:24<1:02:20,  1.40it/s]


 90%|███████████████████████████████▎   | 44773/50000 [8:07:25<58:51,  1.48it/s]


 90%|███████████████████████████████▎   | 44774/50000 [8:07:25<56:34,  1.54it/s]


 90%|███████████████████████████████▎   | 44775/50000 [8:07:26<56:36,  1.54it/s]


 90%|███████████████████████████████▎   | 44776/50000 [8:07:26<55:49,  1.56it/s]


 90%|███████████████████████████████▎   | 44777/50000 [8:07:27<57:58,  1.50it/s]


 90%|███████████████████████████████▎   | 44778/50000 [8:07:28<57:06,  1.52it/s]


 90%|███████████████████████████████▎   | 44779/50000 [8:07:28<59:53,  1.45it/s]


 90%|███████████████████████████████▎   | 44780/50000 [8:07:29<59:25,  1.46it/s]


 90%|███████████████████████████████▎   | 44781/50000 [8:07:30<56:37,  1.54it/s]


 90%|███████████████████████████████▎   | 44782/50000 [8:07:30<54:10,  1.61it/s]


 90%|███████████████████████████████▎   | 44783/50000 [8:07:31<54:35,  1.59it/s]


 90%|███████████████████████████████▎   | 44784/50000 [8:07:32<55:24,  1.57it/s]


 90%|███████████████████████████████▎   | 44785/50000 [8:07:32<53:47,  1.62it/s]


 90%|███████████████████████████████▎   | 44786/50000 [8:07:33<50:36,  1.72it/s]


 90%|███████████████████████████████▎   | 44787/50000 [8:07:33<50:36,  1.72it/s]


 90%|███████████████████████████████▎   | 44788/50000 [8:07:34<54:07,  1.60it/s]


 90%|███████████████████████████████▎   | 44789/50000 [8:07:35<54:22,  1.60it/s]


 90%|███████████████████████████████▎   | 44790/50000 [8:07:35<52:24,  1.66it/s]


 90%|███████████████████████████████▎   | 44791/50000 [8:07:36<55:51,  1.55it/s]


 90%|███████████████████████████████▎   | 44792/50000 [8:07:37<57:47,  1.50it/s]


 90%|███████████████████████████████▎   | 44793/50000 [8:07:37<55:46,  1.56it/s]


 90%|█████████████████████████████▌   | 44794/50000 [8:07:38<1:00:07,  1.44it/s]


 90%|███████████████████████████████▎   | 44795/50000 [8:07:39<57:14,  1.52it/s]


 90%|███████████████████████████████▎   | 44796/50000 [8:07:39<55:12,  1.57it/s]


 90%|███████████████████████████████▎   | 44797/50000 [8:07:40<57:47,  1.50it/s]


 90%|███████████████████████████████▎   | 44798/50000 [8:07:41<57:04,  1.52it/s]


 90%|█████████████████████████████▌   | 44799/50000 [8:07:41<1:01:01,  1.42it/s]


 90%|███████████████████████████████▎   | 44800/50000 [8:07:42<59:32,  1.46it/s]
                                                                                
{'loss': 3.1254, 'grad_norm': 3.1628000736236572, 'learning_rate': 0.000104, 'epoch': 2.35}

 90%|███████████████████████████████▎   | 44800/50000 [8:07:42<59:32,  1.46it/s]


 90%|███████████████████████████████▎   | 44801/50000 [8:07:43<54:49,  1.58it/s]


 90%|███████████████████████████████▎   | 44802/50000 [8:07:43<55:41,  1.56it/s]


 90%|███████████████████████████████▎   | 44803/50000 [8:07:44<56:47,  1.53it/s]


 90%|███████████████████████████████▎   | 44804/50000 [8:07:45<58:36,  1.48it/s]


 90%|█████████████████████████████▌   | 44805/50000 [8:07:45<1:00:43,  1.43it/s]


 90%|███████████████████████████████▎   | 44806/50000 [8:07:46<59:48,  1.45it/s]


 90%|███████████████████████████████▎   | 44807/50000 [8:07:47<57:40,  1.50it/s]


 90%|███████████████████████████████▎   | 44808/50000 [8:07:47<59:59,  1.44it/s]


 90%|███████████████████████████████▎   | 44809/50000 [8:07:48<55:04,  1.57it/s]


 90%|███████████████████████████████▎   | 44810/50000 [8:07:48<54:08,  1.60it/s]


 90%|███████████████████████████████▎   | 44811/50000 [8:07:49<59:23,  1.46it/s]


 90%|███████████████████████████████▎   | 44812/50000 [8:07:50<56:20,  1.53it/s]


 90%|███████████████████████████████▎   | 44813/50000 [8:07:51<57:09,  1.51it/s]


 90%|███████████████████████████████▎   | 44814/50000 [8:07:51<57:33,  1.50it/s]


 90%|███████████████████████████████▎   | 44815/50000 [8:07:52<57:00,  1.52it/s]


 90%|███████████████████████████████▎   | 44816/50000 [8:07:52<54:31,  1.58it/s]


 90%|███████████████████████████████▎   | 44817/50000 [8:07:53<55:23,  1.56it/s]


 90%|███████████████████████████████▎   | 44818/50000 [8:07:54<56:29,  1.53it/s]


 90%|███████████████████████████████▎   | 44819/50000 [8:07:55<58:04,  1.49it/s]


 90%|███████████████████████████████▎   | 44820/50000 [8:07:55<57:42,  1.50it/s]


 90%|███████████████████████████████▎   | 44821/50000 [8:07:56<54:40,  1.58it/s]


 90%|███████████████████████████████▍   | 44822/50000 [8:07:56<57:59,  1.49it/s]


 90%|███████████████████████████████▍   | 44823/50000 [8:07:57<59:52,  1.44it/s]


 90%|███████████████████████████████▍   | 44824/50000 [8:07:58<55:28,  1.55it/s]


 90%|███████████████████████████████▍   | 44825/50000 [8:07:58<56:34,  1.52it/s]


 90%|███████████████████████████████▍   | 44826/50000 [8:07:59<56:41,  1.52it/s]


 90%|███████████████████████████████▍   | 44827/50000 [8:08:00<53:49,  1.60it/s]


 90%|███████████████████████████████▍   | 44828/50000 [8:08:00<59:39,  1.44it/s]


 90%|█████████████████████████████▌   | 44829/50000 [8:08:01<1:00:08,  1.43it/s]


 90%|███████████████████████████████▍   | 44830/50000 [8:08:02<58:30,  1.47it/s]


 90%|███████████████████████████████▍   | 44831/50000 [8:08:02<56:22,  1.53it/s]


 90%|███████████████████████████████▍   | 44832/50000 [8:08:03<58:03,  1.48it/s]


 90%|███████████████████████████████▍   | 44833/50000 [8:08:04<55:44,  1.54it/s]


 90%|███████████████████████████████▍   | 44834/50000 [8:08:04<55:24,  1.55it/s]


 90%|███████████████████████████████▍   | 44835/50000 [8:08:05<55:10,  1.56it/s]


 90%|███████████████████████████████▍   | 44836/50000 [8:08:06<51:39,  1.67it/s]


 90%|███████████████████████████████▍   | 44837/50000 [8:08:06<55:04,  1.56it/s]


 90%|███████████████████████████████▍   | 44838/50000 [8:08:07<54:04,  1.59it/s]


 90%|███████████████████████████████▍   | 44839/50000 [8:08:08<54:39,  1.57it/s]


 90%|███████████████████████████████▍   | 44840/50000 [8:08:08<55:43,  1.54it/s]


 90%|███████████████████████████████▍   | 44841/50000 [8:08:09<54:10,  1.59it/s]


 90%|███████████████████████████████▍   | 44842/50000 [8:08:09<54:33,  1.58it/s]


 90%|███████████████████████████████▍   | 44843/50000 [8:08:10<58:19,  1.47it/s]


 90%|███████████████████████████████▍   | 44844/50000 [8:08:11<58:27,  1.47it/s]


 90%|███████████████████████████████▍   | 44845/50000 [8:08:11<55:25,  1.55it/s]


 90%|███████████████████████████████▍   | 44846/50000 [8:08:12<54:17,  1.58it/s]


 90%|███████████████████████████████▍   | 44847/50000 [8:08:13<52:41,  1.63it/s]


 90%|███████████████████████████████▍   | 44848/50000 [8:08:13<57:43,  1.49it/s]


 90%|███████████████████████████████▍   | 44849/50000 [8:08:14<59:06,  1.45it/s]


 90%|███████████████████████████████▍   | 44850/50000 [8:08:15<56:45,  1.51it/s]


 90%|███████████████████████████████▍   | 44851/50000 [8:08:15<56:33,  1.52it/s]


 90%|███████████████████████████████▍   | 44852/50000 [8:08:16<56:15,  1.52it/s]


 90%|███████████████████████████████▍   | 44853/50000 [8:08:17<52:42,  1.63it/s]


 90%|███████████████████████████████▍   | 44854/50000 [8:08:17<51:16,  1.67it/s]


 90%|███████████████████████████████▍   | 44855/50000 [8:08:18<56:54,  1.51it/s]


 90%|███████████████████████████████▍   | 44856/50000 [8:08:19<57:04,  1.50it/s]


 90%|███████████████████████████████▍   | 44857/50000 [8:08:19<52:50,  1.62it/s]


 90%|███████████████████████████████▍   | 44858/50000 [8:08:20<52:10,  1.64it/s]


 90%|███████████████████████████████▍   | 44859/50000 [8:08:20<52:57,  1.62it/s]


 90%|███████████████████████████████▍   | 44860/50000 [8:08:21<54:27,  1.57it/s]


 90%|███████████████████████████████▍   | 44861/50000 [8:08:22<56:33,  1.51it/s]


 90%|███████████████████████████████▍   | 44862/50000 [8:08:22<52:11,  1.64it/s]


 90%|███████████████████████████████▍   | 44863/50000 [8:08:23<55:06,  1.55it/s]


 90%|███████████████████████████████▍   | 44864/50000 [8:08:24<53:24,  1.60it/s]


 90%|███████████████████████████████▍   | 44865/50000 [8:08:24<52:41,  1.62it/s]


 90%|███████████████████████████████▍   | 44866/50000 [8:08:25<54:08,  1.58it/s]


 90%|███████████████████████████████▍   | 44867/50000 [8:08:25<54:03,  1.58it/s]


 90%|███████████████████████████████▍   | 44868/50000 [8:08:26<52:20,  1.63it/s]


 90%|███████████████████████████████▍   | 44869/50000 [8:08:27<51:27,  1.66it/s]


 90%|███████████████████████████████▍   | 44870/50000 [8:08:27<50:31,  1.69it/s]


 90%|███████████████████████████████▍   | 44871/50000 [8:08:28<54:36,  1.57it/s]


 90%|███████████████████████████████▍   | 44872/50000 [8:08:29<54:40,  1.56it/s]


 90%|███████████████████████████████▍   | 44873/50000 [8:08:29<57:50,  1.48it/s]


 90%|███████████████████████████████▍   | 44874/50000 [8:08:30<58:42,  1.46it/s]


 90%|███████████████████████████████▍   | 44875/50000 [8:08:31<59:52,  1.43it/s]


 90%|███████████████████████████████▍   | 44876/50000 [8:08:31<56:53,  1.50it/s]


 90%|███████████████████████████████▍   | 44877/50000 [8:08:32<56:02,  1.52it/s]


 90%|███████████████████████████████▍   | 44878/50000 [8:08:33<55:45,  1.53it/s]


 90%|███████████████████████████████▍   | 44879/50000 [8:08:33<54:24,  1.57it/s]


 90%|███████████████████████████████▍   | 44880/50000 [8:08:34<52:02,  1.64it/s]


 90%|███████████████████████████████▍   | 44881/50000 [8:08:34<54:47,  1.56it/s]


 90%|███████████████████████████████▍   | 44882/50000 [8:08:35<56:30,  1.51it/s]


 90%|███████████████████████████████▍   | 44883/50000 [8:08:36<57:41,  1.48it/s]


 90%|███████████████████████████████▍   | 44884/50000 [8:08:37<57:32,  1.48it/s]


 90%|███████████████████████████████▍   | 44885/50000 [8:08:37<57:46,  1.48it/s]


 90%|███████████████████████████████▍   | 44886/50000 [8:08:38<55:29,  1.54it/s]


 90%|███████████████████████████████▍   | 44887/50000 [8:08:38<55:06,  1.55it/s]


 90%|███████████████████████████████▍   | 44888/50000 [8:08:39<54:48,  1.55it/s]


 90%|███████████████████████████████▍   | 44889/50000 [8:08:40<57:42,  1.48it/s]


 90%|███████████████████████████████▍   | 44890/50000 [8:08:40<55:28,  1.54it/s]


 90%|███████████████████████████████▍   | 44891/50000 [8:08:41<55:11,  1.54it/s]


 90%|███████████████████████████████▍   | 44892/50000 [8:08:42<53:24,  1.59it/s]


 90%|███████████████████████████████▍   | 44893/50000 [8:08:42<51:48,  1.64it/s]


 90%|███████████████████████████████▍   | 44894/50000 [8:08:43<51:39,  1.65it/s]


 90%|███████████████████████████████▍   | 44895/50000 [8:08:43<51:32,  1.65it/s]


 90%|███████████████████████████████▍   | 44896/50000 [8:08:44<51:18,  1.66it/s]


 90%|███████████████████████████████▍   | 44897/50000 [8:08:45<52:06,  1.63it/s]


 90%|███████████████████████████████▍   | 44898/50000 [8:08:45<55:47,  1.52it/s]


 90%|███████████████████████████████▍   | 44899/50000 [8:08:46<56:12,  1.51it/s]


 90%|███████████████████████████████▍   | 44900/50000 [8:08:47<53:25,  1.59it/s]
                                                                                
{'loss': 3.1055, 'grad_norm': 3.410020112991333, 'learning_rate': 0.000102, 'epoch': 2.35}

 90%|███████████████████████████████▍   | 44900/50000 [8:08:47<53:25,  1.59it/s]


 90%|███████████████████████████████▍   | 44901/50000 [8:08:47<53:16,  1.60it/s]


 90%|███████████████████████████████▍   | 44902/50000 [8:08:48<52:30,  1.62it/s]


 90%|███████████████████████████████▍   | 44903/50000 [8:08:48<50:49,  1.67it/s]


 90%|███████████████████████████████▍   | 44904/50000 [8:08:49<57:29,  1.48it/s]


 90%|███████████████████████████████▍   | 44905/50000 [8:08:50<52:35,  1.61it/s]


 90%|███████████████████████████████▍   | 44906/50000 [8:08:50<53:42,  1.58it/s]


 90%|███████████████████████████████▍   | 44907/50000 [8:08:51<54:59,  1.54it/s]


 90%|███████████████████████████████▍   | 44908/50000 [8:08:52<51:47,  1.64it/s]


 90%|███████████████████████████████▍   | 44909/50000 [8:08:52<50:26,  1.68it/s]


 90%|███████████████████████████████▍   | 44910/50000 [8:08:53<51:34,  1.64it/s]


 90%|███████████████████████████████▍   | 44911/50000 [8:08:54<56:41,  1.50it/s]


 90%|███████████████████████████████▍   | 44912/50000 [8:08:54<53:41,  1.58it/s]


 90%|███████████████████████████████▍   | 44913/50000 [8:08:55<54:28,  1.56it/s]


 90%|███████████████████████████████▍   | 44914/50000 [8:08:56<57:11,  1.48it/s]


 90%|███████████████████████████████▍   | 44915/50000 [8:08:56<57:14,  1.48it/s]


 90%|███████████████████████████████▍   | 44916/50000 [8:08:57<56:00,  1.51it/s]


 90%|███████████████████████████████▍   | 44917/50000 [8:08:58<57:26,  1.47it/s]


 90%|███████████████████████████████▍   | 44918/50000 [8:08:58<57:36,  1.47it/s]


 90%|███████████████████████████████▍   | 44919/50000 [8:08:59<56:46,  1.49it/s]


 90%|███████████████████████████████▍   | 44920/50000 [8:09:00<56:29,  1.50it/s]


 90%|███████████████████████████████▍   | 44921/50000 [8:09:00<54:38,  1.55it/s]


 90%|███████████████████████████████▍   | 44922/50000 [8:09:01<55:18,  1.53it/s]


 90%|███████████████████████████████▍   | 44923/50000 [8:09:02<53:51,  1.57it/s]


 90%|███████████████████████████████▍   | 44924/50000 [8:09:02<56:35,  1.49it/s]


 90%|███████████████████████████████▍   | 44925/50000 [8:09:03<51:49,  1.63it/s]


 90%|███████████████████████████████▍   | 44926/50000 [8:09:03<54:19,  1.56it/s]


 90%|███████████████████████████████▍   | 44927/50000 [8:09:04<52:41,  1.60it/s]


 90%|███████████████████████████████▍   | 44928/50000 [8:09:05<53:07,  1.59it/s]


 90%|███████████████████████████████▍   | 44929/50000 [8:09:05<50:28,  1.67it/s]


 90%|███████████████████████████████▍   | 44930/50000 [8:09:06<51:53,  1.63it/s]


 90%|███████████████████████████████▍   | 44931/50000 [8:09:06<51:26,  1.64it/s]


 90%|███████████████████████████████▍   | 44932/50000 [8:09:07<53:12,  1.59it/s]


 90%|███████████████████████████████▍   | 44933/50000 [8:09:08<54:33,  1.55it/s]


 90%|███████████████████████████████▍   | 44934/50000 [8:09:08<52:36,  1.60it/s]


 90%|███████████████████████████████▍   | 44935/50000 [8:09:09<54:55,  1.54it/s]


 90%|███████████████████████████████▍   | 44936/50000 [8:09:10<55:15,  1.53it/s]


 90%|███████████████████████████████▍   | 44937/50000 [8:09:10<53:35,  1.57it/s]


 90%|███████████████████████████████▍   | 44938/50000 [8:09:11<51:55,  1.62it/s]


 90%|███████████████████████████████▍   | 44939/50000 [8:09:12<51:42,  1.63it/s]


 90%|███████████████████████████████▍   | 44940/50000 [8:09:12<54:07,  1.56it/s]


 90%|███████████████████████████████▍   | 44941/50000 [8:09:13<54:10,  1.56it/s]


 90%|███████████████████████████████▍   | 44942/50000 [8:09:14<54:06,  1.56it/s]


 90%|███████████████████████████████▍   | 44943/50000 [8:09:14<56:10,  1.50it/s]


 90%|███████████████████████████████▍   | 44944/50000 [8:09:15<55:09,  1.53it/s]


 90%|███████████████████████████████▍   | 44945/50000 [8:09:15<51:39,  1.63it/s]


 90%|███████████████████████████████▍   | 44946/50000 [8:09:16<50:43,  1.66it/s]


 90%|███████████████████████████████▍   | 44947/50000 [8:09:17<51:48,  1.63it/s]


 90%|███████████████████████████████▍   | 44948/50000 [8:09:17<50:39,  1.66it/s]


 90%|███████████████████████████████▍   | 44949/50000 [8:09:18<52:24,  1.61it/s]


 90%|███████████████████████████████▍   | 44950/50000 [8:09:19<53:55,  1.56it/s]


 90%|███████████████████████████████▍   | 44951/50000 [8:09:19<57:08,  1.47it/s]


 90%|███████████████████████████████▍   | 44952/50000 [8:09:20<56:06,  1.50it/s]


 90%|███████████████████████████████▍   | 44953/50000 [8:09:21<58:21,  1.44it/s]


 90%|███████████████████████████████▍   | 44954/50000 [8:09:21<57:07,  1.47it/s]


 90%|███████████████████████████████▍   | 44955/50000 [8:09:22<58:42,  1.43it/s]


 90%|███████████████████████████████▍   | 44956/50000 [8:09:23<53:47,  1.56it/s]


 90%|███████████████████████████████▍   | 44957/50000 [8:09:23<51:32,  1.63it/s]


 90%|███████████████████████████████▍   | 44958/50000 [8:09:24<49:04,  1.71it/s]


 90%|███████████████████████████████▍   | 44959/50000 [8:09:24<49:29,  1.70it/s]


 90%|███████████████████████████████▍   | 44960/50000 [8:09:25<49:09,  1.71it/s]


 90%|███████████████████████████████▍   | 44961/50000 [8:09:25<50:29,  1.66it/s]


 90%|███████████████████████████████▍   | 44962/50000 [8:09:26<47:51,  1.75it/s]


 90%|███████████████████████████████▍   | 44963/50000 [8:09:27<51:20,  1.63it/s]


 90%|███████████████████████████████▍   | 44964/50000 [8:09:27<54:20,  1.54it/s]


 90%|███████████████████████████████▍   | 44965/50000 [8:09:28<52:59,  1.58it/s]


 90%|███████████████████████████████▍   | 44966/50000 [8:09:29<53:40,  1.56it/s]


 90%|███████████████████████████████▍   | 44967/50000 [8:09:29<55:34,  1.51it/s]


 90%|███████████████████████████████▍   | 44968/50000 [8:09:30<57:25,  1.46it/s]


 90%|███████████████████████████████▍   | 44969/50000 [8:09:31<56:35,  1.48it/s]


 90%|███████████████████████████████▍   | 44970/50000 [8:09:31<56:37,  1.48it/s]


 90%|███████████████████████████████▍   | 44971/50000 [8:09:32<55:33,  1.51it/s]


 90%|███████████████████████████████▍   | 44972/50000 [8:09:33<55:35,  1.51it/s]


 90%|███████████████████████████████▍   | 44973/50000 [8:09:33<53:33,  1.56it/s]


 90%|███████████████████████████████▍   | 44974/50000 [8:09:34<53:25,  1.57it/s]


 90%|███████████████████████████████▍   | 44975/50000 [8:09:35<56:06,  1.49it/s]


 90%|███████████████████████████████▍   | 44976/50000 [8:09:35<55:19,  1.51it/s]


 90%|███████████████████████████████▍   | 44977/50000 [8:09:36<55:27,  1.51it/s]


 90%|███████████████████████████████▍   | 44978/50000 [8:09:37<53:09,  1.57it/s]


 90%|███████████████████████████████▍   | 44979/50000 [8:09:37<52:01,  1.61it/s]


 90%|███████████████████████████████▍   | 44980/50000 [8:09:38<50:37,  1.65it/s]


 90%|███████████████████████████████▍   | 44981/50000 [8:09:38<50:35,  1.65it/s]


 90%|███████████████████████████████▍   | 44982/50000 [8:09:39<53:56,  1.55it/s]


 90%|███████████████████████████████▍   | 44983/50000 [8:09:40<55:57,  1.49it/s]


 90%|███████████████████████████████▍   | 44984/50000 [8:09:40<56:12,  1.49it/s]


 90%|█████████████████████████████▋   | 44985/50000 [8:09:41<1:00:05,  1.39it/s]


 90%|█████████████████████████████▋   | 44986/50000 [8:09:42<1:00:22,  1.38it/s]


 90%|███████████████████████████████▍   | 44987/50000 [8:09:43<59:13,  1.41it/s]


 90%|███████████████████████████████▍   | 44988/50000 [8:09:43<59:45,  1.40it/s]


 90%|███████████████████████████████▍   | 44989/50000 [8:09:44<56:42,  1.47it/s]


 90%|███████████████████████████████▍   | 44990/50000 [8:09:45<54:07,  1.54it/s]


 90%|███████████████████████████████▍   | 44991/50000 [8:09:45<54:22,  1.54it/s]


 90%|███████████████████████████████▍   | 44992/50000 [8:09:46<55:09,  1.51it/s]


 90%|███████████████████████████████▍   | 44993/50000 [8:09:46<51:40,  1.61it/s]


 90%|███████████████████████████████▍   | 44994/50000 [8:09:47<51:51,  1.61it/s]


 90%|███████████████████████████████▍   | 44995/50000 [8:09:48<52:54,  1.58it/s]


 90%|███████████████████████████████▍   | 44996/50000 [8:09:48<54:06,  1.54it/s]


 90%|███████████████████████████████▍   | 44997/50000 [8:09:49<54:38,  1.53it/s]


 90%|███████████████████████████████▍   | 44998/50000 [8:09:50<56:58,  1.46it/s]


 90%|███████████████████████████████▍   | 44999/50000 [8:09:50<54:35,  1.53it/s]


 90%|███████████████████████████████▌   | 45000/50000 [8:09:51<52:54,  1.57it/s]
                                                                                
{'loss': 3.1471, 'grad_norm': 3.0939083099365234, 'learning_rate': 0.0001, 'epoch': 2.36}

 90%|███████████████████████████████▌   | 45000/50000 [8:09:51<52:54,  1.57it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.20s/it][A



 75%|█████████████████████████████████▊           | 3/4 [00:03<00:01,  1.34s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:05<00:00,  1.46s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 34.426924, 'eval_rouge-2': 8.234613999999999, 'eval_rouge-l': 26.983272000000003, 'eval_bleu-4': 0.03904865270549994, 'eval_runtime': 8.527, 'eval_samples_per_second': 5.864, 'eval_steps_per_second': 0.469, 'epoch': 2.36}



 90%|███████████████████████████████▌   | 45000/50000 [8:10:00<52:54,  1.57it/s]

100%|█████████████████████████████████████████████| 4/4 [00:05<00:00,  1.46s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-45000
tokenizer config file saved in ./output/tmp-checkpoint-45000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-45000/special_tokens_map.json



 90%|█████████████████████████████▋   | 45001/50000 [8:10:00<4:30:28,  3.25s/it]


 90%|█████████████████████████████▋   | 45002/50000 [8:10:01<3:26:12,  2.48s/it]


 90%|█████████████████████████████▋   | 45003/50000 [8:10:02<2:38:15,  1.90s/it]


 90%|█████████████████████████████▋   | 45004/50000 [8:10:02<2:09:27,  1.55s/it]


 90%|█████████████████████████████▋   | 45005/50000 [8:10:03<1:45:27,  1.27s/it]


 90%|█████████████████████████████▋   | 45006/50000 [8:10:04<1:27:53,  1.06s/it]


 90%|█████████████████████████████▋   | 45007/50000 [8:10:04<1:17:00,  1.08it/s]


 90%|█████████████████████████████▋   | 45008/50000 [8:10:05<1:12:06,  1.15it/s]


 90%|█████████████████████████████▋   | 45009/50000 [8:10:06<1:09:48,  1.19it/s]


 90%|█████████████████████████████▋   | 45010/50000 [8:10:07<1:10:10,  1.19it/s]


 90%|█████████████████████████████▋   | 45011/50000 [8:10:07<1:03:53,  1.30it/s]


 90%|█████████████████████████████▋   | 45012/50000 [8:10:08<1:00:35,  1.37it/s]


 90%|███████████████████████████████▌   | 45013/50000 [8:10:08<58:12,  1.43it/s]


 90%|███████████████████████████████▌   | 45014/50000 [8:10:09<54:56,  1.51it/s]


 90%|███████████████████████████████▌   | 45015/50000 [8:10:10<54:04,  1.54it/s]


 90%|███████████████████████████████▌   | 45016/50000 [8:10:10<56:20,  1.47it/s]


 90%|███████████████████████████████▌   | 45017/50000 [8:10:11<55:03,  1.51it/s]


 90%|███████████████████████████████▌   | 45018/50000 [8:10:12<57:59,  1.43it/s]


 90%|█████████████████████████████▋   | 45019/50000 [8:10:13<1:00:54,  1.36it/s]


 90%|███████████████████████████████▌   | 45020/50000 [8:10:13<59:10,  1.40it/s]


 90%|███████████████████████████████▌   | 45021/50000 [8:10:14<52:52,  1.57it/s]


 90%|███████████████████████████████▌   | 45022/50000 [8:10:14<54:39,  1.52it/s]


 90%|███████████████████████████████▌   | 45023/50000 [8:10:15<54:00,  1.54it/s]


 90%|███████████████████████████████▌   | 45024/50000 [8:10:16<58:34,  1.42it/s]


 90%|███████████████████████████████▌   | 45025/50000 [8:10:16<55:28,  1.49it/s]


 90%|███████████████████████████████▌   | 45026/50000 [8:10:17<52:22,  1.58it/s]


 90%|███████████████████████████████▌   | 45027/50000 [8:10:18<54:57,  1.51it/s]


 90%|███████████████████████████████▌   | 45028/50000 [8:10:18<53:17,  1.56it/s]


 90%|███████████████████████████████▌   | 45029/50000 [8:10:19<52:17,  1.58it/s]


 90%|███████████████████████████████▌   | 45030/50000 [8:10:20<54:37,  1.52it/s]


 90%|███████████████████████████████▌   | 45031/50000 [8:10:20<56:44,  1.46it/s]


 90%|███████████████████████████████▌   | 45032/50000 [8:10:21<53:54,  1.54it/s]


 90%|███████████████████████████████▌   | 45033/50000 [8:10:22<54:08,  1.53it/s]


 90%|███████████████████████████████▌   | 45034/50000 [8:10:22<52:51,  1.57it/s]


 90%|███████████████████████████████▌   | 45035/50000 [8:10:23<53:23,  1.55it/s]


 90%|███████████████████████████████▌   | 45036/50000 [8:10:23<52:58,  1.56it/s]


 90%|███████████████████████████████▌   | 45037/50000 [8:10:24<53:36,  1.54it/s]


 90%|███████████████████████████████▌   | 45038/50000 [8:10:25<52:59,  1.56it/s]


 90%|███████████████████████████████▌   | 45039/50000 [8:10:26<57:41,  1.43it/s]


 90%|███████████████████████████████▌   | 45040/50000 [8:10:26<54:49,  1.51it/s]


 90%|███████████████████████████████▌   | 45041/50000 [8:10:27<52:42,  1.57it/s]


 90%|███████████████████████████████▌   | 45042/50000 [8:10:27<50:35,  1.63it/s]


 90%|███████████████████████████████▌   | 45043/50000 [8:10:28<51:17,  1.61it/s]


 90%|███████████████████████████████▌   | 45044/50000 [8:10:29<51:00,  1.62it/s]


 90%|███████████████████████████████▌   | 45045/50000 [8:10:29<57:53,  1.43it/s]


 90%|███████████████████████████████▌   | 45046/50000 [8:10:30<56:19,  1.47it/s]


 90%|███████████████████████████████▌   | 45047/50000 [8:10:31<55:41,  1.48it/s]


 90%|███████████████████████████████▌   | 45048/50000 [8:10:31<55:18,  1.49it/s]


 90%|███████████████████████████████▌   | 45049/50000 [8:10:32<53:26,  1.54it/s]


 90%|███████████████████████████████▌   | 45050/50000 [8:10:33<51:30,  1.60it/s]


 90%|███████████████████████████████▌   | 45051/50000 [8:10:33<55:13,  1.49it/s]


 90%|███████████████████████████████▌   | 45052/50000 [8:10:34<54:56,  1.50it/s]


 90%|███████████████████████████████▌   | 45053/50000 [8:10:35<54:53,  1.50it/s]


 90%|███████████████████████████████▌   | 45054/50000 [8:10:35<56:47,  1.45it/s]


 90%|███████████████████████████████▌   | 45055/50000 [8:10:36<51:53,  1.59it/s]


 90%|███████████████████████████████▌   | 45056/50000 [8:10:36<48:26,  1.70it/s]


 90%|███████████████████████████████▌   | 45057/50000 [8:10:37<49:59,  1.65it/s]


 90%|███████████████████████████████▌   | 45058/50000 [8:10:38<49:49,  1.65it/s]


 90%|███████████████████████████████▌   | 45059/50000 [8:10:38<52:23,  1.57it/s]


 90%|█████████████████████████████▋   | 45060/50000 [8:10:39<1:01:40,  1.34it/s]


 90%|███████████████████████████████▌   | 45061/50000 [8:10:40<59:19,  1.39it/s]


 90%|███████████████████████████████▌   | 45062/50000 [8:10:41<57:02,  1.44it/s]


 90%|███████████████████████████████▌   | 45063/50000 [8:10:41<54:12,  1.52it/s]


 90%|███████████████████████████████▌   | 45064/50000 [8:10:42<51:45,  1.59it/s]


 90%|███████████████████████████████▌   | 45065/50000 [8:10:42<51:35,  1.59it/s]


 90%|███████████████████████████████▌   | 45066/50000 [8:10:43<50:41,  1.62it/s]


 90%|███████████████████████████████▌   | 45067/50000 [8:10:44<51:59,  1.58it/s]


 90%|███████████████████████████████▌   | 45068/50000 [8:10:44<50:57,  1.61it/s]


 90%|███████████████████████████████▌   | 45069/50000 [8:10:45<50:32,  1.63it/s]


 90%|███████████████████████████████▌   | 45070/50000 [8:10:46<55:34,  1.48it/s]


 90%|███████████████████████████████▌   | 45071/50000 [8:10:46<58:05,  1.41it/s]


 90%|███████████████████████████████▌   | 45072/50000 [8:10:47<56:42,  1.45it/s]


 90%|███████████████████████████████▌   | 45073/50000 [8:10:48<54:02,  1.52it/s]


 90%|███████████████████████████████▌   | 45074/50000 [8:10:48<52:37,  1.56it/s]


 90%|███████████████████████████████▌   | 45075/50000 [8:10:49<52:52,  1.55it/s]


 90%|███████████████████████████████▌   | 45076/50000 [8:10:50<51:24,  1.60it/s]


 90%|███████████████████████████████▌   | 45077/50000 [8:10:50<50:09,  1.64it/s]


 90%|███████████████████████████████▌   | 45078/50000 [8:10:51<52:42,  1.56it/s]


 90%|███████████████████████████████▌   | 45079/50000 [8:10:51<52:17,  1.57it/s]


 90%|███████████████████████████████▌   | 45080/50000 [8:10:52<51:13,  1.60it/s]


 90%|███████████████████████████████▌   | 45081/50000 [8:10:53<53:31,  1.53it/s]


 90%|███████████████████████████████▌   | 45082/50000 [8:10:54<56:20,  1.45it/s]


 90%|█████████████████████████████▊   | 45083/50000 [8:10:54<1:00:21,  1.36it/s]


 90%|███████████████████████████████▌   | 45084/50000 [8:10:55<58:38,  1.40it/s]


 90%|███████████████████████████████▌   | 45085/50000 [8:10:56<57:54,  1.41it/s]


 90%|███████████████████████████████▌   | 45086/50000 [8:10:56<55:12,  1.48it/s]


 90%|███████████████████████████████▌   | 45087/50000 [8:10:57<55:29,  1.48it/s]


 90%|███████████████████████████████▌   | 45088/50000 [8:10:58<55:04,  1.49it/s]


 90%|███████████████████████████████▌   | 45089/50000 [8:10:58<57:00,  1.44it/s]


 90%|███████████████████████████████▌   | 45090/50000 [8:10:59<55:12,  1.48it/s]


 90%|███████████████████████████████▌   | 45091/50000 [8:11:00<52:22,  1.56it/s]


 90%|███████████████████████████████▌   | 45092/50000 [8:11:00<50:32,  1.62it/s]


 90%|███████████████████████████████▌   | 45093/50000 [8:11:01<51:45,  1.58it/s]


 90%|███████████████████████████████▌   | 45094/50000 [8:11:01<51:01,  1.60it/s]


 90%|███████████████████████████████▌   | 45095/50000 [8:11:02<49:34,  1.65it/s]


 90%|███████████████████████████████▌   | 45096/50000 [8:11:03<49:02,  1.67it/s]


 90%|███████████████████████████████▌   | 45097/50000 [8:11:03<48:21,  1.69it/s]


 90%|███████████████████████████████▌   | 45098/50000 [8:11:04<47:25,  1.72it/s]


 90%|███████████████████████████████▌   | 45099/50000 [8:11:04<47:50,  1.71it/s]


 90%|███████████████████████████████▌   | 45100/50000 [8:11:05<49:30,  1.65it/s]
                                                                                
{'loss': 3.1363, 'grad_norm': 2.8472232818603516, 'learning_rate': 9.800000000000001e-05, 'epoch': 2.36}

 90%|███████████████████████████████▌   | 45100/50000 [8:11:05<49:30,  1.65it/s]


 90%|███████████████████████████████▌   | 45101/50000 [8:11:06<47:31,  1.72it/s]


 90%|███████████████████████████████▌   | 45102/50000 [8:11:06<48:50,  1.67it/s]


 90%|███████████████████████████████▌   | 45103/50000 [8:11:07<48:38,  1.68it/s]


 90%|███████████████████████████████▌   | 45104/50000 [8:11:07<50:04,  1.63it/s]


 90%|███████████████████████████████▌   | 45105/50000 [8:11:08<53:18,  1.53it/s]


 90%|███████████████████████████████▌   | 45106/50000 [8:11:09<51:00,  1.60it/s]


 90%|███████████████████████████████▌   | 45107/50000 [8:11:09<51:53,  1.57it/s]


 90%|███████████████████████████████▌   | 45108/50000 [8:11:10<52:20,  1.56it/s]


 90%|███████████████████████████████▌   | 45109/50000 [8:11:11<54:15,  1.50it/s]


 90%|███████████████████████████████▌   | 45110/50000 [8:11:11<53:58,  1.51it/s]


 90%|███████████████████████████████▌   | 45111/50000 [8:11:12<54:14,  1.50it/s]


 90%|███████████████████████████████▌   | 45112/50000 [8:11:13<53:47,  1.51it/s]


 90%|███████████████████████████████▌   | 45113/50000 [8:11:13<52:32,  1.55it/s]


 90%|███████████████████████████████▌   | 45114/50000 [8:11:14<51:22,  1.59it/s]


 90%|███████████████████████████████▌   | 45115/50000 [8:11:15<52:43,  1.54it/s]


 90%|███████████████████████████████▌   | 45116/50000 [8:11:15<49:41,  1.64it/s]


 90%|███████████████████████████████▌   | 45117/50000 [8:11:16<48:54,  1.66it/s]


 90%|███████████████████████████████▌   | 45118/50000 [8:11:16<49:44,  1.64it/s]


 90%|███████████████████████████████▌   | 45119/50000 [8:11:17<51:05,  1.59it/s]


 90%|███████████████████████████████▌   | 45120/50000 [8:11:18<54:17,  1.50it/s]


 90%|███████████████████████████████▌   | 45121/50000 [8:11:18<53:30,  1.52it/s]


 90%|███████████████████████████████▌   | 45122/50000 [8:11:19<55:47,  1.46it/s]


 90%|███████████████████████████████▌   | 45123/50000 [8:11:20<51:31,  1.58it/s]


 90%|███████████████████████████████▌   | 45124/50000 [8:11:20<51:54,  1.57it/s]


 90%|███████████████████████████████▌   | 45125/50000 [8:11:21<54:24,  1.49it/s]


 90%|███████████████████████████████▌   | 45126/50000 [8:11:22<55:29,  1.46it/s]


 90%|███████████████████████████████▌   | 45127/50000 [8:11:22<52:58,  1.53it/s]


 90%|███████████████████████████████▌   | 45128/50000 [8:11:23<53:04,  1.53it/s]


 90%|███████████████████████████████▌   | 45129/50000 [8:11:24<53:08,  1.53it/s]


 90%|███████████████████████████████▌   | 45130/50000 [8:11:24<50:33,  1.61it/s]


 90%|███████████████████████████████▌   | 45131/50000 [8:11:25<51:03,  1.59it/s]


 90%|███████████████████████████████▌   | 45132/50000 [8:11:26<53:57,  1.50it/s]


 90%|███████████████████████████████▌   | 45133/50000 [8:11:26<50:49,  1.60it/s]


 90%|███████████████████████████████▌   | 45134/50000 [8:11:27<51:50,  1.56it/s]


 90%|███████████████████████████████▌   | 45135/50000 [8:11:27<51:26,  1.58it/s]


 90%|███████████████████████████████▌   | 45136/50000 [8:11:28<55:57,  1.45it/s]


 90%|███████████████████████████████▌   | 45137/50000 [8:11:29<51:49,  1.56it/s]


 90%|███████████████████████████████▌   | 45138/50000 [8:11:30<53:51,  1.50it/s]


 90%|███████████████████████████████▌   | 45139/50000 [8:11:30<58:36,  1.38it/s]


 90%|███████████████████████████████▌   | 45140/50000 [8:11:31<57:27,  1.41it/s]


 90%|███████████████████████████████▌   | 45141/50000 [8:11:32<54:25,  1.49it/s]


 90%|███████████████████████████████▌   | 45142/50000 [8:11:32<51:59,  1.56it/s]


 90%|███████████████████████████████▌   | 45143/50000 [8:11:33<52:52,  1.53it/s]


 90%|███████████████████████████████▌   | 45144/50000 [8:11:34<51:48,  1.56it/s]


 90%|███████████████████████████████▌   | 45145/50000 [8:11:34<50:44,  1.59it/s]


 90%|███████████████████████████████▌   | 45146/50000 [8:11:35<50:55,  1.59it/s]


 90%|███████████████████████████████▌   | 45147/50000 [8:11:35<52:53,  1.53it/s]


 90%|███████████████████████████████▌   | 45148/50000 [8:11:36<51:39,  1.57it/s]


 90%|███████████████████████████████▌   | 45149/50000 [8:11:37<52:29,  1.54it/s]


 90%|███████████████████████████████▌   | 45150/50000 [8:11:37<50:17,  1.61it/s]


 90%|███████████████████████████████▌   | 45151/50000 [8:11:38<54:04,  1.49it/s]


 90%|███████████████████████████████▌   | 45152/50000 [8:11:39<55:58,  1.44it/s]


 90%|███████████████████████████████▌   | 45153/50000 [8:11:40<59:51,  1.35it/s]


 90%|███████████████████████████████▌   | 45154/50000 [8:11:40<59:17,  1.36it/s]


 90%|███████████████████████████████▌   | 45155/50000 [8:11:41<55:10,  1.46it/s]


 90%|███████████████████████████████▌   | 45156/50000 [8:11:42<54:09,  1.49it/s]


 90%|███████████████████████████████▌   | 45157/50000 [8:11:42<55:09,  1.46it/s]


 90%|███████████████████████████████▌   | 45158/50000 [8:11:43<53:04,  1.52it/s]


 90%|███████████████████████████████▌   | 45159/50000 [8:11:43<50:33,  1.60it/s]


 90%|███████████████████████████████▌   | 45160/50000 [8:11:44<51:03,  1.58it/s]


 90%|███████████████████████████████▌   | 45161/50000 [8:11:45<53:19,  1.51it/s]


 90%|███████████████████████████████▌   | 45162/50000 [8:11:46<57:36,  1.40it/s]


 90%|███████████████████████████████▌   | 45163/50000 [8:11:46<56:28,  1.43it/s]


 90%|███████████████████████████████▌   | 45164/50000 [8:11:47<55:40,  1.45it/s]


 90%|███████████████████████████████▌   | 45165/50000 [8:11:48<52:11,  1.54it/s]


 90%|███████████████████████████████▌   | 45166/50000 [8:11:48<50:39,  1.59it/s]


 90%|███████████████████████████████▌   | 45167/50000 [8:11:49<55:16,  1.46it/s]


 90%|███████████████████████████████▌   | 45168/50000 [8:11:50<54:39,  1.47it/s]


 90%|███████████████████████████████▌   | 45169/50000 [8:11:50<54:03,  1.49it/s]


 90%|███████████████████████████████▌   | 45170/50000 [8:11:51<55:02,  1.46it/s]


 90%|███████████████████████████████▌   | 45171/50000 [8:11:52<54:46,  1.47it/s]


 90%|███████████████████████████████▌   | 45172/50000 [8:11:52<56:02,  1.44it/s]


 90%|███████████████████████████████▌   | 45173/50000 [8:11:53<54:34,  1.47it/s]


 90%|███████████████████████████████▌   | 45174/50000 [8:11:54<55:44,  1.44it/s]


 90%|███████████████████████████████▌   | 45175/50000 [8:11:55<58:37,  1.37it/s]


 90%|███████████████████████████████▌   | 45176/50000 [8:11:55<55:01,  1.46it/s]


 90%|█████████████████████████████▊   | 45177/50000 [8:11:56<1:03:33,  1.26it/s]


 90%|███████████████████████████████▌   | 45178/50000 [8:11:57<58:54,  1.36it/s]


 90%|███████████████████████████████▋   | 45179/50000 [8:11:57<56:24,  1.42it/s]


 90%|███████████████████████████████▋   | 45180/50000 [8:11:58<55:34,  1.45it/s]


 90%|███████████████████████████████▋   | 45181/50000 [8:11:59<54:03,  1.49it/s]


 90%|███████████████████████████████▋   | 45182/50000 [8:11:59<54:03,  1.49it/s]


 90%|███████████████████████████████▋   | 45183/50000 [8:12:00<54:04,  1.48it/s]


 90%|███████████████████████████████▋   | 45184/50000 [8:12:01<53:28,  1.50it/s]


 90%|███████████████████████████████▋   | 45185/50000 [8:12:01<55:14,  1.45it/s]


 90%|███████████████████████████████▋   | 45186/50000 [8:12:02<55:12,  1.45it/s]


 90%|███████████████████████████████▋   | 45187/50000 [8:12:03<52:54,  1.52it/s]


 90%|███████████████████████████████▋   | 45188/50000 [8:12:03<52:22,  1.53it/s]


 90%|███████████████████████████████▋   | 45189/50000 [8:12:04<53:50,  1.49it/s]


 90%|███████████████████████████████▋   | 45190/50000 [8:12:05<51:39,  1.55it/s]


 90%|███████████████████████████████▋   | 45191/50000 [8:12:05<51:38,  1.55it/s]


 90%|███████████████████████████████▋   | 45192/50000 [8:12:06<51:19,  1.56it/s]


 90%|███████████████████████████████▋   | 45193/50000 [8:12:07<51:51,  1.55it/s]


 90%|███████████████████████████████▋   | 45194/50000 [8:12:07<49:55,  1.60it/s]


 90%|███████████████████████████████▋   | 45195/50000 [8:12:08<52:37,  1.52it/s]


 90%|███████████████████████████████▋   | 45196/50000 [8:12:09<54:30,  1.47it/s]


 90%|███████████████████████████████▋   | 45197/50000 [8:12:09<55:38,  1.44it/s]


 90%|███████████████████████████████▋   | 45198/50000 [8:12:10<53:38,  1.49it/s]


 90%|███████████████████████████████▋   | 45199/50000 [8:12:11<55:20,  1.45it/s]


 90%|███████████████████████████████▋   | 45200/50000 [8:12:11<54:10,  1.48it/s]
                                                                                
{'loss': 3.1387, 'grad_norm': 2.8686001300811768, 'learning_rate': 9.6e-05, 'epoch': 2.37}

 90%|███████████████████████████████▋   | 45200/50000 [8:12:11<54:10,  1.48it/s]


 90%|███████████████████████████████▋   | 45201/50000 [8:12:12<53:55,  1.48it/s]


 90%|███████████████████████████████▋   | 45202/50000 [8:12:13<49:35,  1.61it/s]


 90%|███████████████████████████████▋   | 45203/50000 [8:12:13<49:51,  1.60it/s]


 90%|███████████████████████████████▋   | 45204/50000 [8:12:14<50:37,  1.58it/s]


 90%|███████████████████████████████▋   | 45205/50000 [8:12:15<52:54,  1.51it/s]


 90%|███████████████████████████████▋   | 45206/50000 [8:12:15<51:24,  1.55it/s]


 90%|███████████████████████████████▋   | 45207/50000 [8:12:16<51:06,  1.56it/s]


 90%|███████████████████████████████▋   | 45208/50000 [8:12:16<50:46,  1.57it/s]


 90%|███████████████████████████████▋   | 45209/50000 [8:12:17<49:06,  1.63it/s]


 90%|███████████████████████████████▋   | 45210/50000 [8:12:18<51:54,  1.54it/s]


 90%|███████████████████████████████▋   | 45211/50000 [8:12:18<52:18,  1.53it/s]


 90%|███████████████████████████████▋   | 45212/50000 [8:12:19<57:13,  1.39it/s]


 90%|█████████████████████████████▊   | 45213/50000 [8:12:20<1:00:01,  1.33it/s]


 90%|███████████████████████████████▋   | 45214/50000 [8:12:21<58:09,  1.37it/s]


 90%|███████████████████████████████▋   | 45215/50000 [8:12:22<58:51,  1.35it/s]


 90%|███████████████████████████████▋   | 45216/50000 [8:12:22<57:36,  1.38it/s]


 90%|███████████████████████████████▋   | 45217/50000 [8:12:23<55:51,  1.43it/s]


 90%|███████████████████████████████▋   | 45218/50000 [8:12:24<56:56,  1.40it/s]


 90%|███████████████████████████████▋   | 45219/50000 [8:12:24<54:14,  1.47it/s]


 90%|███████████████████████████████▋   | 45220/50000 [8:12:25<53:01,  1.50it/s]


 90%|███████████████████████████████▋   | 45221/50000 [8:12:25<50:56,  1.56it/s]


 90%|███████████████████████████████▋   | 45222/50000 [8:12:26<52:03,  1.53it/s]


 90%|███████████████████████████████▋   | 45223/50000 [8:12:27<52:03,  1.53it/s]


 90%|███████████████████████████████▋   | 45224/50000 [8:12:27<54:14,  1.47it/s]


 90%|███████████████████████████████▋   | 45225/50000 [8:12:28<51:46,  1.54it/s]


 90%|███████████████████████████████▋   | 45226/50000 [8:12:29<54:21,  1.46it/s]


 90%|███████████████████████████████▋   | 45227/50000 [8:12:29<52:09,  1.53it/s]


 90%|███████████████████████████████▋   | 45228/50000 [8:12:30<50:36,  1.57it/s]


 90%|███████████████████████████████▋   | 45229/50000 [8:12:31<51:38,  1.54it/s]


 90%|███████████████████████████████▋   | 45230/50000 [8:12:31<50:05,  1.59it/s]


 90%|███████████████████████████████▋   | 45231/50000 [8:12:32<51:06,  1.56it/s]


 90%|███████████████████████████████▋   | 45232/50000 [8:12:33<50:37,  1.57it/s]


 90%|███████████████████████████████▋   | 45233/50000 [8:12:33<51:20,  1.55it/s]


 90%|███████████████████████████████▋   | 45234/50000 [8:12:34<51:59,  1.53it/s]


 90%|███████████████████████████████▋   | 45235/50000 [8:12:35<53:34,  1.48it/s]


 90%|███████████████████████████████▋   | 45236/50000 [8:12:35<51:07,  1.55it/s]


 90%|███████████████████████████████▋   | 45237/50000 [8:12:36<50:56,  1.56it/s]


 90%|███████████████████████████████▋   | 45238/50000 [8:12:36<50:13,  1.58it/s]


 90%|███████████████████████████████▋   | 45239/50000 [8:12:37<48:32,  1.63it/s]


 90%|███████████████████████████████▋   | 45240/50000 [8:12:38<49:08,  1.61it/s]


 90%|███████████████████████████████▋   | 45241/50000 [8:12:38<50:16,  1.58it/s]


 90%|███████████████████████████████▋   | 45242/50000 [8:12:39<46:50,  1.69it/s]


 90%|███████████████████████████████▋   | 45243/50000 [8:12:39<44:51,  1.77it/s]


 90%|███████████████████████████████▋   | 45244/50000 [8:12:40<45:48,  1.73it/s]


 90%|███████████████████████████████▋   | 45245/50000 [8:12:41<46:57,  1.69it/s]


 90%|███████████████████████████████▋   | 45246/50000 [8:12:41<48:37,  1.63it/s]


 90%|███████████████████████████████▋   | 45247/50000 [8:12:42<49:29,  1.60it/s]


 90%|███████████████████████████████▋   | 45248/50000 [8:12:42<47:08,  1.68it/s]


 90%|███████████████████████████████▋   | 45249/50000 [8:12:43<46:47,  1.69it/s]


 90%|███████████████████████████████▋   | 45250/50000 [8:12:44<46:06,  1.72it/s]


 91%|███████████████████████████████▋   | 45251/50000 [8:12:44<46:30,  1.70it/s]


 91%|███████████████████████████████▋   | 45252/50000 [8:12:45<50:28,  1.57it/s]


 91%|███████████████████████████████▋   | 45253/50000 [8:12:46<55:50,  1.42it/s]


 91%|███████████████████████████████▋   | 45254/50000 [8:12:46<52:56,  1.49it/s]


 91%|███████████████████████████████▋   | 45255/50000 [8:12:47<51:16,  1.54it/s]


 91%|███████████████████████████████▋   | 45256/50000 [8:12:47<48:54,  1.62it/s]


 91%|███████████████████████████████▋   | 45257/50000 [8:12:48<49:30,  1.60it/s]


 91%|███████████████████████████████▋   | 45258/50000 [8:12:49<48:26,  1.63it/s]


 91%|███████████████████████████████▋   | 45259/50000 [8:12:49<46:04,  1.71it/s]


 91%|███████████████████████████████▋   | 45260/50000 [8:12:50<46:09,  1.71it/s]


 91%|███████████████████████████████▋   | 45261/50000 [8:12:51<49:22,  1.60it/s]


 91%|███████████████████████████████▋   | 45262/50000 [8:12:51<50:42,  1.56it/s]


 91%|███████████████████████████████▋   | 45263/50000 [8:12:52<49:41,  1.59it/s]


 91%|███████████████████████████████▋   | 45264/50000 [8:12:52<48:49,  1.62it/s]


 91%|███████████████████████████████▋   | 45265/50000 [8:12:53<49:51,  1.58it/s]


 91%|███████████████████████████████▋   | 45266/50000 [8:12:54<50:25,  1.56it/s]


 91%|███████████████████████████████▋   | 45267/50000 [8:12:54<49:41,  1.59it/s]


 91%|███████████████████████████████▋   | 45268/50000 [8:12:55<50:26,  1.56it/s]


 91%|███████████████████████████████▋   | 45269/50000 [8:12:56<52:27,  1.50it/s]


 91%|███████████████████████████████▋   | 45270/50000 [8:12:56<48:49,  1.61it/s]


 91%|███████████████████████████████▋   | 45271/50000 [8:12:57<47:25,  1.66it/s]


 91%|███████████████████████████████▋   | 45272/50000 [8:12:57<48:40,  1.62it/s]


 91%|███████████████████████████████▋   | 45273/50000 [8:12:58<50:17,  1.57it/s]


 91%|███████████████████████████████▋   | 45274/50000 [8:12:59<50:40,  1.55it/s]


 91%|███████████████████████████████▋   | 45275/50000 [8:12:59<50:20,  1.56it/s]


 91%|███████████████████████████████▋   | 45276/50000 [8:13:00<51:17,  1.54it/s]


 91%|███████████████████████████████▋   | 45277/50000 [8:13:01<50:09,  1.57it/s]


 91%|███████████████████████████████▋   | 45278/50000 [8:13:01<50:16,  1.57it/s]


 91%|███████████████████████████████▋   | 45279/50000 [8:13:02<50:49,  1.55it/s]


 91%|███████████████████████████████▋   | 45280/50000 [8:13:03<51:01,  1.54it/s]


 91%|███████████████████████████████▋   | 45281/50000 [8:13:03<49:39,  1.58it/s]


 91%|███████████████████████████████▋   | 45282/50000 [8:13:04<51:54,  1.51it/s]


 91%|███████████████████████████████▋   | 45283/50000 [8:13:05<53:19,  1.47it/s]


 91%|███████████████████████████████▋   | 45284/50000 [8:13:05<52:11,  1.51it/s]


 91%|███████████████████████████████▋   | 45285/50000 [8:13:06<54:36,  1.44it/s]


 91%|███████████████████████████████▋   | 45286/50000 [8:13:07<52:17,  1.50it/s]


 91%|███████████████████████████████▋   | 45287/50000 [8:13:07<52:01,  1.51it/s]


 91%|███████████████████████████████▋   | 45288/50000 [8:13:08<52:35,  1.49it/s]


 91%|███████████████████████████████▋   | 45289/50000 [8:13:09<54:22,  1.44it/s]


 91%|███████████████████████████████▋   | 45290/50000 [8:13:10<55:30,  1.41it/s]


 91%|███████████████████████████████▋   | 45291/50000 [8:13:10<52:50,  1.49it/s]


 91%|███████████████████████████████▋   | 45292/50000 [8:13:11<51:42,  1.52it/s]


 91%|███████████████████████████████▋   | 45293/50000 [8:13:11<49:50,  1.57it/s]


 91%|███████████████████████████████▋   | 45294/50000 [8:13:12<52:20,  1.50it/s]


 91%|███████████████████████████████▋   | 45295/50000 [8:13:13<51:46,  1.51it/s]


 91%|███████████████████████████████▋   | 45296/50000 [8:13:13<53:02,  1.48it/s]


 91%|███████████████████████████████▋   | 45297/50000 [8:13:14<52:16,  1.50it/s]


 91%|███████████████████████████████▋   | 45298/50000 [8:13:15<50:59,  1.54it/s]


 91%|███████████████████████████████▋   | 45299/50000 [8:13:15<49:21,  1.59it/s]


 91%|███████████████████████████████▋   | 45300/50000 [8:13:16<50:22,  1.56it/s]
                                                                                
{'loss': 3.1461, 'grad_norm': 4.025346279144287, 'learning_rate': 9.400000000000001e-05, 'epoch': 2.37}

 91%|███████████████████████████████▋   | 45300/50000 [8:13:16<50:22,  1.56it/s]


 91%|███████████████████████████████▋   | 45301/50000 [8:13:16<48:27,  1.62it/s]


 91%|███████████████████████████████▋   | 45302/50000 [8:13:17<48:51,  1.60it/s]


 91%|███████████████████████████████▋   | 45303/50000 [8:13:18<47:56,  1.63it/s]


 91%|███████████████████████████████▋   | 45304/50000 [8:13:18<48:55,  1.60it/s]


 91%|███████████████████████████████▋   | 45305/50000 [8:13:19<47:40,  1.64it/s]


 91%|███████████████████████████████▋   | 45306/50000 [8:13:20<53:03,  1.47it/s]


 91%|███████████████████████████████▋   | 45307/50000 [8:13:20<51:45,  1.51it/s]


 91%|███████████████████████████████▋   | 45308/50000 [8:13:21<53:06,  1.47it/s]


 91%|███████████████████████████████▋   | 45309/50000 [8:13:22<52:40,  1.48it/s]


 91%|███████████████████████████████▋   | 45310/50000 [8:13:23<54:06,  1.44it/s]


 91%|███████████████████████████████▋   | 45311/50000 [8:13:23<51:20,  1.52it/s]


 91%|███████████████████████████████▋   | 45312/50000 [8:13:24<49:59,  1.56it/s]


 91%|███████████████████████████████▋   | 45313/50000 [8:13:24<50:22,  1.55it/s]


 91%|███████████████████████████████▋   | 45314/50000 [8:13:25<49:19,  1.58it/s]


 91%|███████████████████████████████▋   | 45315/50000 [8:13:26<47:40,  1.64it/s]


 91%|███████████████████████████████▋   | 45316/50000 [8:13:26<47:24,  1.65it/s]


 91%|███████████████████████████████▋   | 45317/50000 [8:13:27<47:12,  1.65it/s]


 91%|███████████████████████████████▋   | 45318/50000 [8:13:27<46:55,  1.66it/s]


 91%|███████████████████████████████▋   | 45319/50000 [8:13:28<46:43,  1.67it/s]


 91%|███████████████████████████████▋   | 45320/50000 [8:13:29<48:31,  1.61it/s]


 91%|███████████████████████████████▋   | 45321/50000 [8:13:29<49:21,  1.58it/s]


 91%|███████████████████████████████▋   | 45322/50000 [8:13:30<54:41,  1.43it/s]


 91%|███████████████████████████████▋   | 45323/50000 [8:13:31<54:18,  1.44it/s]


 91%|███████████████████████████████▋   | 45324/50000 [8:13:31<51:49,  1.50it/s]


 91%|███████████████████████████████▋   | 45325/50000 [8:13:32<58:45,  1.33it/s]


 91%|███████████████████████████████▋   | 45326/50000 [8:13:33<56:33,  1.38it/s]


 91%|███████████████████████████████▋   | 45327/50000 [8:13:34<55:17,  1.41it/s]


 91%|███████████████████████████████▋   | 45328/50000 [8:13:34<53:11,  1.46it/s]


 91%|███████████████████████████████▋   | 45329/50000 [8:13:35<51:51,  1.50it/s]


 91%|███████████████████████████████▋   | 45330/50000 [8:13:35<49:27,  1.57it/s]


 91%|███████████████████████████████▋   | 45331/50000 [8:13:36<49:23,  1.58it/s]


 91%|███████████████████████████████▋   | 45332/50000 [8:13:37<48:21,  1.61it/s]


 91%|███████████████████████████████▋   | 45333/50000 [8:13:37<47:43,  1.63it/s]


 91%|███████████████████████████████▋   | 45334/50000 [8:13:38<47:17,  1.64it/s]


 91%|███████████████████████████████▋   | 45335/50000 [8:13:38<45:49,  1.70it/s]


 91%|███████████████████████████████▋   | 45336/50000 [8:13:39<51:05,  1.52it/s]


 91%|███████████████████████████████▋   | 45337/50000 [8:13:40<54:38,  1.42it/s]


 91%|███████████████████████████████▋   | 45338/50000 [8:13:41<50:56,  1.53it/s]


 91%|███████████████████████████████▋   | 45339/50000 [8:13:41<48:29,  1.60it/s]


 91%|███████████████████████████████▋   | 45340/50000 [8:13:42<47:20,  1.64it/s]


 91%|███████████████████████████████▋   | 45341/50000 [8:13:43<52:43,  1.47it/s]


 91%|███████████████████████████████▋   | 45342/50000 [8:13:43<52:06,  1.49it/s]


 91%|███████████████████████████████▋   | 45343/50000 [8:13:44<50:10,  1.55it/s]


 91%|███████████████████████████████▋   | 45344/50000 [8:13:44<49:59,  1.55it/s]


 91%|███████████████████████████████▋   | 45345/50000 [8:13:45<59:08,  1.31it/s]


 91%|███████████████████████████████▋   | 45346/50000 [8:13:46<55:17,  1.40it/s]


 91%|███████████████████████████████▋   | 45347/50000 [8:13:47<53:55,  1.44it/s]


 91%|███████████████████████████████▋   | 45348/50000 [8:13:47<52:49,  1.47it/s]


 91%|███████████████████████████████▋   | 45349/50000 [8:13:48<53:40,  1.44it/s]


 91%|███████████████████████████████▋   | 45350/50000 [8:13:49<52:21,  1.48it/s]


 91%|███████████████████████████████▋   | 45351/50000 [8:13:50<58:31,  1.32it/s]


 91%|███████████████████████████████▋   | 45352/50000 [8:13:50<54:15,  1.43it/s]


 91%|███████████████████████████████▋   | 45353/50000 [8:13:51<57:55,  1.34it/s]


 91%|███████████████████████████████▋   | 45354/50000 [8:13:52<54:33,  1.42it/s]


 91%|███████████████████████████████▋   | 45355/50000 [8:13:52<52:02,  1.49it/s]


 91%|███████████████████████████████▋   | 45356/50000 [8:13:53<51:47,  1.49it/s]


 91%|███████████████████████████████▋   | 45357/50000 [8:13:54<55:07,  1.40it/s]


 91%|███████████████████████████████▊   | 45358/50000 [8:13:54<52:03,  1.49it/s]


 91%|███████████████████████████████▊   | 45359/50000 [8:13:55<49:53,  1.55it/s]


 91%|███████████████████████████████▊   | 45360/50000 [8:13:56<50:15,  1.54it/s]


 91%|███████████████████████████████▊   | 45361/50000 [8:13:56<50:20,  1.54it/s]


 91%|███████████████████████████████▊   | 45362/50000 [8:13:57<50:34,  1.53it/s]


 91%|███████████████████████████████▊   | 45363/50000 [8:13:58<48:57,  1.58it/s]


 91%|███████████████████████████████▊   | 45364/50000 [8:13:58<48:24,  1.60it/s]


 91%|███████████████████████████████▊   | 45365/50000 [8:13:59<49:21,  1.57it/s]


 91%|███████████████████████████████▊   | 45366/50000 [8:13:59<47:48,  1.62it/s]


 91%|███████████████████████████████▊   | 45367/50000 [8:14:00<48:54,  1.58it/s]


 91%|███████████████████████████████▊   | 45368/50000 [8:14:01<49:05,  1.57it/s]


 91%|███████████████████████████████▊   | 45369/50000 [8:14:02<53:37,  1.44it/s]


 91%|███████████████████████████████▊   | 45370/50000 [8:14:02<50:17,  1.53it/s]


 91%|███████████████████████████████▊   | 45371/50000 [8:14:03<52:23,  1.47it/s]


 91%|███████████████████████████████▊   | 45372/50000 [8:14:03<49:51,  1.55it/s]


 91%|███████████████████████████████▊   | 45373/50000 [8:14:04<47:47,  1.61it/s]


 91%|███████████████████████████████▊   | 45374/50000 [8:14:05<48:05,  1.60it/s]


 91%|███████████████████████████████▊   | 45375/50000 [8:14:06<55:31,  1.39it/s]


 91%|███████████████████████████████▊   | 45376/50000 [8:14:06<54:44,  1.41it/s]


 91%|███████████████████████████████▊   | 45377/50000 [8:14:07<54:01,  1.43it/s]


 91%|███████████████████████████████▊   | 45378/50000 [8:14:07<51:28,  1.50it/s]


 91%|███████████████████████████████▊   | 45379/50000 [8:14:08<51:18,  1.50it/s]


 91%|███████████████████████████████▊   | 45380/50000 [8:14:09<48:01,  1.60it/s]


 91%|███████████████████████████████▊   | 45381/50000 [8:14:09<48:28,  1.59it/s]


 91%|███████████████████████████████▊   | 45382/50000 [8:14:10<46:53,  1.64it/s]


 91%|███████████████████████████████▊   | 45383/50000 [8:14:11<47:55,  1.61it/s]


 91%|███████████████████████████████▊   | 45384/50000 [8:14:11<46:36,  1.65it/s]


 91%|███████████████████████████████▊   | 45385/50000 [8:14:12<47:25,  1.62it/s]


 91%|███████████████████████████████▊   | 45386/50000 [8:14:12<48:18,  1.59it/s]


 91%|███████████████████████████████▊   | 45387/50000 [8:14:13<47:53,  1.61it/s]


 91%|███████████████████████████████▊   | 45388/50000 [8:14:14<49:12,  1.56it/s]


 91%|███████████████████████████████▊   | 45389/50000 [8:14:14<50:07,  1.53it/s]


 91%|███████████████████████████████▊   | 45390/50000 [8:14:15<47:59,  1.60it/s]


 91%|███████████████████████████████▊   | 45391/50000 [8:14:16<48:05,  1.60it/s]


 91%|███████████████████████████████▊   | 45392/50000 [8:14:16<50:04,  1.53it/s]


 91%|███████████████████████████████▊   | 45393/50000 [8:14:17<50:08,  1.53it/s]


 91%|███████████████████████████████▊   | 45394/50000 [8:14:18<51:58,  1.48it/s]


 91%|███████████████████████████████▊   | 45395/50000 [8:14:18<50:16,  1.53it/s]


 91%|███████████████████████████████▊   | 45396/50000 [8:14:19<49:32,  1.55it/s]


 91%|███████████████████████████████▊   | 45397/50000 [8:14:20<49:43,  1.54it/s]


 91%|███████████████████████████████▊   | 45398/50000 [8:14:20<51:39,  1.48it/s]


 91%|███████████████████████████████▊   | 45399/50000 [8:14:21<49:36,  1.55it/s]


 91%|███████████████████████████████▊   | 45400/50000 [8:14:22<50:13,  1.53it/s]
                                                                                
{'loss': 3.1552, 'grad_norm': 2.954277276992798, 'learning_rate': 9.2e-05, 'epoch': 2.38}

 91%|███████████████████████████████▊   | 45400/50000 [8:14:22<50:13,  1.53it/s]


 91%|███████████████████████████████▊   | 45401/50000 [8:14:22<51:00,  1.50it/s]


 91%|███████████████████████████████▊   | 45402/50000 [8:14:23<53:11,  1.44it/s]


 91%|███████████████████████████████▊   | 45403/50000 [8:14:24<49:47,  1.54it/s]


 91%|███████████████████████████████▊   | 45404/50000 [8:14:24<49:52,  1.54it/s]


 91%|███████████████████████████████▊   | 45405/50000 [8:14:25<47:29,  1.61it/s]


 91%|███████████████████████████████▊   | 45406/50000 [8:14:25<50:22,  1.52it/s]


 91%|███████████████████████████████▊   | 45407/50000 [8:14:26<51:01,  1.50it/s]


 91%|███████████████████████████████▊   | 45408/50000 [8:14:27<50:26,  1.52it/s]


 91%|███████████████████████████████▊   | 45409/50000 [8:14:27<48:26,  1.58it/s]


 91%|███████████████████████████████▊   | 45410/50000 [8:14:28<49:17,  1.55it/s]


 91%|███████████████████████████████▊   | 45411/50000 [8:14:29<46:07,  1.66it/s]


 91%|███████████████████████████████▊   | 45412/50000 [8:14:29<49:23,  1.55it/s]


 91%|███████████████████████████████▊   | 45413/50000 [8:14:30<47:49,  1.60it/s]


 91%|███████████████████████████████▊   | 45414/50000 [8:14:30<48:02,  1.59it/s]


 91%|███████████████████████████████▊   | 45415/50000 [8:14:31<48:37,  1.57it/s]


 91%|███████████████████████████████▊   | 45416/50000 [8:14:32<50:41,  1.51it/s]


 91%|███████████████████████████████▊   | 45417/50000 [8:14:33<51:14,  1.49it/s]


 91%|███████████████████████████████▊   | 45418/50000 [8:14:33<55:13,  1.38it/s]


 91%|███████████████████████████████▊   | 45419/50000 [8:14:34<57:15,  1.33it/s]


 91%|███████████████████████████████▊   | 45420/50000 [8:14:35<56:23,  1.35it/s]


 91%|███████████████████████████████▊   | 45421/50000 [8:14:35<52:17,  1.46it/s]


 91%|███████████████████████████████▊   | 45422/50000 [8:14:36<49:42,  1.54it/s]


 91%|███████████████████████████████▊   | 45423/50000 [8:14:37<47:52,  1.59it/s]


 91%|███████████████████████████████▊   | 45424/50000 [8:14:37<45:22,  1.68it/s]


 91%|███████████████████████████████▊   | 45425/50000 [8:14:38<45:33,  1.67it/s]


 91%|███████████████████████████████▊   | 45426/50000 [8:14:38<43:44,  1.74it/s]


 91%|███████████████████████████████▊   | 45427/50000 [8:14:39<47:13,  1.61it/s]


 91%|███████████████████████████████▊   | 45428/50000 [8:14:40<47:43,  1.60it/s]


 91%|███████████████████████████████▊   | 45429/50000 [8:14:40<47:20,  1.61it/s]


 91%|███████████████████████████████▊   | 45430/50000 [8:14:41<44:58,  1.69it/s]


 91%|███████████████████████████████▊   | 45431/50000 [8:14:41<46:25,  1.64it/s]


 91%|███████████████████████████████▊   | 45432/50000 [8:14:42<43:53,  1.73it/s]


 91%|███████████████████████████████▊   | 45433/50000 [8:14:43<44:03,  1.73it/s]


 91%|███████████████████████████████▊   | 45434/50000 [8:14:43<48:18,  1.58it/s]


 91%|███████████████████████████████▊   | 45435/50000 [8:14:44<47:18,  1.61it/s]


 91%|███████████████████████████████▊   | 45436/50000 [8:14:45<47:38,  1.60it/s]


 91%|███████████████████████████████▊   | 45437/50000 [8:14:45<50:24,  1.51it/s]


 91%|███████████████████████████████▊   | 45438/50000 [8:14:46<49:42,  1.53it/s]


 91%|███████████████████████████████▊   | 45439/50000 [8:14:47<53:41,  1.42it/s]


 91%|███████████████████████████████▊   | 45440/50000 [8:14:47<51:21,  1.48it/s]


 91%|███████████████████████████████▊   | 45441/50000 [8:14:48<53:43,  1.41it/s]


 91%|███████████████████████████████▊   | 45442/50000 [8:14:49<51:15,  1.48it/s]


 91%|███████████████████████████████▊   | 45443/50000 [8:14:49<48:57,  1.55it/s]


 91%|███████████████████████████████▊   | 45444/50000 [8:14:50<45:57,  1.65it/s]


 91%|███████████████████████████████▊   | 45445/50000 [8:14:50<46:55,  1.62it/s]


 91%|███████████████████████████████▊   | 45446/50000 [8:14:51<46:09,  1.64it/s]


 91%|███████████████████████████████▊   | 45447/50000 [8:14:52<46:38,  1.63it/s]


 91%|███████████████████████████████▊   | 45448/50000 [8:14:52<47:14,  1.61it/s]


 91%|███████████████████████████████▊   | 45449/50000 [8:14:53<52:26,  1.45it/s]


 91%|███████████████████████████████▊   | 45450/50000 [8:14:54<49:09,  1.54it/s]


 91%|███████████████████████████████▊   | 45451/50000 [8:14:54<50:57,  1.49it/s]


 91%|███████████████████████████████▊   | 45452/50000 [8:14:55<48:31,  1.56it/s]


 91%|███████████████████████████████▊   | 45453/50000 [8:14:56<48:33,  1.56it/s]


 91%|███████████████████████████████▊   | 45454/50000 [8:14:56<48:08,  1.57it/s]


 91%|███████████████████████████████▊   | 45455/50000 [8:14:57<48:16,  1.57it/s]


 91%|███████████████████████████████▊   | 45456/50000 [8:14:58<49:58,  1.52it/s]


 91%|███████████████████████████████▊   | 45457/50000 [8:14:59<55:47,  1.36it/s]


 91%|███████████████████████████████▊   | 45458/50000 [8:14:59<53:54,  1.40it/s]


 91%|███████████████████████████████▊   | 45459/50000 [8:15:00<52:26,  1.44it/s]


 91%|███████████████████████████████▊   | 45460/50000 [8:15:00<51:07,  1.48it/s]


 91%|███████████████████████████████▊   | 45461/50000 [8:15:01<49:20,  1.53it/s]


 91%|███████████████████████████████▊   | 45462/50000 [8:15:02<49:05,  1.54it/s]


 91%|███████████████████████████████▊   | 45463/50000 [8:15:02<49:20,  1.53it/s]


 91%|███████████████████████████████▊   | 45464/50000 [8:15:03<54:01,  1.40it/s]


 91%|███████████████████████████████▊   | 45465/50000 [8:15:04<50:47,  1.49it/s]


 91%|███████████████████████████████▊   | 45466/50000 [8:15:04<50:57,  1.48it/s]


 91%|███████████████████████████████▊   | 45467/50000 [8:15:05<49:55,  1.51it/s]


 91%|███████████████████████████████▊   | 45468/50000 [8:15:06<45:25,  1.66it/s]


 91%|███████████████████████████████▊   | 45469/50000 [8:15:06<46:46,  1.61it/s]


 91%|███████████████████████████████▊   | 45470/50000 [8:15:07<47:50,  1.58it/s]


 91%|███████████████████████████████▊   | 45471/50000 [8:15:08<47:44,  1.58it/s]


 91%|███████████████████████████████▊   | 45472/50000 [8:15:08<46:31,  1.62it/s]


 91%|███████████████████████████████▊   | 45473/50000 [8:15:09<46:57,  1.61it/s]


 91%|███████████████████████████████▊   | 45474/50000 [8:15:09<46:14,  1.63it/s]


 91%|███████████████████████████████▊   | 45475/50000 [8:15:10<45:19,  1.66it/s]


 91%|███████████████████████████████▊   | 45476/50000 [8:15:11<52:01,  1.45it/s]


 91%|███████████████████████████████▊   | 45477/50000 [8:15:12<52:39,  1.43it/s]


 91%|███████████████████████████████▊   | 45478/50000 [8:15:12<51:31,  1.46it/s]


 91%|███████████████████████████████▊   | 45479/50000 [8:15:13<49:01,  1.54it/s]


 91%|███████████████████████████████▊   | 45480/50000 [8:15:13<47:31,  1.59it/s]


 91%|███████████████████████████████▊   | 45481/50000 [8:15:14<45:46,  1.65it/s]


 91%|███████████████████████████████▊   | 45482/50000 [8:15:15<48:08,  1.56it/s]


 91%|███████████████████████████████▊   | 45483/50000 [8:15:15<45:06,  1.67it/s]


 91%|███████████████████████████████▊   | 45484/50000 [8:15:16<45:03,  1.67it/s]


 91%|███████████████████████████████▊   | 45485/50000 [8:15:16<45:17,  1.66it/s]


 91%|███████████████████████████████▊   | 45486/50000 [8:15:17<45:54,  1.64it/s]


 91%|███████████████████████████████▊   | 45487/50000 [8:15:18<45:44,  1.64it/s]


 91%|███████████████████████████████▊   | 45488/50000 [8:15:18<45:37,  1.65it/s]


 91%|███████████████████████████████▊   | 45489/50000 [8:15:19<45:59,  1.63it/s]


 91%|███████████████████████████████▊   | 45490/50000 [8:15:19<45:05,  1.67it/s]


 91%|███████████████████████████████▊   | 45491/50000 [8:15:20<44:52,  1.67it/s]


 91%|███████████████████████████████▊   | 45492/50000 [8:15:20<44:02,  1.71it/s]


 91%|███████████████████████████████▊   | 45493/50000 [8:15:21<43:42,  1.72it/s]


 91%|███████████████████████████████▊   | 45494/50000 [8:15:22<43:54,  1.71it/s]


 91%|███████████████████████████████▊   | 45495/50000 [8:15:22<45:28,  1.65it/s]


 91%|███████████████████████████████▊   | 45496/50000 [8:15:23<45:59,  1.63it/s]


 91%|███████████████████████████████▊   | 45497/50000 [8:15:24<45:21,  1.65it/s]


 91%|███████████████████████████████▊   | 45498/50000 [8:15:24<45:02,  1.67it/s]


 91%|███████████████████████████████▊   | 45499/50000 [8:15:25<48:01,  1.56it/s]


 91%|███████████████████████████████▊   | 45500/50000 [8:15:26<52:37,  1.43it/s]
                                                                                
{'loss': 3.0792, 'grad_norm': 2.844332456588745, 'learning_rate': 8.999999999999999e-05, 'epoch': 2.38}

 91%|███████████████████████████████▊   | 45500/50000 [8:15:26<52:37,  1.43it/s]


 91%|███████████████████████████████▊   | 45501/50000 [8:15:27<57:01,  1.31it/s]


 91%|███████████████████████████████▊   | 45502/50000 [8:15:27<54:12,  1.38it/s]


 91%|███████████████████████████████▊   | 45503/50000 [8:15:28<50:57,  1.47it/s]


 91%|███████████████████████████████▊   | 45504/50000 [8:15:28<49:48,  1.50it/s]


 91%|███████████████████████████████▊   | 45505/50000 [8:15:29<49:09,  1.52it/s]


 91%|███████████████████████████████▊   | 45506/50000 [8:15:30<49:51,  1.50it/s]


 91%|███████████████████████████████▊   | 45507/50000 [8:15:30<47:56,  1.56it/s]


 91%|███████████████████████████████▊   | 45508/50000 [8:15:31<48:17,  1.55it/s]


 91%|███████████████████████████████▊   | 45509/50000 [8:15:32<48:15,  1.55it/s]


 91%|███████████████████████████████▊   | 45510/50000 [8:15:32<47:49,  1.56it/s]


 91%|███████████████████████████████▊   | 45511/50000 [8:15:33<47:55,  1.56it/s]


 91%|███████████████████████████████▊   | 45512/50000 [8:15:34<48:59,  1.53it/s]


 91%|███████████████████████████████▊   | 45513/50000 [8:15:34<48:34,  1.54it/s]


 91%|███████████████████████████████▊   | 45514/50000 [8:15:35<50:12,  1.49it/s]


 91%|███████████████████████████████▊   | 45515/50000 [8:15:36<47:34,  1.57it/s]


 91%|███████████████████████████████▊   | 45516/50000 [8:15:36<48:01,  1.56it/s]


 91%|███████████████████████████████▊   | 45517/50000 [8:15:37<49:47,  1.50it/s]


 91%|███████████████████████████████▊   | 45518/50000 [8:15:38<49:05,  1.52it/s]


 91%|███████████████████████████████▊   | 45519/50000 [8:15:38<48:44,  1.53it/s]


 91%|███████████████████████████████▊   | 45520/50000 [8:15:39<48:12,  1.55it/s]


 91%|███████████████████████████████▊   | 45521/50000 [8:15:39<44:13,  1.69it/s]


 91%|███████████████████████████████▊   | 45522/50000 [8:15:40<45:36,  1.64it/s]


 91%|███████████████████████████████▊   | 45523/50000 [8:15:41<46:51,  1.59it/s]


 91%|███████████████████████████████▊   | 45524/50000 [8:15:41<44:14,  1.69it/s]


 91%|███████████████████████████████▊   | 45525/50000 [8:15:42<44:53,  1.66it/s]


 91%|███████████████████████████████▊   | 45526/50000 [8:15:42<48:09,  1.55it/s]


 91%|███████████████████████████████▊   | 45527/50000 [8:15:43<48:04,  1.55it/s]


 91%|███████████████████████████████▊   | 45528/50000 [8:15:44<46:26,  1.60it/s]


 91%|███████████████████████████████▊   | 45529/50000 [8:15:44<46:07,  1.62it/s]


 91%|███████████████████████████████▊   | 45530/50000 [8:15:45<48:12,  1.55it/s]


 91%|███████████████████████████████▊   | 45531/50000 [8:15:46<52:54,  1.41it/s]


 91%|███████████████████████████████▊   | 45532/50000 [8:15:47<51:31,  1.45it/s]


 91%|███████████████████████████████▊   | 45533/50000 [8:15:47<48:59,  1.52it/s]


 91%|███████████████████████████████▊   | 45534/50000 [8:15:48<49:11,  1.51it/s]


 91%|███████████████████████████████▊   | 45535/50000 [8:15:48<49:11,  1.51it/s]


 91%|███████████████████████████████▉   | 45536/50000 [8:15:49<49:11,  1.51it/s]


 91%|███████████████████████████████▉   | 45537/50000 [8:15:50<49:27,  1.50it/s]


 91%|███████████████████████████████▉   | 45538/50000 [8:15:50<48:05,  1.55it/s]


 91%|███████████████████████████████▉   | 45539/50000 [8:15:51<48:07,  1.54it/s]


 91%|███████████████████████████████▉   | 45540/50000 [8:15:52<51:59,  1.43it/s]


 91%|███████████████████████████████▉   | 45541/50000 [8:15:53<51:26,  1.44it/s]


 91%|███████████████████████████████▉   | 45542/50000 [8:15:53<56:37,  1.31it/s]


 91%|███████████████████████████████▉   | 45543/50000 [8:15:54<52:00,  1.43it/s]


 91%|███████████████████████████████▉   | 45544/50000 [8:15:55<51:10,  1.45it/s]


 91%|███████████████████████████████▉   | 45545/50000 [8:15:55<49:17,  1.51it/s]


 91%|███████████████████████████████▉   | 45546/50000 [8:15:56<47:01,  1.58it/s]


 91%|███████████████████████████████▉   | 45547/50000 [8:15:57<48:48,  1.52it/s]


 91%|███████████████████████████████▉   | 45548/50000 [8:15:57<47:13,  1.57it/s]


 91%|███████████████████████████████▉   | 45549/50000 [8:15:58<52:01,  1.43it/s]


 91%|███████████████████████████████▉   | 45550/50000 [8:15:59<56:42,  1.31it/s]


 91%|███████████████████████████████▉   | 45551/50000 [8:16:00<53:43,  1.38it/s]


 91%|███████████████████████████████▉   | 45552/50000 [8:16:00<57:41,  1.28it/s]


 91%|███████████████████████████████▉   | 45553/50000 [8:16:01<52:57,  1.40it/s]


 91%|███████████████████████████████▉   | 45554/50000 [8:16:02<49:30,  1.50it/s]


 91%|███████████████████████████████▉   | 45555/50000 [8:16:02<49:54,  1.48it/s]


 91%|███████████████████████████████▉   | 45556/50000 [8:16:03<49:04,  1.51it/s]


 91%|███████████████████████████████▉   | 45557/50000 [8:16:04<48:44,  1.52it/s]


 91%|███████████████████████████████▉   | 45558/50000 [8:16:04<47:22,  1.56it/s]


 91%|███████████████████████████████▉   | 45559/50000 [8:16:05<44:48,  1.65it/s]


 91%|███████████████████████████████▉   | 45560/50000 [8:16:05<45:20,  1.63it/s]


 91%|███████████████████████████████▉   | 45561/50000 [8:16:06<44:09,  1.68it/s]


 91%|███████████████████████████████▉   | 45562/50000 [8:16:06<45:20,  1.63it/s]


 91%|███████████████████████████████▉   | 45563/50000 [8:16:07<45:09,  1.64it/s]


 91%|███████████████████████████████▉   | 45564/50000 [8:16:08<43:16,  1.71it/s]


 91%|███████████████████████████████▉   | 45565/50000 [8:16:08<43:26,  1.70it/s]


 91%|███████████████████████████████▉   | 45566/50000 [8:16:09<47:34,  1.55it/s]


 91%|███████████████████████████████▉   | 45567/50000 [8:16:10<47:43,  1.55it/s]


 91%|███████████████████████████████▉   | 45568/50000 [8:16:10<47:48,  1.55it/s]


 91%|███████████████████████████████▉   | 45569/50000 [8:16:11<49:25,  1.49it/s]


 91%|███████████████████████████████▉   | 45570/50000 [8:16:12<47:26,  1.56it/s]


 91%|███████████████████████████████▉   | 45571/50000 [8:16:12<49:05,  1.50it/s]


 91%|███████████████████████████████▉   | 45572/50000 [8:16:13<47:24,  1.56it/s]


 91%|███████████████████████████████▉   | 45573/50000 [8:16:13<46:35,  1.58it/s]


 91%|███████████████████████████████▉   | 45574/50000 [8:16:14<47:07,  1.57it/s]


 91%|███████████████████████████████▉   | 45575/50000 [8:16:15<43:47,  1.68it/s]


 91%|███████████████████████████████▉   | 45576/50000 [8:16:15<47:28,  1.55it/s]


 91%|███████████████████████████████▉   | 45577/50000 [8:16:16<48:03,  1.53it/s]


 91%|███████████████████████████████▉   | 45578/50000 [8:16:17<48:09,  1.53it/s]


 91%|███████████████████████████████▉   | 45579/50000 [8:16:17<47:39,  1.55it/s]


 91%|███████████████████████████████▉   | 45580/50000 [8:16:18<47:11,  1.56it/s]


 91%|███████████████████████████████▉   | 45581/50000 [8:16:19<45:17,  1.63it/s]


 91%|███████████████████████████████▉   | 45582/50000 [8:16:19<46:29,  1.58it/s]


 91%|███████████████████████████████▉   | 45583/50000 [8:16:20<44:57,  1.64it/s]


 91%|███████████████████████████████▉   | 45584/50000 [8:16:20<43:04,  1.71it/s]


 91%|███████████████████████████████▉   | 45585/50000 [8:16:21<43:22,  1.70it/s]


 91%|███████████████████████████████▉   | 45586/50000 [8:16:21<43:35,  1.69it/s]


 91%|███████████████████████████████▉   | 45587/50000 [8:16:22<43:10,  1.70it/s]


 91%|███████████████████████████████▉   | 45588/50000 [8:16:23<44:17,  1.66it/s]


 91%|███████████████████████████████▉   | 45589/50000 [8:16:23<46:11,  1.59it/s]


 91%|███████████████████████████████▉   | 45590/50000 [8:16:24<50:17,  1.46it/s]


 91%|███████████████████████████████▉   | 45591/50000 [8:16:25<58:00,  1.27it/s]


 91%|███████████████████████████████▉   | 45592/50000 [8:16:26<56:40,  1.30it/s]


 91%|███████████████████████████████▉   | 45593/50000 [8:16:27<54:17,  1.35it/s]


 91%|███████████████████████████████▉   | 45594/50000 [8:16:27<52:19,  1.40it/s]


 91%|███████████████████████████████▉   | 45595/50000 [8:16:28<51:30,  1.43it/s]


 91%|███████████████████████████████▉   | 45596/50000 [8:16:29<53:25,  1.37it/s]


 91%|███████████████████████████████▉   | 45597/50000 [8:16:29<53:40,  1.37it/s]


 91%|███████████████████████████████▉   | 45598/50000 [8:16:30<58:20,  1.26it/s]


 91%|███████████████████████████████▉   | 45599/50000 [8:16:31<54:14,  1.35it/s]


 91%|███████████████████████████████▉   | 45600/50000 [8:16:32<52:45,  1.39it/s]
                                                                                
{'loss': 3.1385, 'grad_norm': 3.1824891567230225, 'learning_rate': 8.8e-05, 'epoch': 2.39}

 91%|███████████████████████████████▉   | 45600/50000 [8:16:32<52:45,  1.39it/s]


 91%|███████████████████████████████▉   | 45601/50000 [8:16:32<50:52,  1.44it/s]


 91%|███████████████████████████████▉   | 45602/50000 [8:16:33<55:56,  1.31it/s]


 91%|███████████████████████████████▉   | 45603/50000 [8:16:34<57:57,  1.26it/s]


 91%|███████████████████████████████▉   | 45604/50000 [8:16:35<58:29,  1.25it/s]


 91%|███████████████████████████████▉   | 45605/50000 [8:16:36<56:59,  1.29it/s]


 91%|███████████████████████████████▉   | 45606/50000 [8:16:36<50:48,  1.44it/s]


 91%|███████████████████████████████▉   | 45607/50000 [8:16:37<47:47,  1.53it/s]


 91%|███████████████████████████████▉   | 45608/50000 [8:16:37<47:29,  1.54it/s]


 91%|███████████████████████████████▉   | 45609/50000 [8:16:38<45:53,  1.59it/s]


 91%|███████████████████████████████▉   | 45610/50000 [8:16:39<45:03,  1.62it/s]


 91%|███████████████████████████████▉   | 45611/50000 [8:16:39<47:09,  1.55it/s]


 91%|███████████████████████████████▉   | 45612/50000 [8:16:40<46:48,  1.56it/s]


 91%|███████████████████████████████▉   | 45613/50000 [8:16:41<47:04,  1.55it/s]


 91%|███████████████████████████████▉   | 45614/50000 [8:16:41<49:17,  1.48it/s]


 91%|███████████████████████████████▉   | 45615/50000 [8:16:42<46:54,  1.56it/s]


 91%|███████████████████████████████▉   | 45616/50000 [8:16:42<45:30,  1.61it/s]


 91%|███████████████████████████████▉   | 45617/50000 [8:16:43<45:06,  1.62it/s]


 91%|███████████████████████████████▉   | 45618/50000 [8:16:44<44:29,  1.64it/s]


 91%|███████████████████████████████▉   | 45619/50000 [8:16:44<46:51,  1.56it/s]


 91%|███████████████████████████████▉   | 45620/50000 [8:16:45<45:43,  1.60it/s]


 91%|███████████████████████████████▉   | 45621/50000 [8:16:46<45:54,  1.59it/s]


 91%|███████████████████████████████▉   | 45622/50000 [8:16:46<45:54,  1.59it/s]


 91%|███████████████████████████████▉   | 45623/50000 [8:16:47<44:14,  1.65it/s]


 91%|███████████████████████████████▉   | 45624/50000 [8:16:47<46:31,  1.57it/s]


 91%|███████████████████████████████▉   | 45625/50000 [8:16:48<44:56,  1.62it/s]


 91%|███████████████████████████████▉   | 45626/50000 [8:16:49<45:36,  1.60it/s]


 91%|███████████████████████████████▉   | 45627/50000 [8:16:49<46:29,  1.57it/s]


 91%|███████████████████████████████▉   | 45628/50000 [8:16:50<50:17,  1.45it/s]


 91%|███████████████████████████████▉   | 45629/50000 [8:16:51<50:46,  1.43it/s]


 91%|███████████████████████████████▉   | 45630/50000 [8:16:52<53:49,  1.35it/s]


 91%|███████████████████████████████▉   | 45631/50000 [8:16:52<51:50,  1.40it/s]


 91%|███████████████████████████████▉   | 45632/50000 [8:16:53<49:11,  1.48it/s]


 91%|███████████████████████████████▉   | 45633/50000 [8:16:53<44:46,  1.63it/s]


 91%|███████████████████████████████▉   | 45634/50000 [8:16:54<43:48,  1.66it/s]


 91%|███████████████████████████████▉   | 45635/50000 [8:16:55<45:05,  1.61it/s]


 91%|███████████████████████████████▉   | 45636/50000 [8:16:55<47:57,  1.52it/s]


 91%|███████████████████████████████▉   | 45637/50000 [8:16:56<49:04,  1.48it/s]


 91%|███████████████████████████████▉   | 45638/50000 [8:16:57<49:51,  1.46it/s]


 91%|███████████████████████████████▉   | 45639/50000 [8:16:58<49:50,  1.46it/s]


 91%|███████████████████████████████▉   | 45640/50000 [8:16:58<50:20,  1.44it/s]


 91%|███████████████████████████████▉   | 45641/50000 [8:16:59<46:36,  1.56it/s]


 91%|███████████████████████████████▉   | 45642/50000 [8:16:59<45:18,  1.60it/s]


 91%|███████████████████████████████▉   | 45643/50000 [8:17:00<46:10,  1.57it/s]


 91%|███████████████████████████████▉   | 45644/50000 [8:17:01<47:53,  1.52it/s]


 91%|███████████████████████████████▉   | 45645/50000 [8:17:01<47:26,  1.53it/s]


 91%|███████████████████████████████▉   | 45646/50000 [8:17:02<51:02,  1.42it/s]


 91%|███████████████████████████████▉   | 45647/50000 [8:17:03<49:46,  1.46it/s]


 91%|███████████████████████████████▉   | 45648/50000 [8:17:03<47:26,  1.53it/s]


 91%|███████████████████████████████▉   | 45649/50000 [8:17:04<44:20,  1.64it/s]


 91%|███████████████████████████████▉   | 45650/50000 [8:17:05<47:36,  1.52it/s]


 91%|███████████████████████████████▉   | 45651/50000 [8:17:05<48:02,  1.51it/s]


 91%|███████████████████████████████▉   | 45652/50000 [8:17:06<50:04,  1.45it/s]


 91%|███████████████████████████████▉   | 45653/50000 [8:17:07<49:55,  1.45it/s]


 91%|███████████████████████████████▉   | 45654/50000 [8:17:07<48:53,  1.48it/s]


 91%|███████████████████████████████▉   | 45655/50000 [8:17:08<48:56,  1.48it/s]


 91%|███████████████████████████████▉   | 45656/50000 [8:17:09<46:47,  1.55it/s]


 91%|███████████████████████████████▉   | 45657/50000 [8:17:09<45:58,  1.57it/s]


 91%|███████████████████████████████▉   | 45658/50000 [8:17:10<44:52,  1.61it/s]


 91%|███████████████████████████████▉   | 45659/50000 [8:17:11<47:51,  1.51it/s]


 91%|███████████████████████████████▉   | 45660/50000 [8:17:11<44:51,  1.61it/s]


 91%|███████████████████████████████▉   | 45661/50000 [8:17:12<47:14,  1.53it/s]


 91%|███████████████████████████████▉   | 45662/50000 [8:17:12<44:17,  1.63it/s]


 91%|███████████████████████████████▉   | 45663/50000 [8:17:13<48:34,  1.49it/s]


 91%|███████████████████████████████▉   | 45664/50000 [8:17:14<46:31,  1.55it/s]


 91%|███████████████████████████████▉   | 45665/50000 [8:17:14<45:14,  1.60it/s]


 91%|███████████████████████████████▉   | 45666/50000 [8:17:15<43:31,  1.66it/s]


 91%|███████████████████████████████▉   | 45667/50000 [8:17:16<48:51,  1.48it/s]


 91%|███████████████████████████████▉   | 45668/50000 [8:17:16<46:28,  1.55it/s]


 91%|███████████████████████████████▉   | 45669/50000 [8:17:17<55:06,  1.31it/s]


 91%|███████████████████████████████▉   | 45670/50000 [8:17:18<54:23,  1.33it/s]


 91%|███████████████████████████████▉   | 45671/50000 [8:17:19<50:42,  1.42it/s]


 91%|███████████████████████████████▉   | 45672/50000 [8:17:19<49:04,  1.47it/s]


 91%|███████████████████████████████▉   | 45673/50000 [8:17:20<47:22,  1.52it/s]


 91%|███████████████████████████████▉   | 45674/50000 [8:17:21<47:00,  1.53it/s]


 91%|███████████████████████████████▉   | 45675/50000 [8:17:21<47:26,  1.52it/s]


 91%|███████████████████████████████▉   | 45676/50000 [8:17:22<49:57,  1.44it/s]


 91%|███████████████████████████████▉   | 45677/50000 [8:17:23<49:11,  1.46it/s]


 91%|███████████████████████████████▉   | 45678/50000 [8:17:23<47:08,  1.53it/s]


 91%|███████████████████████████████▉   | 45679/50000 [8:17:24<45:50,  1.57it/s]


 91%|███████████████████████████████▉   | 45680/50000 [8:17:24<46:00,  1.56it/s]


 91%|███████████████████████████████▉   | 45681/50000 [8:17:25<46:20,  1.55it/s]


 91%|███████████████████████████████▉   | 45682/50000 [8:17:26<44:55,  1.60it/s]


 91%|███████████████████████████████▉   | 45683/50000 [8:17:26<44:11,  1.63it/s]


 91%|███████████████████████████████▉   | 45684/50000 [8:17:27<43:48,  1.64it/s]


 91%|███████████████████████████████▉   | 45685/50000 [8:17:28<45:09,  1.59it/s]


 91%|███████████████████████████████▉   | 45686/50000 [8:17:28<47:45,  1.51it/s]


 91%|███████████████████████████████▉   | 45687/50000 [8:17:29<49:26,  1.45it/s]


 91%|███████████████████████████████▉   | 45688/50000 [8:17:30<47:25,  1.52it/s]


 91%|███████████████████████████████▉   | 45689/50000 [8:17:30<45:20,  1.58it/s]


 91%|███████████████████████████████▉   | 45690/50000 [8:17:31<46:10,  1.56it/s]


 91%|███████████████████████████████▉   | 45691/50000 [8:17:32<46:29,  1.54it/s]


 91%|███████████████████████████████▉   | 45692/50000 [8:17:32<45:29,  1.58it/s]


 91%|███████████████████████████████▉   | 45693/50000 [8:17:33<45:35,  1.57it/s]


 91%|███████████████████████████████▉   | 45694/50000 [8:17:33<46:29,  1.54it/s]


 91%|███████████████████████████████▉   | 45695/50000 [8:17:34<45:20,  1.58it/s]


 91%|███████████████████████████████▉   | 45696/50000 [8:17:35<44:07,  1.63it/s]


 91%|███████████████████████████████▉   | 45697/50000 [8:17:35<43:48,  1.64it/s]


 91%|███████████████████████████████▉   | 45698/50000 [8:17:36<43:03,  1.67it/s]


 91%|███████████████████████████████▉   | 45699/50000 [8:17:36<43:51,  1.63it/s]


 91%|███████████████████████████████▉   | 45700/50000 [8:17:37<43:21,  1.65it/s]
                                                                                
{'loss': 3.1022, 'grad_norm': 3.2110393047332764, 'learning_rate': 8.599999999999999e-05, 'epoch': 2.39}

 91%|███████████████████████████████▉   | 45700/50000 [8:17:37<43:21,  1.65it/s]


 91%|███████████████████████████████▉   | 45701/50000 [8:17:38<44:54,  1.60it/s]


 91%|███████████████████████████████▉   | 45702/50000 [8:17:38<46:51,  1.53it/s]


 91%|███████████████████████████████▉   | 45703/50000 [8:17:39<47:29,  1.51it/s]


 91%|███████████████████████████████▉   | 45704/50000 [8:17:40<53:46,  1.33it/s]


 91%|███████████████████████████████▉   | 45705/50000 [8:17:41<50:16,  1.42it/s]


 91%|███████████████████████████████▉   | 45706/50000 [8:17:41<47:49,  1.50it/s]


 91%|███████████████████████████████▉   | 45707/50000 [8:17:42<46:55,  1.52it/s]


 91%|███████████████████████████████▉   | 45708/50000 [8:17:42<45:31,  1.57it/s]


 91%|███████████████████████████████▉   | 45709/50000 [8:17:43<43:59,  1.63it/s]


 91%|███████████████████████████████▉   | 45710/50000 [8:17:44<43:32,  1.64it/s]


 91%|███████████████████████████████▉   | 45711/50000 [8:17:44<43:14,  1.65it/s]


 91%|███████████████████████████████▉   | 45712/50000 [8:17:45<41:27,  1.72it/s]


 91%|███████████████████████████████▉   | 45713/50000 [8:17:45<41:40,  1.71it/s]


 91%|███████████████████████████████▉   | 45714/50000 [8:17:46<41:18,  1.73it/s]


 91%|████████████████████████████████   | 45715/50000 [8:17:47<42:38,  1.67it/s]


 91%|████████████████████████████████   | 45716/50000 [8:17:47<42:17,  1.69it/s]


 91%|████████████████████████████████   | 45717/50000 [8:17:48<43:24,  1.64it/s]


 91%|████████████████████████████████   | 45718/50000 [8:17:48<41:36,  1.72it/s]


 91%|████████████████████████████████   | 45719/50000 [8:17:49<41:45,  1.71it/s]


 91%|████████████████████████████████   | 45720/50000 [8:17:49<41:12,  1.73it/s]


 91%|████████████████████████████████   | 45721/50000 [8:17:50<40:56,  1.74it/s]


 91%|████████████████████████████████   | 45722/50000 [8:17:51<40:52,  1.74it/s]


 91%|████████████████████████████████   | 45723/50000 [8:17:51<46:05,  1.55it/s]


 91%|████████████████████████████████   | 45724/50000 [8:17:52<44:27,  1.60it/s]


 91%|████████████████████████████████   | 45725/50000 [8:17:53<43:58,  1.62it/s]


 91%|████████████████████████████████   | 45726/50000 [8:17:53<44:33,  1.60it/s]


 91%|████████████████████████████████   | 45727/50000 [8:17:54<44:02,  1.62it/s]


 91%|████████████████████████████████   | 45728/50000 [8:17:54<41:25,  1.72it/s]


 91%|████████████████████████████████   | 45729/50000 [8:17:55<41:33,  1.71it/s]


 91%|████████████████████████████████   | 45730/50000 [8:17:56<41:39,  1.71it/s]


 91%|████████████████████████████████   | 45731/50000 [8:17:56<46:50,  1.52it/s]


 91%|████████████████████████████████   | 45732/50000 [8:17:57<43:38,  1.63it/s]


 91%|████████████████████████████████   | 45733/50000 [8:17:57<43:21,  1.64it/s]


 91%|████████████████████████████████   | 45734/50000 [8:17:58<42:46,  1.66it/s]


 91%|████████████████████████████████   | 45735/50000 [8:17:59<45:14,  1.57it/s]


 91%|████████████████████████████████   | 45736/50000 [8:18:00<49:03,  1.45it/s]


 91%|████████████████████████████████   | 45737/50000 [8:18:00<46:53,  1.52it/s]


 91%|████████████████████████████████   | 45738/50000 [8:18:01<44:27,  1.60it/s]


 91%|████████████████████████████████   | 45739/50000 [8:18:01<43:23,  1.64it/s]


 91%|████████████████████████████████   | 45740/50000 [8:18:02<50:11,  1.41it/s]


 91%|████████████████████████████████   | 45741/50000 [8:18:03<47:14,  1.50it/s]


 91%|████████████████████████████████   | 45742/50000 [8:18:03<47:13,  1.50it/s]


 91%|████████████████████████████████   | 45743/50000 [8:18:04<46:47,  1.52it/s]


 91%|████████████████████████████████   | 45744/50000 [8:18:05<45:16,  1.57it/s]


 91%|████████████████████████████████   | 45745/50000 [8:18:05<45:48,  1.55it/s]


 91%|████████████████████████████████   | 45746/50000 [8:18:06<47:26,  1.49it/s]


 91%|████████████████████████████████   | 45747/50000 [8:18:07<48:24,  1.46it/s]


 91%|████████████████████████████████   | 45748/50000 [8:18:07<46:30,  1.52it/s]


 91%|████████████████████████████████   | 45749/50000 [8:18:08<47:55,  1.48it/s]


 92%|████████████████████████████████   | 45750/50000 [8:18:09<45:38,  1.55it/s]


 92%|████████████████████████████████   | 45751/50000 [8:18:09<42:46,  1.66it/s]


 92%|████████████████████████████████   | 45752/50000 [8:18:10<40:45,  1.74it/s]


 92%|████████████████████████████████   | 45753/50000 [8:18:10<41:16,  1.71it/s]


 92%|████████████████████████████████   | 45754/50000 [8:18:11<42:51,  1.65it/s]


 92%|████████████████████████████████   | 45755/50000 [8:18:12<43:49,  1.61it/s]


 92%|████████████████████████████████   | 45756/50000 [8:18:12<43:24,  1.63it/s]


 92%|████████████████████████████████   | 45757/50000 [8:18:13<42:36,  1.66it/s]


 92%|████████████████████████████████   | 45758/50000 [8:18:13<40:38,  1.74it/s]


 92%|████████████████████████████████   | 45759/50000 [8:18:14<45:43,  1.55it/s]


 92%|████████████████████████████████   | 45760/50000 [8:18:15<47:30,  1.49it/s]


 92%|████████████████████████████████   | 45761/50000 [8:18:16<51:04,  1.38it/s]


 92%|████████████████████████████████   | 45762/50000 [8:18:16<52:54,  1.34it/s]


 92%|████████████████████████████████   | 45763/50000 [8:18:17<49:06,  1.44it/s]


 92%|████████████████████████████████   | 45764/50000 [8:18:18<48:02,  1.47it/s]


 92%|████████████████████████████████   | 45765/50000 [8:18:18<47:18,  1.49it/s]


 92%|████████████████████████████████   | 45766/50000 [8:18:19<44:10,  1.60it/s]


 92%|████████████████████████████████   | 45767/50000 [8:18:20<46:22,  1.52it/s]


 92%|████████████████████████████████   | 45768/50000 [8:18:20<46:55,  1.50it/s]


 92%|████████████████████████████████   | 45769/50000 [8:18:21<45:25,  1.55it/s]


 92%|████████████████████████████████   | 45770/50000 [8:18:21<42:26,  1.66it/s]


 92%|████████████████████████████████   | 45771/50000 [8:18:22<42:57,  1.64it/s]


 92%|████████████████████████████████   | 45772/50000 [8:18:22<40:20,  1.75it/s]


 92%|████████████████████████████████   | 45773/50000 [8:18:23<40:33,  1.74it/s]


 92%|████████████████████████████████   | 45774/50000 [8:18:24<42:10,  1.67it/s]


 92%|████████████████████████████████   | 45775/50000 [8:18:24<43:51,  1.61it/s]


 92%|████████████████████████████████   | 45776/50000 [8:18:25<46:42,  1.51it/s]


 92%|████████████████████████████████   | 45777/50000 [8:18:26<46:22,  1.52it/s]


 92%|████████████████████████████████   | 45778/50000 [8:18:26<46:24,  1.52it/s]


 92%|████████████████████████████████   | 45779/50000 [8:18:27<44:48,  1.57it/s]


 92%|████████████████████████████████   | 45780/50000 [8:18:28<46:21,  1.52it/s]


 92%|████████████████████████████████   | 45781/50000 [8:18:28<44:38,  1.58it/s]


 92%|████████████████████████████████   | 45782/50000 [8:18:29<45:18,  1.55it/s]


 92%|████████████████████████████████   | 45783/50000 [8:18:30<43:22,  1.62it/s]


 92%|████████████████████████████████   | 45784/50000 [8:18:30<46:10,  1.52it/s]


 92%|████████████████████████████████   | 45785/50000 [8:18:31<46:20,  1.52it/s]


 92%|████████████████████████████████   | 45786/50000 [8:18:32<44:49,  1.57it/s]


 92%|████████████████████████████████   | 45787/50000 [8:18:32<44:28,  1.58it/s]


 92%|████████████████████████████████   | 45788/50000 [8:18:33<43:00,  1.63it/s]


 92%|████████████████████████████████   | 45789/50000 [8:18:33<43:17,  1.62it/s]


 92%|████████████████████████████████   | 45790/50000 [8:18:34<42:33,  1.65it/s]


 92%|████████████████████████████████   | 45791/50000 [8:18:35<43:38,  1.61it/s]


 92%|████████████████████████████████   | 45792/50000 [8:18:35<46:18,  1.51it/s]


 92%|████████████████████████████████   | 45793/50000 [8:18:36<48:02,  1.46it/s]


 92%|████████████████████████████████   | 45794/50000 [8:18:37<45:44,  1.53it/s]


 92%|████████████████████████████████   | 45795/50000 [8:18:37<44:23,  1.58it/s]


 92%|████████████████████████████████   | 45796/50000 [8:18:38<46:16,  1.51it/s]


 92%|████████████████████████████████   | 45797/50000 [8:18:39<44:45,  1.57it/s]


 92%|████████████████████████████████   | 45798/50000 [8:18:39<44:45,  1.56it/s]


 92%|████████████████████████████████   | 45799/50000 [8:18:40<47:20,  1.48it/s]


 92%|████████████████████████████████   | 45800/50000 [8:18:41<48:20,  1.45it/s]
                                                                                
{'loss': 3.1492, 'grad_norm': 3.7464609146118164, 'learning_rate': 8.400000000000001e-05, 'epoch': 2.4}

 92%|████████████████████████████████   | 45800/50000 [8:18:41<48:20,  1.45it/s]


 92%|████████████████████████████████   | 45801/50000 [8:18:41<47:15,  1.48it/s]


 92%|████████████████████████████████   | 45802/50000 [8:18:42<44:38,  1.57it/s]


 92%|████████████████████████████████   | 45803/50000 [8:18:43<43:51,  1.60it/s]


 92%|████████████████████████████████   | 45804/50000 [8:18:43<42:31,  1.64it/s]


 92%|████████████████████████████████   | 45805/50000 [8:18:44<41:42,  1.68it/s]


 92%|████████████████████████████████   | 45806/50000 [8:18:44<46:11,  1.51it/s]


 92%|████████████████████████████████   | 45807/50000 [8:18:45<51:15,  1.36it/s]


 92%|████████████████████████████████   | 45808/50000 [8:18:46<52:55,  1.32it/s]


 92%|████████████████████████████████   | 45809/50000 [8:18:47<49:33,  1.41it/s]


 92%|████████████████████████████████   | 45810/50000 [8:18:47<47:59,  1.45it/s]


 92%|████████████████████████████████   | 45811/50000 [8:18:48<46:21,  1.51it/s]


 92%|████████████████████████████████   | 45812/50000 [8:18:49<46:49,  1.49it/s]


 92%|████████████████████████████████   | 45813/50000 [8:18:49<46:12,  1.51it/s]


 92%|████████████████████████████████   | 45814/50000 [8:18:50<46:03,  1.51it/s]


 92%|████████████████████████████████   | 45815/50000 [8:18:51<46:04,  1.51it/s]


 92%|████████████████████████████████   | 45816/50000 [8:18:51<46:20,  1.50it/s]


 92%|████████████████████████████████   | 45817/50000 [8:18:52<44:08,  1.58it/s]


 92%|████████████████████████████████   | 45818/50000 [8:18:53<43:23,  1.61it/s]


 92%|████████████████████████████████   | 45819/50000 [8:18:53<44:27,  1.57it/s]


 92%|████████████████████████████████   | 45820/50000 [8:18:54<44:56,  1.55it/s]


 92%|████████████████████████████████   | 45821/50000 [8:18:54<44:44,  1.56it/s]


 92%|████████████████████████████████   | 45822/50000 [8:18:55<45:11,  1.54it/s]


 92%|████████████████████████████████   | 45823/50000 [8:18:56<43:20,  1.61it/s]


 92%|████████████████████████████████   | 45824/50000 [8:18:56<40:54,  1.70it/s]


 92%|████████████████████████████████   | 45825/50000 [8:18:57<42:54,  1.62it/s]


 92%|████████████████████████████████   | 45826/50000 [8:18:58<45:17,  1.54it/s]


 92%|████████████████████████████████   | 45827/50000 [8:18:58<43:27,  1.60it/s]


 92%|████████████████████████████████   | 45828/50000 [8:18:59<41:47,  1.66it/s]


 92%|████████████████████████████████   | 45829/50000 [8:18:59<44:11,  1.57it/s]


 92%|████████████████████████████████   | 45830/50000 [8:19:00<42:43,  1.63it/s]


 92%|████████████████████████████████   | 45831/50000 [8:19:01<44:03,  1.58it/s]


 92%|████████████████████████████████   | 45832/50000 [8:19:01<44:28,  1.56it/s]


 92%|████████████████████████████████   | 45833/50000 [8:19:02<44:46,  1.55it/s]


 92%|████████████████████████████████   | 45834/50000 [8:19:03<46:08,  1.50it/s]


 92%|████████████████████████████████   | 45835/50000 [8:19:04<49:21,  1.41it/s]


 92%|████████████████████████████████   | 45836/50000 [8:19:04<47:03,  1.47it/s]


 92%|████████████████████████████████   | 45837/50000 [8:19:05<46:45,  1.48it/s]


 92%|████████████████████████████████   | 45838/50000 [8:19:06<50:21,  1.38it/s]


 92%|████████████████████████████████   | 45839/50000 [8:19:06<52:37,  1.32it/s]


 92%|████████████████████████████████   | 45840/50000 [8:19:07<49:19,  1.41it/s]


 92%|████████████████████████████████   | 45841/50000 [8:19:08<48:39,  1.42it/s]


 92%|████████████████████████████████   | 45842/50000 [8:19:08<46:27,  1.49it/s]


 92%|████████████████████████████████   | 45843/50000 [8:19:09<47:22,  1.46it/s]


 92%|████████████████████████████████   | 45844/50000 [8:19:10<46:34,  1.49it/s]


 92%|████████████████████████████████   | 45845/50000 [8:19:10<44:50,  1.54it/s]


 92%|████████████████████████████████   | 45846/50000 [8:19:11<43:47,  1.58it/s]


 92%|████████████████████████████████   | 45847/50000 [8:19:12<46:01,  1.50it/s]


 92%|████████████████████████████████   | 45848/50000 [8:19:12<45:32,  1.52it/s]


 92%|████████████████████████████████   | 45849/50000 [8:19:13<43:45,  1.58it/s]


 92%|████████████████████████████████   | 45850/50000 [8:19:13<42:55,  1.61it/s]


 92%|████████████████████████████████   | 45851/50000 [8:19:14<49:00,  1.41it/s]


 92%|████████████████████████████████   | 45852/50000 [8:19:15<46:00,  1.50it/s]


 92%|████████████████████████████████   | 45853/50000 [8:19:16<44:24,  1.56it/s]


 92%|████████████████████████████████   | 45854/50000 [8:19:16<44:19,  1.56it/s]


 92%|████████████████████████████████   | 45855/50000 [8:19:17<43:20,  1.59it/s]


 92%|████████████████████████████████   | 45856/50000 [8:19:17<42:32,  1.62it/s]


 92%|████████████████████████████████   | 45857/50000 [8:19:18<44:27,  1.55it/s]


 92%|████████████████████████████████   | 45858/50000 [8:19:19<43:11,  1.60it/s]


 92%|████████████████████████████████   | 45859/50000 [8:19:19<41:46,  1.65it/s]


 92%|████████████████████████████████   | 45860/50000 [8:19:20<44:36,  1.55it/s]


 92%|████████████████████████████████   | 45861/50000 [8:19:21<44:03,  1.57it/s]


 92%|████████████████████████████████   | 45862/50000 [8:19:21<43:16,  1.59it/s]


 92%|████████████████████████████████   | 45863/50000 [8:19:22<43:16,  1.59it/s]


 92%|████████████████████████████████   | 45864/50000 [8:19:22<43:06,  1.60it/s]


 92%|████████████████████████████████   | 45865/50000 [8:19:23<47:52,  1.44it/s]


 92%|████████████████████████████████   | 45866/50000 [8:19:24<48:23,  1.42it/s]


 92%|████████████████████████████████   | 45867/50000 [8:19:25<45:18,  1.52it/s]


 92%|████████████████████████████████   | 45868/50000 [8:19:25<44:10,  1.56it/s]


 92%|████████████████████████████████   | 45869/50000 [8:19:26<46:33,  1.48it/s]


 92%|████████████████████████████████   | 45870/50000 [8:19:26<44:47,  1.54it/s]


 92%|████████████████████████████████   | 45871/50000 [8:19:27<46:59,  1.46it/s]


 92%|████████████████████████████████   | 45872/50000 [8:19:28<44:39,  1.54it/s]


 92%|████████████████████████████████   | 45873/50000 [8:19:28<42:46,  1.61it/s]


 92%|████████████████████████████████   | 45874/50000 [8:19:29<44:41,  1.54it/s]


 92%|████████████████████████████████   | 45875/50000 [8:19:30<45:23,  1.51it/s]


 92%|████████████████████████████████   | 45876/50000 [8:19:30<44:49,  1.53it/s]


 92%|████████████████████████████████   | 45877/50000 [8:19:31<44:31,  1.54it/s]


 92%|████████████████████████████████   | 45878/50000 [8:19:32<45:15,  1.52it/s]


 92%|████████████████████████████████   | 45879/50000 [8:19:32<44:42,  1.54it/s]


 92%|████████████████████████████████   | 45880/50000 [8:19:33<43:18,  1.59it/s]


 92%|████████████████████████████████   | 45881/50000 [8:19:34<43:20,  1.58it/s]


 92%|████████████████████████████████   | 45882/50000 [8:19:34<41:36,  1.65it/s]


 92%|████████████████████████████████   | 45883/50000 [8:19:35<42:01,  1.63it/s]


 92%|████████████████████████████████   | 45884/50000 [8:19:35<42:34,  1.61it/s]


 92%|████████████████████████████████   | 45885/50000 [8:19:36<42:56,  1.60it/s]


 92%|████████████████████████████████   | 45886/50000 [8:19:37<42:02,  1.63it/s]


 92%|████████████████████████████████   | 45887/50000 [8:19:37<39:31,  1.73it/s]


 92%|████████████████████████████████   | 45888/50000 [8:19:38<41:03,  1.67it/s]


 92%|████████████████████████████████   | 45889/50000 [8:19:38<40:42,  1.68it/s]


 92%|████████████████████████████████   | 45890/50000 [8:19:39<42:33,  1.61it/s]


 92%|████████████████████████████████   | 45891/50000 [8:19:40<41:14,  1.66it/s]


 92%|████████████████████████████████   | 45892/50000 [8:19:40<40:17,  1.70it/s]


 92%|████████████████████████████████▏  | 45893/50000 [8:19:41<39:46,  1.72it/s]


 92%|████████████████████████████████▏  | 45894/50000 [8:19:41<40:10,  1.70it/s]


 92%|████████████████████████████████▏  | 45895/50000 [8:19:42<41:06,  1.66it/s]


 92%|████████████████████████████████▏  | 45896/50000 [8:19:43<43:19,  1.58it/s]


 92%|████████████████████████████████▏  | 45897/50000 [8:19:43<43:51,  1.56it/s]


 92%|████████████████████████████████▏  | 45898/50000 [8:19:44<43:01,  1.59it/s]


 92%|████████████████████████████████▏  | 45899/50000 [8:19:45<43:05,  1.59it/s]


 92%|████████████████████████████████▏  | 45900/50000 [8:19:45<43:23,  1.57it/s]
                                                                                
{'loss': 3.1474, 'grad_norm': 3.0049421787261963, 'learning_rate': 8.2e-05, 'epoch': 2.4}

 92%|████████████████████████████████▏  | 45900/50000 [8:19:45<43:23,  1.57it/s]


 92%|████████████████████████████████▏  | 45901/50000 [8:19:46<43:36,  1.57it/s]


 92%|████████████████████████████████▏  | 45902/50000 [8:19:46<43:44,  1.56it/s]


 92%|████████████████████████████████▏  | 45903/50000 [8:19:47<42:08,  1.62it/s]


 92%|████████████████████████████████▏  | 45904/50000 [8:19:48<41:40,  1.64it/s]


 92%|████████████████████████████████▏  | 45905/50000 [8:19:48<42:22,  1.61it/s]


 92%|████████████████████████████████▏  | 45906/50000 [8:19:49<43:30,  1.57it/s]


 92%|████████████████████████████████▏  | 45907/50000 [8:19:50<43:32,  1.57it/s]


 92%|████████████████████████████████▏  | 45908/50000 [8:19:50<45:56,  1.48it/s]


 92%|████████████████████████████████▏  | 45909/50000 [8:19:51<44:26,  1.53it/s]


 92%|████████████████████████████████▏  | 45910/50000 [8:19:52<42:38,  1.60it/s]


 92%|████████████████████████████████▏  | 45911/50000 [8:19:52<45:01,  1.51it/s]


 92%|████████████████████████████████▏  | 45912/50000 [8:19:53<44:49,  1.52it/s]


 92%|████████████████████████████████▏  | 45913/50000 [8:19:54<46:25,  1.47it/s]


 92%|████████████████████████████████▏  | 45914/50000 [8:19:55<50:05,  1.36it/s]


 92%|████████████████████████████████▏  | 45915/50000 [8:19:55<48:55,  1.39it/s]


 92%|████████████████████████████████▏  | 45916/50000 [8:19:56<47:34,  1.43it/s]


 92%|████████████████████████████████▏  | 45917/50000 [8:19:57<47:00,  1.45it/s]


 92%|████████████████████████████████▏  | 45918/50000 [8:19:57<46:12,  1.47it/s]


 92%|████████████████████████████████▏  | 45919/50000 [8:19:58<44:05,  1.54it/s]


 92%|████████████████████████████████▏  | 45920/50000 [8:19:58<44:17,  1.54it/s]


 92%|████████████████████████████████▏  | 45921/50000 [8:19:59<42:44,  1.59it/s]


 92%|████████████████████████████████▏  | 45922/50000 [8:20:00<43:03,  1.58it/s]


 92%|████████████████████████████████▏  | 45923/50000 [8:20:00<44:05,  1.54it/s]


 92%|████████████████████████████████▏  | 45924/50000 [8:20:01<44:39,  1.52it/s]


 92%|████████████████████████████████▏  | 45925/50000 [8:20:02<42:59,  1.58it/s]


 92%|████████████████████████████████▏  | 45926/50000 [8:20:02<41:26,  1.64it/s]


 92%|████████████████████████████████▏  | 45927/50000 [8:20:03<41:46,  1.63it/s]


 92%|████████████████████████████████▏  | 45928/50000 [8:20:03<41:05,  1.65it/s]


 92%|████████████████████████████████▏  | 45929/50000 [8:20:04<39:08,  1.73it/s]


 92%|████████████████████████████████▏  | 45930/50000 [8:20:04<38:29,  1.76it/s]


 92%|████████████████████████████████▏  | 45931/50000 [8:20:05<38:08,  1.78it/s]


 92%|████████████████████████████████▏  | 45932/50000 [8:20:05<38:08,  1.78it/s]


 92%|████████████████████████████████▏  | 45933/50000 [8:20:06<39:18,  1.72it/s]


 92%|████████████████████████████████▏  | 45934/50000 [8:20:07<40:15,  1.68it/s]


 92%|████████████████████████████████▏  | 45935/50000 [8:20:07<40:00,  1.69it/s]


 92%|████████████████████████████████▏  | 45936/50000 [8:20:08<42:25,  1.60it/s]


 92%|████████████████████████████████▏  | 45937/50000 [8:20:09<42:51,  1.58it/s]


 92%|████████████████████████████████▏  | 45938/50000 [8:20:09<44:52,  1.51it/s]


 92%|████████████████████████████████▏  | 45939/50000 [8:20:10<46:16,  1.46it/s]


 92%|████████████████████████████████▏  | 45940/50000 [8:20:11<46:03,  1.47it/s]


 92%|████████████████████████████████▏  | 45941/50000 [8:20:11<45:37,  1.48it/s]


 92%|████████████████████████████████▏  | 45942/50000 [8:20:12<43:39,  1.55it/s]


 92%|████████████████████████████████▏  | 45943/50000 [8:20:13<43:26,  1.56it/s]


 92%|████████████████████████████████▏  | 45944/50000 [8:20:13<43:51,  1.54it/s]


 92%|████████████████████████████████▏  | 45945/50000 [8:20:14<42:10,  1.60it/s]


 92%|████████████████████████████████▏  | 45946/50000 [8:20:14<40:08,  1.68it/s]


 92%|████████████████████████████████▏  | 45947/50000 [8:20:15<43:03,  1.57it/s]


 92%|████████████████████████████████▏  | 45948/50000 [8:20:16<45:06,  1.50it/s]


 92%|████████████████████████████████▏  | 45949/50000 [8:20:17<44:17,  1.52it/s]


 92%|████████████████████████████████▏  | 45950/50000 [8:20:17<42:49,  1.58it/s]


 92%|████████████████████████████████▏  | 45951/50000 [8:20:18<40:27,  1.67it/s]


 92%|████████████████████████████████▏  | 45952/50000 [8:20:18<42:53,  1.57it/s]


 92%|████████████████████████████████▏  | 45953/50000 [8:20:19<45:05,  1.50it/s]


 92%|████████████████████████████████▏  | 45954/50000 [8:20:20<43:07,  1.56it/s]


 92%|████████████████████████████████▏  | 45955/50000 [8:20:20<45:00,  1.50it/s]


 92%|████████████████████████████████▏  | 45956/50000 [8:20:21<43:41,  1.54it/s]


 92%|████████████████████████████████▏  | 45957/50000 [8:20:22<45:14,  1.49it/s]


 92%|████████████████████████████████▏  | 45958/50000 [8:20:22<42:01,  1.60it/s]


 92%|████████████████████████████████▏  | 45959/50000 [8:20:23<43:00,  1.57it/s]


 92%|████████████████████████████████▏  | 45960/50000 [8:20:23<40:04,  1.68it/s]


 92%|████████████████████████████████▏  | 45961/50000 [8:20:24<40:37,  1.66it/s]


 92%|████████████████████████████████▏  | 45962/50000 [8:20:25<40:14,  1.67it/s]


 92%|████████████████████████████████▏  | 45963/50000 [8:20:25<40:50,  1.65it/s]


 92%|████████████████████████████████▏  | 45964/50000 [8:20:26<41:55,  1.60it/s]


 92%|████████████████████████████████▏  | 45965/50000 [8:20:27<44:22,  1.52it/s]


 92%|████████████████████████████████▏  | 45966/50000 [8:20:27<44:21,  1.52it/s]


 92%|████████████████████████████████▏  | 45967/50000 [8:20:28<45:55,  1.46it/s]


 92%|████████████████████████████████▏  | 45968/50000 [8:20:29<44:18,  1.52it/s]


 92%|████████████████████████████████▏  | 45969/50000 [8:20:30<47:46,  1.41it/s]


 92%|████████████████████████████████▏  | 45970/50000 [8:20:30<46:28,  1.45it/s]


 92%|████████████████████████████████▏  | 45971/50000 [8:20:31<44:10,  1.52it/s]


 92%|████████████████████████████████▏  | 45972/50000 [8:20:31<44:42,  1.50it/s]


 92%|████████████████████████████████▏  | 45973/50000 [8:20:32<47:43,  1.41it/s]


 92%|████████████████████████████████▏  | 45974/50000 [8:20:33<45:15,  1.48it/s]


 92%|████████████████████████████████▏  | 45975/50000 [8:20:33<44:32,  1.51it/s]


 92%|████████████████████████████████▏  | 45976/50000 [8:20:34<43:47,  1.53it/s]


 92%|████████████████████████████████▏  | 45977/50000 [8:20:35<45:05,  1.49it/s]


 92%|████████████████████████████████▏  | 45978/50000 [8:20:36<53:04,  1.26it/s]


 92%|████████████████████████████████▏  | 45979/50000 [8:20:36<49:01,  1.37it/s]


 92%|████████████████████████████████▏  | 45980/50000 [8:20:37<46:50,  1.43it/s]


 92%|████████████████████████████████▏  | 45981/50000 [8:20:38<46:22,  1.44it/s]


 92%|████████████████████████████████▏  | 45982/50000 [8:20:39<51:41,  1.30it/s]


 92%|████████████████████████████████▏  | 45983/50000 [8:20:39<48:27,  1.38it/s]


 92%|████████████████████████████████▏  | 45984/50000 [8:20:40<49:22,  1.36it/s]


 92%|████████████████████████████████▏  | 45985/50000 [8:20:41<48:53,  1.37it/s]


 92%|████████████████████████████████▏  | 45986/50000 [8:20:41<45:29,  1.47it/s]


 92%|████████████████████████████████▏  | 45987/50000 [8:20:42<45:00,  1.49it/s]


 92%|████████████████████████████████▏  | 45988/50000 [8:20:43<45:52,  1.46it/s]


 92%|████████████████████████████████▏  | 45989/50000 [8:20:43<46:46,  1.43it/s]


 92%|████████████████████████████████▏  | 45990/50000 [8:20:44<44:29,  1.50it/s]


 92%|████████████████████████████████▏  | 45991/50000 [8:20:45<44:40,  1.50it/s]


 92%|████████████████████████████████▏  | 45992/50000 [8:20:45<44:38,  1.50it/s]


 92%|████████████████████████████████▏  | 45993/50000 [8:20:46<42:49,  1.56it/s]


 92%|████████████████████████████████▏  | 45994/50000 [8:20:47<41:40,  1.60it/s]


 92%|████████████████████████████████▏  | 45995/50000 [8:20:47<42:09,  1.58it/s]


 92%|████████████████████████████████▏  | 45996/50000 [8:20:48<42:22,  1.57it/s]


 92%|████████████████████████████████▏  | 45997/50000 [8:20:48<41:36,  1.60it/s]


 92%|████████████████████████████████▏  | 45998/50000 [8:20:49<41:45,  1.60it/s]


 92%|████████████████████████████████▏  | 45999/50000 [8:20:50<44:27,  1.50it/s]


 92%|████████████████████████████████▏  | 46000/50000 [8:20:51<43:49,  1.52it/s]
                                                                                
{'loss': 3.1236, 'grad_norm': 3.31870174407959, 'learning_rate': 8e-05, 'epoch': 2.41}

 92%|████████████████████████████████▏  | 46000/50000 [8:20:51<43:49,  1.52it/s]


 92%|████████████████████████████████▏  | 46001/50000 [8:20:51<41:02,  1.62it/s]


 92%|████████████████████████████████▏  | 46002/50000 [8:20:52<42:25,  1.57it/s]


 92%|████████████████████████████████▏  | 46003/50000 [8:20:52<41:14,  1.61it/s]


 92%|████████████████████████████████▏  | 46004/50000 [8:20:53<39:53,  1.67it/s]


 92%|████████████████████████████████▏  | 46005/50000 [8:20:54<41:22,  1.61it/s]


 92%|████████████████████████████████▏  | 46006/50000 [8:20:54<41:38,  1.60it/s]


 92%|████████████████████████████████▏  | 46007/50000 [8:20:55<45:28,  1.46it/s]


 92%|████████████████████████████████▏  | 46008/50000 [8:20:56<46:00,  1.45it/s]


 92%|████████████████████████████████▏  | 46009/50000 [8:20:56<44:59,  1.48it/s]


 92%|████████████████████████████████▏  | 46010/50000 [8:20:57<43:38,  1.52it/s]


 92%|████████████████████████████████▏  | 46011/50000 [8:20:57<40:59,  1.62it/s]


 92%|████████████████████████████████▏  | 46012/50000 [8:20:58<40:15,  1.65it/s]


 92%|████████████████████████████████▏  | 46013/50000 [8:20:59<39:49,  1.67it/s]


 92%|████████████████████████████████▏  | 46014/50000 [8:20:59<39:39,  1.67it/s]


 92%|████████████████████████████████▏  | 46015/50000 [8:21:00<42:13,  1.57it/s]


 92%|████████████████████████████████▏  | 46016/50000 [8:21:00<39:44,  1.67it/s]


 92%|████████████████████████████████▏  | 46017/50000 [8:21:01<39:04,  1.70it/s]


 92%|████████████████████████████████▏  | 46018/50000 [8:21:02<42:09,  1.57it/s]


 92%|████████████████████████████████▏  | 46019/50000 [8:21:02<41:28,  1.60it/s]


 92%|████████████████████████████████▏  | 46020/50000 [8:21:03<43:32,  1.52it/s]


 92%|████████████████████████████████▏  | 46021/50000 [8:21:04<43:26,  1.53it/s]


 92%|████████████████████████████████▏  | 46022/50000 [8:21:04<41:59,  1.58it/s]


 92%|████████████████████████████████▏  | 46023/50000 [8:21:05<44:24,  1.49it/s]


 92%|████████████████████████████████▏  | 46024/50000 [8:21:06<44:09,  1.50it/s]


 92%|████████████████████████████████▏  | 46025/50000 [8:21:06<43:46,  1.51it/s]


 92%|████████████████████████████████▏  | 46026/50000 [8:21:07<45:13,  1.46it/s]


 92%|████████████████████████████████▏  | 46027/50000 [8:21:08<42:43,  1.55it/s]


 92%|████████████████████████████████▏  | 46028/50000 [8:21:08<40:58,  1.62it/s]


 92%|████████████████████████████████▏  | 46029/50000 [8:21:09<43:46,  1.51it/s]


 92%|████████████████████████████████▏  | 46030/50000 [8:21:10<43:27,  1.52it/s]


 92%|████████████████████████████████▏  | 46031/50000 [8:21:10<43:05,  1.53it/s]


 92%|████████████████████████████████▏  | 46032/50000 [8:21:11<41:58,  1.58it/s]


 92%|████████████████████████████████▏  | 46033/50000 [8:21:12<43:33,  1.52it/s]


 92%|████████████████████████████████▏  | 46034/50000 [8:21:12<44:33,  1.48it/s]


 92%|████████████████████████████████▏  | 46035/50000 [8:21:13<47:14,  1.40it/s]


 92%|████████████████████████████████▏  | 46036/50000 [8:21:14<44:28,  1.49it/s]


 92%|████████████████████████████████▏  | 46037/50000 [8:21:14<42:25,  1.56it/s]


 92%|████████████████████████████████▏  | 46038/50000 [8:21:15<42:20,  1.56it/s]


 92%|████████████████████████████████▏  | 46039/50000 [8:21:16<42:54,  1.54it/s]


 92%|████████████████████████████████▏  | 46040/50000 [8:21:16<42:03,  1.57it/s]


 92%|████████████████████████████████▏  | 46041/50000 [8:21:17<43:48,  1.51it/s]


 92%|████████████████████████████████▏  | 46042/50000 [8:21:18<42:42,  1.54it/s]


 92%|████████████████████████████████▏  | 46043/50000 [8:21:18<39:50,  1.66it/s]


 92%|████████████████████████████████▏  | 46044/50000 [8:21:19<40:46,  1.62it/s]


 92%|████████████████████████████████▏  | 46045/50000 [8:21:19<41:17,  1.60it/s]


 92%|████████████████████████████████▏  | 46046/50000 [8:21:20<44:54,  1.47it/s]


 92%|████████████████████████████████▏  | 46047/50000 [8:21:21<43:48,  1.50it/s]


 92%|████████████████████████████████▏  | 46048/50000 [8:21:21<42:55,  1.53it/s]


 92%|████████████████████████████████▏  | 46049/50000 [8:21:22<44:51,  1.47it/s]


 92%|████████████████████████████████▏  | 46050/50000 [8:21:23<43:57,  1.50it/s]


 92%|████████████████████████████████▏  | 46051/50000 [8:21:23<43:41,  1.51it/s]


 92%|████████████████████████████████▏  | 46052/50000 [8:21:24<42:07,  1.56it/s]


 92%|████████████████████████████████▏  | 46053/50000 [8:21:25<42:01,  1.57it/s]


 92%|████████████████████████████████▏  | 46054/50000 [8:21:25<45:49,  1.43it/s]


 92%|████████████████████████████████▏  | 46055/50000 [8:21:26<43:03,  1.53it/s]


 92%|████████████████████████████████▏  | 46056/50000 [8:21:27<42:47,  1.54it/s]


 92%|████████████████████████████████▏  | 46057/50000 [8:21:27<42:30,  1.55it/s]


 92%|████████████████████████████████▏  | 46058/50000 [8:21:28<43:17,  1.52it/s]


 92%|████████████████████████████████▏  | 46059/50000 [8:21:29<42:18,  1.55it/s]


 92%|████████████████████████████████▏  | 46060/50000 [8:21:29<42:08,  1.56it/s]


 92%|████████████████████████████████▏  | 46061/50000 [8:21:30<41:46,  1.57it/s]


 92%|████████████████████████████████▏  | 46062/50000 [8:21:31<44:34,  1.47it/s]


 92%|████████████████████████████████▏  | 46063/50000 [8:21:31<43:49,  1.50it/s]


 92%|████████████████████████████████▏  | 46064/50000 [8:21:32<43:56,  1.49it/s]


 92%|████████████████████████████████▏  | 46065/50000 [8:21:32<41:31,  1.58it/s]


 92%|████████████████████████████████▏  | 46066/50000 [8:21:33<39:46,  1.65it/s]


 92%|████████████████████████████████▏  | 46067/50000 [8:21:34<40:24,  1.62it/s]


 92%|████████████████████████████████▏  | 46068/50000 [8:21:34<43:23,  1.51it/s]


 92%|████████████████████████████████▏  | 46069/50000 [8:21:35<44:18,  1.48it/s]


 92%|████████████████████████████████▏  | 46070/50000 [8:21:36<46:13,  1.42it/s]


 92%|████████████████████████████████▏  | 46071/50000 [8:21:37<44:47,  1.46it/s]


 92%|████████████████████████████████▎  | 46072/50000 [8:21:37<43:46,  1.50it/s]


 92%|████████████████████████████████▎  | 46073/50000 [8:21:38<43:25,  1.51it/s]


 92%|████████████████████████████████▎  | 46074/50000 [8:21:38<42:56,  1.52it/s]


 92%|████████████████████████████████▎  | 46075/50000 [8:21:39<45:23,  1.44it/s]


 92%|████████████████████████████████▎  | 46076/50000 [8:21:40<43:33,  1.50it/s]


 92%|████████████████████████████████▎  | 46077/50000 [8:21:40<42:02,  1.56it/s]


 92%|████████████████████████████████▎  | 46078/50000 [8:21:41<41:51,  1.56it/s]


 92%|████████████████████████████████▎  | 46079/50000 [8:21:42<43:41,  1.50it/s]


 92%|████████████████████████████████▎  | 46080/50000 [8:21:42<42:09,  1.55it/s]


 92%|████████████████████████████████▎  | 46081/50000 [8:21:43<42:07,  1.55it/s]


 92%|████████████████████████████████▎  | 46082/50000 [8:21:44<43:24,  1.50it/s]


 92%|████████████████████████████████▎  | 46083/50000 [8:21:44<41:18,  1.58it/s]


 92%|████████████████████████████████▎  | 46084/50000 [8:21:45<43:45,  1.49it/s]


 92%|████████████████████████████████▎  | 46085/50000 [8:21:46<42:11,  1.55it/s]


 92%|████████████████████████████████▎  | 46086/50000 [8:21:46<40:51,  1.60it/s]


 92%|████████████████████████████████▎  | 46087/50000 [8:21:47<41:52,  1.56it/s]


 92%|████████████████████████████████▎  | 46088/50000 [8:21:48<43:15,  1.51it/s]


 92%|████████████████████████████████▎  | 46089/50000 [8:21:48<43:13,  1.51it/s]


 92%|████████████████████████████████▎  | 46090/50000 [8:21:49<42:40,  1.53it/s]


 92%|████████████████████████████████▎  | 46091/50000 [8:21:49<39:09,  1.66it/s]


 92%|████████████████████████████████▎  | 46092/50000 [8:21:50<38:07,  1.71it/s]


 92%|████████████████████████████████▎  | 46093/50000 [8:21:51<38:58,  1.67it/s]


 92%|████████████████████████████████▎  | 46094/50000 [8:21:51<41:21,  1.57it/s]


 92%|████████████████████████████████▎  | 46095/50000 [8:21:52<43:15,  1.50it/s]


 92%|████████████████████████████████▎  | 46096/50000 [8:21:53<43:17,  1.50it/s]


 92%|████████████████████████████████▎  | 46097/50000 [8:21:53<45:06,  1.44it/s]


 92%|████████████████████████████████▎  | 46098/50000 [8:21:54<44:35,  1.46it/s]


 92%|████████████████████████████████▎  | 46099/50000 [8:21:55<42:51,  1.52it/s]


 92%|████████████████████████████████▎  | 46100/50000 [8:21:55<41:53,  1.55it/s]
                                                                                
{'loss': 3.1102, 'grad_norm': 3.0865085124969482, 'learning_rate': 7.8e-05, 'epoch': 2.41}

 92%|████████████████████████████████▎  | 46100/50000 [8:21:55<41:53,  1.55it/s]


 92%|████████████████████████████████▎  | 46101/50000 [8:21:56<42:04,  1.54it/s]


 92%|████████████████████████████████▎  | 46102/50000 [8:21:57<43:19,  1.50it/s]


 92%|████████████████████████████████▎  | 46103/50000 [8:21:57<42:12,  1.54it/s]


 92%|████████████████████████████████▎  | 46104/50000 [8:21:58<40:33,  1.60it/s]


 92%|████████████████████████████████▎  | 46105/50000 [8:21:59<45:58,  1.41it/s]


 92%|████████████████████████████████▎  | 46106/50000 [8:21:59<45:10,  1.44it/s]


 92%|████████████████████████████████▎  | 46107/50000 [8:22:00<42:45,  1.52it/s]


 92%|████████████████████████████████▎  | 46108/50000 [8:22:01<41:40,  1.56it/s]


 92%|████████████████████████████████▎  | 46109/50000 [8:22:01<41:53,  1.55it/s]


 92%|████████████████████████████████▎  | 46110/50000 [8:22:02<40:48,  1.59it/s]


 92%|████████████████████████████████▎  | 46111/50000 [8:22:03<43:08,  1.50it/s]


 92%|████████████████████████████████▎  | 46112/50000 [8:22:03<41:56,  1.55it/s]


 92%|████████████████████████████████▎  | 46113/50000 [8:22:04<40:29,  1.60it/s]


 92%|████████████████████████████████▎  | 46114/50000 [8:22:04<41:02,  1.58it/s]


 92%|████████████████████████████████▎  | 46115/50000 [8:22:05<41:56,  1.54it/s]


 92%|████████████████████████████████▎  | 46116/50000 [8:22:06<40:56,  1.58it/s]


 92%|████████████████████████████████▎  | 46117/50000 [8:22:06<41:29,  1.56it/s]


 92%|████████████████████████████████▎  | 46118/50000 [8:22:07<40:39,  1.59it/s]


 92%|████████████████████████████████▎  | 46119/50000 [8:22:08<42:56,  1.51it/s]


 92%|████████████████████████████████▎  | 46120/50000 [8:22:08<41:31,  1.56it/s]


 92%|████████████████████████████████▎  | 46121/50000 [8:22:09<41:45,  1.55it/s]


 92%|████████████████████████████████▎  | 46122/50000 [8:22:10<41:33,  1.56it/s]


 92%|████████████████████████████████▎  | 46123/50000 [8:22:10<39:14,  1.65it/s]


 92%|████████████████████████████████▎  | 46124/50000 [8:22:11<40:31,  1.59it/s]


 92%|████████████████████████████████▎  | 46125/50000 [8:22:11<40:32,  1.59it/s]


 92%|████████████████████████████████▎  | 46126/50000 [8:22:12<39:49,  1.62it/s]


 92%|████████████████████████████████▎  | 46127/50000 [8:22:13<40:03,  1.61it/s]


 92%|████████████████████████████████▎  | 46128/50000 [8:22:14<44:23,  1.45it/s]


 92%|████████████████████████████████▎  | 46129/50000 [8:22:14<44:08,  1.46it/s]


 92%|████████████████████████████████▎  | 46130/50000 [8:22:15<41:38,  1.55it/s]


 92%|████████████████████████████████▎  | 46131/50000 [8:22:15<38:43,  1.66it/s]


 92%|████████████████████████████████▎  | 46132/50000 [8:22:16<42:51,  1.50it/s]


 92%|████████████████████████████████▎  | 46133/50000 [8:22:17<41:35,  1.55it/s]


 92%|████████████████████████████████▎  | 46134/50000 [8:22:17<44:11,  1.46it/s]


 92%|████████████████████████████████▎  | 46135/50000 [8:22:18<45:31,  1.41it/s]


 92%|████████████████████████████████▎  | 46136/50000 [8:22:19<44:03,  1.46it/s]


 92%|████████████████████████████████▎  | 46137/50000 [8:22:19<41:48,  1.54it/s]


 92%|████████████████████████████████▎  | 46138/50000 [8:22:20<39:18,  1.64it/s]


 92%|████████████████████████████████▎  | 46139/50000 [8:22:21<43:44,  1.47it/s]


 92%|████████████████████████████████▎  | 46140/50000 [8:22:22<48:57,  1.31it/s]


 92%|████████████████████████████████▎  | 46141/50000 [8:22:22<45:57,  1.40it/s]


 92%|████████████████████████████████▎  | 46142/50000 [8:22:23<46:28,  1.38it/s]


 92%|████████████████████████████████▎  | 46143/50000 [8:22:24<47:05,  1.37it/s]


 92%|████████████████████████████████▎  | 46144/50000 [8:22:24<45:36,  1.41it/s]


 92%|████████████████████████████████▎  | 46145/50000 [8:22:25<49:35,  1.30it/s]


 92%|████████████████████████████████▎  | 46146/50000 [8:22:26<44:41,  1.44it/s]


 92%|████████████████████████████████▎  | 46147/50000 [8:22:27<42:40,  1.50it/s]


 92%|████████████████████████████████▎  | 46148/50000 [8:22:27<41:59,  1.53it/s]


 92%|████████████████████████████████▎  | 46149/50000 [8:22:28<43:53,  1.46it/s]


 92%|████████████████████████████████▎  | 46150/50000 [8:22:29<43:34,  1.47it/s]


 92%|████████████████████████████████▎  | 46151/50000 [8:22:29<42:43,  1.50it/s]


 92%|████████████████████████████████▎  | 46152/50000 [8:22:30<44:59,  1.43it/s]


 92%|████████████████████████████████▎  | 46153/50000 [8:22:30<41:05,  1.56it/s]


 92%|████████████████████████████████▎  | 46154/50000 [8:22:31<40:07,  1.60it/s]


 92%|████████████████████████████████▎  | 46155/50000 [8:22:32<40:25,  1.59it/s]


 92%|████████████████████████████████▎  | 46156/50000 [8:22:32<40:30,  1.58it/s]


 92%|████████████████████████████████▎  | 46157/50000 [8:22:33<40:48,  1.57it/s]


 92%|████████████████████████████████▎  | 46158/50000 [8:22:34<41:01,  1.56it/s]


 92%|████████████████████████████████▎  | 46159/50000 [8:22:34<40:40,  1.57it/s]


 92%|████████████████████████████████▎  | 46160/50000 [8:22:35<41:16,  1.55it/s]


 92%|████████████████████████████████▎  | 46161/50000 [8:22:36<40:07,  1.59it/s]


 92%|████████████████████████████████▎  | 46162/50000 [8:22:36<41:08,  1.55it/s]


 92%|████████████████████████████████▎  | 46163/50000 [8:22:37<43:43,  1.46it/s]


 92%|████████████████████████████████▎  | 46164/50000 [8:22:38<41:48,  1.53it/s]


 92%|████████████████████████████████▎  | 46165/50000 [8:22:38<41:12,  1.55it/s]


 92%|████████████████████████████████▎  | 46166/50000 [8:22:39<45:19,  1.41it/s]


 92%|████████████████████████████████▎  | 46167/50000 [8:22:40<51:01,  1.25it/s]


 92%|████████████████████████████████▎  | 46168/50000 [8:22:41<46:26,  1.38it/s]


 92%|████████████████████████████████▎  | 46169/50000 [8:22:41<44:11,  1.45it/s]


 92%|████████████████████████████████▎  | 46170/50000 [8:22:42<40:58,  1.56it/s]


 92%|████████████████████████████████▎  | 46171/50000 [8:22:42<39:59,  1.60it/s]


 92%|████████████████████████████████▎  | 46172/50000 [8:22:43<39:18,  1.62it/s]


 92%|████████████████████████████████▎  | 46173/50000 [8:22:44<39:05,  1.63it/s]


 92%|████████████████████████████████▎  | 46174/50000 [8:22:44<38:31,  1.66it/s]


 92%|████████████████████████████████▎  | 46175/50000 [8:22:45<39:00,  1.63it/s]


 92%|████████████████████████████████▎  | 46176/50000 [8:22:45<37:45,  1.69it/s]


 92%|████████████████████████████████▎  | 46177/50000 [8:22:46<37:43,  1.69it/s]


 92%|████████████████████████████████▎  | 46178/50000 [8:22:47<38:38,  1.65it/s]


 92%|████████████████████████████████▎  | 46179/50000 [8:22:47<38:42,  1.65it/s]


 92%|████████████████████████████████▎  | 46180/50000 [8:22:48<40:12,  1.58it/s]


 92%|████████████████████████████████▎  | 46181/50000 [8:22:48<39:32,  1.61it/s]


 92%|████████████████████████████████▎  | 46182/50000 [8:22:49<38:50,  1.64it/s]


 92%|████████████████████████████████▎  | 46183/50000 [8:22:50<38:13,  1.66it/s]


 92%|████████████████████████████████▎  | 46184/50000 [8:22:50<39:14,  1.62it/s]


 92%|████████████████████████████████▎  | 46185/50000 [8:22:51<38:08,  1.67it/s]


 92%|████████████████████████████████▎  | 46186/50000 [8:22:51<38:36,  1.65it/s]


 92%|████████████████████████████████▎  | 46187/50000 [8:22:52<38:21,  1.66it/s]


 92%|████████████████████████████████▎  | 46188/50000 [8:22:53<38:46,  1.64it/s]


 92%|████████████████████████████████▎  | 46189/50000 [8:22:53<37:07,  1.71it/s]


 92%|████████████████████████████████▎  | 46190/50000 [8:22:54<38:34,  1.65it/s]


 92%|████████████████████████████████▎  | 46191/50000 [8:22:54<39:35,  1.60it/s]


 92%|████████████████████████████████▎  | 46192/50000 [8:22:55<38:43,  1.64it/s]


 92%|████████████████████████████████▎  | 46193/50000 [8:22:56<39:05,  1.62it/s]


 92%|████████████████████████████████▎  | 46194/50000 [8:22:56<41:33,  1.53it/s]


 92%|████████████████████████████████▎  | 46195/50000 [8:22:57<39:29,  1.61it/s]


 92%|████████████████████████████████▎  | 46196/50000 [8:22:58<41:58,  1.51it/s]


 92%|████████████████████████████████▎  | 46197/50000 [8:22:58<40:58,  1.55it/s]


 92%|████████████████████████████████▎  | 46198/50000 [8:22:59<39:25,  1.61it/s]


 92%|████████████████████████████████▎  | 46199/50000 [8:23:00<41:25,  1.53it/s]


 92%|████████████████████████████████▎  | 46200/50000 [8:23:00<42:02,  1.51it/s]
                                                                                
{'loss': 3.0692, 'grad_norm': 2.647451400756836, 'learning_rate': 7.6e-05, 'epoch': 2.42}

 92%|████████████████████████████████▎  | 46200/50000 [8:23:00<42:02,  1.51it/s]


 92%|████████████████████████████████▎  | 46201/50000 [8:23:01<43:36,  1.45it/s]


 92%|████████████████████████████████▎  | 46202/50000 [8:23:02<46:05,  1.37it/s]


 92%|████████████████████████████████▎  | 46203/50000 [8:23:03<44:19,  1.43it/s]


 92%|████████████████████████████████▎  | 46204/50000 [8:23:03<43:12,  1.46it/s]


 92%|████████████████████████████████▎  | 46205/50000 [8:23:04<44:46,  1.41it/s]


 92%|████████████████████████████████▎  | 46206/50000 [8:23:05<45:16,  1.40it/s]


 92%|████████████████████████████████▎  | 46207/50000 [8:23:05<44:20,  1.43it/s]


 92%|████████████████████████████████▎  | 46208/50000 [8:23:06<42:24,  1.49it/s]


 92%|████████████████████████████████▎  | 46209/50000 [8:23:07<41:52,  1.51it/s]


 92%|████████████████████████████████▎  | 46210/50000 [8:23:07<39:01,  1.62it/s]


 92%|████████████████████████████████▎  | 46211/50000 [8:23:08<42:53,  1.47it/s]


 92%|████████████████████████████████▎  | 46212/50000 [8:23:09<41:07,  1.54it/s]


 92%|████████████████████████████████▎  | 46213/50000 [8:23:09<43:23,  1.45it/s]


 92%|████████████████████████████████▎  | 46214/50000 [8:23:10<43:00,  1.47it/s]


 92%|████████████████████████████████▎  | 46215/50000 [8:23:11<43:59,  1.43it/s]


 92%|████████████████████████████████▎  | 46216/50000 [8:23:11<42:49,  1.47it/s]


 92%|████████████████████████████████▎  | 46217/50000 [8:23:12<41:01,  1.54it/s]


 92%|████████████████████████████████▎  | 46218/50000 [8:23:13<42:49,  1.47it/s]


 92%|████████████████████████████████▎  | 46219/50000 [8:23:14<46:12,  1.36it/s]


 92%|████████████████████████████████▎  | 46220/50000 [8:23:14<42:55,  1.47it/s]


 92%|████████████████████████████████▎  | 46221/50000 [8:23:15<43:46,  1.44it/s]


 92%|████████████████████████████████▎  | 46222/50000 [8:23:15<43:36,  1.44it/s]


 92%|████████████████████████████████▎  | 46223/50000 [8:23:16<46:00,  1.37it/s]


 92%|████████████████████████████████▎  | 46224/50000 [8:23:17<48:24,  1.30it/s]


 92%|████████████████████████████████▎  | 46225/50000 [8:23:18<45:16,  1.39it/s]


 92%|████████████████████████████████▎  | 46226/50000 [8:23:18<43:55,  1.43it/s]


 92%|████████████████████████████████▎  | 46227/50000 [8:23:19<41:11,  1.53it/s]


 92%|████████████████████████████████▎  | 46228/50000 [8:23:20<41:19,  1.52it/s]


 92%|████████████████████████████████▎  | 46229/50000 [8:23:20<41:17,  1.52it/s]


 92%|████████████████████████████████▎  | 46230/50000 [8:23:21<39:50,  1.58it/s]


 92%|████████████████████████████████▎  | 46231/50000 [8:23:21<37:49,  1.66it/s]


 92%|████████████████████████████████▎  | 46232/50000 [8:23:22<37:38,  1.67it/s]


 92%|████████████████████████████████▎  | 46233/50000 [8:23:23<37:23,  1.68it/s]


 92%|████████████████████████████████▎  | 46234/50000 [8:23:23<38:01,  1.65it/s]


 92%|████████████████████████████████▎  | 46235/50000 [8:23:24<38:34,  1.63it/s]


 92%|████████████████████████████████▎  | 46236/50000 [8:23:24<38:17,  1.64it/s]


 92%|████████████████████████████████▎  | 46237/50000 [8:23:25<39:00,  1.61it/s]


 92%|████████████████████████████████▎  | 46238/50000 [8:23:26<40:11,  1.56it/s]


 92%|████████████████████████████████▎  | 46239/50000 [8:23:26<39:06,  1.60it/s]


 92%|████████████████████████████████▎  | 46240/50000 [8:23:27<40:01,  1.57it/s]


 92%|████████████████████████████████▎  | 46241/50000 [8:23:28<37:09,  1.69it/s]


 92%|████████████████████████████████▎  | 46242/50000 [8:23:28<40:36,  1.54it/s]


 92%|████████████████████████████████▎  | 46243/50000 [8:23:29<40:13,  1.56it/s]


 92%|████████████████████████████████▎  | 46244/50000 [8:23:30<40:11,  1.56it/s]


 92%|████████████████████████████████▎  | 46245/50000 [8:23:30<44:19,  1.41it/s]


 92%|████████████████████████████████▎  | 46246/50000 [8:23:31<43:39,  1.43it/s]


 92%|████████████████████████████████▎  | 46247/50000 [8:23:32<42:27,  1.47it/s]


 92%|████████████████████████████████▎  | 46248/50000 [8:23:33<44:56,  1.39it/s]


 92%|████████████████████████████████▎  | 46249/50000 [8:23:33<43:43,  1.43it/s]


 92%|████████████████████████████████▍  | 46250/50000 [8:23:34<43:20,  1.44it/s]


 93%|████████████████████████████████▍  | 46251/50000 [8:23:34<40:09,  1.56it/s]


 93%|████████████████████████████████▍  | 46252/50000 [8:23:35<40:21,  1.55it/s]


 93%|████████████████████████████████▍  | 46253/50000 [8:23:36<43:26,  1.44it/s]


 93%|████████████████████████████████▍  | 46254/50000 [8:23:37<43:48,  1.43it/s]


 93%|████████████████████████████████▍  | 46255/50000 [8:23:37<40:19,  1.55it/s]


 93%|████████████████████████████████▍  | 46256/50000 [8:23:38<39:14,  1.59it/s]


 93%|████████████████████████████████▍  | 46257/50000 [8:23:38<36:46,  1.70it/s]


 93%|████████████████████████████████▍  | 46258/50000 [8:23:39<38:24,  1.62it/s]


 93%|████████████████████████████████▍  | 46259/50000 [8:23:40<40:54,  1.52it/s]


 93%|████████████████████████████████▍  | 46260/50000 [8:23:40<42:16,  1.47it/s]


 93%|████████████████████████████████▍  | 46261/50000 [8:23:41<40:28,  1.54it/s]


 93%|████████████████████████████████▍  | 46262/50000 [8:23:41<37:44,  1.65it/s]


 93%|████████████████████████████████▍  | 46263/50000 [8:23:42<38:14,  1.63it/s]


 93%|████████████████████████████████▍  | 46264/50000 [8:23:43<40:45,  1.53it/s]


 93%|████████████████████████████████▍  | 46265/50000 [8:23:44<43:46,  1.42it/s]


 93%|████████████████████████████████▍  | 46266/50000 [8:23:44<43:10,  1.44it/s]


 93%|████████████████████████████████▍  | 46267/50000 [8:23:45<42:44,  1.46it/s]


 93%|████████████████████████████████▍  | 46268/50000 [8:23:46<43:53,  1.42it/s]


 93%|████████████████████████████████▍  | 46269/50000 [8:23:46<41:55,  1.48it/s]


 93%|████████████████████████████████▍  | 46270/50000 [8:23:47<41:15,  1.51it/s]


 93%|████████████████████████████████▍  | 46271/50000 [8:23:48<43:20,  1.43it/s]


 93%|████████████████████████████████▍  | 46272/50000 [8:23:48<40:49,  1.52it/s]


 93%|████████████████████████████████▍  | 46273/50000 [8:23:49<39:39,  1.57it/s]


 93%|████████████████████████████████▍  | 46274/50000 [8:23:50<38:49,  1.60it/s]


 93%|████████████████████████████████▍  | 46275/50000 [8:23:50<38:51,  1.60it/s]


 93%|████████████████████████████████▍  | 46276/50000 [8:23:51<38:51,  1.60it/s]


 93%|████████████████████████████████▍  | 46277/50000 [8:23:51<40:23,  1.54it/s]


 93%|████████████████████████████████▍  | 46278/50000 [8:23:52<38:45,  1.60it/s]


 93%|████████████████████████████████▍  | 46279/50000 [8:23:53<39:50,  1.56it/s]


 93%|████████████████████████████████▍  | 46280/50000 [8:23:53<38:35,  1.61it/s]


 93%|████████████████████████████████▍  | 46281/50000 [8:23:54<38:39,  1.60it/s]


 93%|████████████████████████████████▍  | 46282/50000 [8:23:55<38:49,  1.60it/s]


 93%|████████████████████████████████▍  | 46283/50000 [8:23:55<37:31,  1.65it/s]


 93%|████████████████████████████████▍  | 46284/50000 [8:23:56<38:43,  1.60it/s]


 93%|████████████████████████████████▍  | 46285/50000 [8:23:56<38:58,  1.59it/s]


 93%|████████████████████████████████▍  | 46286/50000 [8:23:57<37:31,  1.65it/s]


 93%|████████████████████████████████▍  | 46287/50000 [8:23:58<37:11,  1.66it/s]


 93%|████████████████████████████████▍  | 46288/50000 [8:23:58<36:57,  1.67it/s]


 93%|████████████████████████████████▍  | 46289/50000 [8:23:59<36:47,  1.68it/s]


 93%|████████████████████████████████▍  | 46290/50000 [8:23:59<38:04,  1.62it/s]


 93%|████████████████████████████████▍  | 46291/50000 [8:24:00<37:33,  1.65it/s]


 93%|████████████████████████████████▍  | 46292/50000 [8:24:01<38:58,  1.59it/s]


 93%|████████████████████████████████▍  | 46293/50000 [8:24:01<37:22,  1.65it/s]


 93%|████████████████████████████████▍  | 46294/50000 [8:24:02<38:36,  1.60it/s]


 93%|████████████████████████████████▍  | 46295/50000 [8:24:03<38:53,  1.59it/s]


 93%|████████████████████████████████▍  | 46296/50000 [8:24:03<38:09,  1.62it/s]


 93%|████████████████████████████████▍  | 46297/50000 [8:24:04<37:31,  1.64it/s]


 93%|████████████████████████████████▍  | 46298/50000 [8:24:04<37:25,  1.65it/s]


 93%|████████████████████████████████▍  | 46299/50000 [8:24:05<35:53,  1.72it/s]


 93%|████████████████████████████████▍  | 46300/50000 [8:24:06<37:33,  1.64it/s]
                                                                                
{'loss': 3.1261, 'grad_norm': 3.237863063812256, 'learning_rate': 7.4e-05, 'epoch': 2.42}

 93%|████████████████████████████████▍  | 46300/50000 [8:24:06<37:33,  1.64it/s]


 93%|████████████████████████████████▍  | 46301/50000 [8:24:06<41:32,  1.48it/s]


 93%|████████████████████████████████▍  | 46302/50000 [8:24:07<40:55,  1.51it/s]


 93%|████████████████████████████████▍  | 46303/50000 [8:24:08<39:12,  1.57it/s]


 93%|████████████████████████████████▍  | 46304/50000 [8:24:08<38:42,  1.59it/s]


 93%|████████████████████████████████▍  | 46305/50000 [8:24:09<39:45,  1.55it/s]


 93%|████████████████████████████████▍  | 46306/50000 [8:24:09<38:50,  1.58it/s]


 93%|████████████████████████████████▍  | 46307/50000 [8:24:10<38:40,  1.59it/s]


 93%|████████████████████████████████▍  | 46308/50000 [8:24:11<42:10,  1.46it/s]


 93%|████████████████████████████████▍  | 46309/50000 [8:24:12<44:35,  1.38it/s]


 93%|████████████████████████████████▍  | 46310/50000 [8:24:12<42:17,  1.45it/s]


 93%|████████████████████████████████▍  | 46311/50000 [8:24:13<40:13,  1.53it/s]


 93%|████████████████████████████████▍  | 46312/50000 [8:24:13<39:07,  1.57it/s]


 93%|████████████████████████████████▍  | 46313/50000 [8:24:14<39:31,  1.55it/s]


 93%|████████████████████████████████▍  | 46314/50000 [8:24:15<43:03,  1.43it/s]


 93%|████████████████████████████████▍  | 46315/50000 [8:24:16<40:31,  1.52it/s]


 93%|████████████████████████████████▍  | 46316/50000 [8:24:16<39:32,  1.55it/s]


 93%|████████████████████████████████▍  | 46317/50000 [8:24:17<37:56,  1.62it/s]


 93%|████████████████████████████████▍  | 46318/50000 [8:24:17<38:26,  1.60it/s]


 93%|████████████████████████████████▍  | 46319/50000 [8:24:18<37:12,  1.65it/s]


 93%|████████████████████████████████▍  | 46320/50000 [8:24:19<37:51,  1.62it/s]


 93%|████████████████████████████████▍  | 46321/50000 [8:24:19<38:04,  1.61it/s]


 93%|████████████████████████████████▍  | 46322/50000 [8:24:20<39:43,  1.54it/s]


 93%|████████████████████████████████▍  | 46323/50000 [8:24:21<39:29,  1.55it/s]


 93%|████████████████████████████████▍  | 46324/50000 [8:24:21<39:32,  1.55it/s]


 93%|████████████████████████████████▍  | 46325/50000 [8:24:22<37:49,  1.62it/s]


 93%|████████████████████████████████▍  | 46326/50000 [8:24:22<37:32,  1.63it/s]


 93%|████████████████████████████████▍  | 46327/50000 [8:24:23<39:44,  1.54it/s]


 93%|████████████████████████████████▍  | 46328/50000 [8:24:24<40:02,  1.53it/s]


 93%|████████████████████████████████▍  | 46329/50000 [8:24:24<37:32,  1.63it/s]


 93%|████████████████████████████████▍  | 46330/50000 [8:24:25<39:42,  1.54it/s]


 93%|████████████████████████████████▍  | 46331/50000 [8:24:26<39:19,  1.55it/s]


 93%|████████████████████████████████▍  | 46332/50000 [8:24:26<40:05,  1.52it/s]


 93%|████████████████████████████████▍  | 46333/50000 [8:24:27<40:37,  1.50it/s]


 93%|████████████████████████████████▍  | 46334/50000 [8:24:28<40:45,  1.50it/s]


 93%|████████████████████████████████▍  | 46335/50000 [8:24:28<44:05,  1.39it/s]


 93%|████████████████████████████████▍  | 46336/50000 [8:24:29<41:54,  1.46it/s]


 93%|████████████████████████████████▍  | 46337/50000 [8:24:30<40:56,  1.49it/s]


 93%|████████████████████████████████▍  | 46338/50000 [8:24:30<38:58,  1.57it/s]


 93%|████████████████████████████████▍  | 46339/50000 [8:24:31<37:49,  1.61it/s]


 93%|████████████████████████████████▍  | 46340/50000 [8:24:32<38:23,  1.59it/s]


 93%|████████████████████████████████▍  | 46341/50000 [8:24:32<41:09,  1.48it/s]


 93%|████████████████████████████████▍  | 46342/50000 [8:24:33<40:22,  1.51it/s]


 93%|████████████████████████████████▍  | 46343/50000 [8:24:33<37:48,  1.61it/s]


 93%|████████████████████████████████▍  | 46344/50000 [8:24:34<38:58,  1.56it/s]


 93%|████████████████████████████████▍  | 46345/50000 [8:24:35<36:44,  1.66it/s]


 93%|████████████████████████████████▍  | 46346/50000 [8:24:35<39:40,  1.54it/s]


 93%|████████████████████████████████▍  | 46347/50000 [8:24:36<38:53,  1.57it/s]


 93%|████████████████████████████████▍  | 46348/50000 [8:24:37<38:48,  1.57it/s]


 93%|████████████████████████████████▍  | 46349/50000 [8:24:37<38:46,  1.57it/s]


 93%|████████████████████████████████▍  | 46350/50000 [8:24:38<37:50,  1.61it/s]


 93%|████████████████████████████████▍  | 46351/50000 [8:24:39<38:31,  1.58it/s]


 93%|████████████████████████████████▍  | 46352/50000 [8:24:39<38:39,  1.57it/s]


 93%|████████████████████████████████▍  | 46353/50000 [8:24:40<39:20,  1.54it/s]


 93%|████████████████████████████████▍  | 46354/50000 [8:24:40<38:53,  1.56it/s]


 93%|████████████████████████████████▍  | 46355/50000 [8:24:41<39:35,  1.53it/s]


 93%|████████████████████████████████▍  | 46356/50000 [8:24:42<39:09,  1.55it/s]


 93%|████████████████████████████████▍  | 46357/50000 [8:24:43<40:59,  1.48it/s]


 93%|████████████████████████████████▍  | 46358/50000 [8:24:43<39:27,  1.54it/s]


 93%|████████████████████████████████▍  | 46359/50000 [8:24:44<39:44,  1.53it/s]


 93%|████████████████████████████████▍  | 46360/50000 [8:24:44<39:50,  1.52it/s]


 93%|████████████████████████████████▍  | 46361/50000 [8:24:45<40:01,  1.52it/s]


 93%|████████████████████████████████▍  | 46362/50000 [8:24:46<41:47,  1.45it/s]


 93%|████████████████████████████████▍  | 46363/50000 [8:24:46<38:31,  1.57it/s]


 93%|████████████████████████████████▍  | 46364/50000 [8:24:47<37:53,  1.60it/s]


 93%|████████████████████████████████▍  | 46365/50000 [8:24:48<37:32,  1.61it/s]


 93%|████████████████████████████████▍  | 46366/50000 [8:24:48<38:42,  1.56it/s]


 93%|████████████████████████████████▍  | 46367/50000 [8:24:49<39:23,  1.54it/s]


 93%|████████████████████████████████▍  | 46368/50000 [8:24:50<39:04,  1.55it/s]


 93%|████████████████████████████████▍  | 46369/50000 [8:24:50<38:56,  1.55it/s]


 93%|████████████████████████████████▍  | 46370/50000 [8:24:51<39:19,  1.54it/s]


 93%|████████████████████████████████▍  | 46371/50000 [8:24:52<44:17,  1.37it/s]


 93%|████████████████████████████████▍  | 46372/50000 [8:24:53<45:41,  1.32it/s]


 93%|████████████████████████████████▍  | 46373/50000 [8:24:53<43:48,  1.38it/s]


 93%|████████████████████████████████▍  | 46374/50000 [8:24:54<44:21,  1.36it/s]


 93%|████████████████████████████████▍  | 46375/50000 [8:24:55<45:43,  1.32it/s]


 93%|████████████████████████████████▍  | 46376/50000 [8:24:56<44:16,  1.36it/s]


 93%|████████████████████████████████▍  | 46377/50000 [8:24:56<41:45,  1.45it/s]


 93%|████████████████████████████████▍  | 46378/50000 [8:24:57<39:33,  1.53it/s]


 93%|████████████████████████████████▍  | 46379/50000 [8:24:57<38:00,  1.59it/s]


 93%|████████████████████████████████▍  | 46380/50000 [8:24:58<38:31,  1.57it/s]


 93%|████████████████████████████████▍  | 46381/50000 [8:24:59<40:31,  1.49it/s]


 93%|████████████████████████████████▍  | 46382/50000 [8:24:59<40:29,  1.49it/s]


 93%|████████████████████████████████▍  | 46383/50000 [8:25:00<38:25,  1.57it/s]


 93%|████████████████████████████████▍  | 46384/50000 [8:25:01<38:39,  1.56it/s]


 93%|████████████████████████████████▍  | 46385/50000 [8:25:01<38:16,  1.57it/s]


 93%|████████████████████████████████▍  | 46386/50000 [8:25:02<39:01,  1.54it/s]


 93%|████████████████████████████████▍  | 46387/50000 [8:25:03<40:16,  1.49it/s]


 93%|████████████████████████████████▍  | 46388/50000 [8:25:03<38:35,  1.56it/s]


 93%|████████████████████████████████▍  | 46389/50000 [8:25:04<41:51,  1.44it/s]


 93%|████████████████████████████████▍  | 46390/50000 [8:25:05<40:56,  1.47it/s]


 93%|████████████████████████████████▍  | 46391/50000 [8:25:05<40:10,  1.50it/s]


 93%|████████████████████████████████▍  | 46392/50000 [8:25:06<39:52,  1.51it/s]


 93%|████████████████████████████████▍  | 46393/50000 [8:25:07<39:32,  1.52it/s]


 93%|████████████████████████████████▍  | 46394/50000 [8:25:07<38:58,  1.54it/s]


 93%|████████████████████████████████▍  | 46395/50000 [8:25:08<39:36,  1.52it/s]


 93%|████████████████████████████████▍  | 46396/50000 [8:25:08<37:58,  1.58it/s]


 93%|████████████████████████████████▍  | 46397/50000 [8:25:09<38:06,  1.58it/s]


 93%|████████████████████████████████▍  | 46398/50000 [8:25:10<37:15,  1.61it/s]


 93%|████████████████████████████████▍  | 46399/50000 [8:25:10<39:01,  1.54it/s]


 93%|████████████████████████████████▍  | 46400/50000 [8:25:11<36:35,  1.64it/s]
                                                                                
{'loss': 3.1017, 'grad_norm': 3.4656407833099365, 'learning_rate': 7.2e-05, 'epoch': 2.43}

 93%|████████████████████████████████▍  | 46400/50000 [8:25:11<36:35,  1.64it/s]


 93%|████████████████████████████████▍  | 46401/50000 [8:25:11<35:00,  1.71it/s]


 93%|████████████████████████████████▍  | 46402/50000 [8:25:12<35:11,  1.70it/s]


 93%|████████████████████████████████▍  | 46403/50000 [8:25:13<36:55,  1.62it/s]


 93%|████████████████████████████████▍  | 46404/50000 [8:25:13<37:27,  1.60it/s]


 93%|████████████████████████████████▍  | 46405/50000 [8:25:14<37:00,  1.62it/s]


 93%|████████████████████████████████▍  | 46406/50000 [8:25:15<37:56,  1.58it/s]


 93%|████████████████████████████████▍  | 46407/50000 [8:25:15<38:09,  1.57it/s]


 93%|████████████████████████████████▍  | 46408/50000 [8:25:16<37:24,  1.60it/s]


 93%|████████████████████████████████▍  | 46409/50000 [8:25:16<37:26,  1.60it/s]


 93%|████████████████████████████████▍  | 46410/50000 [8:25:17<38:21,  1.56it/s]


 93%|████████████████████████████████▍  | 46411/50000 [8:25:18<39:01,  1.53it/s]


 93%|████████████████████████████████▍  | 46412/50000 [8:25:19<39:23,  1.52it/s]


 93%|████████████████████████████████▍  | 46413/50000 [8:25:19<43:44,  1.37it/s]


 93%|████████████████████████████████▍  | 46414/50000 [8:25:20<41:19,  1.45it/s]


 93%|████████████████████████████████▍  | 46415/50000 [8:25:21<42:57,  1.39it/s]


 93%|████████████████████████████████▍  | 46416/50000 [8:25:22<42:51,  1.39it/s]


 93%|████████████████████████████████▍  | 46417/50000 [8:25:22<45:08,  1.32it/s]


 93%|████████████████████████████████▍  | 46418/50000 [8:25:23<43:03,  1.39it/s]


 93%|████████████████████████████████▍  | 46419/50000 [8:25:24<39:28,  1.51it/s]


 93%|████████████████████████████████▍  | 46420/50000 [8:25:24<38:57,  1.53it/s]


 93%|████████████████████████████████▍  | 46421/50000 [8:25:25<40:53,  1.46it/s]


 93%|████████████████████████████████▍  | 46422/50000 [8:25:26<43:54,  1.36it/s]


 93%|████████████████████████████████▍  | 46423/50000 [8:25:27<44:41,  1.33it/s]


 93%|████████████████████████████████▍  | 46424/50000 [8:25:27<41:47,  1.43it/s]


 93%|████████████████████████████████▍  | 46425/50000 [8:25:28<42:02,  1.42it/s]


 93%|████████████████████████████████▍  | 46426/50000 [8:25:28<39:50,  1.50it/s]


 93%|████████████████████████████████▍  | 46427/50000 [8:25:29<39:32,  1.51it/s]


 93%|████████████████████████████████▍  | 46428/50000 [8:25:30<39:11,  1.52it/s]


 93%|████████████████████████████████▌  | 46429/50000 [8:25:30<36:45,  1.62it/s]


 93%|████████████████████████████████▌  | 46430/50000 [8:25:31<36:33,  1.63it/s]


 93%|████████████████████████████████▌  | 46431/50000 [8:25:32<37:34,  1.58it/s]


 93%|████████████████████████████████▌  | 46432/50000 [8:25:32<38:01,  1.56it/s]


 93%|████████████████████████████████▌  | 46433/50000 [8:25:33<36:46,  1.62it/s]


 93%|████████████████████████████████▌  | 46434/50000 [8:25:33<38:29,  1.54it/s]


 93%|████████████████████████████████▌  | 46435/50000 [8:25:34<41:36,  1.43it/s]


 93%|████████████████████████████████▌  | 46436/50000 [8:25:35<42:16,  1.40it/s]


 93%|████████████████████████████████▌  | 46437/50000 [8:25:36<44:00,  1.35it/s]


 93%|████████████████████████████████▌  | 46438/50000 [8:25:36<41:57,  1.41it/s]


 93%|████████████████████████████████▌  | 46439/50000 [8:25:37<39:37,  1.50it/s]


 93%|████████████████████████████████▌  | 46440/50000 [8:25:38<40:21,  1.47it/s]


 93%|████████████████████████████████▌  | 46441/50000 [8:25:38<39:57,  1.48it/s]


 93%|████████████████████████████████▌  | 46442/50000 [8:25:39<40:07,  1.48it/s]


 93%|████████████████████████████████▌  | 46443/50000 [8:25:40<39:46,  1.49it/s]


 93%|████████████████████████████████▌  | 46444/50000 [8:25:41<43:09,  1.37it/s]


 93%|████████████████████████████████▌  | 46445/50000 [8:25:41<43:24,  1.37it/s]


 93%|████████████████████████████████▌  | 46446/50000 [8:25:42<45:06,  1.31it/s]


 93%|████████████████████████████████▌  | 46447/50000 [8:25:43<42:08,  1.41it/s]


 93%|████████████████████████████████▌  | 46448/50000 [8:25:43<40:50,  1.45it/s]


 93%|████████████████████████████████▌  | 46449/50000 [8:25:44<43:45,  1.35it/s]


 93%|████████████████████████████████▌  | 46450/50000 [8:25:45<41:44,  1.42it/s]


 93%|████████████████████████████████▌  | 46451/50000 [8:25:46<40:40,  1.45it/s]


 93%|████████████████████████████████▌  | 46452/50000 [8:25:46<39:01,  1.52it/s]


 93%|████████████████████████████████▌  | 46453/50000 [8:25:47<40:35,  1.46it/s]


 93%|████████████████████████████████▌  | 46454/50000 [8:25:47<38:50,  1.52it/s]


 93%|████████████████████████████████▌  | 46455/50000 [8:25:48<37:03,  1.59it/s]


 93%|████████████████████████████████▌  | 46456/50000 [8:25:49<37:09,  1.59it/s]


 93%|████████████████████████████████▌  | 46457/50000 [8:25:49<35:09,  1.68it/s]


 93%|████████████████████████████████▌  | 46458/50000 [8:25:50<33:26,  1.77it/s]


 93%|████████████████████████████████▌  | 46459/50000 [8:25:50<33:12,  1.78it/s]


 93%|████████████████████████████████▌  | 46460/50000 [8:25:51<34:15,  1.72it/s]


 93%|████████████████████████████████▌  | 46461/50000 [8:25:51<34:18,  1.72it/s]


 93%|████████████████████████████████▌  | 46462/50000 [8:25:52<36:57,  1.60it/s]


 93%|████████████████████████████████▌  | 46463/50000 [8:25:53<38:01,  1.55it/s]


 93%|████████████████████████████████▌  | 46464/50000 [8:25:54<37:42,  1.56it/s]


 93%|████████████████████████████████▌  | 46465/50000 [8:25:54<36:20,  1.62it/s]


 93%|████████████████████████████████▌  | 46466/50000 [8:25:55<37:58,  1.55it/s]


 93%|████████████████████████████████▌  | 46467/50000 [8:25:55<38:18,  1.54it/s]


 93%|████████████████████████████████▌  | 46468/50000 [8:25:56<38:39,  1.52it/s]


 93%|████████████████████████████████▌  | 46469/50000 [8:25:57<38:14,  1.54it/s]


 93%|████████████████████████████████▌  | 46470/50000 [8:25:57<37:49,  1.56it/s]


 93%|████████████████████████████████▌  | 46471/50000 [8:25:58<38:05,  1.54it/s]


 93%|████████████████████████████████▌  | 46472/50000 [8:25:59<39:32,  1.49it/s]


 93%|████████████████████████████████▌  | 46473/50000 [8:26:00<41:07,  1.43it/s]


 93%|████████████████████████████████▌  | 46474/50000 [8:26:00<40:22,  1.46it/s]


 93%|████████████████████████████████▌  | 46475/50000 [8:26:01<38:11,  1.54it/s]


 93%|████████████████████████████████▌  | 46476/50000 [8:26:01<39:41,  1.48it/s]


 93%|████████████████████████████████▌  | 46477/50000 [8:26:02<39:35,  1.48it/s]


 93%|████████████████████████████████▌  | 46478/50000 [8:26:03<39:31,  1.49it/s]


 93%|████████████████████████████████▌  | 46479/50000 [8:26:03<38:55,  1.51it/s]


 93%|████████████████████████████████▌  | 46480/50000 [8:26:04<37:22,  1.57it/s]


 93%|████████████████████████████████▌  | 46481/50000 [8:26:05<37:43,  1.55it/s]


 93%|████████████████████████████████▌  | 46482/50000 [8:26:05<35:15,  1.66it/s]


 93%|████████████████████████████████▌  | 46483/50000 [8:26:06<36:01,  1.63it/s]


 93%|████████████████████████████████▌  | 46484/50000 [8:26:07<37:06,  1.58it/s]


 93%|████████████████████████████████▌  | 46485/50000 [8:26:07<35:59,  1.63it/s]


 93%|████████████████████████████████▌  | 46486/50000 [8:26:08<36:38,  1.60it/s]


 93%|████████████████████████████████▌  | 46487/50000 [8:26:08<37:26,  1.56it/s]


 93%|████████████████████████████████▌  | 46488/50000 [8:26:09<36:19,  1.61it/s]


 93%|████████████████████████████████▌  | 46489/50000 [8:26:10<35:45,  1.64it/s]


 93%|████████████████████████████████▌  | 46490/50000 [8:26:10<34:41,  1.69it/s]


 93%|████████████████████████████████▌  | 46491/50000 [8:26:11<35:36,  1.64it/s]


 93%|████████████████████████████████▌  | 46492/50000 [8:26:11<35:34,  1.64it/s]


 93%|████████████████████████████████▌  | 46493/50000 [8:26:12<34:33,  1.69it/s]


 93%|████████████████████████████████▌  | 46494/50000 [8:26:13<35:31,  1.64it/s]


 93%|████████████████████████████████▌  | 46495/50000 [8:26:13<36:06,  1.62it/s]


 93%|████████████████████████████████▌  | 46496/50000 [8:26:14<36:11,  1.61it/s]


 93%|████████████████████████████████▌  | 46497/50000 [8:26:14<35:40,  1.64it/s]


 93%|████████████████████████████████▌  | 46498/50000 [8:26:15<36:51,  1.58it/s]


 93%|████████████████████████████████▌  | 46499/50000 [8:26:16<37:27,  1.56it/s]


 93%|████████████████████████████████▌  | 46500/50000 [8:26:16<37:26,  1.56it/s]
                                                                                
{'loss': 3.0867, 'grad_norm': 3.26271390914917, 'learning_rate': 7.000000000000001e-05, 'epoch': 2.43}

 93%|████████████████████████████████▌  | 46500/50000 [8:26:16<37:26,  1.56it/s]


 93%|████████████████████████████████▌  | 46501/50000 [8:26:17<35:09,  1.66it/s]


 93%|████████████████████████████████▌  | 46502/50000 [8:26:18<37:45,  1.54it/s]


 93%|████████████████████████████████▌  | 46503/50000 [8:26:19<40:37,  1.43it/s]


 93%|████████████████████████████████▌  | 46504/50000 [8:26:19<39:30,  1.47it/s]


 93%|████████████████████████████████▌  | 46505/50000 [8:26:20<40:03,  1.45it/s]


 93%|████████████████████████████████▌  | 46506/50000 [8:26:20<39:24,  1.48it/s]


 93%|████████████████████████████████▌  | 46507/50000 [8:26:21<37:25,  1.56it/s]


 93%|████████████████████████████████▌  | 46508/50000 [8:26:22<36:01,  1.62it/s]


 93%|████████████████████████████████▌  | 46509/50000 [8:26:22<40:09,  1.45it/s]


 93%|████████████████████████████████▌  | 46510/50000 [8:26:23<39:41,  1.47it/s]


 93%|████████████████████████████████▌  | 46511/50000 [8:26:24<40:15,  1.44it/s]


 93%|████████████████████████████████▌  | 46512/50000 [8:26:24<36:57,  1.57it/s]


 93%|████████████████████████████████▌  | 46513/50000 [8:26:25<39:15,  1.48it/s]


 93%|████████████████████████████████▌  | 46514/50000 [8:26:26<37:57,  1.53it/s]


 93%|████████████████████████████████▌  | 46515/50000 [8:26:26<38:14,  1.52it/s]


 93%|████████████████████████████████▌  | 46516/50000 [8:26:27<38:26,  1.51it/s]


 93%|████████████████████████████████▌  | 46517/50000 [8:26:28<37:12,  1.56it/s]


 93%|████████████████████████████████▌  | 46518/50000 [8:26:28<38:35,  1.50it/s]


 93%|████████████████████████████████▌  | 46519/50000 [8:26:29<38:47,  1.50it/s]


 93%|████████████████████████████████▌  | 46520/50000 [8:26:30<36:57,  1.57it/s]


 93%|████████████████████████████████▌  | 46521/50000 [8:26:30<35:19,  1.64it/s]


 93%|████████████████████████████████▌  | 46522/50000 [8:26:31<37:03,  1.56it/s]


 93%|████████████████████████████████▌  | 46523/50000 [8:26:31<36:16,  1.60it/s]


 93%|████████████████████████████████▌  | 46524/50000 [8:26:32<38:18,  1.51it/s]


 93%|████████████████████████████████▌  | 46525/50000 [8:26:33<40:03,  1.45it/s]


 93%|████████████████████████████████▌  | 46526/50000 [8:26:34<39:28,  1.47it/s]


 93%|████████████████████████████████▌  | 46527/50000 [8:26:34<38:23,  1.51it/s]


 93%|████████████████████████████████▌  | 46528/50000 [8:26:35<37:38,  1.54it/s]


 93%|████████████████████████████████▌  | 46529/50000 [8:26:36<37:39,  1.54it/s]


 93%|████████████████████████████████▌  | 46530/50000 [8:26:36<39:27,  1.47it/s]


 93%|████████████████████████████████▌  | 46531/50000 [8:26:37<39:01,  1.48it/s]


 93%|████████████████████████████████▌  | 46532/50000 [8:26:38<37:18,  1.55it/s]


 93%|████████████████████████████████▌  | 46533/50000 [8:26:38<37:05,  1.56it/s]


 93%|████████████████████████████████▌  | 46534/50000 [8:26:39<35:56,  1.61it/s]


 93%|████████████████████████████████▌  | 46535/50000 [8:26:39<38:07,  1.51it/s]


 93%|████████████████████████████████▌  | 46536/50000 [8:26:40<38:11,  1.51it/s]


 93%|████████████████████████████████▌  | 46537/50000 [8:26:41<39:28,  1.46it/s]


 93%|████████████████████████████████▌  | 46538/50000 [8:26:42<40:49,  1.41it/s]


 93%|████████████████████████████████▌  | 46539/50000 [8:26:42<42:41,  1.35it/s]


 93%|████████████████████████████████▌  | 46540/50000 [8:26:43<41:22,  1.39it/s]


 93%|████████████████████████████████▌  | 46541/50000 [8:26:44<38:54,  1.48it/s]


 93%|████████████████████████████████▌  | 46542/50000 [8:26:44<37:43,  1.53it/s]


 93%|████████████████████████████████▌  | 46543/50000 [8:26:45<40:24,  1.43it/s]


 93%|████████████████████████████████▌  | 46544/50000 [8:26:46<38:59,  1.48it/s]


 93%|████████████████████████████████▌  | 46545/50000 [8:26:46<37:03,  1.55it/s]


 93%|████████████████████████████████▌  | 46546/50000 [8:26:47<37:00,  1.56it/s]


 93%|████████████████████████████████▌  | 46547/50000 [8:26:48<36:38,  1.57it/s]


 93%|████████████████████████████████▌  | 46548/50000 [8:26:48<40:26,  1.42it/s]


 93%|████████████████████████████████▌  | 46549/50000 [8:26:49<38:36,  1.49it/s]


 93%|████████████████████████████████▌  | 46550/50000 [8:26:50<36:53,  1.56it/s]


 93%|████████████████████████████████▌  | 46551/50000 [8:26:50<36:11,  1.59it/s]


 93%|████████████████████████████████▌  | 46552/50000 [8:26:51<37:50,  1.52it/s]


 93%|████████████████████████████████▌  | 46553/50000 [8:26:52<37:32,  1.53it/s]


 93%|████████████████████████████████▌  | 46554/50000 [8:26:52<39:19,  1.46it/s]


 93%|████████████████████████████████▌  | 46555/50000 [8:26:53<36:14,  1.58it/s]


 93%|████████████████████████████████▌  | 46556/50000 [8:26:53<34:10,  1.68it/s]


 93%|████████████████████████████████▌  | 46557/50000 [8:26:54<34:50,  1.65it/s]


 93%|████████████████████████████████▌  | 46558/50000 [8:26:55<33:58,  1.69it/s]


 93%|████████████████████████████████▌  | 46559/50000 [8:26:55<36:20,  1.58it/s]


 93%|████████████████████████████████▌  | 46560/50000 [8:26:56<33:40,  1.70it/s]


 93%|████████████████████████████████▌  | 46561/50000 [8:26:56<34:20,  1.67it/s]


 93%|████████████████████████████████▌  | 46562/50000 [8:26:57<37:01,  1.55it/s]


 93%|████████████████████████████████▌  | 46563/50000 [8:26:58<34:33,  1.66it/s]


 93%|████████████████████████████████▌  | 46564/50000 [8:26:58<34:30,  1.66it/s]


 93%|████████████████████████████████▌  | 46565/50000 [8:26:59<33:50,  1.69it/s]


 93%|████████████████████████████████▌  | 46566/50000 [8:26:59<34:45,  1.65it/s]


 93%|████████████████████████████████▌  | 46567/50000 [8:27:00<35:11,  1.63it/s]


 93%|████████████████████████████████▌  | 46568/50000 [8:27:01<34:56,  1.64it/s]


 93%|████████████████████████████████▌  | 46569/50000 [8:27:01<35:18,  1.62it/s]


 93%|████████████████████████████████▌  | 46570/50000 [8:27:02<34:21,  1.66it/s]


 93%|████████████████████████████████▌  | 46571/50000 [8:27:02<32:40,  1.75it/s]


 93%|████████████████████████████████▌  | 46572/50000 [8:27:03<36:58,  1.55it/s]


 93%|████████████████████████████████▌  | 46573/50000 [8:27:04<36:03,  1.58it/s]


 93%|████████████████████████████████▌  | 46574/50000 [8:27:04<35:38,  1.60it/s]


 93%|████████████████████████████████▌  | 46575/50000 [8:27:05<37:52,  1.51it/s]


 93%|████████████████████████████████▌  | 46576/50000 [8:27:06<36:15,  1.57it/s]


 93%|████████████████████████████████▌  | 46577/50000 [8:27:06<34:49,  1.64it/s]


 93%|████████████████████████████████▌  | 46578/50000 [8:27:07<34:07,  1.67it/s]


 93%|████████████████████████████████▌  | 46579/50000 [8:27:07<33:18,  1.71it/s]


 93%|████████████████████████████████▌  | 46580/50000 [8:27:08<32:14,  1.77it/s]


 93%|████████████████████████████████▌  | 46581/50000 [8:27:09<36:24,  1.57it/s]


 93%|████████████████████████████████▌  | 46582/50000 [8:27:09<36:43,  1.55it/s]


 93%|████████████████████████████████▌  | 46583/50000 [8:27:10<35:52,  1.59it/s]


 93%|████████████████████████████████▌  | 46584/50000 [8:27:11<36:46,  1.55it/s]


 93%|████████████████████████████████▌  | 46585/50000 [8:27:11<36:28,  1.56it/s]


 93%|████████████████████████████████▌  | 46586/50000 [8:27:12<35:33,  1.60it/s]


 93%|████████████████████████████████▌  | 46587/50000 [8:27:13<35:28,  1.60it/s]


 93%|████████████████████████████████▌  | 46588/50000 [8:27:13<37:10,  1.53it/s]


 93%|████████████████████████████████▌  | 46589/50000 [8:27:14<36:22,  1.56it/s]


 93%|████████████████████████████████▌  | 46590/50000 [8:27:14<35:23,  1.61it/s]


 93%|████████████████████████████████▌  | 46591/50000 [8:27:15<33:26,  1.70it/s]


 93%|████████████████████████████████▌  | 46592/50000 [8:27:16<33:32,  1.69it/s]


 93%|████████████████████████████████▌  | 46593/50000 [8:27:16<37:34,  1.51it/s]


 93%|████████████████████████████████▌  | 46594/50000 [8:27:17<40:37,  1.40it/s]


 93%|████████████████████████████████▌  | 46595/50000 [8:27:18<37:55,  1.50it/s]


 93%|████████████████████████████████▌  | 46596/50000 [8:27:18<37:31,  1.51it/s]


 93%|████████████████████████████████▌  | 46597/50000 [8:27:19<38:34,  1.47it/s]


 93%|████████████████████████████████▌  | 46598/50000 [8:27:20<37:52,  1.50it/s]


 93%|████████████████████████████████▌  | 46599/50000 [8:27:20<36:18,  1.56it/s]


 93%|████████████████████████████████▌  | 46600/50000 [8:27:21<36:03,  1.57it/s]
                                                                                
{'loss': 3.1187, 'grad_norm': 3.157302141189575, 'learning_rate': 6.800000000000001e-05, 'epoch': 2.44}

 93%|████████████████████████████████▌  | 46600/50000 [8:27:21<36:03,  1.57it/s]


 93%|████████████████████████████████▌  | 46601/50000 [8:27:22<34:49,  1.63it/s]


 93%|████████████████████████████████▌  | 46602/50000 [8:27:22<33:14,  1.70it/s]


 93%|████████████████████████████████▌  | 46603/50000 [8:27:23<38:44,  1.46it/s]


 93%|████████████████████████████████▌  | 46604/50000 [8:27:24<38:04,  1.49it/s]


 93%|████████████████████████████████▌  | 46605/50000 [8:27:24<38:56,  1.45it/s]


 93%|████████████████████████████████▌  | 46606/50000 [8:27:25<36:00,  1.57it/s]


 93%|████████████████████████████████▌  | 46607/50000 [8:27:25<34:34,  1.64it/s]


 93%|████████████████████████████████▋  | 46608/50000 [8:27:26<32:55,  1.72it/s]


 93%|████████████████████████████████▋  | 46609/50000 [8:27:27<35:12,  1.61it/s]


 93%|████████████████████████████████▋  | 46610/50000 [8:27:27<36:02,  1.57it/s]


 93%|████████████████████████████████▋  | 46611/50000 [8:27:28<36:06,  1.56it/s]


 93%|████████████████████████████████▋  | 46612/50000 [8:27:29<35:15,  1.60it/s]


 93%|████████████████████████████████▋  | 46613/50000 [8:27:29<35:42,  1.58it/s]


 93%|████████████████████████████████▋  | 46614/50000 [8:27:30<40:12,  1.40it/s]


 93%|████████████████████████████████▋  | 46615/50000 [8:27:31<38:06,  1.48it/s]


 93%|████████████████████████████████▋  | 46616/50000 [8:27:31<39:46,  1.42it/s]


 93%|████████████████████████████████▋  | 46617/50000 [8:27:32<37:54,  1.49it/s]


 93%|████████████████████████████████▋  | 46618/50000 [8:27:33<36:41,  1.54it/s]


 93%|████████████████████████████████▋  | 46619/50000 [8:27:33<36:54,  1.53it/s]


 93%|████████████████████████████████▋  | 46620/50000 [8:27:34<38:09,  1.48it/s]


 93%|████████████████████████████████▋  | 46621/50000 [8:27:35<39:02,  1.44it/s]


 93%|████████████████████████████████▋  | 46622/50000 [8:27:35<35:49,  1.57it/s]


 93%|████████████████████████████████▋  | 46623/50000 [8:27:36<36:36,  1.54it/s]


 93%|████████████████████████████████▋  | 46624/50000 [8:27:37<36:23,  1.55it/s]


 93%|████████████████████████████████▋  | 46625/50000 [8:27:37<35:11,  1.60it/s]


 93%|████████████████████████████████▋  | 46626/50000 [8:27:38<34:27,  1.63it/s]


 93%|████████████████████████████████▋  | 46627/50000 [8:27:38<36:05,  1.56it/s]


 93%|████████████████████████████████▋  | 46628/50000 [8:27:39<36:47,  1.53it/s]


 93%|████████████████████████████████▋  | 46629/50000 [8:27:40<35:39,  1.58it/s]


 93%|████████████████████████████████▋  | 46630/50000 [8:27:40<35:30,  1.58it/s]


 93%|████████████████████████████████▋  | 46631/50000 [8:27:41<36:53,  1.52it/s]


 93%|████████████████████████████████▋  | 46632/50000 [8:27:42<35:50,  1.57it/s]


 93%|████████████████████████████████▋  | 46633/50000 [8:27:42<36:10,  1.55it/s]


 93%|████████████████████████████████▋  | 46634/50000 [8:27:43<36:03,  1.56it/s]


 93%|████████████████████████████████▋  | 46635/50000 [8:27:43<33:55,  1.65it/s]


 93%|████████████████████████████████▋  | 46636/50000 [8:27:44<33:06,  1.69it/s]


 93%|████████████████████████████████▋  | 46637/50000 [8:27:45<33:55,  1.65it/s]


 93%|████████████████████████████████▋  | 46638/50000 [8:27:45<33:57,  1.65it/s]


 93%|████████████████████████████████▋  | 46639/50000 [8:27:46<31:37,  1.77it/s]


 93%|████████████████████████████████▋  | 46640/50000 [8:27:46<31:24,  1.78it/s]


 93%|████████████████████████████████▋  | 46641/50000 [8:27:47<33:14,  1.68it/s]


 93%|████████████████████████████████▋  | 46642/50000 [8:27:48<35:10,  1.59it/s]


 93%|████████████████████████████████▋  | 46643/50000 [8:27:48<35:21,  1.58it/s]


 93%|████████████████████████████████▋  | 46644/50000 [8:27:49<35:46,  1.56it/s]


 93%|████████████████████████████████▋  | 46645/50000 [8:27:50<36:24,  1.54it/s]


 93%|████████████████████████████████▋  | 46646/50000 [8:27:50<38:10,  1.46it/s]


 93%|████████████████████████████████▋  | 46647/50000 [8:27:51<35:56,  1.55it/s]


 93%|████████████████████████████████▋  | 46648/50000 [8:27:52<35:19,  1.58it/s]


 93%|████████████████████████████████▋  | 46649/50000 [8:27:52<36:04,  1.55it/s]


 93%|████████████████████████████████▋  | 46650/50000 [8:27:53<34:22,  1.62it/s]


 93%|████████████████████████████████▋  | 46651/50000 [8:27:53<35:11,  1.59it/s]


 93%|████████████████████████████████▋  | 46652/50000 [8:27:54<34:34,  1.61it/s]


 93%|████████████████████████████████▋  | 46653/50000 [8:27:55<32:50,  1.70it/s]


 93%|████████████████████████████████▋  | 46654/50000 [8:27:55<32:51,  1.70it/s]


 93%|████████████████████████████████▋  | 46655/50000 [8:27:56<32:33,  1.71it/s]


 93%|████████████████████████████████▋  | 46656/50000 [8:27:56<31:20,  1.78it/s]


 93%|████████████████████████████████▋  | 46657/50000 [8:27:57<31:36,  1.76it/s]


 93%|████████████████████████████████▋  | 46658/50000 [8:27:57<30:02,  1.85it/s]


 93%|████████████████████████████████▋  | 46659/50000 [8:27:58<29:28,  1.89it/s]


 93%|████████████████████████████████▋  | 46660/50000 [8:27:59<34:24,  1.62it/s]


 93%|████████████████████████████████▋  | 46661/50000 [8:27:59<33:31,  1.66it/s]


 93%|████████████████████████████████▋  | 46662/50000 [8:28:00<33:32,  1.66it/s]


 93%|████████████████████████████████▋  | 46663/50000 [8:28:00<34:22,  1.62it/s]


 93%|████████████████████████████████▋  | 46664/50000 [8:28:01<37:40,  1.48it/s]


 93%|████████████████████████████████▋  | 46665/50000 [8:28:02<36:55,  1.51it/s]


 93%|████████████████████████████████▋  | 46666/50000 [8:28:03<36:35,  1.52it/s]


 93%|████████████████████████████████▋  | 46667/50000 [8:28:03<37:40,  1.47it/s]


 93%|████████████████████████████████▋  | 46668/50000 [8:28:04<35:57,  1.54it/s]


 93%|████████████████████████████████▋  | 46669/50000 [8:28:05<35:55,  1.55it/s]


 93%|████████████████████████████████▋  | 46670/50000 [8:28:05<37:31,  1.48it/s]


 93%|████████████████████████████████▋  | 46671/50000 [8:28:06<36:48,  1.51it/s]


 93%|████████████████████████████████▋  | 46672/50000 [8:28:07<38:01,  1.46it/s]


 93%|████████████████████████████████▋  | 46673/50000 [8:28:07<36:23,  1.52it/s]


 93%|████████████████████████████████▋  | 46674/50000 [8:28:08<36:34,  1.52it/s]


 93%|████████████████████████████████▋  | 46675/50000 [8:28:08<34:12,  1.62it/s]


 93%|████████████████████████████████▋  | 46676/50000 [8:28:09<33:37,  1.65it/s]


 93%|████████████████████████████████▋  | 46677/50000 [8:28:10<34:41,  1.60it/s]


 93%|████████████████████████████████▋  | 46678/50000 [8:28:10<35:12,  1.57it/s]


 93%|████████████████████████████████▋  | 46679/50000 [8:28:11<33:47,  1.64it/s]


 93%|████████████████████████████████▋  | 46680/50000 [8:28:12<34:34,  1.60it/s]


 93%|████████████████████████████████▋  | 46681/50000 [8:28:12<33:57,  1.63it/s]


 93%|████████████████████████████████▋  | 46682/50000 [8:28:13<34:47,  1.59it/s]


 93%|████████████████████████████████▋  | 46683/50000 [8:28:13<35:03,  1.58it/s]


 93%|████████████████████████████████▋  | 46684/50000 [8:28:14<32:52,  1.68it/s]


 93%|████████████████████████████████▋  | 46685/50000 [8:28:15<34:15,  1.61it/s]


 93%|████████████████████████████████▋  | 46686/50000 [8:28:15<33:14,  1.66it/s]


 93%|████████████████████████████████▋  | 46687/50000 [8:28:16<33:34,  1.64it/s]


 93%|████████████████████████████████▋  | 46688/50000 [8:28:16<34:51,  1.58it/s]


 93%|████████████████████████████████▋  | 46689/50000 [8:28:17<39:31,  1.40it/s]


 93%|████████████████████████████████▋  | 46690/50000 [8:28:18<38:36,  1.43it/s]


 93%|████████████████████████████████▋  | 46691/50000 [8:28:19<35:41,  1.55it/s]


 93%|████████████████████████████████▋  | 46692/50000 [8:28:19<34:27,  1.60it/s]


 93%|████████████████████████████████▋  | 46693/50000 [8:28:20<35:02,  1.57it/s]


 93%|████████████████████████████████▋  | 46694/50000 [8:28:20<35:00,  1.57it/s]


 93%|████████████████████████████████▋  | 46695/50000 [8:28:21<35:16,  1.56it/s]


 93%|████████████████████████████████▋  | 46696/50000 [8:28:22<34:56,  1.58it/s]


 93%|████████████████████████████████▋  | 46697/50000 [8:28:22<34:47,  1.58it/s]


 93%|████████████████████████████████▋  | 46698/50000 [8:28:23<33:58,  1.62it/s]


 93%|████████████████████████████████▋  | 46699/50000 [8:28:24<34:08,  1.61it/s]


 93%|████████████████████████████████▋  | 46700/50000 [8:28:24<35:41,  1.54it/s]
                                                                                
{'loss': 3.0961, 'grad_norm': 2.9301295280456543, 'learning_rate': 6.6e-05, 'epoch': 2.45}

 93%|████████████████████████████████▋  | 46700/50000 [8:28:24<35:41,  1.54it/s]


 93%|████████████████████████████████▋  | 46701/50000 [8:28:25<35:57,  1.53it/s]


 93%|████████████████████████████████▋  | 46702/50000 [8:28:26<35:24,  1.55it/s]


 93%|████████████████████████████████▋  | 46703/50000 [8:28:26<35:12,  1.56it/s]


 93%|████████████████████████████████▋  | 46704/50000 [8:28:27<35:29,  1.55it/s]


 93%|████████████████████████████████▋  | 46705/50000 [8:28:28<37:10,  1.48it/s]


 93%|████████████████████████████████▋  | 46706/50000 [8:28:28<34:37,  1.59it/s]


 93%|████████████████████████████████▋  | 46707/50000 [8:28:29<33:49,  1.62it/s]


 93%|████████████████████████████████▋  | 46708/50000 [8:28:29<34:17,  1.60it/s]


 93%|████████████████████████████████▋  | 46709/50000 [8:28:30<34:49,  1.57it/s]


 93%|████████████████████████████████▋  | 46710/50000 [8:28:31<34:34,  1.59it/s]


 93%|████████████████████████████████▋  | 46711/50000 [8:28:32<38:58,  1.41it/s]


 93%|████████████████████████████████▋  | 46712/50000 [8:28:32<35:46,  1.53it/s]


 93%|████████████████████████████████▋  | 46713/50000 [8:28:33<36:41,  1.49it/s]


 93%|████████████████████████████████▋  | 46714/50000 [8:28:33<33:38,  1.63it/s]


 93%|████████████████████████████████▋  | 46715/50000 [8:28:34<33:19,  1.64it/s]


 93%|████████████████████████████████▋  | 46716/50000 [8:28:34<33:17,  1.64it/s]


 93%|████████████████████████████████▋  | 46717/50000 [8:28:35<34:05,  1.60it/s]


 93%|████████████████████████████████▋  | 46718/50000 [8:28:36<34:14,  1.60it/s]


 93%|████████████████████████████████▋  | 46719/50000 [8:28:36<34:33,  1.58it/s]


 93%|████████████████████████████████▋  | 46720/50000 [8:28:37<36:15,  1.51it/s]


 93%|████████████████████████████████▋  | 46721/50000 [8:28:38<34:36,  1.58it/s]


 93%|████████████████████████████████▋  | 46722/50000 [8:28:38<34:10,  1.60it/s]


 93%|████████████████████████████████▋  | 46723/50000 [8:28:39<33:45,  1.62it/s]


 93%|████████████████████████████████▋  | 46724/50000 [8:28:40<34:51,  1.57it/s]


 93%|████████████████████████████████▋  | 46725/50000 [8:28:40<34:22,  1.59it/s]


 93%|████████████████████████████████▋  | 46726/50000 [8:28:41<36:15,  1.51it/s]


 93%|████████████████████████████████▋  | 46727/50000 [8:28:42<35:51,  1.52it/s]


 93%|████████████████████████████████▋  | 46728/50000 [8:28:42<33:40,  1.62it/s]


 93%|████████████████████████████████▋  | 46729/50000 [8:28:43<34:41,  1.57it/s]


 93%|████████████████████████████████▋  | 46730/50000 [8:28:43<36:00,  1.51it/s]


 93%|████████████████████████████████▋  | 46731/50000 [8:28:44<34:57,  1.56it/s]


 93%|████████████████████████████████▋  | 46732/50000 [8:28:45<35:14,  1.55it/s]


 93%|████████████████████████████████▋  | 46733/50000 [8:28:45<34:58,  1.56it/s]


 93%|████████████████████████████████▋  | 46734/50000 [8:28:46<33:40,  1.62it/s]


 93%|████████████████████████████████▋  | 46735/50000 [8:28:46<32:33,  1.67it/s]


 93%|████████████████████████████████▋  | 46736/50000 [8:28:47<34:21,  1.58it/s]


 93%|████████████████████████████████▋  | 46737/50000 [8:28:48<33:20,  1.63it/s]


 93%|████████████████████████████████▋  | 46738/50000 [8:28:48<33:14,  1.64it/s]


 93%|████████████████████████████████▋  | 46739/50000 [8:28:49<31:46,  1.71it/s]


 93%|████████████████████████████████▋  | 46740/50000 [8:28:50<33:56,  1.60it/s]


 93%|████████████████████████████████▋  | 46741/50000 [8:28:50<34:03,  1.59it/s]


 93%|████████████████████████████████▋  | 46742/50000 [8:28:51<34:33,  1.57it/s]


 93%|████████████████████████████████▋  | 46743/50000 [8:28:51<33:52,  1.60it/s]


 93%|████████████████████████████████▋  | 46744/50000 [8:28:52<35:14,  1.54it/s]


 93%|████████████████████████████████▋  | 46745/50000 [8:28:53<35:18,  1.54it/s]


 93%|████████████████████████████████▋  | 46746/50000 [8:28:54<35:52,  1.51it/s]


 93%|████████████████████████████████▋  | 46747/50000 [8:28:54<38:30,  1.41it/s]


 93%|████████████████████████████████▋  | 46748/50000 [8:28:55<40:09,  1.35it/s]


 93%|████████████████████████████████▋  | 46749/50000 [8:28:56<40:00,  1.35it/s]


 94%|████████████████████████████████▋  | 46750/50000 [8:28:56<36:59,  1.46it/s]


 94%|████████████████████████████████▋  | 46751/50000 [8:28:57<36:49,  1.47it/s]


 94%|████████████████████████████████▋  | 46752/50000 [8:28:58<36:04,  1.50it/s]


 94%|████████████████████████████████▋  | 46753/50000 [8:28:59<38:42,  1.40it/s]


 94%|████████████████████████████████▋  | 46754/50000 [8:28:59<39:21,  1.37it/s]


 94%|████████████████████████████████▋  | 46755/50000 [8:29:00<36:32,  1.48it/s]


 94%|████████████████████████████████▋  | 46756/50000 [8:29:00<34:26,  1.57it/s]


 94%|████████████████████████████████▋  | 46757/50000 [8:29:01<35:36,  1.52it/s]


 94%|████████████████████████████████▋  | 46758/50000 [8:29:02<33:45,  1.60it/s]


 94%|████████████████████████████████▋  | 46759/50000 [8:29:02<32:27,  1.66it/s]


 94%|████████████████████████████████▋  | 46760/50000 [8:29:03<32:29,  1.66it/s]


 94%|████████████████████████████████▋  | 46761/50000 [8:29:04<33:14,  1.62it/s]


 94%|████████████████████████████████▋  | 46762/50000 [8:29:04<34:46,  1.55it/s]


 94%|████████████████████████████████▋  | 46763/50000 [8:29:05<34:54,  1.55it/s]


 94%|████████████████████████████████▋  | 46764/50000 [8:29:06<34:41,  1.55it/s]


 94%|████████████████████████████████▋  | 46765/50000 [8:29:06<33:49,  1.59it/s]


 94%|████████████████████████████████▋  | 46766/50000 [8:29:07<32:43,  1.65it/s]


 94%|████████████████████████████████▋  | 46767/50000 [8:29:07<32:37,  1.65it/s]


 94%|████████████████████████████████▋  | 46768/50000 [8:29:08<33:42,  1.60it/s]


 94%|████████████████████████████████▋  | 46769/50000 [8:29:09<32:49,  1.64it/s]


 94%|████████████████████████████████▋  | 46770/50000 [8:29:09<34:48,  1.55it/s]


 94%|████████████████████████████████▋  | 46771/50000 [8:29:10<36:33,  1.47it/s]


 94%|████████████████████████████████▋  | 46772/50000 [8:29:11<34:34,  1.56it/s]


 94%|████████████████████████████████▋  | 46773/50000 [8:29:11<35:14,  1.53it/s]


 94%|████████████████████████████████▋  | 46774/50000 [8:29:12<33:44,  1.59it/s]


 94%|████████████████████████████████▋  | 46775/50000 [8:29:12<33:43,  1.59it/s]


 94%|████████████████████████████████▋  | 46776/50000 [8:29:13<35:34,  1.51it/s]


 94%|████████████████████████████████▋  | 46777/50000 [8:29:14<33:07,  1.62it/s]


 94%|████████████████████████████████▋  | 46778/50000 [8:29:14<35:21,  1.52it/s]


 94%|████████████████████████████████▋  | 46779/50000 [8:29:15<34:26,  1.56it/s]


 94%|████████████████████████████████▋  | 46780/50000 [8:29:16<32:13,  1.67it/s]


 94%|████████████████████████████████▋  | 46781/50000 [8:29:16<32:32,  1.65it/s]


 94%|████████████████████████████████▋  | 46782/50000 [8:29:17<34:37,  1.55it/s]


 94%|████████████████████████████████▋  | 46783/50000 [8:29:18<38:00,  1.41it/s]


 94%|████████████████████████████████▋  | 46784/50000 [8:29:18<36:52,  1.45it/s]


 94%|████████████████████████████████▋  | 46785/50000 [8:29:19<38:55,  1.38it/s]


 94%|████████████████████████████████▊  | 46786/50000 [8:29:20<37:13,  1.44it/s]


 94%|████████████████████████████████▊  | 46787/50000 [8:29:20<36:44,  1.46it/s]


 94%|████████████████████████████████▊  | 46788/50000 [8:29:21<36:41,  1.46it/s]


 94%|████████████████████████████████▊  | 46789/50000 [8:29:22<33:21,  1.60it/s]


 94%|████████████████████████████████▊  | 46790/50000 [8:29:22<32:54,  1.63it/s]


 94%|████████████████████████████████▊  | 46791/50000 [8:29:23<34:35,  1.55it/s]


 94%|████████████████████████████████▊  | 46792/50000 [8:29:24<34:39,  1.54it/s]


 94%|████████████████████████████████▊  | 46793/50000 [8:29:24<35:44,  1.50it/s]


 94%|████████████████████████████████▊  | 46794/50000 [8:29:25<36:43,  1.45it/s]


 94%|████████████████████████████████▊  | 46795/50000 [8:29:26<35:25,  1.51it/s]


 94%|████████████████████████████████▊  | 46796/50000 [8:29:26<35:08,  1.52it/s]


 94%|████████████████████████████████▊  | 46797/50000 [8:29:27<34:13,  1.56it/s]


 94%|████████████████████████████████▊  | 46798/50000 [8:29:27<32:12,  1.66it/s]


 94%|████████████████████████████████▊  | 46799/50000 [8:29:28<34:13,  1.56it/s]


 94%|████████████████████████████████▊  | 46800/50000 [8:29:29<34:29,  1.55it/s]
                                                                                
{'loss': 3.1347, 'grad_norm': 3.719378709793091, 'learning_rate': 6.4e-05, 'epoch': 2.45}

 94%|████████████████████████████████▊  | 46800/50000 [8:29:29<34:29,  1.55it/s]


 94%|████████████████████████████████▊  | 46801/50000 [8:29:29<34:21,  1.55it/s]


 94%|████████████████████████████████▊  | 46802/50000 [8:29:30<35:55,  1.48it/s]


 94%|████████████████████████████████▊  | 46803/50000 [8:29:31<35:09,  1.52it/s]


 94%|████████████████████████████████▊  | 46804/50000 [8:29:32<36:04,  1.48it/s]


 94%|████████████████████████████████▊  | 46805/50000 [8:29:32<34:20,  1.55it/s]


 94%|████████████████████████████████▊  | 46806/50000 [8:29:33<32:22,  1.64it/s]


 94%|████████████████████████████████▊  | 46807/50000 [8:29:33<33:58,  1.57it/s]


 94%|████████████████████████████████▊  | 46808/50000 [8:29:34<33:41,  1.58it/s]


 94%|████████████████████████████████▊  | 46809/50000 [8:29:35<33:04,  1.61it/s]


 94%|████████████████████████████████▊  | 46810/50000 [8:29:35<35:18,  1.51it/s]


 94%|████████████████████████████████▊  | 46811/50000 [8:29:36<34:36,  1.54it/s]


 94%|████████████████████████████████▊  | 46812/50000 [8:29:37<34:12,  1.55it/s]


 94%|████████████████████████████████▊  | 46813/50000 [8:29:37<36:05,  1.47it/s]


 94%|████████████████████████████████▊  | 46814/50000 [8:29:38<34:20,  1.55it/s]


 94%|████████████████████████████████▊  | 46815/50000 [8:29:39<37:07,  1.43it/s]


 94%|████████████████████████████████▊  | 46816/50000 [8:29:39<36:05,  1.47it/s]


 94%|████████████████████████████████▊  | 46817/50000 [8:29:40<34:40,  1.53it/s]


 94%|████████████████████████████████▊  | 46818/50000 [8:29:40<32:07,  1.65it/s]


 94%|████████████████████████████████▊  | 46819/50000 [8:29:41<32:10,  1.65it/s]


 94%|████████████████████████████████▊  | 46820/50000 [8:29:42<32:31,  1.63it/s]


 94%|████████████████████████████████▊  | 46821/50000 [8:29:42<33:26,  1.58it/s]


 94%|████████████████████████████████▊  | 46822/50000 [8:29:43<32:47,  1.62it/s]


 94%|████████████████████████████████▊  | 46823/50000 [8:29:44<31:55,  1.66it/s]


 94%|████████████████████████████████▊  | 46824/50000 [8:29:44<34:22,  1.54it/s]


 94%|████████████████████████████████▊  | 46825/50000 [8:29:45<33:56,  1.56it/s]


 94%|████████████████████████████████▊  | 46826/50000 [8:29:46<33:24,  1.58it/s]


 94%|████████████████████████████████▊  | 46827/50000 [8:29:46<33:45,  1.57it/s]


 94%|████████████████████████████████▊  | 46828/50000 [8:29:47<34:11,  1.55it/s]


 94%|████████████████████████████████▊  | 46829/50000 [8:29:48<34:23,  1.54it/s]


 94%|████████████████████████████████▊  | 46830/50000 [8:29:48<33:37,  1.57it/s]


 94%|████████████████████████████████▊  | 46831/50000 [8:29:49<33:35,  1.57it/s]


 94%|████████████████████████████████▊  | 46832/50000 [8:29:49<31:13,  1.69it/s]


 94%|████████████████████████████████▊  | 46833/50000 [8:29:50<30:48,  1.71it/s]


 94%|████████████████████████████████▊  | 46834/50000 [8:29:50<29:45,  1.77it/s]


 94%|████████████████████████████████▊  | 46835/50000 [8:29:51<33:01,  1.60it/s]


 94%|████████████████████████████████▊  | 46836/50000 [8:29:52<30:37,  1.72it/s]


 94%|████████████████████████████████▊  | 46837/50000 [8:29:52<36:12,  1.46it/s]


 94%|████████████████████████████████▊  | 46838/50000 [8:29:53<34:05,  1.55it/s]


 94%|████████████████████████████████▊  | 46839/50000 [8:29:54<34:09,  1.54it/s]


 94%|████████████████████████████████▊  | 46840/50000 [8:29:54<34:06,  1.54it/s]


 94%|████████████████████████████████▊  | 46841/50000 [8:29:55<33:40,  1.56it/s]


 94%|████████████████████████████████▊  | 46842/50000 [8:29:56<32:58,  1.60it/s]


 94%|████████████████████████████████▊  | 46843/50000 [8:29:56<33:46,  1.56it/s]


 94%|████████████████████████████████▊  | 46844/50000 [8:29:57<35:33,  1.48it/s]


 94%|████████████████████████████████▊  | 46845/50000 [8:29:58<35:17,  1.49it/s]


 94%|████████████████████████████████▊  | 46846/50000 [8:29:58<36:12,  1.45it/s]


 94%|████████████████████████████████▊  | 46847/50000 [8:29:59<36:34,  1.44it/s]


 94%|████████████████████████████████▊  | 46848/50000 [8:30:00<35:54,  1.46it/s]


 94%|████████████████████████████████▊  | 46849/50000 [8:30:00<35:01,  1.50it/s]


 94%|████████████████████████████████▊  | 46850/50000 [8:30:01<34:25,  1.53it/s]


 94%|████████████████████████████████▊  | 46851/50000 [8:30:02<34:05,  1.54it/s]


 94%|████████████████████████████████▊  | 46852/50000 [8:30:02<33:50,  1.55it/s]


 94%|████████████████████████████████▊  | 46853/50000 [8:30:03<32:17,  1.62it/s]


 94%|████████████████████████████████▊  | 46854/50000 [8:30:03<32:30,  1.61it/s]


 94%|████████████████████████████████▊  | 46855/50000 [8:30:04<34:27,  1.52it/s]


 94%|████████████████████████████████▊  | 46856/50000 [8:30:05<31:57,  1.64it/s]


 94%|████████████████████████████████▊  | 46857/50000 [8:30:05<31:50,  1.65it/s]


 94%|████████████████████████████████▊  | 46858/50000 [8:30:06<32:51,  1.59it/s]


 94%|████████████████████████████████▊  | 46859/50000 [8:30:07<33:16,  1.57it/s]


 94%|████████████████████████████████▊  | 46860/50000 [8:30:07<32:45,  1.60it/s]


 94%|████████████████████████████████▊  | 46861/50000 [8:30:08<31:45,  1.65it/s]


 94%|████████████████████████████████▊  | 46862/50000 [8:30:08<32:53,  1.59it/s]


 94%|████████████████████████████████▊  | 46863/50000 [8:30:09<31:39,  1.65it/s]


 94%|████████████████████████████████▊  | 46864/50000 [8:30:10<31:53,  1.64it/s]


 94%|████████████████████████████████▊  | 46865/50000 [8:30:10<32:02,  1.63it/s]


 94%|████████████████████████████████▊  | 46866/50000 [8:30:11<32:20,  1.61it/s]


 94%|████████████████████████████████▊  | 46867/50000 [8:30:12<33:58,  1.54it/s]


 94%|████████████████████████████████▊  | 46868/50000 [8:30:12<35:30,  1.47it/s]


 94%|████████████████████████████████▊  | 46869/50000 [8:30:13<35:26,  1.47it/s]


 94%|████████████████████████████████▊  | 46870/50000 [8:30:14<34:37,  1.51it/s]


 94%|████████████████████████████████▊  | 46871/50000 [8:30:14<33:02,  1.58it/s]


 94%|████████████████████████████████▊  | 46872/50000 [8:30:15<33:13,  1.57it/s]


 94%|████████████████████████████████▊  | 46873/50000 [8:30:15<32:09,  1.62it/s]


 94%|████████████████████████████████▊  | 46874/50000 [8:30:16<33:10,  1.57it/s]


 94%|████████████████████████████████▊  | 46875/50000 [8:30:17<34:43,  1.50it/s]


 94%|████████████████████████████████▊  | 46876/50000 [8:30:17<33:13,  1.57it/s]


 94%|████████████████████████████████▊  | 46877/50000 [8:30:18<33:19,  1.56it/s]


 94%|████████████████████████████████▊  | 46878/50000 [8:30:19<33:24,  1.56it/s]


 94%|████████████████████████████████▊  | 46879/50000 [8:30:19<33:45,  1.54it/s]


 94%|████████████████████████████████▊  | 46880/50000 [8:30:20<34:17,  1.52it/s]


 94%|████████████████████████████████▊  | 46881/50000 [8:30:21<33:51,  1.53it/s]


 94%|████████████████████████████████▊  | 46882/50000 [8:30:21<33:45,  1.54it/s]


 94%|████████████████████████████████▊  | 46883/50000 [8:30:22<34:53,  1.49it/s]


 94%|████████████████████████████████▊  | 46884/50000 [8:30:23<33:00,  1.57it/s]


 94%|████████████████████████████████▊  | 46885/50000 [8:30:23<33:16,  1.56it/s]


 94%|████████████████████████████████▊  | 46886/50000 [8:30:24<32:40,  1.59it/s]


 94%|████████████████████████████████▊  | 46887/50000 [8:30:25<32:37,  1.59it/s]


 94%|████████████████████████████████▊  | 46888/50000 [8:30:25<34:05,  1.52it/s]


 94%|████████████████████████████████▊  | 46889/50000 [8:30:26<32:37,  1.59it/s]


 94%|████████████████████████████████▊  | 46890/50000 [8:30:26<33:09,  1.56it/s]


 94%|████████████████████████████████▊  | 46891/50000 [8:30:27<33:38,  1.54it/s]


 94%|████████████████████████████████▊  | 46892/50000 [8:30:28<35:05,  1.48it/s]


 94%|████████████████████████████████▊  | 46893/50000 [8:30:28<33:06,  1.56it/s]


 94%|████████████████████████████████▊  | 46894/50000 [8:30:29<34:48,  1.49it/s]


 94%|████████████████████████████████▊  | 46895/50000 [8:30:30<33:18,  1.55it/s]


 94%|████████████████████████████████▊  | 46896/50000 [8:30:30<32:03,  1.61it/s]


 94%|████████████████████████████████▊  | 46897/50000 [8:30:31<30:53,  1.67it/s]


 94%|████████████████████████████████▊  | 46898/50000 [8:30:32<33:08,  1.56it/s]


 94%|████████████████████████████████▊  | 46899/50000 [8:30:32<31:49,  1.62it/s]


 94%|████████████████████████████████▊  | 46900/50000 [8:30:33<31:07,  1.66it/s]
                                                                                
{'loss': 3.1123, 'grad_norm': 11.09067153930664, 'learning_rate': 6.2e-05, 'epoch': 2.46}

 94%|████████████████████████████████▊  | 46900/50000 [8:30:33<31:07,  1.66it/s]


 94%|████████████████████████████████▊  | 46901/50000 [8:30:33<30:54,  1.67it/s]


 94%|████████████████████████████████▊  | 46902/50000 [8:30:34<31:32,  1.64it/s]


 94%|████████████████████████████████▊  | 46903/50000 [8:30:35<32:17,  1.60it/s]


 94%|████████████████████████████████▊  | 46904/50000 [8:30:35<32:53,  1.57it/s]


 94%|████████████████████████████████▊  | 46905/50000 [8:30:36<36:02,  1.43it/s]


 94%|████████████████████████████████▊  | 46906/50000 [8:30:37<33:46,  1.53it/s]


 94%|████████████████████████████████▊  | 46907/50000 [8:30:37<32:38,  1.58it/s]


 94%|████████████████████████████████▊  | 46908/50000 [8:30:38<32:50,  1.57it/s]


 94%|████████████████████████████████▊  | 46909/50000 [8:30:39<35:28,  1.45it/s]


 94%|████████████████████████████████▊  | 46910/50000 [8:30:39<36:24,  1.41it/s]


 94%|████████████████████████████████▊  | 46911/50000 [8:30:40<34:28,  1.49it/s]


 94%|████████████████████████████████▊  | 46912/50000 [8:30:41<33:18,  1.55it/s]


 94%|████████████████████████████████▊  | 46913/50000 [8:30:41<32:29,  1.58it/s]


 94%|████████████████████████████████▊  | 46914/50000 [8:30:42<34:05,  1.51it/s]


 94%|████████████████████████████████▊  | 46915/50000 [8:30:42<31:08,  1.65it/s]


 94%|████████████████████████████████▊  | 46916/50000 [8:30:43<31:44,  1.62it/s]


 94%|████████████████████████████████▊  | 46917/50000 [8:30:44<31:58,  1.61it/s]


 94%|████████████████████████████████▊  | 46918/50000 [8:30:45<35:14,  1.46it/s]


 94%|████████████████████████████████▊  | 46919/50000 [8:30:45<37:50,  1.36it/s]


 94%|████████████████████████████████▊  | 46920/50000 [8:30:46<42:01,  1.22it/s]


 94%|████████████████████████████████▊  | 46921/50000 [8:30:47<39:38,  1.29it/s]


 94%|████████████████████████████████▊  | 46922/50000 [8:30:48<35:34,  1.44it/s]


 94%|████████████████████████████████▊  | 46923/50000 [8:30:48<35:48,  1.43it/s]


 94%|████████████████████████████████▊  | 46924/50000 [8:30:49<35:10,  1.46it/s]


 94%|████████████████████████████████▊  | 46925/50000 [8:30:50<33:17,  1.54it/s]


 94%|████████████████████████████████▊  | 46926/50000 [8:30:50<31:19,  1.64it/s]


 94%|████████████████████████████████▊  | 46927/50000 [8:30:51<31:50,  1.61it/s]


 94%|████████████████████████████████▊  | 46928/50000 [8:30:51<30:44,  1.67it/s]


 94%|████████████████████████████████▊  | 46929/50000 [8:30:52<30:51,  1.66it/s]


 94%|████████████████████████████████▊  | 46930/50000 [8:30:52<30:10,  1.70it/s]


 94%|████████████████████████████████▊  | 46931/50000 [8:30:53<30:52,  1.66it/s]


 94%|████████████████████████████████▊  | 46932/50000 [8:30:54<31:38,  1.62it/s]


 94%|████████████████████████████████▊  | 46933/50000 [8:30:55<36:32,  1.40it/s]


 94%|████████████████████████████████▊  | 46934/50000 [8:30:55<34:54,  1.46it/s]


 94%|████████████████████████████████▊  | 46935/50000 [8:30:56<32:35,  1.57it/s]


 94%|████████████████████████████████▊  | 46936/50000 [8:30:56<32:33,  1.57it/s]


 94%|████████████████████████████████▊  | 46937/50000 [8:30:57<31:32,  1.62it/s]


 94%|████████████████████████████████▊  | 46938/50000 [8:30:58<31:13,  1.63it/s]


 94%|████████████████████████████████▊  | 46939/50000 [8:30:58<33:03,  1.54it/s]


 94%|████████████████████████████████▊  | 46940/50000 [8:30:59<33:21,  1.53it/s]


 94%|████████████████████████████████▊  | 46941/50000 [8:31:00<32:23,  1.57it/s]


 94%|████████████████████████████████▊  | 46942/50000 [8:31:00<35:22,  1.44it/s]


 94%|████████████████████████████████▊  | 46943/50000 [8:31:01<35:08,  1.45it/s]


 94%|████████████████████████████████▊  | 46944/50000 [8:31:02<33:19,  1.53it/s]


 94%|████████████████████████████████▊  | 46945/50000 [8:31:02<35:05,  1.45it/s]


 94%|████████████████████████████████▊  | 46946/50000 [8:31:03<33:29,  1.52it/s]


 94%|████████████████████████████████▊  | 46947/50000 [8:31:04<34:43,  1.47it/s]


 94%|████████████████████████████████▊  | 46948/50000 [8:31:04<33:07,  1.54it/s]


 94%|████████████████████████████████▊  | 46949/50000 [8:31:05<32:57,  1.54it/s]


 94%|████████████████████████████████▊  | 46950/50000 [8:31:06<32:44,  1.55it/s]


 94%|████████████████████████████████▊  | 46951/50000 [8:31:06<34:34,  1.47it/s]


 94%|████████████████████████████████▊  | 46952/50000 [8:31:07<33:15,  1.53it/s]


 94%|████████████████████████████████▊  | 46953/50000 [8:31:08<33:00,  1.54it/s]


 94%|████████████████████████████████▊  | 46954/50000 [8:31:08<33:09,  1.53it/s]


 94%|████████████████████████████████▊  | 46955/50000 [8:31:09<32:18,  1.57it/s]


 94%|████████████████████████████████▊  | 46956/50000 [8:31:10<33:50,  1.50it/s]


 94%|████████████████████████████████▊  | 46957/50000 [8:31:10<36:44,  1.38it/s]


 94%|████████████████████████████████▊  | 46958/50000 [8:31:11<35:12,  1.44it/s]


 94%|████████████████████████████████▊  | 46959/50000 [8:31:12<36:59,  1.37it/s]


 94%|████████████████████████████████▊  | 46960/50000 [8:31:13<34:35,  1.46it/s]


 94%|████████████████████████████████▊  | 46961/50000 [8:31:13<35:00,  1.45it/s]


 94%|████████████████████████████████▊  | 46962/50000 [8:31:14<34:55,  1.45it/s]


 94%|████████████████████████████████▊  | 46963/50000 [8:31:15<35:34,  1.42it/s]


 94%|████████████████████████████████▊  | 46964/50000 [8:31:15<33:54,  1.49it/s]


 94%|████████████████████████████████▉  | 46965/50000 [8:31:16<32:39,  1.55it/s]


 94%|████████████████████████████████▉  | 46966/50000 [8:31:16<32:57,  1.53it/s]


 94%|████████████████████████████████▉  | 46967/50000 [8:31:17<31:55,  1.58it/s]


 94%|████████████████████████████████▉  | 46968/50000 [8:31:18<32:21,  1.56it/s]


 94%|████████████████████████████████▉  | 46969/50000 [8:31:18<31:07,  1.62it/s]


 94%|████████████████████████████████▉  | 46970/50000 [8:31:19<31:57,  1.58it/s]


 94%|████████████████████████████████▉  | 46971/50000 [8:31:20<31:53,  1.58it/s]


 94%|████████████████████████████████▉  | 46972/50000 [8:31:20<33:18,  1.52it/s]


 94%|████████████████████████████████▉  | 46973/50000 [8:31:21<32:26,  1.56it/s]


 94%|████████████████████████████████▉  | 46974/50000 [8:31:21<30:15,  1.67it/s]


 94%|████████████████████████████████▉  | 46975/50000 [8:31:22<31:26,  1.60it/s]


 94%|████████████████████████████████▉  | 46976/50000 [8:31:23<32:16,  1.56it/s]


 94%|████████████████████████████████▉  | 46977/50000 [8:31:23<32:27,  1.55it/s]


 94%|████████████████████████████████▉  | 46978/50000 [8:31:24<32:29,  1.55it/s]


 94%|████████████████████████████████▉  | 46979/50000 [8:31:25<31:32,  1.60it/s]


 94%|████████████████████████████████▉  | 46980/50000 [8:31:25<32:12,  1.56it/s]


 94%|████████████████████████████████▉  | 46981/50000 [8:31:26<31:02,  1.62it/s]


 94%|████████████████████████████████▉  | 46982/50000 [8:31:27<32:02,  1.57it/s]


 94%|████████████████████████████████▉  | 46983/50000 [8:31:27<31:58,  1.57it/s]


 94%|████████████████████████████████▉  | 46984/50000 [8:31:28<32:35,  1.54it/s]


 94%|████████████████████████████████▉  | 46985/50000 [8:31:28<31:35,  1.59it/s]


 94%|████████████████████████████████▉  | 46986/50000 [8:31:29<30:54,  1.63it/s]


 94%|████████████████████████████████▉  | 46987/50000 [8:31:30<31:28,  1.60it/s]


 94%|████████████████████████████████▉  | 46988/50000 [8:31:30<30:43,  1.63it/s]


 94%|████████████████████████████████▉  | 46989/50000 [8:31:31<34:06,  1.47it/s]


 94%|████████████████████████████████▉  | 46990/50000 [8:31:32<33:14,  1.51it/s]


 94%|████████████████████████████████▉  | 46991/50000 [8:31:32<31:47,  1.58it/s]


 94%|████████████████████████████████▉  | 46992/50000 [8:31:33<33:33,  1.49it/s]


 94%|████████████████████████████████▉  | 46993/50000 [8:31:34<32:28,  1.54it/s]


 94%|████████████████████████████████▉  | 46994/50000 [8:31:34<31:47,  1.58it/s]


 94%|████████████████████████████████▉  | 46995/50000 [8:31:35<31:52,  1.57it/s]


 94%|████████████████████████████████▉  | 46996/50000 [8:31:36<33:29,  1.49it/s]


 94%|████████████████████████████████▉  | 46997/50000 [8:31:36<33:15,  1.50it/s]


 94%|████████████████████████████████▉  | 46998/50000 [8:31:37<33:04,  1.51it/s]


 94%|████████████████████████████████▉  | 46999/50000 [8:31:38<31:48,  1.57it/s]


 94%|████████████████████████████████▉  | 47000/50000 [8:31:38<32:01,  1.56it/s]
                                                                                
{'loss': 3.1512, 'grad_norm': 5.297137260437012, 'learning_rate': 6e-05, 'epoch': 2.46}

 94%|████████████████████████████████▉  | 47000/50000 [8:31:38<32:01,  1.56it/s]


 94%|████████████████████████████████▉  | 47001/50000 [8:31:39<32:41,  1.53it/s]


 94%|████████████████████████████████▉  | 47002/50000 [8:31:39<31:13,  1.60it/s]


 94%|████████████████████████████████▉  | 47003/50000 [8:31:40<31:36,  1.58it/s]


 94%|████████████████████████████████▉  | 47004/50000 [8:31:41<31:41,  1.58it/s]


 94%|████████████████████████████████▉  | 47005/50000 [8:31:41<31:55,  1.56it/s]


 94%|████████████████████████████████▉  | 47006/50000 [8:31:42<32:18,  1.54it/s]


 94%|████████████████████████████████▉  | 47007/50000 [8:31:43<32:13,  1.55it/s]


 94%|████████████████████████████████▉  | 47008/50000 [8:31:43<32:25,  1.54it/s]


 94%|████████████████████████████████▉  | 47009/50000 [8:31:44<35:30,  1.40it/s]


 94%|████████████████████████████████▉  | 47010/50000 [8:31:45<34:48,  1.43it/s]


 94%|████████████████████████████████▉  | 47011/50000 [8:31:45<33:16,  1.50it/s]


 94%|████████████████████████████████▉  | 47012/50000 [8:31:46<31:35,  1.58it/s]


 94%|████████████████████████████████▉  | 47013/50000 [8:31:47<33:38,  1.48it/s]


 94%|████████████████████████████████▉  | 47014/50000 [8:31:47<31:58,  1.56it/s]


 94%|████████████████████████████████▉  | 47015/50000 [8:31:48<31:05,  1.60it/s]


 94%|████████████████████████████████▉  | 47016/50000 [8:31:49<30:37,  1.62it/s]


 94%|████████████████████████████████▉  | 47017/50000 [8:31:49<31:08,  1.60it/s]


 94%|████████████████████████████████▉  | 47018/50000 [8:31:50<30:06,  1.65it/s]


 94%|████████████████████████████████▉  | 47019/50000 [8:31:50<30:44,  1.62it/s]


 94%|████████████████████████████████▉  | 47020/50000 [8:31:51<31:25,  1.58it/s]


 94%|████████████████████████████████▉  | 47021/50000 [8:31:52<31:41,  1.57it/s]


 94%|████████████████████████████████▉  | 47022/50000 [8:31:52<31:07,  1.59it/s]


 94%|████████████████████████████████▉  | 47023/50000 [8:31:53<31:29,  1.58it/s]


 94%|████████████████████████████████▉  | 47024/50000 [8:31:54<31:27,  1.58it/s]


 94%|████████████████████████████████▉  | 47025/50000 [8:31:54<31:19,  1.58it/s]


 94%|████████████████████████████████▉  | 47026/50000 [8:31:55<31:31,  1.57it/s]


 94%|████████████████████████████████▉  | 47027/50000 [8:31:56<31:33,  1.57it/s]


 94%|████████████████████████████████▉  | 47028/50000 [8:31:56<29:06,  1.70it/s]


 94%|████████████████████████████████▉  | 47029/50000 [8:31:57<30:13,  1.64it/s]


 94%|████████████████████████████████▉  | 47030/50000 [8:31:57<29:25,  1.68it/s]


 94%|████████████████████████████████▉  | 47031/50000 [8:31:58<29:48,  1.66it/s]


 94%|████████████████████████████████▉  | 47032/50000 [8:31:59<31:23,  1.58it/s]


 94%|████████████████████████████████▉  | 47033/50000 [8:31:59<32:06,  1.54it/s]


 94%|████████████████████████████████▉  | 47034/50000 [8:32:00<31:56,  1.55it/s]


 94%|████████████████████████████████▉  | 47035/50000 [8:32:01<32:00,  1.54it/s]


 94%|████████████████████████████████▉  | 47036/50000 [8:32:01<33:47,  1.46it/s]


 94%|████████████████████████████████▉  | 47037/50000 [8:32:02<32:16,  1.53it/s]


 94%|████████████████████████████████▉  | 47038/50000 [8:32:03<32:20,  1.53it/s]


 94%|████████████████████████████████▉  | 47039/50000 [8:32:03<31:20,  1.57it/s]


 94%|████████████████████████████████▉  | 47040/50000 [8:32:04<30:05,  1.64it/s]


 94%|████████████████████████████████▉  | 47041/50000 [8:32:04<29:45,  1.66it/s]


 94%|████████████████████████████████▉  | 47042/50000 [8:32:05<30:27,  1.62it/s]


 94%|████████████████████████████████▉  | 47043/50000 [8:32:05<29:55,  1.65it/s]


 94%|████████████████████████████████▉  | 47044/50000 [8:32:06<29:43,  1.66it/s]


 94%|████████████████████████████████▉  | 47045/50000 [8:32:07<29:40,  1.66it/s]


 94%|████████████████████████████████▉  | 47046/50000 [8:32:07<28:29,  1.73it/s]


 94%|████████████████████████████████▉  | 47047/50000 [8:32:08<26:55,  1.83it/s]


 94%|████████████████████████████████▉  | 47048/50000 [8:32:08<28:27,  1.73it/s]


 94%|████████████████████████████████▉  | 47049/50000 [8:32:09<29:25,  1.67it/s]


 94%|████████████████████████████████▉  | 47050/50000 [8:32:10<29:55,  1.64it/s]


 94%|████████████████████████████████▉  | 47051/50000 [8:32:10<29:32,  1.66it/s]


 94%|████████████████████████████████▉  | 47052/50000 [8:32:11<29:28,  1.67it/s]


 94%|████████████████████████████████▉  | 47053/50000 [8:32:11<30:21,  1.62it/s]


 94%|████████████████████████████████▉  | 47054/50000 [8:32:12<30:05,  1.63it/s]


 94%|████████████████████████████████▉  | 47055/50000 [8:32:13<34:19,  1.43it/s]


 94%|████████████████████████████████▉  | 47056/50000 [8:32:13<32:07,  1.53it/s]


 94%|████████████████████████████████▉  | 47057/50000 [8:32:14<34:24,  1.43it/s]


 94%|████████████████████████████████▉  | 47058/50000 [8:32:15<32:11,  1.52it/s]


 94%|████████████████████████████████▉  | 47059/50000 [8:32:16<33:04,  1.48it/s]


 94%|████████████████████████████████▉  | 47060/50000 [8:32:16<34:03,  1.44it/s]


 94%|████████████████████████████████▉  | 47061/50000 [8:32:17<33:19,  1.47it/s]


 94%|████████████████████████████████▉  | 47062/50000 [8:32:18<32:37,  1.50it/s]


 94%|████████████████████████████████▉  | 47063/50000 [8:32:18<31:17,  1.56it/s]


 94%|████████████████████████████████▉  | 47064/50000 [8:32:19<29:33,  1.66it/s]


 94%|████████████████████████████████▉  | 47065/50000 [8:32:19<30:20,  1.61it/s]


 94%|████████████████████████████████▉  | 47066/50000 [8:32:20<29:40,  1.65it/s]


 94%|████████████████████████████████▉  | 47067/50000 [8:32:21<33:12,  1.47it/s]


 94%|████████████████████████████████▉  | 47068/50000 [8:32:21<32:53,  1.49it/s]


 94%|████████████████████████████████▉  | 47069/50000 [8:32:22<30:13,  1.62it/s]


 94%|████████████████████████████████▉  | 47070/50000 [8:32:23<29:40,  1.65it/s]


 94%|████████████████████████████████▉  | 47071/50000 [8:32:23<30:07,  1.62it/s]


 94%|████████████████████████████████▉  | 47072/50000 [8:32:24<32:25,  1.51it/s]


 94%|████████████████████████████████▉  | 47073/50000 [8:32:25<33:38,  1.45it/s]


 94%|████████████████████████████████▉  | 47074/50000 [8:32:25<31:52,  1.53it/s]


 94%|████████████████████████████████▉  | 47075/50000 [8:32:26<31:36,  1.54it/s]


 94%|████████████████████████████████▉  | 47076/50000 [8:32:27<32:47,  1.49it/s]


 94%|████████████████████████████████▉  | 47077/50000 [8:32:27<32:37,  1.49it/s]


 94%|████████████████████████████████▉  | 47078/50000 [8:32:28<33:32,  1.45it/s]


 94%|████████████████████████████████▉  | 47079/50000 [8:32:29<31:08,  1.56it/s]


 94%|████████████████████████████████▉  | 47080/50000 [8:32:29<31:26,  1.55it/s]


 94%|████████████████████████████████▉  | 47081/50000 [8:32:30<30:04,  1.62it/s]


 94%|████████████████████████████████▉  | 47082/50000 [8:32:30<29:51,  1.63it/s]


 94%|████████████████████████████████▉  | 47083/50000 [8:32:31<30:10,  1.61it/s]


 94%|████████████████████████████████▉  | 47084/50000 [8:32:32<30:33,  1.59it/s]


 94%|████████████████████████████████▉  | 47085/50000 [8:32:32<29:33,  1.64it/s]


 94%|████████████████████████████████▉  | 47086/50000 [8:32:33<28:45,  1.69it/s]


 94%|████████████████████████████████▉  | 47087/50000 [8:32:33<29:17,  1.66it/s]


 94%|████████████████████████████████▉  | 47088/50000 [8:32:34<30:06,  1.61it/s]


 94%|████████████████████████████████▉  | 47089/50000 [8:32:35<30:46,  1.58it/s]


 94%|████████████████████████████████▉  | 47090/50000 [8:32:35<32:10,  1.51it/s]


 94%|████████████████████████████████▉  | 47091/50000 [8:32:36<34:40,  1.40it/s]


 94%|████████████████████████████████▉  | 47092/50000 [8:32:37<31:30,  1.54it/s]


 94%|████████████████████████████████▉  | 47093/50000 [8:32:37<31:09,  1.55it/s]


 94%|████████████████████████████████▉  | 47094/50000 [8:32:38<32:58,  1.47it/s]


 94%|████████████████████████████████▉  | 47095/50000 [8:32:39<32:32,  1.49it/s]


 94%|████████████████████████████████▉  | 47096/50000 [8:32:39<32:08,  1.51it/s]


 94%|████████████████████████████████▉  | 47097/50000 [8:32:40<32:19,  1.50it/s]


 94%|████████████████████████████████▉  | 47098/50000 [8:32:41<31:48,  1.52it/s]


 94%|████████████████████████████████▉  | 47099/50000 [8:32:42<34:33,  1.40it/s]


 94%|████████████████████████████████▉  | 47100/50000 [8:32:42<36:22,  1.33it/s]
                                                                                
{'loss': 3.1135, 'grad_norm': 3.018352746963501, 'learning_rate': 5.800000000000001e-05, 'epoch': 2.47}

 94%|████████████████████████████████▉  | 47100/50000 [8:32:42<36:22,  1.33it/s]


 94%|████████████████████████████████▉  | 47101/50000 [8:32:43<34:33,  1.40it/s]


 94%|████████████████████████████████▉  | 47102/50000 [8:32:44<34:40,  1.39it/s]


 94%|████████████████████████████████▉  | 47103/50000 [8:32:44<32:52,  1.47it/s]


 94%|████████████████████████████████▉  | 47104/50000 [8:32:45<30:59,  1.56it/s]


 94%|████████████████████████████████▉  | 47105/50000 [8:32:46<35:55,  1.34it/s]


 94%|████████████████████████████████▉  | 47106/50000 [8:32:47<34:56,  1.38it/s]


 94%|████████████████████████████████▉  | 47107/50000 [8:32:47<34:21,  1.40it/s]


 94%|████████████████████████████████▉  | 47108/50000 [8:32:48<32:38,  1.48it/s]


 94%|████████████████████████████████▉  | 47109/50000 [8:32:49<32:10,  1.50it/s]


 94%|████████████████████████████████▉  | 47110/50000 [8:32:49<30:43,  1.57it/s]


 94%|████████████████████████████████▉  | 47111/50000 [8:32:50<31:13,  1.54it/s]


 94%|████████████████████████████████▉  | 47112/50000 [8:32:50<31:26,  1.53it/s]


 94%|████████████████████████████████▉  | 47113/50000 [8:32:51<32:27,  1.48it/s]


 94%|████████████████████████████████▉  | 47114/50000 [8:32:52<32:24,  1.48it/s]


 94%|████████████████████████████████▉  | 47115/50000 [8:32:52<31:59,  1.50it/s]


 94%|████████████████████████████████▉  | 47116/50000 [8:32:53<30:25,  1.58it/s]


 94%|████████████████████████████████▉  | 47117/50000 [8:32:54<28:22,  1.69it/s]


 94%|████████████████████████████████▉  | 47118/50000 [8:32:54<29:04,  1.65it/s]


 94%|████████████████████████████████▉  | 47119/50000 [8:32:55<32:00,  1.50it/s]


 94%|████████████████████████████████▉  | 47120/50000 [8:32:56<31:36,  1.52it/s]


 94%|████████████████████████████████▉  | 47121/50000 [8:32:56<31:30,  1.52it/s]


 94%|████████████████████████████████▉  | 47122/50000 [8:32:57<30:20,  1.58it/s]


 94%|████████████████████████████████▉  | 47123/50000 [8:32:58<32:57,  1.45it/s]


 94%|████████████████████████████████▉  | 47124/50000 [8:32:58<33:35,  1.43it/s]


 94%|████████████████████████████████▉  | 47125/50000 [8:32:59<33:19,  1.44it/s]


 94%|████████████████████████████████▉  | 47126/50000 [8:33:00<31:51,  1.50it/s]


 94%|████████████████████████████████▉  | 47127/50000 [8:33:00<30:18,  1.58it/s]


 94%|████████████████████████████████▉  | 47128/50000 [8:33:01<31:42,  1.51it/s]


 94%|████████████████████████████████▉  | 47129/50000 [8:33:02<31:38,  1.51it/s]


 94%|████████████████████████████████▉  | 47130/50000 [8:33:02<30:19,  1.58it/s]


 94%|████████████████████████████████▉  | 47131/50000 [8:33:03<30:39,  1.56it/s]


 94%|████████████████████████████████▉  | 47132/50000 [8:33:04<36:46,  1.30it/s]


 94%|████████████████████████████████▉  | 47133/50000 [8:33:05<35:21,  1.35it/s]


 94%|████████████████████████████████▉  | 47134/50000 [8:33:05<34:21,  1.39it/s]


 94%|████████████████████████████████▉  | 47135/50000 [8:33:06<32:44,  1.46it/s]


 94%|████████████████████████████████▉  | 47136/50000 [8:33:06<30:43,  1.55it/s]


 94%|████████████████████████████████▉  | 47137/50000 [8:33:07<29:34,  1.61it/s]


 94%|████████████████████████████████▉  | 47138/50000 [8:33:08<30:01,  1.59it/s]


 94%|████████████████████████████████▉  | 47139/50000 [8:33:08<28:48,  1.66it/s]


 94%|████████████████████████████████▉  | 47140/50000 [8:33:09<31:55,  1.49it/s]


 94%|████████████████████████████████▉  | 47141/50000 [8:33:10<32:36,  1.46it/s]


 94%|████████████████████████████████▉  | 47142/50000 [8:33:10<31:04,  1.53it/s]


 94%|█████████████████████████████████  | 47143/50000 [8:33:11<30:05,  1.58it/s]


 94%|█████████████████████████████████  | 47144/50000 [8:33:11<29:23,  1.62it/s]


 94%|█████████████████████████████████  | 47145/50000 [8:33:12<30:03,  1.58it/s]


 94%|█████████████████████████████████  | 47146/50000 [8:33:13<29:26,  1.62it/s]


 94%|█████████████████████████████████  | 47147/50000 [8:33:14<32:36,  1.46it/s]


 94%|█████████████████████████████████  | 47148/50000 [8:33:14<30:35,  1.55it/s]


 94%|█████████████████████████████████  | 47149/50000 [8:33:15<30:52,  1.54it/s]


 94%|█████████████████████████████████  | 47150/50000 [8:33:16<32:11,  1.48it/s]


 94%|█████████████████████████████████  | 47151/50000 [8:33:16<31:11,  1.52it/s]


 94%|█████████████████████████████████  | 47152/50000 [8:33:17<29:15,  1.62it/s]


 94%|█████████████████████████████████  | 47153/50000 [8:33:17<27:38,  1.72it/s]


 94%|█████████████████████████████████  | 47154/50000 [8:33:18<28:42,  1.65it/s]


 94%|█████████████████████████████████  | 47155/50000 [8:33:19<32:03,  1.48it/s]


 94%|█████████████████████████████████  | 47156/50000 [8:33:19<34:08,  1.39it/s]


 94%|█████████████████████████████████  | 47157/50000 [8:33:20<32:48,  1.44it/s]


 94%|█████████████████████████████████  | 47158/50000 [8:33:21<32:28,  1.46it/s]


 94%|█████████████████████████████████  | 47159/50000 [8:33:22<34:57,  1.35it/s]


 94%|█████████████████████████████████  | 47160/50000 [8:33:22<32:55,  1.44it/s]


 94%|█████████████████████████████████  | 47161/50000 [8:33:23<33:46,  1.40it/s]


 94%|█████████████████████████████████  | 47162/50000 [8:33:24<34:48,  1.36it/s]


 94%|█████████████████████████████████  | 47163/50000 [8:33:24<33:48,  1.40it/s]


 94%|█████████████████████████████████  | 47164/50000 [8:33:25<35:07,  1.35it/s]


 94%|█████████████████████████████████  | 47165/50000 [8:33:26<35:07,  1.35it/s]


 94%|█████████████████████████████████  | 47166/50000 [8:33:27<36:14,  1.30it/s]


 94%|█████████████████████████████████  | 47167/50000 [8:33:27<33:57,  1.39it/s]


 94%|█████████████████████████████████  | 47168/50000 [8:33:28<32:06,  1.47it/s]


 94%|█████████████████████████████████  | 47169/50000 [8:33:29<32:35,  1.45it/s]


 94%|█████████████████████████████████  | 47170/50000 [8:33:29<31:13,  1.51it/s]


 94%|█████████████████████████████████  | 47171/50000 [8:33:30<29:39,  1.59it/s]


 94%|█████████████████████████████████  | 47172/50000 [8:33:30<29:14,  1.61it/s]


 94%|█████████████████████████████████  | 47173/50000 [8:33:31<29:52,  1.58it/s]


 94%|█████████████████████████████████  | 47174/50000 [8:33:32<31:14,  1.51it/s]


 94%|█████████████████████████████████  | 47175/50000 [8:33:33<30:50,  1.53it/s]


 94%|█████████████████████████████████  | 47176/50000 [8:33:33<30:32,  1.54it/s]


 94%|█████████████████████████████████  | 47177/50000 [8:33:34<29:53,  1.57it/s]


 94%|█████████████████████████████████  | 47178/50000 [8:33:34<30:08,  1.56it/s]


 94%|█████████████████████████████████  | 47179/50000 [8:33:35<29:36,  1.59it/s]


 94%|█████████████████████████████████  | 47180/50000 [8:33:36<29:17,  1.60it/s]


 94%|█████████████████████████████████  | 47181/50000 [8:33:36<30:29,  1.54it/s]


 94%|█████████████████████████████████  | 47182/50000 [8:33:37<29:16,  1.60it/s]


 94%|█████████████████████████████████  | 47183/50000 [8:33:37<28:42,  1.64it/s]


 94%|█████████████████████████████████  | 47184/50000 [8:33:38<29:42,  1.58it/s]


 94%|█████████████████████████████████  | 47185/50000 [8:33:39<30:51,  1.52it/s]


 94%|█████████████████████████████████  | 47186/50000 [8:33:40<32:24,  1.45it/s]


 94%|█████████████████████████████████  | 47187/50000 [8:33:40<32:51,  1.43it/s]


 94%|█████████████████████████████████  | 47188/50000 [8:33:41<32:09,  1.46it/s]


 94%|█████████████████████████████████  | 47189/50000 [8:33:42<29:29,  1.59it/s]


 94%|█████████████████████████████████  | 47190/50000 [8:33:42<30:04,  1.56it/s]


 94%|█████████████████████████████████  | 47191/50000 [8:33:43<31:23,  1.49it/s]


 94%|█████████████████████████████████  | 47192/50000 [8:33:44<31:28,  1.49it/s]


 94%|█████████████████████████████████  | 47193/50000 [8:33:44<29:12,  1.60it/s]


 94%|█████████████████████████████████  | 47194/50000 [8:33:45<29:57,  1.56it/s]


 94%|█████████████████████████████████  | 47195/50000 [8:33:45<29:41,  1.57it/s]


 94%|█████████████████████████████████  | 47196/50000 [8:33:46<28:01,  1.67it/s]


 94%|█████████████████████████████████  | 47197/50000 [8:33:46<27:41,  1.69it/s]


 94%|█████████████████████████████████  | 47198/50000 [8:33:47<28:30,  1.64it/s]


 94%|█████████████████████████████████  | 47199/50000 [8:33:48<28:22,  1.64it/s]


 94%|█████████████████████████████████  | 47200/50000 [8:33:48<28:39,  1.63it/s]
                                                                                
{'loss': 3.125, 'grad_norm': 3.540755033493042, 'learning_rate': 5.6e-05, 'epoch': 2.47}

 94%|█████████████████████████████████  | 47200/50000 [8:33:48<28:39,  1.63it/s]


 94%|█████████████████████████████████  | 47201/50000 [8:33:49<31:33,  1.48it/s]


 94%|█████████████████████████████████  | 47202/50000 [8:33:50<31:17,  1.49it/s]


 94%|█████████████████████████████████  | 47203/50000 [8:33:51<32:22,  1.44it/s]


 94%|█████████████████████████████████  | 47204/50000 [8:33:51<31:56,  1.46it/s]


 94%|█████████████████████████████████  | 47205/50000 [8:33:52<33:50,  1.38it/s]


 94%|█████████████████████████████████  | 47206/50000 [8:33:53<31:33,  1.48it/s]


 94%|█████████████████████████████████  | 47207/50000 [8:33:53<30:55,  1.51it/s]


 94%|█████████████████████████████████  | 47208/50000 [8:33:54<30:43,  1.51it/s]


 94%|█████████████████████████████████  | 47209/50000 [8:33:55<29:58,  1.55it/s]


 94%|█████████████████████████████████  | 47210/50000 [8:33:55<30:32,  1.52it/s]


 94%|█████████████████████████████████  | 47211/50000 [8:33:56<30:08,  1.54it/s]


 94%|█████████████████████████████████  | 47212/50000 [8:33:56<29:09,  1.59it/s]


 94%|█████████████████████████████████  | 47213/50000 [8:33:57<30:19,  1.53it/s]


 94%|█████████████████████████████████  | 47214/50000 [8:33:58<30:41,  1.51it/s]


 94%|█████████████████████████████████  | 47215/50000 [8:33:58<29:57,  1.55it/s]


 94%|█████████████████████████████████  | 47216/50000 [8:33:59<31:24,  1.48it/s]


 94%|█████████████████████████████████  | 47217/50000 [8:34:00<31:08,  1.49it/s]


 94%|█████████████████████████████████  | 47218/50000 [8:34:01<31:39,  1.46it/s]


 94%|█████████████████████████████████  | 47219/50000 [8:34:01<31:13,  1.48it/s]


 94%|█████████████████████████████████  | 47220/50000 [8:34:02<29:41,  1.56it/s]


 94%|█████████████████████████████████  | 47221/50000 [8:34:02<29:49,  1.55it/s]


 94%|█████████████████████████████████  | 47222/50000 [8:34:03<27:52,  1.66it/s]


 94%|█████████████████████████████████  | 47223/50000 [8:34:04<27:45,  1.67it/s]


 94%|█████████████████████████████████  | 47224/50000 [8:34:04<27:10,  1.70it/s]


 94%|█████████████████████████████████  | 47225/50000 [8:34:05<27:16,  1.70it/s]


 94%|█████████████████████████████████  | 47226/50000 [8:34:05<26:39,  1.73it/s]


 94%|█████████████████████████████████  | 47227/50000 [8:34:06<26:44,  1.73it/s]


 94%|█████████████████████████████████  | 47228/50000 [8:34:07<30:01,  1.54it/s]


 94%|█████████████████████████████████  | 47229/50000 [8:34:07<28:54,  1.60it/s]


 94%|█████████████████████████████████  | 47230/50000 [8:34:08<29:05,  1.59it/s]


 94%|█████████████████████████████████  | 47231/50000 [8:34:08<27:29,  1.68it/s]


 94%|█████████████████████████████████  | 47232/50000 [8:34:09<28:00,  1.65it/s]


 94%|█████████████████████████████████  | 47233/50000 [8:34:09<26:23,  1.75it/s]


 94%|█████████████████████████████████  | 47234/50000 [8:34:10<28:34,  1.61it/s]


 94%|█████████████████████████████████  | 47235/50000 [8:34:11<28:46,  1.60it/s]


 94%|█████████████████████████████████  | 47236/50000 [8:34:12<30:19,  1.52it/s]


 94%|█████████████████████████████████  | 47237/50000 [8:34:12<30:20,  1.52it/s]


 94%|█████████████████████████████████  | 47238/50000 [8:34:13<31:45,  1.45it/s]


 94%|█████████████████████████████████  | 47239/50000 [8:34:14<31:29,  1.46it/s]


 94%|█████████████████████████████████  | 47240/50000 [8:34:15<33:27,  1.37it/s]


 94%|█████████████████████████████████  | 47241/50000 [8:34:15<31:48,  1.45it/s]


 94%|█████████████████████████████████  | 47242/50000 [8:34:16<30:06,  1.53it/s]


 94%|█████████████████████████████████  | 47243/50000 [8:34:16<29:42,  1.55it/s]


 94%|█████████████████████████████████  | 47244/50000 [8:34:17<31:01,  1.48it/s]


 94%|█████████████████████████████████  | 47245/50000 [8:34:18<30:36,  1.50it/s]


 94%|█████████████████████████████████  | 47246/50000 [8:34:18<30:44,  1.49it/s]


 94%|█████████████████████████████████  | 47247/50000 [8:34:19<30:08,  1.52it/s]


 94%|█████████████████████████████████  | 47248/50000 [8:34:20<29:22,  1.56it/s]


 94%|█████████████████████████████████  | 47249/50000 [8:34:20<28:23,  1.61it/s]


 94%|█████████████████████████████████  | 47250/50000 [8:34:21<28:29,  1.61it/s]


 95%|█████████████████████████████████  | 47251/50000 [8:34:22<30:08,  1.52it/s]


 95%|█████████████████████████████████  | 47252/50000 [8:34:22<30:02,  1.52it/s]


 95%|█████████████████████████████████  | 47253/50000 [8:34:23<29:58,  1.53it/s]


 95%|█████████████████████████████████  | 47254/50000 [8:34:23<29:50,  1.53it/s]


 95%|█████████████████████████████████  | 47255/50000 [8:34:24<32:15,  1.42it/s]


 95%|█████████████████████████████████  | 47256/50000 [8:34:25<31:46,  1.44it/s]


 95%|█████████████████████████████████  | 47257/50000 [8:34:26<31:16,  1.46it/s]


 95%|█████████████████████████████████  | 47258/50000 [8:34:26<29:52,  1.53it/s]


 95%|█████████████████████████████████  | 47259/50000 [8:34:27<30:37,  1.49it/s]


 95%|█████████████████████████████████  | 47260/50000 [8:34:28<31:22,  1.46it/s]


 95%|█████████████████████████████████  | 47261/50000 [8:34:28<32:12,  1.42it/s]


 95%|█████████████████████████████████  | 47262/50000 [8:34:29<32:26,  1.41it/s]


 95%|█████████████████████████████████  | 47263/50000 [8:34:30<31:21,  1.45it/s]


 95%|█████████████████████████████████  | 47264/50000 [8:34:30<30:41,  1.49it/s]


 95%|█████████████████████████████████  | 47265/50000 [8:34:31<28:55,  1.58it/s]


 95%|█████████████████████████████████  | 47266/50000 [8:34:32<30:00,  1.52it/s]


 95%|█████████████████████████████████  | 47267/50000 [8:34:32<29:33,  1.54it/s]


 95%|█████████████████████████████████  | 47268/50000 [8:34:33<30:22,  1.50it/s]


 95%|█████████████████████████████████  | 47269/50000 [8:34:34<29:48,  1.53it/s]


 95%|█████████████████████████████████  | 47270/50000 [8:34:34<30:32,  1.49it/s]


 95%|█████████████████████████████████  | 47271/50000 [8:34:35<31:09,  1.46it/s]


 95%|█████████████████████████████████  | 47272/50000 [8:34:36<35:54,  1.27it/s]


 95%|█████████████████████████████████  | 47273/50000 [8:34:37<33:45,  1.35it/s]


 95%|█████████████████████████████████  | 47274/50000 [8:34:37<31:39,  1.44it/s]


 95%|█████████████████████████████████  | 47275/50000 [8:34:38<31:01,  1.46it/s]


 95%|█████████████████████████████████  | 47276/50000 [8:34:39<32:05,  1.41it/s]


 95%|█████████████████████████████████  | 47277/50000 [8:34:39<29:52,  1.52it/s]


 95%|█████████████████████████████████  | 47278/50000 [8:34:40<30:50,  1.47it/s]


 95%|█████████████████████████████████  | 47279/50000 [8:34:41<30:26,  1.49it/s]


 95%|█████████████████████████████████  | 47280/50000 [8:34:41<31:13,  1.45it/s]


 95%|█████████████████████████████████  | 47281/50000 [8:34:42<29:36,  1.53it/s]


 95%|█████████████████████████████████  | 47282/50000 [8:34:43<28:53,  1.57it/s]


 95%|█████████████████████████████████  | 47283/50000 [8:34:43<29:03,  1.56it/s]


 95%|█████████████████████████████████  | 47284/50000 [8:34:44<28:04,  1.61it/s]


 95%|█████████████████████████████████  | 47285/50000 [8:34:44<29:16,  1.55it/s]


 95%|█████████████████████████████████  | 47286/50000 [8:34:45<28:23,  1.59it/s]


 95%|█████████████████████████████████  | 47287/50000 [8:34:46<26:41,  1.69it/s]


 95%|█████████████████████████████████  | 47288/50000 [8:34:46<28:17,  1.60it/s]


 95%|█████████████████████████████████  | 47289/50000 [8:34:47<28:13,  1.60it/s]


 95%|█████████████████████████████████  | 47290/50000 [8:34:48<28:20,  1.59it/s]


 95%|█████████████████████████████████  | 47291/50000 [8:34:48<27:35,  1.64it/s]


 95%|█████████████████████████████████  | 47292/50000 [8:34:49<27:49,  1.62it/s]


 95%|█████████████████████████████████  | 47293/50000 [8:34:49<27:22,  1.65it/s]


 95%|█████████████████████████████████  | 47294/50000 [8:34:50<28:10,  1.60it/s]


 95%|█████████████████████████████████  | 47295/50000 [8:34:51<26:38,  1.69it/s]


 95%|█████████████████████████████████  | 47296/50000 [8:34:51<26:01,  1.73it/s]


 95%|█████████████████████████████████  | 47297/50000 [8:34:52<29:26,  1.53it/s]


 95%|█████████████████████████████████  | 47298/50000 [8:34:52<28:25,  1.58it/s]


 95%|█████████████████████████████████  | 47299/50000 [8:34:53<29:46,  1.51it/s]


 95%|█████████████████████████████████  | 47300/50000 [8:34:54<29:29,  1.53it/s]
                                                                                
{'loss': 3.1329, 'grad_norm': 3.286604404449463, 'learning_rate': 5.4e-05, 'epoch': 2.48}

 95%|█████████████████████████████████  | 47300/50000 [8:34:54<29:29,  1.53it/s]


 95%|█████████████████████████████████  | 47301/50000 [8:34:54<27:33,  1.63it/s]


 95%|█████████████████████████████████  | 47302/50000 [8:34:55<29:04,  1.55it/s]


 95%|█████████████████████████████████  | 47303/50000 [8:34:56<28:59,  1.55it/s]


 95%|█████████████████████████████████  | 47304/50000 [8:34:56<28:46,  1.56it/s]


 95%|█████████████████████████████████  | 47305/50000 [8:34:57<28:05,  1.60it/s]


 95%|█████████████████████████████████  | 47306/50000 [8:34:58<29:13,  1.54it/s]


 95%|█████████████████████████████████  | 47307/50000 [8:34:58<28:34,  1.57it/s]


 95%|█████████████████████████████████  | 47308/50000 [8:34:59<30:07,  1.49it/s]


 95%|█████████████████████████████████  | 47309/50000 [8:35:00<30:08,  1.49it/s]


 95%|█████████████████████████████████  | 47310/50000 [8:35:01<33:27,  1.34it/s]


 95%|█████████████████████████████████  | 47311/50000 [8:35:01<32:30,  1.38it/s]


 95%|█████████████████████████████████  | 47312/50000 [8:35:02<31:19,  1.43it/s]


 95%|█████████████████████████████████  | 47313/50000 [8:35:03<30:21,  1.48it/s]


 95%|█████████████████████████████████  | 47314/50000 [8:35:03<29:05,  1.54it/s]


 95%|█████████████████████████████████  | 47315/50000 [8:35:04<28:07,  1.59it/s]


 95%|█████████████████████████████████  | 47316/50000 [8:35:04<29:35,  1.51it/s]


 95%|█████████████████████████████████  | 47317/50000 [8:35:05<27:29,  1.63it/s]


 95%|█████████████████████████████████  | 47318/50000 [8:35:06<28:53,  1.55it/s]


 95%|█████████████████████████████████  | 47319/50000 [8:35:06<27:48,  1.61it/s]


 95%|█████████████████████████████████  | 47320/50000 [8:35:07<28:28,  1.57it/s]


 95%|█████████████████████████████████  | 47321/50000 [8:35:08<28:25,  1.57it/s]


 95%|█████████████████████████████████▏ | 47322/50000 [8:35:08<28:27,  1.57it/s]


 95%|█████████████████████████████████▏ | 47323/50000 [8:35:09<29:41,  1.50it/s]


 95%|█████████████████████████████████▏ | 47324/50000 [8:35:10<31:56,  1.40it/s]


 95%|█████████████████████████████████▏ | 47325/50000 [8:35:11<32:43,  1.36it/s]


 95%|█████████████████████████████████▏ | 47326/50000 [8:35:11<32:54,  1.35it/s]


 95%|█████████████████████████████████▏ | 47327/50000 [8:35:12<31:49,  1.40it/s]


 95%|█████████████████████████████████▏ | 47328/50000 [8:35:13<33:41,  1.32it/s]


 95%|█████████████████████████████████▏ | 47329/50000 [8:35:13<32:37,  1.36it/s]


 95%|█████████████████████████████████▏ | 47330/50000 [8:35:14<33:58,  1.31it/s]


 95%|█████████████████████████████████▏ | 47331/50000 [8:35:15<33:51,  1.31it/s]


 95%|█████████████████████████████████▏ | 47332/50000 [8:35:16<33:09,  1.34it/s]


 95%|█████████████████████████████████▏ | 47333/50000 [8:35:16<30:52,  1.44it/s]


 95%|█████████████████████████████████▏ | 47334/50000 [8:35:17<30:17,  1.47it/s]


 95%|█████████████████████████████████▏ | 47335/50000 [8:35:18<28:59,  1.53it/s]


 95%|█████████████████████████████████▏ | 47336/50000 [8:35:18<30:11,  1.47it/s]


 95%|█████████████████████████████████▏ | 47337/50000 [8:35:19<28:29,  1.56it/s]


 95%|█████████████████████████████████▏ | 47338/50000 [8:35:20<29:48,  1.49it/s]


 95%|█████████████████████████████████▏ | 47339/50000 [8:35:20<31:38,  1.40it/s]


 95%|█████████████████████████████████▏ | 47340/50000 [8:35:21<31:45,  1.40it/s]


 95%|█████████████████████████████████▏ | 47341/50000 [8:35:22<32:59,  1.34it/s]


 95%|█████████████████████████████████▏ | 47342/50000 [8:35:23<32:30,  1.36it/s]


 95%|█████████████████████████████████▏ | 47343/50000 [8:35:23<30:39,  1.44it/s]


 95%|█████████████████████████████████▏ | 47344/50000 [8:35:24<31:09,  1.42it/s]


 95%|█████████████████████████████████▏ | 47345/50000 [8:35:25<29:18,  1.51it/s]


 95%|█████████████████████████████████▏ | 47346/50000 [8:35:25<27:55,  1.58it/s]


 95%|█████████████████████████████████▏ | 47347/50000 [8:35:26<27:32,  1.61it/s]


 95%|█████████████████████████████████▏ | 47348/50000 [8:35:26<27:01,  1.64it/s]


 95%|█████████████████████████████████▏ | 47349/50000 [8:35:27<27:19,  1.62it/s]


 95%|█████████████████████████████████▏ | 47350/50000 [8:35:28<26:57,  1.64it/s]


 95%|█████████████████████████████████▏ | 47351/50000 [8:35:28<26:46,  1.65it/s]


 95%|█████████████████████████████████▏ | 47352/50000 [8:35:29<26:23,  1.67it/s]


 95%|█████████████████████████████████▏ | 47353/50000 [8:35:29<25:52,  1.71it/s]


 95%|█████████████████████████████████▏ | 47354/50000 [8:35:30<27:37,  1.60it/s]


 95%|█████████████████████████████████▏ | 47355/50000 [8:35:31<27:58,  1.58it/s]


 95%|█████████████████████████████████▏ | 47356/50000 [8:35:31<28:03,  1.57it/s]


 95%|█████████████████████████████████▏ | 47357/50000 [8:35:32<29:10,  1.51it/s]


 95%|█████████████████████████████████▏ | 47358/50000 [8:35:33<30:36,  1.44it/s]


 95%|█████████████████████████████████▏ | 47359/50000 [8:35:34<31:35,  1.39it/s]


 95%|█████████████████████████████████▏ | 47360/50000 [8:35:34<31:37,  1.39it/s]


 95%|█████████████████████████████████▏ | 47361/50000 [8:35:35<30:21,  1.45it/s]


 95%|█████████████████████████████████▏ | 47362/50000 [8:35:35<29:11,  1.51it/s]


 95%|█████████████████████████████████▏ | 47363/50000 [8:35:37<34:28,  1.27it/s]


 95%|█████████████████████████████████▏ | 47364/50000 [8:35:37<32:40,  1.34it/s]


 95%|█████████████████████████████████▏ | 47365/50000 [8:35:38<30:28,  1.44it/s]


 95%|█████████████████████████████████▏ | 47366/50000 [8:35:38<29:35,  1.48it/s]


 95%|█████████████████████████████████▏ | 47367/50000 [8:35:39<34:04,  1.29it/s]


 95%|█████████████████████████████████▏ | 47368/50000 [8:35:40<32:05,  1.37it/s]


 95%|█████████████████████████████████▏ | 47369/50000 [8:35:41<30:57,  1.42it/s]


 95%|█████████████████████████████████▏ | 47370/50000 [8:35:41<29:14,  1.50it/s]


 95%|█████████████████████████████████▏ | 47371/50000 [8:35:42<30:14,  1.45it/s]


 95%|█████████████████████████████████▏ | 47372/50000 [8:35:43<28:44,  1.52it/s]


 95%|█████████████████████████████████▏ | 47373/50000 [8:35:43<27:55,  1.57it/s]


 95%|█████████████████████████████████▏ | 47374/50000 [8:35:44<27:55,  1.57it/s]


 95%|█████████████████████████████████▏ | 47375/50000 [8:35:44<28:06,  1.56it/s]


 95%|█████████████████████████████████▏ | 47376/50000 [8:35:45<26:48,  1.63it/s]


 95%|█████████████████████████████████▏ | 47377/50000 [8:35:46<28:25,  1.54it/s]


 95%|█████████████████████████████████▏ | 47378/50000 [8:35:47<30:45,  1.42it/s]


 95%|█████████████████████████████████▏ | 47379/50000 [8:35:47<28:44,  1.52it/s]


 95%|█████████████████████████████████▏ | 47380/50000 [8:35:48<28:54,  1.51it/s]


 95%|█████████████████████████████████▏ | 47381/50000 [8:35:49<31:15,  1.40it/s]


 95%|█████████████████████████████████▏ | 47382/50000 [8:35:49<31:09,  1.40it/s]


 95%|█████████████████████████████████▏ | 47383/50000 [8:35:50<31:05,  1.40it/s]


 95%|█████████████████████████████████▏ | 47384/50000 [8:35:51<29:09,  1.50it/s]


 95%|█████████████████████████████████▏ | 47385/50000 [8:35:51<28:01,  1.55it/s]


 95%|█████████████████████████████████▏ | 47386/50000 [8:35:52<28:13,  1.54it/s]


 95%|█████████████████████████████████▏ | 47387/50000 [8:35:53<27:56,  1.56it/s]


 95%|█████████████████████████████████▏ | 47388/50000 [8:35:53<27:44,  1.57it/s]


 95%|█████████████████████████████████▏ | 47389/50000 [8:35:54<28:45,  1.51it/s]


 95%|█████████████████████████████████▏ | 47390/50000 [8:35:55<29:49,  1.46it/s]


 95%|█████████████████████████████████▏ | 47391/50000 [8:35:55<28:43,  1.51it/s]


 95%|█████████████████████████████████▏ | 47392/50000 [8:35:56<27:51,  1.56it/s]


 95%|█████████████████████████████████▏ | 47393/50000 [8:35:56<28:04,  1.55it/s]


 95%|█████████████████████████████████▏ | 47394/50000 [8:35:57<26:07,  1.66it/s]


 95%|█████████████████████████████████▏ | 47395/50000 [8:35:58<26:31,  1.64it/s]


 95%|█████████████████████████████████▏ | 47396/50000 [8:35:58<25:51,  1.68it/s]


 95%|█████████████████████████████████▏ | 47397/50000 [8:35:59<25:26,  1.71it/s]


 95%|█████████████████████████████████▏ | 47398/50000 [8:35:59<27:03,  1.60it/s]


 95%|█████████████████████████████████▏ | 47399/50000 [8:36:00<27:45,  1.56it/s]


 95%|█████████████████████████████████▏ | 47400/50000 [8:36:01<30:08,  1.44it/s]
                                                                                
{'loss': 3.0966, 'grad_norm': 3.0490520000457764, 'learning_rate': 5.2e-05, 'epoch': 2.48}

 95%|█████████████████████████████████▏ | 47400/50000 [8:36:01<30:08,  1.44it/s]


 95%|█████████████████████████████████▏ | 47401/50000 [8:36:02<31:16,  1.38it/s]


 95%|█████████████████████████████████▏ | 47402/50000 [8:36:02<29:13,  1.48it/s]


 95%|█████████████████████████████████▏ | 47403/50000 [8:36:03<26:58,  1.60it/s]


 95%|█████████████████████████████████▏ | 47404/50000 [8:36:03<27:34,  1.57it/s]


 95%|█████████████████████████████████▏ | 47405/50000 [8:36:04<31:39,  1.37it/s]


 95%|█████████████████████████████████▏ | 47406/50000 [8:36:05<32:18,  1.34it/s]


 95%|█████████████████████████████████▏ | 47407/50000 [8:36:06<32:45,  1.32it/s]


 95%|█████████████████████████████████▏ | 47408/50000 [8:36:07<30:10,  1.43it/s]


 95%|█████████████████████████████████▏ | 47409/50000 [8:36:07<27:39,  1.56it/s]


 95%|█████████████████████████████████▏ | 47410/50000 [8:36:08<30:11,  1.43it/s]


 95%|█████████████████████████████████▏ | 47411/50000 [8:36:09<29:47,  1.45it/s]


 95%|█████████████████████████████████▏ | 47412/50000 [8:36:09<32:02,  1.35it/s]


 95%|█████████████████████████████████▏ | 47413/50000 [8:36:10<29:34,  1.46it/s]


 95%|█████████████████████████████████▏ | 47414/50000 [8:36:11<29:24,  1.47it/s]


 95%|█████████████████████████████████▏ | 47415/50000 [8:36:11<28:51,  1.49it/s]


 95%|█████████████████████████████████▏ | 47416/50000 [8:36:12<28:19,  1.52it/s]


 95%|█████████████████████████████████▏ | 47417/50000 [8:36:13<27:56,  1.54it/s]


 95%|█████████████████████████████████▏ | 47418/50000 [8:36:13<26:46,  1.61it/s]


 95%|█████████████████████████████████▏ | 47419/50000 [8:36:14<26:12,  1.64it/s]


 95%|█████████████████████████████████▏ | 47420/50000 [8:36:14<26:22,  1.63it/s]


 95%|█████████████████████████████████▏ | 47421/50000 [8:36:15<26:48,  1.60it/s]


 95%|█████████████████████████████████▏ | 47422/50000 [8:36:16<28:26,  1.51it/s]


 95%|█████████████████████████████████▏ | 47423/50000 [8:36:16<28:29,  1.51it/s]


 95%|█████████████████████████████████▏ | 47424/50000 [8:36:17<30:02,  1.43it/s]


 95%|█████████████████████████████████▏ | 47425/50000 [8:36:18<28:32,  1.50it/s]


 95%|█████████████████████████████████▏ | 47426/50000 [8:36:18<29:08,  1.47it/s]


 95%|█████████████████████████████████▏ | 47427/50000 [8:36:19<27:35,  1.55it/s]


 95%|█████████████████████████████████▏ | 47428/50000 [8:36:20<27:23,  1.56it/s]


 95%|█████████████████████████████████▏ | 47429/50000 [8:36:20<26:40,  1.61it/s]


 95%|█████████████████████████████████▏ | 47430/50000 [8:36:21<26:14,  1.63it/s]


 95%|█████████████████████████████████▏ | 47431/50000 [8:36:21<25:57,  1.65it/s]


 95%|█████████████████████████████████▏ | 47432/50000 [8:36:22<26:16,  1.63it/s]


 95%|█████████████████████████████████▏ | 47433/50000 [8:36:23<26:08,  1.64it/s]


 95%|█████████████████████████████████▏ | 47434/50000 [8:36:23<25:32,  1.67it/s]


 95%|█████████████████████████████████▏ | 47435/50000 [8:36:24<27:55,  1.53it/s]


 95%|█████████████████████████████████▏ | 47436/50000 [8:36:25<27:59,  1.53it/s]


 95%|█████████████████████████████████▏ | 47437/50000 [8:36:25<26:08,  1.63it/s]


 95%|█████████████████████████████████▏ | 47438/50000 [8:36:26<25:22,  1.68it/s]


 95%|█████████████████████████████████▏ | 47439/50000 [8:36:26<25:43,  1.66it/s]


 95%|█████████████████████████████████▏ | 47440/50000 [8:36:27<24:32,  1.74it/s]


 95%|█████████████████████████████████▏ | 47441/50000 [8:36:28<25:55,  1.65it/s]


 95%|█████████████████████████████████▏ | 47442/50000 [8:36:28<25:49,  1.65it/s]


 95%|█████████████████████████████████▏ | 47443/50000 [8:36:29<29:43,  1.43it/s]


 95%|█████████████████████████████████▏ | 47444/50000 [8:36:30<30:47,  1.38it/s]


 95%|█████████████████████████████████▏ | 47445/50000 [8:36:31<31:17,  1.36it/s]


 95%|█████████████████████████████████▏ | 47446/50000 [8:36:31<29:56,  1.42it/s]


 95%|█████████████████████████████████▏ | 47447/50000 [8:36:32<28:14,  1.51it/s]


 95%|█████████████████████████████████▏ | 47448/50000 [8:36:32<27:13,  1.56it/s]


 95%|█████████████████████████████████▏ | 47449/50000 [8:36:33<27:13,  1.56it/s]


 95%|█████████████████████████████████▏ | 47450/50000 [8:36:34<28:57,  1.47it/s]


 95%|█████████████████████████████████▏ | 47451/50000 [8:36:34<26:47,  1.59it/s]


 95%|█████████████████████████████████▏ | 47452/50000 [8:36:35<25:25,  1.67it/s]


 95%|█████████████████████████████████▏ | 47453/50000 [8:36:35<25:09,  1.69it/s]


 95%|█████████████████████████████████▏ | 47454/50000 [8:36:36<25:37,  1.66it/s]


 95%|█████████████████████████████████▏ | 47455/50000 [8:36:37<27:40,  1.53it/s]


 95%|█████████████████████████████████▏ | 47456/50000 [8:36:37<25:38,  1.65it/s]


 95%|█████████████████████████████████▏ | 47457/50000 [8:36:38<25:18,  1.68it/s]


 95%|█████████████████████████████████▏ | 47458/50000 [8:36:38<25:12,  1.68it/s]


 95%|█████████████████████████████████▏ | 47459/50000 [8:36:39<25:56,  1.63it/s]


 95%|█████████████████████████████████▏ | 47460/50000 [8:36:40<26:42,  1.59it/s]


 95%|█████████████████████████████████▏ | 47461/50000 [8:36:40<25:37,  1.65it/s]


 95%|█████████████████████████████████▏ | 47462/50000 [8:36:41<26:13,  1.61it/s]


 95%|█████████████████████████████████▏ | 47463/50000 [8:36:42<25:35,  1.65it/s]


 95%|█████████████████████████████████▏ | 47464/50000 [8:36:42<25:33,  1.65it/s]


 95%|█████████████████████████████████▏ | 47465/50000 [8:36:43<27:18,  1.55it/s]


 95%|█████████████████████████████████▏ | 47466/50000 [8:36:44<27:12,  1.55it/s]


 95%|█████████████████████████████████▏ | 47467/50000 [8:36:44<30:31,  1.38it/s]


 95%|█████████████████████████████████▏ | 47468/50000 [8:36:45<29:22,  1.44it/s]


 95%|█████████████████████████████████▏ | 47469/50000 [8:36:46<28:09,  1.50it/s]


 95%|█████████████████████████████████▏ | 47470/50000 [8:36:46<26:13,  1.61it/s]


 95%|█████████████████████████████████▏ | 47471/50000 [8:36:47<26:30,  1.59it/s]


 95%|█████████████████████████████████▏ | 47472/50000 [8:36:47<25:54,  1.63it/s]


 95%|█████████████████████████████████▏ | 47473/50000 [8:36:48<25:24,  1.66it/s]


 95%|█████████████████████████████████▏ | 47474/50000 [8:36:49<25:45,  1.63it/s]


 95%|█████████████████████████████████▏ | 47475/50000 [8:36:49<24:54,  1.69it/s]


 95%|█████████████████████████████████▏ | 47476/50000 [8:36:50<25:49,  1.63it/s]


 95%|█████████████████████████████████▏ | 47477/50000 [8:36:50<26:04,  1.61it/s]


 95%|█████████████████████████████████▏ | 47478/50000 [8:36:51<26:04,  1.61it/s]


 95%|█████████████████████████████████▏ | 47479/50000 [8:36:52<26:08,  1.61it/s]


 95%|█████████████████████████████████▏ | 47480/50000 [8:36:52<26:35,  1.58it/s]


 95%|█████████████████████████████████▏ | 47481/50000 [8:36:53<26:29,  1.58it/s]


 95%|█████████████████████████████████▏ | 47482/50000 [8:36:54<26:06,  1.61it/s]


 95%|█████████████████████████████████▏ | 47483/50000 [8:36:54<24:36,  1.70it/s]


 95%|█████████████████████████████████▏ | 47484/50000 [8:36:55<25:46,  1.63it/s]


 95%|█████████████████████████████████▏ | 47485/50000 [8:36:55<25:27,  1.65it/s]


 95%|█████████████████████████████████▏ | 47486/50000 [8:36:56<26:26,  1.58it/s]


 95%|█████████████████████████████████▏ | 47487/50000 [8:36:57<25:46,  1.63it/s]


 95%|█████████████████████████████████▏ | 47488/50000 [8:36:57<25:56,  1.61it/s]


 95%|█████████████████████████████████▏ | 47489/50000 [8:36:58<24:34,  1.70it/s]


 95%|█████████████████████████████████▏ | 47490/50000 [8:36:59<27:27,  1.52it/s]


 95%|█████████████████████████████████▏ | 47491/50000 [8:36:59<26:47,  1.56it/s]


 95%|█████████████████████████████████▏ | 47492/50000 [8:37:00<27:06,  1.54it/s]


 95%|█████████████████████████████████▏ | 47493/50000 [8:37:00<26:55,  1.55it/s]


 95%|█████████████████████████████████▏ | 47494/50000 [8:37:01<26:05,  1.60it/s]


 95%|█████████████████████████████████▏ | 47495/50000 [8:37:02<25:44,  1.62it/s]


 95%|█████████████████████████████████▏ | 47496/50000 [8:37:02<26:02,  1.60it/s]


 95%|█████████████████████████████████▏ | 47497/50000 [8:37:03<25:47,  1.62it/s]


 95%|█████████████████████████████████▏ | 47498/50000 [8:37:04<26:19,  1.58it/s]


 95%|█████████████████████████████████▏ | 47499/50000 [8:37:04<25:57,  1.61it/s]


 95%|█████████████████████████████████▎ | 47500/50000 [8:37:05<25:37,  1.63it/s]
                                                                                
{'loss': 3.118, 'grad_norm': 2.9646732807159424, 'learning_rate': 5e-05, 'epoch': 2.49}

 95%|█████████████████████████████████▎ | 47500/50000 [8:37:05<25:37,  1.63it/s]


 95%|█████████████████████████████████▎ | 47501/50000 [8:37:06<27:23,  1.52it/s]


 95%|█████████████████████████████████▎ | 47502/50000 [8:37:06<28:57,  1.44it/s]


 95%|█████████████████████████████████▎ | 47503/50000 [8:37:07<28:05,  1.48it/s]


 95%|█████████████████████████████████▎ | 47504/50000 [8:37:08<28:34,  1.46it/s]


 95%|█████████████████████████████████▎ | 47505/50000 [8:37:08<29:09,  1.43it/s]


 95%|█████████████████████████████████▎ | 47506/50000 [8:37:09<27:35,  1.51it/s]


 95%|█████████████████████████████████▎ | 47507/50000 [8:37:10<26:48,  1.55it/s]


 95%|█████████████████████████████████▎ | 47508/50000 [8:37:10<25:38,  1.62it/s]


 95%|█████████████████████████████████▎ | 47509/50000 [8:37:11<26:10,  1.59it/s]


 95%|█████████████████████████████████▎ | 47510/50000 [8:37:12<27:10,  1.53it/s]


 95%|█████████████████████████████████▎ | 47511/50000 [8:37:12<26:21,  1.57it/s]


 95%|█████████████████████████████████▎ | 47512/50000 [8:37:13<26:25,  1.57it/s]


 95%|█████████████████████████████████▎ | 47513/50000 [8:37:13<26:00,  1.59it/s]


 95%|█████████████████████████████████▎ | 47514/50000 [8:37:14<25:46,  1.61it/s]


 95%|█████████████████████████████████▎ | 47515/50000 [8:37:15<26:52,  1.54it/s]


 95%|█████████████████████████████████▎ | 47516/50000 [8:37:15<27:49,  1.49it/s]


 95%|█████████████████████████████████▎ | 47517/50000 [8:37:16<27:54,  1.48it/s]


 95%|█████████████████████████████████▎ | 47518/50000 [8:37:17<27:33,  1.50it/s]


 95%|█████████████████████████████████▎ | 47519/50000 [8:37:17<27:09,  1.52it/s]


 95%|█████████████████████████████████▎ | 47520/50000 [8:37:18<28:00,  1.48it/s]


 95%|█████████████████████████████████▎ | 47521/50000 [8:37:19<27:19,  1.51it/s]


 95%|█████████████████████████████████▎ | 47522/50000 [8:37:19<26:22,  1.57it/s]


 95%|█████████████████████████████████▎ | 47523/50000 [8:37:20<26:18,  1.57it/s]


 95%|█████████████████████████████████▎ | 47524/50000 [8:37:21<25:52,  1.60it/s]


 95%|█████████████████████████████████▎ | 47525/50000 [8:37:21<24:29,  1.68it/s]


 95%|█████████████████████████████████▎ | 47526/50000 [8:37:22<25:35,  1.61it/s]


 95%|█████████████████████████████████▎ | 47527/50000 [8:37:22<25:48,  1.60it/s]


 95%|█████████████████████████████████▎ | 47528/50000 [8:37:23<25:48,  1.60it/s]


 95%|█████████████████████████████████▎ | 47529/50000 [8:37:24<27:28,  1.50it/s]


 95%|█████████████████████████████████▎ | 47530/50000 [8:37:24<26:24,  1.56it/s]


 95%|█████████████████████████████████▎ | 47531/50000 [8:37:25<24:46,  1.66it/s]


 95%|█████████████████████████████████▎ | 47532/50000 [8:37:25<24:21,  1.69it/s]


 95%|█████████████████████████████████▎ | 47533/50000 [8:37:26<25:04,  1.64it/s]


 95%|█████████████████████████████████▎ | 47534/50000 [8:37:27<24:25,  1.68it/s]


 95%|█████████████████████████████████▎ | 47535/50000 [8:37:27<26:00,  1.58it/s]


 95%|█████████████████████████████████▎ | 47536/50000 [8:37:28<25:18,  1.62it/s]


 95%|█████████████████████████████████▎ | 47537/50000 [8:37:29<25:54,  1.58it/s]


 95%|█████████████████████████████████▎ | 47538/50000 [8:37:29<25:52,  1.59it/s]


 95%|█████████████████████████████████▎ | 47539/50000 [8:37:30<25:59,  1.58it/s]


 95%|█████████████████████████████████▎ | 47540/50000 [8:37:30<25:09,  1.63it/s]


 95%|█████████████████████████████████▎ | 47541/50000 [8:37:31<25:57,  1.58it/s]


 95%|█████████████████████████████████▎ | 47542/50000 [8:37:32<25:34,  1.60it/s]


 95%|█████████████████████████████████▎ | 47543/50000 [8:37:32<24:20,  1.68it/s]


 95%|█████████████████████████████████▎ | 47544/50000 [8:37:33<24:53,  1.64it/s]


 95%|█████████████████████████████████▎ | 47545/50000 [8:37:34<25:44,  1.59it/s]


 95%|█████████████████████████████████▎ | 47546/50000 [8:37:34<24:48,  1.65it/s]


 95%|█████████████████████████████████▎ | 47547/50000 [8:37:35<24:35,  1.66it/s]


 95%|█████████████████████████████████▎ | 47548/50000 [8:37:36<27:21,  1.49it/s]


 95%|█████████████████████████████████▎ | 47549/50000 [8:37:36<26:02,  1.57it/s]


 95%|█████████████████████████████████▎ | 47550/50000 [8:37:37<25:03,  1.63it/s]


 95%|█████████████████████████████████▎ | 47551/50000 [8:37:37<24:29,  1.67it/s]


 95%|█████████████████████████████████▎ | 47552/50000 [8:37:38<24:53,  1.64it/s]


 95%|█████████████████████████████████▎ | 47553/50000 [8:37:39<25:28,  1.60it/s]


 95%|█████████████████████████████████▎ | 47554/50000 [8:37:39<26:40,  1.53it/s]


 95%|█████████████████████████████████▎ | 47555/50000 [8:37:40<25:19,  1.61it/s]


 95%|█████████████████████████████████▎ | 47556/50000 [8:37:40<24:36,  1.66it/s]


 95%|█████████████████████████████████▎ | 47557/50000 [8:37:41<27:25,  1.49it/s]


 95%|█████████████████████████████████▎ | 47558/50000 [8:37:42<28:10,  1.44it/s]


 95%|█████████████████████████████████▎ | 47559/50000 [8:37:42<26:05,  1.56it/s]


 95%|█████████████████████████████████▎ | 47560/50000 [8:37:43<26:35,  1.53it/s]


 95%|█████████████████████████████████▎ | 47561/50000 [8:37:44<26:21,  1.54it/s]


 95%|█████████████████████████████████▎ | 47562/50000 [8:37:44<26:05,  1.56it/s]


 95%|█████████████████████████████████▎ | 47563/50000 [8:37:45<26:53,  1.51it/s]


 95%|█████████████████████████████████▎ | 47564/50000 [8:37:46<26:50,  1.51it/s]


 95%|█████████████████████████████████▎ | 47565/50000 [8:37:46<25:08,  1.61it/s]


 95%|█████████████████████████████████▎ | 47566/50000 [8:37:47<26:20,  1.54it/s]


 95%|█████████████████████████████████▎ | 47567/50000 [8:37:48<26:40,  1.52it/s]


 95%|█████████████████████████████████▎ | 47568/50000 [8:37:48<25:27,  1.59it/s]


 95%|█████████████████████████████████▎ | 47569/50000 [8:37:49<24:59,  1.62it/s]


 95%|█████████████████████████████████▎ | 47570/50000 [8:37:50<26:41,  1.52it/s]


 95%|█████████████████████████████████▎ | 47571/50000 [8:37:50<29:37,  1.37it/s]


 95%|█████████████████████████████████▎ | 47572/50000 [8:37:51<28:49,  1.40it/s]


 95%|█████████████████████████████████▎ | 47573/50000 [8:37:52<27:25,  1.48it/s]


 95%|█████████████████████████████████▎ | 47574/50000 [8:37:52<26:55,  1.50it/s]


 95%|█████████████████████████████████▎ | 47575/50000 [8:37:53<26:05,  1.55it/s]


 95%|█████████████████████████████████▎ | 47576/50000 [8:37:54<26:03,  1.55it/s]


 95%|█████████████████████████████████▎ | 47577/50000 [8:37:54<26:20,  1.53it/s]


 95%|█████████████████████████████████▎ | 47578/50000 [8:37:55<25:14,  1.60it/s]


 95%|█████████████████████████████████▎ | 47579/50000 [8:37:55<24:35,  1.64it/s]


 95%|█████████████████████████████████▎ | 47580/50000 [8:37:56<25:23,  1.59it/s]


 95%|█████████████████████████████████▎ | 47581/50000 [8:37:57<25:28,  1.58it/s]


 95%|█████████████████████████████████▎ | 47582/50000 [8:37:57<24:45,  1.63it/s]


 95%|█████████████████████████████████▎ | 47583/50000 [8:37:58<26:09,  1.54it/s]


 95%|█████████████████████████████████▎ | 47584/50000 [8:37:59<26:15,  1.53it/s]


 95%|█████████████████████████████████▎ | 47585/50000 [8:37:59<25:01,  1.61it/s]


 95%|█████████████████████████████████▎ | 47586/50000 [8:38:00<25:18,  1.59it/s]


 95%|█████████████████████████████████▎ | 47587/50000 [8:38:01<25:52,  1.55it/s]


 95%|█████████████████████████████████▎ | 47588/50000 [8:38:01<26:11,  1.53it/s]


 95%|█████████████████████████████████▎ | 47589/50000 [8:38:02<26:06,  1.54it/s]


 95%|█████████████████████████████████▎ | 47590/50000 [8:38:03<25:49,  1.56it/s]


 95%|█████████████████████████████████▎ | 47591/50000 [8:38:03<24:55,  1.61it/s]


 95%|█████████████████████████████████▎ | 47592/50000 [8:38:04<25:40,  1.56it/s]


 95%|█████████████████████████████████▎ | 47593/50000 [8:38:04<25:53,  1.55it/s]


 95%|█████████████████████████████████▎ | 47594/50000 [8:38:05<25:48,  1.55it/s]


 95%|█████████████████████████████████▎ | 47595/50000 [8:38:06<25:08,  1.59it/s]


 95%|█████████████████████████████████▎ | 47596/50000 [8:38:06<24:29,  1.64it/s]


 95%|█████████████████████████████████▎ | 47597/50000 [8:38:07<24:45,  1.62it/s]


 95%|█████████████████████████████████▎ | 47598/50000 [8:38:08<25:23,  1.58it/s]


 95%|█████████████████████████████████▎ | 47599/50000 [8:38:08<28:02,  1.43it/s]


 95%|█████████████████████████████████▎ | 47600/50000 [8:38:09<26:46,  1.49it/s]
                                                                                
{'loss': 3.0879, 'grad_norm': 3.2134528160095215, 'learning_rate': 4.8e-05, 'epoch': 2.49}

 95%|█████████████████████████████████▎ | 47600/50000 [8:38:09<26:46,  1.49it/s]


 95%|█████████████████████████████████▎ | 47601/50000 [8:38:10<26:15,  1.52it/s]


 95%|█████████████████████████████████▎ | 47602/50000 [8:38:10<28:39,  1.39it/s]


 95%|█████████████████████████████████▎ | 47603/50000 [8:38:11<29:50,  1.34it/s]


 95%|█████████████████████████████████▎ | 47604/50000 [8:38:12<28:55,  1.38it/s]


 95%|█████████████████████████████████▎ | 47605/50000 [8:38:13<27:26,  1.45it/s]


 95%|█████████████████████████████████▎ | 47606/50000 [8:38:13<27:04,  1.47it/s]


 95%|█████████████████████████████████▎ | 47607/50000 [8:38:14<25:46,  1.55it/s]


 95%|█████████████████████████████████▎ | 47608/50000 [8:38:14<25:45,  1.55it/s]


 95%|█████████████████████████████████▎ | 47609/50000 [8:38:15<26:44,  1.49it/s]


 95%|█████████████████████████████████▎ | 47610/50000 [8:38:16<23:56,  1.66it/s]


 95%|█████████████████████████████████▎ | 47611/50000 [8:38:16<23:55,  1.66it/s]


 95%|█████████████████████████████████▎ | 47612/50000 [8:38:17<24:17,  1.64it/s]


 95%|█████████████████████████████████▎ | 47613/50000 [8:38:17<24:51,  1.60it/s]


 95%|█████████████████████████████████▎ | 47614/50000 [8:38:18<26:06,  1.52it/s]


 95%|█████████████████████████████████▎ | 47615/50000 [8:38:19<25:08,  1.58it/s]


 95%|█████████████████████████████████▎ | 47616/50000 [8:38:19<24:41,  1.61it/s]


 95%|█████████████████████████████████▎ | 47617/50000 [8:38:20<25:16,  1.57it/s]


 95%|█████████████████████████████████▎ | 47618/50000 [8:38:21<24:50,  1.60it/s]


 95%|█████████████████████████████████▎ | 47619/50000 [8:38:21<24:32,  1.62it/s]


 95%|█████████████████████████████████▎ | 47620/50000 [8:38:22<24:43,  1.60it/s]


 95%|█████████████████████████████████▎ | 47621/50000 [8:38:23<24:58,  1.59it/s]


 95%|█████████████████████████████████▎ | 47622/50000 [8:38:23<25:04,  1.58it/s]


 95%|█████████████████████████████████▎ | 47623/50000 [8:38:24<25:33,  1.55it/s]


 95%|█████████████████████████████████▎ | 47624/50000 [8:38:24<24:45,  1.60it/s]


 95%|█████████████████████████████████▎ | 47625/50000 [8:38:25<23:18,  1.70it/s]


 95%|█████████████████████████████████▎ | 47626/50000 [8:38:26<23:40,  1.67it/s]


 95%|█████████████████████████████████▎ | 47627/50000 [8:38:26<22:41,  1.74it/s]


 95%|█████████████████████████████████▎ | 47628/50000 [8:38:27<23:23,  1.69it/s]


 95%|█████████████████████████████████▎ | 47629/50000 [8:38:27<24:23,  1.62it/s]


 95%|█████████████████████████████████▎ | 47630/50000 [8:38:28<25:56,  1.52it/s]


 95%|█████████████████████████████████▎ | 47631/50000 [8:38:29<25:43,  1.53it/s]


 95%|█████████████████████████████████▎ | 47632/50000 [8:38:29<25:48,  1.53it/s]


 95%|█████████████████████████████████▎ | 47633/50000 [8:38:30<25:55,  1.52it/s]


 95%|█████████████████████████████████▎ | 47634/50000 [8:38:31<26:36,  1.48it/s]


 95%|█████████████████████████████████▎ | 47635/50000 [8:38:31<25:12,  1.56it/s]


 95%|█████████████████████████████████▎ | 47636/50000 [8:38:32<26:21,  1.49it/s]


 95%|█████████████████████████████████▎ | 47637/50000 [8:38:33<26:50,  1.47it/s]


 95%|█████████████████████████████████▎ | 47638/50000 [8:38:34<28:20,  1.39it/s]


 95%|█████████████████████████████████▎ | 47639/50000 [8:38:34<27:18,  1.44it/s]


 95%|█████████████████████████████████▎ | 47640/50000 [8:38:35<28:01,  1.40it/s]


 95%|█████████████████████████████████▎ | 47641/50000 [8:38:36<27:08,  1.45it/s]


 95%|█████████████████████████████████▎ | 47642/50000 [8:38:36<24:51,  1.58it/s]


 95%|█████████████████████████████████▎ | 47643/50000 [8:38:37<25:01,  1.57it/s]


 95%|█████████████████████████████████▎ | 47644/50000 [8:38:37<24:04,  1.63it/s]


 95%|█████████████████████████████████▎ | 47645/50000 [8:38:38<24:53,  1.58it/s]


 95%|█████████████████████████████████▎ | 47646/50000 [8:38:39<24:16,  1.62it/s]


 95%|█████████████████████████████████▎ | 47647/50000 [8:38:39<23:58,  1.64it/s]


 95%|█████████████████████████████████▎ | 47648/50000 [8:38:40<23:15,  1.69it/s]


 95%|█████████████████████████████████▎ | 47649/50000 [8:38:40<23:20,  1.68it/s]


 95%|█████████████████████████████████▎ | 47650/50000 [8:38:41<23:46,  1.65it/s]


 95%|█████████████████████████████████▎ | 47651/50000 [8:38:42<24:00,  1.63it/s]


 95%|█████████████████████████████████▎ | 47652/50000 [8:38:42<24:51,  1.57it/s]


 95%|█████████████████████████████████▎ | 47653/50000 [8:38:43<26:25,  1.48it/s]


 95%|█████████████████████████████████▎ | 47654/50000 [8:38:44<27:32,  1.42it/s]


 95%|█████████████████████████████████▎ | 47655/50000 [8:38:45<26:51,  1.45it/s]


 95%|█████████████████████████████████▎ | 47656/50000 [8:38:45<26:14,  1.49it/s]


 95%|█████████████████████████████████▎ | 47657/50000 [8:38:46<24:30,  1.59it/s]


 95%|█████████████████████████████████▎ | 47658/50000 [8:38:46<24:13,  1.61it/s]


 95%|█████████████████████████████████▎ | 47659/50000 [8:38:47<24:00,  1.62it/s]


 95%|█████████████████████████████████▎ | 47660/50000 [8:38:48<24:09,  1.61it/s]


 95%|█████████████████████████████████▎ | 47661/50000 [8:38:48<23:27,  1.66it/s]


 95%|█████████████████████████████████▎ | 47662/50000 [8:38:49<23:41,  1.64it/s]


 95%|█████████████████████████████████▎ | 47663/50000 [8:38:49<24:35,  1.58it/s]


 95%|█████████████████████████████████▎ | 47664/50000 [8:38:50<25:08,  1.55it/s]


 95%|█████████████████████████████████▎ | 47665/50000 [8:38:51<24:51,  1.57it/s]


 95%|█████████████████████████████████▎ | 47666/50000 [8:38:51<24:43,  1.57it/s]


 95%|█████████████████████████████████▎ | 47667/50000 [8:38:52<24:19,  1.60it/s]


 95%|█████████████████████████████████▎ | 47668/50000 [8:38:53<25:29,  1.52it/s]


 95%|█████████████████████████████████▎ | 47669/50000 [8:38:53<24:17,  1.60it/s]


 95%|█████████████████████████████████▎ | 47670/50000 [8:38:54<27:02,  1.44it/s]


 95%|█████████████████████████████████▎ | 47671/50000 [8:38:55<25:52,  1.50it/s]


 95%|█████████████████████████████████▎ | 47672/50000 [8:38:55<25:06,  1.54it/s]


 95%|█████████████████████████████████▎ | 47673/50000 [8:38:56<24:49,  1.56it/s]


 95%|█████████████████████████████████▎ | 47674/50000 [8:38:56<24:10,  1.60it/s]


 95%|█████████████████████████████████▎ | 47675/50000 [8:38:57<24:22,  1.59it/s]


 95%|█████████████████████████████████▎ | 47676/50000 [8:38:58<25:23,  1.53it/s]


 95%|█████████████████████████████████▎ | 47677/50000 [8:38:59<26:12,  1.48it/s]


 95%|█████████████████████████████████▎ | 47678/50000 [8:38:59<27:25,  1.41it/s]


 95%|█████████████████████████████████▍ | 47679/50000 [8:39:00<27:00,  1.43it/s]


 95%|█████████████████████████████████▍ | 47680/50000 [8:39:01<25:40,  1.51it/s]


 95%|█████████████████████████████████▍ | 47681/50000 [8:39:01<23:53,  1.62it/s]


 95%|█████████████████████████████████▍ | 47682/50000 [8:39:02<24:18,  1.59it/s]


 95%|█████████████████████████████████▍ | 47683/50000 [8:39:02<23:59,  1.61it/s]


 95%|█████████████████████████████████▍ | 47684/50000 [8:39:03<24:10,  1.60it/s]


 95%|█████████████████████████████████▍ | 47685/50000 [8:39:04<22:51,  1.69it/s]


 95%|█████████████████████████████████▍ | 47686/50000 [8:39:04<23:46,  1.62it/s]


 95%|█████████████████████████████████▍ | 47687/50000 [8:39:05<23:06,  1.67it/s]


 95%|█████████████████████████████████▍ | 47688/50000 [8:39:05<24:42,  1.56it/s]


 95%|█████████████████████████████████▍ | 47689/50000 [8:39:06<24:58,  1.54it/s]


 95%|█████████████████████████████████▍ | 47690/50000 [8:39:07<25:42,  1.50it/s]


 95%|█████████████████████████████████▍ | 47691/50000 [8:39:07<24:35,  1.57it/s]


 95%|█████████████████████████████████▍ | 47692/50000 [8:39:08<23:57,  1.61it/s]


 95%|█████████████████████████████████▍ | 47693/50000 [8:39:09<25:12,  1.52it/s]


 95%|█████████████████████████████████▍ | 47694/50000 [8:39:09<25:28,  1.51it/s]


 95%|█████████████████████████████████▍ | 47695/50000 [8:39:10<25:08,  1.53it/s]


 95%|█████████████████████████████████▍ | 47696/50000 [8:39:11<24:49,  1.55it/s]


 95%|█████████████████████████████████▍ | 47697/50000 [8:39:11<25:34,  1.50it/s]


 95%|█████████████████████████████████▍ | 47698/50000 [8:39:12<24:41,  1.55it/s]


 95%|█████████████████████████████████▍ | 47699/50000 [8:39:13<24:30,  1.56it/s]


 95%|█████████████████████████████████▍ | 47700/50000 [8:39:13<25:19,  1.51it/s]
                                                                                
{'loss': 3.1348, 'grad_norm': 3.241626024246216, 'learning_rate': 4.6e-05, 'epoch': 2.5}

 95%|█████████████████████████████████▍ | 47700/50000 [8:39:13<25:19,  1.51it/s]


 95%|█████████████████████████████████▍ | 47701/50000 [8:39:14<24:21,  1.57it/s]


 95%|█████████████████████████████████▍ | 47702/50000 [8:39:15<25:15,  1.52it/s]


 95%|█████████████████████████████████▍ | 47703/50000 [8:39:15<27:16,  1.40it/s]


 95%|█████████████████████████████████▍ | 47704/50000 [8:39:16<25:24,  1.51it/s]


 95%|█████████████████████████████████▍ | 47705/50000 [8:39:17<26:22,  1.45it/s]


 95%|█████████████████████████████████▍ | 47706/50000 [8:39:17<25:07,  1.52it/s]


 95%|█████████████████████████████████▍ | 47707/50000 [8:39:18<24:19,  1.57it/s]


 95%|█████████████████████████████████▍ | 47708/50000 [8:39:19<25:45,  1.48it/s]


 95%|█████████████████████████████████▍ | 47709/50000 [8:39:20<27:22,  1.40it/s]


 95%|█████████████████████████████████▍ | 47710/50000 [8:39:20<27:26,  1.39it/s]


 95%|█████████████████████████████████▍ | 47711/50000 [8:39:21<25:02,  1.52it/s]


 95%|█████████████████████████████████▍ | 47712/50000 [8:39:21<24:06,  1.58it/s]


 95%|█████████████████████████████████▍ | 47713/50000 [8:39:22<24:15,  1.57it/s]


 95%|█████████████████████████████████▍ | 47714/50000 [8:39:23<23:57,  1.59it/s]


 95%|█████████████████████████████████▍ | 47715/50000 [8:39:23<23:12,  1.64it/s]


 95%|█████████████████████████████████▍ | 47716/50000 [8:39:24<24:42,  1.54it/s]


 95%|█████████████████████████████████▍ | 47717/50000 [8:39:25<24:48,  1.53it/s]


 95%|█████████████████████████████████▍ | 47718/50000 [8:39:25<25:49,  1.47it/s]


 95%|█████████████████████████████████▍ | 47719/50000 [8:39:26<26:59,  1.41it/s]


 95%|█████████████████████████████████▍ | 47720/50000 [8:39:27<26:27,  1.44it/s]


 95%|█████████████████████████████████▍ | 47721/50000 [8:39:28<28:19,  1.34it/s]


 95%|█████████████████████████████████▍ | 47722/50000 [8:39:28<27:14,  1.39it/s]


 95%|█████████████████████████████████▍ | 47723/50000 [8:39:29<25:28,  1.49it/s]


 95%|█████████████████████████████████▍ | 47724/50000 [8:39:29<24:05,  1.57it/s]


 95%|█████████████████████████████████▍ | 47725/50000 [8:39:30<25:10,  1.51it/s]


 95%|█████████████████████████████████▍ | 47726/50000 [8:39:31<23:56,  1.58it/s]


 95%|█████████████████████████████████▍ | 47727/50000 [8:39:31<22:36,  1.68it/s]


 95%|█████████████████████████████████▍ | 47728/50000 [8:39:32<23:13,  1.63it/s]


 95%|█████████████████████████████████▍ | 47729/50000 [8:39:33<25:57,  1.46it/s]


 95%|█████████████████████████████████▍ | 47730/50000 [8:39:33<25:33,  1.48it/s]


 95%|█████████████████████████████████▍ | 47731/50000 [8:39:34<24:33,  1.54it/s]


 95%|█████████████████████████████████▍ | 47732/50000 [8:39:34<23:51,  1.58it/s]


 95%|█████████████████████████████████▍ | 47733/50000 [8:39:35<25:32,  1.48it/s]


 95%|█████████████████████████████████▍ | 47734/50000 [8:39:36<24:58,  1.51it/s]


 95%|█████████████████████████████████▍ | 47735/50000 [8:39:37<24:42,  1.53it/s]


 95%|█████████████████████████████████▍ | 47736/50000 [8:39:37<24:05,  1.57it/s]


 95%|█████████████████████████████████▍ | 47737/50000 [8:39:38<24:01,  1.57it/s]


 95%|█████████████████████████████████▍ | 47738/50000 [8:39:38<23:02,  1.64it/s]


 95%|█████████████████████████████████▍ | 47739/50000 [8:39:39<24:30,  1.54it/s]


 95%|█████████████████████████████████▍ | 47740/50000 [8:39:40<24:29,  1.54it/s]


 95%|█████████████████████████████████▍ | 47741/50000 [8:39:40<24:40,  1.53it/s]


 95%|█████████████████████████████████▍ | 47742/50000 [8:39:41<24:21,  1.54it/s]


 95%|█████████████████████████████████▍ | 47743/50000 [8:39:42<24:16,  1.55it/s]


 95%|█████████████████████████████████▍ | 47744/50000 [8:39:42<24:35,  1.53it/s]


 95%|█████████████████████████████████▍ | 47745/50000 [8:39:43<24:30,  1.53it/s]


 95%|█████████████████████████████████▍ | 47746/50000 [8:39:44<24:30,  1.53it/s]


 95%|█████████████████████████████████▍ | 47747/50000 [8:39:44<23:35,  1.59it/s]


 95%|█████████████████████████████████▍ | 47748/50000 [8:39:45<23:05,  1.63it/s]


 95%|█████████████████████████████████▍ | 47749/50000 [8:39:45<22:48,  1.64it/s]


 96%|█████████████████████████████████▍ | 47750/50000 [8:39:46<22:27,  1.67it/s]


 96%|█████████████████████████████████▍ | 47751/50000 [8:39:47<23:58,  1.56it/s]


 96%|█████████████████████████████████▍ | 47752/50000 [8:39:47<23:01,  1.63it/s]


 96%|█████████████████████████████████▍ | 47753/50000 [8:39:48<24:11,  1.55it/s]


 96%|█████████████████████████████████▍ | 47754/50000 [8:39:49<25:26,  1.47it/s]


 96%|█████████████████████████████████▍ | 47755/50000 [8:39:49<24:22,  1.54it/s]


 96%|█████████████████████████████████▍ | 47756/50000 [8:39:50<25:37,  1.46it/s]


 96%|█████████████████████████████████▍ | 47757/50000 [8:39:51<25:27,  1.47it/s]


 96%|█████████████████████████████████▍ | 47758/50000 [8:39:51<23:36,  1.58it/s]


 96%|█████████████████████████████████▍ | 47759/50000 [8:39:52<22:49,  1.64it/s]


 96%|█████████████████████████████████▍ | 47760/50000 [8:39:52<23:07,  1.61it/s]


 96%|█████████████████████████████████▍ | 47761/50000 [8:39:53<23:11,  1.61it/s]


 96%|█████████████████████████████████▍ | 47762/50000 [8:39:54<22:00,  1.70it/s]


 96%|█████████████████████████████████▍ | 47763/50000 [8:39:54<22:36,  1.65it/s]


 96%|█████████████████████████████████▍ | 47764/50000 [8:39:55<26:14,  1.42it/s]


 96%|█████████████████████████████████▍ | 47765/50000 [8:39:56<26:35,  1.40it/s]


 96%|█████████████████████████████████▍ | 47766/50000 [8:39:57<25:36,  1.45it/s]


 96%|█████████████████████████████████▍ | 47767/50000 [8:39:57<25:02,  1.49it/s]


 96%|█████████████████████████████████▍ | 47768/50000 [8:39:58<24:47,  1.50it/s]


 96%|█████████████████████████████████▍ | 47769/50000 [8:39:58<23:46,  1.56it/s]


 96%|█████████████████████████████████▍ | 47770/50000 [8:39:59<23:17,  1.60it/s]


 96%|█████████████████████████████████▍ | 47771/50000 [8:40:00<23:43,  1.57it/s]


 96%|█████████████████████████████████▍ | 47772/50000 [8:40:00<24:50,  1.50it/s]


 96%|█████████████████████████████████▍ | 47773/50000 [8:40:01<24:52,  1.49it/s]


 96%|█████████████████████████████████▍ | 47774/50000 [8:40:02<23:50,  1.56it/s]


 96%|█████████████████████████████████▍ | 47775/50000 [8:40:02<23:53,  1.55it/s]


 96%|█████████████████████████████████▍ | 47776/50000 [8:40:03<23:05,  1.61it/s]


 96%|█████████████████████████████████▍ | 47777/50000 [8:40:04<23:12,  1.60it/s]


 96%|█████████████████████████████████▍ | 47778/50000 [8:40:04<23:14,  1.59it/s]


 96%|█████████████████████████████████▍ | 47779/50000 [8:40:05<22:24,  1.65it/s]


 96%|█████████████████████████████████▍ | 47780/50000 [8:40:05<22:14,  1.66it/s]


 96%|█████████████████████████████████▍ | 47781/50000 [8:40:06<21:22,  1.73it/s]


 96%|█████████████████████████████████▍ | 47782/50000 [8:40:06<21:17,  1.74it/s]


 96%|█████████████████████████████████▍ | 47783/50000 [8:40:07<21:14,  1.74it/s]


 96%|█████████████████████████████████▍ | 47784/50000 [8:40:08<22:06,  1.67it/s]


 96%|█████████████████████████████████▍ | 47785/50000 [8:40:08<22:54,  1.61it/s]


 96%|█████████████████████████████████▍ | 47786/50000 [8:40:09<23:58,  1.54it/s]


 96%|█████████████████████████████████▍ | 47787/50000 [8:40:10<22:53,  1.61it/s]


 96%|█████████████████████████████████▍ | 47788/50000 [8:40:10<23:09,  1.59it/s]


 96%|█████████████████████████████████▍ | 47789/50000 [8:40:11<24:47,  1.49it/s]


 96%|█████████████████████████████████▍ | 47790/50000 [8:40:12<23:25,  1.57it/s]


 96%|█████████████████████████████████▍ | 47791/50000 [8:40:12<23:47,  1.55it/s]


 96%|█████████████████████████████████▍ | 47792/50000 [8:40:13<23:00,  1.60it/s]


 96%|█████████████████████████████████▍ | 47793/50000 [8:40:14<24:27,  1.50it/s]


 96%|█████████████████████████████████▍ | 47794/50000 [8:40:14<25:43,  1.43it/s]


 96%|█████████████████████████████████▍ | 47795/50000 [8:40:15<24:21,  1.51it/s]


 96%|█████████████████████████████████▍ | 47796/50000 [8:40:16<23:56,  1.53it/s]


 96%|█████████████████████████████████▍ | 47797/50000 [8:40:16<25:00,  1.47it/s]


 96%|█████████████████████████████████▍ | 47798/50000 [8:40:17<24:45,  1.48it/s]


 96%|█████████████████████████████████▍ | 47799/50000 [8:40:18<24:42,  1.48it/s]


 96%|█████████████████████████████████▍ | 47800/50000 [8:40:18<23:58,  1.53it/s]
                                                                                
{'loss': 3.1377, 'grad_norm': 3.528747320175171, 'learning_rate': 4.4e-05, 'epoch': 2.5}

 96%|█████████████████████████████████▍ | 47800/50000 [8:40:18<23:58,  1.53it/s]


 96%|█████████████████████████████████▍ | 47801/50000 [8:40:19<23:40,  1.55it/s]


 96%|█████████████████████████████████▍ | 47802/50000 [8:40:20<25:09,  1.46it/s]


 96%|█████████████████████████████████▍ | 47803/50000 [8:40:20<26:54,  1.36it/s]


 96%|█████████████████████████████████▍ | 47804/50000 [8:40:21<24:48,  1.48it/s]


 96%|█████████████████████████████████▍ | 47805/50000 [8:40:22<22:57,  1.59it/s]


 96%|█████████████████████████████████▍ | 47806/50000 [8:40:22<24:04,  1.52it/s]


 96%|█████████████████████████████████▍ | 47807/50000 [8:40:23<23:39,  1.55it/s]


 96%|█████████████████████████████████▍ | 47808/50000 [8:40:24<24:41,  1.48it/s]


 96%|█████████████████████████████████▍ | 47809/50000 [8:40:24<24:24,  1.50it/s]


 96%|█████████████████████████████████▍ | 47810/50000 [8:40:25<24:09,  1.51it/s]


 96%|█████████████████████████████████▍ | 47811/50000 [8:40:26<23:28,  1.55it/s]


 96%|█████████████████████████████████▍ | 47812/50000 [8:40:26<24:14,  1.50it/s]


 96%|█████████████████████████████████▍ | 47813/50000 [8:40:27<23:12,  1.57it/s]


 96%|█████████████████████████████████▍ | 47814/50000 [8:40:27<23:13,  1.57it/s]


 96%|█████████████████████████████████▍ | 47815/50000 [8:40:28<23:22,  1.56it/s]


 96%|█████████████████████████████████▍ | 47816/50000 [8:40:29<24:05,  1.51it/s]


 96%|█████████████████████████████████▍ | 47817/50000 [8:40:30<24:45,  1.47it/s]


 96%|█████████████████████████████████▍ | 47818/50000 [8:40:30<27:16,  1.33it/s]


 96%|█████████████████████████████████▍ | 47819/50000 [8:40:31<28:25,  1.28it/s]


 96%|█████████████████████████████████▍ | 47820/50000 [8:40:32<26:05,  1.39it/s]


 96%|█████████████████████████████████▍ | 47821/50000 [8:40:32<24:32,  1.48it/s]


 96%|█████████████████████████████████▍ | 47822/50000 [8:40:33<23:59,  1.51it/s]


 96%|█████████████████████████████████▍ | 47823/50000 [8:40:34<23:19,  1.56it/s]


 96%|█████████████████████████████████▍ | 47824/50000 [8:40:34<22:51,  1.59it/s]


 96%|█████████████████████████████████▍ | 47825/50000 [8:40:35<22:32,  1.61it/s]


 96%|█████████████████████████████████▍ | 47826/50000 [8:40:36<24:16,  1.49it/s]


 96%|█████████████████████████████████▍ | 47827/50000 [8:40:36<24:20,  1.49it/s]


 96%|█████████████████████████████████▍ | 47828/50000 [8:40:37<23:54,  1.51it/s]


 96%|█████████████████████████████████▍ | 47829/50000 [8:40:38<22:42,  1.59it/s]


 96%|█████████████████████████████████▍ | 47830/50000 [8:40:38<21:17,  1.70it/s]


 96%|█████████████████████████████████▍ | 47831/50000 [8:40:39<21:20,  1.69it/s]


 96%|█████████████████████████████████▍ | 47832/50000 [8:40:39<21:52,  1.65it/s]


 96%|█████████████████████████████████▍ | 47833/50000 [8:40:40<23:37,  1.53it/s]


 96%|█████████████████████████████████▍ | 47834/50000 [8:40:41<22:58,  1.57it/s]


 96%|█████████████████████████████████▍ | 47835/50000 [8:40:41<23:45,  1.52it/s]


 96%|█████████████████████████████████▍ | 47836/50000 [8:40:42<23:32,  1.53it/s]


 96%|█████████████████████████████████▍ | 47837/50000 [8:40:43<24:12,  1.49it/s]


 96%|█████████████████████████████████▍ | 47838/50000 [8:40:43<23:50,  1.51it/s]


 96%|█████████████████████████████████▍ | 47839/50000 [8:40:44<22:34,  1.60it/s]


 96%|█████████████████████████████████▍ | 47840/50000 [8:40:45<22:46,  1.58it/s]


 96%|█████████████████████████████████▍ | 47841/50000 [8:40:45<22:21,  1.61it/s]


 96%|█████████████████████████████████▍ | 47842/50000 [8:40:46<22:52,  1.57it/s]


 96%|█████████████████████████████████▍ | 47843/50000 [8:40:46<22:49,  1.57it/s]


 96%|█████████████████████████████████▍ | 47844/50000 [8:40:47<22:43,  1.58it/s]


 96%|█████████████████████████████████▍ | 47845/50000 [8:40:48<22:02,  1.63it/s]


 96%|█████████████████████████████████▍ | 47846/50000 [8:40:48<21:29,  1.67it/s]


 96%|█████████████████████████████████▍ | 47847/50000 [8:40:49<19:21,  1.85it/s]


 96%|█████████████████████████████████▍ | 47848/50000 [8:40:49<20:00,  1.79it/s]


 96%|█████████████████████████████████▍ | 47849/50000 [8:40:50<20:23,  1.76it/s]


 96%|█████████████████████████████████▍ | 47850/50000 [8:40:50<21:29,  1.67it/s]


 96%|█████████████████████████████████▍ | 47851/50000 [8:40:51<21:57,  1.63it/s]


 96%|█████████████████████████████████▍ | 47852/50000 [8:40:52<23:16,  1.54it/s]


 96%|█████████████████████████████████▍ | 47853/50000 [8:40:52<23:09,  1.55it/s]


 96%|█████████████████████████████████▍ | 47854/50000 [8:40:53<23:24,  1.53it/s]


 96%|█████████████████████████████████▍ | 47855/50000 [8:40:54<23:05,  1.55it/s]


 96%|█████████████████████████████████▍ | 47856/50000 [8:40:55<24:58,  1.43it/s]


 96%|█████████████████████████████████▍ | 47857/50000 [8:40:55<23:51,  1.50it/s]


 96%|█████████████████████████████████▌ | 47858/50000 [8:40:56<24:38,  1.45it/s]


 96%|█████████████████████████████████▌ | 47859/50000 [8:40:57<25:07,  1.42it/s]


 96%|█████████████████████████████████▌ | 47860/50000 [8:40:57<24:45,  1.44it/s]


 96%|█████████████████████████████████▌ | 47861/50000 [8:40:58<24:01,  1.48it/s]


 96%|█████████████████████████████████▌ | 47862/50000 [8:40:59<24:30,  1.45it/s]


 96%|█████████████████████████████████▌ | 47863/50000 [8:40:59<23:18,  1.53it/s]


 96%|█████████████████████████████████▌ | 47864/50000 [8:41:00<22:58,  1.55it/s]


 96%|█████████████████████████████████▌ | 47865/50000 [8:41:01<23:57,  1.49it/s]


 96%|█████████████████████████████████▌ | 47866/50000 [8:41:01<23:40,  1.50it/s]


 96%|█████████████████████████████████▌ | 47867/50000 [8:41:02<23:03,  1.54it/s]


 96%|█████████████████████████████████▌ | 47868/50000 [8:41:03<23:01,  1.54it/s]


 96%|█████████████████████████████████▌ | 47869/50000 [8:41:03<23:05,  1.54it/s]


 96%|█████████████████████████████████▌ | 47870/50000 [8:41:04<21:16,  1.67it/s]


 96%|█████████████████████████████████▌ | 47871/50000 [8:41:04<20:46,  1.71it/s]


 96%|█████████████████████████████████▌ | 47872/50000 [8:41:05<21:45,  1.63it/s]


 96%|█████████████████████████████████▌ | 47873/50000 [8:41:05<21:26,  1.65it/s]


 96%|█████████████████████████████████▌ | 47874/50000 [8:41:06<20:49,  1.70it/s]


 96%|█████████████████████████████████▌ | 47875/50000 [8:41:07<20:07,  1.76it/s]


 96%|█████████████████████████████████▌ | 47876/50000 [8:41:07<21:40,  1.63it/s]


 96%|█████████████████████████████████▌ | 47877/50000 [8:41:08<21:29,  1.65it/s]


 96%|█████████████████████████████████▌ | 47878/50000 [8:41:08<21:20,  1.66it/s]


 96%|█████████████████████████████████▌ | 47879/50000 [8:41:09<20:54,  1.69it/s]


 96%|█████████████████████████████████▌ | 47880/50000 [8:41:10<22:29,  1.57it/s]


 96%|█████████████████████████████████▌ | 47881/50000 [8:41:10<22:30,  1.57it/s]


 96%|█████████████████████████████████▌ | 47882/50000 [8:41:11<23:49,  1.48it/s]


 96%|█████████████████████████████████▌ | 47883/50000 [8:41:12<23:21,  1.51it/s]


 96%|█████████████████████████████████▌ | 47884/50000 [8:41:13<26:19,  1.34it/s]


 96%|█████████████████████████████████▌ | 47885/50000 [8:41:13<24:47,  1.42it/s]


 96%|█████████████████████████████████▌ | 47886/50000 [8:41:14<25:27,  1.38it/s]


 96%|█████████████████████████████████▌ | 47887/50000 [8:41:15<24:15,  1.45it/s]


 96%|█████████████████████████████████▌ | 47888/50000 [8:41:15<22:51,  1.54it/s]


 96%|█████████████████████████████████▌ | 47889/50000 [8:41:16<22:09,  1.59it/s]


 96%|█████████████████████████████████▌ | 47890/50000 [8:41:16<21:40,  1.62it/s]


 96%|█████████████████████████████████▌ | 47891/50000 [8:41:17<23:25,  1.50it/s]


 96%|█████████████████████████████████▌ | 47892/50000 [8:41:18<24:22,  1.44it/s]


 96%|█████████████████████████████████▌ | 47893/50000 [8:41:19<24:15,  1.45it/s]


 96%|█████████████████████████████████▌ | 47894/50000 [8:41:19<25:34,  1.37it/s]


 96%|█████████████████████████████████▌ | 47895/50000 [8:41:20<23:50,  1.47it/s]


 96%|█████████████████████████████████▌ | 47896/50000 [8:41:21<24:22,  1.44it/s]


 96%|█████████████████████████████████▌ | 47897/50000 [8:41:22<24:56,  1.40it/s]


 96%|█████████████████████████████████▌ | 47898/50000 [8:41:22<24:10,  1.45it/s]


 96%|█████████████████████████████████▌ | 47899/50000 [8:41:23<24:30,  1.43it/s]


 96%|█████████████████████████████████▌ | 47900/50000 [8:41:24<23:59,  1.46it/s]
                                                                                
{'loss': 3.1203, 'grad_norm': 3.362733840942383, 'learning_rate': 4.2000000000000004e-05, 'epoch': 2.51}

 96%|█████████████████████████████████▌ | 47900/50000 [8:41:24<23:59,  1.46it/s]


 96%|█████████████████████████████████▌ | 47901/50000 [8:41:24<22:46,  1.54it/s]


 96%|█████████████████████████████████▌ | 47902/50000 [8:41:25<22:50,  1.53it/s]


 96%|█████████████████████████████████▌ | 47903/50000 [8:41:25<22:56,  1.52it/s]


 96%|█████████████████████████████████▌ | 47904/50000 [8:41:26<23:38,  1.48it/s]


 96%|█████████████████████████████████▌ | 47905/50000 [8:41:27<22:46,  1.53it/s]


 96%|█████████████████████████████████▌ | 47906/50000 [8:41:27<22:17,  1.57it/s]


 96%|█████████████████████████████████▌ | 47907/50000 [8:41:28<21:29,  1.62it/s]


 96%|█████████████████████████████████▌ | 47908/50000 [8:41:29<21:51,  1.60it/s]


 96%|█████████████████████████████████▌ | 47909/50000 [8:41:29<23:07,  1.51it/s]


 96%|█████████████████████████████████▌ | 47910/50000 [8:41:30<22:15,  1.56it/s]


 96%|█████████████████████████████████▌ | 47911/50000 [8:41:31<23:24,  1.49it/s]


 96%|█████████████████████████████████▌ | 47912/50000 [8:41:31<22:19,  1.56it/s]


 96%|█████████████████████████████████▌ | 47913/50000 [8:41:32<22:41,  1.53it/s]


 96%|█████████████████████████████████▌ | 47914/50000 [8:41:33<22:44,  1.53it/s]


 96%|█████████████████████████████████▌ | 47915/50000 [8:41:33<22:14,  1.56it/s]


 96%|█████████████████████████████████▌ | 47916/50000 [8:41:34<21:53,  1.59it/s]


 96%|█████████████████████████████████▌ | 47917/50000 [8:41:34<21:23,  1.62it/s]


 96%|█████████████████████████████████▌ | 47918/50000 [8:41:35<21:05,  1.65it/s]


 96%|█████████████████████████████████▌ | 47919/50000 [8:41:36<21:28,  1.62it/s]


 96%|█████████████████████████████████▌ | 47920/50000 [8:41:36<21:44,  1.59it/s]


 96%|█████████████████████████████████▌ | 47921/50000 [8:41:37<21:27,  1.61it/s]


 96%|█████████████████████████████████▌ | 47922/50000 [8:41:38<22:23,  1.55it/s]


 96%|█████████████████████████████████▌ | 47923/50000 [8:41:38<23:34,  1.47it/s]


 96%|█████████████████████████████████▌ | 47924/50000 [8:41:39<22:44,  1.52it/s]


 96%|█████████████████████████████████▌ | 47925/50000 [8:41:39<22:00,  1.57it/s]


 96%|█████████████████████████████████▌ | 47926/50000 [8:41:40<22:09,  1.56it/s]


 96%|█████████████████████████████████▌ | 47927/50000 [8:41:41<23:11,  1.49it/s]


 96%|█████████████████████████████████▌ | 47928/50000 [8:41:42<22:51,  1.51it/s]


 96%|█████████████████████████████████▌ | 47929/50000 [8:41:42<22:36,  1.53it/s]


 96%|█████████████████████████████████▌ | 47930/50000 [8:41:43<24:23,  1.41it/s]


 96%|█████████████████████████████████▌ | 47931/50000 [8:41:44<24:04,  1.43it/s]


 96%|█████████████████████████████████▌ | 47932/50000 [8:41:44<23:26,  1.47it/s]


 96%|█████████████████████████████████▌ | 47933/50000 [8:41:45<22:37,  1.52it/s]


 96%|█████████████████████████████████▌ | 47934/50000 [8:41:45<21:13,  1.62it/s]


 96%|█████████████████████████████████▌ | 47935/50000 [8:41:46<21:42,  1.59it/s]


 96%|█████████████████████████████████▌ | 47936/50000 [8:41:47<22:47,  1.51it/s]


 96%|█████████████████████████████████▌ | 47937/50000 [8:41:47<22:43,  1.51it/s]


 96%|█████████████████████████████████▌ | 47938/50000 [8:41:48<23:49,  1.44it/s]


 96%|█████████████████████████████████▌ | 47939/50000 [8:41:49<24:35,  1.40it/s]


 96%|█████████████████████████████████▌ | 47940/50000 [8:41:50<22:52,  1.50it/s]


 96%|█████████████████████████████████▌ | 47941/50000 [8:41:50<21:36,  1.59it/s]


 96%|█████████████████████████████████▌ | 47942/50000 [8:41:51<22:10,  1.55it/s]


 96%|█████████████████████████████████▌ | 47943/50000 [8:41:52<23:10,  1.48it/s]


 96%|█████████████████████████████████▌ | 47944/50000 [8:41:52<24:09,  1.42it/s]


 96%|█████████████████████████████████▌ | 47945/50000 [8:41:53<23:27,  1.46it/s]


 96%|█████████████████████████████████▌ | 47946/50000 [8:41:54<22:21,  1.53it/s]


 96%|█████████████████████████████████▌ | 47947/50000 [8:41:54<22:10,  1.54it/s]


 96%|█████████████████████████████████▌ | 47948/50000 [8:41:55<21:22,  1.60it/s]


 96%|█████████████████████████████████▌ | 47949/50000 [8:41:55<21:40,  1.58it/s]


 96%|█████████████████████████████████▌ | 47950/50000 [8:41:56<21:56,  1.56it/s]


 96%|█████████████████████████████████▌ | 47951/50000 [8:41:57<21:55,  1.56it/s]


 96%|█████████████████████████████████▌ | 47952/50000 [8:41:57<21:07,  1.62it/s]


 96%|█████████████████████████████████▌ | 47953/50000 [8:41:58<22:36,  1.51it/s]


 96%|█████████████████████████████████▌ | 47954/50000 [8:41:59<22:16,  1.53it/s]


 96%|█████████████████████████████████▌ | 47955/50000 [8:41:59<22:19,  1.53it/s]


 96%|█████████████████████████████████▌ | 47956/50000 [8:42:00<21:35,  1.58it/s]


 96%|█████████████████████████████████▌ | 47957/50000 [8:42:01<21:47,  1.56it/s]


 96%|█████████████████████████████████▌ | 47958/50000 [8:42:01<21:55,  1.55it/s]


 96%|█████████████████████████████████▌ | 47959/50000 [8:42:02<23:48,  1.43it/s]


 96%|█████████████████████████████████▌ | 47960/50000 [8:42:03<21:37,  1.57it/s]


 96%|█████████████████████████████████▌ | 47961/50000 [8:42:03<22:47,  1.49it/s]


 96%|█████████████████████████████████▌ | 47962/50000 [8:42:04<22:28,  1.51it/s]


 96%|█████████████████████████████████▌ | 47963/50000 [8:42:05<22:30,  1.51it/s]


 96%|█████████████████████████████████▌ | 47964/50000 [8:42:05<22:28,  1.51it/s]


 96%|█████████████████████████████████▌ | 47965/50000 [8:42:06<21:44,  1.56it/s]


 96%|█████████████████████████████████▌ | 47966/50000 [8:42:06<21:47,  1.56it/s]


 96%|█████████████████████████████████▌ | 47967/50000 [8:42:07<20:50,  1.63it/s]


 96%|█████████████████████████████████▌ | 47968/50000 [8:42:08<20:22,  1.66it/s]


 96%|█████████████████████████████████▌ | 47969/50000 [8:42:08<20:13,  1.67it/s]


 96%|█████████████████████████████████▌ | 47970/50000 [8:42:09<20:12,  1.67it/s]


 96%|█████████████████████████████████▌ | 47971/50000 [8:42:10<21:20,  1.58it/s]


 96%|█████████████████████████████████▌ | 47972/50000 [8:42:10<22:07,  1.53it/s]


 96%|█████████████████████████████████▌ | 47973/50000 [8:42:11<22:12,  1.52it/s]


 96%|█████████████████████████████████▌ | 47974/50000 [8:42:12<22:00,  1.53it/s]


 96%|█████████████████████████████████▌ | 47975/50000 [8:42:12<21:06,  1.60it/s]


 96%|█████████████████████████████████▌ | 47976/50000 [8:42:13<21:18,  1.58it/s]


 96%|█████████████████████████████████▌ | 47977/50000 [8:42:13<20:40,  1.63it/s]


 96%|█████████████████████████████████▌ | 47978/50000 [8:42:14<21:43,  1.55it/s]


 96%|█████████████████████████████████▌ | 47979/50000 [8:42:15<21:45,  1.55it/s]


 96%|█████████████████████████████████▌ | 47980/50000 [8:42:15<21:49,  1.54it/s]


 96%|█████████████████████████████████▌ | 47981/50000 [8:42:16<21:23,  1.57it/s]


 96%|█████████████████████████████████▌ | 47982/50000 [8:42:16<20:14,  1.66it/s]


 96%|█████████████████████████████████▌ | 47983/50000 [8:42:17<19:54,  1.69it/s]


 96%|█████████████████████████████████▌ | 47984/50000 [8:42:18<20:10,  1.66it/s]


 96%|█████████████████████████████████▌ | 47985/50000 [8:42:18<21:21,  1.57it/s]


 96%|█████████████████████████████████▌ | 47986/50000 [8:42:19<20:52,  1.61it/s]


 96%|█████████████████████████████████▌ | 47987/50000 [8:42:20<22:18,  1.50it/s]


 96%|█████████████████████████████████▌ | 47988/50000 [8:42:20<23:13,  1.44it/s]


 96%|█████████████████████████████████▌ | 47989/50000 [8:42:21<23:46,  1.41it/s]


 96%|█████████████████████████████████▌ | 47990/50000 [8:42:22<22:06,  1.52it/s]


 96%|█████████████████████████████████▌ | 47991/50000 [8:42:22<22:08,  1.51it/s]


 96%|█████████████████████████████████▌ | 47992/50000 [8:42:23<21:54,  1.53it/s]


 96%|█████████████████████████████████▌ | 47993/50000 [8:42:24<22:09,  1.51it/s]


 96%|█████████████████████████████████▌ | 47994/50000 [8:42:24<22:10,  1.51it/s]


 96%|█████████████████████████████████▌ | 47995/50000 [8:42:25<21:47,  1.53it/s]


 96%|█████████████████████████████████▌ | 47996/50000 [8:42:26<21:50,  1.53it/s]


 96%|█████████████████████████████████▌ | 47997/50000 [8:42:26<22:00,  1.52it/s]


 96%|█████████████████████████████████▌ | 47998/50000 [8:42:27<22:46,  1.46it/s]


 96%|█████████████████████████████████▌ | 47999/50000 [8:42:28<24:18,  1.37it/s]


 96%|█████████████████████████████████▌ | 48000/50000 [8:42:29<23:28,  1.42it/s]
                                                                                
{'loss': 3.1203, 'grad_norm': 3.1171679496765137, 'learning_rate': 4e-05, 'epoch': 2.51}

 96%|█████████████████████████████████▌ | 48000/50000 [8:42:29<23:28,  1.42it/s]


 96%|█████████████████████████████████▌ | 48001/50000 [8:42:29<22:07,  1.51it/s]


 96%|█████████████████████████████████▌ | 48002/50000 [8:42:30<23:58,  1.39it/s]


 96%|█████████████████████████████████▌ | 48003/50000 [8:42:31<23:08,  1.44it/s]


 96%|█████████████████████████████████▌ | 48004/50000 [8:42:31<20:52,  1.59it/s]


 96%|█████████████████████████████████▌ | 48005/50000 [8:42:32<22:56,  1.45it/s]


 96%|█████████████████████████████████▌ | 48006/50000 [8:42:33<24:35,  1.35it/s]


 96%|█████████████████████████████████▌ | 48007/50000 [8:42:34<24:46,  1.34it/s]


 96%|█████████████████████████████████▌ | 48008/50000 [8:42:34<22:11,  1.50it/s]


 96%|█████████████████████████████████▌ | 48009/50000 [8:42:35<23:03,  1.44it/s]


 96%|█████████████████████████████████▌ | 48010/50000 [8:42:35<22:08,  1.50it/s]


 96%|█████████████████████████████████▌ | 48011/50000 [8:42:36<20:41,  1.60it/s]


 96%|█████████████████████████████████▌ | 48012/50000 [8:42:37<21:52,  1.51it/s]


 96%|█████████████████████████████████▌ | 48013/50000 [8:42:37<21:33,  1.54it/s]


 96%|█████████████████████████████████▌ | 48014/50000 [8:42:38<20:49,  1.59it/s]


 96%|█████████████████████████████████▌ | 48015/50000 [8:42:39<21:11,  1.56it/s]


 96%|█████████████████████████████████▌ | 48016/50000 [8:42:39<23:49,  1.39it/s]


 96%|█████████████████████████████████▌ | 48017/50000 [8:42:40<23:05,  1.43it/s]


 96%|█████████████████████████████████▌ | 48018/50000 [8:42:41<23:35,  1.40it/s]


 96%|█████████████████████████████████▌ | 48019/50000 [8:42:41<21:38,  1.53it/s]


 96%|█████████████████████████████████▌ | 48020/50000 [8:42:42<22:35,  1.46it/s]


 96%|█████████████████████████████████▌ | 48021/50000 [8:42:43<22:09,  1.49it/s]


 96%|█████████████████████████████████▌ | 48022/50000 [8:42:43<22:09,  1.49it/s]


 96%|█████████████████████████████████▌ | 48023/50000 [8:42:44<21:04,  1.56it/s]


 96%|█████████████████████████████████▌ | 48024/50000 [8:42:45<20:26,  1.61it/s]


 96%|█████████████████████████████████▌ | 48025/50000 [8:42:45<20:03,  1.64it/s]


 96%|█████████████████████████████████▌ | 48026/50000 [8:42:46<19:57,  1.65it/s]


 96%|█████████████████████████████████▌ | 48027/50000 [8:42:46<20:12,  1.63it/s]


 96%|█████████████████████████████████▌ | 48028/50000 [8:42:47<20:48,  1.58it/s]


 96%|█████████████████████████████████▌ | 48029/50000 [8:42:48<20:26,  1.61it/s]


 96%|█████████████████████████████████▌ | 48030/50000 [8:42:48<21:41,  1.51it/s]


 96%|█████████████████████████████████▌ | 48031/50000 [8:42:49<21:11,  1.55it/s]


 96%|█████████████████████████████████▌ | 48032/50000 [8:42:50<20:45,  1.58it/s]


 96%|█████████████████████████████████▌ | 48033/50000 [8:42:50<21:14,  1.54it/s]


 96%|█████████████████████████████████▌ | 48034/50000 [8:42:51<20:36,  1.59it/s]


 96%|█████████████████████████████████▌ | 48035/50000 [8:42:52<21:12,  1.54it/s]


 96%|█████████████████████████████████▋ | 48036/50000 [8:42:52<21:30,  1.52it/s]


 96%|█████████████████████████████████▋ | 48037/50000 [8:42:53<23:00,  1.42it/s]


 96%|█████████████████████████████████▋ | 48038/50000 [8:42:54<22:27,  1.46it/s]


 96%|█████████████████████████████████▋ | 48039/50000 [8:42:54<21:37,  1.51it/s]


 96%|█████████████████████████████████▋ | 48040/50000 [8:42:55<22:18,  1.46it/s]


 96%|█████████████████████████████████▋ | 48041/50000 [8:42:56<21:57,  1.49it/s]


 96%|█████████████████████████████████▋ | 48042/50000 [8:42:56<20:29,  1.59it/s]


 96%|█████████████████████████████████▋ | 48043/50000 [8:42:57<20:15,  1.61it/s]


 96%|█████████████████████████████████▋ | 48044/50000 [8:42:58<21:30,  1.52it/s]


 96%|█████████████████████████████████▋ | 48045/50000 [8:42:58<20:33,  1.59it/s]


 96%|█████████████████████████████████▋ | 48046/50000 [8:42:59<21:04,  1.55it/s]


 96%|█████████████████████████████████▋ | 48047/50000 [8:43:00<22:13,  1.47it/s]


 96%|█████████████████████████████████▋ | 48048/50000 [8:43:00<22:02,  1.48it/s]


 96%|█████████████████████████████████▋ | 48049/50000 [8:43:01<21:32,  1.51it/s]


 96%|█████████████████████████████████▋ | 48050/50000 [8:43:02<21:59,  1.48it/s]


 96%|█████████████████████████████████▋ | 48051/50000 [8:43:02<23:46,  1.37it/s]


 96%|█████████████████████████████████▋ | 48052/50000 [8:43:03<23:55,  1.36it/s]


 96%|█████████████████████████████████▋ | 48053/50000 [8:43:04<22:31,  1.44it/s]


 96%|█████████████████████████████████▋ | 48054/50000 [8:43:04<21:25,  1.51it/s]


 96%|█████████████████████████████████▋ | 48055/50000 [8:43:05<20:04,  1.62it/s]


 96%|█████████████████████████████████▋ | 48056/50000 [8:43:06<21:15,  1.52it/s]


 96%|█████████████████████████████████▋ | 48057/50000 [8:43:06<22:52,  1.42it/s]


 96%|█████████████████████████████████▋ | 48058/50000 [8:43:07<21:28,  1.51it/s]


 96%|█████████████████████████████████▋ | 48059/50000 [8:43:08<21:25,  1.51it/s]


 96%|█████████████████████████████████▋ | 48060/50000 [8:43:08<21:37,  1.50it/s]


 96%|█████████████████████████████████▋ | 48061/50000 [8:43:09<23:05,  1.40it/s]


 96%|█████████████████████████████████▋ | 48062/50000 [8:43:10<22:28,  1.44it/s]


 96%|█████████████████████████████████▋ | 48063/50000 [8:43:10<21:15,  1.52it/s]


 96%|█████████████████████████████████▋ | 48064/50000 [8:43:11<21:52,  1.48it/s]


 96%|█████████████████████████████████▋ | 48065/50000 [8:43:12<21:33,  1.50it/s]


 96%|█████████████████████████████████▋ | 48066/50000 [8:43:12<20:46,  1.55it/s]


 96%|█████████████████████████████████▋ | 48067/50000 [8:43:13<20:42,  1.56it/s]


 96%|█████████████████████████████████▋ | 48068/50000 [8:43:14<21:01,  1.53it/s]


 96%|█████████████████████████████████▋ | 48069/50000 [8:43:14<21:45,  1.48it/s]


 96%|█████████████████████████████████▋ | 48070/50000 [8:43:15<20:54,  1.54it/s]


 96%|█████████████████████████████████▋ | 48071/50000 [8:43:16<20:43,  1.55it/s]


 96%|█████████████████████████████████▋ | 48072/50000 [8:43:16<20:54,  1.54it/s]


 96%|█████████████████████████████████▋ | 48073/50000 [8:43:17<20:03,  1.60it/s]


 96%|█████████████████████████████████▋ | 48074/50000 [8:43:18<22:16,  1.44it/s]


 96%|█████████████████████████████████▋ | 48075/50000 [8:43:19<24:15,  1.32it/s]


 96%|█████████████████████████████████▋ | 48076/50000 [8:43:19<22:23,  1.43it/s]


 96%|█████████████████████████████████▋ | 48077/50000 [8:43:20<21:55,  1.46it/s]


 96%|█████████████████████████████████▋ | 48078/50000 [8:43:21<22:17,  1.44it/s]


 96%|█████████████████████████████████▋ | 48079/50000 [8:43:21<22:09,  1.44it/s]


 96%|█████████████████████████████████▋ | 48080/50000 [8:43:22<22:00,  1.45it/s]


 96%|█████████████████████████████████▋ | 48081/50000 [8:43:22<20:24,  1.57it/s]


 96%|█████████████████████████████████▋ | 48082/50000 [8:43:23<21:18,  1.50it/s]


 96%|█████████████████████████████████▋ | 48083/50000 [8:43:24<21:41,  1.47it/s]


 96%|█████████████████████████████████▋ | 48084/50000 [8:43:25<22:39,  1.41it/s]


 96%|█████████████████████████████████▋ | 48085/50000 [8:43:25<21:33,  1.48it/s]


 96%|█████████████████████████████████▋ | 48086/50000 [8:43:26<21:11,  1.50it/s]


 96%|█████████████████████████████████▋ | 48087/50000 [8:43:27<21:03,  1.51it/s]


 96%|█████████████████████████████████▋ | 48088/50000 [8:43:27<23:01,  1.38it/s]


 96%|█████████████████████████████████▋ | 48089/50000 [8:43:28<23:02,  1.38it/s]


 96%|█████████████████████████████████▋ | 48090/50000 [8:43:29<22:17,  1.43it/s]


 96%|█████████████████████████████████▋ | 48091/50000 [8:43:29<21:42,  1.47it/s]


 96%|█████████████████████████████████▋ | 48092/50000 [8:43:30<21:35,  1.47it/s]


 96%|█████████████████████████████████▋ | 48093/50000 [8:43:31<20:47,  1.53it/s]


 96%|█████████████████████████████████▋ | 48094/50000 [8:43:31<20:35,  1.54it/s]


 96%|█████████████████████████████████▋ | 48095/50000 [8:43:32<19:35,  1.62it/s]


 96%|█████████████████████████████████▋ | 48096/50000 [8:43:33<19:29,  1.63it/s]


 96%|█████████████████████████████████▋ | 48097/50000 [8:43:33<19:39,  1.61it/s]


 96%|█████████████████████████████████▋ | 48098/50000 [8:43:34<19:06,  1.66it/s]


 96%|█████████████████████████████████▋ | 48099/50000 [8:43:34<19:48,  1.60it/s]


 96%|█████████████████████████████████▋ | 48100/50000 [8:43:35<21:05,  1.50it/s]
                                                                                
{'loss': 3.1162, 'grad_norm': 2.9046573638916016, 'learning_rate': 3.8e-05, 'epoch': 2.52}

 96%|█████████████████████████████████▋ | 48100/50000 [8:43:35<21:05,  1.50it/s]


 96%|█████████████████████████████████▋ | 48101/50000 [8:43:36<20:42,  1.53it/s]


 96%|█████████████████████████████████▋ | 48102/50000 [8:43:36<19:02,  1.66it/s]


 96%|█████████████████████████████████▋ | 48103/50000 [8:43:37<19:29,  1.62it/s]


 96%|█████████████████████████████████▋ | 48104/50000 [8:43:38<20:48,  1.52it/s]


 96%|█████████████████████████████████▋ | 48105/50000 [8:43:38<20:36,  1.53it/s]


 96%|█████████████████████████████████▋ | 48106/50000 [8:43:39<19:52,  1.59it/s]


 96%|█████████████████████████████████▋ | 48107/50000 [8:43:40<19:53,  1.59it/s]


 96%|█████████████████████████████████▋ | 48108/50000 [8:43:40<19:25,  1.62it/s]


 96%|█████████████████████████████████▋ | 48109/50000 [8:43:41<19:16,  1.63it/s]


 96%|█████████████████████████████████▋ | 48110/50000 [8:43:41<18:59,  1.66it/s]


 96%|█████████████████████████████████▋ | 48111/50000 [8:43:42<19:08,  1.64it/s]


 96%|█████████████████████████████████▋ | 48112/50000 [8:43:43<20:36,  1.53it/s]


 96%|█████████████████████████████████▋ | 48113/50000 [8:43:43<19:21,  1.62it/s]


 96%|█████████████████████████████████▋ | 48114/50000 [8:43:44<19:33,  1.61it/s]


 96%|█████████████████████████████████▋ | 48115/50000 [8:43:45<20:08,  1.56it/s]


 96%|█████████████████████████████████▋ | 48116/50000 [8:43:45<21:49,  1.44it/s]


 96%|█████████████████████████████████▋ | 48117/50000 [8:43:46<22:12,  1.41it/s]


 96%|█████████████████████████████████▋ | 48118/50000 [8:43:47<21:58,  1.43it/s]


 96%|█████████████████████████████████▋ | 48119/50000 [8:43:47<21:48,  1.44it/s]


 96%|█████████████████████████████████▋ | 48120/50000 [8:43:48<21:34,  1.45it/s]


 96%|█████████████████████████████████▋ | 48121/50000 [8:43:49<21:20,  1.47it/s]


 96%|█████████████████████████████████▋ | 48122/50000 [8:43:49<21:40,  1.44it/s]


 96%|█████████████████████████████████▋ | 48123/50000 [8:43:50<21:09,  1.48it/s]


 96%|█████████████████████████████████▋ | 48124/50000 [8:43:51<21:02,  1.49it/s]


 96%|█████████████████████████████████▋ | 48125/50000 [8:43:51<21:04,  1.48it/s]


 96%|█████████████████████████████████▋ | 48126/50000 [8:43:52<20:08,  1.55it/s]


 96%|█████████████████████████████████▋ | 48127/50000 [8:43:53<18:41,  1.67it/s]


 96%|█████████████████████████████████▋ | 48128/50000 [8:43:53<19:21,  1.61it/s]


 96%|█████████████████████████████████▋ | 48129/50000 [8:43:54<19:38,  1.59it/s]


 96%|█████████████████████████████████▋ | 48130/50000 [8:43:54<19:11,  1.62it/s]


 96%|█████████████████████████████████▋ | 48131/50000 [8:43:55<18:44,  1.66it/s]


 96%|█████████████████████████████████▋ | 48132/50000 [8:43:56<18:15,  1.71it/s]


 96%|█████████████████████████████████▋ | 48133/50000 [8:43:56<19:05,  1.63it/s]


 96%|█████████████████████████████████▋ | 48134/50000 [8:43:57<20:38,  1.51it/s]


 96%|█████████████████████████████████▋ | 48135/50000 [8:43:58<19:52,  1.56it/s]


 96%|█████████████████████████████████▋ | 48136/50000 [8:43:58<19:33,  1.59it/s]


 96%|█████████████████████████████████▋ | 48137/50000 [8:43:59<22:00,  1.41it/s]


 96%|█████████████████████████████████▋ | 48138/50000 [8:44:00<20:35,  1.51it/s]


 96%|█████████████████████████████████▋ | 48139/50000 [8:44:00<20:20,  1.52it/s]


 96%|█████████████████████████████████▋ | 48140/50000 [8:44:01<20:10,  1.54it/s]


 96%|█████████████████████████████████▋ | 48141/50000 [8:44:02<20:46,  1.49it/s]


 96%|█████████████████████████████████▋ | 48142/50000 [8:44:02<19:49,  1.56it/s]


 96%|█████████████████████████████████▋ | 48143/50000 [8:44:03<19:23,  1.60it/s]


 96%|█████████████████████████████████▋ | 48144/50000 [8:44:03<19:12,  1.61it/s]


 96%|█████████████████████████████████▋ | 48145/50000 [8:44:04<18:29,  1.67it/s]


 96%|█████████████████████████████████▋ | 48146/50000 [8:44:05<18:13,  1.70it/s]


 96%|█████████████████████████████████▋ | 48147/50000 [8:44:05<18:12,  1.70it/s]


 96%|█████████████████████████████████▋ | 48148/50000 [8:44:06<18:21,  1.68it/s]


 96%|█████████████████████████████████▋ | 48149/50000 [8:44:06<18:45,  1.64it/s]


 96%|█████████████████████████████████▋ | 48150/50000 [8:44:07<18:34,  1.66it/s]


 96%|█████████████████████████████████▋ | 48151/50000 [8:44:08<18:12,  1.69it/s]


 96%|█████████████████████████████████▋ | 48152/50000 [8:44:08<18:56,  1.63it/s]


 96%|█████████████████████████████████▋ | 48153/50000 [8:44:09<19:51,  1.55it/s]


 96%|█████████████████████████████████▋ | 48154/50000 [8:44:10<19:51,  1.55it/s]


 96%|█████████████████████████████████▋ | 48155/50000 [8:44:10<19:26,  1.58it/s]


 96%|█████████████████████████████████▋ | 48156/50000 [8:44:11<19:40,  1.56it/s]


 96%|█████████████████████████████████▋ | 48157/50000 [8:44:11<18:57,  1.62it/s]


 96%|█████████████████████████████████▋ | 48158/50000 [8:44:12<18:47,  1.63it/s]


 96%|█████████████████████████████████▋ | 48159/50000 [8:44:13<19:01,  1.61it/s]


 96%|█████████████████████████████████▋ | 48160/50000 [8:44:13<19:23,  1.58it/s]


 96%|█████████████████████████████████▋ | 48161/50000 [8:44:14<18:52,  1.62it/s]


 96%|█████████████████████████████████▋ | 48162/50000 [8:44:15<19:21,  1.58it/s]


 96%|█████████████████████████████████▋ | 48163/50000 [8:44:15<20:25,  1.50it/s]


 96%|█████████████████████████████████▋ | 48164/50000 [8:44:16<21:09,  1.45it/s]


 96%|█████████████████████████████████▋ | 48165/50000 [8:44:17<20:57,  1.46it/s]


 96%|█████████████████████████████████▋ | 48166/50000 [8:44:17<21:46,  1.40it/s]


 96%|█████████████████████████████████▋ | 48167/50000 [8:44:18<20:20,  1.50it/s]


 96%|█████████████████████████████████▋ | 48168/50000 [8:44:19<19:38,  1.56it/s]


 96%|█████████████████████████████████▋ | 48169/50000 [8:44:19<19:28,  1.57it/s]


 96%|█████████████████████████████████▋ | 48170/50000 [8:44:20<18:57,  1.61it/s]


 96%|█████████████████████████████████▋ | 48171/50000 [8:44:21<20:17,  1.50it/s]


 96%|█████████████████████████████████▋ | 48172/50000 [8:44:21<19:27,  1.57it/s]


 96%|█████████████████████████████████▋ | 48173/50000 [8:44:22<18:59,  1.60it/s]


 96%|█████████████████████████████████▋ | 48174/50000 [8:44:22<18:36,  1.64it/s]


 96%|█████████████████████████████████▋ | 48175/50000 [8:44:23<21:27,  1.42it/s]


 96%|█████████████████████████████████▋ | 48176/50000 [8:44:24<20:43,  1.47it/s]


 96%|█████████████████████████████████▋ | 48177/50000 [8:44:25<20:26,  1.49it/s]


 96%|█████████████████████████████████▋ | 48178/50000 [8:44:25<22:40,  1.34it/s]


 96%|█████████████████████████████████▋ | 48179/50000 [8:44:26<21:16,  1.43it/s]


 96%|█████████████████████████████████▋ | 48180/50000 [8:44:27<20:11,  1.50it/s]


 96%|█████████████████████████████████▋ | 48181/50000 [8:44:27<19:41,  1.54it/s]


 96%|█████████████████████████████████▋ | 48182/50000 [8:44:28<21:22,  1.42it/s]


 96%|█████████████████████████████████▋ | 48183/50000 [8:44:29<20:21,  1.49it/s]


 96%|█████████████████████████████████▋ | 48184/50000 [8:44:30<22:00,  1.38it/s]


 96%|█████████████████████████████████▋ | 48185/50000 [8:44:30<21:18,  1.42it/s]


 96%|█████████████████████████████████▋ | 48186/50000 [8:44:31<20:56,  1.44it/s]


 96%|█████████████████████████████████▋ | 48187/50000 [8:44:32<20:40,  1.46it/s]


 96%|█████████████████████████████████▋ | 48188/50000 [8:44:32<20:35,  1.47it/s]


 96%|█████████████████████████████████▋ | 48189/50000 [8:44:33<20:19,  1.49it/s]


 96%|█████████████████████████████████▋ | 48190/50000 [8:44:33<19:53,  1.52it/s]


 96%|█████████████████████████████████▋ | 48191/50000 [8:44:34<20:05,  1.50it/s]


 96%|█████████████████████████████████▋ | 48192/50000 [8:44:35<19:02,  1.58it/s]


 96%|█████████████████████████████████▋ | 48193/50000 [8:44:35<18:38,  1.62it/s]


 96%|█████████████████████████████████▋ | 48194/50000 [8:44:36<18:45,  1.60it/s]


 96%|█████████████████████████████████▋ | 48195/50000 [8:44:37<19:56,  1.51it/s]


 96%|█████████████████████████████████▋ | 48196/50000 [8:44:37<19:36,  1.53it/s]


 96%|█████████████████████████████████▋ | 48197/50000 [8:44:38<20:11,  1.49it/s]


 96%|█████████████████████████████████▋ | 48198/50000 [8:44:39<19:53,  1.51it/s]


 96%|█████████████████████████████████▋ | 48199/50000 [8:44:39<18:56,  1.59it/s]


 96%|█████████████████████████████████▋ | 48200/50000 [8:44:40<18:29,  1.62it/s]
                                                                                
{'loss': 3.1169, 'grad_norm': 4.269341945648193, 'learning_rate': 3.6e-05, 'epoch': 2.52}

 96%|█████████████████████████████████▋ | 48200/50000 [8:44:40<18:29,  1.62it/s]


 96%|█████████████████████████████████▋ | 48201/50000 [8:44:40<18:13,  1.64it/s]


 96%|█████████████████████████████████▋ | 48202/50000 [8:44:41<18:40,  1.60it/s]


 96%|█████████████████████████████████▋ | 48203/50000 [8:44:42<18:18,  1.64it/s]


 96%|█████████████████████████████████▋ | 48204/50000 [8:44:42<19:49,  1.51it/s]


 96%|█████████████████████████████████▋ | 48205/50000 [8:44:43<19:16,  1.55it/s]


 96%|█████████████████████████████████▋ | 48206/50000 [8:44:44<19:12,  1.56it/s]


 96%|█████████████████████████████████▋ | 48207/50000 [8:44:44<18:32,  1.61it/s]


 96%|█████████████████████████████████▋ | 48208/50000 [8:44:45<18:02,  1.66it/s]


 96%|█████████████████████████████████▋ | 48209/50000 [8:44:45<18:37,  1.60it/s]


 96%|█████████████████████████████████▋ | 48210/50000 [8:44:46<19:23,  1.54it/s]


 96%|█████████████████████████████████▋ | 48211/50000 [8:44:47<19:56,  1.50it/s]


 96%|█████████████████████████████████▋ | 48212/50000 [8:44:48<20:39,  1.44it/s]


 96%|█████████████████████████████████▋ | 48213/50000 [8:44:48<18:56,  1.57it/s]


 96%|█████████████████████████████████▋ | 48214/50000 [8:44:49<19:46,  1.50it/s]


 96%|█████████████████████████████████▊ | 48215/50000 [8:44:49<19:12,  1.55it/s]


 96%|█████████████████████████████████▊ | 48216/50000 [8:44:50<19:26,  1.53it/s]


 96%|█████████████████████████████████▊ | 48217/50000 [8:44:51<19:25,  1.53it/s]


 96%|█████████████████████████████████▊ | 48218/50000 [8:44:51<19:21,  1.53it/s]


 96%|█████████████████████████████████▊ | 48219/50000 [8:44:52<18:51,  1.57it/s]


 96%|█████████████████████████████████▊ | 48220/50000 [8:44:53<19:10,  1.55it/s]


 96%|█████████████████████████████████▊ | 48221/50000 [8:44:53<18:29,  1.60it/s]


 96%|█████████████████████████████████▊ | 48222/50000 [8:44:54<18:17,  1.62it/s]


 96%|█████████████████████████████████▊ | 48223/50000 [8:44:54<18:01,  1.64it/s]


 96%|█████████████████████████████████▊ | 48224/50000 [8:44:55<18:20,  1.61it/s]


 96%|█████████████████████████████████▊ | 48225/50000 [8:44:56<19:15,  1.54it/s]


 96%|█████████████████████████████████▊ | 48226/50000 [8:44:57<19:11,  1.54it/s]


 96%|█████████████████████████████████▊ | 48227/50000 [8:44:57<19:12,  1.54it/s]


 96%|█████████████████████████████████▊ | 48228/50000 [8:44:58<18:56,  1.56it/s]


 96%|█████████████████████████████████▊ | 48229/50000 [8:44:58<18:51,  1.57it/s]


 96%|█████████████████████████████████▊ | 48230/50000 [8:44:59<19:07,  1.54it/s]


 96%|█████████████████████████████████▊ | 48231/50000 [8:45:00<18:59,  1.55it/s]


 96%|█████████████████████████████████▊ | 48232/50000 [8:45:00<19:14,  1.53it/s]


 96%|█████████████████████████████████▊ | 48233/50000 [8:45:01<18:56,  1.56it/s]


 96%|█████████████████████████████████▊ | 48234/50000 [8:45:02<18:13,  1.61it/s]


 96%|█████████████████████████████████▊ | 48235/50000 [8:45:02<19:05,  1.54it/s]


 96%|█████████████████████████████████▊ | 48236/50000 [8:45:03<20:49,  1.41it/s]


 96%|█████████████████████████████████▊ | 48237/50000 [8:45:04<19:43,  1.49it/s]


 96%|█████████████████████████████████▊ | 48238/50000 [8:45:04<18:52,  1.56it/s]


 96%|█████████████████████████████████▊ | 48239/50000 [8:45:05<18:46,  1.56it/s]


 96%|█████████████████████████████████▊ | 48240/50000 [8:45:06<19:36,  1.50it/s]


 96%|█████████████████████████████████▊ | 48241/50000 [8:45:06<18:40,  1.57it/s]


 96%|█████████████████████████████████▊ | 48242/50000 [8:45:07<19:01,  1.54it/s]


 96%|█████████████████████████████████▊ | 48243/50000 [8:45:08<18:48,  1.56it/s]


 96%|█████████████████████████████████▊ | 48244/50000 [8:45:08<19:06,  1.53it/s]


 96%|█████████████████████████████████▊ | 48245/50000 [8:45:09<18:31,  1.58it/s]


 96%|█████████████████████████████████▊ | 48246/50000 [8:45:10<19:10,  1.52it/s]


 96%|█████████████████████████████████▊ | 48247/50000 [8:45:10<19:21,  1.51it/s]


 96%|█████████████████████████████████▊ | 48248/50000 [8:45:11<18:00,  1.62it/s]


 96%|█████████████████████████████████▊ | 48249/50000 [8:45:11<17:38,  1.65it/s]


 96%|█████████████████████████████████▊ | 48250/50000 [8:45:12<17:19,  1.68it/s]


 97%|█████████████████████████████████▊ | 48251/50000 [8:45:13<18:43,  1.56it/s]


 97%|█████████████████████████████████▊ | 48252/50000 [8:45:13<18:31,  1.57it/s]


 97%|█████████████████████████████████▊ | 48253/50000 [8:45:14<18:43,  1.56it/s]


 97%|█████████████████████████████████▊ | 48254/50000 [8:45:14<17:53,  1.63it/s]


 97%|█████████████████████████████████▊ | 48255/50000 [8:45:15<18:22,  1.58it/s]


 97%|█████████████████████████████████▊ | 48256/50000 [8:45:16<17:52,  1.63it/s]


 97%|█████████████████████████████████▊ | 48257/50000 [8:45:16<17:15,  1.68it/s]


 97%|█████████████████████████████████▊ | 48258/50000 [8:45:17<17:41,  1.64it/s]


 97%|█████████████████████████████████▊ | 48259/50000 [8:45:18<18:00,  1.61it/s]


 97%|█████████████████████████████████▊ | 48260/50000 [8:45:18<17:36,  1.65it/s]


 97%|█████████████████████████████████▊ | 48261/50000 [8:45:19<18:02,  1.61it/s]


 97%|█████████████████████████████████▊ | 48262/50000 [8:45:19<17:54,  1.62it/s]


 97%|█████████████████████████████████▊ | 48263/50000 [8:45:20<18:01,  1.61it/s]


 97%|█████████████████████████████████▊ | 48264/50000 [8:45:21<17:37,  1.64it/s]


 97%|█████████████████████████████████▊ | 48265/50000 [8:45:21<17:02,  1.70it/s]


 97%|█████████████████████████████████▊ | 48266/50000 [8:45:22<18:04,  1.60it/s]


 97%|█████████████████████████████████▊ | 48267/50000 [8:45:23<19:00,  1.52it/s]


 97%|█████████████████████████████████▊ | 48268/50000 [8:45:23<18:30,  1.56it/s]


 97%|█████████████████████████████████▊ | 48269/50000 [8:45:24<18:09,  1.59it/s]


 97%|█████████████████████████████████▊ | 48270/50000 [8:45:24<17:34,  1.64it/s]


 97%|█████████████████████████████████▊ | 48271/50000 [8:45:25<17:06,  1.68it/s]


 97%|█████████████████████████████████▊ | 48272/50000 [8:45:25<16:28,  1.75it/s]


 97%|█████████████████████████████████▊ | 48273/50000 [8:45:26<17:12,  1.67it/s]


 97%|█████████████████████████████████▊ | 48274/50000 [8:45:27<17:30,  1.64it/s]


 97%|█████████████████████████████████▊ | 48275/50000 [8:45:27<17:25,  1.65it/s]


 97%|█████████████████████████████████▊ | 48276/50000 [8:45:28<17:07,  1.68it/s]


 97%|█████████████████████████████████▊ | 48277/50000 [8:45:29<18:32,  1.55it/s]


 97%|█████████████████████████████████▊ | 48278/50000 [8:45:29<17:43,  1.62it/s]


 97%|█████████████████████████████████▊ | 48279/50000 [8:45:30<18:13,  1.57it/s]


 97%|█████████████████████████████████▊ | 48280/50000 [8:45:31<18:15,  1.57it/s]


 97%|█████████████████████████████████▊ | 48281/50000 [8:45:31<17:30,  1.64it/s]


 97%|█████████████████████████████████▊ | 48282/50000 [8:45:32<17:53,  1.60it/s]


 97%|█████████████████████████████████▊ | 48283/50000 [8:45:32<17:53,  1.60it/s]


 97%|█████████████████████████████████▊ | 48284/50000 [8:45:33<18:00,  1.59it/s]


 97%|█████████████████████████████████▊ | 48285/50000 [8:45:34<17:38,  1.62it/s]


 97%|█████████████████████████████████▊ | 48286/50000 [8:45:34<19:33,  1.46it/s]


 97%|█████████████████████████████████▊ | 48287/50000 [8:45:35<18:36,  1.53it/s]


 97%|█████████████████████████████████▊ | 48288/50000 [8:45:36<17:57,  1.59it/s]


 97%|█████████████████████████████████▊ | 48289/50000 [8:45:36<17:32,  1.63it/s]


 97%|█████████████████████████████████▊ | 48290/50000 [8:45:37<17:14,  1.65it/s]


 97%|█████████████████████████████████▊ | 48291/50000 [8:45:37<17:05,  1.67it/s]


 97%|█████████████████████████████████▊ | 48292/50000 [8:45:38<17:44,  1.60it/s]


 97%|█████████████████████████████████▊ | 48293/50000 [8:45:39<18:48,  1.51it/s]


 97%|█████████████████████████████████▊ | 48294/50000 [8:45:39<18:18,  1.55it/s]


 97%|█████████████████████████████████▊ | 48295/50000 [8:45:40<17:27,  1.63it/s]


 97%|█████████████████████████████████▊ | 48296/50000 [8:45:41<17:31,  1.62it/s]


 97%|█████████████████████████████████▊ | 48297/50000 [8:45:41<18:52,  1.50it/s]


 97%|█████████████████████████████████▊ | 48298/50000 [8:45:42<19:19,  1.47it/s]


 97%|█████████████████████████████████▊ | 48299/50000 [8:45:43<22:14,  1.27it/s]


 97%|█████████████████████████████████▊ | 48300/50000 [8:45:44<20:41,  1.37it/s]
                                                                                
{'loss': 3.1272, 'grad_norm': 3.2489213943481445, 'learning_rate': 3.4000000000000007e-05, 'epoch': 2.53}

 97%|█████████████████████████████████▊ | 48300/50000 [8:45:44<20:41,  1.37it/s]


 97%|█████████████████████████████████▊ | 48301/50000 [8:45:44<19:36,  1.44it/s]


 97%|█████████████████████████████████▊ | 48302/50000 [8:45:45<18:44,  1.51it/s]


 97%|█████████████████████████████████▊ | 48303/50000 [8:45:45<18:13,  1.55it/s]


 97%|█████████████████████████████████▊ | 48304/50000 [8:45:46<18:34,  1.52it/s]


 97%|█████████████████████████████████▊ | 48305/50000 [8:45:47<17:46,  1.59it/s]


 97%|█████████████████████████████████▊ | 48306/50000 [8:45:47<19:18,  1.46it/s]


 97%|█████████████████████████████████▊ | 48307/50000 [8:45:48<18:17,  1.54it/s]


 97%|█████████████████████████████████▊ | 48308/50000 [8:45:49<19:03,  1.48it/s]


 97%|█████████████████████████████████▊ | 48309/50000 [8:45:49<18:10,  1.55it/s]


 97%|█████████████████████████████████▊ | 48310/50000 [8:45:50<17:45,  1.59it/s]


 97%|█████████████████████████████████▊ | 48311/50000 [8:45:50<16:37,  1.69it/s]


 97%|█████████████████████████████████▊ | 48312/50000 [8:45:51<18:30,  1.52it/s]


 97%|█████████████████████████████████▊ | 48313/50000 [8:45:52<17:45,  1.58it/s]


 97%|█████████████████████████████████▊ | 48314/50000 [8:45:53<18:01,  1.56it/s]


 97%|█████████████████████████████████▊ | 48315/50000 [8:45:53<17:18,  1.62it/s]


 97%|█████████████████████████████████▊ | 48316/50000 [8:45:54<17:32,  1.60it/s]


 97%|█████████████████████████████████▊ | 48317/50000 [8:45:54<17:10,  1.63it/s]


 97%|█████████████████████████████████▊ | 48318/50000 [8:45:55<16:41,  1.68it/s]


 97%|█████████████████████████████████▊ | 48319/50000 [8:45:55<16:35,  1.69it/s]


 97%|█████████████████████████████████▊ | 48320/50000 [8:45:56<16:55,  1.65it/s]


 97%|█████████████████████████████████▊ | 48321/50000 [8:45:57<17:47,  1.57it/s]


 97%|█████████████████████████████████▊ | 48322/50000 [8:45:57<18:11,  1.54it/s]


 97%|█████████████████████████████████▊ | 48323/50000 [8:45:58<19:43,  1.42it/s]


 97%|█████████████████████████████████▊ | 48324/50000 [8:45:59<19:06,  1.46it/s]


 97%|█████████████████████████████████▊ | 48325/50000 [8:46:00<19:18,  1.45it/s]


 97%|█████████████████████████████████▊ | 48326/50000 [8:46:00<19:59,  1.40it/s]


 97%|█████████████████████████████████▊ | 48327/50000 [8:46:01<21:11,  1.32it/s]


 97%|█████████████████████████████████▊ | 48328/50000 [8:46:02<19:51,  1.40it/s]


 97%|█████████████████████████████████▊ | 48329/50000 [8:46:02<18:45,  1.48it/s]


 97%|█████████████████████████████████▊ | 48330/50000 [8:46:03<17:47,  1.57it/s]


 97%|█████████████████████████████████▊ | 48331/50000 [8:46:04<18:28,  1.51it/s]


 97%|█████████████████████████████████▊ | 48332/50000 [8:46:04<18:34,  1.50it/s]


 97%|█████████████████████████████████▊ | 48333/50000 [8:46:05<17:35,  1.58it/s]


 97%|█████████████████████████████████▊ | 48334/50000 [8:46:06<17:02,  1.63it/s]


 97%|█████████████████████████████████▊ | 48335/50000 [8:46:06<17:08,  1.62it/s]


 97%|█████████████████████████████████▊ | 48336/50000 [8:46:07<17:12,  1.61it/s]


 97%|█████████████████████████████████▊ | 48337/50000 [8:46:07<16:43,  1.66it/s]


 97%|█████████████████████████████████▊ | 48338/50000 [8:46:08<17:42,  1.56it/s]


 97%|█████████████████████████████████▊ | 48339/50000 [8:46:09<17:57,  1.54it/s]


 97%|█████████████████████████████████▊ | 48340/50000 [8:46:09<18:02,  1.53it/s]


 97%|█████████████████████████████████▊ | 48341/50000 [8:46:10<17:34,  1.57it/s]


 97%|█████████████████████████████████▊ | 48342/50000 [8:46:11<17:49,  1.55it/s]


 97%|█████████████████████████████████▊ | 48343/50000 [8:46:11<17:15,  1.60it/s]


 97%|█████████████████████████████████▊ | 48344/50000 [8:46:12<17:44,  1.56it/s]


 97%|█████████████████████████████████▊ | 48345/50000 [8:46:13<17:02,  1.62it/s]


 97%|█████████████████████████████████▊ | 48346/50000 [8:46:13<17:32,  1.57it/s]


 97%|█████████████████████████████████▊ | 48347/50000 [8:46:14<17:18,  1.59it/s]


 97%|█████████████████████████████████▊ | 48348/50000 [8:46:14<16:56,  1.63it/s]


 97%|█████████████████████████████████▊ | 48349/50000 [8:46:15<16:24,  1.68it/s]


 97%|█████████████████████████████████▊ | 48350/50000 [8:46:16<16:52,  1.63it/s]


 97%|█████████████████████████████████▊ | 48351/50000 [8:46:16<16:37,  1.65it/s]


 97%|█████████████████████████████████▊ | 48352/50000 [8:46:17<18:31,  1.48it/s]


 97%|█████████████████████████████████▊ | 48353/50000 [8:46:18<18:08,  1.51it/s]


 97%|█████████████████████████████████▊ | 48354/50000 [8:46:18<18:01,  1.52it/s]


 97%|█████████████████████████████████▊ | 48355/50000 [8:46:19<16:49,  1.63it/s]


 97%|█████████████████████████████████▊ | 48356/50000 [8:46:19<16:25,  1.67it/s]


 97%|█████████████████████████████████▊ | 48357/50000 [8:46:20<17:07,  1.60it/s]


 97%|█████████████████████████████████▊ | 48358/50000 [8:46:21<17:21,  1.58it/s]


 97%|█████████████████████████████████▊ | 48359/50000 [8:46:21<18:03,  1.52it/s]


 97%|█████████████████████████████████▊ | 48360/50000 [8:46:22<17:53,  1.53it/s]


 97%|█████████████████████████████████▊ | 48361/50000 [8:46:23<18:55,  1.44it/s]


 97%|█████████████████████████████████▊ | 48362/50000 [8:46:24<18:35,  1.47it/s]


 97%|█████████████████████████████████▊ | 48363/50000 [8:46:24<18:15,  1.49it/s]


 97%|█████████████████████████████████▊ | 48364/50000 [8:46:25<18:38,  1.46it/s]


 97%|█████████████████████████████████▊ | 48365/50000 [8:46:26<19:14,  1.42it/s]


 97%|█████████████████████████████████▊ | 48366/50000 [8:46:26<18:47,  1.45it/s]


 97%|█████████████████████████████████▊ | 48367/50000 [8:46:27<18:14,  1.49it/s]


 97%|█████████████████████████████████▊ | 48368/50000 [8:46:27<17:25,  1.56it/s]


 97%|█████████████████████████████████▊ | 48369/50000 [8:46:28<16:57,  1.60it/s]


 97%|█████████████████████████████████▊ | 48370/50000 [8:46:29<17:27,  1.56it/s]


 97%|█████████████████████████████████▊ | 48371/50000 [8:46:29<17:19,  1.57it/s]


 97%|█████████████████████████████████▊ | 48372/50000 [8:46:30<16:43,  1.62it/s]


 97%|█████████████████████████████████▊ | 48373/50000 [8:46:31<18:18,  1.48it/s]


 97%|█████████████████████████████████▊ | 48374/50000 [8:46:31<17:04,  1.59it/s]


 97%|█████████████████████████████████▊ | 48375/50000 [8:46:32<16:53,  1.60it/s]


 97%|█████████████████████████████████▊ | 48376/50000 [8:46:32<16:42,  1.62it/s]


 97%|█████████████████████████████████▊ | 48377/50000 [8:46:33<16:46,  1.61it/s]


 97%|█████████████████████████████████▊ | 48378/50000 [8:46:34<19:26,  1.39it/s]


 97%|█████████████████████████████████▊ | 48379/50000 [8:46:35<19:05,  1.41it/s]


 97%|█████████████████████████████████▊ | 48380/50000 [8:46:35<19:10,  1.41it/s]


 97%|█████████████████████████████████▊ | 48381/50000 [8:46:36<18:48,  1.44it/s]


 97%|█████████████████████████████████▊ | 48382/50000 [8:46:37<18:16,  1.48it/s]


 97%|█████████████████████████████████▊ | 48383/50000 [8:46:37<17:41,  1.52it/s]


 97%|█████████████████████████████████▊ | 48384/50000 [8:46:38<19:13,  1.40it/s]


 97%|█████████████████████████████████▊ | 48385/50000 [8:46:39<18:40,  1.44it/s]


 97%|█████████████████████████████████▊ | 48386/50000 [8:46:39<17:43,  1.52it/s]


 97%|█████████████████████████████████▊ | 48387/50000 [8:46:40<17:26,  1.54it/s]


 97%|█████████████████████████████████▊ | 48388/50000 [8:46:41<17:00,  1.58it/s]


 97%|█████████████████████████████████▊ | 48389/50000 [8:46:41<16:32,  1.62it/s]


 97%|█████████████████████████████████▊ | 48390/50000 [8:46:42<16:15,  1.65it/s]


 97%|█████████████████████████████████▊ | 48391/50000 [8:46:42<16:43,  1.60it/s]


 97%|█████████████████████████████████▊ | 48392/50000 [8:46:43<16:50,  1.59it/s]


 97%|█████████████████████████████████▉ | 48393/50000 [8:46:44<15:55,  1.68it/s]


 97%|█████████████████████████████████▉ | 48394/50000 [8:46:44<16:28,  1.62it/s]


 97%|█████████████████████████████████▉ | 48395/50000 [8:46:45<15:31,  1.72it/s]


 97%|█████████████████████████████████▉ | 48396/50000 [8:46:45<15:31,  1.72it/s]


 97%|█████████████████████████████████▉ | 48397/50000 [8:46:46<16:04,  1.66it/s]


 97%|█████████████████████████████████▉ | 48398/50000 [8:46:47<17:05,  1.56it/s]


 97%|█████████████████████████████████▉ | 48399/50000 [8:46:47<17:10,  1.55it/s]


 97%|█████████████████████████████████▉ | 48400/50000 [8:46:48<18:29,  1.44it/s]
                                                                                
{'loss': 3.1098, 'grad_norm': 3.4025847911834717, 'learning_rate': 3.2e-05, 'epoch': 2.53}

 97%|█████████████████████████████████▉ | 48400/50000 [8:46:48<18:29,  1.44it/s]


 97%|█████████████████████████████████▉ | 48401/50000 [8:46:49<17:44,  1.50it/s]


 97%|█████████████████████████████████▉ | 48402/50000 [8:46:49<17:40,  1.51it/s]


 97%|█████████████████████████████████▉ | 48403/50000 [8:46:50<17:49,  1.49it/s]


 97%|█████████████████████████████████▉ | 48404/50000 [8:46:51<17:37,  1.51it/s]


 97%|█████████████████████████████████▉ | 48405/50000 [8:46:51<16:58,  1.57it/s]


 97%|█████████████████████████████████▉ | 48406/50000 [8:46:52<17:04,  1.56it/s]


 97%|█████████████████████████████████▉ | 48407/50000 [8:46:53<17:48,  1.49it/s]


 97%|█████████████████████████████████▉ | 48408/50000 [8:46:53<16:50,  1.57it/s]


 97%|█████████████████████████████████▉ | 48409/50000 [8:46:54<16:34,  1.60it/s]


 97%|█████████████████████████████████▉ | 48410/50000 [8:46:55<16:49,  1.57it/s]


 97%|█████████████████████████████████▉ | 48411/50000 [8:46:55<17:00,  1.56it/s]


 97%|█████████████████████████████████▉ | 48412/50000 [8:46:56<16:16,  1.63it/s]


 97%|█████████████████████████████████▉ | 48413/50000 [8:46:56<16:42,  1.58it/s]


 97%|█████████████████████████████████▉ | 48414/50000 [8:46:57<16:26,  1.61it/s]


 97%|█████████████████████████████████▉ | 48415/50000 [8:46:58<17:26,  1.51it/s]


 97%|█████████████████████████████████▉ | 48416/50000 [8:46:58<16:55,  1.56it/s]


 97%|█████████████████████████████████▉ | 48417/50000 [8:46:59<16:29,  1.60it/s]


 97%|█████████████████████████████████▉ | 48418/50000 [8:47:00<16:20,  1.61it/s]


 97%|█████████████████████████████████▉ | 48419/50000 [8:47:00<15:53,  1.66it/s]


 97%|█████████████████████████████████▉ | 48420/50000 [8:47:01<16:03,  1.64it/s]


 97%|█████████████████████████████████▉ | 48421/50000 [8:47:02<17:05,  1.54it/s]


 97%|█████████████████████████████████▉ | 48422/50000 [8:47:02<17:33,  1.50it/s]


 97%|█████████████████████████████████▉ | 48423/50000 [8:47:03<17:28,  1.50it/s]


 97%|█████████████████████████████████▉ | 48424/50000 [8:47:04<17:28,  1.50it/s]


 97%|█████████████████████████████████▉ | 48425/50000 [8:47:04<17:12,  1.53it/s]


 97%|█████████████████████████████████▉ | 48426/50000 [8:47:05<16:45,  1.56it/s]


 97%|█████████████████████████████████▉ | 48427/50000 [8:47:05<16:36,  1.58it/s]


 97%|█████████████████████████████████▉ | 48428/50000 [8:47:06<16:41,  1.57it/s]


 97%|█████████████████████████████████▉ | 48429/50000 [8:47:07<16:12,  1.62it/s]


 97%|█████████████████████████████████▉ | 48430/50000 [8:47:07<16:18,  1.60it/s]


 97%|█████████████████████████████████▉ | 48431/50000 [8:47:08<16:45,  1.56it/s]


 97%|█████████████████████████████████▉ | 48432/50000 [8:47:09<17:02,  1.53it/s]


 97%|█████████████████████████████████▉ | 48433/50000 [8:47:09<16:55,  1.54it/s]


 97%|█████████████████████████████████▉ | 48434/50000 [8:47:10<16:50,  1.55it/s]


 97%|█████████████████████████████████▉ | 48435/50000 [8:47:11<17:22,  1.50it/s]


 97%|█████████████████████████████████▉ | 48436/50000 [8:47:11<16:54,  1.54it/s]


 97%|█████████████████████████████████▉ | 48437/50000 [8:47:12<17:25,  1.50it/s]


 97%|█████████████████████████████████▉ | 48438/50000 [8:47:13<16:44,  1.55it/s]


 97%|█████████████████████████████████▉ | 48439/50000 [8:47:13<16:46,  1.55it/s]


 97%|█████████████████████████████████▉ | 48440/50000 [8:47:14<16:23,  1.59it/s]


 97%|█████████████████████████████████▉ | 48441/50000 [8:47:15<17:09,  1.51it/s]


 97%|█████████████████████████████████▉ | 48442/50000 [8:47:15<17:51,  1.45it/s]


 97%|█████████████████████████████████▉ | 48443/50000 [8:47:16<17:01,  1.52it/s]


 97%|█████████████████████████████████▉ | 48444/50000 [8:47:17<17:03,  1.52it/s]


 97%|█████████████████████████████████▉ | 48445/50000 [8:47:17<16:18,  1.59it/s]


 97%|█████████████████████████████████▉ | 48446/50000 [8:47:18<16:40,  1.55it/s]


 97%|█████████████████████████████████▉ | 48447/50000 [8:47:18<16:02,  1.61it/s]


 97%|█████████████████████████████████▉ | 48448/50000 [8:47:19<15:33,  1.66it/s]


 97%|█████████████████████████████████▉ | 48449/50000 [8:47:19<15:23,  1.68it/s]


 97%|█████████████████████████████████▉ | 48450/50000 [8:47:20<15:49,  1.63it/s]


 97%|█████████████████████████████████▉ | 48451/50000 [8:47:21<15:25,  1.67it/s]


 97%|█████████████████████████████████▉ | 48452/50000 [8:47:21<15:21,  1.68it/s]


 97%|█████████████████████████████████▉ | 48453/50000 [8:47:22<15:27,  1.67it/s]


 97%|█████████████████████████████████▉ | 48454/50000 [8:47:22<15:15,  1.69it/s]


 97%|█████████████████████████████████▉ | 48455/50000 [8:47:23<16:30,  1.56it/s]


 97%|█████████████████████████████████▉ | 48456/50000 [8:47:24<19:26,  1.32it/s]


 97%|█████████████████████████████████▉ | 48457/50000 [8:47:25<17:59,  1.43it/s]


 97%|█████████████████████████████████▉ | 48458/50000 [8:47:25<17:25,  1.48it/s]


 97%|█████████████████████████████████▉ | 48459/50000 [8:47:26<17:21,  1.48it/s]


 97%|█████████████████████████████████▉ | 48460/50000 [8:47:27<16:31,  1.55it/s]


 97%|█████████████████████████████████▉ | 48461/50000 [8:47:27<17:03,  1.50it/s]


 97%|█████████████████████████████████▉ | 48462/50000 [8:47:28<17:23,  1.47it/s]


 97%|█████████████████████████████████▉ | 48463/50000 [8:47:29<17:16,  1.48it/s]


 97%|█████████████████████████████████▉ | 48464/50000 [8:47:29<16:56,  1.51it/s]


 97%|█████████████████████████████████▉ | 48465/50000 [8:47:30<16:04,  1.59it/s]


 97%|█████████████████████████████████▉ | 48466/50000 [8:47:31<16:47,  1.52it/s]


 97%|█████████████████████████████████▉ | 48467/50000 [8:47:31<16:10,  1.58it/s]


 97%|█████████████████████████████████▉ | 48468/50000 [8:47:32<16:27,  1.55it/s]


 97%|█████████████████████████████████▉ | 48469/50000 [8:47:33<18:03,  1.41it/s]


 97%|█████████████████████████████████▉ | 48470/50000 [8:47:33<16:51,  1.51it/s]


 97%|█████████████████████████████████▉ | 48471/50000 [8:47:34<18:19,  1.39it/s]


 97%|█████████████████████████████████▉ | 48472/50000 [8:47:35<17:44,  1.43it/s]


 97%|█████████████████████████████████▉ | 48473/50000 [8:47:35<17:09,  1.48it/s]


 97%|█████████████████████████████████▉ | 48474/50000 [8:47:36<16:18,  1.56it/s]


 97%|█████████████████████████████████▉ | 48475/50000 [8:47:37<16:00,  1.59it/s]


 97%|█████████████████████████████████▉ | 48476/50000 [8:47:37<16:03,  1.58it/s]


 97%|█████████████████████████████████▉ | 48477/50000 [8:47:38<17:44,  1.43it/s]


 97%|█████████████████████████████████▉ | 48478/50000 [8:47:39<17:22,  1.46it/s]


 97%|█████████████████████████████████▉ | 48479/50000 [8:47:39<16:43,  1.52it/s]


 97%|█████████████████████████████████▉ | 48480/50000 [8:47:40<16:25,  1.54it/s]


 97%|█████████████████████████████████▉ | 48481/50000 [8:47:41<16:30,  1.53it/s]


 97%|█████████████████████████████████▉ | 48482/50000 [8:47:41<17:06,  1.48it/s]


 97%|█████████████████████████████████▉ | 48483/50000 [8:47:42<16:06,  1.57it/s]


 97%|█████████████████████████████████▉ | 48484/50000 [8:47:43<16:02,  1.57it/s]


 97%|█████████████████████████████████▉ | 48485/50000 [8:47:43<15:56,  1.58it/s]


 97%|█████████████████████████████████▉ | 48486/50000 [8:47:44<16:00,  1.58it/s]


 97%|█████████████████████████████████▉ | 48487/50000 [8:47:44<15:34,  1.62it/s]


 97%|█████████████████████████████████▉ | 48488/50000 [8:47:45<15:24,  1.64it/s]


 97%|█████████████████████████████████▉ | 48489/50000 [8:47:46<15:22,  1.64it/s]


 97%|█████████████████████████████████▉ | 48490/50000 [8:47:46<15:27,  1.63it/s]


 97%|█████████████████████████████████▉ | 48491/50000 [8:47:47<16:16,  1.55it/s]


 97%|█████████████████████████████████▉ | 48492/50000 [8:47:47<15:35,  1.61it/s]


 97%|█████████████████████████████████▉ | 48493/50000 [8:47:48<15:11,  1.65it/s]


 97%|█████████████████████████████████▉ | 48494/50000 [8:47:49<14:30,  1.73it/s]


 97%|█████████████████████████████████▉ | 48495/50000 [8:47:49<14:29,  1.73it/s]


 97%|█████████████████████████████████▉ | 48496/50000 [8:47:50<15:05,  1.66it/s]


 97%|█████████████████████████████████▉ | 48497/50000 [8:47:51<15:41,  1.60it/s]


 97%|█████████████████████████████████▉ | 48498/50000 [8:47:51<15:41,  1.60it/s]


 97%|█████████████████████████████████▉ | 48499/50000 [8:47:52<15:38,  1.60it/s]


 97%|█████████████████████████████████▉ | 48500/50000 [8:47:52<15:07,  1.65it/s]
                                                                                
{'loss': 3.1325, 'grad_norm': 3.0853631496429443, 'learning_rate': 3e-05, 'epoch': 2.54}

 97%|█████████████████████████████████▉ | 48500/50000 [8:47:52<15:07,  1.65it/s]


 97%|█████████████████████████████████▉ | 48501/50000 [8:47:53<16:59,  1.47it/s]


 97%|█████████████████████████████████▉ | 48502/50000 [8:47:54<16:21,  1.53it/s]


 97%|█████████████████████████████████▉ | 48503/50000 [8:47:54<15:41,  1.59it/s]


 97%|█████████████████████████████████▉ | 48504/50000 [8:47:55<15:26,  1.62it/s]


 97%|█████████████████████████████████▉ | 48505/50000 [8:47:56<15:27,  1.61it/s]


 97%|█████████████████████████████████▉ | 48506/50000 [8:47:56<15:44,  1.58it/s]


 97%|█████████████████████████████████▉ | 48507/50000 [8:47:57<15:56,  1.56it/s]


 97%|█████████████████████████████████▉ | 48508/50000 [8:47:57<15:50,  1.57it/s]


 97%|█████████████████████████████████▉ | 48509/50000 [8:47:58<16:03,  1.55it/s]


 97%|█████████████████████████████████▉ | 48510/50000 [8:47:59<15:35,  1.59it/s]


 97%|█████████████████████████████████▉ | 48511/50000 [8:47:59<15:20,  1.62it/s]


 97%|█████████████████████████████████▉ | 48512/50000 [8:48:00<15:41,  1.58it/s]


 97%|█████████████████████████████████▉ | 48513/50000 [8:48:01<15:30,  1.60it/s]


 97%|█████████████████████████████████▉ | 48514/50000 [8:48:01<15:36,  1.59it/s]


 97%|█████████████████████████████████▉ | 48515/50000 [8:48:02<16:59,  1.46it/s]


 97%|█████████████████████████████████▉ | 48516/50000 [8:48:03<16:18,  1.52it/s]


 97%|█████████████████████████████████▉ | 48517/50000 [8:48:03<16:11,  1.53it/s]


 97%|█████████████████████████████████▉ | 48518/50000 [8:48:04<16:18,  1.52it/s]


 97%|█████████████████████████████████▉ | 48519/50000 [8:48:05<15:43,  1.57it/s]


 97%|█████████████████████████████████▉ | 48520/50000 [8:48:05<15:23,  1.60it/s]


 97%|█████████████████████████████████▉ | 48521/50000 [8:48:06<16:07,  1.53it/s]


 97%|█████████████████████████████████▉ | 48522/50000 [8:48:06<15:18,  1.61it/s]


 97%|█████████████████████████████████▉ | 48523/50000 [8:48:07<15:45,  1.56it/s]


 97%|█████████████████████████████████▉ | 48524/50000 [8:48:08<15:13,  1.62it/s]


 97%|█████████████████████████████████▉ | 48525/50000 [8:48:08<15:38,  1.57it/s]


 97%|█████████████████████████████████▉ | 48526/50000 [8:48:09<15:22,  1.60it/s]


 97%|█████████████████████████████████▉ | 48527/50000 [8:48:10<16:22,  1.50it/s]


 97%|█████████████████████████████████▉ | 48528/50000 [8:48:10<16:01,  1.53it/s]


 97%|█████████████████████████████████▉ | 48529/50000 [8:48:11<15:00,  1.63it/s]


 97%|█████████████████████████████████▉ | 48530/50000 [8:48:12<16:35,  1.48it/s]


 97%|█████████████████████████████████▉ | 48531/50000 [8:48:12<15:55,  1.54it/s]


 97%|█████████████████████████████████▉ | 48532/50000 [8:48:13<15:33,  1.57it/s]


 97%|█████████████████████████████████▉ | 48533/50000 [8:48:14<15:37,  1.57it/s]


 97%|█████████████████████████████████▉ | 48534/50000 [8:48:14<15:11,  1.61it/s]


 97%|█████████████████████████████████▉ | 48535/50000 [8:48:15<14:19,  1.71it/s]


 97%|█████████████████████████████████▉ | 48536/50000 [8:48:15<14:38,  1.67it/s]


 97%|█████████████████████████████████▉ | 48537/50000 [8:48:16<16:15,  1.50it/s]


 97%|█████████████████████████████████▉ | 48538/50000 [8:48:17<15:57,  1.53it/s]


 97%|█████████████████████████████████▉ | 48539/50000 [8:48:17<16:43,  1.46it/s]


 97%|█████████████████████████████████▉ | 48540/50000 [8:48:18<17:05,  1.42it/s]


 97%|█████████████████████████████████▉ | 48541/50000 [8:48:19<15:55,  1.53it/s]


 97%|█████████████████████████████████▉ | 48542/50000 [8:48:20<16:39,  1.46it/s]


 97%|█████████████████████████████████▉ | 48543/50000 [8:48:20<16:32,  1.47it/s]


 97%|█████████████████████████████████▉ | 48544/50000 [8:48:21<15:59,  1.52it/s]


 97%|█████████████████████████████████▉ | 48545/50000 [8:48:21<15:52,  1.53it/s]


 97%|█████████████████████████████████▉ | 48546/50000 [8:48:22<15:17,  1.58it/s]


 97%|█████████████████████████████████▉ | 48547/50000 [8:48:23<15:16,  1.59it/s]


 97%|█████████████████████████████████▉ | 48548/50000 [8:48:23<15:31,  1.56it/s]


 97%|█████████████████████████████████▉ | 48549/50000 [8:48:24<15:05,  1.60it/s]


 97%|█████████████████████████████████▉ | 48550/50000 [8:48:25<16:02,  1.51it/s]


 97%|█████████████████████████████████▉ | 48551/50000 [8:48:25<15:09,  1.59it/s]


 97%|█████████████████████████████████▉ | 48552/50000 [8:48:26<15:00,  1.61it/s]


 97%|█████████████████████████████████▉ | 48553/50000 [8:48:26<14:13,  1.69it/s]


 97%|█████████████████████████████████▉ | 48554/50000 [8:48:27<13:56,  1.73it/s]


 97%|█████████████████████████████████▉ | 48555/50000 [8:48:28<15:52,  1.52it/s]


 97%|█████████████████████████████████▉ | 48556/50000 [8:48:28<15:29,  1.55it/s]


 97%|█████████████████████████████████▉ | 48557/50000 [8:48:29<15:32,  1.55it/s]


 97%|█████████████████████████████████▉ | 48558/50000 [8:48:30<15:23,  1.56it/s]


 97%|█████████████████████████████████▉ | 48559/50000 [8:48:30<15:56,  1.51it/s]


 97%|█████████████████████████████████▉ | 48560/50000 [8:48:31<15:40,  1.53it/s]


 97%|█████████████████████████████████▉ | 48561/50000 [8:48:32<15:48,  1.52it/s]


 97%|█████████████████████████████████▉ | 48562/50000 [8:48:32<15:52,  1.51it/s]


 97%|█████████████████████████████████▉ | 48563/50000 [8:48:33<16:18,  1.47it/s]


 97%|█████████████████████████████████▉ | 48564/50000 [8:48:33<14:56,  1.60it/s]


 97%|█████████████████████████████████▉ | 48565/50000 [8:48:34<14:56,  1.60it/s]


 97%|█████████████████████████████████▉ | 48566/50000 [8:48:35<14:21,  1.66it/s]


 97%|█████████████████████████████████▉ | 48567/50000 [8:48:35<14:34,  1.64it/s]


 97%|█████████████████████████████████▉ | 48568/50000 [8:48:36<14:32,  1.64it/s]


 97%|█████████████████████████████████▉ | 48569/50000 [8:48:37<15:58,  1.49it/s]


 97%|█████████████████████████████████▉ | 48570/50000 [8:48:37<15:25,  1.55it/s]


 97%|█████████████████████████████████▉ | 48571/50000 [8:48:38<15:18,  1.56it/s]


 97%|██████████████████████████████████ | 48572/50000 [8:48:39<15:13,  1.56it/s]


 97%|██████████████████████████████████ | 48573/50000 [8:48:39<15:10,  1.57it/s]


 97%|██████████████████████████████████ | 48574/50000 [8:48:40<14:51,  1.60it/s]


 97%|██████████████████████████████████ | 48575/50000 [8:48:40<15:10,  1.57it/s]


 97%|██████████████████████████████████ | 48576/50000 [8:48:41<14:34,  1.63it/s]


 97%|██████████████████████████████████ | 48577/50000 [8:48:42<15:00,  1.58it/s]


 97%|██████████████████████████████████ | 48578/50000 [8:48:42<15:19,  1.55it/s]


 97%|██████████████████████████████████ | 48579/50000 [8:48:43<15:28,  1.53it/s]


 97%|██████████████████████████████████ | 48580/50000 [8:48:44<15:27,  1.53it/s]


 97%|██████████████████████████████████ | 48581/50000 [8:48:44<14:51,  1.59it/s]


 97%|██████████████████████████████████ | 48582/50000 [8:48:45<14:41,  1.61it/s]


 97%|██████████████████████████████████ | 48583/50000 [8:48:45<14:29,  1.63it/s]


 97%|██████████████████████████████████ | 48584/50000 [8:48:46<14:37,  1.61it/s]


 97%|██████████████████████████████████ | 48585/50000 [8:48:47<14:13,  1.66it/s]


 97%|██████████████████████████████████ | 48586/50000 [8:48:47<14:22,  1.64it/s]


 97%|██████████████████████████████████ | 48587/50000 [8:48:48<14:00,  1.68it/s]


 97%|██████████████████████████████████ | 48588/50000 [8:48:48<13:57,  1.69it/s]


 97%|██████████████████████████████████ | 48589/50000 [8:48:49<13:41,  1.72it/s]


 97%|██████████████████████████████████ | 48590/50000 [8:48:50<14:46,  1.59it/s]


 97%|██████████████████████████████████ | 48591/50000 [8:48:50<15:23,  1.53it/s]


 97%|██████████████████████████████████ | 48592/50000 [8:48:51<15:51,  1.48it/s]


 97%|██████████████████████████████████ | 48593/50000 [8:48:52<15:11,  1.54it/s]


 97%|██████████████████████████████████ | 48594/50000 [8:48:52<14:32,  1.61it/s]


 97%|██████████████████████████████████ | 48595/50000 [8:48:53<14:48,  1.58it/s]


 97%|██████████████████████████████████ | 48596/50000 [8:48:54<15:09,  1.54it/s]


 97%|██████████████████████████████████ | 48597/50000 [8:48:54<15:15,  1.53it/s]


 97%|██████████████████████████████████ | 48598/50000 [8:48:55<14:37,  1.60it/s]


 97%|██████████████████████████████████ | 48599/50000 [8:48:56<16:10,  1.44it/s]


 97%|██████████████████████████████████ | 48600/50000 [8:48:56<15:18,  1.52it/s]
                                                                                
{'loss': 3.1516, 'grad_norm': 3.0851123332977295, 'learning_rate': 2.8e-05, 'epoch': 2.54}

 97%|██████████████████████████████████ | 48600/50000 [8:48:56<15:18,  1.52it/s]


 97%|██████████████████████████████████ | 48601/50000 [8:48:57<15:18,  1.52it/s]


 97%|██████████████████████████████████ | 48602/50000 [8:48:58<14:39,  1.59it/s]


 97%|██████████████████████████████████ | 48603/50000 [8:48:58<14:03,  1.66it/s]


 97%|██████████████████████████████████ | 48604/50000 [8:48:59<15:47,  1.47it/s]


 97%|██████████████████████████████████ | 48605/50000 [8:48:59<14:50,  1.57it/s]


 97%|██████████████████████████████████ | 48606/50000 [8:49:00<14:21,  1.62it/s]


 97%|██████████████████████████████████ | 48607/50000 [8:49:01<14:08,  1.64it/s]


 97%|██████████████████████████████████ | 48608/50000 [8:49:01<13:56,  1.66it/s]


 97%|██████████████████████████████████ | 48609/50000 [8:49:02<13:53,  1.67it/s]


 97%|██████████████████████████████████ | 48610/50000 [8:49:02<14:17,  1.62it/s]


 97%|██████████████████████████████████ | 48611/50000 [8:49:03<14:10,  1.63it/s]


 97%|██████████████████████████████████ | 48612/50000 [8:49:04<13:41,  1.69it/s]


 97%|██████████████████████████████████ | 48613/50000 [8:49:04<14:08,  1.63it/s]


 97%|██████████████████████████████████ | 48614/50000 [8:49:05<14:35,  1.58it/s]


 97%|██████████████████████████████████ | 48615/50000 [8:49:06<14:55,  1.55it/s]


 97%|██████████████████████████████████ | 48616/50000 [8:49:06<15:37,  1.48it/s]


 97%|██████████████████████████████████ | 48617/50000 [8:49:07<14:52,  1.55it/s]


 97%|██████████████████████████████████ | 48618/50000 [8:49:08<15:02,  1.53it/s]


 97%|██████████████████████████████████ | 48619/50000 [8:49:08<14:51,  1.55it/s]


 97%|██████████████████████████████████ | 48620/50000 [8:49:09<14:16,  1.61it/s]


 97%|██████████████████████████████████ | 48621/50000 [8:49:09<14:26,  1.59it/s]


 97%|██████████████████████████████████ | 48622/50000 [8:49:10<14:35,  1.57it/s]


 97%|██████████████████████████████████ | 48623/50000 [8:49:11<14:20,  1.60it/s]


 97%|██████████████████████████████████ | 48624/50000 [8:49:11<15:03,  1.52it/s]


 97%|██████████████████████████████████ | 48625/50000 [8:49:12<15:46,  1.45it/s]


 97%|██████████████████████████████████ | 48626/50000 [8:49:13<15:07,  1.51it/s]


 97%|██████████████████████████████████ | 48627/50000 [8:49:14<15:26,  1.48it/s]


 97%|██████████████████████████████████ | 48628/50000 [8:49:14<14:45,  1.55it/s]


 97%|██████████████████████████████████ | 48629/50000 [8:49:15<14:26,  1.58it/s]


 97%|██████████████████████████████████ | 48630/50000 [8:49:15<14:05,  1.62it/s]


 97%|██████████████████████████████████ | 48631/50000 [8:49:16<14:11,  1.61it/s]


 97%|██████████████████████████████████ | 48632/50000 [8:49:17<14:17,  1.60it/s]


 97%|██████████████████████████████████ | 48633/50000 [8:49:17<14:27,  1.58it/s]


 97%|██████████████████████████████████ | 48634/50000 [8:49:18<15:11,  1.50it/s]


 97%|██████████████████████████████████ | 48635/50000 [8:49:19<16:16,  1.40it/s]


 97%|██████████████████████████████████ | 48636/50000 [8:49:20<16:29,  1.38it/s]


 97%|██████████████████████████████████ | 48637/50000 [8:49:20<15:14,  1.49it/s]


 97%|██████████████████████████████████ | 48638/50000 [8:49:21<15:00,  1.51it/s]


 97%|██████████████████████████████████ | 48639/50000 [8:49:21<14:28,  1.57it/s]


 97%|██████████████████████████████████ | 48640/50000 [8:49:22<13:32,  1.67it/s]


 97%|██████████████████████████████████ | 48641/50000 [8:49:22<13:30,  1.68it/s]


 97%|██████████████████████████████████ | 48642/50000 [8:49:23<13:47,  1.64it/s]


 97%|██████████████████████████████████ | 48643/50000 [8:49:24<13:43,  1.65it/s]


 97%|██████████████████████████████████ | 48644/50000 [8:49:24<14:46,  1.53it/s]


 97%|██████████████████████████████████ | 48645/50000 [8:49:25<15:14,  1.48it/s]


 97%|██████████████████████████████████ | 48646/50000 [8:49:26<14:39,  1.54it/s]


 97%|██████████████████████████████████ | 48647/50000 [8:49:26<14:02,  1.61it/s]


 97%|██████████████████████████████████ | 48648/50000 [8:49:27<13:53,  1.62it/s]


 97%|██████████████████████████████████ | 48649/50000 [8:49:27<13:30,  1.67it/s]


 97%|██████████████████████████████████ | 48650/50000 [8:49:28<13:31,  1.66it/s]


 97%|██████████████████████████████████ | 48651/50000 [8:49:29<13:09,  1.71it/s]


 97%|██████████████████████████████████ | 48652/50000 [8:49:29<13:04,  1.72it/s]


 97%|██████████████████████████████████ | 48653/50000 [8:49:30<13:19,  1.68it/s]


 97%|██████████████████████████████████ | 48654/50000 [8:49:30<13:37,  1.65it/s]


 97%|██████████████████████████████████ | 48655/50000 [8:49:31<13:02,  1.72it/s]


 97%|██████████████████████████████████ | 48656/50000 [8:49:32<12:56,  1.73it/s]


 97%|██████████████████████████████████ | 48657/50000 [8:49:32<14:00,  1.60it/s]


 97%|██████████████████████████████████ | 48658/50000 [8:49:33<13:38,  1.64it/s]


 97%|██████████████████████████████████ | 48659/50000 [8:49:33<13:36,  1.64it/s]


 97%|██████████████████████████████████ | 48660/50000 [8:49:34<15:06,  1.48it/s]


 97%|██████████████████████████████████ | 48661/50000 [8:49:35<14:56,  1.49it/s]


 97%|██████████████████████████████████ | 48662/50000 [8:49:36<14:41,  1.52it/s]


 97%|██████████████████████████████████ | 48663/50000 [8:49:36<14:47,  1.51it/s]


 97%|██████████████████████████████████ | 48664/50000 [8:49:37<16:03,  1.39it/s]


 97%|██████████████████████████████████ | 48665/50000 [8:49:38<15:37,  1.42it/s]


 97%|██████████████████████████████████ | 48666/50000 [8:49:38<14:54,  1.49it/s]


 97%|██████████████████████████████████ | 48667/50000 [8:49:39<14:28,  1.54it/s]


 97%|██████████████████████████████████ | 48668/50000 [8:49:40<14:02,  1.58it/s]


 97%|██████████████████████████████████ | 48669/50000 [8:49:40<13:10,  1.68it/s]


 97%|██████████████████████████████████ | 48670/50000 [8:49:41<13:32,  1.64it/s]


 97%|██████████████████████████████████ | 48671/50000 [8:49:41<13:18,  1.67it/s]


 97%|██████████████████████████████████ | 48672/50000 [8:49:42<13:27,  1.64it/s]


 97%|██████████████████████████████████ | 48673/50000 [8:49:43<13:49,  1.60it/s]


 97%|██████████████████████████████████ | 48674/50000 [8:49:43<14:04,  1.57it/s]


 97%|██████████████████████████████████ | 48675/50000 [8:49:44<14:43,  1.50it/s]


 97%|██████████████████████████████████ | 48676/50000 [8:49:45<14:27,  1.53it/s]


 97%|██████████████████████████████████ | 48677/50000 [8:49:45<13:57,  1.58it/s]


 97%|██████████████████████████████████ | 48678/50000 [8:49:46<13:57,  1.58it/s]


 97%|██████████████████████████████████ | 48679/50000 [8:49:46<14:08,  1.56it/s]


 97%|██████████████████████████████████ | 48680/50000 [8:49:47<14:25,  1.53it/s]


 97%|██████████████████████████████████ | 48681/50000 [8:49:48<14:06,  1.56it/s]


 97%|██████████████████████████████████ | 48682/50000 [8:49:48<14:03,  1.56it/s]


 97%|██████████████████████████████████ | 48683/50000 [8:49:49<14:49,  1.48it/s]


 97%|██████████████████████████████████ | 48684/50000 [8:49:50<15:05,  1.45it/s]


 97%|██████████████████████████████████ | 48685/50000 [8:49:50<14:43,  1.49it/s]


 97%|██████████████████████████████████ | 48686/50000 [8:49:51<15:11,  1.44it/s]


 97%|██████████████████████████████████ | 48687/50000 [8:49:52<14:58,  1.46it/s]


 97%|██████████████████████████████████ | 48688/50000 [8:49:53<14:43,  1.49it/s]


 97%|██████████████████████████████████ | 48689/50000 [8:49:53<15:07,  1.44it/s]


 97%|██████████████████████████████████ | 48690/50000 [8:49:54<14:38,  1.49it/s]


 97%|██████████████████████████████████ | 48691/50000 [8:49:54<13:34,  1.61it/s]


 97%|██████████████████████████████████ | 48692/50000 [8:49:55<13:55,  1.57it/s]


 97%|██████████████████████████████████ | 48693/50000 [8:49:56<14:28,  1.51it/s]


 97%|██████████████████████████████████ | 48694/50000 [8:49:56<14:17,  1.52it/s]


 97%|██████████████████████████████████ | 48695/50000 [8:49:57<13:57,  1.56it/s]


 97%|██████████████████████████████████ | 48696/50000 [8:49:58<14:29,  1.50it/s]


 97%|██████████████████████████████████ | 48697/50000 [8:49:59<14:53,  1.46it/s]


 97%|██████████████████████████████████ | 48698/50000 [8:49:59<14:37,  1.48it/s]


 97%|██████████████████████████████████ | 48699/50000 [8:50:00<14:20,  1.51it/s]


 97%|██████████████████████████████████ | 48700/50000 [8:50:00<14:18,  1.51it/s]


                                                                                
{'loss': 3.098, 'grad_norm': 3.149322032928467, 'learning_rate': 2.6e-05, 'epoch': 2.55}

 97%|██████████████████████████████████ | 48700/50000 [8:50:00<14:18,  1.51it/s]


 97%|██████████████████████████████████ | 48701/50000 [8:50:01<14:14,  1.52it/s]


 97%|██████████████████████████████████ | 48702/50000 [8:50:02<14:59,  1.44it/s]


 97%|██████████████████████████████████ | 48703/50000 [8:50:03<14:30,  1.49it/s]


 97%|██████████████████████████████████ | 48704/50000 [8:50:03<13:55,  1.55it/s]


 97%|██████████████████████████████████ | 48705/50000 [8:50:04<13:35,  1.59it/s]


 97%|██████████████████████████████████ | 48706/50000 [8:50:04<14:28,  1.49it/s]


 97%|██████████████████████████████████ | 48707/50000 [8:50:05<14:56,  1.44it/s]


 97%|██████████████████████████████████ | 48708/50000 [8:50:06<15:04,  1.43it/s]


 97%|██████████████████████████████████ | 48709/50000 [8:50:07<15:07,  1.42it/s]


 97%|██████████████████████████████████ | 48710/50000 [8:50:07<14:49,  1.45it/s]


 97%|██████████████████████████████████ | 48711/50000 [8:50:08<14:47,  1.45it/s]


 97%|██████████████████████████████████ | 48712/50000 [8:50:09<14:35,  1.47it/s]


 97%|██████████████████████████████████ | 48713/50000 [8:50:09<14:52,  1.44it/s]


 97%|██████████████████████████████████ | 48714/50000 [8:50:10<15:54,  1.35it/s]


 97%|██████████████████████████████████ | 48715/50000 [8:50:11<15:41,  1.37it/s]


 97%|██████████████████████████████████ | 48716/50000 [8:50:12<15:49,  1.35it/s]


 97%|██████████████████████████████████ | 48717/50000 [8:50:12<15:05,  1.42it/s]


 97%|██████████████████████████████████ | 48718/50000 [8:50:13<14:02,  1.52it/s]


 97%|██████████████████████████████████ | 48719/50000 [8:50:13<13:18,  1.60it/s]


 97%|██████████████████████████████████ | 48720/50000 [8:50:14<13:19,  1.60it/s]


 97%|██████████████████████████████████ | 48721/50000 [8:50:15<12:39,  1.68it/s]


 97%|██████████████████████████████████ | 48722/50000 [8:50:15<12:32,  1.70it/s]


 97%|██████████████████████████████████ | 48723/50000 [8:50:16<12:24,  1.72it/s]


 97%|██████████████████████████████████ | 48724/50000 [8:50:16<12:50,  1.66it/s]


 97%|██████████████████████████████████ | 48725/50000 [8:50:17<13:35,  1.56it/s]


 97%|██████████████████████████████████ | 48726/50000 [8:50:18<12:58,  1.64it/s]


 97%|██████████████████████████████████ | 48727/50000 [8:50:18<13:04,  1.62it/s]


 97%|██████████████████████████████████ | 48728/50000 [8:50:19<13:17,  1.60it/s]


 97%|██████████████████████████████████ | 48729/50000 [8:50:20<14:26,  1.47it/s]


 97%|██████████████████████████████████ | 48730/50000 [8:50:20<14:36,  1.45it/s]


 97%|██████████████████████████████████ | 48731/50000 [8:50:21<15:03,  1.41it/s]


 97%|██████████████████████████████████ | 48732/50000 [8:50:22<13:47,  1.53it/s]


 97%|██████████████████████████████████ | 48733/50000 [8:50:22<13:30,  1.56it/s]


 97%|██████████████████████████████████ | 48734/50000 [8:50:23<13:03,  1.62it/s]


 97%|██████████████████████████████████ | 48735/50000 [8:50:24<13:47,  1.53it/s]


 97%|██████████████████████████████████ | 48736/50000 [8:50:24<14:20,  1.47it/s]


 97%|██████████████████████████████████ | 48737/50000 [8:50:25<13:40,  1.54it/s]


 97%|██████████████████████████████████ | 48738/50000 [8:50:25<13:07,  1.60it/s]


 97%|██████████████████████████████████ | 48739/50000 [8:50:26<13:25,  1.57it/s]


 97%|██████████████████████████████████ | 48740/50000 [8:50:27<13:30,  1.55it/s]


 97%|██████████████████████████████████ | 48741/50000 [8:50:27<12:45,  1.65it/s]


 97%|██████████████████████████████████ | 48742/50000 [8:50:28<13:09,  1.59it/s]


 97%|██████████████████████████████████ | 48743/50000 [8:50:29<13:48,  1.52it/s]


 97%|██████████████████████████████████ | 48744/50000 [8:50:29<13:21,  1.57it/s]


 97%|██████████████████████████████████ | 48745/50000 [8:50:30<13:30,  1.55it/s]


 97%|██████████████████████████████████ | 48746/50000 [8:50:31<13:08,  1.59it/s]


 97%|██████████████████████████████████ | 48747/50000 [8:50:31<12:55,  1.62it/s]


 97%|██████████████████████████████████ | 48748/50000 [8:50:32<12:33,  1.66it/s]


 97%|██████████████████████████████████ | 48749/50000 [8:50:32<12:21,  1.69it/s]


 98%|██████████████████████████████████▏| 48750/50000 [8:50:33<12:38,  1.65it/s]


 98%|██████████████████████████████████▏| 48751/50000 [8:50:34<12:35,  1.65it/s]


 98%|██████████████████████████████████▏| 48752/50000 [8:50:34<13:00,  1.60it/s]


 98%|██████████████████████████████████▏| 48753/50000 [8:50:35<12:37,  1.65it/s]


 98%|██████████████████████████████████▏| 48754/50000 [8:50:35<12:51,  1.62it/s]


 98%|██████████████████████████████████▏| 48755/50000 [8:50:36<13:03,  1.59it/s]


 98%|██████████████████████████████████▏| 48756/50000 [8:50:37<13:41,  1.52it/s]


 98%|██████████████████████████████████▏| 48757/50000 [8:50:37<12:46,  1.62it/s]


 98%|██████████████████████████████████▏| 48758/50000 [8:50:38<13:08,  1.57it/s]


 98%|██████████████████████████████████▏| 48759/50000 [8:50:39<13:10,  1.57it/s]


 98%|██████████████████████████████████▏| 48760/50000 [8:50:39<12:50,  1.61it/s]


 98%|██████████████████████████████████▏| 48761/50000 [8:50:40<12:31,  1.65it/s]


 98%|██████████████████████████████████▏| 48762/50000 [8:50:40<12:41,  1.63it/s]


 98%|██████████████████████████████████▏| 48763/50000 [8:50:41<11:58,  1.72it/s]


 98%|██████████████████████████████████▏| 48764/50000 [8:50:42<12:31,  1.64it/s]


 98%|██████████████████████████████████▏| 48765/50000 [8:50:42<13:14,  1.55it/s]


 98%|██████████████████████████████████▏| 48766/50000 [8:50:43<13:01,  1.58it/s]


 98%|██████████████████████████████████▏| 48767/50000 [8:50:44<13:06,  1.57it/s]


 98%|██████████████████████████████████▏| 48768/50000 [8:50:44<12:43,  1.61it/s]


 98%|██████████████████████████████████▏| 48769/50000 [8:50:45<12:21,  1.66it/s]


 98%|██████████████████████████████████▏| 48770/50000 [8:50:45<12:09,  1.69it/s]


 98%|██████████████████████████████████▏| 48771/50000 [8:50:46<12:22,  1.66it/s]


 98%|██████████████████████████████████▏| 48772/50000 [8:50:47<12:40,  1.62it/s]


 98%|██████████████████████████████████▏| 48773/50000 [8:50:47<12:31,  1.63it/s]


 98%|██████████████████████████████████▏| 48774/50000 [8:50:48<13:07,  1.56it/s]


 98%|██████████████████████████████████▏| 48775/50000 [8:50:49<13:15,  1.54it/s]


 98%|██████████████████████████████████▏| 48776/50000 [8:50:49<13:11,  1.55it/s]


 98%|██████████████████████████████████▏| 48777/50000 [8:50:50<13:20,  1.53it/s]


 98%|██████████████████████████████████▏| 48778/50000 [8:50:50<12:30,  1.63it/s]


 98%|██████████████████████████████████▏| 48779/50000 [8:50:51<12:25,  1.64it/s]


 98%|██████████████████████████████████▏| 48780/50000 [8:50:52<12:31,  1.62it/s]


 98%|██████████████████████████████████▏| 48781/50000 [8:50:52<12:37,  1.61it/s]


 98%|██████████████████████████████████▏| 48782/50000 [8:50:53<12:20,  1.64it/s]


 98%|██████████████████████████████████▏| 48783/50000 [8:50:53<12:26,  1.63it/s]


 98%|██████████████████████████████████▏| 48784/50000 [8:50:54<12:31,  1.62it/s]


 98%|██████████████████████████████████▏| 48785/50000 [8:50:55<12:52,  1.57it/s]


 98%|██████████████████████████████████▏| 48786/50000 [8:50:56<14:43,  1.37it/s]


 98%|██████████████████████████████████▏| 48787/50000 [8:50:56<14:39,  1.38it/s]


 98%|██████████████████████████████████▏| 48788/50000 [8:50:57<14:45,  1.37it/s]


 98%|██████████████████████████████████▏| 48789/50000 [8:50:58<13:43,  1.47it/s]


 98%|██████████████████████████████████▏| 48790/50000 [8:50:59<14:17,  1.41it/s]


 98%|██████████████████████████████████▏| 48791/50000 [8:50:59<13:26,  1.50it/s]


 98%|██████████████████████████████████▏| 48792/50000 [8:51:00<13:23,  1.50it/s]


 98%|██████████████████████████████████▏| 48793/50000 [8:51:00<12:45,  1.58it/s]


 98%|██████████████████████████████████▏| 48794/50000 [8:51:01<12:58,  1.55it/s]


 98%|██████████████████████████████████▏| 48795/50000 [8:51:02<12:55,  1.55it/s]


 98%|██████████████████████████████████▏| 48796/50000 [8:51:02<12:48,  1.57it/s]


 98%|██████████████████████████████████▏| 48797/50000 [8:51:03<12:26,  1.61it/s]


 98%|██████████████████████████████████▏| 48798/50000 [8:51:03<12:33,  1.60it/s]


 98%|██████████████████████████████████▏| 48799/50000 [8:51:04<12:11,  1.64it/s]


 98%|██████████████████████████████████▏| 48800/50000 [8:51:05<12:34,  1.59it/s]
                                                                                
{'loss': 3.1132, 'grad_norm': 3.618619441986084, 'learning_rate': 2.4e-05, 'epoch': 2.55}

 98%|██████████████████████████████████▏| 48800/50000 [8:51:05<12:34,  1.59it/s]


 98%|██████████████████████████████████▏| 48801/50000 [8:51:06<13:39,  1.46it/s]


 98%|██████████████████████████████████▏| 48802/50000 [8:51:06<13:23,  1.49it/s]


 98%|██████████████████████████████████▏| 48803/50000 [8:51:07<13:05,  1.52it/s]


 98%|██████████████████████████████████▏| 48804/50000 [8:51:07<13:06,  1.52it/s]


 98%|██████████████████████████████████▏| 48805/50000 [8:51:08<12:48,  1.55it/s]


 98%|██████████████████████████████████▏| 48806/50000 [8:51:09<14:27,  1.38it/s]


 98%|██████████████████████████████████▏| 48807/50000 [8:51:10<13:44,  1.45it/s]


 98%|██████████████████████████████████▏| 48808/50000 [8:51:10<13:56,  1.43it/s]


 98%|██████████████████████████████████▏| 48809/50000 [8:51:11<13:18,  1.49it/s]


 98%|██████████████████████████████████▏| 48810/50000 [8:51:12<13:46,  1.44it/s]


 98%|██████████████████████████████████▏| 48811/50000 [8:51:12<14:08,  1.40it/s]


 98%|██████████████████████████████████▏| 48812/50000 [8:51:13<13:57,  1.42it/s]


 98%|██████████████████████████████████▏| 48813/50000 [8:51:14<13:17,  1.49it/s]


 98%|██████████████████████████████████▏| 48814/50000 [8:51:14<12:42,  1.55it/s]


 98%|██████████████████████████████████▏| 48815/50000 [8:51:15<12:47,  1.54it/s]


 98%|██████████████████████████████████▏| 48816/50000 [8:51:16<13:23,  1.47it/s]


 98%|██████████████████████████████████▏| 48817/50000 [8:51:16<13:17,  1.48it/s]


 98%|██████████████████████████████████▏| 48818/50000 [8:51:17<13:51,  1.42it/s]


 98%|██████████████████████████████████▏| 48819/50000 [8:51:18<13:53,  1.42it/s]


 98%|██████████████████████████████████▏| 48820/50000 [8:51:18<13:30,  1.46it/s]


 98%|██████████████████████████████████▏| 48821/50000 [8:51:19<13:22,  1.47it/s]


 98%|██████████████████████████████████▏| 48822/50000 [8:51:20<13:11,  1.49it/s]


 98%|██████████████████████████████████▏| 48823/50000 [8:51:20<13:08,  1.49it/s]


 98%|██████████████████████████████████▏| 48824/50000 [8:51:21<12:56,  1.51it/s]


 98%|██████████████████████████████████▏| 48825/50000 [8:51:22<12:44,  1.54it/s]


 98%|██████████████████████████████████▏| 48826/50000 [8:51:22<12:35,  1.55it/s]


 98%|██████████████████████████████████▏| 48827/50000 [8:51:23<12:15,  1.59it/s]


 98%|██████████████████████████████████▏| 48828/50000 [8:51:24<12:12,  1.60it/s]


 98%|██████████████████████████████████▏| 48829/50000 [8:51:24<12:51,  1.52it/s]


 98%|██████████████████████████████████▏| 48830/50000 [8:51:25<13:26,  1.45it/s]


 98%|██████████████████████████████████▏| 48831/50000 [8:51:26<13:12,  1.47it/s]


 98%|██████████████████████████████████▏| 48832/50000 [8:51:26<12:45,  1.53it/s]


 98%|██████████████████████████████████▏| 48833/50000 [8:51:27<12:12,  1.59it/s]


 98%|██████████████████████████████████▏| 48834/50000 [8:51:27<11:30,  1.69it/s]


 98%|██████████████████████████████████▏| 48835/50000 [8:51:28<11:24,  1.70it/s]


 98%|██████████████████████████████████▏| 48836/50000 [8:51:29<12:20,  1.57it/s]


 98%|██████████████████████████████████▏| 48837/50000 [8:51:29<12:22,  1.57it/s]


 98%|██████████████████████████████████▏| 48838/50000 [8:51:30<12:22,  1.57it/s]


 98%|██████████████████████████████████▏| 48839/50000 [8:51:31<11:53,  1.63it/s]


 98%|██████████████████████████████████▏| 48840/50000 [8:51:31<11:39,  1.66it/s]


 98%|██████████████████████████████████▏| 48841/50000 [8:51:32<11:36,  1.66it/s]


 98%|██████████████████████████████████▏| 48842/50000 [8:51:32<12:04,  1.60it/s]


 98%|██████████████████████████████████▏| 48843/50000 [8:51:33<12:02,  1.60it/s]


 98%|██████████████████████████████████▏| 48844/50000 [8:51:34<11:42,  1.64it/s]


 98%|██████████████████████████████████▏| 48845/50000 [8:51:34<12:01,  1.60it/s]


 98%|██████████████████████████████████▏| 48846/50000 [8:51:35<12:37,  1.52it/s]


 98%|██████████████████████████████████▏| 48847/50000 [8:51:36<13:11,  1.46it/s]


 98%|██████████████████████████████████▏| 48848/50000 [8:51:36<12:11,  1.58it/s]


 98%|██████████████████████████████████▏| 48849/50000 [8:51:37<11:40,  1.64it/s]


 98%|██████████████████████████████████▏| 48850/50000 [8:51:38<12:16,  1.56it/s]


 98%|██████████████████████████████████▏| 48851/50000 [8:51:38<12:11,  1.57it/s]


 98%|██████████████████████████████████▏| 48852/50000 [8:51:39<11:49,  1.62it/s]


 98%|██████████████████████████████████▏| 48853/50000 [8:51:39<11:36,  1.65it/s]


 98%|██████████████████████████████████▏| 48854/50000 [8:51:40<12:20,  1.55it/s]


 98%|██████████████████████████████████▏| 48855/50000 [8:51:41<12:46,  1.49it/s]


 98%|██████████████████████████████████▏| 48856/50000 [8:51:42<13:17,  1.43it/s]


 98%|██████████████████████████████████▏| 48857/50000 [8:51:42<13:07,  1.45it/s]


 98%|██████████████████████████████████▏| 48858/50000 [8:51:43<12:45,  1.49it/s]


 98%|██████████████████████████████████▏| 48859/50000 [8:51:43<11:50,  1.61it/s]


 98%|██████████████████████████████████▏| 48860/50000 [8:51:44<11:59,  1.58it/s]


 98%|██████████████████████████████████▏| 48861/50000 [8:51:45<11:39,  1.63it/s]


 98%|██████████████████████████████████▏| 48862/50000 [8:51:45<12:25,  1.53it/s]


 98%|██████████████████████████████████▏| 48863/50000 [8:51:46<12:32,  1.51it/s]


 98%|██████████████████████████████████▏| 48864/50000 [8:51:47<11:56,  1.59it/s]


 98%|██████████████████████████████████▏| 48865/50000 [8:51:47<11:43,  1.61it/s]


 98%|██████████████████████████████████▏| 48866/50000 [8:51:48<11:36,  1.63it/s]


 98%|██████████████████████████████████▏| 48867/50000 [8:51:48<12:13,  1.55it/s]


 98%|██████████████████████████████████▏| 48868/50000 [8:51:49<12:20,  1.53it/s]


 98%|██████████████████████████████████▏| 48869/50000 [8:51:50<12:14,  1.54it/s]


 98%|██████████████████████████████████▏| 48870/50000 [8:51:50<12:04,  1.56it/s]


 98%|██████████████████████████████████▏| 48871/50000 [8:51:51<12:26,  1.51it/s]


 98%|██████████████████████████████████▏| 48872/50000 [8:51:52<12:47,  1.47it/s]


 98%|██████████████████████████████████▏| 48873/50000 [8:51:52<12:19,  1.52it/s]


 98%|██████████████████████████████████▏| 48874/50000 [8:51:53<11:53,  1.58it/s]


 98%|██████████████████████████████████▏| 48875/50000 [8:51:54<11:35,  1.62it/s]


 98%|██████████████████████████████████▏| 48876/50000 [8:51:54<12:12,  1.53it/s]


 98%|██████████████████████████████████▏| 48877/50000 [8:51:55<11:46,  1.59it/s]


 98%|██████████████████████████████████▏| 48878/50000 [8:51:56<11:47,  1.59it/s]


 98%|██████████████████████████████████▏| 48879/50000 [8:51:56<11:43,  1.59it/s]


 98%|██████████████████████████████████▏| 48880/50000 [8:51:57<11:48,  1.58it/s]


 98%|██████████████████████████████████▏| 48881/50000 [8:51:58<12:22,  1.51it/s]


 98%|██████████████████████████████████▏| 48882/50000 [8:51:58<11:28,  1.62it/s]


 98%|██████████████████████████████████▏| 48883/50000 [8:51:59<11:42,  1.59it/s]


 98%|██████████████████████████████████▏| 48884/50000 [8:51:59<11:28,  1.62it/s]


 98%|██████████████████████████████████▏| 48885/50000 [8:52:00<11:31,  1.61it/s]


 98%|██████████████████████████████████▏| 48886/50000 [8:52:00<10:56,  1.70it/s]


 98%|██████████████████████████████████▏| 48887/50000 [8:52:01<11:10,  1.66it/s]


 98%|██████████████████████████████████▏| 48888/50000 [8:52:02<11:24,  1.63it/s]


 98%|██████████████████████████████████▏| 48889/50000 [8:52:02<12:18,  1.50it/s]


 98%|██████████████████████████████████▏| 48890/50000 [8:52:03<11:48,  1.57it/s]


 98%|██████████████████████████████████▏| 48891/50000 [8:52:04<11:31,  1.60it/s]


 98%|██████████████████████████████████▏| 48892/50000 [8:52:04<11:14,  1.64it/s]


 98%|██████████████████████████████████▏| 48893/50000 [8:52:05<11:47,  1.56it/s]


 98%|██████████████████████████████████▏| 48894/50000 [8:52:06<11:36,  1.59it/s]


 98%|██████████████████████████████████▏| 48895/50000 [8:52:06<11:36,  1.59it/s]


 98%|██████████████████████████████████▏| 48896/50000 [8:52:07<12:11,  1.51it/s]


 98%|██████████████████████████████████▏| 48897/50000 [8:52:08<12:00,  1.53it/s]


 98%|██████████████████████████████████▏| 48898/50000 [8:52:08<11:38,  1.58it/s]


 98%|██████████████████████████████████▏| 48899/50000 [8:52:09<12:02,  1.52it/s]


 98%|██████████████████████████████████▏| 48900/50000 [8:52:09<11:54,  1.54it/s]
                                                                                
{'loss': 3.1188, 'grad_norm': 3.3257639408111572, 'learning_rate': 2.2e-05, 'epoch': 2.56}

 98%|██████████████████████████████████▏| 48900/50000 [8:52:09<11:54,  1.54it/s]


 98%|██████████████████████████████████▏| 48901/50000 [8:52:10<11:35,  1.58it/s]


 98%|██████████████████████████████████▏| 48902/50000 [8:52:11<11:49,  1.55it/s]


 98%|██████████████████████████████████▏| 48903/50000 [8:52:11<11:27,  1.59it/s]


 98%|██████████████████████████████████▏| 48904/50000 [8:52:12<11:20,  1.61it/s]


 98%|██████████████████████████████████▏| 48905/50000 [8:52:13<11:53,  1.53it/s]


 98%|██████████████████████████████████▏| 48906/50000 [8:52:13<11:57,  1.52it/s]


 98%|██████████████████████████████████▏| 48907/50000 [8:52:14<11:20,  1.61it/s]


 98%|██████████████████████████████████▏| 48908/50000 [8:52:14<11:14,  1.62it/s]


 98%|██████████████████████████████████▏| 48909/50000 [8:52:15<11:21,  1.60it/s]


 98%|██████████████████████████████████▏| 48910/50000 [8:52:16<12:25,  1.46it/s]


 98%|██████████████████████████████████▏| 48911/50000 [8:52:17<12:48,  1.42it/s]


 98%|██████████████████████████████████▏| 48912/50000 [8:52:17<12:29,  1.45it/s]


 98%|██████████████████████████████████▏| 48913/50000 [8:52:18<12:21,  1.47it/s]


 98%|██████████████████████████████████▏| 48914/50000 [8:52:19<11:28,  1.58it/s]


 98%|██████████████████████████████████▏| 48915/50000 [8:52:19<10:58,  1.65it/s]


 98%|██████████████████████████████████▏| 48916/50000 [8:52:20<12:05,  1.49it/s]


 98%|██████████████████████████████████▏| 48917/50000 [8:52:20<11:37,  1.55it/s]


 98%|██████████████████████████████████▏| 48918/50000 [8:52:21<11:14,  1.60it/s]


 98%|██████████████████████████████████▏| 48919/50000 [8:52:22<12:41,  1.42it/s]


 98%|██████████████████████████████████▏| 48920/50000 [8:52:23<13:05,  1.37it/s]


 98%|██████████████████████████████████▏| 48921/50000 [8:52:23<12:42,  1.42it/s]


 98%|██████████████████████████████████▏| 48922/50000 [8:52:24<12:29,  1.44it/s]


 98%|██████████████████████████████████▏| 48923/50000 [8:52:25<13:05,  1.37it/s]


 98%|██████████████████████████████████▏| 48924/50000 [8:52:26<12:33,  1.43it/s]


 98%|██████████████████████████████████▏| 48925/50000 [8:52:26<12:21,  1.45it/s]


 98%|██████████████████████████████████▏| 48926/50000 [8:52:27<13:44,  1.30it/s]


 98%|██████████████████████████████████▏| 48927/50000 [8:52:28<13:06,  1.36it/s]


 98%|██████████████████████████████████▏| 48928/50000 [8:52:28<12:09,  1.47it/s]


 98%|██████████████████████████████████▎| 48929/50000 [8:52:29<12:50,  1.39it/s]


 98%|██████████████████████████████████▎| 48930/50000 [8:52:30<11:30,  1.55it/s]


 98%|██████████████████████████████████▎| 48931/50000 [8:52:30<11:07,  1.60it/s]


 98%|██████████████████████████████████▎| 48932/50000 [8:52:31<10:31,  1.69it/s]


 98%|██████████████████████████████████▎| 48933/50000 [8:52:31<10:28,  1.70it/s]


 98%|██████████████████████████████████▎| 48934/50000 [8:52:32<10:31,  1.69it/s]


 98%|██████████████████████████████████▎| 48935/50000 [8:52:33<11:12,  1.58it/s]


 98%|██████████████████████████████████▎| 48936/50000 [8:52:33<10:37,  1.67it/s]


 98%|██████████████████████████████████▎| 48937/50000 [8:52:34<10:55,  1.62it/s]


 98%|██████████████████████████████████▎| 48938/50000 [8:52:34<11:00,  1.61it/s]


 98%|██████████████████████████████████▎| 48939/50000 [8:52:35<10:43,  1.65it/s]


 98%|██████████████████████████████████▎| 48940/50000 [8:52:36<10:42,  1.65it/s]


 98%|██████████████████████████████████▎| 48941/50000 [8:52:36<10:24,  1.70it/s]


 98%|██████████████████████████████████▎| 48942/50000 [8:52:37<10:39,  1.65it/s]


 98%|██████████████████████████████████▎| 48943/50000 [8:52:37<11:12,  1.57it/s]


 98%|██████████████████████████████████▎| 48944/50000 [8:52:38<11:12,  1.57it/s]


 98%|██████████████████████████████████▎| 48945/50000 [8:52:39<11:14,  1.56it/s]


 98%|██████████████████████████████████▎| 48946/50000 [8:52:39<10:58,  1.60it/s]


 98%|██████████████████████████████████▎| 48947/50000 [8:52:40<10:43,  1.64it/s]


 98%|██████████████████████████████████▎| 48948/50000 [8:52:41<10:59,  1.60it/s]


 98%|██████████████████████████████████▎| 48949/50000 [8:52:41<10:50,  1.62it/s]


 98%|██████████████████████████████████▎| 48950/50000 [8:52:42<10:56,  1.60it/s]


 98%|██████████████████████████████████▎| 48951/50000 [8:52:42<10:24,  1.68it/s]


 98%|██████████████████████████████████▎| 48952/50000 [8:52:43<10:39,  1.64it/s]


 98%|██████████████████████████████████▎| 48953/50000 [8:52:44<10:54,  1.60it/s]


 98%|██████████████████████████████████▎| 48954/50000 [8:52:44<10:43,  1.62it/s]


 98%|██████████████████████████████████▎| 48955/50000 [8:52:45<11:00,  1.58it/s]


 98%|██████████████████████████████████▎| 48956/50000 [8:52:45<10:19,  1.68it/s]


 98%|██████████████████████████████████▎| 48957/50000 [8:52:46<10:31,  1.65it/s]


 98%|██████████████████████████████████▎| 48958/50000 [8:52:47<10:21,  1.68it/s]


 98%|██████████████████████████████████▎| 48959/50000 [8:52:47<11:00,  1.58it/s]


 98%|██████████████████████████████████▎| 48960/50000 [8:52:48<10:32,  1.65it/s]


 98%|██████████████████████████████████▎| 48961/50000 [8:52:49<10:29,  1.65it/s]


 98%|██████████████████████████████████▎| 48962/50000 [8:52:49<11:07,  1.56it/s]


 98%|██████████████████████████████████▎| 48963/50000 [8:52:50<10:35,  1.63it/s]


 98%|██████████████████████████████████▎| 48964/50000 [8:52:50<10:21,  1.67it/s]


 98%|██████████████████████████████████▎| 48965/50000 [8:52:51<10:54,  1.58it/s]


 98%|██████████████████████████████████▎| 48966/50000 [8:52:52<11:25,  1.51it/s]


 98%|██████████████████████████████████▎| 48967/50000 [8:52:52<11:02,  1.56it/s]


 98%|██████████████████████████████████▎| 48968/50000 [8:52:53<10:57,  1.57it/s]


 98%|██████████████████████████████████▎| 48969/50000 [8:52:54<10:45,  1.60it/s]


 98%|██████████████████████████████████▎| 48970/50000 [8:52:54<10:33,  1.63it/s]


 98%|██████████████████████████████████▎| 48971/50000 [8:52:55<10:30,  1.63it/s]


 98%|██████████████████████████████████▎| 48972/50000 [8:52:56<11:46,  1.45it/s]


 98%|██████████████████████████████████▎| 48973/50000 [8:52:56<11:41,  1.46it/s]


 98%|██████████████████████████████████▎| 48974/50000 [8:52:57<11:35,  1.47it/s]


 98%|██████████████████████████████████▎| 48975/50000 [8:52:58<10:59,  1.55it/s]


 98%|██████████████████████████████████▎| 48976/50000 [8:52:58<10:47,  1.58it/s]


 98%|██████████████████████████████████▎| 48977/50000 [8:52:59<11:15,  1.52it/s]


 98%|██████████████████████████████████▎| 48978/50000 [8:52:59<10:43,  1.59it/s]


 98%|██████████████████████████████████▎| 48979/50000 [8:53:00<10:44,  1.58it/s]


 98%|██████████████████████████████████▎| 48980/50000 [8:53:01<10:22,  1.64it/s]


 98%|██████████████████████████████████▎| 48981/50000 [8:53:01<11:12,  1.52it/s]


 98%|██████████████████████████████████▎| 48982/50000 [8:53:02<11:35,  1.46it/s]


 98%|██████████████████████████████████▎| 48983/50000 [8:53:03<11:24,  1.49it/s]


 98%|██████████████████████████████████▎| 48984/50000 [8:53:04<11:23,  1.49it/s]


 98%|██████████████████████████████████▎| 48985/50000 [8:53:04<11:13,  1.51it/s]


 98%|██████████████████████████████████▎| 48986/50000 [8:53:05<11:09,  1.51it/s]


 98%|██████████████████████████████████▎| 48987/50000 [8:53:05<10:47,  1.56it/s]


 98%|██████████████████████████████████▎| 48988/50000 [8:53:06<10:31,  1.60it/s]


 98%|██████████████████████████████████▎| 48989/50000 [8:53:07<10:10,  1.66it/s]


 98%|██████████████████████████████████▎| 48990/50000 [8:53:07<10:07,  1.66it/s]


 98%|██████████████████████████████████▎| 48991/50000 [8:53:08<10:23,  1.62it/s]


 98%|██████████████████████████████████▎| 48992/50000 [8:53:08<10:42,  1.57it/s]


 98%|██████████████████████████████████▎| 48993/50000 [8:53:09<10:17,  1.63it/s]


 98%|██████████████████████████████████▎| 48994/50000 [8:53:10<10:21,  1.62it/s]


 98%|██████████████████████████████████▎| 48995/50000 [8:53:10<10:13,  1.64it/s]


 98%|██████████████████████████████████▎| 48996/50000 [8:53:11<10:52,  1.54it/s]


 98%|██████████████████████████████████▎| 48997/50000 [8:53:12<12:06,  1.38it/s]


 98%|██████████████████████████████████▎| 48998/50000 [8:53:13<11:42,  1.43it/s]


 98%|██████████████████████████████████▎| 48999/50000 [8:53:13<11:10,  1.49it/s]


 98%|██████████████████████████████████▎| 49000/50000 [8:53:14<11:08,  1.50it/s]
                                                                                
{'loss': 3.0787, 'grad_norm': 4.719484329223633, 'learning_rate': 2e-05, 'epoch': 2.57}

 98%|██████████████████████████████████▎| 49000/50000 [8:53:14<11:08,  1.50it/s]


 98%|██████████████████████████████████▎| 49001/50000 [8:53:14<11:05,  1.50it/s]


 98%|██████████████████████████████████▎| 49002/50000 [8:53:15<11:02,  1.51it/s]


 98%|██████████████████████████████████▎| 49003/50000 [8:53:16<10:39,  1.56it/s]


 98%|██████████████████████████████████▎| 49004/50000 [8:53:16<10:37,  1.56it/s]


 98%|██████████████████████████████████▎| 49005/50000 [8:53:17<10:45,  1.54it/s]


 98%|██████████████████████████████████▎| 49006/50000 [8:53:18<10:30,  1.58it/s]


 98%|██████████████████████████████████▎| 49007/50000 [8:53:18<10:26,  1.59it/s]


 98%|██████████████████████████████████▎| 49008/50000 [8:53:19<11:00,  1.50it/s]


 98%|██████████████████████████████████▎| 49009/50000 [8:53:20<11:20,  1.46it/s]


 98%|██████████████████████████████████▎| 49010/50000 [8:53:20<10:47,  1.53it/s]


 98%|██████████████████████████████████▎| 49011/50000 [8:53:21<10:45,  1.53it/s]


 98%|██████████████████████████████████▎| 49012/50000 [8:53:22<10:40,  1.54it/s]


 98%|██████████████████████████████████▎| 49013/50000 [8:53:22<10:20,  1.59it/s]


 98%|██████████████████████████████████▎| 49014/50000 [8:53:23<11:04,  1.48it/s]


 98%|██████████████████████████████████▎| 49015/50000 [8:53:24<10:36,  1.55it/s]


 98%|██████████████████████████████████▎| 49016/50000 [8:53:24<10:43,  1.53it/s]


 98%|██████████████████████████████████▎| 49017/50000 [8:53:25<10:26,  1.57it/s]


 98%|██████████████████████████████████▎| 49018/50000 [8:53:25<10:26,  1.57it/s]


 98%|██████████████████████████████████▎| 49019/50000 [8:53:26<10:07,  1.61it/s]


 98%|██████████████████████████████████▎| 49020/50000 [8:53:27<10:35,  1.54it/s]


 98%|██████████████████████████████████▎| 49021/50000 [8:53:27<10:09,  1.61it/s]


 98%|██████████████████████████████████▎| 49022/50000 [8:53:28<10:10,  1.60it/s]


 98%|██████████████████████████████████▎| 49023/50000 [8:53:29<11:13,  1.45it/s]


 98%|██████████████████████████████████▎| 49024/50000 [8:53:29<10:41,  1.52it/s]


 98%|██████████████████████████████████▎| 49025/50000 [8:53:30<10:11,  1.59it/s]


 98%|██████████████████████████████████▎| 49026/50000 [8:53:31<10:01,  1.62it/s]


 98%|██████████████████████████████████▎| 49027/50000 [8:53:31<09:52,  1.64it/s]


 98%|██████████████████████████████████▎| 49028/50000 [8:53:32<10:27,  1.55it/s]


 98%|██████████████████████████████████▎| 49029/50000 [8:53:32<10:06,  1.60it/s]


 98%|██████████████████████████████████▎| 49030/50000 [8:53:33<10:20,  1.56it/s]


 98%|██████████████████████████████████▎| 49031/50000 [8:53:34<10:21,  1.56it/s]


 98%|██████████████████████████████████▎| 49032/50000 [8:53:34<10:48,  1.49it/s]


 98%|██████████████████████████████████▎| 49033/50000 [8:53:35<10:16,  1.57it/s]


 98%|██████████████████████████████████▎| 49034/50000 [8:53:36<10:50,  1.49it/s]


 98%|██████████████████████████████████▎| 49035/50000 [8:53:36<10:19,  1.56it/s]


 98%|██████████████████████████████████▎| 49036/50000 [8:53:37<10:25,  1.54it/s]


 98%|██████████████████████████████████▎| 49037/50000 [8:53:38<11:11,  1.43it/s]


 98%|██████████████████████████████████▎| 49038/50000 [8:53:38<10:31,  1.52it/s]


 98%|██████████████████████████████████▎| 49039/50000 [8:53:39<10:25,  1.54it/s]


 98%|██████████████████████████████████▎| 49040/50000 [8:53:40<10:57,  1.46it/s]


 98%|██████████████████████████████████▎| 49041/50000 [8:53:41<11:13,  1.42it/s]


 98%|██████████████████████████████████▎| 49042/50000 [8:53:41<11:03,  1.44it/s]


 98%|██████████████████████████████████▎| 49043/50000 [8:53:42<11:07,  1.43it/s]


 98%|██████████████████████████████████▎| 49044/50000 [8:53:43<10:49,  1.47it/s]


 98%|██████████████████████████████████▎| 49045/50000 [8:53:43<11:10,  1.42it/s]


 98%|██████████████████████████████████▎| 49046/50000 [8:53:44<11:17,  1.41it/s]


 98%|██████████████████████████████████▎| 49047/50000 [8:53:45<10:45,  1.48it/s]


 98%|██████████████████████████████████▎| 49048/50000 [8:53:45<11:01,  1.44it/s]


 98%|██████████████████████████████████▎| 49049/50000 [8:53:46<11:09,  1.42it/s]


 98%|██████████████████████████████████▎| 49050/50000 [8:53:47<11:02,  1.43it/s]


 98%|██████████████████████████████████▎| 49051/50000 [8:53:47<10:34,  1.50it/s]


 98%|██████████████████████████████████▎| 49052/50000 [8:53:48<10:54,  1.45it/s]


 98%|██████████████████████████████████▎| 49053/50000 [8:53:49<10:23,  1.52it/s]


 98%|██████████████████████████████████▎| 49054/50000 [8:53:49<10:08,  1.56it/s]


 98%|██████████████████████████████████▎| 49055/50000 [8:53:50<10:06,  1.56it/s]


 98%|██████████████████████████████████▎| 49056/50000 [8:53:51<10:13,  1.54it/s]


 98%|██████████████████████████████████▎| 49057/50000 [8:53:51<09:53,  1.59it/s]


 98%|██████████████████████████████████▎| 49058/50000 [8:53:52<11:09,  1.41it/s]


 98%|██████████████████████████████████▎| 49059/50000 [8:53:53<11:09,  1.41it/s]


 98%|██████████████████████████████████▎| 49060/50000 [8:53:53<10:54,  1.44it/s]


 98%|██████████████████████████████████▎| 49061/50000 [8:53:54<10:11,  1.54it/s]


 98%|██████████████████████████████████▎| 49062/50000 [8:53:55<09:33,  1.64it/s]


 98%|██████████████████████████████████▎| 49063/50000 [8:53:55<09:41,  1.61it/s]


 98%|██████████████████████████████████▎| 49064/50000 [8:53:56<09:33,  1.63it/s]


 98%|██████████████████████████████████▎| 49065/50000 [8:53:56<09:28,  1.65it/s]


 98%|██████████████████████████████████▎| 49066/50000 [8:53:57<09:12,  1.69it/s]


 98%|██████████████████████████████████▎| 49067/50000 [8:53:58<10:26,  1.49it/s]


 98%|██████████████████████████████████▎| 49068/50000 [8:53:59<10:53,  1.43it/s]


 98%|██████████████████████████████████▎| 49069/50000 [8:53:59<11:05,  1.40it/s]


 98%|██████████████████████████████████▎| 49070/50000 [8:54:00<10:07,  1.53it/s]


 98%|██████████████████████████████████▎| 49071/50000 [8:54:00<10:13,  1.51it/s]


 98%|██████████████████████████████████▎| 49072/50000 [8:54:01<09:56,  1.56it/s]


 98%|██████████████████████████████████▎| 49073/50000 [8:54:02<10:07,  1.53it/s]


 98%|██████████████████████████████████▎| 49074/50000 [8:54:02<09:46,  1.58it/s]


 98%|██████████████████████████████████▎| 49075/50000 [8:54:03<09:49,  1.57it/s]


 98%|██████████████████████████████████▎| 49076/50000 [8:54:04<09:44,  1.58it/s]


 98%|██████████████████████████████████▎| 49077/50000 [8:54:04<09:56,  1.55it/s]


 98%|██████████████████████████████████▎| 49078/50000 [8:54:05<10:12,  1.50it/s]


 98%|██████████████████████████████████▎| 49079/50000 [8:54:06<09:32,  1.61it/s]


 98%|██████████████████████████████████▎| 49080/50000 [8:54:06<10:01,  1.53it/s]


 98%|██████████████████████████████████▎| 49081/50000 [8:54:07<09:56,  1.54it/s]


 98%|██████████████████████████████████▎| 49082/50000 [8:54:07<09:30,  1.61it/s]


 98%|██████████████████████████████████▎| 49083/50000 [8:54:08<09:40,  1.58it/s]


 98%|██████████████████████████████████▎| 49084/50000 [8:54:09<09:39,  1.58it/s]


 98%|██████████████████████████████████▎| 49085/50000 [8:54:09<09:53,  1.54it/s]


 98%|██████████████████████████████████▎| 49086/50000 [8:54:10<09:48,  1.55it/s]


 98%|██████████████████████████████████▎| 49087/50000 [8:54:11<09:09,  1.66it/s]


 98%|██████████████████████████████████▎| 49088/50000 [8:54:11<09:18,  1.63it/s]


 98%|██████████████████████████████████▎| 49089/50000 [8:54:12<09:14,  1.64it/s]


 98%|██████████████████████████████████▎| 49090/50000 [8:54:12<08:47,  1.73it/s]


 98%|██████████████████████████████████▎| 49091/50000 [8:54:13<09:09,  1.65it/s]


 98%|██████████████████████████████████▎| 49092/50000 [8:54:14<08:56,  1.69it/s]


 98%|██████████████████████████████████▎| 49093/50000 [8:54:14<08:48,  1.72it/s]


 98%|██████████████████████████████████▎| 49094/50000 [8:54:15<09:56,  1.52it/s]


 98%|██████████████████████████████████▎| 49095/50000 [8:54:15<09:24,  1.60it/s]


 98%|██████████████████████████████████▎| 49096/50000 [8:54:16<09:36,  1.57it/s]


 98%|██████████████████████████████████▎| 49097/50000 [8:54:17<09:37,  1.56it/s]


 98%|██████████████████████████████████▎| 49098/50000 [8:54:18<10:28,  1.44it/s]


 98%|██████████████████████████████████▎| 49099/50000 [8:54:18<09:51,  1.52it/s]


 98%|██████████████████████████████████▎| 49100/50000 [8:54:19<09:09,  1.64it/s]
                                                                                
{'loss': 3.0807, 'grad_norm': 4.262101650238037, 'learning_rate': 1.8e-05, 'epoch': 2.57}

 98%|██████████████████████████████████▎| 49100/50000 [8:54:19<09:09,  1.64it/s]


 98%|██████████████████████████████████▎| 49101/50000 [8:54:19<09:39,  1.55it/s]


 98%|██████████████████████████████████▎| 49102/50000 [8:54:20<10:26,  1.43it/s]


 98%|██████████████████████████████████▎| 49103/50000 [8:54:21<09:26,  1.58it/s]


 98%|██████████████████████████████████▎| 49104/50000 [8:54:21<09:36,  1.55it/s]


 98%|██████████████████████████████████▎| 49105/50000 [8:54:22<09:34,  1.56it/s]


 98%|██████████████████████████████████▎| 49106/50000 [8:54:23<09:33,  1.56it/s]


 98%|██████████████████████████████████▎| 49107/50000 [8:54:23<09:31,  1.56it/s]


 98%|██████████████████████████████████▍| 49108/50000 [8:54:24<09:18,  1.60it/s]


 98%|██████████████████████████████████▍| 49109/50000 [8:54:24<09:06,  1.63it/s]


 98%|██████████████████████████████████▍| 49110/50000 [8:54:25<09:12,  1.61it/s]


 98%|██████████████████████████████████▍| 49111/50000 [8:54:26<09:40,  1.53it/s]


 98%|██████████████████████████████████▍| 49112/50000 [8:54:26<09:39,  1.53it/s]


 98%|██████████████████████████████████▍| 49113/50000 [8:54:27<09:18,  1.59it/s]


 98%|██████████████████████████████████▍| 49114/50000 [8:54:28<10:10,  1.45it/s]


 98%|██████████████████████████████████▍| 49115/50000 [8:54:29<10:22,  1.42it/s]


 98%|██████████████████████████████████▍| 49116/50000 [8:54:29<10:11,  1.45it/s]


 98%|██████████████████████████████████▍| 49117/50000 [8:54:30<10:01,  1.47it/s]


 98%|██████████████████████████████████▍| 49118/50000 [8:54:31<09:31,  1.54it/s]


 98%|██████████████████████████████████▍| 49119/50000 [8:54:31<09:05,  1.62it/s]


 98%|██████████████████████████████████▍| 49120/50000 [8:54:32<10:07,  1.45it/s]


 98%|██████████████████████████████████▍| 49121/50000 [8:54:32<09:31,  1.54it/s]


 98%|██████████████████████████████████▍| 49122/50000 [8:54:33<09:11,  1.59it/s]


 98%|██████████████████████████████████▍| 49123/50000 [8:54:34<08:56,  1.63it/s]


 98%|██████████████████████████████████▍| 49124/50000 [8:54:34<08:48,  1.66it/s]


 98%|██████████████████████████████████▍| 49125/50000 [8:54:35<08:37,  1.69it/s]


 98%|██████████████████████████████████▍| 49126/50000 [8:54:35<08:50,  1.65it/s]


 98%|██████████████████████████████████▍| 49127/50000 [8:54:36<08:43,  1.67it/s]


 98%|██████████████████████████████████▍| 49128/50000 [8:54:37<08:33,  1.70it/s]


 98%|██████████████████████████████████▍| 49129/50000 [8:54:37<08:15,  1.76it/s]


 98%|██████████████████████████████████▍| 49130/50000 [8:54:38<08:02,  1.80it/s]


 98%|██████████████████████████████████▍| 49131/50000 [8:54:38<08:22,  1.73it/s]


 98%|██████████████████████████████████▍| 49132/50000 [8:54:39<08:15,  1.75it/s]


 98%|██████████████████████████████████▍| 49133/50000 [8:54:39<08:35,  1.68it/s]


 98%|██████████████████████████████████▍| 49134/50000 [8:54:40<08:54,  1.62it/s]


 98%|██████████████████████████████████▍| 49135/50000 [8:54:41<08:26,  1.71it/s]


 98%|██████████████████████████████████▍| 49136/50000 [8:54:41<08:44,  1.65it/s]


 98%|██████████████████████████████████▍| 49137/50000 [8:54:42<08:53,  1.62it/s]


 98%|██████████████████████████████████▍| 49138/50000 [8:54:42<08:18,  1.73it/s]


 98%|██████████████████████████████████▍| 49139/50000 [8:54:43<08:55,  1.61it/s]


 98%|██████████████████████████████████▍| 49140/50000 [8:54:44<09:33,  1.50it/s]


 98%|██████████████████████████████████▍| 49141/50000 [8:54:45<09:30,  1.50it/s]


 98%|██████████████████████████████████▍| 49142/50000 [8:54:45<09:45,  1.47it/s]


 98%|██████████████████████████████████▍| 49143/50000 [8:54:46<09:28,  1.51it/s]


 98%|██████████████████████████████████▍| 49144/50000 [8:54:47<09:23,  1.52it/s]


 98%|██████████████████████████████████▍| 49145/50000 [8:54:47<09:00,  1.58it/s]


 98%|██████████████████████████████████▍| 49146/50000 [8:54:48<08:49,  1.61it/s]


 98%|██████████████████████████████████▍| 49147/50000 [8:54:48<08:24,  1.69it/s]


 98%|██████████████████████████████████▍| 49148/50000 [8:54:49<08:14,  1.72it/s]


 98%|██████████████████████████████████▍| 49149/50000 [8:54:49<08:28,  1.67it/s]


 98%|██████████████████████████████████▍| 49150/50000 [8:54:50<08:58,  1.58it/s]


 98%|██████████████████████████████████▍| 49151/50000 [8:54:51<08:58,  1.58it/s]


 98%|██████████████████████████████████▍| 49152/50000 [8:54:51<09:07,  1.55it/s]


 98%|██████████████████████████████████▍| 49153/50000 [8:54:52<09:41,  1.46it/s]


 98%|██████████████████████████████████▍| 49154/50000 [8:54:53<09:17,  1.52it/s]


 98%|██████████████████████████████████▍| 49155/50000 [8:54:53<08:56,  1.57it/s]


 98%|██████████████████████████████████▍| 49156/50000 [8:54:54<08:28,  1.66it/s]


 98%|██████████████████████████████████▍| 49157/50000 [8:54:55<09:13,  1.52it/s]


 98%|██████████████████████████████████▍| 49158/50000 [8:54:55<09:30,  1.47it/s]


 98%|██████████████████████████████████▍| 49159/50000 [8:54:56<09:46,  1.43it/s]


 98%|██████████████████████████████████▍| 49160/50000 [8:54:57<10:06,  1.38it/s]


 98%|██████████████████████████████████▍| 49161/50000 [8:54:58<09:29,  1.47it/s]


 98%|██████████████████████████████████▍| 49162/50000 [8:54:58<08:56,  1.56it/s]


 98%|██████████████████████████████████▍| 49163/50000 [8:54:59<08:47,  1.59it/s]


 98%|██████████████████████████████████▍| 49164/50000 [8:54:59<08:39,  1.61it/s]


 98%|██████████████████████████████████▍| 49165/50000 [8:55:00<08:06,  1.72it/s]


 98%|██████████████████████████████████▍| 49166/50000 [8:55:00<08:05,  1.72it/s]


 98%|██████████████████████████████████▍| 49167/50000 [8:55:01<08:18,  1.67it/s]


 98%|██████████████████████████████████▍| 49168/50000 [8:55:02<08:18,  1.67it/s]


 98%|██████████████████████████████████▍| 49169/50000 [8:55:02<07:58,  1.74it/s]


 98%|██████████████████████████████████▍| 49170/50000 [8:55:03<08:21,  1.65it/s]


 98%|██████████████████████████████████▍| 49171/50000 [8:55:03<08:17,  1.67it/s]


 98%|██████████████████████████████████▍| 49172/50000 [8:55:04<08:32,  1.62it/s]


 98%|██████████████████████████████████▍| 49173/50000 [8:55:05<08:32,  1.61it/s]


 98%|██████████████████████████████████▍| 49174/50000 [8:55:05<08:38,  1.59it/s]


 98%|██████████████████████████████████▍| 49175/50000 [8:55:06<08:29,  1.62it/s]


 98%|██████████████████████████████████▍| 49176/50000 [8:55:07<08:24,  1.63it/s]


 98%|██████████████████████████████████▍| 49177/50000 [8:55:07<08:47,  1.56it/s]


 98%|██████████████████████████████████▍| 49178/50000 [8:55:08<09:10,  1.49it/s]


 98%|██████████████████████████████████▍| 49179/50000 [8:55:09<08:59,  1.52it/s]


 98%|██████████████████████████████████▍| 49180/50000 [8:55:09<09:21,  1.46it/s]


 98%|██████████████████████████████████▍| 49181/50000 [8:55:10<08:55,  1.53it/s]


 98%|██████████████████████████████████▍| 49182/50000 [8:55:11<08:40,  1.57it/s]


 98%|██████████████████████████████████▍| 49183/50000 [8:55:11<08:46,  1.55it/s]


 98%|██████████████████████████████████▍| 49184/50000 [8:55:12<08:25,  1.61it/s]


 98%|██████████████████████████████████▍| 49185/50000 [8:55:12<08:29,  1.60it/s]


 98%|██████████████████████████████████▍| 49186/50000 [8:55:13<08:40,  1.56it/s]


 98%|██████████████████████████████████▍| 49187/50000 [8:55:14<09:21,  1.45it/s]


 98%|██████████████████████████████████▍| 49188/50000 [8:55:15<09:08,  1.48it/s]


 98%|██████████████████████████████████▍| 49189/50000 [8:55:15<08:49,  1.53it/s]


 98%|██████████████████████████████████▍| 49190/50000 [8:55:16<08:50,  1.53it/s]


 98%|██████████████████████████████████▍| 49191/50000 [8:55:16<08:30,  1.59it/s]


 98%|██████████████████████████████████▍| 49192/50000 [8:55:17<08:55,  1.51it/s]


 98%|██████████████████████████████████▍| 49193/50000 [8:55:18<09:13,  1.46it/s]


 98%|██████████████████████████████████▍| 49194/50000 [8:55:19<09:25,  1.43it/s]


 98%|██████████████████████████████████▍| 49195/50000 [8:55:19<09:00,  1.49it/s]


 98%|██████████████████████████████████▍| 49196/50000 [8:55:20<08:38,  1.55it/s]


 98%|██████████████████████████████████▍| 49197/50000 [8:55:20<08:03,  1.66it/s]


 98%|██████████████████████████████████▍| 49198/50000 [8:55:21<08:36,  1.55it/s]


 98%|██████████████████████████████████▍| 49199/50000 [8:55:22<08:14,  1.62it/s]


 98%|██████████████████████████████████▍| 49200/50000 [8:55:22<08:08,  1.64it/s]
                                                                                
{'loss': 3.0781, 'grad_norm': 3.354680299758911, 'learning_rate': 1.6e-05, 'epoch': 2.58}

 98%|██████████████████████████████████▍| 49200/50000 [8:55:22<08:08,  1.64it/s]


 98%|██████████████████████████████████▍| 49201/50000 [8:55:23<08:13,  1.62it/s]


 98%|██████████████████████████████████▍| 49202/50000 [8:55:23<08:01,  1.66it/s]


 98%|██████████████████████████████████▍| 49203/50000 [8:55:24<07:55,  1.68it/s]


 98%|██████████████████████████████████▍| 49204/50000 [8:55:25<08:28,  1.57it/s]


 98%|██████████████████████████████████▍| 49205/50000 [8:55:25<07:55,  1.67it/s]


 98%|██████████████████████████████████▍| 49206/50000 [8:55:26<08:54,  1.49it/s]


 98%|██████████████████████████████████▍| 49207/50000 [8:55:27<08:30,  1.55it/s]


 98%|██████████████████████████████████▍| 49208/50000 [8:55:27<08:19,  1.59it/s]


 98%|██████████████████████████████████▍| 49209/50000 [8:55:28<07:58,  1.65it/s]


 98%|██████████████████████████████████▍| 49210/50000 [8:55:28<08:29,  1.55it/s]


 98%|██████████████████████████████████▍| 49211/50000 [8:55:29<08:31,  1.54it/s]


 98%|██████████████████████████████████▍| 49212/50000 [8:55:30<08:27,  1.55it/s]


 98%|██████████████████████████████████▍| 49213/50000 [8:55:30<08:29,  1.54it/s]


 98%|██████████████████████████████████▍| 49214/50000 [8:55:31<08:04,  1.62it/s]


 98%|██████████████████████████████████▍| 49215/50000 [8:55:32<08:12,  1.59it/s]


 98%|██████████████████████████████████▍| 49216/50000 [8:55:32<08:30,  1.53it/s]


 98%|██████████████████████████████████▍| 49217/50000 [8:55:33<08:43,  1.50it/s]


 98%|██████████████████████████████████▍| 49218/50000 [8:55:34<08:13,  1.58it/s]


 98%|██████████████████████████████████▍| 49219/50000 [8:55:34<08:34,  1.52it/s]


 98%|██████████████████████████████████▍| 49220/50000 [8:55:35<08:11,  1.59it/s]


 98%|██████████████████████████████████▍| 49221/50000 [8:55:36<08:18,  1.56it/s]


 98%|██████████████████████████████████▍| 49222/50000 [8:55:36<08:00,  1.62it/s]


 98%|██████████████████████████████████▍| 49223/50000 [8:55:37<08:03,  1.61it/s]


 98%|██████████████████████████████████▍| 49224/50000 [8:55:37<07:54,  1.63it/s]


 98%|██████████████████████████████████▍| 49225/50000 [8:55:38<08:48,  1.47it/s]


 98%|██████████████████████████████████▍| 49226/50000 [8:55:39<09:05,  1.42it/s]


 98%|██████████████████████████████████▍| 49227/50000 [8:55:40<08:53,  1.45it/s]


 98%|██████████████████████████████████▍| 49228/50000 [8:55:40<09:02,  1.42it/s]


 98%|██████████████████████████████████▍| 49229/50000 [8:55:41<09:15,  1.39it/s]


 98%|██████████████████████████████████▍| 49230/50000 [8:55:42<08:58,  1.43it/s]


 98%|██████████████████████████████████▍| 49231/50000 [8:55:43<09:33,  1.34it/s]


 98%|██████████████████████████████████▍| 49232/50000 [8:55:43<08:52,  1.44it/s]


 98%|██████████████████████████████████▍| 49233/50000 [8:55:44<08:49,  1.45it/s]


 98%|██████████████████████████████████▍| 49234/50000 [8:55:45<08:45,  1.46it/s]


 98%|██████████████████████████████████▍| 49235/50000 [8:55:45<09:17,  1.37it/s]


 98%|██████████████████████████████████▍| 49236/50000 [8:55:46<09:20,  1.36it/s]


 98%|██████████████████████████████████▍| 49237/50000 [8:55:47<08:43,  1.46it/s]


 98%|██████████████████████████████████▍| 49238/50000 [8:55:47<08:30,  1.49it/s]


 98%|██████████████████████████████████▍| 49239/50000 [8:55:48<08:12,  1.54it/s]


 98%|██████████████████████████████████▍| 49240/50000 [8:55:48<07:59,  1.58it/s]


 98%|██████████████████████████████████▍| 49241/50000 [8:55:49<08:08,  1.55it/s]


 98%|██████████████████████████████████▍| 49242/50000 [8:55:50<08:07,  1.55it/s]


 98%|██████████████████████████████████▍| 49243/50000 [8:55:51<08:30,  1.48it/s]


 98%|██████████████████████████████████▍| 49244/50000 [8:55:51<08:37,  1.46it/s]


 98%|██████████████████████████████████▍| 49245/50000 [8:55:52<08:11,  1.54it/s]


 98%|██████████████████████████████████▍| 49246/50000 [8:55:52<08:05,  1.55it/s]


 98%|██████████████████████████████████▍| 49247/50000 [8:55:53<08:01,  1.56it/s]


 98%|██████████████████████████████████▍| 49248/50000 [8:55:54<08:19,  1.51it/s]


 98%|██████████████████████████████████▍| 49249/50000 [8:55:54<08:04,  1.55it/s]


 98%|██████████████████████████████████▍| 49250/50000 [8:55:55<08:01,  1.56it/s]


 99%|██████████████████████████████████▍| 49251/50000 [8:55:56<08:10,  1.53it/s]


 99%|██████████████████████████████████▍| 49252/50000 [8:55:56<08:09,  1.53it/s]


 99%|██████████████████████████████████▍| 49253/50000 [8:55:57<08:29,  1.47it/s]


 99%|██████████████████████████████████▍| 49254/50000 [8:55:58<08:09,  1.52it/s]


 99%|██████████████████████████████████▍| 49255/50000 [8:55:58<08:25,  1.47it/s]


 99%|██████████████████████████████████▍| 49256/50000 [8:55:59<08:07,  1.52it/s]


 99%|██████████████████████████████████▍| 49257/50000 [8:56:00<08:08,  1.52it/s]


 99%|██████████████████████████████████▍| 49258/50000 [8:56:00<08:06,  1.52it/s]


 99%|██████████████████████████████████▍| 49259/50000 [8:56:01<08:41,  1.42it/s]


 99%|██████████████████████████████████▍| 49260/50000 [8:56:02<08:31,  1.45it/s]


 99%|██████████████████████████████████▍| 49261/50000 [8:56:02<08:20,  1.48it/s]


 99%|██████████████████████████████████▍| 49262/50000 [8:56:03<07:58,  1.54it/s]


 99%|██████████████████████████████████▍| 49263/50000 [8:56:04<07:46,  1.58it/s]


 99%|██████████████████████████████████▍| 49264/50000 [8:56:04<07:48,  1.57it/s]


 99%|██████████████████████████████████▍| 49265/50000 [8:56:05<07:49,  1.57it/s]


 99%|██████████████████████████████████▍| 49266/50000 [8:56:06<07:40,  1.60it/s]


 99%|██████████████████████████████████▍| 49267/50000 [8:56:06<07:40,  1.59it/s]


 99%|██████████████████████████████████▍| 49268/50000 [8:56:07<07:43,  1.58it/s]


 99%|██████████████████████████████████▍| 49269/50000 [8:56:07<07:32,  1.62it/s]


 99%|██████████████████████████████████▍| 49270/50000 [8:56:08<07:55,  1.54it/s]


 99%|██████████████████████████████████▍| 49271/50000 [8:56:09<07:35,  1.60it/s]


 99%|██████████████████████████████████▍| 49272/50000 [8:56:09<07:38,  1.59it/s]


 99%|██████████████████████████████████▍| 49273/50000 [8:56:10<07:33,  1.60it/s]


 99%|██████████████████████████████████▍| 49274/50000 [8:56:11<07:33,  1.60it/s]


 99%|██████████████████████████████████▍| 49275/50000 [8:56:11<07:54,  1.53it/s]


 99%|██████████████████████████████████▍| 49276/50000 [8:56:12<07:59,  1.51it/s]


 99%|██████████████████████████████████▍| 49277/50000 [8:56:13<07:54,  1.53it/s]


 99%|██████████████████████████████████▍| 49278/50000 [8:56:13<07:37,  1.58it/s]


 99%|██████████████████████████████████▍| 49279/50000 [8:56:14<07:20,  1.64it/s]


 99%|██████████████████████████████████▍| 49280/50000 [8:56:14<07:23,  1.62it/s]


 99%|██████████████████████████████████▍| 49281/50000 [8:56:15<07:17,  1.64it/s]


 99%|██████████████████████████████████▍| 49282/50000 [8:56:16<07:25,  1.61it/s]


 99%|██████████████████████████████████▍| 49283/50000 [8:56:16<07:31,  1.59it/s]


 99%|██████████████████████████████████▍| 49284/50000 [8:56:17<07:16,  1.64it/s]


 99%|██████████████████████████████████▍| 49285/50000 [8:56:17<07:21,  1.62it/s]


 99%|██████████████████████████████████▌| 49286/50000 [8:56:18<07:09,  1.66it/s]


 99%|██████████████████████████████████▌| 49287/50000 [8:56:19<07:44,  1.53it/s]


 99%|██████████████████████████████████▌| 49288/50000 [8:56:19<07:42,  1.54it/s]


 99%|██████████████████████████████████▌| 49289/50000 [8:56:20<08:17,  1.43it/s]


 99%|██████████████████████████████████▌| 49290/50000 [8:56:21<08:23,  1.41it/s]


 99%|██████████████████████████████████▌| 49291/50000 [8:56:22<08:12,  1.44it/s]


 99%|██████████████████████████████████▌| 49292/50000 [8:56:22<07:52,  1.50it/s]


 99%|██████████████████████████████████▌| 49293/50000 [8:56:23<07:43,  1.53it/s]


 99%|██████████████████████████████████▌| 49294/50000 [8:56:24<07:31,  1.56it/s]


 99%|██████████████████████████████████▌| 49295/50000 [8:56:24<07:22,  1.59it/s]


 99%|██████████████████████████████████▌| 49296/50000 [8:56:25<07:41,  1.53it/s]


 99%|██████████████████████████████████▌| 49297/50000 [8:56:26<07:53,  1.48it/s]


 99%|██████████████████████████████████▌| 49298/50000 [8:56:26<07:26,  1.57it/s]


 99%|██████████████████████████████████▌| 49299/50000 [8:56:27<07:44,  1.51it/s]


 99%|██████████████████████████████████▌| 49300/50000 [8:56:27<07:32,  1.55it/s]
                                                                                
{'loss': 3.1447, 'grad_norm': 3.0245466232299805, 'learning_rate': 1.4e-05, 'epoch': 2.58}

 99%|██████████████████████████████████▌| 49300/50000 [8:56:27<07:32,  1.55it/s]


 99%|██████████████████████████████████▌| 49301/50000 [8:56:28<07:36,  1.53it/s]


 99%|██████████████████████████████████▌| 49302/50000 [8:56:29<07:56,  1.47it/s]


 99%|██████████████████████████████████▌| 49303/50000 [8:56:30<07:54,  1.47it/s]


 99%|██████████████████████████████████▌| 49304/50000 [8:56:30<08:00,  1.45it/s]


 99%|██████████████████████████████████▌| 49305/50000 [8:56:31<08:03,  1.44it/s]


 99%|██████████████████████████████████▌| 49306/50000 [8:56:32<07:39,  1.51it/s]


 99%|██████████████████████████████████▌| 49307/50000 [8:56:32<07:36,  1.52it/s]


 99%|██████████████████████████████████▌| 49308/50000 [8:56:33<07:30,  1.54it/s]


 99%|██████████████████████████████████▌| 49309/50000 [8:56:33<07:18,  1.58it/s]


 99%|██████████████████████████████████▌| 49310/50000 [8:56:34<06:59,  1.64it/s]


 99%|██████████████████████████████████▌| 49311/50000 [8:56:35<07:19,  1.57it/s]


 99%|██████████████████████████████████▌| 49312/50000 [8:56:35<07:27,  1.54it/s]


 99%|██████████████████████████████████▌| 49313/50000 [8:56:36<07:30,  1.52it/s]


 99%|██████████████████████████████████▌| 49314/50000 [8:56:37<07:29,  1.53it/s]


 99%|██████████████████████████████████▌| 49315/50000 [8:56:37<07:16,  1.57it/s]


 99%|██████████████████████████████████▌| 49316/50000 [8:56:38<07:00,  1.63it/s]


 99%|██████████████████████████████████▌| 49317/50000 [8:56:38<07:11,  1.58it/s]


 99%|██████████████████████████████████▌| 49318/50000 [8:56:39<07:14,  1.57it/s]


 99%|██████████████████████████████████▌| 49319/50000 [8:56:40<07:20,  1.54it/s]


 99%|██████████████████████████████████▌| 49320/50000 [8:56:40<07:22,  1.54it/s]


 99%|██████████████████████████████████▌| 49321/50000 [8:56:41<07:20,  1.54it/s]


 99%|██████████████████████████████████▌| 49322/50000 [8:56:42<07:07,  1.58it/s]


 99%|██████████████████████████████████▌| 49323/50000 [8:56:42<06:42,  1.68it/s]


 99%|██████████████████████████████████▌| 49324/50000 [8:56:43<06:38,  1.69it/s]


 99%|██████████████████████████████████▌| 49325/50000 [8:56:43<06:53,  1.63it/s]


 99%|██████████████████████████████████▌| 49326/50000 [8:56:44<06:42,  1.67it/s]


 99%|██████████████████████████████████▌| 49327/50000 [8:56:45<06:40,  1.68it/s]


 99%|██████████████████████████████████▌| 49328/50000 [8:56:45<06:25,  1.74it/s]


 99%|██████████████████████████████████▌| 49329/50000 [8:56:46<06:37,  1.69it/s]


 99%|██████████████████████████████████▌| 49330/50000 [8:56:46<06:51,  1.63it/s]


 99%|██████████████████████████████████▌| 49331/50000 [8:56:47<06:48,  1.64it/s]


 99%|██████████████████████████████████▌| 49332/50000 [8:56:48<07:18,  1.52it/s]


 99%|██████████████████████████████████▌| 49333/50000 [8:56:48<07:17,  1.52it/s]


 99%|██████████████████████████████████▌| 49334/50000 [8:56:49<07:04,  1.57it/s]


 99%|██████████████████████████████████▌| 49335/50000 [8:56:50<07:06,  1.56it/s]


 99%|██████████████████████████████████▌| 49336/50000 [8:56:51<07:48,  1.42it/s]


 99%|██████████████████████████████████▌| 49337/50000 [8:56:51<08:31,  1.30it/s]


 99%|██████████████████████████████████▌| 49338/50000 [8:56:52<08:09,  1.35it/s]


 99%|██████████████████████████████████▌| 49339/50000 [8:56:53<07:50,  1.41it/s]


 99%|██████████████████████████████████▌| 49340/50000 [8:56:53<07:43,  1.42it/s]


 99%|██████████████████████████████████▌| 49341/50000 [8:56:54<07:54,  1.39it/s]


 99%|██████████████████████████████████▌| 49342/50000 [8:56:55<07:31,  1.46it/s]


 99%|██████████████████████████████████▌| 49343/50000 [8:56:56<07:38,  1.43it/s]


 99%|██████████████████████████████████▌| 49344/50000 [8:56:56<07:52,  1.39it/s]


 99%|██████████████████████████████████▌| 49345/50000 [8:56:57<07:53,  1.38it/s]


 99%|██████████████████████████████████▌| 49346/50000 [8:56:58<07:50,  1.39it/s]


 99%|██████████████████████████████████▌| 49347/50000 [8:56:58<07:16,  1.50it/s]


 99%|██████████████████████████████████▌| 49348/50000 [8:56:59<07:01,  1.55it/s]


 99%|██████████████████████████████████▌| 49349/50000 [8:57:00<07:15,  1.49it/s]


 99%|██████████████████████████████████▌| 49350/50000 [8:57:00<07:15,  1.49it/s]


 99%|██████████████████████████████████▌| 49351/50000 [8:57:01<06:57,  1.55it/s]


 99%|██████████████████████████████████▌| 49352/50000 [8:57:02<07:04,  1.53it/s]


 99%|██████████████████████████████████▌| 49353/50000 [8:57:02<07:19,  1.47it/s]


 99%|██████████████████████████████████▌| 49354/50000 [8:57:03<07:15,  1.48it/s]


 99%|██████████████████████████████████▌| 49355/50000 [8:57:04<07:11,  1.49it/s]


 99%|██████████████████████████████████▌| 49356/50000 [8:57:04<07:04,  1.52it/s]


 99%|██████████████████████████████████▌| 49357/50000 [8:57:05<06:52,  1.56it/s]


 99%|██████████████████████████████████▌| 49358/50000 [8:57:05<06:42,  1.59it/s]


 99%|██████████████████████████████████▌| 49359/50000 [8:57:06<06:38,  1.61it/s]


 99%|██████████████████████████████████▌| 49360/50000 [8:57:07<06:30,  1.64it/s]


 99%|██████████████████████████████████▌| 49361/50000 [8:57:07<06:21,  1.67it/s]


 99%|██████████████████████████████████▌| 49362/50000 [8:57:08<06:17,  1.69it/s]


 99%|██████████████████████████████████▌| 49363/50000 [8:57:09<06:43,  1.58it/s]


 99%|██████████████████████████████████▌| 49364/50000 [8:57:09<07:07,  1.49it/s]


 99%|██████████████████████████████████▌| 49365/50000 [8:57:10<06:43,  1.57it/s]


 99%|██████████████████████████████████▌| 49366/50000 [8:57:10<06:29,  1.63it/s]


 99%|██████████████████████████████████▌| 49367/50000 [8:57:11<06:18,  1.67it/s]


 99%|██████████████████████████████████▌| 49368/50000 [8:57:12<06:19,  1.66it/s]


 99%|██████████████████████████████████▌| 49369/50000 [8:57:12<06:19,  1.66it/s]


 99%|██████████████████████████████████▌| 49370/50000 [8:57:13<06:18,  1.66it/s]


 99%|██████████████████████████████████▌| 49371/50000 [8:57:14<06:47,  1.54it/s]


 99%|██████████████████████████████████▌| 49372/50000 [8:57:14<06:42,  1.56it/s]


 99%|██████████████████████████████████▌| 49373/50000 [8:57:15<06:45,  1.55it/s]


 99%|██████████████████████████████████▌| 49374/50000 [8:57:15<06:35,  1.58it/s]


 99%|██████████████████████████████████▌| 49375/50000 [8:57:16<06:54,  1.51it/s]


 99%|██████████████████████████████████▌| 49376/50000 [8:57:17<07:09,  1.45it/s]


 99%|██████████████████████████████████▌| 49377/50000 [8:57:17<06:50,  1.52it/s]


 99%|██████████████████████████████████▌| 49378/50000 [8:57:18<06:50,  1.51it/s]


 99%|██████████████████████████████████▌| 49379/50000 [8:57:19<06:49,  1.52it/s]


 99%|██████████████████████████████████▌| 49380/50000 [8:57:20<07:06,  1.45it/s]


 99%|██████████████████████████████████▌| 49381/50000 [8:57:20<07:28,  1.38it/s]


 99%|██████████████████████████████████▌| 49382/50000 [8:57:21<07:08,  1.44it/s]


 99%|██████████████████████████████████▌| 49383/50000 [8:57:22<06:42,  1.53it/s]


 99%|██████████████████████████████████▌| 49384/50000 [8:57:22<06:23,  1.61it/s]


 99%|██████████████████████████████████▌| 49385/50000 [8:57:23<06:16,  1.63it/s]


 99%|██████████████████████████████████▌| 49386/50000 [8:57:23<06:12,  1.65it/s]


 99%|██████████████████████████████████▌| 49387/50000 [8:57:24<06:24,  1.59it/s]


 99%|██████████████████████████████████▌| 49388/50000 [8:57:24<06:04,  1.68it/s]


 99%|██████████████████████████████████▌| 49389/50000 [8:57:25<05:59,  1.70it/s]


 99%|██████████████████████████████████▌| 49390/50000 [8:57:26<05:57,  1.71it/s]


 99%|██████████████████████████████████▌| 49391/50000 [8:57:26<06:45,  1.50it/s]


 99%|██████████████████████████████████▌| 49392/50000 [8:57:27<06:27,  1.57it/s]


 99%|██████████████████████████████████▌| 49393/50000 [8:57:28<06:17,  1.61it/s]


 99%|██████████████████████████████████▌| 49394/50000 [8:57:28<06:13,  1.62it/s]


 99%|██████████████████████████████████▌| 49395/50000 [8:57:29<06:16,  1.61it/s]


 99%|██████████████████████████████████▌| 49396/50000 [8:57:30<06:20,  1.59it/s]


 99%|██████████████████████████████████▌| 49397/50000 [8:57:30<06:21,  1.58it/s]


 99%|██████████████████████████████████▌| 49398/50000 [8:57:31<06:24,  1.57it/s]


 99%|██████████████████████████████████▌| 49399/50000 [8:57:32<06:37,  1.51it/s]


 99%|██████████████████████████████████▌| 49400/50000 [8:57:32<06:39,  1.50it/s]
                                                                                
{'loss': 3.0941, 'grad_norm': 5.568228721618652, 'learning_rate': 1.2e-05, 'epoch': 2.59}

 99%|██████████████████████████████████▌| 49400/50000 [8:57:32<06:39,  1.50it/s]


 99%|██████████████████████████████████▌| 49401/50000 [8:57:33<06:20,  1.57it/s]


 99%|██████████████████████████████████▌| 49402/50000 [8:57:33<06:08,  1.62it/s]


 99%|██████████████████████████████████▌| 49403/50000 [8:57:34<06:03,  1.64it/s]


 99%|██████████████████████████████████▌| 49404/50000 [8:57:35<06:31,  1.52it/s]


 99%|██████████████████████████████████▌| 49405/50000 [8:57:35<06:28,  1.53it/s]


 99%|██████████████████████████████████▌| 49406/50000 [8:57:36<06:24,  1.54it/s]


 99%|██████████████████████████████████▌| 49407/50000 [8:57:37<06:24,  1.54it/s]


 99%|██████████████████████████████████▌| 49408/50000 [8:57:37<06:23,  1.54it/s]


 99%|██████████████████████████████████▌| 49409/50000 [8:57:38<06:39,  1.48it/s]


 99%|██████████████████████████████████▌| 49410/50000 [8:57:39<06:09,  1.60it/s]


 99%|██████████████████████████████████▌| 49411/50000 [8:57:39<06:10,  1.59it/s]


 99%|██████████████████████████████████▌| 49412/50000 [8:57:40<06:30,  1.51it/s]


 99%|██████████████████████████████████▌| 49413/50000 [8:57:40<05:57,  1.64it/s]


 99%|██████████████████████████████████▌| 49414/50000 [8:57:41<06:06,  1.60it/s]


 99%|██████████████████████████████████▌| 49415/50000 [8:57:42<05:46,  1.69it/s]


 99%|██████████████████████████████████▌| 49416/50000 [8:57:42<05:51,  1.66it/s]


 99%|██████████████████████████████████▌| 49417/50000 [8:57:43<06:03,  1.60it/s]


 99%|██████████████████████████████████▌| 49418/50000 [8:57:44<06:03,  1.60it/s]


 99%|██████████████████████████████████▌| 49419/50000 [8:57:44<06:08,  1.58it/s]


 99%|██████████████████████████████████▌| 49420/50000 [8:57:45<06:02,  1.60it/s]


 99%|██████████████████████████████████▌| 49421/50000 [8:57:45<06:10,  1.56it/s]


 99%|██████████████████████████████████▌| 49422/50000 [8:57:46<06:28,  1.49it/s]


 99%|██████████████████████████████████▌| 49423/50000 [8:57:47<06:10,  1.56it/s]


 99%|██████████████████████████████████▌| 49424/50000 [8:57:48<06:29,  1.48it/s]


 99%|██████████████████████████████████▌| 49425/50000 [8:57:48<06:20,  1.51it/s]


 99%|██████████████████████████████████▌| 49426/50000 [8:57:49<06:18,  1.51it/s]


 99%|██████████████████████████████████▌| 49427/50000 [8:57:50<06:31,  1.46it/s]


 99%|██████████████████████████████████▌| 49428/50000 [8:57:50<06:30,  1.46it/s]


 99%|██████████████████████████████████▌| 49429/50000 [8:57:51<06:20,  1.50it/s]


 99%|██████████████████████████████████▌| 49430/50000 [8:57:51<06:01,  1.57it/s]


 99%|██████████████████████████████████▌| 49431/50000 [8:57:52<05:41,  1.66it/s]


 99%|██████████████████████████████████▌| 49432/50000 [8:57:53<06:25,  1.47it/s]


 99%|██████████████████████████████████▌| 49433/50000 [8:57:53<06:17,  1.50it/s]


 99%|██████████████████████████████████▌| 49434/50000 [8:57:54<06:48,  1.39it/s]


 99%|██████████████████████████████████▌| 49435/50000 [8:57:55<06:24,  1.47it/s]


 99%|██████████████████████████████████▌| 49436/50000 [8:57:55<06:05,  1.54it/s]


 99%|██████████████████████████████████▌| 49437/50000 [8:57:56<06:03,  1.55it/s]


 99%|██████████████████████████████████▌| 49438/50000 [8:57:57<06:33,  1.43it/s]


 99%|██████████████████████████████████▌| 49439/50000 [8:57:58<06:23,  1.46it/s]


 99%|██████████████████████████████████▌| 49440/50000 [8:57:58<05:55,  1.57it/s]


 99%|██████████████████████████████████▌| 49441/50000 [8:57:59<05:43,  1.63it/s]


 99%|██████████████████████████████████▌| 49442/50000 [8:57:59<05:40,  1.64it/s]


 99%|██████████████████████████████████▌| 49443/50000 [8:58:00<06:14,  1.49it/s]


 99%|██████████████████████████████████▌| 49444/50000 [8:58:01<06:19,  1.46it/s]


 99%|██████████████████████████████████▌| 49445/50000 [8:58:02<06:34,  1.41it/s]


 99%|██████████████████████████████████▌| 49446/50000 [8:58:02<06:22,  1.45it/s]


 99%|██████████████████████████████████▌| 49447/50000 [8:58:03<06:11,  1.49it/s]


 99%|██████████████████████████████████▌| 49448/50000 [8:58:03<05:45,  1.60it/s]


 99%|██████████████████████████████████▌| 49449/50000 [8:58:04<05:49,  1.58it/s]


 99%|██████████████████████████████████▌| 49450/50000 [8:58:05<06:09,  1.49it/s]


 99%|██████████████████████████████████▌| 49451/50000 [8:58:06<06:32,  1.40it/s]


 99%|██████████████████████████████████▌| 49452/50000 [8:58:06<06:18,  1.45it/s]


 99%|██████████████████████████████████▌| 49453/50000 [8:58:07<05:59,  1.52it/s]


 99%|██████████████████████████████████▌| 49454/50000 [8:58:07<05:36,  1.62it/s]


 99%|██████████████████████████████████▌| 49455/50000 [8:58:08<05:29,  1.66it/s]


 99%|██████████████████████████████████▌| 49456/50000 [8:58:09<05:53,  1.54it/s]


 99%|██████████████████████████████████▌| 49457/50000 [8:58:09<05:41,  1.59it/s]


 99%|██████████████████████████████████▌| 49458/50000 [8:58:10<05:22,  1.68it/s]


 99%|██████████████████████████████████▌| 49459/50000 [8:58:10<05:22,  1.68it/s]


 99%|██████████████████████████████████▌| 49460/50000 [8:58:11<05:42,  1.58it/s]


 99%|██████████████████████████████████▌| 49461/50000 [8:58:12<05:45,  1.56it/s]


 99%|██████████████████████████████████▌| 49462/50000 [8:58:12<05:38,  1.59it/s]


 99%|██████████████████████████████████▌| 49463/50000 [8:58:13<05:34,  1.61it/s]


 99%|██████████████████████████████████▌| 49464/50000 [8:58:14<05:47,  1.54it/s]


 99%|██████████████████████████████████▋| 49465/50000 [8:58:14<05:39,  1.58it/s]


 99%|██████████████████████████████████▋| 49466/50000 [8:58:15<05:19,  1.67it/s]


 99%|██████████████████████████████████▋| 49467/50000 [8:58:15<05:28,  1.62it/s]


 99%|██████████████████████████████████▋| 49468/50000 [8:58:16<05:25,  1.63it/s]


 99%|██████████████████████████████████▋| 49469/50000 [8:58:17<05:23,  1.64it/s]


 99%|██████████████████████████████████▋| 49470/50000 [8:58:17<05:15,  1.68it/s]


 99%|██████████████████████████████████▋| 49471/50000 [8:58:18<05:38,  1.56it/s]


 99%|██████████████████████████████████▋| 49472/50000 [8:58:18<05:29,  1.60it/s]


 99%|██████████████████████████████████▋| 49473/50000 [8:58:19<05:43,  1.53it/s]


 99%|██████████████████████████████████▋| 49474/50000 [8:58:20<05:33,  1.58it/s]


 99%|██████████████████████████████████▋| 49475/50000 [8:58:20<05:24,  1.62it/s]


 99%|██████████████████████████████████▋| 49476/50000 [8:58:21<05:22,  1.63it/s]


 99%|██████████████████████████████████▋| 49477/50000 [8:58:22<05:44,  1.52it/s]


 99%|██████████████████████████████████▋| 49478/50000 [8:58:22<05:29,  1.59it/s]


 99%|██████████████████████████████████▋| 49479/50000 [8:58:23<05:30,  1.58it/s]


 99%|██████████████████████████████████▋| 49480/50000 [8:58:24<05:42,  1.52it/s]


 99%|██████████████████████████████████▋| 49481/50000 [8:58:24<05:17,  1.63it/s]


 99%|██████████████████████████████████▋| 49482/50000 [8:58:25<05:27,  1.58it/s]


 99%|██████████████████████████████████▋| 49483/50000 [8:58:25<05:24,  1.59it/s]


 99%|██████████████████████████████████▋| 49484/50000 [8:58:26<05:07,  1.68it/s]


 99%|██████████████████████████████████▋| 49485/50000 [8:58:27<05:20,  1.61it/s]


 99%|██████████████████████████████████▋| 49486/50000 [8:58:27<05:02,  1.70it/s]


 99%|██████████████████████████████████▋| 49487/50000 [8:58:28<04:47,  1.78it/s]


 99%|██████████████████████████████████▋| 49488/50000 [8:58:28<04:51,  1.76it/s]


 99%|██████████████████████████████████▋| 49489/50000 [8:58:29<04:55,  1.73it/s]


 99%|██████████████████████████████████▋| 49490/50000 [8:58:30<05:05,  1.67it/s]


 99%|██████████████████████████████████▋| 49491/50000 [8:58:30<05:10,  1.64it/s]


 99%|██████████████████████████████████▋| 49492/50000 [8:58:31<05:17,  1.60it/s]


 99%|██████████████████████████████████▋| 49493/50000 [8:58:32<05:34,  1.51it/s]


 99%|██████████████████████████████████▋| 49494/50000 [8:58:32<06:11,  1.36it/s]


 99%|██████████████████████████████████▋| 49495/50000 [8:58:33<05:49,  1.44it/s]


 99%|██████████████████████████████████▋| 49496/50000 [8:58:34<05:35,  1.50it/s]


 99%|██████████████████████████████████▋| 49497/50000 [8:58:34<05:26,  1.54it/s]


 99%|██████████████████████████████████▋| 49498/50000 [8:58:35<05:23,  1.55it/s]


 99%|██████████████████████████████████▋| 49499/50000 [8:58:35<04:59,  1.67it/s]


 99%|██████████████████████████████████▋| 49500/50000 [8:58:36<04:57,  1.68it/s]
                                                                                
{'loss': 3.118, 'grad_norm': 3.0396335124969482, 'learning_rate': 1e-05, 'epoch': 2.59}

 99%|██████████████████████████████████▋| 49500/50000 [8:58:36<04:57,  1.68it/s]


 99%|██████████████████████████████████▋| 49501/50000 [8:58:37<05:01,  1.65it/s]


 99%|██████████████████████████████████▋| 49502/50000 [8:58:37<05:10,  1.61it/s]


 99%|██████████████████████████████████▋| 49503/50000 [8:58:38<05:16,  1.57it/s]


 99%|██████████████████████████████████▋| 49504/50000 [8:58:38<04:55,  1.68it/s]


 99%|██████████████████████████████████▋| 49505/50000 [8:58:39<05:29,  1.50it/s]


 99%|██████████████████████████████████▋| 49506/50000 [8:58:40<05:23,  1.53it/s]


 99%|██████████████████████████████████▋| 49507/50000 [8:58:40<05:13,  1.57it/s]


 99%|██████████████████████████████████▋| 49508/50000 [8:58:41<05:28,  1.50it/s]


 99%|██████████████████████████████████▋| 49509/50000 [8:58:42<05:24,  1.51it/s]


 99%|██████████████████████████████████▋| 49510/50000 [8:58:43<05:19,  1.53it/s]


 99%|██████████████████████████████████▋| 49511/50000 [8:58:43<05:12,  1.57it/s]


 99%|██████████████████████████████████▋| 49512/50000 [8:58:44<05:23,  1.51it/s]


 99%|██████████████████████████████████▋| 49513/50000 [8:58:44<05:13,  1.55it/s]


 99%|██████████████████████████████████▋| 49514/50000 [8:58:45<04:49,  1.68it/s]


 99%|██████████████████████████████████▋| 49515/50000 [8:58:46<05:26,  1.48it/s]


 99%|██████████████████████████████████▋| 49516/50000 [8:58:46<05:12,  1.55it/s]


 99%|██████████████████████████████████▋| 49517/50000 [8:58:47<05:17,  1.52it/s]


 99%|██████████████████████████████████▋| 49518/50000 [8:58:48<05:04,  1.58it/s]


 99%|██████████████████████████████████▋| 49519/50000 [8:58:48<04:59,  1.61it/s]


 99%|██████████████████████████████████▋| 49520/50000 [8:58:49<05:16,  1.52it/s]


 99%|██████████████████████████████████▋| 49521/50000 [8:58:50<05:14,  1.52it/s]


 99%|██████████████████████████████████▋| 49522/50000 [8:58:50<05:28,  1.45it/s]


 99%|██████████████████████████████████▋| 49523/50000 [8:58:51<05:11,  1.53it/s]


 99%|██████████████████████████████████▋| 49524/50000 [8:58:52<05:02,  1.58it/s]


 99%|██████████████████████████████████▋| 49525/50000 [8:58:52<04:51,  1.63it/s]


 99%|██████████████████████████████████▋| 49526/50000 [8:58:53<04:48,  1.64it/s]


 99%|██████████████████████████████████▋| 49527/50000 [8:58:53<04:30,  1.75it/s]


 99%|██████████████████████████████████▋| 49528/50000 [8:58:54<04:34,  1.72it/s]


 99%|██████████████████████████████████▋| 49529/50000 [8:58:54<04:22,  1.80it/s]


 99%|██████████████████████████████████▋| 49530/50000 [8:58:55<04:24,  1.78it/s]


 99%|██████████████████████████████████▋| 49531/50000 [8:58:56<04:37,  1.69it/s]


 99%|██████████████████████████████████▋| 49532/50000 [8:58:56<05:10,  1.51it/s]


 99%|██████████████████████████████████▋| 49533/50000 [8:58:57<05:05,  1.53it/s]


 99%|██████████████████████████████████▋| 49534/50000 [8:58:58<04:51,  1.60it/s]


 99%|██████████████████████████████████▋| 49535/50000 [8:58:58<04:51,  1.60it/s]


 99%|██████████████████████████████████▋| 49536/50000 [8:58:59<04:32,  1.70it/s]


 99%|██████████████████████████████████▋| 49537/50000 [8:58:59<04:36,  1.67it/s]


 99%|██████████████████████████████████▋| 49538/50000 [8:59:00<05:01,  1.53it/s]


 99%|██████████████████████████████████▋| 49539/50000 [8:59:01<04:54,  1.57it/s]


 99%|██████████████████████████████████▋| 49540/50000 [8:59:01<04:46,  1.60it/s]


 99%|██████████████████████████████████▋| 49541/50000 [8:59:02<04:29,  1.70it/s]


 99%|██████████████████████████████████▋| 49542/50000 [8:59:02<04:14,  1.80it/s]


 99%|██████████████████████████████████▋| 49543/50000 [8:59:03<04:17,  1.78it/s]


 99%|██████████████████████████████████▋| 49544/50000 [8:59:03<04:30,  1.69it/s]


 99%|██████████████████████████████████▋| 49545/50000 [8:59:04<04:33,  1.66it/s]


 99%|██████████████████████████████████▋| 49546/50000 [8:59:05<04:22,  1.73it/s]


 99%|██████████████████████████████████▋| 49547/50000 [8:59:05<04:47,  1.58it/s]


 99%|██████████████████████████████████▋| 49548/50000 [8:59:06<04:41,  1.61it/s]


 99%|██████████████████████████████████▋| 49549/50000 [8:59:07<04:34,  1.64it/s]


 99%|██████████████████████████████████▋| 49550/50000 [8:59:07<04:34,  1.64it/s]


 99%|██████████████████████████████████▋| 49551/50000 [8:59:08<04:50,  1.54it/s]


 99%|██████████████████████████████████▋| 49552/50000 [8:59:09<04:58,  1.50it/s]


 99%|██████████████████████████████████▋| 49553/50000 [8:59:09<04:42,  1.58it/s]


 99%|██████████████████████████████████▋| 49554/50000 [8:59:10<04:42,  1.58it/s]


 99%|██████████████████████████████████▋| 49555/50000 [8:59:10<04:41,  1.58it/s]


 99%|██████████████████████████████████▋| 49556/50000 [8:59:11<04:33,  1.62it/s]


 99%|██████████████████████████████████▋| 49557/50000 [8:59:12<04:42,  1.57it/s]


 99%|██████████████████████████████████▋| 49558/50000 [8:59:12<04:29,  1.64it/s]


 99%|██████████████████████████████████▋| 49559/50000 [8:59:13<05:14,  1.40it/s]


 99%|██████████████████████████████████▋| 49560/50000 [8:59:14<05:28,  1.34it/s]


 99%|██████████████████████████████████▋| 49561/50000 [8:59:15<05:18,  1.38it/s]


 99%|██████████████████████████████████▋| 49562/50000 [8:59:16<05:31,  1.32it/s]


 99%|██████████████████████████████████▋| 49563/50000 [8:59:16<05:00,  1.46it/s]


 99%|██████████████████████████████████▋| 49564/50000 [8:59:17<04:43,  1.54it/s]


 99%|██████████████████████████████████▋| 49565/50000 [8:59:17<04:41,  1.55it/s]


 99%|██████████████████████████████████▋| 49566/50000 [8:59:18<04:31,  1.60it/s]


 99%|██████████████████████████████████▋| 49567/50000 [8:59:18<04:24,  1.64it/s]


 99%|██████████████████████████████████▋| 49568/50000 [8:59:19<04:28,  1.61it/s]


 99%|██████████████████████████████████▋| 49569/50000 [8:59:20<05:11,  1.38it/s]


 99%|██████████████████████████████████▋| 49570/50000 [8:59:21<05:25,  1.32it/s]


 99%|██████████████████████████████████▋| 49571/50000 [8:59:21<05:01,  1.42it/s]


 99%|██████████████████████████████████▋| 49572/50000 [8:59:22<04:42,  1.52it/s]


 99%|██████████████████████████████████▋| 49573/50000 [8:59:23<04:36,  1.54it/s]


 99%|██████████████████████████████████▋| 49574/50000 [8:59:23<04:34,  1.55it/s]


 99%|██████████████████████████████████▋| 49575/50000 [8:59:24<04:29,  1.58it/s]


 99%|██████████████████████████████████▋| 49576/50000 [8:59:24<04:25,  1.60it/s]


 99%|██████████████████████████████████▋| 49577/50000 [8:59:25<04:18,  1.64it/s]


 99%|██████████████████████████████████▋| 49578/50000 [8:59:26<04:24,  1.60it/s]


 99%|██████████████████████████████████▋| 49579/50000 [8:59:26<04:38,  1.51it/s]


 99%|██████████████████████████████████▋| 49580/50000 [8:59:27<04:28,  1.56it/s]


 99%|██████████████████████████████████▋| 49581/50000 [8:59:28<04:28,  1.56it/s]


 99%|██████████████████████████████████▋| 49582/50000 [8:59:28<04:20,  1.60it/s]


 99%|██████████████████████████████████▋| 49583/50000 [8:59:29<04:12,  1.65it/s]


 99%|██████████████████████████████████▋| 49584/50000 [8:59:30<04:19,  1.60it/s]


 99%|██████████████████████████████████▋| 49585/50000 [8:59:30<04:32,  1.52it/s]


 99%|██████████████████████████████████▋| 49586/50000 [8:59:31<04:24,  1.57it/s]


 99%|██████████████████████████████████▋| 49587/50000 [8:59:32<04:34,  1.51it/s]


 99%|██████████████████████████████████▋| 49588/50000 [8:59:32<04:43,  1.45it/s]


 99%|██████████████████████████████████▋| 49589/50000 [8:59:33<04:59,  1.37it/s]


 99%|██████████████████████████████████▋| 49590/50000 [8:59:34<04:38,  1.47it/s]


 99%|██████████████████████████████████▋| 49591/50000 [8:59:34<04:23,  1.55it/s]


 99%|██████████████████████████████████▋| 49592/50000 [8:59:35<04:38,  1.46it/s]


 99%|██████████████████████████████████▋| 49593/50000 [8:59:36<04:25,  1.53it/s]


 99%|██████████████████████████████████▋| 49594/50000 [8:59:36<04:21,  1.55it/s]


 99%|██████████████████████████████████▋| 49595/50000 [8:59:37<04:45,  1.42it/s]


 99%|██████████████████████████████████▋| 49596/50000 [8:59:38<04:41,  1.43it/s]


 99%|██████████████████████████████████▋| 49597/50000 [8:59:38<04:25,  1.52it/s]


 99%|██████████████████████████████████▋| 49598/50000 [8:59:39<04:25,  1.52it/s]


 99%|██████████████████████████████████▋| 49599/50000 [8:59:40<04:22,  1.53it/s]


 99%|██████████████████████████████████▋| 49600/50000 [8:59:40<04:12,  1.59it/s]
                                                                                
{'loss': 3.0943, 'grad_norm': 3.602935314178467, 'learning_rate': 8e-06, 'epoch': 2.6}

 99%|██████████████████████████████████▋| 49600/50000 [8:59:40<04:12,  1.59it/s]


 99%|██████████████████████████████████▋| 49601/50000 [8:59:41<04:38,  1.43it/s]


 99%|██████████████████████████████████▋| 49602/50000 [8:59:42<04:43,  1.40it/s]


 99%|██████████████████████████████████▋| 49603/50000 [8:59:43<04:49,  1.37it/s]


 99%|██████████████████████████████████▋| 49604/50000 [8:59:43<04:52,  1.36it/s]


 99%|██████████████████████████████████▋| 49605/50000 [8:59:44<05:03,  1.30it/s]


 99%|██████████████████████████████████▋| 49606/50000 [8:59:45<04:39,  1.41it/s]


 99%|██████████████████████████████████▋| 49607/50000 [8:59:45<04:31,  1.45it/s]


 99%|██████████████████████████████████▋| 49608/50000 [8:59:46<04:34,  1.43it/s]


 99%|██████████████████████████████████▋| 49609/50000 [8:59:47<04:18,  1.51it/s]


 99%|██████████████████████████████████▋| 49610/50000 [8:59:47<04:00,  1.62it/s]


 99%|██████████████████████████████████▋| 49611/50000 [8:59:48<04:03,  1.60it/s]


 99%|██████████████████████████████████▋| 49612/50000 [8:59:49<04:17,  1.51it/s]


 99%|██████████████████████████████████▋| 49613/50000 [8:59:49<04:19,  1.49it/s]


 99%|██████████████████████████████████▋| 49614/50000 [8:59:50<04:54,  1.31it/s]


 99%|██████████████████████████████████▋| 49615/50000 [8:59:51<04:51,  1.32it/s]


 99%|██████████████████████████████████▋| 49616/50000 [8:59:52<04:33,  1.40it/s]


 99%|██████████████████████████████████▋| 49617/50000 [8:59:52<04:36,  1.38it/s]


 99%|██████████████████████████████████▋| 49618/50000 [8:59:53<04:27,  1.43it/s]


 99%|██████████████████████████████████▋| 49619/50000 [8:59:54<04:15,  1.49it/s]


 99%|██████████████████████████████████▋| 49620/50000 [8:59:54<04:20,  1.46it/s]


 99%|██████████████████████████████████▋| 49621/50000 [8:59:55<04:18,  1.47it/s]


 99%|██████████████████████████████████▋| 49622/50000 [8:59:56<04:11,  1.50it/s]


 99%|██████████████████████████████████▋| 49623/50000 [8:59:56<04:03,  1.55it/s]


 99%|██████████████████████████████████▋| 49624/50000 [8:59:57<04:02,  1.55it/s]


 99%|██████████████████████████████████▋| 49625/50000 [8:59:58<04:00,  1.56it/s]


 99%|██████████████████████████████████▋| 49626/50000 [8:59:58<04:04,  1.53it/s]


 99%|██████████████████████████████████▋| 49627/50000 [8:59:59<03:58,  1.57it/s]


 99%|██████████████████████████████████▋| 49628/50000 [8:59:59<03:50,  1.62it/s]


 99%|██████████████████████████████████▋| 49629/50000 [9:00:00<03:51,  1.60it/s]


 99%|██████████████████████████████████▋| 49630/50000 [9:00:01<03:54,  1.58it/s]


 99%|██████████████████████████████████▋| 49631/50000 [9:00:01<03:53,  1.58it/s]


 99%|██████████████████████████████████▋| 49632/50000 [9:00:02<04:02,  1.52it/s]


 99%|██████████████████████████████████▋| 49633/50000 [9:00:03<03:52,  1.58it/s]


 99%|██████████████████████████████████▋| 49634/50000 [9:00:03<03:48,  1.60it/s]


 99%|██████████████████████████████████▋| 49635/50000 [9:00:04<03:54,  1.56it/s]


 99%|██████████████████████████████████▋| 49636/50000 [9:00:04<03:47,  1.60it/s]


 99%|██████████████████████████████████▋| 49637/50000 [9:00:05<03:43,  1.62it/s]


 99%|██████████████████████████████████▋| 49638/50000 [9:00:06<03:46,  1.60it/s]


 99%|██████████████████████████████████▋| 49639/50000 [9:00:06<03:46,  1.59it/s]


 99%|██████████████████████████████████▋| 49640/50000 [9:00:07<03:37,  1.66it/s]


 99%|██████████████████████████████████▋| 49641/50000 [9:00:08<04:02,  1.48it/s]


 99%|██████████████████████████████████▋| 49642/50000 [9:00:08<03:50,  1.56it/s]


 99%|██████████████████████████████████▊| 49643/50000 [9:00:09<03:49,  1.55it/s]


 99%|██████████████████████████████████▊| 49644/50000 [9:00:10<04:17,  1.38it/s]


 99%|██████████████████████████████████▊| 49645/50000 [9:00:11<04:29,  1.32it/s]


 99%|██████████████████████████████████▊| 49646/50000 [9:00:11<04:18,  1.37it/s]


 99%|██████████████████████████████████▊| 49647/50000 [9:00:12<04:00,  1.47it/s]


 99%|██████████████████████████████████▊| 49648/50000 [9:00:13<03:50,  1.53it/s]


 99%|██████████████████████████████████▊| 49649/50000 [9:00:13<03:37,  1.61it/s]


 99%|██████████████████████████████████▊| 49650/50000 [9:00:14<03:43,  1.56it/s]


 99%|██████████████████████████████████▊| 49651/50000 [9:00:14<03:29,  1.67it/s]


 99%|██████████████████████████████████▊| 49652/50000 [9:00:15<03:34,  1.62it/s]


 99%|██████████████████████████████████▊| 49653/50000 [9:00:16<03:38,  1.59it/s]


 99%|██████████████████████████████████▊| 49654/50000 [9:00:16<03:42,  1.55it/s]


 99%|██████████████████████████████████▊| 49655/50000 [9:00:17<03:32,  1.62it/s]


 99%|██████████████████████████████████▊| 49656/50000 [9:00:17<03:33,  1.61it/s]


 99%|██████████████████████████████████▊| 49657/50000 [9:00:18<03:30,  1.63it/s]


 99%|██████████████████████████████████▊| 49658/50000 [9:00:19<03:24,  1.68it/s]


 99%|██████████████████████████████████▊| 49659/50000 [9:00:19<03:31,  1.62it/s]


 99%|██████████████████████████████████▊| 49660/50000 [9:00:20<03:22,  1.68it/s]


 99%|██████████████████████████████████▊| 49661/50000 [9:00:20<03:23,  1.67it/s]


 99%|██████████████████████████████████▊| 49662/50000 [9:00:21<03:22,  1.67it/s]


 99%|██████████████████████████████████▊| 49663/50000 [9:00:22<03:29,  1.61it/s]


 99%|██████████████████████████████████▊| 49664/50000 [9:00:22<03:37,  1.54it/s]


 99%|██████████████████████████████████▊| 49665/50000 [9:00:23<03:36,  1.55it/s]


 99%|██████████████████████████████████▊| 49666/50000 [9:00:24<03:44,  1.49it/s]


 99%|██████████████████████████████████▊| 49667/50000 [9:00:24<03:36,  1.54it/s]


 99%|██████████████████████████████████▊| 49668/50000 [9:00:25<03:34,  1.55it/s]


 99%|██████████████████████████████████▊| 49669/50000 [9:00:26<03:36,  1.53it/s]


 99%|██████████████████████████████████▊| 49670/50000 [9:00:26<03:26,  1.60it/s]


 99%|██████████████████████████████████▊| 49671/50000 [9:00:27<03:34,  1.54it/s]


 99%|██████████████████████████████████▊| 49672/50000 [9:00:27<03:23,  1.61it/s]


 99%|██████████████████████████████████▊| 49673/50000 [9:00:28<03:20,  1.63it/s]


 99%|██████████████████████████████████▊| 49674/50000 [9:00:29<03:32,  1.53it/s]


 99%|██████████████████████████████████▊| 49675/50000 [9:00:29<03:33,  1.52it/s]


 99%|██████████████████████████████████▊| 49676/50000 [9:00:30<03:50,  1.40it/s]


 99%|██████████████████████████████████▊| 49677/50000 [9:00:31<03:45,  1.43it/s]


 99%|██████████████████████████████████▊| 49678/50000 [9:00:32<03:41,  1.45it/s]


 99%|██████████████████████████████████▊| 49679/50000 [9:00:32<03:47,  1.41it/s]


 99%|██████████████████████████████████▊| 49680/50000 [9:00:33<03:40,  1.45it/s]


 99%|██████████████████████████████████▊| 49681/50000 [9:00:34<03:37,  1.47it/s]


 99%|██████████████████████████████████▊| 49682/50000 [9:00:34<03:21,  1.58it/s]


 99%|██████████████████████████████████▊| 49683/50000 [9:00:35<03:19,  1.59it/s]


 99%|██████████████████████████████████▊| 49684/50000 [9:00:35<03:19,  1.58it/s]


 99%|██████████████████████████████████▊| 49685/50000 [9:00:36<03:11,  1.65it/s]


 99%|██████████████████████████████████▊| 49686/50000 [9:00:37<03:22,  1.55it/s]


 99%|██████████████████████████████████▊| 49687/50000 [9:00:37<03:22,  1.54it/s]


 99%|██████████████████████████████████▊| 49688/50000 [9:00:38<03:24,  1.53it/s]


 99%|██████████████████████████████████▊| 49689/50000 [9:00:39<03:25,  1.52it/s]


 99%|██████████████████████████████████▊| 49690/50000 [9:00:39<03:24,  1.52it/s]


 99%|██████████████████████████████████▊| 49691/50000 [9:00:40<03:40,  1.40it/s]


 99%|██████████████████████████████████▊| 49692/50000 [9:00:41<03:27,  1.48it/s]


 99%|██████████████████████████████████▊| 49693/50000 [9:00:42<03:30,  1.46it/s]


 99%|██████████████████████████████████▊| 49694/50000 [9:00:42<03:25,  1.49it/s]


 99%|██████████████████████████████████▊| 49695/50000 [9:00:43<03:13,  1.57it/s]


 99%|██████████████████████████████████▊| 49696/50000 [9:00:43<03:05,  1.64it/s]


 99%|██████████████████████████████████▊| 49697/50000 [9:00:44<03:01,  1.67it/s]


 99%|██████████████████████████████████▊| 49698/50000 [9:00:44<03:01,  1.66it/s]


 99%|██████████████████████████████████▊| 49699/50000 [9:00:45<03:10,  1.58it/s]


 99%|██████████████████████████████████▊| 49700/50000 [9:00:46<03:04,  1.63it/s]
                                                                                
{'loss': 3.1001, 'grad_norm': 3.2410998344421387, 'learning_rate': 6e-06, 'epoch': 2.6}

 99%|██████████████████████████████████▊| 49700/50000 [9:00:46<03:04,  1.63it/s]


 99%|██████████████████████████████████▊| 49701/50000 [9:00:46<03:09,  1.58it/s]


 99%|██████████████████████████████████▊| 49702/50000 [9:00:47<03:05,  1.60it/s]


 99%|██████████████████████████████████▊| 49703/50000 [9:00:48<03:00,  1.64it/s]


 99%|██████████████████████████████████▊| 49704/50000 [9:00:48<03:02,  1.62it/s]


 99%|██████████████████████████████████▊| 49705/50000 [9:00:49<03:04,  1.60it/s]


 99%|██████████████████████████████████▊| 49706/50000 [9:00:50<03:21,  1.46it/s]


 99%|██████████████████████████████████▊| 49707/50000 [9:00:50<03:13,  1.52it/s]


 99%|██████████████████████████████████▊| 49708/50000 [9:00:51<03:04,  1.58it/s]


 99%|██████████████████████████████████▊| 49709/50000 [9:00:51<03:00,  1.62it/s]


 99%|██████████████████████████████████▊| 49710/50000 [9:00:52<02:56,  1.64it/s]


 99%|██████████████████████████████████▊| 49711/50000 [9:00:53<02:46,  1.73it/s]


 99%|██████████████████████████████████▊| 49712/50000 [9:00:53<02:55,  1.64it/s]


 99%|██████████████████████████████████▊| 49713/50000 [9:00:54<03:00,  1.59it/s]


 99%|██████████████████████████████████▊| 49714/50000 [9:00:55<03:07,  1.53it/s]


 99%|██████████████████████████████████▊| 49715/50000 [9:00:55<03:00,  1.58it/s]


 99%|██████████████████████████████████▊| 49716/50000 [9:00:56<02:54,  1.62it/s]


 99%|██████████████████████████████████▊| 49717/50000 [9:00:56<02:48,  1.68it/s]


 99%|██████████████████████████████████▊| 49718/50000 [9:00:57<02:43,  1.72it/s]


 99%|██████████████████████████████████▊| 49719/50000 [9:00:58<02:48,  1.67it/s]


 99%|██████████████████████████████████▊| 49720/50000 [9:00:58<02:51,  1.64it/s]


 99%|██████████████████████████████████▊| 49721/50000 [9:00:59<02:52,  1.62it/s]


 99%|██████████████████████████████████▊| 49722/50000 [9:01:00<03:01,  1.53it/s]


 99%|██████████████████████████████████▊| 49723/50000 [9:01:00<03:00,  1.53it/s]


 99%|██████████████████████████████████▊| 49724/50000 [9:01:01<03:00,  1.53it/s]


 99%|██████████████████████████████████▊| 49725/50000 [9:01:01<02:58,  1.54it/s]


 99%|██████████████████████████████████▊| 49726/50000 [9:01:02<02:59,  1.53it/s]


 99%|██████████████████████████████████▊| 49727/50000 [9:01:03<02:59,  1.52it/s]


 99%|██████████████████████████████████▊| 49728/50000 [9:01:04<03:03,  1.48it/s]


 99%|██████████████████████████████████▊| 49729/50000 [9:01:04<03:02,  1.48it/s]


 99%|██████████████████████████████████▊| 49730/50000 [9:01:05<02:53,  1.56it/s]


 99%|██████████████████████████████████▊| 49731/50000 [9:01:05<02:49,  1.59it/s]


 99%|██████████████████████████████████▊| 49732/50000 [9:01:06<02:52,  1.55it/s]


 99%|██████████████████████████████████▊| 49733/50000 [9:01:07<03:02,  1.47it/s]


 99%|██████████████████████████████████▊| 49734/50000 [9:01:08<03:06,  1.43it/s]


 99%|██████████████████████████████████▊| 49735/50000 [9:01:08<03:03,  1.44it/s]


 99%|██████████████████████████████████▊| 49736/50000 [9:01:09<02:59,  1.47it/s]


 99%|██████████████████████████████████▊| 49737/50000 [9:01:10<02:59,  1.47it/s]


 99%|██████████████████████████████████▊| 49738/50000 [9:01:10<02:58,  1.47it/s]


 99%|██████████████████████████████████▊| 49739/50000 [9:01:11<02:53,  1.51it/s]


 99%|██████████████████████████████████▊| 49740/50000 [9:01:12<02:49,  1.53it/s]


 99%|██████████████████████████████████▊| 49741/50000 [9:01:12<02:57,  1.46it/s]


 99%|██████████████████████████████████▊| 49742/50000 [9:01:13<02:50,  1.51it/s]


 99%|██████████████████████████████████▊| 49743/50000 [9:01:13<02:45,  1.55it/s]


 99%|██████████████████████████████████▊| 49744/50000 [9:01:14<02:49,  1.51it/s]


 99%|██████████████████████████████████▊| 49745/50000 [9:01:15<02:38,  1.61it/s]


 99%|██████████████████████████████████▊| 49746/50000 [9:01:15<02:38,  1.60it/s]


 99%|██████████████████████████████████▊| 49747/50000 [9:01:16<02:35,  1.62it/s]


 99%|██████████████████████████████████▊| 49748/50000 [9:01:17<02:37,  1.60it/s]


 99%|██████████████████████████████████▊| 49749/50000 [9:01:17<02:38,  1.58it/s]


100%|██████████████████████████████████▊| 49750/50000 [9:01:18<02:35,  1.60it/s]


100%|██████████████████████████████████▊| 49751/50000 [9:01:18<02:33,  1.63it/s]


100%|██████████████████████████████████▊| 49752/50000 [9:01:19<02:36,  1.58it/s]


100%|██████████████████████████████████▊| 49753/50000 [9:01:20<02:35,  1.59it/s]


100%|██████████████████████████████████▊| 49754/50000 [9:01:20<02:31,  1.62it/s]


100%|██████████████████████████████████▊| 49755/50000 [9:01:21<02:35,  1.58it/s]


100%|██████████████████████████████████▊| 49756/50000 [9:01:22<02:35,  1.57it/s]


100%|██████████████████████████████████▊| 49757/50000 [9:01:22<02:34,  1.58it/s]


100%|██████████████████████████████████▊| 49758/50000 [9:01:23<02:31,  1.60it/s]


100%|██████████████████████████████████▊| 49759/50000 [9:01:23<02:27,  1.63it/s]


100%|██████████████████████████████████▊| 49760/50000 [9:01:24<02:31,  1.59it/s]


100%|██████████████████████████████████▊| 49761/50000 [9:01:25<02:30,  1.59it/s]


100%|██████████████████████████████████▊| 49762/50000 [9:01:25<02:24,  1.65it/s]


100%|██████████████████████████████████▊| 49763/50000 [9:01:26<02:21,  1.67it/s]


100%|██████████████████████████████████▊| 49764/50000 [9:01:27<02:27,  1.60it/s]


100%|██████████████████████████████████▊| 49765/50000 [9:01:27<02:34,  1.52it/s]


100%|██████████████████████████████████▊| 49766/50000 [9:01:28<02:31,  1.55it/s]


100%|██████████████████████████████████▊| 49767/50000 [9:01:29<02:31,  1.54it/s]


100%|██████████████████████████████████▊| 49768/50000 [9:01:29<02:32,  1.52it/s]


100%|██████████████████████████████████▊| 49769/50000 [9:01:30<02:25,  1.59it/s]


100%|██████████████████████████████████▊| 49770/50000 [9:01:31<02:30,  1.53it/s]


100%|██████████████████████████████████▊| 49771/50000 [9:01:31<02:31,  1.51it/s]


100%|██████████████████████████████████▊| 49772/50000 [9:01:32<02:34,  1.47it/s]


100%|██████████████████████████████████▊| 49773/50000 [9:01:33<02:28,  1.53it/s]


100%|██████████████████████████████████▊| 49774/50000 [9:01:33<02:26,  1.54it/s]


100%|██████████████████████████████████▊| 49775/50000 [9:01:34<02:24,  1.56it/s]


100%|██████████████████████████████████▊| 49776/50000 [9:01:35<02:30,  1.49it/s]


100%|██████████████████████████████████▊| 49777/50000 [9:01:35<02:27,  1.51it/s]


100%|██████████████████████████████████▊| 49778/50000 [9:01:36<02:22,  1.56it/s]


100%|██████████████████████████████████▊| 49779/50000 [9:01:36<02:19,  1.59it/s]


100%|██████████████████████████████████▊| 49780/50000 [9:01:37<02:24,  1.52it/s]


100%|██████████████████████████████████▊| 49781/50000 [9:01:38<02:27,  1.49it/s]


100%|██████████████████████████████████▊| 49782/50000 [9:01:38<02:22,  1.53it/s]


100%|██████████████████████████████████▊| 49783/50000 [9:01:39<02:16,  1.59it/s]


100%|██████████████████████████████████▊| 49784/50000 [9:01:40<02:19,  1.55it/s]


100%|██████████████████████████████████▊| 49785/50000 [9:01:40<02:23,  1.50it/s]


100%|██████████████████████████████████▊| 49786/50000 [9:01:41<02:27,  1.45it/s]


100%|██████████████████████████████████▊| 49787/50000 [9:01:42<02:29,  1.43it/s]


100%|██████████████████████████████████▊| 49788/50000 [9:01:42<02:21,  1.50it/s]


100%|██████████████████████████████████▊| 49789/50000 [9:01:43<02:15,  1.56it/s]


100%|██████████████████████████████████▊| 49790/50000 [9:01:44<02:11,  1.60it/s]


100%|██████████████████████████████████▊| 49791/50000 [9:01:44<02:17,  1.53it/s]


100%|██████████████████████████████████▊| 49792/50000 [9:01:45<02:21,  1.47it/s]


100%|██████████████████████████████████▊| 49793/50000 [9:01:46<02:13,  1.55it/s]


100%|██████████████████████████████████▊| 49794/50000 [9:01:46<02:08,  1.61it/s]


100%|██████████████████████████████████▊| 49795/50000 [9:01:47<02:05,  1.63it/s]


100%|██████████████████████████████████▊| 49796/50000 [9:01:47<02:04,  1.64it/s]


100%|██████████████████████████████████▊| 49797/50000 [9:01:48<02:02,  1.65it/s]


100%|██████████████████████████████████▊| 49798/50000 [9:01:49<02:22,  1.42it/s]


100%|██████████████████████████████████▊| 49799/50000 [9:01:49<02:09,  1.56it/s]


100%|██████████████████████████████████▊| 49800/50000 [9:01:50<02:07,  1.56it/s]
                                                                                
{'loss': 3.0808, 'grad_norm': 3.456488847732544, 'learning_rate': 4e-06, 'epoch': 2.61}

100%|██████████████████████████████████▊| 49800/50000 [9:01:50<02:07,  1.56it/s]


100%|██████████████████████████████████▊| 49801/50000 [9:01:51<02:06,  1.57it/s]


100%|██████████████████████████████████▊| 49802/50000 [9:01:51<02:06,  1.57it/s]


100%|██████████████████████████████████▊| 49803/50000 [9:01:52<02:07,  1.54it/s]


100%|██████████████████████████████████▊| 49804/50000 [9:01:53<02:08,  1.53it/s]


100%|██████████████████████████████████▊| 49805/50000 [9:01:53<02:06,  1.55it/s]


100%|██████████████████████████████████▊| 49806/50000 [9:01:54<02:05,  1.55it/s]


100%|██████████████████████████████████▊| 49807/50000 [9:01:55<02:14,  1.43it/s]


100%|██████████████████████████████████▊| 49808/50000 [9:01:55<02:02,  1.56it/s]


100%|██████████████████████████████████▊| 49809/50000 [9:01:56<01:55,  1.66it/s]


100%|██████████████████████████████████▊| 49810/50000 [9:01:57<02:02,  1.55it/s]


100%|██████████████████████████████████▊| 49811/50000 [9:01:57<02:03,  1.53it/s]


100%|██████████████████████████████████▊| 49812/50000 [9:01:58<01:58,  1.58it/s]


100%|██████████████████████████████████▊| 49813/50000 [9:01:58<01:54,  1.64it/s]


100%|██████████████████████████████████▊| 49814/50000 [9:01:59<01:51,  1.67it/s]


100%|██████████████████████████████████▊| 49815/50000 [9:02:00<01:54,  1.61it/s]


100%|██████████████████████████████████▊| 49816/50000 [9:02:00<01:48,  1.69it/s]


100%|██████████████████████████████████▊| 49817/50000 [9:02:01<01:51,  1.64it/s]


100%|██████████████████████████████████▊| 49818/50000 [9:02:01<01:50,  1.64it/s]


100%|██████████████████████████████████▊| 49819/50000 [9:02:02<01:45,  1.71it/s]


100%|██████████████████████████████████▊| 49820/50000 [9:02:03<01:48,  1.67it/s]


100%|██████████████████████████████████▊| 49821/50000 [9:02:03<01:53,  1.58it/s]


100%|██████████████████████████████████▉| 49822/50000 [9:02:04<01:53,  1.57it/s]


100%|██████████████████████████████████▉| 49823/50000 [9:02:04<01:50,  1.61it/s]


100%|██████████████████████████████████▉| 49824/50000 [9:02:05<01:49,  1.60it/s]


100%|██████████████████████████████████▉| 49825/50000 [9:02:06<01:48,  1.62it/s]


100%|██████████████████████████████████▉| 49826/50000 [9:02:06<01:47,  1.61it/s]


100%|██████████████████████████████████▉| 49827/50000 [9:02:07<01:45,  1.64it/s]


100%|██████████████████████████████████▉| 49828/50000 [9:02:07<01:42,  1.68it/s]


100%|██████████████████████████████████▉| 49829/50000 [9:02:08<01:44,  1.63it/s]


100%|██████████████████████████████████▉| 49830/50000 [9:02:09<01:42,  1.66it/s]


100%|██████████████████████████████████▉| 49831/50000 [9:02:09<01:43,  1.63it/s]


100%|██████████████████████████████████▉| 49832/50000 [9:02:10<01:37,  1.72it/s]


100%|██████████████████████████████████▉| 49833/50000 [9:02:11<01:41,  1.65it/s]


100%|██████████████████████████████████▉| 49834/50000 [9:02:11<01:51,  1.49it/s]


100%|██████████████████████████████████▉| 49835/50000 [9:02:12<01:44,  1.57it/s]


100%|██████████████████████████████████▉| 49836/50000 [9:02:12<01:40,  1.64it/s]


100%|██████████████████████████████████▉| 49837/50000 [9:02:13<01:41,  1.61it/s]


100%|██████████████████████████████████▉| 49838/50000 [9:02:14<01:46,  1.52it/s]


100%|██████████████████████████████████▉| 49839/50000 [9:02:15<01:49,  1.47it/s]


100%|██████████████████████████████████▉| 49840/50000 [9:02:15<01:48,  1.48it/s]


100%|██████████████████████████████████▉| 49841/50000 [9:02:16<01:47,  1.49it/s]


100%|██████████████████████████████████▉| 49842/50000 [9:02:17<01:42,  1.53it/s]


100%|██████████████████████████████████▉| 49843/50000 [9:02:17<01:43,  1.52it/s]


100%|██████████████████████████████████▉| 49844/50000 [9:02:18<01:47,  1.45it/s]


100%|██████████████████████████████████▉| 49845/50000 [9:02:19<01:44,  1.48it/s]


100%|██████████████████████████████████▉| 49846/50000 [9:02:19<01:47,  1.43it/s]


100%|██████████████████████████████████▉| 49847/50000 [9:02:20<01:50,  1.39it/s]


100%|██████████████████████████████████▉| 49848/50000 [9:02:21<01:45,  1.44it/s]


100%|██████████████████████████████████▉| 49849/50000 [9:02:21<01:38,  1.53it/s]


100%|██████████████████████████████████▉| 49850/50000 [9:02:22<01:37,  1.54it/s]


100%|██████████████████████████████████▉| 49851/50000 [9:02:23<01:55,  1.29it/s]


100%|██████████████████████████████████▉| 49852/50000 [9:02:24<01:57,  1.26it/s]


100%|██████████████████████████████████▉| 49853/50000 [9:02:24<01:44,  1.40it/s]


100%|██████████████████████████████████▉| 49854/50000 [9:02:25<01:38,  1.49it/s]


100%|██████████████████████████████████▉| 49855/50000 [9:02:26<01:34,  1.54it/s]


100%|██████████████████████████████████▉| 49856/50000 [9:02:26<01:27,  1.64it/s]


100%|██████████████████████████████████▉| 49857/50000 [9:02:27<01:25,  1.67it/s]


100%|██████████████████████████████████▉| 49858/50000 [9:02:27<01:26,  1.64it/s]


100%|██████████████████████████████████▉| 49859/50000 [9:02:28<01:26,  1.63it/s]


100%|██████████████████████████████████▉| 49860/50000 [9:02:28<01:24,  1.66it/s]


100%|██████████████████████████████████▉| 49861/50000 [9:02:29<01:30,  1.54it/s]


100%|██████████████████████████████████▉| 49862/50000 [9:02:30<01:26,  1.59it/s]


100%|██████████████████████████████████▉| 49863/50000 [9:02:30<01:28,  1.55it/s]


100%|██████████████████████████████████▉| 49864/50000 [9:02:31<01:34,  1.44it/s]


100%|██████████████████████████████████▉| 49865/50000 [9:02:32<01:36,  1.40it/s]


100%|██████████████████████████████████▉| 49866/50000 [9:02:33<01:36,  1.38it/s]


100%|██████████████████████████████████▉| 49867/50000 [9:02:33<01:34,  1.41it/s]


100%|██████████████████████████████████▉| 49868/50000 [9:02:34<01:35,  1.39it/s]


100%|██████████████████████████████████▉| 49869/50000 [9:02:35<01:28,  1.48it/s]


100%|██████████████████████████████████▉| 49870/50000 [9:02:35<01:24,  1.54it/s]


100%|██████████████████████████████████▉| 49871/50000 [9:02:36<01:21,  1.57it/s]


100%|██████████████████████████████████▉| 49872/50000 [9:02:37<01:21,  1.57it/s]


100%|██████████████████████████████████▉| 49873/50000 [9:02:37<01:21,  1.55it/s]


100%|██████████████████████████████████▉| 49874/50000 [9:02:38<01:22,  1.53it/s]


100%|██████████████████████████████████▉| 49875/50000 [9:02:39<01:25,  1.47it/s]


100%|██████████████████████████████████▉| 49876/50000 [9:02:39<01:19,  1.56it/s]


100%|██████████████████████████████████▉| 49877/50000 [9:02:40<01:18,  1.56it/s]


100%|██████████████████████████████████▉| 49878/50000 [9:02:41<01:17,  1.58it/s]


100%|██████████████████████████████████▉| 49879/50000 [9:02:41<01:14,  1.62it/s]


100%|██████████████████████████████████▉| 49880/50000 [9:02:42<01:19,  1.52it/s]


100%|██████████████████████████████████▉| 49881/50000 [9:02:43<01:20,  1.48it/s]


100%|██████████████████████████████████▉| 49882/50000 [9:02:43<01:16,  1.54it/s]


100%|██████████████████████████████████▉| 49883/50000 [9:02:44<01:13,  1.59it/s]


100%|██████████████████████████████████▉| 49884/50000 [9:02:44<01:12,  1.61it/s]


100%|██████████████████████████████████▉| 49885/50000 [9:02:45<01:14,  1.54it/s]


100%|██████████████████████████████████▉| 49886/50000 [9:02:46<01:10,  1.61it/s]


100%|██████████████████████████████████▉| 49887/50000 [9:02:46<01:09,  1.62it/s]


100%|██████████████████████████████████▉| 49888/50000 [9:02:47<01:18,  1.42it/s]


100%|██████████████████████████████████▉| 49889/50000 [9:02:48<01:17,  1.43it/s]


100%|██████████████████████████████████▉| 49890/50000 [9:02:48<01:15,  1.45it/s]


100%|██████████████████████████████████▉| 49891/50000 [9:02:49<01:10,  1.54it/s]


100%|██████████████████████████████████▉| 49892/50000 [9:02:50<01:07,  1.59it/s]


100%|██████████████████████████████████▉| 49893/50000 [9:02:50<01:03,  1.69it/s]


100%|██████████████████████████████████▉| 49894/50000 [9:02:51<01:04,  1.64it/s]


100%|██████████████████████████████████▉| 49895/50000 [9:02:52<01:07,  1.55it/s]


100%|██████████████████████████████████▉| 49896/50000 [9:02:52<01:04,  1.62it/s]


100%|██████████████████████████████████▉| 49897/50000 [9:02:53<01:08,  1.50it/s]


100%|██████████████████████████████████▉| 49898/50000 [9:02:53<01:05,  1.57it/s]


100%|██████████████████████████████████▉| 49899/50000 [9:02:54<01:02,  1.60it/s]


100%|██████████████████████████████████▉| 49900/50000 [9:02:55<01:06,  1.51it/s]
                                                                                
{'loss': 3.1173, 'grad_norm': 3.071300983428955, 'learning_rate': 2e-06, 'epoch': 2.61}

100%|██████████████████████████████████▉| 49900/50000 [9:02:55<01:06,  1.51it/s]


100%|██████████████████████████████████▉| 49901/50000 [9:02:55<01:01,  1.62it/s]


100%|██████████████████████████████████▉| 49902/50000 [9:02:56<01:01,  1.60it/s]


100%|██████████████████████████████████▉| 49903/50000 [9:02:57<00:59,  1.62it/s]


100%|██████████████████████████████████▉| 49904/50000 [9:02:57<01:00,  1.58it/s]


100%|██████████████████████████████████▉| 49905/50000 [9:02:58<01:01,  1.55it/s]


100%|██████████████████████████████████▉| 49906/50000 [9:02:58<00:57,  1.64it/s]


100%|██████████████████████████████████▉| 49907/50000 [9:02:59<00:58,  1.59it/s]


100%|██████████████████████████████████▉| 49908/50000 [9:03:00<00:58,  1.56it/s]


100%|██████████████████████████████████▉| 49909/50000 [9:03:00<00:56,  1.60it/s]


100%|██████████████████████████████████▉| 49910/50000 [9:03:01<00:58,  1.54it/s]


100%|██████████████████████████████████▉| 49911/50000 [9:03:02<01:01,  1.45it/s]


100%|██████████████████████████████████▉| 49912/50000 [9:03:03<01:02,  1.40it/s]


100%|██████████████████████████████████▉| 49913/50000 [9:03:03<01:01,  1.42it/s]


100%|██████████████████████████████████▉| 49914/50000 [9:03:04<00:59,  1.45it/s]


100%|██████████████████████████████████▉| 49915/50000 [9:03:04<00:56,  1.51it/s]


100%|██████████████████████████████████▉| 49916/50000 [9:03:05<00:57,  1.45it/s]


100%|██████████████████████████████████▉| 49917/50000 [9:03:06<00:55,  1.51it/s]


100%|██████████████████████████████████▉| 49918/50000 [9:03:06<00:52,  1.55it/s]


100%|██████████████████████████████████▉| 49919/50000 [9:03:07<00:49,  1.65it/s]


100%|██████████████████████████████████▉| 49920/50000 [9:03:08<00:49,  1.61it/s]


100%|██████████████████████████████████▉| 49921/50000 [9:03:08<00:53,  1.47it/s]


100%|██████████████████████████████████▉| 49922/50000 [9:03:09<00:51,  1.51it/s]


100%|██████████████████████████████████▉| 49923/50000 [9:03:10<00:49,  1.57it/s]


100%|██████████████████████████████████▉| 49924/50000 [9:03:10<00:48,  1.58it/s]


100%|██████████████████████████████████▉| 49925/50000 [9:03:11<00:48,  1.56it/s]


100%|██████████████████████████████████▉| 49926/50000 [9:03:12<00:49,  1.50it/s]


100%|██████████████████████████████████▉| 49927/50000 [9:03:12<00:51,  1.41it/s]


100%|██████████████████████████████████▉| 49928/50000 [9:03:13<00:51,  1.41it/s]


100%|██████████████████████████████████▉| 49929/50000 [9:03:14<00:48,  1.47it/s]


100%|██████████████████████████████████▉| 49930/50000 [9:03:14<00:44,  1.58it/s]


100%|██████████████████████████████████▉| 49931/50000 [9:03:15<00:45,  1.51it/s]


100%|██████████████████████████████████▉| 49932/50000 [9:03:16<00:46,  1.47it/s]


100%|██████████████████████████████████▉| 49933/50000 [9:03:16<00:43,  1.54it/s]


100%|██████████████████████████████████▉| 49934/50000 [9:03:17<00:43,  1.53it/s]


100%|██████████████████████████████████▉| 49935/50000 [9:03:18<00:40,  1.59it/s]


100%|██████████████████████████████████▉| 49936/50000 [9:03:18<00:39,  1.63it/s]


100%|██████████████████████████████████▉| 49937/50000 [9:03:19<00:40,  1.57it/s]


100%|██████████████████████████████████▉| 49938/50000 [9:03:19<00:37,  1.66it/s]


100%|██████████████████████████████████▉| 49939/50000 [9:03:20<00:37,  1.62it/s]


100%|██████████████████████████████████▉| 49940/50000 [9:03:21<00:37,  1.58it/s]


100%|██████████████████████████████████▉| 49941/50000 [9:03:21<00:37,  1.57it/s]


100%|██████████████████████████████████▉| 49942/50000 [9:03:22<00:37,  1.55it/s]


100%|██████████████████████████████████▉| 49943/50000 [9:03:23<00:35,  1.59it/s]


100%|██████████████████████████████████▉| 49944/50000 [9:03:23<00:36,  1.53it/s]


100%|██████████████████████████████████▉| 49945/50000 [9:03:24<00:34,  1.61it/s]


100%|██████████████████████████████████▉| 49946/50000 [9:03:25<00:34,  1.58it/s]


100%|██████████████████████████████████▉| 49947/50000 [9:03:25<00:31,  1.67it/s]


100%|██████████████████████████████████▉| 49948/50000 [9:03:26<00:30,  1.71it/s]


100%|██████████████████████████████████▉| 49949/50000 [9:03:26<00:32,  1.59it/s]


100%|██████████████████████████████████▉| 49950/50000 [9:03:27<00:31,  1.57it/s]


100%|██████████████████████████████████▉| 49951/50000 [9:03:28<00:30,  1.60it/s]


100%|██████████████████████████████████▉| 49952/50000 [9:03:28<00:29,  1.65it/s]


100%|██████████████████████████████████▉| 49953/50000 [9:03:29<00:29,  1.57it/s]


100%|██████████████████████████████████▉| 49954/50000 [9:03:30<00:29,  1.55it/s]


100%|██████████████████████████████████▉| 49955/50000 [9:03:30<00:29,  1.53it/s]


100%|██████████████████████████████████▉| 49956/50000 [9:03:31<00:29,  1.49it/s]


100%|██████████████████████████████████▉| 49957/50000 [9:03:32<00:28,  1.51it/s]


100%|██████████████████████████████████▉| 49958/50000 [9:03:32<00:26,  1.56it/s]


100%|██████████████████████████████████▉| 49959/50000 [9:03:33<00:26,  1.53it/s]


100%|██████████████████████████████████▉| 49960/50000 [9:03:34<00:26,  1.49it/s]


100%|██████████████████████████████████▉| 49961/50000 [9:03:34<00:27,  1.43it/s]


100%|██████████████████████████████████▉| 49962/50000 [9:03:35<00:27,  1.40it/s]


100%|██████████████████████████████████▉| 49963/50000 [9:03:36<00:25,  1.44it/s]


100%|██████████████████████████████████▉| 49964/50000 [9:03:36<00:23,  1.50it/s]


100%|██████████████████████████████████▉| 49965/50000 [9:03:37<00:22,  1.55it/s]


100%|██████████████████████████████████▉| 49966/50000 [9:03:38<00:21,  1.56it/s]


100%|██████████████████████████████████▉| 49967/50000 [9:03:38<00:21,  1.53it/s]


100%|██████████████████████████████████▉| 49968/50000 [9:03:39<00:21,  1.51it/s]


100%|██████████████████████████████████▉| 49969/50000 [9:03:39<00:19,  1.59it/s]


100%|██████████████████████████████████▉| 49970/50000 [9:03:40<00:18,  1.62it/s]


100%|██████████████████████████████████▉| 49971/50000 [9:03:41<00:18,  1.60it/s]


100%|██████████████████████████████████▉| 49972/50000 [9:03:41<00:17,  1.59it/s]


100%|██████████████████████████████████▉| 49973/50000 [9:03:42<00:16,  1.64it/s]


100%|██████████████████████████████████▉| 49974/50000 [9:03:42<00:15,  1.72it/s]


100%|██████████████████████████████████▉| 49975/50000 [9:03:43<00:14,  1.78it/s]


100%|██████████████████████████████████▉| 49976/50000 [9:03:44<00:14,  1.62it/s]


100%|██████████████████████████████████▉| 49977/50000 [9:03:44<00:14,  1.61it/s]


100%|██████████████████████████████████▉| 49978/50000 [9:03:45<00:13,  1.60it/s]


100%|██████████████████████████████████▉| 49979/50000 [9:03:46<00:13,  1.55it/s]


100%|██████████████████████████████████▉| 49980/50000 [9:03:46<00:12,  1.54it/s]


100%|██████████████████████████████████▉| 49981/50000 [9:03:47<00:11,  1.58it/s]


100%|██████████████████████████████████▉| 49982/50000 [9:03:48<00:12,  1.50it/s]


100%|██████████████████████████████████▉| 49983/50000 [9:03:48<00:10,  1.57it/s]


100%|██████████████████████████████████▉| 49984/50000 [9:03:49<00:09,  1.63it/s]


100%|██████████████████████████████████▉| 49985/50000 [9:03:49<00:09,  1.61it/s]


100%|██████████████████████████████████▉| 49986/50000 [9:03:50<00:08,  1.58it/s]


100%|██████████████████████████████████▉| 49987/50000 [9:03:51<00:08,  1.49it/s]


100%|██████████████████████████████████▉| 49988/50000 [9:03:51<00:08,  1.48it/s]


100%|██████████████████████████████████▉| 49989/50000 [9:03:52<00:07,  1.43it/s]


100%|██████████████████████████████████▉| 49990/50000 [9:03:53<00:06,  1.50it/s]


100%|██████████████████████████████████▉| 49991/50000 [9:03:53<00:05,  1.55it/s]


100%|██████████████████████████████████▉| 49992/50000 [9:03:54<00:05,  1.60it/s]


100%|██████████████████████████████████▉| 49993/50000 [9:03:55<00:04,  1.59it/s]


100%|██████████████████████████████████▉| 49994/50000 [9:03:55<00:03,  1.57it/s]


100%|██████████████████████████████████▉| 49995/50000 [9:03:56<00:03,  1.54it/s]


100%|██████████████████████████████████▉| 49996/50000 [9:03:57<00:02,  1.52it/s]


100%|██████████████████████████████████▉| 49997/50000 [9:03:58<00:02,  1.35it/s]


100%|██████████████████████████████████▉| 49998/50000 [9:03:58<00:01,  1.41it/s]


100%|██████████████████████████████████▉| 49999/50000 [9:03:59<00:00,  1.53it/s]


100%|███████████████████████████████████| 50000/50000 [9:03:59<00:00,  1.58it/s]
                                                                                
{'loss': 3.0703, 'grad_norm': 3.8544743061065674, 'learning_rate': 0.0, 'epoch': 2.62}

100%|███████████████████████████████████| 50000/50000 [9:03:59<00:00,  1.58it/s]***** Running Evaluation *****
  Num examples = 50
  Batch size = 16




  0%|                                                     | 0/4 [00:00<?, ?it/s][A



 50%|██████████████████████▌                      | 2/4 [00:02<00:02,  1.04s/it][A





 75%|█████████████████████████████████▊           | 3/4 [00:07<00:02,  2.97s/it][A



100%|█████████████████████████████████████████████| 4/4 [00:09<00:00,  2.48s/it][A


                                                                                


                                                                                
[A{'eval_rouge-1': 33.461914, 'eval_rouge-2': 8.040958, 'eval_rouge-l': 25.72797, 'eval_bleu-4': 0.03739291547618872, 'eval_runtime': 16.1492, 'eval_samples_per_second': 3.096, 'eval_steps_per_second': 0.248, 'epoch': 2.62}

100%|███████████████████████████████████| 50000/50000 [9:04:15<00:00,  1.58it/s]

100%|█████████████████████████████████████████████| 4/4 [00:09<00:00,  2.48s/it][A

                                                                                [ASaving model checkpoint to ./output/tmp-checkpoint-50000
tokenizer config file saved in ./output/tmp-checkpoint-50000/tokenizer_config.json
Special tokens file saved in ./output/tmp-checkpoint-50000/special_tokens_map.json




Training completed. Do not forget to share your model on huggingface.co/models =)



                                                                                
{'train_runtime': 32656.1028, 'train_samples_per_second': 9.187, 'train_steps_per_second': 1.531, 'train_loss': 3.28082859375, 'epoch': 2.62}

100%|███████████████████████████████████| 50000/50000 [9:04:16<00:00,  1.58it/s]
100%|███████████████████████████████████| 50000/50000 [9:04:16<00:00,  1.53it/s]
***** Running Prediction *****
  Num examples = 1070
  Batch size = 16



  0%|                                                    | 0/67 [00:00<?, ?it/s]


  3%|█▎                                          | 2/67 [00:02<01:07,  1.04s/it]


  4%|█▉                                          | 3/67 [00:07<03:09,  2.95s/it]


  6%|██▋                                         | 4/67 [00:09<02:43,  2.59s/it]


  7%|███▎                                        | 5/67 [00:12<02:36,  2.52s/it]


  9%|███▉                                        | 6/67 [00:14<02:22,  2.34s/it]


 10%|████▌                                       | 7/67 [00:15<02:07,  2.12s/it]


 12%|█████▎                                      | 8/67 [00:18<02:08,  2.18s/it]


 13%|█████▉                                      | 9/67 [00:19<01:56,  2.00s/it]


 15%|██████▍                                    | 10/67 [00:21<01:55,  2.02s/it]


 16%|███████                                    | 11/67 [00:23<01:48,  1.93s/it]


 18%|███████▋                                   | 12/67 [00:25<01:42,  1.87s/it]


 19%|████████▎                                  | 13/67 [00:26<01:38,  1.83s/it]


 21%|████████▉                                  | 14/67 [00:29<01:42,  1.93s/it]


 22%|█████████▋                                 | 15/67 [00:34<02:41,  3.10s/it]


 24%|██████████▎                                | 16/67 [00:36<02:13,  2.62s/it]


 25%|██████████▉                                | 17/67 [00:38<01:59,  2.39s/it]


 27%|███████████▌                               | 18/67 [00:40<01:49,  2.23s/it]


 28%|████████████▏                              | 19/67 [00:42<01:43,  2.15s/it]


 30%|████████████▊                              | 20/67 [00:43<01:35,  2.04s/it]


 31%|█████████████▍                             | 21/67 [00:45<01:29,  1.95s/it]


 33%|██████████████                             | 22/67 [00:48<01:34,  2.10s/it]


 34%|██████████████▊                            | 23/67 [00:49<01:29,  2.03s/it]


 36%|███████████████▍                           | 24/67 [00:51<01:26,  2.02s/it]


 37%|████████████████                           | 25/67 [00:53<01:19,  1.89s/it]


 39%|████████████████▋                          | 26/67 [00:55<01:15,  1.84s/it]


 40%|█████████████████▎                         | 27/67 [00:57<01:20,  2.02s/it]


 42%|█████████████████▉                         | 28/67 [00:59<01:12,  1.86s/it]


 43%|██████████████████▌                        | 29/67 [01:04<01:53,  3.00s/it]


 45%|███████████████████▎                       | 30/67 [01:10<02:22,  3.84s/it]


 46%|███████████████████▉                       | 31/67 [01:12<01:55,  3.22s/it]


 48%|████████████████████▌                      | 32/67 [01:17<02:17,  3.93s/it]


 49%|█████████████████████▏                     | 33/67 [01:19<01:52,  3.29s/it]


 51%|█████████████████████▊                     | 34/67 [01:21<01:31,  2.78s/it]


 52%|██████████████████████▍                    | 35/67 [01:24<01:33,  2.93s/it]


 54%|███████████████████████                    | 36/67 [01:26<01:20,  2.60s/it]


 55%|███████████████████████▋                   | 37/67 [01:28<01:11,  2.37s/it]


 57%|████████████████████████▍                  | 38/67 [01:29<01:00,  2.10s/it]


 58%|█████████████████████████                  | 39/67 [01:31<00:57,  2.05s/it]


 60%|█████████████████████████▋                 | 40/67 [01:37<01:24,  3.12s/it]


 61%|██████████████████████████▎                | 41/67 [01:38<01:09,  2.66s/it]


 63%|██████████████████████████▉                | 42/67 [01:40<01:01,  2.45s/it]


 64%|███████████████████████████▌               | 43/67 [01:42<00:56,  2.35s/it]


 66%|████████████████████████████▏              | 44/67 [01:44<00:51,  2.23s/it]


 67%|████████████████████████████▉              | 45/67 [01:46<00:45,  2.07s/it]


 69%|█████████████████████████████▌             | 46/67 [01:48<00:42,  2.04s/it]


 70%|██████████████████████████████▏            | 47/67 [01:50<00:38,  1.94s/it]


 72%|██████████████████████████████▊            | 48/67 [01:51<00:34,  1.83s/it]


 73%|███████████████████████████████▍           | 49/67 [01:53<00:32,  1.79s/it]


 75%|████████████████████████████████           | 50/67 [01:55<00:29,  1.75s/it]


 76%|████████████████████████████████▋          | 51/67 [01:56<00:26,  1.67s/it]


 78%|█████████████████████████████████▎         | 52/67 [01:58<00:25,  1.72s/it]


 79%|██████████████████████████████████         | 53/67 [02:00<00:23,  1.71s/it]


 81%|██████████████████████████████████▋        | 54/67 [02:02<00:23,  1.79s/it]


 82%|███████████████████████████████████▎       | 55/67 [02:03<00:20,  1.72s/it]


 84%|███████████████████████████████████▉       | 56/67 [02:05<00:18,  1.71s/it]


 85%|████████████████████████████████████▌      | 57/67 [02:07<00:17,  1.74s/it]


 87%|█████████████████████████████████████▏     | 58/67 [02:09<00:16,  1.83s/it]


 88%|█████████████████████████████████████▊     | 59/67 [02:14<00:23,  2.99s/it]


 90%|██████████████████████████████████████▌    | 60/67 [02:16<00:18,  2.59s/it]


 91%|███████████████████████████████████████▏   | 61/67 [02:17<00:13,  2.23s/it]


 93%|███████████████████████████████████████▊   | 62/67 [02:20<00:11,  2.32s/it]


 94%|████████████████████████████████████████▍  | 63/67 [02:22<00:08,  2.20s/it]


 96%|█████████████████████████████████████████  | 64/67 [02:24<00:06,  2.19s/it]


 97%|█████████████████████████████████████████▋ | 65/67 [02:26<00:03,  2.00s/it]


 99%|██████████████████████████████████████████▎| 66/67 [02:28<00:01,  1.97s/it]


100%|███████████████████████████████████████████| 67/67 [02:30<00:00,  1.98s/it]


100%|███████████████████████████████████████████| 67/67 [02:34<00:00,  2.31s/it]


## 3. 使用微调的数据集进行推理
在完成微调任务之后，我们可以查看到 `output` 文件夹下多了很多个`checkpoint-*`的文件夹，这些文件夹代表了训练的轮数。
我们选择最后一轮的微调权重，并使用inference进行导入。

In [3]:
!ls output/

checkpoint-10000  checkpoint-25000  checkpoint-40000  checkpoint-50000
checkpoint-15000  checkpoint-30000  checkpoint-45000
checkpoint-20000  checkpoint-35000  checkpoint-5000


In [2]:
!CUDA_VISIBLE_DEVICES=1  /root/miniconda3/envs/llm/bin/python inference_hf.py output/checkpoint-50000/ --prompt "类型#裙*版型#显瘦*材质#网纱*风格#性感*裙型#百褶*裙下摆#压褶*裙长#连衣裙*裙衣门襟#拉链*裙衣门襟#套头*裙款式#拼接*裙款式#拉链*裙款式#木耳边*裙款式#抽褶*裙款式#不规则"

Loading checkpoint shards: 100%|██████████████████| 7/7 [00:00<00:00, 15.00it/s]
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
这款连衣裙采用拼接木耳边设计，优雅中不失性感，不规则的裙摆，增加裙子的时尚感。袖口和裙摆采用网纱压褶设计，穿着更加有层次感。拼接的百褶裙摆，穿着更加显瘦，套头的设计，穿脱方便，腰间有拉链设计，穿着更加方便 Roland<UNK>。


In [1]:
!CUDA_VISIBLE_DEVICES=1  /root/miniconda3/envs/llm/bin/python inference_hf.py output/checkpoint-10000/ --prompt "类型#裙*版型#显瘦*材质#网纱*风格#性感*裙型#百褶*裙下摆#压褶*裙长#连衣裙*裙衣门襟#拉链*裙衣门襟#套头*裙款式#拼接*裙款式#拉链*裙款式#木耳边*裙款式#抽褶*裙款式#不规则"

Loading checkpoint shards: 100%|██████████████████| 7/7 [00:00<00:00, 12.56it/s]
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
这款连衣裙，网纱拼接设计，穿着舒适，舒适度好。不规则的百褶裙摆，增添几分灵动感。加上抽褶的百褶，轻松遮肉显瘦，轻松穿出纤细的腰肢。拼接的木耳边，增加整体层次感。而压褶的百褶裙摆，更添几分俏皮感。加上门襟的拉链，穿着方便，穿着舒适。


## 4. 总结
到此位置，我们就完成了使用单张 GPU Lora 来微调 ChatGLM3-6B 模型，使其能生产出更好的广告。
在本章节中，你将会学会：
+ 如何使用模型进行 Lora 微调
+ 微调数据集的准备和对齐
+ 使用微调的模型进行推理