Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nmt-2.0] #21

Open
natalymr opened this issue Apr 10, 2020 · 8 comments
Open

[nmt-2.0] #21

natalymr opened this issue Apr 10, 2020 · 8 comments

Comments

@natalymr
Copy link
Owner

No description provided.

@natalymr natalymr created this issue from a note in gcm_summer2019 (In Progress) Apr 10, 2020
@natalymr
Copy link
Owner Author

natalymr commented Apr 10, 2020

  • вставить eval
  • вставить bleu на eval
  • почистить код
  • запустить на машине
  • отправить, если не будет обучаться, на проверку Саше
  • 100 эпох
  • 200 эпох
  • другой lr
  • запустить baselines
  • прочитать, как интерпретировать графики на wandb
  • прочитать статью про bleu score
  • собрать датасет с 200 токенами
  • добавить dropout/batch_normalization

@natalymr
Copy link
Owner Author

natalymr commented Apr 20, 2020

h = 2, 1000 эпох, 10 примеров, train/test одинаковый
image
h = 2, 1000 эпох, 10 примеров, train/test разный
image
h = 256, 1000 эпох, 10 примеров, train/test одинаковый
image
h = 256, 10 эпох, полный датасет, train/test разный
image
h = 256, 50 эпох, полный датасет, train/test разный
image
h = 256, 30 эпох, полный датасет, другой lr, train/test разные
image
h = 256, 30 эпох, 2000 датасет, lr = 0.001, train/test разные
image
h = 256, 30 эпох, 2000 датасет, lr = 0.0001, train/test разные
image
h = 256, 30 эпох, 100 датасет, lr = 0.0001, train/test разные
image

@natalymr
Copy link
Owner Author

natalymr commented Apr 22, 2020

запуски на маке

500 эпох, датасет 100 train != 100 test, lr = 0.05, step_size=150, gamma=0.1, bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc - ничем вроде не отличается от предыдущего, но что-то скоры совсем другие; ДО BIDERECTIONAL, добавила clip_grad

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1s09khnv

500 эпох, датасет 100 train != 100 test, lr = 0.05, step_size=150, gamma=0.1, bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc - ничем вроде не отличается от предыдущего, но что-то скоры совсем другие; ДО BIDERECTIONAL

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/35uz9xt0

500 эпох, датасет 100 train != 100 test, lr = 0.1, step_size=100, gamma=0.1), bs=10*10, test every 10, ИСПРАВЛЕННЫЙ grad acc

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/cnqca8ml

500 эпох, датасет 100 train != 100 test, lr = 0.1, step_size=100, gamma=0.1), bs=10*10, test every 10

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1o31kd14

добавила grad acc (bs = 1, step делаем через каждый 5 шагов)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/uaoy1i2r

просто предыдущий запуск

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2iarf6cm

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, добавила pack_padded

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ypzmdhj4

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, убрала dropout, clip_grad

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2bhi9b3z

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, clip_grad(0.25)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3nkuv908

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох уменьшили в 0.2 раза, добавила dropout на lstm в decoder, clip_grad(0.25)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3v4h52mv?workspace=user-natalymr

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr начали с 0.001, после 500 эпох увеличили в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ba9k4hxu?workspace=user-natalymr

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr увеличивается через каждые 500 эпох в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1612jgvi

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr уменьшается через каждые 500 эпох в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2kdz2ntc

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5, lr уменьшается через каждые 100 эпох в 10 раз

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2ar6y7nu

h = 2, init = xavier_uniform_, 10 dataset, train != test, добавила dropout везде с коэф 0.5

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/cp5vlq38

h = 2, init = xavier_uniform_, 10 dataset, train=test, добавила dropout везде с коэф 0.5

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3msbqrk2

h = 2, init = xavier_normal_, 10 dataset, train=test

вот тут видно, что

h = 2, init = normal, 10 dataset, train=test

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3nxm5dk6?workspace=user-natalymr

h = 2, init не делала, 10 dataset, train=test

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2bjp9eqj?workspace=user-natalymr

запуски на машине

1000 эпох, 100 dataset, train != test, lr = 0.0001

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/xc58t4gg?workspace=user-natalymr

200 эпох, init = xavier_normal_, 100 dataset, train != test, lr = 0.01

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/zx11267c

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr = 0.01

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/b12hg64j

200 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 50 эпох

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/4rx2i84m

200 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ttyv6fs2

1000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох, batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/mpr4zj4q

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох, batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ymhus29z

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr НЕ УМЕНЬШАЕМ (0.001), batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ej4s0x8t

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr УВЕЛИЧИВАЕМ через каждые 500 эпох в 5 раз (0.001), batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/wy8r2kfu

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr уменьшающийся через каждые 500 эпох в 2 раза (а не в 10 раз), начиная с 0.001, batch size = 100, а не 10; hid_size 400, а не 300

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/5gnbvy1d

2000 эпох, init = xavier_normal_, 100 dataset, train != test, lr не уменьшаем (0.001), batch size = 100, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/dk5czbor

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr не уменьшаем (0.001), batch size = 100, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/hysoott0

200 эпох, init = xavier_normal_, 2000 dataset, train != test, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/513ukeok

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/zyvjmits

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1), sort dataset = True

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/r1ik0206?workspace=user-natalymr

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1), sort dataset = True, SKIP PADDING

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/anwy0t7z

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr НЕ УМЕНЬШАЕМ, sort dataset = True, скип паддинг

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/nobbnkk3

59 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.01, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = FALSE, shuffle=TRUE, скип паддинг

начала генерировать <sos> и <eos> 😡

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/bdtttitj

200 эпох, init = xavier_normal_, 2000 dataset, train != test, lr =0.1, batch size = 130, а не 10; hid_size 400, а не 300, добавила DROPOUT 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = TRUE, shuffle=TRUE, скип паддинг

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/nlz2wr98

200 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=100*2) & lr = 0.1, step_size=100, gamma=0.1 & test_every 2

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/45tyzit5

30 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=100*5(!!!)) & lr = 0.1, step_size=100, gamma=0.1 & test_every 5(!) - дубина - неправильно реализовала GRAD ACC

для почти всего генерирует одно и то же + повторяет одно и то же слово в одном сообщении

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/q5okz6q1

100 эпох, init = xavier_normal_, 2000 dataset, train != test, hid_size 256, а не 400, добавила dropout 0.2 (encoder: embed, lstm, decoder: embed), lr не уменьшаем, sort dataset = true, shuffle=true, скип паддинг; GRAD ACC (bs=50*5(!!!)) & lr = 0.1, step_size=50, gamma=0.1 & test_every 1 - ДУБИНА, неправильно реализовала GRAD ACC

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/s7hwma0d

вроде как лучше оказался 2*100

100 эпох, весь датасет, Total number of params = 16167398, src vocab = 26774, tgt vocab = 13795, bs=100*2 & lr = 0.1, step_size=50, gamma=0.1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/99lrdx51

100 эпох, весь датасет, Total number of params = 16133862, src vocab = 26643, tgt vocab = 13795, bs=100*5 & lr = 0.1, step_size=50, gamma=1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/fhva9pab

100 эпох, весь датасет, Total number of params = 26839398, src vocab = 26643, tgt vocab = 13795, hid_size 400, а не 256, bs=50*5(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/4nc1j7lj

100 эпох, весь датасет, Total number of params =19172998, src vocab = 26643, tgt vocab = 13795, hid_size 300, а не 400, bs=50*10(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/d9gjkoer

100 эпох, весь датасет, Total number of params =19172998, src vocab = 26643, tgt vocab = 13795, hid_size 300, а не 400, bs=50*10(!) & lr = 0.1, step_size=50, gamma=1 & test_every 1, добавила clip_grad(1)

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/625igudj

ДУБИНА - неправильно реализовала bleu - новый Bleu, добавляем по 1000 каждую эпоху после test_bleu > 1.5

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/z2vjlp74

Новые Bleu, сразу на всем датасете - ДУБИНА, не на всем, неправильно реализовала постепенное добавление датасета, поэтому тут все испортила

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/3amsiq35?workspace=user-natalymr

Новые Bleu, сразу на всем датасете

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2hyvjbwi

@natalymr
Copy link
Owner Author

natalymr commented Apr 27, 2020

если градиенты начинают расходиться (перепрыгнули лок минимум)

lr уменьшаем, dropout добавить


если градиенты сходятся, а loss/acc не тот (засели в лок минимуме)

lr увеличиваем, dropout ослабить


Статья про Batch Size
There might be critical consequences when using different batch sizes that should be taken into consideration when choosing one. Let’s cover two of the main potential
consequences of using small or large batch sizes:

  • Generalization: Large batch sizes may cause bad generalization (or even get stuck in
    a local minimum). Generalization means that the neural network will perform quite well on samples outside of the training set. So, bad generalization — which is pretty much overfitting — means that the neural network will perform poorly on samples outside of the training set.
  • Convergence speed: Small batch sizes may lead to slow convergence of the learning algorithm. The variable updates applied in every step, that were calculated using a
    batch of samples, will determine the starting point for the next batch of samples.
    Training samples are randomly drawn from the training set every step and therefore the
    resulting gradients are noisy estimates based on partial data. The fewer samples we use in a single batch, the noisier and less accurate the gradient estimates will be. That is, the smaller the batch, the bigger impact a single sample has on the applied variable updates. In other words, smaller batch sizes may make the learning process noisier and
    fluctuating, essentially extending the time it takes the algorithm to converge.

With all that in mind, we have to choose a batch size that will be neither too small nor too large but somewhere in between. The main idea here is that we should play around with
different batch sizes until we find one that would be optimal for the specific neural
network and dataset we are using.


Solution (survey)

One way to overcome the GPU memory limitations and run large batch sizes is to split the
batch of samples into smaller mini-batches, where each mini-batch requires an amount of
GPU memory that can be satisfied. These mini-batches can run independently, and their
gradients should be averaged or summed before calculating the model variable updates.
There are two main ways to implement this:

  • Data-parallelism — use multiple GPUs to train all mini-batches in parallel, each on a
    single GPU. The gradients from all mini-batches are accumulated and the result is used to
    update the model variables at the end of every step.
  • Gradient accumulation — run the mini-batches sequentially while accumulating the gradients. The accumulated results are used to update the model variables at the end of the last mini-batch.

So what is gradient accumulation, technically?

Gradient accumulation means running a configured number of steps without updating the model variables while accumulating the gradients of those steps and then using the accumulated gradients to compute the variable updates.
Yes, it’s really that simple.
Running some steps without updating any of the model variables is the way we —
logically — split the batch of samples into a few mini-batches. The batch of samples that is used in every step is effectively a mini-batch, and all the samples of those steps combined are effectively the global batch.
By not updating the variables at all those steps, we cause all the mini-batches to use the same model variables for calculating the gradients. This is mandatory to ensure the same gradients and updates are calculated as if we were using the global batch size.
Accumulating the gradients in all of these steps results in the same sum of gradients as if we were using the global batch size.

Iterating through an example

So, let’s say we are accumulating gradients over 5 steps. We want to accumulate the gradients of the first 4 steps, without updating any variable. At the fifth step, we want to use the accumulated gradients of the previous 4 steps combined with the gradients of the fifth step to compute and assign the variable updates. Let’s see it in action:

  1. Starting at the first step, all the samples of the first mini-batch propagate through the forward and backward passes, resulting in computed gradients for each trainable model variable. We don’t want to actually update the variables, so there is no need in computing the updates at this point. What we need, though, is a place to store the gradients of the first step, in order for them to be accessible in the following steps, and we will use another variable for each trainable model variable, to hold the accumulated gradients. So, after computing the gradients of the first step, we will store them in the variables we created for the accumulated gradients.
  2. Now the second step starts, and again, all the samples of the second mini-batch
    propagate through all the layers of the model, computing the gradients of the second step. Just like the step before, we don’t want to update the variables yet, so there is no need in computing the variable updates. What’s different than the first step though, is that instead of just storing the gradients of the second step in our variables, we are going to add them to the values stored in the variables, which currently hold the gradients of the first step.
  3. Steps 3 and 4 are pretty much the same as the second step, as we are not yet updating the variables, and we are accumulating the gradients by adding them to our variables.
  4. Steps 3 and 4 are pretty much the same as the second step, as we are not yet updating the variables, and we are accumulating the gradients by adding them to our variables.
  5. Then, in step 5, we do want to update the variables, as we intended to accumulate the gradients over 5 steps. After computing the gradients of the fifth step, we will add them to the accumulated gradients, resulting in the sum of all the gradients of those 5 steps.

We’ll then take this sum and insert it as a parameter to the optimizer, resulting in the updates computed using all the gradients of those 5 steps, computed over all the samples in the global batch.


Solution (implementation)

https://discuss.pytorch.org/t/how-to-implement-accumulated-gradient/3822
image
image
И ЕЩЕ ТРИ ВАРИАНТА РЕАЛИЗАЦИИ
https://discuss.pytorch.org/t/why-do-we-need-to-set-the-gradients-manually-to-zero-in-pytorch/4903/20?u=alband
image
Из этого же обсуждения
image

@natalymr
Copy link
Owner Author

natalymr commented May 6, 2020

200 tokens

1000 dataset (500val/test) bs = 250, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/0yjtnd2s

ALL dataset, Total number of params = 26230872, src vocab = 41607, tgt vocab = 18069, bs = 250, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1p2c36bd
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/w3ao2qls

ALL dataset, Total number of params = 10497688, src vocab = 41607, tgt vocab = 18069, bs = 500, hid_size = 128, а не 300, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ljc12vrp

Обучаем на 1000 коммитах, проверяем на всем test/val, hid_size=300, bs=25*10, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ha8nper1

Обучаем на 1000 коммитах С КОНЦА, проверяем на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ehtguhaa

Обучаем на 5000 коммитах, проверяем на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1,

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/0ux74572

Проверочный запуск: вначале обучаемся на 1000 примерх, когда test_bleu > 0.5, начинаем каждую эпоху добавлять по 500 примеров в трейн, проверяемся на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.8, lr=0.1,

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/ojqkbrjw

Вначале обучаемся на 1000 примерх, когда test_bleu > 1.5, начинаем каждую эпоху добавлять по 500 примеров в трейн, проверяемся на всем test/val, hid_size=300, bs=25*10, lr step_size=10, gamma=0.7, lr=0.1,

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/q6jh7kqd

Поменяла способ подсчета bleu

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/1k6wlhkq

Добавляем по 1000 коммитов

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/31y4gm86?workspace=user-natalymr

Добавляем по 500 коммитов

https://app.wandb.ai/natalymr/nmt-2.0-test/runs/2ekge9aq

@natalymr
Copy link
Owner Author

natalymr commented May 11, 2020

NMT

100 tokens

на всем датасете
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/6y01tlo1

200 tokens

добавляем по 500
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/20rquvgm
после того, как добавили новые данные в трейн, ждем 4 эпохи, потом опять добавляем
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/iio91nz5
уменьшила lr (0.01)
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/i7l6s0dg

@natalymr
Copy link
Owner Author

natalymr commented May 17, 2020

New Dataset (match with code2seq)

NMT


100 tokens

по старому - добавлять немного коммитов, потом 4 эпохи обучаемся

https://app.wandb.ai/natalymr/nmt-1.0-test/runs/8p3nb59w?workspace=user-natalymr

сразу все данные

https://app.wandb.ai/natalymr/nmt-1.0-test/runs/d97basok?workspace=user-natalymr

сразу все данные, в 2 раза больше эпох

https://app.wandb.ai/natalymr/nmt-1.0-test/runs/9kdczc93?workspace=user-natalymr

изменила lr

прервался запуск
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/vylsmbo9?workspace=user-natalymr
полный запуск
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/1yinbj6a?workspace=user-natalymr


200 tokens

lr = 0.01, после 150 эпох уменьшаем в 10 раз, 250 эпох
https://app.wandb.ai/natalymr/nmt-1.0-test/runs/ucukw8kl

@natalymr
Copy link
Owner Author

natalymr commented May 18, 2020

New Dataset (match with code2seq)

NMT-2

100 tokens

Весь датасет, lr=0.01
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/og66z2nl
400 эпох:
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/6c7f56mo
250 эпох, не уменьшаем LR
https://app.wandb.ai/natalymr/nmt-2.0-test/runs/9k5usrqq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
gcm_summer2019
In Progress
Development

No branches or pull requests

1 participant