Never ending training #22

astariul · 2019-06-03T02:07:22Z

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

Vibha111094 · 2019-06-03T02:43:05Z

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .
Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie
labels_tgt = input_ids_tgt[1:]
input_ids_tgt = input_ids_tgt[:-1]
input_mask_src = [1] * len(input_ids_src)
input_mask_tgt = [1] * len(input_ids_tgt) while creating tf record .

ishurironaldinho · 2019-06-20T03:24:35Z

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

I run the inference code ,but i don't know how to produce the summary.

should i post the original story through the postman,so it will give back a summary???

thatianafernandes · 2019-07-22T10:30:15Z

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .

Where exactly can I set that?

Vibha111094 · 2019-07-22T10:57:33Z

In config.py you would have
lr = {
'learning_rate_schedule': 'constant.linear_warmup.rsqrt_decay.rsqrt_depth',
'lr_constant': 2 * (hidden_dim ** -0.5),
'static_lr': 1e-3,
'warmup_steps': 10000,
} .
You could increase to around 15000-20000.

mishrachinmaya689 · 2019-10-19T06:44:19Z

When I put low numbers for steps =10 , warm up steps = 10 , max eval=10 iteration is still going 150+ for epoch 0. Could you help clarifying how those numbers are interlinked.

xieyxclack · 2019-11-30T09:19:09Z

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .
Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie
labels_tgt = input_ids_tgt[1:]
input_ids_tgt = input_ids_tgt[:-1]
input_mask_src = [1] * len(input_ids_src)
input_mask_tgt = [1] * len(input_ids_tgt) while creating tf record .

hello, I adopt the default setting and obtain ROUGE-1/2/L: 39.29/17.30/27.10. In fact the ROUGE-L result is terrible. I trained on 1 GPU for 3 days, total 17w steps with batch size = 32.
Could you provide your results on CNN/Dailymail dataset, or do you know what is wrong?
Many thanks!@Vibha111094

Shanzaay · 2019-12-28T18:10:55Z

I am following the default settings. But after the second epoch, it's taking too long. Does anyone else happen to face the same problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Never ending training #22

Never ending training #22

astariul commented Jun 3, 2019

Vibha111094 commented Jun 3, 2019

ishurironaldinho commented Jun 20, 2019

thatianafernandes commented Jul 22, 2019

Vibha111094 commented Jul 22, 2019

mishrachinmaya689 commented Oct 19, 2019

xieyxclack commented Nov 30, 2019 •

edited

Loading

Shanzaay commented Dec 28, 2019

Never ending training #22

Never ending training #22

Comments

astariul commented Jun 3, 2019

Vibha111094 commented Jun 3, 2019

ishurironaldinho commented Jun 20, 2019

thatianafernandes commented Jul 22, 2019

Vibha111094 commented Jul 22, 2019

mishrachinmaya689 commented Oct 19, 2019

xieyxclack commented Nov 30, 2019 • edited Loading

Shanzaay commented Dec 28, 2019

xieyxclack commented Nov 30, 2019 •

edited

Loading