Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never ending training #22

Open
astariul opened this issue Jun 3, 2019 · 7 comments
Open

Never ending training #22

astariul opened this issue Jun 3, 2019 · 7 comments

Comments

@astariul
Copy link

astariul commented Jun 3, 2019

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

@Vibha111094
Copy link

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .
Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie
labels_tgt = input_ids_tgt[1:]
input_ids_tgt = input_ids_tgt[:-1]
input_mask_src = [1] * len(input_ids_src)
input_mask_tgt = [1] * len(input_ids_tgt) while creating tf record .

@ishurironaldinho
Copy link

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

I run the inference code ,but i don't know how to produce the summary.

should i post the original story through the postman,so it will give back a summary???

@thatianafernandes
Copy link

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .

Where exactly can I set that?

@Vibha111094
Copy link

In config.py you would have
lr = {
'learning_rate_schedule': 'constant.linear_warmup.rsqrt_decay.rsqrt_depth',
'lr_constant': 2 * (hidden_dim ** -0.5),
'static_lr': 1e-3,
'warmup_steps': 10000,
} .
You could increase to around 15000-20000.

@mishrachinmaya689
Copy link

When I put low numbers for steps =10 , warm up steps = 10 , max eval=10 iteration is still going 150+ for epoch 0. Could you help clarifying how those numbers are interlinked.

@xieyxclack
Copy link

xieyxclack commented Nov 30, 2019

Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .
Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie
labels_tgt = input_ids_tgt[1:]
input_ids_tgt = input_ids_tgt[:-1]
input_mask_src = [1] * len(input_ids_src)
input_mask_tgt = [1] * len(input_ids_tgt) while creating tf record .

hello, I adopt the default setting and obtain ROUGE-1/2/L: 39.29/17.30/27.10. In fact the ROUGE-L result is terrible. I trained on 1 GPU for 3 days, total 17w steps with batch size = 32.
Could you provide your results on CNN/Dailymail dataset, or do you know what is wrong?
Many thanks!@Vibha111094

@Shanzaay
Copy link

I am following the default settings. But after the second epoch, it's taking too long. Does anyone else happen to face the same problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants