Another opt bug fix ('init_model' not being set correctly) #3162

emilydinan · 2020-10-06T22:03:50Z

Patch description
create_agent_from_opt_file was not properly copying over 'init_model' as set by opt. This causes issues, when, for example, we have a model that has been (1) initialized with some model file (2) requeued and has (3) load_from_checkpoint = True. When requeued, the 'init_model' will revert back to the model set in (1) inside create_agent_from_opt_file, instead of using the checkpoint file set by the train script (which is contained in opt['init_model']). This in turn leads to TorchAgent loading the model file from the best valid instead of the checkpoint.

Thanks to @wyshi for finding this bug!!

klshuster · 2020-10-06T22:11:03Z

parlai/core/agents.py

@@ -325,6 +325,9 @@ def create_agent_from_opt_file(opt: Opt):
            opt_from_file[k] = v

    opt_from_file['model_file'] = model_file  # update model file path
+    if opt.get('init_model') is not None:


a couple quick questions:

why is this not handled in lines 309-315?

why is this not handled in lines 323-325?

when train_model.py sets opt['init_model'] to the checkpoint, it does not add it into override. adding to override could be one possible solution, what do you think?

it isn't handled here in the specific case where init_model did exist in opt_from_file (i.e. (1) the PR description). ex: finetuning a model on BST that's initialized with Reddit and it gets requeued.

I dont know which is better, but i'm very slightly leaning towards adding it to override in train model, it feels marginally better than carving out a special case for init_model in these opt functions. do you have an opinion either way?

I don't really know if I like either. I think adding to override is hacky and can't be trusted lol. But also carving out this special exception isn't very nice either... @stephenroller ?

Both seem equally gross to me. This solution, with comments explaining wtf is going on, and a test checking that we do it correctly, seems sufficient.

Should we kill override, then we'll at least catch doing it wrong in the future.

stephenroller

Could we add a test that would catch this?

stephenroller

Add the comment discussing what's going on and why please

stephenroller · 2020-10-07T18:12:42Z

parlai/core/agents.py

@@ -325,6 +325,9 @@ def create_agent_from_opt_file(opt: Opt):
            opt_from_file[k] = v

    opt_from_file['model_file'] = model_file  # update model file path
+    if opt.get('init_model') is not None:


Both seem equally gross to me. This solution, with comments explaining wtf is going on, and a test checking that we do it correctly, seems sufficient.

Should we kill override, then we'll at least catch doing it wrong in the future.

omg

20748a9

emilydinan requested a review from klshuster October 6, 2020 22:03

facebook-github-bot added the CLA Signed label Oct 6, 2020

emilydinan requested review from stephenroller and wyshi October 6, 2020 22:04

klshuster reviewed Oct 6, 2020

View reviewed changes

stephenroller reviewed Oct 7, 2020

View reviewed changes

add a test

8114694

emilydinan requested a review from stephenroller October 7, 2020 14:51

lint??

e65ebd2

stephenroller approved these changes Oct 7, 2020

View reviewed changes

add a comment

4405bdb

emilydinan merged commit bca1dd1 into master Oct 8, 2020

emilydinan deleted the optdoesitagain branch October 8, 2020 17:43

emilydinan mentioned this pull request Oct 28, 2020

[Opt] Do not save load_from_checkpoint #3236

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another opt bug fix ('init_model' not being set correctly) #3162

Another opt bug fix ('init_model' not being set correctly) #3162

emilydinan commented Oct 6, 2020 •

edited

klshuster Oct 6, 2020

emilydinan Oct 7, 2020

klshuster Oct 7, 2020

emilydinan Oct 7, 2020

stephenroller Oct 7, 2020

stephenroller left a comment

stephenroller left a comment

stephenroller Oct 7, 2020

Another opt bug fix ('init_model' not being set correctly) #3162

Another opt bug fix ('init_model' not being set correctly) #3162

Conversation

emilydinan commented Oct 6, 2020 • edited

klshuster Oct 6, 2020

Choose a reason for hiding this comment

emilydinan Oct 7, 2020

Choose a reason for hiding this comment

klshuster Oct 7, 2020

Choose a reason for hiding this comment

emilydinan Oct 7, 2020

Choose a reason for hiding this comment

stephenroller Oct 7, 2020

Choose a reason for hiding this comment

stephenroller left a comment

Choose a reason for hiding this comment

stephenroller left a comment

Choose a reason for hiding this comment

stephenroller Oct 7, 2020

Choose a reason for hiding this comment

emilydinan commented Oct 6, 2020 •

edited