Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Another opt bug fix ('init_model' not being set correctly) #3162

Merged
merged 4 commits into from Oct 8, 2020

Conversation

emilydinan
Copy link
Contributor

@emilydinan emilydinan commented Oct 6, 2020

Patch description
create_agent_from_opt_file was not properly copying over 'init_model' as set by opt. This causes issues, when, for example, we have a model that has been (1) initialized with some model file (2) requeued and has (3) load_from_checkpoint = True. When requeued, the 'init_model' will revert back to the model set in (1) inside create_agent_from_opt_file, instead of using the checkpoint file set by the train script (which is contained in opt['init_model']). This in turn leads to TorchAgent loading the model file from the best valid instead of the checkpoint.

Thanks to @wyshi for finding this bug!!

@@ -325,6 +325,9 @@ def create_agent_from_opt_file(opt: Opt):
opt_from_file[k] = v

opt_from_file['model_file'] = model_file # update model file path
if opt.get('init_model') is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple quick questions:

  1. why is this not handled in lines 309-315?
  2. why is this not handled in lines 323-325?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. when train_model.py sets opt['init_model'] to the checkpoint, it does not add it into override. adding to override could be one possible solution, what do you think?
  2. it isn't handled here in the specific case where init_model did exist in opt_from_file (i.e. (1) the PR description). ex: finetuning a model on BST that's initialized with Reddit and it gets requeued.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know which is better, but i'm very slightly leaning towards adding it to override in train model, it feels marginally better than carving out a special case for init_model in these opt functions. do you have an opinion either way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know if I like either. I think adding to override is hacky and can't be trusted lol. But also carving out this special exception isn't very nice either... @stephenroller ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both seem equally gross to me. This solution, with comments explaining wtf is going on, and a test checking that we do it correctly, seems sufficient.

Should we kill override, then we'll at least catch doing it wrong in the future.

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test that would catch this?

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the comment discussing what's going on and why please

@@ -325,6 +325,9 @@ def create_agent_from_opt_file(opt: Opt):
opt_from_file[k] = v

opt_from_file['model_file'] = model_file # update model file path
if opt.get('init_model') is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both seem equally gross to me. This solution, with comments explaining wtf is going on, and a test checking that we do it correctly, seems sufficient.

Should we kill override, then we'll at least catch doing it wrong in the future.

@emilydinan emilydinan merged commit bca1dd1 into master Oct 8, 2020
@emilydinan emilydinan deleted the optdoesitagain branch October 8, 2020 17:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants