Wrong checkpoint metric used in load_best_model #46

geoffreyangus · 2019-12-15T02:09:48Z

Describe the bug
The bug is that the wrong checkpoint_metric is used in load_best_model at the end of EmmentalLearner.learn. I believe that it has to do with the fact that utils.merge doesn't delete entries, it just replaces them. This leaves us with multiple entries in logging_config.checkpointer_config.checkpointer_metric.

To Reproduce
Steps to reproduce the behavior:

Initialize an Emmental experiment
Run the following code snippet:

Meta.update_config(config={
    'learner_config': {
        'n_epochs': 2,
        'valid_split': 'valid',
        'optimizer_config': {'optimizer': 'adam', 'lr': 0.01, 'l2': 0.000},
        'lr_scheduler_config': {}
    },
    'logging_config': {
        'evaluation_freq': 1,
        'checkpointing': True,
        'checkpointer_config': {
            'checkpoint_metric': {
                'model/all/valid/loss': 'min'
            }
        }
    }
})
print(Meta.config['logging_config'])

At this point, it should be clear that there are multiple values in logging_config.checkpointer_config.checkpoint_metric. However, in order to see how this affects downstream tasks, run EmmentalLearner.learn

...
model = EmmentalModel(name='model', tasks=tasks)
learner = EmmentalLearner()
learner.learn()

Finally, print list(learner.logging_manager.checkpointer.checkpoint_metric.keys())[0], which shows the value used by Checkpointer.load_best_model function in order to determine if a best model was found (checkpointer.py, line ~253). The value from the default config should appear at this point instead of the value from the updated config.

Expected behavior
I expect the checkpoint metric I defined in the updated config to be used in Checkpointer.load_best_model.

Environment

OS: Ubuntu 16.04
Emmental Version: 0.0.4
Python 3.6

The text was updated successfully, but these errors were encountered:

senwu · 2019-12-17T22:46:14Z

This issue should be fixed in #47. Please check! Feel free to reopen if the issue still exists.

senwu closed this as completed Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong checkpoint metric used in load_best_model #46

Wrong checkpoint metric used in load_best_model #46

geoffreyangus commented Dec 15, 2019

senwu commented Dec 17, 2019

Wrong checkpoint metric used in load_best_model #46

Wrong checkpoint metric used in load_best_model #46

Comments

geoffreyangus commented Dec 15, 2019

senwu commented Dec 17, 2019