Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RES-2190: Fix labels set to -100 in finetuning tasks #517

Merged
merged 11 commits into from
May 25, 2021

Conversation

benja-matic
Copy link
Contributor

Regarding RES-2190. Looks like HuggingFace trainer looks at args.dataloader_drop_last for train, and eval loaders. Workaround is to turn flip that to False during evaluation in compute_metrics_task, flip it back to True at the end (if it was originally true). A few minor ignorable formatting edits will merge in as well. Finally, there's a finetuning experiment for tiny_bert50k.

@mvacaporale
Copy link
Contributor

mvacaporale commented May 19, 2021

Good find. The solution is simple and straightforward. Note, there could be others ways. For one, we could override get_eval_dataloader, but I think that would a bit cumbersome. I think yours will work quite well for our needs.

@@ -144,6 +144,12 @@
)


initial_finetuning_tiny_bert50k_glue = deepcopy(finetuning_bert700k_glue)
initial_finetuning_tiny_bert50k_glue.update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why initial_finetuning_tiny_bert50k_glue as opposed to just finetuning_tiny_bert50k_glue? What does the initial signify?

@@ -52,7 +52,7 @@
from transformers.integrations import is_wandb_available
from transformers.trainer_utils import get_last_checkpoint, is_main_process

from experiments import CONFIGS
from experiments import CONFIGS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you added an extra space at the end of CONFIGS. Make sure you run flake8 in whatever repo you're submitting a PR for to catch these issues.

cd ~/nta/nupic.research
flake8

test_dataset = None
if ((data_args.task_name is not None or data_args.test_file is not None)
and training_args.do_predict):
and training_args.do_predict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed some whitespace here. Flake8 would have caught this as well.

vals, cnts = np.unique(np.array(eval_dataset['label']), return_counts=True)
logging.info(f"Label distribution for {task} before calling trainer.evaluate")
for val in range(len(vals)):
logging.info(f"Label={vals[val]}: {cnts[val]} examples")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intentionally mean to keep this logging? Do you think we need it?

preds = preds[np.where(ep.label_ids != -100)]
logging.info(f"Removing {1-(len(label_ids) / len(ep.label_ids)):.2%} samples "
"from evaluation set where label == -100")
assert -100 not in ep.label_ids, "unknown source of -100 labels"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick: Could you fix the punctuation? "Unknown source of -100 labels."

@@ -916,7 +930,7 @@ def model_init():
"HP search with saved models not supported."
logging.info("Pretraining new model from scratch")

# Instantiate model; possibly one of our custom sparse models.
# Instantiate model; possibly one of our custom sparse models.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks like it was changed here as well.

Copy link
Contributor

@lucasosouza lucasosouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seems like a good solution to me, it is a hack but it should work fine and with no extra computation or memory required. I left some very minor comments, please take a look

test_dataset = tokenized_datasets[
"test_matched" if data_args.task_name == "mnli" else "test"
]
if (data_args.task_name is not None or data_args.test_file is not None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this change, what is the difference between the old version and the new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No semantic difference. I just couldn't figure out how to make it pep compliant. I was getting issues about the indent, or the length of the line, so just broke it up.

# if you want drop_last for training, this toggles it back on
if drop_last:
trainer.args.dataloader_drop_last = True
logging.info("Switched trainer.args.dataloader_drop_last"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should lower the level of this logging, logging.debug should be enough. There is no useful information to the user here, it is more of a print for debugging purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Those were mainly to verify that my fix was working properly, not needed going forward. I removed the bit above and a couple related ones.

@mvacaporale
Copy link
Contributor

mvacaporale commented May 20, 2021

@benja-matic Do you know how this change affects the fine-tuning results? We should probably rerun those from the README (including bert_1mi, bert_100k, sparse_80%_kd_onecycle_lr_rigl, and sparse_80%_kd_onecycle_lr) @lucasosouza What do you think? Is this necessary? My concern is that it may make it harder to contextualize new results if we don't rerun previous fine-tuning experiments. Or at the very least, we should rerun one or two just to make sure the results are negligibly affected.

@benja-matic
Copy link
Contributor Author

@benja-matic Do you know how this change affects the fine-tuning results? We should probably rerun those from the README (including bert_1mi, bert_100k, sparse_80%_kd_onecycle_lr_rigl, and sparse_80%_kd_onecycle_lr) @lucasosouza What do you think? Is this necessary? My concern is that it may make it harder to contextualize new results if we don't rerun previous fine-tuning experiments. Or at the very least, we should rerun one or two just to make sure the results are negligibly affected.

@mvacaporale don't know just yet. I'm rerunning fine tuning on baseline models this afternoon.

@benja-matic benja-matic merged commit bc38d37 into numenta:master May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants