save trained model to checkpoing? #1

yuyangxie96 · 2022-03-24T07:11:01Z

I noticed that Line 158 of main.py controls saving the model to the checkpoint. When I run lra-text, the trained model is not saved, and the performance seems to be overfitting. Are there some problems or mistakes here?

pkuzengqi · 2022-03-30T01:20:21Z

Checkpoint saving:
See

Skyformer/src/main.py

Line 156 in cfe8c8c

if dev_accu > best_dev_accu:

if dev_accu > best_dev_accu:
        best_dev_accu = dev_accu
        if (train_step_idx + 1) > total_step * 0.2:
            torch.save({"model_state_dict":model.state_dict()}, checkpoint_path)
            print('best model saved: step = ',train_step_idx, 'dev accu = ',dev_accu)

It's controlled not to save too frequently. You may drag the torch.save() out for the case of lra-text (quickly overfitting) or disable line 158.

Overfitting:
All models seem to overfit this dataset. I post a dev loss fig on the last page of https://arxiv.org/pdf/2112.05359.pdf for reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save trained model to checkpoing? #1

save trained model to checkpoing? #1

yuyangxie96 commented Mar 24, 2022

pkuzengqi commented Mar 30, 2022 •

edited

save trained model to checkpoing? #1

save trained model to checkpoing? #1

Comments

yuyangxie96 commented Mar 24, 2022

pkuzengqi commented Mar 30, 2022 • edited

pkuzengqi commented Mar 30, 2022 •

edited