Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'adam_m not found in checkpoint ' when further pretraining #45

Closed
DayuanJiang opened this issue Apr 18, 2020 · 6 comments
Closed

'adam_m not found in checkpoint ' when further pretraining #45

DayuanJiang opened this issue Apr 18, 2020 · 6 comments

Comments

@DayuanJiang
Copy link

When I was trying further pretraining on the models with domain-specific data in Colab, I encountered a problem that the official pretrained model could not be loaded.

Here is the commend for further pretraining.

hparam =    '{"model_size": "small", \
             "use_tpu":true, \
             "num_tpu_cores":8, \
             "tpu_name":"grpc://10.53.161.26:8470", \
             "num_train_steps":4000100,\
             "pretrain_tfrecords":"gs://tweet_torch/electra/electra/data/pretrain_tf_records/pretrain_data.tfrecord*", \
             "model_dir":"gs://tweet_torch/electra/electra/data/electra_small/", \
             "generator_hidden_size":1.0\
            }'
!python electra/run_pretraining.py  \
                    --data-dir "gs://tweet_torch/electra/electra/data/" \
                    --model-name "electra_small" \
                    --hparams '{hparam}'

And the error message is pretty long so I just paste some of it that seems useful.

ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
From /job:worker/replica:0/task:0:
Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
	 [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]
@pnhuy
Copy link

pnhuy commented Apr 26, 2020

I also had the same problem.

It seems that the adam_m parameter was removed from the checkpoint before saving. (google-research/bert#99 (comment))

So if we don't have the full checkpoint, we can't do further training.

Just waiting for the full checkpoint.

@clarkkev
Copy link
Collaborator

You should be able to do further training, just don't initialize the Adam parameters from the checkpoint by doing something like this. I don't think refreshing the Adam parameters will cause any real problem with the model.

@w5688414
Copy link

I meet the same problem

@ghost
Copy link

ghost commented Sep 22, 2020

Hello, I checked the solution @clarkkev mentioned above but still don't know the exact solution. Can anyone provide any further help? I am new to tensorflow and did not find anywhere skip the adam_m parameters from the link above. Thank you in advance.

@Veyronl
Copy link

Veyronl commented Dec 9, 2020

I have the same trouble.Could you tell me how to fix it? @clarkkev @lincoln-jiang @w5688414 Thank you in advance.

@DayuanJiang
Copy link
Author

@Veyronl I just gave up using Electra.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants