'adam_m not found in checkpoint ' when further pretraining #45

DayuanJiang · 2020-04-18T11:23:47Z

When I was trying further pretraining on the models with domain-specific data in Colab, I encountered a problem that the official pretrained model could not be loaded.

Here is the commend for further pretraining.

hparam =    '{"model_size": "small", \
             "use_tpu":true, \
             "num_tpu_cores":8, \
             "tpu_name":"grpc://10.53.161.26:8470", \
             "num_train_steps":4000100,\
             "pretrain_tfrecords":"gs://tweet_torch/electra/electra/data/pretrain_tf_records/pretrain_data.tfrecord*", \
             "model_dir":"gs://tweet_torch/electra/electra/data/electra_small/", \
             "generator_hidden_size":1.0\
            }'
!python electra/run_pretraining.py  \
                    --data-dir "gs://tweet_torch/electra/electra/data/" \
                    --model-name "electra_small" \
                    --hparams '{hparam}'

And the error message is pretty long so I just paste some of it that seems useful.

ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
From /job:worker/replica:0/task:0:
Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
	 [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]

The text was updated successfully, but these errors were encountered:

pnhuy · 2020-04-26T13:07:29Z

I also had the same problem.

It seems that the adam_m parameter was removed from the checkpoint before saving. (google-research/bert#99 (comment))

So if we don't have the full checkpoint, we can't do further training.

Just waiting for the full checkpoint.

clarkkev · 2020-05-12T20:10:12Z

You should be able to do further training, just don't initialize the Adam parameters from the checkpoint by doing something like this. I don't think refreshing the Adam parameters will cause any real problem with the model.

w5688414 · 2020-08-11T14:47:18Z

I meet the same problem

ghost · 2020-09-22T19:10:54Z

Hello, I checked the solution @clarkkev mentioned above but still don't know the exact solution. Can anyone provide any further help? I am new to tensorflow and did not find anywhere skip the adam_m parameters from the link above. Thank you in advance.

Veyronl · 2020-12-09T09:12:16Z

I have the same trouble.Could you tell me how to fix it? @clarkkev @lincoln-jiang @w5688414 Thank you in advance.

DayuanJiang · 2020-12-09T10:15:10Z

@Veyronl I just gave up using Electra.

clarkkev closed this as completed May 18, 2020

ymcui mentioned this issue Aug 12, 2020

error in loading checkpoints for pretraining ymcui/Chinese-ELECTRA#49

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'adam_m not found in checkpoint ' when further pretraining #45

'adam_m not found in checkpoint ' when further pretraining #45

DayuanJiang commented Apr 18, 2020

pnhuy commented Apr 26, 2020

clarkkev commented May 12, 2020

w5688414 commented Aug 11, 2020

ghost commented Sep 22, 2020

Veyronl commented Dec 9, 2020

DayuanJiang commented Dec 9, 2020

'adam_m not found in checkpoint ' when further pretraining #45

'adam_m not found in checkpoint ' when further pretraining #45

Comments

DayuanJiang commented Apr 18, 2020

pnhuy commented Apr 26, 2020

clarkkev commented May 12, 2020

w5688414 commented Aug 11, 2020

ghost commented Sep 22, 2020

Veyronl commented Dec 9, 2020

DayuanJiang commented Dec 9, 2020