You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I get a shape mismatch error while running the t5_mesh_transformer either for training or fine-tuning.
Following is an example fine-tuning run, using a sample WMT TSV file:
ERROR:tensorflow:Error recorded from training_loop: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((768, 768)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([1024, 16384]) from checkpoint reader.
E1110 22:05:58.563133 140034595272448 error_handling.py:75] Error recorded from training_loop: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((768, 768)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([1024, 16384]) from checkpoint reader.
INFO:tensorflow:training_loop marked as finished
I1110 22:05:58.563437 140034595272448 error_handling.py:101] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1110 22:05:58.563559 140034595272448 error_handling.py:135] Reraising captured error
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1191, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
config)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3159, in _model_fn
_train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn))
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3604, in _train_on_tpu_system
device_assignment=ctx.device_assignment)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/tpu.py", line 1277, in split_compile_and_shard
name=name)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/tpu.py", line 992, in split_compile_and_replicate
outputs = computation(*computation_inputs)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3589, in multi_tpu_train_steps_on_single_shard
inputs=[0, _INITIAL_LOSS])
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/training_loop.py", line 178, in while_loop
condition_wrapper, body_wrapper, inputs, name="", parallel_iterations=1)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2753, in while_loop
return_same_structure)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2245, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2170, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/tpu/training_loop.py", line 121, in body_wrapper
outputs = body(*(inputs + dequeue_ops))
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3588, in <lambda>
lambda i, loss: [i + 1, single_tpu_train_step(i)],
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1715, in train_step
self._call_model_fn(features, labels))
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1994, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/mesh_tensorflow/transformer/utils.py", line 567, in my_model_fn
init_checkpoint, {v: v for v in restore_vars}
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint
init_from_checkpoint_fn)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1940, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1947, in _merge_call
return merge_fn(self._strategy, *args, **kwargs)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 286, in <lambda>
ckpt_dir_or_file, assignment_map)
File "/home/nasrinm/anaconda3/envs/t5/lib/python3.6/site-packages/tensorflow_core/python/training/checkpoint_utils.py", line 329, in _init_from_checkpoint
tensor_name_in_ckpt, str(variable_map[tensor_name_in_ckpt])
ValueError: Shape of variable decoder/block_000/layer_000/SelfAttention/k:0 ((768, 768)) doesn't match with shape of tensor decoder/block_000/layer_000/SelfAttention/k ([1024, 16384]) from checkpoint reader.
Note that the error is replicable using any of the pretrained models, not just 11B param one.
The text was updated successfully, but these errors were encountered:
Hi, I think the issue is that your command includes both the --gin_file="gs://t5-data/pretrained_models/11B/operative_config.gin"
and --gin_file="models/bi_v1.gin"
flags. The latter flag is overwriting the values from the pretrained operative config and messing up the model hparams so that they don't match the pretrained checkpoint. Looking at the readme I can see that this is not explained at all, sorry about that. Can you give that a try to confirm that's the issue you're facing, and then we can update the readme? Thanks.
I get a shape mismatch error while running the
t5_mesh_transformer
either for training or fine-tuning.Following is an example fine-tuning run, using a sample WMT TSV file:
Then I get the following error:
Note that the error is replicable using any of the pretrained models, not just 11B param one.
The text was updated successfully, but these errors were encountered: