LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum) #94

MichaelCaohn · 2019-11-04T02:09:23Z

I am using run_classifier_with_tfhub with --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2.

I am getting error like "LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum)"

The argument is:
python3 -m run_classifier_with_tfhub --data_dir=../../DataSet/CoLA/ --t
ask_name=cola --output_dir=testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq
_length=128 --train_batch_size=32 --learning_rate=2e-05 --num_train_epochs=3.0

I am using tensorflow==1.15.0

Rachnas · 2019-11-05T11:46:01Z

I faced the same issue with Hub2, work around is to use Hub1.

mnsrmov · 2019-11-05T16:49:17Z

I faced the same issue with Hub2, work around is to use Hub1.

FYI, Rachnas means using version 1 of the base model rather than 2. If someone finds a way to use version 2 please tell us the secret!

MichaelCaohn · 2019-11-08T16:25:13Z

I faced the same issue with Hub2, work around is to use Hub1.

Thank you for the advice Rachnas. It worked in Hub1. However, still wondering how to work using Hub2 :)

mnsrmov · 2019-11-11T17:06:19Z

astrongstorm, Rachnas have you guys been able to get reasonable results from any training? Even when I repeat the same example in they have provided I get pretty bad results.

jdongca2003 · 2019-11-13T00:02:14Z

You need to specify vocab or spam_model_file (related to sentencepiece tokenization model) in the command line.
How do you get them?
You can download https://tfhub.dev/google/albert_base/1 and untar them. Then you can find them in "../asset/30k-clean.model"

add command-line arguments
"--spam_model_file=YOUR_PATH/assets/30k-clean.model"

Note: only work for hub1.

Rachnas · 2019-11-16T01:43:20Z

astrongstorm, Rachnas have you guys been able to get reasonable results from any training? Even when I repeat the same example in they have provided I get pretty bad results.

I am yet to get results.

jdongca2003 · 2019-11-16T02:42:18Z

run_classifier_with_tfhub.py
--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1
--data_dir=glue_data/MNLI
--task_name=mnli
--spm_model_file=30k-clean.model
--output_dir=output
--do_train=true
--do_eval=true
--max_seq_length=128
--train_batch_size=32
--learning_rate=1e-4
--num_train_epochs=5
--eval_batch_size=32
--predict_batch_size=32
--use_tpu=False

I got poor results too:

INFO:tensorflow:***** Eval results *****
I1109 21:40:18.838561 139722163410752 run_classifier_with_tfhub.py:273] ***** Eval results *****
INFO:tensorflow: eval_accuracy = 0.8169129
I1109 21:40:18.838666 139722163410752 run_classifier_with_tfhub.py:275] eval_accuracy = 0.8169129
INFO:tensorflow: eval_loss = 0.57061106
I1109 21:40:18.838964 139722163410752 run_classifier_with_tfhub.py:275] eval_loss = 0.57061106
INFO:tensorflow: global_step = 61359

liuqiangict · 2019-11-18T00:51:19Z

For this problem, I believe we are talking about the v2, there are some problems on tensor lookup on Hub2, right?

zheyuye · 2019-11-19T08:46:59Z

facing same issue using version 2, but it works fine with version 1 by defined spam_model_file in command line

mnsrmov · 2019-11-25T15:21:14Z

I'm getting bad results on both version 1 and 2. Better results on 1 in comparison to 2 though. In my prior experiences with other models I found that Lamb was very sensitive to the parameters. I'm thinking of trying Adam to see if that is the problem. Has anyone tried using Adam instead of Lamb and see if they get better results?

agupta74 · 2019-12-04T07:17:18Z

I am also having the same issue.

PradyumnaGupta · 2019-12-12T14:39:05Z

The training problem is still not solved even after using Hub1 ( version 1 of ALBERT ) . It gives the following error -
ValueError: Variable <tf.Variable 'albert_layer_module/cls/predictions/output_bias:0' shape=(30000,) dtype=float32> has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

iShaka · 2019-12-13T18:35:22Z

I'm getting bad results on both version 1 and 2. Better results on 1 in comparison to 2 though. In my prior experiences with other models I found that Lamb was very sensitive to the parameters. I'm thinking of trying Adam to see if that is the problem. Has anyone tried using Adam instead of Lamb and see if they get better results?

Have you solve the problem on v2? Could you share how to make it work?

agupta74 · 2019-12-19T07:12:50Z

The issue with hub v2 modules is not fixed yet (v1 is good)

0x0539 · 2019-12-20T23:16:40Z

The "no gradient defined for operation Einsum" was found to be caused by using an old version of TF. The full investigation is here. I've modified requirements.txt to explicitly request TF 1.15. Please run pip install -r requirements.txt and verify that you are running TF 1.15. If you still see the problem, let me know by posting to this thread.

BTW, I merged the TF-hub functionality into run_classifier.py in this commit. The reason is that run_classifier_with_tfhub.py got out of sync. Please use run_classifier.py with --albert_hub_module_handle=XXX when fine-tuning from TF-Hub. Sorry for any inconvenience.

I tested this with TF1.15 using the v2 hub modules and it seems to be working at HEAD.

python3 -m run_classifier --data_dir="$HOME/ALBERT/glue" --task_name=cola --output_dir=/tmp/testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-05 --train_step=50 --spm_model_file="$HOME/ALBERT/spm_vocab/30k-clean.model"

agupta74 · 2019-12-21T02:14:16Z

I am still seeing the same issue with TF 1.15. using the "run_classifier" command mentioned above. v1 module works fine.

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Einsum' (op type: Einsum)

jonanem · 2019-12-23T20:01:22Z

The "no gradient defined for operation Einsum" was found to be caused by using an old version of TF. The full investigation is here. I've modified requirements.txt to explicitly request TF 1.15. Please run pip install -r requirements.txt and verify that you are running TF 1.15. If you still see the problem, let me know by posting to this thread.

BTW, I merged the TF-hub functionality into run_classifier.py in this commit. The reason is that run_classifier_with_tfhub.py got out of sync. Please use run_classifier.py with --albert_hub_module_handle=XXX when fine-tuning from TF-Hub. Sorry for any inconvenience.

I tested this with TF1.15 using the v2 hub modules and it seems to be working at HEAD.
python3 -m run_classifier --data_dir="$HOME/ALBERT/glue" --task_name=cola --output_dir=/tmp/testing_ttt --vocab_file=vocab.txt --albert_hub_module_handle=https://tfhub.dev/google/albert_base/2 --do_train=True --do_eval=True --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-05 --train_step=50 --spm_model_file="$HOME/ALBERT/spm_vocab/30k-clean.model"

with tensorflow version 1.15 we are still facing the same error

0x0539 · 2019-12-27T20:20:08Z

Ah, now I'm able to reproduce it. There appears to be an issue with the way that the V2 modules were generated. I'm looking into it with the TF team and will get back with an answer soon hopefully.

0x0539 · 2019-12-27T20:24:01Z

It looks like V2 modules were generated with a different version of TF, which contains native ops not present in TF 1.X releases. We will have to regenerate and re-release them with TF 1.15. Apologies for the inconvenience. I'll update this thread when the new modules are uploaded.

0x0539 · 2020-01-08T21:57:23Z

We have regenerated the hub modules using TF1.15.
Please use hub modules with the "/3" suffix. Hub modules with the "/2" suffix will remain broken. TF-Hub links in the readme have been updated accordingly.
See Jan 7 update in the readme for more info.

TheGlobalist · 2020-02-22T08:49:09Z

I am facing the same issue with the traditional BERt on Colab.
Here's all the specs:

TF --> '1.15.0'
Colab --> '0.7.0'

Code for loading BERt


        input_word_ids = tf.keras.layers.Input(shape=(20,), dtype=tf.int32, name="input_word_ids")
        input_mask = tf.keras.layers.Input(shape=(20,), dtype=tf.int32, name="input_mask")
        segment_ids = tf.keras.layers.Input(shape=(20,), dtype=tf.int32, name="segment_ids")
        #BERt = BERtLayer()([input_word_ids, input_mask, segment_ids])
        bert = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/1", trainable=True)
        pooled_output, sequence_output = bert([input_word_ids, input_mask, segment_ids])

Exception thrown

`Call initializer instance with the dtype argument instead of passing it to the constructor
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_word_ids (InputLayer)     [(None, 20)]         0                                            
__________________________________________________________________________________________________
input_mask (InputLayer)         [(None, 20)]         0                                            
__________________________________________________________________________________________________
segment_ids (InputLayer)        [(None, 20)]         0                                            
__________________________________________________________________________________________________
keras_layer (KerasLayer)        [(None, 768), (None, 177853441   input_word_ids[0][0]             
                                                                 input_mask[0][0]                 
                                                                 segment_ids[0][0]                
__________________________________________________________________________________________________
bidirectional (Bidirectional)   [(None, None, 512),  2099200     keras_layer[0][1]                
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 512)          0           bidirectional[0][1]              
                                                                 bidirectional[0][3]              
__________________________________________________________________________________________________
repeat_vector (RepeatVector)    (None, None, 512)    0           concatenate[0][0]                
__________________________________________________________________________________________________
dense (Dense)                   (None, None, 1)      513         repeat_vector[0][0]              
__________________________________________________________________________________________________
activation (Activation)         (None, None, 1)      0           dense[0][0]                      
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 512)          0           bidirectional[0][0]              
                                                                 activation[0][0]                 
__________________________________________________________________________________________________
multiply (Multiply)             (None, None, 512)    0           bidirectional[0][0]              
                                                                 lambda[0][0]                     
__________________________________________________________________________________________________
babelnet (Dense)                (None, None, 26221)  13451373    multiply[0][0]                   
__________________________________________________________________________________________________
domain (Dense)                  (None, None, 9916)   5086908     multiply[0][0]                   
__________________________________________________________________________________________________
lexicon (Dense)                 (None, None, 9916)   5086908     multiply[0][0]                   
==================================================================================================
Total params: 203,578,343
Trainable params: 203,578,342
Non-trainable params: 1
__________________________________________________________________________________________________
enter in train...
WARNING:tensorflow:From /content/Progetto/code/tokenizer.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /content/Progetto/code/tokenizer.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

Done train preparation...
Done label preparatiomn
ciao
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 29740 samples, validate on 7436 samples
2020-02-22 08:36:39.829236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.829902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-02-22 08:36:39.830005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-22 08:36:39.830039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-22 08:36:39.830074: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-22 08:36:39.830103: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-22 08:36:39.830127: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-22 08:36:39.830154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-22 08:36:39.830182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-22 08:36:39.830309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.830960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.831507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-02-22 08:36:39.831561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-22 08:36:39.831575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-02-22 08:36:39.831603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-02-22 08:36:39.831760: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.832342: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-22 08:36:39.832866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2020-02-22 08:36:41.766438: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:42.267189: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:43.576576: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:43.654042: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
2020-02-22 08:36:44.099220: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 367248384 exceeds 10% of system memory.
Epoch 1/4
2020-02-22 08:37:05.283924: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-22 08:37:08.127005: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at function_ops.cc:250 : Not found: No gradient defined for op: Einsum
Traceback (most recent call last):
  File "model.py", line 128, in <module>
    modello.train(train,label,vocab_label_bn,vocab_label_wndmn,vocab_label_lex, train_dev, label_dev)
  File "model.py", line 92, in train
    callbacks = [checkpoint, early_stopper],
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 675, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 394, in model_iteration
    batch_outs = f(ins_batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.NotFoundError: [_Derived_]No gradient defined for op: Einsum
	 [[{{node Func/_36}}]]
	 [[training/Adam/gradients/gradients/keras_layer/cond/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/PartitionedCall/gradients/StatefulPartitionedCall_grad/SymbolicGradient]]`

RobRomijnders · 2020-05-16T13:58:42Z

Does the hub module have multiple tags? If so, did you tried any other?

I faced a similar error with a different hub module. It turns out I was using the incorrect tag.

andrewluchen transferred this issue from google-research/google-research Jan 6, 2020

0x0539 mentioned this issue Jan 8, 2020

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum) #31

Closed

0x0539 closed this as completed Jan 8, 2020

This was referenced Jan 8, 2020

tensorflow.python.framework.errors_impl.NotFoundError: Graph ops missing from the python registry ({'Einsum'}) are also absent from the c++ registry #82

Closed

[ALBERT]: LookupError: gradient registry has no entry for: AddV2 #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum) #94

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum) #94

MichaelCaohn commented Nov 4, 2019

Rachnas commented Nov 5, 2019

mnsrmov commented Nov 5, 2019 •

edited

MichaelCaohn commented Nov 8, 2019

mnsrmov commented Nov 11, 2019

jdongca2003 commented Nov 13, 2019 •

edited

Rachnas commented Nov 16, 2019

jdongca2003 commented Nov 16, 2019

liuqiangict commented Nov 18, 2019 •

edited

zheyuye commented Nov 19, 2019

mnsrmov commented Nov 25, 2019

agupta74 commented Dec 4, 2019

PradyumnaGupta commented Dec 12, 2019

iShaka commented Dec 13, 2019

agupta74 commented Dec 19, 2019

0x0539 commented Dec 20, 2019 •

edited

agupta74 commented Dec 21, 2019 •

edited

jonanem commented Dec 23, 2019

0x0539 commented Dec 27, 2019

0x0539 commented Dec 27, 2019

0x0539 commented Jan 8, 2020 •

edited

TheGlobalist commented Feb 22, 2020

RobRomijnders commented May 16, 2020

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum) #94

LookupError: No gradient defined for operation 'module_apply_tokens/bert/encoder/transformer/group_0_11/layer_11/inner_group_0/ffn_1/intermediate/output/dense/einsum/Ein$um' (op type: Einsum) #94

Comments

MichaelCaohn commented Nov 4, 2019

Rachnas commented Nov 5, 2019

mnsrmov commented Nov 5, 2019 • edited

MichaelCaohn commented Nov 8, 2019

mnsrmov commented Nov 11, 2019

jdongca2003 commented Nov 13, 2019 • edited

Rachnas commented Nov 16, 2019

jdongca2003 commented Nov 16, 2019

liuqiangict commented Nov 18, 2019 • edited

zheyuye commented Nov 19, 2019

mnsrmov commented Nov 25, 2019

agupta74 commented Dec 4, 2019

PradyumnaGupta commented Dec 12, 2019

iShaka commented Dec 13, 2019

agupta74 commented Dec 19, 2019

0x0539 commented Dec 20, 2019 • edited

agupta74 commented Dec 21, 2019 • edited

jonanem commented Dec 23, 2019

0x0539 commented Dec 27, 2019

0x0539 commented Dec 27, 2019

0x0539 commented Jan 8, 2020 • edited

TheGlobalist commented Feb 22, 2020

RobRomijnders commented May 16, 2020

mnsrmov commented Nov 5, 2019 •

edited

jdongca2003 commented Nov 13, 2019 •

edited

liuqiangict commented Nov 18, 2019 •

edited

0x0539 commented Dec 20, 2019 •

edited

agupta74 commented Dec 21, 2019 •

edited

0x0539 commented Jan 8, 2020 •

edited