Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong order of values ​​when calling bert.variables and fine tune after that #5

Closed
igeti opened this issue May 22, 2019 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@igeti
Copy link

igeti commented May 22, 2019

Thank you very much for the article. After that, I wanted to understand BERT more deeply and found the following thing in your code.
For fine tune, you use the following line of code:
trainable_vars = self.bert.variables
trainable_vars = trainable_vars [-self.n_fine_tune_layers:]
However, self.bert.variables returns the list sorted by variable names, and therefore the 11th block of the transformer goes before 9. And with fine tune, intermediate layers are trained when the others are completely frozen.

bert.variables return

 <tf.Variable 'BERT_module_1/bert/embeddings/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/position_embeddings:0' shape=(512, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/word_embeddings:0' shape=(119547, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/output_bias:0' shape=(119547,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/kernel:0' shape=(768, 768) dtype=float32>]```
@jacobzweig jacobzweig added the bug Something isn't working label May 23, 2019
@jacobzweig jacobzweig self-assigned this May 23, 2019
@zabithameed
Copy link

Dear kkkyan, please refer to the line,
layer_no = int((var.name.split("/")[3]).split("_")[-1]),

The error I faced is,
ValueError: invalid literal for int() with base 10: 'encoder'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants