Wrong order of values when calling bert.variables and fine tune after that #5

igeti · 2019-05-22T14:17:31Z

Thank you very much for the article. After that, I wanted to understand BERT more deeply and found the following thing in your code.
For fine tune, you use the following line of code:
trainable_vars = self.bert.variables
trainable_vars = trainable_vars [-self.n_fine_tune_layers:]
However, self.bert.variables returns the list sorted by variable names, and therefore the 11th block of the transformer goes before 9. And with fine tune, intermediate layers are trained when the others are completely frozen.

bert.variables return

 <tf.Variable 'BERT_module_1/bert/embeddings/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/position_embeddings:0' shape=(512, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/embeddings/word_embeddings:0' shape=(119547, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_0/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_1/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_10/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_11/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_2/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_3/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_4/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_5/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_6/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_7/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_8/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/output/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/key/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/query/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/attention/self/value/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/bias:0' shape=(3072,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/intermediate/dense/kernel:0' shape=(768, 3072) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/encoder/layer_9/output/dense/kernel:0' shape=(3072, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/bert/pooler/dense/kernel:0' shape=(768, 768) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/output_bias:0' shape=(119547,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/beta:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/LayerNorm/gamma:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/bias:0' shape=(768,) dtype=float32>,
 <tf.Variable 'BERT_module_1/cls/predictions/transform/dense/kernel:0' shape=(768, 768) dtype=float32>]```

The text was updated successfully, but these errors were encountered:

zabithameed · 2019-05-28T14:51:21Z

Dear kkkyan, please refer to the line,
layer_no = int((var.name.split("/")[3]).split("_")[-1]),

The error I faced is,
ValueError: invalid literal for int() with base 10: 'encoder'.

jacobzweig added the bug Something isn't working label May 23, 2019

jacobzweig self-assigned this May 23, 2019

kkkyan mentioned this issue May 28, 2019

fix issue n_layer_fine_tune #6

Closed

kurtjanssensai mentioned this issue Jun 6, 2019

Fix unfreezing last layers #9

Closed

jacobzweig closed this as completed in a157b09 Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong order of values when calling bert.variables and fine tune after that #5

Wrong order of values when calling bert.variables and fine tune after that #5

igeti commented May 22, 2019 •

edited

Loading

zabithameed commented May 28, 2019

Wrong order of values ​​when calling bert.variables and fine tune after that #5

Wrong order of values ​​when calling bert.variables and fine tune after that #5

Comments

igeti commented May 22, 2019 • edited Loading

zabithameed commented May 28, 2019

Wrong order of values when calling bert.variables and fine tune after that #5

Wrong order of values when calling bert.variables and fine tune after that #5

igeti commented May 22, 2019 •

edited

Loading