Implement Pooler layer in BertModelLayer #82
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement the Pooler layer from the BERT model architecture, which creates a pooled feature vector using the first token from the output sequence. In many of the online blogs and examples, they mention to take the pooled output from BERT directly and add dense layers (or other layers) on this pooled output.
With this change, the pooler layer weights available in the downloaded checkpoint files of various models can also be loaded into the BertModelLayer object.
Original Behaviour:
Done loading 37 BERT weights from: ~/Downloads/BERT/BERT-Weights/uncased_L-2_H-128_A-2/bert_model.ckpt into <bert.model.BertModelLayer object at 0x7f6a9c64df40> (prefix:bert_orig). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]
Unused weights from checkpoint:
bert/pooler/dense/bias
bert/pooler/dense/kernel
cls/predictions/output_bias
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/kernel
cls/seq_relationship/output_bias
cls/seq_relationship/output_weights
Modified Behaviour:
Done loading 39 BERT weights from: ~/Downloads/BERT/BERT-Weights/uncased_L-2_H-128_A-2/bert_model.ckpt into <bert.model.BertModelLayer object at 0x7f6a9d026a30> (prefix:bert_pooled). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]
Unused weights from checkpoint:
cls/predictions/output_bias
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/kernel
cls/seq_relationship/output_bias
cls/seq_relationship/output_weights
To get the pooler layer output, we need to initialize the BertModelLayer as follows:
bert_params.return_pooler_output = True
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")