Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not load pretrained bert weights when loading chinese_L-12_H-768_A-12/bert_model.ckpt #80

Closed
yangxudong opened this issue Oct 21, 2020 · 3 comments

Comments

@yangxudong
Copy link

Here is my code snippets

max_seq_len = 128
bert_params = bert.params_from_pretrained_ckpt(model_dir)
bert_layer = bert.BertModelLayer.from_params(bert_params, name="bert")

input_ids = Input(shape=(max_seq_len,), dtype='int32')
masked = Masking(mask_value=0)(input_ids)
emb = bert_layer.embeddings_layer(masked)  # shape: (None, seq_len, emb_size)
...
mask_ids = np.expand_dims(np.tile(np.array(tokenizer.convert_tokens_to_ids(["[MASK]"])), max_seq_len), 0)
emb_mask = bert_layer.embeddings_layer(mask_ids)  # shape(1, seq_len, emb_size)
new_emb = err_prob * emb_mask + (1. - err_prob) * emb  # broadcast, shape(None, seq_len, emb_size)
output = bert_layer.encoders_layer(new_emb)  # bert_layer 接受的是input_ids, 不是embedding之后的数据
output = Dense(num_classes, activation='softmax')(output + emb)
correct_model = Model(input_ids, output)
correct_model.build(input_shape=(None, max_seq_len))
bert.load_bert_weights(bert_layer, model_ckpt)
correct_model.compile(optimizer=Adam(1e-3))
correct_model.summary()

When I run it, get the problem of loading pretrained weights. Can anyone help me? Thanks!

Traceback (most recent call last):
  File "/Users/weisu.yxd/PycharmProjects/PY3/soft_mask_bert.py", line 103, in <module>
    bert.load_bert_weights(bert_layer, model_ckpt)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bert/loader.py", line 206, in load_stock_weights
    prefix = bert_prefix(bert)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bert/loader.py", line 186, in bert_prefix
    assert match, "Unexpected bert layer: {} weight:{}".format(bert, bert.weights[0].name)
AssertionError: Unexpected bert layer: <bert.model.BertModelLayer object at 0x102d56be0> weight:embeddings/word_embeddings/embeddings:0
@kpe
Copy link
Owner

kpe commented Oct 21, 2020

I'm not able to reproduce, can you try posting a minimal but complete executable example, i.e. something like:

import os
import bert

from tensorflow import keras

model_name = "chinese_L-12_H-768_A-12"
model_dir = bert.fetch_google_bert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")

# use in Keras Model here, and call model.build()
model = keras.models.Sequential([
    keras.layers.InputLayer(input_shape=(128,)),
    l_bert,
    keras.layers.Lambda(lambda x: x[:, 0, :]),
    keras.layers.Dense(2)
])
model.build(input_shape=(None, 128))

bert.load_bert_weights(l_bert, model_ckpt)
model.summary()

@kpe
Copy link
Owner

kpe commented Oct 21, 2020

ohh... I see, you are trying to replace/extend the default embeddings layer, cool!

I believe, because the l_bert instance is not part of the graph, actually because, the weights get instantiated here (out of any context/scope):

output = bert_layer.encoders_layer(new_emb)  

the prefix/name_scope is missing. As a workaround, you could put the relevant peaces (or everything) in a name_scope like this:

from tensorflow.python.keras import backend as K

# https://github.com/tensorflow/tensorflow/issues/27298
with K.get_graph().as_default(), K.name_scope('bert'):
  emb_mask = bert_layer.embeddings_layer(mask_ids)  # shape(1, seq_len, emb_size)
  output = bert_layer.encoders_layer(new_emb)

as a minimal example:

import os
import bert

from tensorflow import keras
from tensorflow.python.keras import backend as K

model_name = "chinese_L-12_H-768_A-12"
model_dir = bert.fetch_google_bert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)

# https://github.com/tensorflow/tensorflow/issues/27298
with K.get_graph().as_default(), K.name_scope('bert'):
    l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")
    inp_ids = keras.layers.Input(shape=(128,), dtype='int32')
    new_emb = l_bert.embeddings_layer(inp_ids)
    output = l_bert.encoders_layer(new_emb)
    output = keras.layers.Dense(3, activation='softmax')(output + new_emb)
    model = keras.models.Model(inp_ids, output, name='bert')

bert.load_bert_weights(l_bert, model_ckpt)
model.summary()

as an alternative consider extending BertModelLayer, and overriding the relevant methods (i.e. call(),build(),...).

@kpe kpe closed this as completed Oct 21, 2020
@yangxudong
Copy link
Author

the prefix/name_scope is missing

Thanks for your reply. And yes, it is because of missing the prefix/name_scope. Your example is feasible. Cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants