# bert4keras
by 苏剑林

https://github.com/bojone/bert4keras

https://bert4keras.spaces.ac.cn/

## 功能

    加载bert/roberta/albert的预训练权重进行finetune；
    实现语言模型、seq2seq所需要的attention mask；
    丰富的examples；https://github.com/bojone/bert4keras/tree/master/examples
    从零预训练代码（支持TPU、多GPU，请看pretraining）；
    兼容keras、tf.keras


## 安装
安装稳定版：

pip install bert4keras

安装最新版：

pip install git+https://www.github.com/bojone/bert4keras.git
## 权重

目前支持加载的权重：

    Google原版bert: https://github.com/google-research/bert
    brightmart版roberta: https://github.com/brightmart/roberta_zh
    哈工大版roberta: https://github.com/ymcui/Chinese-BERT-wwm
    Google原版albert[例子]: https://github.com/google-research/ALBERT
    brightmart版albert: https://github.com/brightmart/albert_zh
    转换后的albert: https://github.com/bojone/albert_zh
    华为的NEZHA: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/NEZHA
    自研语言模型: https://github.com/ZhuiyiTechnology/pretrained-models
    T5模型: https://github.com/google-research/text-to-text-transfer-transformer
    GPT2_ML: https://github.com/imcaspar/gpt2-ml
    Google原版ELECTRA: https://github.com/google-research/electra
    哈工大版ELECTRA: https://github.com/ymcui/Chinese-ELECTRA
    CLUE版ELECTRA: https://github.com/CLUEbenchmark/ELECTRA


In [1]:
# !pip install git+https://www.github.com/bojone/bert4keras.git
!pip freeze | grep keras

bert4keras==0.7.2


In [1]:
#! -*- coding: utf-8 -*-
# 测试代码可用性

from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import Tokenizer
import numpy as np

Using TensorFlow backend.


In [2]:
import tensorflow as tf

In [3]:
print(tf.__version__)

2.1.0


In [4]:
!pip freeze | grep Keras

Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0


In [5]:
!pip freeze | grep tensorflow

tensorflow==2.1.0
tensorflow-datasets==1.3.0
tensorflow-estimator==2.1.0
tensorflow-metadata==0.15.0


In [6]:
bert_dir = '/Users/luoyonggui/Documents/nlpdata/chinese_L-12_H-768_A-12'

In [7]:
config_path = f'{bert_dir}/bert_config.json'
checkpoint_path = f'{bert_dir}/bert_model.ckpt'
dict_path = f'{bert_dir}/vocab.txt'

# 调用bert base模型来编码句子的简单例子

代码分为两部分：
* 第一部分是tokenizer的建立，bert4keras.tokenizers里边包含了对原版BERT的tokenizer的完整复现，同时还补充了一下常用的功能；
* 第二部分就是BERT模型的建立，其主要函数是build_transformer_model，其定义如下：

In [None]:
def build_transformer_model(
    config_path=None,  # 模型的配置文件（对应的文件为json格式）
    checkpoint_path=None,  # 模型的预训练权重（tensorflow的ckpt格式）
    model='bert',  # 模型的类型（bert、albert、albert_unshared、nezha、electra、gpt2_ml、t5）
    application='encoder',  # 模型的用途（encoder、lm、unilm）
    return_keras_model=True,  # 返回Keras模型，还是返回bert4keras的模型类
    **kwargs  # 其他传递参数
):

build_transformer_model各参数的含义很难用几句话表达清楚，不过在这个10分钟教程里，这些细节并不是特别重要，所以暂时略去。学习一个框架最好的方法还是多看例子，所以还是恳请用户多参考github上提供的examples https://github.com/bojone/bert4keras/tree/master/examples。

## 建立分词器

In [24]:
tokenizer = Tokenizer(dict_path) # 建立分词器
# 编码测试
token_ids, segment_ids = tokenizer.encode('语言模型')

In [25]:
token_ids, segment_ids

([101, 6427, 6241, 3563, 1798, 102], [0, 0, 0, 0, 0, 0])

可以看出，编码以后是2个list， list长度为编码句子的长度+2

In [38]:
# 可以限制句子的长度
tokenizer.encode('语言模型', max_length=4)

([101, 6427, 6241, 102], [0, 0, 0, 0])

In [39]:
# 可以限制句子的长度
tokenizer.encode('语言模型', max_length=40)

([101, 6427, 6241, 3563, 1798, 102], [0, 0, 0, 0, 0, 0])

## 建立模型，加载权重

### 返回keras model

In [27]:
model = build_transformer_model(config_path, checkpoint_path, 
                               return_keras_model=True) # 建立模型，加载权重

In [28]:
type(model)

keras.engine.training.Model

In [33]:
# 获取bert的输入层
model.inputs

[<tf.Tensor 'Input-Token_4:0' shape=(None, None) dtype=float32>,
 <tf.Tensor 'Input-Segment_4:0' shape=(None, None) dtype=float32>]

In [17]:
# 获取bert的输出层
model.output

<tf.Tensor 'Transformer-11-FeedForward-Norm_1/add_1:0' shape=(None, None, 768) dtype=float32>

In [19]:
# bert的层list
len(model.layers)

104

In [21]:
model.layers

[<keras.engine.input_layer.InputLayer at 0x13db727f0>,
 <keras.engine.input_layer.InputLayer at 0x13db727b8>,
 <bert4keras.layers.Embedding at 0x13db72710>,
 <bert4keras.layers.Embedding at 0x13db726d8>,
 <keras.layers.merge.Add at 0x13db723c8>,
 <bert4keras.layers.PositionEmbedding at 0x13db725c0>,
 <bert4keras.layers.LayerNormalization at 0x13db72630>,
 <keras.layers.core.Dropout at 0x13db72550>,
 <bert4keras.layers.MultiHeadAttention at 0x13da8d8d0>,
 <keras.layers.core.Dropout at 0x1422f12e8>,
 <keras.layers.merge.Add at 0x1422f1358>,
 <bert4keras.layers.LayerNormalization at 0x13dac6f98>,
 <bert4keras.layers.FeedForward at 0x13dabfc18>,
 <keras.layers.core.Dropout at 0x13dd59f98>,
 <keras.layers.merge.Add at 0x13dab0e80>,
 <bert4keras.layers.LayerNormalization at 0x1469980b8>,
 <bert4keras.layers.MultiHeadAttention at 0x1469986a0>,
 <keras.layers.core.Dropout at 0x146998198>,
 <keras.layers.merge.Add at 0x13dd40358>,
 <bert4keras.layers.LayerNormalization at 0x13dd402e8>,
 <bert4k

In [20]:
# 获取bert的倒数第二层
model.layers[-2]

<keras.layers.merge.Add at 0x149c7af60>

### 返回keras model的封装类

In [29]:
bert = build_transformer_model(config_path, checkpoint_path, 
                               return_keras_model=False) # 建立模型，加载权重
type(bert)

bert4keras.models.BERT

#### 通过bert4keras.models.BERT获取keras.engine.training.Model

In [30]:
type(bert.model)

keras.engine.training.Model

In [32]:
# 获取bert的输入层
bert.inputs

[<tf.Tensor 'Input-Token_5:0' shape=(None, None) dtype=float32>,
 <tf.Tensor 'Input-Segment_5:0' shape=(None, None) dtype=float32>]

In [34]:
bert.initializer

<keras.initializers.TruncatedNormal at 0x1481d65c0>

## predict

In [35]:
r = model.predict([np.array([token_ids]), np.array([segment_ids])])

In [36]:
r.shape

(1, 6, 768)

In [11]:
r

array([[[-0.63250965,  0.20302312,  0.07936583, ...,  0.49122566,
         -0.20493367,  0.25752527],
        [-0.7588357 ,  0.09651838,  1.0718755 , ..., -0.61096966,
          0.0431218 ,  0.03881414],
        [ 0.547703  , -0.79211694,  0.44435284, ...,  0.42449164,
          0.41105747,  0.08222783],
        [-0.29242492,  0.6052705 ,  0.49968675, ...,  0.86041355,
         -0.65331644,  0.5369077 ],
        [-0.7473448 ,  0.49431536,  0.7185178 , ...,  0.38486043,
         -0.7409052 ,  0.39056796],
        [-0.87413776, -0.21650389,  1.3388399 , ...,  0.5816858 ,
         -0.4373227 ,  0.56181794]]], dtype=float32)