Skip to content

Latest commit



347 lines (256 loc) · 21.9 KB

File metadata and controls

347 lines (256 loc) · 21.9 KB


Keras solution of Chinese NER task using BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF/single-CRF model with BERTs (Google's Pretrained Language Model: supporting BERT/RoBERTa/ALBERT). Welcome to star this repository if it helps!



Future Work

This project is currently under migration to tensorflow 2.0, which will take a few days if my work is not busy (lol).


This project can be installed via:

git clone
cd keras_bert_ner
python install

Alternatively, using pip:

pip install git+


pip install keras_bert_ner

to uninstall:

pip uninstall keras_bert_ner


Data Format

        "O O B I O O O B I O O O O B I O O O O O O O B I O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O B I O O O O B I O O O O O O O B I O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I O O O B I O O O O O O O O O O O O O O O O O O O O O O O O O"
        "O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O B I I I O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O"

See in ./examples/data/train.txt, data source: 互联网金融新实体发现


Simply run python keras_bert_ner/train/ --help to see the relevant parameters for training a typical NER model. Where you will see as follows:

(nlp) liushaoweihua@ai-server-6:~/jupyterlab/Keras-Bert-Ner$ python keras_bert_ner/train/ --help
usage: [-h] -train_data TRAIN_DATA [-dev_data DEV_DATA]
               [-save_path SAVE_PATH] [-albert] -bert_config BERT_CONFIG
               -bert_checkpoint BERT_CHECKPOINT -bert_vocab BERT_VOCAB
               [-do_eval] [-device_map DEVICE_MAP] [-best_fit]
               [-max_epochs MAX_EPOCHS]
               [-early_stop_patience EARLY_STOP_PATIENCE]
               [-reduce_lr_patience REDUCE_LR_PATIENCE]
               [-reduce_lr_factor REDUCE_LR_FACTOR] [-hard_epochs HARD_EPOCHS]
               [-batch_size BATCH_SIZE] [-max_len MAX_LEN]
               [-learning_rate LEARNING_RATE] [-model_type MODEL_TYPE]
               [-cell_type CELL_TYPE] [-rnn_units RNN_UNITS]
               [-rnn_layers RNN_LAYERS] [-cnn_filters CNN_FILTERS]
               [-cnn_kernel_size CNN_KERNEL_SIZE] [-cnn_blocks CNN_BLOCKS]
               [-crf_only] [-dropout_rate DROPOUT_RATE]

More precisely:

Data File Paths:
  Config the train/dev/test file paths
  -train_data TRAIN_DATA                     (REQUIRED) Train data path
  -dev_data DEV_DATA                         (OPTIONAL) Dev data path. Needed when -do_eval=True

Model Output Paths:
  Config the output paths for model
  -save_path SAVE_PATH                       (OPTIONAL) Model output paths

BERT File paths:
  Config the path, checkpoint and filename of a pretrained or fine-tuned BERT model
  -albert                                    (OPTIONAL) Whether to use ALBERT model. Default is False
  -bert_config BERT_CONFIG                   (REQUIRED) bert_config.json
  -bert_checkpoint BERT_CHECKPOINT           (REQUIRED) bert_model.ckpt
  -bert_vocab BERT_VOCAB                     (REQUIRED) vocab.txt

Action Configs:
  Config the actions during running
  -do_eval                                   (OPTIONAL) Evaluation mode. Default is True
  -device_map DEVICE_MAP                     (OPTIONAL) Use CPU/GPU to train. If use CPU, then 'cpu'. 
                                             If use GPU, then assign the devices, such as '0'. Default 
                                             is 'cpu'

Train Configs:
  Config the train params
  -best_fit                                  (OPTIONAL) Train best model that suits for dev.txt. 
                                             Default is False
  -max_epochs MAX_EPOCHS                     (OPTIONAL) Training epochs. Only available when 
                                             -best_fit=True. Default is 256
  -early_stop_patience EARLY_STOP_PATIENCE   (OPTIONAL) Early stop patience. Only available when 
  																					 -best_fit=True. Default is 3
  -reduce_lr_patience REDUCE_LR_PATIENCE     (OPTIONAL) Reduce learning rate on plateau patience.
                        										 Only available when -best_fit=True. Default is 2
  -reduce_lr_factor REDUCE_LR_FACTOR         (OPTIONAL) Reduce learning rate on plateau factor.
                        										 Only available when -best_fit=True. Default is 0.5
  -hard_epochs HARD_EPOCHS                   (OPTIONAL) Training epochs. Only available when
                        										 -best_fit=False. Default is 10
  -batch_size BATCH_SIZE  									 (OPTIONAL) Batch size. Default is 64
  -max_len MAX_LEN      										 (OPTIONAL) Max sequence length. Default is 64
  -learning_rate LEARNING_RATE 							 (OPTIONAL) Initial adam lr. Default is 1e-5

Model Configs:
  Config the model params
  -model_type MODEL_TYPE                     (OPTIONAL) RNN models or CNN models. Default is rnn
  -cell_type CELL_TYPE                       (OPTIONAL) Cell types. If model_type='rnn', could be
                        										 bilstm or bigru. If model_type='cnn', could be idcnn.
                        										 Default is bilstm
  -rnn_units RNN_UNITS  										 (OPTIONAL) RNN units. Only available when model_type='rnn'. 
  																					 Default is 128
  -rnn_layers RNN_LAYERS										 (OPTIONAL) RNN layers. Only available when model_type='rnn'. 
  																					 Default is 1
  -cnn_filters CNN_FILTERS									 (OPTIONAL) CNN filters. Only available when model_type='cnn'. 
  																					 Default is 128
  -cnn_kernel_size CNN_KERNEL_SIZE					 (OPTIONAL) CNN filters. Only available when model_type='cnn'.
                        										 Default is 3
  -cnn_blocks CNN_BLOCKS										 (OPTIONAL) IDCNN blocks. Only available when model_type='cnn'.
                        										 Default is 4
  -crf_only             										 (OPTIONAL) Only use CRF-layers after BERT. Default is False
  -dropout_rate DROPOUT_RATE								 (OPTIONAL) Dropout rate. Default is 0.0

Some Tips

If your pretrained language model are ALBERTs(Large/Base/Tiny), remember to add parameter -albert.

If you do not want to add any downstream layers, like BiLSTM/BiGRU/IDCNN, simply add parameter -crf_only.

If you want to get the best training results, you need to assign parameters for early-stopping and reduce-learning-rate(see in Train Configs), and do not forget to add parameter -best_fit.


Examples can be seen in ./examples/train_example. Simply run bash to start training.

Here are two templates for rnn models and cnn models:


PRETRAINED_LM_DIR="/home1/liushaoweihua/pretrained_lm/albert_tiny_250k" # your pretrained language model path
DATA_DIR="../data" # your train/dev data path
OUTPUT_DIR="../models" # where to store the NER model

python \
    -train_data=${DATA_DIR}/train.txt \
    -dev_data=${DATA_DIR}/dev.txt \
    -save_path=${OUTPUT_DIR} \
    -bert_config=${PRETRAINED_LM_DIR}/albert_config_tiny.json \
    -bert_checkpoint=${PRETRAINED_LM_DIR}/albert_model.ckpt \
    -bert_vocab=${PRETRAINED_LM_DIR}/vocab.txt \
    -device_map="0" \
    -best_fit \
    -max_epochs=256 \
    -early_stop_patience=5 \
    -reduce_lr_patience=3 \
    -reduce_lr_factor=0.5 \
    -batch_size=64 \
    -max_len=512 \
    -learning_rate=5e-6 \
    -model_type="rnn" \  # rnn model
    -cell_type="bilstm" \ # rnn cell: can be "bilstm" or "bigru"
    -rnn_units=128 \
    -rnn_layers=1 \
    -dropout_rate=0.1 \
    -learning_rate=5e-5 \


PRETRAINED_LM_DIR="/home1/liushaoweihua/pretrained_lm/albert_tiny_250k" # your pretrained language model path
DATA_DIR="../data" # your train/dev data path
OUTPUT_DIR="../models" # where to store the NER model

python \
    -train_data=${DATA_DIR}/train.txt \
    -dev_data=${DATA_DIR}/dev.txt \
    -save_path=${OUTPUT_DIR} \
    -bert_config=${PRETRAINED_LM_DIR}/albert_config_tiny.json \
    -bert_checkpoint=${PRETRAINED_LM_DIR}/albert_model.ckpt \
    -bert_vocab=${PRETRAINED_LM_DIR}/vocab.txt \
    -device_map="0" \
    -best_fit \
    -max_epochs=256 \
    -early_stop_patience=5 \
    -reduce_lr_patience=3 \
    -reduce_lr_factor=0.5 \
    -batch_size=64 \
    -max_len=512 \
    -learning_rate=5e-6 \
    -model_type="cnn" \  # cnn model
    -cell_type="idcnn" \ # cnn cell: can be idcnn
    -cnn_filters=128 \
    -cnn_kernel_size=3 \
    -cnn_blocks=4 \
    -dropout_rate=0.1 \
    -learning_rate=5e-5 \

Logs in Training Phase



Both tag accuracy and sentence accuracy are printed during the training phase.


Data Format


See in ./examples/data/test.txt, data source: 互联网金融新实体发现


Simply run python keras_bert_ner/utils/ --help to see the relevant parameters. Where you will see as follows:

(nlp) liushaoweihua@ai-server-6:~/jupyterlab/Keras-Bert-Ner$ python keras_bert_ner/utils/ --help
usage: [-h] -test_data TEST_DATA [-max_len MAX_LEN] -model_path
               MODEL_PATH -model_name MODEL_NAME [-output_path OUTPUT_PATH]
               -bert_vocab BERT_VOCAB [-device_map DEVICE_MAP]

More precisely:

Data File Paths:
  Config the train/dev/test file paths
  -test_data TEST_DATA                       (REQUIRED) Test data path
  -max_len MAX_LEN                           (OPTIONAL) Max sequence length. Default is 64

Model Output Paths:
  Config the model paths
  -model_path MODEL_PATH                     (REQUIRED) Model path
  -model_name MODEL_NAME                     (REQUIRED) Model name

Output Paths:
  Config the output paths
  -output_path OUTPUT_PATH                   (OPTIONAL) Output file paths

BERT File paths:
  Config the vocab of a pretrained or fine-tuned BERT model
  -bert_vocab BERT_VOCAB                     (REQUIRED) vocab.txt

Action Configs:
  Config the actions during running
  -device_map DEVICE_MAP                     (OPTIONAL) Use CPU/GPU to train. If use CPU, then 'cpu'. 
                                             If use GPU, then assign the devices, such as '0'. Default 
                                             is 'cpu'


Examples can be seen in ./examples/test_example. Simply run bash to start testing.

Logs in Testing Phase




Examples can be seen in ./examples/deploy_example.

Simply run bash to start deploying an API.

Then run the file usage.ipynb or type your_ip:2601/?s=your_text in browser to see the result.


Max Sequence Length: 512

Memory Usage (G) 3.72 0.89
Inference Time (ms) 180 300

Logs in Deploying Phase



Some Chinese Pretrained Language Model






The architecture of this repository refers to macanv's work: BERT-BiLSTM-CRF-NER.

The most important component of keras_bert_ner refers to bojone's work: bert4keras.

The pretained Language Model ALBERT-Tiny, work of BrightMart, makes it possible for NER tasks with short inference time and relatively higher accuracy.

Thanks for all these wonderful works!