Skip to content
A BERT-based Chinese Text Encoder Enhanced by N-gram Representations
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ZEN Update modeling.py Nov 3, 2019
datasets update readme.md Nov 3, 2019
examples update readme.md Nov 3, 2019
models
LICENSE Initial commit Nov 1, 2019
README.md Update README.md Nov 7, 2019
requirements.txt
setup.py init Nov 2, 2019

README.md

ZEN

ZEN is a BERT-based Chinese (Z) text encoder Enhanced by N-gram representations, where different combinations of characters are considered during training. The potential word or phrase boundaries are explicitly pre-trained and fine-tuned with the character encoder (BERT), so that ZEN incorporates the comprehensive information of both the character sequence and words or phrases it contains. The structure of ZEN is illustrated in the figure below.

 

ZEN_model

 

Citation

If you use or extend our work, please cite the following paper:

@article{Sinovation2019ZEN,
  title="{ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations}",
  author={Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, Yonggang Wang},
  journal={ArXiv},
  year={2019},
  volume={abs/1911.00720}
}

Quick tour of pre-training and fine-tune using ZEN

The library comprises several example scripts for conducting Chinese NLP tasks:

  • run_pre_train.py: an example pre-training ZEN
  • run_sequence_level_classification.py: an example fine-tuning ZEN on DC, SA, SPM and NLI tasks (sequence-level classification)
  • run_token_level_classification.py: an example fine-tuning ZEN on CWS, POS and NER tasks (token-level classification)

Examples of pre-training and fine-tune using ZEN.

Contact information

For help or issues using ZEN, please submit a GitHub issue.

For personal communication related to ZEN, please contact chenguimin(chenguimin@chuangxin.com).

You can’t perform that action at this time.