Skip to content
/ CWS Public

Source code for an ACL2016 paper of Chinese word segmentation

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



9 Commits

Repository files navigation


This code implements the word segmentation algorithm proposed in the following paper.

Deng Cai and Hai Zhao, Neural Word Segmentation Learing for Chinese. ACL 2016.

Lastest update! We improved the system, the corresponding paper was accepted to ACL2017, with source code at this repo.

Update! a faster implementation using dynet as backend is now available. python -d to use the new (dynet based) version.

Usage (theano, also helpful to dynet version):

- train

python -t. To train a model, first check the hyperparameter settings in The training procedure will result a config file at the very beginning in which your hyperparameter settings are preserved, and output the trained model parameters to *.npz per epoch.

- test

python params.npz input_file output_path config_file. To test a trained model, specify the file that stores the model parameters as params.npz as well as the corresponding configuration file config_file. The test procedure will read data from input_file and output result to output_path.

- evaluate

For example, To see the best result (F1-score 95.5) on PKU dataset reported in our paper, first generate the output file through the trained model ( python best_pku.npz ../data/pku_test somepath best_pku_config), then use the command ./score ../data/dic ../data/pku_test somepath.


Thanks to those excellent computing tools: Dynet, Theano, Numpy, Gensim.


Deng Cai. Any question, feel free to contact me through my email.


If you find this code useful, please cite our paper.

  author    = {Cai, Deng  and  Zhao, Hai},
  title     = {Neural Word Segmentation Learning for Chinese},
  booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {August},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {409--420},
  url       = {}


Source code for an ACL2016 paper of Chinese word segmentation







No releases published


No packages published