Chinese Word Segmentation

State of the art Chinese Word Segmentation with Bi-LSTMs (Ji Ma, Kuzman Ganchev and David Weiss, EMNLP 2018) - (https://aclweb.org/anthology/D18-1529)

Compatibility

Python3.6.X, Tensorflow 1.12.0

Notes

In this project, four chinese datasets (AS,CITYU,MSR and PKU) were used to train the deep learning model for chinese word segmentation task. These datasets can be gotten from: http://sighan.cs.uchicago.edu/bakeoff2005/

For Training

Run: python3 train.py

input_file_path is the path that contains no-space chinese sequence.

label_file_path is the path that contains the chinese sequence labels in BIES format.

For Preprocessing

Run: python3 preprocess.py original_file_path input_file_path output_file_path

original_file_path is the file that contains the chinese sequence.

input_file_path is the path to save the no-space chinese sequence.

label_file_path is the path to save the chinese sequence labels in BIES format.

For Prediction

Run: python3 predict.py input_path output_path resources_path

input_path is the file that contains the no-space chinese sequence.

output_path is the path to save the predictions in BIES format.

resources_path is the path to the saved model.

The saved model and extras can be downloaded from http://bit.ly/2PKGZBg and placed in the resources folder.

For Scoring

Run: python3 score.py predicition_file gold_file

prediction_file is the file that contains the predicitions in BIES format from previous step.

gold_file is the path to the gold file in BIES format.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
resources		resources
Mariam_Garba_NLP_HW1_Report.pdf		Mariam_Garba_NLP_HW1_Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese Word Segmentation

Compatibility

Notes

For Training

For Preprocessing

For Prediction

For Scoring

About

Releases

Packages

Languages

mokeam/Chinese-Word-Segmentation-in-NLP

Folders and files

Latest commit

History

Repository files navigation

Chinese Word Segmentation

Compatibility

Notes

For Training

For Preprocessing

For Prediction

For Scoring

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages