Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer
Switch branches/tags
Nothing to show
Clone or download
Latest commit dfa8488 May 8, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
model Initial commit May 8, 2018
utils Initial commit May 8, 2018
.gitattributes Initial commit May 8, 2018
.gitignore Initial commit May 8, 2018
Readme.md Initial commit May 8, 2018
demo.decode.config Initial commit May 8, 2018
demo.train.config Initial commit May 8, 2018
main.py Initial commit May 8, 2018
requirements.txt Initial commit May 8, 2018

Readme.md

LR-NER

This repository is the implemented code based on PyTorch of our paper 《Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer》,our approach achieved improvements on two low resource languages (including Dutch and Spanish) and Chinese OntoNotes 4.0 dataset.

1 Requirement

Python : 2.7
PyTorch : >=0.3.0

2 Installation

PyTorch

This code is based on PyTorch. You can find installation instructions here.

Dependencies

You can install dependencies like this :

pip install -r requirements.txt

3 Usage

The default configuration is in the file demo.train.config and demo.decode.config.You can modify the parameters as you want.

In training status: : CUDA_VISIBLE_DEVICES=0 python main.py --config demo.train.config

In decoding status : python main.py --config demo.decode.config

4 Dataset

Language Dataset Link
Dutch CoNLL-2002 https://github.com/synalp/NER/tree/master/corpus/
Spanish CoNLL-2002 https://github.com/synalp/NER/tree/master/corpus/
Chinese Ontonotes 4.0 https://catalog.ldc.upenn.edu/ldc2011t03
Translation Link
MUSE https://github.com/facebookresearch/MUSE
Word embedding Link
Glove(english) https://nlp.stanford.edu/projects/glove/

5 Reference

NCRF++: An Open-source Neural Sequence Labeling Toolkit

NER-pytorch