Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Intro

This is an implementation of the paper

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
ICLR 2017

You might also want to refer to

Multi-Task Cross-Lingual Sequence Tagging from Scratch
Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
Preprint, 2016

Requirements

Lasagne, Theano. Python 2.7.

Install Lasagne and Theano with the instructions here: https://github.com/Lasagne/Lasagne#installation

Get data

Publicly Available Data

Some of the datasets are publicly available, which can be downloaded from our server.

wget http://kimi.ml.cmu.edu/transfer/data.tar.gz
tar -xvzf data.tar.gz

The above command will download the Genia and Twitter datasets, along with the Senna embeddings and an English gazeteer.

Other datasets require a LDC license; please contact your institution to access the below datasets.

Get Chunking Dataset

Get the CoNLL 2000 chunking dataset using a LDC license, and organize the files with the following structure:

transfer/chunking/train.txt
transfer/chunking/test.txt

Get POS Dataset

Get the PennTreebank 2003 dataset using a LDC license, and organize the files with the following structure:

transfer/pos_tree/dev.txt
transfer/pos_tree/test.txt
transfer/pos_tree/train.txt

Get Spanish NER Dataset

Get the CoNLL 2003 Spanish NER dataset using a LDC license, and organize the files with the following structure:

transfer/span/esp.testa
transfer/span/esp.testb
transfer/span/esp.train

Get English NER dataset

Get the CoNLL 2003 English NER dataset using a LDC license, and organize the files with the following structure:

transfer/eng.testa.old
transfer/eng.testb.old
transfer/eng.train

Labeling Rates and Data Splits

For each dataset, we first concatenate the training set and the dev set (training set always first). And then use the following function (in sample.py) to sample a list of indices that are used for training.

def create_sample_index(rate, len):
    np.random.seed(13)
    return np.random.choice(len, int(rate * len))

where rate is the labeling rate, and len is the number of instances (training+dev). The function will return an np array of indices; other instances not in the list will be discarded during training.

You can use the above function to reproduce the data splits for comparison of different models.

Transfer Learning with Our Model

The transfer learning scripts are in joint.py and lang.joint.py, where joint.py is used for transfer learning within one language, and lang.joint.py is used to cross-lingual transfer learning.

joint.py accepts the following input formats:

python2 joint.py --tasks <target_task_name> <source_task_name> --labeling_rates <labeling_rate_for_target_task> <labeling_rate_for_source_task> [--very_top_joint]

where task names come from the list

[genia, pos, ner, chunking, ner_span, twitter_ner, twitter_pos]

and labeling rates are float numbers. The flag very_top_joint indicates whether to share the parameters of the CRF layer or not.

Below are examples of the transfer learning settings used in our paper (Fig. 2):

# transfer from PTB to Genia
python2 joint.py --tasks genia pos --labeling_rates <labeling_rate> 1.0 --very_top_joint

# transfer from CoNLL 2003 NER to Genia
python2 joint.py --tasks genia ner --labeling_rates <labeling_rate> 1.0

# transfer from Spanish NER to Genia
python2 lang.joint.py --tasks genia ner_span --labeling_rates <labeling_rate> 1.0

# transfer from PTB to Twitter POS tagging
python2 joint.py --tasks twitter_pos pos --labeling_rates <labeling_rate> 1.0

# transfer from CoNLL 2003 to Twitter NER
python2 joint.py --tasks twitter_ner ner --labeling_rates <labeling_rate> 1.0

# transfer from CoNLL 2003 NER to PTB POS tagging
python2 joint.py --tasks pos ner --labeling_rates <labeling_rate> 1.0

# transfer from PTB POS tagging to CoNLL 2000 chunking
python2 joint.py --tasks chunking pos --labeling_rates <labeling_rate> 1.0

# transfer from PTB POS tagging to CoNLL 2003 NER
python2 joint.py --tasks ner pos --labeling_rates <labeling_rate> 1.0

# transfer from CoNLL 2003 English NER to Spanish NER
python2 lang.joint.py --tasks ner_span ner --labeling_rates <labeling_rate> 1.0

# transfer from Spanish NER to CoNLL 2003 English NER
python2 lang.joint.py --tasks ner ner_span --labeling_rates <labeling_rate> 1.0

Our Results (With More Results Than in the Paper)

Target	Source	Labeling Rate	With Transfer	Without Transfer
genia	PTB	0.0	0.840899499608	N/A
genia	PTB	0.001	0.916581258415	0.832640019292
genia	PTB	0.01	0.963083539318	0.935592130383
genia	PTB	0.1	0.981953738872	0.978035007335
genia	PTB	1.0	0.990092642833	0.990655332489
genia	Eng NER	0.001	0.87471269687	0.832640019292
genia	Eng NER	0.01	0.941942485079	0.935592130383
genia	Eng NER	0.1	0.979944132956	0.978035007335
genia	Eng NER	1.0	0.989951970419	0.990655332489
genia	Span NER	0.001	0.843853620305	0.832640019292
genia	Span NER	0.01	0.93111070919	0.935592130383
genia	Span NER	0.1	0.978718273347	0.978035007335
genia	Span NER	1.0	0.989550049235	0.990655332489
PTB	Eng NER	0.001	0.87471269687	0.841578354698
PTB	Eng NER	0.01	0.949326669443	0.942871025961
PTB	Eng NER	0.1	0.967891464976	0.965916979037
PTB	Eng NER	1.0	0.974470513829	0.975334351428
Eng NER	PTB	0.001	0.346473029046	0.335092085615
Eng NER	PTB	0.01	0.749249658936	0.686385971674
Eng NER	PTB	0.1	0.870218090812	0.86219588832
Eng NER	PTB	1.0	0.91264717787	0.91208817241
Chunking	PTB	0.001	0.622235477654	0.58375524895
Chunking	PTB	0.01	0.867262565155	0.834900974403
Chunking	PTB	0.1	0.927242176013	0.90649356106
Chunking	PTB	1.0	0.953936031606	0.945709723506
Eng NER	Span NER	0.001	0.346253229974	0.335092085615
Eng NER	Span NER	0.01	0.726148735929	0.686385971674
Eng NER	Span NER	0.1	0.865126276196	0.86219588832
Eng NER	Span NER	1.0	0.912161558395	0.91208817241
Span NER	Eng NER	0.001	0.164485165794	0.115025161754
Span NER	Eng NER	0.01	0.604273247066	0.598373003917
Span NER	Eng NER	0.1	0.765227337718	0.745397008055
Span NER	Eng NER	1.0	0.848126232742	0.846034214619
Twitter POS	PTB	0.001	0.020282728949	0.00860479409957
Twitter POS	PTB	0.01	0.646588813768	0.503380454825
Twitter POS	PTB	0.1	0.836508912108	0.748002458513
Twitter POS	PTB	1.0	0.907191149355	0.893054701905
Twitter NER	Eng NER	0.001	0.0137931034483	0.00950118764846
Twitter NER	Eng NER	0.01	0.24154589372	0.0963855421687
Twitter NER	Eng NER	0.1	0.432432432432	0.346534653465
Twitter NER	Eng NER	1.0	0.6473029045	0.63829787234

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
chunking.py		chunking.py
cnn_rnn.py		cnn_rnn.py
genia.py		genia.py
joint.py		joint.py
lang.joint.py		lang.joint.py
ner.py		ner.py
ner_ned.py		ner_ned.py
ner_span.py		ner_span.py
pos.py		pos.py
sample.py		sample.py
twitter_ner.py		twitter_ner.py
twitter_pos.py		twitter_pos.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Intro

Requirements

Get data

Publicly Available Data

Get Chunking Dataset

Get POS Dataset

Get Spanish NER Dataset

Get English NER dataset

Labeling Rates and Data Splits

Transfer Learning with Our Model

Our Results (With More Results Than in the Paper)

About

Releases

Packages

Languages

License

kimiyoung/transfer

Folders and files

Latest commit

History

Repository files navigation

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Intro

Requirements

Get data

Publicly Available Data

Get Chunking Dataset

Get POS Dataset

Get Spanish NER Dataset

Get English NER dataset

Labeling Rates and Data Splits

Transfer Learning with Our Model

Our Results (With More Results Than in the Paper)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages