CMU multinlp

This repository contains scripts to download and preprocess the General Language Analysis Datasets (GLAD) benchmark and codes for the task-agnostic SpanRel model, a multi-faceted NLP toolkit that can cover many different tasks.

Generalizing Natural Language Analysis through Span-relation Representations (ACL2020)

Prerequisites

# install jsonnet from https://github.com/google/jsonnet
conda create -n spanrel python=3.6
conda activate spanrel
./setup.sh

Datasets

8 datasets consisting of annotations of 10 tasks are included in this repository.

Dataset	Task	Task code	Dir
Wet Lab Protocols	NER	wlp	data/wlp
	RE	wlp	data/wlp
CoNLL-2003	NER	ner	data/semeval_2014/
SemEval-2010 Task 8	RE	rc	data/semeval_2010_task8/
OntoNotes 5.0	Coref.	coref	data/conll_coref_2012/
	SRL	srl	data/conll_srl_2012/
	POS	pos_conll	data/conll_pos_2012/
	Dep.	dp_conll	data/conll_dep_2012/
	Consti.	consti_conll	data/conll_consti_2012/
Penn Treebank	POS	pos	data/ptb_pos/
	Dep.	dp	data/ptb/
	Consti.	consti	data/ptb_consti/
OIE2016	OpenIE	oie	data/openie/
MPQA 3.0	ORL	orl	data/mpqa/
SemEval-2014 Task 4	ABSA	semeval14_st2	data/semeval_2014/

Follow the instructions in run.sh in each dataset directory to download and preprocess the datasets into BRAT format.

Train and Evaluate SpanRel models

Run BERT-based models, where $emb can be bert-base-uncased, bert-large-uncased, and $task is one of the "task code" shown in the table.

./run_by_config_bert.sh $task $emb $output

Run GloVe/ELMo-based models, where $emb can be glove or elmo, and $task is one of the "task code" shown in the table.

./run_by_config.sh $task $emb $output

Train and Evaluate SpanRel models on other datasets

Put the data in data/kairos with train, dev, and test sub-directory, each containing multiple .ann and .txt files.
Build vocabulary: ./run_by_config_bert.sh kairos bert-base-uncased output/kairos_vocab.
Modify the vocab variable in the kairos section of run_by_config_bert.sh to output/kairos_vocab/vocabulary.
Train and evaluate: ./run_by_config_bert.sh kairos bert-base-uncased output/kairos_log.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
brat_multitask		brat_multitask
data		data
output/all_vocab/vocabulary		output/all_vocab/vocabulary
scripts		scripts
training_config/template		training_config/template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh
run_by_config.sh		run_by_config.sh
run_by_config_bert.sh		run_by_config_bert.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMU multinlp

Prerequisites

Datasets

Train and Evaluate SpanRel models

Train and Evaluate SpanRel models on other datasets

About

Releases

Packages

Languages

License

neulab/cmu-multinlp

Folders and files

Latest commit

History

Repository files navigation

CMU multinlp

Prerequisites

Datasets

Train and Evaluate SpanRel models

Train and Evaluate SpanRel models on other datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages