LinguisticStructureLM: Transformer-based Language Modeling with Symbolic Linguistic Structure Representations

Published at NAACL-HLT 2022 as "Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling" by Jakob Prange, Nathan Schneider, and Lingpeng Kong.

Please cite as:

@inproceedings{prange-etal-2022-linguistic,
    title = "Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling",
    author = "Prange, Jakob  and
      Schneider, Nathan  and
      Kong, Lingpeng",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.325",
    pages = "4375--4391"

Setup

To install dependencies, run: pip install -r requirements.txt
Download the trained models into this directory.
Obtain annotated data and store all training and evaluation files as FORMALISM.training.mrp and FORMALISM.validation.mrp (where FORMALISM is one of {dm, psd, eds, ptg, ud, ptb-phrase, ptb-func, empty}) in a directory called mrp/ which is a subdirectory of this one. Note: We used the annotated and MRP-formatted WSJ data, so we cannot publicly release it here. Please contact me or open an issue! (You'll probably need an LDC license to get the data.)

Reproduce Paper Results

To reproduce the main results (table 2 in the paper), complete the following steps:

Edit lm_eval.sh to match your local environment
Run: sh eval_all_lm.sh
The results will be written to stdout by the eval.py, which will be collected in a file called eval-dm,dm,psd,eds,ptg,ud,ptb-phrase,ptb-func-10-0001-0.0_0.0-0-14combined.out by lm_eval.sh. Run: cat eval-dm,dm,psd,eds,ptg,ud,ptb-phrase,ptb-func-10-0001-0.0_0.0-0-14combined.out | grep ";all;" | grep gold, which will give you a bunch of semicolon-separated lines you can paste into your favorite spreadsheet. Voila!

Other Usage

To get more info on commandline arguments, run: python3 train.py or python3 eval.py

To evaluate a trained model more generally (might require additional input file; contact me!), edit lm_eval.sh to match your environment and directory structure, uncomment the lines you want in eval_all_lm.sh and run: sh eval_all_lm.sh SEED where SEED is the last number before .pt in the model name (currently only seed=14 models are available for download).

To train a new model (requires access to .mrp-formatted and preprocessed data, which you can find here and/or contact me about), edit lm.sh to match your environment and directory structure, uncomment the lines you want in run_all_lm.sh and run: sh run_all_lm.sh SEED where SEED is a custom random seed you can set.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
LICENSE		LICENSE
README.md		README.md
_dgl.py		_dgl.py
eval.py		eval.py
eval_all_lm.sh		eval_all_lm.sh
evaluation.py		evaluation.py
graphmlp.py		graphmlp.py
lm.py		lm.py
lm.sh		lm.sh
lm_eval.sh		lm_eval.sh
read_mrp.py		read_mrp.py
requirements.txt		requirements.txt
run_all_lm.sh		run_all_lm.sh
sae.py		sae.py
train.py		train.py
train_sae.py		train_sae.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinguisticStructureLM: Transformer-based Language Modeling with Symbolic Linguistic Structure Representations

Setup

Reproduce Paper Results

Other Usage

About

Releases

Packages

Languages

License

jakpra/LinguisticStructureLM

Folders and files

Latest commit

History

Repository files navigation

LinguisticStructureLM: Transformer-based Language Modeling with Symbolic Linguistic Structure Representations

Setup

Reproduce Paper Results

Other Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages