Lexical Semantic Recognition

Code accompanying Lexical Semantic Recognition, by Nelson F. Liu, Daniel Hershcovich, Michael Kranzlein, and Nathan Schneider.

Pre-trained Models

If you're interested in pre-trained models, system output, or evaluation script output for the models in the paper, you can download them from Google Drive.

Installation

This project is being developed in Python 3.6.

Conda will set up a virtual environment with the exact version of Python used for development along with all the dependencies needed to run the code.

Download and install Conda.
Change your directory to your clone of this repo.
```
cd lexical-semantic-recognition-baselines
```
Create a Conda environment with Python 3.6 .
```
conda create -n lsr python=3.6
```
Now activate the Conda environment. You will need to activate the Conda environment in each terminal in which you want to run code from this repo.
```
conda activate lsr
```
Install the required dependencies.
```
pip install -r requirements.txt
```

You should now be able to test your installation with py.test -v. Congratulations!

Training a model

Training a model is as simple as:

allennlp train <train_config_path> \
    --include-package streusle_tagger \
    -s <path to save output to>

The training configs live in the ./training_config folder.

For instance, to run a BERT model with all constraints during inference, but none during training, use streusle_bert_large_cased/streusle_bert_large_cased_all_constraints_no_train_constraints.jsonnet.

Modifying training configs

By changing the jsonnet training configs, you can control various aspects of your runs. For instance, by default, all the STREUSLE 4.x training configs look for data at ./data/streusle/streusle.ud_{train|dev|test}.json. To train on other datasets, simply modify this path. You can also override it on the commandline with:

allennlp train <train_config_path> \
    --include-package streusle_tagger \
    -s <path to save output to> \
    --overrides '{"train_data_path": "<train_path>", "validation_data_path": "<validation_path>", "test_data_path": "<test_path>"}'

Evaluating trained models

To evaluate trained models, we have scripts at ./scripts, e.g. ./scripts/evaluate_on_streusle.sh. Each of these scripts loops through models saved at ./models, and runs evaluation on them. Of course, if you have a particular model you want to evaluate, you can always run these commands in the shell independent of the evaluation command.

For instance, to evaluate a STREUSLE 4.x tagger, we would use https://github.com/nelson-liu/streusle-tagger/blob/master/scripts/evaluate_on_streusle.sh . On the shell:

We start by generating predictions from the model.

allennlp predict <path to saved model.tar.gz output by allennlp> <data_to_predict_on_path> \
    --silent \
    --output-file <model output and predictions path> \
    --silent \
    --include-package streusle_tagger \
    --use-dataset-reader \
    --predictor streusle-tagger \
    --cuda-device 0 \
    --batch-size 64

Change --cuda-device to -1 if using CPU, and modify the batch size as you see fit.

Next, we evaluate with the official STREUSLE metric:

./scripts/streusle_eval/streuseval.sh \
    data/streusle/streusle.ud_dev.conllulex \
    <model output and predictions path>

This should write the metrics to the same folder as <model output and predictions path>.

Predictions from trained models

To get predictions from a trained models, format your input data into JSON-lines format. Each line should contain a JSON dictionary that looks like {"tokens": [...], "upos_tags": [...], "lemmas": [...]}. "tokens", "upos_tags", and "lemmas" should each be a list of strings.

If using a model with constraints, providing the "upos_tags" and "lemmas" will override any generated by the model themselves. If they are not provided, any model that needs them should automatically generate them.

Then, we can pass this file into allennlp predict to write out a file with predictions and model internals:

allennlp predict <path to saved model.tar.gz output by allennlp> <data_to_predict_on_path.jsonl> \
    --silent \
    --output-file <model output and predictions path> \
    --silent \
    --include-package streusle_tagger \
    --predictor streusle-tagger \
    --cuda-device 0 \
    --batch-size 64

Name		Name	Last commit message	Last commit date
Latest commit History 232 Commits
build_tools/travis		build_tools/travis
constraints_used		constraints_used
data		data
fixtures		fixtures
scripts		scripts
streusle_tagger		streusle_tagger
tests		tests
training_config		training_config
.gitignore		.gitignore
.pylintrc		.pylintrc
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
experiments.md		experiments.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexical Semantic Recognition

Pre-trained Models

Installation

Training a model

Modifying training configs

Evaluating trained models

Predictions from trained models

About

Releases

Packages

Contributors 2

Languages

License

nelson-liu/lexical-semantic-recognition

Folders and files

Latest commit

History

Repository files navigation

Lexical Semantic Recognition

Pre-trained Models

Installation

Training a model

Modifying training configs

Evaluating trained models

Predictions from trained models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages