CONAN - CONtinuous pAtterNs

This repository contains the code to reproduce the results from the EMNLP 2021 short paper "Continuous Entailment Patterns for Lexical Inference in Context".

If this code is useful for you, please consider citing:

@inproceedings{schmitt-schutze-2021-continuous,
    title = "Continuous Entailment Patterns for Lexical Inference in Context",
    author = {Schmitt, Martin  and
      Sch{\"u}tze, Hinrich},
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
	note = "To appear"
}

Data

Please see the instructions in the repository Language Models for Lexical Inference in Context about retrieving the data.

Code

This code base is built upon the code in Language Models for Lexical Inference in Context. You can refer to the documentation there for most scripts. The train script was renamed to src/train/train.py and there is only one. The only new arguments, --num_patterns and --num_tokens_per_pattern, correspond to the hyperparamters n and k and should be self-explanatory.

The following additional scripts are provided:

src/analysis/contokens_nn.py
src/analysis/create_heatmap.py
src/train/n_k_loop.py

Nearest Neighbor Analysis

The script src/analysis/contokens_nn.py computes the nearest neighbors of continuous tokens in the subword embedding space. It is called like this:

python3 -m src.analysis.contokens_nn PATH_TO_CHECKPOINT NUM_PATTERNS NUM_TOKENS_PER_PATTERN

where

PATH_TO_CHECKPOINT is a model checkpoint stored after training
NUM_PATTERNS is the number of patterns (called n in the paper)
NUM_TOKENS_PER_PATTERN is the number of continuous tokens per pattern (called k in the paper)

Optimize n and k

The script src/train/n_k_loop.py performs a grid search over all configurations for n and k considered in the paper. It takes mostly the same arguments as the train script; the arguments --start_num_patterns, --start_num_tokens_per_pattern, --start_version can be useful for restarting an aborted loop. The results are stored in a tsv file named after the specified experiment name.

The visualization of these results can be done with src/analysis/create_heatmap.py.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CONAN - CONtinuous pAtterNs

Data

Code

Nearest Neighbor Analysis

Optimize n and k

About

Releases

Packages

Languages

License

mnschmit/conan

Folders and files

Latest commit

History

Repository files navigation

CONAN - CONtinuous pAtterNs

Data

Code

Nearest Neighbor Analysis

Optimize n and k

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages