-
Clone the repository:
git clone git@github.com:kvah/analyzing_verb_alternations_plms.git -
cdinto the clone directory:cd analyzing_verb_alternations_plms/ -
Create the conda environment:
conda env create -n 575nn --file ./conda_environment.yaml -
Activate the conda environment:
conda activate 575nn -
Install
alternationproberas an editable package with pip:pip install -e .
To Run tests: pytests ./tests
Provides the following:
get_bert_word_embeddings: command-line utility to produce static word-level embeddings from the LAVA dataset.- usage:
get_bert_word_embeddings [--model_name] - model_name:
bert-base-uncased,roberta-base,google/electra-base-discriminator,microsoft/deberta-base - Will produce 2 output files:
./data/embeddings/static/{model_name}.npy: This is a 2d numpy array with the word-embeddings./data/embeddings/bert-word-embeddings-lava-vocab.json: This is a mapping of vocabulary item to its index in the numpy array. (It is the same as the order from the original file in the LAVA dataset.)
- usage:
The embeddings and associated vocabulary mapping can be loaded like so:
import json
import numpy as np
from alternationprober.constants import (PATH_TO_BERT_WORD_EMBEDDINGS_FILE,
PATH_TO_LAVA_VOCAB)
embeddings = np.load(PATH_TO_BERT_WORD_EMBEDDINGS_FILE, allow_pickle=True))
with PATH_TO_LAVA_VOCAB.open("r") as f:
vocabulary_to_index = json.load(f)
get_bert_word_embeddings: command-line utility to produce contextual layer embeddings from the LAVA dataset.- usage:
get_bert_context_word_embeddings [--model_name] - model_name:
bert-base-uncased,roberta-base,google/electra-base-discriminator,microsoft/deberta-base - Produces the following output file:
./data/embeddings/context/{model_name}.npy: This is a 2d numpy array with the contextual word embeddings
- usage:
The embeddings and associated vocabulary mapping can be loaded like so:
import json
import numpy as np
from alternationprober.constants import PATH_TO_BERT_CONTEXT_WORD_EMBEDDINGS_FILE
context_embeddings = np.load(PATH_TO_BERT_CONTEXT_WORD_EMBEDDINGS_FILE)
-
run_linear_classifier_experiment: Will run our experiment to predict alternation classes from static model embeddings derived from the LaVA dataset.- usage:
run_linear_classifier_experiment [--output_directory] [--use_context_embeddings] [--model_name] - Example: To run the linear classifier experiment with contextual embeddings:
run_linear_classifier_experiment --use_context_embeddings --bert_base_uncased
- Note: <
output_directory> will default to./results/linear-probe-for-word-embeddings - Note:
./download-datasets.shand the jupyter notebook./data_analysis.ipynbmust be run first to make the data available.
- usage:
-
run_linear_classifier_sentence_experiment: Will run our experiment to predict sentence grammaticality from contextual embeddings derived from the FAVA dataset.- usage:
run_linear_classifier_experiment [--output_directory] [--use_context_embeddings] [--model_name] - Example: To run the linear classifier experiment with contextual embeddings:
run_linear_classifier_experiment --use_context_embeddings --bert_base_uncased
- Note: <
output_directory> will default to./results/linear-probe-for-sentence-embeddings - Note:
./download-datasets.shand the jupyter notebook./data_analysis.ipynbmust be run first to make the data available.
- usage:
Shell script to download the LaVa and FAVA datasets to ./data directory.
Usage: sh download-datasets.sh