Skip to content

mojsaeed/TypeEmbedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

You Are My Type! Type Embeddings for Pre-trained Language Models @ EMNLP 2022 (Findings)

(Paper) Type Embedding Example

One reason for the positive impact of Pretrained Language Models (PLMs) in NLP tasks is their ability to encode semantic types, such as ‘European City’ or ‘Woman’. While previous work has analyzed such information in the context of interpretability, it is not clear how to use types to steer the PLM output. For example, in a cloze statement, it is desirable to steer the model to generate a token that satisfies a user-specified type, e.g., predict a date rather than a location.

In this work, we introduce Type Embeddings (TEs), an input embedding that promotes desired types in a PLM. Our proposal is to define a type by a small set of word examples. We empirically study the ability of TEs both in representing types and in steering masking redictions without changes to the prompt text in BERT. Finally, using the LAMA datasets, we show how TEs highly improve the precision in extracting facts from PLMs.

This repo contains the required code for running the experiments of the associated paper.

I. Installation

0. Clone Repo

git clone https://github.com/MhmdSaiid/TypeEmbedding
cd TypeEmbedding

1. Create virtual env and install reqs

virtualenv TE -p $(which python3)
source TE/bin/activate
pip install -r requirements.txt

2. Unzip Data

The datasets and type embeddings used for the experiments:

bash exps/prepare_data.sh

II. TE Analysis

III. TE MLM Experiments

  • Instrinsic Experiments (Section X.X)

    bash exps/runItr.sh
    python src/print_avg.py --res_dir "results/ProcessedDatasets" 
  • Run Extrinsic Experiments (Section X.X) (Under Construction)

  • Type Switch (exps/Type_Switch.ipynb)

  • Sampling Variations

    • Sampling Method bash exps/SampleMethod.sh
    • Number of Samples bash exps/SampleNum.sh

IV. TE NLG Experiments

V. Generate Your Own Type Embedding

You can generate your own Type Embeddings using the following script:

python src/GetTypeVecs.py --model_arch 'bert-base-cased'\
                          --path 'data/KG Samples/'\
                          --seed 0\ 
                          --num_samples 10
  • --model_arch: model architecture according to HuggingFace's Transformers library
  • --path: folder containing samples for types in csv format
  • --seed: seed calue for reproducability
  • --num_samples: number of samples used
  • --sample_type: Sampling Method. Choose between Uniform Random Sampling (Unif), Random Weighted Sampling (Weighted), Most important samples (Top), and Least important sample (Bot)

Contact Us

For any inquiries, feel free to contact us, or raise an issue on Github.

Reference

You can cite our work:

@inproceedings{saeed-etal-2022-TE,
  title = {You Are My Type! Type Embeddings for Pre-trained Language Models},
  author = {Saeed, Mohammed and Papotti, Paolo},
  booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
  month = dec,
  year = {2022},
  address = {Online and Abu Dhabi, UAE},
  publisher = {Association for Computational Linguistics},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published