Temporal Blind Spots in LLMs

This repo contains experiments and necessary steps to reproduce the results from our WSDM '24 paper "Temporal Blind Spots in LLMs".

Conda Environment

conda create -n llm-inference 
conda activate llm-inference
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia
pip install black tqdm jupyter
pip install sentencepiece
pip install datasets
pip install git+https://github.com/huggingface/transformers
pip install chardet cchardet
pip install accelerate
pip install gradio
pip install -r requirements.txt

Furthermore, you might want to setup Weights and Biases.

Datasets and Preprocessing

We are using TemporalQuestions and ArchivalQA. Currently, it is necessary to manually capitalize the "question" and "answer" the first line in the ArchivalQA files. I put these into a /data directory. Additionally, we sample datasets from ArchivalQA. TODO: Description/code to set everything up.

Steps to prepare datasets:

Create a data directory (e.g., /data)
Download Archival QA from here: https://github.com/WangJiexin/ArchivalQA
Download TemporalQuestions from here: https://www.dropbox.com/sh/fdepuisdce268za/AACtiPDaO_RwLCwhIwaET4Iba?dl=0
Download TempLAMA from here: https://github.com/google-research/language/tree/master/language/templama
Move the datasets to your data directory.

Prompts

I used different options, which are specified in the prompts.json file. You can swap them there and specify the correct key as the --prompt_name argument. Default is "gpt_style" which we use in the experiments in the paper

Models

The experiments use text-davinci-003 which requires you to register an API key at OpenAI and export it to OPENAI_API_KEY like

export OPENAI_API_KEY=<key>

Running Experiments

Running the experiments will mostly look like this:

python alpaca_query.py \
    --ds_path=/data/Event-focused Questions/Explicitly Time-Scoped Questions.csv \
    --model_name=alpaca-7b \
    --prompt_path=prompts.json \
    --prompt_name=gpt_style \
    --batch_size=$batch_size

The --ds_path is pointing toward the individual .csv files. --model_name might be "alpaca-7b" (more options in alpaca_query.py argparse arguments). --prompt_path should point toward a json file with prompts for the models. Per default, this is pointing at the prompts.json file. Lastly, --prompt_name is the key in the prompt_path file.

Running this for all models will result in the big results table (Table 3).

Results

Results will be written to standard output, a file with the model predictions will be saved, and logged to weights and biases (if set up).

Additional Experiments

Additional experiments will require additional steps. The scripts are in the /scripts/ folder.

Time Stratification

The file /scripts/time_stratification.ipynb covers the necessary steps. The configuration necessary is in the first cell. It is necessary to specify the dataset, prediction files, and interval information.

Time Referencing

For the time referencing experiments please have a look at /scripts/absolute_relative_time_refs.ipynb to create the datasets. If these datasets exist, just run the standard experiment setup and pass the corresponding datasets (that were generated in here).

Temporal Errors

Temporal Errors can be computed via the /scripts/temporal_error_checks.ipynb file. Again, you need to pass the according dataset and predictions files in the second cell.

Can't find CUDA libdevice

I ran into this error a couple times. It seems that this was a common problem with the current tensorflow version. Check out the solution down here: https://www.tensorflow.org/install/pip For me the CUDA path did not work out, so I needed to set CUDA_HOME: $CUDA_HOME=/home/wallat/.conda/envs/llm-inference/

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scripts		scripts
src		src
.gitignore		.gitignore
Supplementary_Material_Temporal Blind Spots in Large Language Models.pdf		Supplementary_Material_Temporal Blind Spots in Large Language Models.pdf
alpaca_query.py		alpaca_query.py
gpt_query.py		gpt_query.py
prompts.json		prompts.json
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Blind Spots in LLMs

Conda Environment

Datasets and Preprocessing

Prompts

Models

Running Experiments

Results

Additional Experiments

Time Stratification

Time Referencing

Temporal Errors

Can't find CUDA libdevice

Citation

About

Releases

Packages

Languages

jwallat/temporalblindspots

Folders and files

Latest commit

History

Repository files navigation

Temporal Blind Spots in LLMs

Conda Environment

Datasets and Preprocessing

Prompts

Models

Running Experiments

Results

Additional Experiments

Time Stratification

Time Referencing

Temporal Errors

Can't find CUDA libdevice

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages