Estimating Large Language Model Capabilities without Labeled Test Data

This paper is the code implementation of "Estimating Large Language Model Capabilities without Labeled Test Data" by Harvey Yiyun Fu, Qinyuan Ye, Albert Xu, Xiang Ren, and Robin Jia.

This repo contains code for both the In-context Learning LLM inference to generate meta-training data and the meta-model training.

Installation

pip install torch==1.13.1
pip install transformers==4.22.1

Quick Start

To run experiments with MMLU and MCQA datasets:

cd mcqa

To run experiments with CBQA datasets:

cd cbqa

Please see mcqa/config.py and cbqa/config.py for the full ontology of datasets in each collection.

Meta-training data collection

While under mcqa/, run the following command to do inference using the OPT model on the MMLU/MCQA dataset:

python opt_mmlu_worker.py\
    --model_size opt-6.7b --num_shots 5 --temperature 0 --template mmlu --seed 1

--model_size: the size of the OPT model, such as opt-6.7b or opt-13b
--num_shots: number of few-shot examples in the prompt
--template: the prompt template to demonstrate the few-shot examples. Choose from mmlu, subject, gopher, gpt, and user.
--temperature: hyperparameter to control the randomness
--seed: random seed

Similarly, while under cbqa/, run the following command to do inference using the OPT model on the CBQA dataset:

python opt_worker.py\
    --model_size opt-6.7b --num_shots 5 --seed 1

--model_size: the size of the OPT model, such as opt-6.7b or opt-13b
--num_shots: number of few-shot examples in the prompt
--seed: random seed

Then under either directory, run

python transform_embed.py

to retrieve and store the PCA-transformed embeddings

Meta-model training

Under mcqa/ , run

python train_classifier.py\
    --setting cv --cv_k 5 --tasks mmlu --num_unlabeled 1000 --data_dim 100 --only_size 13B \
    --seed 1 --llama --mmlu --metric conf

--setting: general setting for train/test split
--cv_k: number of splits for cross-validation
--tasks: task defined in config.py as the metadata, choose from mmlu and mcqa
--num_unlabeled: how much data to include in a single confidence profile
--data_dim: dimension of confidence profile
--only_size: inference of the specified size of LLM
--only_shots: inference of the specified k-shot results
--llama/--opt: use llama model or opt model
--mmlu/--mcqa: do inference on mmlu or mcqa
--metric: metric for processing confidence profile, choose from conf, pca_embed, conf_embed
--train_size: number of seeds to include in the training/test data
--seed: random seed
--do_sigmoid, --dropout, --lr, --lr_lambda, --num_epochs: MLP hyperparameters

Under cbqa/, run

python train_classifier.py\
    --setting cv --cv_k 5 --tasks cbqa --num_unlabeled 1000 --data_dim 100 --only_size llama13B \
    --seed 1 --llama --metric conf

--setting: general setting for train/test split
--cv_k: number of splits for cross-validation
--tasks: task defined in config.py as the metadata, choose from cbqa and seq2seq
--num_unlabeled: how much data to include in a single confidence profile
--data_dim: dimension of confidence profile
--only_size: inference of the specified size of LLM
--only_shots: inference of the specified k-shot results
--llama/--opt: use llama model or opt model
--mmlu/--mcqa: do inference on mmlu or mcqa
--metric: metric for processing confidence profile, choose from conf, pca_embed, conf_embed
--seed: random seed
--do_sigmoid, --dropout, --lr, --lr_lambda, --num_epochs: MLP hyperparameters

Acknowledgement

We did not include meta-training data in this repo due to its large magnitude. We did not include the inference code for LLaMA models to avoid certain copyright issues. We thank 🤗 huggingface datasets for making the datasets and LLMs easily accessible.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
cbqa		cbqa
mcqa		mcqa
LICENSE		LICENSE
README.md		README.md
intro.png		intro.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Estimating Large Language Model Capabilities without Labeled Test Data

Installation

Quick Start

Meta-training data collection

Meta-model training

Acknowledgement

About

Releases

Packages

Languages

License

harvey-fin/icl-estimate

Folders and files

Latest commit

History

Repository files navigation

Estimating Large Language Model Capabilities without Labeled Test Data

Installation

Quick Start

Meta-training data collection

Meta-model training

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages