VLLM Code Harness

This package is a vllm enabled derivative of the popular bigcode-evaluation-harness. It offers feature parity with the parent project as of 13th September 2024.

By our experiments, evaluations using this harness can be upto 12x faster than the original harness. We also release a Dockerfile which contains all the generation dependencies and evaluation toolchains pre-installed. This can be used to run the evaluation harness in a reproducible manner.

NOTE: Due to limitations imposed by vllm, this harness only supports autoregressive models and not encoder-decoder ones.

Usage

We provide instructions for running evaluations on a cluster with Apptainer (formerly Singularity) installed. The apptainer execution model is preferable due to its lack of need for a priviledged access, making it safer to execute arbitrarily generated programs. However, the harness should be runnable on any system with Docker installed using the template provided in the Dockerfiles directory.

Pull the image from the Docker Hub and create an apptainer SIF image. We provide a pre-built image based on the one in the Dockerfile directory available on the Docker Hub. You can pull it using the following command:

apptainer pull docker://ineil77/code_inference:latest

Create an overlay filesystem volume for the container to use. This is useful as the evaluations will generate a large number of files and the overlay filesystem will prevent the host filesystem from being cluttered. (TIP: Ask your sysadmin to raise the limit on the number of inodes for the host filesystem. This is a common issue when running evaluations as apptainer simply inherits the host filesystem's inode limit. A small limit can cause the evaluations to fail.)

apptainer overlay create --fakeroot --size 4096 overlay.img

Checkout the harness repository and navigate to the directory containing the evaluation harness.

gh repo clone iNeil77/vllm-code-harness
cd vllm-code-harness

Create a bash script take in a HF model as an argument to run a task (e.g. ReCode) named recode.sh. For a full list of arguments refer to the bigcode_eval/arguments.py file.

modelname="$1"

source /Miniforge/etc/profile.d/conda.sh
source /Miniforge/etc/profile.d/mamba.sh
mamba activate inference
python -c 'from huggingface_hub import HfFolder; HfFolder.save_token("YOUR_TOKEN_HERE")'
for split in "func_name"
do
    python main.py \
        --model $modelname \
        --max_length_generation 1024 \
        --tasks "perturbed-humaneval-$split-num_seeds_5" \
        --temperature 0.0 \
        --n_samples 1 \
        --precision "bf16" \
        --allow_code_execution \
        --continuous_batching_size 64 \
        --swap_space 128 \
        --save_references_path "/Outputs/$modelname/recode/$split/references.json" \
        --save_generations_path "/Outputs/$modelname/recode/$split/generations.json" \
        --metric_output_path "/Outputs/$modelname/recode/$split/metrics.json" 
done

Run the evaluation harness using the following shell script:

for modelname in "bigcode/starcoderbase-1b" "deepseek-ai/deepseek-coder-1.3b-base"
do
    apptainer run \
        --nv \
        --fakeroot \
        --bind "/Path/To/Current/Folder:/:ro" \
        --bind "/Path/To/Current/Folder/Outputs:/Outputs:rw" \
        --overlay "overlay.img" \
        /Path/To/SIF/Containers/code_inference_latest.sif recode.sh $modelname
done

Name		Name	Last commit message	Last commit date
Latest commit History 890 Commits
Dockerfiles		Dockerfiles
docs		docs
eval_harness		eval_harness
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLLM Code Harness

Usage

About

Releases

Packages

Languages

iNeil77/vllm-code-harness

Folders and files

Latest commit

History

Repository files navigation

VLLM Code Harness

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages