Neural Rankers for Code Generation via Inter-Cluster Modeling

Introduction

We introduce SRank, a novel reranking strategy for selecting the best solution from code generation that focuses on modeling inter-cluster relationship. By quantifying the functional overlap between clusters, our approach provides a better ranking strategy of code solutions. Empirical results show that our method achieves a remarkable results on pass@1 score. For instance, on the Human-Eval benchmark, we achieve 69.66% in pass@1 with Codex002, 75.31% for WizardCoder, 53.99% for StarCoder and 60.55% for CodeGen, which surpass the state-of-the-arts solution ranking methods, such as CodeT and Coder-Reviewer on the same CodeLLM with significant margin (≈6.1% improvement on average). Comparing to the random sampling method, we can achieve an average improvement of ≈23.07% on Human-Eval and 17.64% on MBPP. Even in scenarios with limited test inputs, our approach demonstrates robustness and superiority, marking a new state-of-the-arts in code generation reranking.

Installation

All experiments are run with python==3.9.17.
Install pyminifier from source. Installing pyminifier requires reverting setup tools to an older version (pip install setuptools==57.5.0). For other issues of installing pyminifier, checkout their issues for potential fixes.
Install human-eval from source.
Install the other packages by

pip install -r requirements.txt

Usage

Available models:

wizardcoder34B
wizardcoder15B
codegen25
starcoder
davinci002
codegen16B

Available datasets:

humaneval
mbpp
apps

Data

The processed results will be saved at these locations with pre-defined file names

Post-processed code solutions: generation/gen_code/preds/${dataset}/${model}/postprocessed_T${temperature}_N${num_samples}.jsonl
Post-processed test cases: generation/gen_test/preds/${dataset}/${model}/postprocessed_T${temperature}_N${num_samples}.jsonl
Execution results: execution/results/${dataset}/${model}/T${temperature}_N{$num_samples}/*

Data generation

Generating code solutions

cd generation/gen_code/sh
./run.py ${device_ids} ${model} ${dataset} ${max_sequence_length} ${number_of_sequences} ${running_script}

For example, running wizardcoder on humaneval

cd generation/gen_code/sh
./run.sh 0,1,2,3 wizardcoder humaneval 2048 8 wizardcoder.py

Results are saved to generation/gen_code/preds/${dataset}/${model}/T${temperature}_N${num_samples}/

1st step: Generating test cases

cd generation/gen_test/sh
./run.py ${device_ids} ${model} ${dataset} ${max_sequence_length} ${number_of_sequences} ${running_script}

For example, running wizardcoder on humaneval

cd generation/gen_test/sh
./run.sh 0,1,2,3 wizardcoder humaneval 2048 8 wizardcoder.py

Results are saved to generation/gen_test/preds/${dataset}/${model}/T${temperature}_N${num_samples}/

2nd step: Post-processing raw generation

Post-processing code solutions

cd generation/gen_code/sh
./postprocess.sh ${model} ${dataset}

Results are saved to generation/gen_code/preds/${dataset}/${model}/postprocessed_T${temperature}_N${num_samples}.jsonl

Post-processing test cases

cd generation/gen_test/sh
./postprocess.sh ${model} ${dataset}

Results are saved to generation/gen_test/preds/${dataset}/${model}/postprocessed_T${temperature}_N${num_samples}.jsonl

3rd step. Execution

cd execution/sh
./run.sh ${model} ${dataset}

Execution results are saved to execution/results/${dataset}/${model}/T${temperature}_N{$num_samples}/. The folder contains the following files:

ground_truth_exec_result.pkl: Execution results of code solutions on ground truth test cases, as provided by benchmark datasets.
model_generated_test_cases.pkl: Processed model-generated test cases, excluding those with syntactic and partially semantic inaccuracies.
test_inputs_exec_result.pkl: Execution outputs of code solutions on model-generated test cases.

4th step: Reranking

Available reranking methods:

attention
random

./run.sh ${model} ${dataset} ${temperature} ${num_samples} ${reranking_method}

For example, running reranking wizardcoder15B on humaneval

cd reranking/sh
./run.sh wizardcoder humaneval 0.8 100 attention

Acknowledgement

This code base is adapted from

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execution

execution

generation

generation

reranking

reranking

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Neural Rankers for Code Generation via Inter-Cluster Modeling

Introduction

Installation

Usage

Data

Data generation

Generating code solutions

1st step: Generating test cases

2nd step: Post-processing raw generation

Post-processing code solutions

Post-processing test cases

3rd step. Execution

4th step: Reranking

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
execution		execution
generation		generation
reranking		reranking
README.md		README.md
requirements.txt		requirements.txt

FSoft-AI4Code/SRank-CodeRanker

Folders and files

Latest commit

History

Repository files navigation

Neural Rankers for Code Generation via Inter-Cluster Modeling

Introduction

Installation

Usage

Data

Data generation

Generating code solutions

1st step: Generating test cases

2nd step: Post-processing raw generation

Post-processing code solutions

Post-processing test cases

3rd step. Execution

4th step: Reranking

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages