Skip to content

Dataset to evaluate the impact of five prompt programming techniques on the code generated by LLMs. This repository also includes the replication package for the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al.

License

Notifications You must be signed in to change notification settings

icetlab/CodePromptEval

Repository files navigation

CodePromptEval: Evaluating the impact of prompt programming on code generation

GitHub arXiv

This repository contains a dataset, CodePromptEval, based on the CoderEval Python dataset's functions (Yu et al. (2024)). CodePromptEval consists of 7,072 prompts based on 221 prompts for code-generation tasks, and each prompt implements 32 unique combinations of prompt techniques. The prompt techniques we cover are Few-shot learning, Persona, Chain-of-Thought, Function Signature (context), and List of Packages (context).

In addition, we provide the replication package of the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al. (2024). The replication package contains the original CoderEval, the additional tests and few-shot examples that we added to CoderEval, the scripts that we used to construct and evaluate CodePromptEval on five LLMs (GPT-3.5, GPT-4o, Llama3-70B, Llama2-7B, and Mistral), as well as the LLMs output with the generated functions and the evaluation results.

This replication package also includes the raw results of a manual inspection of 40 functions that failed or passed due to prompting the models using one or more prompt techniques.

To cite this work:

@article{khojah2024impact,
  title={{The Impact of Prompt Programming on Function-Level Code Generation}},
  author={Khojah, Ranim and Neto, Francisco Gomes de Oliveira and Mohamad, Mazen and Leitner, Philipp},
  journal={arXiv preprint arXiv:2412.20545},
  year={2024}
}

Install dependencies

# (optional) create a virtual environment
pip install virtualenv
python -m venv .<name_of_virtual_environment>
source .<name_of_virtual_environment>/bin/activate

# install packages
pip install -r requirements.txt

Contact

Please contact khojah{at}chalmers.se if you have any questions.

About

Dataset to evaluate the impact of five prompt programming techniques on the code generated by LLMs. This repository also includes the replication package for the study "The Impact of Prompt Programming on Function-Level Code Generation" by Khojah et al.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published