Skip to content

PyRepair/maniple

Repository files navigation

Maniple

This repository contains code, scripts and data necessary to reproduce the paper "The Fact Selection Problem in LLM-Based Program Repair".

Installation

Before installing the project, ensure you have the following prerequisites installed on your system:

  • Python version 3.10 or higher.

Follow these steps to install and set up the project on your local machine:

cd maniple
python3 -m pip install .

Structure of Directories

The project is organized into several directories, each serving a specific purpose:

data/               # Training and testing datasets
  BGP32/            # Sampled 32 bugs from the BugsInPy dataset
    black/          # The bug project folder
      10/           # The bug ID folder
        100000001/      # The bitvector used for prompting
          prompt.md         # The prompt used for this bitvector
          response_1.md     # The response from the model
          response_1.json   # The response in JSON format
          response_1.patch  # The response in patch format
          result_1.json     # Testing result
    ...
  BGP314/           # 314 bugs from the BugsInPy dataset

maniple/            # Scripts for getting facts and generate prompts
  strata_based/     # Scripts for generating prompts
  utils/            # Utility functions
  tests/            # Test scripts
  metrics/          # Scripts for calculating metrics for dataset

experiment.ipynb    # Jupyter notebook for training models

experiment-initialization-resources/  # Contains raw facts for each bug
  bug-data/         # row facts for each bug
    ansible/        # Bug project folder
      5/            # Bug ID folder
        bug-info.json              # Metadata for the bug
        facts_in_prompt.json       # Facts used in the prompt
        processed_facts.json       # Processed facts
        external_facts.json        # GitHub issues for this bug
        static-dynamic-facts.json  # Static and dynamic facts
    ...
  datasets-list/    # Subsets from BugsInPy dataset
  strata-bitvector/ # Debugging information for bitvectors

Due to the large size of BGP314, it is not stored in this repository, but available on Zenodo: https://zenodo.org/records/10853003.

Steps to Reproduce the Experiments

Please follow the steps below sequentially to reproduce the experiments on 314 bugs in BugsInPy with our bitvector based prompt

Prepare the Dataset

First, you need to ensure python3.7 command is available globally in your system. If not install manually with commands.

cd /tmp/
wget https://www.python.org/ftp/python/3.7.17/Python-3.7.17.tgz
tar xzf Python-3.7.17.tgz
cd Python-3.7.17

sudo ./configure --prefix=/opt/python/3.7.17/ --enable-optimizations --with-lto --with-computed-gotos --with-system-ffi
sudo make -j "$(nproc)"
sudo make altinstall
sudo rm /tmp/Python-3.7.17.tgz

Then, you can install the required dependencies by running the following command:

The CLI scripts under the `maniple` directory provide useful commands to download and prepare environments for each bug.

To download and prepare environments for each bugs, you can use the `prep` command.

```sh
bgp update_bug_records
maniple prep --dataset experiment-initialization-resources/datasets-list/314-dataset.json --envs-dir ~/Documents/maniple-env --bugdata-dir ~/Documents/maniple-bugsdata

This script will automatically download all 314 bugs from GitHub, create a virtual environment for the bug and install the necessary dependencies.

Fact Extraction

Then you can extract facts from the bug data using the extract command as follows:

maniple extract --dataset 314-dataset --output-dir data/BGP314

This script will extract facts from the bug data and save them in the specified output directory.

You can find all extracted facts under the experiment-initialization-resources/bug-data directory.

Generate Bitvector Specific Prompts and Responses

First, you need to generate bitvector for the facts. The 128 bitvector for our paper can be generated by the following command.

python3 -m maniple.strata_based.fact_bitvector_generator

You can customize your bitvectors, they should be put under experiment-initialization-resources/strata-bitvectors directory. You can refer the example bitvector format used for our paper.

To reproduce our experiment prompt and response, please use the command below, and replace <YOUR_OPENAI_KEY> with your own key.

On Linux/macOS:
# if you want to use OpenAI backend
export OPENAI_API_KEY=<YOUR_OPENAI_KEY>

# if you want to use Ollama backend
export USE_OLLAMA=true

# run LLM query
python3 -m maniple.strata_based.prompt_generator --database BGP314 --partition 10 --trial 15 --model "gpt-3.5-turbo-0125"

Again, you can build your own customize prompt with customize bitvector using our extracted facts. Above is only for reproducing our prompt and response.

This script will generate prompts and responses for all 314 bugs in the dataset by enumerating all possible bitvectors according to current strata design specified in maniple/strata_based/fact_strata_table.json. By specifying --trial 15, the script will generate 15 responses for each prompt. And by specifying --partition 10 the script will start 10 threads to speed up the process. And by specifying --model, you can select which LLM model to use by their name.

Testing Generated Patches

Please use following command:

maniple validate --output-dir data/BGP314

This script will validate the generated patches for the specified bug and save the results in the specified output directory. The test comes from the developer's fix commit.

Contributing

Contributions to this project are welcome! Please submit a PR if you find any bugs or have any suggestions.

License

This project is licensed under the MIT - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •