LLMAO

LLMAO is a large language model based fault localization tool, associated with the following paper published at ICSE-2024. For paper replication, skip to section 3. For replication of this tool on Defects4J per bug level, skip to section 5.

I. Requirements

We recommend using docker for LLMAO.

# Pull the huggingface docker image, which includes most requirements

docker pull huggingface/transformers-pytorch-gpu

# Run a container. Make sure to mount the container to your own directory path. We assume an Nvidia GPU exists, as training and loading an LLM requires a significant amount of GPU VRAM.

docker run -it --mount type=bind,src="path-to-local-directory",dst=/home huggingface/transformers-pytorch-gpu:4.21.0

# Install some additional dependencies
pip install --upgrade pip
pip install accelerate
pip install torchdata

ROC plots and AUC scores:

python3 plotter.py plotfiles

II. Demo

We include two example code files here for demonstration: demo_code.c and demo_code.java.

With actual vulnerable lines 52-62 for demo_code.c, and actual buggy lines 20-30 for demo_code.java.

python3 demo.py $demo_type $pretrain_type $code_file_path
example: python3 demo.py devign 16B demo_code.c


output: 
line-52 sus-15.86%:         DISAS_INSN(divw)
...

Minimum VRAM (GPU memory) required for loading each of the checkpoints:

350M: 2.6GB

6B: 14.7GB

16B: 38GB (recommend at least 2-3 GPUs)

The 16B checkpoint for Defects4J is too large to fit on GitHub or standard drives, for replication results please use 350M or 6B.

III. Obtain some top scores

Top scores:

python3 top_scores.py model_logs $pretrain_type
# Example
python3 top_scores.py model_logs 6B

IV. Train model yourself

Download Dataset

Click the following url link and download the dataset used in this research.

data
Unzip it and put the folder in the same path as this repo
Load Codegen final hidden states: change biggest_model=1 to use Codegen-16B: requires significant amount of GPU vram and storage. bash codegen_loading.sh
Train bash fault_localizer.sh
Rerun results python3 top_scores.py python3 plotter.py

V. Run LLMAO on file level

LLMAO was neither trained nor evaluated on file level due to its limited context window of 128 lines. The training and evaluation procedure of LLMAO is described in Section 3.1 of the paper.

To run LLMAO on a much larger file, one way is to split the file into multiple chunks of 128 lines and combine scores at the end. However, this way of using LLMAO removes valuable context across the entire file, and buggy or vulnerable lines across multiple chunks cannot be accuracy detected. We include the method for running LLMAO on Defects4J entire files in this replication package to showcase the limitiation of our LLM-based fault localization. We hope that this limitation can be reduced as LLMs grow larger and can process significantly larger context windows. Enter the following:

python3 top_score_window.py

Output:

Top score for llmao_window
top 5: 77
top 3: 52
top 1: 24

Top score for Transfer
top 5: 145
top 3: 126
top 1: 69

In which LLMAO has much weaker results than Transfer-FL, a prior fault localization approach that is trained on Defects4J for each individual bug.

To remake LLMAO file level scores:

python3 llmao_d4j_window.py

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.aws		.aws
.vscode		.vscode
d4j_code		d4j_code
data		data
model_checkpoints		model_checkpoints
model_logs		model_logs
model_logs_tenfold		model_logs_tenfold
model_plotters		model_plotters
plotfiles		plotfiles
plots		plots
score_llmao_window		score_llmao_window
score_transferfl		score_transferfl
.gitignore		.gitignore
README.md		README.md
codegen.py		codegen.py
codegen_loading.py		codegen_loading.py
codegen_loading.sh		codegen_loading.sh
configuration_codegen.py		configuration_codegen.py
demo.py		demo.py
demo.sh		demo.sh
demo_code.c		demo_code.c
demo_code.java		demo_code.java
demo_code_result.txt		demo_code_result.txt
fault_localizer.sh		fault_localizer.sh
llmao_d4j_window.py		llmao_d4j_window.py
log.txt		log.txt
modeling_codegen.py		modeling_codegen.py
plotter.py		plotter.py
results_plot.sh		results_plot.sh
results_topscores.sh		results_topscores.sh
test_input.txt		test_input.txt
top_score_window.py		top_score_window.py
top_scores.py		top_scores.py
training.py		training.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMAO

I. Requirements

II. Demo

III. Obtain some top scores

IV. Train model yourself

V. Run LLMAO on file level

About

Releases

Packages

Languages

squaresLab/LLMAO

Folders and files

Latest commit

History

Repository files navigation

LLMAO

I. Requirements

II. Demo

III. Obtain some top scores

IV. Train model yourself

V. Run LLMAO on file level

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages