OPTIMATCH
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities
(Reproduction of Experiments)

OPTIMATCH

Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

How to reproduce

Environment Setup

First of all, clone this repository to your local machine and access the main dir via the following command:

git clone https://github.com/optimatch/optimatch.git
cd optimatch

Then, install the python dependencies via the following command:

pip install -r requirements.txt

We highly recommend you check out this installation guide for the "torch" library so you can install the appropriate version on your device.
To utilize GPU (optional), you also need to install the CUDA library, you may want to check out this installation guide.
Python 3.9.7 is recommended, which has been fully tested without issues.

Reproduction of Experiments

Download necessary data and unzip via the following command:

cd data
sh download_data.sh 
cd ..

Reproduce Main Results (Table 1 in the paper)

OPTIMATCH (proposed approach)

Inference

cd our_method/optimatch/saved_models/checkpoint-best-f1
sh download_models.sh
cd ../..
sh test_phase_2_150pat.sh
cd ..

Retrain Phase 1 Model

cd our_method/optimatch
sh train_phase_1.sh
cd ..

Retrain Phase 2 Model

cd our_method/optimatch
sh train_phase_2_150pat.sh
cd ..

Baselines

To reproduce baseline approaches, please follow the instructions below:
- Step 1: cd to "./baselines" folder
- Step 2: cd to the specific baseline folder you wish to reproduce, e.g., "statement_codebert"
- Step 3: cd to the models folder, e.g., "saved_models/checkpoint-best-f1"
- Step 4: download the models via "sh download_models.sh" and "cd ../.."
- Step 5: find the shell script named as "train_xyz.sh" (e.g., train_multi_task_baseline_codebert.sh) and run it via "sh train_xyz.sh"
To run inference, find the shell script named as "test_xyz.sh" and run it via "sh test_xyz.sh",
If "test_xyz.sh" does not exist, remove "do_test" command in "train_xyz.sh" and run the inference via "sh train_xyz.sh"

A concrete example is provided as follows:
- Statement-Level CodeBERT
  - Retrain
```
cd baselines/statement_codebert/saved_models/checkpoint-best-f1
sh download_models.sh
cd ../..
sh train_multi_task_baseline_codebert.sh
cd ../..
```

Reproduce Ablation Study (Table 2 in the paper)

To reproduce w/o vulnerability codebook & matching, run the following commands:
- Retrain (ignore "sh train_phase_one.sh" if running inference only)
```
cd our_method/optimatch/saved_models/checkpoint-best-f1
sh download_models.sh
cd ../..
sh train_phase_one.sh
sh test_phase_one.sh
cd ../..
```

Each ablation trial (except w/o vulnerability codebook & matching) consists of phase 1 and 2 trainings like our OPTIMATCH approach. First cd to the folder contains your interested trial. To retrain models in any phases, run "train_xyz.sh". To run inference in any phases, run "test_xyz.sh".

To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_mean"
To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_max"
To reproduce OPTIMATCH wt N vulnerability centroids, cd to "./ablation/num_patterns"

License

MIT License

Citation

under review at ICML 2024

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ablation		ablation
baselines		baselines
data		data
img		img
our_method/optimatch		our_method/optimatch
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ablation

ablation

baselines

baselines

data

data

img

img

our_method/optimatch

our_method/optimatch

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

OPTIMATCH
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities
(Reproduction of Experiments)

OPTIMATCH

Table of contents

How to reproduce

Environment Setup

Reproduction of Experiments

Reproduce Main Results (Table 1 in the paper)

Reproduce Ablation Study (Table 2 in the paper)

License

Citation

About

Releases

Packages

Languages

License

optimatch/optimatch

Folders and files

Latest commit

History

Repository files navigation

OPTIMATCH Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities (Reproduction of Experiments)

OPTIMATCH

Table of contents

How to reproduce

Environment Setup

Reproduction of Experiments

Reproduce Main Results (Table 1 in the paper)

Reproduce Ablation Study (Table 2 in the paper)

License

Citation

About

Resources

License

Stars

Watchers

Forks

Languages

OPTIMATCH
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities
(Reproduction of Experiments)