Skip to content

optimatch/optimatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OPTIMATCH
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities
(Reproduction of Experiments)

OPTIMATCH

Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Table of contents

  1. How to reproduce
  2. License
  3. Citation

How to reproduce

Environment Setup

First of all, clone this repository to your local machine and access the main dir via the following command:

git clone https://github.com/optimatch/optimatch.git
cd optimatch

Then, install the python dependencies via the following command:

pip install -r requirements.txt
  • We highly recommend you check out this installation guide for the "torch" library so you can install the appropriate version on your device.

  • To utilize GPU (optional), you also need to install the CUDA library, you may want to check out this installation guide.

  • Python 3.9.7 is recommended, which has been fully tested without issues.

Reproduction of Experiments

Download necessary data and unzip via the following command:

cd data
sh download_data.sh 
cd ..

Reproduce Main Results (Table 1 in the paper)

  • OPTIMATCH (proposed approach)

    • Inference
    cd our_method/optimatch/saved_models/checkpoint-best-f1
    sh download_models.sh
    cd ../..
    sh test_phase_2_150pat.sh
    cd ..
    
    • Retrain Phase 1 Model
    cd our_method/optimatch
    sh train_phase_1.sh
    cd ..
    
    • Retrain Phase 2 Model
    cd our_method/optimatch
    sh train_phase_2_150pat.sh
    cd ..
    
  • Baselines

    To reproduce baseline approaches, please follow the instructions below:

    • Step 1: cd to "./baselines" folder
    • Step 2: cd to the specific baseline folder you wish to reproduce, e.g., "statement_codebert"
    • Step 3: cd to the models folder, e.g., "saved_models/checkpoint-best-f1"
    • Step 4: download the models via "sh download_models.sh" and "cd ../.."
    • Step 5: find the shell script named as "train_xyz.sh" (e.g., train_multi_task_baseline_codebert.sh) and run it via "sh train_xyz.sh"

    To run inference, find the shell script named as "test_xyz.sh" and run it via "sh test_xyz.sh",
    If "test_xyz.sh" does not exist, remove "do_test" command in "train_xyz.sh" and run the inference via "sh train_xyz.sh"

    A concrete example is provided as follows:

    • Statement-Level CodeBERT
      • Retrain
      cd baselines/statement_codebert/saved_models/checkpoint-best-f1
      sh download_models.sh
      cd ../..
      sh train_multi_task_baseline_codebert.sh
      cd ../..
      

Reproduce Ablation Study (Table 2 in the paper)

  • To reproduce w/o vulnerability codebook & matching, run the following commands:
    • Retrain (ignore "sh train_phase_one.sh" if running inference only)
      cd our_method/optimatch/saved_models/checkpoint-best-f1
      sh download_models.sh
      cd ../..
      sh train_phase_one.sh
      sh test_phase_one.sh
      cd ../..
      

Each ablation trial (except w/o vulnerability codebook & matching) consists of phase 1 and 2 trainings like our OPTIMATCH approach. First cd to the folder contains your interested trial. To retrain models in any phases, run "train_xyz.sh". To run inference in any phases, run "test_xyz.sh".

  • To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_mean"
  • To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_max"
  • To reproduce OPTIMATCH wt N vulnerability centroids, cd to "./ablation/num_patterns"

License

MIT License

Citation

under review at ICML 2024

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published