Skip to content
View optimatch's full-sized avatar

Block or report optimatch

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
optimatch/README.md

OPTIMATCH
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities
(Reproduction of Experiments)

OPTIMATCH

Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Table of contents

  1. How to reproduce
  2. License
  3. Citation

How to reproduce

Environment Setup

First of all, clone this repository to your local machine and access the main dir via the following command:

git clone https://github.com/optimatch/optimatch.git
cd optimatch

Then, install the python dependencies via the following command:

pip install -r requirements.txt
  • We highly recommend you check out this installation guide for the "torch" library so you can install the appropriate version on your device.

  • To utilize GPU (optional), you also need to install the CUDA library, you may want to check out this installation guide.

  • Python 3.9.7 is recommended, which has been fully tested without issues.

Reproduction of Experiments

Download necessary data and unzip via the following command:

cd data
sh download_data.sh 
cd ..

Reproduce Main Results (Table 1 in the paper)

  • OPTIMATCH (proposed approach)

    • Inference
    cd our_method/optimatch/saved_models/checkpoint-best-f1
    sh download_models.sh
    cd ../..
    sh test_phase_2_150pat.sh
    cd ..
    
    • Retrain Phase 1 Model
    cd our_method/optimatch
    sh train_phase_1.sh
    cd ..
    
    • Retrain Phase 2 Model
    cd our_method/optimatch
    sh train_phase_2_150pat.sh
    cd ..
    
  • Baselines

    To reproduce baseline approaches, please follow the instructions below:

    • Step 1: cd to "./baselines" folder
    • Step 2: cd to the specific baseline folder you wish to reproduce, e.g., "statement_codebert"
    • Step 3: cd to the models folder, e.g., "saved_models/checkpoint-best-f1"
    • Step 4: download the models via "sh download_models.sh" and "cd ../.."
    • Step 5: find the shell script named as "train_xyz.sh" (e.g., train_multi_task_baseline_codebert.sh) and run it via "sh train_xyz.sh"

    To run inference, find the shell script named as "test_xyz.sh" and run it via "sh test_xyz.sh",
    If "test_xyz.sh" does not exist, remove "do_test" command in "train_xyz.sh" and run the inference via "sh train_xyz.sh"

    A concrete example is provided as follows:

    • Statement-Level CodeBERT
      • Retrain
      cd baselines/statement_codebert/saved_models/checkpoint-best-f1
      sh download_models.sh
      cd ../..
      sh train_multi_task_baseline_codebert.sh
      cd ../..
      

Reproduce Ablation Study (Table 2 in the paper)

  • To reproduce w/o vulnerability codebook & matching, run the following commands:
    • Retrain (ignore "sh train_phase_one.sh" if running inference only)
      cd our_method/optimatch/saved_models/checkpoint-best-f1
      sh download_models.sh
      cd ../..
      sh train_phase_one.sh
      sh test_phase_one.sh
      cd ../..
      

Each ablation trial (except w/o vulnerability codebook & matching) consists of phase 1 and 2 trainings like our OPTIMATCH approach. First cd to the folder contains your interested trial. To retrain models in any phases, run "train_xyz.sh". To run inference in any phases, run "test_xyz.sh".

  • To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_mean"
  • To reproduce w/o RNN embedding (mean pooling applied), cd to "./ablation/token_embedding_pooling_max"
  • To reproduce OPTIMATCH wt N vulnerability centroids, cd to "./ablation/num_patterns"

License

MIT License

Citation

Popular repositories Loading

  1. optimatch optimatch Public

    Python 10 1