This repository contains the code of prediction-preserving simplification and the simplified data using DD module for our paper 'Understanding Neural Code Intelligence Through Program Simplification' accepted at ESEC/FSE'21.
Artifact for Article (SIVAND):
Reproducible Capsule of FeatureExtractor:
├── ./ # code for model-agnostic DD framework
├── data/
├── selected_input # randomly selected test inputs from different datasets
├── simplified_input # traces of simplified inputs for different models
├── summary_result # summary results of all experiments as csv
├── models/
├── dd-code2seq # DD module with code2seq model
├── dd-code2vec # DD module with code2vec model
├── dd-great # DD module with RNN/Transformer model
├── others/ # related helper functions
├── save/ # images of SIVAND
Delta Debugging (DD) was implemented with Python 2. We have modified the core modules (DD.py, MyDD.py) to run in Python 3 (i.e., Python 3.7.3), and then adopted the DD modules for prediction-preserving program simplification using different models. The approach, SIVAND, is model-agnostic and can be applied to any model by loading a model and making a prediction with the model for a task.
How to Start:
To apply SIVAND (for MethodName task as an example), first update <g_test_file>
(path to a file that contains all selected inputs) and <g_deltas_type>
(select token or char type delta for DD) in helper.py
.
Then, modify load_model_M()
to load a target model (i.e., code2vec/code2seq) from <model_path>
, and prediction_with_M()
to get the predicted name, score, and loss value with <model>
for an input <file_path>
.
Also, check whether <code>
is parsable into is_parsable()
, and load method by language (i.e. Java) from load_method()
.
Finally, run MyDD.py
that will simplify programs one by one and save all simplified traces in the dd_data/
folder.
More Details:
Check models/dd-code2vec/
and models/dd-code2seq/
folders to see how SIVAND works with code2vec and code2seq models for MethodName task on Java program.
Similarly, for VarMisuse task (RNN & Transformer models, Python program), check the models/dd-great/
folder for our modified code.
Example of an original and minimized method in which the target is to predict onCreate . |
Reduction of a program while preserving the predicted method name OnCreate by the code2vec model. |
The minimized example clearly shows that the model has learned to take shortcuts, in this case looking for the name in the function's body.
-
Tasks:
-
Models:
- [MN] code2vec & code2seq
- [VM] RNN & Transformer
-
Datasets:
- [MN] Java-Large
- [VM] Py150
-
Sample Inputs:
- [MN] Correctly predicted samples, Wrongly predicted samples
- [VM] Buggy (correct location and target; wrong location), Non-buggy (bug-free)
-
Delta Types:
- [MN] Token & Char
- [VM] Token
The data/summary_result/
folder contains summary results of all experiments as csv, each file has the following fields:
filename
: ID for the input file ofdata/simplified_input
foldermodel
: {code2vec, code2seq, RNN, or Transformer}task
: METHOD_NAME or VARIABLE_MISUSEfilter_type
:- {token_correct, char_correct or token_wrong} for task == METHOD_NAME
- {buggy_correct, non_buggy_correct, or buggy_wrong_location} for task == VARIABLE_MISUSE
initial_score
: score of actual programfinal_score
: score of minimal programinitial_loss
: loss of actual programfinal_loss
: loss of minimal programdd_pass
: total/valid/correct DD stepss for reductiondd_time
: total time spent for reductioninitial_program
: actual raw programfinal_program
: minimal simplified programinitial_tokens
: tokens in actual programfinal_tokens
: tokens in minimal programlen_{initial/final/minimal}_{tokens/chars}
: number of corresponding {tokens/chars}per_removed_{chars/tokens}
: percentage of removed {chars/tokens}attn_nodes
: top-k AST nodes based on attention score {k ~= len_final_nodes}final_nodes
: all AST nodes in minimal programcommon_nodes
: common nodes between attention & reductionlen_{attn/final/common}_nodes
: number of corresponding AST nodesper_common_tokens
: percentage of common nodes between attention & reductionground_truth
: True (for correct prediction) or False (for wrong prediction)
Note that the <null>
, -1
, or <empty>
value represents that the value is not available for that particular input/experiment.
Summary of reduction results in correctly predicted samples. |
Understanding Neural Code Intelligence through Program Simplification
@inproceedings{rabin2021sivand,
author = {Rabin, Md Rafiqul Islam and Hellendoorn, Vincent J. and Alipour, Mohammad Amin},
title = {Understanding Neural Code Intelligence through Program Simplification},
year = {2021},
isbn = {9781450385626},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3468264.3468539},
doi = {10.1145/3468264.3468539},
booktitle = {Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
pages = {441–452},
numpages = {12},
location = {Athens, Greece},
series = {ESEC/FSE 2021}
}