Skip to content

GJCEXP/CODESCRIBE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

CODESCRIBE

In this work, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization. Specifically, CODESCRIBE leverages the graph neural network and Transformer to preserve the structural and sequential information of code, respectively. In addition, we propose a pointer-generator network that pays attention to both the structure and sequential tokens of code for a better summary generation. Experiments on two real-world datasets in Java and Python demonstrate the effectiveness of our proposed approach when compared with several state-of-the-art baselines.

Runtime Environment

  • 4 NVIDIA 2080 Ti GPUs
  • Ubuntu 16.04
  • CUDA 10.2 (with CuDNN of the corresponding version)
  • Anaconda
    • Python 3.9 (base environment)
    • Python 2.7 (virtual environment named as python27)
  • PyTorch 1.9 for Python 3.9
  • PyTorch Geometric 1.7 for Python 3.9
  • Specifically, install our package with pip install my-lib-0.0.6.tar.gz for both Python 3.9 and Python 2.7. The package can be downloaded from Baidu Netdisk or Google Drive.

Dataset

The whole datasets of Python and Java can be downloaded from Baidu Netdisk or Google Drive.

Note that: We provide 100 samples for train/valid/test datasets in the directory data/python/raw_data/ and data/java/raw_data/. To run on the whole dataset, replace these samples with the data files downloaded.

Experiment on the Python Dataset

  1. Step into the directory src_code/python/:
    cd src_code/python
    
  2. Proprecess the train/valid/test data:
    python s1_preprocessor.py
    conda activate python27
    python s2_preprocessor_py27.py
    conda deactivate
    python s3_preprocessor.py
    
  3. Run the model for training and testing: python s4_model.py

After running, the performance will be printed in the console, and the predicted results of test data and will be saved in the path data/python/result/result.json, with ground truth and code involved for comparison.

We have provided the results of test dataset, you can get the evaluation results directly by running

python s5_eval_res.py"

Note that:

  • all the parameters are set in src_code/python/config.py and src_code/python/config_py27.py.
  • If the model has been trained, you can set the parameter "train_mode" in line 83 in config.py to "False". Then you can predict the test data directly by using the model that has been saved in data/python/model/.

Experiment on the Java Dataset

  1. Step into the directory src_code/java/:
    cd src_code/java
    
  2. Proprecess the train/valid/test data:
    python s1_preprocessor.py
    
  3. Run the model for training and testing:
    python s2_model.py
    

After running, the performance will be printed in the console, and the predicted results of test data and will be saved in the path data/java/result/result.json, with ground truth and code involved for comparison.

We have provided the results of test dataset, you can get the evaluation results directly by running

python s3_eval_res.py"

Note that:

  • all the parameters are set in src_code/java/config.py.
  • If the model has been trained, you can set the parameter "train_mode" in line 113 in config.py to "False". Then you can predict the test data directly by using the model that has been saved in data/java/model/.

More Implementation Details.

  • The main parameter settings for CODESCRIBE are shown as:

params

  • The time used per epoch and the memory usage are provided as:

usage

Experimental Result:

We provide part of our experiment result as below.

  • Comparison with state-of-the-arts.

Comparison with the baselines

  • Qualitative examples.

Examples

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages