CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure

Updates

2022/12/10: Please check our slides. 🐈
2022/10/12: Our code is available. 😋
2022/10/06: Release the paper of CAT-probing, check out our paper. 👏
2022/10/06: CAT-probing is accepted by Findings of EMNLP 2022 🎉

Introduction

We proposed a metric-based probing method, namely, CAT-probing, to quantitatively evaluate how CodePTMs Attention scores relate to distances between AST nodes.

More details are provided in our EMNLP'22 paper and our paper.

Environment & Preparing

conda create --name cat python=3.7
conda activate cat
pip install -r requirements.txt
git clone https://github.com/nchen909/CodeAttention
cd CodeAttention/evaluator/CodeBLEU/parser
bash build.sh
cd ../../../
cp evaluator/CodeBLEU/parser/my-languages.so build/
#make sure git-lfs installed like 'apt-get install git-lfs'
apt-get install git-lfs
bash get_models.sh

Preparing data

The dataset we use comes from CodeSearchNet .

mkdir data
cd data
pip install gdown
gdown https://drive.google.com/uc?export=download&id=1t8GncfPknpumOKbgUXux-EkuYnOZ6EfW
unzip data.zip
rm data.zip

Preparing local path

Direct WORKDIR in run.sh and run_att.sh to your path.

Using CAT-probing

Finetune

export MODEL_NAME=
export TASK="summarize"
export SUB_TASK=
bash run.sh $MODEL_NAME $TASK $SUB_TASK

MODEL_NAME can be any one of ["roberta", "codebert", "graphcodebert", "unixcoder"].

SUB_TASK can be any one of ["go", "java", "javascript", "python"].

Probing

# first modify WORKDIR in run_att.sh to yours 
export MODEL_NAME=
export TASK="summarize"
export SUB_TASK=
export LAYER_NUM=
bash run_att.sh $MODEL_NAME $TASK $SUB_TASK $LAYER_NUM

MODEL_NAME can be any one of ["roberta", "codebert", "graphcodebert", "unixcoder"].

SUB_TASK can be any one of ["go", "java", "javascript", "python"].

LAYER_NUM can be any one of [0-11] or -1 (11 refers to the last layer, -1 refers to all [0-11] layers).

Visualization

Visualization results can be found in att-vis-notebook folder.

Citation

Please consider citing us if you find this repository useful.👇

@inproceedings{chen2022cat,
  title={CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure},
  author={Chen, Nuo and Sun, Qiushi and Zhu, Renyu and Li, Xiang and Lu, Xuesong and Gao, Ming},
  booktitle = {Proceedings of {EMNLP}},
  year={2022}
}

Acknowledgement

This work has been supported by the National Natural Science Foundation of China under Grant No. U1911203, the National Natural Science Foundation of China under Grant No. 62277017, Alibaba Group through the Alibaba Innovation Research Program, and the National Natural Science Foundation of China under Grant No. 61877018, The Research Project of Shanghai Science and Technology Commission (20dz2260300) and The Fundamental Research Funds for the Central Universities.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
CAT-score		CAT-score
att-vis-notebook		att-vis-notebook
att_dist_case_per_language		att_dist_case_per_language
att_dist_case_python_with_ast		att_dist_case_python_with_ast
attention-analysis		attention-analysis
attentions/summarize		attentions/summarize
bertviz.egg-info		bertviz.egg-info
build		build
distribution		distribution
evaluator		evaluator
frequent_token_type		frequent_token_type
static		static
tensorboard/summarize		tensorboard/summarize
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attention.py		attention.py
configs.py		configs.py
get_models.sh		get_models.sh
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
run.sh		run.sh
run_att.sh		run_att.sh
slides.pdf		slides.pdf
utils.py		utils.py

License

nchen909/CodeAttention

Folders and files

Latest commit

History

Repository files navigation

CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure

Updates

Introduction

Environment & Preparing

Preparing data

Preparing local path

Using CAT-probing

Finetune

Probing

Visualization

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages