What do code-based Transformer models learn?

I will be reorganizing the folder structure, refactoring the code and adding comments so that the project is easier to understand once I have the time !

Project Organization

This is the final project for the course Comp 599: Natural Language Understanding with Deep Learning (more info here link) which I completed with Adam Weiss and Hector Leos. We received the maximum grade of 85% (A) for the project.

Abstract

With the advent of state-of-the-art large-scale pre-trained Transformer-based models in various domains of Natural Language Processing, recent work has focused on understanding what about natural language these models learn. While some studies show they learn grammatical structures, others cast doubt over such claims. In addition to natural language tasks, transformer models, such as CodeBERT have also recently been used in many tasks involving code. However, little work has been done to understand whether Transformers are able to understand key structures related to code. In this paper, we provide novel evidence that state-of-the-art Code-Clone detection models are largely invariant to random word-order permutations (ie. they assign the same labels to code pairs that have been permuted and those which haven’t). We provide preliminary empirical evaluation of this phenomenon. Furthermore, we also find evidence that Transformers are capable of capturing important syntactic structures, as previously shown in ML models. Syntax structures such as corefence are captured to a great extent. We discuss the implications of these puzzling findings in the effort of understanding what these models are learning at large.

Our work falls within a general push to better understand how deep learning models function so they can be interpretable.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
COMP599_Milestone_3___Final-1.pdf		COMP599_Milestone_3___Final-1.pdf
Permutation.ipynb		Permutation.ipynb
README.md		README.md
codebertsimilar.ipynb		codebertsimilar.ipynb
json_data.json		json_data.json
run_experiment.py		run_experiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What do code-based Transformer models learn?

Project Organization

Abstract

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What do code-based Transformer models learn?

Project Organization

Abstract

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages