Skip to content

Replication Package of the paper "Automating Code-Related Tasks Through Transformers: The Impact of Pre-training" accepted at ICSE'23

Notifications You must be signed in to change notification settings

RosaliaTufano/impact_pre-training

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Impact Pre-training

This repository is the replication package of the work "Automating Code-Related Tasks Through Transformers: The Impact of Pre-training"

The SLR folder contains the material from the sistematic literature review. In particular:

  • SLR/queries.numbers contains the queries executed for each source;
  • SLR/data contains the collected papers.

The code folder contains the scripts to reproduce our experiments: In particular:

  • code/training contains the Google Colab scripts to run the pre-training and the fine-tuning. Note that you need a Pro Goggle Colab account tu succesfully run the scripts (on the TPUs);
  • code/cleaning contains the scripts we used to clean the dataset;
  • code/generate_mutants contains all the necessary to generate mutants of given Java methods.
  • code/tokenizer contains the tokenizer model and vocabulary.

The results folder contains statistical analysis, BLEU score and Levenstein distance of the models predictions.

We stored all the processed data (pre-training datasets and fine-tuning datasets) and all the trained models checkpoints (for each model we stored the final/best chekpoint only) on Zenodo, available at the following links:

About

Replication Package of the paper "Automating Code-Related Tasks Through Transformers: The Impact of Pre-training" accepted at ICSE'23

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 71.5%
  • Python 28.5%