Impact Pre-training

This repository is the replication package of the work "Automating Code-Related Tasks Through Transformers: The Impact of Pre-training"

The SLR folder contains the material from the sistematic literature review. In particular:

SLR/queries.numbers contains the queries executed for each source;
SLR/data contains the collected papers.

The code folder contains the scripts to reproduce our experiments: In particular:

code/training contains the Google Colab scripts to run the pre-training and the fine-tuning. Note that you need a Pro Goggle Colab account tu succesfully run the scripts (on the TPUs);
code/cleaning contains the scripts we used to clean the dataset;
code/generate_mutants contains all the necessary to generate mutants of given Java methods.
code/tokenizer contains the tokenizer model and vocabulary.

The results folder contains statistical analysis, BLEU score and Levenstein distance of the models predictions.

We stored all the processed data (pre-training datasets and fine-tuning datasets) and all the trained models checkpoints (for each model we stored the final/best chekpoint only) on Zenodo, available at the following links:

datasets: https://zenodo.org/record/7052859#.YyGtUewzZoY;
models chekpoints: https://zenodo.org/record/7078746#.YyGuG-wzZoY;

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
SLR		SLR
code		code
results		results
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLR

SLR

code

code

results

results

README.md

README.md

Repository files navigation

Impact Pre-training

About

Releases 1

Packages

Languages

RosaliaTufano/impact_pre-training

Folders and files

Latest commit

History

Repository files navigation

Impact Pre-training

About

Resources

Stars

Watchers

Forks

Languages