Curriculum Learning for Small Code Language Models

This repository contains code and models_checkpoint for the paper "Curriculum Learning for Small Code Language Models". It implements training and evaluation of code language models on the TinyPy dataset using various curriculum learning techniques.

You can access the paper using this link : https://aclanthology.org/2024.acl-srw.44/

Directory Structure

├── README.md
├── code
│   ├── evaluate_*.py - Evaluation scripts
│   ├── model.py - Model definition
│   ├── train_*.py - Training scripts for different techniques
│   └── pycache
├── data
│   ├── TinyPy - Raw TinyPy dataset
│   │   ├── [Split] - Data splits
│   └── TinyPy_processed - Preprocessed binary data  
├── models_checkpoint - Saved model checkpoints

Models

The following models are included:

baseline_model_1M.pth - Baseline model trained on all data
hybrid_cl_model_1M.pth - Hybrid curriculum learning
incremental_cl_model_1M.pth - Incremental curriculum learning
sequential_cl_model_1M.pth - Sequential curriculum learning

Data

The data folder should be added manually by downloading the dataset from Kaggle:

Create a new folder named data inside the code folder.
Go to the Kaggle dataset TinyPy for Curriculum Learning.
Download the dataset.
Extract the contents and place them in the data folder.

Dependencies

numpy
pandas
torch
tqdm
wandb (optional)
fuzzywuzzy

Usage

Train a model

To train a model, use one of the training scripts. For example, to train the baseline model:

python train_baseline.py

To train with hybrid curriculum learning:

python code/train_hybrid_cl.py

To train with incremental curriculum learning:

python code/train_incremental_cl.py

To train with sequential curriculum learning:

python code/train_sequential_cl.py

Evaluate a model

To evaluate a model, use one of the evaluation scripts. For example, to evaluate the code execution:

python evaluate_code_execution.py

To evaluate code completion at the token level:

python code/evaluate_code_completion_tokenlevel.py

To evaluate code completion at the line level:

python code/evaluate_code_completion_linelevel.py

Acknowledgement

This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
code		code
models_checkpoint		models_checkpoint
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Curriculum Learning for Small Code Language Models

Directory Structure

Models

Data

Dependencies

Usage

Train a model

Evaluate a model

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

kamel-yamani/CurriculumLearningForSmallCodeLanguageModels

Folders and files

Latest commit

History

Repository files navigation

Curriculum Learning for Small Code Language Models

Directory Structure

Models

Data

Dependencies

Usage

Train a model

Evaluate a model

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages