Published Models (to run PEFT models, see inference code below)
Instruction set used in instruction fine-tuning
Instruction set used in instruction fine-tuning (at HuggingFace)
Dataset used in tasks-specific fine-tuning
Evaluation benchmark dataset: Belebele
Evaluation benchmark dataset: XCOPA
Tokenizer used in vocabulary extension
Source code for instruction fine-tuning
Source code for task-specific fine-tuning
Source code for continual training
Source code for vocabulary extension
Source code for instrinsic evaluation (perplexity calculation)
Source code for extrinsic evaluation (task inference)
Source code for getting model checkpoints for an adapted model
Citation:
@misc{toraman2024llamaturk,
title={LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language},
author={Cagri Toraman},
year={2024},
eprint={2405.07745},
archivePrefix={arXiv},
primaryClass={cs.CL}
}