# Finetuning Starcoder to build a Python Copilot
_Authored by: [Chandrahas Aroori](https://github.com/Exorust)_

This notebook demonstrates how to finetune the popular coding LLM Starcoder2 aS a python copilot.

In this notebook we will take a look at the easiest way to finetune starcoder, this method can be applied for any generative code applications.

## Model Dependencies

Let us install the Dependencies for the model.

In [None]:
!pip install -U git+https://github.com/huggingface/transformers.git
!pip install -U accelerate
!pip install -U datasets>=2.16.1
!pip install -U bitsandbytes 
!pip install -U peft==0.8.2
!pip install -U wandb==0.16.3
!pip install -U huggingface_hub==0.20.3

## Imports

In [None]:
# Code adapted from https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/supervised_finetuning.py
# and https://huggingface.co/blog/gemma-peft
import argparse
import multiprocessing
import os

import torch
import transformers
from transformers import Trainer
from datasets import Dataset
from accelerate import PartialState
from datasets import load_dataset
from peft import LoraConfig, TaskType, get_peft_model
from transformers import (
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    logging,
    set_seed,
)

# Set up basic arguments
os.makedirs("finetune_starcoder2", exist_ok=True)
set_seed(42)

logging.set_verbosity_error()

## Load the dataset

Here we want to use a python copilot dataset to train our starcoder as a python copilot assistant.

In [None]:
data = load_dataset(
    "matlok/python-copilot-training-from-many-repos-large",
    split="train",
    num_proc=multiprocessing.cpu_count(),
) 

## LoRA and PeFT config

Here we are setting up Peft and BitsAndBytes which works by making the model use less memory by reducing it's quantization. So instead of using 32bit floating point variables, it will now use 4bit loading. 

LoRA works by picking the right weights to retrain instead of reteaching everything from scratch


In [None]:
# config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
lora_config = LoraConfig(
    r=8,
    target_modules=[
        "q_proj",
        "o_proj",
        "k_proj",
        "v_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    task_type="CAUSAL_LM",
)


## Training

Training the model!

In [None]:
# Constants
MAX_SEQ_LENGTH = 1024
MAX_STEPS = 10
MICRO_BATCH_SIZE = 1
GRADIENT_ACCUMULATION_STEPS = 4
WEIGHT_DECAY = 0.01
BF16 = True

ATTENTION_DROPOUT = 0.1
LEARNING_RATE = 2e-4
LR_SCHEDULER_TYPE = "cosine"
WARMUP_STEPS = 100
NUM_PROC = None
PUSH_TO_HUB = True


# load model and dataset
token = os.getenv("HF_TOKEN")
model = AutoModelForCausalLM.from_pretrained(
    "bigcode/starcoder2-3b",
    quantization_config=bnb_config,
    device_map={"": PartialState().process_index},
    attention_dropout=ATTENTION_DROPOUT,
)

model.print_trainable_parameters()
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# setup the trainer
trainer = Trainer(
    model=model,
    train_dataset=data,
    # max_seq_length=args.max_seq_length,
    args=transformers.TrainingArguments(
        # per_device_train_batch_size=args.micro_batch_size,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
        warmup_steps=WARMUP_STEPS,
        max_steps=MAX_STEPS,
        learning_rate=LEARNING_RATE,
        lr_scheduler_type=LR_SCHEDULER_TYPE,
        weight_decay=WEIGHT_DECAY,
        fp16 =True,
        logging_strategy="steps",
        logging_steps=1,
        output_dir="finetune_starcoder2",
        optim="paged_adamw_8bit",
        seed=42,
        run_name=f"train-starcoder2-3b",
        report_to="wandb",
    ),
)

# launch
print("Training...")
trainer.train()

print("Saving the last checkpoint of the model")
model.save_pretrained(os.path.join("finetune_starcoder2", "final_checkpoint/"))
trainer.push_to_hub("Upload model")
print("Training Done! 💥")
