# **GenAI Paper: Enhancing Code Reasoning in StarCoder2 Using Parameter-Efficient Fine-Tuning Techniques**

#### The objective of this project is to fine-tune the large language model StarCoder2-3B using the OpenCodeReasoning dataset, which focuses on code reasoning tasks. The goal is to enhance the model’s ability to understand and explain code structures, identify bugs, and clarify programming concepts. To make the fine-tuning process efficient, the project leverages Supervised Fine-Tuning (SFT) along with LoRA/QLoRA techniques for parameter-efficient adaptation. This enables deployment in TinyLLM settings with reduced memory and compute requirements.

## Step 1: Installing and importing the libraries for Environment setup

In [1]:
# STEP 1: Environment Setup (Run this in your notebook/shell first)
!pip uninstall accelerate peft bitsandbytes transformers trl -y -q
!pip install accelerate peft==0.13.2 bitsandbytes transformers trl==0.12.0 wandb -q
!pip install huggingface_hub

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.7/320.7 kB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.2/310.2 kB[0m [31m31.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m367.1/367.1 kB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 MB[0m [31m31.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m142.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m494.8/494.8 kB[0m [31m42.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━

## Step 2: Import required libraries for LLM fine tuning

In [2]:
# STEP 2: Imports
import os
import torch
import wandb
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    EarlyStoppingCallback,
    pipeline,
)
from peft import LoraConfig
from trl import SFTTrainer

## Step 3: Specify the Base Model and Dataset from Hugging Face Hub

In [3]:
# STEP 3: Configuration
model_identifier = "bigcode/starcoder2-3b"
formatted_dataset = "nvidia/OpenCodeReasoning"


## Step 3: Specify LoRA and QLoRA hyper parameters for fine-tuning

In [4]:
# LoRA hyperparameters
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1

In [5]:
# QLoRA config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16
)

In [6]:
# Device map: Load the entire model onto CUDA device 0
device_map = {"": 0}


## Step 4: Loading the pre-trained starcoder2-3b model

In [7]:
# STEP 5: Load Model & Tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_identifier,
    quantization_config=bnb_config,
    device_map=device_map,
    trust_remote_code=True
)
model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_identifier, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/700 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/12.1G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/958 [00:00<?, ?B/s]

## Step 5: Loading the dataset

In [8]:
# STEP 6: Load Dataset (first 5k samples)
from datasets import load_dataset

# Pick one of the two configs: 'split_0' or 'split_1'
dataset = load_dataset("nvidia/OpenCodeReasoning", "split_0", split="split_0[:10000]")

README.md: 0.00B [00:00, ?B/s]

Resolving data files:   0%|          | 0/30 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/30 [00:00<?, ?it/s]

Downloading data:   0%|          | 0/30 [00:00<?, ?files/s]

train-00000-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00001-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00002-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00003-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00004-of-00030.parquet:   0%|          | 0.00/241M [00:00<?, ?B/s]

train-00005-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00006-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00007-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00008-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00009-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00010-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00011-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00012-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00013-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00014-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00015-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00016-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00017-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00018-of-00030.parquet:   0%|          | 0.00/243M [00:00<?, ?B/s]

train-00019-of-00030.parquet:   0%|          | 0.00/242M [00:00<?, ?B/s]

train-00020-of-00030.parquet:   0%|          | 0.00/328M [00:00<?, ?B/s]

train-00021-of-00030.parquet:   0%|          | 0.00/394M [00:00<?, ?B/s]

train-00022-of-00030.parquet:   0%|          | 0.00/393M [00:00<?, ?B/s]

train-00023-of-00030.parquet:   0%|          | 0.00/328M [00:00<?, ?B/s]

train-00024-of-00030.parquet:   0%|          | 0.00/240M [00:00<?, ?B/s]

train-00025-of-00030.parquet:   0%|          | 0.00/256M [00:00<?, ?B/s]

train-00026-of-00030.parquet:   0%|          | 0.00/389M [00:00<?, ?B/s]

train-00027-of-00030.parquet:   0%|          | 0.00/384M [00:00<?, ?B/s]

train-00028-of-00030.parquet:   0%|          | 0.00/384M [00:00<?, ?B/s]

train-00029-of-00030.parquet:   0%|          | 0.00/382M [00:00<?, ?B/s]

Generating split_0 split:   0%|          | 0/567850 [00:00<?, ? examples/s]

In [17]:
dataset.column_names

['id',
 'input',
 'output',
 'source',
 'license',
 'dataset',
 'split',
 'difficulty',
 'solution']

In [19]:
dataset[0]["input"]

'Problem description.\nVipul is a hardworking super-hero who maintains the bracket ratio of all the strings in the world. Recently he indulged himself in saving the string population so much that he lost his ability for checking brackets (luckily, not permanently ).Being his super-hero friend\xa0help him in his time of hardship.\nInput\n\nThe first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.\nThe first line of each test case contains a single string S denoting the string to be checked.\n\n\nOutput\n\nFor each test case, output a single line printing "YES" or "NO" (without " " and in uppercase only) , denoting if the brackets in the given string is balanced or not .\n\n\nConstraints\n\n1 ≤ T ≤ 10\n1 ≤ length of S ≤ 60\n\n\nExample\nInput:\n3\n((()))\n(())()\n()(()\n\nOutput:\nYES\nYES\nNO\n\n\xa0\n\nExplanation\nExample is self-explanatory.'

## Step 6: Setting up the configuration for the LoRA fine-tuning method

In [10]:
# STEP 7: PEFT/LoRA config
peft_config = LoraConfig(
    r=64,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"]
)

## Step 7: Creating a training configuration by setting the training parameters

In [11]:
# Training arguments with early stopping enabled

training_args = TrainingArguments(
    output_dir="./starcoder2_qlora_results",
    num_train_epochs=1,
    max_steps=500,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=1,
    learning_rate=1e-4,
    weight_decay=0.01,
    optim="paged_adamw_32bit",
    save_steps=500,
    save_total_limit=2,
    logging_steps=25,
    fp16=True,
    bf16=False,
    warmup_ratio=0.05,
    group_by_length=True,
    lr_scheduler_type="cosine",
    gradient_checkpointing=True,
    report_to="wandb"
)

## Step 8: Creating the Supervised Fine-Tuning Trainer

In [12]:
# STEP 9: Initialize Trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    dataset_text_field="input",
    max_seq_length=None,
    packing=False,
)

# STEP 9.1: Print PEFT parameter summary
def print_trainable_parameters(model):
    trainable = 0
    total = 0
    for param in model.parameters():
        total += param.numel()
        if param.requires_grad:
            trainable += param.numel()
    print(f"\nTrainable params: {trainable:,}")
    print(f"Total params: {total:,}")
    print(f"Trainable %: {100 * trainable / total:.4f}%\n")

# Apply on model after PEFT applied
print_trainable_parameters(trainer.model)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.



Trainable params: 36,372,480
Total params: 1,627,573,248
Trainable %: 2.2348%



## Step 9: Training the model

In [None]:
# STEP 10: Train
trainer.train()



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mpsiriuma[0m ([33mpsiriuma-studyml[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


  return fn(*args, **kwargs)


Step,Training Loss
25,3.3391
50,2.472
75,2.1692
100,1.9281
125,1.876
150,1.8379
175,1.7599
200,1.8132
225,1.7052
250,1.7665


TrainOutput(global_step=500, training_loss=1.8682943420410156, metrics={'train_runtime': 1011.6274, 'train_samples_per_second': 1.977, 'train_steps_per_second': 0.494, 'total_flos': 1.7946620865847296e+16, 'train_loss': 1.8682943420410156, 'epoch': 0.2})

## Step 10: Chatting and validation with the model

In [None]:
# Step 11 Model Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from tabulate import tabulate
import re

device = "cuda" if torch.cuda.is_available() else "cpu"
model.eval()

questions = [
    "Explain what is a linked list.",
    "What is the difference between a stack and a queue?",
    "Explain Python function to reverse a string."
]

results = []

for q in questions:
    prompt = f"<|user|>\n{q}\n<|assistant|>\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=False,
            num_beams=3,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode and clean the output
    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
    answer = decoded.split("<|assistant|>")[-1].strip()

    # Remove artifacts like repeated prompts or template tokens
    answer = re.sub(r'<\|.*?\|>', '', answer).strip()
    answer = re.sub(r'\n+', ' ', answer)  # collapse newlines
    answer = re.sub(r'\s{2,}', ' ', answer)  # remove extra spaces

    results.append([q, answer])

# Display clean table
print(tabulate(results, headers=["Question", "Model Response"], tablefmt="grid"))


+-----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Question                                            | Model Response                                                                                                                                                                                                                                        