#Supervised Fine-Tuning (SFT)

**GOAL**

given a foundation model (in thie case llama-2-7B) that was pretrained on a broad, general purpose corpus, our goal is to fine tune the model on a specific task through supervised learning approach. SFT is a general purpose to improve model performance on a specific downstream task which is usually domain specific. SFT could be directly applied to a foundational model or a domain adapted pre trained model.

in this case we use open source verilog code dataset containing description of the verilog code in natural language as input and the actual verilog code as output. We demonstrate that SFT model trained on this specific dataset could be used for domain specific code generation given an input prompt, which would be very useful in developing coding copilot applications.

**Software Requirements**

1. access to latest NeMo framework NGC Containers
2. this playbook has been tested on: nvcr.io/nvidia/nemo:25.02'. it is expected to work similarly on other environments

Launch the NeMo framework container

In [None]:
!docker run -it -p 8080:8080 -p 8088:8088 --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:25.02

Launch Jupyter Notebook as follows

In [None]:
!jupyter notebook --allow-root --ip 0.0.0.0 --port 8088 --no-browser --NotebookApp.token=''

**Hardware Requirements**

This playbook has been tested on 2xA100 80G but can be scaled to multiple GPUs as well as multiple nodes by modifying the appropriate parameters

**Step1**

download the llama-2-7b model from hugging face and convert it to .nemo format, remove the original download once conversion is complete

In [None]:
!git lfs install
!git clone https://huggingface.co/meta-llama/Llama-2-7b-hf

In [None]:
!cd Llama-2-7b-hf
!python3 ../convert.py
!cd ..
!rm -rf Llama-2-7b-hf/

**Step2**

download the verilog dataset, preprocess the dataset to train, validation, test split then run the supervised fine tuning step


In [None]:
!cd code/
!python3 run_sft.py

**Step3**

once the SFT step is complete, run the inference step to generate prediction on both base and SFT models

In [None]:
!python3 run_inference.py

**Step4**

once the predictions are made, we evaluate the prediction's ROUGE scores. You should expect the SFT model's ROUGE score is much higher than that of the base model's scores.

In [None]:
!python3 /opt/NeMo/scripts/metric_calculation/compute_rouge.py --ground-truth /workspace/data/verilog/test.jsonl --preds /workspace/inference/base_llama_prediction.jsonl --answer-field "output" 
!python3 /opt/NeMo/scripts/metric_calculation/compute_rouge.py --ground-truth /workspace/data/verilog/test.jsonl --preds /workspace/inference/sft_prediction.jsonl --answer-field "output"