# Set up environment
Let's start by installing some huggingface libraries which will be needed to do supervised fine tuning of our pretrained LLM model. 
- transformers library for loading up LLM models.
- datasets for downloading datasets from huggingface and preparing them for inferencing and fine tuning.
- bitsandbytes for quantization of LLM weights from higher precision format to lower precision.
- peft stands for parameter efficient fine tuning which implements LoRA adapters, which lowers down the hardware resources needed for fine tuning.
- trl is for using supervised fine tuning trainer class.

In [None]:
pip install -q -U bitsandbytes transformers peft accelerate datasets trl

## Import all necessary modules and loging into huggingface

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, Trainer
import torch
from huggingface_hub import login
login(
  token="", # ADD YOUR TOKEN HERE
  add_to_git_credential=True
)
import os
# disable Weights and Biases
os.environ['WANDB_DISABLED']="true"

# Dataset Format Support for SFTTrainer

The `SFTTrainer` supports popular dataset formats, allowing you to pass the dataset to the trainer directly without any pre-processing. The following formats are supported:

## Conversational Format

```json
{"messages": [{"role": "system", "content": "You are helpful"}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are helpful"}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are helpful"}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "..."}]}
```

## Instruction Format

```json
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
```

If your dataset uses one of the above formats, you can directly pass it to the trainer without pre-processing. The `SFTTrainer` will format the dataset for you using the defined format from the model’s tokenizer with the `apply_chat_template` method.

Now, we will convert our dataset into conversational format in this step, downsample from the dataset and select 12500 rows and make a train split of 10000 rows.

In [None]:
system_message = """You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
{schema}"""
 
def create_conversation(sample):
  return {
    "messages": [
      {"role": "system", "content": system_message.format(schema=sample["context"])},
      {"role": "user", "content": sample["question"]},
      {"role": "assistant", "content": sample["answer"]}
    ]
  }
 
dataset = load_dataset("b-mc2/sql-create-context", split="train")
dataset = dataset.shuffle().select(range(12500))
 
dataset = dataset.map(create_conversation, remove_columns=dataset.features,batched=False)
dataset = dataset.train_test_split(test_size=2500/12500)
 
print(dataset["train"][345]["messages"])
 

# Loading LLM for SFT

Next, we will load our LLM and tokenizer. For our case, we are using google's gemma-2b instruct model. You can change the model by changing the model id.
We will use `AutoModelForCausalLM` for downloading and loading up the LLM model.
Also, we will pass on the quantization config into `AutoModelForCausalLM.from_pretrained` and do the quantization to 4bits.

Correctly, preparing the model and tokenizer for training chat/conversational models is crucial. We need to add new special tokens to the tokenizer and model to teach them the different roles in a conversation. In trl we have a convenient method with setup_chat_format, which:

Adds special tokens to the tokenizer, e.g. <|im_start|> and <|im_end|>, to indicate the start and end of a conversation.
Resizes the model’s embedding layer to accommodate the new tokens.
Sets the chat_template of the tokenizer, which is used to format the input data into a chat-like format. The default is chatml from OpenAI

In [None]:
from trl import setup_chat_format

compute_dtype = getattr(torch, "float16")
quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )
model_name="google/gemma-2b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)
device_map = {"": 0}
model = AutoModelForCausalLM.from_pretrained(model_name,quantization_config=quantization_config,device_map=device_map)
tokenizer.padding_side = 'right' # to prevent warnings
model, tokenizer = setup_chat_format(model, tokenizer)

# Loading up LoRA config

- In this step, we make LoRa config object to be passed on to the SFTT trainer.
- We are making the rank = 32.
- And target modules are Q,K,V for attention block and dense layers. 

In [None]:
from peft import LoraConfig
 
peft_config = LoraConfig(
        lora_alpha=32,
        lora_dropout=0.05,
        r=32,
        bias="none",
        target_modules=["q_proj", "k_proj", "v_proj", "dense"],
        task_type="CAUSAL_LM",
)

# Preparing Training arguments

Pass all the necessary arguments in the TrainingArguments object, here we are not evaluating our model while training. If you want to add evaluation step, pass these arguments too.
-     evaluation_strategy="steps","epoch"
-     eval_steps=100,
-     do_eval=True,

In [None]:
output_dir = f'./peft-gemma-2b-sql-SFTT'

args = TrainingArguments(
    output_dir=output_dir,              # output directory    
    num_train_epochs=1,                 # number of epochs to train    
    per_device_train_batch_size=1,      # Per device batch size to be loaded in device    
    gradient_accumulation_steps=4,      # Gradient accumulation steps for mini-batches   
    gradient_checkpointing=True,        # Gradient checkpoint    
    optim="adamw_torch_fused",              
    logging_steps=25,                   # Logging steps    
    save_strategy="steps",              # Save strategy to be steps, can also be epoch   
    learning_rate=2e-4,                     
    fp16=True,                          # fp16 to be loaded and if your gpu supports bf16 then use that    
    max_grad_norm=0.3,                      
    warmup_ratio=0.03,                      
    lr_scheduler_type="constant",           
    max_steps=1000,                     # Max steps will override the training length
    save_steps=100,                     # Save checkpoint after every save_steps
    overwrite_output_dir = 'True',      # will override the dir content
   
)

# Preparing SFT Trainer object

In this step we pass:
- Max sequence length, you can change this as per your dataset requires.
- We pass the base model and tokenizer
- Training dataset
- peft config
- packing=True, SFTTrainer supports example packing, where multiple short examples are packed in the same input sequence to increase training efficiency.

In [None]:
from trl import SFTTrainer
 
max_seq_length = 1024 # max sequence length for model and packing of the dataset
trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset['train'],
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    packing=True,
    dataset_kwargs={
        "add_special_tokens": False,  # We template with special tokens
        "append_concat_token": False, # No need to add additional separator token
    }
)

## Begin training and then saving the model

In [None]:
# start training, the model will be automatically saved to the hub and the output directory
trainer.train()
 
# save model
trainer.save_model()

### Freeing up some memory

In [None]:
del model
torch.cuda.empty_cache()


# Test and evaluate the fine tuned LLM

- We will use AutoPeftModelForCausalLm from peft library for loading up fine tuned LLM with peft adapters.
- Also loading the tokenizer for the fine tuned model.
- Using pipeline module from transformers for text-generation use case.

In [None]:
from peft import AutoPeftModelForCausalLM
from transformers import pipeline
peft_model = AutoPeftModelForCausalLM.from_pretrained(
  output_dir,
  device_map="auto",
  torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(output_dir)
pipe = pipeline("text-generation", model=peft_model, tokenizer=tokenizer)


## Loading up the base model for evaluation compared to fine tuned model

In [None]:
base_model = AutoModelForCausalLM.from_pretrained(model_name,quantization_config=quantization_config,device_map=device_map)
base_tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model,base_tokenizer=setup_chat_format(base_model, base_tokenizer)
base_model_pipe = pipeline("text-generation", model=base_model, tokenizer=base_tokenizer)

# Final step of evaluation

In this step we will use eval_dataset to evaluate our trained model and will compare it with base model too.

- We will use apply_chat_template to prepare our prompt in conversational format and leave the assistant block empty for generation.
- Looping for 5 samples and generating outputs from both the model

In [None]:
from random import randint

eval_dataset = dataset['test']
for i in range(5):
    rand_idx = randint(0, len(eval_dataset))

    # Test on sample
    prompt = pipe.tokenizer.apply_chat_template(eval_dataset[rand_idx]["messages"][:2], tokenize=False, add_generation_prompt=True)
    base_model_prompt = base_model_pipe.tokenizer.apply_chat_template(eval_dataset[rand_idx]["messages"][:2], tokenize=False, add_generation_prompt=True)

    outputs = pipe(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)
    base_model_outputs = base_model_pipe(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)

    print(f"Context:\n{eval_dataset[rand_idx]['messages'][0]['content']}")
    print(f"Query:\n{eval_dataset[rand_idx]['messages'][1]['content']}")
    print(f"Original Answer:\n{eval_dataset[rand_idx]['messages'][2]['content']}\n")
    print(f"Generated Answer:\n{outputs[0]['generated_text'][len(prompt):].strip()}\n")
    print(f"Base Model Generated Answer:\n{base_model_outputs[0]['generated_text'][len(prompt):].strip()}")

    print("\n\n")

In [None]:
Context:
You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
CREATE TABLE table_name_77 (score VARCHAR, losing_team VARCHAR, total VARCHAR)
Query:
Which Score has a Losing Team of sydney roosters, and a Total of 88?
Original Answer:
SELECT score FROM table_name_77 WHERE losing_team = "sydney roosters" AND total = 88

Generated Answer:
SELECT score FROM table_name_77 WHERE losing_team = "sydney roosters" AND total = 88

Base Model Generated Answer:
Which Score has a Losing Team of sydney roosters, and a Total of 88?



Context:
You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
CREATE TABLE table_name_64 (moving_to VARCHAR, transfer_fee VARCHAR, name VARCHAR)
Query:
Where is Odjidja-Ofoe, with an undisclosed transfer fee, moving to?
Original Answer:
SELECT moving_to FROM table_name_64 WHERE transfer_fee = "undisclosed" AND name = "odjidja-ofoe"
...
The



Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...