<a href="https://colab.research.google.com/github/panchambanerjee/finetuning_expts/blob/main/huggingface_sft_trl/test_mistral7b_texttosql_peft.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### We have created a Mistral7b text-to-sql model, and we want to test it out here with a few samples

* Code: https://github.com/panchambanerjee/finetuning_expts/blob/main/huggingface_sft_trl/mistral_hf_finetune.ipynb
* Dataset: https://huggingface.co/delayedkarma/mistral-7b-text-to-sql

Primary reference: https://www.philschmid.de/fine-tune-llms-in-2024-with-trl

In [1]:
!pip install -qqq datasets trl peft --progress-bar off

In [15]:
import os
import gc
import torch


from transformers import AutoTokenizer, pipeline
from datasets import load_dataset
from peft import AutoPeftModelForCausalLM
from random import randint
from google.colab import userdata

# Defined in the secrets tab in Google Colab
hf_token = userdata.get('huggingface')

In [16]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
peft_model_id = "delayedkarma/mistral-7b-text-to-sql"
# peft_model_id = args.output_dir

# Load Model with PEFT adapter
model = AutoPeftModelForCausalLM.from_pretrained(
  peft_model_id,
  device_map="auto",
  torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
# load into pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


adapter_config.json:   0%|          | 0.00/680 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/557 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


adapter_model.safetensors:   0%|          | 0.00/1.87G [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'M

In [4]:
# Convert dataset to OAI messages
system_message = """You are a text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
{schema}"""

def create_conversation(sample):
  return {
    "messages": [
      {"role": "system", "content": system_message.format(schema=sample["context"])},
      {"role": "user", "content": sample["question"]},
      {"role": "assistant", "content": sample["answer"]}
    ]
  }

# Load dataset from the hub
dataset = load_dataset("b-mc2/sql-create-context", split="train")
dataset = dataset.shuffle().select(range(100))

# Convert dataset to OAI messages
dataset = dataset.map(create_conversation, remove_columns=dataset.features, batched=False)


Downloading readme:   0%|          | 0.00/4.43k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.8M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [7]:
dataset = dataset.train_test_split(test_size=20/100)

In [8]:
dataset['test']

Dataset({
    features: ['messages'],
    num_rows: 20
})

In [10]:
dataset['test'].to_pandas().head()

Unnamed: 0,messages
0,[{'content': 'You are a text to SQL query tran...
1,[{'content': 'You are a text to SQL query tran...
2,[{'content': 'You are a text to SQL query tran...
3,[{'content': 'You are a text to SQL query tran...
4,[{'content': 'You are a text to SQL query tran...


### Evaluate the dataset

In [17]:
# Load our test dataset
eval_dataset = dataset['test']
rand_idx = randint(0, len(eval_dataset))

# Test on sample
prompt = pipe.tokenizer.apply_chat_template(eval_dataset[rand_idx]["messages"][:2], tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)

print(f"Query:\n{eval_dataset[rand_idx]['messages'][1]['content']}")
print(f"Original Answer:\n{eval_dataset[rand_idx]['messages'][2]['content']}")
print(f"Generated Answer:\n{outputs[0]['generated_text'][len(prompt):].strip()}")


Query:
How many matched scored 3–6, 7–6(5), 6–3?
Original Answer:
SELECT COUNT(surface) FROM table_1028356_3 WHERE score = "3–6, 7–6(5), 6–3"
Generated Answer:
SELECT COUNT(surface) FROM table_1028356_3 WHERE score = "3–6, 7–6(5), 6–3"


In [18]:
# Test on some other samples
rand_idx = randint(0, len(eval_dataset))

prompt = pipe.tokenizer.apply_chat_template(eval_dataset[rand_idx]["messages"][:2], tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)

print(f"Query:\n{eval_dataset[rand_idx]['messages'][1]['content']}")
print(f"Original Answer:\n{eval_dataset[rand_idx]['messages'][2]['content']}")
print(f"Generated Answer:\n{outputs[0]['generated_text'][len(prompt):].strip()}")

Query:
Who directed the title that was written by Adam Milch?
Original Answer:
SELECT directed_by FROM table_12419515_4 WHERE written_by = "Adam Milch"
Generated Answer:
SELECT directed_by FROM table_12419515_4 WHERE written_by = "Adam Milch"


In [19]:
# Test on some other samples
rand_idx = randint(0, len(eval_dataset))

prompt = pipe.tokenizer.apply_chat_template(eval_dataset[rand_idx]["messages"][:2], tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)

print(f"Query:\n{eval_dataset[rand_idx]['messages'][1]['content']}")
print(f"Original Answer:\n{eval_dataset[rand_idx]['messages'][2]['content']}")
print(f"Generated Answer:\n{outputs[0]['generated_text'][len(prompt):].strip()}")

Query:
How many average scores have preliminary scores over 9.25, interview scores more than 9.44, and evening gown scores of 9.77?
Original Answer:
SELECT COUNT(average) FROM table_name_98 WHERE preliminaries > 9.25 AND interview > 9.44 AND evening_gown = 9.77
Generated Answer:
SELECT COUNT(average) FROM table_name_98 WHERE preliminaries > 9.25 AND interview > 9.44 AND evening_gown = 9.77


In [24]:
eval_dataset[rand_idx]["messages"][:2]

[{'content': 'You are a text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_98 (average VARCHAR, evening_gown VARCHAR, preliminaries VARCHAR, interview VARCHAR)',
  'role': 'system'},
 {'content': 'How many average scores have preliminary scores over 9.25, interview scores more than 9.44, and evening gown scores of 9.77?',
  'role': 'user'}]

In [23]:
prompt

'<|im_start|>system\nYou are a text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_98 (average VARCHAR, evening_gown VARCHAR, preliminaries VARCHAR, interview VARCHAR)<|im_end|>\n<|im_start|>user\nHow many average scores have preliminary scores over 9.25, interview scores more than 9.44, and evening gown scores of 9.77?<|im_end|>\n<|im_start|>assistant\n'

For Naive Evaluation refer to Phil Schmid's original notebook, and https://github.com/panchambanerjee/finetuning_expts/blob/main/huggingface_sft_trl/mistral_hf_finetune.ipynb

### For text to text generation, we have to use the full model, here we use only the adapter weights