## Register for a Hugging Face Account
If you do not already have a Hugging Face account, you will need to create one. Register at the following link to get started:

- [Register here](https://huggingface.co/) to create a Hugging Face account.

## Use Facebook's Latest Language Model: Meta-Llama-3-8B
In this notebook, we will be working with Facebook's latest language model, "meta-llama/Meta-Llama-3-8B". Make sure to obtain the necessary , refer to [this link](https://huggingface.co/meta-llama/Meta-Llama-3-8B).

link.

## Educational Resources
To maximize the potential of Hugging Face's offerings, familiarize yourself with the following tutorials:

- **Language Model Tutorial:** Learn how to leverage large language models effectively by consulting the [Transformers documentation](https://huggingface.co/docs/transformers/index).
- **Open Source Dataset Tutorial:** Explore and utilize datasets available on Hugging Face with the [Datasets documentation](https://huggingface.co/docs/datasets/index).
- **Fine-Tuning Models Tutorial:** For hands-on guidance on fine-tuning models within notebooks, refer to [this tutorial](https://huggingface.co/docs/transformers/notebooks).


In [1]:
# Import the 'login' function from the huggingface_hub library to authenticate and access your Hugging Face account.
from huggingface_hub import login
# Call the login function to authenticate. You'll need to enter your credentials or token.
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [2]:
import torch
import os
import sys
import json
import pandas as pd
from IPython.display import display, HTML
from datetime import datetime
import datasets
from datasets import load_dataset

import torch
import trl
from trl import setup_chat_format,SFTTrainer

import peft
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model, AutoPeftModelForCausalLM
import transformers
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline
)

from random import randint
print("Torch version:", torch.__version__)
print("Datasets version:", datasets.__version__)
print("TRL (Transformers Reinforcement Learning) version:", trl.__version__)
print("PEFT (Parameter Efficient Fine-Tuning) version:", peft.__version__)
print("Transformers version:", transformers.__version__)

2024-06-02 20:29:18.285930: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-02 20:29:18.636457: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-02 20:29:18.636491: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-02 20:29:18.636522: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-02 20:29:18.652420: I tensorflow/core/platform/cpu_feature_g

Torch version: 2.1.2+cu118
Datasets version: 2.19.1
TRL (Transformers Reinforcement Learning) version: 0.8.6
PEFT (Parameter Efficient Fine-Tuning) version: 0.11.1
Transformers version: 4.40.2


# Fine-Tuning a Large Language Model (LLM) for SQL Query Generation

## Notebook Overview
This notebook focuses on fine-tuning a Large Language Model (LLM) to perform a specialized task in natural language processing. The objective is to train the model to generate SQL queries based on provided language descriptions (questions) and SQL CREATE TABLE statements (context). 

## Dataset
We will utilize an open-source dataset available on Hugging Face, which is specifically structured for training models to understand and generate SQL queries from natural language descriptions and table contexts.

- **Dataset Source:** You can access the dataset and view its detailed structure and samples by visiting [b-mc2/sql-create-context on Hugging Face](https://huggingface.co/datasets/b-mc2/sql-create-context). This dataset provides a rich collection of examples where each entry includes:
  - `question`: A natural language description of the data retrieval or manipulation task.
  - `context`: SQL CREATE TABLE statements providing the schema of the database relevant to the question.
  - `answer`: The SQL query that correctly retrieves or manipulates the data as described in the question.

## Objective
The goal of this notebook is to demonstrate how to set up, train, and evaluate a model using this dataset.


<span style="color: red;">Furthermore, to give you an idea of the data needed for the model, consider what type of data is necessary when training a LLM to generate playwright code for UI testing.</span>



1. What is UI Testing?
2. What is Playwright and How to Use It?
3. What is an LLM?
4. How to Use HuggingFace ？
5. How to Finetune a Large Model？
6. Type of Data Needed for Finetuning for a UI Testing Task？ refer to attached slides.
7. Evaluating and Testing the Model's Performance ?





Here's some inspiration

## Reflection on UI Testing for Software Development
When integrating this LLM  into a software application, conducting UI tests is crucial to ensure the system functions correctly from the user's perspective. This section details the critical considerations for inputting, processing, and outputting data during UI tests.

### Model Inputs for UI Testing
- **User Input**: The actual text input by users, simulating real-world usage where they articulate their needs in terms of software interactions. This input tests the model’s ability to interpret and respond to varied, unstructured human language.
- **Schema Context**: A consistent input that provides the model with the structure of the software during testing, ensuring that the outputs are applicable and correctly formatted. This schema defines the boundaries and possibilities for the user's requests.

### Required Context for UI Testing
- **Relevance and Completeness**: The model must fully understand the provided software schema and the user's request relevance to generate accurate responses. This understanding is essential for ensuring that the responses are both technically correct and contextually appropriate.
- **Interaction History**: By maintaining a record of past interactions, the model can refine its responses based on previous questions and answers. This history helps improve the model’s accuracy and adaptability over time, providing more personalized responses to the user.

### Expected Outputs from the Model
- **Playwright Code**: The model should output accurate Playwright code that fulfills the user’s request based on the given description and software schema. Playwright is used here as a tool for automating browser tasks based on user requirements expressed in natural language.
- **Feedback Mechanisms**: If the model is unable to generate valid Playwright code or if the request is too ambiguous, it should offer feedback or request further clarification. This mechanism is crucial for identifying the model's limitations during user testing and for improving user experience by preventing misinterpretations and errors in real-world applications.

### Considerations for Effective UI Testing
- **Test Scenario Coverage**: Develop comprehensive test scenarios that cover a wide range of potential user interactions to ensure the model can handle diverse and unexpected inputs.
- **Error Handling**: Evaluate how the system responds to incorrect inputs or non-executable requests. Effective error handling is crucial for maintaining usability and user trust.
- **Performance Metrics**: Assess how quickly the model processes inputs and generates outputs. Response time is a critical factor in user satisfaction, especially in interactive applications where delays can disrupt the user experience.

Implementing thorough UI testing will help guarantee that the model not only performs well under controlled conditions but also operates effectively and reliably in real-world scenarios, enhancing the overall utility and user-friendliness of the software.


In [None]:
# Define a system message template for the conversational model.
# This message sets the context by describing the role of the assistant and providing a database schema.
system_message = """You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.
SCHEMA:
{schema}"""

# Function to format individual samples from the dataset into a conversation format for training the model.
def create_conversation(sample):
    # Creates a dictionary representing a conversation with system, user, and assistant messages.
    return {
        "messages": [
            {"role": "system", "content": system_message.format(schema=sample["context"])},  # System role message with schema
            {"role": "user", "content": sample["question"]},  # User role message with the query question
            {"role": "assistant", "content": sample["answer"]}  # Assistant role message with the SQL query answer
        ]
    }

# Load the dataset from the Hugging Face Hub, specifically using the 'train' split.
dataset = load_dataset("b-mc2/sql-create-context", split="train")
# Shuffle the dataset and select the first 12,500 entries for processing.
dataset = dataset.shuffle().select(range(12500))

# Apply the create_conversation function to each sample in the dataset, converting them to the required format.
# Remove the original columns from the dataset as they are no longer needed after conversion.
dataset = dataset.map(create_conversation, remove_columns=dataset.features, batched=False)
# Split the dataset into 10,000 training samples and 2,500 test samples.
dataset = dataset.train_test_split(test_size=2500/12500)


In [4]:
# The `DatasetDict` object contains two subsets of data: `train` and `test`, which are used for training and evaluating the model, respectively.

dataset

DatasetDict({
    train: Dataset({
        features: ['messages'],
        num_rows: 10000
    })
    test: Dataset({
        features: ['messages'],
        num_rows: 2500
    })
})

In [5]:

def convert_to_dataframe(data):
    # Initialize lists to hold data for each column
    system_messages = []
    user_questions = []
    assistant_answers = []
    
    # Iterate through each conversation in the data
    for conversation in data:
        # Append each message to the corresponding list
        system_messages.append(conversation[0]['content'])
        user_questions.append(conversation[1]['content'])
        assistant_answers.append(conversation[2]['content'])
    
    # Create a DataFrame from the lists
    df = pd.DataFrame({
        'System Message': system_messages,
        'User Question': user_questions,
        'Assistant Answer': assistant_answers
    })
    
    return df


df = convert_to_dataframe(dataset["train"][:10]["messages"])
display(HTML(df.to_html()))

Unnamed: 0,System Message,User Question,Assistant Answer
0,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_97 (player VARCHAR, debut VARCHAR, born VARCHAR)","Who was the player with a debut of age 18 v plymouth , 14 august 2012, and born in Portsmouth?","SELECT player FROM table_name_97 WHERE debut = ""age 18 v plymouth , 14 august 2012"" AND born = ""portsmouth"""
1,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_17968282_1 (points VARCHAR, team VARCHAR)",Name the total number of points for newell's old boys,"SELECT COUNT(points) FROM table_17968282_1 WHERE team = ""Newell's Old Boys"""
2,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_19359427_6 (manner_of_departure VARCHAR, outgoing_manager VARCHAR)",What was Gary Megson's manner of departure?,"SELECT manner_of_departure FROM table_19359427_6 WHERE outgoing_manager = ""Gary Megson"""
3,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_88 (silver INTEGER, total VARCHAR)",What is the average number of silver medals won among nations that won 9 medals total?,SELECT AVG(silver) FROM table_name_88 WHERE total = 9
4,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_17 (silver VARCHAR, location VARCHAR)",What Silver has the Location of Guangzhou?,"SELECT silver FROM table_name_17 WHERE location = ""guangzhou"""
5,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_55 (circuit VARCHAR, date VARCHAR)",What Circuit has a Date of 25 july?,"SELECT circuit FROM table_name_55 WHERE date = ""25 july"""
6,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_21 (opponent VARCHAR, game VARCHAR)",Who is the opponent in game 7?,SELECT opponent FROM table_name_21 WHERE game = 7
7,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_35 (original_title VARCHAR, director VARCHAR, film_title_used_in_nomination VARCHAR)","What is Original title, when Director is Veljko Bulajić category:articles with hcards, and when Film title used in nomination is Train Without A Timetable?","SELECT original_title FROM table_name_35 WHERE director = ""veljko bulajić category:articles with hcards"" AND film_title_used_in_nomination = ""train without a timetable"""
8,"You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_23285805_6 (score VARCHAR, record VARCHAR)",What was the score with the team record was 15-25?,"SELECT score FROM table_23285805_6 WHERE record = ""15-25"""
9,You are an text to SQL query translator. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA.\nSCHEMA:\nCREATE TABLE table_name_93 (tournament VARCHAR),"During the Hamburg Masters Tournament, during which Jiří Novák was absent(A) in 1998, how did he do in 1997?","SELECT 1997 FROM table_name_93 WHERE 1998 = ""a"" AND tournament = ""hamburg masters"""


In [6]:
# save datasets to disk
dataset["train"].to_json("data/train_dataset.json", orient="records")
dataset["test"].to_json("data/test_dataset.json", orient="records")

Creating json from Arrow format:   0%|          | 0/10 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/3 [00:00<?, ?ba/s]

1193600

## Loading a Pre-trained Model and Training

In [7]:
# Define the identifier for the model to be loaded, specifically "meta-llama/Meta-Llama-3-8B".
model_id = "meta-llama/Meta-Llama-3-8B" 

# Configure the model for low-precision (4-bit) quantization to reduce memory usage and potentially increase inference speed.
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Enable loading the model in 4-bit precision
    bnb_4bit_use_double_quant=True,  # Use double quantization for better precision handling
    bnb_4bit_quant_type="nf4",  # Set the quantization type to 'nf4', a specific 4-bit format
    bnb_4bit_compute_dtype=torch.bfloat16  # Use bfloat16 as the datatype for computation to balance performance and accuracy
)

# Load the pre-trained causal language model from Hugging Face's model hub.
# The model is automatically distributed across available GPUs if possible using `device_map="auto"`.
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",  # Automatically distribute the model across available GPUs
    attn_implementation="flash_attention_2",  # Use an optimized attention mechanism for better performance
    torch_dtype=torch.bfloat16,  # Use bfloat16 as the default tensor data type for all model parameters
    quantization_config=bnb_config  # Apply the defined quantization configuration
)

# Load the tokenizer associated with the model, enabling fast tokenization.
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
tokenizer.padding_side = 'right'  # Set padding to the right to align with model's expectations and prevent warnings

# Redefine the pad_token and pad_token_id to use the out-of-vocabulary token (unk_token), 
# which can help in handling tokens that are not in the tokenizer's vocabulary during processing.
tokenizer.pad_token = tokenizer.unk_token
tokenizer.pad_token_id = tokenizer.unk_token_id

# Set up the chat format for the model using a predefined chat template (ChatML) designed for OpenAI API interactions.
# This step is useful for ensuring the model and tokenizer are properly configured for generating responses in a chat environment.
model, tokenizer = setup_chat_format(model, tokenizer)




Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [8]:
model
# model structure is not complex

# Embedding: 
# The model begins with an embedding layer with 128,258 tokens each mapped to 4,096-dimensional vectors. 
# This layer converts input token IDs into vectors that the model can process.

# Layers: 
# The core of the model consists of 32 decoder layers stacked sequentially, which are encapsulated within a ModuleList. 
# Each layer is self-attention block + 2 fully connected layers

# LM Head:
# A linear layer that maps the final output of the decoder stack back to the vocabulary size (128,258), which is used to predict the next token in the sequence.

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128258, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaFlashAttention2(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): 

In [9]:
# Define a function to count the number of trainable parameters in a model.
def count_parameters(model):
    # Iterate over all the parameters in the model, summing up the number of elements (numel) for those parameters that are trainable (requires_grad=True).
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Print the total number of trainable parameters in the model. This helps understand the complexity and capacity of the model.
print("Number of trainable parameters:", count_parameters(model))

Number of trainable parameters: 1050955776


In [10]:
# Load jsonl training  data from disk for sql
dataset = load_dataset("json", data_files="data/train_dataset.json", split="train")

Generating train split: 0 examples [00:00, ? examples/s]

In [11]:
# Configure the LoRA settings for model adaptation based on the QLoRA paper and Sebastian Raschka's experiments.
# LoRA introduces low-rank matrices that modify the behavior of pre-trained models with minimal additional parameters.
peft_config = LoraConfig(
    lora_alpha=128,  # The scaling factor for the learning rate of the LoRA parameters.
    lora_dropout=0.05,  # The dropout rate applied to the LoRA weights to prevent overfitting.
    r=256,  # The rank of the low-rank matrices. A higher rank increases model capacity and expressivity.
    bias="none",  # No bias is used in the LoRA layers to keep the adaptation simpler and more focused on weight adjustments.
    target_modules="all-linear",  # Apply LoRA adaptation to all linear modules in the model.
    # Alternatively, specify particular modules to apply LoRA using a list, e.g., ["embed_tokens", "lm_head", "q_proj", "v_proj"].
    task_type="CAUSAL_LM",  # The type of task for which the model is being adapted. Here, it's causal language modeling.
)


In [12]:
args = TrainingArguments(

    output_dir="Meta-Llama-3-8B-text-to-sql-flash-attention-2",    # directory to save and repository id

    num_train_epochs=3,                     # number of training epochs
    per_device_train_batch_size=3,          # batch size per device during training
    gradient_accumulation_steps=2,          # number of steps before performing a backward/update pass
    gradient_checkpointing=True,            # use gradient checkpointing to save memory
    optim="adamw_torch_fused",              # use fused adamw optimizer
    logging_steps=10,                       # log every 10 steps
    save_strategy="epoch",                  # save checkpoint every epoch
    learning_rate=2e-4,                     # learning rate, based on QLoRA paper
    bf16=True,                              # use bfloat16 precision
    tf32=True,                              # use tf32 precision
    max_grad_norm=0.3,                      # max gradient norm based on QLoRA paper
    warmup_ratio=0.03,                      # warmup ratio based on QLoRA paper
    lr_scheduler_type="constant",           # use constant learning rate scheduler
    push_to_hub=True,                       # push model to hub
    report_to="tensorboard",                # report metrics to tensorboard
)

In [13]:
# Define the maximum sequence length for the model's input. This is the longest sequence of tokens the model can process in a single batch.
max_seq_length = 3072  # Max sequence length for model input and dataset packing

# Instantiate a trainer for supervised fine-tuning (SFT) with specific configurations for LoRA and other settings.
trainer = SFTTrainer(
    model=model,  # The pre-trained model that will be fine-tuned
    args=args,  # Training arguments, typically including learning rate, training epochs, etc.
    train_dataset=dataset,  # The dataset used for training the model
    peft_config=peft_config,  # Parameter-Efficient Fine-Tuning configuration as defined earlier (includes LoRA settings)
    max_seq_length=max_seq_length,  # Use the defined maximum sequence length for tokenizing and batching
    tokenizer=tokenizer,  # The tokenizer for processing text data into a format suitable for the model
    packing=True,  # Enable sequence packing to more efficiently handle variable-length sequences within batches
    dataset_kwargs={
        "add_special_tokens": False,  # Do not automatically add special tokens (like [CLS], [SEP]), because they are manually templated
        "append_concat_token": False,  # Avoid adding an extra separator token at the end of sequences
        # "dataset_text_field": "text",  # Uncomment this line to specify which field in the dataset contains the text data if needed
    }
)


Generating train split: 0 examples [00:00, ? examples/s]



In [14]:
# start training, the model will be automatically saved to the hub and the output directory
trainer.train()

# save model
trainer.save_model()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.bfloat16.


Step,Training Loss
10,0.9287
20,0.6393
30,0.5899
40,0.5733
50,0.5602
60,0.5199
70,0.4887
80,0.4759
90,0.474
100,0.4804




In [15]:

# free the memory again
del model
del trainer
torch.cuda.empty_cache()
     

## Inferece

In [16]:


peft_model_id = "./Meta-Llama-3-8B-text-to-sql-flash-attention-2"


# Load Model with PEFT adapter
model = AutoPeftModelForCausalLM.from_pretrained(
  peft_model_id,
  device_map="auto",
  torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
# load into pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'JambaForCaus

In [17]:
# Load our test dataset
eval_dataset = load_dataset("json", data_files="data/test_dataset.json", split="train")
rand_idx = randint(0, len(eval_dataset))

# Test on sample
prompt = pipe.tokenizer.apply_chat_template(eval_dataset[rand_idx]["messages"][:2], tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=False, temperature=0.1, top_k=50, top_p=0.1, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)

print(f"Query:\n{eval_dataset[rand_idx]['messages'][1]['content']}")
print(f"Original Answer:\n{eval_dataset[rand_idx]['messages'][2]['content']}")
print(f"Generated Answer:\n{outputs[0]['generated_text'][len(prompt):].strip()}")

Generating train split: 0 examples [00:00, ? examples/s]



Query:
What is the average horizontal bar points for all gymnasts?
Original Answer:
SELECT AVG(Horizontal_Bar_Points) FROM gymnast
Generated Answer:
SELECT AVG(Horizontal_Bar_Points) FROM gymnast
