<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/FTA_DEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q -U bitsandbytes
!pip install -q -U transformers
!pip install -q -U datasets
!pip install -q -U accelerate
!pip install -q -U peft
!pip install -q -U trl


In [2]:
!nvidia-smi

Sat Feb 22 19:28:57 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:00:04.0 Off |                    0 |
| N/A   29C    P0             41W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [1]:
import os

#Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).

os.environ["WANDB_MODE"] = "offline"

os.environ["WANDB_DISABLED"] = "true"


!pip install transformers accelerate --quiet

from transformers import TrainingArguments
import accelerate

# Initialize the Accelerator
accelerator = accelerate.Accelerator()

## FineTuningAgent-OODA

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
import torch
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training # Import LoraConfig
import warnings
from trl import SFTTrainer
warnings.filterwarnings("ignore")

class FineTuningAgent:
    """
    An agent that fine-tunes a language model for text-to-SQL translation,
    structured according to the OODA loop.
    """

    def __init__(self, model_id, dataset_name, config):
        """
        Initializes the FineTuningAgent with model ID, dataset name, and configuration.
        """
        self.model_id = model_id
        self.dataset_name = dataset_name
        self.config = config
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    def _observe(self):
        """
        Observes the environment by loading the model, tokenizer, and dataset.
        """
        # 1. Load Model and Tokenizer (with quantization if enabled)
        quantization_config = None
        if self.config.get("quantization"):
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch.bfloat16,
            )

        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_id,
            quantization_config=quantization_config,
            trust_remote_code=True,
        )
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_id, trust_remote_code=True)


        self.model.config.use_cache=False
        self.model.gradient_checkpointing_enable() #enable gradient checkpoint

        # Add padding token if it does not exist
        if self.tokenizer.pad_token is None:
            self.tokenizer.add_special_tokens({'pad_token': '[PAD]'})
            self.model.resize_token_embeddings(len(self.tokenizer))

        # Move model to device
        self.model.to(self.device)

        #for dataset of size 78577.
        # 2. Load Dataset (using dataset name from Hugging Face Hub)
        dataset = load_dataset(self.dataset_name, split="train")
        self.dataset = dataset.shuffle().select(range(52500))  # Limit to 5250

    def _orient(self):
        """
        Orients the agent by formatting the dataset and preparing training arguments.
        """
        # Convert dataset to OAI messages
        system_message = """You are an text to SQL query translator. User will provide you with a table schema and a question, and you will generate a SQL query to answer the question.
        SCHEMA:

        {schema}"""

        def create_conversation(sample):
            return {
                "messages": [
                    {"role": "system", "content": system_message.format(schema=sample["context"])},
                    {"role": "user", "content": sample["question"]},
                    {"role": "assistant", "content": sample["answer"]}
                ],
                # Retain original columns
                "question": sample["question"],
                "context": sample["context"],
                "answer": sample["answer"]
            }

        self.dataset = self.dataset.map(create_conversation, remove_columns=self.dataset.column_names)
        self.dataset = self.dataset.train_test_split(test_size=2500 / 52500)

        # 3. Prepare Training Arguments
        self.training_args = TrainingArguments(**self.config.get("training_args"))
        self.training_args.remove_unused_columns = False

    def _decide(self):
        """
        Decides on the fine-tuning strategy, including LoRA configuration.
        """
        # 4. PEFT Configuration (LoRA)
        if self.config.get("lora"):
            self.model = prepare_model_for_kbit_training(self.model)

            peft_config = LoraConfig(
                lora_alpha=128,
                lora_dropout=0.05,
                r=256,
                bias="none",
                target_modules="all-linear",
                task_type="CAUSAL_LM",
            )

            self.peft_config = peft_config

            self.model = get_peft_model(self.model, peft_config)
            print('\n')
            self.model.print_trainable_parameters()
            print('\n')

    def _act(self):
        """
        Acts by preprocessing the dataset and initializing the Trainer.
        """
        # Preprocess the data
        self.dataset = self.dataset.map(
            self._preprocess_function,
            batched=True,
            remove_columns=self.dataset["train"].column_names,
        )

        # 6. Initialize Trainer
        self.trainer = SFTTrainer(
            model=self.model,
            args=self.training_args,
            train_dataset=self.dataset["train"],
            eval_dataset=self.dataset["test"],
            preprocess_logits_for_metrics=False,
            peft_config=self.peft_config,
        )



    def _preprocess_function(self, examples):
        """
        Preprocesses the data by combining context and question, tokenizing, and formatting labels.
        """
        inputs = [f"### Question: {q} ### Context: {c}" for q, c in zip(examples["question"], examples["context"])]
        # adding padding and max_length to model inputs tokenization
        model_inputs = self.tokenizer(inputs, max_length=1024, truncation=True, padding="max_length")
        with self.tokenizer.as_target_tokenizer():
            # adding padding and max_length to labels tokenization
            labels = self.tokenizer(examples["answer"], max_length=1024, truncation=True, padding="max_length")
        model_inputs["labels"] = labels["input_ids"]
        return model_inputs

    def run(self):
        """
        Executes the OODA loop and fine-tunes the language model.
        """
        self._observe()
        self._orient()
        self._decide()
        self._act()

        # Train the model
        self.trainer.train()

    def evaluate(self):
        """
        Evaluates the fine-tuned language model.
        """
        return self.trainer.evaluate()

In [5]:
# Example Usage
config = {
    "training_args": {
        "output_dir": "./results",
        "num_train_epochs": 1,
        "per_device_train_batch_size": 3,
        "gradient_accumulation_steps": 2,
        "report_to":None,
        "gradient_checkpointing":True,            # use gradient checkpointing to save memory
        "optim":'adamw_torch_fused',              # use fused adamw optimizer
        "logging_steps":500,                       # log every 10 steps
        "save_strategy":'epoch',                  # save checkpoint every epoch
        "learning_rate":2e-4,                     # learning rate, based on QLoRA paper
        "bf16":True,                              # use bfloat16 precision
        "tf32":True,                              # use tf32 precision
        "max_grad_norm":0.3,                      # max gradient norm based on QLoRA paper
        "warmup_ratio":0.03,                      # warmup ratio based on QLoRA paper
        "lr_scheduler_type":'constant',
    },
    "quantization": True,
    "lora": True,  # Enable LoRA
}

agent = FineTuningAgent(
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    dataset_name="b-mc2/sql-create-context",
    config=config,
)

In [None]:
# Initiate the OODA loop and fine-tuning process
agent.run()

`low_cpu_mem_usage` was None, now default to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
The new lm_head weights will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


Map:   0%|          | 0/52500 [00:00<?, ? examples/s]

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).




trainable params: 671,088,640 || all params: 7,912,828,928 || trainable%: 8.4810




Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2500 [00:00<?, ? examples/s]

Converting train dataset to ChatML:   0%|          | 0/50000 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/50000 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/50000 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/2500 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/2500 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/2500 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Step,Training Loss


In [None]:
eval_results = agent.evaluate()

print('\n')
print(eval_results)
print('\n')