<a href="https://colab.research.google.com/github/olonok69/LLM_Notebooks/blob/main/mlflow/qlora/LLama3_2_3B_fine_tuning_QLORA_DORA_customer_service.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning Open-Source LLM using QLoRA with MLflow and PEFT
- meta-llama/Llama-3.2-3B-Instruct: The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.
- QLoRA is a novel method that allows us to fine-tune large foundational models with limited GPU resources. It reduces the number of trainable parameters by learning pairs of rank-decomposition matrices and also applies 4-bit quantization to the frozen pretrained model to further reduce the memory footprint.
- PEFT is a library developed by HuggingFace, that enables developers to easily integrate various optimization methods with pretrained models available on the HuggingFace Hub. With PEFT, you can apply QLoRA to the pretrained model with a few lines of configurations and run fine-tuning just like the normal Transformers model training.
- MLflow manages an exploding number of configurations, assets, and metrics during the LLM training on your behalf. MLflow is natively integrated with Transformers and PEFT, and plays a crucial role in organizing the fine-tuning cycle.


# Dataset

### Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset


This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM Fine-Tuning. For example, if you are [ACME Company], you can create your own customized LLM by first training a fine-tuned model using this dataset, and then further fine-tuning it with a small amount of your own data. An overview of this approach can be found at: From General-Purpose LLMs to Verticalized Enterprise Models

The dataset has the following specs:

- Use Case: Intent Detection
- Vertical: Customer Service
- 27 intents assigned to 10 categories
- 26872 question/answer pairs, around 1000 per intent
- 30 entity/slot types
- 12 different types of language generation tags

In [1]:
%pip install -U transformers -q
%pip install -U datasets  -q
%pip install -U accelerate  -q
%pip install -U peft -q
%pip install -U trl -q
%pip install -U bitsandbytes -q
%pip install mlflow pyngrok -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
trl 0.12.2 requires transformers<4.47.0, but you have transformers 4.47.0 which is incompatible.[0m[31m
[0m

In [2]:
from google.colab import userdata
import mlflow
import os

MLFLOW_TRACKING_URI="databricks"
# Specify the workspace hostname and token
DATABRICKS_HOST="https://adb-2467347032368999.19.azuredatabricks.net/"
DATABRICKS_TOKEN=userdata.get('DATABRCKS_TTOKEN')

In [3]:
if "MLFLOW_TRACKING_URI" not in os.environ:
    os.environ["MLFLOW_TRACKING_URI"] = MLFLOW_TRACKING_URI
if "DATABRICKS_HOST" not in os.environ:
    os.environ["DATABRICKS_HOST"] = DATABRICKS_HOST
if "DATABRICKS_TOKEN" not in os.environ:
    os.environ["DATABRICKS_TOKEN"] = DATABRICKS_TOKEN

In [4]:

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

mlflow.set_experiment("/Users/pepe@kk.com/llama3.2_finetuning")

<Experiment: artifact_location='dbfs:/databricks/mlflow-tracking/4138888342068537', creation_time=1733930612576, experiment_id='4138888342068537', last_update_time=1733940460091, lifecycle_stage='active', name='/Users/pepe@kk.com/llama3.2_finetuning', tags={'mlflow.experiment.sourceName': '/Users/pepe@kk.com/llama3.2_finetuning',
 'mlflow.experimentType': 'MLFLOW_EXPERIMENT',
 'mlflow.ownerEmail': 'pepe@kk.com',
 'mlflow.ownerId': '1331640755799986'}>

In [5]:
import warnings

# Disable a few less-than-useful UserWarnings from setuptools and pydantic
warnings.filterwarnings("ignore", category=UserWarning)

In [6]:
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging

    )

from peft import (LoraConfig,
                 PeftModel,
                 prepare_model_for_kbit_training,
                 get_peft_model)

import os
import torch
import wandb
from datasets import load_dataset
from trl import SFTTrainer, setup_chat_format


# Model
https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
# Dataset
https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset

In [7]:
base_model = "meta-llama/Llama-3.2-3B-Instruct"
new_model = "llama-3.2-3b-it-Ecommerce-ChatBot"
dataset_name = "bitext/Bitext-customer-support-llm-chatbot-training-dataset"

In [8]:
# Set torch dtype and attention implementation
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install -qqq flash-attn
    torch_dtype = torch.bfloat16
    attn_implementation = "flash_attention_2"
else:
    torch_dtype = torch.float16
    attn_implementation = "eager"

In [9]:
attn_implementation

'flash_attention_2'

In [10]:
tokenizer =AutoTokenizer.from_pretrained(base_model,trust_remote_code=True)

# For 8 bit quantization
#quantization_config = BitsAndBytesConfig(load_in_8bit=True,
#                                        llm_int8_threshold=200.0)

## For 4 bit quantization
quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,)

model = AutoModelForCausalLM.from_pretrained(base_model,
                                             quantization_config=quantization_config,
                                             device_map="auto")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [11]:
#Importing the dataset
dataset_raw = load_dataset(dataset_name, split="train")
dataset = dataset_raw.shuffle(seed=65).select(range(1000)) # Only use 1000 samples for quick demo
instruction = """You are a top-rated customer service agent named John.
    Be polite to customers and answer all their questions.
    """
def format_chat_template(row):

    row_json = [{"role": "system", "content": instruction },
               {"role": "user", "content": row["instruction"]},
               {"role": "assistant", "content": row["response"]}]

    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

dataset = dataset.map(
    format_chat_template,
    num_proc= 4,
)

README.md:   0%|          | 0.00/11.9k [00:00<?, ?B/s]

(…)t_Training_Dataset_27K_responses-v11.csv:   0%|          | 0.00/19.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/26872 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/1000 [00:00<?, ? examples/s]

In [12]:
dataset_raw

Dataset({
    features: ['flags', 'instruction', 'category', 'intent', 'response'],
    num_rows: 26872
})

In [13]:
# Spliting Dataset Optional

split_data = dataset.train_test_split(test_size =0.2,shuffle=True)


train_dataset = split_data["train"]
test_dataset = split_data["test"]



print(f"Training Set Size: {len(train_dataset)}")
print(f"Evaluation Set Size: {len(test_dataset)}")

Training Set Size: 800
Evaluation Set Size: 200


In [14]:
import bitsandbytes as bnb

trained_model_id = "Llama-3.2-3B-sft-lora-bitext"
output_dir = '/content/' + trained_model_id


# Lora Config
https://huggingface.co/docs/peft/main/en/package_reference/lora#peft.LoraConfig

In [15]:
# based on config
peft_config = LoraConfig(
        r=64,
        lora_alpha=16,
        lora_dropout=0.1,
        use_dora=True, # disable if you dont want to use
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
# Enable ‘Weight-Decomposed Low-Rank Adaptation’ (DoRA)

# https://arxiv.org/pdf/2402.09353

In [16]:
#Hyperparamter
training_args = TrainingArguments(
    output_dir=new_model,
    overwrite_output_dir=True,
    per_device_eval_batch_size=1, # originally set to 8
    per_device_train_batch_size=1, # originally set to 8
    push_to_hub=True,
    hub_model_id=trained_model_id,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    num_train_epochs=1,
    eval_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=10,
    logging_strategy="steps",
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    report_to="mlflow"
)

In [17]:
trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        dataset_text_field="text",
        processing_class=tokenizer,
        packing=False,
        peft_config=peft_config,
        max_seq_length=tokenizer.model_max_length,
    )


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/800 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [19]:
# trainer.processing_class=tokenizer
tokenizer.pad_token = tokenizer.eos_token

In [20]:
from datetime import datetime
import pandas as pd
name = "fine_tuning" +datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
with mlflow.start_run(run_name = name) as run:
  mlflow.log_params(training_args.__dict__)
  trainer.train()

2024/12/11 19:04:35 ERROR mlflow.utils.async_logging.async_logging_queue: Run Id 939788e363b84b4aaf8182196f8a2bbc: Failed to log run data: Exception: INVALID_PARAMETER_VALUE: Parameter with key eval_strategy was already logged with a value of IntervalStrategy.STEPS. The attempted new value was steps
2024/12/11 19:04:35 ERROR mlflow.utils.async_logging.async_logging_queue: Run Id 939788e363b84b4aaf8182196f8a2bbc: Failed to log run data: Exception: INVALID_PARAMETER_VALUE: Parameter with key accelerator_config was already logged with a value of AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False). The attempted new value was {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}


Step,Training Loss,Validation Loss
80,0.8938,0.979614
160,0.9124,0.914273
240,0.8281,0.862815
320,0.8125,0.835615
400,0.709,0.825323


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


🏃 View run fine_tuning2024-12-11_19:04:33 at: https://adb-2467347032368999.19.azuredatabricks.net/ml/experiments/4138888342068537/runs/939788e363b84b4aaf8182196f8a2bbc
🧪 View experiment at: https://adb-2467347032368999.19.azuredatabricks.net/ml/experiments/4138888342068537


In [21]:
# Model Inferance
messages = [{"role": "system", "content": instruction},
    {"role": "user", "content": "I bought the same item twice, cancel order {{Order Number}}"}]


prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors="pt",padding=True, truncation=True).to("cuda")


outputs = model.generate(**inputs, max_new_tokens=150, num_return_sequences=1)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(text.split("assistant")[1])

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.




I understand your concern about having bought the same item twice and wanting to cancel order number {{Order Number}}. I'm here to assist you in resolving this issue. To proceed with the cancellation, could you please provide me with some additional details? For example, the item in question, the purchase date, or any other relevant information that will help me locate your order and assist you further. Rest assured, I'm committed to ensuring your satisfaction and resolving this matter promptly. Let's work together to find a solution. How can I further assist you today?


In [22]:
prompt

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 11 Dec 2024\n\nYou are a top-rated customer service agent named John.\n    Be polite to customers and answer all their questions.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI bought the same item twice, cancel order {{Order Number}}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'

In [23]:
import pprint
pprint.pprint(text.split("assistant")[1])

('\n'
 '\n'
 'I understand your concern about having bought the same item twice and '
 "wanting to cancel order number {{Order Number}}. I'm here to assist you in "
 'resolving this issue. To proceed with the cancellation, could you please '
 'provide me with some additional details? For example, the item in question, '
 'the purchase date, or any other relevant information that will help me '
 "locate your order and assist you further. Rest assured, I'm committed to "
 "ensuring your satisfaction and resolving this matter promptly. Let's work "
 'together to find a solution. How can I further assist you today?')


In [33]:
format_chat_template(train_dataset[1])

{'flags': 'BLM',
 'instruction': 'I am trying to check what shipping methods are available',
 'category': 'DELIVERY',
 'intent': 'delivery_options',
 'response': "I'll get right on it! I'd be more than happy to assist you in exploring the variety of shipping methods we offer. Here are the available options:\n\n1. Standard Shipping: This option typically takes {{Date Range}} business days for delivery and is perfect if you aren't in a rush.\n2. Expedited Shipping: If you need your items a bit faster, this method will get them to you in {{Date Range}} business days.\n3. Overnight Shipping: For urgent needs, we offer overnight shipping, ensuring you receive your items on the next business day.\n4. In-Store Pickup: If you prefer, you can also opt for in-store pickup at one of our {{Store Location}} for added convenience.\n\nFeel free to choose the method that aligns best with your preferences and schedule. If you have any further questions or need assistance with your decision-making proce

In [24]:
format_chat_template(train_dataset[1])['text']

"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 11 Dec 2024\n\nYou are a top-rated customer service agent named John.\n    Be polite to customers and answer all their questions.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nI am trying to check what shipping methods are available<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'll get right on it! I'd be more than happy to assist you in exploring the variety of shipping methods we offer. Here are the available options:\n\n1. Standard Shipping: This option typically takes {{Date Range}} business days for delivery and is perfect if you aren't in a rush.\n2. Expedited Shipping: If you need your items a bit faster, this method will get them to you in {{Date Range}} business days.\n3. Overnight Shipping: For urgent needs, we offer overnight shipping, ensuring you receive your items on the next business day.\n4. In-Store Pickup: If you prefer, you can also o

In [25]:
from mlflow.models import infer_signature

sample = format_chat_template(train_dataset[1])

# MLflow infers schema from the provided sample input/output/params
signature = infer_signature(
    model_input=sample['text'],
    model_output=sample["response"],
    # Parameters are saved with default values if specified
    params={"max_new_tokens": 256, "repetition_penalty": 1.15, "return_full_text": False},
)
signature

inputs: 
  [string (required)]
outputs: 
  [string (required)]
params: 
  ['max_new_tokens': long (default: 256), 'repetition_penalty': double (default: 1.15), 'return_full_text': boolean (default: False)]

In [26]:
# Basically the same format as we applied to the dataset. However, the template only accepts {prompt} variable so both table and question need to be fed in there.
prompt_template = """You are a top-rated customer service agent named John.
    Be polite to customers and answer all their questions.
{prompt}

### Response:
"""

In [27]:


import datetime
now = datetime.datetime.now()
now.strftime("%Y-%m-%d_%H:%M:%S")

'2024-12-11_19:11:44'

In [28]:
import mlflow

# Get the ID of the MLflow Run that was automatically created above
last_run_id = mlflow.last_active_run().info.run_id

# Save a tokenizer without padding because it is only needed for training
tokenizer_no_pad = AutoTokenizer.from_pretrained(base_model, add_bos_token=True)

# If you interrupt the training, uncomment the following line to stop the MLflow run
# mlflow.end_run()


# Start an MLflow run context and log the PHi3 model wrapper along with the param-included signature to
# allow for overriding parameters at inference time
now = datetime.datetime.now()

description= """fine tuning Llama3.2 model PEFT
"""
with mlflow.start_run(run_id=last_run_id, description=description) as run:
    mlflow.log_params(peft_config.to_dict())
    mlflow.transformers.log_model(
        transformers_model={"model": trainer.model, "tokenizer": trainer.tokenizer},
        prompt_template=prompt_template,
        signature=signature,
        artifact_path="model",  # This is a relative path to save model files within MLflow run
    )

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
2024/12/11 19:11:58 INFO mlflow.transformers: Overriding save_pretrained to False for PEFT models, following the Transformers behavior. The PEFT adaptor and config will be saved, but the base model weights will not and reference to the HuggingFace Hub repository will be logged instead.
2024/12/11 19:11:59 INFO mlflow.transformers: Skipping saving pretrained model weights to disk as the save_pretrained argumentis set to False. The reference to the HuggingFace Hub repository meta-llama/Llama-3.2-3B-Instruct will be logged instead.


README.md:   0%|          | 0.00/41.7k [00:00<?, ?B/s]

LICENSE.txt:   0%|          | 0.00/7.71k [00:00<?, ?B/s]

2024/12/11 19:11:59 INFO mlflow.transformers: text-generation pipelines saved with prompt templates have the `return_full_text` pipeline kwarg set to False by default. To override this behavior, provide a `model_config` dict with `return_full_text` set to `True` when saving the model.
2024/12/11 19:11:59 INFO mlflow.transformers: A local checkpoint path or PEFT model is given as the `transformers_model`. To avoid loading the full model into memory, we don't infer the pip requirement for the model. Instead, we will use the default requirements, but it may not capture all required pip libraries for the model. Consider providing the pip requirements explicitly.


Uploading artifacts:   0%|          | 0/10 [00:00<?, ?it/s]



🏃 View run fine_tuning2024-12-11_19:04:33 at: https://adb-2467347032368999.19.azuredatabricks.net/ml/experiments/4138888342068537/runs/939788e363b84b4aaf8182196f8a2bbc
🧪 View experiment at: https://adb-2467347032368999.19.azuredatabricks.net/ml/experiments/4138888342068537


In [29]:
run.to_dictionary()

{'info': {'artifact_uri': 'dbfs:/databricks/mlflow-tracking/4138888342068537/939788e363b84b4aaf8182196f8a2bbc/artifacts',
  'end_time': 1733944250550,
  'experiment_id': '4138888342068537',
  'lifecycle_stage': 'active',
  'run_id': '939788e363b84b4aaf8182196f8a2bbc',
  'run_name': 'fine_tuning2024-12-11_19:04:33',
  'run_uuid': '939788e363b84b4aaf8182196f8a2bbc',
  'start_time': 1733943873778,
  'status': 'RUNNING',
  'user_id': ''},
 'data': {'metrics': {'epoch': 1.0,
   'eval_loss': 0.8253230452537537,
   'eval_runtime': 26.0601,
   'eval_samples_per_second': 7.675,
   'eval_steps_per_second': 7.675,
   'grad_norm': 0.5536566376686096,
   'learning_rate': 0.0,
   'loss': 0.709,
   'total_flos': 2676416701384704.0,
   'train_loss': 0.9513044072687626,
   'train_runtime': 368.9967,
   'train_samples_per_second': 2.168,
   'train_steps_per_second': 1.084},
  'params': {'__cached__setup_devices': 'cuda:0',
   '_n_gpu': '1',
   'accelerator_config': 'AcceleratorConfig(split_batches=False

In [56]:
# import torch
# import gc
# try:
#   del trainer
#   del model
# except:
#   pass
# with torch.no_grad():
#     torch.cuda.empty_cache()
# gc.collect()

5096

In [30]:
mlflow_model = mlflow.pyfunc.load_model("runs:/939788e363b84b4aaf8182196f8a2bbc/model")

Downloading artifacts:   0%|          | 0/10 [00:00<?, ?it/s]



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [31]:
prompt="""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 11 Dec 2024

You are a top-rated customer service agent named John.
    Be polite to customers and answer all their questions.<|eot_id|><|start_header_id|>user<|end_header_id|>

I don't know what to do to change to the {{Account Type}} account<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Not a problem at all! I understand that you're uncertain about the steps required to switch to the {{Account Type}} account. Allow me to provide you with a clear and concise guide:

1. Log into your account: Start by accessing our platform through the login page.

2. Navigate to Account Settings: Once you're logged in, locate the Account Settings or Profile section. This is where you can manage and make changes to your account.

3. Find the Upgrade option: Within the Account Settings or Profile section, look for an option labeled "Upgrade" or "Switch Account Type." Click on it to proceed.

4. Select the Free account: From the list of available account types, choose the "Free" account to switch to it.

5. Confirm the changes: Follow the on-screen instructions to confirm your decision and finalize the switch to the {{Account Type}} account.

If you encounter any difficulties or have further questions, please don't hesitate to reach out to our dedicated customer support team. They're available {{Customer Service Hours}} via {{Customer Support Phone Number}} or through the Live Chat on our website at {{Website URL}}. We're here to assist you every step of the way and ensure a smooth transition to your desired account type.<|eot_id|>"""

In [32]:
mlflow_model.predict(prompt)
#

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


["Your feedback is invaluable to us. It helps us improve our services and better meet your needs. If there's anything else we can assist you with, feel free to let us know. Have a great day!"]

In [35]:
mlflow_model.metadata.to_dict()

{'run_id': '939788e363b84b4aaf8182196f8a2bbc',
 'artifact_path': 'model',
 'utc_time_created': '2024-12-11 19:11:57.952803',
 'flavors': {'python_function': {'config': {'return_full_text': False},
   'env': {'conda': 'conda.yaml', 'virtualenv': 'python_env.yaml'},
   'loader_module': 'mlflow.transformers',
   'python_version': '3.10.12'},
  'transformers': {'code': None,
   'components': ['tokenizer'],
   'framework': 'pt',
   'instance_type': 'TextGenerationPipeline',
   'peft_adaptor': 'peft',
   'pipeline_model_type': 'LlamaForCausalLM',
   'source_model_name': 'meta-llama/Llama-3.2-3B-Instruct',
   'source_model_revision': '0cb88a4f764b7a12671c53f0838cd831a0843b95',
   'task': 'text-generation',
   'tokenizer_name': 'meta-llama/Llama-3.2-3B-Instruct',
   'tokenizer_revision': '0cb88a4f764b7a12671c53f0838cd831a0843b95',
   'tokenizer_type': 'PreTrainedTokenizerFast',
   'torch_dtype': 'torch.float32',
   'transformers_version': '4.46.3'}},
 'model_uuid': '22eefffa71e34ec18847237e8b7