# Finetuning Llama-3.1 8b with text2sql SFT dataset

This notebook implements necessary data transformations and installation of libraries. It works with Google Colab A-100 GPU (paid) instances. There are comments for changing it into int4 quantizations on T4 GPU.

Unsloth library and code is used to lower GPU memory requirement and increase performance. See https://github.com/unslothai/unsloth

To upload training dataset, click on Folder Icon on the left hand side of your Colab Notebook and upload your training data file: train_text2sql_sft_dataset.json (see instructions for generating the dataset under "finetuning" parent directory).

In [None]:
# Click on Folder icon on the left hand side of your Colab Notebook and upload your training data file: train_text2sql_sft_dataset.json
training_data_file = "train_text2sql_sft_dataset.json"

In [None]:
# install libraries
!pip install -U datasets

!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
!pip install --no-deps unsloth

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.

# Unsloth 4bit models
# fourbit_models = [
#     "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
#     "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
#     "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
#     "unsloth/Meta-Llama-3.1-405B-bnb-4bit",
# ]

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B-Instruct", # changed to "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit" and load_in_4bit= True for 4bit
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.6.8: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

inputs = tokenizer(
[
    alpaca_prompt.format(
        "You are a text to SQL query translator. Using the SQLite DB Schema and the External Knowledge, translate the following text question into a SQLite SQL select statement.", # instruction
        "-- DB Schema: CREATE TABLE \"lists\"\n(\n    user_id                     INTEGER\n        references lists_users (user_id),\n    list_id                     INTEGER not null\n        primary key,\n    list_title                  TEXT,\n    list_movie_number           INTEGER,\n    list_update_timestamp_utc   TEXT,\n    list_creation_timestamp_utc TEXT,\n    list_followers              INTEGER,\n    list_url                    TEXT,\n    list_comments               INTEGER,\n    list_description            TEXT,\n    list_cover_image_url        TEXT,\n    list_first_image_url        TEXT,\n    list_second_image_url       TEXT,\n    list_third_image_url        TEXT\n)\n\nCREATE TABLE \"movies\"\n(\n    movie_id             INTEGER not null\n        primary key,\n    movie_title          TEXT,\n    movie_release_year   INTEGER,\n    movie_url            TEXT,\n    movie_title_language TEXT,\n    movie_popularity     INTEGER,\n    movie_image_url      TEXT,\n    director_id          TEXT,\n    director_name        TEXT,\n    director_url         TEXT\n)\n\nCREATE TABLE \"ratings_users\"\n(\n    user_id                 INTEGER\n        references lists_users (user_id),\n    rating_date_utc         TEXT,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_avatar_image_url   TEXT,\n    user_cover_image_url    TEXT,\n    user_eligible_for_trial INTEGER,\n    user_has_payment_method INTEGER\n)\n\nCREATE TABLE lists_users\n(\n    user_id                 INTEGER not null ,\n    list_id                 INTEGER not null ,\n    list_update_date_utc    TEXT,\n    list_creation_date_utc  TEXT,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_avatar_image_url   TEXT,\n    user_cover_image_url    TEXT,\n    user_eligible_for_trial TEXT,\n    user_has_payment_method TEXT,\n    primary key (user_id, list_id),\n    foreign key (list_id) references lists(list_id),\n    foreign key (user_id) references lists(user_id)\n)\n\nCREATE TABLE ratings\n(\n    movie_id                INTEGER,\n    rating_id               INTEGER,\n    rating_url              TEXT,\n    rating_score            INTEGER,\n    rating_timestamp_utc    TEXT,\n    critic                  TEXT,\n    critic_likes            INTEGER,\n    critic_comments         INTEGER,\n    user_id                 INTEGER,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_eligible_for_trial INTEGER,\n    user_has_payment_method INTEGER,\n    foreign key (movie_id) references movies(movie_id),\n    foreign key (user_id) references lists_users(user_id),\n    foreign key (rating_id) references ratings(rating_id),\n    foreign key (user_id) references ratings_users(user_id)\n)\n\n-- External Knowledge: longest movie title refers to MAX(LENGTH(movie_title)); when it was released refers to movie_release_year;\n\n-- Question: What is the name of the longest movie title? When was it released?", # input
        "", # output - leave this blank for generation!
    ) + EOS_TOKEN
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 1024)

<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are a text to SQL query translator. Using the SQLite DB Schema and the External Knowledge, translate the following text question into a SQLite SQL select statement.

### Input:
-- DB Schema: CREATE TABLE "lists"
(
    user_id                     INTEGER
        references lists_users (user_id),
    list_id                     INTEGER not null
        primary key,
    list_title                  TEXT,
    list_movie_number           INTEGER,
    list_update_timestamp_utc   TEXT,
    list_creation_timestamp_utc TEXT,
    list_followers              INTEGER,
    list_url                    TEXT,
    list_comments               INTEGER,
    list_description            TEXT,
    list_cover_image_url        TEXT,
    list_first_image_url        TEXT,
    list_second_image_url       TEXT,
    list_t

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.6.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
# Data Prep
# We transform data from existing .json format into new prompt format.

# **[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

lines = []
with open(training_data_file, "r") as f:
    lines = f.readlines()

print(lines[:2])

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(messages):
    instruction = messages['messages'][0]["content"]
    input       = messages['messages'][1]["content"]
    output      = messages['messages'][2]["content"]
    text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
    return { "text" : text, }
pass

from datasets import load_dataset, load_from_disk
dataset_raw = load_dataset("json", data_files="train_text2sql_sft_dataset.json", split="train")
dataset = dataset_raw.map(formatting_prompts_func, batched = False,)
dataset

['{"messages":[{"content":"You are a text to SQL query translator. Using the SQLite DB Schema and the External Knowledge, translate the following text question into a SQLite SQL select statement.","role":"system"},{"content":"-- DB Schema: CREATE TABLE \\"lists\\"\\n(\\n    user_id                     INTEGER\\n        references lists_users (user_id),\\n    list_id                     INTEGER not null\\n        primary key,\\n    list_title                  TEXT,\\n    list_movie_number           INTEGER,\\n    list_update_timestamp_utc   TEXT,\\n    list_creation_timestamp_utc TEXT,\\n    list_followers              INTEGER,\\n    list_url                    TEXT,\\n    list_comments               INTEGER,\\n    list_description            TEXT,\\n    list_cover_image_url        TEXT,\\n    list_first_image_url        TEXT,\\n    list_second_image_url       TEXT,\\n    list_third_image_url        TEXT\\n)\\n\\nCREATE TABLE \\"movies\\"\\n(\\n    movie_id             INTEGER not null\

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/6599 [00:00<?, ? examples/s]

Dataset({
    features: ['messages', 'text'],
    num_rows: 6599
})

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text", # Explicitly set the text field
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "tensorboard", # Use this for WandB etc
    ),
)

Unsloth: Tokenizing ["text"]:   0%|          | 0/6599 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.557 GB.
30.502 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 6,599 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,072,204,288 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.1243
2,1.1466
3,1.0803
4,1.0073
5,1.2946
6,0.9908
7,0.9453
8,0.9772
9,0.7099
10,0.7306


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

144.3049 seconds used for training.
2.41 minutes used for training.
Peak reserved memory = 30.502 GB.
Peak reserved memory for training = 0.0 GB.
Peak reserved memory % of max memory = 77.109 %.
Peak reserved memory for training % of max memory = 0.0 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!



In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "You are a text to SQL query translator. Using the SQLite DB Schema and the External Knowledge, translate the following text question into a SQLite SQL select statement.", # instruction
        "-- DB Schema: CREATE TABLE \"lists\"\n(\n    user_id                     INTEGER\n        references lists_users (user_id),\n    list_id                     INTEGER not null\n        primary key,\n    list_title                  TEXT,\n    list_movie_number           INTEGER,\n    list_update_timestamp_utc   TEXT,\n    list_creation_timestamp_utc TEXT,\n    list_followers              INTEGER,\n    list_url                    TEXT,\n    list_comments               INTEGER,\n    list_description            TEXT,\n    list_cover_image_url        TEXT,\n    list_first_image_url        TEXT,\n    list_second_image_url       TEXT,\n    list_third_image_url        TEXT\n)\n\nCREATE TABLE \"movies\"\n(\n    movie_id             INTEGER not null\n        primary key,\n    movie_title          TEXT,\n    movie_release_year   INTEGER,\n    movie_url            TEXT,\n    movie_title_language TEXT,\n    movie_popularity     INTEGER,\n    movie_image_url      TEXT,\n    director_id          TEXT,\n    director_name        TEXT,\n    director_url         TEXT\n)\n\nCREATE TABLE \"ratings_users\"\n(\n    user_id                 INTEGER\n        references lists_users (user_id),\n    rating_date_utc         TEXT,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_avatar_image_url   TEXT,\n    user_cover_image_url    TEXT,\n    user_eligible_for_trial INTEGER,\n    user_has_payment_method INTEGER\n)\n\nCREATE TABLE lists_users\n(\n    user_id                 INTEGER not null ,\n    list_id                 INTEGER not null ,\n    list_update_date_utc    TEXT,\n    list_creation_date_utc  TEXT,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_avatar_image_url   TEXT,\n    user_cover_image_url    TEXT,\n    user_eligible_for_trial TEXT,\n    user_has_payment_method TEXT,\n    primary key (user_id, list_id),\n    foreign key (list_id) references lists(list_id),\n    foreign key (user_id) references lists(user_id)\n)\n\nCREATE TABLE ratings\n(\n    movie_id                INTEGER,\n    rating_id               INTEGER,\n    rating_url              TEXT,\n    rating_score            INTEGER,\n    rating_timestamp_utc    TEXT,\n    critic                  TEXT,\n    critic_likes            INTEGER,\n    critic_comments         INTEGER,\n    user_id                 INTEGER,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_eligible_for_trial INTEGER,\n    user_has_payment_method INTEGER,\n    foreign key (movie_id) references movies(movie_id),\n    foreign key (user_id) references lists_users(user_id),\n    foreign key (rating_id) references ratings(rating_id),\n    foreign key (user_id) references ratings_users(user_id)\n)\n\n-- External Knowledge: longest movie title refers to MAX(LENGTH(movie_title)); when it was released refers to movie_release_year;\n\n-- Question: What is the name of the longest movie title? When was it released?", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = False)
tokenizer.batch_decode(outputs)

The following generation flags are not valid and may be ignored: ['cache_implementation']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


['<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nYou are a text to SQL query translator. Using the SQLite DB Schema and the External Knowledge, translate the following text question into a SQLite SQL select statement.\n\n### Input:\n-- DB Schema: CREATE TABLE "lists"\n(\n    user_id                     INTEGER\n        references lists_users (user_id),\n    list_id                     INTEGER not null\n        primary key,\n    list_title                  TEXT,\n    list_movie_number           INTEGER,\n    list_update_timestamp_utc   TEXT,\n    list_creation_timestamp_utc TEXT,\n    list_followers              INTEGER,\n    list_url                    TEXT,\n    list_comments               INTEGER,\n    list_description            TEXT,\n    list_cover_image_url        TEXT,\n    list_first_image_url        TEXT,\n    list_second_image_ur

In [None]:
# # alpaca_prompt = Copied from above
# FastLanguageModel.for_inference(model) # Enable native 2x faster inference
# inputs = tokenizer(
# [
#     alpaca_prompt.format(
#         "You are a text to SQL query translator. Using the SQLite DB Schema and the External Knowledge, translate the following text question into a SQLite SQL select statement.", # instruction
#         "-- DB Schema: CREATE TABLE \"lists\"\n(\n    user_id                     INTEGER\n        references lists_users (user_id),\n    list_id                     INTEGER not null\n        primary key,\n    list_title                  TEXT,\n    list_movie_number           INTEGER,\n    list_update_timestamp_utc   TEXT,\n    list_creation_timestamp_utc TEXT,\n    list_followers              INTEGER,\n    list_url                    TEXT,\n    list_comments               INTEGER,\n    list_description            TEXT,\n    list_cover_image_url        TEXT,\n    list_first_image_url        TEXT,\n    list_second_image_url       TEXT,\n    list_third_image_url        TEXT\n)\n\nCREATE TABLE \"movies\"\n(\n    movie_id             INTEGER not null\n        primary key,\n    movie_title          TEXT,\n    movie_release_year   INTEGER,\n    movie_url            TEXT,\n    movie_title_language TEXT,\n    movie_popularity     INTEGER,\n    movie_image_url      TEXT,\n    director_id          TEXT,\n    director_name        TEXT,\n    director_url         TEXT\n)\n\nCREATE TABLE \"ratings_users\"\n(\n    user_id                 INTEGER\n        references lists_users (user_id),\n    rating_date_utc         TEXT,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_avatar_image_url   TEXT,\n    user_cover_image_url    TEXT,\n    user_eligible_for_trial INTEGER,\n    user_has_payment_method INTEGER\n)\n\nCREATE TABLE lists_users\n(\n    user_id                 INTEGER not null ,\n    list_id                 INTEGER not null ,\n    list_update_date_utc    TEXT,\n    list_creation_date_utc  TEXT,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_avatar_image_url   TEXT,\n    user_cover_image_url    TEXT,\n    user_eligible_for_trial TEXT,\n    user_has_payment_method TEXT,\n    primary key (user_id, list_id),\n    foreign key (list_id) references lists(list_id),\n    foreign key (user_id) references lists(user_id)\n)\n\nCREATE TABLE ratings\n(\n    movie_id                INTEGER,\n    rating_id               INTEGER,\n    rating_url              TEXT,\n    rating_score            INTEGER,\n    rating_timestamp_utc    TEXT,\n    critic                  TEXT,\n    critic_likes            INTEGER,\n    critic_comments         INTEGER,\n    user_id                 INTEGER,\n    user_trialist           INTEGER,\n    user_subscriber         INTEGER,\n    user_eligible_for_trial INTEGER,\n    user_has_payment_method INTEGER,\n    foreign key (movie_id) references movies(movie_id),\n    foreign key (user_id) references lists_users(user_id),\n    foreign key (rating_id) references ratings(rating_id),\n    foreign key (user_id) references ratings_users(user_id)\n)\n\n-- External Knowledge: longest movie title refers to MAX(LENGTH(movie_title)); when it was released refers to movie_release_year;\n\n-- Question: What is the name of the longest movie title? When was it released?", # input
#         "", # output - leave this blank for generation!
#     )
# ], return_tensors = "pt").to("cuda")

# outputs = model.generate(**inputs, max_new_tokens = 1024, use_cache = True)
# tokenizer.batch_decode(outputs)

You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

In [None]:
# Uncomment for saving Adaptor Only. We save marged model in a future step.

#model.save_pretrained("Meta-Llama-3.1-8B-Instruct_lora_model")  # Local saving
#tokenizer.save_pretrained("Meta-Llama-3.1-8B-Instruct_lora_model")

#model.push_to_hub("amiryo/Meta-Llama-3.1-8B-Instruct-lora-only", token = "...") # Online saving
#tokenizer.push_to_hub("amiryo/Meta-Llama-3.1-8B-Instruct-lora-only", token = "...") # Online saving

README.md:   0%|          | 0.00/587 [00:00<?, ?B/s]

Uploading...:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/amiryo/Meta-Llama-3.1-8B-Instruct-lora-only


Uploading...:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [None]:

# We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

# Merge to 16bit. Saving to float16 for VLLM etc.
if False: model.push_to_hub_merged("your_hf/Meta-Llama-3.1-8B-Instruct-merged-16bit", tokenizer, save_method = "merged_16bit", token = "hf_...")
if False: model.save_pretrained_merged("Meta-Llama-3.1-8B-Instruct-merged-16bit", tokenizer, save_method = "merged_16bit",)

# Just LoRA adapters
if False: model.push_to_hub_merged("your_hf/Meta-Llama-3.1-8B-Instruct-lora", tokenizer, save_method = "lora", token = "hf_...")
if False: model.save_pretrained_merged("Meta-Llama-3.1-8B-Instruct-lora", tokenizer, save_method = "lora",)
