# Fine Tuning LLM to better understand LTL Semantics

## Install Dependencies

In [1]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers" trl peft accelerate bitsandbytes

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-_2l0jciy/unsloth_9e5dbbe183814f2f99c137c37f8eb5af
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-_2l0jciy/unsloth_9e5dbbe183814f2f99c137c37f8eb5af
  Resolved https://github.com/unslothai/unsloth.git to commit 229e2ecc67756f36316dfcbea42396f59eef44e0
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2025.9.9 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading unsloth_zoo-2025.9.9-py3-none-any.whl.metadata (31 kB)
Collecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.git-

## Prepare Dataset

In [3]:
from datasets import load_dataset
import torch

# Load your prepared dataset from the dataset.jsonl file
dataset = load_dataset("json", data_files="dataset.jsonl", split="train")

print(f"Total samples: {len(dataset)}")

# Format the dataset for the model's chat template
def format_instruction(example):
    """Formats the instruction and output using a clean chat template."""
    instruction = example['instruction']
    output = example['output']

    formatted_text = f"""<|im_start|>system
You are a specialized translator that converts natural language drone commands into Linear Temporal Logic (LTL) formulas. Respond only with the LTL formula.
<|im_end|><|im_start|>user
{instruction}<|im_end|><|im_start|>assistant
{output}<|im_end|>"""

    return {"text": formatted_text}

# Apply the formatting to the entire dataset
dataset = dataset.map(format_instruction)

# Create a train/validation split (e.g., 80/20) for evaluation
dataset = dataset.train_test_split(test_size=0.2, seed=42)

train_dataset = dataset["train"]
eval_dataset = dataset["test"]

print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(eval_dataset)}")

Generating train split: 0 examples [00:00, ? examples/s]

Total samples: 333


Map:   0%|          | 0/333 [00:00<?, ? examples/s]

Training samples: 266
Validation samples: 67


## Load Model & Tokenizer

In [4]:
from unsloth import FastLanguageModel

model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"
max_seq_length = 256 # Reduced for shorter LTL outputs

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=None,
    load_in_4bit=True,
)

# A simple, yet effective LoRA configuration for your task
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0, # No dropout for a deterministic translation task
    bias="none",
    use_gradient_checkpointing="unsloth",
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.9.7: Fast Llama patching. Transformers: 4.55.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/235 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Unsloth 2025.9.7 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Setup & Run SFTTrainer

In [5]:
from trl import SFTTrainer
from transformers import TrainingArguments, EarlyStoppingCallback

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    args=TrainingArguments(
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        num_train_epochs=3,
        warmup_steps=10,
        eval_strategy="epoch",  # Evaluation at the end of each epoch
        save_strategy="epoch",  # This must match eval_strategy
        logging_steps=1,
        output_dir="outputs",
        optim="adamw_8bit",
        seed=3407,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        save_total_limit=3,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
    ),
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
)

print("Starting LTL translation training...")
trainer_stats = trainer.train()

print("Training complete!")

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/266 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/67 [00:00<?, ? examples/s]

Starting LTL translation training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 266 | Num Epochs = 3 | Total steps = 51
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)
  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33misraelavendanojr[0m ([33misraelavendanojr-western-washington-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Detected [huggingface_hub.inference, openai] in use.
[34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Install Weave with `pip install weave` then add `import weave` to the top of your script.
[34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/


Unsloth: Will smartly offload gradients to save VRAM!


Epoch,Training Loss,Validation Loss
1,0.3363,0.305347
2,0.1719,0.188403
3,0.161,0.174084


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


Training complete!


## Test Model

In [10]:

# Verify the model on a few hardcoded examples to verify performance.

def test_model_performance(model, tokenizer):
    """Tests the model's ability to handle various LTL translation tasks."""
    print("\nStarting final model test...")

    # The prompt must exactly match the format used during training
    system_prompt = "You are a specialized translator that converts natural language drone commands into Linear Temporal Logic (LTL) formulas. Respond only with the LTL formula."

    test_queries = [
    # Complex sequential commands with new values and directions
    "rise 15 feet, then travel to point Z, then descend 8 feet",
    "move backwards 5 meters, then hover in place for 20 seconds, then fly to X",
    "go forward 12 meters, then go to Y, then wait a half minute, and return to home",

    # Ambiguous/unknown commands not in the original list
    "explore the entire room",
    "go to the kitchen",
    "turn the lights off",

    # Commands with different phrasing for the same output
    "proceed to point Y, then come back to the origin",
    "ascend 25 meters, and then fly to Z",
    "stay put for a quarter of a minute, then travel to X",

    # Commands with multiple movement types
    "go down 7 meters, then backward 15 meters, then fly to Y, then head back home",

    # Multiple chained wait commands with different units
    "wait 90 seconds, then go to X, then wait for two minutes, and go to Z"
    ]

    for query in test_queries:
        # Prepare the full prompt with the user query
        formatted_input = f"{system_prompt}<|im_end|><|im_start|>user\n{query}<|im_end|><|im_start|>assistant"

        # Generate a response from the model
        inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=64,
                do_sample=False,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id,
            )

        response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
        print("-" * 50)
        print(f"Query: {query}")
        print(f"Response: {response.strip()}")

test_model_performance(model, tokenizer)


Starting final model test...
--------------------------------------------------
Query: rise 15 feet, then travel to point Z, then descend 8 feet
Response: move(up,15) U F(at(Z)) U move(down,8)<|im_end|>
--------------------------------------------------
Query: move backwards 5 meters, then hover in place for 20 seconds, then fly to X
Response: move(backward,5) U wait(20) U F(at(X))<|im_end|>
--------------------------------------------------
Query: go forward 12 meters, then go to Y, then wait a half minute, and return to home
Response: move(forward,12) U F(at(Y)) U wait(30) U return_to_start()<|im_end|>
--------------------------------------------------
Query: explore the entire room
Response: AMBIGUOUS_QUERY<|im_end|>
--------------------------------------------------
Query: go to the kitchen
Response: F(at(unknown))<|im_end|>
--------------------------------------------------
Query: turn the lights off
Response: AMBIGUOUS_QUERY<|im_end|>
--------------------------------------------

## Export Fine-Tuned Model

In [11]:
print("\nSaving models...")

# Save the merged model as a GGUF file for Ollama
print("Saving GGUF model...")
model.save_pretrained_gguf(
    "Llama3_Crazyflie_gguf",
    tokenizer,
    quantization_method="q4_k_m"
)

# # Save the merged model for other frameworks
# print("Saving merged model..")
# model.save_pretrained_merged(
#     "Llama3_Crazyflie_merged",
#     tokenizer,
#     save_method="merged_4bit_forced"
# )

print("\nAll models saved successfully!")


Saving models...
Saving GGUF model...


Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.7G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 2.8 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 47%|████▋     | 15/32 [00:01<00:01, 13.67it/s]
We will save to Disk and not RAM now.
100%|██████████| 32/32 [03:25<00:00,  6.42s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving Llama3_Crazyflie_gguf/pytorch_model-00001-of-00004.bin...
Unsloth: Saving Llama3_Crazyflie_gguf/pytorch_model-00002-of-00004.bin...
Unsloth: Saving Llama3_Crazyflie_gguf/pytorch_model-00003-of-00004.bin...
Unsloth: Saving Llama3_Crazyflie_gguf/pytorch_model-00004-of-00004.bin...
Done.


Unsloth: Converting llama model. Can use fast conversion = False.


==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: CMAKE detected. Finalizing some steps for installation.
Unsloth: [1] Converting model at Llama3_Crazyflie_gguf into f16 GGUF format.
The output location will be /content/Llama3_Crazyflie_gguf/unsloth.F16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: Llama3_Crazyflie_gguf
INFO:hf-to-gguf:Model architecture: LlamaForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight,           torch.float32 --> F32, shape = {64}
INFO:hf-to-gguf:gguf: loading model weight map from 'pyt