# **Fine-tuning UrbanLLM Original**

This notebook seeks to fine-tune base LLama2-7B model based on the original UrbanLLM research paper. A truncated data set, containing 3059 training samples, is used to provide a fair comparison with the other models used in our FYP. After which, the original UrbanLLM is evaluated against our evaluation samples to assess its capability in spatial understanding. Some changes are made to the original code due to changes in package version and to implement using unsloth package.
*Source: https://github.com/JIANGYUE61610306/UrbanLLM/tree/main*

In [1]:
%%capture
!pip install unsloth

In [2]:
#Import necessary libraries
from unsloth import FastLanguageModel, is_bfloat16_supported
from datasets import Dataset, load_dataset
import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
    TextStreamer
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer, SFTConfig
import re
from tabulate import tabulate
import json
from statistics import mean

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!


# **Training**

In [3]:
# Defining constants
model_name = "meta-llama/Llama-2-7b-chat-hf"
new_model = "UrbanLLM"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 1
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 4

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.03

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 1

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "meta-llama/Llama-2-7b-hf",
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
    token = HUGGING_FACE_TOKEN, # TODO: Change to your HUGGING_FACE_TOKEN
)

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [None]:
# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = lora_r,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",],
    lora_alpha = lora_alpha,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2025.3.19 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
file_path = "training_sample_truncated.txt" #Use the first 3059 training samples from "training_sample.txt" defined by Jiang et. al.


loaded_list = []

# Open the file in read mode
with open(file_path, "r") as file:
    # Read each line of the file
    for line in file:
        # Strip whitespace from the line
        line = line.strip()
        # Check if the line is not empty
        if line:
            try:
                # Convert the line to an integer and append it to the list
                loaded_list.append(line)
            except ValueError:
                # Handle the case when the line cannot be converted to an integer
                print("Warning: Skipped line as it cannot be converted to an integer:", line)


# formulating instruction
dataset1_one_ans=[]
text = '<s>[INST]{} Q: {}[/INST]A: {}</s>'
instruction_note = "You are UAPLLM, a Large language model for urban activity that decompose user input into a list of spatio-temporal tasks with the following JSON format: {task: task_name, id: task_id, dep: dependency_task_ids, args: {domain: string, location_name_list: list, location_gps_list: list, time: time, input: sequence, service_no: int, bus_stop: string, task_specific: list}}. Task-specific information is stored using a string list and is used in specific models or tools as arguments. The dep field denotes the ID of the previous task that the current task relies. A dep field value of '-1' indicates that the current task does not rely on other tasks and can be executed immediately, otherwise always execute the previous task first. The executed output, such as the generated location, time, sequence, or POIs from the dependency task is marked as <resource>-task_id. This resource will be used in the next task. The spatio-temporal tasks must be selected from the following options: {long_time_series_prediction, time_series_prediction, event_prediction, trajectory_completion, trajectory_prediction, time_series_anomaly_detection, time_series_imputation, arrival_time_estimation, trajectory_forecasting, map_mapping, recommendation, spatial_relationship_inference, bus_arrival, taxi_availability}. To better understand each spatial-temporal task, here is the explanation and numbering, along with the corresponding examples: 1) Long Time Series Prediction: This task in- volves forecasting future values in a time series over a long horizon. It is typically used for long- term planning and trend analysis in various do- mains, such as weather forecasting, economic fore- casting, and demand planning. 2) Time Series Prediction: This task focuses on predicting future values in a time series over a shorter horizon compared to long time series predic- tion. It is commonly used for short-term forecasts like daily stock prices, temperature forecasts, or short-term sales predictions. 3) Event Prediction: This task involves predict- ing the occurrence of specific events based on his- torical data. Examples include predicting natural disasters, equipment failures, or social events like concerts or sports games. 4)Trajectory Completion: This task involves completing missing parts of a trajectory based on observed segments. It is useful in applications like tracking moving objects, filling in missing GPS data, or reconstructing incomplete travel routes. 5)Trajectory Prediction: This task involves fore- casting the future path of a moving object based on its past trajectory. Applications include predicting the movement of vehicles, pedestrians, or animals. 6)Time Series Anomaly Detection: This task in- volves identifying unusual patterns or outliers in time series data that deviate from expected behav- ior. It is used in applications like fraud detection, fault detection in machinery, and monitoring traffic conditions. 7) Time Series Imputation: This task involves filling in missing values in time series data to en- sure completeness and consistency. It is crucial for maintaining data quality in various applications like traffic records and climate data. 8)Arrival Time Estimation: This task involves predicting the arrival time of a vehicle or person at a specific location based on current and historical data. It is commonly used in transportation systems for buses, trains, and delivery services. 9) Taxi Availability Prediction: This task in- volves predicting the availability of taxis in spe- cific areas at given times. It helps optimize taxi dispatching and improve service for passengers by anticipating demand and ensuring timely availabil- ity. 10) Map Mapping: This task involves mapping addresses to GPS locations and mapping GPS loca- tionsback to addresses . 11) Bus Arrival: This task involves predicting the arrival times of buses at specific stops based on real-time data and historical patterns. It enhances the efficiency of public transportation systems by providing accurate and timely information to com- muters. 12) Spatial Relationship Inference: This task involves deducing spatial relationships between different entities or locations. It is used in urban planning to understand spatial dependencies and interactions, such as proximity analysis, clustering, and spatial correlations. 13) Recommendation: This task involves sug- gesting items or actions to users based on their preferences and historical behavior. Applications include recommending points of interest, routes, or services in urban planning. Please note that there exists a logical connection and order between the tasks. 1) If user input mentioned some specific locations/POIs, usually map mapping task should be included in the answer, otherwise if user is only asking the situation around all Singapore, map mapping task should not be included. 2) If user do not specify the arrival time, usually estimated arrival time task should be included. 3) Include tasks to predict weather and PM2.5 according to user input if user is going to outdoor activities. 4) When recommendation or taxi_availability task is included, usually map mapping task should be included as fundamental task. In case the user input cannot be parsed, an empty JSON response should be provided Please provide a task analysis JSON response basd on the given question."

for i in range(len(loaded_list)):
    pattern = r'Q:\s*(.*?)\s+A:\s*(.*)'
    data=loaded_list[i]
    matches = re.findall(pattern, data)

    for match in matches:
        # index = match[0]
        question = match[0]
        answer = match[1]
        # print(f"Q: {question}\nA: {answer}\n")
        # print(f"Q: {question}")
        # print(text.format(instruction_note, question, answer))
        dataset1_one_ans.append(text.format(instruction_note, question, answer))

In [None]:
from datasets import Dataset, DatasetDict
# my_dict = {'text':['1', '2', '3']}
# my_dict = {'text':dataset1_one_ans[:1000]}
my_dict = {'text':dataset1_one_ans}
dataset1 = Dataset.from_dict(my_dict)

dataset = dataset1
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# **Training UrbanLLM (Original)**

In [None]:
# Set training parameters
training_arguments = SFTConfig(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard",
    dataset_text_field="text",
    packing=packing,
    max_seq_length=max_seq_length,
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    train_dataset=dataset,
    peft_config=peft_config,
    args=training_arguments
)

# Train model
trainer.train()

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/3059 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 3,059 | Num Epochs = 1 | Total steps = 765
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 1 x 1) = 4
 "-____-"     Trainable parameters = 159,907,840/7,000,000,000 (2.28% trained)


Step,Training Loss
1,1.6316
2,1.6594
3,1.7003
4,1.6992
5,1.7159
6,1.7256
7,1.7222
8,1.7452
9,1.7474
10,1.7338


Unsloth: Will smartly offload gradients to save VRAM!


In [None]:
model.save_pretrained("urbanllm_model_og") # Local saving of trained save model
tokenizer.save_pretrained("urbanllm_model_og")

('urbanllm_model_og/tokenizer_config.json',
 'urbanllm_model_og/special_tokens_map.json',
 'urbanllm_model_og/tokenizer.model',
 'urbanllm_model_og/added_tokens.json',
 'urbanllm_model_og/tokenizer.json')

# **Load model**

In [5]:
#Load saved fine-tuned UrbanLLM Original model

dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

if True: # load the LoRA adapters of UrbanLLM we just saved for inference
    urbanLLM_og_model, urbanLLM_og_tokenizer = FastLanguageModel.from_pretrained(
        model_name = "urbanllm_model_og",
        max_seq_length = None,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.3.19 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [6]:
urbanLLM_og_model = FastLanguageModel.get_peft_model(
    urbanLLM_og_model,
    r = lora_r,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",],
    lora_alpha = lora_alpha,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth: Already have LoRA adapters! We shall skip this step.


# **Evaluation**

In [7]:
#Evaluation metric helper functions
#Source: https://capri.readthedocs.io/en/latest/evaluation-metrics.html

def precisionk(actual: list, recommended: list, k:int = 5):
    """
    Computes the number of relevant results among the top k recommended items

    Parameters
    ----------
    actual: list
        A list of ground truth items
        example: [X, Y, Z]
    recommended: list
        A list of ground truth items (all possible relevant items)
        example: [x, y, z]

    Returns
    ----------
        precision at k
    """
    recommended = recommended[:k]
    relevantResults = set(actual) & set(recommended)
    assert 0 <= len(
        relevantResults), f"The number of relevant results is not true (currently: {len(relevantResults)})"
    return 1.0 * len(relevantResults) / len(recommended)


def recallk(actual: list, recommended: list, k:int = 5):
    """
    The number of relevant results among the top k recommended items divided by the total number of relevant items

    Parameters
    ----------
    actual: list
        A list of ground truth items (all possible relevant items)
        example: [X, Y, Z]
    recommended: list
        A list of items recommended by the system
        example: [x, y, z]

    Returns
    ----------
        recall at k
    """
    recommended = recommended[:k]
    relevantResults = set(actual) & set(recommended)
    assert 0 <= len(relevantResults), f"The number of relevant results is not true (currently: {len(relevantResults)})"
    return 1.0 * len(relevantResults) / len(actual)


def mapk(actual: list, predicted: list, k: int = 5):
    """
    Computes mean Average Precision at k (mAPk) for a set of recommended items

    Parameters
    ----------
    actual: list
        A list of ground truth items (order doesn't matter)
        example: [X, Y, Z]
    predicted: list
        A list of predicted items, recommended by the system (order matters)
        example: [x, y, z]
    k: integer, optional (default to 10)
        The number of elements of predicted to consider in the calculation

    Returns
    ----------
    score:
        The mean Average Precision at k (mAPk)
    """
    score = 0.0
    numberOfHits = 0.0
    for i, p in enumerate(predicted):
        if p in actual and p not in predicted[:i]:
            numberOfHits += 1.0
            score += numberOfHits / (i+1.0)
    if not actual:
        return 0.0
    score = score / min(len(actual), k)
    return score

In [8]:
#Printing metrics helper function

def print_metrics_table(results: dict):
    """
    Prints the evaluation metrics in a neat table format.

    :param results: Dictionary containing the metrics, e.g.,
                    {'avg_precision@k': 0.6, 'avg_recall@k': 0.4, 'avg_map@k': 0.7}
    """
    table = [[key, f"{value:.4f}"] for key, value in results.items()]
    print(tabulate(table, headers=["Metric", "Value"], tablefmt="github"))

In [9]:
# Create evaluation dataset

# Initialize lists to store questions and responses
eval_questions = []
eval_responses = []

# Open and read the dataset file
with open("eval_output.txt", "r") as file:
    for line in file:

        # Split each line into question and response
        question, response_json = line.strip().split(", [", 1)

        # Parse the JSON-formatted response
        response = json.loads(response_json[:-1])

        # Append to the respective lists
        eval_questions.append(question)
        eval_responses.append(response)


# Create a Hugging Face Dataset from the lists
eval_dataset = Dataset.from_dict({"question": eval_questions, "response": eval_responses})

In [14]:
import logging

custom_prompt = """
[INST]

You are a helpful assistant that provides structured information about points of interest (POIs).

Your response must be a **single-line JSON string** following this format: r'(\[{{"rank":.*?\}}\])'

Strict rules:
- Strictly provide only the top 5 POI recommendations based on the given question.
- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).
- All fields are mandatory in each object.
- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.
- "coordinates" must always be a list of two strings: ["latitude", "longitude"].
- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.
- Stop generating tokens once you completed your first response satisfying .

Now answer the following:

Question:{}[/INST]

Response:{}
"""

def evaluate_unsloth_model(model, tokenizer, evaluation_data):

    # Initialise and track metrics
    avg_precision = []
    avg_recall = []
    avg_map = []
    retry = 0

    for i in range(len(evaluation_data['question'])):

      # Track progress
      print(f"Progress completed: {100* i/len(evaluation_data['question'])}%")

      try:

        input_text = evaluation_data['question'][i]
        expected_output = evaluation_data['response'][i]
        print(f"Question: {input_text}")
        print(f"Actual: {expected_output}")

        # Tokenize input
        inputs = tokenizer(
              [
                  custom_prompt.format(
                      input_text, # question
                      "", # response - leave this blank for generation
                  )
              ],
              truncation=True,
              return_tensors = "pt").to("cuda")

        # Generate output by streaming text on generation
        text_streamer = TextStreamer(tokenizer)
        outputs = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 700, temperature = 0.1)
        generated_text = str(tokenizer.batch_decode(outputs))
        print(f"generated_text: {generated_text}")

        #Cleaning and extracting relevant portion of response
        instr_pattern = r'\[INST\].*?\[/INST\]'
        cleaned_text = re.sub(instr_pattern, '', generated_text, flags=re.DOTALL)

        # Define the regular expression pattern to match the "name" fields
        pattern = r'"name":\s*"([^"]+)"'

        # Find all matches in the generated_text
        poi_names = re.findall(pattern, generated_text)

        # Output the list of POI names
        generated_output_names = [poi.lower() for poi in poi_names]
        print(f"generated_output_names: {generated_output_names}")

        if not generated_output_names:
            raise ValueError("No POI names found in the JSON data.")

        # Extract 'name' for each actual POI location recommendation
        expected_output_names = [re.sub(r'\W+', '', poi.get('name').lower()) for poi in json.loads(expected_output)[1]]

        # Evaluate on metrics
        precision = precisionk(expected_output_names, generated_output_names)
        recall = recallk(expected_output_names, generated_output_names)
        map = mapk(expected_output_names, generated_output_names)

        #Append to metric list
        avg_precision.append(precision)
        avg_recall.append(recall)
        avg_map.append(map)

        print(f"Avg Precision: {avg_precision}")
        print(f"Avg Recall: {avg_recall}")
        print(f"Avg MAP: {avg_map}")

      except:
        #Track number of samples that fail data extraction
        logging.exception('')

        with open("urbanllm_og_eval_failed_qn.txt", "a", encoding="utf-8") as file:
            file.write(f"{input_text}\n")

        retry += 1

        if retry > 100:
          break


    #Print results
    print(f"Training completed.")
    results = {'avg_precision': mean(avg_precision) if len(avg_precision) > 0 else 0, 'avg_recall': mean(avg_recall) if len(avg_recall) > 0 else 0, 'avg_map': mean(avg_map) if len(avg_map) > 0 else 0}
    print_metrics_table(results)

    return results

evaluate_unsloth_model(urbanLLM_og_model, urbanLLM_og_tokenizer, eval_dataset)

Progress completed: 0.0%
Question: Are there any new restaurant openings in district 19 at Waterway Point, 828761?
Actual: ["result", [{"rank": 1, "distance_in_km": 13.68, "name": "Din Tai Fung", "location": "435 Orchard Rd, Level 4, Singapore 238877", "coordinates": ["1.303999", "103.83318"], "price": "$$", "categories": "Taiwanese, Dim Sum, Dumplings", "opening_hours": "11:30 AM - 10:00 PM", "rating": "4.3", "review_count": 158, "district": 19, "attributes": "Regular hours"}, {"rank": 2, "distance_in_km": 13.73, "name": "Taste Paradise", "location": "2 Orchard Turn, #04-07, Singapore 238801", "coordinates": ["1.304235", "103.832011"], "price": "$$", "categories": "Dim Sum, Cantonese", "opening_hours": "11:00 AM - 3:00 PM, 6:00 PM - 11:00 PM", "rating": "4.5", "review_count": 32, "district": 19, "attributes": "Regular hours"}, {"rank": 3, "distance_in_km": 10.94, "name": "SilverKris Lounge", "location": "65 Airport Blvd, #036-133, Singapore 819663", "coordinates": ["1.3561", "103.9868

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Are there any new restaurant openings in district 7 at DUO Tower, 189352?[/INST]\n\nResponse:\n["result",

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 28 at Seletar Aerospace Park,

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 25 at Woodlands Regional Libr

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 10 at Dempsey Hill, 249679?[/

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 22 at IMM, 609601?[/INST]\n\n

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 15 at Joo Chiat Complex, 4200

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 28 at Seletar Aerospace Park,

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 2 at AXA Tower, 068811?[/INST

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where is the nearest hospital with a cardiology department within district 4 at Mount Faber Peak, 099203?

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 26 at Upper Thomson Plaza, 57

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 22 at IMM, 609601?[/INST]\n\n

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.


generated_text: ['<s> \n[INST]\n\nYou are a helpful assistant that provides structured information about points of interest (POIs).\n\nYour response must be a **single-line JSON string** following this format: r\'(\\[{"rank":.*?\\}\\])\'\n\nStrict rules:\n- Strictly provide only the top 5 POI recommendations based on the given question.\n- JSON must be a valid, compact, **single-line** string (no newlines, no indentation).\n- All fields are mandatory in each object.\n- Wrap the entire list with: ["result", [ ... ]]. All POI recommendations must be enclosed within the nested list structure.\n- "coordinates" must always be a list of two strings: ["latitude", "longitude"].\n- Do NOT include any explanation, instruction, or extra text — return only the raw JSON string.\n- Stop generating tokens once you completed your first response satisfying .\n\nNow answer the following:\n\nQuestion:Where can I deposit cash for my Standard Chartered bank account in district 28 at Seletar Aerospace Park,

ERROR:root:
Traceback (most recent call last):
  File "<ipython-input-14-2d6f44ece168>", line 181, in evaluate_unsloth_model
    raise ValueError("No POI names found in the JSON data.")
ValueError: No POI names found in the JSON data.



generated_output_names: []
Progress completed: 99.0%
Question: Where are hospital in district 24 at Kranji War Memorial, 738656 with wheelchair accessibility?
Actual: ["result", [{"rank": 1, "distance_in_km": 3.82, "name": "Woodlands Joint Testing & Vaccination Centre", "location": "Woodlands", "coordinates": ["1.4376172", "103.7865619"], "opening_hours": "24 hours", "district": 24, "attributes": "Wheelchair accessible"}, {"rank": 2, "distance_in_km": 4.9, "name": "Maria Hospital", "location": "Woodlands", "coordinates": ["1.4659796", "103.7595047"], "opening_hours": "24 hours", "district": 24, "attributes": "Wheelchair accessible"}]]
<s> 
[INST]

You are a helpful assistant that provides structured information about points of interest (POIs).

Your response must be a **single-line JSON string** following this format: r'(\[{"rank":.*?\}\])'

Strict rules:
- Strictly provide only the top 5 POI recommendations based on the given question.
- JSON must be a valid, compact, **single-line**

{'avg_precision': 0.011149825783972125,
 'avg_recall': 0.01364692218350755,
 'avg_map': 0.007876500193573364}