# LLM based Geospatial Query Translation Evaluation

In this research, we constructed tailored prompts to instruct the large language model (LLM) in translating user queries into corresponding geospatial function calls. For the translation tasks, we employed the Cohere Labs Command R 08-2024 language model (Cohere Labs, 2024). This model is a 32-billion parameter generative LLM, optimized for reasoning, summarization, and function calling, and is well-suited for geospatial query translation (Cohere Labs, 2024).

We also employed two evaluation metrics to assess the effectiveness of the model in this setup:

- **Exact Match Accuracy (EMA):** The proportion of predictions that exactly match the target function calls.
- **Levenshtein Similarity (LS):** Measures how similar two strings are based on the minimum number of single-character edits required to change one string into the other.

In [34]:
import json
import requests
from typing import List
from tqdm import tqdm

# Levenshtein distance implementation
def levenshtein_distance(s1: str, s2: str) -> int:
    if len(s1) < len(s2):
        return levenshtein_distance(s2, s1)
    if len(s2) == 0:
        return len(s1)
    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row
    return previous_row[-1]

def levenshtein_similarity(s1: str, s2: str) -> float:
    if max(len(s1), len(s2)) == 0:
        return 1.0
    return 1 - levenshtein_distance(s1, s2) / max(len(s1), len(s2))


In [35]:
# Load Test Data
test_data_path = '../../data/test_data.txt'  # Adjust path if needed

inputs = []
outputs = []

with open(test_data_path, 'r', encoding='utf-8') as f:
    for line in f:
        item = json.loads(line)
        inputs.append(item['input'])
        outputs.append(item['output'])

print(f"Loaded {len(inputs)} test samples.")


Loaded 354 test samples.


In [36]:
# Prompt Construction (Few-shot, 10 unique labels)
def construct_prompt(user_query: str) -> str:
    """Constructs a prompt for the LLM using 10 few-shot examples with unique function labels from the training data."""
    examples = [
        ("I'd like to zoom out by 2 levels", "ZoomOut(2)"),
        ("Show the seismic activity map from WMS URL https://seismic.activity/wms", "AddWMS('https://seismic.activity/wms')"),
        ("Load the point vector using point_zones_NY_kpn.kml!", "AddVector('point', 'point_zones_NY_kpn.kml')"),
        ("Add marker 'University' at location -73.1888, 122.889!", "AddMarker('University', [-73.1888, 122.889])"),
        ("Set map bounds from 62.2585, -120.3652 to 63.8833, -3.3906.", "MoveToExtent(62.2585, -120.3652, 63.8833, -3.3906)"),
        ("Switch to the OpenMallMap layer for retail therapy.", "AddLayer('OpenMallMap')"),
        ("Can we go to 40.5267, -79.4892?", "Move(40.5267, -79.4892)"),
        ("Draw a Line on the map!", "Draw('Line')"),
        ("Set the background color to ivory.", "Cartography('background', 'ivory', null)"),
        ("Zoom in by 7 levels to focus on the details.", "ZoomIn(7)")
    ]
    prompt = "You are an expert system that translates user queries into geospatial function calls. Here are some examples:\n"
    for inp, out in examples:
        prompt += f"User: {inp}\nFunction Call: {out}\n"
    prompt += f"User: {user_query}\nFunction Call:"
    return prompt


In [None]:
# LLM Inference (Pseudo-code, replace with actual API call)

# Replace with your actual Cohere API key
COHERE_API_KEY = 'YOUR API KEY'


def call_cohere_command_r(prompt: str) -> str:
    url = "https://api.cohere.ai/v1/generate"
    headers = {
        "Authorization": f"Bearer {COHERE_API_KEY}",
        "Content-Type": "application/json"
    }
    data = {
        "model": "command-r-08-2024",
        "prompt": prompt,
        "max_tokens": 64,
        "temperature": 0.0
    }
    response = requests.post(
        url,
        headers=headers,
        json=data,
    )
    response.raise_for_status()
    return response.json()['generations'][0]['text'].strip()


In [38]:
import re

def extract_function_call(llm_output: str) -> str:
    """
    Extracts the function call from the LLM output.
    Assumes the function call is after the last 'Function Call:' in the string.
    """
    # Find all occurrences of 'Function Call:'
    matches = list(re.finditer(r'Function Call:', llm_output))
    if not matches:
        return llm_output.strip()
    # Get the last match
    last_match = matches[-1]
    # Extract everything after the last 'Function Call:'
    func_call = llm_output[last_match.end():].strip()
    # If the function call is on the next line, get only the first line
    func_call = func_call.split('\n')[0].strip()
    return func_call

In [39]:
# Run Inference and Collect Results
predictions = []

for user_query in tqdm(inputs, desc="Querying LLM"):
    prompt = construct_prompt(user_query)
    prediction = call_cohere_command_r(prompt)
    prediction = extract_function_call(prediction)
    predictions.append(prediction)


Querying LLM: 100%|██████████████████████████████████████████████████████████████████| 354/354 [14:24<00:00,  2.44s/it]


In [40]:
# Evaluation Metrics
# Exact Match Accuracy
exact_matches = [pred == gold for pred, gold in zip(predictions, outputs)]
ema = sum(exact_matches) / len(exact_matches)

# Levenshtein Similarity (average)
lev_sims = [levenshtein_similarity(pred, gold) for pred, gold in zip(predictions, outputs)]
avg_ls = sum(lev_sims) / len(lev_sims)

print(f"Exact Match Accuracy (EMA): {ema:.4f}")
print(f"Average Levenshtein Similarity (LS): {avg_ls:.4f}")


Exact Match Accuracy (EMA): 0.7599
Average Levenshtein Similarity (LS): 0.9098


In [41]:
# Results Table
import pandas as pd

df = pd.DataFrame({
    'User Query': inputs,
    'Expected Output': outputs,
    'LLM Prediction': predictions,
    'Exact Match': exact_matches,
    'Levenshtein Similarity': lev_sims
})

df.head(10)  # Show first 10 results


Unnamed: 0,User Query,Expected Output,LLM Prediction,Exact Match,Levenshtein Similarity
0,Decrease the zoom by 6 levels.,ZoomOut(6),ZoomOut(6),True,1.0
1,"Move map view to 51.3595, 25.5972!","Move(51.3595, 25.5972)","Move(51.3595, 25.5972)",True,1.0
2,Load the polyline vector using polyline_rivers...,"AddVector('polyline', 'polyline_rivers_Asia_rf...","AddVector('polyline', 'polyline_rivers_Asia_rf...",True,1.0
3,Connect to WMS using https://tiles.io/geoserve...,AddWMS('https://tiles.io/geoserver/wms'),AddWMS('https://tiles.io/geoserver/wms'),True,1.0
4,I prefer the OpenMuseumMap layer for cultural ...,AddLayer('OpenMuseumMap'),AddLayer('OpenMuseumMap'),True,1.0
5,Attach polyline_rivers_London_hsq.shp to overl...,"AddVector('polyline', 'polyline_rivers_London_...","AddVector('polyline', 'polyline_rivers_London_...",False,0.833333
6,"Update map position to -32.5905, -80.6182.","Move(-32.5905, -80.6182)","Move(-32.5905, -80.6182)",True,1.0
7,Include point features from file point_buildin...,"AddVector('point', 'point_buildings_Berlin_nzj...","AddVector('point', 'point_buildings_Berlin_nzj...",True,1.0
8,Load the point vector using point_districts_Pa...,"AddVector('point', 'point_districts_Paris_vjw....","AddVector('point', 'point_districts_Paris_vjw....",True,1.0
9,"Focus on the region from 33.9391, 67.7099 to 5...","MoveToExtent(33.9391, 67.7099, 52.2297, 21.0122)","MoveToExtent(33.9391, 67.7",False,0.541667
