# Can LLMs geocode addresses?

This notebook contains the experiment described in this post: LLMs are better at geography than you think.

If you'd like to run it from scratch, you will need API keys from both [OpenAI](https://openai.com/api/) and [Geocode Earth](https://geocode.earth/). You can obtain data and cut your own samples from the [National Address Database](https://www.transportation.gov/gis/national-address-database). I sliced a sample using the following command, with NAD_r19.txt being the [CSV downloaded here](https://nationaladdressdata.s3.amazonaws.com/NAD_r19_TXT.zip):

```
gshuf -n 250 TXT/NAD_r19.txt >> ~/apps/llm-geography-test/data/nadsample.txt
```

If you have questions, feel free to reach out: **chase@localangle.co**


## Setup

This section sets up imports, variables and helper functions.

In [None]:
# Import required libraries
import pandas as pd
import openai
import time
import json
import requests
from IPython.core.magic import register_cell_magic
from IPython import get_ipython

# Set to true if you want to run the geocoder and LLM guesser again
REPROCESS_DATA = False

OPENAI_API_KEY = ''
OPENAI_MODEL = 'gpt-4o'
openai.api_key = OPENAI_API_KEY

GEOCODE_EARTH_API_KEY = ''

# Helper function to skip cells if a condition is met

@register_cell_magic
def skip_if(line, cell):
    if eval(line):
        return
    get_ipython().run_cell(cell)

In [75]:
########## HELPER FUNCTIONS ##########

# LLM functions

def _clean_json_response(response_text: str) -> str:
    """Clean and extract JSON from LLM response."""
    # Remove markdown code blocks if present
    if response_text.startswith('```json'):
        response_text = response_text[7:]
    if response_text.endswith('```'):
        response_text = response_text[:-3]
    
    # Remove any leading/trailing whitespace
    response_text = response_text.strip()
    
    return response_text

def call_llm(prompt: str, model: str = OPENAI_MODEL, force_json: bool = True, max_retries: int = 3) -> str:
    """
    Call OpenAI's LLM with the given prompt and model with exponential backoff retries.
    
    Args:
        prompt (str): The prompt to send to the LLM
        model (str): The model name (e.g., 'gpt-4o-mini', 'gpt-4o')
        force_json (bool): Whether to force JSON output (default: True)
        max_retries (int): Maximum number of retry attempts (default: 3)
        
    Returns:
        str: The LLM response text
        
    Raises:
        ValueError: If no API key is available or prompt is empty
        Exception: If the LLM call fails after all retries
    """
    if not prompt:
        raise ValueError("Prompt cannot be empty")
    
    if not model:
        raise ValueError("Model cannot be empty")
    
    if not OPENAI_API_KEY:
        raise ValueError("OpenAI API key not found. Please set OPENAI_API_KEY.")
    
    # Set system message based on force_json parameter
    if force_json:
        system_message = "You are a helpful assistant that returns only structured JSON output."
    else:
        system_message = "You are a helpful assistant that returns direct, concise responses without markdown formatting or explanations."
    
    for attempt in range(max_retries):
        try:
            response = openai.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": system_message},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.0
            )
            response_text = response.choices[0].message.content.strip()
            return _clean_json_response(response_text)
        except Exception as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
                print(f"OpenAI API call failed (attempt {attempt + 1}/{max_retries}): {str(e)}")
                print(f"Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                raise Exception(f"OpenAI API call failed after {max_retries} attempts: {str(e)}")

# Geocoding functions

def geocode_city(city: str, state: str):
    """
    Geocode a location using Pelias search API.
    
    Args:
        address (str): Location text to geocode
        
    Returns:
        dict: Coordinates and metadata, or None if geocoding fails
    """
    try:
        url = "https://api.geocode.earth/v1/search/structured"
        params = {
            "locality": city,
            "region": state,
            "size": 1  # Just get the best result
        }
        
        if GEOCODE_EARTH_API_KEY:
            params["api_key"] = GEOCODE_EARTH_API_KEY
            
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()
        data = response.json()
        
        if data.get("features") and len(data["features"]) > 0:
            feature = data["features"][0]
            coords = feature.get("geometry", {}).get("coordinates", [])
            
            if len(coords) >= 2:
                return {
                    "latitude": coords[1],
                    "longitude": coords[0],
                    "confidence": feature.get("properties", {}).get("confidence", 0),
                    "label": feature.get("properties", {}).get("label", "")
                }
        
        return None
        
    except Exception as e:
        print(f"Geocoding failed for '{address}': {str(e)}")
        return None

## Load and process the data

This section contains logic for loading and processing the NAD data file, as well as geocoding the city centers and addresses within.

The results from the experiment described in the post persisted in the file `nadsample_processed.txt`.

Set the `REPROCESS_DATA` variable to `True` if you would like to delete that file and rerun the geographic processing on both fronts. This might be useful if you choose to cut your own sample.

In [76]:
# Load the CSV data into a pandas DataFrame
# Specify Zip_Code as string to preserve leading zeros and handle non-numeric values
df = pd.read_csv('data/nadsample.txt', dtype={'Zip_Code': str})

In [77]:
# Create a full address string by joining the specified fields
address_fields = ['AddNo_Full', 'StNam_Full', 'Post_City', 'State', 'Zip_Code']

df['address_str'] = df[['AddNo_Full', 'StNam_Full']].apply(
    lambda row: f"{row['AddNo_Full']} {row['StNam_Full']}",
    axis=1
)

df['city_state'] = df[['Post_City', 'State']].apply(
    lambda row: f"{row['Post_City']}, {row['State']}",
    axis=1
)

df['address_full'] = df[['address_str', 'city_state', 'Zip_Code']].apply(
    lambda row: f"{row['address_str']} {row['city_state']}, {row['Zip_Code']}",
    axis=1
)

# Filter out records where Post_City is None or "Not stated"
df_filtered = df[(df['Post_City'].notna()) & (df['Post_City'] != 'Not stated')]

# Rename Latitude and Longitude to lowercase
df_filtered = df_filtered.rename(columns={
    'Latitude': 'latitude',
    'Longitude': 'longitude',
    'Zip_Code': 'zip'})

# Display the filtered address_str field along with coordinates
address_data = df_filtered[['address_str', 'city_state', 'zip', 'address_full', 'latitude', 'longitude']]

In [78]:
print("Dataset Shape:", address_data.shape)
address_data.head(50)

Dataset Shape: (188, 6)


Unnamed: 0,address_str,city_state,zip,address_full,latitude,longitude
0,504 TRENTON Street,"HARRIMAN, TN",37748.0,"504 TRENTON Street HARRIMAN, TN, 37748",35.934435,-84.549231
1,22 Nagel Drive,"Buffalo, NY",14225.0,"22 Nagel Drive Buffalo, NY, 14225",42.919926,-78.746159
2,5836 Hubbard Drive,"Rockville, MD",20852.0,"5836 Hubbard Drive Rockville, MD, 20852",39.054235,-77.119597
4,2891 LAKE SHORE Road South,"DENVER, NC",28037.0,"2891 LAKE SHORE Road South DENVER, NC, 28037",35.497915,-80.981462
5,7544 South 2020 East,"OGDEN, UT",84405.0,"7544 South 2020 East OGDEN, UT, 84405",41.129919,-111.926519
6,4626 Parkside Drive,"Baltimore, MD",21206.0,"4626 Parkside Drive Baltimore, MD, 21206",39.32302,-76.55819
8,4162 WINDTREE Drive,"SIGNAL MOUNTAIN, TN",37377.0,"4162 WINDTREE Drive SIGNAL MOUNTAIN, TN, 37377",35.176604,-85.343036
10,2225 East County Road 740 North,"NORTH VERNON, IN",47265.0,"2225 East County Road 740 North NORTH VERNON, ...",39.093891,-85.569821
11,4250 SARATOGA Avenue,"Downers Grove, IL",60515.0,"4250 SARATOGA Avenue Downers Grove, IL, 60515",41.809848,-88.016703
12,1453 WATER LILY Drive,"LITT, TX",75068.0,"1453 WATER LILY Drive LITT, TX, 75068",33.181223,-96.914244


## Geocode addresses and persist coordinates

The first part of this section geocodes city centers as a baseline. The second attempts to estimate each address' lat/lon with GPT-5. The results of this are persisted in `data/nadsample_geocoded.txt`.

In [79]:
%%skip_if REPROCESS_DATA == False

# Geocode city centers using the geocode_address function (FIXED VERSION)
print("Geocoding city centers...")

# Make a proper copy to avoid SettingWithCopyWarning
address_data = address_data.copy()

# Initialize new columns
address_data['city_lat'] = None
address_data['city_lon'] = None  
address_data['city_confidence'] = None

# Geocode each unique city_state
unique_cities = address_data['city_state'].unique()
city_geocodes = {}

for i, city_state in enumerate(unique_cities):
    print(f"Geocoding {city_state} ({i+1}/{len(unique_cities)})...")

    city, state = city_state.split(', ')
    city = city.strip()
    state = state.strip()
    
    result = geocode_city(city, state)
    if result:
        city_geocodes[city_state] = {
            'city_lat': result['latitude'],
            'city_lon': result['longitude'],
            'city_confidence': result['confidence']
        }
        print(f"  → {result['latitude']:.4f}, {result['longitude']:.4f} (confidence: {result['confidence']})")
    else:
        print(f"  → Failed to geocode {city}")
        city_geocodes[city_state] = {
            'city_lat': None,
            'city_lon': None,
            'city_confidence': None
        }
    time.sleep(0.5)

# Apply geocoded results to the dataframe
for city_state, coords in city_geocodes.items():
    mask = address_data['city_state'] == city_state
    # Apply to each matching row individually to avoid pandas issues
    for idx in address_data[mask].index:
        address_data.loc[idx, 'city_lat'] = coords['city_lat']
        address_data.loc[idx, 'city_lon'] = coords['city_lon']
        address_data.loc[idx, 'city_confidence'] = coords['city_confidence']

print(address_data.head(5))

# Save the geocoded dataframe to CSV
output_file = 'data/nadsample_geocode.txt'
address_data.to_csv(output_file, index=False)
print(f"\nGeocoded data saved to {output_file}")

# Show summary
print(f"\nSummary:")
print(f"Total addresses: {len(address_data)}")
print(f"Unique cities: {len(unique_cities)}")
successful_geocodes = sum(1 for coords in city_geocodes.values() if coords['city_lat'] is not None)
print(f"Successfully geocoded: {successful_geocodes}/{len(unique_cities)} cities")

Geocoding city centers...
Geocoding HARRIMAN, TN (1/178)...
  → 35.9335, -84.5503 (confidence: 1)
Geocoding Buffalo, NY (2/178)...
  → 42.8908, -78.8529 (confidence: 1)
Geocoding Rockville, MD (3/178)...
  → 39.0800, -77.1585 (confidence: 1)
Geocoding DENVER, NC (4/178)...
  → 35.5355, -81.0337 (confidence: 1)
Geocoding OGDEN, UT (5/178)...
  → 41.2319, -111.9624 (confidence: 1)
Geocoding Baltimore, MD (6/178)...
  → 39.3197, -76.6495 (confidence: 1)
Geocoding SIGNAL MOUNTAIN, TN (7/178)...
  → 35.1406, -85.3394 (confidence: 1)
Geocoding NORTH VERNON, IN (8/178)...
  → 39.0036, -85.6312 (confidence: 1)
Geocoding Downers Grove, IL (9/178)...
  → 41.8014, -88.0073 (confidence: 1)
Geocoding LITT, TX (10/178)...
  → 31.0310, -98.3263 (confidence: 0.3)
Geocoding ARLINGTON, TX (11/178)...
  → 32.6857, -97.1029 (confidence: 1)
Geocoding WESTCHESTER, IL (12/178)...
  → 41.8483, -87.8890 (confidence: 1)
Geocoding Bardstown, KY (13/178)...
  → 37.8198, -85.4611 (confidence: 1)
Geocoding GRAND PR

In [80]:
prompt = f"""Please estimate the latitude and longitude of the following address.
Use your knowledge of the city, its road systems and general conventions of geography
to make your estimate. Be as accurate and precise as possible.

Results should be returned as a JSON object with three attributes, latitude, longitude
and confidence.

Latitude and longitude should be in decimal degrees. Confidence should be a number
between 0 and 1 and should reflect your confidence in the accuracy of the estimate.

Here is the address to estimate:

"""

In [81]:
%%skip_if REPROCESS_DATA == False

# Use LLM to geocode addresses
print("Geocoding addresses with LLM...")

# Initialize new columns for LLM results
address_data['llm_lat'] = None
address_data['llm_lon'] = None
address_data['llm_confidence'] = None

# Process addresses
for i, (idx, row) in enumerate(address_data.iterrows()):
    address = row['address_full']
    print(f"LLM Geocoding {address} ({i+1}/{len(address_data)})...")
    
    try:
        # Create the full prompt with the address
        full_prompt = prompt + address
        
        # Call the LLM
        response = call_llm(full_prompt)
        
        # Parse the JSON response
        result = json.loads(response)
        
        # Extract coordinates and confidence
        address_data.loc[idx, 'llm_lat'] = result.get('latitude')
        address_data.loc[idx, 'llm_lon'] = result.get('longitude')
        address_data.loc[idx, 'llm_confidence'] = result.get('confidence')
        
        print(f"  → {result.get('latitude')}, {result.get('longitude')} (confidence: {result.get('confidence')})")
        
    except Exception as e:
        print(f"  → Failed to geocode with LLM: {str(e)}")
        address_data.loc[idx, 'llm_lat'] = None
        address_data.loc[idx, 'llm_lon'] = None
        address_data.loc[idx, 'llm_confidence'] = None
    
    # Add a small delay to avoid rate limiting
    time.sleep(1)

# Save the updated dataframe
output_file = 'data/nadsample_geocode.txt'
address_data.to_csv(output_file, index=False)
print(f"\nLLM geocoded data saved to {output_file}")

# Show summary
successful_llm_geocodes = address_data['llm_lat'].notna().sum()
print(f"\nLLM Geocoding Summary:")
print(f"Total addresses processed: {len(address_data)}")
print(f"Successfully geocoded: {successful_llm_geocodes}/{len(address_data)} addresses")


Geocoding addresses with LLM...
LLM Geocoding 504 TRENTON Street HARRIMAN, TN, 37748 (1/188)...
  → 35.9336, -84.5522 (confidence: 0.9)
LLM Geocoding 22 Nagel Drive Buffalo, NY, 14225 (2/188)...
  → 42.9295, -78.7548 (confidence: 0.9)
LLM Geocoding 5836 Hubbard Drive Rockville, MD, 20852 (3/188)...
  → 39.0522, -77.1208 (confidence: 0.9)
LLM Geocoding 2891 LAKE SHORE Road South DENVER, NC, 28037 (4/188)...
  → 35.5053, -81.0043 (confidence: 0.9)
LLM Geocoding 7544 South 2020 East OGDEN, UT, 84405 (5/188)...
  → 41.1396, -111.9484 (confidence: 0.9)
LLM Geocoding 4626 Parkside Drive Baltimore, MD, 21206 (6/188)...
  → 39.3445, -76.5568 (confidence: 0.9)
LLM Geocoding 4162 WINDTREE Drive SIGNAL MOUNTAIN, TN, 37377 (7/188)...
  → 35.1333, -85.3433 (confidence: 0.9)
LLM Geocoding 2225 East County Road 740 North NORTH VERNON, IN, 47265 (8/188)...
  → 39.0105, -85.6053 (confidence: 0.8)
LLM Geocoding 4250 SARATOGA Avenue Downers Grove, IL, 60515 (9/188)...
  → 41.8164, -88.0112 (confidence: 0

## Evaluate results

Here we estimate the error between the ground truth lat/lon pairs (provided in the NAD) with both the city centers and the LLM guesses. We use geodesic distance to calculate our error metrics.

In [82]:
import numpy as np
from geographiclib.geodesic import Geodesic

def geodesic_dist_m(lat1, lon1, lat2, lon2):
    return Geodesic.WGS84.Inverse(lat1, lon1, lat2, lon2)['s12']  # meters

def mean_geodesic_error_m(lat_true, lon_true, lat_pred, lon_pred):
    d = [geodesic_dist_m(lt, ln, lp, lo)
         for lt, ln, lp, lo in zip(lat_true, lon_true, lat_pred, lon_pred)]
    return float(np.mean(d))

In [83]:
# Evaluate geocoding performance by calculating geodesic distances
print("Calculating geocoding errors...")

if REPROCESS_DATA == False:
    address_data = pd.read_csv('data/nadsample_geocode.txt')

# Initialize error columns
address_data['city_error_m'] = None
address_data['llm_error_m'] = None

# Calculate errors for each row
for idx, row in address_data.iterrows():
    # Get ground truth coordinates
    gt_lat = row['latitude']
    gt_lon = row['longitude']
    
    # Calculate city center error
    if pd.notna(row['city_lat']) and pd.notna(row['city_lon']):
        city_error = geodesic_dist_m(gt_lat, gt_lon, row['city_lat'], row['city_lon'])
        address_data.loc[idx, 'city_error_m'] = city_error
    else:
        address_data.loc[idx, 'city_error_m'] = None
    
    # Calculate LLM error
    if pd.notna(row['llm_lat']) and pd.notna(row['llm_lon']):
        llm_error = geodesic_dist_m(gt_lat, gt_lon, row['llm_lat'], row['llm_lon'])
        address_data.loc[idx, 'llm_error_m'] = llm_error
    else:
        address_data.loc[idx, 'llm_error_m'] = None

# Calculate summary statistics
city_errors = address_data['city_error_m'].dropna()
llm_errors = address_data['llm_error_m'].dropna()

print(f"\nPerformance Summary:")
print(f"Total addresses: {len(address_data)}")
print(f"City center geocoding results: {len(city_errors)}")
print(f"LLM geocoding results: {len(llm_errors)}")

if len(city_errors) > 0:
    print(f"\nCity Center Geocoding Performance:")
    print(f"  Mean error: {city_errors.mean():.1f} meters ({city_errors.mean()/1609.34:.2f} miles)")
    print(f"  Median error: {city_errors.median():.1f} meters ({city_errors.median()/1609.34:.2f} miles)")
    print(f"  Max error: {city_errors.max():.1f} meters ({city_errors.max()/1609.34:.2f} miles)")

if len(llm_errors) > 0:
    print(f"\nLLM Geocoding Performance:")
    print(f"  Mean error: {llm_errors.mean():.1f} meters ({llm_errors.mean()/1609.34:.2f} miles)")
    print(f"  Median error: {llm_errors.median():.1f} meters ({llm_errors.median()/1609.34:.2f} miles)")
    print(f"  Max error: {llm_errors.max():.1f} meters ({llm_errors.max()/1609.34:.2f} miles)")

# Save the results
output_file = 'data/nadsample_geocode.txt'
address_data.to_csv(output_file, index=False)
print(f"\nResults saved to {output_file}")

# Show a sample of the results
print(f"\nSample results:")
sample_data = address_data[['address_full', 'latitude', 'longitude', 'city_error_m', 'llm_error_m']].head(10)
print(sample_data)

Calculating geocoding errors...

Performance Summary:
Total addresses: 188
City center geocoding results: 188
LLM geocoding results: 188

City Center Geocoding Performance:
  Mean error: 116808.5 meters (72.58 miles)
  Median error: 4029.7 meters (2.50 miles)
  Max error: 13906920.1 meters (8641.38 miles)

LLM Geocoding Performance:
  Mean error: 4422.6 meters (2.75 miles)
  Median error: 2316.0 meters (1.44 miles)
  Max error: 91843.4 meters (57.07 miles)

Results saved to data/nadsample_geocode.txt

Sample results:
                                         address_full   latitude   longitude  \
0              504 TRENTON Street HARRIMAN, TN, 37748  35.934435  -84.549231   
1                   22 Nagel Drive Buffalo, NY, 14225  42.919926  -78.746159   
2             5836 Hubbard Drive Rockville, MD, 20852  39.054235  -77.119597   
4        2891 LAKE SHORE Road South DENVER, NC, 28037  35.497915  -80.981462   
5               7544 South 2020 East OGDEN, UT, 84405  41.129919 -111.926519 

## Results

The results in this limited case suggest that LLM guesses, though not especially accurate, provide a latitude and longitude that are closer (as the crow flies) to the ground-truth latitude and longitude for each address than city centers.

The mean error is skewed by some city centers that were located in the completely wrong country, apparently. This is an error by the geocoder that we could certainly fix if we ran it again.

It would also be worth trying GPT-5, which I avoided doing mostly because it's very slow ...