<a href="https://colab.research.google.com/github/juergenlandauer/caa2025/blob/main/SiteDetectionLiDAR/CAA2025_GPT4_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic site detection in satellite and LiDAR images with GPT4 by Open AI

Author: Juergen Landauer (juergen (AT) landauer-ai.de)

To start, first go to the "Input parameters" section below and review or (optionally) adjust parameters. Then run the entire Notebook by choosing Runtime->Run all in the menu above.


### Set up your API key and install the Open AI Python SDK

To access GPT-4, you need to provide your OpenAI API key. Follow these steps:

- register with Open AI
- Open your [`OpenAI Settings`](https://platform.openai.com/settings) page. Click `User API keys` then `Create new secret key` to generate new token.
Click `Copy`. This will place your private key in the clipboard.
- In Colab, go to the left pane and click on `Secrets` (🔑).
- Store OpenAI API key under the name `OPENAI_API_KEY`.


In [None]:
!pip install -Uq openai

In [None]:
from openai import OpenAI
from google.colab import userdata

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
client = OpenAI(api_key=OPENAI_API_KEY)

# Input parameters

Review all parameters in this section and (optionally) adjust them on the right side. For example, you can upload your own input zip file by providing an URL.

#### Demo with English hillforts for CAA2025

Feel free to replace this with your own imagery by providing a download URL (e.g. from Google Drive)

Note that the ZIP file must contain two sub-folders called "sites" and "nonsites", resp. Each folder must then contain images of sites or samples of other landscape (non-sites)


In [None]:
INPUT_ZIP_URL = 'https://www.dropbox.com/scl/fi/uubtlqbw62o1gi7n9542x/EnglandHillforts.zip?rlkey=dmewcvtgeoiuh9zu02qwfls4u&st=grvt8nyt&dl=0' # @param {"allow-input":true}

#### The text 'prompt' sent to the Foundation Model.

Play with different variations of the text and don't forget to include the object type you are looking for.

In [None]:
PROMPT = 'This is a LiDAR image possibly containing archaeological features from England, possibly also enclosures or hillforts.' # @param {"allow-input":true}

This appendix to the prompt ensures that the AI response is formatted in the right way. Do NOT change this!!!


In [None]:
PROMPT += " For each of the objects in this image return its type, its probability and bounding box in JSON format like [xmin ymin xmax, ymax]."

The text response from the AI is sometimes ambiguous. If you only (or "strictly") want to see certain types of responses, then keep this to True and provide a list of keywords you want to see in the output. Make sure you do not mess up the list syntax.

In [None]:
STRICT_RESPONSE_FILTERING = True # @param ['True', 'False']

FILTER_KEYWORDS = ["hillfort", "enclosure"]  # @param {"allow-input":true}

Here we define the model we are using. Usually it is not required to change this.

In [None]:
MODEL = "gpt-4.1-mini" # @param ["gpt-4.1-mini","gpt-4.1-nano", "gpt-4.1"] {"allow-input":true, isTemplate: true}


## We load the data and unzip it into the directory 'input'

In [None]:
!rm -rf input output file.zip
!mkdir -p input/
!wget -O file.zip "$INPUT_ZIP_URL"
!unzip -q file.zip -d input

--2025-04-22 08:25:43--  https://www.dropbox.com/scl/fi/uuirq92v5evwy8znojxje/sites.zip?rlkey=xycrlu44gdqmo5k5j439tzca7&dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.80.18, 2620:100:6035:18::a27d:5512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.80.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uc98a5a6724e244e5f9bf6b33832.dl.dropboxusercontent.com/cd/0/inline/CoSKMAyMH6W-pHxWCKJ14T8r12QT2IeZItO7TpaAbIpW944YpxwiMp3exfTtgJY3mlOOYz4PBQxvav5aWq8BSAThawOGigBKxXecg0JL9zaUmJ62oYYwMvT3NpUHAjE1X1w/file# [following]
--2025-04-22 08:25:44--  https://uc98a5a6724e244e5f9bf6b33832.dl.dropboxusercontent.com/cd/0/inline/CoSKMAyMH6W-pHxWCKJ14T8r12QT2IeZItO7TpaAbIpW944YpxwiMp3exfTtgJY3mlOOYz4PBQxvav5aWq8BSAThawOGigBKxXecg0JL9zaUmJ62oYYwMvT3NpUHAjE1X1w/file
Resolving uc98a5a6724e244e5f9bf6b33832.dl.dropboxusercontent.com (uc98a5a6724e244e5f9bf6b33832.dl.dropboxusercontent.com)... 162.125.80.15, 2620:100:6035:15::a27d:550f
Connecting to 

### Now we import some libraries

In [None]:
import numpy as np
import os
import cv2 as cv
import PIL.Image
import json
import random
import io
from PIL import Image, ImageDraw, ImageFont
from PIL import ImageColor
import matplotlib.pyplot as plt
from glob import glob
from pathlib import Path
import time
import re
from tqdm import tqdm
from google.colab import files as colabfiles

### Some utility functions we use

In [None]:
# function to read the images as PIL
def read_pil(fpath):
  pilimg = PIL.Image.open(fpath)
  return pilimg

In [None]:
# Function to encode the image
import base64 # Import the base64 module
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

In [None]:
# function to plot bounding boxes on images
def plot_bounding_boxes(img, results):
    width, height = img.size
    # Create a drawing object
    draw = ImageDraw.Draw(img)
    fpath='/usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf'
    fontsize = 24
    font = ImageFont.truetype(fpath, fontsize)

    noun_phrase, prob, bbox = results

    if len(bbox) != 4: return # nothing found
    x1, y1, x2, y2 = bbox
    # Ensure x1 <= x2 and y1 <= y2
    x1, x2 = sorted([x1, x2])  # Sort x-coordinates
    y1, y2 = sorted([y1, y2])  # Sort y-coordinates

    color = 'yellow'
    # Draw the bounding box
    draw.rectangle(((x1, y1), (x2, y2)), outline=color, width=4)
    # Draw the text
    draw.text((x1 + 8, y1 + 6), noun_phrase+" "+str(prob), fill=color, font=font)

# process folder

In [None]:
# read files
sites = glob('./input/sites/*.*')
nonsites = glob('./input/nonsites/*.*')
len(sites), len(nonsites)

(379, 300)

In [None]:
# uncomment this if you want to try just a small sample of MAX_N for each class
#import random
#MAX_N = 10

#sites = random.sample(sites, MAX_N)
#nonsites = random.sample(nonsites, MAX_N)

In [None]:
# define the output format of a 'Detection' - CURRENTLY UNUSED HERE!
from pydantic import BaseModel
class Detection(BaseModel):
  detection_type: str
  probability: float
  bbox: list[int]

## Processing all files with GPT-4

In [None]:
!rm -rf output
!mkdir output

In [None]:
files = sites + nonsites

with open('./output/phrases.txt', 'w') as docfile:
  for fpath in files:
    img = read_pil(fpath)
    img.save('./tmp.png')
    base64_image = encode_image("./tmp.png")
    print('________________________________________________________')

    response_format = Detection

    response = client.responses.create(
        model = MODEL,
        temperature = 0.3,
        input=[{
           "role": "user",
           "content": [
              {"type": "input_text", "text": PROMPT},
              {"type": "input_image", "image_url": f"data:image/jpeg;base64,{base64_image}",
        },],}],
        text={
          "format": {
              "type": "json_schema",
              "name": "Detection",
            "schema": {
                "type": "object",
                "properties": {
                    "detection_type": {
                        "type": "string"
                    },
                    "probability": {
                        "type": "number"
                    },
                    "bbox": {
                        "type": "array",
                        "items": {
                            "type": "number"
                        }
                    },
                },
                "required": ["detection_type", "probability", "bbox"],
                "additionalProperties": False
            },
            "strict":True
          },},)

    outlist = response.output_text
    outlist = re.findall(r'({.+?})', outlist) # possibly cut into results list (if more than one result)

    for out in outlist:
      ret = json.loads(out)
      if STRICT_RESPONSE_FILTERING and any(element in out.lower() for element in FILTER_KEYWORDS) and float(ret['probability']) >= 0.5:
        FOUND = True
        plot_bounding_boxes(img, (ret['detection_type'],ret['probability'], ret['bbox']))
      else: FOUND = False

      print (fpath, "------", FOUND, "---", ret)
      display(img.resize(size=(384,384)))

    filename = Path(fpath).name
    img.save(Path('output')/filename)
    docfile.write("--- " + fpath + ":" + json.dumps(outlist) + os.linesep)
    time.sleep(5)

## Export results for download
We now open a file download dialog for the output.zip. Simply store the output in your local computer. Done :-)

output.zip contains all images with bounding box annotations and a file phrases.txt containing the original response from Gemini.



In [None]:
!zip -r output.zip output
colabfiles.download('output.zip')