<a href="https://colab.research.google.com/github/juergenlandauer/caa2025/blob/main/SiteDetectionSatellite/CAA2025_Gemini_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic site detection in satellite and LiDAR images with Gemini by Google

Author: Juergen Landauer (juergen (AT) landauer-ai.de)

To start, first go to the "Input parameters" section below and review or (optionally) adjust parameters. Then run the entire Notebook by choosing Runtime->Run all in the menu above.


### Set up your API key and install the Gemini Python SDK

To access Gemini, you need to provide your Google Gemini API key. Follow these steps:

- register with Google (also works with your Gmail account
- Login here and get your API key: (https://aistudio.google.com/apikey)
- Click `Copy`. This will place your private key in the clipboard.
- In Colab, go to the left pane and click on `Secrets` (🔑).
- Store Google API key under the name `GOOGLE_API_KEY`.

Details are found here: https://github.com/google-gemini/gemini-api-cookbook/blob/main/quickstarts/Authentication.ipynb.

In [None]:
!pip install -Uq google-generativeai

In [None]:
from google import genai
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

client = genai.Client(api_key=GOOGLE_API_KEY)

# Input parameters

Review all parameters in this section and (optionally) adjust them on the right side. For example, you can upload your own input zip file by providing an URL.

#### Demo with Bavarian castles for CAA2025

Feel free to replace this with your own imagery by providing a download URL (e.g. from Google Drive)

Note that the ZIP file must contain two sub-folders called "sites" and "nonsites", resp. Each folder must then contain images of sites or samples of other landscape (non-sites)


In [None]:
INPUT_ZIP_URL = 'https://www.dropbox.com/scl/fi/uuirq92v5evwy8znojxje/sites.zip?rlkey=xycrlu44gdqmo5k5j439tzca7&dl=0' # @param {"allow-input":true}

#### The text 'prompt' send to the Foundation Model.

Play with different variations of the text and don't forget to include the object type you are looking for.

In [None]:
PROMPT = "This is a satellite image from Germany possibly containing castles or ruins." # @param {"allow-input":true}


This appendix to the prompt ensures that the AI response is formatted in the right way. Do NOT change this!!!

In [None]:
PROMPT += " For each of the objects in this image return its type, its probability and bounding box."

The text response from the AI is sometimes ambiguous. If you only (or "strictly") want to see certain types of responses, then keep this to True and provide a list of keywords you want to see in the output. Make sure you do not mess up the list syntax.

In [None]:
STRICT_RESPONSE_FILTERING = True # @param ['True', 'False']

FILTER_KEYWORDS = ["castle", "ruin"]  # @param {"allow-input":true}
#FILTER_KEYWORDS = ["hillfort", "enclosure"]  # @param {"allow-input":true}


Here we define the model we are using. Usually it is not required to change this.

In [None]:
MODEL_ID="gemini-2.0-flash" # @param ["gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

# here we determine the RPM of these models (RPM = Requests Per Minute)
if "lite" in MODEL_ID:           RPM = 30
elif "experimental" in MODEL_ID: RPM = 10
else:                            RPM = 15 # "flash"

## We load the data and unzip it into the directory 'input'

In [None]:
!rm -rf input output file.zip
!mkdir -p input/
!wget -O file.zip "$INPUT_ZIP_URL"
!unzip -q file.zip -d input

--2025-04-22 09:48:04--  https://www.dropbox.com/scl/fi/uuirq92v5evwy8znojxje/sites.zip?rlkey=xycrlu44gdqmo5k5j439tzca7&dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.5.18, 2620:100:601d:18::a27d:512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.5.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uc59dcb5ad1d0b2cdd8c908f29ba.dl.dropboxusercontent.com/cd/0/inline/CoTzDEdlS2aN4TcLlxsJGK8vFPc52XgTaPDE0PKzYAAeGAqwEfs0RCpOSQogNR46aDxOOdX04vWd-Ja2jzIisIAEP9jH9z4Enhpff0XxcZHAWIPk-K2RgJGhknhQpYoS1w0/file# [following]
--2025-04-22 09:48:05--  https://uc59dcb5ad1d0b2cdd8c908f29ba.dl.dropboxusercontent.com/cd/0/inline/CoTzDEdlS2aN4TcLlxsJGK8vFPc52XgTaPDE0PKzYAAeGAqwEfs0RCpOSQogNR46aDxOOdX04vWd-Ja2jzIisIAEP9jH9z4Enhpff0XxcZHAWIPk-K2RgJGhknhQpYoS1w0/file
Resolving uc59dcb5ad1d0b2cdd8c908f29ba.dl.dropboxusercontent.com (uc59dcb5ad1d0b2cdd8c908f29ba.dl.dropboxusercontent.com)... 162.125.5.15, 2620:100:601d:15::a27d:50f
Connecting to uc59d

### we now import some libraries

In [None]:
import numpy as np
import os
from osgeo import gdal
import cv2 as cv
import PIL.Image
import matplotlib.pyplot as plt
from glob import glob
from pathlib import Path
import time
from tqdm import tqdm
from google.colab import files as colabfiles
import json
import random
import io
from PIL import Image, ImageDraw
from PIL import ImageColor

### Some utility functions we use

In [None]:
# function to read the images as PIL
def read_pil(fpath):
  pilimg = PIL.Image.open(fpath)
  return pilimg

In [None]:
# modified from https://github.com/google-gemini/cookbook/blob/main/quickstarts/Spatial_understanding.ipynb
# function to plot bounding boxes on images
def plot_bounding_boxes(img, noun_phrases_and_positions):
    width, height = img.size
    # Create a drawing object
    draw = ImageDraw.Draw(img)

    # Iterate over the noun phrases and their positions
    for i, (noun_phrase, prob, (y1, x1, y2, x2)) in enumerate(
        noun_phrases_and_positions):
        # Select a color from the list
        #color = colors[i % len(colors)]
        color = 'yellow'
        #if prob >0.88: color = 'red' else: color = 'blue'

        # Convert normalized coordinates to absolute coordinates
        abs_x1 = int(x1/1000 * width)
        abs_y1 = int(y1/1000 * height)
        abs_x2 = int(x2/1000 * width)
        abs_y2 = int(y2/1000 * height)

        # Ensure x1 <= x2 and y1 <= y2
        abs_x1, abs_x2 = sorted([abs_x1, abs_x2])  # Sort x-coordinates
        abs_y1, abs_y2 = sorted([abs_y1, abs_y2])  # Sort y-coordinates

        # Draw the bounding box
        draw.rectangle(
            ((abs_x1, abs_y1), (abs_x2, abs_y2)), outline=color, width=4
        )

        # Draw the text
        draw.text((abs_x1 + 8, abs_y1 + 6), noun_phrase, fill=color)

    # Display the image
    img.show()

# process folder

In [None]:
sites = glob('./input/sites/*.*')
nonsites = glob('./input/nonsites/*.*')

In [None]:
# uncomment this if you want to try just a small sample of MAX_N for each class
#import random
#MAX_N = 10

#sites = random.sample(sites, MAX_N)
#nonsites = random.sample(nonsites, MAX_N)

In [None]:
# define the Gemini output format of a 'Detection'
from pydantic import BaseModel, TypeAdapter
class Detection(BaseModel):
  detection_type: str
  probability: float
  bbox: list[int]

### First upload all files to Gemini

In [None]:
files = sites + nonsites

refs = []
for fpath in tqdm(files):
  img = read_pil(fpath)
  img.save('./tmp.png')

  # Retry logic with exponential backoff
  retries = 3
  delay = 1
  for i in range(retries):
    try:
      file_ref = client.files.upload(file='./tmp.png')
      refs += [file_ref]
      break  # Exit retry loop if successful
    except ConnectionError as e:
      print(f"Connection error during upload, retrying in {delay} seconds... Attempt {i+1}/{retries}")
      time.sleep(delay)
      delay *= 2  # Exponential backoff
    except Exception as e:
        print(f"An unexpected error occured: {e}")
        break
    else:
      print(f"Failed to upload file after {retries} retries.")
      # You might want to handle this failure, like logging or skipping the file


100%|██████████| 679/679 [15:57<00:00,  1.41s/it]


## Processing all files with Gemini

In [None]:
!rm -rf output
!mkdir output

In [None]:
with open('./output/phrases.txt', 'w') as docfile:
  for fref, fpath in zip(refs, files):
    response = client.models.generate_content(model = MODEL_ID, contents = [fref, PROMPT],
        config={'response_mime_type': 'application/json',
                'response_schema': list[Detection]})

    print('________________________________________________________')
    print(fpath, "-- response:", response.text)
    objects = json.loads(response.text)

    # results visualization
    img = read_pil(fpath)
    phrases_boxes = []
    for item in objects: # for each detection
      if 'probability' in item:
          prob = item['probability']
      else: prob = 0.
      #if prob < 0.9: continue # not a valid detection
      detection_type = item['detection_type']
      if STRICT_RESPONSE_FILTERING and any(element in detection_type for element in FILTER_KEYWORDS):
          bbox = item['bbox']
          print (detection_type, prob, bbox)
          phrases_boxes += [(detection_type+"_"+str(prob), float(prob), bbox)]

    if phrases_boxes != []:
      plot_bounding_boxes(img, noun_phrases_and_positions=phrases_boxes)

    filename = Path(fpath).name
    img.save(Path('output')/filename)
    display(img.resize(size=(384, 384)))
    docfile.write(filename + ":" + str(objects) + os.linesep)
    #break
    time.sleep(60//RPM + 2) # sleep to meet RPM limit (requests per minute)

## Export results for download
We now open a file download dialog for the output.zip. Simply store the output in your local computer. Done :-)

output.zip contains all images with bounding box annotations and a file phrases.txt containing the original response from Gemini.



In [None]:
!zip -r output.zip output

In [None]:
colabfiles.download('output.zip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>