Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
76679d0
Start of Gemini component
ZachCafego Mar 13, 2025
b02e2bd
Made changes to unittests, fixed mocking
ZachCafego Mar 19, 2025
913cee3
Now include prompt files in module.
jrobble Apr 3, 2025
389390e
Improve prompts.
jrobble Apr 3, 2025
fb9ad28
Enable generating JSON output for people and vehicles.
jrobble Apr 3, 2025
5d65585
Allow features with a value of "true".
jrobble Apr 3, 2025
4e2b197
Ignore unsure prefixes.
jrobble Apr 3, 2025
0814cd2
Flatten output for vehicles.
jrobble Apr 4, 2025
0f2d7ba
Always add "ANNOTATED BY GEMINI".
jrobble Apr 4, 2025
b5bf823
Output CLASSIFICATION property.
jrobble Apr 8, 2025
912c084
Added support for MODEL_NAME and backoff for model rate limit errors
Jun 3, 2025
4ae286f
Fixed issues from pr 401, besides those relating to backoff
Jun 6, 2025
f4f3862
Altered the exception for 429 to catch the error code
Jun 6, 2025
2c44b0a
Removing redundant assignments
Jun 10, 2025
0e7d8db
Merge develop changes to kburkewv/feat/gemini-detection
Jun 10, 2025
74de3b5
Merge branch develop into kburkewv/feat/gemini-detection
Jun 10, 2025
8ddb41b
Image and video processing are supported through SharedMemory
Jun 13, 2025
9dd2c3c
Feed forward implemenation
Jun 13, 2025
6fed115
Added ff support for images
Jun 16, 2025
231bb47
Added geminidetection ff with markup
Jun 17, 2025
9c80941
Added unittests and data
Jun 17, 2025
382ed65
Merge w/ markup change
Jun 17, 2025
7de957a
Structured output prompts for person and vehicle
Jun 18, 2025
52ee803
Merge remote-tracking branch 'origin/develop' into kmburke/feat/gemin…
Jul 17, 2025
954fce4
Bug fixes for color and SHM
Jul 17, 2025
3fd8f6d
Fixed a bug where if an image is too large it caused SharedMemory to …
Jul 18, 2025
f80d252
Removed the close() call in gemini-process-image
Jul 18, 2025
c9b437b
Changes to prompt and image processing
Jul 18, 2025
1608600
Updated prompts and finalized details
Jul 21, 2025
18767fa
Use OpenCV to convert color profile.
jrobble Jul 29, 2025
d9431d5
Improve shared memory resource cleanup.
jrobble Jul 30, 2025
161dd01
Add monkey patch for resource tracker.
jrobble Jul 30, 2025
11fb07b
Added GENERATION_MAX_ATTEMPTS and removed redundant code
Jul 31, 2025
d86f086
Removed obsolete file
Jul 31, 2025
ecb92a3
Updated the README
Jul 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions python/GeminiDetection/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# syntax=docker/dockerfile:experimental

#############################################################################
# NOTICE #
# #
# This software (or technical data) was produced for the U.S. Government #
# under contract, and is subject to the Rights in Data-General Clause #
# 52.227-14, Alt. IV (DEC 2007). #
# #
# Copyright 2024 The MITRE Corporation. All Rights Reserved. #
#############################################################################

#############################################################################
# Copyright 2024 The MITRE Corporation #
# #
# Licensed under the Apache License, Version 2.0 (the "License"); #
# you may not use this file except in compliance with the License. #
# You may obtain a copy of the License at #
# #
# http://www.apache.org/licenses/LICENSE-2.0 #
# #
# Unless required by applicable law or agreed to in writing, software #
# distributed under the License is distributed on an "AS IS" BASIS, #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. #
# See the License for the specific language governing permissions and #
# limitations under the License. #
#############################################################################

ARG BUILD_REGISTRY
ARG BUILD_TAG=latest
FROM ${BUILD_REGISTRY}openmpf_python_executor_ssb:${BUILD_TAG}

RUN --mount=type=tmpfs,target=/var/cache/apt \
--mount=type=tmpfs,target=/var/lib/apt/lists \
--mount=type=tmpfs,target=/tmp \
apt-get update; \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y wget \
# For Google Gemini
# After installing the following /usr/bin will have:
# python3 -> python3.8
# python3.8
# python3.9
python3.9 python3.9-venv libpython3.9

# Create separate venv for Python 3.9 subprocess
RUN mkdir -p /gemini-subprocess/venv; \
python3.9 -m venv /gemini-subprocess/venv; \
/gemini-subprocess/venv/bin/pip3 install google-genai pillow numpy

COPY gemini-process-image.py gemini_component/resource_tracker_monkeypatch.py /gemini-subprocess

RUN pip3 install --upgrade pip

RUN pip3 install tenacity opencv-python

ARG RUN_TESTS=false

RUN --mount=target=.,readwrite \
install-component.sh; \
if [ "${RUN_TESTS,,}" == true ]; then python tests/test_gemini.py; fi

LABEL org.label-schema.license="Apache 2.0" \
org.label-schema.name="OpenMPF Gemini Detection" \
org.label-schema.schema-version="1.0" \
org.label-schema.url="https://openmpf.github.io" \
org.label-schema.vcs-url="https://github.com/openmpf/openmpf-components" \
org.label-schema.vendor="MITRE"
63 changes: 63 additions & 0 deletions python/GeminiDetection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Overview

This repository contains source code for the OpenMPF Gemini Detection Component.

This component utilizes a config file that contains any number of prompts for any number of object classes. These prompts and the images/video frames are passed to the Google Gemini server to generate responses.

# Job Properties

The following are the properties that can be specified for the component. All properties except for GEMINI_API_KEY and CLASSIFICATION have default values, making them optional to set.

- `GEMINI_API_KEY`: Your API key to send requests to Google Gemini
- `CLASSIFICATION`: The class of the object(s) in the media. Used to determine the prompt(s). Examples: PERSON and VEHICLE.
- `PROMPT_CONFIGURATION_PATH`: The path to JSON file which contains prompts for specified classifications.
- `JSON_PROMPT_CONFIGURATION_PATH`: The path to a JSON file which contains classes and prompts that specify Gemini to return a JSON object.
- `ENABLE_JSON_PROMPT_FORMAT`: Enables returning a JSON formatted response from Gemini, with the prompt specified at PROMPT_JSON_CONFIGURATION_PATH job property. By default set to false.
- `GENERATE_FRAME_RATE_CAP`: The threshold on the maximum number of frames to process in the video segment within one second of the native video time.
- `MODEL_NAME`: The model to use for Gemini inference. By default it is set to `"gemma-3-27b-it"`.
- `GENERATION_MAX_ATTEMPTS`: The maximum number of times the component will attempt to generate valid JSON output.

# Config File

The config file is a JSON formatted file that is used by the component to know which prompts to ask Gemini depending on the class of the object. The user can write their own config file and can be used by setting the `PROMPT_CONFIGURATION_PATH` property. The following is an example of the proper syntax to follow:

```json
[
{
"classes": [
"DOG",
"CAT",
"HORSE"
],
"prompts": [
{
"detectionProperty": "DESCRIPTION",
"prompt": "Describe the animal's color and appearance."
}
]
},
{
"classes": [
"DOG"
],
"prompts": [
{
"detectionProperty": "DOG BREED",
"prompt": "Describe the potential breeds that this dog could contain."
}
]
}
]
```

Note that a class can appear in multiple entries in the JSON, such as `"DOG"` in the example. If you have multiple classes that share a prompt, you can list them together like above and then add more questions for each individual class if you wish to get more specific.

Also be sure to make each `"detectionProperty"` distinct for a given class so that none of your prompts are overwritten.

# Outputs

Once the responses are generated, they are added onto the `detection_properties` dictionary of the associated `ImageLocation` object. for each prompt, the key is specified by the `"detectionProperty"` field of the config JSON and the value will be the Gemini-generated response.

# TODO

- Add functionality for generic class property detection
83 changes: 83 additions & 0 deletions python/GeminiDetection/gemini-process-image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#############################################################################
# NOTICE #
# #
# This software (or technical data) was produced for the U.S. Government #
# under contract, and is subject to the Rights in Data-General Clause #
# 52.227-14, Alt. IV (DEC 2007). #
# #
# Copyright 2024 The MITRE Corporation. All Rights Reserved. #
#############################################################################

#############################################################################
# Copyright 2024 The MITRE Corporation #
# #
# Licensed under the Apache License, Version 2.0 (the "License"); #
# you may not use this file except in compliance with the License. #
# You may obtain a copy of the License at #
# #
# http://www.apache.org/licenses/LICENSE-2.0 #
# #
# Unless required by applicable law or agreed to in writing, software #
# distributed under the License is distributed on an "AS IS" BASIS, #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. #
# See the License for the specific language governing permissions and #
# limitations under the License. #
#############################################################################

import argparse
import json
import sys
import numpy as np

from google import genai
from multiprocessing.shared_memory import SharedMemory
from google.genai.errors import ClientError
from PIL import Image

from resource_tracker_monkeypatch import remove_shm_from_resource_tracker

def main():
parser = argparse.ArgumentParser(description='Sends image and prompt to Gemini Client for processing.')

parser.add_argument("--model", "-m", type=str, default="gemma-3-27b-it", help="The name of the Gemini model to use.")
parser.add_argument("--shm-name", type=str, required=True, help="Shared memory name for image data.")
parser.add_argument("--shm-shape", type=str, required=True, help="Shape of the image in shared memory (JSON list).")
parser.add_argument("--shm-dtype", type=str, required=True, help="Numpy dtype of the image in shared memory.")
parser.add_argument("--prompt", "-p", type=str, required=True, help="The prompt you want to use with the image.")
parser.add_argument("--api_key", "-a", type=str, required=True, help="Your API key for Gemini.")
args = parser.parse_args()

remove_shm_from_resource_tracker()

shm = None

try:
shape = tuple(json.loads(args.shm_shape))
dtype = np.dtype(args.shm_dtype)
shm = SharedMemory(name=args.shm_name)

np_img = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
image = Image.fromarray(np_img)

client = genai.Client(api_key=args.api_key)
content = client.models.generate_content(model=args.model, contents=[args.prompt, image])
print(content.text)
sys.exit(0)

except ClientError as e:
if hasattr(e, 'code') and e.code == 429:
print("Caught a ResourceExhausted error (429 Too Many Requests)", file=sys.stderr)
else:
print(e, file=sys.stderr)
sys.exit(1)

except Exception as e:
print(e, file=sys.stderr)
sys.exit(1)

finally:
if shm:
shm.close()

if __name__ == "__main__":
main()
27 changes: 27 additions & 0 deletions python/GeminiDetection/gemini_component/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#############################################################################
# NOTICE #
# #
# This software (or technical data) was produced for the U.S. Government #
# under contract, and is subject to the Rights in Data-General Clause #
# 52.227-14, Alt. IV (DEC 2007). #
# #
# Copyright 2025 The MITRE Corporation. All Rights Reserved. #
#############################################################################

#############################################################################
# Copyright 2025 The MITRE Corporation #
# #
# Licensed under the Apache License, Version 2.0 (the "License"); #
# you may not use this file except in compliance with the License. #
# You may obtain a copy of the License at #
# #
# http://www.apache.org/licenses/LICENSE-2.0 #
# #
# Unless required by applicable law or agreed to in writing, software #
# distributed under the License is distributed on an "AS IS" BASIS, #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. #
# See the License for the specific language governing permissions and #
# limitations under the License. #
#############################################################################

from .gemini_component import GeminiComponent
24 changes: 24 additions & 0 deletions python/GeminiDetection/gemini_component/data/json_prompts.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"classPrompts": [
{
"classes": [
"PERSON"
],
"prompts": [
"If there is no person visible in the image, produce JSON matching this specification: \"person\": { \"visible_person\": false } Return person. If a person is visible, extract their features and include answers only if 100% confident, if not provide \"unsure\". If an attribute is not visible, set the value to \"not visible\". For clarification, facial features are permanent or consistent traits of the face that do not change with expressions or emotions. Examples include face shape, nose structure, lip shape, eye shape/spacing, jawline, cheekbones, and consistent marks like scars or moles. They do not include temporary expressions (e.g., smiling), emotions (e.g., sadness), or conditions like makeup or lighting. Produce JSON matching this specification: \"person\": { \"visible_person\": true, \"type\": (\"civilian\", \"guard\", \"public figure\"), \"clothing\": array<{\"type\": (ie. \"shirt\", \"pants\", \"dress\", \"t-shirt\", \"shorts\", \"skirt\", etc.), \"color\": string, \"describe\": string}>, \"age_range\": (\"minor/child\", \"adult\", \"elderly\"), \"gender\": string, \"skin_color\": (\"very fair\", \"fair\", \"medium\", \"olive\", \"brown\", \"black\"), \"race\": (\"american indian/alaska native\", \"asian\", \"black/african american\", \"hispanic/latino\", \"native hawaiian/pacific islander\", \"white\"), \"accessories\": array< \"type\": string, \"color\": string, \"describe\": string}>, \"glasses\": {\"type\": string, \"color\": string, \"describe\": string}, \"object_in_hand\": array< \"type\": clothing_enum>, \"color\": string, \"describe\": string}>, \"shoes\": {\"type\": string, \"color\": string, \"describe\": string}, \"head_features\": {\"hair_color\": string, \"bald\": boolean, \"head_cover\": {\"type\": string, \"color\": string, \"describe\": string}}, \"tattoo_features\": {\"location\": string, \"color\": string, \"describe\": string}, \"face_features\": {\"eye_color\": (ie. \"brown\", \"blue\", \"green\", \"hazel\", \"gray\", \"amber\", \"violet\", etc.), \"facial_hair_color\": string, \"facial_features\": string}, \"action_performed\": string, \"background\": {\"type\": string, \"color\": string, \"describe\": string}, \"other_notable_characteristics\": string } Return: person"
]
},
{
"classes": [
"VEHICLE",
"CAR",
"TRUCK",
"BUS",
"MOTORBIKE"
],
"prompts": [
"If there is no vehicle visible in the image, produce JSON matching this specification: \"vehicle\": { \"visible_vehicle\": false } Return vehicle. If a vehicle is visible, extract its features and include answers only if 100% confident, if not provide \"unsure\". If an attribute is not visible, set the value to \"not visible\". Produce JSON matching this specification: \"vehicle\": { \"visible_vehicle\": true, \"make\": string, \"type\": string, \"color\": string, \"license_plate_state\": string, \"license_plate_number\": string, \"other_notable_characteristics\": string} Return: vehicle"
]
}
]
}
39 changes: 39 additions & 0 deletions python/GeminiDetection/gemini_component/data/prompts.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{
"classPrompts": [
{
"classes": [
"PERSON"
],
"prompts": [
{
"detectionProperty": "CLOTHING",
"prompt": "Describe what this person is wearing"
},
{
"detectionProperty": "ACTIVITY",
"prompt": "Describe what this person is doing"
}
]
},
{
"classes": [
"VEHICLE",
"CAR",
"TRUCK",
"BUS"
],
"prompts": [
{
"detectionProperty": "DESCRIPTION",
"prompt": "Describe this vehicle"
}
]
}
],
"framePrompts": [
{
"detectionProperty": "LOCATION",
"prompt": "Describe the location in this scene"
}
]
}
Loading