<a href="https://colab.research.google.com/github/kyileiaye2021/SafeHome_AI/blob/main/SafeHome_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SafeHome AI
### Challenge:
Despite the growing adoption of smart home technologies, current systems still struggle with reliably detecting emergency hazards, such as gas/water leaks when no one is home, and accurately identifying potential security threats, often confusing homeowners with intruders. These limitations reduce trust in smart home automation and can lead to serious safety risks.

### Goal:
Develop a multi-agent AI system capable of analyzing diverse smart-home sensor inputs in real time to accurately detect hazards and security breaches, and deliver real-time alerts to the homeowner. The system should improve reliability, reduce false alarms, and enhance overall safety.

### Multiagent System Architecture

- Input Routing Agent
- Hazard Agent
air quality/ pollution api/weather api/local alert/emergency risk api
- Security Agent
vision camera tool for person detection/door lock/alarm control func tool/homeowner's geolocation (maybe gmap api)/
- Coordinator Agent

### Installing Google Agent Development Kit (ADK)
In this project, Google ADK is utilized.

In [None]:
!pip install google-adk



### Setting up Google API key

In [None]:
from google import genai
from google.genai import types
from google.colab import userdata
import os

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
print("Setup and authentication complete.")

Setup and authentication complete.


### Preprocessing Input Data
Input data can be of any types: images, text, videos. So, the data is preprocessed for later use in agents.

#### Preprocessing Vison Data
- Vision data input such as video, mp4 files are preprocessed through Gemini api before sending request to Input AI agent.

In [None]:
import json

# PREPROCESS VISION DATA SUCH AS IMAGES/VIDEOS
def preprocess_vision_events(file_path:str | None= None, timestamp:str | None=None, source:str | None=None):
  '''
  This preprocess video/image files and creates a json file with a specific description in the videos/images.

  parameters:
  file_path: image/video filepath
  timestamp: timestamp of the video/image
  source: source of the video/image (e.g. front door camera)

  '''

  if file_path is None:
    file_path = input("Enter vision file path (e.g. CCTV footage at front door): ")
  if timestamp is None:
    timestamp = input("Enter timestamp (e.g. 2025-11-19T03:05:00): ")
  if source is None:
    source = input("Enter source (e.g. front door camera): ")

  # file content
  with open(file_path, "rb") as f:
    file_content = f.read()

  # file type
  if file_path.endswith(".jpg") or file_path.endswith(".jpeg") or file_path.endswith(".png"):
    file_type = "image/jpeg"
  elif file_path.endswith(".mp4"):
    file_type = "video/mp4"
  else:
    raise ValueError("Unsupported file type")

  file_part = types.Part.from_bytes(data = file_content, mime_type=file_type)

  prompt = """
  You are a smart home vision analyzer.
  Look at this image/video and return a JSON object with:
  {
    "person_present": true/false,
    "num_people": <int>,
    "description": "<short description of what is happening>",
    "is_suspicious": true/false
  }
  Return ONLY valid JSON. Do not include any explanation, comments, or text before or after the JSON.
  """

  response = client.models.generate_content(
      model = 'gemini-2.0-flash',
      contents=[prompt, file_part],
  )

  print("RAW MODEL RESPONSE: ", response.text)

  # strip whitespaces and ''' ''' in response
  clean = response.text.strip()
  if clean.startswith("```"):
    start = clean.find("{")
    end = clean.find("}") + 1
    clean = clean[start:end]

  print("Cleaned response: ", clean)

  # convert json to python dict
  vision_info = json.loads(clean)

  # event object
  event = {
      "timestamp": timestamp,
      "modality": "vision",
      "source": source,
      "raw_text": vision_info.get("description", ""),
      "data":{
          "person_present": vision_info.get("person_present", False),
          "num_people": vision_info.get("num_people", 0),
          "is_suspicious": vision_info.get("is_suspicious", False)
      }
  }
  return event


In [None]:
# testing the vision event
# file_path = "/content/Video_Generation_of_Home_Intrusion.mp4"
# time_stamp = "Wed, 19 Nov 25 10:37:39 +0000"
# source = "Inside house"
# preprocess_vision_events(file_path, time_stamp, source)
preprocess_vision_events()

Enter vision file path (e.g. CCTV footage at front door): /content/Video_Generation_of_Home_Intrusion.mp4
Enter timestamp (e.g. 2025-11-19T03:05:00): Wed, 19 Nov 25 10:37:39 +0000
Enter source (e.g. front door camera): front door camera
RAW MODEL RESPONSE:  ```json
{
  "person_present": true,
  "num_people": 2,
  "description": "A woman is caught stealing and is confronted by the homeowner with a shotgun.",
  "is_suspicious": true
}
```
Cleaned response:  {
  "person_present": true,
  "num_people": 2,
  "description": "A woman is caught stealing and is confronted by the homeowner with a shotgun.",
  "is_suspicious": true
}


{'timestamp': 'Wed, 19 Nov 25 10:37:39 +0000',
 'modality': 'vision',
 'source': 'front door camera',
 'raw_text': 'A woman is caught stealing and is confronted by the homeowner with a shotgun.',
 'data': {'person_present': True, 'num_people': 2, 'is_suspicious': True}}

#### Preprocessing Sound Data
- Sound data input such as audio files are preprocessed through Gemini api before sending request to Input AI agent.

In [None]:
import json

# PREPROCESS VISION DATA SUCH AS IMAGES/VIDEOS
def preprocess_sound_events(file_path:str | None=None, timestamp:str | None=None, source:str | None=None):
  '''
  This preprocess audio files and creates a json file with a specific description in the audio files.

  parameters:
  file_path: image/video filepath
  timestamp: timestamp of the video/image
  source: source of the video/image (e.g. front door camera)

  '''
  if file_path is None:
    file_path = input("Enter sound file path (e.g. kitchen_noise.wav): ")
  if timestamp is None:
    timestamp = input("Enter timestamp (e.g. 2025-11-19T03:06:00): ")
  if source is None:
    source = input("Enter source (e.g. kitchen): ")

  # file content
  with open(file_path, "rb") as f:
    file_content = f.read()

  # file type
  if file_path.endswith(".wav") or file_path.endswith(".aiff"):
    file_type = "audio/wav"
  elif file_path.endswith(".mp4"):
    file_type = "video/mp4"
  elif file_path.endswith(".mp3"):
    file_type = "audio/mpeg"
  else:
    raise ValueError("Unsupported file type")

  audio_part = types.Part.from_bytes(data = file_content, mime_type=file_type)

  prompt = """
  You are a smart home sound analyzer.
  Listen to this audio carefully and return a JSON object with:
  {
    "sound_type": "conversations" | "animal sound (e.g. dog barks)" | "objects sound (e.g. door slam, glass breaking)" | "appliance noises (e.g. refrigerator's hum, hair dryer sound)" | "other",
    "is_loud": true/false,
    "description": "<short description of what the sound is and what is happening in the audio file. Describe expressively and concise but detailed.>",
    "is_suspicious": true/false
  }
  Return ONLY valid JSON. Do not include any explanation, comments, or text before or after the JSON.
  """

  response = client.models.generate_content(
      model = 'gemini-2.0-flash',
      contents=[prompt, audio_part],
  )

  print("RAW MODEL RESPONSE: ", response.text)

  # strip whitespaces and ''' ''' in response
  clean = response.text.strip()
  if clean.startswith("```"):
    start = clean.find("{")
    end = clean.find("}") + 1
    clean = clean[start:end]

  print("Cleaned response: ", clean)

  # convert json to python dict
  sound_info = json.loads(clean)

  # event object
  event = {
      "timestamp": timestamp,
      "modality": "sound",
      "source": source,
      "raw_text": sound_info.get("description", ""),
      "data":{
          "sound_type": sound_info.get("sound_type", "other"),
          "is_loud": sound_info.get("is_loud", False),
          "is_suspicious": sound_info.get("is_suspicious", False)
      }
  }
  return event


In [None]:
# Testing sound event
# audio_file = "/content/665070__roses1401__all-okay.mp3"
# time_stamp = "Wed, 19 Nov 25 10:37:39 +0000"
# source = "Inside kitchen"
# preprocess_sound_events(audio_file, time_stamp, source)
preprocess_sound_events()


Enter sound file path (e.g. kitchen_noise.wav): /content/665070__roses1401__all-okay.mp3
Enter timestamp (e.g. 2025-11-19T03:06:00): Wed, 19 Nov 25 10:37:39 +0000
Enter source (e.g. kitchen): Inside kitchen
RAW MODEL RESPONSE:  ```json
{
  "sound_type": "conversations",
  "is_loud": false,
  "description": "A person is speaking in a somewhat incoherent and rambling manner, laughing occasionally and mentioning customized samples, music, and job opportunities.",
  "is_suspicious": false
}
```
Cleaned response:  {
  "sound_type": "conversations",
  "is_loud": false,
  "description": "A person is speaking in a somewhat incoherent and rambling manner, laughing occasionally and mentioning customized samples, music, and job opportunities.",
  "is_suspicious": false
}


{'timestamp': 'Wed, 19 Nov 25 10:37:39 +0000',
 'modality': 'sound',
 'source': 'Inside kitchen',
 'raw_text': 'A person is speaking in a somewhat incoherent and rambling manner, laughing occasionally and mentioning customized samples, music, and job opportunities.',
 'data': {'sound_type': 'conversations',
  'is_loud': False,
  'is_suspicious': False}}

#### Preprocessing Sensor Data
Sensor data such as gas, temperature, water are manually set. These data will be used in hazard AI agent to make home safe from potential hazardous danger. Other sensor data such as door lock, motion, and human presence are also set to be used in security AI agent to prevent home from security danger.

Currently, the data are manual data as the real time data can only be detected in real smart home technology.

In [None]:
# THIS EVENTS WILL BE USED FOR HAZARD AI AGENT
# THESE DATA ARE MANUALLY SET. IN THE FUTURE, IF THERE ARE DATA DETECTED IN SMART TECHNOLOGY, THOSE DATA WILL BE USED
gas_event = {
    "timestamp": "2025-11-19T03:07:00",
    "modality": "sensor",
    "source": "gas_sensor_kitchen",
    "raw_text": "",
    "data": {"gas_level": 0.85},
}
temp_event = {
    "timestamp": "2025-11-19T03:08:00",
    "modality": "sensor",
    "source": "temp_sensor_living_room",
    "raw_text": "",
    "data": {"temperature_c": 32.0},
}
water_event = {
    "timestamp": "2025-11-19T03:09:00",
    "modality": "sensor",
    "source": "water_leak_sensor_bathroom",
    "raw_text": "",
    "data": {"water_leak": False},
}

In [None]:
# THIS EVENTS WILL BE USED FOR SECURITY AI AGENT
# THESE DATA ARE MANUALLY SET. IN THE FUTURE, IF THERE ARE DATA DETECTED IN SMART TECHNOLOGY, THOSE DATA WILL BE USED
door_event = {
    "timestamp": "2025-11-19T03:10:00",
    "modality": "sensor",
    "source": "door_sensor_front_door",
    "raw_text": "Front door opened.",
    "data": {
        "door": "front",
        "event": "open",
        "is_night": True,
    },
}
motion_event = {
    "timestamp": "2025-11-19T03:11:00",
    "modality": "sensor",
    "source": "motion_sensor_backyard",
    "raw_text": "Motion detected in the backyard.",
    "data": {
        "motion_detected": True,
        "area": "backyard",
    },
}

### Combining All Vision, Sound, and Sensor Input Data

All vision, sound, and sensor input data are listed in chronological order based on time stamps.

In [None]:
scenario_events = []

# adding
scenario_events.append(preprocess_vision_events())
scenario_events.append(preprocess_sound_events())

scenario_events.append(gas_event)
scenario_events.append(temp_event)
scenario_events.append(water_event)

scenario_events.append(door_event)
scenario_events.append(motion_event)

scenario_events = sorted(scenario_events, key=lambda x: x['timestamp'])

with open("scenario_events.json", "w") as f:
  json.dump(scenario_events, f, indent=2)

print("Saved", len(scenario_events), "events. ")

Enter vision file path (e.g. CCTV footage at front door): /content/CCTV_Footage_of_Man_Entering_House.mp4
Enter timestamp (e.g. 2025-11-19T03:05:00): 2025-09-19T03:10:00
Enter source (e.g. front door camera): front door camera
RAW MODEL RESPONSE:  ```json
{
  "person_present": true,
  "num_people": 1,
  "description": "An elderly man approaches the door, unlocks it with a key, and enters the house carrying a briefcase.",
  "is_suspicious": false
}
```
Cleaned response:  {
  "person_present": true,
  "num_people": 1,
  "description": "An elderly man approaches the door, unlocks it with a key, and enters the house carrying a briefcase.",
  "is_suspicious": false
}
Enter sound file path (e.g. kitchen_noise.wav): /content/591459__wakabaclamp__barking-dog.mp3
Enter timestamp (e.g. 2025-11-19T03:06:00): 2025-12-19T03:10:00
Enter source (e.g. kitchen): living room
RAW MODEL RESPONSE:  ```json
{
  "sound_type": "animal sound (e.g. dog barks)",
  "is_loud": true,
  "description": "The audio fea

### Creating a Custom Function Tool for Input Router AI Agent
A custom function tool is created to sent scenario events to the Input Router AI Agent as a tool.

In [None]:
from google.adk.agents import Agent
from google.adk.models.google_llm import Gemini
from google.adk.runners import InMemoryRunner
from google.adk.tools import FunctionTool
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService

In [None]:
# CREATING A CUSTOM FUNCTION TOOL FOR SCENARIO EVENTS
with open("scenario_events.json", "r") as f:
  SCENARIO_EVENTS = json.load(f)

EVENT_INDEX = {"current": 0}

def get_next_event():
  """
    Return the next preprocessed smart-home event from scenario_events.json.
    When no more events are left, returns {"done": True}.
  """
  if EVENT_INDEX["current"] >= len(SCENARIO_EVENTS):
      return {"done": True}
  ev = SCENARIO_EVENTS[EVENT_INDEX["current"]]
  EVENT_INDEX["current"] += 1
  return ev

event_tool = FunctionTool(func=get_next_event)

### Input Router AI Agent


In [None]:
input_router_agent = Agent(
    name="input_router_agent",
    model=Gemini(
        model= "gemini-2.5-flash-lite"
    ),
    description="Routes smart-home events to Hazard or Security agents.",
    instructions= """
    You are the Input Router Agent in a smart-home multi-agent system.
    You can call the tool get_next_event() to fetch one event at a time.
    Each event has this JSON structure: {timestamp, modality, source, raw_text, data}.\n

    For each event, classify it into exactly one of:
     '  - \"hazard\": gas leak, abnormal temperature, water leak, dangerous conditions\n'
        '  - \"security\": unknown person, suspicious motion/sound, door open at night\n'
        '  - \"ignore\": normal activity, non-dangerous events\n\n'
        "After you see an event from get_next_event(), respond ONLY with JSON:\n"
        "{\n"
        '  \"target_agent\": \"hazard\" | \"security\" | \"ignore\",\n'
        '  \"reason\": \"short explanation\"\n'
        "}\n
    When get_next_event() returns {\"done\": true}, stop.\n""",
    tools=[get_next_event],
)
print("Input Router Agent defined.")

ValidationError: 1 validation error for LlmAgent
instructions
  Extra inputs are not permitted [type=extra_forbidden, input_value='\n    You are the Input ...{"done": true}, stop.\n', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/extra_forbidden