<a href="https://colab.research.google.com/github/kyileiaye2021/SafeHome_AI/blob/main/SafeHome_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SafeHome AI
### Challenge:
Despite the growing adoption of smart home technologies, current systems still struggle with reliably detecting emergency hazards, such as gas/water leaks when no one is home, and accurately identifying potential security threats, often confusing homeowners with intruders. These limitations reduce trust in smart home automation and can lead to serious safety risks.

### Goal:
Develop a multi-agent AI system capable of analyzing diverse smart-home sensor inputs in real time to accurately detect hazards and security breaches, and deliver real-time alerts to the homeowner. The system should improve reliability, reduce false alarms, and enhance overall safety.

### Multiagent System Architecture

- Input Routing Agent
- Hazard Agent
air quality/ pollution api/weather api/local alert/emergency risk api
- Security Agent
vision camera tool for person detection/door lock/alarm control func tool/homeowner's geolocation (maybe gmap api)/
- Coordinator Agent

### Installing Google Agent Development Kit (ADK)
In this project, Google ADK is utilized.

In [None]:
!pip install google-adk



### Setting up Google API key

In [None]:
from google import genai
from google.genai import types
from google.colab import userdata
import os

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
print("Setup and authentication complete.")

Setup and authentication complete.


### Preprocessing Input Data
Input data can be of any types: images, text, videos. So, the data is preprocessed for later use in agents.

#### Preprocessing Vison Data
- Vision data input such as video, mp4 files are preprocessed through Gemini api before sending request to Input AI agent.

In [None]:
import json

# PREPROCESS VISION DATA SUCH AS IMAGES/VIDEOS
def preprocess_vision_events(file_path:str | None= None, timestamp:str | None=None, source:str | None=None):
  '''
  This preprocess video/image files and creates a json file with a specific description in the videos/images.

  parameters:
  file_path: image/video filepath
  timestamp: timestamp of the video/image
  source: source of the video/image (e.g. front door camera)

  '''

  if file_path is None:
    file_path = input("Enter vision file path (e.g. CCTV footage at front door): ")
  if timestamp is None:
    timestamp = input("Enter timestamp (e.g. 2025-11-19T03:05:00): ")
  if source is None:
    source = input("Enter source (e.g. front door camera): ")

  # file content
  with open(file_path, "rb") as f:
    file_content = f.read()

  # file type
  if file_path.endswith(".jpg") or file_path.endswith(".jpeg") or file_path.endswith(".png"):
    file_type = "image/jpeg"
  elif file_path.endswith(".mp4"):
    file_type = "video/mp4"
  else:
    raise ValueError("Unsupported file type")

  file_part = types.Part.from_bytes(data = file_content, mime_type=file_type)

  prompt = """
  You are a smart home vision analyzer.
  Look at this image/video and return a JSON object with:
  {
    "person_present": true/false,
    "num_people": <int>,
    "description": "<short description of what is happening>",
    "is_suspicious": true/false
  }
  Return ONLY valid JSON. Do not include any explanation, comments, or text before or after the JSON.
  """

  response = client.models.generate_content(
      model = 'gemini-2.0-flash',
      contents=[prompt, file_part],
  )

  print("RAW MODEL RESPONSE: ", response.text)

  # strip whitespaces and ''' ''' in response
  clean = response.text.strip()
  if clean.startswith("```"):
    start = clean.find("{")
    end = clean.find("}") + 1
    clean = clean[start:end]

  print("Cleaned response: ", clean)

  # convert json to python dict
  vision_info = json.loads(clean)

  # event object
  event = {
      "timestamp": timestamp,
      "modality": "vision",
      "source": source,
      "raw_text": vision_info.get("description", ""),
      "data":{
          "person_present": vision_info.get("person_present", False),
          "num_people": vision_info.get("num_people", 0),
          "is_suspicious": vision_info.get("is_suspicious", False)
      }
  }
  return event


In [None]:
# testing the vision event
# file_path = "/content/Video_Generation_of_Home_Intrusion.mp4"
# time_stamp = "Wed, 19 Nov 25 10:37:39 +0000"
# source = "Inside house"
# preprocess_vision_events(file_path, time_stamp, source)
# preprocess_vision_events()

#### Preprocessing Sound Data
- Sound data input such as audio files are preprocessed through Gemini api before sending request to Input AI agent.

In [None]:
import json

# PREPROCESS VISION DATA SUCH AS IMAGES/VIDEOS
def preprocess_sound_events(file_path:str | None=None, timestamp:str | None=None, source:str | None=None):
  '''
  This preprocess audio files and creates a json file with a specific description in the audio files.

  parameters:
  file_path: image/video filepath
  timestamp: timestamp of the video/image
  source: source of the video/image (e.g. front door camera)

  '''
  if file_path is None:
    file_path = input("Enter sound file path (e.g. kitchen_noise.wav): ")
  if timestamp is None:
    timestamp = input("Enter timestamp (e.g. 2025-11-19T03:06:00): ")
  if source is None:
    source = input("Enter source (e.g. kitchen): ")

  # file content
  with open(file_path, "rb") as f:
    file_content = f.read()

  # file type
  if file_path.endswith(".wav") or file_path.endswith(".aiff"):
    file_type = "audio/wav"
  elif file_path.endswith(".mp4"):
    file_type = "video/mp4"
  elif file_path.endswith(".mp3"):
    file_type = "audio/mpeg"
  else:
    raise ValueError("Unsupported file type")

  audio_part = types.Part.from_bytes(data = file_content, mime_type=file_type)

  prompt = """
  You are a smart home sound analyzer.
  Listen to this audio carefully and return a JSON object with:
  {
    "sound_type": "conversations" | "animal sound (e.g. dog barks)" | "objects sound (e.g. door slam, glass breaking)" | "appliance noises (e.g. refrigerator's hum, hair dryer sound)" | "other",
    "is_loud": true/false,
    "description": "<short description of what the sound is and what is happening in the audio file. Describe expressively and concise but detailed.>",
    "is_suspicious": true/false
  }
  Return ONLY valid JSON. Do not include any explanation, comments, or text before or after the JSON.
  """

  response = client.models.generate_content(
      model = 'gemini-2.0-flash',
      contents=[prompt, audio_part],
  )

  print("RAW MODEL RESPONSE: ", response.text)

  # strip whitespaces and ''' ''' in response
  clean = response.text.strip()
  if clean.startswith("```"):
    start = clean.find("{")
    end = clean.find("}") + 1
    clean = clean[start:end]

  print("Cleaned response: ", clean)

  # convert json to python dict
  sound_info = json.loads(clean)

  # event object
  event = {
      "timestamp": timestamp,
      "modality": "sound",
      "source": source,
      "raw_text": sound_info.get("description", ""),
      "data":{
          "sound_type": sound_info.get("sound_type", "other"),
          "is_loud": sound_info.get("is_loud", False),
          "is_suspicious": sound_info.get("is_suspicious", False)
      }
  }
  return event


In [None]:
# Testing sound event
# audio_file = "/content/665070__roses1401__all-okay.mp3"
# time_stamp = "Wed, 19 Nov 25 10:37:39 +0000"
# source = "Inside kitchen"
# preprocess_sound_events(audio_file, time_stamp, source)
preprocess_sound_events()


Enter sound file path (e.g. kitchen_noise.wav): /content/665070__roses1401__all-okay.mp3
Enter timestamp (e.g. 2025-11-19T03:06:00): Wed, 19 Nov 25 10:37:39 +0000
Enter source (e.g. kitchen): Inside kitchen
RAW MODEL RESPONSE:  ```json
{
  "sound_type": "conversations",
  "is_loud": false,
  "description": "A person is speaking in a somewhat incoherent and rambling manner, laughing occasionally and mentioning customized samples, music, and job opportunities.",
  "is_suspicious": false
}
```
Cleaned response:  {
  "sound_type": "conversations",
  "is_loud": false,
  "description": "A person is speaking in a somewhat incoherent and rambling manner, laughing occasionally and mentioning customized samples, music, and job opportunities.",
  "is_suspicious": false
}


{'timestamp': 'Wed, 19 Nov 25 10:37:39 +0000',
 'modality': 'sound',
 'source': 'Inside kitchen',
 'raw_text': 'A person is speaking in a somewhat incoherent and rambling manner, laughing occasionally and mentioning customized samples, music, and job opportunities.',
 'data': {'sound_type': 'conversations',
  'is_loud': False,
  'is_suspicious': False}}

#### Preprocessing Sensor Data
Sensor data such as gas, temperature, water are manually set. These data will be used in hazard AI agent to make home safe from potential hazardous danger. Other sensor data such as door lock, motion, and human presence are also set to be used in security AI agent to prevent home from security danger.

Currently, the data are manual data as the real time data can only be detected in real smart home technology.

In [None]:
# THIS EVENTS WILL BE USED FOR HAZARD AI AGENT
# THESE DATA ARE MANUALLY SET. IN THE FUTURE, IF THERE ARE DATA DETECTED IN SMART TECHNOLOGY, THOSE DATA WILL BE USED
gas_event = {
    "timestamp": "2025-11-19T03:07:00",
    "modality": "sensor",
    "source": "gas_sensor_kitchen",
    "raw_text": "",
    "data": {"gas_level": 0.85},
}
temp_event = {
    "timestamp": "2025-11-19T03:08:00",
    "modality": "sensor",
    "source": "temp_sensor_living_room",
    "raw_text": "",
    "data": {"temperature_c": 32.0},
}
water_event = {
    "timestamp": "2025-11-19T03:09:00",
    "modality": "sensor",
    "source": "water_leak_sensor_bathroom",
    "raw_text": "",
    "data": {"water_leak": False},
}

In [None]:
# THIS EVENTS WILL BE USED FOR SECURITY AI AGENT
# THESE DATA ARE MANUALLY SET. IN THE FUTURE, IF THERE ARE DATA DETECTED IN SMART TECHNOLOGY, THOSE DATA WILL BE USED
door_event = {
    "timestamp": "2025-11-19T03:10:00",
    "modality": "sensor",
    "source": "door_sensor_front_door",
    "raw_text": "Front door opened.",
    "data": {
        "door": "front",
        "event": "open",
        "is_night": True,
    },
}
motion_event = {
    "timestamp": "2025-11-19T03:11:00",
    "modality": "sensor",
    "source": "motion_sensor_backyard",
    "raw_text": "Motion detected in the backyard.",
    "data": {
        "motion_detected": True,
        "area": "backyard",
    },
}

### Combining All Vision, Sound, and Sensor Input Data

All vision, sound, and sensor input data are listed in chronological order based on time stamps.

In [None]:
scenario_events = []

# adding
scenario_events.append(preprocess_vision_events())
scenario_events.append(preprocess_sound_events())

scenario_events.append(gas_event)
scenario_events.append(temp_event)
scenario_events.append(water_event)

scenario_events.append(door_event)
scenario_events.append(motion_event)

scenario_events = sorted(scenario_events, key=lambda x: x['timestamp'])

with open("scenario_events.json", "w") as f:
  json.dump(scenario_events, f, indent=2)

print("Saved", len(scenario_events), "events. ")

Enter vision file path (e.g. CCTV footage at front door): /content/Video_Generation_of_Home_Intrusion.mp4
Enter timestamp (e.g. 2025-11-19T03:05:00): 2025-11-22T22:46:41+00:00
Enter source (e.g. front door camera): inside house
RAW MODEL RESPONSE:  ```json
{
  "person_present": true,
  "num_people": 2,
  "description": "A woman is caught by a man with a shotgun while stealing jewelry from a home.",
  "is_suspicious": true
}
```
Cleaned response:  {
  "person_present": true,
  "num_people": 2,
  "description": "A woman is caught by a man with a shotgun while stealing jewelry from a home.",
  "is_suspicious": true
}
Enter sound file path (e.g. kitchen_noise.wav): /content/500988__simoneyoh3998__sound-dryer-running-from-front.wav
Enter timestamp (e.g. 2025-11-19T03:06:00): 2025-07-22T22:46:41+00:00
Enter source (e.g. kitchen): Bathroom
RAW MODEL RESPONSE:  ```json
{
  "sound_type": "appliance noises (e.g. refrigerator's hum, hair dryer sound)",
  "is_loud": true,
  "description": "The aud

### Creating a Custom Function Tool for Input Router AI Agent
A custom function tool is created to sent scenario events to the Input Router AI Agent as a tool.

In [None]:
from google.adk.agents import LlmAgent
from google.adk.models.google_llm import Gemini
from google.adk.runners import InMemoryRunner
from google.adk.tools import FunctionTool
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService

  from google.cloud.aiplatform.utils import gcs_utils


In [None]:
# CREATING A CUSTOM FUNCTION TOOL FOR SCENARIO EVENTS
with open("scenario_events.json", "r") as f:
  SCENARIO_EVENTS = json.load(f)

EVENT_INDEX = {"current": 0}

def get_next_event():
  """
    Return the next preprocessed smart-home event from scenario_events.json.
    When no more events are left, returns {"done": True}.
  """
  if EVENT_INDEX["current"] >= len(SCENARIO_EVENTS):
      return {"done": True}
  ev = SCENARIO_EVENTS[EVENT_INDEX["current"]]
  EVENT_INDEX["current"] += 1
  return ev

event_tool = FunctionTool(func=get_next_event)

In [None]:
# CREATING A CUSTOM FUNCTION TOOL FOR USER QUERY
def get_user_event(query:str):
  return query
user_event = FunctionTool(func=get_user_event)

### User Command
User command will be passed to the Input Router AI agent.

In [None]:
# from datetime import datetime, timezone

# def make_user_event(user_text:str):
#   return {
#         "timestamp": datetime.now(timezone.utc).isoformat(),
#         "modality": "user",
#         "source": "user",
#         "raw_text": user_text,
#         "data": {}
#     }
# {
#   "timestamp": "...",
#   "modality": "user",
#   "source": "user",
#   "raw_text": "the user's natural language message",
#   "data": { }
# }

### Input Router AI Agent


In [None]:
input_router_agent = LlmAgent(
    name="input_router_agent",
    model=Gemini(
        model= "gemini-2.5-flash-lite"
    ),
    description="Routes smart-home events to Hazard or Security agents.",
    instruction="""
You are the Input Router Agent in a smart-home multi-agent system.

You can call the tool get_next_event() to fetch one event at a time.
Each event has the following JSON structure:
{
  "timestamp": "...",
  "modality": "vision" | "sound" | "sensor" | "system",
  "source": "...",
  "raw_text": "...",
  "data": { ... }
}

For environment events, classify them into EXACTLY one of:
- "hazard": gas leak, abnormal temperature change, water leak, dangerous conditions
- "security": unknown person, suspicious motion/sound, door open at night
- "ignore": normal activity or non-dangerous events

You may also receive USER events directly (via get_user_event). If the user query is not None, classify it as "command".

For USER events,

If the user query is not None:
- If the user is asking the system to do something like "turn off the lights", "lock the door", classify as "command".
- If the user is asking about hazards (e.g. "any hazards right now?"), classify as "hazard".
- If the user is asking about security (e.g. "is my home secure?"), classify as "security".
- Otherwise classify as "ignore".


If the user query is None:
- Process the events (via get_next_event). If get_next_event() returns {"done": true}, stop processing.

After receiving an event, respond ONLY with valid JSON:
{
  "target_agent": "hazard" | "security" | "ignore" | "command",
  "reason": "short explanation of why you chose this category"
}
""",
    tools=[user_event, get_next_event],
)

print("✅ Input Router Agent defined.")



✅ Input Router Agent defined.


# Testing Input Router Agent
ADK Runner is used to test input router agent.

In [None]:
# # runner, session, app name, and user id will be defined
# APP_NAME = "smart_home_app"
# USER_ID = "demo_user"
# SESSION_ID = "session_1"

# session_service = InMemorySessionService()

# # Create a session
# session = session_service.create_session(
#     app_name=APP_NAME,
#     user_id = USER_ID,
#     session_id = SESSION_ID
# )

# # Create a runner
# runner = Runner(
#     agent=input_router_agent,
#     app_name=APP_NAME,
#     session_service=session_service
# )

In [None]:
# def parse_router_output(raw_text: str) -> dict:
#     clean = raw_text.strip()
#     if clean.startswith("```"):
#         start = clean.find("{")
#         end = clean.rfind("}") + 1
#         clean = clean[start:end]
#     return json.loads(clean)

In [None]:
# DEFINING SESSION, APP NAME, AND RUNNER

USER_ID = "demo_user"
SESSION_ID = "session_7"
session_service = InMemorySessionService()
APP_NAME = "smart_home_app"

session = await session_service.create_session(
        app_name=APP_NAME,
        user_id=USER_ID,
        session_id=SESSION_ID
    )

runner = Runner(
    agent=input_router_agent,
    session_service=session_service,
    app_name=APP_NAME # <-- This was missing
)

In [None]:
async def run_query(query: str | None=None):
    # session = await session_service.create_session(
    #     app_name=APP_NAME,
    #     user_id=USER_ID,
    #     session_id=SESSION_ID
    # )

    new_message_content = None
    if query:
        # Create content from user query in ADK format
        new_message_content = types.Content(
            role="user",
            parts=[types.Part.from_text(text=query)]
        )
    else:
        # If no query, provide a synthetic message to kick off the agent
        # The agent's instruction says: "If the user query is None: - Process the events (via get_next_event)"
        # So, we provide a non-empty user message to satisfy the runner's requirement and enable tool calls.
        new_message_content = types.Content(
            role="user",
            parts=[types.Part.from_text(text="Process next smart-home event.")]
        )

    # Run the agent using the runner. This is a synchronous call.
    # For asynchronous operations, use runner.run_async() and 'await'
    events = runner.run_async(
        user_id=USER_ID,
        session_id=SESSION_ID,
        new_message=new_message_content,
    )

    async for event in events:
        if event.is_final_response():
            if event.content and event.content.parts:
                for part in event.content.parts:
                    if part.text:
                        return part.text

    return "No final response received."

In [None]:
response = await run_query("Is my house secure right now?")
print(f"Agent Response: {response}")

Agent Response: The user is asking about the security of their house. This should be routed to the security agent.
{
  "target_agent": "security",
  "reason": "The user is asking a question about the security of their house."
}


### Hazard Agent
Getting hazard command from input router agent, a hazard agent will handle the hazardous agent such as gas leak, water leak, sudden temperature change and alert the user. It will automatically take actions for emergency cases such as turning off the main power system when gas leak or adapting the temperature to sudden change of weather when necessary.

For this prototype, we assume the smart home is located in Los Angeles (hard-coded lat/lon). In a real deployment, this would be dynamically configured.

In [None]:
# This is just example of home location.
HOME_LAT = 34.05
HOME_LON = -118.24

In [11]:
# Add a weather API
import requests

def get_outdoor_temp(lat:float, lon:float) -> float:
  """
  - Used Open-Mateo weather api
  - Get current outdoor temperature in deg celcius
  """
  url = f"https://api.open-meteo.com/v1/forecast"
  params = {
      "latitude": lat,
      "longitude": lon,
      "current_weather": True,
  }
  resp = requests.get(url, params=params)
  resp.raise_for_status()
  data = resp.json()
  curr_temp = data["current_weather"]["temperature"]
  return {"outside_temp_c": curr_temp}


In [12]:
# testing weather api
print(get_outdoor_temp(HOME_LAT, HOME_LON))

{'outside_temp_c': 21.0}


### Making immediate actions for emergency cases


In [13]:
def apply_hazard_actions(actions: list[str]) -> dict:
    """
    Simulate applying hazard actions (no real hardware).
    """
    print("[Simulated] Applying hazard actions:", actions)
    return {"status": "ok", "applied": actions}

In [None]:
HAZARD_INSTRUCTION = """
You are the Hazard Agent in a smart-home system.

You receive ONE environment event as JSON, already classified as "hazard" by the router.
Example structure:
{
  "timestamp": "...",
  "modality": "sensor",
  "source": "...",
  "raw_text": "...",
  "data": {
    "gas_level": <0.0–1.0 or null>,
    "water_leak": true/false or null,
    "temperature_c": <number or null>
  }
}

You have access to these tools:
- get_outdoor_temp(): returns {"outside_temp_c": <number>} for the home's location.
- apply_hazard_actions(actions: list[str]): simulate applying actions (no real hardware).

Your job:
1. Decide the hazard type: "gas_leak", "water_leak", "temp_anomaly", or "other".
2. Decide the severity: "low", "medium", or "high".
3. For gas leaks and serious water leaks, you may take immediate actions and call apply_hazard_actions.
4. For temperature anomalies:
   - Call get_outdoor_temp() to get the outside temperature.
   - Compare inside vs outside temp.
   - Prefer to recommend actions and wait for human approval instead of acting immediately.
5. Decide:
   - should_take_immediate_action: true/false
   - needs_human_approval: true/false
   - proposed_actions: list of actions you recommend
   - auto_actions: list of actions you already took (if any, often [] for temp anomalies)
   - notify_owner: true/false
   - user_message: short explanation and, for temp anomalies, a QUESTION asking owner to approve or reject.

Always respond ONLY with valid JSON, no extra text:
{
  "hazard_type": "gas_leak" | "water_leak" | "temp_anomaly" | "other",
  "severity": "low" | "medium" | "high",
  "should_take_immediate_action": true/false,
  "needs_human_approval": true/false,
  "proposed_actions": ["...", "..."],
  "auto_actions": ["...", "..."],
  "notify_owner": true/false,
  "user_message": "..."
}
"""

hazard_agent = LlmAgent(
    name="hazard_agent",
    model=Gemini(model="gemini-2.5-flash-lite"),
    description="Analyzes hazard events (gas, water, temp) and decides actions.",
    instruction=HAZARD_INSTRUCTION,
    tools=[get_outdoor_temp],
)



### Testing Hazard Agent


In [None]:
HAZARD_APP_NAME = 'safe_home_hazard'
HAZARD_USER_ID = 'hazard_demo_user'
HAZARD_SESSION_ID = 'hazard_session_1'
session_service = InMemorySessionService()

hazard_session = await session_service.create_session(
        app_name=HAZARD_APP,
        user_id=HAZARD_USER_ID,
        session_id=HAZARD_SESSION_ID
    )

hazard_runner = Runner(
    agent=hazard_agent,
    session_service=session_service,
    app_name=HAZARD_APP_NAME,
    user_id=HAZARD_USER_ID,
    session_id=HAZARD_SESSION_ID
)

### Security Agent

### Command Control Agent
