<a href="https://colab.research.google.com/github/jaynetra/GoogleADKHackathon/blob/main/MultiModalSyntheticAgent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi modal synthetic data generator with ADK

**Motivation for Multimodal synthetic data generator**

Synthetic data has been around in one form or another for decades.But it is much more important as quality real world data is very hard to get access to for innovation. Many tools are available that address generation of single modality for synthetic data. e.g. for **Tabular Synthetic Data** that is very useful in healthcare/financial domains, tools such as SDV,dbtwin etc. can be used. Similarly there are innovative models for generation of images and audio. But access to meaningful connected multi modal synthetic data  is an industry problem, that can actually fuel innovation. The core idea for this project is to create a multi agent, multi modal super agent that can create multi modal synthetic data using various tools. This is designed as  **multi-agent system** with main agent delegating task for different modality to different subagents.

The system design uses  a multi agent pattern to develop the **MultiModalSyntheticDataGeneratorAgent** component. This agent delegates the tasks to subagents that specialize in one modality. Following modalities are covered

* Synthetic Tabular Data
* Image creation
* Synthetic medical notes

The design is modular to add other modalities can be added easily.


**Architecture Diagram**



**Submission Requirement Checklist:**


*   ✅ **Categories: Content creation and Generation**

Designing specialized sub-agents and enabling automatic routing (`auto flow`) of user requests to the most appropriate agent within a team.

*   ✅ **Multi Agent:**

Crafting Python functions (`tools`) that grant agents specific abilities (like fetching data) and instructing agents on how to use them effectively.

*   ✅ **Architecture Diagram:**

Crafting Python functions (`tools`) that grant agents specific abilities (like fetching data) and instructing agents on how to use them effectively.

*   ✅ **Tools and Frameworks: ADK:**

Utilizing `Session State` and `ToolContext` to enable agents to remember information across conversational turns, leading to more contextual interactions.

*   ✅ **New Project:**
Implementing `before_model_callback` and `before_tool_callback` to inspect, modify, or block requests/tool usage based on predefined rules, enhancing application safety and control.

*   ✅ **Project Hosting:**
On collab environment. The link will be shared for judging

*   ✅ **Project Text Description:**
On collab environment. The link will be shared for judging

*   ✅ **URL to public repo:**
On collab environment. The link will be shared for judging

*   ✅ **Submission Video Link public:**
On collab environment. The link will be shared for judging


**Future work:**

*   ✅ **Connected synthetic data:** Many industries need connected synthetic data based on the industry domain that is multi modal. Understand the domain and create framework to connect the modalities

*   ✅ **Scaling and cost:** Get understanding of scale and cost. While cost per token is a rule of thumb, figure out what other factors need to be understood.
  

In [1]:
# @title Step 0: Setup and Installation
# Install ADK and LiteLLM for multi-model support

!pip install google-adk -q
!pip install litellm -q
!pip install sdv -q

print("Installation complete.")

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m52.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/240.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m240.0/240.0 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/218.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m218.1/218.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m335.7/335.7 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.2/130.2 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# @title Step 1: Import necessary libraries
import os
import asyncio
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm # For multi-model support
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai import types # For creating message Content/Parts

import warnings
# Ignore all warnings
warnings.filterwarnings("ignore")

import logging
logging.basicConfig(level=logging.ERROR)

print("Libraries imported.")

Libraries imported.


In [3]:
# @title Step 2: Configure API Keys
# --- IMPORTANT: Replace placeholders with your real API keys ---


from google.colab import userdata
#GOOGLE_API_KEY = userdata.get('GOOGLE_KEY')
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_KEY')
MODEL_GEMINI_2_0_FLASH = "gemini-2.0-flash"




# Step 3 : Create functions for different modalities

---

## Modality 1: Synthetic tabular data

Use sdv framework which is a open source tool to create. For hackathon purposes using the dataset in the framework that is public. In real scenarios, this step takes a lot of effort

1. **A Tool:** A Python function that equips the agent with the *ability* to get synthetic data. Because there is limited resources and the goal is to understand the ADK concepts, return only 5 rows and returns as a json record structure. Since the model we use only outputs text, json is a good way to represent the output.

2. **An Agent:** The AI "brain" that understands the user's request, knows it has a tabular data generation tool, and decides when and how to use it.




In [4]:
# @title Define the get_tabular_synthetic_data function
# using sdv.dev so not need to deal with keys etc.

import pandas as pd
    # Mock tabular data
from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer

def get_tabular_synthetic_data() -> dict:
    """Retrieves the current weather report for a specified city.

       Returns:
        dict: A dictionary containing default dataframe to dict function.
              Includes a 'status' key ('success' or 'error').
     """

    real_data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_hotel_guests')

    synthesizer = GaussianCopulaSynthesizer(metadata)
    synthesizer.fit(data=real_data)

    df_synth = synthesizer.sample(num_rows=5)

    return df_synth.to_json(orient='records')


---

## Modality 2: Synthetic Image

Use Gen AI tool kit with google's image generating model

1. **A Tool:** A Python function that equips the agent with the *ability* to get synthetic image. Use google image generation model *gemini-2.0-flash-preview-image-generation*
2. **An Agent:** The AI "brain" that understands the user's request, knows it has a image generation tool, and decides when and how to use it.

3. Returns image as json since we are using a text gen model GEMINI FLASH model that can take multi modal but can only generate text


In [5]:
# @title Define Tools for image synthetic agent
from typing import Optional # Make sure to import Optional
from google.adk.tools.tool_context import ToolContext
import io


def get_synthetic_image_google():
    """Provides a synthetic image. If a name is provided, it will be used.

    Args:
        nametool_context (ToolContext): Tool context to save as artifact.

    Returns:
        dict: A dictionary containing the image.

    """

    from google import genai
    from google.genai import types
    from PIL import Image
    from io import BytesIO
    import base64
    import json, numpy as np


    from PIL import Image, ImageDraw
    width = 400
    height = 400
    image = Image.new('RGB', (25,25), 'red')
    contents = ("generate a red square image simple one and a tiny one")
    client = genai.Client()

    response = client.models.generate_content(model="gemini-2.0-flash-preview-image-generation",contents=contents,
                                             config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE']))

    print(response)
    for part in response.candidates[0].content.parts:
      if part.text is not None:
        print(part.text)
      if part.inline_data is not None:
        image = Image.open(BytesIO((part.inline_data.data)))
        image.save('gemini-native-image.png')
        #display(image)

    json_data = json.dumps(np.array(image).tolist()[:100])
    return json_data




In [6]:
# @title Define Tools for image synthetic agent
from typing import Optional # Make sure to import Optional

import io


def get_synthetic_image():
    """Provides a synthetic image. If a name is provided, it will be used.

    Args:
        nametool_context (ToolContext): Tool context to save as artifact.

    Returns:
        dict: A dictionary containing the image.

    """

    from google import genai
    from google.genai import types
    from PIL import Image
    from io import BytesIO
    import base64
    import json, numpy as np


    from PIL import Image, ImageDraw
    from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
    import torch

    model_id = "stabilityai/stable-diffusion-2"
    scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
    pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
    pipe = pipe.to("cuda")

    prompt = "a simple cat"
    image = pipe(prompt).images[0]

    json_data = json.dumps(np.array(image).tolist())
    return json_data[:100]




In [7]:
get_synthetic_image()

scheduler_config.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

model_index.json:   0%|          | 0.00/537 [00:00<?, ?B/s]

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/1.36G [00:00<?, ?B/s]

config.json:   0%|          | 0.00/633 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/909 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/824 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.46G [00:00<?, ?B/s]

config.json:   0%|          | 0.00/611 [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

'[[[129, 129, 128], [129, 128, 129], [125, 124, 124], [129, 128, 128], [127, 126, 127], [128, 127, 12'

---

# Step 4 Create agents for different modalities
Define the Tabular Syntetic Agent and Synthetic Image Agent




In [8]:
# @title Define the Tabular Synthetic Agent
# Use one of the model constants defined earlier
AGENT_MODEL = MODEL_GEMINI_2_0_FLASH # Starting with Gemini

synth_agent = Agent(
    name="synthetic_agent_v1",
    model=AGENT_MODEL, # Can be a string for Gemini or a LiteLlm object
    description="Provides synthetic multi modal data.",
    instruction="You are a helpful synthetic data generator, that can generate multi modal, text and images that are connected. ",
    tools=[get_tabular_synthetic_data])# Pass the function directly



In [9]:
# @title Define the Image Synthetic Agent
# Use one of the model constants defined earlier
AGENT_MODEL = MODEL_GEMINI_2_0_FLASH # Starting with Gemini

synth_image_agent = Agent(
    name="synthetic_image_agent_v1",
    model=AGENT_MODEL, # Can be a string for Gemini or a LiteLlm object
    description="Provides synthetic multi modal data.",
    instruction="You are a helpful synthetic data generator, that can generate multi modal, text and images that are connected. ",
    tools=[get_synthetic_image])# Pass the function directly



# Step 6 : Set up master agent and subagents, agent delegation model

In [10]:
# @title Define the Root Agent with Sub-Agents for multimodal agent

# Ensure sub-agents were created successfully before defining the root agent.
# Also ensure the original 'get_weather' tool is defined.
root_agent = None
runner_root = None # Initialize runner
multi_modal_super_agent = None
if synth_image_agent:
    # Let's use a capable Gemini model for the root agent to handle orchestration
   root_agent_model = AGENT_MODEL
   multi_modal_super_agent =  Agent(
      name="multi_modal_super_agent_v1", # Give it a new version name
      model=root_agent_model,
      description="The main multi modal  agent. Handles weather requests and delegates tabular and image generators.",
      instruction="You are the Super Muli modal synthetic Agent manager. Your primary responsibility is to provide tabular and image syntheic data. "
                   "You have specialized sub-agents: "
                  "1. 'synth_agent': Handles tabular data. "
                  "2. 'synth_image_agent': Handles image data. "
                  "Analyze the user's query. If it's a tabular synthetic request, delegate to 'synth_agent'. If it's a image, delegate to 'synth_image_agent'. "
                        "For anything else, respond appropriately or state you cannot handle it.",
        # Key change: Link the sub-agents here!
      sub_agents=[synth_agent,synth_image_agent]
  )
   print(f"✅ Root Agent '{multi_modal_super_agent.name}' created using model '{root_agent_model}' ")
else:
    print("❌ Cannot create root agent because one or more sub-agents failed to initialize .")


✅ Root Agent 'multi_modal_super_agent_v1' created using model 'gemini-2.0-flash' 


#Step 7 : How is the agent interact with the function in collab environment

In [11]:
# @title Define Agent Interaction Function

from google.genai import types # For creating message Content/Parts

async def call_agent_async(query: str, runner, user_id, session_id):
  """Sends a query to the agent and prints the final response."""
  print(f"\n>>> User Query: {query}")


  # Prepare the user's message in ADK format
  content = types.Content(role='user', parts=[types.Part(text=query)])



  final_response_text = "Agent did not produce a final response." # Default

  # Key Concept: run_async executes the agent logic and yields Events.
  # We iterate through events to find the final answer.
  async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=content):
      # You can uncomment the line below to see *all* events during execution
      print(f"  [Event] Author: {event.author}, Type: {type(event).__name__}, Final: {event.is_final_response()}, Content: {event.content}")
      if event.content and event.content.parts:
          if event.get_function_calls():
              print("  Type: Tool Call Request")
          elif event.get_function_responses():
              print("  Type: Tool Result")
              print(event.content.parts[0].function_response.response['result'])



      # Key Concept: is_final_response() marks the concluding message for the turn.
      if event.is_final_response():
          if event.content and event.content.parts:
             # Assuming text response in the first part
             final_response_text = event.content.parts[0].text
          elif event.actions and event.actions.escalate: # Handle potential errors/escalations
             final_response_text = f"Agent escalated: {event.error_message or 'No specific message.'}"
          # Add more checks here if needed (e.g., specific error codes)
          break # Stop processing events once the final response is found


  print(f"<<< Agent Response: {final_response_text}")


# Step 8 :Finally set up super agent, its session and some prompts to verify if the subagents are called correctly

In [12]:
# @title Interact with the Super Agent
import asyncio # Ensure asyncio is imported
from google.adk.sessions import InMemorySessionService
from google.adk.artifacts import InMemoryArtifactService
from google.adk.runners import Runner

# Ensure the root agent (e.g., 'weather_agent_team' or 'root_agent' from the previous cell) is defined.
# Ensure the call_agent_async function is defined.

# Check if the root agent variable exists before defining the conversation function
root_agent_var_name = 'root_agent' # Default name from Step 3 guide
if 'multi_modal_super_agent' in globals(): # Check if user used this name instead
    root_agent_var_name = 'multi_modal_super_agent'
elif 'root_agent' not in globals():
    print("⚠️ Root agent ('root_agent' or 'weather_agent_team') not found. Cannot define run_team_conversation.")
    # Assign a dummy value to prevent NameError later if the code block runs anyway
    root_agent = None # Or set a flag to prevent execution

# Only define and run if the root agent exists
if root_agent_var_name in globals() and globals()[root_agent_var_name]:
    # Define the main async function for the conversation logic.
    # The 'await' keywords INSIDE this function are necessary for async operations.
    async def run_synthetic_conversation():
        print("\n--- Testing Agent Team Delegation ---")
        session_service = InMemorySessionService()
        artifact_service =InMemoryArtifactService()
        # Choose an implementation

        APP_NAME = "multi_modal_synthetic_agent"
        USER_ID = "user_1_synthetica_agent"
        SESSION_ID = "session_001_synthetic_agent"
        session = await session_service.create_session(
            app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID
        )
        print(f"Session created: App='{APP_NAME}', User='{USER_ID}', Session='{SESSION_ID}'")

        actual_root_agent = globals()[root_agent_var_name]
        runner_agent_team = Runner( # Or use InMemoryRunner
            agent=actual_root_agent,
            app_name=APP_NAME,
            session_service=session_service,
            artifact_service=artifact_service
         )
        print(f"Runner created for agent '{actual_root_agent.name}'.")

        # --- Interactions using await (correct within async def) ---


        await call_agent_async(query = "generate an image",
                               runner=runner_agent_team,
                               user_id=USER_ID,
                               session_id=SESSION_ID)

        await call_agent_async(query = "generate tabular data",
                               runner=runner_agent_team,
                               user_id=USER_ID,
                               session_id=SESSION_ID)


print("Attempting execution using 'await' (default for notebooks)...")
await run_synthetic_conversation()



Attempting execution using 'await' (default for notebooks)...

--- Testing Agent Team Delegation ---
Session created: App='multi_modal_synthetic_agent', User='user_1_synthetica_agent', Session='session_001_synthetic_agent'
Runner created for agent 'multi_modal_super_agent_v1'.

>>> User Query: generate an image




  [Event] Author: multi_modal_super_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=FunctionCall(id='adk-41c26d75-3247-4888-aab8-8126c6241992', args={'agent_name': 'synthetic_image_agent_v1'}, name='transfer_to_agent'), function_response=None, text=None)] role='model'
  Type: Tool Call Request
  [Event] Author: multi_modal_super_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=None, function_response=FunctionResponse(will_continue=None, scheduling=None, id='adk-41c26d75-3247-4888-aab8-8126c6241992', name='transfer_to_agent', response={'result': None}), text=None)] role='user'
  Type: Tool Result
None




  [Event] Author: synthetic_image_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=FunctionCall(id='adk-bdf5dc02-4f41-49cf-9f7c-278e367e58ec', args={}, name='get_synthetic_image'), function_response=None, text=None)] role='model'
  Type: Tool Call Request


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  [Event] Author: synthetic_image_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=None, function_response=FunctionResponse(will_continue=None, scheduling=None, id='adk-bdf5dc02-4f41-49cf-9f7c-278e367e58ec', name='get_synthetic_image', response={'result': '[[[79, 72, 67], [80, 74, 69], [79, 73, 68], [81, 75, 70], [82, 76, 70], [80, 74, 69], [80, 74, 69], '}), text=None)] role='user'
  Type: Tool Result
[[[79, 72, 67], [80, 74, 69], [79, 73, 68], [81, 75, 70], [82, 76, 70], [80, 74, 69], [80, 74, 69], 
  [Event] Author: synthetic_image_agent_v1, Type: Event, Final: True, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=None, function_response=None, text='I have generated a synthetic X-ray image.\n



  [Event] Author: synthetic_image_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=FunctionCall(id='adk-01803868-78aa-4a70-9c29-b1239f6212d9', args={'agent_name': 'synthetic_agent_v1'}, name='transfer_to_agent'), function_response=None, text=None)] role='model'
  Type: Tool Call Request
  [Event] Author: synthetic_image_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=None, function_response=FunctionResponse(will_continue=None, scheduling=None, id='adk-01803868-78aa-4a70-9c29-b1239f6212d9', name='transfer_to_agent', response={'result': None}), text=None)] role='user'
  Type: Tool Result
None




  [Event] Author: synthetic_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=FunctionCall(id='adk-4954d50c-4852-4c69-81a3-7780733ed95f', args={}, name='get_tabular_synthetic_data'), function_response=None, text=None)] role='model'
  Type: Tool Call Request
  [Event] Author: synthetic_agent_v1, Type: Event, Final: False, Content: parts=[Part(video_metadata=None, thought=None, inline_data=None, file_data=None, thought_signature=None, code_execution_result=None, executable_code=None, function_call=None, function_response=FunctionResponse(will_continue=None, scheduling=None, id='adk-4954d50c-4852-4c69-81a3-7780733ed95f', name='get_tabular_synthetic_data', response={'result': '[{"guest_email":"dsullivan@example.net","has_rewards":false,"room_type":"BASIC","amenities_fee":0.29,"checkin_date":"27 Mar 2020","checkout_date":"09 Mar

# Next Steps: The above 8 steps are the MVP functionalities to get an understanding of ADK. Still to experiment, deploy on Vertex AI, storing images using artifact service, applying guardrails etc.