<a href="https://colab.research.google.com/github/its-emile/memory-safe-agent/blob/main/Memory_safe_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# memory-safe (blind) agentic simulation

Hypothesis: an LLM can solve an agentic task without seeing any of the intermediate data between tool calls, and every tool can strictly control the flow of its input and output data with a policy, guarding against the LLM's unbounded data flow.

In this example simulation, we store a log line without the LLM knowing the time data that's being stored.

In a real-life scenario, this would let the agent manage calls to privileged tools while preventing any exposure of the tools' authentication parameters or sensitive intermediate outputs to the agent or unintended tools.

In [1]:
from google.colab import userdata
from google import genai
from abc import ABC, abstractmethod
from pydantic import BaseModel

# Use Gemini for now since it has a free API.
# Ideally we would end up using two models from different labs.
GEMINI_CLIENT = genai.Client(api_key=userdata.get("GOOGLE_API_KEY"))


class Model(ABC):
    @abstractmethod
    def model_json(self, message: str) -> str:
        pass

class Gemini(Model):
    def __init__(self):
        self.client = GEMINI_CLIENT

    def model_json(self, message: str) -> str:
        return self.client.models.generate_content(
            model = "gemini-2.0-flash",
            contents = message,
            config={
              'response_mime_type': 'application/json',
              'response_schema': {
                "type": "array",
                "items": {
                  "type": "object",
                  "properties": {
                    "method_name": {
                      "type": "string",
                    },
                    "parameter_dict": {
                      "type": "object",
                      "properties": {
                        "fmt": {
                          "type": "string",
                        },
                        "event": {
                          "type": "string",
                        },
                        "time": {
                          "type": "string",
                        }
                      }
                    },
                    "call_id": {
                      "type": "string",
                    }
                  },
                  "required": ["method_name","call_id"],
                },
              },
            },
        )

In [2]:
class Data_Safe_MCP(ABC):

  # This would have a validation library to ensure every object follows a schema.
  @property
  @abstractmethod
  def instruction_list(self):
      """List of tool instruction objects"""
      pass

  # @instruction_list.setter
  # @ abstractmethod
  # def instruction_list(self, value):
  #     """Sets tool instuction list"""
  #     pass

  @property
  @abstractmethod
  def tool_list(self):
      """Tools in object format"""
      pass

  # @tool_list.setter
  # @ abstractmethod
  # def tool_list(self, value):
  #     """Sets tool list"""
  #     pass

class Structured_Instruction_Flow(Data_Safe_MCP):
  def __init__(self, instruction_list, tool_list):
      self._instruction_list = instruction_list
      self._tool_list = tool_list

  @property
  def instruction_list(self):
      return self._instruction_list

  @property
  def tool_list(self):
      return self._tool_list

In [3]:
import random
import time
import numpy as np
from dataclasses import dataclass
from typing import List, Dict, Any, Tuple, Optional, Union
import json

AGENT_TEMPLATE="""
You are a highly capable agent, but operating in a blind environment. You have access to the tools below, but won't see the data I/O between them:
{agent_tools}

To fulfill your task below, you will write an execution flow in json as a series of {{method_name, parameter_dict, call_id, out_params}}
Rule 1: You will not see the output data of each call but you can reference any output parameter name by formatting a variable name as call_id:out_param_name.
Rule 2: For security reasons, only the first method (init) can accepts literals, and must have call_id = "_" ; any other method calls will fail if a parameter_dict contains values other than call_id:out_param_name references.
Rule 3: For security reasons, the last method must have "method_name":"result", "call_id":"result" and "parameter_dict" must contain all information the user may need regarding your task completion
Rule 4: The constraints in each tool above are hard constraints (data passing fails unless the source output explicitly allows data to the destination input and the destination input explicitly allows data from the source output).

Example (supposing tools get_time(fmt) -> time and store_log(event) -> log_lines):

{tool_instructions}

Your task to fulfill is:
<usertask>
{task}
</usertask>

Now carefully design an execution flow for this task, and print it only as valid json-formatted as per the example, and don't forget to pass to each method call all the parameters it requires:
"""

INSTRUCTION_LIST = [
{"method_name":"init", "parameter_dict":{"fmt":"%H:%M:%S", "event":"factorized 1024 bit RSA"}, "call_id":"_", "out_params":["fmt","events"]},
{"method_name":"get_time", "parameter_dict":{"fmt":"_:fmt"}, "call_id":"A", "out_params":["time"]},
{"method_name":"store_log", "parameter_dict":{"time":"A:time", "event":"_:event"}, "call_id":"B", "out_params":["log_lines"]},
{"method_name":"result", "parameter_dict":{"log_lines":"B:log_lines"}, "call_id":"result", "out_params":["log_lines"]}
]

TOOL_LIST = {
    "get_time": {
        "allowed_sources": {
            "fmt": ["_:fmt"]
        },
        "allowed_sinks": {
            "time": ["store_log:time"]
        },
        "out_params": ["time"]
    },
    "store_log": {
        "allowed_sources": {
            "time": ["get_time:time"]
        },
        "allowed_sinks": {
            "log_lines": ["results:log_lines"]
        },
        "out_params": ["log_lines"]
    }
}

tools_and_instructions = Structured_Instruction_Flow(INSTRUCTION_LIST,TOOL_LIST)

class Agent(ABC):
  def __init__(self, task: str, instruction_flow: Structured_Instruction_Flow):
    self.model = Gemini()
    self.instructions = instruction_flow.instruction_list
    self.tools = instruction_flow.tool_list
    self.task = task
    model_attempt = Gemini().model_json(AGENT_TEMPLATE.format(
        tool_instructions=self.instructions,
        agent_tools=self.tools,
        task=task,
    )).text
    print("Model attempt:",model_attempt)
    self.plan = json.loads(model_attempt)
    self.call_results={}
    self.call_methods={}

  def process(self):
    k = 0
    if self.plan[0]["call_id"]=="_":
      self.call_results["_"]=self.plan[0]["parameter_dict"]
      k=1
    for step in self.plan[k:]:
      self.call_methods[step["call_id"]]=step["method_name"]

      if step["call_id"]=="result":
        self.call_results["result"] = self.memory_fetch_unsafe(step["parameter_dict"])
        break
      else:
        self.call_results[step["call_id"]] = self.method_call(step["method_name"], step["parameter_dict"])

  def get_results(self):
    return self.call_results["result"]

  def method_call(self, method_name: str, params: Dict[str, Any]):
    if method_name not in self.tools.keys():
      raise ValueError(f"Unknown method name: {method_name}")

    param_values = self.memory_fetch_safe(method_name, params)

    print(f"Calling {method_name} with {param_values}")

    method = getattr(self, method_name)
    res = method(**param_values)

    print(f"Result: {res}")
    return res

  def memory_fetch_safe(self, method_name: str, param_dict: Dict[str, Any]) -> Dict[str, Any]:
    for p in param_dict:
      # verify this sink is allowed by the source the agent is trying to use
      print(f"verifying param ({p}) access policy for {method_name} (source: {param_dict[p]})")
      source_method, source_param = param_dict[p].split(":")

      if source_method != "_" and f'{method_name}:{p}' not in self.tools[self.call_methods[source_method]]["allowed_sinks"][source_param]:
        raise ValueError(f"{source_method} does not authorize {method_name} to read {source_param}")

      m_n = self.call_methods[source_method] if source_method != "_" else "_"
      if f'{m_n}:{source_param}' not in self.tools[method_name]["allowed_sources"][p]:
        raise ValueError(f"{method_name} does not authorize reading {source_param} from {source_method}")
    # params are permitted by the specified method and their source.
    return self.memory_fetch_unsafe(param_dict)

  def memory_fetch_unsafe(self, param_dict: Dict[str, Any]):
    # gets specified output values but not inherently memory safe, requires outer validation
    params={}
    for k in param_dict.keys():
      print(f'loading {k} = {param_dict[k]}')
      key, value = param_dict[k].split(":")
      if key not in self.call_results.keys():
        raise ValueError(f"Unknown call id: {key}")
      if value not in self.call_results[key].keys():
        raise ValueError(f"Unknown output from {self.call_methods[key]}: {value}")
      params[k] = self.call_results[key][value]
      print(f'loaded {k} = {params[k]}')
    return params

  # simulated tools for testing purposes
  def get_time(self, fmt: str) -> str:
    return {"time":time.strftime(fmt)}

  def store_log(self, time: str, event: str | None = None) -> None:
    print(f"storing log {time}: {event}")
    return {"log_lines":1}



In [4]:
a = Agent("store a log that we just discovered nuclear fusion, and tell me how large the log is", tools_and_instructions)
a.process()
print("\n\nResults:")
a.get_results()

Model attempt: [
  {
    "call_id": "_",
    "method_name": "init",
    "parameter_dict": {
      "event": "We just discovered nuclear fusion!",
      "fmt": "%Y-%m-%d %H:%M:%S"
    }
    },
  {
    "call_id": "get_time_call",
    "method_name": "get_time",
    "parameter_dict": {
      "fmt": "_:fmt"
    }
    },
  {
    "call_id": "store_log_call",
    "method_name": "store_log",
    "parameter_dict": {
      "time": "get_time_call:time"
    }
    },
  {
    "call_id": "result",
    "method_name": "result",
    "parameter_dict": {
      "event": "_:event",
      "time": "get_time_call:time"
    }
    }
]
verifying param (fmt) access policy for get_time (source: _:fmt)
loading fmt = _:fmt
loaded fmt = %Y-%m-%d %H:%M:%S
Calling get_time with {'fmt': '%Y-%m-%d %H:%M:%S'}
Result: {'time': '2025-04-24 14:54:19'}
verifying param (time) access policy for store_log (source: get_time_call:time)
loading time = get_time_call:time
loaded time = 2025-04-24 14:54:19
Calling store_log with {'time':

{'event': 'We just discovered nuclear fusion!', 'time': '2025-04-24 14:54:19'}