Hi maintainers — this is a friendly architecture-quality audit note, not a security report.
I’ve been building hermescheck, a small open-source scanner that reviews AI agent runtimes for state recovery, memory drift, tool boundaries, scheduler behavior, and other long-running agent failure modes. I ran it against modelscope/ms-agent because this repo is an active agent runtime/framework with real CLI, web UI, workflow, memory, and tool-execution surfaces.
Three runtime/architecture notes that looked worth sharing:
-
Recovery currently looks transcript-heavy rather than fully resumable.
ms_agent/utils/utils.py persists config + message history, and ms_agent/llm/openai_llm.py continues long generations, but I didn’t see the same level of durable environment / side-effect / restart contract around those flows. That can make interrupted work harder to resume deterministically.
-
Web UI session state appears in-memory and explicit-delete driven.
webui/backend/session_manager.py keeps sessions, messages, and deep-research events in process memory, and webui/backend/websocket_handler.py mainly cleans up task/runner references on stop or completion. This felt a bit fragile for server restarts or abandoned sessions if the goal is durable operator-facing workflows.
-
Memory surfaces look split across several layers without one obvious freshness contract.
I saw cache/history handling in ms_agent/utils/utils.py, long-term memory logic in ms_agent/memory/default_memory.py, and separate condenser/default-memory surfaces under ms_agent/memory/. The architecture may benefit from a clearer “authoritative current memory vs derived/archive/cache layers” rule.
None of the above is meant as “you must change this”; it just looked like a useful maintainer-facing architecture snapshot from an external pass.
Repo for the tool: https://github.com/huangrichao2020/hermescheck
If this isn’t useful, feel free to close and I won’t take it personally.
Hi maintainers — this is a friendly architecture-quality audit note, not a security report.
I’ve been building hermescheck, a small open-source scanner that reviews AI agent runtimes for state recovery, memory drift, tool boundaries, scheduler behavior, and other long-running agent failure modes. I ran it against
modelscope/ms-agentbecause this repo is an active agent runtime/framework with real CLI, web UI, workflow, memory, and tool-execution surfaces.Three runtime/architecture notes that looked worth sharing:
Recovery currently looks transcript-heavy rather than fully resumable.
ms_agent/utils/utils.pypersists config + message history, andms_agent/llm/openai_llm.pycontinues long generations, but I didn’t see the same level of durable environment / side-effect / restart contract around those flows. That can make interrupted work harder to resume deterministically.Web UI session state appears in-memory and explicit-delete driven.
webui/backend/session_manager.pykeeps sessions, messages, and deep-research events in process memory, andwebui/backend/websocket_handler.pymainly cleans up task/runner references on stop or completion. This felt a bit fragile for server restarts or abandoned sessions if the goal is durable operator-facing workflows.Memory surfaces look split across several layers without one obvious freshness contract.
I saw cache/history handling in
ms_agent/utils/utils.py, long-term memory logic inms_agent/memory/default_memory.py, and separate condenser/default-memory surfaces underms_agent/memory/. The architecture may benefit from a clearer “authoritative current memory vs derived/archive/cache layers” rule.None of the above is meant as “you must change this”; it just looked like a useful maintainer-facing architecture snapshot from an external pass.
Repo for the tool: https://github.com/huangrichao2020/hermescheck
If this isn’t useful, feel free to close and I won’t take it personally.