-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Problem Statement
Currently, the AI agent interacts with the pseudo-terminal (PTY) using strictly atomic operations: pty spawn, pty input, pty read, and pty kill. While scientifically accurate to how PTYs function under the hood, this creates severe tool-calling bloat and latency for the LLM.
For example, a simple interactive SSH login and check uname sequence requires the LLM to invoke tools 7 times:
spawn -> write ssh user@host -> read -> write password -> read -> write uname -> read.
This atomic structure forces the AI to explicitly request buffer reads after every single action, wasting time, consuming unnecessary tokens, and degrading the overall experience.
Proposed Solution
Refactor the PTY tool logic to use "compound" operations that bundle execution with immediate reads, effectively reducing the number of round trips required by the LLM.
The updated tool behaviors should be:
spawn: Spawns the PTY, optionally accepts an initial input command, and automatically returns the firstreadbuffer.write: Sends the keystrokes/command and *automatically performs aread*to return the resulting output.read: Kept as a standalone function strictly for polling the output of long-running, blocking commands (e.g.,ping -c 100 localhost) where the AI needs to check on progress.
Under this new architecture, the same SSH login sequence drops from 7 tool calls to just 3:
spawn (ssh user@host) -> write password -> write uname.
Use Case
- AI-Powered Cybersecurity Agents: Drastically speeding up multi-step interactive workflows, such as establishing reverse shells, navigating nested SSH jumps, or interacting with interactive privilege escalation scripts (e.g.,
sudo). - Reduced Token Consumption: Cutting the context overhead by eliminating redundant
<tool_call>read</tool_call>outputs from the LLM's thought process.
I already have a version with new approach and it work faster than current version. IT WORK !!!!
Additional Context
This "compound action" approach has recently been validated by major industry players. For instance, Anthropic recently updated their computer-use/bash tools to follow this exact paradigm, combining command execution and output reading to minimize unnecessary round-trips.
