Skip to content

orielhaim/MAPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAPL: Model Action Programming Language

Large language models equipped with the ability to call external tools currently operate in a serial loop: the model emits one tool call, waits for the result, receives all the context again, decides on the next action, and repeats. This architecture creates unnecessary round-trip costs back to the model, duplication of context tokens, and high cumulative response time even when the branching logic between tools is simple and deterministic. Previous work such as LLMCompiler (Kim et al., 2024) focused on receiving independent tool calls but did not propose a mechanism where the model expresses conditions, early exits, and branches directly in its output, so that a deterministic run will execute them without additional calls to the model. Newer approaches such as Anthropic's Programmatic Tool Calling (2025) allow a model to write Python code that orchestrates tools but rely on a general-purpose (Turing-complete) programming language that requires a sandbox that does not guarantee termination and is an external layer to the model rather than part of its internal format.

In this paper, I propose MAPL (Model Action Programming Language), a compact and intentionally limited (non-Turing-complete) domain-specific language (DSL) that is directly encoded as special tokens in the internal format of language models at the same layer where <|tool_call|>, <|eom_id|>, and <|python_tag|> are encoded today. The model emits a complete MAPL program as part of its output and a deterministic runtime environment executes it. MAPL allows a model to describe in a single call a complete program that includes serial and parallel tool calls, conditions on intermediate results, silent early exits, and various types of calls (blocking or fire-and-forget). I theoretically analyze the savings in model calls, tokens, and response time, and it seems that in typical scenarios of 2-5 tools with branches, MAPL reduces 20%-50% in tool calls and response time, with the most significant benefit being obtained in automation processes that often end in premature exit without response.

1. Introduction

1.1 Background: Tool Calling as a Central Paradigm

The ability of large language models to call external tools - APIs, databases, search engines, and other services - has become a key enabler for deploying AI agents in production systems. Since the introduction of function calling in GPT-4 (OpenAI, 2023) and its extension to open models such as Llama 3.1 (Dubey et al., 2024) and Mistral (Jiang et al., 2024), leading model vendors such as OpenAI, Anthropic, Google, Mistral, and Meta have implemented support for tool calling directly into their models, through dedicated training on structured formats. Technically, tool calling is implemented using special tokens-special tokens that are not part of the natural language but serve as structural instructions for the model and the runtime environment. For example, Llama 3 uses <|python_tag|> to mark the start of a tool call and <|eom_id|> to indicate that the model expects a result before continuing, while <|eot_id|> marks the end of a full queue. Mistral defines dedicated control tokens such as [TOOL_CALLS], BEGIN_TOOL_RESULTS, and END_TOOL_RESULTS. The Hermes (NousResearch) format uses <tool_call> and </tool_call>. OpenAI's ChatML format uses <|im_start|> and <|im_end|> to delimit roles in a conversation. All of these approaches share a common structure: the model emits a special token that marks a tool call, the runtime performs the call, and then the result is fed back to the model as a new message. The model reprocesses the entire context and decides on the next step.

1.2 The Problem: Unnecessary Round-Trips

The current architecture for using tools is based on the paradigm introduced by Yao et al. (2023) in their work on ReAct: the model thinks, decides on an action, executes it, receives an observation, thinks again, and repeats until it reaches a final answer. This paradigm is intuitive and flexible, but it imposes a rigid serial execution structure: each step requires a full round-trip to the model in which all the context - system instructions, tool settings, user message, conversation history, and previous tool results - are re-sent. In a process involving N tool calls, this typically requires N+1 model calls, with the cost of tokens increasing linearly with each step.

The problem becomes more acute when considering the nature of the decisions the model makes between tool calls. In most workflows these decisions are deterministic rather than semantic: “if no search results - stop”, “if results - move to next tool”, “if status is X - call tool Y”. There is no need for a language model to check if a list is empty - this is a check that simple code can perform. The model already “knows” in advance what the logic is, but the existing protocol forces it to discover it step by step.

In addition to the duplication of context and unnecessary judgment, there is the problem of cumulative response time. Each model call requires 1-3 seconds and each tool requires 0.5-2 seconds. A serial process of four model calls and three tools can take 8-15 seconds, even when the logic is simple.

1.3 Limitations of existing approaches

Later works have identified this inefficiency. LLMCompiler (Kim et al., 2024) proposed an architecture inspired by classical compilers: the model generates a dependency graph (DAG) of tasks and an execution unit runs independent tasks in parallel. This approach was able to significantly reduce the response time (up to 3.7x) and the cost (up to 6.7x) compared to ReAct. However, LLMCompiler does not support conditions (if/else) on tool results or early exits; when a branch is required, the model returns to a "replanning state" which is effectively another round-trip.

ReWOO (Xu et al., 2023) separated the planning phase from the execution phase: the model generates a complete plan in advance, and Workers execute the tools. But here too there are no conditions - the plan is a fixed sequence. The Plan-and-Solve approach (Wang et al., 2023) separates the planning and execution phases, but the plan is a linear list of steps without branching.

CodeAct (Wang et al., 2024) proposed that the model emit real Python code that executes tool calls. This allows for conditionals and loops, but requires a full Python sandbox, raises security risks, and makes it difficult to statically validate the program.

Anthropic’s (2025) Programmatic Tool Calling (PTC) goes a step further, allowing the Claude model to write Python code that orchestrates tool calls in a sandbox environment. Intermediate results remain in the runtime environment and do not enter the context of the model. This approach has been shown to save 37% in tokens on complex research tasks, and improve accuracy in internal benchmarks. However, PTC relies on Python - a Turing-complete language - which requires a code execution sandbox, does not guarantee termination, and is an external layer (API-level feature) and not part of the internal format of the model. (Fun Fact I've thought of this before)

PASTE (Sui et al., 2026) proposed speculative tool execution based on recursive flow patterns. FlowMind (Liu et al., 2026) proposed an Execute-Summarize framework in which the model first executes and then extracts a structured workflow. OrchDAG (Lu et al., 2025) models tool execution as a DAG with controlled complexity, and uses RLVR for training. Quasar (Mell et al., 2025) proposes a new programming language for model code operations, with automatic acceptance, uncertainty quantification, and security features - but it is still an external programming language that the model writes as text. Semantic Router DSL (Chen et al., 2026) offers a declarative policy language for inference routing and agent orchestration, but focuses on routing policy-which model to choose, whether to authorize a tool call-rather than on the orchestration of the tools themselves.

All existing approaches share a common assumption: that orchestration happens outside the model’s internal format as an external layer (Python, external DSL, framework). The gap that remains open is clear: there is no existing mechanism where the model expresses conditional logic directly in its token format, as a natural part of the output, so that it is executed deterministically without any round-trip.

1.4 The Programmer Metaphor

The central intuition of MAPL is a change of role: the model goes from a "worker" that executes one step at a time and waits for instructions to a "programmer" who writes a full conditional program. The runtime environment - a deterministic and lightweight runtime - executes the program and returns to the model only when a judgment is required that only a language model can provide, such as formulating a natural language answer.

This limitation is not an intrinsic property of language models but a limitation of the output protocol. Modern models are capable of producing complex structures - JSON, code, markup - and there is no reason in principle that they cannot produce a full conditional program that will be executed by a deterministic runtime environment. This is the central proposition of this article.

2. The MAPL Language

2.1 Design Principles

MAPL is designed according to four guiding principles.

The first is token-level implementation. MAPL is encoded as special tokens at the same level as tool calls are encoded today. It is not text that the model writes, but part of the model's vocabulary. This is the most fundamental distinction from approaches like CodeAct, PTC, and Quasar where the model writes code as plain text.

The second is intentional limitation. MAPL is non-Turing-complete; it has no loops, recursion, or mutable state. There are only conditions, sequences, and parallel execution. This limitation ensures that every MAPL program terminates, and that a deterministic runtime can execute it without the risk of infinite loops and without the need for a sandbox.

The third is minimalism. MAPL adds a small number of primitives on top of existing tool calling. Anything that can be expressed in a regular tool call can be expressed in MAPL. MAPL adds only the ability to chain, condition, and execute concurrently. There are no global variables, no user-defined functions. These would expand the model's error space without sufficient benefit.

The fourth is gradualism. A model can emit a regular tool call (without MAPL) for simple tasks, and a MAPL program only when there is a value for conditional logic. The system does not force a DSL on any interaction. The distinction is made by the first special token that the model emits: <|tool_call|> for a regular call, <|program_start|> for a MAPL program.

2.2 Primitives

MAPL defines five basic primitives.

call - blocking tool call. A standard tool call in which the runtime waits for a result before continuing. Includes the tool name, input parameters, and an identifier that allows the result to be referenced later in the program. The result is stored in the program's namespace and accessible to subsequent steps.

fire - A non-blocking (fire-and-forget) tool call. The runtime dispatches the call but immediately proceeds to the next step, without waiting for the result and without storing it in state. Useful for side operations such as logging, sending an alert, or updating status.

if - A condition on the result of a previous tool. The condition is expressed on fields of the tool result using simple operators: numeric and string comparison, field existence checking, null/empty checking, and logical operators (and, or, not). The condition dictates which branch will be executed - each if contains a then branch and optionally an else branch.

parallel - Parallel execution of multiple independent tool calls. The runtime runs them all at the same time and collects the results before continuing.

finish - Terminates the program. Can be of the respond type (return to the model to formulate a final answer, with the runtime passing the accumulated state to the model) or of the silent type (end without any response at all - the runtime does not return to the model and does not emit an answer to the user).

2.3 Special Tokens

MAPL defines the following special tokens:

<|program_start|> - Start of MAPL program
<|program_end|>   - End of MAPL program
<|call|>          - Blocking tool call
<|fire|>          - Non-blocking tool call
<|if|>            - Start of condition
<|else|>          - Else branch
<|endif|>         - End of condition
<|parallel|>      - Start of parallel block
<|end_parallel|>  - End of parallel block
<|finish|>        - Finish (with silent or respond parameter)

2.4 Full Examples

Example 1: Checking for new emails

Scenario: "Check for new emails. If not, do not respond. If there are, filter duplicates. If there is nothing new after filtering, do not respond. If there is something new, summarize."

In normal tool calling, the process requires 3-4 round-trips:

<|start_header_id|>assistant<|end_header_id|>

<|python_tag|>{"name": "gmail.search", "parameters": {"query": "is:unread newer_than:1d"}}<|eom_id|>

<|start_header_id|>ipython<|end_header_id|>

{"result_count": 3, "results": [{"id": "m1", ...}, {"id": "m2", ...}, {"id": "m3", ...}]}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

<|python_tag|>{"name": "memory.deduplicate", "parameters": {"items": [...]}}<|eom_id|>

<|start_header_id|>ipython<|end_header_id|>

{"new_count": 2, "new_items": [{"id": "m1", ...}, {"id": "m3", ...}]}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

Here is a summary of your new emails:
1. ...<|eot_id|>

Each <|start_header_id|>assistant<|end_header_id|> is a separate call to the model with all the context re-created.

In MAPL the model emits everything in one call:

<|start_header_id|>assistant<|end_header_id|>

<|program_start|>
<|call|> id=search tool=gmail.search {"query": "is:unread newer_than:1d"}
<|if|> search.result_count == 0
  <|finish|> silent
<|else|>
  <|call|> id=dedupe tool=memory.deduplicate {"items": "{search.results}"}
  <|if|> dedupe.new_count == 0
    <|finish|> silent
  <|else|>
    <|finish|> respond {"new_emails": "{dedupe.new_items}"}
  <|endif|>
<|endif|>
<|program_end|><|eot_id|>

The runtime receives the program and executes it:

(1) interprets <|program_start|>...<|program_end|>

(2) runs gmail.search gets {result_count: 3, results: [...]}

(3) checks search.result_count == 0 - false goes to else

(4) runs memory.deduplicate gets {new_count: 2, new_items: [...]}

(5) checks dedupe.new_count == 0 false goes to else

(6) encounters <|finish|> respond returns only {new_emails: [...]} to the model.

Only now if the model needs to be formulated is it called a second time and only gets the new emails, not the entire history.

In the scenario where there are no new emails the runtime makes one call to gmail.search, checks that the result is empty, encounters <|finish|> silent and stops. The model is called only once to produce the program.

2.5 Validation

Before executing a program, the runtime performs static validation checks:

(1) Every reference (ref) points to an id previously defined in the execution tree

(2) Every if contains at least one then branch

(3) Every tool called exists in the schema of available tools

(4) Every path in the tree ends with finish. A program that fails validation is not executed and instead is returned to the model with an error message for correction, similar to how tool calls with invalid parameters are handled today.

2.6 What MAPL does not include

MAPL does not include loops (for\while) because loops on tool results require a complex runtime with state management, and introduce the risk of unbounded execution. MAPL does not include variables or mutable state - the result of each tool is available under the identifier that the model defined (e.g. $search, $dedupe), but there is no nested namespace or scope. MAPL does not include dynamic tool calls - the tool name and parameters must be known at output time, not calculated from results. And MAPL does not include text formulation - any response that requires natural language must go through a finish respond that returns control to the model.

These limitations are intentional. MAPL does not attempt to replace these programming languages, only to allow the model to express simple conditional logic on tool calls in a way that the runtime can safely and deterministically execute.

3. Savings Analysis

The savings estimate of MAPL compared to serial tool calling is only a rough estimate and not the result of empirical testing or field measurement. In principle, in a standard ReAct paradigm, the model is called again after almost every tool call, so a process with several tools can require a large number of model calls and repeatedly accumulate the same context.

In contrast, in MAPL, the model can pre-eject an execution plan that includes tool calls, conditions, early exit, parallel execution, and a decision whether to return a response to the user at all.

Therefore, in scenarios where there are several technical steps or simple branches, it can be estimated that the savings may be significant: about 40%-80% in tokens and model calls in multi-tool processes and sometimes even more in automations that often end in a silent finish such as news monitoring, email checking, or inventory checking.

On the other hand, in simple tasks with only one tool or in tasks that require semantic judgment after each step, the savings are expected to be smaller.

Therefore, these numbers should be considered as an initial estimate only, aimed at illustrating the savings potential and not as a measurable or proven claim without a well-established benchmark.

4. Training

4.1 Training on MAPL

MAPL requires that language models be trained to produce programs in this language, similar to how models are currently trained to produce tool calls in JSON format.

The training process extends the existing paradigm: the special tokens of MAPL (<|program_start|>, <|program_end|>, <|call|>, <|fire|>, <|if|>, <|else|>, <|endif|>, <|parallel|>, <|end_parallel|>, <|finish|>) are added to the vocabulary, training examples of correct MAPL programs for different workflows are generated, and the model is trained to produce them.

The model also learns when to use MAPL and when to use a regular tool call. The distinction is made according to the first special token emitted.

4.2 Recommendation: Large Models Only

Due to the relative complexity of producing correct MAPL programs (compared to a single tool call), I recommend only enabling the MAPL capability on large models of the order of 70B parameters or more. Smaller models may produce incorrect programs at too high a rate, negating the benefit. Smaller models (8B-30B) should only be considered if they undergo dedicated fine-tuning on MAPL with a sufficiently large and diverse dataset.

Sources

Anthropic. (2024). Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol

Anthropic. (2025). Introducing advanced tool use on the Claude Developer Platform. https://www.anthropic.com/engineering/advanced-tool-use

Chen, H., et al. (2026). From Inference Routing to Agent Orchestration: The Semantic Router DSL. arXiv:2603.27299.

Cheng, J., Liu, X., Zhang, Z., Wen, H., Zhang, Z., Yin, Q., Li, S., Nigam, P., Yin, B., Zhang, C., & Song, Y. (2026). Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards. arXiv:2603.24709.

Dubey, A., et al. (2024). The Llama 3 Herd of Models. arXiv:2407.21783.

Jiang, A. Q., et al. (2024). Mistral 7B. arXiv:2310.06825.

Kim, S., Moon, S., Tabrizi, R., Lee, N., Mahoney, M. W., Keutzer, K., & Gholami, A. (2024). An LLM Compiler for Parallel Function Calling. In Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR 235.

Liu, Y., et al. (2026). FlowMind: Execute-Summarize for Structured Workflow Generation from LLM Reasoning Traces. arXiv:2602.11782.

Lu, Y., Liu, S., & Dong, L. (2025). OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs. arXiv:2510.24663.

Mell, S., et al. (2025). A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions. arXiv:2506.12202.

Sui, Y., et al. (2026). Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution (PASTE). arXiv:2603.18897.

Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K.-W., & Lim, E.-P. (2023). Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. In Proceedings of ACL 2023.

Wang, X., Wang, Y., Liu, J., Chen, Y., Yuan, L., Peng, H., & Ji, H. (2024). Executable Code Actions Elicit Better LLM Agents. ICML 2024. arXiv:2402.01030.

Xu, B., Peng, Z., Lei, B., Mukherjee, S., Liu, Y., & Xu, D. (2023). ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models. arXiv:2305.18323.

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.

About

Compact, non-Turing-complete DSL encoded as special tokens in LLMs, letting a model emit a full conditional tool-orchestration program (with branches, parallel calls, and silent exits) in a single output for a deterministic runtime to execute. cutting round-trips, tokens, and latency in multi-tool workflow

Resources

Stars

Watchers

Forks

Contributors

Languages