### Autonomous Code-Executing Agent Using Claude 3.5 Sonnet

Here you can see the example usage of LangCode. AI agent, powered by Claude 3.5 Sonnet, writes code in XML notation, and LangGraph later executes the code in a LangCode environment. It can be used to create local code-executing assistants which can interact with the OS.

In [None]:
%env ANTHROPIC_API_KEY=<key>

In [None]:
# Here we define the notebook. Each time it is initialized -- it creates a new notebook.

from langcode.jupyter import Jupyter

jupyter = Jupyter.local()

In [None]:
# Define Agent State for LangGraph

from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator


class SystemState(TypedDict):
    system_name: str
    user_username: str
    user_realname: str
    internet_access: str
    date_time: str


class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    system_state: Annotated[SystemState, operator.setitem]
    temperature: int

In [None]:
# Define Prompt for AI Agent

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder


system_prompt = (
    """
    You are an autonomous AI assistant, run on a local user's machine.
    You have an access to a stateful Jupyter Notebook. Each new call has the context (variables, imports, etc) of previous calls.
    You can create and call a cell by writing a Python code inside <python></python> XML tags, with no ``` and other stuff used for markup.
    After you have defined a <python></python> call please stop your inference. The system will extract XML call, execute it in Jupyter and return to you in the next message.
    You can use your code-execution ability to interact with local machine, the internet and generally to execute user's objectives. 
    While you can execute code -> receive response -> execute code again, please remember to not create infinite loops beyond user's request.
    Do not ask user to execute code and return the output and / or stuff like 'Waiting until code execution ends...'. It is executed AUTOMATICALLY when in the last finished message produced by you a python call XML notation is found.
    You MUST plan before proceeding to executing code, and you must keep each call reasonably compact, as you are working in Jupyter Notebook.
    Try not to do more than user asks. If user's goals are too broad -- try to clarify or come up with specific objectives yourself.

    Let me draw some example for you (Stuff inside || describes what happens under the hood.):

    User: 
    I want you to do X if X is Z, if it is not Z, do Y!

    Assistant:
    Sure!

    First, I will try to generally access X to inspect it's nature.

    <python>
    ... Code where you work to access X ...
    </python>

    |End of your message|

    |System detects <python></python> call and executes the code, and returns as a user message|

    User: // Note: It's not the actual user, but data from userspace where code was executed.
    The X is Z, blah blah blah...

    Assistant:

    Great! X is Z!

    Now let's proceed to second step... (Execute code again, reflect, etc.)

    Use the system information to make your output better:

    OS: {system_name}
    User's username: {user_username}
    User's real name: {user_realname}
    Internet access: {internet_access}
    Date and time: {date_time}
    """
).strip()

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

In [None]:
# Define Model

from langchain_groq import ChatGroq
from langchain_anthropic import ChatAnthropic


def model(state: AgentState):
    chain = prompt | ChatAnthropic(temperature=state["temperature"], model="claude-3-5-sonnet-20240620")  # type: ignore

    response = chain.invoke(
        {
            "messages": state["messages"],
            "system_name": state["system_state"]["system_name"],
            "user_username": state["system_state"]["user_username"],
            "user_realname": state["system_state"]["user_realname"],
            "internet_access": state["system_state"]["internet_access"],
            "date_time": state["system_state"]["date_time"],
        }
    )

    return {"messages": [response]}

In [None]:
# Define XML Parser

import xml.etree.ElementTree as ET
from typing import Optional
import re


class ToolParser:
    """Parses XML tool calling markup from LLM output."""

    @staticmethod
    def xml_to_dict(xml_string: str) -> Optional[dict]:
        """Parse XML string to dictionary, including inner text for simple elements."""
        try:
            root = ET.fromstring(xml_string)
            result_dict = {root.tag: {} if root.attrib or list(root) else root.text}
            for child in list(root):
                if child.text:
                    result_dict[root.tag][child.tag] = child.text.strip()  # type: ignore
                else:
                    result_dict[root.tag][child.tag] = child.attrib  # type: ignore
            return result_dict
        except ET.ParseError as e:
            print(f"Error parsing XML: {e}")
            return None

    @staticmethod
    def extract_and_parse_xml(raw_string: str) -> list[dict]:
        """Extract XML strings and convert to dictionary with enhanced regex."""
        # Enhanced regex pattern to better capture nested and more complex XML
        xml_pattern = r"<(\w+)[^>]*>(.*?)</\1>"
        xml_strings = re.findall(xml_pattern, raw_string, re.DOTALL)

        # Parse all found XML strings to dictionaries
        parsed_xmls = []
        for xml_match in xml_strings:
            xml_content = f"<{xml_match[0]}>{xml_match[1]}</{xml_match[0]}>"
            parsed_xml = ToolParser.xml_to_dict(xml_content)
            if parsed_xml:
                parsed_xmls.append(parsed_xml)

        return parsed_xmls

In [None]:
# Define Executor and Router

from langchain_core.messages import HumanMessage

def executor(state: AgentState):
    xml = ToolParser.extract_and_parse_xml(state["messages"][-1].content)  # type: ignore

    for call in xml:
        if call.get("python", None):
            result = jupyter.run_cell(call["python"], timeout=600000)

            images = []

            for image in result.images:
                images.append(
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/{image.content_format};base64,{image.content}"
                        },
                    }
                )

            return {
                "messages": [
                    HumanMessage(
                        content=[
                            {"type": "text", "text": result.text}
                        ] + images
                    )
                ]
            }


def router(state: AgentState):
    xml = ToolParser.extract_and_parse_xml(state["messages"][-1].content.strip())  # type: ignore

    if xml:
        return "execute"
    else:
        return "end"

In [None]:
# Define StateGraph

from langgraph.graph import END, StateGraph

graph = StateGraph(AgentState)

graph.add_node("model", model)
graph.add_node("executor", executor)

graph.add_conditional_edges("model", router, {"execute": "executor", "end": END})
graph.add_edge("executor", "model")

graph.set_entry_point("model")

runnable = graph.compile()

In [None]:
# Invoke

from datetime import datetime
from langchain_core.messages import HumanMessage
from IPython.display import Image, display
import base64

inputs = {
    "messages": [
        HumanMessage(
            content="Hi! Could you please do screenshot of my screen and tell what you see?"
        )
    ],
    "temperature": 0.5,
    "system_state": {
        "system_name": "Linux",
        "user_username": "keell0renz",
        "user_realname": "Bohdan Agarkov",
        "internet_access": "Enabled",
        "date_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    },
}

async for output in runnable.astream_log(inputs, include_types=["llm"]):
    for op in output.ops:
        if op["path"] == "/streamed_output/-":
            if op["value"].get("executor", None):
                for message in op["value"]["executor"]["messages"]:
                    print("\n\n<output>")
                    print(message.content[0]["text"])
                    
                    for image_base64_obj in message.content[1:]:
                        image_base64_str = image_base64_obj["image_url"]["url"].split(",")[1]
                        image_data = base64.b64decode(image_base64_str)
                        display(Image(data=image_data))

                    print("</output>\n")

        if op["path"].startswith("/logs/") and op["path"].endswith(
            "/streamed_output/-"
        ):
            print(op["value"].content, end="")

# Later you can re-run the second cell to re-define jupyter notebook state. Clearning logic will be added soon.