## RAG assisted Auto Developer 
-- with LionAGI, LlamaIndex, Autogen and OAI code interpreter


Let us develop a dev bot that can 
- read and understand lionagi's existing codebase
- QA with the codebase to clarify tasks
- produce and tests pure python codes with code interpreter with automatic followup if quality is less than expected
- output final runnable python codes 

This tutorial shows you how you can automatically produce high quality prototype and drafts codes customized for your own codebase 

In [1]:
# %pip install --upgrade --force-reinstall lionagi llama-index pyautogen tree_sitter tree_sitter_languages

1. Create a new folder, Download the notebook in it, and open in IDE.


2. input OPENAI_API_KEY in a `.env` file under project directory


3. download lionagi's source code, which is the whole folder `lionagi` under the root directory of lionagi Github Repo, (not the whole repo)


4. rename the folder to `'lionagi_codes'` and put in the same directory as your notebook. 

5. we will use `lionagi`'s entire source codes, not documentations, as our data source

In [2]:
from pathlib import Path
from IPython.display import Markdown

In [3]:
ext = [".py"]  # extension of files of interest
data_dir = Path.cwd() / "lionagi_codes"  # directory of source data - lionagi codebase

In [4]:
import llama_index.core

llama_index.core.__version__

'0.10.23.post1'

## 1. Setup

- llamaIndex query engine
- autogen coder

#### a. LlamaIndex query engine

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding, OpenAIEmbeddingModelType

Settings.llm = OpenAI(model="gpt-4-turbo-preview")
Settings.embed_model = OpenAIEmbedding(
    model=OpenAIEmbeddingModelType.TEXT_EMBED_3_LARGE
)

#### Build from Data

In [6]:
# from llama_index.core import SimpleDirectoryReader
# from llama_index.core.text_splitter import CodeSplitter

# code_splitter = CodeSplitter(
#     language="python",
#     chunk_lines=100,
#     chunk_lines_overlap=10,
#     max_chars=1500
# )

# reader = SimpleDirectoryReader(data_dir, required_exts=['.py'], recursive=True)
# documents = reader.load_data()
# nodes = code_splitter.get_nodes_from_documents(documents);

In [7]:
# from llama_index.core import VectorStoreIndex

# index = VectorStoreIndex(nodes)
# index.storage_context.persist(persist_dir="./lionagi_index")

#### Build from Storage

In [8]:
from llama_index.core import load_index_from_storage, StorageContext

index_id = "759f925b-5c71-4f59-a11f-4c018feb9e0f"

storage_context = StorageContext.from_defaults(persist_dir="./lionagi_index")
index = load_index_from_storage(storage_context, index_id=index_id)

#### Build Query Engine

In [9]:
from llama_index.core.postprocessor import LLMRerank

reranker = LLMRerank(choice_batch_size=10, top_n=5)
query_engine = index.as_query_engine(node_postprocessors=[reranker])

#### Try out Query Engine

In [10]:
query = "think step by step, explain what does BaseAgent object do in details, also explain its relationship with executable branch"
response = await query_engine.aquery(query)

In [11]:
display(Markdown(response.response))
print(response.source_nodes[0])

The `BaseAgent` object, as defined in the provided code, is a complex entity designed to manage and execute tasks within a given structure, alongside an executable object, and potentially parse the output of these tasks. Here's a detailed breakdown of its functionality and its relationship with the executable branch:

1. **Initialization (`__init__` method)**: When a `BaseAgent` object is instantiated, it initializes itself with a given structure, an executable object, and an optional output parser. It also creates a `StartMail` object and a `MailManager` object. The `MailManager` is initialized with a list containing the structure, the executable object, and the `StartMail` object. This setup suggests that the agent is designed to manage communications or tasks involving these components.

2. **Mail Manager Control (`mail_manager_control` method)**: This asynchronous method continuously checks if either the structure or the executable object has been instructed to stop executing. If either is set to stop, the method then instructs the `MailManager` to also stop. This loop runs at a specified refresh time interval, ensuring that the agent can respond promptly to stop commands.

3. **Execution (`execute` method)**: The execution flow begins by setting the `start_context` with a provided context and triggering the `start` method of the `StartMail` object with the context and identifiers for the structure and executable. It then asynchronously calls multiple execute methods (for the structure, the executable, the mail manager, and the mail manager control) with a specified delay. After these calls, it resets the stop flags for the structure, executable, and mail manager to `False`, indicating they are ready for further operations. If an output parser is defined, it processes the agent itself as its input and returns the result.

4. **Relationship with the Executable Branch**: The executable branch seems to interact with the `BaseAgent` in a couple of ways, primarily through the `_agent_process` method. This method takes an agent (presumably a `BaseAgent` instance) and a verbosity flag as inputs. It then executes the agent with a given context (responses), optionally prints verbose messages, and appends the result of the agent's execution to its responses. This suggests that the executable branch is responsible for managing or orchestrating the execution of agents, possibly handling their outputs and integrating them into a larger workflow or process. The `_process_start` and `_process_end` methods in the executable branch indicate it also manages the lifecycle of tasks or operations, initiating them with a start mail and concluding them upon receiving an end mail, further integrating with the agent's communication or task management mechanisms.

In summary, the `BaseAgent` is designed as a central component in a system that manages and executes tasks, communicates through mails, and potentially processes the results of these tasks. Its relationship with the executable branch indicates a collaborative interaction where the branch orchestrates the execution of agents, manages their lifecycle, and integrates their outputs into broader processes or workflows.

Node ID: 14a70fdf-9ab7-4171-a28e-19f3f0e18467
Text: class BaseAgent(BaseRelatableNode):     def __init__(self,
structure, executable_obj, output_parser=None) -> None:
super().__init__()         self.structure = structure
self.executable = executable_obj         self.start = StartMail()
self.mailManager = MailManager([self.structure, self.executable,
self.start])         s...
Score:  10.000



#### b. Setup Autogen

- define a function to return autogen `user_proxy` and `gpt_assistant`, 
- which will be used to create codes

In [12]:
coder_instruction = """
You are an expert at writing python codes. 
1. Write pure python codes, and 
2. run it to validate the codes
3. then return with the full implementation when the task is resolved and there is no problem. and add the word   TERMINATE
4. Reply FAILED if you cannot solve the problem.
"""

In [13]:
import autogen
from autogen.agentchat.contrib.gpt_assistant_agent import GPTAssistantAgent
from autogen.agentchat import UserProxyAgent


def get_autogen_coder():
    config_list = autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        file_location=".",
        filter_dict={
            "model": [
                "gpt-4-turbo-preview",
            ],
        },
    )

    # Initiate an agent equipped with code interpreter
    gpt_assistant = GPTAssistantAgent(
        name="Coder Assistant",
        llm_config={
            "assistant_id": None,
            "tools": [{"type": "code_interpreter"}],
            "config_list": config_list,
        },
        instructions=coder_instruction,
        verbose=False,
    )

    user_proxy = UserProxyAgent(
        name="user_proxy",
        is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
        code_execution_config={
            "work_dir": "coding",
            "use_docker": False,  # set to True or image name like "python:3" to use docker
        },
        human_input_mode="NEVER",
    )

    return gpt_assistant, user_proxy

#### c. Create Tools

- ingest data
- create tools for query engine,
- create tools for autgen coder

In [14]:
source_codes_responses = []


async def query_codebase(query):
    """
    Perform a query to a QA bot with access to a vector index built with package lionagi codebase

    Args:
        query (str): The query string to search for in the LionAGI codebase.

    Returns:
        str: The string representation of the response content from the codebase query.
    """
    response = await query_engine.aquery(query)
    source_codes_responses.append(response)
    return str(response.response)

##### c-3. Create Coder tool

In [16]:
gpt_assistant, user_proxy = "", ""
coder_responses = []

surpress the messages from creating the agent

In [17]:
%%capture

gpt_assistant, user_proxy = get_autogen_coder();

In [18]:
def continue_code_python(instruction):
    """
    Continues an ongoing chat session. Give an instruction to a coding assistant to write pure python codes

    Args:
        instruction (str): The follow-up instruction or query to send to the GPT assistant.

    Returns:
        str: The latest message received from the GPT assistant in response to the follow-up instruction.
    """
    user_proxy.send(recipient=gpt_assistant, message=instruction)
    coder_responses.append(gpt_assistant.chat_messages)
    return str(gpt_assistant.last_message())

In [19]:
user_proxy.initiate_chat(
    gpt_assistant, message="here is a test, create a hello world in python"
);

[33muser_proxy[0m (to Coder Assistant):

here is a test, create a hello world in python

--------------------------------------------------------------------------------
[33mCoder Assistant[0m (to user_proxy):

Here is the Python code that prints "Hello, World!":

```python
print("Hello, World!")
```

TERMINATE


--------------------------------------------------------------------------------


#### d. Put into `lionagi.Tool` object

In [20]:
from lionagi import func_to_tool

tools = func_to_tool([query_codebase, continue_code_python])
tools[0].content = source_codes_responses
tools[1].content = coder_responses

## 2. Draft Workflow

#### a. Prompts

In [21]:
system = {
    "persona": "A helpful software engineer",
    "requirements": """
        Think step-by-step and provide thoughtful, clear, precise answers. 
        Maintain a humble yet confident tone.
    """,
    "responsibilities": """
        Assist with coding in the lionagi Python package.
    """,
    "tools": """
        Use a QA bot for grounding responses and a coding assistant 
        for writing pure Python code.
    """,
}

function_call1 = {
    "notice": """
        Use the QA bot tool at least 2 times at each task step, 
        identified by the step number. You can ask up to 3 questions 
        each time. This bot can query source codes with natural language 
        questions and provides natural language answers. Decide when to 
        invoke function calls. You have to ask the bot for clarifications 
        or additional information as needed, up to ten times if necessary.
    """
}

function_call2 = {
    "notice": """
        Use the coding assistant tool at least once at each task step, 
        and again if a previous run failed. This assistant can write 
        and run Python code in a sandbox environment, responding to 
        natural language instructions with 'success' or 'failed'. Provide 
        clear, detailed instructions for AI-based coding assistance. 
    """
}

# Step 1: Understanding User Requirements
instruct1 = {
    "task_step": "1",
    "task_name": "Understand User Requirements",
    "task_objective": "Comprehend user-provided task fully",
    "task_description": """
        Analyze and understand the user's task. Develop plans 
        for approach and delivery. 
    """,
}

# Step 2: Proposing a Pure Python Solution
instruct2 = {
    "task_step": "2",
    "task_name": "Propose a Pure Python Solution",
    "task_objective": "Develop a detailed pure Python solution",
    "task_description": """
        Customize the coding task for lionagi package requirements. 
        Use a QA bot for clarifications. Focus on functionalities 
        and coding logic. Add lots more details here for 
        more finetuned specifications. If the assistant's work is not
        of sufficient quality, rerun the assistant tool
    """,
    "function_call": function_call1,
}


# Step 3: Writing Pure Python Code
instruct3 = {
    "task_step": "3",
    "task_name": "Write Pure Python Code",
    "task_objective": "Give detailed instruction to a coding bot",
    "task_description": """
        Instruct the coding assistant to write executable Python code 
        based on improved task understanding. The bot doesn't know
        the previous conversation, so you need to integrate the conversation
        and give detailed instructions. You cannot just say things like, 
        as previsouly described. You must give detailedm instruction such
        that a bot can write it. Provide a complete, well-structured full 
        implementation if successful. If failed, rerun, report 'Task failed' 
        and the most recent code attempt after a second failure. If the 
        assistant's work is not of sufficient quality, rerun the assistant tool
    """,
    "function_call": function_call2,
}

In [22]:
issue = {
    "create template class hierarchy": """
        currently lionagi intakes string or dictionary for instruction
        and context inputs, but as workflow gets complex, it is not 
        convinient, basing on lionagi's codebase, propose and implement
        a suitable template class hierarchy. 
            1. return the fully implemented class hierarchy
            2. google style docstring 
            3. pep-8 linting
            4. add type hints
    """
}

In [23]:
from lionagi import Session


async def solve_in_python(context):

    coder = Session(system, tools=tools)
    coder.llmconfig.update({"temperature": 0.4})

    await coder.chat(instruct1, context=context)

    # instruction 2: clarify needs according to codebase
    await coder.followup(instruct2, max_followup=5, tools=tools[0])

    # use OpenAI code Interpreter to write pure python solution
    await coder.followup(instruct3, max_followup=3, tools=tools[1])

    # save to csv
    coder.to_csv_file()
    coder.log_to_csv()

    return coder

In [24]:
coder = await solve_in_python(issue)

[33muser_proxy[0m (to Coder Assistant):

Write a Python class hierarchy for managing instructions and contexts in a workflow system. The hierarchy should include:

1. An abstract base class named `InstructionBase` with an abstract method `process` that does not take any arguments besides `self` and does not return anything.

2. Two derived classes from `InstructionBase` named `StringInstruction` and `DictInstruction`. `StringInstruction` should have an `__init__` method that accepts a string `instruction` and stores it. `DictInstruction` should have an `__init__` method that accepts a dictionary `instruction` and stores it. Both classes should implement the `process` method, which can simply pass for now.

3. An abstract base class named `ContextBase` with an abstract method `update_context` that accepts a new context as an argument and does not return anything.

4. Two derived classes from `ContextBase` named `SimpleContext` and `ComplexContext`. `SimpleContext` should have an `__in