## RAG assisted Auto Developer 
-- with LionAGI, LlamaIndex, Autogen and OAI code interpreter


Let us develop a dev bot that can 
- read and understand lionagi's existing codebase
- QA with the codebase to clarify tasks
- produce and tests pure python codes with code interpreter with automatic followup if quality is less than expected
- output final runnable python codes 

This tutorial shows you how you can automatically produce high quality prototype and drafts codes customized for your own codebase 

In [1]:
# %pip install lionagi

1. Create a new folder, Download the notebook in it, and open in IDE.


2. input OPENAI_API_KEY in a `.env` file under project directory


3. download lionagi's source code, which is the whole folder `lionagi` under the root directory of lionagi Github Repo, (not the whole repo)


4. rename the folder to `'lionagi_codes'` and put in the same directory as your notebook. 

5. we will use `lionagi`'s entire source codes, not documentations, as our data source

In [2]:
from pathlib import Path
from IPython.display import Markdown

In [3]:
ext=[".py"]             # extension of files of interest
data_dir = Path.cwd() / 'lionagi_codes'   # directory of source data - lionagi codebase

## 1. Setup

Note: only works with llama_index legacy

#### a. define functions to interact with `llama_index` 

- define a index object creation function
- define a `query_engine` object creation function

In [4]:
# check import and will attempt to install for you
from lionagi.util.import_util import ImportUtil

ImportUtil.check_import('llama-index')

In [5]:
# define a function to build index object
from llama_index import ServiceContext, VectorStoreIndex
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding

def get_index(chunks, persist_dir, persist=True):
    llm = OpenAI(temperature=0.1, model="gpt-4-turbo-preview")
    embedding = OpenAIEmbedding(model='text-embedding-3-large')     # use the large embedding model for better performance
    index_service_context = ServiceContext.from_defaults(llm=llm, embed_model=embedding)
    
    _index = VectorStoreIndex(
        chunks, 
        include_embeddings=True, 
        service_context=index_service_context
    )
    if persist:
        _index.storage_context.persist(persist_dir=persist_dir)
    return _index

In [6]:
# define a function to build query engine with LLM rerank
from llama_index.postprocessor import LLMRerank
def get_query_engine(index_, temperature=0.1, model="gpt-4-turbo-preview", rerank=True, **kwargs):
    
    llm = OpenAI(temperature=temperature, model=model)
    service_context = ServiceContext.from_defaults(llm=llm)
    if rerank:
        reranker = LLMRerank(
            choice_batch_size=10,
            top_n=5,
            service_context=service_context,
        )
        kwargs.update({"node_postprocessors":[reranker]})
        
    query_engine = index_.as_query_engine(**kwargs)
    return query_engine

#### b. Setup Autogen

- define a function to return autogen `user_proxy` and `gpt_assistant`, 
- which will be used to create codes

In [None]:
from lionagi.util.import_util import ImportUtil

ImportUtil.check_import(
    package_name = 'autogen', 
    module_name = 'agentchat', 
    pip_name = 'pyautogen'
)

In [8]:
coder_instruction = """
    You are an expert at writing python codes. 
    1. Write pure python codes, and 
    2. run it to validate the codes
    3. then return with the full implementation when the task is resolved and there is no problem. and add the word   TERMINATE
    4. Reply FAILED if you cannot solve the problem.
    """

In [9]:
import autogen
from autogen.agentchat.contrib.gpt_assistant_agent import GPTAssistantAgent
from autogen.agentchat import UserProxyAgent

def get_autogen_coder():
    config_list = autogen.config_list_from_json(
        "OAI_CONFIG_LIST",
        file_location=".",
        filter_dict={
            "model": ["gpt-3.5-turbo", "gpt-35-turbo", "gpt-4", "gpt4", "gpt-4-32k", "gpt-4-turbo-preview"],
        },
    )

    # Initiate an agent equipped with code interpreter
    gpt_assistant = GPTAssistantAgent(
        name="Coder Assistant",
        llm_config={
            "assistant_id": None,
            "tools": [
                {
                    "type": "code_interpreter"
                }
            ],
            "config_list": config_list,
        },
        instructions=coder_instruction,
        verbose = False
    )

    user_proxy = UserProxyAgent(
        name="user_proxy",
        is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
        code_execution_config={
            "work_dir": "coding",
            "use_docker": False,  # set to True or image name like "python:3" to use docker
        },
        human_input_mode="NEVER"
    )
    
    return gpt_assistant, user_proxy

#### c. Create Tools

- ingest data
- create tools for query engine,
- create tools for autgen coder

##### c-1:  Load Data

In [10]:
# as of v0.0.300, load and chunk functions are depreciated
# please use llama_index or langchain directly 
# will be reintroduced in 0.0.301


# # with lionagi interface using llama_index loaders to load and chunk codebase
# load_config = {
#     "reader": "SimpleDirectoryReader", 
#     "reader_type": "llama_index", 
#     "reader_kwargs": {
#         "required_exts":ext,
#         "input_dir": data_dir, 
#         "recursive": True
#         }, 
#     "to_datanode": False
# }

# chunk_config = {
#     "chunker": "CodeSplitter", 
#     "chunker_type": 'llama_index',
#     "chunker_kwargs": {    
#         "language":"python",
#         "chunk_lines":200,  # lines per chunk
#         "chunk_lines_overlap":10,  # lines overlap between chunks
#         "max_chars":3000,  # max chars per chunk
#     },
#     "to_datanode": False
# }

# docs = li.load(**load_config)
# chunks = li.chunk(docs, **chunk_config)

##### c-2: Create Query tool to QA codebase

In [11]:
chunks = ''     # after you get the chunks using llama-index
index = get_index(chunks, persist_dir='storage/lionagi_codes_index', persist=True)
query_engine = get_query_engine(index, rerank=True)

source_codes_responses = []

# the queries will be answered in concurrently in parallel
async def query_codebase(query):
    """
    Perform a query to a QA bot with access to a vector index built with package lionagi codebase

    Args:
        query (str): The query string to search for in the LionAGI codebase.

    Returns:
        str: The string representation of the response content from the codebase query.
    """
    response = await query_engine.aquery(query)
    source_codes_responses.append(response)
    return str(response.response)


In [13]:
query = 'think step by step, explain what does session object do in details'
a = await query_codebase(query)

print(source_codes_responses[0].source_nodes[0])

Node ID: 8c0432b4-4c3d-4ee9-9160-fd100f2fd89c
Text: class Session:
Score:  9.000



In [14]:
Markdown(a)

The `Session` object, as part of a larger application, appears to be designed to manage or facilitate a session within the context of the application's functionality. While the specific details of the `Session` class's methods and properties are not provided, we can infer its potential responsibilities and interactions within the application based on the imports and the few lines of code mentioned.

1. **Environment Configuration**: The `load_dotenv()` function call at the beginning suggests that the `Session` class may rely on environment variables for configuration. This could include settings like API keys, database connections, or other external service configurations that are necessary for the session's operation.

2. **OpenAI Service Integration**: The instantiation of `OpenAIService()` into the `OAIService` variable indicates that the `Session` object interacts with OpenAI services. This could be for a variety of purposes such as generating text, processing natural language, or any other AI-driven task that OpenAI's APIs offer.

3. **Data Handling with Pandas**: The import of `pandas` suggests that the `Session` object may deal with data manipulation, analysis, or transformation tasks. Pandas is a powerful library for handling structured data, so the `Session` could be involved in preparing or processing data for the session's main functionality.

4. **Type Annotations**: The import of various types from the `typing` module (e.g., `Any`, `List`, `Union`, `Dict`, `Optional`, `Callable`, `Tuple`) implies that the `Session` class is designed with type annotations in mind. This helps with code clarity and type checking, indicating a well-structured approach to defining the inputs and outputs of the class's methods.

5. **Application Components Integration**: The imports from relative paths (e.g., `lionagi.schema`, `lionagi._services.oai`, `..messages.messages`, `..branch.branch`, `..branch.branch_manager`) suggest that the `Session` object interacts with various components of the application. This could include:
   - Utilizing schemas defined in `lionagi.schema` for data validation or structure.
   - Sending or receiving messages through components imported from `..messages.messages`.
   - Managing or interacting with branches of a project or workflow, as suggested by the imports of `Branch` and `BranchManager`.

Based on these observations, the `Session` object likely serves as a central part of the application, orchestrating interactions between different services (like OpenAI), managing data, and facilitating the flow of information between various components of the application. It could be responsible for initializing the environment, handling user sessions, processing data, and leveraging AI services to achieve its goals. However, without more detailed information about the methods and properties of the `Session` class, this analysis remains speculative.

##### c-3. Create Coder tool

In [15]:
gpt_assistant, user_proxy = '', ''
coder_responses = []

surpress the messages from creating the agent

In [16]:
%%capture

gpt_assistant, user_proxy = get_autogen_coder();

In [17]:
def continue_code_python(instruction):
    """
    Continues an ongoing chat session. Give an instruction to a coding assistant to write pure python codes

    Args:
        instruction (str): The follow-up instruction or query to send to the GPT assistant.

    Returns:
        str: The latest message received from the GPT assistant in response to the follow-up instruction.
    """
    user_proxy.send(recipient=gpt_assistant, message=instruction)
    coder_responses.append(gpt_assistant.chat_messages)
    return str(gpt_assistant.last_message())

In [18]:
user_proxy.initiate_chat(
    gpt_assistant, 
    message='here is a test, create a hello world in python'
);

[33muser_proxy[0m (to Coder Assistant):

here is a test, create a hello world in python

--------------------------------------------------------------------------------
[33mCoder Assistant[0m (to user_proxy):

The "Hello, World!" program in Python has been created and successfully run. Here is the full implementation:

```python
print("Hello, World!")
```

TERMINATE


--------------------------------------------------------------------------------




#### d. Put into `lionagi.Tool` object

In [19]:
import lionagi as li

funcs = [query_codebase, continue_code_python]
tools = li.lcall(funcs, li.func_to_tool)

In [20]:
print(li.to_readable_dict(tools[0].schema_))

{
    "type": "function",
    "function": {
        "name": "query_codebase",
        "description": "Perform a query to a QA bot with access to a vector index built with package lionagi codebase",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The query string to search for in the LionAGI codebase."
                }
            },
            "required": [
                "query"
            ]
        }
    }
}


## 2. Draft Workflow

#### a. Prompts

In [21]:
system = {
    "persona": "A helpful software engineer",
    "requirements": """
        Think step-by-step and provide thoughtful, clear, precise answers. 
        Maintain a humble yet confident tone.
    """,
    "responsibilities": """
        Assist with coding in the lionagi Python package.
    """,
    "tools": """
        Use a QA bot for grounding responses and a coding assistant 
        for writing pure Python code.
    """
}

function_call1 = {
    "notice": """
        Use the QA bot tool at least 2 times at each task step, 
        identified by the step number. You can ask up to 3 questions 
        each time. This bot can query source codes with natural language 
        questions and provides natural language answers. Decide when to 
        invoke function calls. You have to ask the bot for clarifications 
        or additional information as needed, up to ten times if necessary.
    """
}

function_call2 = {
    "notice": """
        Use the coding assistant tool at least once at each task step, 
        and again if a previous run failed. This assistant can write 
        and run Python code in a sandbox environment, responding to 
        natural language instructions with 'success' or 'failed'. Provide 
        clear, detailed instructions for AI-based coding assistance. 
    """
}

# Step 1: Understanding User Requirements
instruct1 = {
    "task_step": "1",
    "task_name": "Understand User Requirements",
    "task_objective": "Comprehend user-provided task fully",
    "task_description": """
        Analyze and understand the user's task. Develop plans 
        for approach and delivery. 
    """
}

# Step 2: Proposing a Pure Python Solution
instruct2 = {
    "task_step": "2",
    "task_name": "Propose a Pure Python Solution",
    "task_objective": "Develop a detailed pure Python solution",
    "task_description": """
        Customize the coding task for lionagi package requirements. 
        Use a QA bot for clarifications. Focus on functionalities 
        and coding logic. Add lots more details here for 
        more finetuned specifications. If the assistant's work is not
        of sufficient quality, rerun the assistant tool
    """,
    "function_call": function_call1
}


# Step 3: Writing Pure Python Code
instruct3 = {
    "task_step": "3",
    "task_name": "Write Pure Python Code",
    "task_objective": "Give detailed instruction to a coding bot",
    "task_description": """
        Instruct the coding assistant to write executable Python code 
        based on improved task understanding. The bot doesn't know
        the previous conversation, so you need to integrate the conversation
        and give detailed instructions. You cannot just say things like, 
        as previsouly described. You must give detailedm instruction such
        that a bot can write it. Provide a complete, well-structured full 
        implementation if successful. If failed, rerun, report 'Task failed' 
        and the most recent code attempt after a second failure. If the 
        assistant's work is not of sufficient quality, rerun the assistant tool
    """,
    "function_call": function_call2
}

In [22]:
issue = {
    "create template class hierarchy":"""
        currently lionagi intakes string or dictionary for instruction
        and context inputs, but as workflow gets complex, it is not 
        convinient, basing on lionagi's codebase, propose and implement
        a suitable template class hierarchy. 
            1. return the fully implemented class hierarchy
            2. google style docstring 
            3. pep-8 linting
            4. add type hints
    """
}

In [23]:
async def solve_in_python(context):
    
    coder = li.Session(system, tools=tools)
    
    await coder.chat(instruct1, context=context, temperature=0.4)
    
    # instruction 2: clarify needs according to codebase
    await coder.auto_followup(
        instruct2, max_followup=5, temperature=0.4, tools=tools[0])
    
    # use OpenAI code Interpreter to write pure python solution
    await coder.auto_followup(
        instruct3, max_followup=3, temperature=0.4, tools=tools[1])
    
    # save to csv
    coder.to_csv()
    coder.log_to_csv()
    
    return coder

In [24]:
coder = await solve_in_python(issue, num=5, filename="engineer_msgs3.csv")

[33muser_proxy[0m (to Coder Assistant):

Could you please write a Python code implementation that defines a class hierarchy for handling instructions and contexts in a more structured way? The hierarchy should include:

1. Abstract base classes named `InstructionBase` and `ContextBase`. Each of these classes should have at least one abstract method. For `InstructionBase`, define an abstract method `execute` that when implemented, should execute the instruction. For `ContextBase`, define an abstract method `get_context` that when implemented, should return the context data as a dictionary.

2. Concrete classes named `StringInstruction` and `DictInstruction` that inherit from `InstructionBase`. `StringInstruction` should accept a string as an instruction and return it upon execution. `DictInstruction` should accept a dictionary as an instruction and return it upon execution.

3. A concrete class named `ComplexWorkflowContext` that inherits from `ContextBase`. It should accept a diction

```python
from abc import ABC, abstractmethod
from typing import Any, Dict

# Abstract base classes
class InstructionBase(ABC):
    """
    Abstract base class for all instructions.
    
    Subclasses must implement the `execute` method.
    """
    
    @abstractmethod
    def execute(self) -> Any:
        """
        Execute the instruction.
        
        Returns:
            The result of executing the instruction.
        """
        pass

class ContextBase(ABC):
    """
    Abstract base class for all contexts.
    
    Subclasses must implement the `get_context` method.
    """
    
    @abstractmethod
    def get_context(self) -> Dict[str, Any]:
        """
        Get the context data.
        
        Returns:
            A dictionary representing the context data.
        """
        pass

# Concrete instruction classes
class StringInstruction(InstructionBase):
    """
    Represents an instruction in the form of a string.
    
    Args:
        instruction (str): The instruction to be executed.
    """
    
    def __init__(self, instruction: str):
        self.instruction = instruction
    
    def execute(self) -> str:
        """
        Return the instruction string.
        
        Returns:
            The string representing the instruction.
        """
        return self.instruction

class DictInstruction(InstructionBase):
    """
    Represents an instruction in the form of a dictionary.
    
    Args:
        instruction (Dict[str, Any]): The instruction to be executed.
    """
    
    def __init__(self, instruction: Dict[str, Any]):
        self.instruction = instruction
    
    def execute(self) -> Dict[str, Any]:
        """
        Return the instruction dictionary.
        
        Returns:
            The dictionary representing the instruction.
        """
        return self.instruction

# Concrete context class
class ComplexWorkflowContext(ContextBase):
    """
    Represents a context for complex workflows.
    
    Args:
        context_data (Dict[str, Any]): A dictionary representing the context data.
    """
    
    def __init__(self, context_data: Dict[str, Any]):
        self.context_data = context_data
    
    def get_context(self) -> Dict[str, Any]:
        """
        Return the context data.
        
        Returns:
            The context data as a dictionary.
        """
        return self.context_data
```