`What is a subject?` $\implies$ Any Python function, method, class or block of code. 

We need to create the Agent that will decide the tool to use based on a query. There will be two agents: 
1. **Simple vs Complex Agent**. This agent will classify the query of the user into one of these categories: 

- Simple:  Zero or nne subject present in the question. *Example: How does the function x work?*
- Complex: More than one subject.  *Example: What are the differences between Class A and Class B?*

- Tool returned: **`SimpleRetriever`**


2. **General vs Particular Agent**. This agent will the output from the first agent only if the question type was: **`Simple`** and classify the question into one of two categories: 

- Particular: The question involves only the subject. *Example: How does the function x work?*
- General: The question is formulated in such a way that the question does not have to do only with the subject. *Will my changes break anything?*

- Tool returned: **`SimpleRetriever`** or **`GeneralRetriever`**

In [1]:
from app.agent.multi_agent import MultiAgent
from app.agent.agent import ContextTypeAgent, QuestionTypeAgent
from app.prompts.prompts import SIMPLE_VS_COMPLEX, GENERAL_VS_PARTICULAR_CONTEXT
from app.prompts.prompt import Prompt

import asyncio
import nest_asyncio
nest_asyncio.apply()

# simple vs complex -> 0-1 vs > 1 subjects
simple_vs_complex_prompt = Prompt(prompt=SIMPLE_VS_COMPLEX)
simple_vs_complex_agent = QuestionTypeAgent(
    instruction=simple_vs_complex_prompt
)

# general vs particular
general_vs_particular_prompt = Prompt(GENERAL_VS_PARTICULAR_CONTEXT)
general_vs_particular_agent = ContextTypeAgent(
    instruction=general_vs_particular_prompt
    )

multi_agent = MultiAgent(agents=[
    simple_vs_complex_agent, 
    general_vs_particular_agent
])

  from .autonotebook import tqdm as notebook_tqdm


# 1. Classification Testing

**We are going to test how the agent classify the questions and what tools does it choose to answer the question**

In [2]:
from app.database.base import get_db
from sqlalchemy.orm import Session

db = next(get_db())

In [3]:
from app.retrievers.general_retriever import GeneralRetriever
from app.retrievers.similarity_retriever import SimilarityRetriever


async def tool_pipeline(agent: MultiAgent, query: str, db: Session=db):
    tool, output = await agent.pipeline(query=query)
    tool: GeneralRetriever | SimilarityRetriever = tool(db=db)
    return tool.query_database(query=query, subjects=output.subject)

### 1.1 General question retriever

**This one is a tricky one because we are specifying a line of code besides the function. That is not an issue because of how our retrievers are built. We'll look to see if can find the subject based on the function name, method name or class name fields of the NodeMetadata.node_metadata column. That is why Postgres is so powerfull!**

In [5]:
query = "What will happen If I remove the line 'os.remove(name_file)' of the function _create_node_relationships_file"
nodes, nodes_with_score = asyncio.run(tool_pipeline(agent=multi_agent, query=query, db=db))

[34mAgent reasoning response: The subject is the function _create_node_relationships_file therefore the answer is simple because there is only one subject[0m
[34mAgent answer: simple 
[0m
[34mAgent reasoning response: The question refers to what will happen when removing a line from the function, which is not about the subject itself, but about its effects, therefore the answer is general[0m
[34mAgent answer: general 
[0m
[32mTool decided by the agent: GeneralRetriever[0m




[34m	Exact match of subject: {'_create_node_relationships_file'} in the database. --> 1[0m


In [6]:
# we print some of the nodes in which the function _create_node_relationships_file appears
for node in nodes:
    print(node.text[:300])

def _create_node_relationships_file(db: Session):
    
    fields = ('class_name', 'function_name', 'method_name')
    name_file = os.environ['NAMES_FILE']
    relationships_file = os.environ['RELATIONSHIPS_FILE']
    
    os.remove(name_file)
    
    if not os.path.exists(name_file):
        nodes
async def update_nodes_store(db: Session = Depends(get_db)):
    updated_files = _create_file_node(path=os.environ['USER_CODE_DIRECTORY'], db=db)
    _create_node_relationships_file(db=db)
    return {"py_files": updated_files}

async def upload_file_zip(file: UploadFile = File(...), db: Session = Depends(get_db)):

    extract_dir = os.environ['USER_CODE_DIRECTORY']
    if not os.path.exists(extract_dir):
        # shutil.rmtree(extract_dir)
        
        os.makedirs(extract_dir, exist_ok=True)

        with open(f"{ext


### 1.2 Particular Question retrieval

**Let's see one in which several retries have been needed to actually get a coherent answer: multiple subjects $\implies$ complex question whereas one subject $\implies$ simple.**

In [7]:
query = "Explain to me how the class NodePostProccesor is implemented"
nodes, _ = asyncio.run(tool_pipeline(agent=multi_agent, query=query, db=db))

[34mAgent reasoning response: The subject is the class NodePostProccesor therefore the answer is complex because there is one class.[0m
[34mAgent answer: complex 
[0m
[31mNot valid answer. Applying retries[0m
[31m	Retry agent response: {"question_type": "complex", "subject": ["NodePostProccesor"], "reasoning": "The subject is the class NodePostProccesor therefore the answer is complex because there is one subject"}[0m
[31m	Retry agent response: {"question_type": "complex", "subject": ["NodePostProccesor"], "reasoning": "The subject is the class NodePostProccesor therefore the answer is complex because there is only one subject, which is a class"}[0m
[31m	Retry agent response: {"question_type": "complex", "subject": ["NodePostProccesor"], "reasoning": "The subject is the class NodePostProccesor therefore the answer is complex because there is only one subject which is a class"}[0m
[31m	Retry agent response: {"question_type": "simple", "subject": ["NodePostProccesor"], "reas

In [8]:
for node in nodes:
    print(node.text[:300])

class NodePostProccesor:
  
    def __init__(self, retrieved_nodes_score: List[NodeWithScore], db: Session, score_threshold: float=0.8, min_parent_nodes: int=2, min_file_nodes: int=2):
        
        self._retrieved_nodes_score = [node for node in retrieved_nodes_score if node.score > score_thresh


**As we can see, the output has modified the name of the subject: NodePostProcesor has changed to NodePostProcessor. This implies that we won't get a perfect match from the database, so similarity search will be performed. We still get the appropiate Node.**

In [9]:
# let's try again with the same question but removing the word class
# this time it gets a coherent answer faster
query = "Explain to me how NodePostProccesor is implemented"
nodes = asyncio.run(tool_pipeline(agent=multi_agent, query=query, db=db))

[34mAgent reasoning response: The subject is NodePostProccesor class therefore the answer is complex because there is one subject.[0m
[34mAgent answer: complex 
[0m
[31mNot valid answer. Applying retries[0m
[31m	Retry agent response: {"question_type": "simple", "subject": ["NodePostProcessor"], "reasoning": "The subject is NodePostProcessor, therefore the answer is simple because there is only one subject"}[0m
[34mAgent reasoning response: The question is regarding the implementation of NodePostProcessor itself, therefore the answer is particular[0m
[34mAgent answer: particular 
[0m
[32mTool decided by the agent: SimilarityRetriever[0m




[34mOriginal Query: Explain to me how NodePostProccesor is implemented[0m
[34m	New query to look with subject: NodePostProcessor -->  explain to me how nodepostproccesor is implemented[0m


In [10]:
for node in nodes[0]:
    print(node.text[:300])

class NodePostProccesor:
  
    def __init__(self, retrieved_nodes_score: List[NodeWithScore], db: Session, score_threshold: float=0.8, min_parent_nodes: int=2, min_file_nodes: int=2):
        
        self._retrieved_nodes_score = [node for node in retrieved_nodes_score if node.score > score_thresh


### 1.3 Complex question retrieval

**Now we want our Agent to identify multiple subjects.**

In [11]:
query = "How does the function _create_file_node depend on the function _create_nodes_of_file?"
nodes, _ = asyncio.run(tool_pipeline(agent=multi_agent, query=query, db=db))

[34mAgent reasoning response: The subjects are: _create_file_node function and _create_nodes_of_file function, therefore the answer is complex because there are more than one subject.[0m
[34mAgent answer: complex 
[0m
[32mTool decided by the agent: SimilarityRetriever[0m




[34mOriginal Query: How does the function _create_file_node depend on the function _create_nodes_of_file?[0m
[34m	Exact match of subject: _create_file_node in the database. --> 1[0m
[34m	Exact match of subject: _create_nodes_of_file in the database. --> 1[0m


In [12]:
for node in nodes:
    print(node.text[:300])

def _create_file_node(path: str, db: Session):
    updated_files = []
    for root, _, files in os.walk(path):
        for file in files:
            lines = open(os.path.join(root, file), "r").readlines()
            text = "".join(lines)
            hash = calculate_hash(text)
            file = F
def _create_nodes_of_file(path: str, db: Session, file_id: str):
    files_structure_folder = os.environ['FILES_STRUCTURE_FOLDER']
    os.makedirs(files_structure_folder, exist_ok=True)
    
    if path.startswith("."): path = path[2:]
    elif path.startswith(".."): path = path[3:]
    
    file_na


# 2. Question-Answer with agents

**Now that we've seen that our agent classify properly some questions, let's actually find an answer.**

In [36]:
from app.agent.llama_client import LlamaClient
from app.printer import Printer
from app.prompts.prompts import PROMPT_TO_ANSWER_QUESTIONS

printer = Printer()
query = "How does the function _create_file_node depend on the function _create_nodes_of_file?"

llm = LlamaClient()

def format_answer(answer: str, max_words: int) -> str:
    answer = answer.replace("\n\n", "\n")
    lines = answer.split("\n")
    processed_answer = " "
    for line in lines:
        words = line.split(" ")
        for k in range(0, len(words), max_words):
            processed_line = " ".join(words[k: k + max_words])
            processed_answer += "\n" + processed_line
    return processed_answer

async def query_pipeline(agent: MultiAgent, 
                         query: str, 
                         llm: LlamaClient, 
                         db: Session) -> str:
    tool, output = await agent.pipeline(query=query)
    tool: GeneralRetriever | SimilarityRetriever = tool(db=db)
    nodes, nodes_with_score = tool.query_database(query=query, subjects=output.subject)
    for node_with_score in nodes_with_score:
        printer.print_blue(f"Score: {node_with_score.score} for text: \n{node_with_score.node.text[:300]}\n")
    context = "\n".join([node.text for node in nodes])
    prompt = Prompt.format_prompt(prompt=PROMPT_TO_ANSWER_QUESTIONS, context=context, query=query)
    answer = await llm.acall(query=prompt)
    return answer

In [22]:
answer = asyncio.run(
    query_pipeline(agent=multi_agent, 
                   query=query, 
                   llm=llm, 
                   db=db)
)

[34mAgent reasoning response: The subjects are: _create_file_node function and _create_nodes_of_file function, therefore the answer is complex because there are more than one subject.[0m
[34mAgent answer: complex 
[0m
[32mTool decided by the agent: SimilarityRetriever[0m




[34mOriginal Query: How does the function _create_file_node depend on the function _create_nodes_of_file?[0m
[34m	Exact match of subject: _create_file_node in the database. --> 1[0m
[34m	Exact match of subject: _create_nodes_of_file in the database. --> 1[0m
[34mScore: 1 for text: 
def _create_file_node(path: str, db: Session):
    updated_files = []
    for root, _, files in os.walk(path):
        for file in files:
            lines = open(os.path.join(root, file), "r").readlines()
            text = "".join(lines)
            hash = calculate_hash(text)
            file = F
[0m
[34mScore: 1 for text: 
def _create_nodes_of_file(path: str, db: Session, file_id: str):
    files_structure_folder = os.environ['FILES_STRUCTURE_FOLDER']
    os.makedirs(files_structure_folder, exist_ok=True)
    
    if path.startswith("."): path = path[2:]
    elif path.startswith(".."): path = path[3:]
    
    file_na
[0m


In [37]:
formated_answer = format_answer(answer=answer, max_words=15)
print(formated_answer)

 
The function `_create_file_node` depends on the function `_create_nodes_of_file` in the following ways:
1. In the `_create_file_node` function, when a new file is found, it calls the `_create_nodes_of_file`
function to create nodes for that file. This is evident from the line `db.commit()` and
then calling `_create_nodes_of_file` with the `file_path` and `db`.
2. The `_create_nodes_of_file` function is called twice in the `_create_file_node` function, once when a new
file is found, and again after updating the existing file.
Therefore, the function `_create_file_node` relies on the functionality of the `_create_nodes_of_file` function to create nodes
for each file.


In [38]:
query = "If I change the parameter depth of the method: __retrieve_relationship_nodes what effect will that have?"
answer = asyncio.run(
    query_pipeline(agent=multi_agent, 
                   query=query, 
                   llm=llm, 
                   db=db)
)

[34mAgent reasoning response: The subject is the __retrieve_relationship_nodes method, therefore the answer is simple since there is only one subject[0m
[34mAgent answer: simple 
[0m
[34mAgent reasoning response: The question involves the effect, which is different than the subject, therefore the answer is general.[0m
[34mAgent answer: general 
[0m
[32mTool decided by the agent: GeneralRetriever[0m




[34m	Exact match of subject: {'__retrieve_relationship_nodes'} in the database. --> 1[0m
[34mScore: 1 for text: 
    def __retrieve_relationship_nodes(self, base_id: str, node: Node, depth: int):
        if base_id == str(node.id) or depth == 0: 
            return [node.id]
        relations = []
        node_relationships = node.node_relationships
        if not node_relationships or not len(node_relationshi
[0m


In [40]:
formated_answer = format_answer(answer=answer, max_words=15)
print(formated_answer)

 
A great question!
The `depth` parameter in the `__retrieve_relationship_nodes` method controls how many levels of relationships are retrieved.
Let's look at the code:
```python
def __retrieve_relationship_nodes(self, base_id: str, node: Node, depth: int):
    ...
    if depth == 0:
        return [node.id]
    ...
    relations.extend(self.__retrieve_relationship_nodes(base_id=base_id, node=node_, depth=depth-1))
```
As you can see, the method calls itself recursively with a decreasing `depth` value until
it reaches `depth == 0`. When `depth` is 0, it simply returns a list containing
the ID of the current node.
If you change the `depth` parameter, you'll affect how many levels of relationships are retrieved:
* If `depth` is increased (e.g., from 2 to 3), more levels of relationships will
be retrieved.
* If `depth` is decreased (e.g., from 2 to 1), fewer levels of relationships will
be retrieved.
For example, if you call the method with `depth=2`, it will retrieve all relationships at

This is the same question, but know another subject is involved to confuse the Agent. The *NodePostProccesor* class is the parent class of the method *__retrieve_relationships_nodes*. 

In [43]:
query = "If I change the parameter depth of the method: __retrieve_relationship_nodes of the class NodePostProccesor what effect will that have?"
answer = asyncio.run(
    query_pipeline(agent=multi_agent, 
                   query=query, 
                   llm=llm, 
                   db=db)
)

[34mAgent reasoning response: The subjects are: NodePostProcessor class and __retrieve_relationship_nodes method, therefore the answer is complex because there are more than one subject.[0m
[34mAgent answer: complex 
[0m
[32mTool decided by the agent: SimilarityRetriever[0m




[34mOriginal Query: If I change the parameter depht of the method: __retrieve_relationship_nodes of the class NodePostProccesor what effect will that have?[0m
[34m	New query to look with subject: NodePostProcessor -->  if i change the parameter depht of the method: __retrieve_relationship_nodes of the class nodepostproccesor what effect will that have?[0m
[34m	Exact match of subject: __retrieve_relationship_nodes in the database. --> 1[0m
[34mScore: 0.760384105455402 for text: 
class NodePostProccesor:
  
    def __init__(self, retrieved_nodes_score: List[NodeWithScore], db: Session, score_threshold: float=0.8, min_parent_nodes: int=2, min_file_nodes: int=2):
        
        self._retrieved_nodes_score = [node for node in retrieved_nodes_score if node.score > score_thresh
[0m
[34mScore: 1 for text: 
    def __retrieve_relationship_nodes(self, base_id: str, node: Node, depth: int):
        if base_id == str(node.id) or depth == 0: 
            return [node.id]
        relation

**VERY INTERESTING**. 


The *NodePostProcessor* has been corrected to: *NodePostProccessor*, so there is no match in subjects! Nevertheless, we have obtained the *NodePostProccesor* node thanks to similarity search, which is great!

In [44]:
formated_answer = format_answer(answer=answer, max_words=15)
print(formated_answer)

 
Let's dive into the code and see how the `depth` parameter affects the behavior of
the `__retrieve_relationship_nodes` method.
Here's the relevant code:
```python
def __retrieve_relationship_nodes(self, base_id: str, node: Node, depth: int):
    if base_id == str(node.id) or depth == 0:
        return [node.id]
    relations = []
    node_relationships = node.node_relationships
    if not node_relationships or not len(node_relationships):
        return [node.id]
    for id, _ in node_relationships.items():
        node_ = self._db.get(Node, id)
        relations.extend(self.__retrieve_relationship_nodes(base_id=base_id, node=node_, depth=depth-1))
    return relations
```
The `depth` parameter controls how many levels of relationships are recursively traversed. Here's what happens
when you change the value of `depth`:
* If `depth` is 0, the method will only return the current node's ID (`node.id`).
No recursive traversal will occur.
* If `depth` is greater than 0, the method will re

In [46]:
query = "What will happen in the repository If I remove the line 'os.remove(name_file)' of the function _create_node_relationships_file"
answer = asyncio.run(
    query_pipeline(agent=multi_agent, 
                   query=query, 
                   llm=llm, 
                   db=db)
)

[34mAgent reasoning response: The subject is the function _create_node_relationships_file therefore the answer is complex because there is only one subject[0m
[34mAgent answer: complex 
[0m
[31mNot valid answer. Applying retries[0m
[31m	Retry agent response: {"question_type": "simple", "subject": ["os"], "reasoning": "The subject is the os function therefore the answer is simple because there is only one subject."}[0m
[34mAgent reasoning response: The question involves the repository, which is different than the subject, therefore the answer is general.[0m
[34mAgent answer: general 
[0m
[32mTool decided by the agent: GeneralRetriever[0m
[32mMetadata of the node that we're considering: {'arguments': 'db: Session', 'function_name': '_create_node_relationships_file'}[0m
[32mMetadata of the node that we're considering: {'arguments': 'db: Session = Depends', 'function_name': 'update_nodes_store'}[0m


AttributeError: 'NoneType' object has no attribute 'id'