# Planning Agent

In [9]:
import helper_tools.parser as parser
import importlib
import pandas as pd

importlib.reload(parser)

relation_df, entity_df, docs = parser.synthie_parser("train")

Fetching 27 files:   0%|          | 0/27 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:00<00:00, 20203.78it/s]


Uploading Entities to Qdrant.


100%|██████████| 46/46 [00:11<00:00,  3.99it/s]


Uploading Predicates to Qdrant.


100%|██████████| 29/29 [00:10<00:00,  2.89it/s]


In [26]:
from langchain_openai import ChatOpenAI
from langchain_ollama.embeddings import OllamaEmbeddings
from langfuse.callback import CallbackHandler
from dotenv import load_dotenv
import os

load_dotenv()
langfuse_handler = CallbackHandler(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host=os.getenv("LANGFUSE_HOST"),
)

model = ChatOpenAI(model_name="Meta-Llama-3.3-70B-Instruct", base_url="https://api.sambanova.ai/v1", api_key=os.getenv("SAMBANOVA_API_KEY"))
embeddings = OllamaEmbeddings(model='nomic-embed-text')

In [27]:
target_doc = docs.iloc[0]
doc_id = target_doc["docid"]
text = target_doc["text"]
text

'Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom.'

In [36]:
from langchain_core.prompts import PromptTemplate
import re

def planner(state):
    prompt = PromptTemplate.from_template("""
    You are an expert in planning and executing tasks within multi-agent systems. Your role is to design and refine a detailed plan that processes a given text into a triple format, specifically for closed information extraction using an underlying Knowledge Graph. You design the plan for the agent instructor agent, which should execute your plan, call and instruct agents. It is only able to execute one step at a time. Your plan must be based on the following inputs:
    - Agent Call Trace
    - Agent Comments
    - The provided input text
    - All intermediate results produced during the process
    
    For executing the tasks, you can include the following agents in the plan:
    - **Entity Extraction Agent:** Can extract entities from the text.
    - **Relation Extraction Agent:** Can extract relations from the text.
    - **URI Detection Agent:** Based on search terms, can determine if there is an associated entity or relation in the Knowledge Graph.
    
    Your plan should clearly outline the steps required to achieve the goal, ensuring that each phase is actionable and verifiable. The plan will be passed to the Agent Instructor, who will execute the steps through a series of Agent Calls. You will be asked to build up a plan, as long as no final result is done. Your response should be precise, structured, and demonstrate deep expertise in orchestrating complex multi-agent systems for closed Information Extraction tasks. Please line up the plan that you have, to accomplish the task. Do not include tasks that are already worked on. Your plan does not have to include steps like triple formation or verification as this is either covered by the result checker agent or externally.
    
    If you are called for the first time write down the full plan. If you are called afterwards just say what the next task is and where in your plan we are. The next task could also be to end the iteration, because a reasonable result has been reached. If so, include <FINISH_MAS> in your response. Otherwise, never speak from <FINISH_MAS> in your response.
    
    Please base your plan on the following information:
    
    Agent Call Trace: {call_trace}
    Agent Comments: {comments}
    The provided input text: {text}
    All intermediate results produced during the process: {results}
    """)
   
    response_chain = prompt | model
    
    response = response_chain.invoke(state, config={"callbacks": [langfuse_handler]})
          
    return response

In [37]:
response = planner({"text": text, "results": [], "call_trace": [], "comments": []})
print(response.content)

Given the initial state of the system with no prior agent calls, comments, or intermediate results, the plan to process the provided input text into a triple format for closed information extraction using the underlying Knowledge Graph will involve the following steps:

1. **Entity Extraction**: The first step is to extract entities from the input text. This involves calling the **Entity Extraction Agent** to identify and list all entities mentioned in the text. The input text is: "Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom."

2. **Relation Extraction**: After identifying the entities, the next step is to extract relations among these entities. This will be done by calling the **Relation Extraction Agent**. The agent will analyze the text to determine how the extracted entities are related to each other.

3. **URI Detection for Entities**: With the entities extracted, the next step is to determine if there are 

In [32]:
mock_plan = """
Given the inputs, the plan to process the provided text into a triple format for closed information extraction using an underlying Knowledge Graph is as follows:

1. **Text Preprocessing**:
   - **Task**: Clean and normalize the input text to prepare it for entity and relation extraction.
   - **Agent**: No specific agent is required for this step; it can be handled by the Agent Instructor itself or through a generic text preprocessing module.
   - **Input**: The provided input text.
   - **Output**: Preprocessed text.

2. **Entity Extraction**:
   - **Task**: Extract entities from the preprocessed text.
   - **Agent**: Entity Extraction Agent.
   - **Input**: Preprocessed text from Step 1.
   - **Output**: List of extracted entities.

3. **Relation Extraction**:
   - **Task**: Extract relations from the preprocessed text.
   - **Agent**: Relation Extraction Agent.
   - **Input**: Preprocessed text from Step 1.
   - **Output**: List of extracted relations.

4. **URI Detection for Entities**:
   - **Task**: For each extracted entity, determine if there is an associated entity in the Knowledge Graph.
   - **Agent**: URI Detection Agent.
   - **Input**: List of extracted entities from Step 2.
   - **Output**: List of entities with their corresponding URIs in the Knowledge Graph.

5. **URI Detection for Relations**:
   - **Task**: For each extracted relation, determine if there is an associated relation in the Knowledge Graph.
   - **Agent**: URI Detection Agent.
   - **Input**: List of extracted relations from Step 3.
   - **Output**: List of relations with their corresponding URIs in the Knowledge Graph.

6. **Triple Formation**:
   - **Task**: Construct triples using the entities and relations with their URIs.
   - **Agent**: No specific agent is required; this can be handled by the Agent Instructor or a generic triple formation module.
   - **Input**: Outputs from Steps 4 and 5.
   - **Output**: List of triples in the format (subject, predicate, object).

7. **Verification and Validation**:
   - **Task**: Verify that the formed triples are valid and consistent with the Knowledge Graph.
   - **Agent**: This step may involve the URI Detection Agent for validation against the Knowledge Graph.
   - **Input**: List of triples from Step 6.
   - **Output**: Final validated list of triples.

Given that this is the first call, the plan is outlined in its entirety. The next task to be executed is **Text Preprocessing** (Step 1), as it is the initial step in preparing the input text for further processing by the specialized agents.

We are currently at the beginning of the plan, with no tasks executed yet. The Agent Call Trace and Agent Comments are empty, and there are no intermediate results. The provided input text is "Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom."

"""

mock_feedback = """
Based on the provided plan and the current state of the multi-agent system, I can see that the entity extraction step has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next step in the plan is Relation Extraction, which involves extracting relations from the preprocessed text. To proceed, I recommend calling the Relation Extraction Agent with the preprocessed text as input.

Additionally, I would like to suggest that the preprocessed text should be made available for future reference, as it will be required for the Relation Extraction step. 

Please proceed with calling the Relation Extraction Agent. I will provide further feedback after reviewing the output of this step.
"""

comments = [mock_plan, mock_feedback]

mock_call_trace = [('entity_extraction_agent', 'Extract entities from the given text, focusing on locations and organizations.')]
mock_results = ["Output of entity_extraction_agent: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom"]

response = planner({"text": text, "results": mock_results, "call_trace": mock_call_trace, "comments": comments, "instruction": ""})
print(response.content)

Based on the provided information, we are currently at Step 3 of the plan, which is **Relation Extraction**. The previous step, **Entity Extraction**, has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next task to be executed is to call the **Relation Extraction Agent** with the preprocessed text as input. The preprocessed text is the same as the original input text, as the **Text Preprocessing** step has not been explicitly mentioned as completed, but it can be assumed that the text has been preprocessed for the entity extraction step.

The input for the **Relation Extraction Agent** will be the preprocessed text: "Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom."

The expected output from the **Relation Extraction Agent** will be a list of extracted relations from the text.

Here is the current state of the plan

In [33]:
mock_plan = """
Based on the provided information, we are currently at Step 3 of the plan, which is **Relation Extraction**. The previous step, **Entity Extraction**, has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next task to be executed is to call the **Relation Extraction Agent** with the preprocessed text as input. The preprocessed text is the same as the original input text, as the **Text Preprocessing** step has not been explicitly mentioned as completed, but it can be assumed that the text has been preprocessed for the entity extraction step.

The input for the **Relation Extraction Agent** will be the preprocessed text: "Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom."

The expected output from the **Relation Extraction Agent** will be a list of extracted relations from the text.

Here is the current state of the plan:

1. **Text Preprocessing**: Completed (assumed)
2. **Entity Extraction**: Completed
	* Output: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom
3. **Relation Extraction**: Pending
	* Input: Preprocessed text
	* Expected Output: List of extracted relations
4. **URI Detection for Entities**: Pending
5. **URI Detection for Relations**: Pending
6. **Triple Formation**: Pending
7. **Verification and Validation**: Pending

Please proceed with calling the **Relation Extraction Agent** with the preprocessed text as input. I will provide further feedback after reviewing the output of this step.

"""

mock_feedback = """
Based on the provided plan and the current state of the multi-agent system, I can see that the entity extraction step has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next step in the plan is Relation Extraction, which involves extracting relations from the preprocessed text. However, I notice that the Relation Extraction Agent has been called multiple times with the same input, and the output is a list of extracted triples. 

The output of the Relation Extraction Agent is:
<triple>Corfe Castle railway station, located on, Swanage Railway</triple>
<triple>Corfe Castle railway station, located in, Corfe Castle</triple>
<triple>Corfe Castle, located in, United Kingdom</triple>
<triple>Corfe Castle railway station, located in, United Kingdom</triple>

I recommend proceeding with the URI Detection for Entities step, which involves determining the URIs for the extracted entities in the Knowledge Graph. The input for this step will be the list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

Additionally, I suggest proceeding with the URI Detection for Relations step, which involves determining the URIs for the extracted relations in the Knowledge Graph. The input for this step will be the list of extracted relations: located on, located in.

Please proceed with calling the URI Detection Agent for both entities and relations. I will provide further feedback after reviewing the output of these steps. 

The current state of the plan is:
1. **Text Preprocessing**: Completed (assumed)
2. **Entity Extraction**: Completed
	* Output: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom
3. **Relation Extraction**: Completed
	* Output: <triple>Corfe Castle railway station, located on, Swanage Railway</triple>
				 <triple>Corfe Castle railway station, located in, Corfe Castle</triple>
				 <triple>Corfe Castle, located in, United Kingdom</triple>
				 <triple>Corfe Castle railway station, located in, United Kingdom</triple>
4. **URI Detection for Entities**: Pending
5. **URI Detection for Relations**: Pending
6. **Triple Formation**: Pending
7. **Verification and Validation**: Pending

Please proceed with the next steps. I will review the output and provide further feedback.

"""

comments += [mock_plan, mock_feedback]

mock_call_trace += [("relation_extraction_agent", "Extract relations from the text, focusing on the entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.")]
mock_results += ["""Output of entity_extraction_agent: 
<triple>Corfe Castle railway station, located on, Swanage Railway</triple>
<triple>Corfe Castle railway station, located in, Corfe Castle</triple>
<triple>Corfe Castle, located in, United Kingdom</triple>
<triple>Corfe Castle railway station, located in, United Kingdom</triple>
"""]

response = planner({"text": text, "results": mock_results, "call_trace": mock_call_trace, "comments": comments, "instruction": ""})
print(response.content)

Based on the provided plan and the current state of the multi-agent system, we are currently at Step 4 and 5 of the plan, which are **URI Detection for Entities** and **URI Detection for Relations**. 

The previous steps, **Entity Extraction** and **Relation Extraction**, have been completed. The output of the **Entity Extraction** step is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom. The output of the **Relation Extraction** step is a list of extracted triples.

The next tasks to be executed are to call the **URI Detection Agent** for both entities and relations. 

For **URI Detection for Entities**, the input will be the list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom. The expected output will be a list of entities with their corresponding URIs in the Knowledge Graph.

For **URI Detection for Relations**, the input will be the list of extracted relations: locat