# Template Agent

In [2]:
import helper_tools.parser as parser
import importlib
import pandas as pd

importlib.reload(parser)

relation_df, entity_df, docs = parser.synthie_parser("train")

Fetching 27 files:   0%|          | 0/27 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:00<00:00, 6523.02it/s]


Uploading Entities to Qdrant.


100%|██████████| 46/46 [00:05<00:00,  7.81it/s]


Uploading Predicates to Qdrant.


100%|██████████| 29/29 [00:03<00:00,  9.08it/s]


In [3]:
from langchain_openai import ChatOpenAI
from langchain_ollama.embeddings import OllamaEmbeddings
from langfuse.callback import CallbackHandler
from dotenv import load_dotenv
import os

load_dotenv()
langfuse_handler = CallbackHandler(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host=os.getenv("LANGFUSE_HOST"),
)

model = ChatOpenAI(model_name="Meta-Llama-3.3-70B-Instruct", base_url="https://api.sambanova.ai/v1", api_key=os.getenv("SAMBANOVA_API_KEY"))
embeddings = OllamaEmbeddings(model='nomic-embed-text')

In [4]:
target_doc = docs.iloc[0]
doc_id = target_doc["docid"]
text = target_doc["text"]
text

'Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom.'

# Development Space

In [18]:
from langchain_core.prompts import PromptTemplate
import re


def result_checker_agent(state):
    prompt = PromptTemplate.from_template("""
    You are an expert in monitoring multi-agent-systems. In this case you are giving feedback on the process to the planning agent. Therefore, you can see the plans made, as well as agent calls and the history of comments. In addition, you will have access to a text, that should be transformed into triplets, which can be inserted into an underlying knowledge graph. This task often requires multiple iterations to really catch every entity and relation especially those, that are not visible first glimpse. As long as you think the result can be improved, just response with your feedback, which will be processed by the planner in the next step. Really push the result to the edge, what an LLM can do.
    
    In addition to giving feedback, your task is to decide, when the multi-agent-system has come to a reasonable result. If so, just include <FINISH_MAS> in your response. A reasonable result would be, if the result contains just URIs of all relation and all entities and relations can be mapped into the underlying knowledge graph and all triples can be generated. The output will afterwards be formatted by another agent before getting to the user.  
    
    Agent Call Trace: {call_trace}
    Agent Comments: {comments}
    The provided input text: {text}
    All intermediate results produced during the process: {results}
    """)
    
    response_chain = prompt | model
    
    response = response_chain.invoke(state, config={"callbacks": [langfuse_handler]})
          
    return response

In [12]:
mock_plan = """
Given the inputs, the plan to process the provided text into a triple format for closed information extraction using an underlying Knowledge Graph is as follows:

1. **Text Preprocessing**:
   - **Task**: Clean and normalize the input text to prepare it for entity and relation extraction.
   - **Agent**: No specific agent is required for this step; it can be handled by the Agent Instructor itself or through a generic text preprocessing module.
   - **Input**: The provided input text.
   - **Output**: Preprocessed text.

2. **Entity Extraction**:
   - **Task**: Extract entities from the preprocessed text.
   - **Agent**: Entity Extraction Agent.
   - **Input**: Preprocessed text from Step 1.
   - **Output**: List of extracted entities.

3. **Relation Extraction**:
   - **Task**: Extract relations from the preprocessed text.
   - **Agent**: Relation Extraction Agent.
   - **Input**: Preprocessed text from Step 1.
   - **Output**: List of extracted relations.

4. **URI Detection for Entities**:
   - **Task**: For each extracted entity, determine if there is an associated entity in the Knowledge Graph.
   - **Agent**: URI Detection Agent.
   - **Input**: List of extracted entities from Step 2.
   - **Output**: List of entities with their corresponding URIs in the Knowledge Graph.

5. **URI Detection for Relations**:
   - **Task**: For each extracted relation, determine if there is an associated relation in the Knowledge Graph.
   - **Agent**: URI Detection Agent.
   - **Input**: List of extracted relations from Step 3.
   - **Output**: List of relations with their corresponding URIs in the Knowledge Graph.

6. **Triple Formation**:
   - **Task**: Construct triples using the entities and relations with their URIs.
   - **Agent**: No specific agent is required; this can be handled by the Agent Instructor or a generic triple formation module.
   - **Input**: Outputs from Steps 4 and 5.
   - **Output**: List of triples in the format (subject, predicate, object).

7. **Verification and Validation**:
   - **Task**: Verify that the formed triples are valid and consistent with the Knowledge Graph.
   - **Agent**: This step may involve the URI Detection Agent for validation against the Knowledge Graph.
   - **Input**: List of triples from Step 6.
   - **Output**: Final validated list of triples.

Given that this is the first call, the plan is outlined in its entirety. The next task to be executed is **Text Preprocessing** (Step 1), as it is the initial step in preparing the input text for further processing by the specialized agents.

We are currently at the beginning of the plan, with no tasks executed yet. The Agent Call Trace and Agent Comments are empty, and there are no intermediate results. The provided input text is "Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom."

"""
comments = [mock_plan]

mock_call_trace = [('entity_extraction_agent', 'Extract entities from the provided preprocessed text.')]
mock_results = ["Output of entity_extraction_agent: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom"]

response = result_checker_agent({"text": text, "results": mock_results, "call_trace": mock_call_trace, "comments": comments, "instruction": ""})
print(response.content)

Based on the provided plan and the current state of the multi-agent system, I can see that the entity extraction step has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next step in the plan is Relation Extraction, which involves extracting relations from the preprocessed text. To proceed, I recommend calling the Relation Extraction Agent with the preprocessed text as input.

Additionally, I would like to suggest that the preprocessed text should be made available for future reference, as it will be required for the Relation Extraction step. 

Please proceed with calling the Relation Extraction Agent. I will provide further feedback after reviewing the output of this step.


In [17]:
mock_feedback = """
Based on the provided plan and the current state of the multi-agent system, I can see that the entity extraction step has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next step in the plan is Relation Extraction, which involves extracting relations from the preprocessed text. To proceed, I recommend calling the Relation Extraction Agent with the preprocessed text as input.

Additionally, I would like to suggest that the preprocessed text should be made available for future reference, as it will be required for the Relation Extraction step. 

Please proceed with calling the Relation Extraction Agent. I will provide further feedback after reviewing the output of this step.
"""

mock_plan_2 = """
Based on the provided information, we are currently at Step 3 of the plan, which is **Relation Extraction**. The previous step, **Entity Extraction**, has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next task to be executed is to call the **Relation Extraction Agent** with the preprocessed text as input. The preprocessed text is the same as the original input text, as the **Text Preprocessing** step has not been explicitly mentioned as completed, but it can be assumed that the text has been preprocessed for the entity extraction step.

The input for the **Relation Extraction Agent** will be the preprocessed text: "Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom."

The expected output from the **Relation Extraction Agent** will be a list of extracted relations from the text.

Here is the current state of the plan:

1. **Text Preprocessing**: Completed (assumed)
2. **Entity Extraction**: Completed
	* Output: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom
3. **Relation Extraction**: Pending
	* Input: Preprocessed text
	* Expected Output: List of extracted relations
4. **URI Detection for Entities**: Pending
5. **URI Detection for Relations**: Pending
6. **Triple Formation**: Pending
7. **Verification and Validation**: Pending

Please proceed with calling the **Relation Extraction Agent** with the preprocessed text as input. I will provide further feedback after reviewing the output of this step.

"""

comments += [mock_feedback, mock_plan_2]

mock_call_trace += [("relation_extraction_agent", "Extract relations from the text, focusing on the entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.")]
mock_results += ["""Output of entity_extraction_agent: 
<triple>Corfe Castle railway station, located on, Swanage Railway</triple>
<triple>Corfe Castle railway station, located in, Corfe Castle</triple>
<triple>Corfe Castle, located in, United Kingdom</triple>
<triple>Corfe Castle railway station, located in, United Kingdom</triple>
"""]

response = result_checker_agent({"text": text, "results": mock_results, "call_trace": mock_call_trace, "comments": comments, "instruction": ""})
print(response.content)

Based on the provided plan and the current state of the multi-agent system, I can see that the entity extraction step has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next step in the plan is Relation Extraction, which involves extracting relations from the preprocessed text. However, I notice that the Relation Extraction Agent has been called multiple times with the same input, and the output is a list of extracted triples. 

The output of the Relation Extraction Agent is:
<triple>Corfe Castle railway station, located on, Swanage Railway</triple>
<triple>Corfe Castle railway station, located in, Corfe Castle</triple>
<triple>Corfe Castle, located in, United Kingdom</triple>
<triple>Corfe Castle railway station, located in, United Kingdom</triple>

I recommend proceeding with the URI Detection for Entities step, which involves determining the URIs for the extracted entities in the K

In [20]:
mock_feedback_2 = """
Based on the provided plan and the current state of the multi-agent system, I can see that the entity extraction step has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The next step in the plan is Relation Extraction, which involves extracting relations from the preprocessed text. However, I notice that the Relation Extraction Agent has been called multiple times with the same input, and the output is a list of extracted triples. 

The output of the Relation Extraction Agent is:
<triple>Corfe Castle railway station, located on, Swanage Railway</triple>
<triple>Corfe Castle railway station, located in, Corfe Castle</triple>
<triple>Corfe Castle, located in, United Kingdom</triple>
<triple>Corfe Castle railway station, located in, United Kingdom</triple>

I recommend proceeding with the URI Detection for Entities step, which involves determining the URIs for the extracted entities in the Knowledge Graph. The input for this step will be the list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

Additionally, I suggest proceeding with the URI Detection for Relations step, which involves determining the URIs for the extracted relations in the Knowledge Graph. The input for this step will be the list of extracted relations: located on, located in.

Please proceed with calling the URI Detection Agent for both entities and relations. I will provide further feedback after reviewing the output of these steps. 

The current state of the plan is:
1. **Text Preprocessing**: Completed (assumed)
2. **Entity Extraction**: Completed
	* Output: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom
3. **Relation Extraction**: Completed
	* Output: <triple>Corfe Castle railway station, located on, Swanage Railway</triple>
				 <triple>Corfe Castle railway station, located in, Corfe Castle</triple>
				 <triple>Corfe Castle, located in, United Kingdom</triple>
				 <triple>Corfe Castle railway station, located in, United Kingdom</triple>
4. **URI Detection for Entities**: Pending
5. **URI Detection for Relations**: Pending
6. **Triple Formation**: Pending
7. **Verification and Validation**: Pending

Please proceed with the next steps. I will review the output and provide further feedback.

"""

mock_plan_3 = """
Based on the provided plan and the current state of the multi-agent system, we are currently at Step 4 and 5 of the plan, which are **URI Detection for Entities** and **URI Detection for Relations**. 

The previous steps, **Entity Extraction** and **Relation Extraction**, have been completed. The output of the **Entity Extraction** step is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom. The output of the **Relation Extraction** step is a list of extracted triples.

The next tasks to be executed are to call the **URI Detection Agent** for both entities and relations. 

For **URI Detection for Entities**, the input will be the list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom. The expected output will be a list of entities with their corresponding URIs in the Knowledge Graph.

For **URI Detection for Relations**, the input will be the list of extracted relations: located on, located in. The expected output will be a list of relations with their corresponding URIs in the Knowledge Graph.

Here is the current state of the plan:

1. **Text Preprocessing**: Completed (assumed)
2. **Entity Extraction**: Completed
	* Output: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom
3. **Relation Extraction**: Completed
	* Output: 
		<triple>Corfe Castle railway station, located on, Swanage Railway</triple>
		<triple>Corfe Castle railway station, located in, Corfe Castle</triple>
		<triple>Corfe Castle, located in, United Kingdom</triple>
		<triple>Corfe Castle railway station, located in, United Kingdom</triple>
4. **URI Detection for Entities**: Pending
5. **URI Detection for Relations**: Pending
6. **Triple Formation**: Pending
7. **Verification and Validation**: Pending

Please proceed with calling the **URI Detection Agent** for both entities and relations. I will provide further feedback after reviewing the output of these steps. 

The next task is to execute **URI Detection for Entities** and **URI Detection for Relations**. 

Please call the **URI Detection Agent** with the following inputs:
- Entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom
- Relations: located on, located in

I will review the output and provide further feedback.

"""

comments += [mock_feedback_2, mock_plan_3]

mock_call_trace += [('uri_detection_agent', 'Corfe Castle railway station, Swanage Railway, Corfe Castle, United Kingdom')]
mock_results += ["""Output of uri_detection_agent: 
After checking the responses from the URI detection tool, I've compiled an overall mapping of search terms to URIs. Here are the results:

* Corfe Castle railway station: http://www.wikidata.org/entity/Q5170476
* Swanage Railway: http://www.wikidata.org/entity/Q7653559
* Corfe Castle: http://www.wikidata.org/entity/Q1236511
* United Kingdom: http://www.wikidata.org/entity/Q145

The following search terms have multiple possible URIs, but I've selected the one that seems to be the most relevant based on the label:
* Corfe Castle railway station: The tool also returned URIs for Corfe Castle (http://www.wikidata.org/entity/Q1236511) and Corfe Castle (village) (http://www.wikidata.org/entity/Q13341461), but the most relevant one seems to be http://www.wikidata.org/entity/Q5170476.
* Corfe Castle: The tool also returned URIs for Corfe Castle (village) (http://www.wikidata.org/entity/Q13341461) and Corfe Castle railway station (http://www.wikidata.org/entity/Q5170476), but the most relevant one seems to be http://www.wikidata.org/entity/Q1236511.
* United Kingdom: The tool also returned a URI for United States (http://www.wikidata.org/entity/Q30), but this seems to be an error, and the most relevant one is http://www.wikidata.org/entity/Q145.

The following search terms have URIs that don't seem to match exactly, but are still relevant:
* Swanage Railway: The tool also returned a URI for Corfe Castle railway station (http://www.wikidata.org/entity/Q5170476), which is related to Swanage Railway, but not exactly the same thing.

No search terms were found that might not be present in the knowledge graph. All search terms have at least one possible URI mapping.
"""]

response = result_checker_agent({"text": text, "results": mock_results, "call_trace": mock_call_trace, "comments": comments, "instruction": ""})
print(response.content)

Based on the provided plan and the current state of the multi-agent system, I can see that the entity extraction step has been completed, and the output is a list of extracted entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom.

The relation extraction step has also been completed, and the output is a list of extracted triples:

<triple>Corfe Castle railway station, located on, Swanage Railway</triple>
<triple>Corfe Castle railway station, located in, Corfe Castle</triple>
<triple>Corfe Castle, located in, United Kingdom</triple>
<triple>Corfe Castle railway station, located in, United Kingdom</triple>

The URI detection step for entities has been completed, and the output is a list of entities with their corresponding URIs in the Knowledge Graph:

* Corfe Castle railway station: http://www.wikidata.org/entity/Q5170476
* Swanage Railway: http://www.wikidata.org/entity/Q7653559
* Corfe Castle: http://www.wikidata.org/entity/Q1236511
* United Kingdo