# Template Agent

In [2]:
import helper_tools.parser as parser
import importlib
import pandas as pd

importlib.reload(parser)

relation_df, entity_df, docs = parser.synthie_parser("train")

Fetching 27 files:   0%|          | 0/27 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:00<00:00, 33000.03it/s]


Uploading Entities to Qdrant.


100%|██████████| 46/46 [00:06<00:00,  7.17it/s]


Uploading Predicates to Qdrant.


100%|██████████| 29/29 [00:03<00:00,  8.99it/s]


In [3]:
from langchain_openai import ChatOpenAI
from langchain_ollama.embeddings import OllamaEmbeddings
from langfuse.callback import CallbackHandler
from dotenv import load_dotenv
import os

load_dotenv()
langfuse_handler = CallbackHandler(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host=os.getenv("LANGFUSE_HOST"),
)

model = ChatOpenAI(model_name="Meta-Llama-3.3-70B-Instruct", base_url="https://api.sambanova.ai/v1", api_key=os.getenv("SAMBANOVA_API_KEY"))
embeddings = OllamaEmbeddings(model='nomic-embed-text')

In [4]:
target_doc = docs.iloc[0]
doc_id = target_doc["docid"]
text = target_doc["text"]
text

'Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom.'

# Development Space

In [17]:
from langchain_core.prompts import PromptTemplate
import re

def result_formatter_agent(state):
    prompt = PromptTemplate.from_template("""
    You are an expert in formatting results of multi-agent-systems, which are used for closed information extraction. Therefore, your task is to produce triples in turtle format, that can be inserted in the underlying knowledge graph. Therefore, you will get access to the full state of the multi-agent-system including the full call trace, the comments of the planner and the result checker, the provided input text and all intermediate results. Please note, that the so called relation extraction agent will output more triples than necessary due to prompting. Please reduce the output so, that no triple is a duplicate of another. If you want to incorporate reasoning in your output make sure that you enclose the turtle output in <ttl> tags, so that it can be extracted afterwards.
    
    Agent Call Trace: {call_trace}
    Agent Comments: {comments}
    The provided input text: {text}
    All intermediate results produced during the process: {results}
    """)
    
    response_chain = prompt | model
    
    response = response_chain.invoke(state, config={"callbacks": [langfuse_handler]})
    
    result_match = re.search(r'<ttl>(.*?)</ttl>', response.content, re.DOTALL)

    print(response.content)

    if result_match:
        result = result_match.group(1)
    else:
        result = ""
    
    state["results"] += [result]
          
    return state

In [18]:
import pickle

finish_mas_state = pickle.load(open("../state_storage/finish_mas.state", "rb"))
state = result_formatter_agent(finish_mas_state)

Based on the provided information and the current state of the plan execution, I will proceed with integrating the extracted entities and relations into the Knowledge Graph.

The extracted entities and their corresponding URIs are:
- Corfe Castle railway station: http://www.wikidata.org/entity/Q5170476
- Swanage Railway: http://www.wikidata.org/entity/Q7653559
- Corfe Castle: http://www.wikidata.org/entity/Q1236511
- United Kingdom: http://www.wikidata.org/entity/Q145

The extracted relations and their corresponding URIs are:
- "is located on": http://www.wikidata.org/entity/P276 (location)
- "is located in": http://www.wikidata.org/entity/P276 (location) or http://www.wikidata.org/entity/P159 (headquarters location)
- "passes through": http://www.wikidata.org/entity/P81 (railway line(s) subject is directly connected to)

The triples to be inserted into the Knowledge Graph are:
<ttl>
http://www.wikidata.org/entity/Q5170476 http://www.wikidata.org/entity/P276 http://www.wikidata.org/ent

# Final Results

In [27]:
final_result = state["results"][-1].replace("\n", "")
final_result = final_result.split(" .")
final_result = [triple.split(" ") for triple in final_result[:-1]]
final_result

[['http://www.wikidata.org/entity/Q5170476',
  'http://www.wikidata.org/entity/P276',
  'http://www.wikidata.org/entity/Q7653559'],
 ['http://www.wikidata.org/entity/Q5170476',
  'http://www.wikidata.org/entity/P276',
  'http://www.wikidata.org/entity/Q1236511'],
 ['http://www.wikidata.org/entity/Q5170476',
  'http://www.wikidata.org/entity/P276',
  'http://www.wikidata.org/entity/Q145'],
 ['http://www.wikidata.org/entity/Q7653559',
  'http://www.wikidata.org/entity/P81',
  'http://www.wikidata.org/entity/Q1236511'],
 ['http://www.wikidata.org/entity/Q7653559',
  'http://www.wikidata.org/entity/P276',
  'http://www.wikidata.org/entity/Q145'],
 ['http://www.wikidata.org/entity/Q1236511',
  'http://www.wikidata.org/entity/P276',
  'http://www.wikidata.org/entity/Q145']]

In [28]:
pred_relation_df = pd.DataFrame(final_result, columns=["subject_uri", "predicate_uri", "object_uri"]).drop_duplicates()
pred_relation_df

Unnamed: 0,subject_uri,predicate_uri,object_uri
0,http://www.wikidata.org/entity/Q5170476,http://www.wikidata.org/entity/P276,http://www.wikidata.org/entity/Q7653559
1,http://www.wikidata.org/entity/Q5170476,http://www.wikidata.org/entity/P276,http://www.wikidata.org/entity/Q1236511
2,http://www.wikidata.org/entity/Q5170476,http://www.wikidata.org/entity/P276,http://www.wikidata.org/entity/Q145
3,http://www.wikidata.org/entity/Q7653559,http://www.wikidata.org/entity/P81,http://www.wikidata.org/entity/Q1236511
4,http://www.wikidata.org/entity/Q7653559,http://www.wikidata.org/entity/P276,http://www.wikidata.org/entity/Q145
5,http://www.wikidata.org/entity/Q1236511,http://www.wikidata.org/entity/P276,http://www.wikidata.org/entity/Q145
