# Relation Extraction Agent

In [2]:
import helper_tools.parser as parser
import importlib
import pandas as pd

importlib.reload(parser)

relation_df, entity_df, docs = parser.synthie_parser("train")

Fetching 27 files:   0%|          | 0/27 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:00<00:00, 4984.91it/s]


Uploading Entities to Qdrant.


100%|██████████| 46/46 [00:05<00:00,  8.00it/s]


Uploading Predicates to Qdrant.


100%|██████████| 29/29 [00:03<00:00,  9.06it/s]


In [3]:
from langchain_openai import ChatOpenAI
from langchain_ollama.embeddings import OllamaEmbeddings
from langfuse.callback import CallbackHandler
from dotenv import load_dotenv
import os

load_dotenv()
langfuse_handler = CallbackHandler(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host=os.getenv("LANGFUSE_HOST"),
)

model = ChatOpenAI(model_name="Meta-Llama-3.3-70B-Instruct", base_url="https://api.sambanova.ai/v1", api_key=os.getenv("SAMBANOVA_API_KEY"))
embeddings = OllamaEmbeddings(model='nomic-embed-text')

In [4]:
target_doc = docs.iloc[0]
doc_id = target_doc["docid"]
text = target_doc["text"]
text

'Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom.'

# Development Space

In [28]:
from langchain_core.prompts import PromptTemplate
import re


def relation_extraction_agent(state):
    prompt = PromptTemplate.from_template("""
    
    You are an expert for relation extraction out of text in a multi-agent-system for closed information extraction. You will receive a text out of the state from which you should extract all relation. As closed information extraction uses an underlying knowledge graph, there can be different names for similar predicates. Therefore, extract also alternative predicates, when applicable (i.e. Berlin, located in, Germany -> Berlin, country, Germany). 
     
    In addition, the agent_instructor might give you an instruction, which you should follow. Your task is then to follow the optional instruction as well as this system prompt and return a list of all triples, where each triple is enclosed in <triple> tags and subject, predicate and object are comma separated from each other. Enclose your pure result in <result> tags
    
    The provided input text: {text}
    Instruction: {instruction}
    
    """)
    
    response_chain = prompt | model
    
    response = response_chain.invoke(state, config={"callbacks": [langfuse_handler]})
    
    print(response.content)
    
    result_match = re.search(r'<result>(.*?)</result>', response.content, re.DOTALL)

    print(response.content)

    if result_match:
        result = result_match.group(1)
    else:
        result = ""
    
    return result

In [30]:
mock_instruction = "Extract relations from the text, focusing on the entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom."

result = relation_extraction_agent({"text": text, "instruction": mock_instruction})
print(result)

To extract relations from the given text focusing on the entities Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom, we identify the following relationships:

1. Corfe Castle railway station is located on the Swanage Railway.
2. Corfe Castle railway station is situated in the village of Corfe Castle.
3. Corfe Castle is located in the United Kingdom.

Given these relationships, we can also consider alternative predicates based on the context:

- For "Corfe Castle railway station is located on the Swanage Railway," an alternative predicate could be "served by" or simply "on," but the most direct relation is "located on."
- For "Corfe Castle railway station is situated in the village of Corfe Castle," alternative predicates could be "part of" or "located in."
- For "Corfe Castle is located in the United Kingdom," an alternative predicate could be "country" or "part of," but "located in" is the most straightforward.

Thus, the extracted relations with their al

In [31]:
relation_df[relation_df["docid"] == doc_id]

Unnamed: 0,docid,subject,subject_uri,predicate,predicate_uri,object,object_uri
0,0,Corfe_Castle_railway_station,http://www.wikidata.org/entity/Q5170476,connecting line,http://www.wikidata.org/entity/P81,Swanage_Railway,http://www.wikidata.org/entity/Q7653559
1,0,Corfe_Castle_railway_station,http://www.wikidata.org/entity/Q5170476,named after,http://www.wikidata.org/entity/P138,Corfe_Castle,http://www.wikidata.org/entity/Q1236511
2,0,Corfe_Castle_railway_station,http://www.wikidata.org/entity/Q5170476,located in the administrative territorial entity,http://www.wikidata.org/entity/P131,Corfe_Castle_(village),http://www.wikidata.org/entity/Q13341461
3,0,Corfe_Castle_railway_station,http://www.wikidata.org/entity/Q5170476,country,http://www.wikidata.org/entity/P17,United_Kingdom,http://www.wikidata.org/entity/Q145
