# Relation Extraction Agent

In [2]:
import helper_tools.parser as parser
import importlib
import pandas as pd

importlib.reload(parser)

relation_df, entity_df, docs = parser.synthie_parser("train")

Fetching 27 files:   0%|          | 0/27 [00:00<?, ?it/s]

100%|██████████| 10/10 [00:00<00:00, 4984.91it/s]


Uploading Entities to Qdrant.


100%|██████████| 46/46 [00:05<00:00,  8.00it/s]


Uploading Predicates to Qdrant.


100%|██████████| 29/29 [00:03<00:00,  9.06it/s]


In [3]:
from langchain_openai import ChatOpenAI
from langchain_ollama.embeddings import OllamaEmbeddings
from langfuse.callback import CallbackHandler
from dotenv import load_dotenv
import os

load_dotenv()
langfuse_handler = CallbackHandler(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    host=os.getenv("LANGFUSE_HOST"),
)

model = ChatOpenAI(model_name="Meta-Llama-3.3-70B-Instruct", base_url="https://api.sambanova.ai/v1", api_key=os.getenv("SAMBANOVA_API_KEY"))
embeddings = OllamaEmbeddings(model='nomic-embed-text')

In [4]:
target_doc = docs.iloc[0]
doc_id = target_doc["docid"]
text = target_doc["text"]
text

'Corfe Castle railway station is a station on the Swanage Railway in the village of Corfe Castle, in the United Kingdom.'

# Development Space

In [19]:
from langchain_core.prompts import PromptTemplate
import re


def relation_extraction_agent(state):
    prompt = PromptTemplate.from_template("""
    
    You are an expert for relation extraction out of text in a multi-agent-system for closed information extraction. You will receive a text out of the state from which you should extract all relation. In addition, the agent_instructor might give you an instruction, which you should follow. Your task is then to follow the optional instruction as well as this system prompt and return a list of all triples, where each triple is enclosed in <triple> tags and subject, predicate and object are comma separated from each other. Enclose your pure result in <result> tags
    
    The provided input text: {text}
    Instruction: {instruction}
    
    """)
    
    response_chain = prompt | model
    
    response = response_chain.invoke(state, config={"callbacks": [langfuse_handler]})
    
    result_match = re.search(r'<result>(.*?)</result>', response.content, re.DOTALL)
    print(result_match)
    if result_match:
        result = result_match.group(1)
    else:
        result = ""
    print(response.content)
    
    return result

In [20]:
mock_instruction = "Extract relations from the text, focusing on the entities: Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom."

response = relation_extraction_agent({"text": text, "instruction": mock_instruction})
print(response)

<re.Match object; span=(1126, 1423), match='<result>\n<triple>Corfe Castle railway station, l>
To accomplish the task of extracting relations from the given text with a focus on the specified entities (Corfe Castle railway station, Swanage Railway, Corfe Castle, and United Kingdom), we analyze the text as follows:

1. **Corfe Castle railway station** is a station on the **Swanage Railway**. This establishes a relation between Corfe Castle railway station and Swanage Railway, where the predicate could be "located on" or "part of".

2. The station is in the village of **Corfe Castle**, indicating a relation between Corfe Castle railway station and Corfe Castle, with a predicate such as "located in".

3. The village of **Corfe Castle** is in the **United Kingdom**, establishing a relation between Corfe Castle and the United Kingdom, with a predicate like "part of" or "located in".

4. By extension, since Corfe Castle railway station is in Corfe Castle, and Corfe Castle is in the United Ki