# "LLMGraphTransformer"

- https://github.com/langchain-ai/langchain/blob/master/libs/experimental/langchain_experimental/graph_transformers/llm.py#L279
- https://github.com/neo4j-labs/llm-graph-builder/blob/main/backend/src/openAI_llm.py
- https://python.langchain.com/docs/use_cases/graph/constructing/



In [2]:
allowed_nodes = ["Person", "Location", "Event", "Organisation"]
allowed_rels = ["PARTICIPATED_TO", "MEMBER_OF"]

messages = [(
          "system",
          f"""
## 1. Overview
You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
- **Nodes** represent entities and concepts. They're akin to Wikipedia nodes.
- The aim is to achieve simplicity and clarity in the knowledge graph, making it accessible for a vast audience.
## 2. Labeling Nodes
- **Consistency**: Ensure you use basic or elementary types for node labels.
  - For example, when you identify an entity representing a person, always label it as **"person"**. Avoid using more specific terms like "mathematician" or "scientist".
- **Node IDs**: Never utilize integers as node IDs. Node IDs should be names or human-readable identifiers found in the text.
{'- **Allowed Node Labels:**' + ", ".join(allowed_nodes) if allowed_nodes else ""}
{'- **Allowed Relationship Types**:' + ", ".join(allowed_rels) if allowed_rels else ""}
## 3. Handling Numerical Data and Dates
- Numerical data, like age or other related information, should be incorporated as attributes or properties of the respective nodes.
- **No Separate Nodes for Dates/Numbers**: Do not create separate nodes for dates or numerical values. Always attach them as attributes or properties of nodes.
- **Property Format**: Properties must be in a key-value format.
- **Quotation Marks**: Never use escaped single or double quotes within property values.
- **Naming Convention**: Use camelCase for property keys, e.g., `birthDate`.
## 4. Coreference Resolution
- **Maintain Entity Consistency**: When extracting entities, it's vital to ensure consistency.
If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"),
always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the entity ID.
Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
## 5. Strict Compliance
- **Not allowed Values** : Do not use 'Source' as label for any node and 'RELATIONSHIP' as relationship type for any relationships in graph.
- **Colon values** : You may encounter colon(:) in content (example: Time references, description of title after heading). Please 
treat them as text and do not treat them as dictionaries. For example, if time "10:00" is mentioned, considered it as part of text 
content, not as data structure.
Adhere to the rules strictly. Non-compliance will result in termination.
          """),
            ("human", "Use the given format to extract information from the following input: {input}"),
            ("human", "Tip: Make sure to answer in the correct format"),
]

In [4]:
import os
from dotenv import load_dotenv
load_dotenv()

OPEN_API_KEY = os.getenv("OPEN_API_KEY")

In [6]:
text = """Edward Jones (7 April 1824 – c. 1893 or 1896), also known as "the boy Jones", was an English stalker who became notorious for breaking into Buckingham Palace several times between 1838 and 1841.

Jones was fourteen years old when he first broke into the palace in December 1838. He was found in possession of some items he had stolen, but was acquitted at his trial. He broke in again in 1840, ten days after Queen Victoria had given birth to Princess Victoria. Staff found him hiding under a sofa and he was arrested and subsequently questioned by the Privy Council—the monarch's formal body of advisers. He was sentenced to three months' hard labour at Tothill Fields Bridewell prison. He was released in March 1841 and broke back into the palace two weeks later, where he was caught stealing food from the larders. He was again arrested and sentenced to three months' hard labour at Tothill Fields.
"""

In [7]:
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")

llm_transformer = LLMGraphTransformer(llm=llm)

documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

Nodes:[Node(id='Edward Jones', type='Person'), Node(id='Buckingham Palace', type='Place'), Node(id='Queen Victoria', type='Person'), Node(id='Princess Victoria', type='Person'), Node(id='Tothill Fields Bridewell Prison', type='Place')]
Relationships:[Relationship(source=Node(id='Edward Jones', type='Person'), target=Node(id='Buckingham Palace', type='Place'), type='BROKE_INTO'), Relationship(source=Node(id='Edward Jones', type='Person'), target=Node(id='Queen Victoria', type='Person'), type='BROKE_INTO'), Relationship(source=Node(id='Edward Jones', type='Person'), target=Node(id='Princess Victoria', type='Person'), type='BROKE_INTO'), Relationship(source=Node(id='Edward Jones', type='Person'), target=Node(id='Tothill Fields Bridewell Prison', type='Place'), type='SENTENCED_TO')]
