# Knowledge graph with LangChain demonstration
better to use a virtual environement not mess up with your other configurations.

## Installing dependencies and modules
we make sure we are updating pip and installing langchain and langchain-openai extension, we add support of .env files and we include wikipedia for the demo. 

In [8]:
%pip install --quiet --user --upgrade openai langchain langchain-experimental langchain-openai python-dotenv pyvis wikipedia


Note: you may need to restart the kernel to use updated packages.




## load API key and verify on which pythono we are running.

In [9]:
from dotenv import load_dotenv
import os
import sys

# Print the Python executable path
print(sys.executable)

# Load the .env file
load_dotenv()
# Get API key from environment variable 
# (make sure the key is present in .env file in the project directory)
api_key = os.getenv("OPENAI_API_KEY")

c:\Python312\python.exe


In [25]:
from pyvis.network import Network
import openai

client = openai.OpenAI(api_key=api_key)

# Define the function schema (structured output)
functions = [
    {
        "name": "extract_knowledge_graph",
        "description": "Extracts entities and relationships for a knowledge graph",
        "parameters": {
            "type": "object",
            "properties": {
                "entities": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "id": {"type": "string"},
                            "label": {"type": "string"},
                            "type": {"type": "string"},
                        },
                        "required": ["id", "label", "type"],
                    }
                },
                "relationships": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "source": {"type": "string"},
                            "target": {"type": "string"},
                            "relation": {"type": "string"},
                        },
                        "required": ["source", "target", "relation"],
                    }
                },
            },
            "required": ["entities", "relationships"],
        },
    }
]

# Sample input text
text = """
Marie Curie discovered radium and polonium. She won the Nobel Prize in Physics in 1903. 
Marie Curie was born in Warsaw, Poland.
Marie Curie was born on November 7, 1867.
Maries Curie's husband, Pierre Curie, was also a physicist and won the Nobel Prize in Physics in 1903.
Pierre Curie was born in Paris, France.
Pierre Curie was born on May 15, 1859.
Pierre Curie died on April 19, 1906.
Pierre Curie was a French physicist who worked with Marie Curie on radioactivity.
Pierre Curie and Marie Curie were married in 1895.
Pierre Curie and Marie Curie had two daughters, Irène Joliot-Curie and Ève Curie.
Pierre Curie and Marie Curie were both awarded the Nobel Prize in Physics in 1903 for their joint research on radioactivity.
Pierre Curie and Marie Curie are both considered pioneers in the field of radioactivity.
"""
t = """
Albert Einstein developed the theory of relativity and won the Nobel Prize in Physics in 1921. 
Alan Turing is known for his work in computer science and artificial intelligence. 
Alan Turing was a key figure in the development of theoretical computer science and artificial intelligence.
Alan Turing was also instrumental in breaking the Enigma code during World War II.
Alan Turing was born in London, England.
Alan Turing was born on June 23, 1912.
Alan Turing died on June 7, 1954.
Alan Turing was a British mathematician and logician.
In second World War, Alan Turing worked at Bletchley Park, where he led the team that broke the Enigma code.
Albert Einstein was born in Ulm, Germany.
Albert Einstein was born on March 14, 1879.
Albert Einstein died on April 18, 1955.
In second World War, Albert Einstein worked on the Manhattan Project, which developed the atomic bomb.
Albert Einstein was a German-born theoretical physicist who developed the theory of relativity.
Second World War was a global war that lasted from 1939 to 1945.
Alan Turing and Albert Einstein are both considered pioneers in their respective fields of computer science and theoretical physics.

"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are an assistant that extracts knowledge graphs."},
        {"role": "user", "content": text}
    ],
    functions=functions,
    function_call={"name": "extract_knowledge_graph"}
)

# Extract the structured knowledge graph
graph = response.choices[0].message.function_call.arguments
# print(graph)

# pretty print json
import json 
print(json.dumps(json.loads(graph), indent=2))


# Parse JSON output from the function call
args = json.loads(response.choices[0].message.function_call.arguments)
entities = args["entities"]
relationships = args["relationships"]


# Create and visualize graph with pyvis
net = Network(height="1000px", width="100%", notebook=True, directed=True, bgcolor="#222222", font_color="white")
net.set_options("""
        {
            "physics": {
                "forceAtlas2Based": {
                    "gravitationalConstant": -100,
                    "centralGravity": 0.01,
                    "springLength": 200,
                    "springConstant": 0.08
                },
                "minVelocity": 0.75,
                "solver": "forceAtlas2Based"
            }
        }
        """)
for entity in entities:
    net.add_node(entity["id"], label=entity["label"], title=entity["type"])

for rel in relationships:
    net.add_edge(rel["source"], rel["target"], label=rel["relation"])

output_file = "knowledge_graph_gpt.html"
net.show(output_file)

import webbrowser
webbrowser.open(f"file://{os.path.abspath(output_file)}")
    


{
  "entities": [
    {
      "id": "1",
      "label": "Marie Curie",
      "type": "Person"
    },
    {
      "id": "2",
      "label": "radium",
      "type": "Element"
    },
    {
      "id": "3",
      "label": "polonium",
      "type": "Element"
    },
    {
      "id": "4",
      "label": "Nobel Prize in Physics",
      "type": "Award"
    },
    {
      "id": "5",
      "label": "Warsaw, Poland",
      "type": "Place"
    },
    {
      "id": "6",
      "label": "November 7, 1867",
      "type": "Date"
    },
    {
      "id": "7",
      "label": "Pierre Curie",
      "type": "Person"
    },
    {
      "id": "8",
      "label": "Physics",
      "type": "Field"
    },
    {
      "id": "9",
      "label": "Paris, France",
      "type": "Place"
    },
    {
      "id": "10",
      "label": "May 15, 1859",
      "type": "Date"
    },
    {
      "id": "11",
      "label": "April 19, 1906",
      "type": "Date"
    },
    {
      "id": "12",
      "label": "Ir\u00e8ne Joliot-Cur

True

## Connect langchain to OpenAI gpt-4o for structured output.

In [45]:
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0, model_name="gpt-4o",)

graph_transformer = LLMGraphTransformer(llm=llm)

## access wikipedia and grab a few articles. 
here we had a problem with the page of Alber Camus ! impossible to load through his name ! se I use pageid.

In [14]:
# Importing the Wikipedia library
import wikipedia

# Read Wikipedia page for Jean-Paul Sartre
sartre_summary = wikipedia.page("Jean-Paul Sartre").content

# save the summary to a file
with open("wikipedia_sartre.txt", "w", encoding="utf-8") as f:
    f.write(sartre_summary)

print(sartre_summary[:500]) 


Jean-Paul Charles Aymard Sartre (, US also ; French: [saʁtʁ]; 21 June 1905 – 15 April 1980) was a French philosopher, playwright, novelist, screenwriter, political activist, biographer, and literary critic, considered a leading figure in 20th-century French philosophy and Marxism. Sartre was one of the key figures in the philosophy of existentialism (and phenomenology). His work has influenced sociology, critical theory, post-colonial theory, and literary studies. He was awarded the 1964 Nobel P


In [15]:

# Read Wikipedia page for Albert Camus
pageids = wikipedia.search("Albert Camus")
if not pageids:
    raise ValueError("No page found for Albert Camus")
# Albert Camus Page can not be retreivd !
camus_summary = wikipedia.page(pageid=983).content

# save the summary to a file
with open("wikipedia_camus.txt", "w", encoding="utf-8") as f:
    f.write(camus_summary)

# Print the first 500 characters of the summaries
print(camus_summary[:500])


Albert Camus ( ka-MOO; French: [albɛʁ kamy] ; 7 November 1913 – 4 January 1960) was a French philosopher, author, dramatist, journalist, world federalist, and political activist. He was the recipient of the 1957 Nobel Prize in Literature at the age of 44, the second-youngest recipient in history. His works include The Stranger, The Plague, The Myth of Sisyphus, The Fall and The Rebel.
Camus was born in French Algeria to pied-noir parents. He spent his childhood in a poor neighbourhood and later 


In [3]:

# Read Wikipedia page for Game of Thrones
got_summary = wikipedia.page(pageid=20715044).content

# save the summary to a file
with open("wikipedia_got.txt", "w", encoding="utf-8") as f:
    f.write(got_summary)

# Print the first 500 characters of the summaries
print(got_summary[:500])


Game of Thrones is an American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It is an adaptation of A Song of Ice and Fire, a series of high fantasy novels by George R. R. Martin, the first of which is A Game of Thrones. The show premiered on HBO in the United States on April 17, 2011, and concluded on May 19, 2019, with 73 episodes broadcast over eight seasons.
Set on the fictional continents of Westeros and Essos, Game of Thrones has a large ensemble cast an


## convert the text document to a graph
here we use langchain transformers to extract a graph structured format of ourt text.

In [47]:
documents = [Document(page_content=camus_summary)]
graph_documents = await graph_transformer.aconvert_to_graph_documents(documents)

## print the generated graph in text format

In [48]:
from pprint import pprint

# For displaying nodes
print("NODES:")
pprint([{"id": node.id, "type": node.type, "properties": node.properties} for node in graph_documents[0].nodes])

# For displaying relationships
print("\nRELATIONSHIPS:")
pprint([{
    "source": rel.source.id,
    "type": rel.type,
    "target": rel.target.id
} for rel in graph_documents[0].relationships])

NODES:
[{'id': 'Albert Camus', 'properties': {}, 'type': 'Person'},
 {'id': 'Catherine Hélène Camus', 'properties': {}, 'type': 'Person'},
 {'id': 'Lucien Camus', 'properties': {}, 'type': 'Person'},
 {'id': 'Louis Germain', 'properties': {}, 'type': 'Person'},
 {'id': 'Jean Grenier', 'properties': {}, 'type': 'Person'},
 {'id': 'Simone Hié', 'properties': {}, 'type': 'Person'},
 {'id': 'Francine Faure', 'properties': {}, 'type': 'Person'},
 {'id': 'Jean-Paul Sartre', 'properties': {}, 'type': 'Person'},
 {'id': 'María Casares', 'properties': {}, 'type': 'Person'},
 {'id': 'Michel Gallimard', 'properties': {}, 'type': 'Person'},
 {'id': 'Nobel Prize In Literature', 'properties': {}, 'type': 'Award'},
 {'id': 'The Stranger', 'properties': {}, 'type': 'Literary work'},
 {'id': 'The Plague', 'properties': {}, 'type': 'Literary work'},
 {'id': 'The Myth Of Sisyphus', 'properties': {}, 'type': 'Literary work'},
 {'id': 'The Fall', 'properties': {}, 'type': 'Literary work'},
 {'id': 'The Reb

# using pyvis draw the graph
## here we introduce a function to display the graph ina nice interactive html.

In [55]:
from pyvis.network import Network

def visualize_graph(graph_documents, output_file="knowledge_graph.html"):
    # Create network
    net = Network(height="1200px", width="100%", directed=True,
                      notebook=False, bgcolor="#222222", font_color="white")
    
    nodes = graph_documents[0].nodes
    relationships = graph_documents[0].relationships

    # Build lookup for valid nodes
    node_dict = {node.id: node for node in nodes}
    
    # Filter out invalid edges and collect valid node IDs
    valid_edges = []
    valid_node_ids = set()
    for rel in relationships:
        if rel.source.id in node_dict and rel.target.id in node_dict:
            valid_edges.append(rel)
            valid_node_ids.update([rel.source.id, rel.target.id])


    # Track which nodes are part of any relationship
    connected_node_ids = set()
    for rel in relationships:
        connected_node_ids.add(rel.source.id)
        connected_node_ids.add(rel.target.id)

    # Add valid nodes
    for node_id in valid_node_ids:
        node = node_dict[node_id]
        try:
            net.add_node(node.id, label=node.id, title=node.type, group=node.type)
        except:
            continue  # skip if error

    # Add valid edges
    for rel in valid_edges:
        try:
            net.add_edge(rel.source.id, rel.target.id, label=rel.type.lower())
        except:
            continue  # skip if error

    # Configure physics
    net.set_options("""
            {
                "physics": {
                    "forceAtlas2Based": {
                        "gravitationalConstant": -100,
                        "centralGravity": 0.01,
                        "springLength": 200,
                        "springConstant": 0.08
                    },
                    "minVelocity": 0.75,
                    "solver": "forceAtlas2Based"
                }
            }
            """)
        
    net.save_graph(output_file)
    print(f"Graph saved to {os.path.abspath(output_file)}")

    # Try to open in browser
    try:
        import webbrowser
        webbrowser.open(f"file://{os.path.abspath(output_file)}")
    except:
        print("Could not open browser automatically")


## use the function to draw the graph

In [54]:
        
# Run the function
visualize_graph(graph_documents)

Graph saved to d:\Users\makbar\OneDrive - Rayonnance Technologies\0.Rayonnance\0.AI\knowledge-graph\knowledge_graph.html


## adding filters for nodes and relationships of interest

In [50]:
allowed_nodes = ["Person", "Location", "Award", "BornIn", "Organization", "ResearchField"]
allowed_relationships = [
    ("Person", "WORKS_AT", "Organization"),
    ("Person", "SPOUSE", "Person"),
    ("Person", "AWARD", "Award"),
    ("Organization", "IN_LOCATION", "Location"),
    ("Person", "FIELD_OF_RESEARCH", "ResearchField")
]
graph_transformer_rel_defined = LLMGraphTransformer(
  llm=llm,
  allowed_nodes=allowed_nodes,
  allowed_relationships=allowed_relationships
)
graph_documents_rel_defined = await graph_transformer_rel_defined.aconvert_to_graph_documents(documents)

## drawing filtered network

In [51]:
# Visualize graph
visualize_graph(graph_documents_rel_defined, output_file="knowledge_graph_restricted.html")

Graph saved to d:\Users\makbar\OneDrive - Rayonnance Technologies\0.Rayonnance\0.AI\knowledge-graph\knowledge_graph_restricted.html
