# Knowledge Graph: Unveiling Hidden Connections
In today's data-driven world, understanding the relationships between different pieces of information is crucial. Knowledge graphs have emerged as a powerful way to visualize and explore these connections, transforming unstructured text into a structured network of entities and their relationships. We will go through a simple workflow for creating a knowledge graph from textual data, making complex information more accessible and easier to understand.

---

## Knowledge Graphs: 101

### Knowledge Graphs and Knowledge Bases: Difference

The terms "knowledge graph" and "knowledge base" are often used interchangeably, but they have subtle differences. Knowledge base (KB) refers to structured information that we have about a domain of interest. On the other hand, a knowledge graph is a knowledge base structured as a graph, where nodes represent entities and edges signify relations between those entities. For example, from the text "Alice lives in Wonderland", we can extract the relation triplet `<Alice, lives in, Wonderland>`, where "Alice" and "Wonderland" are entities, and "lives in" is relation between them.

### Building a Knowledge Graph

The process of building a knowledge graph usually consists of two sequential steps:
- **Named Entity Recognition** (NER): This step involves extracting entities from the text, which will eventually become the nodes of the knowledge graph.
- **Relation Classification** (RC): In this step, relations between entities are extracted, forming the edges of the knowledge graph.

Then, the knowledge graph is commonly visualized using libraries such as `pyvis`. In this project, we'll do the Named Entity Recognition and Relation Classification tasks simultaneously with an appropriate prompt. This joint task is commonly called **Relation Extraction** (RE).

## Workflow for Creating Knowledge Graphs from Textual Data

Here’s what we are going to do in this project.
<br/>
<img src="../../images/knowledge-graph-workflow.png" alt="Knowledge Graph Workflow" style="width: 80%; height: auto;"/>

## Setup

In [None]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_version = os.environ.get("OPENAI_API_VERSION")
openai.api_key = os.environ.get("OPENAI_API_KEY")

## Building a Knowledge Graph with LangChain

In [1]:
from langchain.prompts import PromptTemplate
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.graphs.networkx_graph import KG_TRIPLE_DELIMITER

# Prompt template for knowledge triple extraction
_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE = (
    "You are a networked intelligence helping a human track knowledge triples"
    " about all relevant people, things, concepts, etc. and integrating"
    " them with your knowledge stored within your weights"
    " as well as that stored in a knowledge graph."
    " Extract all of the knowledge triples from the text."
    " A knowledge triple is a clause that contains a subject, a predicate,"
    " and an object. The subject is the entity being described,"
    " the predicate is the property of the subject that is being"
    " described, and the object is the value of the property.\n\n"
    "EXAMPLE\n"
    "It's a state in the US. It's also the number 1 producer of gold in the US.\n\n"
    f"Output: (Nevada, is a, state){KG_TRIPLE_DELIMITER}(Nevada, is in, US)"
    f"{KG_TRIPLE_DELIMITER}(Nevada, is the number 1 producer of, gold)\n"
    "END OF EXAMPLE\n\n"
    "EXAMPLE\n"
    "I'm going to the store.\n\n"
    "Output: NONE\n"
    "END OF EXAMPLE\n\n"
    "EXAMPLE\n"
    "Oh huh. I know Descartes likes to drive antique scooters and play the mandolin.\n"
    f"Output: (Descartes, likes to drive, antique scooters){KG_TRIPLE_DELIMITER}(Descartes, plays, mandolin)\n"
    "END OF EXAMPLE\n\n"
    "EXAMPLE\n"
    "{text}"
    "Output:"
)

KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT = PromptTemplate(
    input_variables=["text"],
    template=_DEFAULT_KNOWLEDGE_TRIPLE_EXTRACTION_TEMPLATE,
)

model = AzureChatOpenAI(deployment_name="gpt4", temperature=0.9)

# Create an LLMChain using the knowledge triple extraction prompt
chain = LLMChain(llm=model, prompt=KNOWLEDGE_TRIPLE_EXTRACTION_PROMPT)

# Run the chain with the specified text
text = "The city of Paris is the capital and most populous city of France. \
The Eiffel Tower is a famous landmark in Paris."

triples = chain.run(text)

print(triples)

(Paris, is, capital of France)<|>(Paris, is, most populous city of France)<|>(Eiffel Tower, is, famous landmark in Paris)


In [2]:
# Extract the triplets from the response into a list

from typing import List


def parse_triples(response: str, delimiter: str = KG_TRIPLE_DELIMITER) -> List[str]:
    if not response:
        return []
    return response.split(delimiter)


triples_list = parse_triples(triples)
print(triples_list)

['(Paris, is, capital of France)', '(Paris, is, most populous city of France)', '(Eiffel Tower, is, famous landmark in Paris)']


## Knowledge Graph Visualization

In [10]:
from pyvis.network import Network
import networkx as nx
from typing import List

# Create a NetworkX graph from the extracted relation triplets
def create_graph_from_triplets(triplets: List[str]) -> nx.DiGraph:
    G = nx.DiGraph()
    for triplet in triplets:
        subject, predicate, obj = triplet.strip().split(",")
        G.add_edge(subject.strip(), obj.strip(), label=predicate.strip())
    return G


# Convert the NetworkX graph to a PyVis network
def nx_to_pyvis(networkx_graph: nx.DiGraph) -> Network:
    pyvis_graph = Network(notebook=True, cdn_resources="in_line")
    for node in networkx_graph.nodes():
        pyvis_graph.add_node(node)
    for edge in networkx_graph.edges(data=True):
        pyvis_graph.add_edge(edge[0], edge[1], label=edge[2]["label"])
    return pyvis_graph


triplets = [t.strip() for t in triples_list if t.strip()]
graph = create_graph_from_triplets(triplets)
pyvis_network = nx_to_pyvis(graph)

# Customize the appearance of the graph
pyvis_network.toggle_hide_edges_on_drag(True)
pyvis_network.toggle_physics(False)
pyvis_network.set_edge_smooth("discrete")

# Show the interactive knowledge graph visualization
pyvis_network.show("knowledge_graph.html")

knowledge_graph.html


> **Note:** It's worth noting that LangChain offers the `GraphIndexCreator` class, which automates the extraction of relation triplets and is seamlessly integrated with question-answering chains. In future articles, we'll delve deeper into this powerful feature, showcasing its potential further to enhance knowledge graph creation and analysis capabilities.