LLM Knowledge Graph

Use an LLM to generate a knowledge graph.

The purpose of the library is to extract entities and their relationships without a pre-defined schema.

Input: Paragraphs of text

Output: Nodes and Relationships

Output: Merge the nodes into the Knowledge Graph database

Install and use library

pip install llm_ner_nel

from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships

text = '''Vincent de Groof (6 December 1830 – 9 July 1874) was a Dutch-born Belgian early pioneering aeronaut. He created an early model of an ornithopter'''

inference_provider = RelationshipInferenceProvider(model="llama3.2")
relationships = inference_provider.get_relationships(text)

display_relationships(relationships, console_log=True)

Output:

Vincent de Groof:(1.0) (Person) --[born in]-> Dutch-born Belgian:(1.0) (Location)
Vincent de Groof:(1.0) (Person) --[created an early model of]-> an ornithopter:(1.0) (Thing)
Vincent de Groof:(1.0) (Person) --[pioneering aeronaut]-> early:(1.0) (Attribute)

Dev environment setup - Knowledge Graph

pip install -e .

from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
from llm_ner_nel.knowledge_graph.graph import KnowledgeGraph

text = '''Australia’s centre-left prime minister, Anthony Albanese, has won a second term with a crushing victory over the opposition, whose rightwing leader, Peter Dutton, failed to brush off comparisons with Donald Trump and ended up losing his own seat.
Australians have voted for a future that holds true to these values, a future built on everything that brings us together as Australians, and everything that sets our nation apart from the world'''

inference_provider = RelationshipInferenceProvider(model="llama3.2") 
relationships = inference_provider.get_relationships(text)

display_relationships(relationships, console_log=True)

graph.add_or_merge_relationships(relationships, "https://en.wikipedia.org/wiki/Vincent_de_Groof", "Wikipedia")

Try it using a Jupyter Notebook:

knowledge_graph.ipynb

ner.ipynb

Use the conda environment:

Environment setup

Neo4j

Run a local instance of the Neo4J database

Run the startNeo4j.sh script
Or run the following

docker run \
    --name neo4j \
    -d --rm \
    -p 7474:7474 -p 7687:7687 \
    --name neo4j-apoc \
    --volume=./data:/data \
    -e NEO4J_AUTH=neo4j/password \
    -e NEO4J_apoc_export_file_enabled=true \
    -e NEO4J_apoc_import_file_enabled=true \
    -e NEO4J_apoc_import_file_use__neo4j__config=true \
    -e NEO4J_PLUGINS=\[\"apoc\"\] \
    neo4j:2025.03

Verify access to Neo4J:

http://localhost:7474/browser/

Username: neo4j
Password: password

Python dependencies

Install the following dependencies:

pip install -r requirements.txt

Alternatively, setup a conda environment and install the dependencies:

conda create -n knowledge-graph python=3.10
conda activate knowledge-graph
pip install -r requirements.txt
# if running Jupyter notebook on VSCODE
pip install -U ipykernel

Local LLM - Ollama

Run a local LLM using OLLAMA.

Download and install OLLAMA: https://ollama.com/download
Download the LLM Model

ollama run gemma3:12b

Note: if you have a GPU or an environment with High Bandwidth Memory, then it is recommended to run a larger model.

Example

Model: gemma3:27b

from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
from llm_ner_nel.knowledge_graph.graph import KnowledgeGraph

relationships_extractor = RelationshipInferenceProvider(
            model="gemma3:27b",  
            ollama_host="http://localhost:11434", 
            mlflow_tracking_host="http://localhost:5000", 
            mlflow_system_prompt_id = None, # if mlflow prompt repository is not setup ("NER_System/1")
            mlflow_user_prompt_id = None #  if mlflow prompt repository is not setup ("NER_User/1")
)
        
graph=KnowledgeGraph()

# Get relationships from text
relationships = relationships_extractor.get_relationships(text=text)

# Merge the relationships to a knowledge graph                    
graph.add_or_merge_relationships(relationships, src="book", src_type="politics")

Hyper parameters

The LLM hyper parameters can be defined, these would help to refine the expected output for concrete use-cases:

class LlmConfig:
        
    model: str
    temperature: float
    top_k: int
    top_p: float
    max_tokens: int
    repeat_penalty: float
    frequency_penalty: float
    presence_penalty: float
    typical_p: float
    num_thread: int

Prompt management

See default prompt in prompts.py

Prompts can vary between models/providers. Version controlling the prompts and annotating the prompt helps with faster evaluation.

MlFlow provides a prompt repository and is used to get the user prompt and system prompt.

E.g. prompts for named entity recognition

-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
-e MLFLOW_USER_PROMPT_ID=NER_User/@entity \

E.g. prompts for a knowledge graph using named entity recognition (NER) and named entity linking (NEL)

-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@relationship \
-e MLFLOW_USER_PROMPT_ID=NER_User/@relationship \

Data pipelines - Consumers

Two data pipeline examples show how Named Entity Recognition can be used:

Decorate text with entities
Build a knowledge graph

E.g.

PDF Knowledge Graph Builder

In this example, a folder is passed as an argument, entities and their relationships are extracted from the PDF documents located in the folder. The relationships are merged to the Neo4J knowledge graph.

docker run -e OLLAMA_MODEL=gemma3:12b \
           -e OLLAMA_HOST=http://ollama:11434 \
           -e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
           -e MLFLOW_USER_PROMPT_ID=NER_User/@entity \
           -e MLFLOW_TRACKING_HOST=http://mlflow:5000
           -e NEO4J_URI=bolt://neo4j-apoc:7687 \
           -e NEO4J_USERNAME=neo4j \
           -e NEO4J_PASSWORD=password \
           -v /PATH_TO_FOLDERS_WITH_PDFS:/mnt
           --network development_network \
           --rm \
           --name pdf_graph_builder \
           --hostname pdf_graph_builder \
           pdf_graph_builder

Mongo DB entitiy extractor

In this example text is read from a MongoDb Collection and the entities are extracted/stored back to the mongodb collection.

The container would read a collection in batches and perform NER.

docker run -e OLLAMA_MODEL=gemma3:12b \
           -e OLLAMA_HOST=http://ollama:11434 \
           -e MONGODB_CONNECTION_STRING=mongodb://mongodb:27017 \
           -e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
           -e MLFLOW_USER_PROMPT_ID=NER_User/@entity \
           -e MLFLOW_TRACKING_HOST=http://mlflow:5000
           --network development_network \
           --rm \
           --name entity_extractor \
           --hostname entity_extractor \
           mongodb-entity-extractor

RabbitMQ Knowledge Graph Builder

Use RabbitMQ to Publish/Consume messages.

The messages will be processed and merged to a knowledge graph

Message format:

{
  "src": "https://en.wikipedia.org/wiki/Vincent_de_Groof",
  "src_type": "Newspaper",
  "text": "Vincent de Groof (6 December 1830 – 9 July 1874) was a Dutch-born Belgian early pioneering aeronaut. He created an early model of an ornithopter and successfully demonstrated its use before fatally crashing the machine in London, UK.[1] "
}

Pipeline dependecy RabbitMq

Start rabbitmq:

docker run -d --rm --network development_network --hostname rabbitmq --name rabbitmq -p 15672:15672 -p 5672:5672 rabbitmq:3.8.12-rc.1-management

In this example data is published to a rabbitmq queue. The data is consumed and appended to a Neo4j Knowledge Graph

docker run -e RABBITMQ_PORT=5672 \
           -e RABBITMQ_HOST=rabbitmq \
           -e RABBITMQ_VHOST=dev \
           -e RABBITMQ_QUEUE=DatapipelineCleanData \
           -e RABBITMQ_USER=guest \
           -e RABBITMQ_PASSWORD=guest \
           -e OLLAMA_MODEL=gemma3:12b \
           -e OLLAMA_HOST=http://ollama:11434 \
           -e NEO4J_URI=bolt://neo4j-apoc:7687 \
           -e NEO4J_USERNAME=neo4j \
           -e NEO4J_PASSWORD=password \
           -e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@relationship \
           -e MLFLOW_USER_PROMPT_ID=NER_User/@relationship \
           -e MLFLOW_TRACKING_HOST=http://mlflow:5000 \
           --network development_network \
           --rm \
           --name knowledge_consumer \
           --hostname knowledge_consumer \
           rabbit-mq-graph-builder

Dependencies

Ollama should be running in a docker container; or a network route should be available.

Run the following for a CPU-Only ollama instance:

docker run -d --rm 
    -v /usr/share/ollama/.ollama:/root/.ollama \
    -p 11434:11434 \
    --network development_network \
    --name ollama \
    --hostname ollama  \
    ollama/ollama

Note: replace the path /usr/share/ollama/.ollama to the host's ollama model path

Evaluation

Due to the sheer amount of model options and inference providers; it is necessary to evaluate.

Evaluation allows methodical selection of hyper parameters:

User Prompt (based on model and provider)
System Prompt (based on model and provider)
Model

Evaluation criteria

The first step of an evaluation criteria is to define ground truths:

Text --> Expected entities

E.g.

"Input", "GroundTruth"
"Amazon and Google announced a partnership.","Amazon|Google"

Based on the ground truth calculate the following:

True positives (TP)
False positives (FP)
False negatives (FN)

Using the above calculate the following:

Precision
Recall

Finally calculate an F1 score.

Precision

Proportion of correctly identified entities (accuracy).

High precision means that when the model identifies an entity, it is likely to be correct.

$$ \begin{aligned} \text{Precision} &= \frac{TP}{TP + FP} \end{aligned} $$

Where:

( TP ) = True Positives
( FP ) = False Positives

Recall

Ability to find the relevant entities.

High recall means the model has detected most of the actual entities in the text.

$$ \begin{aligned} \text{Recall} &= \frac{TP}{TP + FN} \end{aligned} $$

Where:

( TP ) = True Positives
( FN ) = False Negatives

F1 Score

$$ \begin{aligned} F1 &= 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \end{aligned} $$

Experiments

Publish the evaluation metrics for each "experiment".

ML-Flow is a good tool for analysis.

Metrics

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
.vscode		.vscode
Infrastructure		Infrastructure
consumers		consumers
documentation		documentation
notebooks		notebooks
scripts		scripts
src/llm_ner_nel		src/llm_ner_nel
.gitignore		.gitignore
LICENSE		LICENSE
MAINFEST.IN		MAINFEST.IN
pyproject.toml		pyproject.toml
readme.md		readme.md
readme.py.md		readme.py.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Knowledge Graph

Install and use library

Dev environment setup - Knowledge Graph

Try it using a Jupyter Notebook:

Environment setup

Neo4j

Python dependencies

Local LLM - Ollama

Example

Hyper parameters

Prompt management

Data pipelines - Consumers

PDF Knowledge Graph Builder

Mongo DB entitiy extractor

RabbitMQ Knowledge Graph Builder

Pipeline dependecy RabbitMq

Dependencies

Evaluation

Evaluation criteria

Precision

Recall

F1 Score

Experiments

Metrics

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

egerpaulj/LlmKnowledgeGraph

Folders and files

Latest commit

History

Repository files navigation

LLM Knowledge Graph

Install and use library

Dev environment setup - Knowledge Graph

Try it using a Jupyter Notebook:

Environment setup

Neo4j

Python dependencies

Local LLM - Ollama

Example

Hyper parameters

Prompt management

Data pipelines - Consumers

PDF Knowledge Graph Builder

Mongo DB entitiy extractor

RabbitMQ Knowledge Graph Builder

Pipeline dependecy RabbitMq

Dependencies

Evaluation

Evaluation criteria

Precision

Recall

F1 Score

Experiments

Metrics

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages