Use an LLM to generate a knowledge graph.
The purpose of the library is to extract entities and their relationships without a pre-defined schema.
Input: Paragraphs of text
Output: Nodes and Relationships
Output: Merge the nodes into the Knowledge Graph database
pip install llm_ner_nelfrom llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
text = '''Vincent de Groof (6 December 1830 – 9 July 1874) was a Dutch-born Belgian early pioneering aeronaut. He created an early model of an ornithopter'''
inference_provider = RelationshipInferenceProvider(model="llama3.2")
relationships = inference_provider.get_relationships(text)
display_relationships(relationships, console_log=True)Output:
Vincent de Groof:(1.0) (Person) --[born in]-> Dutch-born Belgian:(1.0) (Location)
Vincent de Groof:(1.0) (Person) --[created an early model of]-> an ornithopter:(1.0) (Thing)
Vincent de Groof:(1.0) (Person) --[pioneering aeronaut]-> early:(1.0) (Attribute)
pip install -e .from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
from llm_ner_nel.knowledge_graph.graph import KnowledgeGraph
text = '''Australia’s centre-left prime minister, Anthony Albanese, has won a second term with a crushing victory over the opposition, whose rightwing leader, Peter Dutton, failed to brush off comparisons with Donald Trump and ended up losing his own seat.
Australians have voted for a future that holds true to these values, a future built on everything that brings us together as Australians, and everything that sets our nation apart from the world'''
inference_provider = RelationshipInferenceProvider(model="llama3.2")
relationships = inference_provider.get_relationships(text)
display_relationships(relationships, console_log=True)
graph.add_or_merge_relationships(relationships, "https://en.wikipedia.org/wiki/Vincent_de_Groof", "Wikipedia")Use the conda environment:
Run a local instance of the Neo4J database
- Run the startNeo4j.sh script
- Or run the following
docker run \
--name neo4j \
-d --rm \
-p 7474:7474 -p 7687:7687 \
--name neo4j-apoc \
--volume=./data:/data \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
neo4j:2025.03Verify access to Neo4J:
http://localhost:7474/browser/
- Username: neo4j
- Password: password
Install the following dependencies:
pip install -r requirements.txtAlternatively, setup a conda environment and install the dependencies:
conda create -n knowledge-graph python=3.10
conda activate knowledge-graph
pip install -r requirements.txt
# if running Jupyter notebook on VSCODE
pip install -U ipykernelRun a local LLM using OLLAMA.
-
Download and install OLLAMA: https://ollama.com/download
-
Download the LLM Model
ollama run gemma3:12bNote: if you have a GPU or an environment with High Bandwidth Memory, then it is recommended to run a larger model.
Model: gemma3:27b
from llm_ner_nel.inference_api.relationship_inference import RelationshipInferenceProvider, display_relationships
from llm_ner_nel.knowledge_graph.graph import KnowledgeGraph
relationships_extractor = RelationshipInferenceProvider(
model="gemma3:27b",
ollama_host="http://localhost:11434",
mlflow_tracking_host="http://localhost:5000",
mlflow_system_prompt_id = None, # if mlflow prompt repository is not setup ("NER_System/1")
mlflow_user_prompt_id = None # if mlflow prompt repository is not setup ("NER_User/1")
)
graph=KnowledgeGraph()
# Get relationships from text
relationships = relationships_extractor.get_relationships(text=text)
# Merge the relationships to a knowledge graph
graph.add_or_merge_relationships(relationships, src="book", src_type="politics")The LLM hyper parameters can be defined, these would help to refine the expected output for concrete use-cases:
class LlmConfig:
model: str
temperature: float
top_k: int
top_p: float
max_tokens: int
repeat_penalty: float
frequency_penalty: float
presence_penalty: float
typical_p: float
num_thread: intSee default prompt in prompts.py
Prompts can vary between models/providers. Version controlling the prompts and annotating the prompt helps with faster evaluation.
MlFlow provides a prompt repository and is used to get the user prompt and system prompt.
E.g. prompts for named entity recognition
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
-e MLFLOW_USER_PROMPT_ID=NER_User/@entity \E.g. prompts for a knowledge graph using named entity recognition (NER) and named entity linking (NEL)
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@relationship \
-e MLFLOW_USER_PROMPT_ID=NER_User/@relationship \Two data pipeline examples show how Named Entity Recognition can be used:
- Decorate text with entities
- Build a knowledge graph
E.g.
In this example, a folder is passed as an argument, entities and their relationships are extracted from the PDF documents located in the folder. The relationships are merged to the Neo4J knowledge graph.
docker run -e OLLAMA_MODEL=gemma3:12b \
-e OLLAMA_HOST=http://ollama:11434 \
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
-e MLFLOW_USER_PROMPT_ID=NER_User/@entity \
-e MLFLOW_TRACKING_HOST=http://mlflow:5000
-e NEO4J_URI=bolt://neo4j-apoc:7687 \
-e NEO4J_USERNAME=neo4j \
-e NEO4J_PASSWORD=password \
-v /PATH_TO_FOLDERS_WITH_PDFS:/mnt
--network development_network \
--rm \
--name pdf_graph_builder \
--hostname pdf_graph_builder \
pdf_graph_builderIn this example text is read from a MongoDb Collection and the entities are extracted/stored back to the mongodb collection.
The container would read a collection in batches and perform NER.
docker run -e OLLAMA_MODEL=gemma3:12b \
-e OLLAMA_HOST=http://ollama:11434 \
-e MONGODB_CONNECTION_STRING=mongodb://mongodb:27017 \
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@entity \
-e MLFLOW_USER_PROMPT_ID=NER_User/@entity \
-e MLFLOW_TRACKING_HOST=http://mlflow:5000
--network development_network \
--rm \
--name entity_extractor \
--hostname entity_extractor \
mongodb-entity-extractorUse RabbitMQ to Publish/Consume messages.
The messages will be processed and merged to a knowledge graph
Message format:
{
"src": "https://en.wikipedia.org/wiki/Vincent_de_Groof",
"src_type": "Newspaper",
"text": "Vincent de Groof (6 December 1830 – 9 July 1874) was a Dutch-born Belgian early pioneering aeronaut. He created an early model of an ornithopter and successfully demonstrated its use before fatally crashing the machine in London, UK.[1] "
}
Start rabbitmq:
docker run -d --rm --network development_network --hostname rabbitmq --name rabbitmq -p 15672:15672 -p 5672:5672 rabbitmq:3.8.12-rc.1-managementIn this example data is published to a rabbitmq queue. The data is consumed and appended to a Neo4j Knowledge Graph
docker run -e RABBITMQ_PORT=5672 \
-e RABBITMQ_HOST=rabbitmq \
-e RABBITMQ_VHOST=dev \
-e RABBITMQ_QUEUE=DatapipelineCleanData \
-e RABBITMQ_USER=guest \
-e RABBITMQ_PASSWORD=guest \
-e OLLAMA_MODEL=gemma3:12b \
-e OLLAMA_HOST=http://ollama:11434 \
-e NEO4J_URI=bolt://neo4j-apoc:7687 \
-e NEO4J_USERNAME=neo4j \
-e NEO4J_PASSWORD=password \
-e MLFLOW_SYSTEM_PROMPT_ID=NER_System/@relationship \
-e MLFLOW_USER_PROMPT_ID=NER_User/@relationship \
-e MLFLOW_TRACKING_HOST=http://mlflow:5000 \
--network development_network \
--rm \
--name knowledge_consumer \
--hostname knowledge_consumer \
rabbit-mq-graph-builderOllama should be running in a docker container; or a network route should be available.
Run the following for a CPU-Only ollama instance:
docker run -d --rm
-v /usr/share/ollama/.ollama:/root/.ollama \
-p 11434:11434 \
--network development_network \
--name ollama \
--hostname ollama \
ollama/ollama
Note: replace the path /usr/share/ollama/.ollama to the host's ollama model path
Due to the sheer amount of model options and inference providers; it is necessary to evaluate.
Evaluation allows methodical selection of hyper parameters:
- User Prompt (based on model and provider)
- System Prompt (based on model and provider)
- Model
The first step of an evaluation criteria is to define ground truths:
Text --> Expected entities
E.g.
"Input", "GroundTruth"
"Amazon and Google announced a partnership.","Amazon|Google"
Based on the ground truth calculate the following:
- True positives (TP)
- False positives (FP)
- False negatives (FN)
Using the above calculate the following:
- Precision
- Recall
Finally calculate an F1 score.
Proportion of correctly identified entities (accuracy).
High precision means that when the model identifies an entity, it is likely to be correct.
Where:
- ( TP ) = True Positives
- ( FP ) = False Positives
Ability to find the relevant entities.
High recall means the model has detected most of the actual entities in the text.
Where:
- ( TP ) = True Positives
- ( FN ) = False Negatives
Publish the evaluation metrics for each "experiment".
ML-Flow is a good tool for analysis.
Copyright (C) 2025 Paul Eger
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.





