### Connect to Neo4j

In [23]:
# import necessary libraries
import time
import pandas as pd
from helpers.helper_functions import init_connection, create_worksheet_lists, create_nodes, create_relationships, reset_db

# set up timer for runtime of the script
start_time = time.time()

# initialize connection to the database
graph = init_connection()

# define source file path
excel_file = pd.ExcelFile("data/knowledge_graph.xlsx")

Connected to the database


### Reset Knowledge Graph
To avoid duplicates in the database the database should be deleted entirely to get a complete new database without any data.

In [24]:
# call the function to reset the datbase
reset_db(graph=graph)

[32mDatabase reset completed[0m


### Create Worksheet Lists
Before we can create the knowledge graph it is necessary to split the worksheets from the excel in Node worksheets and Relationship worksheets because they need to be handled in a different way.

In [25]:
# call function to create nodes and relationships and store them in variables
node_worksheets, rel_worksheets = create_worksheet_lists(excel_file=excel_file)

Node worksheets:  ['Doctor', 'Topic', 'SubTopic', 'Illness', 'Symptom', 'Cause', 'Treatment', 'Patient', 'Drug', 'Diagnosis', 'Hospital', 'Allergy', 'Insurance', 'Department']
Relationship worksheets:  ['REL_Doctor', 'REL_Topic', 'REL_Illness', 'REL_Symptom', 'REL_Patient', 'REL_Hospital'] 



### Create Nodes
To create all nodes we need to iterate over all node_worksheets we've just created and read the data of each worksheet. 

In [26]:
# call function to create relationships
create_nodes(worksheets=node_worksheets, excel_file=excel_file, graph=graph)

Created 10 nodes with the label 'Doctor'
Created 10 nodes with the label 'Topic'
Created 20 nodes with the label 'SubTopic'
Created 30 nodes with the label 'Illness'
Created 55 nodes with the label 'Symptom'
Created 55 nodes with the label 'Cause'
Created 55 nodes with the label 'Treatment'
Created 20 nodes with the label 'Patient'
Created 60 nodes with the label 'Drug'
Created 30 nodes with the label 'Diagnosis'
Created 5 nodes with the label 'Hospital'
Created 15 nodes with the label 'Allergy'
Created 5 nodes with the label 'Insurance'
Created 10 nodes with the label 'Department'
[32mAll nodes have been created successfully. In total: 14 node types.
[0m


### Create Relationships
To create all relationships we need to iterate over all rel_worksheets we've just created and read the data of each worksheet.

In [27]:
# call function to create relationships
create_relationships(worksheets=rel_worksheets, excel_file=excel_file, graph=graph)

Created 190 relationships from a 'Doctor' node
Created 20 relationships from a 'Topic' node
Created 150 relationships from a 'Illness' node
Created 60 relationships from a 'Symptom' node
Created 180 relationships from a 'Patient' node
Created 10 relationships from a 'Hospital' node
[32mAll relationships have been created successfully.
[0m


### Timestamp
Print the timestamp when the script was executed.

In [28]:
# print statement to print when the script was executed
print(f"This script was run on: {time.strftime('%d.%m.%Y %H:%M:%S')}")
end_time = time.time()
total_time = end_time - start_time
print(f"Total execution time: {total_time:.2f} seconds")

This script was run on: 22.04.2025 16:18:37
Total execution time: 2.05 seconds
