### Connect to Neo4j

In [1]:
# import necessary libraries
import pandas as pd
from helpers.helper_functions import init_connection, create_worksheet_lists, create_nodes, create_relationships

# initialize connection to the database
graph = init_connection()

# define source file path
source_file_path = pd.ExcelFile("data/knowledge_graph.xlsx")

Connected to the database


### Reset Knowledge Graph
To avoid duplicates in the database the database should be deleted entirely to get a complete new database without any data.

In [2]:
# define query to delete all nodes and relationships
reset_db_query = "MATCH (n) DETACH DELETE n"

# run specified query
graph.run(reset_db_query)

# print confirmation message
print("Database reset complete. All nodes and relationships have been deleted.")

Database reset complete. All nodes and relationships have been deleted.


### Create Worksheet Lists
Before we can create the knowledge graph it is necessary to split the worksheets from the excel in Node worksheets and Relationship worksheets because they need to be handled in a different way.

In [3]:
# call function to create nodes and relationships and store them in variables
node_worksheets, rel_worksheets = create_worksheet_lists(file_path=source_file_path)

Node worksheets:  ['Doctor', 'Topic', 'SubTopic', 'Illness', 'Symptom', 'Cause', 'Treatment', 'Patient', 'Drug', 'Diagnosis', 'Hospital', 'Allergy', 'Insurance', 'Department']
Relationship worksheets:  ['REL_Doctor', 'REL_Topic', 'REL_Illness', 'REL_Symptom', 'REL_Patient', 'REL_Hospital'] 



### Create Nodes
To create all nodes we need to iterate over all node_worksheets we've just created and read the data of each worksheet. 

In [4]:
# call function to create relationships
create_nodes(worksheets=node_worksheets, file_path=source_file_path, graph=graph)

Created 10 nodes with the label 'Doctor'
Created 10 nodes with the label 'Topic'
Created 20 nodes with the label 'SubTopic'
Created 30 nodes with the label 'Illness'
Created 55 nodes with the label 'Symptom'
Created 55 nodes with the label 'Cause'
Created 55 nodes with the label 'Treatment'
Created 20 nodes with the label 'Patient'
Created 60 nodes with the label 'Drug'
Created 30 nodes with the label 'Diagnosis'
Created 5 nodes with the label 'Hospital'
Created 15 nodes with the label 'Allergy'
Created 5 nodes with the label 'Insurance'
Created 10 nodes with the label 'Department'
All nodes have been created successfully. In total:  14  node types.



### Create Relationships
To create all relationships we need to iterate over all rel_worksheets we've just created and read the data of each worksheet.

In [5]:
# call function to create relationships
create_relationships(worksheets=rel_worksheets, file_path=source_file_path, graph=graph)

Created 190 relationships from a 'Doctor' node
Created 20 relationships from a 'Topic' node
Created 150 relationships from a 'Illness' node
Created 60 relationships from a 'Symptom' node
Created 180 relationships from a 'Patient' node
Created 10 relationships from a 'Hospital' node
All relationships have been created successfully.

