# ImportToDesktop
Imports data and metadata csv files into a Neo4j Desktop database.

Before running this notebook, follow the [instructions](https://github.com/sbl-sdsc/kg-import/tree/main?tab=readme-ov-file#data-import-into-neo4j-knowledge-graph) and start  the Neo4j Graph DBMS.

In [1]:
# Path to environment file
ENV_PATH = "../.env_desktop"

## Setup

In [2]:
import os
import time
import pandas as pd
from py2neo import Graph
import shutil
import requests

from dotenv import load_dotenv
load_dotenv(ENV_PATH, override=True)

pd.set_option('display.max_colwidth', None)

In [3]:
def download_http(url, filename):
    if os.path.exists(filename):
        os.remove(filename)
    data = requests.get(url)
    with open(filename, 'wb')as file:
        file.write(data.content)

In [4]:
# copy required files (temporary solution)
download_http("https://raw.githubusercontent.com/pwrose/neo4j-ipycytoscape/master/notebooks/neo4j_utils.py", "neo4j_utils.py")

In [5]:
import neo4j_utils
import neo4j_bulk_importer

## Import the Knowledge Graph
CSV data and metadata files are uploaded into the Neo4j Graph database from the kg directory using the kg-import bulk upload scripts. For a description of the data organization and the specification of metadata see.

In [6]:
neo4j_bulk_importer.import_from_csv_to_neo4j_desktop(verbose=True)

drop_database: '/Users/Peter/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-763cbf0d-6660-412a-8042-c55ae40d5290/bin/cypher-shell' -d system -u neo4j -p neo4jdemo 'DROP DATABASE `kg-import` IF EXISTS;'



Executing:   0%|          | 0/85 [00:00<?, ?cell/s]

run_bulk_import: cd '/Users/Peter/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-763cbf0d-6660-412a-8042-c55ae40d5290/import'; '/Users/Peter/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-763cbf0d-6660-412a-8042-c55ae40d5290/bin/neo4j-admin' database import full kg-import --overwrite-destination --skip-bad-relationships --skip-duplicate-nodes --multiline-fields --array-delimiter='|' @args.txt
Neo4j version: 5.12.0
Importing the contents of these files into /Users/Peter/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-763cbf0d-6660-412a-8042-c55ae40d5290/data/databases/kg-import:
Nodes:
  [Disease]:
  /Users/Peter/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-763cbf0d-6660-412a-8042-c55ae40d5290/import/header_Disease_n.csv
  /Users/Peter/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-763cbf0d-6660-412a-8042-c55ae40d5290/import/Disease_n.csv

## Connect to the local Neo4j Graph database

In [7]:
database = os.environ.get("NEO4J_DATABASE")
username = os.environ.get("NEO4J_USERNAME")
password = os.environ.get("NEO4J_PASSWORD")

graph = Graph("bolt://localhost:7687", name=database, user=username, password=password)

## Metadata

### Node metadata
The MetaNodes and MetaRelationships define the structure of the KG and the properties of nodes and relationships. The query below lists the nodes and their properties.

In [8]:
query = """
MATCH (n:MetaNode) RETURN n;
"""
df = graph.run(query).to_data_frame()
metadata = df["n"].tolist()
metadata = pd.DataFrame(metadata)
metadata.fillna("", inplace=True)
metadata

Unnamed: 0,nodeName,synonyms,name,location,id,population,firstName,lastName,smoker,sex,age
0,State,Alternate names of state (string[]),Name of state (string),Latitude and longitude in WGS-84 format (point{crs:WGS-84}),Geonames.org id for location (string),Population (int),,,,,
1,City,Alternate names of city (string[]),Name of city (string),Latitude and longitude in WGS-84 format (point{crs:WGS-84}),Geonames.org id for location (string),Population (int),,,,,
2,Symptom,,Name of symptom (string),,Symptom id from Symptom Ontology (string),,,,,,
3,Patient,,,,Unique patient id (string),,First name (string),Last name (string),Patient is a smoker (boolean),Biological sex (string),Age (int)
4,Disease,,Name of disease from Human Disease Ontology (string),,Disease id from Human Disease Ontology (string),,,,,,


## Number of Nodes

In [9]:
query = """
MATCH (n) RETURN COUNT(n);
"""
print(f"Total number of nodes: {graph.evaluate(query)}")

Total number of nodes: 19


In [10]:
query = """
MATCH (n) RETURN labels(n)[0] AS Node, COUNT(n) AS Count
ORDER BY Count DESC
"""
graph.run(query).to_data_frame()

Unnamed: 0,Node,Count
0,MetaNode,5
1,Symptom,4
2,Disease,3
3,Patient,3
4,City,3
5,State,1


## Number of relationships by relationship type

In [11]:
query = """
MATCH ()-[r]-() RETURN DISTINCT TYPE(r) AS Relationship, COUNT(DISTINCT r) AS Count
ORDER BY Count DESC
"""
graph.run(query).to_data_frame()

Unnamed: 0,Relationship,Count
0,MetaRelationship,5
1,PRESENTS,4
2,SHOWS,4
3,LOCATED_IN,3
4,DIAGNOSED_WITH,3
5,LIVES_IN,3
