# Uploading Data to Neo4j Server
## The Marvel Comics character collaboration graph
For this demonstrative project, we will be using graph data from *The Marvel Comics character collaboration graph* originally constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. The data pulls from Marvel's superhero comic books, linking hero's to the stories they appear in. While this is obviosuly a non-serious use, it is still demonstraive of the capabilties of a dockerized environment for doing data analysis on graph data.


## The Uploading Code
We use the py2neo python package to interact with the server. In the code below we define and then call functions to read our relevant data into python, then use py2neo to communicate with the server, making queries to create our intended graph structure. Also notice that before calling the nodes or relationships function, that we first create index specifications for the server. We do this because index values can be matched much more efficiently, helping speed up the uploading process.

**Please note that uploading the data may take a few minutes depending on your system (around 5 minutes)** 

In [1]:
%cd /home/jovyan/work
from py2neo import Graph, Node, Relationship, NodeMatcher
import csv
import time

# Connect to the Neo4j database
graph = Graph("bolt://neo4j:7687", auth=("neo4j", "1234"))

# Function to create nodes
def create_nodes_from_csv(file_path):
    with open(file_path, newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            node = Node(row['type'], name=row['node'])
            graph.create(node)

# Function to create relationships
def create_relationships_from_csv(file_path):
    with open(file_path, newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            hero_name = row['hero']
            comic_name = row['comic']
            hero = NodeMatcher(graph).match("hero", name=hero_name).first()
            comic = NodeMatcher(graph).match("comic", name=comic_name).first()
            if hero and comic:
                relationship = Relationship(hero, "APPEARS_IN", comic)
                graph.create(relationship)
            

# Create indexes
start_time = time.time()
graph.run("CREATE INDEX ON :hero(name)")
graph.run("CREATE INDEX ON :comic(name)")
index_creation_time = time.time() - start_time
print(f"Index creation time: {index_creation_time:.2f} seconds")

# Upload nodes from nodes.csv
start_time = time.time()
create_nodes_from_csv("data/nodes.csv")
node_creation_time = time.time() - start_time
print(f"Node creation time: {node_creation_time:.2f} seconds")

# Upload relationships from edges.csv
start_time = time.time()
create_relationships_from_csv("data/edges.csv")
relationship_creation_time = time.time() - start_time
print(f"Relationship creation time: {relationship_creation_time:.2f} seconds")


/home/jovyan/work
Index creation time: 0.75 seconds
Node creation time: 40.31 seconds
Relationship creation time: 304.51 seconds
