## Graph Database Development
Convert your CSV files into a graph database using neo4j.

#### Downloads

1. Download [Neo4j Desktop](https://neo4j.com/download/)

#### Install Packages

1. No packages are needed to be installed

### Getting Started
1. Open Neo4j Desktop and login or create an account with either a username/password, or with an email address.
2. When you first open Neo4j Desktop, there will be an intro project with a movie database already started. You can only have one DBMS running at a time, so first thing is to click 'Stop' to end this session. After it stops, there should be a message on the top saying 'No active DBMS'.
3. Create a new project in the left-hand sidebar by clicking 'New'. This makes a new project named 'Project'. 
4. On the right-hand side of the name 'Project', there is an 'Edit' button, click this to rename your project to something more specific. Click the check mark to save the new name.
5. Depending on how you want to use/access your graph database, create either of the DBMSs described below:

Local DBMS
1. In your new Project, click the 'Add' button on the right-hand side of the project, and choose 'Local DBMS'. 
2. Here, you can rename your DBMS and give it a password (can be different from your Neo4j Desktop password). Then choose 'Create'.
3. Hover over your newly created DBMS and click 'Start'. It will take a moment to start, but soon you'll see a green 'Active' label.
4. Move your CSV files to the import folder
    1. Hovering over your DBMS, next to 'Open' are three grey dots. Hover over these dots, go to 'Open Folder' and from there click on 'import'. This will open the import folder of your Neo4j DBMS. 
    2. Move your CSV file(s) into this folder: drag-and-drop, copy-and-paste, or if you're more comfortable in the Terminal you can use the provided path and the mv or cp commands.
5. Now that your DBMS is activated, you can access it from any of Neo4j's other tools, such as Bloom and Browser. You can do this by hovering over your DBMS and clicking the down arrow next to 'Open' and choosing any of the options listed. We'll get into this more later, so for now just stay in Desktop.

Remote DBMS
1. In your new Project, click the 'Add' button on the right-hand side of the project, and choose 'Remote DBMS'. 
2. Here, you can rename your DBMS and enter the remote connection URL. Hit 'Next' and enter the username and password of your remote server. Then choose 'Save'.
3. Hover over your newly created DBMS and click 'Connect'. It will take a moment to start, but soon you'll see a green 'Active' label.
4. In your remote server, make sure that your CSV file(s) are set as publicly available. 
5. Now that your DBMS is activated, you can access your DBMS from Neo4j's other tool called Browser. You can do this by hovering over your DBMS and clicking 'Open'. We'll get into this more later, so for now just stay in Desktop.

### Create the Database: Nodes and Relationships
(Instructions for local and remote DBMS)

1) Open your favorite text editor and create a .cypher file. 
    1. If using Visual Studio Code, install the 'Cypher Query Language' Extension. Then, create a new file and use the .cypher extension.
    2. Create [Indexes](https://neo4j.com/docs/cypher-manual/current/indexes-for-search-performance/) or  [Constraints](https://neo4j.com/docs/cypher-manual/current/constraints/) on all of the nodes you are planning on creating.
    3. Use the [LOAD CSV](https://neo4j.com/docs/cypher-manual/current/clauses/load-csv/) command to read your CSV files and use its data to make your graph. The file path will either be "file:///filename.csv" if you created a Local DBMS, or it will be the URL to your publicly available CSV files in your remote server.
    4. Create the nodes
        1. MERGE prevents you from creating duplicate nodes. I recommend using this command instead of CREATE when making nodes. Only use CREATE when you are positive that the CSV file(s) do not contain duplicate rows.
            1. Write different MERGE statements for each different node type.  
            2. Typically, the first node is all of the 'subject' columns of the CSV file, and the second node will be the 'object' columns. 
        2. You can speed up the search process by providing the primary key (the property you created the constraint on) in the MERGE clause. This prevents making duplicate nodes with the same name, for example. 
        3. You can then SET properties of your node. These are going to come from the extra Subject and Object columns in your CSV file (id's, symbols, categories, etc.)
    5. Create the relationships
        1. I recommend using the CREATE command, although if you set it up correctly, the MERGE clause provides the same benefit to relationships as nodes, explained above.
            1. Write different CREATE statements for each different relationship type.
        2. The relationship should be created with the subject node on the left, and the arrow pointing to the object node on the right. The subject acts on the object. 
        3. You can then SET properties of your node. These are going to come from the extra ASSOCIATION columns in your CSV file (knowledge source, publications, etc.)
    6. [Here](https://neo4j.com/docs/cypher-manual/current/syntax/) you can find more documentation on the Cypher query syntax. 
    7. Note that all Cypher queries end with a semicolon. 


2. In Neo4j Desktop, hover over your DBMS and click 'Open'.
    1. Neo4j Browser will open.
    2. Copy the constraints from your text editor and paste them into the query bar and run, a message should appear saying they were created.
    3. Copy and paste your remaining query. This may take a few moments to run, but you will soon see a message saying some number of nodes, relationships, and properties were created. 

In [None]:
CREATE CONSTRAINT FOR (d:Drug) REQUIRE d.Name IS UNIQUE;
CREATE CONSTRAINT FOR (g:Gene) REQUIRE g.Symbol IS UNIQUE;

LOAD CSV WITH HEADERS
FROM "file:///demo.csv" AS row

MERGE (subject:Drug {Name: row.subject_name})
SET subject.Pubchem_ID = row.subject_id

MERGE (object:Gene {Symbol: row.object_symbol})
SET subject.NCBI_ID = row.object_id,
    subject.Prefixes = row.object_id_prefixes

CREATE (subject)-[rel:TARGETS]->(object)
SET rel.Publications = row.ASSOCIATION_Publications,
    rel.FDA_Approval_Status = row.ASSOCIATION_FDA_Approval_Status,
    rel.Knowledge_Source = row.ASSOCIATION_Knowledge_Source

### Query your Graph Database

Neo4j User Interface
1. In Neo4j Desktop, hover over your active DBMS and click 'Open'. This will open your graph database in Neo4j Browser.
2. Here you can query your database! First, I recommend entering ':schema' and push 'Run' (the play-arrow button). This tells you about your nodes, relationships, constraints, and other logistical information about your graph.
3. Entering 'CALL db.schema.visualization' allows you to see your nodes and relationships.
4. Here you can enter any Cypher query you want. Refer to the Cypher [documentation](https://neo4j.com/docs/cypher-manual/current/syntax/) for writing these queries. 
5. Depending on the RETURN statement you write in your query, you might be given the option to view your results as a graph. This is helpful with checking that your graph is designed exactly as you want it, and with visualizing new discoveries!

Python Script (remote DBMS only)
1. Navigate to the docs/README_notebooks directory and choose the api_development notebook. Scroll down to the "Connect to Database" section for description on how to connect to your remote graph database.


### Future Steps
Future steps include adding more knowledge graphs to the database. The advantage of a graph database is that the schema is flexible. Unlike in a relational database, continuously adding to the system and performing updates to the graph is simple and effortless. 

### Additional Links
Here is a useful [link](https://neo4j.com/docs/) to the documentation for all of Neo4j's different tools and the Cypher query language. 