# Cypher Query Language
Unlike relational databases, graph databases do not have a uniform query language. Platforms typically build their own query languages. Cypher was originally built for Neo4j but was later opened up via the OpenCypher project (https://opencypher.org/).

## Preliminaries
To start querying our Neo4j database, we have two options:
* Go to http://localhost:7474 to access Neo4j's console
* Use this notebook that uses the combination of [py2neo](https://py2neo.org/2021.1/) and [ipycytoscape](https://github.com/cytoscape/ipycytoscape) to generate graph visualizations in this jupyter notebook. This notebook will focus on creating cypher queries, so we won't delve on the libraries that were used. The `cypher` function provided below should take care of querying and plotting your cypher queries. If you want to learn more about these libraries, you can click on their respective links above.

### Importing libraries

In [1]:
import os
import requests
import ipycytoscape
from py2neo import Graph
import seaborn as sns
import random
from dotenv import load_dotenv

### Loading environment variables and connecting to the Neo4j database

In [2]:
load_dotenv()
graph = Graph(os.getenv('NEO4j_URL'), auth=(os.getenv('NEO4J_USER'), os.getenv('NEO4J_PASSWORD')))

### Cypher query function
This function is built so we can start building cypher queries quickly (same as using Neo4j console). Feel free to look at the following libraries for more information:
* [py2neo](https://py2neo.org/2021.1/)
* [cytoscape.js](https://js.cytoscape.org/)
* [ipycytoscape](https://github.com/cytoscape/ipycytoscape)

In [3]:
style = [{
                            "selector": 'node',
                            "style": {
                            'background-color': 'data(color)',
                            'border-color': 'data(borderColor)',
                            'border-width': 'data(borderWidth)',
                            'label': 'data(label)',
                            "text-valign": "center",
                            "text-halign": "center",
                            'width': "50",
                            'height': "50",
                            }
                        },
                        {
                            "selector": 'edge',
                            "style": {
                            'curve-style': 'straight',
                            'line-color': 'data(lineColor)',
                            'width': '3',
                            'label': 'data(relation)',
                            "text-rotation": "autorotate",
                            "text-margin-x": "0px",
                            "text-margin-y": "0px",
                            'font-size': '12px',
                            'target-arrow-shape': "data(directed)",
                            'target-endpoint': 'outside-to-node',
                            'source-endpoint': 'outside-to-node',
                            'target-arrow-color': 'data(lineColor)',
                            }
                        },
                        {
                            "selector": 'node.highlight',
                            "style": {
                                'border-color': 'gray',
                                'border-width': '2px',
                                'font-weight': 'bold',
                                'font-size': '18px',
                                'width': "90",
                                'height': "90",
                            }
                        },
                        {
                            "selector": 'node.focused',
                            "style": {
                                'border-color': 'gray',
                                'border-width': '2px',
                                'font-weight': 'bold',
                                'font-size': '18px',
                                'width': "90",
                                'height': "90",
                            }
                        },
                        {
                            "selector": 'edge.focusedColored',
                            "style": {
                                'line-color': '#F8333C',
                                'width': '6'
                            }
                        },
                        {
                            "selector": 'node.semitransp',
                            "style":{ 'opacity': '0.5' }
                        },
                        {
                            "selector": 'node.focusedSemitransp',
                            "style":{ 'opacity': '0.5' }
                        },
                        {
                            "selector": 'edge.colored',
                            "style": {
                                'line-color': '#F8333C',
                                'target-arrow-color': '#F8333C',
                                'width': '6'
                            }
                        },
                        {
                            "selector": 'edge.semitransp',
                            "style":{ 'opacity': '0.5' }
                        },
                        {
                            "selector": 'edge.focusedSemitransp',
                            "style":{ 'opacity': '0.5' }
                        }]

In [4]:
palette = sns.color_palette().as_hex()
def cypher(query):
    if query == "": return
    results = graph.run(query).data()
    nodes = []
    edges = []
    colors = {}
    for i in results:
        for vals in i.values():
            for node in vals.nodes:
                label = str(node.labels)
                if label not in colors:
                    colors[label] = palette[len(colors) % len(sns.color_palette())]
                color = colors[label]
                n = {"kind": label, "color": color}
                for k,v in node.items():
                    n[k] = v
                nodes.append(n)
            for relation in vals.relationships:
                r = {
                    "kind": "relation",
                    "source": relation.nodes[0]["id"],
                    "target": relation.nodes[1]["id"]
                    }
                for k,v in relation.items():
                    r[k] = v
                edges.append(r)      
    cytoscapeobj = ipycytoscape.CytoscapeWidget()
    cytoscapeobj.graph.add_graph_from_json({
        "nodes": nodes,
        "edges": edges
    }) 
    cytoscapeobj.set_style(style)
    return cytoscapeobj


## Cypher Query

### Querying nodes
Let's dissect this cypher query:
```MATCH (a:`KEGG Pathway` {label: 'Apoptosis'}) RETURN a```
  * The `MATCH` keyword queries for nodes/relations/paths that matches the succeeding statement
  * Queries wrapped in parenthesis are matched on nodes. In this case, we want to match KEGG Pathway nodes and we want the node to have the label, Apoptosis.
  * The `a` preceding the filter is the variable where we store the results. This is the variable that we then return and display as a graph.

In [6]:
query = "MATCH (a:`KEGG Pathway` {label: 'Apoptosis'}) RETURN a"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

Without any filter, our graph database will match any type of nodes and return them. For this reason, it's ideal to set up a limit to not overwhelm our results:

In [9]:
query = "MATCH (a:`KEGG Pathway`) RETURN a LIMIT 12"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

### Querying relations
Let's dissect this cypher query:
```MATCH ()-[r:`GO BP`]->() RETURN r LIMIT 10```
  * Queries wrapped in square braces matches a relation. In this case we are matching for GO BP edges. Similar to nodes, we can add curly braces to filter edges by property.
  * `()` denotes any node. These are not assigned to any variable.
  * Edges can have directions based on the arrow:
    * `()-[]->()`
    * `()<-[]-()`
  * Alternatively, setting the query to * `()-[]-()` makes it agnostic to the directionality of the edges.

In [11]:
query = "MATCH ()-[r:`GO BP`]->() RETURN r LIMIT 10"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

### Querying paths
You can assign a whole path to a variable, say `p`, like this
```MATCH p=()-[]->() RETURN p LIMIT 10```
  * `()` and `[]` denotes any node or edge.

In [18]:
query = "MATCH p=()-[]->()<-[]-() RETURN p LIMIT 10"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

### Checkpoint Exercise:
Create a query that returns 10 terms that are connected with the gene KL

In [19]:
query = "MATCH p=(:Gene {label: 'KL'})-[]-() RETURN p LIMIT 10"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

### Checkpoint Exercise:
Create a query that returns all terms that are connected with the gene KL with the relationship KEGG

In [10]:
query = ""
cypher(query)

## Querying longer paths
Suppose we want to discover a path of length 2 that connects the gene MAPK1 to the gene CRKL. There are several ways of doing this:

First we can define a path using components that we already know like this:

In [22]:
query = "MATCH p=(:Gene {label: 'MAPK1'})-[]-()-[]-(:Gene {label: 'CRKL'}) RETURN p"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

Alternatively, we can use variable length pattern matching:

```MATCH p=(:Gene {label: 'MAPK1'})-[*2]-(:Gene {label: 'CRKL'}) RETURN p LIMIT 10```

This query allows us to skip defining intermediate nodes and just specify the length of the path in the square braces. Note that there may be some subtle differences on how the graph db's algorithm discover and prioritizes paths between two methods especially if you are using limits.

In [23]:
query = "MATCH p=(:Gene {label: 'MAPK1'})-[*2]-(:Gene {label: 'CRKL'}) RETURN p"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

Variable pattern matching are particularly useful if you want to set a minimum or maximum length of a path, the next query finds path with a minimum length of 2 and maximum length of 4:

In [24]:
query = "MATCH p=(:Gene {label: 'MAPK1'})-[*2..4]-(:Gene {label: 'CRKL'}) RETURN p LIMIT 10"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

### Checkpoint Exercise:
Find at least 10 shared genes between the GO Biological Process Term 'apoptotic process (GO:0006915)' and the KEGG Pathway, 'Apoptosis':

In [14]:
query = ""
cypher(query)

### allShortestPaths
Another option is to use the allShortestPaths function, this returns the all the shortest paths between two nodes:

In [28]:
query = "MATCH p=allShortestPaths((:`GO Biological Process Term` {label: 'apoptotic process (GO:0006915)'})-[*]-(:`KEGG Pathway` {label: 'Apoptosis'})) RETURN p LIMIT 10"
cypher(query)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node', 'style': {'background…

## Cypher resources

This notebook provides an overview on creating Cypher queries but this is by no means exhaustive. To learn more about cypher queries, you can check the cypher manual here: https://neo4j.com/docs/cypher-manual/current/