This repository contains codes and datasets used in [M-Cypher: A GQL Supporting Motifs], which is built on Cypher to support motif-related queries.
- To express a motif-related operation that is difficult to express declaratively with Cypher.
- To provide access to motif-related functionality that is not available in Cypher.
- For example, subgraph matching and motif connectivity.
- To provide uniform interface to third party applications w.r.t. motifs.
- For example, proved better effectiveness for clustering, node ranking and link prediction.
MATCH (A) WITH A, size((A)--()) as degree WHERE degree>4000 CALL algo.pageRank.stream(null, null, {iterations:20, dampingFactor:0.85, sourceNodes: [A]}) YIELD nodeId, score RETURN algo.asNode(nodeId) AS page,score ORDER BY score DESC
- To provide access to motif-related functionality that is not available in Cypher.
- To provide user-friendly input and output.
- To fullfill standard GQL protocals in theory.
- To embed advanced features into the query, e.g., motif adjacency matrix, proximity (distance) matrix w.r.t. motifs.
The code takes the edge list of the graph. Every row indicates an edge between two nodes separated by a comma. The datasets used in the paper are included in the data/
directory.
nodeID,nodeLabelID,nodeName
nodeID1,nodeID2,edgeLabelID
Labels for nodes nodeLabelID:nodeLabel
and edges edgeLabelID:edgeLabel
- Motif input [Matin];
- Output visulization [Matin];
- M-Cypher parser [Xiaodong];
- 4-page paper [Xiaodong];
- Declarative functionalities [Xiaodong]:
- Motif counting (see Q1);
- Motif instance enumerating (a.k.a isomorphic subgraph detection, see Q2);
- Motif-paths finding (e.g., nodes reachable by triangle connectivities, see Q3);
- Motif-components finding (e.g., k-cliques);
- Embedded API functionalities [Matin and Xiaodong]:
- Motif Page Rank for better node ranking;
- Motif conductance for better graph clustering;
- Motif Discovery;
- Motif adjacency matrix calculation;
- Motif feature vectors for better link prediction.
In the following query examples, we demonstrate three use cases (Q1, Q2 and Q3) by motif M, which is predefined by the user in the GUI.
- cypher:
MATCH p=(:Country)<-[:from_country]-(:Strain)-[:mutate_from_branch]->(:Branch) RETURN COUNT (p)
- m-cypher:
MATCH (m:M) RETURN COUNT (m)
- What if M is a large motif?
- Almost impossible to describe M by path pattern queries in cypher!
- Even so, there will be many duplicates!
- cypher:
MATCH (a:Location)<-[:from_location]-(b:Strain)-[:mutate_from_branch]->(c:Branch) RETURN a,b,c
- m-cypher:
MATCH (m:M) RETURN m
- Same problems exists as Q1!
- cypher: NA
- m-cypher:
MATCH (a:Location{name:a})-[m:M*]->(b:Location{name:b})
- Install neo4j-desktop.
- Put data into
<neo4j-home>/import
. For example,C:/Users/<user>/.Neo4jDesktop/neo4jDatabases/<database>/installation-<version>/import
in windows. - Import COVID19 data into neo4j.
#put 'nodes' and 'edges' in '<neo4j-home>/import' beforehand
LOAD CSV FROM 'file:///nodes' AS line
FOREACH ( ignoreMe in CASE WHEN line[1]='0' THEN [1] ELSE [] END | CREATE (:Host {id:toInteger(line[0]),label:line[2]}))
LOAD CSV FROM 'file:///nodes' AS line
FOREACH ( ignoreMe in CASE WHEN line[1]='1' THEN [1] ELSE [] END | CREATE (:Virus {id:toInteger(line[0]),label:line[2]}))
LOAD CSV FROM 'file:///nodes' AS line
FOREACH ( ignoreMe in CASE WHEN line[1]='2' THEN [1] ELSE [] END | CREATE (:VirusProtein {id:toInteger(line[0]),label:line[2]}))
LOAD CSV FROM 'file:///nodes' AS line
FOREACH ( ignoreMe in CASE WHEN line[1]='3' THEN [1] ELSE [] END | CREATE (:HostProtein {id:toInteger(line[0]),label:line[2]}))
LOAD CSV FROM 'file:///nodes' AS line
FOREACH ( ignoreMe in CASE WHEN line[1]='4' THEN [1] ELSE [] END | CREATE (:Drug {id:toInteger(line[0]),label:line[2]}))
LOAD CSV FROM 'file:///edges' AS line
MATCH (n:Drug {id:toInteger(line[0])}), (m:Virus{id:toInteger(line[1])})
FOREACH ( ignoreMe in CASE WHEN line[2]='0' THEN [1] ELSE [] END | MERGE (n)-[:Effect]->(m))
LOAD CSV FROM 'file:///edges' AS line
MATCH (n:HostProtein {id:toInteger(line[0])}), (m:VirusProtein{id:toInteger(line[1])})
FOREACH ( ignoreMe in CASE WHEN line[2]='1' THEN [1] ELSE [] END | MERGE (n)-[:Interact]->(m))
LOAD CSV FROM 'file:///edges' AS line
MATCH (n:VirusProtein {id:toInteger(line[0])}), (m:HostProtein{id:toInteger(line[1])})
FOREACH ( ignoreMe in CASE WHEN line[2]='2' THEN [1] ELSE [] END | MERGE (n)-[:Bind]->(m))
LOAD CSV FROM 'file:///edges' AS line
MATCH (n:HostProtein {id:toInteger(line[0])}), (m:Host{id:toInteger(line[1])})
FOREACH ( ignoreMe in CASE WHEN line[2]='3' THEN [1] ELSE [] END | MERGE (n)-[:Belong_to]->(m))
LOAD CSV FROM 'file:///edges' AS line
MATCH (n:Virus {id:toInteger(line[0])}), (m:VirusProtein{id:toInteger(line[1])})
FOREACH ( ignoreMe in CASE WHEN line[2]='4' THEN [1] ELSE [] END | MERGE (n)-[:Produce]->(m))
- See
codes/mc-explorer
. - About motif input:
see
codes/mc-explorer/platform/WebContent/js/graphM.js
,codes/mc-explorer/platform/WebContent/js/utilities.js
andcodes/mc-explorer/platform/WebContent/js/graphResult.js
. - About graph visualization: cytoscape.