Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

Latest commit

 

History

History
219 lines (148 loc) · 8.94 KB

readme.adoc

File metadata and controls

219 lines (148 loc) · 8.94 KB

Efficient Graph Algorithms for Neo4j

Build Status

This library provides efficiently implemented, parallel versions of common graph algorithms for Neo4j 3.x exposed as Cypher procedures.

You can find the documentation here http://neo4j-contrib.github.io/neo4j-graph-algorithms

Algorithms

Centralities

These algorithms determine the importance of distinct nodes in a network.

  • Page Rank (algo.pageRank)

  • Betweenness Centrality (algo.betweenness)

  • Closeness Centrality (algo.closeness)

Community Detection

These algorithms evaluate how a group is clustered or partitioned, as well as its tendency to strengthen or break apart.

  • Louvain (algo.louvain)

  • Label Propagation (algo.labelPropagation)

  • (Weakly) Connected Components (algo.unionFind)

  • Strongly Connected Components (algo.scc)

  • Triangle Count / Clustering Coefficient (algo.triangleCount)

Path Finding

These algorithms help find the shortest path or evaluate the availability and quality of routes.

  • Minimum Weight Spanning Tree (algo.mst)

  • All Pairs- and Single Source - Shortest Path (algo.shortestPath, algo.allShortestPaths)

These procedures work either on the whole graph or on a subgraph filtered by label and relationship-type. You can also use filtering and projection using Cypher queries.

We’d love your feedback, so please try out these algorithms and let us know how well they work for your use-case. Also please note things that are missing from the installation instructions or documentation.

Please raise GitHub issues for anything you encounter or join the neo4j-users Slack group and ask in the #neo4j-graph-algorithm channel.

Installation

Download graph-algorithms-algo-*.jar from the matching release and copy it into your $NEO4J_HOME/plugins directory.

Because the algorithms use the lower level Kernel API to read from and write to Neo4j you will also have to enable them in the configuration (for security reasons):

Add to $NEO4J_HOME/conf/neo4j.conf
dbms.security.procedures.unrestricted=algo.*

Once you’ve done that restart Neo4j and execute the query CALL algo.list() to see a list of all the algorithms.

You can also see the full list in the documentation.

Usage

These algorithms are exposed as Neo4j procedures. We can call them directly from Cypher in your Neo4j Browser, from cypher-shell, or your client code.

For most algorithms we provide two procedures:

  • one named algo.<name> that writes results back to the graph as node-properties and reports statistics.

  • another named algo.<name>.stream that returns a stream of data, e.g. node-ids and computed values.

For large graphs the streaming procedure might return millions or billions of results, that’s why it is often more convenient to store the results of the algorithm and then use them with later queries.

We can project the graph we want to run algorithms on with either label and relationship-type projection or cypher projection.

+----------+label/rel type projection +-----------+
|  Neo4j   +------------------------->| Projected |  Execute algorithm
| stored   |    cypher projection     |   graph   |<-------------------
|  graph   +------------------------->|           |
+----------+                          +-----------+

Projected graph model is separate from Neo4j’s stored graph model to allow fast cache for the topology of the graph containing only relevant nodes, relations and in addition the weights. For now the projected graph model does not support multiple relationships between a single pair of nodes. During projection we allow only one relationship between a pair of nodes per direction (in, out) in the directed case but two relationships for BOTH the undirected case.

Label and relationship-type projection

We can project the subgraph we want to run the algorithm on using label parameter to describe nodes and relationship-type to describe relationships.

The general call syntax is:

CALL algo.<name>([label],[relationshipType],{config})

e.g. Page Rank on DBpedia (11M nodes, 116M relationships):

CALL algo.pageRank('Page','Link',{iterations:5, dampingFactor:0.85, write: true, writeProperty:'pagerank'});
// YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty

CALL algo.pageRank.stream('Page','Link',{iterations:5, dampingFactor:0.85})
YIELD node, score
RETURN node.title, score
ORDER BY score DESC LIMIT 10;

Cypher projection

If label and relationship-type projection is not selective enough to describe our subgraph to run the algorithm on, we can use Cypher statements to project subsets of our graph. Use a node-statement instead of the label parameter and a relationship-statement instead of the relationship-type and use graph:'cypher' in the config.

Relationships described in the relationship-statement will only be projected if both source and target nodes are described in the node-statement. All other relationships that don’t have both source and target nodes described in node-statement will be ignored in projection.

We can also return a property value or weight (according to our config) in addition to the id’s from these statements.

Cypher projection allows us to be more expressive in describing our subgraph we want to analyse, but might take longer to project the graph with more complex cypher queries.

CALL algo.pageRank(
'MATCH (p:Page) RETURN id(p) as id',
'MATCH (p1:Page)-[:Link]->(p2:Page) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:5, write: true});

Cypher projection can also be used to project a virtual (non-stored) graph. Here is an example of how to project an undirected graph of people who visited the same web page and run Louvain community detection algorithm on it using the number of common visited web pages between pairs of people as relationship weight.

CALL algo.louvain(
'MATCH (p:Person) RETURN id(p) as id',
'MATCH (p1:Person)-[:Visit]->(:Page)<-[:Visit]-(p2:Person)
RETURN id(p1) as source, id(p2) as target, count(*) as weight',
{graph:'cypher', iterations:5, write: true});

The detailed call syntax and all parameters and possible return values for each algorithm are listed in the project’s documentation

Graph Loading

As it can take some time to load large graphs into the algorithm data structures, you can pre-load graphs and then later refer to them by name for several graph algorithms. After usage they can be removed from memory to free resources used.

// Load graph
CALL algo.graph.load('my-graph','Label','REL_TYPE',{graph:'heavy',..other config...})
  YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
        nodeWeight, relationshipWeight, nodeProperty, loadNodes, loadRelationships;

// Info on loaded graph
CALL algo.graph.info('my-graph')
  YIELD name, type, exists, removed, nodes;

// Use graph
CALL algo.pageRank(null,null,{graph:'my-graph',...})


// Remove graph
CALL algo.graph.remove('my-graph')
  YIELD name, type, exists, removed, nodes;

Building Locally

Currently aiming at Neo4j 3.x (with a branch per version)

git clone https://github.com/neo4j-contrib/neo4j-graph-algorithms
cd neo4j-graph-algorithms
git checkout 3.3
mvn clean install
cp algo/target/graph-algorithms-*.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart