-
Notifications
You must be signed in to change notification settings - Fork 0
Neo4j
illyfrancis edited this page Jul 26, 2013
·
6 revisions
Intro to Neo4j by Emil Eifram
- Nodes
- Relationships between nodes
- Properties on both
- First Identify the entities
- node id (mandatory attrib on a node)
- node attributes
- Nodes (Person) knows (relationships) other node
- Relationships also have properties (e.g. the "knows" has age = 3 days)
- can add arbitrary relationships "coded_by"
GraphDatabaseService graphDb = ... // get factory (via DI or whatever)
Transaction tx = graphDb.beginTx(); // this is important!!!
// create Thomas "Neo" Anderson
Node mrAnderson = graphDb.createNode();
mrAnderson.setProperty("name", "Thomas Anderson");
mrAnderson.setProperty("age", 29);
// create Morpheus
Node morpheus = graphDb.createNode();
morpheus.setProperty("name", "Morpheus");
morpheus.setProperty("rank", "Captain");
morpheus.setProperty("occupation", "Total bad ass");
// create a relationship representing that they know each other
mrAnderson.createRelationshipTo(morpheus, RelTypes.KNOWS);
// create Trinity, Cypher, Agent Smith, Architect similarly
tx.commit();
// In package org.neo4j.graphdb
public interface RelationshipType {
String name();
}
// example of how to roll dynamic RelationshipTypes
class MyDynamicRelType implements RelationshipType {
private final String name;
MyDynamicRelType(String name) { this.name = name; }
public String name() { return this.name; }
}
There's standard imp in the API by the way.
Dynamic relationship is useful when you don't know before. But if you do know use static relationship type
// Example on how to static-relationshiptype
enum MyStaticRelTypes implements RelationshipType {
KNOWS,
KNOWS_FOR,
}
- Traverser framework for high-performance traversing across the node space
// instantiate a traverser that returns Mr Anderson's frieds
Traverser friedsTraverser = mrAnderson.traverse(
Traverser.Order.BREATH_FIRST, // (1) can be depth etc
StopEvaluator.END_OF_GRAPH, // (3) when to stop
ReturnableEvaluator.ALL_BUT_START_NODE, // (4) what to include
RelTypes.KNOWS, // (2) a filter
Direction.OUTGOING ); // (2.1)
// Traverse the node space and print out the result
System.out.println("Mr Anderson's friends");
for (Node friend: friendsTraverser) {
System.out.printf("At depth %d => %s%n,
friendsTraverser.currentPosition().getDepth(),
friend.getProperty("name");
}
// create a traverser that returns all "friends in love"
Traverser loveTraverser = mrAnderson.traverse(
Traverser.Order.BREATH_FIRST,
StopEvaluator.END_OF_GRAPH,
// to select everyone that has outgoing relationship of type love
new ReturnalbeEvaluator() {
public boolean isReturnableNode(TraversalPosition pos) {
return pos.currentNode().hasRelationship(
RelTypes.LOVES, Direction.OUTGOING);
}
}
RelTypes.KNOWS,
Direction.OUTGOING );
// Traverse the node space and print out the result
System.out.println("Who's a lover");
for (Node person: loveTraverser) {
System.out.printf("At depth %d => %s%n,
loveTraverser.currentPosition().getDepth(),
person.getProperty("name");
}
-
How do you implement your domain model?
-
Use the delegator pattern, i.e. every domain entity wraps a Neo4j primitive:
class PersonImpl implements Person { private final Node underlyingNode; PersonImpl(Node node) { this.underlyingNode = node; }
public String getName() { return (String) this.underlyingNode.getProperty("name"); } public void setName(String name) { this.underlyingNode.setProperty("name", name); }}
- Prob is there's a lot of "cruft" to write
- Some frameworks
- Qi4j(www.qi4j.org)
- framework for doing DDD in pure Java 5
- Defines Entities / Associations / Properties
- == Nodes / Rel's / Properties
- Neo4j is an "EntityStore" backend
- Jo4neo
- Spring Data Neo4j
- SDN is "JPA without the suck" or "JPA for graph dbs"
- How to declare a simple node-backed POJO
@NodeEntity
class Person {
@Indexed
private String name;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
- Social data
- Spatial data (36:01)
- Nodes - (name, lat, longitude) (e.g. Omni Hotel, lat=33948, long=1937823)
- Relationship - (ROAD with property length = 3 miles)
- Then workout shortest path between two places
- Social AND spatial data
- Financial data
- Disk-based
- Navtive graph storage engine with custom binary on-disk format
- Transactional
- JTA/JTS, XA, 2PC, Tx recovery, deadlock detection, multi version concurrency control (MVCC) etc
- Scales up
- Many billions of nodes/rels/props on single JVM
- Robust
- ~1k persons
- Avg 50 firends per person
- pathExists(a,b) limit depth 4
- two backends (mysql & neo) and warmed up
- No O/R impedance mismatch
- Can easily evolve schemas
- Can represent semi-structured info
- Can represnt graphs/netwrosk (with performance)
- Lacks in tool and framework support
- Few other implementations => potential lock in
- No support for ad-hoc queries (not quite true any more) -> Query languages (Cypher)
- Cypher - the new graph query language released in 1.4
- uses pattern matching
- Example
START user = node:people-index(name = "John")
MATCH (user)-[:FRIEND_OF]->()-[:FIREND_OF]->(fof)
RETURN fof
- Example - with filter
START user = node(1,2,3)
MATCH (user)-[:FRIEND_OF]->(friend)
WHERE friend.age > 18
RETURN friend.name
- Gremlin - a Groovy DSL for graphs
- Ex
start.outE('FIRNED_OF').inV{ it.age > 18 }.name
- Jython, Scala, Erlang, C#, Ruby PHP, Groovy etc
- Neo4j HA - Clustering mode with master slave replication
- Neo4j Server with RESTFful API
- Neo4j Spatial for "all nodes withing 20km"
- Pretty webadmin including Gremlin and Cypher support
- and more...
- Neo4j Community (GPL)
- Neo4j Advanced (AGPL - commercially supported) - monitoring and management
- Neo4j Enterprise (AGPL / commercially supported) - high availability clustering
From neo4j.org Graph DB 101
- Example Access Authorization (28:00)
Good for read ops. Comparable to transactional, ACID compliant relational databases for write operations
- Vertical scalability for read and write operations
- adding more hardware will assure near-linear scalability of read/write ops
- Horizontal scalability for read ops (not for writes)
- adding more nodes to a HA cluster will allow for more query throughput
- cache-sharding tech will allow for optimized read speed