Graph Database Support

Amresh edited this page Jul 14, 2013 · 24 revisions
Clone this wiki locally

In a fast connecting and complex world, Graph databases like Neo4j, InfiniteGraph etc are getting popular because they provide lightning-fast access to complex data found in social networks, recommendation engines and networked systems among other similar problems.

Kundera supports Neo4j which is probably the most popular graph database till date. Our journey to making this work has been bumpy because of complex structure of the graph and challenges in fitting things into JPA, specification on which Kundera is based.

If you are new to Kundera, you can start with what is Kundera and Getting started in 5 minutes.

IMDB Example

We'll be using famous IMDB example for our demonstration. Below is graphical representation of data that we would be representing in the form of entities. (Top two circles are Actor nodes while bottom 3 nodes are Movie nodes. They are connected by edges that represent Role relationships)

IMDB Example

Defining Entities

Actor Node Entity

import java.util.HashMap;
import java.util.Map;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.Id;
import javax.persistence.ManyToMany;
import javax.persistence.MapKeyJoinColumn;
import javax.persistence.Table;

@Entity
@Table
public class Actor
{
    @Id
    @Column(name = "ACTOR_ID")
    private int id;

    @Column(name = "ACTOR_NAME")
    private String name;

    @ManyToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER)
    @MapKeyJoinColumn(name = "ACTS_IN")
    private Map<Role, Movie> movies;
    
    public void addMovie(Role role, Movie movie)
    {
        if (movies == null) movies = new HashMap<Role, Movie>();
        movies.put(role, movie);
    }
    //Costructors, getters/ setters other methods ommitted.
}

Movie Node Entity

import java.util.HashMap;
import java.util.Map;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.Id;
import javax.persistence.ManyToMany;
import javax.persistence.Table;

@Entity
@Table
public class Movie
{
    @Id
    @Column(name = "MOVIE_ID")
    private String id;

    @Column(name = "TITLE")
    private String title;

    @Column(name = "YEAR")
    private int year;

    @ManyToMany(fetch = FetchType.LAZY, mappedBy = "movies")
    private Map<Role, Actor> actors;

    public void addActor(Role role, Actor actor)
    {
        if (actors == null) actors = new HashMap<Role, Actor>();
        actors.put(role, actor);
    }
    //Constructors, getters/ setters, other methods ommitted
}

Role Relationship Entity

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.OneToOne;
import javax.persistence.Table;

@Entity
@Table
public class Role
{
    @Id
    @Column(name = "ROLE_NAME")
    private String roleName;

    @Column(name = "ROLE_TYPE")
    private String roleType;

    @OneToOne
    private Actor actor;

    @OneToOne
    private Movie movie;

    //Constructors, getters/ setters, other methods ommitted
}

Configuration

You need to put an entry for persistence unit specific to your Neo4j configuration. Kundera currently supports Neo4j server in embedded mode only, hence you would require to put database file name.

<persistence-unit name="imdb">
		<provider>com.impetus.kundera.KunderaPersistence</provider>
                <class>com.impetus.client.neo4j.imdb.Actor</class>
		<class>com.impetus.client.neo4j.imdb.Movie</class>
		<class>com.impetus.client.neo4j.imdb.Role</class>
		<properties>						
			<property name="kundera.datastore.file.path" value="target/imdb.db" />			
			<property name="kundera.dialect" value="neo4j" />
			<property name="kundera.client.lookup.class"
				value="com.impetus.client.neo4j.Neo4JClientFactory" />
			<property name="kundera.cache.provider.class"
				value="com.impetus.kundera.cache.ehcache.EhCacheProvider" />
			<property name="kundera.cache.config.resource" value="/ehcache-test.xml" />
			<property name="kundera.client.property" value="kunderaNeo4JTest.xml"/>
			<property name="kundera.transaction.resource.class" value="com.impetus.client.neo4j.Neo4JTransaction" />
		</properties>
	</persistence-unit>

CRUD

//Imports
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;

/** Prepare data*/
        // Actors
        actor1 = new Actor(1, "Tom Cruise");
        actor2 = new Actor(2, "Emmanuelle Béart");

        // Movies
        Movie movie1 = new Movie("m1", "War of the Worlds", 2005);
        Movie movie2 = new Movie("m2", "Mission Impossible", 1996);
        Movie movie3 = new Movie("m3", "Hell", 2009);

        // Roles
        Role role1 = new Role("Ray Ferrier", "Lead Actor");
        role1.setActor(actor1);
        role1.setMovie(movie1);
        Role role2 = new Role("Ethan Hunt", "Lead Actor");
        role2.setActor(actor1);
        role2.setMovie(movie2);
        Role role3 = new Role("Claire Phelps", "Lead Actress");
        role3.setActor(actor2);
        role1.setMovie(movie2);
        Role role4 = new Role("Sophie", "Supporting Actress");
        role4.setActor(actor2);
        role1.setMovie(movie3);

        // Relationships
        actor1.addMovie(role1, movie1);
        actor1.addMovie(role2, movie2);
        actor2.addMovie(role3, movie2);
        actor2.addMovie(role4, movie3);

        movie1.addActor(role1, actor1);
        movie2.addActor(role2, actor1);
        movie2.addActor(role3, actor2);
        movie3.addActor(role4, actor2);
        
        //Create Entity Manager
        EntityManagerFactory emf = Persistence.createEntityManagerFactory("imdb");
        EntityManager em = emf.createEntityManager();

        //Write Actor and movie nodes alongwith Roles
        //Please note, it's mandatory to run all modifying operations within transaction in Neo4J
        em.getTransaction().begin();
        em.persist(actor1);
        em.persist(actor2);
        em.getTransaction().commit();

        //Fetch actors
        em.clear();   //Clearing Persistence cache (not required but ensures data is fetched from database afresh)
        Actor actor1 = em.find(Actor.class, 1);
        Actor actor2 = em.find(Actor.class, 2);

        //Update Actors
        actor1.setName("Arnold");
        actor2.setName("Julia");
        em.getTransaction().begin();
        em.merge(actor1);
        em.merge(actor2);
        em.getTransaction().commit();

        //Delete Actors (operation is cascaded to Movies too)
        em.getTransaction().begin();
        em.remove(actor1);
        em.remove(actor2);
        em.getTransaction().commit();
 

Polyglot Persistence

Kundera's Polyglot Persistence allows you to store and retrieve part of your business data into multiple data stores. Since, it's not practical for a node in graph to point to a record in key-value based datastore for example, Kundera creates, what it calls, "Proxy Nodes" for this purpose. This is depicted in diagram below.

Polyglot Persistence

You can test this feature yourself by changing Movie entity definition to:

@Entity
@Table(name="MOVIE", schema="MyCassandraSchema@Cassandra_Persisstence_Unit")
public class Movie
{
    //Same attributes here
}

Batch Insertion

Batch insertions in Neo4j bypasses transaction, are not thread safe, and as a result, perform faster. They are hence, well suited for initial import of data.

You can run persist operations as batch insertion by either of the following way:

  1. Specify "kundera.batch.size" property in persistence.xml, and persist operation will run in batch.
    <property name="kundera.batch.size" value="5000" />
  1. Or, give a value to this property in Map, while creating EntityManagerFactory.
    Map properties = new HashMap();
    properties.put(PersistenceProperties.KUNDERA_BATCH_SIZE, "5000");
    EntityManagerFactory emf = Persistence.createEntityManagerFactory("imdb", properties);

JPA Queries

Kundera converts JPA queries into Lucene queries and runs them directly on Lucene indexes. An example is:

    Query query = em.createQuery("select a from Actor a where a.name=:name");
    query.setParameter("name", "Tom Cruise");
    List<Actor> actors = query.getResultList();

Native Queries

Native queries in JPA allow you to run query language supported by underlying database directly. For Neo4J, Kundera supports only Lucene queries as of now.

Running Lucene Queries

You can run Lucene queries directly onto lucene indexes (both manual and auto). Here is how this works:

    emf = Persistence.createEntityManagerFactory("neo4j_persistence_unit");
    em = emf.createEntityManager();
    Query query = em.createNativeQuery("ACTOR_NAME:Tom", Actor.class);
    List<Actor> actors = query.getResultList();

Running Cypher Queries

While you can't directly run Cypher queries using createNativeQuery() as of now, there is a way you can run cypher queries by getting a handle of GraphDatabaseService.

    emf = Persistence.createEntityManagerFactory("neo4j_persistence_unit");
    em = emf.createEntityManager();
    Map<String, Client> clients = (Map<String, Client>) em.getDelegate();
    client = (Neo4JClient) clients.get("neo4j_persistence_unit");
    GraphDatabaseService graphDb = client.getConnection();
    ExecutionEngine engine = new ExecutionEngine(graphDb);
    ExecutionResult result = engine.execute("My Cypher Query String");
    em.close();
    emf.close();

Running Gremlin Queries

Gremlin queries are proposed to be added in Kundera in later releases.

Limitations

Some limitations/ features yet to be added are:

  1. Polyglot persistence will work only when owning side entity is for Neo4j.
  2. Because of graph's very nature, only Many-To-Many relationships are supported for the sake of simplicity. Support for other relationships maybe added later.
  3. Unidirectional relationships are not tested hence not supported as of now.
  4. Only JPA and Lucene queries supported as of now. Cypher and Gremlin are proposed to be added later.
  5. Neo4j will run in embedded mode only. REST client support is proposed for later releases.