Skip to content
markus1978 edited this page Mar 4, 2013 · 14 revisions

Synopsis

EMF fragments is a Eclipse Modeling Framework (EMF) persistence layer for distributed data stores like NoSQL databases (e.g. MongoDB, Hadoop/HBase) or distributed file systems.

What Problem Does It Solve?

[EMF](http://www.eclipse.org/emf Eclipse Modeling Framework) is designed to programmatically create, edit, and analyze software models. It provides generated type-safe APIs to manipulate models based on a schema (i.e. metamodel). This is similar to XML and XML-schemas with JAX-like APIs. EMF works fine as long as you use it for software models and your models fit into main memory. If you use EMF for different types of data, e.g. sensor-data, geo-data, social-data, etc. you run out of main memory soon and thinks become a little bit more complicated.

Why would you use EMF for data other than software models? EMF provides very good generated APIs, generated GUI tools to visualize data, and a series of strong query and model transformation languages. All things one can apply to structured data at large. Data in EMF is described through metamodels similar to XML schemas or entity-relationship diagrams, but EMF is open two how dat is actually stored internally. Thus, EMF can provide a common abstraction layer for different technologies (e.g. relational databases, xml-files, NoSql databases).

To use large models in EMF, we need something to persist models that does not require us to load complete models into memory. Existing solutions include ORM mappings (i.e. eclipse's CDO). These solutions have drawbacks:

  1. ORM mappings store data slowly because data is indexed and stored very finely grained
  2. ORM mappings are slow when data structures are traversed, since data is loaded piece by piece even though it is used in larger aggregates
  3. It is hard to distribute SQL databases

EMF-Fragments is designed to store large object-oriented data models (typed, labeled, bidirectional graphs) efficiently and scalable. EMF-Fragments emphasizes on fast storage of new data and fast navigation of existing data structures. The requirements for this framework come from storing and analyzing large ammounts of sensor data in real-time.

How Does EMF Fragments Work?

EMF-Fragments is different from frameworks based on object relatational mappings (ORM) like Connected Data Objects (CDO). While ORM mappings map single objects, attributes, and references to database entries, EMF-Fragments maps larger chunks of a model (fragments) to URIs. This allows it to store models on a wide range of distributed data-stores inlcuding distributed file-systems and key-value stores (think NoSQL databases like MongoDB or HBase). This also prepares EMF models for cloud computing paradigms such as Map/Reduce.

example fragmentation

The EMF-Fragments framework automatically and transparently fragments models in the background. Clients designate types of references at which models are fragmented. This allows clients to control fragmentation without the need to trigger it programatically. Fragments are managed automatically: when you create, delete, move, edit model elements and new fragments are created and elements are distributed among those fragments on the fly. Fragments (internally realized as resources) are identified by URIs. The framework allows to map URIs to (distributed) data-stores (e.g. NoSql databases or distributed file systems).

How Is EMF Fragments Used?

Using EMF-Fragments is simple if you are used to EMF. You create EMF metamodels as usual, e.g. with ecore. You generate APIs and tools as usual using normal genmodels but with three specific parameters.

  1. You have to configure your genmodels to use reflective feature delegation.
  2. You have to use a specific base class: FObjectImpl
  3. You have to enable Containment Proxies

You use the generated APIs and tools as usual. The only different is that you use resource URIs that EMF-Fragments recognizes. EMF-Fragments supports different URI schemes for different databases:

  1. memory://<model_id>, in memory persistence to test EMF-Fragments
  2. mongodb://<host>/<model_id>, stores models in mongoDB
  3. hbase://<host>/<model_id>, stores models in hbase
  4. file://<directory>, stores models in (distributed) file systems

To actually have your models fragmented, you need to annotate your meta-model. There are different possibilities:

  1. You mark a containment reference with de.hub.emffag:fragments->true. This creates containment proxies, references are persisted at the owner with URIs, and all values become the root object of their own fragment.
  2. You mark a containment reference with de.hub.emffrag:indexes->true. This creates an internal indexed value-set. The resulting EList is limited, but you can add values and iterate the list. The advantage is that no proxy URIs have to be stored for the owner. Thus, this scales even for very large value-sets.
  3. You mark a non-containment reference with de.hub.emffag:indexes->true. Similar to the last one, but the values are not stored in separate fragments.
  4. You use one of the provided abstract index classes IndexedMap or IndexedList. This is similar to the last two options, but gives you a direct interface and not a crippled EList. IndexMap allows you to realize maps with arbitrary key types, but you have to provide an order-perserving mapping from that key-type to byte[].

Architecture

architecture

Hello World Example

This example is part of the de.hub.emffrag.tests eclipse project. You can find it within the sources of emf-fragments. For this Hello World example we use a very simple meta-model:

example meta-model

The following code demonstrates how to initialize emf-fragments, to create a model, and how to traverse a model:

// necessary if you use EMF outside of a running eclipse environment
EmfFragActivator.standalone(TestModelPackage.eINSTANCE);
		
// initialize your model
Resource resource = new ResourceSetImpl().createResource(URI.createURI("memory://localhost/test"));
		
// create the model as usual
TestObject testContainer = TestModelFactory.eINSTANCE.createTestObject();
testContainer.setName("Container");
resource.getContents().add(testContainer);
	
TestObject testContents = TestModelFactory.eINSTANCE.createTestObject();
TestObject testFragmentedContents = TestModelFactory.eINSTANCE.createTestObject();
		
testContents.setName("Hello Old World!");
testFragmentedContents.setName("Hello New World!");
		
testContainer.getRegularContents().add(testContents);
testContainer.getFragmentedContents().add(testFragmentedContents);
		
// call save to force save of cached and unsaved parts of your model
// before exiting the JVM
resource.save(null);
		
System.out.println("Key value store contents: ");
System.out.println(((FragmentedModel)resource).getDataStore());
		
// to read a model initialize the environment as before
// initialize your model
resource = new ResourceSetImpl().createResource(URI.createURI("memory://localhost/test"));
		
// navigate the model as usual
System.out.println("Iterate results: ");
TreeIterator<EObject> allContents = resource.getAllContents();
while (allContents.hasNext()) {
	System.out.println(allContents.next());			
}

The result should be something like this:

Key value store contents: 
memory://localhost/test
key: 102 95 0 0 0 0 0 0 0 0 , URI: memory://localhost/test/Zl8AAAAAAAAAAA
value: <?xml version="1.0" encoding="UTF-8"?>
<tm:TestObject xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:tm="http://hu-berlin.de/sam/emfhbase/testmodel" name="Container">
  <regularContents name="Hello Old World!"/>
  <fragmentedContents href="Zl8AAAAAAAAAAQ#/"/>
</tm:TestObject>

key: 102 95 0 0 0 0 0 0 0 1 , URI: memory://localhost/test/Zl8AAAAAAAAAAQ
value: <?xml version="1.0" encoding="UTF-8"?>
<tm:TestObject xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:tm="http://hu-berlin.de/sam/emfhbase/testmodel" name="Hello New World!"/>


Iterate results: 
Container
Hello Old World!
Hello New World!

As you can see, the object added to the fragmentedContents reference was stored in its own fragment. The object added to the normal contents reference was stored in the same fragment as its container. The fragmentedContents reference was annotated with de.hub.emffrag:fragments->true, the reference contents was not.