Integrates the components developed within the EU MONDO project.
Overall structure and dependencies
The main artefacts developed within this project are:
uk.ac.york.mondo.integration.server.product, an Eclipse product for a Jetty application server that hosts the integrated MONDO components and provides Apache Thrift-based APIs to access them remotely. Clients targeting this API are included in the
uk.ac.york.mondo.integration.updatesiteEclipse P2 update site (GUI-based clients and OSGi console extensions) and
uk.ac.york.mondo.integration.clients.cli.productEclipse product (command-line client).
uk.ac.york.mondo.integration.eclipse.product, an Eclipse product with a custom Mars SR.1 distribution that includes all the Eclipse-based tools in the MONDO project.
The project has the following external dependencies:
- Most of the OSGi-ready dependencies come from the
uk.ac.york.mondo.integration.targetplatformtarget platform definition. Due to conflicting licenses, developers are expected to build the Hawk GPL update site on their own and edit the target platform to refer to their locally built archive. Please refer to mondo-hawk/README.md for instructions. The Hawk GPL update site must be served through a local web server on port 8000: in most UNIX environments, changing to its directory and running
python -m SimpleHTTPServer 8000should be enough.
atl-mrEclipse project of the bluezio/integrate-hawk-emf branch of the ATL/MapReduce project.
- Some of the plugins in the org.eclipse.atl.atlMR project (see the CloudATL section for details).
- The Apache Artemis core client and server libraries. These are listed in
uk.ac.york.mondo.integration.artemis/ivy.xml, so they are easy to download using IvyDE.
Compiled versions of the remote client components are available as an Eclipse update site and as a Maven repository here.
We recommend using Eclipse Modeling (Luna or later).
- Go to the MONDO updates site and install the fix for SNI updates.
- Import all
uk.ac.york.mondo.integration.*projects from this repository.
The dependencies for the project using Artemis are located in
uk.ac.york.mondo.integration.artemis. There are two ways to fetch the dependencies:
Use the Apache IvyDE plug-in. Install everything from the IvyDE update site. Right click the
fetch-deps.xmlfile and choose Run as | Ant Build. If this does not work, go to option 2.
Use the command-line tool. Install the
ivyUbuntu package and issue the following command in the repository:
ant -Dnative-package-type=jar -lib /usr/share/java/ivy.jar -f uk.ac.york.mondo.integration.artemis/fetch-deps.xml
You may have to run the Ant job twice in order to succeed.
Setting the target platform
Before attempting to resolve the target platform, don't forget to start the local webserver in the mondo-hawk directory:
cd mondo-hawk/org.hawk.updateiste/hawk-gpl python -m SimpleHTTPServer 8000
Go to the
uk.ac.york.mondo.integration.targetplatform, open the
uk.ac.york.mondo.thrift.osgi.example.targetplatform.target Target Definition file and click Set as Target Platform.
The Thrift APIs are defined in
uk.ac.york.mondo.integration.api, particularly in the
src/api.emf Emfatic file. Emfatic produces an
.ecore file from it, which we transform to the
api.thrift Thrift IDL file using the ecore2thrift plugin. The Thrift compiler then uses
.api project also provides some utility methods and classes in its
api.utils package, which is not manually generated. It is recommended to use
APIUtils to connect to the Thrift APIs instead of using the Thrift classes directly.
The MONDO Eclipse workbench product (
uk.ac.york.mondo.integration.eclipse.product) runs on Mars SR.1, which has some graphics corruption issues in some GNU/Linux distributions. GNU/Linux users are recommended to use the provided
run-eclipse.sh script, which sets up the environment variable appropriately to avoid these issues.
The MONDO server product (
uk.ac.york.mondo.integration.server.product) consists of several servlets that implement the Thrift APIs for the various MONDO components, plus two customizations for the standard OSGi HttpService:
uk.ac.york.mondo.integration.server.logback binds SLF4J to the Logback library, and
uk.ac.york.mondo.integration.server.gzip adds gzip compression to all HTTP responses coming from the server.
When deploying the server in a production environment, it is important to set up the secure store correctly, as it keeps the usernames and passwords of all the VCS that Hawk indexes. This has two steps (in Linux, they are automated when using the provided
The secure store must be placed in a place no other program will try to access concurrently. This can be done by editing
-eclipse.keyring <newline> /path/to/keyringfile. That path should be only readable by the user running the server, for added security.
An encryption password must be set. For Windows and Mac, the available OS integration should be enough. For Linux environments, two lines have to be added at the beginning of the
mondo-server.inifile, specifying the path to a password file:
Creating a password file from 100 bytes of random data can be produced with these commands:
head -c 100 /dev/random | base64 > /path/to/password chmod 400 /path/to/password
The server will test on startup that the secure store has been set properly: if you get a warning that encryption is not available, you will need to revise your setup.
Another important detail for production environments is turning on security. This is disabled by default to help with testing and initial evaluations, but it can be enabled by running the server once, shutting it down and then editing the
shiro.ini file appropriately (relevant sections include comments on what to do) and switching
true in the
mondo-server.ini file. The MONDO server uses an embedded MapDB database, which is managed through the Users Thrift API. Once security is enabled, all Thrift APIs and all external (not in-VM) Artemis connections become password-protected.
If you enable security, you might want to ensure that -Dhawk.tcp.port is not present in the
mondo-server.ini file, since the Hawk TCP port does not support security for the sake of raw performance.
If you are deploying this across a network, you will need to edit the
mondo-server.ini file and customize the
hawk.artemis.host line to the host that you want the Artemis server to listen to. This should be the IP address or hostname of the MONDO server in the network, normally. The Thrift API uses this hostname as well in its replies to the
watchModelChanges operation in the Hawk API.
Additionally, if the server IP is dynamic but has a consistent DNS name (e.g. an Amazon VM + a dynamic DNS provider), we recommend setting
true (so the Artemis server will keep listening on all interfaces, even if the IP address changes) and using the DNS name for
hawk.artemis.host instead of a literal IP address.
Finally, production environments should enable and enforce SSL as well, since plain HTTP is insecure. The Linux products include a shell script that generates simple self-signed key/trust stores and indicates which Java system properties should be set on the server and the client.
Hawk is integrated through its Thrift API, which is implemented in the
uk.ac.york.mondo.integration.hawk.servlet plugin as standard OSGi HttpService servlets. The plugin takes care of saving and reloading the created Hawk instances and starting and stopping the embedded Apache Artemis messaging server.
On top of this, the integration project includes the following clients for the Hawk Thrift APIs:
uk.ac.york.mondo.integration.hawk.climaps the Thrift API to the OSGi console view (use
hawkhelpto see the available commands).
uk.ac.york.mondo.integration.hawk.emfprovides a EMF Resource implementation that allows treating a Hawk index (or a part of it) as a remote read-only model.
uk.ac.york.mondo.integration.hawk.remote.thriftextends Hawk's Eclipse UI with an additional
IModelIndexerimplementation, so it can manage and access remote Hawk indexes in the same way as local ones.
Hawk EMF Resource
Access details for Hawk model indexes are saved as
.hawkmodel files or provided through
hawk+http(s):// URLs. Both can be created with the editor provided by
.hawkmodel files can be then opened as models with the sample Ecore reflective editor or by the Epsilon Exeed editor.
.hawkmodel files are Java property files, so their file format is quite simple. They define the following options:
- Server URL, instance name and Thrift protocol: these are required to contact the Thrift API.
hawk.servletprovides servlets in
/thrift/hawk/tuple: they are all backed by the same storage and logic, but they use different Thrift protocols with varying degrees of language compatibility and efficiency. JSON works for all languages but is the least efficient,
compacttakes up less space at the expense of some extra processing (but only works for a reduced set of languages) and
tupleis the most space-efficient but only works for Java.
- Pattern for the version control repositories whose contents we want to see (we only support "*" as a wildcard at the moment).
- Comma-separated filename patterns for the files whose contents we want to see (again, only "*" is supported at the moment).
- Loading mode: from most eager to least,
GREEDYrequests all model elements at once,
LAZY_ATTRIBUTESrequests all model elements without attributes and then fetches attributes on the fly,
LAZY_CHILDRENrequests only the roots and then fetches all reference fields of a node once a reference is accessed, and
LAZY_REFERENCESrequests only the roots and fetches single reference fields once they are accessed. It is also possible to combine
LAZY_ATTRIBUTESwith the other two lazy modes (
LAZY_ATTRIBUTES_REFERENCES). For very large models (with millions of elements),
LAZY_CHILDRENperforms the best so far at browsing models (when combined with the Epsilon Exeed editor).
- By checking the "Subscribe" box, the resource will ask Hawk to feed an Artemis queue with events about changes in the indexed models. This queue will be used by the resource to update the local view of the model incrementally on the fly. A unique client ID must be provided, in order to support reconnections and durable queues. The durability of the queue can be
DURABLE(survive reconnections and server restarts) or
TEMPORARY(removed after disconnecting).
CloudATL (also known as ATL/MapReduce) is integrated in a similar way to Hawk.
fr.inria.atlanmod.mondo.integration.cloudatl.servlet implements the Thrift API, which is exposed by the
fr.inria.atlanmod.mondo.integration.cloudatl.cli as a set of OSGi console commands.
Setting up a trivial Hadoop cluster for testing
The CloudATL servlet works as a frontend node for a Hadoop cluster, which must have been set up in advance. The
conf folder of the
.servlet project provides an example of how the configuration would look like for a trivial one-node cluster. Using Docker, it is quite simple to start a one-node pseudo-distributed Hadoop cluster. Install Docker, make sure you have over 10% of free disk space (required by Hadoop to start a node) and issue the following command.
sudo docker run -it bluezio/hadoop-jh /etc/bootstrap.sh -bash
After this, the
conf/*.xml files should be updated to reflect the IP address of the Docker instance.
hdfs-site.xml should also be edited to provide valid local directories writable by the user running the MONDO server. When running the server, the
HADOOP_USER_NAME environment variable should be set to
root (the username running Hadoop in the Docker instance) and
HADOOP_CONF_DIR should be set to the absolute path to the
NOTE: pseudo-distributed Hadoop clusters are only meant for quick test runs. To obtain actual performance gains, users will need to set up more realistic Hadoop clusters on their own. More advanced Hadoop configurations are outside of the scope of this document.
Running CloudATL jobs
Once set up, running a CloudATL job from the OSGi console can be done with two commands (the full list is available through
cloudatlconnect http://mondo_server_ip:port/thrift/cloudatl cloudatllaunch emftvm_url sourcemetamodel_url targetmetamodel_url sourcemodel_url targetmodel_url
The supported URL schemes are
hdfs:// (for files that have been previously uploaded to the Hadoop Distributed File System, using e.g.
hdfs dfs -put) and
hawk+http:// (for models indexed by Hawk: these URLs are produced by the
.hawkmodel editor in the
.jar files for Hadoop
The CloudATL servlet needs to have up-to-date builds of ATL/MapReduce and the Hawk EMF driver in its
libs folder, so it can send them to Hadoop for processing. Building the
.jar requires these steps:
- Download and install a recent version of Eclipse and a JDK for Java 7.
- Go to "Window > Preferences > Java > Compiler" and change the default "Compiler compliance level" to 1.7. This is needed for the Docker image above, which comes with Java 7.
- Go to "Help > Install New Software..." and install everything from IvyDE through its update site. Let Eclipse restart.
- Go to the Git perspective with "Window > Perspective > Open Perspective > Other... > Git".
- Clone the
https://github.com/bluezio/ATL_MRGit repository and import its projects. This can be done by copying the URL into the clipboard, right-clicking on the "Git Repositories" view and selecting "Paste Repository Path or URI". Make sure to check the "Import all existing Eclipse projects" box on the last step of the wizard.
- Clone the
https://github.com/atlanmod/org.eclipse.atl.atlMR.gitrepository, but do not import all projects. Instead, uncheck the box, let the clone finish and right click on the "plugins" folder within "Working Directory", selecting the "Import Projects..." menu entry.
- Close all projects except for
org.eclipse.m2m.atl.emftvm.trace, without letting it open any referenced projects.
- Right-click on
atl-mrin the "Package Explorer" view and select "Export... > Ant Buildfiles".
- Right-click on the generated
build.xmlfile in the "Package Explorer" view and select "Run As > Ant Build...". Select the
dist-emftvmconfiguration, and make sure in the "JRE" tab that it runs in the same JRE as the workspace.
- Refresh the
atl-mrproject by right clicking on it in the "Package Explorer" view and selecting "Refresh".
- Again, right-click on the generated
build.xmlfile, but this time select the
distconfiguration. It does not need to run in the same JRE as the workspace, but nevertheless check in the "JRE" tab that a valid JRE has been selected.
- Refresh the project, and we're done: the binary distribution will be located in