Luca Garulli edited this page Feb 9, 2015 · 26 revisions


Users should read the Official Documentation. This wiki contains the source of documentation for editors only (links are broken if you read from GitHub).

The OrientDB-ETL module is an amazing tool to move data from and to OrientDB by executing an ETL process. It's super easy to use. OrientDB ETL is based on the following principles:

  • one configuration file in JSON format
  • one Extractor is allowed to extract data from a source
  • one Loader is allowed to load data to a destination
  • multiple Transformers that transform data in pipeline. They receive something in input, do something, return something as output that will be processed as input by the next component

How ETL works


Example of a process that extract from a CSV file, apply some change, lookup if the record has already been created and then store the record as document against OrientDB database:

|           |              PIPELINE             |
+ EXTRACTOR +-----------------------+-----------+
|           |     TRANSFORMERS      |  LOADER   |
|   FILE   ==>  CSV->FIELD->MERGE  ==> OrientDB |

The pipeline, made of transformation and loading phases, can run in parallel by setting the configuration {"parallel":true}.

## Installation Starting from OrientDB v2.0 the ETL module will be distributed in bundle with the official release. If you want to use it, then follow these steps:

  • Clone the repository on your computer, by executing:
  • git clone
  • Compile the module, by executing:
  • mvn clean install
  • Copy script/ (or .bat under Windows) to $ORIENTDB_HOME/bin
  • Copy target/orientdb-etl-2.0-SNAPSHOT.jar to $ORIENTDB_HOME/lib


$ ./ config-dbpedia.json

## Available Components


Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.