Skip to content
slashdotted edited this page Aug 30, 2017 · 1 revision

Poma Pipeline

Poma is a middleware and framework that enables you to easily create distributed pipelines for processing data. To install Poma please follow the instructions published at https://github.com/slashdotted/PomaPure.

Getting started

Poma aims to be generic and easily customizable. Hence you will probably need to configure the framework to suit your needs, for example by creating new processing modules or by changing the type of data exchanged within a pipeline. There are nonetheless some basic concepts that need to be understood:

  • data processing is achieved by means of pipelines which are executed by Poma
  • each pipeline describes a data processing flow made of modules and links between modules
  • modules can be either sources (if they generate some data) or processors (if the get some data as input and produce an output)
  • the type of data exchanged by modules is called packet
  • each packet contains some metadata (in a JSON-like format) and a payload which can be customized
  • packets travel on the pipeline following the links between modules

Pipelines are defined using a (very simple) custom JSON-base format, for example:

{
  "source" : "mysource",
  "modules" : {
     "mysource" : {
        "type" : "Dummy"
     },
     "mysink" : {
        "type" : "Dummy"
     }
  },
 "links" : [
   { "from" : "mysource", "to" : "mysink" }
 ]
}

defines a pipeline with two modules, called mysource and mysink, both of type Dummy which are connected together by a link. Module mysource is also defined as the source of the pipeline, meaning that data processing will start from that point.

The type attributed of a module defines which plugin library will be used to create an instance of the module: all plugin libraries normally reside in the Modules directory.

Customizing the packet type

(TODO)

Creating a new module

(TODO)

Executing the pipeline

(TODO)

Split the pipeline into multiple threads

(TODO)

Parallel data processing

(TODO)

Distributed data processing

(TODO)

Using the visual editor

(TODO)