Skip to content

Creating own operators

luposdate edited this page Jun 11, 2012 · 5 revisions

A very simple example of an operator is the lupos.engine.operators.multiinput.Union operator.

There are two main functionalities to be implemented by an operator. The first one is the intialization of the computation, which variables are surely bound by this operator and which variables may be bound by this operator (depending on the bound variables of its operands). Per default the intersection and the union of all variables of its operands is computed and be stored in lupos.engine.operators.BasicOperator.intersectionVariables and lupos.engine.operators.BasicOperator.unionVariables. If another set of variables is surely bound by the current operator, the Message preProcessMessage(BoundVariablesMessage msg) method can be overridden like in the Union operator, where the intersectionVariables are set in the message msg.

Of course, an operator has to compute a new result based on the results of its operands. This is usually done in the public QueryResult process(final QueryResult queryResult, final int operandID) method, where queryResult is the result of the operand identified by operandID (operandID for the left operand is 0, for the right operand 1 and may be higher if you have a non-binary operator).

The result of the operator is usually directly returned by this method. However, for multi-input operators in the case of pipeline-breakers, the results of the operands often must be first collected before they are processed. Then the calculation may be done in the public Message preProcessMessage(final EndOfEvaluationMessage msg) method (for an example, see lupos.engine.operators.multiinput.join.HashJoin).

For consuming the result of an operand, it is recommended to use queryResult.oneTimeIterator(), which returns an iterator of type Bindings. Please note that using oneTimeIterator() allows to iterate only once through the result of an operand (and afterwards, the result of the operand may be forgotten). However, if iterating only once is acceptable, this is the most efficient way to process the result (and is based on the iterator concept of database operators). If you have to iterate several times through the result of an operand, you can use iterator(), which internally stores all results intermediately in main memory or on disk (depending on the configuration and the size of the result) for succeeding iterations.

There are two main ways to generate a QueryResult:

  1. The first way is to compute the complete QueryResult of the current operand and return it. This way is more easy for the programmer: just use lupos.datastructures.queryresult.QueryResult.createInstance() to create a new QueryResult object and add new single solutions to it by using the method add(Bindings bindings). This way is usually not so efficient as the second way.

  2. The second way is to use the iterator concept of database operators. For this purpose, one should use the lupos.datastructures.queryresult.QueryResult.createInstance(iterator) method, where iterator is an iterator object. There are several types of iterators supported: java.util.Iterator<Bindings>, which just declares the Boolean hasNext() method for returning if the iterator has another Bindings and Bindings next() for retrieving the next Bindings object. lupos.datastructures.queryresult.ParallelIterator<Bindings> adds another method close(), which should be called if the iterator is closed (In this case you should also add public void finalize() { close(); } to your iterator class, such that the close() method is called also whenever an operator does not explicitly close an operator and whenever the garbage collector removes the object). Finally, the lupos.datastructures.queryresult.SIPParallelIterator<Bindings, Bindings> iterator interface, which adds the method Bindings next(Bindings key) for retrieving the next Bindings object, which is equal to or larger than key. Using this iterator class is only meaningful if the input is sorted, but may greatly improve the performance by "jumping over" unneeded intermediate results.