An Open Source Java implementation of the Validation Transformation Language, based on the VTL 1.1 draft specification. The implementation follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores. VTL is a standard language for defining validation and transformat…
Branch: develop
Clone or download
Pull request Compare This branch is 432 commits ahead of hadrienk:develop.
hadrienk Merge pull request #106 from statisticsnorway/feature/join-on-identif…
…ier-fix

VTL parsing should only allow common identifiers in the "on" clause for join operations
Latest commit 7694a8b Jan 31, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
java-vtl-coverage Bump to version 0.1.13-SNAPSHOT Nov 21, 2018
java-vtl-dependency-parser/src/main/java/no/ssb/vtl/dependencies Merge branch 'develop' into feature/expression-statement-refactoring Oct 4, 2017
java-vtl-documentation Merge branch 'bugfix/hierarchy-negative-null-values' into develop Nov 13, 2018
java-vtl-model Fix after PR feedbacks Jan 23, 2019
java-vtl-parser Use variable instead of variableExpression in on clause Jan 31, 2019
java-vtl-script Use variable instead of variableExpression in on clause Jan 31, 2019
java-vtl-test Bump to version 0.1.13-SNAPSHOT Nov 21, 2018
.gitignore Ignore documentation site Sep 11, 2017
.travis.yml Add coverage etup for codeclimate and codecov Nov 21, 2017
CHANGELOG.md Update dependencies Nov 21, 2018
LICENSE Update license to Apache 2.0 Oct 4, 2016
README.md
pom.xml Bump to version 0.1.13-SNAPSHOT Nov 21, 2018

README.md

Build Status Codacy Badge Codacy coverage Gitter

Java VTL: Java implementation of VTL

The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.

Visit the interactive reference manual for more information.

Modules

The project is divided in modules;

  • java-vtl-parent
    • java-vtl-parser, contains the lexer and parser for VTL.
    • java-vtl-model, VTL data model.
    • java-vtl-script, JSR-223 (ScriptEngine) implementation.
    • java-vtl-connector, connector API.
    • java-vtl-tools, various tools.

Usage

Add a dependency to the maven project

<dependency>
    <groupId>no.ssb.vtl</groupId>
    <artifactId>java-vtl-script</artifactId>
    <version>0.1.13-SNAPSHOT</version>
</dependency>

Evaluate VTL expressions

ScriptEngine engine = new VTLScriptEngine(connector);

Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
            "ds2 := get(\"bar\")" +
            "ds3 := [ds1, ds2] {" +
            "   filter ds1.id = \"string\"," +
            "   total := ds1.measure + ds2.measure" +
            "}");

System.out.println(bindings.get("ds3"))

Connect to external systems

VTL Java uses the no.ssb.vtl.connector.Connector interface to access and export data from and to external systems.

The Connector interface defines three methods:

public interface Connector {

    boolean canHandle(String identifier);

    Dataset getDataset(String identifier) throws ConnectorException;

    Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;

}

The method canHandle(String identifier) is used by the engine to find which connector is able to provide a Dataset for a given identifier.

The method getDataset(String identifier) is then called to get the dataset. Example implementations can be found in the java-vtl-ssb-api-connector module but a very crude implementation could be as such:

class StaticDataset implements Dataset {

    private final DataStructure structure = DataStructure.builder()
            .put("id", Role.IDENTIFIER, String.class)
            .put("period", Role.IDENTIFIER, Instant.class)
            .put("measure", Role.MEASURE, Long.class)
            .put("attribute", Role.ATTRIBUTE, String.class)
            .build();

    @Override
    public Stream<DataPoint> getData() {

        List<Map<String, Object>> data = new ArrayList<>();
        HashMap<String, Object> row = new HashMap<>();
        Instant period = Instant.now();
        for (int i = 0; i < 100; i++) {
            row.put("id", "id #" + i);
            row.put("period", period);
            row.put("measure", Long.valueOf(i));
            row.put("attribute", "attribute #" + i);
            data.add(row);
        }

        return data.stream().map(structure::wrap);
    }

    @Override
    public Optional<Map<String, Integer>> getDistinctValuesCount() {
        return Optional.empty();
    }

    @Override
    public Optional<Long> getSize() {
        return Optional.of(100L);
    }

    @Override
    public DataStructure getDataStructure() {
        return structure;
    }
}

Implementation roadmap

This is an overview of the implementation progress.

Group Operators Progress Comment
General purpose round parenthesis done
General purpose := (assignment) done
General purpose membership done
General purpose get usable The keep, filter and aggregate options are not implemented.
General purpose put usable Defined in the grammar but not implemented
Join expression []{} done
Join clause filter done
Join clause keep done
Join clause drop done
Join clause fold done
Join clause unfold done
Join clause rename done
Join clause := (assignment) done
Join clause . (membership) done
Clauses rename done
Clauses filter done
Clauses keep done
Clauses calc todo
Clauses attrcalc todo
Clauses aggregate todo
Conditional if-then-else todo
Conditional nvl done
Validation Comparisons (>,<,>=,<=,=,<>) done
Validation in,not in, between todo
Validation isnull done Implemented syntax are isnull(value), value is null and value is not null
Validation exist_in, not_exist_in todo
Validation exist_in_all, not_exist_in_all todo
Validation check usable The boolean dataset must be built manually (no lifting).
Validation match_characters todo
Validation match_values todo
Statistical min, max todo
Statistical hierarchy usable The inline definition is not supported. A dataset that has a correct structure can be used instead.
Statistical aggregate todo
Relational union done
Relational intersect todo
Relational symdiff todo
Relational setdiff done
Relational merge todo
Boolean and usable Only inside join expression (no lifting).
Boolean or usable Only inside join expression (no lifting).
Boolean xor usable Only inside join expression (no lifting).
Boolean not usable Only inside join expression (no lifting).
Mathematical unary plus and minus done
Mathematical addition, substraction done
Mathematical multiplication, division done
Mathematical round, ceil, floor done
Mathematical abs done
Mathematical trunc done
Mathematical power, exp, nroot done
Mathematical ln, log done
Mathematical mod done
String length todo
String concatenation done
String trim todo
String upper/lower case todo
String substr usable No lifting.
String indexof todo
String date_from_string usable Dataset as input not implemented. Only YYYY date format accepted.
Outside specification integer_from_string done
Outside specification float_from_string done
Outside specification string_from_number done

Analytics