Skip to content

Commit

Permalink
#35 provide code + example + doc for Output Splitting
Browse files Browse the repository at this point in the history
  • Loading branch information
justb4 committed Dec 13, 2016
1 parent 207d57e commit a049572
Show file tree
Hide file tree
Showing 16 changed files with 975 additions and 579 deletions.
10 changes: 2 additions & 8 deletions STETL.iml
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,11 @@
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager" inherit-compiler-output="true">
<exclude-output />
<content url="file://$MODULE_DIR$/../contrib">
<sourceFolder url="file://$MODULE_DIR$/../contrib/astun/loadergit" isTestSource="false" />
</content>
<content url="file://$MODULE_DIR$">
<sourceFolder url="file://$MODULE_DIR$/tests" isTestSource="true" />
<sourceFolder url="file://$MODULE_DIR$/stetl" isTestSource="false" />
</content>
<content url="file:///opt/gpkgtools/gpkgtools">
<sourceFolder url="file:///opt/gpkgtools/gpkgtools/gpkgtools" isTestSource="false" />
<sourceFolder url="file://$MODULE_DIR$/tests" isTestSource="true" />
</content>
<orderEntry type="sourceFolder" forTests="false" />
<orderEntry type="jdk" jdkName="Python 2.7.6 (/usr/local/bin/python)" jdkType="Python SDK" />
<orderEntry type="jdk" jdkName="Python 2.7.12 (Homebrew /usr/local)" jdkType="Python SDK" />
</component>
</module>
4 changes: 3 additions & 1 deletion docs/cases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,9 @@ Smart Emission
--------------

Sensors for air quality, meteo and audio at civilians. Project by University of Nijmegen/Gemeente Nijmegen with participation
by Geonovum. Stetl is used to transform a low-level sensor API to PostGIS and later on WMS/WFS/SOS.
by Geonovum. Stetl is used to transform a low-level sensor API to PostGIS and later on WMS/WFS/SOS and the SensorThings API.
Also InfluxDB output is developed here.

This is also an example how to use a Stetl Docker image:

See https://github.com/Geonovum/smartemission/tree/master/etl
6 changes: 5 additions & 1 deletion docs/code.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ from an :class:`stetl.input.Input` via zero or more :class:`stetl.filter.Filter`

As a trivial example: an :class:`stetl.input.Input` could be an XML file, a :class:`stetl.filter.Filter` could represent
an XSLT file and an :class:`stetl.output.Output` a PostGIS database. This is effected by specialized classes in
the subpackages inputs, filters, and outputs.
the subpackages inputs, filters, and outputs. New in 1.1.0: :class:`stetl.Splitter` to split data to multiple Outputs.

.. automodule:: stetl.factory
:members:
Expand All @@ -60,6 +60,10 @@ the subpackages inputs, filters, and outputs.
:members:
:show-inheritance:

.. automodule:: stetl.splitter
:members:
:show-inheritance:


Components: Inputs
------------------
Expand Down
45 changes: 44 additions & 1 deletion docs/using.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,16 @@ a the Unix pipe symbol "|".
So the above Chain is ``input_xml_file|transformer_xslt|output_file``. The names
of the component sections like ``[input_xml_file]`` are arbitrary.

Note: since v1.1.0 a datastream can be split (see below) to multiple ``Outputs`` using a ``+`` like : ::

[etl]
chains = input_xml_file|transformer_xslt|output_gml_file+output_wfs

In later versions also combining ``Inputs`` and ``Filter``-splitting will be provided.

Configuring Components
----------------------

Most Stetl Components, i.e. inputs, filters, outputs, have properties that can be configured within their
respective [section] in the config file. But what are the possible properties, values and defaults?
This is documented within each Component class using the ``@Config`` decorator much similar to the standard Python
Expand Down Expand Up @@ -336,7 +344,8 @@ The syntax: chains are separated by commas (steps are sill separated by pipe sym
Chains are executed in order. We can even reuse the
specified components from within the same file. Each will have a separate instance within a Chain.

For example in the `Top10NL example <https://github.com/geopython/stetl/blob/master/examples/top10nl/etl-top10nl.cfg>`_ we see three Chains::
For example in the `Top10NL example <https://github.com/geopython/stetl/blob/master/examples/top10nl/etl-top10nl.cfg>`_
we see three Chains::

[etl]
chains = input_sql_pre|schema_name_filter|output_postgres,
Expand All @@ -346,3 +355,37 @@ For example in the `Top10NL example <https://github.com/geopython/stetl/blob/mas
Here the Chain `input_sql_pre|schema_name_filter|output_postgres` sets up a PostgreSQL schema and
creates tables. `input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr` does the actual ETL and
`input_sql_post|schema_name_filter|output_postgres` does some PostgreSQL postprocessing.

Multiple Outputs
----------------

In some cases we may want to split processed data to multiple ``Outputs``.
For example to produce output files in multiple formats like GML, GeoJSON etc
or to publish converted (Filtered) data to multiple remote services (SOS, SensorThings API)
or just for simple debugging to a target ``Output`` and ``StandardOutput``.

See issue https://github.com/geopython/stetl/issues/35 and
the `Splitter example <https://github.com/geopython/stetl/tree/master/examples/basics/15_splitoutput>`_.

Here the GML-output is split to two ``Outputs`` by using a ``+`` in the ETL Chain definition: ::

# Transform input xml to valid GML file using an XSLT filter and pass to multiple outputs.

[etl]
chains = input_xml_file|transformer_xslt |output_file + output_std

[input_xml_file]
class = inputs.fileinput.XmlFileInput
file_path = input/cities.xml

[transformer_xslt]
class = filters.xsltfilter.XsltFilter
script = cities2gml.xsl

[output_file]
class = outputs.fileoutput.FileOutput
file_path = output/gmlcities.gml

[output_std]
class = outputs.standardoutput.StandardOutput

50 changes: 50 additions & 0 deletions examples/basics/15_splitoutput/cities2gml.xsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Transform plain XML cities XML to valid GML.
Author: Just van den Broecke, Just Objects B.V.
-->
<xsl:stylesheet version="1.0"
xmlns:ogr="http://ogr.maptools.org/"
xmlns:gml="http://www.opengis.net/gml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/">
<ogr:FeatureCollection
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ogr="http://ogr.maptools.org/"
xmlns:gml="http://www.opengis.net/gml"
xsi:schemaLocation="http://ogr.maptools.org/ ../gmlcities.xsd http://www.opengis.net/gml http://schemas.opengis.net/gml/2.1.2/feature.xsd"
>
<gml:boundedBy>
<gml:Box>
<gml:coord><gml:X>-180.0</gml:X><gml:Y>-90.0</gml:Y></gml:coord>
<gml:coord><gml:X>180.0</gml:X><gml:Y>90.0</gml:Y></gml:coord>
</gml:Box>
</gml:boundedBy>
<!-- Loop through all cities. -->
<xsl:apply-templates/>
</ogr:FeatureCollection>
</xsl:template>

<!-- Make each city an ogr:featureMember. -->
<xsl:template match="city">
<gml:featureMember>
<ogr:City>
<ogr:name>
<xsl:value-of select="name"/>
</ogr:name>
<ogr:geometry>
<gml:Point srsName="urn:ogc:def:crs:EPSG:4326">
<gml:coordinates><xsl:value-of select="lat"/>,<xsl:value-of select="lon"/></gml:coordinates>
</gml:Point>
</ogr:geometry>
</ogr:City>
</gml:featureMember>
</xsl:template>
</xsl:stylesheet>
19 changes: 19 additions & 0 deletions examples/basics/15_splitoutput/etl.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Transform input xml to valid GML file using an XSLT filter and pass to multiple outputs.

[etl]
chains = input_xml_file|transformer_xslt |output_file + output_std

[input_xml_file]
class = inputs.fileinput.XmlFileInput
file_path = input/cities.xml

[transformer_xslt]
class = filters.xsltfilter.XsltFilter
script = cities2gml.xsl

[output_file]
class = outputs.fileoutput.FileOutput
file_path = output/gmlcities.gml

[output_std]
class = outputs.standardoutput.StandardOutput
9 changes: 9 additions & 0 deletions examples/basics/15_splitoutput/etl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/sh
#
# Shortcut to call Stetl main.py with etl config.
#
# Author: Just van den Broecke
#
stetl -c etl.cfg


33 changes: 33 additions & 0 deletions examples/basics/15_splitoutput/gmlcities.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://ogr.maptools.org/" xmlns:ogr="http://ogr.maptools.org/"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:gml="http://www.opengis.net/gml"
elementFormDefault="qualified" version="1.0">
<xs:import namespace="http://www.opengis.net/gml"
schemaLocation="http://schemas.opengis.net/gml/2.1.2/feature.xsd"/>
<xs:element name="FeatureCollection" type="ogr:FeatureCollectionType" substitutionGroup="gml:_FeatureCollection"/>
<xs:complexType name="FeatureCollectionType">
<xs:complexContent>
<xs:extension base="gml:AbstractFeatureCollectionType">
<xs:attribute name="lockId" type="xs:string" use="optional"/>
<xs:attribute name="scope" type="xs:string" use="optional"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:element name="City" type="ogr:City_Type" substitutionGroup="gml:_Feature"/>
<xs:complexType name="City_Type">
<xs:complexContent>
<xs:extension base="gml:AbstractFeatureType">
<xs:sequence>
<xs:element name="name" nillable="false" minOccurs="1" maxOccurs="1">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="42"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="geometry" type="gml:PointPropertyType" nillable="false" minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:schema>
18 changes: 18 additions & 0 deletions examples/basics/15_splitoutput/input/cities.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?xml version='1.0' encoding='utf-8'?>
<cities>
<city>
<name>Amsterdam</name>
<lat>52.4</lat>
<lon>4.9</lon>
</city>
<city>
<name>Bonn</name>
<lat>50.7</lat>
<lon>7.1</lon>
</city>
<city>
<name>Rome</name>
<lat>41.9</lat>
<lon>12.5</lon>
</city>
</cities>
45 changes: 45 additions & 0 deletions examples/basics/15_splitoutput/output/gmlcities.gml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
<?xml version='1.0' encoding='utf-8'?>
<ogr:FeatureCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ogr="http://ogr.maptools.org/" xmlns:gml="http://www.opengis.net/gml" xsi:schemaLocation="http://ogr.maptools.org/ ../gmlcities.xsd http://www.opengis.net/gml http://schemas.opengis.net/gml/2.1.2/feature.xsd">
<gml:boundedBy>
<gml:Box>
<gml:coord>
<gml:X>-180.0</gml:X>
<gml:Y>-90.0</gml:Y>
</gml:coord>
<gml:coord>
<gml:X>180.0</gml:X>
<gml:Y>90.0</gml:Y>
</gml:coord>
</gml:Box>
</gml:boundedBy>
<gml:featureMember>
<ogr:City>
<ogr:name>Amsterdam</ogr:name>
<ogr:geometry>
<gml:Point srsName="urn:ogc:def:crs:EPSG:4326">
<gml:coordinates>52.4,4.9</gml:coordinates>
</gml:Point>
</ogr:geometry>
</ogr:City>
</gml:featureMember>
<gml:featureMember>
<ogr:City>
<ogr:name>Bonn</ogr:name>
<ogr:geometry>
<gml:Point srsName="urn:ogc:def:crs:EPSG:4326">
<gml:coordinates>50.7,7.1</gml:coordinates>
</gml:Point>
</ogr:geometry>
</ogr:City>
</gml:featureMember>
<gml:featureMember>
<ogr:City>
<ogr:name>Rome</ogr:name>
<ogr:geometry>
<gml:Point srsName="urn:ogc:def:crs:EPSG:4326">
<gml:coordinates>41.9,12.5</gml:coordinates>
</gml:Point>
</ogr:geometry>
</ogr:City>
</gml:featureMember>
</ogr:FeatureCollection>
1 change: 1 addition & 0 deletions examples/basics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ As a general Stetl-health test you may run all examples using `./runall.sh`.
* 12_gdal_ogr - direct OgrInput (and later output)
* 13_dbinput - input from SQL sources, here SLQLite Input
* 14_logfileinput - input from Apache Logfile
* 15_splitoutput - split input over multiple outputs
Loading

0 comments on commit a049572

Please sign in to comment.