-
-
Notifications
You must be signed in to change notification settings - Fork 84
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
17 changed files
with
3,158 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
TOP100NL inlezen met Stetl (www.stetl.org) ETL framework. | ||
door: Just van den Broecke, | ||
GFS en XSLT door Frank Steggink | ||
|
||
Deze map bevat de ETL configuratie en commando om via Stetl | ||
TOP100NL vanuit de bron GML bestanden naar verschillende outputs weg te schrijven. | ||
Standaard is dit PostGIS, maar omdat output via ogr2ogr verloopt kan dit | ||
elke output zijn die ogr2ogr ondersteunt, bijv SHP, GeoJSON of GeoPackage, in theorie ook bijv Oracle. | ||
|
||
Om gebruik te maken van Stetl moet de externe GitHub submodule externals/stetl | ||
aanwezig zijn. | ||
|
||
Bij het klonen van de GitHub komt Stetl als volgt mee: | ||
git clone --recursive https://github.com/nlextract/NLExtract.git | ||
Stetl komt dan mee, hoeft niet apart geinstalleerd, alleen de Stetl-dependencies. | ||
|
||
Dependencies Stetl installeren: | ||
http://www.stetl.org/en/latest/install.html | ||
|
||
Meer over Stetl: http://stetl.org | ||
|
||
Commando | ||
-------- | ||
|
||
./etl-top100nl.sh | ||
Windows: etl-top100nl.cmd | ||
|
||
Gebruikt default opties (database params etc) uit options/default.args. | ||
|
||
Stetl configuratie, hoeft niet gewijzigd, alleen indien bijv andere output gewenst: | ||
conf/etl-top100nl-v1.1.cfg | ||
|
||
Opties/argumenten | ||
----------------- | ||
|
||
Een aantal opties kunnen op 2 manieren vervangen worden: | ||
|
||
1- Impliciet: Overrule default opties (database params etc) met een eigen lokale file gebaseerd op | ||
lokale hostnaam: options/<jouw host naam>.args | ||
|
||
2- Expliciet op command line via ./etl-top100nl.sh <mijn opties file>.args | ||
etl-top100nl.cmd <mijn opties file>.args | ||
|
||
Indien methode 2 gebruikt wordt, prevaleert deze boven 1 en de default opties! | ||
|
||
Database mapping | ||
---------------- | ||
gfs/top100-v1.1.gfs is de GDAL/OGR "GFS Template" en bepaalt de mapping van GML elementen/attributen | ||
naar PostGIS kolom(namen). Maak eventueel een eigen GFS file en specificeer deze in je | ||
options/<jouw host naam>.args: bijv gfs_template=gfs/mijntop100.gfs | ||
|
||
TODO | ||
---- | ||
* GUI |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
# Example of process-chains for extracting TOP100NL source data from GML to PostGIS. | ||
# A Chain is a series of Components: one Input, zero or more Filters and one Output. | ||
# The output of a Component is connected to the input of the next Component (except for | ||
# the final Output Component, which writes to the final destination, e.g. Postgres. | ||
# | ||
# Currently 3 chains are executed in the following order: | ||
# - SQL pre: DB initialization, delete tables, create schema | ||
# - Main ETL chain, consists of the following components | ||
# 1. input_zip_file: reads files from input ZIP file(s) | ||
# 2. extract_zip_file: extracts a GML file from a ZIP file | ||
# 3. parse_gml_file: parses elements from a GML file | ||
# 4. xml_assembler: assemble feature elements into smaller (etree) docs | ||
# 5. transformer_xslt: transform each (etree) doc | ||
# 6. packet_writer: writes the transformed GML document to a file | ||
# 7. output_ogr2ogr: output using ogr2ogr, input is a transformed GML file, output can be any OGR output | ||
# - SQL post: remove duplicates | ||
# | ||
# Any substitutable values are specified in curly brackets e.g. {password}. | ||
# Actual values can be passed as args to Stetl main.py or as arguments from a wrapper program | ||
# like top100extract.py to etl.py. Here are the 3 chains: | ||
|
||
[etl] | ||
chains = input_sql_pre|schema_name_filter|output_postgres, | ||
input_zip_file|extract_zip_file|parse_gml_file|xml_assembler|transformer_xslt|packet_writer|output_ogr2ogr, | ||
input_sql_post|schema_name_filter|output_postgres | ||
|
||
# Pre SQL file inputs to be executed | ||
[input_sql_pre] | ||
class = inputs.fileinput.StringFileInput | ||
file_path = sql/drop-tables-v1.1.sql,sql/create-schema.sql | ||
|
||
# Post SQL file inputs to be executed | ||
[input_sql_post] | ||
class = inputs.fileinput.StringFileInput | ||
file_path = sql/delete-duplicates-v1.1.sql,sql/update-multiattributes-v1.1.sql | ||
|
||
# Generic filter to substitute Python-format string values like {schema} in string | ||
[schema_name_filter] | ||
class = filters.stringfilter.StringSubstitutionFilter | ||
# format args {schema} is schema name | ||
format_args = schema:{schema} | ||
|
||
[output_postgres] | ||
class = outputs.dboutput.PostgresDbOutput | ||
database = {database} | ||
host = {host} | ||
port = {port} | ||
user = {user} | ||
password = {password} | ||
schema = {schema} | ||
|
||
# The source input ZIP-file(s) from dir, producing 'records' with ZIP file name and inner file names | ||
[input_zip_file] | ||
class=inputs.fileinput.ZipFileInput | ||
file_path = {input_dir} | ||
filename_pattern = *.[zZ][iI][pP] | ||
name_filter=*.[gG][mM][lL] | ||
|
||
# Filter to extract a ZIP file one by one to a temporary location | ||
[extract_zip_file] | ||
class=filters.zipfileextractor.ZipFileExtractor | ||
file_path = {temp_dir}/fromzip-tmp.gml | ||
|
||
# The source input file producing cityObjectMember elements | ||
[parse_gml_file] | ||
class = filters.xmlelementreader.XmlElementReader | ||
element_tags = FeatureMember | ||
|
||
# Assembles etree docs gml:featureMember elements, each with "max_elements" elements | ||
[xml_assembler] | ||
class = filters.xmlassembler.XmlAssembler | ||
max_elements = {max_features} | ||
container_doc = <?xml version="1.0" encoding="UTF-8"?> | ||
<top100nl:FeatureCollectionTop100 | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xmlns:top100nl="http://register.geostandaarden.nl/gmlapplicatieschema/top100nl/1.1.0" | ||
xmlns:gml="http://www.opengis.net/gml/3.2" | ||
xsi:schemaLocation="http://register.geostandaarden.nl/gmlapplicatieschema/top100nl/1.1.0 top100nl.xsd" | ||
gml:id="Top100NL_FC"> | ||
</top100nl:FeatureCollectionTop100> | ||
element_container_tag = FeatureCollectionTop100 | ||
|
||
# Transforms into simple/flat feature data (single geometry per feature type, single attrs) | ||
[transformer_xslt] | ||
class = filters.xsltfilter.XsltFilter | ||
script = xsl/top100-split_v1.1.xsl | ||
|
||
# Writes the payload of a packet as a string to a file | ||
[packet_writer] | ||
class = filters.packetwriter.PacketWriter | ||
file_path = {temp_dir}/top100-tmp.gml | ||
|
||
# The ogr2ogr command-line, may use any output here, as long as | ||
# the input is a GML file. The "temp_file" is where etree-docs | ||
# are saved. It has to be the same file as in the ogr2ogr command. | ||
# TODO: find a way to use a GML-stream through stdin to ogr2ogr | ||
[output_ogr2ogr] | ||
class = outputs.execoutput.Ogr2OgrExecOutput | ||
# destination format: OGR vector format name | ||
dest_format = PostgreSQL | ||
# destination datasource: name of datasource | ||
dest_data_source = "PG:dbname={database} host={host} port={port} user={user} password={password} active_schema={schema}" | ||
# layer creation options will only be added to ogr2ogr on first run | ||
lco = -lco LAUNDER=YES -lco PRECISION=NO | ||
# spatial_extent, translates to -spat xmin ymin xmax ymax | ||
spatial_extent = {spatial_extent} | ||
# gfs template | ||
gfs_template = gfs/top100-v1.1.gfs | ||
# miscellaneous ogr2ogr options | ||
options = -append -gt 65536 {multi_opts} --config PG_USE_COPY NO | ||
# cleanup input? | ||
cleanup_input = True | ||
|
||
# Validator for XML | ||
[xml_schema_validator] | ||
class = filters.xmlvalidator.XmlSchemaValidator | ||
xsd = http://register.geostandaarden.nl/gmlapplicatieschema/top100nl/1.1.0/top100nl.xsd | ||
enabled = False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
:: ETL voor TOP100NL GML met gebruik Stetl. | ||
:: | ||
:: Dit is een front-end/wrapper batch-script om uiteindelijk Stetl met een configuratie | ||
:: (etl-top100nl-v1.1.cfg) en parameters (options\myoptions.args) aan te roepen. Dit script is | ||
:: gebaseerd op het shell-script ../../../brk/etl-brk.sh. | ||
:: | ||
:: Author: Frank Steggink | ||
@echo off | ||
|
||
setlocal | ||
|
||
:: Gebruik Stetl meegeleverd met NLExtract (kan in theorie ook Stetl via pip install stetl zijn) | ||
if "%STETL_HOME%"=="" ( | ||
set STETL_HOME=../../../externals/stetl | ||
) | ||
|
||
:: Nodig voor imports | ||
if "%PYTHONPATH%"=="" ( | ||
set PYTHONPATH=%STETL_HOME% | ||
) else ( | ||
set PYTHONPATH=%STETL_HOME%;%PYTHONPATH% | ||
) | ||
|
||
:: Default argumenten/opties | ||
set options_file=options\default.args | ||
|
||
:: Overrule eventueel het default optiebestand door het gebruik van een host-gebaseerd optiebestand | ||
:: options\<hostnaam>.args. | ||
if exist options\%COMPUTERNAME%.args set options_file=options\%COMPUTERNAME%.args | ||
|
||
:: Evt via commandline overrulen: etl-top100nl.cmd <mijn optiebestand> | ||
if not "%~1"=="" set options_file=%1 | ||
|
||
:: Uiteindelijke commando. Kan ook gewoon "stetl -c conf\etl-top100nl-v1.1.cfg -a ..." worden indien Stetl installed | ||
python %STETL_HOME%\stetl\main.py -c conf\etl-top100nl-v1.1.cfg -a %options_file% | ||
|
||
endlocal |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#!/bin/bash | ||
# | ||
# ETL voor TOP100NL GML met gebruik Stetl. | ||
# | ||
# Dit is een front-end/wrapper shell-script om uiteindelijk Stetl met een configuratie | ||
# (etl-top100nl-v1.1.cfg) en parameters (options/myoptions.args) aan te roepen. | ||
# | ||
# Author: Just van den Broecke | ||
# | ||
|
||
# Gebruik Stetl meegeleverd met NLExtract (kan in theorie ook Stetl via pip install stetl zijn) | ||
if [ -z "$STETL_HOME" ]; then | ||
STETL_HOME=../../../externals/stetl | ||
fi | ||
|
||
# Nodig voor imports | ||
if [ -z "$PYTHONPATH" ]; then | ||
export PYTHONPATH=$STETL_HOME | ||
else | ||
export PYTHONPATH=$STETL_HOME:$PYTHONPATH | ||
fi | ||
|
||
# Default arguments/options | ||
options_file=options/default.args | ||
|
||
# Optionally overules default options file by using a host-based file options/<your hostname>.args | ||
# To add your localhost add <your hostname>.args in options directory | ||
host_options_file=options/`hostname`.args | ||
|
||
[ -f "$host_options_file" ] && options_file=$host_options_file | ||
|
||
# Evt via commandline overrulen: etl-top100nl.sh <my options file> | ||
[ -f "$1" ] && options_file=$1 | ||
|
||
# Uiteindelijke commando. Kan ook gewoon "stetl -c conf/etl-top100nl-v1.1.cfg -a ..." worden indien Stetl installed | ||
# python $STETL_HOME/stetl/main.py -c conf/etl-top100nl-v1.1.cfg -a "$pg_options temp_dir=temp max_features=$max_features gml_files=$gml_files $multi $spatial_extent" | ||
python $STETL_HOME/stetl/main.py -c conf/etl-top100nl-v1.1.cfg -a $options_file |
Oops, something went wrong.