Skip to content

Commit

Permalink
#60 HttpInput Auth support plus documentation added/revised
Browse files Browse the repository at this point in the history
  • Loading branch information
justb4 committed Oct 25, 2017
1 parent 69875d8 commit d207d80
Show file tree
Hide file tree
Showing 19 changed files with 202 additions and 287 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
*.pyc
VERSION.txt
build/
_build/
dist/
htmlcov/
Stetl.egg-info
138 changes: 93 additions & 45 deletions docs/using.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,16 @@ Note: since v1.1.0 a datastream can be split (see below) to multiple ``Outputs``
[etl]
chains = input_xml_file|transformer_xslt|(output_gml_file)(output_wfs)

In later versions also combining ``Inputs`` and ``Filter``-splitting will be provided.
Or multiple Input streams can be combined/merged like: ::

[etl]
chains = (input_http_api_1) (input_http_api_2) | data_transformer | output_db

It is even possible to have both Splitting and Merging together with filtering: ::

[etl]
chains = (input_http_api_1 | cleaner_filter) (input_http_api_2) | data_transformer | (output_db) (output_file)


Configuring Components
----------------------
Expand All @@ -82,49 +91,48 @@ For class authors: this information is added
via the Python Decorators much similar to ``@property``. The :class:`stetl.component.Config`
is used to define read-only properties for each Component instance. For example, ::

class FileInput(Input):
"""
Abstract base class for specific FileInputs, use derived classes.
"""

# Start attribute config meta
# Applying Decorator pattern with the Config class to provide
# read-only config values from the configured properties.

@Config(ptype=str, default=None, required=False)
def file_path(self):
"""
Path to file or files or URLs: can be a dir or files or URLs
or even multiple, comma separated. For URLs only JSON is supported now.

Required: True

Default: None
"""
pass

@Config(ptype=str, default='*.[gxGX][mM][lL]', required=False)
def filename_pattern(self):
"""
Filename pattern according to Python glob.glob for example:
'\*.[gxGX][mM][lL]'

Required: False

Default: '\*.[gxGX][mM][lL]'
"""
pass

# End attribute config meta

def __init__(self, configdict, section, produces):
Input.__init__(self, configdict, section, produces)

# Create the list of files to be used as input
self.file_list = Util.make_file_list(self.file_path, None, self.filename_pattern, self.depth_search)

This defines two configurable properties for the class FileInput.
Each ``@Config`` has three parameters: ``p_type``, the Python type (``str``, ``list``, ``dict``, ``bool``, ``int``),
class FileInput(Input):
"""
Abstract base class for specific FileInputs, use derived classes.
"""

# Start attribute config meta
# Applying Decorator pattern with the Config class to provide
# read-only config values from the configured properties.

@Config(ptype=str, default=None, required=False)
def file_path(self):
"""
Path to file or files or URLs: can be a dir or files or URLs
or even multiple, comma separated. For URLs only JSON is supported now.
"""
pass

@Config(ptype=str, default='*.[gxGX][mM][lL]', required=False)
def filename_pattern(self):
"""
Filename pattern according to Python ``glob.glob`` for example:
'\\*.[gxGX][mM][lL]'
"""
pass

@Config(ptype=bool, default=False, required=False)
def depth_search(self):
"""
Should we recurse into sub-directories to find files?
"""
pass

# End attribute config meta

def __init__(self, configdict, section, produces):
Input.__init__(self, configdict, section, produces)

# Create the list of files to be used as input
self.file_list = Util.make_file_list(self.file_path, None, self.filename_pattern, self.depth_search)

This defines three configurable properties for the class FileInput.
Each ``@Config`` has three parameters: ``ptype``, the Python type (``str``, ``list``, ``dict``, ``bool``, ``int``),
``default`` (default value if not present) and ``required`` (if property in mandatory or optional).

Within the config one can set specific
Expand Down Expand Up @@ -365,7 +373,7 @@ or to publish converted (Filtered) data to multiple remote services (SOS, Sensor
or just for simple debugging to a target ``Output`` and ``StandardOutput``.

See issue https://github.com/geopython/stetl/issues/35 and
the `Chain Split example <https://github.com/geopython/stetl/tree/master/examples/basics/15_splitchain>`_.
the `Chain Split example <https://github.com/geopython/stetl/tree/master/examples/basics/15_splitter>`_.

Here the Chains are split by using ``()`` in the ETL Chain definition: ::

Expand All @@ -391,3 +399,43 @@ Here the Chains are split by using ``()`` in the ETL Chain definition: ::

[output_std]
class = outputs.standardoutput.StandardOutput

Chain Merging
-------------

In some cases we may want to merge (combine, join) multiple input streams.

For example to harvest data from multiple HTTP REST APIs, or to realize a `Filter` that
integrates data from two data-sources.

See issue https://github.com/geopython/stetl/issues/59 and
the `Chain Merge example <https://github.com/geopython/stetl/tree/master/examples/basics/16_merger>`_.

Here the Chains are merged by using ``()`` notation in the ETL Chain definition, possibly even combined with Splitting
Outputs: ::

# Merge two inputs into single Filter.

[etl]
chains = (input_1) (input_2)|transformer_xslt|output_std,
(input_1) (input_2)|transformer_xslt|(output_file)(output_std)


[input_1]
class = inputs.fileinput.XmlFileInput
file_path = input1/cities.xml

[input_2]
class = inputs.fileinput.XmlFileInput
file_path = input2/cities.xml

[transformer_xslt]
class = filters.xsltfilter.XsltFilter
script = cities2gml.xsl

[output_file]
class = outputs.fileoutput.FileOutput
file_path = output/gmlcities.gml

[output_std]
class = outputs.standardoutput.StandardOutput
4 changes: 2 additions & 2 deletions examples/basics/16_merger/etl.cfg
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Merge two inputs into single Filter.

[etl]
chains = (input_1) (input_2)|transformer_xslt|output_std
# ,(input_1) (input_2)|transformer_xslt|(output_file)(output_std)
chains = (input_1) (input_2)|transformer_xslt|output_std,
(input_1) (input_2)|transformer_xslt|(output_file)(output_std)


[input_1]
Expand Down
14 changes: 6 additions & 8 deletions stetl/component.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def __init__(self, ptype=str, default=None, required=False):
self.default = default
self.required = required

def __call__(self, fget, doc=None):
def __call__(self, fget, doc=''):
"""
The __call__ method is not called until the
decorated function is called. self is returned such that __get__ below is called
Expand All @@ -44,19 +44,21 @@ def __call__(self, fget, doc=None):
# For Spinx documentation build we need the original function with docstring.
IS_SPHINX_BUILD = bool(os.getenv('SPHINX_BUILD'))
if IS_SPHINX_BUILD:
fget.__doc__ = '``CONFIG`` - %s' % fget.__doc__
doc = doc.strip()
# TODO more detail, example below
# doc = '``Parameter`` - %s\n\n' % doc
# doc += '* type: %s\n' % self.ptype
#
doc += '* type: %s\n' % str(self.ptype).split("'")[1]
doc += '* required: %s\n' % self.required
doc += '* default: %s\n' % self.default

# if self.value:
# doc += '* value: %s\n' % self.value
# else:
# doc += '* required: %s\n' % self.required
# doc += '* default: %s\n' % self.default
# doc += '* value_range: %s\n' % self.value_range

fget.__doc__ = '``CONFIG`` %s\n%s' % (fget.__doc__, doc)
return fget
else:
return self
Expand Down Expand Up @@ -101,17 +103,13 @@ class Component(object):
def input_format(self):
"""
The specific input format if the consumes parameter is a list or the format to be converted to the output_format.
Required: False
Default: None
"""
pass

@Config(ptype=str, default=None, required=False)
def output_format(self):
"""
The specific output format if the produces parameter is a list or the format to which the input format is converted.
Required: False
Default: None
"""
pass

Expand Down
2 changes: 1 addition & 1 deletion stetl/etl.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def __init__(self, options_dict, args_dict=None):
else:
# Parse config file directly
self.configdict.read(config_file)
except Exception, e:
except Exception as e:
log.error("Fatal Error reading config file: err=%s" % str(e))


Expand Down
6 changes: 0 additions & 6 deletions stetl/filters/formatconverter.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,6 @@ class FormatConverter(Filter):
def converter_args(self):
"""
Custom converter-specific arguments.
Type: dictionary
Required: False
Default: None
"""
pass

Expand Down
4 changes: 0 additions & 4 deletions stetl/filters/packetwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,6 @@ class PacketWriter(Filter):
def file_path(self):
"""
File path to write content to.
Required: True
Default: None
"""
pass

Expand Down
10 changes: 1 addition & 9 deletions stetl/filters/templatingfilter.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,13 @@ class TemplatingFilter(Filter):
def template_file(self):
"""
Path to template file. One of template_file or template_string needs to be configured.
Required: False
Default: None
"""
pass

@Config(ptype=str, default=None, required=False)
def template_string(self):
"""
Template string. One of template_file or template_string needs to be configured.
Required: False
Default: None
"""
pass

Expand Down Expand Up @@ -147,8 +143,6 @@ class Jinja2TemplatingFilter(TemplatingFilter):
def template_search_paths(self):
"""
List of directories where to search for templates, default is current working directory only.
Required: False
Default: [os.getcwd()]
"""
pass

Expand All @@ -157,8 +151,6 @@ def template_globals_path(self):
"""
One or more JSON files or URLs with global variables that can be used anywhere in template.
Multiple files will be merged into one globals dictionary
Required: False
Default: None
"""
pass

Expand All @@ -174,7 +166,7 @@ def __init__(self, configdict, section):
def create_template(self):
try:
from jinja2 import Environment, FileSystemLoader
except Exception, e:
except Exception as e:
log.error(
'Cannot import modules from Jinja2, err= %s; You probably need to install Jinja2 first, see http://jinja.pocoo.org',
str(e))
Expand Down
2 changes: 1 addition & 1 deletion stetl/filters/xmlassembler.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def flush_elements(self, packet):
# Start new doc (TODO clone)
try:
etree_doc = etree.fromstring(self.container_doc, self.xml_parser)
except Exception, e:
except Exception as e:
log.error("new container doc not OK")
return packet

Expand Down
8 changes: 0 additions & 8 deletions stetl/filters/xmlelementreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,21 +30,13 @@ def element_tags(self):
"""
Comma-separated string of XML (feature) element tag names of the elements that should be extracted
and added to the output element stream.
Required: True
Default: None
"""
pass

@Config(ptype=bool, default=False, required=False)
def strip_namespaces(self):
"""
should namespaces be removed from the input document and thus not be present in the output element stream?
Required: False
Default: False
"""
pass

Expand Down
4 changes: 0 additions & 4 deletions stetl/filters/zipfileextractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,6 @@ class ZipFileExtractor(Filter):
def file_path(self):
"""
File name to write the extracted file to.
Required: True
Default: None
"""
pass

Expand Down

0 comments on commit d207d80

Please sign in to comment.