Provide OGC-specific/semantic health-checks #82

justb4 · 2017-02-08T16:02:11Z

In GHC OGC-resources are checked for a successful GetCapabilities response (<title> element), but sometimes that response may even come from a static file. OGC Services on that endpoint can fail for many reasons, usually one notices on Get-requests (WMS GetMap, WFS GetFeature etc) that the service is "unhealthy" without hard failures e.g. a blank WMS image (with Exception inimage), zero WFS featurecount etc. I realize that generic auto-generated, crawling Get* requests are tricky to implement, with unwanted performance impacts caused by random OWS-requests (think of a GetFeature for whole count(r)y).

Having a WWW:LINK Resource check for Exceptions via issue #19 was a first step, but via this issue we seek to do what one could call OGC-semantic-health-checks on Resources.

The basic idea is to assign a checklist to each Resource. As its name implies this is a list of checks to be executed on that Resource during a Run. We can have predefined/default checks like getting a Capabilities document sucessfully. As we can never be exhaustive in the kind of tests, individual check-types are best implemented as plugins.

Suppose each check-type has a typeid, a simple checklist on a WWW:LINK Resource may look like: WWW:LINK with WMS GetMap request:

checklist: [
  {
    type: 'hascontenttype',
    properties: {
      content_type_is: `image/jpeg`
    }
 },
  {
    type: 'keywordnotexists'
    properties: {
      keyword: `ServiceException>`
    }
 }
]

For OGC Resources based on an endpoint we need more parameters. For example for a OGC:WMS endpoint Resource to check if an image is returned, one needs:

checklist: [
  {
    type: 'hascontenttype',
    properties: {
      request: 'GetMap',
      service: 'WMS',
      layers: 'layerN',
      version: '1.1.1',
      bbox: [4.83,52.29,4.87,52.32],
      width: 240,
      height: 320,
      format: 'image/jpeg',
      exceptions: 'application/vnd.ogc.se_xml',
      content_type_is: `image/jpeg`
    }
 },
  {
    type: 'keywordnotexists'
    properties: {
    .
    .
 }
]

Most parameters/properties are needed for building the WMS GetMap request. We may need to define requests separately such that they are issued once, and then run the checklist. This will make for WWW:LINK and OGC:* Resources a similar implementation. Probably for OGC:* Resources a user needs to provide a list of requests per Endpoint, e.g. from request-templates first and then compose a checklist, with parameters for these requests.

Many more checks can be thought of: minimum filesize (prevent blank images), featurecount, etc For OGC specific XML-based services this would entain response parsing according to XML-schema's via OWSLib, so less a need for regexes etc. We can start simple with the WWW:LINK keyword check using the checklist-method.

The text was updated successfully, but these errors were encountered:

justb4 · 2017-02-08T16:15:25Z

On writing the above issue I now realize that a more minimal/first implementation could entail to enable a user to extend an OGC:* Resource with daughter-sample-requests and do a basic check for their success/failure. This would require forms to fill in parameters, possibly selecting from values in the GetCapabilities and Describe* responses....

justb4 · 2017-02-09T11:28:27Z

Further thinking: both Requests (on an OGC-service-endpoint) and Checks (on Requests) are best implemented using a plugin system.

As for Requests, I have positive experience using OGC Request Templates, either via plain Python str format() or Jinja2. See for example SOS Templates I used for SOS-T publication. Each Template is a complete request with symbolic parameters in {parameter}. At runtime the {parameter}s are substituted using a dict (key/value pairs) of actual values. See for example InsertSensor.
This mechanism could apply to both GET and POST requests. A WMS GetMap (GET) template for a OGC:WMS endpoint could look like:

BBOX={bbox}&WIDTH={width}&HEIGHT={height}&SRS={srs}&
LAYERS={layers}&STYLES=&FORMAT={format}&SERVICE=WMS&
VERSION={version}&REQUEST=GetMap&EXCEPTIONS=application/vnd.ogc.se_xml

The advantage is that we need only one Request-handling mechanism. The dict of values will need to be filled when a user adds a Request via the GUI in add.html and will be stored in the database. There's a challenge how to obtain value-ranges like SRS/CRS from the Endpoint's metadata.

Each Request template would need to supply at least:

unique symbolic identifier wms_getmap
Resource type, e.g. OGC:WMS
method (GET or POST, maybe even others)
description (multilang?)
parameter-names and -types
the template text string itself

In add.html, dependent on the Resource type a user can add one or more applicable Requests, each using a form generated from the above parameter-names and -types. As for the database, the simplest is to have a Request-table with at least the columns: request_type (plugin-id), resource_id (parent Resource), parameters (map of values to substitute). On running the health-checks GHC will query all Requests for each Resource, reading each related template string and substituting the values from the parameters etc.

justb4 · 2017-02-09T11:42:06Z

As for Checks we could apply a similar mechanism as Requests: plugins that implement a general mechanism, for example:

contains_keyword (check if response contains given keyword)
not_contains_keyword (check if response not contains given keyword)
has_content_type (check if response has given content type),
response_time_less_than (QoS check on response time)
feature_count_greater_than (WFS feature count greater than number/0)
etc

Possibly we need a way to indicate an outcome or Verdict of the Checks.

The implementation of each Check plugin would be a Python function (or class) with a fixed interface. Also here parameters apply that a user needs to supply in add.html and edit in resource.html. All very similar to the Requests implementation: a Check table would have at least the following columns: request_id, parameters (dict/map, e.g. the keyword, or feature count etc). Basically the Check table would supply the CheckList for each Request. The Run table also needs to be extended in order to know the result for each Resource/Request/Check combination.

I realize the above is not a quick&dirty implementation and will require quite some development time (hard to estimate, 5-10 days?) but could be preserved once we move to v2. For example the Requests and Checks could be managed via a REST API.

justb4 · 2017-02-09T13:00:58Z

After discussion on Gitter with @tomkralidis :

each Resource (Endpoint) has 1..N Requests, each Request 1..N Checks
a generic GHC test runner will execute these
makes sense to bundle Request (one or more tbd) and applicable Checks in single plugin
finding plugins: modules via PYTHONPATH (viz pycsw and Stetl) i.s.o. hard-coded dir-paths
may make sense to have a separate geohealthcheck-plugins GH repo
Plugin-instances/settings could be managed via a REST API (a.o. called by GUI)
leave freedom to developers to implement a GHC Plugin anyway they want

justb4 · 2017-02-12T16:40:14Z

After some thought the above commit introduces the key classes/framework for plugins to deal with the above requirements. The key concept herein is that of a Probe and its implementation as a Probe base class and plugins as classes derived from Probe.

A Probe embeds a single Request with multiple Checks and result arbitration. Via a Probe's class-variables (capitals) most of the specification can be done. In most cases a plugin-author only needs to provide these variables. But still there is the freedom to override any of the Probe base class methods. Requests are driven from REQUEST_TEMPLATES. Actual parameters for a Probe are specified by the user in the GUI and stored in the DB as Request and Check records.

There are the following aspects/phases in this concept:

Probe authoring: derived classes from Probe optional Check functions
Probe configuration: in main_config.py (and site_config.py) Probe classes available should be listed in GHC_PLUGINS array and to be found in the PYTHONPATH.
Resource editing (add, edit, delete): for each Resource: select Probes, parameterize Probes
Resource editing (add, edit, delete): for each Probe: select Checks, parameterize Checks
Resource editing: all Probe/Check identifiers with their parameters are stored in DB
GHC Run: read Probe/Check records (via Resource), instantiate Probe (via Factory) with config
For each Probe: Run the Probe, do the Checks, obtain and store Result in Run table

The above concept (running Probes) is in progress to be tested via unittests in tests dir.

justb4 · 2017-03-28T08:22:31Z

Implementation is progressing. One may check at our new "devserver" (that runs "dev branches", edge development) at http://dev.geohealthcheck.org. Before merging into master, there are still some additions to be made:

automatic default Probe/Check assignment on Resource creation (e.g. OGC Capabilities)
default Check enablement in Probe.CHECKS_AVAIL
database migration/versioning: candidates Alembic with Flask-migrate or SQLALchemy-migrate. The latter seems more lightweight...

And Nice to Have:

Helpers for (OWS) Parameters like bbox (via map) and other resources like Layers (WMS)
Run report in email notification
Formatted Run Reports, via REST API
More Probe and Check GWC Plugins
More documentation/tutorials on Probe/Check architecture and development

fixes #89 until we upgrade to Flask-SQLAlchemy 2.2. This fix is also used for changes under #82 which includes more extensive DB-testing in Travis build.

justb4 · 2017-05-02T10:30:43Z

A PR #93 was created and just merged into master branch.

justb4 · 2017-05-03T12:47:44Z

Result is running on http://demo.geohealthcheck.org
Doc on Plugins on: http://docs.geohealthcheck.org/en/latest/plugins.html

As #93 shows, also as "side-effects" (like DB upgrade automation), many changes went in under this issue:

Plugins: extensible Probes and Checks to perform healthchecks on Resources (endpoints)
UI changes to assign and configure Probes/Checks per Resource (resource-edit page)
reporting: more extensive reports for Probe/Check results
many Plugins for Probe/Checks available for common use cases
config: new params: GHC_PLUGINS listing Probes/Checks modules and classes avail
config: new params: GHC_PROBE_DEFAULTS default Probe class to assign to Resource on Resource-create
database upgrade via Flask-Migrate/Alembic and invoked by end-user via paver upgrade
GeoHealthCheck/migrations dir contains all migrations, for any pre-0.2.0 version
Flask-Script support: script manage.py contains a command processor
for various DB management tasks related to migrations and upgrading
more extensive documentation, plus tutorial and API docs for Plugin development
unit tests: data loading for fixtures, more tests added, executed via Travis on commit
new Resource type: OGC:STA, support for OGC SensorThings API, including STA Probe
more robust DB Session Mgt, in particular for PostgreSQL deadlocks
version: 0.2.0 (prev was 0.1.0)

Only "gotcha": when having existing DB, all Probes/Checks need to be added manually..
Closing this issue now, any problems/fixes via new issues.

justb4 added the enhancement label Feb 8, 2017

justb4 self-assigned this Feb 8, 2017

justb4 mentioned this issue Feb 8, 2017

Monitor alternative responses from returned from a resouce #19

Closed

justb4 added a commit that referenced this issue Apr 10, 2017

Merge pull request #91 from geopython/fix_89

c789acd

fixes #89 until we upgrade to Flask-SQLAlchemy 2.2. This fix is also used for changes under #82 which includes more extensive DB-testing in Travis build.

justb4 closed this as completed May 3, 2017

justb4 mentioned this issue May 3, 2017

Enhancement: Support tags or some other filtering mechanism #75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide OGC-specific/semantic health-checks #82

Provide OGC-specific/semantic health-checks #82

justb4 commented Feb 8, 2017

justb4 commented Feb 8, 2017 •

edited

Loading

justb4 commented Feb 9, 2017

justb4 commented Feb 9, 2017

justb4 commented Feb 9, 2017

justb4 commented Feb 12, 2017 •

edited

Loading

justb4 commented Mar 28, 2017 •

edited

Loading

justb4 commented May 2, 2017

justb4 commented May 3, 2017

Provide OGC-specific/semantic health-checks #82

Provide OGC-specific/semantic health-checks #82

Comments

justb4 commented Feb 8, 2017

justb4 commented Feb 8, 2017 • edited Loading

justb4 commented Feb 9, 2017

justb4 commented Feb 9, 2017

justb4 commented Feb 9, 2017

justb4 commented Feb 12, 2017 • edited Loading

justb4 commented Mar 28, 2017 • edited Loading

justb4 commented May 2, 2017

justb4 commented May 3, 2017

justb4 commented Feb 8, 2017 •

edited

Loading

justb4 commented Feb 12, 2017 •

edited

Loading

justb4 commented Mar 28, 2017 •

edited

Loading