Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide OGC-specific/semantic health-checks #82

Closed
justb4 opened this issue Feb 8, 2017 · 8 comments
Closed

Provide OGC-specific/semantic health-checks #82

justb4 opened this issue Feb 8, 2017 · 8 comments
Assignees

Comments

@justb4
Copy link
Member

justb4 commented Feb 8, 2017

In GHC OGC-resources are checked for a successful GetCapabilities response (<title> element), but sometimes that response may even come from a static file. OGC Services on that endpoint can fail for many reasons, usually one notices on Get-requests (WMS GetMap, WFS GetFeature etc) that the service is "unhealthy" without hard failures e.g. a blank WMS image (with Exception inimage), zero WFS featurecount etc. I realize that generic auto-generated, crawling Get* requests are tricky to implement, with unwanted performance impacts caused by random OWS-requests (think of a GetFeature for whole count(r)y).

Having a WWW:LINK Resource check for Exceptions via issue #19 was a first step, but via this issue we seek to do what one could call OGC-semantic-health-checks on Resources.

The basic idea is to assign a checklist to each Resource. As its name implies this is a list of checks to be executed on that Resource during a Run. We can have predefined/default checks like getting a Capabilities document sucessfully. As we can never be exhaustive in the kind of tests, individual check-types are best implemented as plugins.

Suppose each check-type has a typeid, a simple checklist on a WWW:LINK Resource may look like: WWW:LINK with WMS GetMap request:

checklist: [
  {
    type: 'hascontenttype',
    properties: {
      content_type_is: `image/jpeg`
    }
 },
  {
    type: 'keywordnotexists'
    properties: {
      keyword: `ServiceException>`
    }
 }
]

For OGC Resources based on an endpoint we need more parameters. For example for a OGC:WMS endpoint Resource to check if an image is returned, one needs:

checklist: [
  {
    type: 'hascontenttype',
    properties: {
      request: 'GetMap',
      service: 'WMS',
      layers: 'layerN',
      version: '1.1.1',
      bbox: [4.83,52.29,4.87,52.32],
      width: 240,
      height: 320,
      format: 'image/jpeg',
      exceptions: 'application/vnd.ogc.se_xml',
      content_type_is: `image/jpeg`
    }
 },
  {
    type: 'keywordnotexists'
    properties: {
    .
    .
 }
]

Most parameters/properties are needed for building the WMS GetMap request. We may need to define requests separately such that they are issued once, and then run the checklist. This will make for WWW:LINK and OGC:* Resources a similar implementation. Probably for OGC:* Resources a user needs to provide a list of requests per Endpoint, e.g. from request-templates first and then compose a checklist, with parameters for these requests.

Many more checks can be thought of: minimum filesize (prevent blank images), featurecount, etc For OGC specific XML-based services this would entain response parsing according to XML-schema's via OWSLib, so less a need for regexes etc. We can start simple with the WWW:LINK keyword check using the checklist-method.

@justb4
Copy link
Member Author

justb4 commented Feb 8, 2017

On writing the above issue I now realize that a more minimal/first implementation could entail to enable a user to extend an OGC:* Resource with daughter-sample-requests and do a basic check for their success/failure. This would require forms to fill in parameters, possibly selecting from values in the GetCapabilities and Describe* responses....

@justb4
Copy link
Member Author

justb4 commented Feb 9, 2017

Further thinking: both Requests (on an OGC-service-endpoint) and Checks (on Requests) are best implemented using a plugin system.

As for Requests, I have positive experience using OGC Request Templates, either via plain Python str format() or Jinja2. See for example SOS Templates I used for SOS-T publication. Each Template is a complete request with symbolic parameters in {parameter}. At runtime the {parameter}s are substituted using a dict (key/value pairs) of actual values. See for example InsertSensor.
This mechanism could apply to both GET and POST requests. A WMS GetMap (GET) template for a OGC:WMS endpoint could look like:

BBOX={bbox}&WIDTH={width}&HEIGHT={height}&SRS={srs}&
LAYERS={layers}&STYLES=&FORMAT={format}&SERVICE=WMS&
VERSION={version}&REQUEST=GetMap&EXCEPTIONS=application/vnd.ogc.se_xml

The advantage is that we need only one Request-handling mechanism. The dict of values will need to be filled when a user adds a Request via the GUI in add.html and will be stored in the database. There's a challenge how to obtain value-ranges like SRS/CRS from the Endpoint's metadata.

Each Request template would need to supply at least:

  • unique symbolic identifier wms_getmap
  • Resource type, e.g. OGC:WMS
  • method (GET or POST, maybe even others)
  • description (multilang?)
  • parameter-names and -types
  • the template text string itself

In add.html, dependent on the Resource type a user can add one or more applicable Requests, each using a form generated from the above parameter-names and -types. As for the database, the simplest is to have a Request-table with at least the columns: request_type (plugin-id), resource_id (parent Resource), parameters (map of values to substitute). On running the health-checks GHC will query all Requests for each Resource, reading each related template string and substituting the values from the parameters etc.

@justb4
Copy link
Member Author

justb4 commented Feb 9, 2017

As for Checks we could apply a similar mechanism as Requests: plugins that implement a general mechanism, for example:

  • contains_keyword (check if response contains given keyword)
  • not_contains_keyword (check if response not contains given keyword)
  • has_content_type (check if response has given content type),
  • response_time_less_than (QoS check on response time)
  • feature_count_greater_than (WFS feature count greater than number/0)
    etc

Possibly we need a way to indicate an outcome or Verdict of the Checks.

The implementation of each Check plugin would be a Python function (or class) with a fixed interface. Also here parameters apply that a user needs to supply in add.html and edit in resource.html. All very similar to the Requests implementation: a Check table would have at least the following columns: request_id, parameters (dict/map, e.g. the keyword, or feature count etc). Basically the Check table would supply the CheckList for each Request. The Run table also needs to be extended in order to know the result for each Resource/Request/Check combination.

I realize the above is not a quick&dirty implementation and will require quite some development time (hard to estimate, 5-10 days?) but could be preserved once we move to v2. For example the Requests and Checks could be managed via a REST API.

@justb4
Copy link
Member Author

justb4 commented Feb 9, 2017

After discussion on Gitter with @tomkralidis :

  • each Resource (Endpoint) has 1..N Requests, each Request 1..N Checks
  • a generic GHC test runner will execute these
  • makes sense to bundle Request (one or more tbd) and applicable Checks in single plugin
  • finding plugins: modules via PYTHONPATH (viz pycsw and Stetl) i.s.o. hard-coded dir-paths
  • may make sense to have a separate geohealthcheck-plugins GH repo
  • Plugin-instances/settings could be managed via a REST API (a.o. called by GUI)
  • leave freedom to developers to implement a GHC Plugin anyway they want

@justb4
Copy link
Member Author

justb4 commented Feb 12, 2017

After some thought the above commit introduces the key classes/framework for plugins to deal with the above requirements. The key concept herein is that of a Probe and its implementation as a Probe base class and plugins as classes derived from Probe.

A Probe embeds a single Request with multiple Checks and result arbitration. Via a Probe's class-variables (capitals) most of the specification can be done. In most cases a plugin-author only needs to provide these variables. But still there is the freedom to override any of the Probe base class methods. Requests are driven from REQUEST_TEMPLATES. Actual parameters for a Probe are specified by the user in the GUI and stored in the DB as Request and Check records.

There are the following aspects/phases in this concept:

  • Probe authoring: derived classes from Probe optional Check functions
  • Probe configuration: in main_config.py (and site_config.py) Probe classes available should be listed in GHC_PLUGINS array and to be found in the PYTHONPATH.
  • Resource editing (add, edit, delete): for each Resource: select Probes, parameterize Probes
  • Resource editing (add, edit, delete): for each Probe: select Checks, parameterize Checks
  • Resource editing: all Probe/Check identifiers with their parameters are stored in DB
  • GHC Run: read Probe/Check records (via Resource), instantiate Probe (via Factory) with config
  • For each Probe: Run the Probe, do the Checks, obtain and store Result in Run table

The above concept (running Probes) is in progress to be tested via unittests in tests dir.

@justb4
Copy link
Member Author

justb4 commented Mar 28, 2017

Implementation is progressing. One may check at our new "devserver" (that runs "dev branches", edge development) at http://dev.geohealthcheck.org. Before merging into master, there are still some additions to be made:

  • automatic default Probe/Check assignment on Resource creation (e.g. OGC Capabilities)
  • default Check enablement in Probe.CHECKS_AVAIL
  • database migration/versioning: candidates Alembic with Flask-migrate or SQLALchemy-migrate. The latter seems more lightweight...

And Nice to Have:

  • Helpers for (OWS) Parameters like bbox (via map) and other resources like Layers (WMS)
  • Run report in email notification
  • Formatted Run Reports, via REST API
  • More Probe and Check GWC Plugins
  • More documentation/tutorials on Probe/Check architecture and development

justb4 added a commit that referenced this issue Apr 10, 2017
fixes #89 until we upgrade to Flask-SQLAlchemy 2.2. This fix is also used for changes under #82 which includes more extensive DB-testing in Travis build.
@justb4
Copy link
Member Author

justb4 commented May 2, 2017

A PR #93 was created and just merged into master branch.

@justb4
Copy link
Member Author

justb4 commented May 3, 2017

Result is running on http://demo.geohealthcheck.org
Doc on Plugins on: http://docs.geohealthcheck.org/en/latest/plugins.html

As #93 shows, also as "side-effects" (like DB upgrade automation), many changes went in under this issue:

  • Plugins: extensible Probes and Checks to perform healthchecks on Resources (endpoints)
  • UI changes to assign and configure Probes/Checks per Resource (resource-edit page)
  • reporting: more extensive reports for Probe/Check results
  • many Plugins for Probe/Checks available for common use cases
  • config: new params: GHC_PLUGINS listing Probes/Checks modules and classes avail
  • config: new params: GHC_PROBE_DEFAULTS default Probe class to assign to Resource on Resource-create
  • database upgrade via Flask-Migrate/Alembic and invoked by end-user via paver upgrade
  • GeoHealthCheck/migrations dir contains all migrations, for any pre-0.2.0 version
  • Flask-Script support: script manage.py contains a command processor
    for various DB management tasks related to migrations and upgrading
  • more extensive documentation, plus tutorial and API docs for Plugin development
  • unit tests: data loading for fixtures, more tests added, executed via Travis on commit
  • new Resource type: OGC:STA, support for OGC SensorThings API, including STA Probe
  • more robust DB Session Mgt, in particular for PostgreSQL deadlocks
  • version: 0.2.0 (prev was 0.1.0)

Only "gotcha": when having existing DB, all Probes/Checks need to be added manually..
Closing this issue now, any problems/fixes via new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant