Skip to content
Dalton Kell edited this page May 13, 2019 · 9 revisions

Compliance Checker Development

Installation in Editable Mode

compliance-checker depends on many external C libraries. The easiest way to install in editable mode on MS-Windows/OS X/Linux is with conda.

$ python gen_conda_requirements.py --groups requirements.txt --py3 > conda-requirements.txt
$ conda install -c conda-forge --file conda-requirements.txt --file test_requirements.txt
$ pip install -e .

Compliance Checker internals

Portions of the compliance checker are modeled after standard Python unit tests.

Checkers or CheckSuites are similar to a unit test class, and comprise all the checks pertaining to a compliance standard.

Checkers are comprised of check methods, which are methods of the class that begin with the string check_. Each check method is passed a DSPair object. A check method is expected to return:

  • None (essentially a skip)
  • A Result object
  • An iterable (list) of Result objects

A DSPair object is an internal object, and depending on what type of file is being inspected, will contain a NetCDF Dataset object or an XML document loaded via etree, along with a Wicken Dogma object that provides attributes that inspect the other object.

A Result object is a container for data indicating success or failure of a check. It contains:

  • A priority (BaseCheck.HIGH, BaseCheck.MEDIUM, BaseCheck.LOW)
  • A value
    • A 2-tuple of (passed, total)
    • True (equivalent to (1, 1))
    • False (equivalent to (0, 1))
  • A computer-readable name of the check performed. This may be a single string, or a tuple of strings that can be grouped at each item along with other Results, ie ('variable', 'temperature', 'has_standard_name_attr')
  • An optional list of messages, typically used to indicate why something failed
  • An optional list of child Result objects (advanced usage)

Contributing to the Compliance Checker

Contributing To Existing Checkers

Let's say you were to notice an issue in the CF checker, as in your dataset is failing a check and you don't think it should. Open the cf.py file and search for the name of the check failing to find the method producing that Result. You can then adjust the method to correct any issue - fork, commit, and send a PR for discussion. In order for your changes to be visible to the Compliance Checker, it's advised to uninstall the currently installed version from pip: pip uninstall compliance-checker and re-run python setup.py develop. Alternatively, you can run pip install -e . --no-cache-dir to pip-install in editable mode without using the previously cached version.

Extending Base Classes

If you are creating a new checker within the built-in check suite (e.g. updating the IOOS Metadata Profile from v1.1 to v1.2), we elect to keep the old checker versions (meaning each class) and simply extend the base classes with the new information (hooray object-oriented design!). To ensure your checker is registered, the register_checker = True member must be present in the class. Additionally, you will need to examine the setup.py file and write in the path to the new Checker class as a check suite:

    entry_points         = {
        'console_scripts': [
            'compliance-checker = cchecker:main'
        ],
        'compliance_checker.suites': [
            'cf = compliance_checker.cf.cf:CFBaseCheck',
            'acdd-1.1 = compliance_checker.acdd:ACDD1_1Check',
            'acdd-1.3 = compliance_checker.acdd:ACDD1_3Check',
            'ioos_sos = compliance_checker.ioos:IOOSBaseSOSCheck',
            'ioos-0.1 = compliance_checker.ioos:IOOS0_1Check',
            'ioos-1.1 = compliance_checker.ioos:IOOS1_1Check',
            'ioos-1.2 = compliance_checker.ioos:IOOS1_2Check',  # <-- entry point now exists, will discover the class
        ]
    },

Implementing Your Own Checker

(pending Plugin merge)

The Compliance Checker features a plugin system that allows you to author your own CheckSuite to suit your organization's needs. A CC Plugin is a separate Python project that the Compliance Checker will discover at run-time.

A basic and working example of a real CC Plugin can be found at ioos/cc-plugin-glider.

Getting Started

You'll need a setup.py. Follow the template set in cc-glider-plugin. Here's a basic example for a plugin that checks "whizbang" compliance:

from __future__ import with_statement
import sys

from setuptools import setup, find_packages

from cc_plugin_whizbang import __version__

def readme():
    with open('README.md') as f:
        return f.read()

reqs = [line.strip() for line in open('requirements.txt')]

setup(name               = "cc-plugin-whizbang",
    version              = __version__,
    description          = "Compliance Checker WhizBang plugin",
    long_description     = readme(),
    license              = 'Apache License 2.0',
    author               = "You",
    author_email         = "you@example.com",
    url                  = "https://github.com/org/cc-plugin-whizbang",
    packages             = find_packages(),
    install_requires     = reqs,
    classifiers          = [
            'Development Status :: 5 - Production/Stable',
            'Intended Audience :: Developers',
            'Intended Audience :: Science/Research',
            'License :: OSI Approved :: Apache Software License',
            'Operating System :: POSIX :: Linux',
            'Programming Language :: Python',
            'Topic :: Scientific/Engineering',
        ],
    entry_points         = {
        'compliance_checker.suites': [
            'whizbang = cc_plugin_whizbang.whizbang:WhizBangCheck',
        ]
    }
)

The entry_points entry is the glue that allows Compliance Checker to find this package. When you install via pip install . or pip install -e . to the same environment that Compliance Checker is in, Compliance Checker will now find your plugin.

Best Practices For Check Methods

  • If your check method is optional/conditional, don't return a failure - just return None
  • Use messages for failures, and be explicit - put values, variable names, etc in them.
    • Good: "Variable temperature does not have a standard_name attribute"
    • Bad: "VARIABLE IS WRONG"
  • DON'T put messages for passing checks/Results, they may confuse the end user.
  • There's a lot of flexibility to how you structure your Result objects - if you have a method that checks if a global attribute is present AND has to be a certain value, you can either return a list of two Result objects (likely using True/False for their values), OR you can return one Result object with a value of (1, 2) with a message saying something like "Attribute (attrname) present but value (the_value) was not (expected value)".
  • There's a number of helper/utility methods for CF related checks to classify variables, etc - look around cf/cf.py and cf/util.py for something that may suit your needs.