Schema Formatters and Metadata Views

Date	Oct 15, 2014	Contacts	Jesse Eichar
Status	In Progress	Release	3.0
Resources	Jesse Eichar	Ticket #	650
Source code	https://github.com/jesseeichar/core-geonetwork/tree/formatter-functions
Funding	Swisstopo/Camptocamp

Overview

This proposal will change the formatter api:

So that it will look in schema plugins formatter directory as well as the data directory's formatter directory in order to find all the registered formatters.
So that formatters can be grouped by putting several formatters together in a sub-directory of the root formatters directory. In this case the formatter id will contain a / separator. IE package/identification. Note: A group directory cannot contain a formatter view file.
So that config.properties files can be shared by putting them in the root formatter directory. This makes the most sense when it comes to schema plugins since they will likely share most if not all configuration properties.
By adding a groovy based alternative formatter dialect. This will permit more terse and readable formatter files.

This proposal will also provide default metadata views based on the formatters framework for the default schemas.

Resources on formatters:

Technical Details:

Schema Plugin Formatters

Currently the <data_dir>/data/formatter directory is searched for directories containing a view.xsl file, each of these directories are considered to be a formatter plugin. This proposal will also search each schema plugin's formatter directory for formatters as well. These formatters will be the metadata viewers available for each schema.

Grouped Formatters

In some cases it makes sense to group related formatters. For example ISO19139 original viewer had a group of views based on the sections of the ISO19139 schema: metadata, identification, distribution, etc`.. This proposal will allow formatters to be grouped in a similar way by putting them together in a directory:

schema_plugins
    iso19139
        formatter
            package
                metadata
                identification
                spatial_rep
                distribution
                maintenance
                ...

Each formatter in a group will be assigned an id with a / separator separating the parts: package/metadata or package/identification for example.

Shared files

Since most views in a schema plugin will share much of its configuration configuration files like config.properties and localization files can be put in the root of the formatter directory of the schema plugin and all formatters in that schema plugin will share those files. In the case of config.properties a particular plugin can override the common properties by including its own config.properties file. The two files will be merged with the formatter specific file take priority.

Groovy Based Formatter

XSLT is rather verbose and many people find it extremely intimidating to approach. As an alternative syntax this proposal will provide a Groovy based DSL for defining the formatter.

A note about Groovy. Groovy is based on Java and virtually all Java syntax is also Groovy syntax so if you are a Java developer it should be extremely easy for you to write Groovy code. Groovy adds to Java many forms for syntax sugar like not having to declare variable types. In the case of XML processing, the dynamic nature of Groovy makes for a very clean API for creating and parsing XML. In addition it is a fairly popular language in the Java community.

The Groovy formatter API will at its most simple allow one to define element handlers and when the element is parsed the handler will be called to produce HTML. The handler can either return a file, a string or html using the HtmlBuilder class.

For parsing the metadata the Groovy XmlSlurper will be used and the syntax for selecting elements and processing elements will be similar, although extra options will be added to make the task of implementing a formatters simpler and cleaner.

The following is a simple example of a formatter. Remember view.groovy files are script so all programming techniques apply (if, when, methods, classes).

This example is the file used in the tests that verify the correctness of the framework.

/*
 * The view.groovy script is a groovy script which must configure the handlers object
 * (org.fao.geonet.services.metadata.format.groovy.Handlers).  To script has
 * the following variables bound before execution:
 *
 * - handlers - an org.fao.geonet.services.metadata.format.groovy.Handlers object
 * - f - an org.fao.geonet.services.metadata.format.groovy.Functions object
 * - env - an org.fao.geonet.services.metadata.format.groovy.Environment object.
 *         *IMPORTANT* this object can only be used during process time. When this
 *                     script is executed the org.fao.geonet.services.metadata.format.groovy.Transformer
 *                     object is created but not executed.  The transformer is cached so that the
 *                     groovy processing only needs to be executed once.
 */


/*
 * the handlers.roots method allows you to select the elements in the document where the recursive processing
 * of the xml should begin.  If this is not present then there will be a single root which is
 * root element of the metadata.
 * The strings in the roots method are XPath selectors as take by org.fao.geonet.utils.Xml#selectNodes()
 * All nodes selected by the selectors will be the root nodes
 *
 * the method roots will set the roots, the method root will add to the existing roots
 */
handlers.roots ('gmd:distributionInfo//gmd:onLine[1]', 'gmd:identificationInfo/*', 'gmd:referenceSystemInfo')

// According to Groovy the brackets in a method call are optional (in many cases) so roots can be written:
// handlers.roots 'gmd:identificationInfo/*', 'gmd:distributionInfo//*[gco:CharacterString]'

/**
 * Another way to set the roots is to call the roots method with a closure.  This is useful
 * if you need to use data in the env object to determine which roots to select.
 */
handlers.roots {
    if (env.param('brief') == 'true') {
        ['gmd:distributionInfo//gmd:onLine[1]']
    } else {
        ['gmd:distributionInfo//gmd:onLine[1]', 'gmd:identificationInfo/*', 'gmd:referenceSystemInfo']
    }
}

/*
 * a root can also be added by calling:
 */
handlers.root 'gmd:DataQuality'


/*
 * the handlers object is used to register handlers for the identified elements
 * there are a few ways to specify what a handler can handle.
 *
 * The first and most performant is to specify the name of the element
 * in the following any time a gmd:abstract element is encountered the
 * function/closer will be executed passing in the selected element
 * as mentioned above a handle function can return a string, file or XML
 * in the case below it returns a string which will be parsed into XML
 * groovy has multi-line strings with interpolation (see below)
 *
 * exact name have a default priority of 1 where as matchers that use functions to do the matching have a priority of 0
 * the matchers are checked to see if they are applied first by priority and then in the order that they are defined.
 *
 * The parameters passed to the handler are:
 * - the GPathResult representing the current node
 * - the current TransformationContext.  This contains information about the current state of the transformation.  it is not
 *   normally of use in the closures.
 *
 * Like Javascript you only need to specify as many parameters as needed.
 *
 * The return value will be converted to a string via the toString method and that string will be added to the resulting xml/html
 */
handlers.add 'gmd:abstract', { el ->
    // Don't need a return because last expression of a function is
    // always returned in groovy
    """<p class="abstract">
         <span class="label">${f.nodeLabel('gmd:abstract')}</span>
         <span class="value">${el.'gco:CharacterString'.text()}</span>
       </p>"""
}

/*
 * A start handler is executed when the view creation process begins.  This is required when
 * there are multiple roots or if there will be multiple top level elements.
 * For example if the root is gmd:spatialRepresentation and there are multiple spatialRepresentation
 * elements, then start and end handlers are required to wrap the elements with a single top-level
 * element
 *
 * There is ever only one start and one end handler calling the method multiple times will replace the previous handler and thus
 * the first call will have no effect.
 *
 * There are no parameters to a start and end handler
 */
handlers.start {
    '''<html>
    <body>
'''
}

/*
 * End handlers are executed when the metadata is finished being processed
 */
handlers.end {
    '''    </body>
</html>
'''
}
/*
 * In addition to matching an element name exactly a regexp can be used for the matching
 * Like Javascript regular expression in groovy start and end with /
 */
handlers.add ~/...:title/, { el ->
    """<p class="title">
         <span class="label">${f.nodeLabel(el)}</span>
         <span class="value">${el.'gco:CharacterString'.text()}</span>
       </p>"""
}

/*
 * it is also possible to match on the path of the node.  When matching against a path the path separator is
 * > instead of / because / is the terminator of a regular expression in Groovy
 *
 * The function in this case turns a static method in the class Iso19139Functions into a function (done by the & operator)
 * and passes that function as the handler.
 *
 * The following directories will be scanned for groovy files and made available to the script
 * - format bundle directory
 * - formatter/groovy directory
 * - schema_plugins/<schema>/formatter/groovy
 */
handlers.withPath ~/[^>]+>gmd:identificationInfo>.+extent>.+>gmd:geographicElement/, Iso19139Functions.&handleExtent

/*
 * This example is similar but the class is from the <root formatter dir>/groovy
 */
handlers.withPath ~/[^>]+>gmd:identificationInfo>[^>]+>gmd:pointOfContact/, SharedFunctions.&text

/*
 * Methods can be defined which can used anywhere in the script.
 * This method will take an element which has gco:CharacterString and/or gmd:PT_FreeText
 * children and finds the translation that best matches the UI language
 * the UI language is a global variable available to the script as f.lang3 and f.lang2
 * where they are the 3 and 2 letter language codes respectively
 */
def isoText = { el ->
    def uiCode = "#${env.lang2.toUpperCase()}" // using interpolation to create a code like #DE
    def locStrings = el.'**'.find{ it.name == 'gmd:LocalisedCharacterString'}
    def ptEl = locStrings.find{it.'@locale' == uiCode}
    if (ptEl != null) return ptEl.text()
    if (el.'gco:CharacterString') return el.'gco:CharacterString'.text()
    if (!locStrings.isEmpty) return locStrings[0].text()
    ""
}

/*
 * A second way to define a handler is to provide a function as the first parameter which returns a
 * boolean. (Again you don't need a return because return is implicit)
 *  In groovy functions there is a magic _it_ variable which refers to the single parameter
 * passed in.  You can either simply use it in the function or define a parameter like el ->
 * The matcher function can have 0 - 2 parameters, they are:
 * - GPathResult - the current node
 * - String - the full path of the node
 */
def isRefSysCode = {el, path -> el.name() == 'gmd:code' && path.contains ('gmd:referenceSystemInfo')}

/*
 * due to a limitation of the groovy language the closure must always be the last argument so the matcher
 * must either be a method and provide the &methodName reference or be assigned to a variable like in this
 * example
 */
handlers.add isRefSysCode, { el ->
    /*
     * The html function in f (org.fao.geonet.services.metadata.format.groovy.Functions) allows
     * the use of the very handy groovy.xml.MarkupBuilder which provides a light-weight method of writing XML or HTML.
     *
     * The f.html method takes a closure which can use the groovy.xml.MarkupBuilder and will return the html that has been
     * created as a string.  There for it is very useful for building html in handlers
     */
    f.html { html ->
        html.p('class': 'code') {
            span('class': 'label', f.nodeLabel(el.name())) // translate is a method provided by framework
            span('class': 'value', isoText(el))
        }
    }
}

/*
 * This example illustrates another way of configuring a handler. this add method take a map of values and
 * constructs a handler from them.  The values that will be used from the map are, select and any JavaBean properties
 * in the object.  For example:
 *
 * - name - a name to help with logging and debugging
 * - priority - handlers with a higher priority will be evaluated before handlers with a lower priority
 */
handlers.add name: 'container el', select: { it.children().size() > 0 }, priority: -1, { el ->
    def childData = handlers.processElements(el.children(), el)

    /*
     * we are returning a FileResult which has a path to the file as first parameter and takes a map
     * of String -> Object which are the replacements.  When this is returned the file will be loaded
     * (UTF-8 by default) and all parts of the file with the pattern: ${key} will be replaced with the
     * the value in the replacement map.  So in this example ${label} will be replaced with the
     * translated node name and ${children} will be replaced with the children XML.
     *
     * File resolution is as follows:
     * - Check for file in same directory as the view.groovy file
     * - If in a schema-plugin then look in the root formatter directory for the file
     * - Finally look in the formatter directory for the file
     */
    if (!childData.isEmpty()) {
        return handlers.fileResult("block.html", [label: f.nodeLabel(el.name()), childData: childData])
    }

    // return null if we don't want to add this element, just because it matches doesn't mean it has to produce data
}

/*
 * Another example of FileResult. In this case it is for a specific element and the template is looked up in the root
 * formatter/groovy directory.
 *
 * This example mixes using a new MarkupBuilder object to create an XML string and then use that (and other) text as substitutions
 * in the FileResult object that is returned.
 *
 * Note:  It is possible to convert a FileResult object to a string via the toString() method.
 *        This is useful when you want to embed the data from one FileResult in another.
 *
 */
handlers.add 'gmd:CI_OnlineResource', { el ->
    def linkage = el.'gmd:linkage'.'gmd:URL'.text()

    if (!linkage.trim().isEmpty()) {
        linkage = f.html {html ->
            html.div ('class':'linkage') {
                span ('class': 'label', f.nodeLabel(el.'gmd:linkage'.'gmd:URL') + ":")
                span ('class': 'value', linkage)
            }
        }
    }
    handlers.fileResult ("groovy/online-resource.html",[
                    resourceLabel: f.nodeLabel(el),
                    name:  isoText(el.'gmd:name'), // get text of name child
                    desc: isoText(el.'gmd:description'), // get text of description child
                    linkage: linkage,

            ])
}

/*
 * This example demonstrates accessing the request parameters.
 *
 * The env (org.fao.geonet.services.metadata.format.groovy.Environment) object has a method for getting the parameters
 */
handlers.add name: 'h2IdentInfo option',
             select: {el -> el.name() == 'gmd:MD_DataIdentification' && env.param('h2IdentInfo').toBool()}, { el ->
    def childData = handlers.processElements(el.children(), el)

    f.html {
        it.div('class':'identificationInfo') {
            h2 (f.nodeLabel(el))
            // mkp.yield and mkp.yieldUnescaped addes data to the body of the current tag.  You can also add text
            // as the last parameter of the tag params but that will be escaped.
            //
            // mkp has several useful methods for making XML
            mkp.yieldUnescaped(childData)
        }
    }
}



/**
 * Sorters can be used to control the order in which the data is added to the resulting document.  When the children of an
 * element are being processed a sorter (if its matches method selects the element) will sort the data before it is added
 * to the document.
 *
 * Like handlers, sorters can be prioritized so that the highest priority sorter that matches an element will be applied.
 *
 * The data passed to be sorted are org.fao.geonet.services.metadata.format.groovy.SortData objects.
 */
handlers.sort ~/.*/, {el1, el2 ->
    el1.name().compareTo(el2.name())
}

def sortVal = { el ->
    switch (el.name()) {
        case "gmd:abstract":
        case "gmd:pointOfContact":
            return 0
        default:
            return 1;
    }
}

/*
 * As with handlers sorters have priority and can be created using the map form
 *
 * Note:  Both handlers and sorters have a name parameter that is used during debugging and logging.  It does not
 * have a functional function.
 *
 * For the map form of handler and handlers any
 */
handlers.sort name:'Sort data identification children', select: 'gmd:MD_DataIdentification', priority: 5, {el1, el2 ->
    sortVal(el1) - sortVal(el2)
}

Groovy Strings

A note about strings in Groovy. both ' and " create strings in Groovy (like Javascript). However ' creates a literal string where " creates a string allowing interpolation. Interpolation is where the variables and methods are executed within the string and the results are placed in the string. For example if there was a variable count = 5 you could make a string: "There are $count boxes" which would result in There are 5 boxes. It is important to realize that if you use ' instead of " you will not get interpolation.

To call methods you can use the ${...} form around the code and the result from the code execution will be put in the string. For example "The max is ${Math.max(2,3)}" will result in The max is 3.

Multiline String: 3 " or 3 ' together starts a multiline string. As with normal strings """...""" allows interpolation where '''...''' will not use interpolation and what is written is literally what you get.

Shared Functions

Since there are likely to be many shared functions between formatters, the paths schema_plugins/<plugin>/formatter/groovy and `/groovy' will automatically be included on the classpath when executing a groovy formatter. All files in those directories are assumed to be correctly written groovy files and not simple groovy scripts. So all code should be encapsulated within classes.

XmlSlurper Resources:

Groovy DSL Advantages

More terse. Less code is required to do the same amount of work this means it is easier to determine the intent of the code
Groovy is a full programming language which means that developers are not limited to the by the functionality of xslt. It also means that it is easy to use third party libraries like Geotools or Joda time directly
Debugging is much easier since you can actually put break points in the files and step through the execution.
Stack traces take you directly to the correct file and line. The error messages are also very mature.
The performance should be quite excellent once the scripts are cached
Performance tuning is easier since normal Java performance analysis tools will work.

Proposal Type:

Type: Metadata Viewer
Module: services, schemas

Voting History

Vote Proposed: TBA

Participants

All

Provide feedback

Saved searches

Use saved searches to filter your results more quickly