Skip to content

Transformation Architecture

Ian Ibbotson edited this page Dec 24, 2013 · 14 revisions

NOTE

As of 4.0 The following information is not valid. Transformations are now defined in Config.groovy for each object type. There is no user configuration of standard transforms. The transforms are handled in the app using XALAN and standard grails idioms. For bespoke transforms, rather than registering the transforms in the app, users can supply the URL of the XSL to apply to the document. This provides and extensible "Long Tail" of transforms with an efficient transformation engine.

What is a Transformation?

In this context, a transformation refers to the process of taking output from the KBPlus application as XML and transforming it into various other machine readable formats. The XML would typically be a list of journal titles. In most cases, the other formats to which we are transforming are likely to be proprietary formats that are readable by various types of resolver.

Specification

Broadly, the requirements were:

  • The transformation should be triggered by the user from within the web interface of the KB Plus Application.
  • The user should be able to choose on their profile page which of the available transformations they want appear in their export menu
  • The transformation itself should occur outside of the KBPlus Application
  • It should be easy to add new transformations
  • OS will provide specifications for the actual formats needed by the resolvers

Architecture

Within the KB Plus Application

The KBPlus Application supports the export of entitlement information in a standard XML format (specified by OS), directly from the screens where users browse their institutional data. This XML can optionally be routed to an external transformation.

The available external transformations are listed in two database tables in the KBPlus database. A table called transformer lists the external applications available for offloading transformation work. (There is currently only one.) A table called transforms list all the available transforms, including a readable name, a path to a stylesheet and formats accepted and returned by that transform.

On the user profile page, within the KBPlus Application, a user can approve a transform. This approval is stored in the user_transforms table, and thus persists across sessions. Once approved in this way, that user will see that particular transform in their export menu.

When the user selects an export from the menu, this triggers the KBPlus Application to create an XML version of the data that are on the user's current page. The user does not see this XML. Instead, the XML is sent via and HTTP POST to an external script.

External to the KB Plus Application

An external script receives the POST from the KBPlus Application. This script takes two parameters:

  • xml The XML itself
  • path The name of the transformation

The name of the transformation must match an XSLT stylesheet that is available to the script on its local filesystem.

The script is an extremely simple CGI script that passes on the work of the transformation to the libxslt C libraries.

The script returns the transformed text to the client, which in this case is the KBPlus app, which then delivers it back to the user.

The script can reside on any web server that will run CGI Perl and has the XML::LibXSLT CPAN module correctly installed. However, as the XML can be several megabytes, I'd recommend that the web server is on the same subnet as the the KBPlus App. This will prevent lag in the HTTP transfer.

One advantage of this simple approach is that the transformation script does not need to concern itself with user permissions. The decision to give the user the XML needed for the transformation is in the hands of the KBPlus Application.

Implementation

Here are some details of our specific implementation. This is not by any means the only way of doing this.

On the KBPlus Servers we already have an Apache webserver running that is entirely independent of the the Tomcat server that runs the KBPlus Application. This runs under the kbwww user. The transformation CGI script is installed here: /home/kbwww/www/cgi-bin/KBPlus_transformer.cgi

and is therefore accessible, for example, at the following URL

http://dev.kbplus.ac.uk/cgi-bin/KBPlus_transformer.cgi

Note that users never see this URL.

Note that in reality /home/kbwww/www/cgi-bin/KBPlus_transformer.cgi is just a symlink to /home/kbwww/git/KBPlus/transformer/cgi-bin/KBPlus_transformer.cgi, which is inside a local git clone of the project.

Stylesheets are stored here:

/home/kbwww/git/KBPlus/transformer/stylesheets

Note that the above directory /home/kbwww/git/KBPlus contains a git clone of the project repository.

so an example value for the path parameter passed to the script in the post might be:

/home/kbwww/git/KBPlus/transformer/stylesheets/serialssolutions.xslt

Source code is here

How to add new transformations

  • Create an XSLT stylesheet
  • Copy it to the file system on which the transformer script lives
  • Tell the KBPlus Application that it exists

Step one should include adding the stylesheet to the git repository under transformer/stylesheets.

We do step two by doing a git pull in /home/kbwww/git/KBPlus to refresh the directory on the deployment machine.

Step three is done by putting data into the two database tables. This can either be done directly, or by putting some configuration parameters into the file in ~/.grails/demo-config.groovy and restarting Tomcat.

For example:

systransforms = [ [transformer_name: 'EDINA XSLT', transforms_name: 'Serials Solutions Resolver', url:'http://dev.kbplus.ac.uk/cgi-bin/KBPlus_transformer.cgi', type:'title', format:'xml', return_file_extension:'txt', return_mime:'text/plain', path_to_stylesheet:'/home/kbwww/git/KBPlus/transformer/stylesheets/serialssolutions.xslt'] ]

Clone this wiki locally