Skip to content

Topcat's schema driven query builder

Brian Ritchie edited this page Jul 23, 2019 · 5 revisions

Topcat's schema-driven query builder

Brian Ritchie, Feb 2019

Introduction

In issue#392, Rolf Krahl suggested that Topcat's Javascript application should not contact ICAT directly, to reduce exposure of the ICAT interface to the outside world. In particular, the Javascript should not be sending raw JPQL query strings to ICAT's entityManager interface.

I have spent some time looking into how this might be achieved; and though I have decided that it would require greater changes to the code than is worthwhile, it is probably worth documenting what I have discovered about the approach used in Topcat. I believe that the original code was developed by Jody Salt, but it has never been documented.

Overview

Topcat tries hard to be generic, with facility-specific details provided through configuration, largely in a giant topcat.json structure. Each facility can define its own entity hierarchy, and for each entity, the columns used in display tables for browsing that entity. This means that the main code has to cope with arbitrary table shapes and column contents; so queries are constructed depending on the configuration.

A key feature of the query construction is that it is driven in part from Topcat's own internal model (in Javascript) of the ICAT schema. This model describes the entities and relationships between them.

For each entity type the schema model defines a canonical variable name that can be used in column definitions, filters, etc. When the definition of, or filter upon a column in a display table for entity type A refers to a variable for entity type B, in constructing the query, Topcat needs to know what joins to include to link B to A; this is done by calculating the shortest variable path from the schema from A to B.

As the schema is fixed, these calculations need only be made once (indeed, the resulting paths could have been hard-wired themselves). In the code, this happens in the construction of the icatSchema factory instance in Javascript.

The IcatQueryBuilder service is used to construct queries on-demand. It provides methods for defining most of the query clauses, and a final build method that constructs the final query string. The build method is not used externally, but is used in the service's methods that submit the query to ICAT in various forms (run, count, min and max). The build method is where most of the work is done, analysing the where clauses and using information derived from the variable paths to determine the joins. A further complication is that the arguments passed to the clause constructors can take the form of parameterised fragment strings followed by lists of arguments that can be values or functions, and such fragments can also be nested in lists. The work of converting such lists into a single query string is done by a separate buildQuery method in the helpers service.

Though the IcatQueryBuilder is used in many places, there are others where Topcat constructs queries in a more ad-hoc manner.

The icatSchema factory

Location: topcat/yo/app/scripts/services/icat-schema-factory.js

The icat schema is defined/described as a large JSON object that defines, for each named entity type in the schema, its fields and its relationships to other entities. Dataset is a typical example:

	'dataset': {
		'fields':{
		'name': 'string',
		'location': 'string',
		'complete': 'boolean',
		'startDate': 'date',
		'doi': 'string',
		'createId': 'string',
		'createTime': 'date',
		'modTime': 'date',
		'description': 'string',
		'modId': 'string',
		'endDate': 'date'
		},
		'relationships':{
			'sample': {
				'entityType': 'sample',
				'cardinality': '0,1',
				'variableName': 'datasetSample'
			},
			'dataCollectionDatasets': {
				'entityType': 'dataCollectionDataset',
				'cardinality': '0,*'
			},
			'investigation': {
				'entityType': 'investigation',
				'cardinality': '1,1',
				'variableName': 'investigation'
			},
			'datafiles': {
				'entityType': 'datafile',
				'cardinality': '0,*',
				'variableName': 'datafile'
			},
			'type': {
				'entityType': 'datasetType',
				'cardinality': '1,1',
				'variableName': 'datasetType'
			},
			'parameters': {
				'entityType': 'datasetParameter',
				'cardinality': '0,*',
				'variableName': 'datasetParameter'
			}
		}
	},

Not all relationships define a variableName, but for those that do, the variableName can be used in where-clauses in queries selecting the "parent" entity. For example, a query selecting datasets can refer to datasetType in its where clause; Topcat will then be able to determine what joins are required.

The factory constructor calculates a variablePaths object for each entityType and adds it to its schema (i.e. alongside the fields and relationships). The calculation is particularly complex, but appears to be traversing the relationships to construct the shortest path between each pair of entity types. It appears to calculate all possible paths, then chooses the shortest; there may be a more efficient calculation, but this doesn't seem to cause performance issues even though it is being calculated every time the application is loaded into the browser.

(In the code, out is the JSON object representing the schema.)

	function findPossibleVariablePaths(entityType, path, visited, paths){
		if(!path){
			path = [];
			visited = [];
			paths = {};
		}
		if(out.entityTypes[entityType]){
			_.each(out.entityTypes[entityType]['relationships'], function(relationship, relationshipName){
				if(!relationship['variableName']) return;
				var currentPath = _.clone(path);
				var currentVisited = _.clone(visited);
				currentPath.push(relationshipName);
				if(!paths[relationship['variableName']]) paths[relationship['variableName']] = [];
				paths[relationship['variableName']].push(currentPath);
				if(!_.include(visited, relationship['entityType'])){
					currentVisited.push(relationship['entityType']);
					findPossibleVariablePaths(relationship['entityType'], currentPath, currentVisited, paths);
				}
			});
		}

		return paths;
	}

	function findVariablePaths(entityType){
		var out = {}
		_.each(findPossibleVariablePaths(entityType), function(paths, variableName){
			paths = _.sortBy(paths, function(path){
				return path.length;
			});
			out[variableName] = paths[0];
		});
		return out;
	}

	_.each(out.entityTypes, function(entitySchema, entityType){
		entitySchema['variablePaths'] = findVariablePaths(entityType);
	});

Finally, the constructor calculates a variableEntityTypes map and adds this to the schema object. This maps each relationship's variableName to the relationship's entityType.

	out.variableEntityTypes = {};

	_.each(out.entityTypes, function(entityTypeSchema){
		_.each(entityTypeSchema.relationships, function(relationship){
			if(relationship.variableName){
				out.variableEntityTypes[relationship.variableName] = relationship.entityType;
			}
		});
	});

Examples

Inspection of the icatSchema in the browser shows an example of the calculated variablePaths for dataset to be:

  variablePaths: {...}
    datafile: Array [ "datafiles" ]
    ​​​datafileFormat: Array [ "datafiles", "datafileFormat" ]
    datafileParameter: Array [ "datafiles", "parameters" ]
    datafileParameterType: Array(3) [ "datafiles", "parameters", "type" ]
    dataset: Array [ "investigation", "datasets" ]
    datasetParameter: Array [ "parameters" ]
    datasetParameterType: Array [ "parameters", "type" ]
    datasetSample: Array [ "sample" ]
    datasetType: Array [ "type" ]
    facility: Array [ "investigation", "facility" ]
    facilityCycle: Array(3) [ "investigation", "facility", "facilityCycles" ]
    instrument: Array(3) [ "investigation", "facility", "instruments" ]
    instrumentScientist: Array(5) [ "investigation", "facility", "instruments",  ]
    instrumentScientistPivot: Array(4) [ "investigation", "facility", "instruments",  ]
    investigation: Array [ "investigation" ]
    investigationInstrument: Array(3) [ "investigation", "investigationInstruments", "instrument" ]
    investigationInstrumentPivot: Array [ "investigation", "investigationInstruments" ]
    investigationParameter: Array [ "investigation", "parameters" ]
    investigationParameterType: Array(3) [ "investigation", "parameters", "type" ]
    investigationSample: Array [ "investigation", "samples" ]
    investigationStudy: Array(3) [ "investigation", "studyInvestigations", "study" ]
    investigationUser: Array(3) [ "investigation", "investigationUsers", "user" ]
    investigationUserPivot: Array [ "investigation", "investigationUsers" ]
    publication: Array [ "investigation", "publications" ]
    studyInvestigationPivot: Array [ "investigation", "studyInvestigations" ]

and the variableEntityTypes (across all entities) are:

  variableEntityTypes: {...}
    dataCollectionParameter: "dataCollectionParameter"
    datafile: "datafile"
    datafileFormat: "datafileFormat"
    datafileParameter: "datafileParameter"
    datafileParameterType: "parameterType"
    dataset: "dataset"
    datasetParameter: "datasetParameter"
    datasetParameterType: "parameterType"
    datasetSample: "sample"
    datasetType: "datasetType"
    facility: "facility"
    facilityCycle: "facilityCycle"
    instrument: "instrument"
    instrumentScientist: "user"
    instrumentScientistPivot: "instrumentScientist"
    investigation: "investigation"
    investigationGroup: "grouping"
    investigationInstrument: "instrument"
    investigationInstrumentPivot: "investigationInstrument"
    investigationParameter: "investigationParameter"
    investigationParameterType: "parameterType"
    investigationSample: "sample"
    investigationStudy: "study"
    investigationUser: "user"
    investigationUserPivot: "investigationUser"
    publication: "publication"
    studyInvestigationPivot: "studyInvestigation"

Refactoring potential

With issue #392 in mind, if we were to move the query construction from Javascript to Java, then this code would have to be moved, or perhaps even replicated. (The icatSchema is used for other purposes besides query construction.) Replication seems inefficient and risky from a maintenance viewpoint.

As the schema is (usually) fixed, the variablePaths calculation could be replaced by hard-wired values, but there seems to be little point in doing this, even though for any particular configuration, the subset of variable paths that will be actually used is probably quite small.

I suspect that a less generic Topcat would not require such a complicated mechanism. However, the approach may be of interest for development of a more abstract query interface to ICAT.

The IcatQueryBuilder service

This is the main mechanism for construction of queries in Topcat. Most of the interface methods build lists of query clauses; the work of creating the query itself is performed in a final build stage.

The interface this provides to the rest of Topcat is:

  • the constructor takes an entityType; that is used to generate the select clause and drive much else in the final build stage

  • .where() adds a clause to the list of where clauses; the argument can be a single string, or a list containing a parameterised string and some form of arguments (which may be values or functions to be invoked during the final query build)

  • .orderBy() takes a fieldName and an optional direction, and adds this to the orderByList;

  • .limit() takes one or two arguments; the forms are .limit(count) and .limit(offset,count); it defines an appropriate limit clause for the query

  • .include(variableName) adds an expression to the include list. This uses the variablePaths calculated by the icatSchema factory

  • .build() constructs the query from the previously-supplied information. It can take up to three arguments: functionName, fieldName and investigationName, all optional. It appears only to be used by IcatQueryBuilder itself, within .run(), .count(), .min() and .max().

  • .run() runs the query on ICAT

  • .count() runs a count for the query on ICAT

  • .min(), .max() run mix/max queries on ICAT

The service initialisation includes calculation of a "steps" object; this uses the schema variablePaths to map a string of the form entityType.path to a variableName. This is later used when calculating the join expressions.

    var steps = {};
    _.each(icatSchema.entityTypes, function(entitySchema, entityType){
        _.each(entitySchema.variablePaths, function(path, variableName){
            var matches;
            if(path.length == 1){
                steps[entityType + '.' + path[0]] = variableName;
            }
        });
    });

Further initialisation is done in the IcatQueryBuilder constructor itself, including:

  function IcatQueryBuilder(icat, entityType){
      var that = this;
      var facility = icat.facility();
      var facilityName = facility.config().name;
      var tc = facility.tc();
      var user = tc.user(facilityName);
      var cart = user.cart();
      var select = entityType;
      var whereList = [];
      var orderByList = [];
      var includeList = [];
      var variablePaths = icatSchema.entityTypes[entityType].variablePaths;
      var limitOffset;
      var limitCount;

Note that variablePaths is set to the variablePaths of the given entity in the icatSchema.

Most of the exposed methods are straightforward. .include() uses variablePaths to determine which path expression to add to the includeList.

    this.include = function(variableName){
        var variablePath = variablePaths[variableName] || [];
        var path = _.flatten([[entityType], variablePath]).join('.');
        if(!_.contains(includeList, path) && entityType != path && !(entityType == 'proposal' && variableName == 'investigation')){
            includeList.push(path);
        }
        return this;
    };

Note the specific case-handling for the 'proposal' entityType and 'investigation' variables (where the path is not included). The treatment of 'proposal' in Topcat is probably worth a separate document!

The bulk of the work is done in the .build() method. This constructs a list of query fragments, where each fragment is a list containing a string and possibly arguments (when the string contains "?" placeholders).

    this.build = function(functionName, fieldName, investigationName){
      var out = [];
      var distinct = functionName == 'count' ? 'distinct ' : '';

out is the variable used to hold the final (or near-final) form of the query.

The first fragments to be added are the select clause and the initial from clause. This part of the construction depends on whether the functionName, fieldName and/or investigationName arguments are defined.

Note that .build() appears only to be used within IcatQueryBuilder itself, in the .run(), .count(), .min() and .max() methods. In most cases, build() is called with no arguments; other cases will be considered later.

For entityTypes other than 'proposal', the construction is:

	if(functionName){
		if(fieldName){
			out.push(["select ?(??.?)", functionName.safe(), distinct.safe(), entityType.safe(), fieldName.safe()]);
		} else {
			out.push(["select ?(??)", functionName.safe(), distinct.safe(), entityType.safe()]);
		}
	} else {
		out.push(["select distinct ?", entityType.safe()]);
	}

	out.push([
        "from ? ?", helpers.capitalize(entityType).safe(), entityType.safe()
	]);

So when build() has no arguments, for a given entityType of "dataset" this adds the following pair of lists to the out list:

    ["select distinct ?", "dataset"], ["from ? ?", "Dataset", "dataset"]

(Strictly, the three instances of "dataset" and "Dataset" in these lists are not simple strings: .safe() converts a String into a SafeString, which is defined in helpers.service. It does not do any real processing on the string (e.g. to make it safe) but is used to declare that the string is safe, and so does not require any special processing in later steps.)

Much later, this will be processed to create the query fragment:

    "select distinct dataset from Dataset dataset"

A similar but completely separate construction is made when the query's entityType is 'proposal' (roughly: at this point, 'proposal' is replaced by 'investigation').

	if(entityType == 'proposal'){
		if(investigationName){
			if(functionName){
				if(fieldName){
					out.push(["select ?(?investigation.?)", functionName.safe(), distinct.safe(), fieldName.safe()]);
				} else {
					out.push(["select ?(?investigation)", functionName.safe(), distinct.safe()]);
				}
			} else {
				out.push(["select investigation"]);
			}
		} else {
			if(functionName){
				if(fieldName){
					out.push(["select ?(distinct investigation.?)", functionName.safe(), fieldName.safe()]);
				} else {
					out.push(["select ?(distinct investigation.name)", functionName.safe()]);
				}
			} else {
				out.push(["select distinct investigation.name"]);
			}
		}

		out.push([
			"from Investigation investigation"
		]);
	} else {
		... general case above ...

The next stage is to calculate the joins for the "from" clause. This starts by building a list of impliedPaths, first by analysing the whereList expressions. To do this, it uses helpers.buildQuery to build a temporary copy of the where clauses where any "?" placeholders have been replaced by the supplied values. For each variable name that appears in the whereQuery, its variablePath (from the icatSchema) is added to impliedPaths.

		var whereQuery = _.clone(whereList);
		var whereQueryFragment = helpers.buildQuery(whereQuery);
		var impliedPaths = {};

		_.each(variablePaths, function(path, name){
			if(name != entityType && whereQueryFragment.indexOf(name) >= 0){
				impliedPaths[name] = path;
			}
		});

Aside: the use of indexOf here implies that the code could be fooled when one variable name is a substring of another. One possible example is datafile and datafileParameterType, which are both variableNames of relationships within datafileParameter. I have no evidence of any problems arising due to this.

The orderByList is processed similarly:

    _.each(orderByList, function(orderBy){
        var name = orderBy.replace(/\.[^\.]+$/, '');
        if(name !=  entityType){
            impliedPaths[name] = variablePaths[name];
        }
    });

A special case for facilityCycle (whether it is the query entityType or referenced in the impliedPaths): an extra where clause is added to relate the investigation startDate to the facilityCycle's date-range; and if the query entityType is neither 'investigation' nor 'proposal', then the variablePath for 'investigation' is added to impliedPaths (since 'investigation' is now mentioned in the whereQuery, and may not have appeared originally).

	if(impliedPaths.facilityCycle || entityType == 'facilityCycle'){
        if(whereQuery.length > 0) whereQuery.push('and');
		whereQuery.push('investigation.startDate BETWEEN facilityCycle.startDate AND facilityCycle.endDate')
		if(entityType != 'investigation' && entityType != 'proposal'){
			impliedPaths['investigation'] = variablePaths['investigation'];
		}
	}

If the investigationName argument to build() is defined, then a where-clause is added to restrict to matching investigations. This is used in the .run() method in the 'proposal' case (see below); this is probably the only use.

	if(investigationName){
	    if(whereQuery.length > 0) whereQuery.push('and');
	    whereQuery.push(["investigation.name = ?", investigationName]);
	}

If the query entityType is 'proposal', then the first element is removed from each of the impliedPaths. (I presume this works in tandem with other special-case coding for 'proposal', which treats it as a pseudo-level in the hierarchy that is (largely) synonymous with 'investigation'; but later processing steps must be born in mind!)

    if(entityType == 'proposal'){
        var alteredImpliedPaths = {}
        _.each(impliedPaths, function(path, name){
            path = path.slice(1);
            alteredImpliedPaths[name] = path;
        });
        impliedPaths = alteredImpliedPaths;
    }

Next, the impliedPaths are used to construct impliedVars, and this is used to construct the joins list:

   var impliedVars = this.impliedPathsToImpliedSteps(impliedPaths);

	var joins = [];
	_.each(impliedVars, function(name, pair){
		joins.push([", ? as ?", pair.safe(), name.safe()]);
	});

	if(joins.length > 0){
		out.push(joins);
	}

(See below for coverage of impliedPathsToImpliedSteps(); it uses impliedPaths and steps to build a map from expressions of the form variable.relationship to (a different) variable.)

The remainder of the build() method is straightforward, except for the final step:

    if(joins.length > 0){
	out.push(joins);
    }

    if(whereQuery.length > 0){
	out.push(['where', whereQuery]);
    }

    if(orderByList.length > 0){
        out.push(['ORDER BY', orderByList.join(', ')]);
    }

    if(!functionName && limitCount && !investigationName){
        out.push(['limit ?, ?', limitOffset, limitCount]);
    }

    if(!functionName && includeList.length > 0 && !investigationName){
        out.push(['include', includeList.join(', ')]);
    }

	return helpers.buildQuery(out);

The final step passes 'out' (which by now is a list of nested lists of possibly-parameterised strings and their arguments) to helpers.buildQuery(), which flattens the query list and replaces all "?" placeholders with their supplied values. (See later for more details on what helpers.buildQuery() does.)

The run() method runs the query on ICAT. It uses helpers.overload() to define multiple argument forms; the first case ('object') is the most general, and other cases are defined in terms of it. Note that it (initially) calls build() with no arguments, and in all but the proposal case this is what it returns:

    this.run = helpers.overload({
        /**
         * Runs the query on the Icat server.
         * 
         * @method
         * @name IcatQueryBuilder#run
         * @param  {object} options {@link https://docs.angularjs.org/api/ng/service/$http#usage|as specified in the Angular documentation}
         * @return {Promise<IcatEntity[]>} a deferred array of icat entities
         */
		'object': function(options){
            var out = icat.query([this.build()], options);

The larger part of the code here is for special-case treatment of the 'proposal' entityType. In that case, the original query is sent to ICAT and is presumed to return a list of investigation names. For each name, build() is called again, this time supplying the name as the investigationName parameter (and so presumably restricting the query to that named investigation). This is then sent to ICAT, and the result added to a list of proposals; the results of these queries are combined and returned as the final response.

(This suggests that the entityType 'proposal' is not a true ICAT entity, but is treated as a set of investigations. A query for 'proposal' is "translated" into a query on investigation (names), but in the final run() step the query is run for all investigations with matching names.)

            if(entityType == 'proposal'){
                var defered = $q.defer();
                out.then(function(names){
                    var promises = [];
                    var proposals = [];

                    _.each(names, function(name){
                        promises.push(icat.query([that.build(null, null, name)], options).then(function(investigations){
                            var proposal = {};
                            proposal.entityType = "proposal";
                            proposal.id = investigations[0].name;
                            proposal.name = investigations[0].name;
                            proposal.investigations = investigations;
                            proposal = tcIcatEntity.create(proposal, icat.facility());
                            proposals.push(proposal);
                        }));
                    });
                    
                    $q.all(promises).then(function(){
                        var proposalIndex = {};
                        _.each(proposals, function(proposal){
                            proposalIndex[proposal.name] = proposal; 
                        });
                        proposals = _.map(names, function(name){
                            return proposalIndex[name]
                        });
                        defered.resolve(proposals);
                    });
                    
                });
                return defered.promise;
            } else {
			    return out;
            }
		},

The remaining overloaded cases are defined in terms of the above:

        /**
         * Runs the query on the Icat server.
         * 
         * @method
         * @name IcatQueryBuilder#run
         * @param  {Promise} timeout if resolved the request will be cancelled
         * @return {Promise<IcatEntity[]>} a deferred array of icat entities
         */
		'promise': function(timeout){
			return this.run({timeout: timeout});
		},

        /**
         * Runs the query on the Icat server.
         * 
         * @method
         * @name IcatQueryBuilder#run
         * @return {Promise<IcatEntity[]>} a deferred array of icat entities
         */
		'': function(){
			return this.run({});
		}
	});

count() is another overloaded function (here only the general case is shown). This calls build() with a first argument (functionName) of 'count' - recall that this controls the construction of the 'select' clause: the result will be a select expression of the form select count(distinct entityType).

    this.count = helpers.overload({
        /**
         * Counts the number of results that will be returned.
         * 
         * @method
         * @name IcatQueryBuilder#count
         * @param  {object} options {@link https://docs.angularjs.org/api/ng/service/$http#usage|as specified in the Angular documentation}
         * @return {Promise<number>} the number results to be returned (deferred)
         */
        'object': function(options){
            return icat.query([this.build('count')], options).then(function(results){
                return results[0];
            });
        },

min() and max() are similar to count(), but also specify a fieldName to build(), so the resulting select expression will be of the form select min(entityType.fieldName).

    this.min = helpers.overload({
        'string, object': function(fieldName, options){
            return icat.query([this.build('min', fieldName)], options).then(function(results){
                return results[0];
            });
        },
        'string, promise': function(fieldName, timeout){
            return this.min(fieldName, {timeout: timeout});
        },
        'string': function(fieldName){
            return this.min(fieldName, {});
        }
    });

    this.max = helpers.overload({
        'string, object': function(fieldName, options){
            return icat.query([this.build('max', fieldName)], options).then(function(results){
                return results[0];
            });
        },
        'string, promise': function(fieldName, timeout){
            return this.max(fieldName, {timeout: timeout});
        },
        'string': function(fieldName){
            return this.max(fieldName, {});
        }
    });

The impliedPathsToImpliedSteps function is used when building the joins. It uses both the impliedPaths and steps structures constructed previously. It appears to be constructing a map from path expressions of the form variable.relationship to variables.

    this.impliedPathsToImpliedSteps = function(impliedPaths){
        var out = {};
        _.each(impliedPaths, function(path, variableName){
            var currentVariableName = entityType;
            var currentEntityType = entityType;
            _.each(path, function(relationship){
                if(currentEntityType == 'proposal') currentEntityType = 'investigation';
                if(currentVariableName == 'proposal') currentVariableName = 'investigation';
                var stepPair = currentVariableName + "." + relationship;
                var entityPair = currentEntityType + "." + relationship;
                currentVariableName = steps[entityPair];
                out[stepPair] = currentVariableName;
                currentEntityType = icatSchema['variableEntityTypes'][currentVariableName];
            });
        });
        return out;
    };

Refactoring potential

At first sight, the use of "abstract" constructors such as .where() suggest that IcatQueryBuilder could be refactored so that the final result is not a string but a more abstract representation of a query that may be safer to pass across in an API. Perhaps the work done in .build() could be moved over to the server? However, we would still want to avoid an API that allows arbitrary expression-strings to be supplied as potential where-clauses; I am not convinced that such a refactoring would entirely prevent malicious query injection.

The parts played by helpers.service

Topcat's helpers.service is a catch-all for a number of utility functions, both general (such as support for overloaded definitions) and specific (such as buildQuery). The buildQuery() function is used as the last step in query construction by IcatQueryBuilder.build(). (It is also used when ad-hoc queries are submitted, see later.)

Recall that the value constructed in IcatQueryBuilder is a nested list of query fragments, which can be strings or lists of parameterised strings followed by their arguments. The first stage of buildQuery() works through the nested lists; if it finds a function object, it invokes it and substitutes the returned value; nested lists are flattened. It repeats until no more functions or nested lists are found; the resulting object is still a list (I am not sure how many levels there may be at this point!)

	this.buildQuery = function(query){
		while(true){
        	query = _.map(query, function(i){
        		if(helpers.typeOf(i) == 'function') i = i.call(this);
        		return i;
        	});
        	query = _.flatten(query);
        	var isFunction = _.select(query, function(i){ return helpers.typeOf(i) == 'function'; }).length > 0;
        	var isArray = _.select(query, function(i){ return helpers.typeOf(i) == 'array'; }).length > 0;
        	if(!isFunction && !isArray) break;
        }

I am not certain I have found all situations where functions are passed. One example is in tc.service's search() (see below). This may be a stumbling-block in any attempt to move query construction to the server, as the function may have to be evaluated in the client browser.

Next, any undefined elements are removed:

	        query = _.select(query, function(i){ return i !== undefined; });

The next stage replaces any "?" placeholders with values, where the values are drawn from successive elements in the list. Note that each fragment is processed through the jpqlSanitize() function, see below. The final result is flattened to a single string (I think!)

	        try {
	        	var _query = [];
	        	for(var i = 0; i < query.length; i++){
	        		var expression = [];
	        		var fragments = query[i].split(/\?/);
	        		for(var j in fragments){
	        			expression.push(fragments[j]);
	        			if(j < fragments.length - 1){
	        				i++;
	        				expression.push(helpers.jpqlSanitize(query[i]));
	        			}
	        		}
	        		_query.push(expression.join(''));
	        	}
	        } catch(e) {
	        	console.error("can't build query", query, e)
	        }
	        return _query.join(' ');
		};

The jpqlSanitize() function "merely" quotes (and escapes quotes in) arguments that are "unsafe" strings. This works in tandem with the String.safe() method seen earlier: this converts a String to a SafeString, which then returns true to .isSafe(); hence .safe() is used earlier to indicate that a string does not require to be quoted. (I believe this means that any numeric values passed as strings must be flagged using .safe().)

		this.jpqlSanitize = function(data){
			if(typeof data == 'string' && !data.isSafe){
				return "'" + data.replace(/'/g, "''") + "'";
			}
			return data;
		};

Actual query constructions

The largest part of query construction in practice is in the definition of the where clauses. The .where() method is used in the following places (excluding those within the QueryBuilder itself) and ways:

  • browse-entities.controller
      out.where(["facility.id = ?", facilityId]);
      out.where(["investigation.name = ?", id]);
      out.where(["?.id = ?", variableName.safe(), parseInt(id || "-1")]);
      out.where("investigationInstrument.id = instrument.id")
      out.where([
                            "? between {ts ?} and {ts ?}",
                            columnDef.jpqlFilter.safe(),
                            from.safe(),
                            to.safe()
                        ]);
      out.where([
                            "? between ? and ?",
                            columnDef.jpqlFilter.safe(),
                            from,
                            to
                        ]);
      out.where("datafileParameterType.name = 'run_number'")
      out.where([
                        "UPPER(?) like concat('%', ?, '%')", 
                        columnDef.jpqlFilter.safe(),
                        columnDef.filter.term.toUpperCase()
                    ]);
  • meta-panel.controller
      .where(entity.type + ".id = " + entity.id);

Note: meta-panel.controller's use of the queryBuilder is slightly unusual - see below.

  • my-data.controller
      out.where("investigationUser.name = :user");
      out.where([
                            "? between {ts ?} and {ts ?}",
                            columnDef.jpqlFilter.safe(),
                            from.safe(),
                            to.safe()
                        ]);
      out.where([
                            "? between ? and ?",
                            columnDef.jpqlFilter.safe(),
                            from,
                            to
                        ]);
      if(columnDef.where) out.where(columnDef.where);
      out.where([
                        "UPPER(?) like concat('%', ?, '%')", 
                        columnDef.jpqlFilter.safe(),
                        columnDef.filter.term.toUpperCase()
                    ]);
      icat.queryBuilder(entityType).where([
                                "investigation.id = ?", entity.id,
                                "and datafileParameterType.name = 'run_number'"
                            ]).min('numericValue', canceler.promise).then(function(min){
                                entity['min' + fieldNameSuffix] = min;
                            });
      icat.queryBuilder('datafileParameter').where([
                                "investigation.id = ?", entity.id,
                                "and datafileParameterType.name = 'run_number'"
                            ]).max('numericValue', canceler.promise).then(function(max){
                                entity['max' + fieldNameSuffix] = max;
                            });

Though we should also consider query constructions that don't use the IcatQueryBuilder, the relatively small number of clause-strings (albeit parameterised) suggests that it might be possible to limit uses of .where() to a particular set of clauses (say, by registering clauses such as "investigation.id = ?" on the server side, and rejecting requests that attempt to use unrecognised clauses. However, the inclusion of columnDef.where (which brings in arbitrary query fragments from the configuration) precludes this. We could require that any such fragments be declared on the server side, but such duplication of configuration would be ungainly.

That said, I am only aware of one example of a where clause specified in the configuration: in ISIS, a custom where clause is used in the "Run Number" column:

            "myData": {
                "entityType" : "investigation",
                "gridOptions": {
                    "enableSelection": false,
                    "columnDefs": [
                        ...,
                        {
                            "field": "datafileParameter.numericValue",
                            "title": "BROWSE.COLUMN.INVESTIGATION.RUN_NUMBER",
                            "where": "datafileParameterType.name = 'run_number'"
                        },

Special case: meta-panel.controller

Though meta-panel.controller uses the IcatQueryBuilder, it doesn't trigger the variable-driven join calculations, as the query it constructs does not refer to variables used in the meta-tab fields. Instead, it has a hard-wired list of inclusions in the code, depending on the current entityType:

            var queryBuilder = facility.icat().queryBuilder(entity.type).where(entity.type + ".id = " + entity.id);

            if(entity.type == 'instrument'){
                queryBuilder.include('instrumentScientist');
            }

            if(entity.type == 'investigation'){
                queryBuilder.include('user');
                queryBuilder.include('investigationParameterType');
                queryBuilder.include('sample');
                queryBuilder.include('publication');
                queryBuilder.include('study');
                queryBuilder.include('investigationUser');
                queryBuilder.include('studyInvestigation');
            }

            if(entity.type == 'dataset'){
                queryBuilder.include('datasetParameterType');
                queryBuilder.include('sample');
                queryBuilder.include('datasetType');
            }

            if(entity.type == 'datafile'){
                queryBuilder.include('datafileParameterType');
                queryBuilder.include('datafileFormat');
            }

This means that for each entityType, the set of variables available in the meta-tabs is limited to those reachable via the hard-wired inclusions. I do not know why a particular set of inclusions was chosen; it may have been driven by the requirements at the time. (The study and studyInvestigation inclusions were added to the investigation case recently so that study details can be referenced.)

There may be an error here: I have noticed that actual queries generated for the meta-panel never seem to include any samples. I suspect that instead of .include("sample") the investigation and dataset cases should do .include("investigationSample") and .include("datasetSample") instead. Yet no-one seems to have complained!

Ad-hoc query constructions

There are a number of query constructions in Topcat that do not use the IcatQueryBuilder, presumably because they are relatively trivial for the most part. Some examples include:

  • breadcrumb.controller:
    // specific proposal case:
    facility.icat().query(["select investigation from Investigation investigation where investigation.name = ?", entityId, "limit 0, 1"], 
        timeout).then(function(entities){

    // general case:
    facility.icat().query(["select ? from ? ? where ?.id = ?", entityType.safe(), helpers.capitalize(entityType).safe(), 
                          entityType.safe(), entityType.safe(), entityId, "limit 0, 1"], timeout)
                   .then(function(entities){
  • browse-facilities.controller:
    facility.icat().query(["select facility from Facility facility where facility.id = ?", facility.config().id])
                   .then(function(facilities){
  • doi-redirect.controller:
    tc.icat(facilityName).query(["select ? from ? ? where ?.id = ", entityType.safe(), helpers.capitalize(entityType).safe(), 
                                entityType.safe(),entityType.safe(), entityId])
                         .then(function(entities){
  • search-parameter.controller:
    facility.icat().query([
        "select parameterType from ParameterType parameterType",
        "where",
        "parameterType.applicableToInvestigation = true or",
        "parameterType.applicableToDataset = true or",
        "parameterType.applicableToDatafile = true",
        "include parameterType.permissibleStringValues"
    ]).then(function(parameterTypes){
  • tc-icat-entity.service:
    // this.getDatasetCount():
        var query = "select count(dataset) from Dataset dataset, dataset.investigation as investigation where investigation.id = ?";
        ...
        return icat.query([query, that.id], options)...

        var query = parentQueries[entity.entityType][parentEntityName](ids);
					return icat.query(query)...
  • tc-icat.service : many uses, including:
    // this.login():
      that.query(["SELECT facility FROM Facility facility WHERE facility.name = ?", name ...
      that.query([
            "SELECT datasetType FROM DatasetType datasetType, datasetType.facility as facility", 
            "WHERE facility.name = ?", name,
            "AND datasetType.name = ?", idsUploadDatasetType
      that.query(["select user from User user where user.name = ?", username]).then(function(users){
  • tc-user-cart.service:
    cartItem.entity = helpers.overload({
      ...
      return facility.icat().query([
                    "select ? from ? ? where ?.id = ?",
                    this.entityType.safe(),
                    helpers.capitalize(this.entityType).safe(),
                    this.entityType.safe(),
                    this.entityType.safe(),
                    this.entityId
                ], options).then(
  • tc.service:
    this.search = helpers.overload({
      ...
	    var query = [
	      function(){
	        if(entityType == 'Investigation'){
	          return 'select investigation from Investigation investigation';
	        } else if(entityType == 'Dataset') {
	          return 'select dataset from Dataset dataset';
	        } else {
	          return 'select datafile from Datafile datafile';
	        }
	      },
	      'where ?.id in (?)', entityInstanceName.safe(), ids.join(', ').safe(),
	      function(){
	        if(entityType == 'Investigation'){
	          return 'include investigation.investigationInstruments.instrument';
	        } else if(entityType == 'Dataset'){
	          return 'include dataset.investigation';
	        } else if(entityType == 'Datafile') {
	          return 'include datafile.dataset.investigation';
	        }
	      }
	    ];
       promises.push(icat.query(query, options).then(function(_results){...

Note that many of these queries are parameterised. icat().query(queryList)) (from tc-icat.service) uses helpers.buildQuery to construct a final query string from queryList; this string is then passed directly to an ICAT entityManager request.

There are more complex instances; for example, tc-icat-entity.service has another query-construction mechanism built around traversal of a map from entities to queries for their parent entities.

In tc.service, .search() takes a query string, submits it to an ICAT lucene/data request, then constructs new queries based on chunks of the returned IDs. These are not passed to the IcatQueryBuilder, but the query list is an example that contains functions, which will be invoked by helpers.buildQuery().