Skip to content

Grouping and aggregation

James Cheney edited this page Jul 26, 2023 · 2 revisions

PR #1135 extends the Links language-integrated query mechanism, making it possible for the user to write database queries that return, or employ as intermediate data structures, finite maps rather than finite relations. Finite maps are used to represent the result of grouping a relation, and aggregation can then be applied groupwise to a finite map to obtain again a relation. Such Links queries are translated to SQL queries using group by and aggregation.

In the Links type system, the type of finite maps is of the form [(k, [v])], where k and v are tuple types, respectively for keys and values. For every key x : k, the finite map contains at most one pair (x, [y1, ..., yn]) associating to the key x the collection of values [y1, ..., yn].

Finite maps, and thus queries with grouping and aggregation, are only available in the mixing normaliser. The standard normaliser is still limited to working with relations.

Usage

Grouping

The main way to construct a finite map is by means of the grouping operator groupBy

groupBy;
fun : ((a) -b-> c, [a]) -b-> [(c, [a])

groupBy(f, l) takes a list l and a grouping criterion f: it applies f to each item in the list to obtain its grouping key, and finally returns a finite map, i.e. a list of pairs associating to each grouping key the list of items of the original list l that share the same grouping key.

e.g.

groupBy(fun (x) { (is_even = even(x)) }, [1, 2, 3]) = [((is_even = true), [2]), ((is_even = false), [1,3])]

The input of groupBy should be a relation rather than a finite map; to allow for iterated grouping, we provide a groupByMap operation that uses a finite map input:

groupByMap;
fun : ((a) -b-> c, [(d, [a])]) -b-> [((c, d), [a])]

Notice that iterated use of groupByMap produces nested tuple types as keys: however, Links flattens these tuples transparently when converting them to SQL.

Comprehension

Finite maps extend comprehension syntax in two ways:

  • the output of a (standard) comprehension can now be a finite map (however, the input must always be a relation)
for (x <-- table) groupBy(f, l)
  • we add a new key comprehension to iterate on the (duplicate-free) set of keys of a finite map: this is presently supported by the function concatMapKey:
concatMapKey;
fun : ((a) -b-> [c], [(a, [_])]) -b-> [c]

concatMapKey works like concatMap, but it takes as input a finite map rather than any list; it performs comprehension over the (deduplicated) key set of the input map (for this reason, the output type of the finite map argument is irrelevant). The output can be a relation or a finite map.

Lookup

lookupG;
fun : (a, [(a, [b])]) -> [b]

Given a key k and a finite map m, lookupG(k,m) returns the list of values associated by the map m to the key k.

Aggregation

Queries support the following aggregation operators:

sum, sumF, avg, avgF, min_list, minF_list, max_list, maxF_list, length

(where the F variants accept Float inputs rather than Int)

If l : [Int], one can for instance use sum(l) within a databaseable query. However, in the most common case of queries using grouping and aggregation together, the following groupwise aggregation operator is easier to use and typically produces more efficient SQL:

aggBy;
fun : ([(a, [b])], ([b]) -c-> d) -c-> [(a, d)]

aggBy(m, f) takes a finite map m and an aggregation criterion f: its purpose is to apply the aggregation to the input map on a key by key basis. The implementation here is admittedly hacky: we can only support f when it is textually in the form:

fun (t) { (outlabel1 = agg1(for (x <- t) [x.inlabel1]), ..., outlabeln = aggn(for (x <- t) [x.inlabeln])) }

Where agg1, ... aggn are aggregation functions.

Any other use of aggBy will NOT be databaseable.

Example queries can be found in the database/grouping.links test file (which requires tables created by database/grouping-create.links)