Skip to content
Saurabh Asthana edited this page May 15, 2017 · 3 revisions

The Magma Query API lets you pull data out of Magma through an expressive query interface.

The basic form of a query is a series of predicates, passed in as an array (a "question"). All questions take the same basic form:

  1. Specify the model we wish to map entities (e.g. 'sample' to return information about samples)
  2. Specify the data item we wish to map (e.g. 'processor')
  3. Return the data in (2) for each entity from (1).

In its simplest form we may think of the question as a path through our model graph. For example, let's consider a simple structure, Project > Experiment > Sample, where a Project has many Experiments and each Experiment has many Samples. Let's say we want to know all of the processors for the Samples for a given Experiment. We could write this query like so:

[ 'experiment', 'sample', 'processor' ]

Here we are specifying a path through the graph by starting with a model and following one of its attributes to a linked model. In this case, the Experiment model has a 'sample' attribute that points to the Sample model. The Sample model has an attribute 'sample_name' which is a piece of data.

All questions thus begin at a model and end at a piece of data (string, integer, date, etc.)

But note that this is NOT a valid question (yet).

Unfortunately the actual data set has many "experiments" and "samples", so this query does not suffice. We need to apply filters to deal with multiple items.

A filter is simply another path through the graph ending at another piece of data, usually with a Boolean test applied to it. Continuing our example:

[ 'experiment', [ 'name', '::equals', 'Colorectal' ], 'sample', 'processor' ]

Here we have added a filter to our query, which branches off the 'experiment' predicate. The Experiment model has an attribute 'name', and our filter tests that it is equal to the string 'Colorectal'. All filters must reduce to a Boolean value, otherwise the query is invalid. We can specify any number of filters after a model predicate.

[ 'experiment', [ 'name', '::equals', 'Colorectal' ], [ ... something else ], [ ... something else ], 'sample', 'processor' ] 

So far we have defined our entity, filtered it, and mapped it to a value. The last thing we need to do is decide how to collapse lists. In our example, 'experiment' is a set of items. Even if our filter reduces this set to a single item, we still have a choice. Do we want a single item, or a list of items as our result? E.g., [ 'experiment' ] vs. 'experiment'. We must do one or the other by giving an argument to the model predicate:

[ 'experiment', [ 'name', '::equals', 'Colorectal' ], '::first', 'sample', '::all', 'processor' ]

Now we have a valid question! "::first" means we only want the value of whatever we reduce our experiment to. "::all" means when we are returning a list of every matching sample for the mapped experiment.

The output of our example query will be a list of processors, along with the identifier for the associated sample (usually a named attribute on the model, e.g. 'sample_name', but sometimes just a primary key # if the model has no identifier). E.g.: [ [ "Sample1", "Dopey" ], [ "Sample2", "Sleepy" ], [ "Sample3", "Bashful" ] ]

You may complain that we have lost the 'experiment' in our map, and our question does not actually map "experiment" to "processor" - this is true, and shows some flexibility in the query API. For example we could write this same query like this:

[ 'sample', [ 'experiment', 'name', '::equals', 'Colorectal' ], '::all', 'processor' ]

You may mediate on why these two produce the same result!

Arguments to predicates

There are a handful of predicate types, each of which take various arguments.

Model

A Model predicate is our query starting point and specifies a set of records. Model predicates can accept an arbitrary number of filter [] arguments, followed by:

::first - reduce this model to a single item
::all - return a vector of values for this model, labeled with this model's identifiers

Record

A Record predicate follows after a Model predicate. The valid arguments are:

<attribute_name> - a string specifying an attribute on this model
::has, <attribute_name> - a boolean test for the existence of <attribute_name> (i.e., the data is not null)
::identifier - an alias for the attribute_name of this Model's identifier. E.g., if a Sample has identifier attribute 'sample_name', '::identifier' will return the same value as 'sample_name'

String

Column attributes usually just return their value. However, you may optionally follow them with arguments to apply a boolean test.

::equals, <string> - A boolean test for equality, e.g. [ 'sample_name', '::equals', 'Dumbo' ]
::in, [ list of strings ] - A boolean test for membership, e.g., [ 'sample_name', '::in', [ 'ant', 'bear', 'cat' ] ]
::matches, <string> - A boolean test for a regular expression match, e.g., [ 'sample_name', '::matches', '[GD]umbo' ]

Integer, DateTime

Both of these column predicate types take the same test arguments:

::<= - less than or equals
::< - less than
::>= - greater than or equals
::> - greater than
::= - equals

File

::url - a URL to retrieve this file resource
::path - the filename/path for this file resource