Query Language '98

Cam Saul edited this page May 8, 2018 · 44 revisions
Intro

MBQL, the Metabase Query Language, is the language used to express "structured" queries in Metabase. It is generated by the Query Builder in the Metabase frontend, and translated into a myriad of "native" query languages -- including SQL, Mongo Aggregation Pipelines, and Druid -- by the backend. (Metabase can also run queries in these native languages directly, but those are not considered part of MBQL).

If you're new to MBQL or this document, please familiarize yourself with MBQL's core concepts, specifically the conventions used in the MBQL grammar and examples.

Quick Reference

Conceptual

Conventions

For the most part, this document uses Clojure in the examples and its terminology and syntax.

  • Literal :tokens are written in monospaced font with a preceding :. In queries, you may pass these tokens as either strings or Lisp keywords. See the conceptual explanation of tokens for more information.
  • parameter-names are used in certain cases for clarification.
  • optional varargs are written as in a Clojure binding, using &
  • Maps/dictionaries/hashes/associative arrays are written in Clojure syntax, e.g. {:key "value"}

For example, a grammar like this:

Could correspond to one of these clauses in real life:

[:= [:field-id 5] 100]
["=" ["field-id" 5] 100 200]

In examples, -> is used to denote the output of a query or expression, especially in cases where an expression is automatically translated to another form. <angle-brackets> are used for clarity to represent arguments to expressions instead of giving literal values like in the previous example:

[:= <field> <v1> <v2>]

  -> [:or [:= <field> <v1>]
          [:= <field> <v2>]]

Tokens

Tokens are literal keywords (in the Lisp/Ruby sense) or strings in MBQL and have special meaning, much like keywords in a programming language. You'll usually see them at the beginning of a bracketed expression:

[:= <field> <value>] ; := is a token

When writing a query, you may use either Clojure keywords or strings for tokens. Tokens are also case-insensitive and lisp-case/snake_case insensitive. For example, these query fragments are all understood to be the same thing:

{:source_table 100}
{:source-table 100}
{"SOURCE_TABLE" 100}
{"source_table" 100}

In the examples in this document, tokens are written in a monospaced font with a preceding colon, like :lisp-keywords. For consistency, all tokens are written in (lower-case) lisp-case.

MBQL as a Square-Bracketed Lisp

Most of the forms in MBQL '98 look something like:

[:= [:field-id 100] "COOL"]

It may be helpful to think of MBQL as a square-bracketed LISP; indeed, this is how such expressions are internally implemented. The expression above becomes a function call like:

(= (field-id 100) "COOL")

Rvalues

The parameters to MBQL's square-bracked funcalls are rvalues in the C++ sense:

An rvalue is a temporary value that does not persist beyond the expression that uses it.

An rvalue is either a field reference or a literal value.

Field References

A field reference rvalue is a concrete reference to a metabase.models.field/Field (such as its ID) or some derivative calculation thereof (an expression reference), or an aggregate field reference to one of the fields produced by the query's aggregation being performed, such as :count.

Concrete Field References

A concrete Field reference refers to a column represented internally by a metabase.models.field/Field. All columns in a database are mapped to Fields on the backend, so a concrete Field reference effectively means any existing column in the database (as opposed to one generated as the result of a query, such as a count column returned when doing an aggregation). You can think of a concrete field as any column that would come back when doing a SQL SELECT(*) query.

A field reference may be local or foreign, optionally wrapped by a datetime-field form; for most intents and purposes, a expression reference is also considered a "concrete field" and may be used in the same places.

Local Concrete Field References

Local field references are the most common type of field references. Local simply means that the Field being referenced belongs to a query's source table (as opposed to a different Table in the Database).

Local field references look like:

e.g.

[:field-id 100] ; the metabase.models.Field whose ID is 100
Notes

In older versions of MBQL, raw integer literals were used for local field references. This caused ambiguity with integer literals in certain clauses, so each clause assumed certain argument positions were always field references, while others were always assumed to be values. As such, in older versions of MBQL it was impossible to include multiple Fields in a single filter clause:

;; MBQL '98
{:filter [[:= [:field-id 10] [:field-id 11]]]} ; return rows where Field 10 == Field 11

;; MBQL '95
{"filter" [["=" 10 11]] ; Oops! Return rows where Field 10 == 11

MBQL '98 eliminates this ambiguity in order to relax the restrictions on where Fields can be used in filter clauses.

Foreign Concrete Field References

Foreign field references are just like local references, but are used to refer to Fields belonging to a Table other than the source table. Access to these Fields requires JOINS to their parent Table. To avoid ambiguity when performing the join, you must specify a Field belonging to the source Table that is a foreign key to use to perform the join.

Foreign field references look like:

ex.

;; Order by Field 20, which belongs to a different Table. Join to its Table via Field 10, which belongs to the source Table.
{:order-by [[:asc [:fk-> 10 20]]}

It may be helpful to think of the form as suggesting you access dest-field-id through fk-field-id.

The Field with fk-field-id must have a corresponding metabase.models.ForeignKey instance, which should be created automatically during the sync process. For obvious reasons, databases that don't support joins cannot use foreign field references; drivers that do support them declare the feature :foreign-keys.

Datetime Fields

Datetime Fields wrap a datetime field and specify how its values should be bucketed. Datetime fields used without being wrapped in a datetime-field form are automatically converted to datetime-field forms with :day bucketing for legacy backward-compatibility reasons. This is not usually what you want, so be sure to use the :datetime-field form to pick the bucketing you'd like!

datetime-field forms can be used anywhere you can otherwise use a local or foreign field reference.

;; Row count broken out by Field 20, grouped by day (the default)
{:aggregation [:count]
 :breakout    [[:field-id 20]]}

;; Row count broken out by Field 20, grouped by year
{:aggregation [:count]
 :breakout    [[:datetime-field [:field-id 20] :year]]}
Note

Older versions of MBQL required an AS in the datetime-field form in a misguided attempt to improve readability, e.g.:

;; MBQL '95
["datetime_field" 100 "as" "week-of-year"]

This parameter should be considered deprecated now that MBQL has been re-envisioned as a square-bracketed Lisp. MBQL '98 will currently accept the older syntax, albeit begrudgingly.

Aggregate Field References

An aggregate field reference refers to one of the fields produced by the query's aggregation, such as :count. It has the form:

index is the index of the aggregation in question. Since MBQL current only supports a single aggregation clause, index currently must always be zero.

aggregate-field clauses are primarily used for ordering results:

{:aggregation [:sum [:field-id 10]]
 :breakout    [[:field-id 20]]
 :order-by    [[:desc [:aggregation 0]]} ; order by sum, descending

Expression Reference

An expression reference refers to one of the expressions added to the query. Suppose you have an :expressions clause like this:

{:expressions {:sales_tax [:- <id of total> <id of subtotal>]}}

You can refer to this expression elsewhere in the query, e.g. in the :order-by clause:

{:order-by [[:desc [:expression :sales_tax]]]} ; sort results of the query by the value of (total - subtotal)

For purposes of this grammar an expression reference can be considered a concrete field reference and used anywhere one is accepted. The sole exception is within the :expressions clause itself, where expression references aren't allowed.

Values

A value can be nil, a boolean (true or false), a string literal, a numeric literal, or a datetime literal.

Values are used in filter clauses to restrict the results returned by a query. Some filter clauses, like =, can accept any type of value; others are more restrictive, and only accept certain types of values. Several filters such as < only accept values that are considered orderable, i.e. numbers and datetimes.

String Literals

A (presumably UTF-8 encoded) string literal. Implemented internally as a java.lang.String.

"I'm a string literal!😋"

Orderable Values

An orderable value is something that makes sense in comparisons like < -- either a numeric literal or a datetime literal.

[:= <field> true] ; makes sense
[:< <field> true] ; doesn't make sense -- values of a field cannot be "less than" true
[:< <field> 100]  ; makes sense, values of a field can be less than 100
Numeric Literals

Either integer literals or floating-point literals.

Integer Literals

A integer. Internally, these are represented java.lang.Integer, which is 32-bit; thus MBQL integers cannot be less than -231 or greater than 231-1.

Floating-Point Literals

A double-precision floating point number. Internally, these are represented by java.lang.Double.

Datetime Literals

A datetime literal represents a moment in time, either absolute or relative to the moment the query is ran.

ISO-8601 Datetime Literals

ISO-8601 formatted strings are used to represent an absolute moment in time.

Some examples of valid ISO-8601 strings:

"2014"
"2014-09"
"2014-09-21"
"2015-09-20T19:31"
"2015-09-20T19:31:36"
"2015-09-20T19:31:36Z"
"2015-09-20T19:31:36.000"
"2015-09-20T19:31:36.000Z"
"2015-09-20T19:31:36.000+00:00"
Note

Some QP code actually parses datetime strings according to RFC 3339, which is basically a subset of ISO 8601. You probably won't need to worry about the differences between the two; almost every string valid in format is valid in the other. ISO-8601 is a much more widely-known standard than RFC 3339; for the sake of familiarity the rest of this document only specifically mentions the former.

Relative Datetime Forms

Relative datetimes represent a point in time relative to each moment the query is ran, such as today or 1 year ago. A relative datetime is expressed as one of the following forms:

The first form represents the moment in time a query is ran. It is analogous to the SQL NOW() function.

The second form represents a moment in time n units from the moment the query is ran. For example, you could express 31 days ago as follows:

[:relative-datetime -31 :day]
Relative Datetime units

Valid units for a relative datetime form are:

  • :minute
  • :hour
  • :day
  • :week
  • :month
  • :quarter
  • :year

Subtraction and addition of units are done simply; in other words, -2 :day is the same as -48 :hour or -2880 :minute.

Datetime Units

These units are used to specify the desired bucketing of fields with the datetime-field form. Bucketing, as used in this document, refers collectively to both datetime truncation and extraction operations. Truncation operations are flooring operations that convert values to a lower resolution (such as the start of the current day or of the current month); extraction operations extract some element of a date (such as the month) without regard to other elements.

Truncation is analogous to date_trunc in Postgres, while extraction is analogous to extract.

Token Operation Example Description
default N/A 2016-02-01 21:59:20 No bucketing should take place -- use the default resolution of the column in question, such as seconds or milliseconds.
minute truncation 2016-02-01 21:59 Truncate to the beginning of the current minute.
minute-of-hour extraction 59 The minute of the the current hour. Ranges from 0 - 59.
hour truncation 2016-02-01 21:00 Truncate to beginning of the current hour.
hour-of-day extraction 21 Extract the current hour. Ranges from 0 - 23.
day truncation 2016-01-01 Truncate to the beginning of the current day.
day-of-week extraction 2 [Monday] Day of week, with Sunday as the first day of the week (1) to Saturday(6)
day-of-month extraction 1 The calendar date.
day-of-year extraction 32 Day of the year, with January 1st the first day (1).
week truncation 2016-01-31 Truncate to the beginning of the week (Sunday).
week-of-year extraction 6 Week of the year with the week containing January 1st as week 1.
month truncation 2016-02 Truncate to the first day of the month.
month-of-year extraction 2 [Feb] Extract the month of the year as you'd see on a calendar (e.g. January is 01).
quarter truncation 2016-01-01 Truncate to the beginning of the quarter. Quarters start on Jan 1st, April 1st, July 1st, and October 1st of each year.
quarter-of-year extraction 1 The number of the current quarter (1-4), with the quarter beginning Jan 1st as 1.
year extraction 2016 Extract the current year.

Clauses

Aggregations

An aggregation is analogous to SQL aggregation functions like COUNT() or SUM(). When specified, it consists of a single clause, e.g.

{:aggregation [:sum [:field-id 100]]}

The following are valid aggregation clauses:

Aggregations are often used in combination with breakouts, but this is not required. When an aggregation is used without a breakout, the query will return a single row with a single value. When one or more breakout fields is specified, it will return a row for each distinct value.

Suppose we have a table of venues, with a price column representing the number of $$$ we'd see on a Yelp-like app. A count aggregation with no breakout returns the total number of rows, while breaking out by price returns the count of venues for each given price:

{:aggregation [:count]}

  -> [[100]]

{:aggregation [:count], :breakout [<price>]}

  -> [[1 22]
      [2 59]
      [3 13]
      [4  6]]
Notes

In future iterations of the QL (probably MBQL '98SE or MBQL ME), multiple aggregation clauses will also be allowed, but for now, only a single aggregation can be specified.

Older versions of MBQL supported an additional aggregation type, rows. In MBQL '95, this was made the default behavior, and was ignored by the Query Processor; in MBQL '98, it is considered deprecated, and the QP will issue a warning when it is encountered.

Average Aggregation: avg

Return the average value of a numeric field.

Count Aggregation: count

Return the count of rows in the source-table. This takes a single optional arg, a field:

If field is specified, the query retuns the count of rows where field has a non-nil value.

count is analogous to SQL's COUNT. With no args, it behaves like COUNT(*); with field, it is analogous to COUNT(column_name).

Cumulative Count Aggregation: cum-count

Return the cumulative count of a numeric field. This is useful for doing things like total number of sales over time -- charts that go "up and to the right". 📈

This aggregation is implemented entirely by middleware on the backend, so individual drivers don't need to worry about handling it. Note that it doesn't make particular sense to use this aggregation without any breakout columns; in that case, cum-count will be treated the same as count

Cumulative Sum Aggregation: cum-sum

Return the cumulative sum of a numeric field.

This aggregation is implemented entirely by middleware on the backend, so individual drivers don't need to worry about handling it. Note that it doesn't make particular sense to use this aggregation without any breakout columns; in that case, cum-sum will be treated the same as sum.

Distinct Aggregation: distinct

Return the number of distinct values of a field.

This is analogous to COUNT(DISTINCT column) in SQL.

Standard Deviation Aggregation: stddev

Return the standard deviation of values of a numeric field.

This is not supported by all drivers; drivers that support it declare the feature standard-deviation-aggregations.

Sum Aggregation: sum

Return the sum of all values of a numeric field. This is analogous to SQL SUM(column).

Depending on the database and underlying column type, the results will either by an integer or a floating-point.

Minimum Value Aggregation: min

Return the minimum value of a numeric or datetime field. This is analogous to SQL MIN(column).

Maximum Value Aggregation: max

Return the maximum value of a numeric or datetime field. This is analogous to SQL MAX(column).

breakout

Breakouts group a query by the distinct values of one or more columns. When used in conjunction with an aggregation, they are analogous to GROUP BY statements in SQL. Unlike SQL, however, breakout clauses can be used without aggregate functions; in this case, they return the distinct values of a field, analogous to SQL's SELECT DISTINCT.

;; Get distinct values of Field 10
;; Equivalent SQL: SELECT DISTINCT field_10
{:breakout [[:field-id 10]]}

;; Count of all rows broken out by (i.e., grouped by) distinct values of Field 10
;; Equivalent SQL: SELECT COUNT(*) GROUP BY field_10
{:aggregation [:count]
 :breakout [[:field-id 10]]}

Adding Calculated Columns with the expressions Clause

Expressions are used to add new columns to the results of a query; for each row, their values are derived from calculations involving the values of other columns and/or numeric constants.

Suppose we have a table called sales with data like:

subtotal total
10.00 11.00
50.00 52.50
40.00 41.00

Now suppose we'd like to see what the sales tax for each sale was. We can add an expression to derive sales_tax from total - subtotal. The MBQL form for doing so looks like:

{:expressions {:sales_tax [:- [:field-id <id of total>]
                              [:field-id <id of subtotal>]]}}

The generated SQL will look something like:

SELECT subtotal, total, (total - subtotal) AS sales_tax
FROM sales

The results of a query including this :expressions clause would look like:

subtotal total sales_tax
10.00 11.00 1.00
50.00 52.50 2.50
40.00 41.00 1.00
Note

Expressions are not available for all drivers; drivers that offer them advertise the :expressions feature.

Valid expression Operators

The behavior of the various expression operators above should be self-explanatory. Nested expressions may be used as one or both arguments, as may numeric literals; note, however, that field references should refer to a field with numeric values, or the query will fail (it doesn't make sense to generate to try to divide a string, for example). Field references also may not be expression references, which are otherwise allowed wherever a concrete field is specified; this restriction reflects inherent limitations in underlying database systems.

Referring to Expressions

You can refer to the results of expressions in other clauses. See expression references for more details.

Future Additions

More operators may be added in the future if expressions prove popular. Possibilities include string operators, such as concatenation, uppercasing, lowercasing, etc.

Specifying Which Columns Should Be Returned with fields

Conceptually similar to a SQL SELECT statement; specifies which Fields the query should return. When specified, fields is a sequence of one or more field references:

{:fields [[:field-id 100] [:field-id 200]]}

Filtering Results with filter

filter is used to "filter" the the rows returned by a query, and is analogous to the SQL WHERE clause.

Filter Clauses

A filter clause is one of the following:

and Filter

Compound filter clause: return rows that satisfy all clauses. clauses can be any valid filter clause, including nested compound :and or :or clauses.

If only a single clause is passed, it is treated as if there were no surrounding :and clause:

[:and [:= <field> <value>]] -> [:= <field> <value>]

or Filter

Compound filter clause: return rows that satisfy one or more clauses. clauses can be any valid filter clause, including nested compound :and or :or clauses.

If only a single clause is passed, it is treated as if there were no surrounding :or clause:

[:or [:= <field> <value>]] -> [:= <field> <value>]

not Filter

The not filter returns results that do not match its argument (i.e., it logically negates its argument). not works exactly like SQL's NOT or Clojure's not function.

In most cases a not form is automatically translated to a simpler, logically equivalent form:

(not (and <x> <y>)) -> (or (not <x>) (not <y>))
(not (not <x>))     -> <x>
(not (= <x> <y>)    -> (!= <x> <y>)

The negation of some filters (like starts-with) have no logical equivalent in MBQL, so drivers still need to know how to handle not.

Equals Filter: =

With a single value, return rows where the values of field match value. With multiple values, this behaves like a SQL IN operator, and returns rows that match any of the values.

[:= <field> 100]     ; return rows where <field> == 100
[:= <field> 100 200] ; return rows where <field> == 100 *or* field == 200

Internally, = filters with multiple values are translated to compound statements, e.g.:

[:= <field> 100 200]

  -> [:or [:= <field> 100]
          [:= <field> 200]]

Not Equals Filter: !=

With a single value, return rows where the values of field are not equal to value. With multiple values, this behaves like a SQL NOT IN operator, and returns rows that match none of the values.

[:!= <field> 100]     ; return rows where <field> != 100
[:!= <field> 100 200] ; return rows where <field> != 100 *and* field != 200

Internally, != filters with multiple values are translated to compound statements, e.g.:

[:!= <field> 100 200]

  -> [:and [:!= <field> 100]
           [:!= <field> 200]]

Less Than Filter: <

Return results where field is less than value. field and value must be orderable, i.e. numbers or a datetimes.

Greater Than Filter: >

Return results where field is greater than value. field and value must be orderable, i.e. numbers or a datetimes.

Less Than or Equal Filter: <=

Return results where field is less than or equal to value. field and value must be orderable, i.e. numbers or a datetimes.

Greater Than or Equal Filter: >=

Return results where field is greater than or equal to value. field and value must be orderable, i.e. numbers or a datetimes.

is-null Filter

Return results where field is nil. Convenience for [:= <field> nil]. This filter isn't particularly useful in my opinion and may be deprecated in the future; prefer using = instead.

not-null Filter

Return results where field is non-nil. Convenience for [:!= <field> nil]. This filter isn't particularly useful in my opinion and may be deprecated in the future; prefer using != instead.

between Filter: Values in a Certain Range

Analogous to the SQL BETWEEN operator. Return results where field is between min and max, inclusive. field, min, and max must be orderable, i.e. numbers or datetimes.

between is conceptually similar to

[:and [:>= <field> <min>]
      [:<= <field> <max>]]

and can be implemented by drivers with no corresponding native concept as such.

inside Filter: Values in a Geo Bounding Box

inside is used for doing geo bounding boxes:

Drivers don't need to specifically handle inside filters. Internally, they're automatically translated to to a pair of between clauses:

[:inside <lat> <lon> <lat-max> <lon-min> <lat-min> <lon-max>]

  -> [:and [:between <lat-field> <lat-min> <lat-max>]
           [:between <lon-field> <lon-min> <lon-max>]]

starts-with Filter: Values That Start With a Substring

The starts-with filter restricts results to those where values of string field start with the substring value. Analogous to SQL <field> LIKE '<value>%'.

contains Filter: Values That Contain a Substring

The contains filter restricts results to those where values of string field contain the substring value. Analogous to SQL <field> LIKE '%<value>%'.

e.g.

;; return results where values of string Field 10 contain the substring "BBQ"
{:filter [:contains [:field-id 10] "BBQ"]}

does-not-contain Filter: Values That Do Not Contain a Substring

The does not contain filter restricts results to those where values of string field do not contain the substring value. Analogous to SQL NOT (<field> LIKE '%<value>%').

This filter is just syntactic sugar for wrapping a contains filter in a not form, and is implemented as such on the backend (i.e., individual drivers don't need to worry about handling this clause directly).

[:does-not-contain <field> <string>] -> [:not [:contains <field> <string>]]

Prefer using the equivalent not/contains form instead whenever possible; this form will likely be deprecated at some point in the future. It is provided for the time being solely as a convenience to the Metabase frontend until support for not is implemented.

ends-with Filter: Values That End With a Substring

The ends-with filter restricts results to those where values of string field end with the substring value. Analogous to SQL <field> LIKE '%<value>'.

time-interval Filter: Values in a Certain Datetime Range

time-interval is syntactic sugar for restricting results to ones where datetime field matches a certain datetime or range of datetimes.

;; Assuming today is Feb 8th, 2016...

;; return results where <field> is any time on Feb 8th 2016
[:time-interval <field> :current :day]

;; return results where <field> is during the previous month (any date in Jan 2016)
[:time-interval <field> :last :month]

;; Return results in the last 30 days, excluding the current day (Jan 8th <-> Feb 7th)
[:time-interval <field> -30 :day]

:current, :last, and :next are allowed as values on n for readability purposes and have the same effect as as the integers 0, -1, and 1, respectively.

time-interval is translated into a = or between filter, depending on the value of n.

TODO Examples of translation to =/between clauses

Limiting Results with limit

limit specifies the maximum number of rows that should be returned by the query. It is analogous to a SQL LIMIT clause (or SELECT TOP in MS SQL).

ex.

{:source-table 20
 :limit        10} ; return the first 10 rows of Table 20

n must be greater than zero.

Ordering Results with order-by

You can specify how results should be sorted with order-by. When specified, it consists of one or more order by clauses, each of which specify a field and sort direction (ascending or descending).

order-by is analogous to SQL's ORDER BY. Like SQL, you can specify multiple columns to sort by; sorting is done by the first column, then by the second in the case of duplicate values for the first, and so forth.

Ex.

;; Sort by values of Field 10 in ascending order
{:order-by    [[:asc [:field-id 10]]]}

;; Sort by :count in descending order
{:aggregation [:count]
 :breakout    [[:field-id 10]]
 :order-by    [[:desc [:aggregate-field 0]]]}

Order By Clauses

  • [direction :asc|:desc field field]

Source Table Clause

The source table clause specifies the ID of the Metabase Table that should be queried. A Metabase Table is the internal object used to represent a physical table/collection/datasource/etc and store metadata about it.

This clause is roughly equivalent in purpose to a simple SQL FROM clause (i.e., on with no joins or sub-selects). Local Field References are references to Metabase Fields that belong to this Table. To include data from other Tables (and perform the equivalent of a SQL JOIN) you can include Foreign Field References somewhere in your query.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.