Skip to content

Commit

Permalink
SQL: Doc on syntax (identifiers in particular) (#38662)
Browse files Browse the repository at this point in the history
Add section on syntax, identifiers and literals and on single vs double quotes.
  • Loading branch information
costin authored Feb 15, 2019
1 parent b9fe312 commit aafdb59
Show file tree
Hide file tree
Showing 10 changed files with 269 additions and 16 deletions.
26 changes: 24 additions & 2 deletions docs/reference/sql/appendix/syntax-reserved.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

Table with reserved keywords that need to be quoted. Also provide an example to make it more obvious.

The following table lists all of the keywords that are reserved in Presto,
The following table lists all of the keywords that are reserved in {es-sql},
along with their status in the SQL standard. These reserved keywords must
be quoted (using double quotes) in order to be used as an identifier, for example:

Expand All @@ -31,43 +31,65 @@ s|SQL-92
|`BETWEEN` |reserved |reserved
|`BY` |reserved |reserved
|`CAST` |reserved |reserved
|`CATALOG` |reserved |reserved
|`CONVERT` |reserved |reserved
|`CURRENT_DATE` |reserved |reserved
|`CURRENT_TIMESTAMP` |reserved |reserved
|`DAY` |reserved |reserved
|`DAYS` | |
|`DESC` |reserved |reserved
|`DESCRIBE` |reserved |reserved
|`DISTINCT` |reserved |reserved
|`ESCAPE` |reserved |reserved
|`EXISTS` |reserved |reserved
|`EXPLAIN` |reserved |reserved
|`EXTRACT` |reserved |reserved
|`FALSE` |reserved |reserved
|`FIRST` |reserved |reserved
|`FROM` |reserved |reserved
|`FULL` |reserved |reserved
|`GROUP` |reserved |reserved
|`HAVING` |reserved |reserved
|`HOUR` |reserved |reserved
|`HOURS` | |
|`IN` |reserved |reserved
|`INNER` |reserved |reserved
|`INTERVAL` |reserved |reserved
|`IS` |reserved |reserved
|`JOIN` |reserved |reserved
|`LEFT` |reserved |reserved
|`LIKE` |reserved |reserved
|`LIMIT` |reserved |reserved
|`MATCH` |reserved |reserved
|`MINUTE` |reserved |reserved
|`MINUTES` | |
|`MONTH` |reserved |reserved
|`NATURAL` |reserved |reserved
|`NO` |reserved |reserved
|`NOT` |reserved |reserved
|`NULL` |reserved |reserved
|`NULLS` | |
|`ON` |reserved |reserved
|`OR` |reserved |reserved
|`ORDER` |reserved |reserved
|`OUTER` |reserved |reserved
|`RIGHT` |reserved |reserved
|`RLIKE` | |
|`QUERY` | |
|`SECOND` |reserved |reserved
|`SECONDS` | |
|`SELECT` |reserved |reserved
|`SESSION` | |reserved
|`TABLE` |reserved |reserved
|`TABLES` | |
|`THEN` |reserved |reserved
|`TO` |reserved |reserved
|`TRUE` |reserved |reserved
|`TYPE` | |
|`USING` |reserved |reserved
|`WHEN` |reserved |reserved
|`WHERE` |reserved |reserved
|`WITH` |reserved |reserved
|`YEAR` |reserved |reserved
|`YEARS` | |

|===
23 changes: 12 additions & 11 deletions docs/reference/sql/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,32 +12,33 @@
[partintro]
--

X-Pack includes a SQL feature to execute SQL against Elasticsearch
X-Pack includes a SQL feature to execute SQL queries against {es}
indices and return results in tabular format.

The following chapters aim to cover everything from usage, to syntax and drivers.
Experience users or those in a hurry might want to jump directly to
the list of SQL <<sql-commands, commands>> and <<sql-functions, functions>>.

<<sql-overview, Overview>>::
Overview of {es-sql} and its features.
<<sql-getting-started, Getting Started>>::
Start using SQL right away in {es}.
<<sql-concepts, Concepts and Terminology>>::
Language conventions across SQL and {es}.
<<sql-security,Security>>::
Securing {es-sql} and {es}.
Secure {es-sql} and {es}.
<<sql-rest,REST API>>::
Accepts SQL in a JSON document, executes it, and returns the
results.
Execute SQL in JSON format over REST.
<<sql-translate,Translate API>>::
Accepts SQL in a JSON document and translates it into a native
Elasticsearch query and returns that.
Translate SQL in JSON format to {es} native query.
<<sql-cli,CLI>>::
Command-line application that connects to {es} to execute
SQL and print tabular results.
Command-line application for executing SQL against {es}.
<<sql-jdbc,JDBC>>::
A JDBC driver for {es}.
JDBC driver for {es}.
<<sql-odbc,ODBC>>::
An ODBC driver for {es}.
ODBC driver for {es}.
<<sql-client-apps,Client Applications>>::
Documentation for configuring various SQL/BI tools with {es-sql}.
Setup various SQL/BI tools with {es-sql}.
<<sql-spec,SQL Language>>::
Overview of the {es-sql} language, such as supported data types, commands and
syntax.
Expand Down
8 changes: 5 additions & 3 deletions docs/reference/sql/language/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@
[[sql-spec]]
== SQL Language

This chapter describes the SQL semantics supported in X-Pack namely:
This chapter describes the SQL syntax and semantics supported namely:

<<sql-data-types>>:: Data types
<<sql-lexical-structure>>:: Lexical structure
<<sql-commands>>:: Commands
<<sql-data-types>>:: Data types
<<sql-index-patterns>>:: Index patterns

include::syntax/lexic/index.asciidoc[]
include::syntax/commands/index.asciidoc[]
include::data-types.asciidoc[]
include::syntax/index.asciidoc[]
include::index-patterns.asciidoc[]
228 changes: 228 additions & 0 deletions docs/reference/sql/language/syntax/lexic/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
[role="xpack"]
[testenv="basic"]
[[sql-lexical-structure]]
== Lexical Structure

This section covers the major lexical structure of SQL, which for the most part, is going to resemble that of ANSI SQL itself hence why low-levels details are not discussed in depth.

{es-sql} currently accepts only one _command_ at a time. A command is a sequence of _tokens_ terminated by the end of input stream.

A token can be a __key word__, an _identifier_ (_quoted_ or _unquoted_), a _literal_ (or constant) or a special character symbol (typically a delimiter). Tokens are typically separated by whitespace (be it space, tab) though in some cases, where there is no ambiguity (typically due to a character symbol) this is not needed - however for readability purposes this should be avoided.

[[sql-syntax-keywords]]
[float]
=== Key Words

Take the following example:

[source, sql]
----
SELECT * FROM table
----

This query has four tokens: `SELECT`, `\*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are __key words__ meaning words that have a fixed meaning in SQL. The token `table` is an _identifier_ meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc...

As one can see, both key words and identifiers have the _same_ lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the <<sql-syntax-reserved, reserved appendix>>.
Do note that key words are case-insensitive meaning the previous example can be written as:

[source, sql]
----
select * fRoM table;
----

Identifiers however are not - as {es} is case sensitive, {es-sql} uses the received value verbatim.

To help differentiate between the two, through-out the documentation the SQL key words are upper-cased a convention we find increases readability and thus recommend to others.

[[sql-syntax-identifiers]]
[float]
=== Identifiers

Identifiers can be of two types: __quoted__ and __unquoted__:

[source, sql]
----
SELECT ip_address FROM "hosts-*"
----

This query has two identifiers, `ip_address` and `hosts-\*` (an <<multi-index,index pattern>>). As `ip_address` does not clash with any key words it can be used verbatim, `hosts-*` on the other hand cannot as it clashes with `-` (minus operation) and `*` hence the double quotes.

Another example:

[source, sql]
----
SELECT "from" FROM "<logstash-{now/d}>"
----

The first identifier from needs to quoted as otherwise it clashes with the `FROM` key word (which is case insensitive as thus can be written as `from`) while the second identifier using {es} <<date-math-index-names>> would have otherwise confuse the parser.

Hence why in general, *especially* when dealing with user input it is *highly* recommended to use quotes for identifiers. It adds minimal increase to your queries and in return offers clarity and disambiguation.

[[sql-syntax-literals]]
[float]
=== Literals (Constants)

{es-sql} supports two kind of __implicitly-typed__ literals: strings and numbers.

[[sql-syntax-string-literals]]
[float]
==== String Literals

A string literal is an arbitrary number of characters bounded by single quotes `'`: `'Giant Robot'`.
To include a single quote in the string, escape it using another single quote: `'Captain EO''s Voyage'`.

NOTE: An escaped single quote is *not* a double quote (`"`), but a single quote `'` _repeated_ (`''`).

[sql-syntax-numeric-literals]
[float]
==== Numeric Literals

Numeric literals are accepted both in decimal and scientific notation with exponent marker (`e` or `E`), starting either with a digit or decimal point `.`:

[source, sql]
----
1969 -- integer notation
3.14 -- decimal notation
.1234 -- decimal notation starting with decimal point
4E5 -- scientific notation (with exponent marker)
1.2e-3 -- scientific notation with decimal point
----

Numeric literals that contain a decimal point are always interpreted as being of type `double`. Those without are considered `integer` if they fit otherwise their type is `long` (or `BIGINT` in ANSI SQL types).

[[sql-syntax-generic-literals]]
[float]
==== Generic Literals

When dealing with arbitrary type literal, one creates the object by casting, typically, the string representation to the desired type. This can be achieved through the dedicated <<sql-operators-cast, cast operator>> and <<sql-functions-type-conversion, functions>>:

[source, sql]
----
123::LONG -- cast 123 to a LONG
CAST('1969-05-13T12:34:56' AS TIMESTAMP) -- cast the given string to datetime
CONVERT('10.0.0.1', IP) -- cast '10.0.0.1' to an IP
----

Do note that {es-sql} provides functions that out of the box return popular literals (like `E()`) or provide dedicated parsing for certain strings.

[[sql-syntax-single-vs-double-quotes]]
[float]
=== Single vs Double Quotes

It is worth pointing out that in SQL, single quotes `'` and double quotes `"` have different meaning and *cannot* be used interchangeably.
Single quotes are used to declare a <<sql-syntax-string-literals, string literal>> while double quotes for <<sql-syntax-identifiers, identifiers>>.

To wit:

[source, sql]
----
SELECT "first_name" <1>
FROM "musicians" <1>
WHERE "last_name" <1>
= 'Carroll' <2>
----

<1> Double quotes `"` used for column and table identifiers
<2> Single quotes `'` used for a string literal

[[sql-syntax-special-chars]]
[float]
=== Special characters

A few characters that are not alphanumeric have a dedicated meaning different from that of an operator. For completeness these are specified below:


[cols="^m,^15"]

|===

s|Char
s|Description

|* | The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions.
|, | Commas are used to enumerate the elements of a list.
|. | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc...).
|()| Parentheses are used for specific SQL commands, function declarations or to enforce precedence.
|===

[[sql-syntax-operators]]
[float]
=== Operators

Most operators in {es-sql} have the same precedence and are left-associative. As this is done at parsing time, parenthesis need to be used to enforce a different precedence.

The following table indicates the supported operators and their precendence (highest to lowest);

[cols="^2m,^,^3"]

|===

s|Operator/Element
s|Associativity
s|Description

|.
|left
|qualifier separator

|::
|left
|PostgreSQL-style type cast

|+ -
|right
|unary plus and minus (numeric literal sign)

|* / %
|left
|multiplication, division, modulo

|+ -
|left
|addition, substraction

|BETWEEN IN LIKE
|
|range containment, string matching

|< > <= >= = <=> <> !=
|
|comparison

|NOT
|right
|logical negation

|AND
|left
|logical conjunction

|OR
|left
|logical disjunction

|===


[[sql-syntax-comments]]
[float]
=== Comments

{es-sql} allows comments which are sequence of characters ignored by the parsers.

Two styles are supported:

Single Line:: Comments start with a double dash `--` and continue until the end of the line.
Multi line:: Comments that start with `/\*` and end with `*/` (also known as C-style).


[source, sql]
----
-- single line comment
/* multi
line
comment
that supports /* nested comments */
*/
----

0 comments on commit aafdb59

Please sign in to comment.