Skip to content

Commit

Permalink
SQL: Introduce HISTOGRAM grouping function (#36510)
Browse files Browse the repository at this point in the history
Introduce Histogram grouping function for bucketing/grouping data based
 on a given range. Both date and numeric histograms are supported using
 the appropriate range declaration (numbers vs intervals).

SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h
SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h

In addition add multiply operator for Intervals
Add docs for intervals and histogram

Fix #36509

(cherry picked from commit 6ee6bb5)
  • Loading branch information
costin committed Dec 14, 2018
1 parent 585c364 commit adb9aa4
Show file tree
Hide file tree
Showing 56 changed files with 1,428 additions and 408 deletions.
2 changes: 1 addition & 1 deletion docs/reference/sql/concepts.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,4 @@ Multiple clusters, each with its own namespace, connected to each other in a fed

|===

As one can see while the mapping between the concepts are not exactly one to one and the semantics somewhat different, there are more things in common than differences. In fact, thanks to SQL declarative nature, many concepts can move across {es} transparently and the terminology of the two likely to be used interchangeably through-out the rest of the material.
As one can see while the mapping between the concepts are not exactly one to one and the semantics somewhat different, there are more things in common than differences. In fact, thanks to SQL declarative nature, many concepts can move across {es} transparently and the terminology of the two likely to be used interchangeably throughout the rest of the material.
96 changes: 95 additions & 1 deletion docs/reference/sql/functions/date-time.asciidoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,101 @@
[role="xpack"]
[testenv="basic"]
[[sql-functions-datetime]]
=== Date and Time Functions
=== Date/Time and Interval Functions and Operators

beta[]

{es-sql} offers a wide range of facilities for performing date/time manipulations.

[[sql-functions-datetime-interval]]
==== Intervals

A common requirement when dealing with date/time in general revolves around
the notion of ``interval``s, a topic that is worth exploring in the context of {es} and {es-sql}.

{es} has comprehensive support for <<date-math, date math>> both inside <<date-math-index-names, index names>> and <<mapping-date-format, queries>>.
Inside {es-sql} the former is supported as is by passing the expression in the table name, while the latter is supported through the standard SQL `INTERVAL`.

The table below shows the mapping between {es} and {es-sql}:

[cols="^m,^m",options="header"]

|===
| {es} | {es-sql}

2+h| Index/Table date math

2+|<index-{now/M{YYYY.MM}}>

2+h| Query date math

| 1y | INTERVAL 1 YEAR
| 2M | INTERVAL 2 MONTH
| 3w | INTERVAL 21 DAY
| 4d | INTERVAL 4 DAY
| 5h | INTERVAL 5 HOUR
| 6m | INTERVAL 6 MINUTE
| 7s | INTERVAL 7 SECOND

|===

`INTERVAL` allows either `YEAR` and `MONTH` to be mixed together _or_ `DAY`, `HOUR`, `MINUTE` and `SECOND`.

TIP: {es-sql} accepts also the plural for each time unit (e.g. both `YEAR` and `YEARS` are valid).

Example of the possible combinations below:

[cols="^,^",options="header"]

|===
| Interval | Description

| `INTERVAL '1-2' YEAR TO MONTH` | 1 year and 2 months
| `INTERVAL '3 4' DAYS TO HOURS` | 3 days and 4 hours
| `INTERVAL '5 6:12' DAYS TO MINUTES` | 5 days, 6 hours and 12 minutes
| `INTERVAL '3 4:56:01' DAY TO SECOND` | 3 days, 4 hours, 56 minutes and 1 second
| `INTERVAL '2 3:45:01.23456789' DAY TO SECOND` | 2 days, 3 hours, 45 minutes, 1 second and 234567890 nanoseconds
| `INTERVAL '123:45' HOUR TO MINUTES` | 123 hours and 45 minutes
| `INTERVAL '65:43:21.0123' HOUR TO SECONDS` | 65 hours, 43 minutes, 21 seconds and 12300000 nanoseconds
| `INTERVAL '45:01.23' MINUTES TO SECONDS` | 45 minutes, 1 second and 230000000 nanoseconds

|===

==== Operators

Basic arithmetic operators (`+`, `-`, etc) support date-time parameters as indicated below:

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtIntervalPlusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtDatePlusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtMinusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtIntervalMinusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtDateMinusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtIntervalMul]
--------------------------------------------------

==== Functions

beta[]

Expand Down
54 changes: 54 additions & 0 deletions docs/reference/sql/functions/grouping.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
[role="xpack"]
[testenv="basic"]
[[sql-functions-grouping]]
=== Grouping Functions

beta[]

Functions for creating special __grouping__s (also known as _bucketing_); as such these need to be used
as part of the <<sql-syntax-group-by, grouping>>.

[[sql-functions-grouping-histogram]]
==== `HISTOGRAM`

.Synopsis
[source, sql]
----
HISTOGRAM ( numeric_exp<1>, numeric_interval<2>)
HISTOGRAM ( date_exp<3>, date_time_interval<4>)
----

*Input*:

<1> numeric expression (typically a field)
<2> numeric interval
<3> date/time expression (typically a field)
<4> date/time <<sql-functions-datetime-interval, interval>>

*Output*: non-empty buckets or groups of the given expression divided according to the given interval

.Description

The histogram function takes all matching values and divides them into buckets with fixed size matching the given interval, using (roughly) the following formula:

[source, sql]
----
bucket_key = Math.floor(value / interval) * interval
----

`Histogram` can be applied on either numeric fields:


["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs.csv-spec[histogramNumeric]
----

or date/time fields:

["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs.csv-spec[histogramDate]
----


2 changes: 2 additions & 0 deletions docs/reference/sql/functions/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ beta[]

* <<sql-operators, Operators>>
* <<sql-functions-aggs, Aggregate>>
* <<sql-functions-grouping, Grouping>>
* <<sql-functions-datetime, Date-Time>>
* <<sql-functions-search, Full-Text Search>>
* <<sql-functions-math, Mathematical>>
Expand All @@ -19,6 +20,7 @@ beta[]

include::operators.asciidoc[]
include::aggs.asciidoc[]
include::grouping.asciidoc[]
include::date-time.asciidoc[]
include::search.asciidoc[]
include::math.asciidoc[]
Expand Down
73 changes: 51 additions & 22 deletions docs/reference/sql/language/data-types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,42 +7,71 @@ beta[]

Most of {es} <<mapping-types, data types>> are available in {es-sql}, as indicated below:

[cols="^,^,^",options="header"]
[cols="^,^m,^",options="header"]

|===
| {es} type | SQL type | SQL precision
| {es} type | SQL type | SQL precision

3+h| Core types

| <<null-value, `null`>> | `null` | 0
| <<boolean, `boolean`>> | `boolean` | 1
| <<number, `byte`>> | `tinyint` | 3
| <<number, `short`>> | `smallint` | 5
| <<number, `integer`>> | `integer` | 10
| <<number, `long`>> | `bigint` | 19
| <<number, `double`>> | `double` | 15
| <<number, `float`>> | `real` | 7
| <<number, `half_float`>> | `float` | 16
| <<number, `scaled_float`>> | `float` | 19
| <<keyword, `keyword`>> | `varchar` | based on <<ignore-above>>
| <<text, `text`>> | `varchar` | 2,147,483,647
| <<binary, `binary`>> | `varbinary` | 2,147,483,647
| <<date, `date`>> | `timestamp` | 24

3+h| Complex types

| <<object, `object`>> | `struct` | 0
| <<nested, `nested`>> | `struct` | 0
| <<null-value, `null`>> | null | 0
| <<boolean, `boolean`>> | boolean | 1
| <<number, `byte`>> | tinyint | 3
| <<number, `short`>> | smallint | 5
| <<number, `integer`>> | integer | 10
| <<number, `long`>> | bigint | 19
| <<number, `double`>> | double | 15
| <<number, `float`>> | real | 7
| <<number, `half_float`>> | float | 16
| <<number, `scaled_float`>> | float | 19
| <<keyword, `keyword`>> | varchar | based on <<ignore-above>>
| <<text, `text`>> | varchar | 2,147,483,647
| <<binary, `binary`>> | varbinary | 2,147,483,647
| <<date, `date`>> | timestamp | 24
| <<ip, `ip`>> | varchar | 39

3+h| Complex types

| <<object, `object`>> | struct | 0
| <<nested, `nested`>> | struct | 0

3+h| Unsupported types

| _types not mentioned above_ | `unsupported`| 0
| _types not mentioned above_ | unsupported | 0

|===


Obviously, not all types in {es} have an equivalent in SQL and vice-versa hence why, {es-sql}
uses the data type _particularities_ of the former over the latter as ultimately {es} is the backing store.

In addition to the types above, {es-sql} also supports at _runtime_ SQL-specific types that do not have an equivalent in {es}.
Such types cannot be loaded from {es} (as it does not know about them) however can be used inside {es-sql} in queries or their results.

The table below indicates these types:

[cols="^m,^",options="header"]

|===
| SQL type | SQL precision


| interval_year | 7
| interval_month | 7
| interval_day | 23
| interval_hour | 23
| interval_minute | 23
| interval_second | 23
| interval_year_to_month | 7
| interval_day_to_hour | 23
| interval_day_to_minute | 23
| interval_day_to_second | 23
| interval_hour_to_minute | 23
| interval_hour_to_second | 23
| interval_minute_to_second | 23

|===


[[sql-multi-field]]
[float]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,7 @@ public boolean wasNull() throws SQLException {

@Override
public String getString(int columnIndex) throws SQLException {
Object val = column(columnIndex);
return val != null ? val.toString() : null;
return getObject(columnIndex, String.class);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
*/
package org.elasticsearch.xpack.sql.jdbc;

import org.elasticsearch.xpack.sql.proto.StringUtils;

import java.sql.Date;
import java.sql.SQLException;
import java.sql.SQLFeatureNotSupportedException;
Expand Down Expand Up @@ -118,10 +120,11 @@ static <T> T convert(Object val, EsType columnType, Class<T> type, String typeSt
return (T) convert(val, columnType, typeString);
}

// converting a Long to a Timestamp shouldn't be possible according to the spec,
// it feels a little brittle to check this scenario here and I don't particularly like it
// TODO: can we do any better or should we go over the spec and allow getLong(date) to be valid?
if (!(type == Long.class && columnType == EsType.DATE) && type.isInstance(val)) {
// if the value type is the same as the target, no conversion is needed
// make sure though to check the internal type against the desired one
// since otherwise the internal object format can leak out
// (for example dates when longs are requested or intervals for strings)
if (type.isInstance(val) && TypeUtils.classOf(columnType) == type) {
try {
return type.cast(val);
} catch (ClassCastException cce) {
Expand Down Expand Up @@ -268,7 +271,7 @@ private static Float floatValue(Object v) {
}

private static String asString(Object nativeValue) {
return nativeValue == null ? null : String.valueOf(nativeValue);
return nativeValue == null ? null : StringUtils.toString(nativeValue);
}

private static <T> T failConversion(Object value, EsType columnType, String typeString, Class<T> target) throws SQLException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ public void testExplainBasic() throws IOException {
assertThat(command("EXPLAIN " + (randomBoolean() ? "" : "(PLAN ANALYZED) ") + "SELECT * FROM test"), containsString("plan"));
assertThat(readLine(), startsWith("----------"));
assertThat(readLine(), startsWith("Project[[test_field{f}#"));
assertThat(readLine(), startsWith("\\_SubQueryAlias[test]"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][test_field{f}#"));
assertThat(readLine(), startsWith("\\_EsRelation[test][test_field{f}#"));
assertEquals("", readLine());

assertThat(command("EXPLAIN (PLAN OPTIMIZED) SELECT * FROM test"), containsString("plan"));
Expand Down Expand Up @@ -74,8 +73,7 @@ public void testExplainWithWhere() throws IOException {
assertThat(readLine(), startsWith("----------"));
assertThat(readLine(), startsWith("Project[[i{f}#"));
assertThat(readLine(), startsWith("\\_Filter[i{f}#"));
assertThat(readLine(), startsWith(" \\_SubQueryAlias[test]"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][i{f}#"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][i{f}#"));
assertEquals("", readLine());

assertThat(command("EXPLAIN (PLAN OPTIMIZED) SELECT * FROM test WHERE i = 2"), containsString("plan"));
Expand Down Expand Up @@ -134,8 +132,7 @@ public void testExplainWithCount() throws IOException {
containsString("plan"));
assertThat(readLine(), startsWith("----------"));
assertThat(readLine(), startsWith("Aggregate[[],[COUNT(1)#"));
assertThat(readLine(), startsWith("\\_SubQueryAlias[test]"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][i{f}#"));
assertThat(readLine(), startsWith("\\_EsRelation[test][i{f}#"));
assertEquals("", readLine());

assertThat(command("EXPLAIN (PLAN OPTIMIZED) SELECT COUNT(*) FROM test"), containsString("plan"));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ protected void assertResults(ResultSet expected, ResultSet elastic) throws SQLEx

@Override
protected boolean logEsResultSet() {
return true;
return false;
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ public void testShowFunctions() throws IOException {
while (aggregateFunction.matcher(line).matches()) {
line = readLine();
}
Pattern groupingFunction = Pattern.compile("\\s*[A-Z0-9_~]+\\s*\\|\\s*GROUPING\\s*");
while (groupingFunction.matcher(line).matches()) {
line = readLine();
}
Pattern conditionalFunction = Pattern.compile("\\s*[A-Z0-9_~]+\\s*\\|\\s*CONDITIONAL\\s*");
while (conditionalFunction.matcher(line).matches()) {
line = readLine();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ public abstract class DebugSqlSpec extends SqlSpecTestCase {
@ParametersFactory(shuffle = false, argumentFormatting = PARAM_FORMATTING)
public static List<Object[]> readScriptSpec() throws Exception {
Parser parser = specParser();
return readScriptSpec("/debug.sql-spec", parser);
return readScriptSpec("/datetime.sql-spec", parser);
}

public DebugSqlSpec(String fileName, String groupName, String testName, Integer lineNumber, String query) {
Expand Down

0 comments on commit adb9aa4

Please sign in to comment.