Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Introduce HISTOGRAM grouping function #36510

Merged
merged 5 commits into from
Dec 14, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/reference/sql/concepts.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,4 @@ Multiple clusters, each with its own namespace, connected to each other in a fed

|===

As one can see while the mapping between the concepts are not exactly one to one and the semantics somewhat different, there are more things in common than differences. In fact, thanks to SQL declarative nature, many concepts can move across {es} transparently and the terminology of the two likely to be used interchangeably through-out the rest of the material.
As one can see while the mapping between the concepts are not exactly one to one and the semantics somewhat different, there are more things in common than differences. In fact, thanks to SQL declarative nature, many concepts can move across {es} transparently and the terminology of the two likely to be used interchangeably throughout the rest of the material.
96 changes: 95 additions & 1 deletion docs/reference/sql/functions/date-time.asciidoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,101 @@
[role="xpack"]
[testenv="basic"]
[[sql-functions-datetime]]
=== Date and Time Functions
=== Date/Time and Interval Functions and Operators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the beta[] marker

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

beta[]

{es-sql} offers a wide range of facilities for performing date/time manipulations.

[[sql-functions-datetime-interval]]
==== Intervals

A common requirement when dealing with date/time in general revolves around
the notion of ``interval``s, a topic that is worth exploring in the context of {es} and {es-sql}.

{es} has comprehensive support for <<date-math, date math>> both inside <<date-math-index-names, index names>> and <<mapping-date-format, queries>>.
Inside {es-sql} the former is supported as is by passing the expression in the table name, while the latter is supported through the standard SQL `INTERVAL`.

The table below shows the mapping between {es} and {es-sql}:

[cols="^m,^m",options="header"]

|===
| {es} | {es-sql}

2+h| Index/Table date math

2+|<index-{now/M{YYYY.MM}}>

2+h| Query date math

| 1y | INTERVAL 1 YEAR
| 2M | INTERVAL 2 MONTH
| 3w | INTERVAL 21 DAY
| 4d | INTERVAL 4 DAY
| 5h | INTERVAL 5 HOUR
| 6m | INTERVAL 6 MINUTE
| 7s | INTERVAL 7 SECOND

|===

`INTERVAL` allows either `YEAR` and `MONTH` to be mixed together _or_ `DAY`, `HOUR`, `MINUTE` and `SECOND`.

TIP: {es-sql} accepts also the plural for each time unit (e.g. both `YEAR` and `YEARS` are valid).

Example of the possible combinations below:

[cols="^,^",options="header"]

|===
| Interval | Description

| `INTERVAL '1-2' YEAR TO MONTH` | 1 year and 2 months
| `INTERVAL '3 4' DAYS TO HOURS` | 3 days and 4 hours
| `INTERVAL '5 6:12' DAYS TO MINUTES` | 5 days, 6 hours and 12 minutes
| `INTERVAL '3 4:56:01' DAY TO SECOND` | 3 days, 4 hours, 56 minutes and 1 second
| `INTERVAL '2 3:45:01.23456789' DAY TO SECOND` | 2 days, 3 hours, 45 minutes, 1 second and 234567890 nanoseconds
| `INTERVAL '123:45' HOUR TO MINUTES` | 123 hours and 45 minutes
| `INTERVAL '65:43:21.0123' HOUR TO SECONDS` | 65 hours, 43 minutes, 21 seconds and 12300000 nanoseconds
| `INTERVAL '45:01.23' MINUTES TO SECONDS` | 45 minutes, 1 second and 230000000 nanoseconds

|===

==== Operators

Basic arithmetic operators (`+`, `-`, etc) support date-time parameters as indicated below:

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtIntervalPlusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtDatePlusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtMinusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtIntervalMinusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtDateMinusInterval]
--------------------------------------------------

["source","sql",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{sql-specs}/docs.csv-spec[dtIntervalMul]
--------------------------------------------------

==== Functions

beta[]

Expand Down
54 changes: 54 additions & 0 deletions docs/reference/sql/functions/grouping.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
[role="xpack"]
[testenv="basic"]
[[sql-functions-grouping]]
=== Grouping Functions

beta[]

Functions for creating special __grouping__s (also known as _bucketing_); as such these need to be used
as part of the <<sql-syntax-group-by, grouping>>.

[[sql-functions-grouping-histogram]]
==== `HISTOGRAM`

.Synopsis
[source, sql]
----
HISTOGRAM ( numeric_exp<1>, numeric_interval<2>)
HISTOGRAM ( date_exp<3>, date_time_interval<4>)
----

*Input*:

<1> numeric expression (typically a field)
<2> numeric interval
<3> date/time expression (typically a field)
<4> date/time <<sql-functions-datetime-interval, interval>>

*Output*: non-empty buckets or groups of the given expression divided according to the given interval

.Description

The histogram function takes all matching values and divides them into buckets with fixed size matching the given interval, using (roughly) the following formula:

[source, sql]
----
bucket_key = Math.floor(value / interval) * interval
----

`Histogram` can be applied on either numeric fields:


["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs.csv-spec[histogramNumeric]
----

or date/time fields:

["source","sql",subs="attributes,callouts,macros"]
----
include-tagged::{sql-specs}/docs.csv-spec[histogramDate]
----


2 changes: 2 additions & 0 deletions docs/reference/sql/functions/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ beta[]

* <<sql-operators, Operators>>
* <<sql-functions-aggs, Aggregate>>
* <<sql-functions-grouping, Grouping>>
* <<sql-functions-datetime, Date-Time>>
* <<sql-functions-search, Full-Text Search>>
* <<sql-functions-math, Mathematical>>
Expand All @@ -19,6 +20,7 @@ beta[]

include::operators.asciidoc[]
include::aggs.asciidoc[]
include::grouping.asciidoc[]
include::date-time.asciidoc[]
include::search.asciidoc[]
include::math.asciidoc[]
Expand Down
73 changes: 51 additions & 22 deletions docs/reference/sql/language/data-types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,42 +7,71 @@ beta[]

Most of {es} <<mapping-types, data types>> are available in {es-sql}, as indicated below:

[cols="^,^,^",options="header"]
[cols="^,^m,^",options="header"]

|===
| {es} type | SQL type | SQL precision
| {es} type | SQL type | SQL precision

3+h| Core types

| <<null-value, `null`>> | `null` | 0
| <<boolean, `boolean`>> | `boolean` | 1
| <<number, `byte`>> | `tinyint` | 3
| <<number, `short`>> | `smallint` | 5
| <<number, `integer`>> | `integer` | 10
| <<number, `long`>> | `bigint` | 19
| <<number, `double`>> | `double` | 15
| <<number, `float`>> | `real` | 7
| <<number, `half_float`>> | `float` | 16
| <<number, `scaled_float`>> | `float` | 19
| <<keyword, `keyword`>> | `varchar` | based on <<ignore-above>>
| <<text, `text`>> | `varchar` | 2,147,483,647
| <<binary, `binary`>> | `varbinary` | 2,147,483,647
| <<date, `date`>> | `timestamp` | 24

3+h| Complex types

| <<object, `object`>> | `struct` | 0
| <<nested, `nested`>> | `struct` | 0
| <<null-value, `null`>> | null | 0
| <<boolean, `boolean`>> | boolean | 1
| <<number, `byte`>> | tinyint | 3
| <<number, `short`>> | smallint | 5
| <<number, `integer`>> | integer | 10
| <<number, `long`>> | bigint | 19
| <<number, `double`>> | double | 15
| <<number, `float`>> | real | 7
| <<number, `half_float`>> | float | 16
| <<number, `scaled_float`>> | float | 19
| <<keyword, `keyword`>> | varchar | based on <<ignore-above>>
| <<text, `text`>> | varchar | 2,147,483,647
| <<binary, `binary`>> | varbinary | 2,147,483,647
| <<date, `date`>> | timestamp | 24
| <<ip, `ip`>> | varchar | 39

3+h| Complex types

| <<object, `object`>> | struct | 0
| <<nested, `nested`>> | struct | 0

3+h| Unsupported types

| _types not mentioned above_ | `unsupported`| 0
| _types not mentioned above_ | unsupported | 0

|===


Obviously, not all types in {es} have an equivalent in SQL and vice-versa hence why, {es-sql}
uses the data type _particularities_ of the former over the latter as ultimately {es} is the backing store.

In addition to the types above, {es-sql} also supports at _runtime_ SQL-specific types that do not have an equivalent in {es}.
Such types cannot be loaded from {es} (as it does not know about them) however can be used inside {es-sql} in queries or their results.

The table below indicates these types:

[cols="^m,^",options="header"]

|===
| SQL type | SQL precision


| interval_year | 7
| interval_month | 7
| interval_day | 23
| interval_hour | 23
| interval_minute | 23
| interval_second | 23
| interval_year_to_month | 7
| interval_day_to_hour | 23
| interval_day_to_minute | 23
| interval_day_to_second | 23
| interval_hour_to_minute | 23
| interval_hour_to_second | 23
| interval_minute_to_second | 23

|===


[[sql-multi-field]]
[float]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,7 @@ public boolean wasNull() throws SQLException {

@Override
public String getString(int columnIndex) throws SQLException {
Object val = column(columnIndex);
return val != null ? val.toString() : null;
return getObject(columnIndex, String.class);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
*/
package org.elasticsearch.xpack.sql.jdbc;

import org.elasticsearch.xpack.sql.proto.StringUtils;

import java.sql.Date;
import java.sql.SQLException;
import java.sql.SQLFeatureNotSupportedException;
Expand Down Expand Up @@ -118,10 +120,11 @@ static <T> T convert(Object val, EsType columnType, Class<T> type, String typeSt
return (T) convert(val, columnType, typeString);
}

// converting a Long to a Timestamp shouldn't be possible according to the spec,
// it feels a little brittle to check this scenario here and I don't particularly like it
// TODO: can we do any better or should we go over the spec and allow getLong(date) to be valid?
if (!(type == Long.class && columnType == EsType.DATE) && type.isInstance(val)) {
// if the value type is the same as the target, no conversion is needed
// make sure though to check the internal type against the desired one
// since otherwise the internal object format can leak out
// (for example dates when longs are requested or intervals for strings)
if (type.isInstance(val) && TypeUtils.classOf(columnType) == type) {
try {
return type.cast(val);
} catch (ClassCastException cce) {
Expand Down Expand Up @@ -268,7 +271,7 @@ private static Float floatValue(Object v) {
}

private static String asString(Object nativeValue) {
return nativeValue == null ? null : String.valueOf(nativeValue);
return nativeValue == null ? null : StringUtils.toString(nativeValue);
}

private static <T> T failConversion(Object value, EsType columnType, String typeString, Class<T> target) throws SQLException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ public void testExplainBasic() throws IOException {
assertThat(command("EXPLAIN " + (randomBoolean() ? "" : "(PLAN ANALYZED) ") + "SELECT * FROM test"), containsString("plan"));
assertThat(readLine(), startsWith("----------"));
assertThat(readLine(), startsWith("Project[[test_field{f}#"));
assertThat(readLine(), startsWith("\\_SubQueryAlias[test]"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][test_field{f}#"));
assertThat(readLine(), startsWith("\\_EsRelation[test][test_field{f}#"));
assertEquals("", readLine());

assertThat(command("EXPLAIN (PLAN OPTIMIZED) SELECT * FROM test"), containsString("plan"));
Expand Down Expand Up @@ -74,8 +73,7 @@ public void testExplainWithWhere() throws IOException {
assertThat(readLine(), startsWith("----------"));
assertThat(readLine(), startsWith("Project[[i{f}#"));
assertThat(readLine(), startsWith("\\_Filter[i{f}#"));
assertThat(readLine(), startsWith(" \\_SubQueryAlias[test]"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][i{f}#"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][i{f}#"));
assertEquals("", readLine());

assertThat(command("EXPLAIN (PLAN OPTIMIZED) SELECT * FROM test WHERE i = 2"), containsString("plan"));
Expand Down Expand Up @@ -134,8 +132,7 @@ public void testExplainWithCount() throws IOException {
containsString("plan"));
assertThat(readLine(), startsWith("----------"));
assertThat(readLine(), startsWith("Aggregate[[],[COUNT(1)#"));
assertThat(readLine(), startsWith("\\_SubQueryAlias[test]"));
assertThat(readLine(), startsWith(" \\_EsRelation[test][i{f}#"));
assertThat(readLine(), startsWith("\\_EsRelation[test][i{f}#"));
assertEquals("", readLine());

assertThat(command("EXPLAIN (PLAN OPTIMIZED) SELECT COUNT(*) FROM test"), containsString("plan"));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ protected void assertResults(ResultSet expected, ResultSet elastic) throws SQLEx

@Override
protected boolean logEsResultSet() {
return true;
return false;
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ public void testShowFunctions() throws IOException {
while (aggregateFunction.matcher(line).matches()) {
line = readLine();
}
Pattern groupingFunction = Pattern.compile("\\s*[A-Z0-9_~]+\\s*\\|\\s*GROUPING\\s*");
while (groupingFunction.matcher(line).matches()) {
line = readLine();
}
Pattern conditionalFunction = Pattern.compile("\\s*[A-Z0-9_~]+\\s*\\|\\s*CONDITIONAL\\s*");
while (conditionalFunction.matcher(line).matches()) {
line = readLine();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ public abstract class DebugSqlSpec extends SqlSpecTestCase {
@ParametersFactory(shuffle = false, argumentFormatting = PARAM_FORMATTING)
public static List<Object[]> readScriptSpec() throws Exception {
Parser parser = specParser();
return readScriptSpec("/debug.sql-spec", parser);
return readScriptSpec("/datetime.sql-spec", parser);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert

}

public DebugSqlSpec(String fileName, String groupName, String testName, Integer lineNumber, String query) {
Expand Down