Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Disable access to the field keyword in the new SQL engine #1025

Merged
merged 3 commits into from Feb 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/dev/NewSQLEngine.md
Expand Up @@ -35,11 +35,17 @@ As for correctness, besides full coverage of unit and integration test, we devel

### 3.1 Breaking Changes

#### 3.1.1 Response
Because of implementation changed internally, you can expect Explain output in a different format. For query protocol, there are slightly changes on two fields' value in the default response format:

* **Schema**: Previously the `name` and `alias` value differed for different queries. For consistency, name is always the original text now and alias is its alias defined in SELECT clause or absent if none.
* **Total**: The `total` field represented how many documents matched in total no matter how many returned (indicated by `size` field). However, this field becomes meaningless because of post processing on DSL response in the new query engine. Thus, for now the total number is always same as size field.

#### 3.1.2 Syntax and Semantic

* **field.keyword**: In Elasticsearch, user could assign different type for one field. The most common use case is assign `text` and `keyword` type for string field. In the legacy engine, user could select the `field.keyword` from the index. e.g. `SELECT addr.state FROM account WHERE addr.state = 'WA' `. In the new engine, user don't have access the to the `addr.state` field, instead, the query engine will handle it automatically.


### 3.2 Limitations

You can find all the limitations in [Limitations](/docs/user/limitations/limitations.rst). For these unsupported features, the query will be forwarded to the old query engine by fallback mechanism. To avoid impact on your side, normally you won't see any difference in a query response. If you want to check if and why your query falls back to be handled by old SQL engine, please explain your query and check Elasticsearch log for "Request is falling back to old SQL engine due to ...".
Expand Down
28 changes: 14 additions & 14 deletions docs/experiment/ppl/cmd/search.rst
Expand Up @@ -32,14 +32,14 @@ PPL query::

od> source=accounts;
fetched rows / total rows = 4/4
+------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------+
| account_number | firstname | address | gender | city | lastname | balance | employer | state | age | email |
|------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------|
| 1 | Amber | 880 Holmes Lane | M | Brogan | Duke | 39225 | Pyrami | IL | 32 | amberduke@pyrami.com |
| 6 | Hattie | 671 Bristol Street | M | Dante | Bond | 5686 | Netagy | TN | 36 | hattiebond@netagy.com |
| 13 | Nanette | 789 Madison Street | F | Nogal | Bates | 32838 | Quility | VA | 28 | null |
| 18 | Dale | 467 Hutchinson Court | M | Orick | Adams | 4180 | null | MD | 33 | daleadams@boink.com |
+------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------+
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
|------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------|
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke |
| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | hattiebond@netagy.com | Bond |
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates |
| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | Adams |
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+

Example 2: Fetch data with condition
====================================
Expand All @@ -50,10 +50,10 @@ PPL query::

od> source=accounts account_number=1 or gender="F";
fetched rows / total rows = 2/2
+------------------+-------------+--------------------+----------+--------+------------+-----------+------------+---------+-------+----------------------+
| account_number | firstname | address | gender | city | lastname | balance | employer | state | age | email |
|------------------+-------------+--------------------+----------+--------+------------+-----------+------------+---------+-------+----------------------|
| 1 | Amber | 880 Holmes Lane | M | Brogan | Duke | 39225 | Pyrami | IL | 32 | amberduke@pyrami.com |
| 13 | Nanette | 789 Madison Street | F | Nogal | Bates | 32838 | Quility | VA | 28 | null |
+------------------+-------------+--------------------+----------+--------+------------+-----------+------------+---------+-------+----------------------+
+------------------+-------------+--------------------+-----------+----------+--------+------------+---------+-------+----------------------+------------+
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
|------------------+-------------+--------------------+-----------+----------+--------+------------+---------+-------+----------------------+------------|
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke |
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates |
+------------------+-------------+--------------------+-----------+----------+--------+------------+---------+-------+----------------------+------------+

32 changes: 16 additions & 16 deletions docs/user/general/identifiers.rst
Expand Up @@ -40,14 +40,14 @@ Here are examples for using index pattern directly without quotes::

od> SELECT * FROM *cc*nt*;
fetched rows / total rows = 4/4
+------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------+
| account_number | firstname | address | gender | city | lastname | balance | employer | state | age | email |
|------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------|
| 1 | Amber | 880 Holmes Lane | M | Brogan | Duke | 39225 | Pyrami | IL | 32 | amberduke@pyrami.com |
| 6 | Hattie | 671 Bristol Street | M | Dante | Bond | 5686 | Netagy | TN | 36 | hattiebond@netagy.com |
| 13 | Nanette | 789 Madison Street | F | Nogal | Bates | 32838 | Quility | VA | 28 | null |
| 18 | Dale | 467 Hutchinson Court | M | Orick | Adams | 4180 | null | MD | 33 | daleadams@boink.com |
+------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------+
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
|------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------|
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke |
| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | hattiebond@netagy.com | Bond |
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates |
| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | Adams |
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+


Delimited Identifiers
Expand Down Expand Up @@ -76,14 +76,14 @@ Here are examples for quoting an index name by back ticks::

od> SELECT * FROM `accounts`;
fetched rows / total rows = 4/4
+------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------+
| account_number | firstname | address | gender | city | lastname | balance | employer | state | age | email |
|------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------|
| 1 | Amber | 880 Holmes Lane | M | Brogan | Duke | 39225 | Pyrami | IL | 32 | amberduke@pyrami.com |
| 6 | Hattie | 671 Bristol Street | M | Dante | Bond | 5686 | Netagy | TN | 36 | hattiebond@netagy.com |
| 13 | Nanette | 789 Madison Street | F | Nogal | Bates | 32838 | Quility | VA | 28 | null |
| 18 | Dale | 467 Hutchinson Court | M | Orick | Adams | 4180 | null | MD | 33 | daleadams@boink.com |
+------------------+-------------+----------------------+----------+--------+------------+-----------+------------+---------+-------+-----------------------+
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
|------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------|
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke |
| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | hattiebond@netagy.com | Bond |
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates |
| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | Adams |
+------------------+-------------+----------------------+-----------+----------+--------+------------+---------+-------+-----------------------+------------+


Case Sensitivity
Expand Down
Expand Up @@ -101,15 +101,6 @@ private void flatMappings(
func.accept(fullFieldName, type);
}

if (isMultiField(mapping)) {
((Map<String, Map<String, Object>>) mapping.get("fields"))
.forEach(
(innerFieldName, innerMapping) ->
func.accept(
fullFieldName + "." + innerFieldName,
(String) innerMapping.getOrDefault("type", "object")));
}

if (mapping.containsKey("properties")) { // Nested field
flatMappings((Map<String, Object>) mapping.get("properties"), fullFieldName, func);
}
Expand Down
Expand Up @@ -54,14 +54,22 @@ public class ElasticsearchIndex implements Table {
/** Current Elasticsearch index name. */
private final String indexName;

/**
* The cached mapping of field and type in index.
*/
private Map<String, ExprType> cachedFieldTypes = null;

/*
* TODO: Assume indexName doesn't have wildcard.
* Need to either handle field name conflicts
* or lazy evaluate when query engine pulls field type.
*/
@Override
public Map<String, ExprType> getFieldTypes() {
return new ElasticsearchDescribeIndexRequest(client, indexName).getFieldTypes();
if (cachedFieldTypes == null) {
cachedFieldTypes = new ElasticsearchDescribeIndexRequest(client, indexName).getFieldTypes();
}
return cachedFieldTypes;
}

/**
Expand Down
Expand Up @@ -111,7 +111,7 @@ public void getIndexMappings() throws IOException {
assertEquals(1, indexMappings.size());

IndexMapping indexMapping = indexMappings.values().iterator().next();
assertEquals(20, indexMapping.size());
assertEquals(18, indexMapping.size());
assertEquals("text", indexMapping.getFieldType("address"));
assertEquals("integer", indexMapping.getFieldType("age"));
assertEquals("double", indexMapping.getFieldType("balance"));
Expand All @@ -121,15 +121,13 @@ public void getIndexMappings() throws IOException {
assertEquals("some_new_es_type_outside_type_system", indexMapping.getFieldType("new_field"));
assertEquals("text", indexMapping.getFieldType("field with spaces"));
assertEquals("text_keyword", indexMapping.getFieldType("employer"));
assertEquals("keyword", indexMapping.getFieldType("employer.raw"));
assertEquals("nested", indexMapping.getFieldType("projects"));
assertEquals("boolean", indexMapping.getFieldType("projects.active"));
assertEquals("date", indexMapping.getFieldType("projects.release"));
assertEquals("nested", indexMapping.getFieldType("projects.members"));
assertEquals("text", indexMapping.getFieldType("projects.members.name"));
assertEquals("object", indexMapping.getFieldType("manager"));
assertEquals("text_keyword", indexMapping.getFieldType("manager.name"));
assertEquals("keyword", indexMapping.getFieldType("manager.name.keyword"));
assertEquals("keyword", indexMapping.getFieldType("manager.address"));
assertEquals("long", indexMapping.getFieldType("manager.salary"));
}
Expand Down
Expand Up @@ -110,7 +110,7 @@ void getIndexMappings() throws IOException {
assertEquals(1, indexMappings.size());

IndexMapping indexMapping = indexMappings.values().iterator().next();
assertEquals(20, indexMapping.size());
assertEquals(18, indexMapping.size());
assertEquals("text", indexMapping.getFieldType("address"));
assertEquals("integer", indexMapping.getFieldType("age"));
assertEquals("double", indexMapping.getFieldType("balance"));
Expand All @@ -120,15 +120,13 @@ void getIndexMappings() throws IOException {
assertEquals("some_new_es_type_outside_type_system", indexMapping.getFieldType("new_field"));
assertEquals("text", indexMapping.getFieldType("field with spaces"));
assertEquals("text_keyword", indexMapping.getFieldType("employer"));
assertEquals("keyword", indexMapping.getFieldType("employer.raw"));
assertEquals("nested", indexMapping.getFieldType("projects"));
assertEquals("boolean", indexMapping.getFieldType("projects.active"));
assertEquals("date", indexMapping.getFieldType("projects.release"));
assertEquals("nested", indexMapping.getFieldType("projects.members"));
assertEquals("text", indexMapping.getFieldType("projects.members.name"));
assertEquals("object", indexMapping.getFieldType("manager"));
assertEquals("text_keyword", indexMapping.getFieldType("manager.name"));
assertEquals("keyword", indexMapping.getFieldType("manager.name.keyword"));
assertEquals("keyword", indexMapping.getFieldType("manager.address"));
assertEquals("long", indexMapping.getFieldType("manager.salary"));
}
Expand Down
Expand Up @@ -136,26 +136,6 @@ public void selectWrongField() throws IOException {
assertThat(getDataRows(response).length(), equalTo(RESPONSE_DEFAULT_MAX_SIZE));
}

@Ignore("Issue 1019, Breaking Change, the keyword should not been allowed, semantic exception "
+ "is expected")
@Test
public void selectKeyword() throws IOException {
JSONObject response = executeQuery(
String.format(Locale.ROOT, "SELECT firstname.keyword FROM %s",
TestsConstants.TEST_INDEX_ACCOUNT));

List<String> fields = Collections.singletonList("firstname.keyword");
assertContainsColumns(getSchema(response), fields);

/*
* firstname.keyword will appear in Schema but because there is no 'firstname.keyword' in SearchHits source
* the DataRows will output null.
*
* Looks like x-pack adds this keyword field to "docvalue_fields", this is likely how it ends up in SearchHits
*/
// assertContainsData(getDataRows(protocol), fields);
}

@Test
public void selectScore() throws IOException {
JSONObject response = executeQuery(
Expand Down Expand Up @@ -377,17 +357,6 @@ public void aggregationFunctionInSelectWithAlias() throws IOException {
}
}

@Test
public void aggregationFunctionInSelectGroupByMultipleFields() throws IOException {
JSONObject response = executeQuery(
String.format(Locale.ROOT, "SELECT SUM(age) FROM %s GROUP BY age, state.keyword",
TestsConstants.TEST_INDEX_ACCOUNT));

List<String> fields = Arrays.asList("SUM(age)");
assertContainsColumns(getSchema(response), fields);
assertContainsData(getDataRows(response), fields);
}

@Test
public void aggregationFunctionInSelectNoGroupBy() throws IOException {
JSONObject response = executeQuery(String.format(Locale.ROOT, "SELECT SUM(age) FROM %s",
Expand Down
Expand Up @@ -65,10 +65,31 @@ public void testNonAggregatedSelectColumnPresentWithoutGroupByClause() throws IO
.hasStatusCode(BAD_REQUEST)
.hasErrorType("SemanticCheckException")
.containsMessage("Explicit GROUP BY clause is required because expression [state] "
+ "contains non-aggregated column")
+ "contains non-aggregated column")
.whenExecute("SELECT state, AVG(age) FROM elasticsearch-sql_test_index_account");
}

@Test
public void testQueryFieldWithKeyword() throws IOException {
expectResponseException()
.hasStatusCode(BAD_REQUEST)
.hasErrorType("SemanticCheckException")
.containsMessage(
"can't resolve Symbol(namespace=FIELD_NAME, name=firstname.keyword) in type env")
.whenExecute("SELECT firstname.keyword FROM elasticsearch-sql_test_index_account");
}

@Test
public void aggregationFunctionInSelectGroupByMultipleFields() throws IOException {
expectResponseException()
.hasStatusCode(BAD_REQUEST)
.hasErrorType("SemanticCheckException")
.containsMessage(
"can't resolve Symbol(namespace=FIELD_NAME, name=state.keyword) in type env")
.whenExecute(
"SELECT SUM(age) FROM elasticsearch-sql_test_index_account GROUP BY state.keyword");
}

public ResponseExceptionAssertion expectResponseException() {
return new ResponseExceptionAssertion(exceptionRule);
}
Expand Down