Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

SELECT nested() cannot get sub properties #963

Open
FreCap opened this issue Jan 6, 2021 · 6 comments
Open

SELECT nested() cannot get sub properties #963

FreCap opened this issue Jan 6, 2021 · 6 comments
Labels
bug Something isn't working good first issue Good for newcomers SQL

Comments

@FreCap
Copy link
Contributor

FreCap commented Jan 6, 2021

How to reproduce:


DELETE /my_index_nested_sub_field

PUT /my_index_nested_sub_field
{
  "mappings": {
    "properties": {
      "myNestedField": {
        "type": "nested"
      }
    }
  }
}

POST my_index_nested_sub_field/_doc/
{
  "id": 2,
  "myNestedField": [
    {
      "myNestedId": 7,
      "myNestedFieldString": "5",
      "myNestedFieldObject": {
        "val1":1,
        "val2":2,
        "val3":3
      }
    },
    {
      "myNestedId": 8,
      "myNestedFieldString": "3",
      "myNestedFieldObject": {
        "val1":4,
        "val2":5,
        "val3":6
      }
    }
  ]
}

1) Works as expected, contains both myNestedFieldObject and myNestedId


POST _opendistro/_sql
{
  "query": """
SELECT  nested(myNestedField.myNestedFieldObject), nested(myNestedField.myNestedId) FROM my_index_nested_sub_field
  """
}

2) Doesn't work as expected: partial response, doesn't contain myNestedFieldObject


POST _opendistro/_sql
{
  "query": """
SELECT  nested(myNestedField.*) FROM my_index_nested_sub_field
  """
}

Response doesn't contain myNestedFieldObject

{
  "schema": [
    {
      "name": "myNestedField.myNestedId",
      "type": "long"
    },
    {
      "name": "myNestedField.myNestedFieldString",
      "type": "text"
    }
  ],
  "total": 2,
  "datarows": [
    [
      7,
      "5"
    ],
    [
      8,
      "3"
    ]
  ],
  "size": 2,
  "status": 200
}

3) Doesn't work as expected: throws exception

POST _opendistro/_sql
{
  "query": """
SELECT  nested(myNestedField.myNestedFieldObject.val1)  FROM my_index_nested_sub_field
  """
}

Response:
{
  "error": {
    "reason": "Error occurred in Elasticsearch engine: all shards failed",
    "details": """Shard[0]: [my_index_nested_sub_field/WWDKmuGGSy6sdOcG8JkInA] QueryShardException[failed to create query: [nested] nested object under path [myNestedField.myNestedFieldObject] is not of nested type]; nested: IllegalStateException[[nested] nested object under path [myNestedField.myNestedFieldObject] is not of nested type];

For more details, please send request for Json format to see the raw response from elasticsearch engine.""",
    "type": "SearchPhaseExecutionException"
  },
  "status": 400
}

Same for adding myNestedField,

POST _opendistro/_sql
{
  "query": """
SELECT  nested(***myNestedField***,myNestedField.myNestedFieldObject.val1)  FROM my_index_nested_sub_field
  """
}

Response:
{
  "error": {
    "reason": "Error occurred in Elasticsearch engine: all shards failed",
    "details": """Shard[0]: [my_index_nested_sub_field/WWDKmuGGSy6sdOcG8JkInA] QueryShardException[failed to create query: [nested] failed to find nested object under path [myNestedField.myNestedFieldObject.val1]]; nested: IllegalStateException[[nested] failed to find nested object under path [myNestedField.myNestedFieldObject.val1]];

For more details, please send request for Json format to see the raw response from elasticsearch engine.""",
    "type": "SearchPhaseExecutionException"
  },
  "status": 400
}
@FreCap FreCap changed the title SELECT nested cannot get sub properties SELECT nested() cannot get sub properties Jan 6, 2021
@chloe-zh
Copy link
Member

chloe-zh commented Jan 7, 2021

Hi @FreCap , thanks for reporting this issue! I was able to reproduce these cases.

1
For the second case SELECT nested(myNestedField.*) FROM my_index_nested_sub_field, since the old engine was once not able to convert nested type to a standard type which our JDBC driver supported, so we disabled the nested field to be selected when the query is to select all (select *) or select wildcard (e.g. select myNestedField.*) in the legacy codebase.

Therefore the myNestedField.myNestedFieldObject is left out in the response. The workaround would be enable the nested type back since the type conversion is supported in the latest code version now. We will have it fixed once we have resources.

One option before the fix is to set the format to JSON (which are the search hits as response caught from the ES engine) rather than the default JDBC format by adding the param ?format=json at the end of the request, and it works well as I test out. For example:

POST _opendistro/_sql?format=json
{
  "query": "select nested(myNestedField.*) from nested_data"
}

response:
{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "nested_data",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "myNestedField" : [
            {
              "myNestedId" : 7,
              "myNestedFieldString" : "5",
              "myNestedFieldObject" : {
                "val1" : 1,
                "val2" : 2,
                "val3" : 3
              }
            },
            {
              "myNestedId" : 8,
              "myNestedFieldString" : "3",
              "myNestedFieldObject" : {
                "val1" : 4,
                "val2" : 5,
                "val3" : 6
              }
            }
          ]
        },
        "inner_hits" : {
          "myNestedField" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "nested_data",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "myNestedField",
                    "offset" : 0
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "myNestedId" : 7,
                    "myNestedFieldObject" : {
                      "val3" : 3,
                      "val2" : 2,
                      "val1" : 1
                    },
                    "myNestedFieldString" : "5"
                  }
                },
                {
                  "_index" : "nested_data",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "myNestedField",
                    "offset" : 1
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "myNestedId" : 8,
                    "myNestedFieldObject" : {
                      "val3" : 6,
                      "val2" : 5,
                      "val1" : 4
                    },
                    "myNestedFieldString" : "3"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

2
For case SELECT nested(myNestedField.myNestedFieldObject.val1) FROM my_index_nested_sub_field, the stack trace indicates that myNestedField.myNestedFieldObject is not of nested type, so the nested operation is not eligible here. Currently we do not support either object type in nested fields or deep nested fields yet, but we will have workaround to support nested fields in our new engine very soon. We will keep you updated once such features are good to release. You can also checkout our release notes in the repository or the announcements in the ODFE website to see the latest features. Thanks!

@chloe-zh chloe-zh added bug Something isn't working SQL good first issue Good for newcomers labels Jan 7, 2021
@FreCap
Copy link
Contributor Author

FreCap commented Jan 7, 2021

Thank you @chloe-zh!

I saw I could use format=json, but in this phase of dev I was trying to switch to full jdbc given how hard to read the json format for complex aggregates can be.

, the stack trace indicates that myNestedField.myNestedFieldObject is not of nested type,
In which file do you find the stacktrace? I've been trying to look for it but I couldn't find it in my cluster

Thank you for keeping me up to date.
Francesco

@chloe-zh
Copy link
Member

chloe-zh commented Jan 7, 2021

Hi @FreCap , you can find the stack trace from es log in the node.

@FreCap
Copy link
Contributor Author

FreCap commented Jan 12, 2021

Thanks @chloe-zh!

As a suggestion, it'd be great having the log in the response without having to go to the machine. Having tens of machines (or not having access to them for security reasons), it becomes an extremely challenging task to complete

@FreCap
Copy link
Contributor Author

FreCap commented Jan 12, 2021

Do you have any suggestion on what I could do to contribute a fix? Any, even small, guidance would be greatly appreciated

@chloe-zh
Copy link
Member

Could you checkout the method populateColumns in legacy/src/main/java/com/amazon/opendistroforelasticsearch/sql/legacy/executor/format/SelectResultSet.java to try for a workaround if you'd like to. The issue should be rooted around there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working good first issue Good for newcomers SQL
Projects
None yet
Development

No branches or pull requests

2 participants