Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] IS NOT NULL with alias throws SemanticCheckException #292

Open
ankurgupta4891 opened this issue Nov 18, 2021 · 11 comments
Open

[BUG] IS NOT NULL with alias throws SemanticCheckException #292

ankurgupta4891 opened this issue Nov 18, 2021 · 11 comments

Comments

@ankurgupta4891
Copy link

Describe the bug
Using IS NOT NULL with alias throws SemanticCheckException.

To Reproduce
Sample query:

SELECT dest_city_name AS dc
FROM all_flights
WHERE dc IS NOT NULL

Expected behavior
Should be same as query:

SELECT dest_city_name AS dc
FROM all_flights
WHERE dest_city_name IS NOT NULL
@penghuo
Copy link
Collaborator

penghuo commented Jan 6, 2022

Thanks for reporting issue. Currently we doesn't support use alias in where clause. One workaround should be

SELECT dest_city_name AS dc
FROM all_flights
WHERE dest_city_name IS NOT NULL

@penghuo penghuo added enhancement New feature or request and removed untriaged labels Jan 6, 2022
@penghuo
Copy link
Collaborator

penghuo commented Jan 6, 2022

@ankurgupta4891 In opendistro-for-elasticsearch/sql#1145, you mentioned, "I tried to give simple version of our use case but we need to get alias support with IS NOT NULL"

Could you share your use case? Are u using tools to auto generate query?

@penghuo
Copy link
Collaborator

penghuo commented Mar 15, 2022

confirmed offline. Query is auto generated.

Next Step

We need to evaluate the tech solution. The challenge is that SQL statements is analyzed in predefined order of, FROM -> WHERE -> SELECT, alias defined in SELECT which is not visible to WHERE clause.

@jingyali-apr
Copy link

jingyali-apr commented Aug 31, 2022

Hey team,
I notice that IS NOT NULL/IS NULL with alias in ORDER BY also throws exception: "This query is not explainable."
(Using Query Workbench kibana plugin in ES 7.10)

SELECT dest_state_nm AS city FROM all_flights_integ_tests ORDER BY city IS NOT NULL

Could you help and check as well?
Thanks

@dai-chen
Copy link
Collaborator

dai-chen commented Sep 1, 2022

Hey team, I notice that IS NOT NULL/IS NULL with alias in ORDER BY also throws exception: "This query is not explainable." (Using Query Workbench kibana plugin in ES 7.10)

SELECT dest_state_nm AS city FROM all_flights_integ_tests ORDER BY city IS NOT NULL

Could you help and check as well? Thanks

@jingyali-apr Thanks for reporting the issue! Could you share your index mapping and some test data for investigation? And it would be helpful if you can find error stacktrace in ES log.

@jingyali-apr
Copy link

Hey team, I notice that IS NOT NULL/IS NULL with alias in ORDER BY also throws exception: "This query is not explainable." (Using Query Workbench kibana plugin in ES 7.10)

SELECT dest_state_nm AS city FROM all_flights_integ_tests ORDER BY city IS NOT NULL

Could you help and check as well? Thanks

@jingyali-apr Thanks for reporting the issue! Could you share your index mapping and some test data for investigation? And it would be helpful if you can find error stacktrace in ES log.

Hey @dai-chen, please check below samples.

Successful case without IS NOT NULL

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY state limit 5"
}

{
  "schema": [
    {
      "name": "dest_state_nm",
      "alias": "state",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "Alabama"
    ],
    [
      "Alaska"
    ],
    [
      "California"
    ],
    [
      "Colorado"
    ],
    [
      "Connecticut"
    ]
  ],
  "total": 5,
  "size": 5,
  "status": 200
}

Failed case with IS NOT NULL

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY state IS NOT NULL limit 5"
}

{
  "error": {
    "reason": "Invalid SQL query",
    "details": "can't resolve Symbol(namespace=FIELD_NAME, name=state) in type env",
    "type": "SemanticCheckException"
  },
  "status": 400
}

Successful case without using alias

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY dest_state_nm IS NOT NULL limit 5"
}

{
  "schema": [
    {
      "name": "dest_state_nm",
      "alias": "state",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "Alabama"
    ],
    [
      "Wisconsin"
    ],
    [
      "West Virginia"
    ],
    [
      "Virginia"
    ],
    [
      "Vermont"
    ]
  ],
  "total": 5,
  "size": 5,
  "status": 200
}

Partial test data:

POST _opendistro/_sql/
{
"query": "SELECT dest_state_nm FROM all_flights_integ_tests limit 50"
}

{
  "schema": [
    {
      "name": "dest_state_nm",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "Alaska"
    ],
    [
      "South Carolina"
    ],
    [
      "Michigan"
    ],
    [
      "Texas"
    ],
    [
      "Texas"
    ],
    [
      "New York"
    ],
    [
      "Michigan"
    ],
    [
      "Massachusetts"
    ],
    [
      "Texas"
    ],
    [
      "Hawaii"
    ],
    [
      "Hawaii"
    ],
    [
      "Alabama"
    ],
    [
      "Texas"
    ],
    [
      "Georgia"
    ],
    [
      "Ohio"
    ],
    [
      "Michigan"
    ],
    [
      "Texas"
    ],
    [
      "Ohio"
    ],
    [
      "New York"
    ],
    [
      "Virginia"
    ],
    [
      "Virginia"
    ],
    [
      "Michigan"
    ],
    [
      "Florida"
    ],
    [
      "Wisconsin"
    ],
    [
      "Texas"
    ],
    [
      "Mississippi"
    ],
    [
      "New York"
    ],
    [
      "Georgia"
    ],
    [
      "Texas"
    ],
    [
      "New Jersey"
    ],
    [
      "Texas"
    ],
    [
      "Louisiana"
    ],
    [
      "New York"
    ],
    [
      "Alabama"
    ],
    [
      "Texas"
    ],
    [
      "Kentucky"
    ],
    [
      "Texas"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Texas"
    ],
    [
      "Georgia"
    ],
    [
      "Tennessee"
    ],
    [
      "New Jersey"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Georgia"
    ],
    [
      "Maryland"
    ],
    [
      "Colorado"
    ],
    [
      "Tennessee"
    ]
  ],
  "total": 50,
  "size": 50,
  "status": 200
}

Thanks

@dai-chen
Copy link
Collaborator

dai-chen commented Sep 1, 2022

@jingyali-apr Could you try to use the original field name in ORDER BY clause as below?

POST _opendistro/_sql
{
"query" : """
  SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY dest_state_nm IS NOT NULL
 """
}

@jingyali-apr
Copy link

@jingyali-apr Could you try to use the original field name in ORDER BY clause as below?

POST _opendistro/_sql
{
"query" : """
  SELECT dest_state_nm AS state FROM all_flights_integ_tests GROUP BY state ORDER BY dest_state_nm IS NOT NULL
 """
}

Yeah, I tried this and it worked.

See my Successful case without using alias in above comment.

@dai-chen
Copy link
Collaborator

dai-chen commented Sep 1, 2022

@jingyali-apr Thanks for confirming! I may have found where things go wrong:

querySpec.replaceIfAliasOrOrdinal(items.get(i)),

When we try to replace alias in ORDER BY, we only replace ordinal or alias directly. In our case, state is NOT NULL is an expression which causes alias replacement fail. Will debug to confirm and open issue/PR to fix. However, just to let you know, the fix would be only available in latest OpenSearch version (probably 2.3 next). Thanks!

@Yury-Fridlyand
Copy link
Collaborator

https://stackoverflow.com/a/942592
You can only use column aliases in GROUP BY, ORDER BY, or HAVING clauses.

Standard SQL doesn't allow you to refer to a column alias in a WHERE clause. This restriction is imposed because when the WHERE code is executed, the column value may not yet be determined.

Copied from MySQL documentation

@jpluimers
Copy link

Copied from MySQL documentation

Note that was the [Wayback/Archive] MySQL :: MySQL 5.1 Reference Manual :: B.5.5.4 Problems with Column Aliases documentation until December 2009. After that it got rephrased (including "doesn't allow" to "disallow", "alias" to "aliases", "code" to "clause) and still is in the current form at [Wayback/Archive] MySQL :: MySQL 8.0 Reference Manual :: B.3.4.4 Problems with Column Aliases:

Standard SQL disallows references to column aliases in a WHERE clause. This restriction is imposed because when the WHERE clause is evaluated, the column value may not yet have been determined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request Priority-High
Projects
SQL/PPL Maintenance
High priority
Status: No status
Development

No branches or pull requests

8 participants