Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Comparing date, timestamp and string lead to failures and Date formatted as timestamp #1056

Open
FreCap opened this issue Feb 22, 2021 · 16 comments
Assignees
Labels
Breaking Changes Breaking Changes that will impact clients SQL

Comments

@FreCap
Copy link
Contributor

FreCap commented Feb 22, 2021

Version 1.13 (might be a regression from previous versions)

The following query:

POST my_test_index_date/_doc/
{
  "field1": {
    "subFieldA": "2020-02-02"
  }
}

generates the following mapping specifying the field as DATE:

{
  "mappings": {
    "_doc": {
      "properties": {
        "field1": {
          "properties": {
            "subFieldA": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

When I try to query it in different combinations that I feel should probably be acceptable, it always fail:

Case 1

POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  i.field1.subFieldA > '2020-02-01'
    LIMIT 50;"""
}

Response -->
{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "> function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[BOOLEAN,BOOLEAN],[STRING,STRING],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [TIMESTAMP,STRING]",
    "type": "ExpressionEvaluationException"
  },
  "status": 503
}

Why the field is considered TIMESTAMP here instead of DATE (like in the mapping)?

It'd be very nice to have the two types (TIMESTAMP and STRING) comparable since we don't want the user to always know what kind of Date it is in the mapping

Case 2


POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  i.field1.subFieldA > DATE('2020-02-01')
    LIMIT 50;"""
}

Response -->
{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "> function expected {[BYTE,BYTE],[SHORT,SHORT],[INTEGER,INTEGER],[LONG,LONG],[FLOAT,FLOAT],[DOUBLE,DOUBLE],[BOOLEAN,BOOLEAN],[STRING,STRING],[TIMESTAMP,TIMESTAMP],[DATE,DATE],[TIME,TIME],[DATETIME,DATETIME],[INTERVAL,INTERVAL],[STRUCT,STRUCT],[ARRAY,ARRAY]}, but get [TIMESTAMP,DATE]",
    "type": "ExpressionEvaluationException"
  },
  "status": 503
}

Shouldn't TIMESTAMP and DATE be comparable?

Case 3

POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  (i.field1.subFieldA) > TIMESTAMP('2020-02-01')
    LIMIT 50;"""
}

Response --> timestamp parse error
{
  "error": {
    "type": "SearchPhaseExecutionException",
    "reason": "Error occurred in Elasticsearch engine: all shards failed",
    "details": "Shard[0]: NotSerializableExceptionWrapper[semantic_check_exception: timestamp:2020-02-01 in unsupported format, please use yyyy-MM-dd HH:mm:ss[.SSSSSS]]\n\nFor more details, please send request for Json format to see the raw response from elasticsearch engine."
  },
  "status": 503
}

Case 4, working, but unexpected output


POST _opendistro/_sql
{
  "query": """SELECT * FROM
  my_test_index_date as i 
  WHERE 
  DATE(i.field1.subFieldA) > DATE('2020-02-01')
    LIMIT 50;"""
}

Response --> DATE in Timestamp format instead of DATE format like in mapping
{
  "schema": [
    {
      "name": "field1",
      "type": "object"
    }
  ],
  "datarows": [
    [
      {
        "subFieldA": "2020-02-02 00:00:00"
      }
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}

@penghuo
Copy link
Contributor

penghuo commented Feb 24, 2021

@penghuo penghuo added the bug Something isn't working label Feb 24, 2021
@FreCap
Copy link
Contributor Author

FreCap commented Feb 24, 2021

Also this I believe it's a breaking change

@dai-chen
Copy link
Member

dai-chen commented Mar 1, 2021

Also this I believe it's a breaking change

yeah, I checked that our old engine can handle the queries without issue. The reason is no matter the field value, there is only one type called date in Elasticsearch which is represented by epoch internally. When we developed the new engine, we assume raw data fetched from Elasticsearch is always timestamp. Will revisit this and see how to make it more user friendly meanwhile follow SQL standard as much as possible. Thanks!

@dai-chen dai-chen added Breaking Changes Breaking Changes that will impact clients SQL and removed bug Something isn't working labels Mar 1, 2021
@dai-chen
Copy link
Member

dai-chen commented Mar 1, 2021

Issue may be related: #510

@FreCap
Copy link
Contributor Author

FreCap commented Mar 2, 2021

@dai-chen Shouldn't it fall back to the old one automatically when the query throws?

If not, could we enable that fallback somehow?

@dai-chen
Copy link
Member

dai-chen commented Mar 2, 2021

@dai-chen Shouldn't it fall back to the old one automatically when the query throws?

If not, could we enable that fallback somehow?

Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.

@FreCap
Copy link
Contributor Author

FreCap commented Mar 3, 2021

@dai-chen Shouldn't it fall back to the old one automatically when the query throws?
If not, could we enable that fallback somehow?

Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.

Is it possible that both kind of errors are forced to fallback to the old engine?
Would it be easy to make a PR?

There are a lot of great features in this release that I would love continuing using, but a few breaking changes that I can see could take several weeks+ to be solved.

Recovering from both error types would get us the best from both worlds and help adoption

@dai-chen
Copy link
Member

dai-chen commented Mar 3, 2021

@dai-chen Shouldn't it fall back to the old one automatically when the query throws?
If not, could we enable that fallback somehow?

Yes, fall back happens automatically. However, mostly it happens during parsing. In particular, the ANTLR grammar in new engine decides if we should fall back. After that, the query is considered being able to handled by new engine. The semantic check like the error you saw is in this case.

Is it possible that both kind of errors are forced to fallback to the old engine?
Would it be easy to make a PR?

There are a lot of great features in this release that I would love continuing using, but a few breaking changes that I can see could take several weeks+ to be solved.

Recovering from both error types would get us the best from both worlds and help adoption

Sure, thanks for helping us find this and other breaking changes! We're looking into all recent issues and will prepare PR for high priority ones.

@FreCap
Copy link
Contributor Author

FreCap commented Mar 8, 2021

As additional data point, another valid format in Postgres is 20181231 (opposed to 2018-12-31)

@dai-chen
Copy link
Member

@FreCap sure, I'm working on a PR to support implicit cast. Will reference to MySQL and PostgreSQL. Thanks!

@dai-chen dai-chen self-assigned this Mar 11, 2021
@dai-chen
Copy link
Member

Probably related: #803

@gauravlanjekar
Copy link

I am also facing the same issue on date ranges, currently using between instead of > < signs to compare dates and strings.

@FreCap
Copy link
Contributor Author

FreCap commented Jul 22, 2021

@gauravlanjekar
Copy link

@FreCap Unfortunaletly I am running opendistro with AWS ES. So I don't have the possibility to run the other branch. :(
Do you see any problem with using between because that is my current workaround until a fix is available.

@dai-chen
Copy link
Member

dai-chen commented Apr 5, 2022

I tested with OpenSearch 1.1 which can support implicit conversion between Date and String. The following query worked for me:

POST my_test_index_date/_doc/
{
  "field1": {
    "subFieldA": "2020-02-02"
  }
}

POST _plugins/_sql
{
  "query": """
    SELECT * FROM
      my_test_index_date as i 
    WHERE 
      i.field1.subFieldA > '2020-02-01 00:00:00'
    LIMIT 50;
  """
}
{
  "schema": [
    {
      "name": "field1",
      "type": "object"
    }
  ],
  "datarows": [
    [
      {
        "subFieldA": "2020-02-02 00:00:00"
      }
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}

I'm investigating why '2020-02-01' or '2020-02-01T00:00:00Z' returns nothing.

@dharminfadia
Copy link

Hello Every One

I am trying to change data type using query workbench I want to change data type date to timestamp using query for output schema in other DB but I am not able to do and opensearch having not supporting like cast kind of query any one can help it’s Urgent

query - select timestamp(datecolumn) as date from table

above query is not working plase help !!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Breaking Changes Breaking Changes that will impact clients SQL
Projects
None yet
Development

No branches or pull requests

5 participants