Skip to content

Conversation

@dey4ss
Copy link
Member

@dey4ss dey4ss commented Sep 10, 2021

This PR adds SQL date intervals, i.e., supports statements such the following:

SELECT * FROM t 
   WHERE t.date BETWEEN '2021-01-01' AND DATE '2021-01-01' + 30 DAYS

This is achieved by a new type of literal/Expr, with type kExpressionLiteralInterval. It is returned as second operand of the binary plus (or minus, respectivley) expression, where the start date ist the first one. Only checks if duration is a positive integer and the time unit is any of second(s), minute(s), hour(s), day(s), month(s), year(s). Any further sanity checks are in response of the consuming system.

closes #184
closes #178

@dey4ss
Copy link
Member Author

dey4ss commented Sep 10, 2021

Regarding the syntax, cast(... as date) + INTERVAL n days is kind of a random: I've seen cast(... as date) + n days [1] (which is implemented as well), cast(... as date) + interval 'n' days [2], and Postgres supposes cast(.. as date) + interval 'n days'[3]. However, I thougt this to be natural. If there are opinions preferring any of the variations, I'm open to it.

@Bensk1 Bensk1 requested a review from mweisgut September 13, 2021 07:09
@dey4ss
Copy link
Member Author

dey4ss commented Sep 13, 2021

After revisiting the SQL standard and validation with Postgres, both INTERVAL 'n' days and INTERVAL 'n days' seems to be correct (but not INTERVAL n days. I will change the current behavior to apply this.

@mweisgut
Copy link
Contributor

I will do the review on Thursday at the latest.

| extract_expr
| cast_expr
| array_expr
| interval_expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to introduce an 'inverval_literal' in ll. 1029 ('literal: [...]') instead. (interval_expression is used in the SQL standard to enforce type compatibility (see http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt p.168 ), which we do not do in the hyrise parser)

Copy link
Member Author

@dey4ss dey4ss Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just about to change that. I guess Expr::isLiteral() should return true for interval literals, as well? And is there a reason not to include date literals here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I added isType(kExprLiteralDate) || isType(kExprLiteralInterval) to Expr::isLiteral().

@klauck
Copy link
Contributor

klauck commented Sep 14, 2021

After revisiting the SQL standard and validation with Postgres, both INTERVAL 'n' days and INTERVAL 'n days' seems to be correct (but not INTERVAL n days. I will change the current behavior to apply this.

Arrg, the interval syntax is complex and differs depending on the specific database system: Further, note, it is 'n' day, '1 day', and '5 days' (with and without 's').

@dey4ss
Copy link
Member Author

dey4ss commented Sep 15, 2021

Further, note, it is 'n' day, '1 day', and '5 days' (with and without 's').

This ambiguation works now.

Copy link
Contributor

@mweisgut mweisgut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the above comments, it would be helpful if the description of this PR summarizes the different ways to define the intervals, i.e., 'n' day, '1 day', and '5 days'.

!SELECT * FROM t WHERE a = DATE '2000-01-01' + INTERVAL '30' DAYS;
!SELECT * FROM t WHERE a = DATE '2000-01-01' + x DAYS;
!SELECT * FROM t WHERE a = DATE '2000-01-01' + INTERVAL 'x' DAY;
!SELECT * FROM t WHERE a = DATE '2000-01-01' + INTERVAL '3.3 DAYS';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the two following statements should be correctly parsed. With this PR, they are successfully parsed. However, they do not make semantic sense, do they?

SELECT * FROM t WHERE a BETWEEN 3 DAYS AND 3 DAYS;
SELECT * FROM t WHERE a BETWEEN INTERVAL '3 DAYS' AND INTERVAL '3 DAYS';

Based on briefly checking the SQL specification, I think it should be parsable with the following rules, starting from a between predicate:

- between predicate
  - row value predicand
    - row value constructor predicand
      - common value expression
        - interval value expression
          - ...

So the question here would be whether we should let parsing those statements fail or if further sanity checks in Hyrise should reject such statements.

Copy link
Member Author

@dey4ss dey4ss Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to pass such checks to Hyrise. Also ensuring that intervals may only be input to '+' or '-' operations etc. would be out of the parser's scope for now. Regarding semantic sense, DB Fiddle suggests that Postgres has no problem with intervals in BETWEEN statements, e.g., SELECT INTERVAL '2 days' BETWEEN INTERVAL '1 day' AND INTERVAL '3 days'evaluates to 'true'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. @klauck and I also discussed that and we also think that such semantic checks should be part of Hyrise.

https://sqliteonline.com/ also suggests that Postgres returns true for SELECT INTERVAL '2 days' BETWEEN INTERVAL '1 day' AND INTERVAL '3 days'.

Copy link
Contributor

@mweisgut mweisgut Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mweisgut
Copy link
Contributor

Referencing #184 and #178.

@dey4ss dey4ss changed the title Add Date Intervals Add Interval Literal Sep 16, 2021
@mweisgut
Copy link
Contributor

mweisgut commented Sep 21, 2021

Looks good to me. Just one minor change request:
Could you please add SELECT (CAST('2002-5-01' as DATE) + INTERVAL '60 days');, which is the example query of #178 to queries-good.sql? Then, we can close the issue with this issue IMO.

@klauck could you take a final look at this PR?

src/sql/Expr.cpp Outdated

Expr* Expr::makeIntervalLiteral(int64_t duration, DatetimeField unit) {
Expr* e = new Expr(kExprLiteralInterval);
e->name = strdup("INTERVAL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the use of the name field here (and suggest removing the line (and also in makeExtract and makeCast)). I think the purpose of the name field was storing actual string values, which may differ among different Expression instances (e.g. different string literals) and not the name of the expression type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, all changes look fine for me. Thanks for your revision.

Copy link
Member Author

@dey4ss dey4ss Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three name fields are now removed, tests + print methods are adjusted accordingly.

@mweisgut mweisgut merged commit f1afb9a into hyrise:master Sep 21, 2021
@dey4ss dey4ss deleted the date-interval branch September 21, 2021 11:51
@dey4ss dey4ss mentioned this pull request Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DATE subtract with INTERVAL is not parsed Support for Date/Time Operations

3 participants