Skip to content

Support group by span over time based column with Span UDF#3421

Merged
LantaoJin merged 13 commits intoopensearch-project:feature/calcite-enginefrom
songkant-aws:songkant/feature/calcite-engine
Mar 19, 2025
Merged

Support group by span over time based column with Span UDF#3421
LantaoJin merged 13 commits intoopensearch-project:feature/calcite-enginefrom
songkant-aws:songkant/feature/calcite-engine

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Mar 13, 2025

Description

Support group by span over time based column with Span UDF

Related Issues

Resolves #3354

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Songkan Tang <songkant@amazon.com>
…ngine

Signed-off-by: Songkan Tang <songkant@amazon.com>
date.atStartOfDay().atZone(ZoneOffset.UTC).toInstant().toEpochMilli(), interval);
return SqlFunctions.timestampToDate(dateEpochValue);
case SqlTypeName.TIME:
if (dateTimeUnit.getId() > 4) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why need this limitation?

Copy link
Contributor Author

@songkant-aws songkant-aws Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment inline. The TIME type usually means date format without year, month, etc like the field '17:59:59.99'

* day, 1 month, 1 hour
* </ol>
*/
public class SpanFunction implements UserDefinedFunction {
Copy link
Collaborator

@qianheng-aws qianheng-aws Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave a TODO here to do refactoring in future. Your previous idea of implementing a self-defined implementor seems to be a better approach.

Looks like it could be implemented by replacing ScalarFunction(we currently use) with ImplementableFunction, and move logic for handling different filed type to the implementor. In that way, this span function should be simpler with only handling timestamp type.

Then, we may have different functions like dateToTs, tsToDate, span(ForTs) and provide us more flexibility to reuse them or combine them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO

return SqlLibraryOperators.DATEADD;
// UDF Functions
case "SPAN":
return TransferUserDefinedFunction(SpanFunction.class, "SPAN", ReturnTypes.ARG0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReturnTypes.ARG0 or ReturnTypes.ARG0_NULLABLE? What if the timestamp is null in span(timestamp, ...)?

Copy link
Contributor Author

@songkant-aws songkant-aws Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to ReturnTypes.ARG0_NULLABLE but it will transform it to default long or int value like 0L or 0 in linq4j generated code. Still looking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved the groupby nulls issue due to sql type creation conflicts in RexNodeBuilder.

executeQuery(
String.format(
"source=%s | stats avg(balance) by span(birthdate, 1 day) as age_balance",
"source=%s | stats avg(balance) by span(birthdate, 1 month) as age_balance",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need some ITs for such as 15 minutes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 15 minutes IT.

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
@LantaoJin
Copy link
Member

LantaoJin commented Mar 18, 2025

PPLQueryDataAnonymizer and PPLQueryDataAnonymizerTest are required to update for span expression. @songkant-aws

@LantaoJin
Copy link
Member

Flaky test

CalcitePPLStringBuiltinFunctionIT > testPosition FAILED
    java.lang.AssertionError: expected:<1> but was:<2>
        at __randomizedtesting.SeedInfo.seed([EFDC39F9CD048074:F34E991E6AAE7EDB]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.sql.util.MatcherUtils.verify(MatcherUtils.java:192)
        at org.opensearch.sql.util.MatcherUtils.verifyDataRows(MatcherUtils.java:160)
        at org.opensearch.sql.calcite.standalone.CalcitePPLStringBuiltinFunctionIT.testPosition(CalcitePPLStringBuiltinFunctionIT.java:161)

Let me trigger a re-run attempt.

cc @yuancu

@LantaoJin LantaoJin added the calcite calcite migration releated label Mar 18, 2025
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
@songkant-aws
Copy link
Contributor Author

PPLQueryDataAnonymizer and PPLQueryDataAnonymizerTest are required to update for span expression. @songkant-aws

Added corresponding change and more IT to cover span over different kinds of date formats.

@songkant-aws
Copy link
Contributor Author

Looks like we have some flaky tests, this time there is one in build-windows-macos, which is irrelevant to this PR's change.

org.opensearch.sql.calcite.standalone.CalcitePPLStringBuiltinFunctionIT.testSubstring(CalcitePPLStringBuiltinFunctionIT.java:148)

@LantaoJin LantaoJin merged commit b99130a into opensearch-project:feature/calcite-engine Mar 19, 2025
18 of 20 checks passed
@LantaoJin
Copy link
Member

Failure due to flaky test, merged.

penghuo pushed a commit that referenced this pull request Jun 16, 2025
* Support group by span over time based column with Span UDF

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Minor fix UT and IT after merge

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add custom minute span IT

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Remove print

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix spotless style

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix groupby nulls issue

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Remove unnecessary sql type creation in RexNodeBuilder

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix rounding for date and time type

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add more tests

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Revert unrelated change in build.gradle

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
@songkant-aws songkant-aws deleted the songkant/feature/calcite-engine branch October 9, 2025 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants