Skip to content

[Calcite Engine] Function framework refactoring#3522

Merged
penghuo merged 6 commits into
opensearch-project:mainfrom
qianheng-aws:calcite-function-refactoring
Apr 9, 2025
Merged

[Calcite Engine] Function framework refactoring#3522
penghuo merged 6 commits into
opensearch-project:mainfrom
qianheng-aws:calcite-function-refactoring

Conversation

@qianheng-aws
Copy link
Copy Markdown
Collaborator

@qianheng-aws qianheng-aws commented Apr 7, 2025

Description

Do refactoring for the framework of function.

It includes change or enhancement:

  1. Add PPLFuncImpTable for registering logical-level implementation for resolving a BuiltinFunctionName. It also includes resolving for functions with different parameters by using CalciteFuncSignature.
  2. Add PPLBuiltinOperators for all self-implemented UDFs. This class is a physical-level implementation of an operator.
  3. Migrate all SqlStdOperatorTable and SqlLibraryOperators to the new framework, and also migrate the span function and typeof function.
  4. Make the UDF static instead of creating a new one when calling visitFunction.
  5. Replace function name(STRING) with BuiltinFunctionName(ENUM) to strengthen code robustness

The new framework aims to hide all implementation at the physical level, so the plan will be more concise and clear.

Take span for example: for a time-related field, span will use a ScalarFunctionImpl (physical-level implementation, can only see span(\$1, \$2, \$3) in the plan); while using logical implementation like FLOOR($1 / $2) * $2 for a non-time field. This makes the plan weird and confusing, and it also blocks push-down since the push-down rule takes effect by identifying the span function operator. After migrating, the SPAN operator is able to handle both cases internally.

PPLFuncImpTable will provide more flexible implementation of a function than only using SqlOperator.

Take typeof for example, after migrating, the new implementation will replace using ScalarFunction TYPEOF(\$1) with returning a RexLiteral directly, which avoids row-by-row evaluation.

Related Issues

Resolves #3493

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
@noCharger noCharger added the calcite calcite migration releated label Apr 7, 2025
Comment thread core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java Outdated
Comment thread core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java Outdated
Comment on lines +194 to +198
for (Map.Entry<CalciteFuncSignature, FunctionImp> implement : implementList) {
if (implement.getKey().match(functionName.getName(), argTypes)) {
return implement.getValue();
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In worst case, what is the maximum number of signatures that would need to be checked?

Copy link
Copy Markdown
Collaborator Author

@qianheng-aws qianheng-aws Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maximum is 2 only for LOG, while others always have only 1. LOG has 2 implementation -- LOG($1) and LOG($1, $2), the former is translated to LOG($1, E).

And we expect different implementation of one function should be included in the implementor of its related operator as much as possible, i.e. implementation details should be hidden in the physical-level instead of logical-level, unless we want to expose the logical detail of the implementation on purpose.(like if want integral division for integer, we may need the new added cast calling is exposed in the plan instead of being hidden in the implementor).

So in the visible future, the maximum number should be only 1 or very small. And we can deprecate this extra level if we strictly follow the above rule and requires only 1-to-1 mapping between function to operator.

import org.apache.calcite.sql.type.SqlReturnTypeInference;
import org.apache.calcite.sql.validate.SqlUserDefinedFunction;

public interface UserDefinedFunctionHelper {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a helper an interface? Can be UserDefinedFunctionFactory or `UserDefinedFunctionTemplate?

default SqlOperandMetadata getOperandMetadata() {
return null;
}
;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless?

case EXPR_TIME -> "evalTime";
case EXPR_TIMESTAMP -> "evalTimestamp";
default -> throw new IllegalArgumentException(
String.format("Unsupported expr udt: %s", exprSqlType.getUdt()));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDT is internal implementation, we should hide internal error message from user.

SpanFunctionImpl.class, methodName, String.class, int.class, String.class));
return function.getImplementor().implement(translator, call, RexImpTable.NullAs.NULL);
}
throw new IllegalArgumentException(String.format("Unsupported RelDataType: %s", fieldType));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RelDataType is internal implementation, we should hide internal error message from user.

import org.apache.calcite.rel.type.RelDataType;

/** Function signature is composed by function name and arguments list. */
public record CalciteFuncSignature(FunctionName functionName, List<RelDataType> funcArgTypes) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need need this signature? If we follow the SpanFunctionImpl, we could put all real implement function like evalDate into same class. Also, the argument validation check would be in another PR. I think the only reason here is about implicit conversion. But I think it's already disabled in calcite now?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is chance that:

  1. an function may has different operator depends on params' type or size and wants this information exposed on our plan. Although, by most cases, we expect the one-to-one relationship between function and operator and hide different processing logic in one operator.

  2. different functions map to one operator while each of them has their specific limitation on the params' size or type. E.g. We use operator NOT_EQUAL for both function != and XOR, while XOR has its limitation that only accepting boolean, so we could put this limitation check on its signature. Of course, we can implement XOR by an UDF with more efforts. It depends on whether we want to expose this implementation details to customer in the plan or not.

It stills remains discussion on the above 2 cases. We can keep this redundant level here since it doesn't harm the functionality or performance, until the final decision in the end.

import org.apache.calcite.schema.impl.ScalarFunctionImpl;

/** The UDF which implements its functionality by invoking the `eval` method of a class */
public abstract class ScalarUDF implements UserDefinedFunctionHelper {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all UDF functions has different implementations for different arguments. So I think this class is useless?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, useless for now. I'm not sure if it could be useful in the future.

This is for the case where one reflective method could cover the functionality of a operator, like the BuiltInMethod in calcite. For more complex implementation like handling different arguments needs different logic, should use ImplementorUDF

* all private Use method directly in {@link BuiltInMethod} if possible, most operators'
* implementor could be substituted by a single method.
*/
private static Expression invokeCalciteImplementor(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this method invoke with private static

Comment thread core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java Outdated
Comment thread core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java Outdated
Comment thread core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java Outdated
Comment thread core/src/main/java/org/opensearch/sql/expression/function/SpanFunctionImpl.java Outdated
}
}

@Strict
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation applied to a user-defined function that indicates that the function returns null if and only if one or more of its arguments are null. please confirm

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that satisfy our expectation.

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
…function-refactoring

# Conflicts:
#	core/src/main/java/org/opensearch/sql/calcite/utils/BuiltinFunctionUtils.java
#	core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java
Signed-off-by: Heng Qian <qianheng@amazon.com>
@penghuo penghuo merged commit b0384a8 into opensearch-project:main Apr 9, 2025
22 checks passed
penghuo pushed a commit that referenced this pull request Jun 16, 2025
---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
@qianheng-aws qianheng-aws deleted the calcite-function-refactoring branch March 26, 2026 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC][Enhancement][Calcite Engine] UDF utils refactoring

5 participants