[Calcite Engine] Function framework refactoring#3522
Conversation
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
| for (Map.Entry<CalciteFuncSignature, FunctionImp> implement : implementList) { | ||
| if (implement.getKey().match(functionName.getName(), argTypes)) { | ||
| return implement.getValue(); | ||
| } | ||
| } |
There was a problem hiding this comment.
In worst case, what is the maximum number of signatures that would need to be checked?
There was a problem hiding this comment.
The maximum is 2 only for LOG, while others always have only 1. LOG has 2 implementation -- LOG($1) and LOG($1, $2), the former is translated to LOG($1, E).
And we expect different implementation of one function should be included in the implementor of its related operator as much as possible, i.e. implementation details should be hidden in the physical-level instead of logical-level, unless we want to expose the logical detail of the implementation on purpose.(like if want integral division for integer, we may need the new added cast calling is exposed in the plan instead of being hidden in the implementor).
So in the visible future, the maximum number should be only 1 or very small. And we can deprecate this extra level if we strictly follow the above rule and requires only 1-to-1 mapping between function to operator.
| import org.apache.calcite.sql.type.SqlReturnTypeInference; | ||
| import org.apache.calcite.sql.validate.SqlUserDefinedFunction; | ||
|
|
||
| public interface UserDefinedFunctionHelper { |
There was a problem hiding this comment.
Why a helper an interface? Can be UserDefinedFunctionFactory or `UserDefinedFunctionTemplate?
| default SqlOperandMetadata getOperandMetadata() { | ||
| return null; | ||
| } | ||
| ; |
| case EXPR_TIME -> "evalTime"; | ||
| case EXPR_TIMESTAMP -> "evalTimestamp"; | ||
| default -> throw new IllegalArgumentException( | ||
| String.format("Unsupported expr udt: %s", exprSqlType.getUdt())); |
There was a problem hiding this comment.
UDT is internal implementation, we should hide internal error message from user.
| SpanFunctionImpl.class, methodName, String.class, int.class, String.class)); | ||
| return function.getImplementor().implement(translator, call, RexImpTable.NullAs.NULL); | ||
| } | ||
| throw new IllegalArgumentException(String.format("Unsupported RelDataType: %s", fieldType)); |
There was a problem hiding this comment.
RelDataType is internal implementation, we should hide internal error message from user.
| import org.apache.calcite.rel.type.RelDataType; | ||
|
|
||
| /** Function signature is composed by function name and arguments list. */ | ||
| public record CalciteFuncSignature(FunctionName functionName, List<RelDataType> funcArgTypes) { |
There was a problem hiding this comment.
Do we really need need this signature? If we follow the SpanFunctionImpl, we could put all real implement function like evalDate into same class. Also, the argument validation check would be in another PR. I think the only reason here is about implicit conversion. But I think it's already disabled in calcite now?
There was a problem hiding this comment.
I'm wondering if there is chance that:
-
an function may has different operator depends on params' type or size and wants this information exposed on our plan. Although, by most cases, we expect the one-to-one relationship between function and operator and hide different processing logic in one operator.
-
different functions map to one operator while each of them has their specific limitation on the params' size or type. E.g. We use operator
NOT_EQUALfor both function!=andXOR, whileXORhas its limitation that only accepting boolean, so we could put this limitation check on its signature. Of course, we can implementXORby an UDF with more efforts. It depends on whether we want to expose this implementation details to customer in the plan or not.
It stills remains discussion on the above 2 cases. We can keep this redundant level here since it doesn't harm the functionality or performance, until the final decision in the end.
| import org.apache.calcite.schema.impl.ScalarFunctionImpl; | ||
|
|
||
| /** The UDF which implements its functionality by invoking the `eval` method of a class */ | ||
| public abstract class ScalarUDF implements UserDefinedFunctionHelper { |
There was a problem hiding this comment.
Almost all UDF functions has different implementations for different arguments. So I think this class is useless?
There was a problem hiding this comment.
Yeah, useless for now. I'm not sure if it could be useful in the future.
This is for the case where one reflective method could cover the functionality of a operator, like the BuiltInMethod in calcite. For more complex implementation like handling different arguments needs different logic, should use ImplementorUDF
| * all private Use method directly in {@link BuiltInMethod} if possible, most operators' | ||
| * implementor could be substituted by a single method. | ||
| */ | ||
| private static Expression invokeCalciteImplementor( |
There was a problem hiding this comment.
How does this method invoke with private static
| } | ||
| } | ||
|
|
||
| @Strict |
There was a problem hiding this comment.
This annotation applied to a user-defined function that indicates that the function returns null if and only if one or more of its arguments are null. please confirm
There was a problem hiding this comment.
Yes, I think that satisfy our expectation.
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
…function-refactoring # Conflicts: # core/src/main/java/org/opensearch/sql/calcite/utils/BuiltinFunctionUtils.java # core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java
Signed-off-by: Heng Qian <qianheng@amazon.com>
--------- Signed-off-by: Heng Qian <qianheng@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com>
Description
Do refactoring for the framework of function.
It includes change or enhancement:
PPLFuncImpTablefor registering logical-level implementation for resolving a BuiltinFunctionName. It also includes resolving for functions with different parameters by using CalciteFuncSignature.PPLBuiltinOperatorsfor all self-implemented UDFs. This class is a physical-level implementation of an operator.SqlStdOperatorTableandSqlLibraryOperatorsto the new framework, and also migrate the span function and typeof function.The new framework aims to hide all implementation at the physical level, so the plan will be more concise and clear.
Take
spanfor example: for a time-related field, span will use a ScalarFunctionImpl (physical-level implementation, can only seespan(\$1, \$2, \$3)in the plan); while using logical implementation likeFLOOR($1 / $2) * $2for a non-time field. This makes the plan weird and confusing, and it also blocks push-down since the push-down rule takes effect by identifying thespanfunction operator. After migrating, theSPANoperator is able to handle both cases internally.PPLFuncImpTablewill provide more flexible implementation of a function than only using SqlOperator.Take
typeoffor example, after migrating, the new implementation will replace using ScalarFunctionTYPEOF(\$1)with returning aRexLiteraldirectly, which avoids row-by-row evaluation.Related Issues
Resolves #3493
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.