Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hive scalar function #16737

Merged
merged 4 commits into from Dec 15, 2021

Conversation

yulongfufu
Copy link
Contributor

@yulongfufu yulongfufu commented Sep 13, 2021

Support Hive built-in scalar function in Presto.

These functions are already in Presto dependency(com.facebook.presto.hive:hive-apache). There are several steps to reuse Hive function :

  • Convert the type of parameters between presto function and Hive.
  • Bridge the instance of hive function into presto function.
  • Load hive function by using the feature of function namespace to load hive function namespace.

The overall changes are very less intrusive to the original code and when users don't use hive function namespace i.e.hive.default.function_name, no behavior changes.

Related issue: #16478
Proposal: #16592
Design doc: https://bytedance.feishu.cn/docs/doccnA5uuqTrBFIc1SiwrahulGh

Framework

image

Description of Changes

Add a new plugin presto-hive-functions

Introduce HiveFunctionRegistry interface to registry hive function. We can implement different FunctionRegistries to get different function classes (e.g. hive built-in function / user-defined hive function which stored in external storage).
Implement SimpleHiveFunctionRegistry to get hive scalar function class through FunctionRegistry which likes org.apache.hadoop.hive.ql.exec.FunctionRegistry.
Introduce encoder / decoder to convert type between Presto and Hive.
Introduce HiveScalarFunctionInvoker to create、initialize and bridge Hive scalar function instance. At last the real execution of Hive Function will be called through reflection i.e.Invoker.evaluate.
Introduce HiveFunctionNamespaceManager to load HiveFunctionNamespace.

Usage

It will load the hive built-in function at runtime when using the hive function namespace i.e.hive.default.function_name, but when users don't use hive catalog function namespace, no behavior changes.

For example:

Similar function which have the same function name:

select hive.default.concat('f', 'b'); Hive#substr
select concat('f', 'b'); Presto#substr

Hive function:

presto> SELECT hive.default.isnull(null);
      _col0       
------------------
true
(1 row)
presto> SELECT hive.default.str_to_map('a:1,b:2,c:3', ',', ':');
      _col0      
-----------------
 {a=1, b=2, c=3} 
(1 row)

Roadmap

It's the first PR which supports hive built-in scalar function. And then we will support hive built-in aggregation function and user-defined hive function which stored in external storage in later Prs.

  • support hive built-in scalar function.
  • support hive built-in aggregation function.
  • support user-defined hive function which stored in external storage.
  • support AUTOMATIC mode to invoke hive udf.
  • support hive simpleUdf / simpleUdaf (the FunctionRegestry in hive-3.x can not be initialize).
  • support constant type.

Test plan

  • Added unit tests

  • Verified in online cluster for one year(300k ~ 500k sql / day)

== RELEASE NOTES ==
General Changes
* Support Hive scalar function

@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 3 times, most recently from b00f144 to 4769c3f Compare September 28, 2021 03:09
@yulongfufu yulongfufu changed the title Support hive scalar function. Support hive scalar function Sep 28, 2021
@yulongfufu yulongfufu marked this pull request as ready for review September 28, 2021 08:26
@yulongfufu
Copy link
Contributor Author

@prithvip @pettyjamesm @v-jizhang Hi, would you mind helping me review this feature?

@rongrong
Copy link
Contributor

Have we thought about implementing this as a separate plugin rather than modifying presto-hive? That would give people more flexibility on enabling the feature. For example, at Facebook, we have multiple catalog using the Hive connector, but we might want to only have a single catalog for hive functions.

@yulongfufu
Copy link
Contributor Author

Have we thought about implementing this as a separate plugin rather than modifying presto-hive?

It is feasible. I would love to revise it If it is necessary.

we have multiple catalog using the Hive connector, but we might want to only have a single catalog for hive functions.

Another idea: Whether use the feature or not is decided by different hive.properties.

presto-hive/pom.xml Outdated Show resolved Hide resolved
@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 2 times, most recently from c9c2272 to 183813f Compare October 18, 2021 12:52
@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 3 times, most recently from 3f72bf3 to b13633b Compare November 9, 2021 17:29
@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 2 times, most recently from 8c09f1d to 1f0cf93 Compare November 13, 2021 15:52
@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 2 times, most recently from 63daaf2 to e3b5c68 Compare November 22, 2021 10:57
Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Some final nits. Thanks for working on this!

@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 2 times, most recently from 897506c to 16ee2ef Compare December 9, 2021 11:22
yulongfufu and others added 3 commits December 9, 2021 19:58
Introduce HiveFunctionRegistry interface to registry hive function.
Implement SimpleHiveFunctionRegistry to get hive scalar function class.
FunctionRegistry implements a mapping between function names and
function classes like org.apache.hadoop.hive.ql.exec.FunctionRegistry.

Co-authored-by: Todd Gao <todd.gao.2013@gmail.com>
Co-authored-by: zybing <739551801@qq.com>
Co-authored-by: Todd Gao <todd.gao.2013@gmail.com>
Co-authored-by: zybing <739551801@qq.com>
HiveScalarFunctionInvoker bridge Hive scalar function instance.

Co-authored-by: Todd Gao <todd.gao.2013@gmail.com>
Co-authored-by: zybing <739551801@qq.com>
@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 3 times, most recently from 2fbbff6 to bc93078 Compare December 11, 2021 17:06
@yulongfufu yulongfufu force-pushed the support_hive_scalar_function branch 4 times, most recently from 375a412 to df0784e Compare December 14, 2021 04:04
Co-authored-by: Todd Gao <todd.gao.2013@gmail.com>
Co-authored-by: zybing <739551801@qq.com>
@rongrong rongrong merged commit 204d3fa into prestodb:master Dec 15, 2021
40 checks passed
@beinan
Copy link
Member

beinan commented Dec 21, 2021

Hello @rongrong & @yulongfufu , I'm seeing the error below after this pr merged. Anything I missed?

java.lang.IllegalStateException: No factory for function namespace manager hive-functions
at com.google.common.base.Preconditions.checkState(Preconditions.java:588)
at com.facebook.presto.metadata.FunctionAndTypeManager.loadFunctionNamespaceManager(FunctionAndTypeManager.java:166)
at com.facebook.presto.metadata.StaticFunctionNamespaceStore.loadFunctionNamespaceManager(StaticFunctionNamespaceStore.java:81)
at com.facebook.presto.metadata.StaticFunctionNamespaceStore.loadFunctionNamespaceManagers(StaticFunctionNamespaceStore.java:63)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:165)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:85)

@beinan
Copy link
Member

beinan commented Dec 21, 2021

nvm, I found I need ../presto-hive-function-namespace/pom.xml in the bundle

@yohengyang
Copy link

yohengyang commented Jan 14, 2022

@yulongfufu I tested the pmod method and an exception occurred

check(selectF("pmod", "c_integer", "c_integer"), INTEGER, 1L);

The error stack is as follows, do you know how to fix it?

java.lang.RuntimeException: HIVE_FUNCTION_INITIALIZATION_ERROR

	at com.facebook.presto.tests.AbstractTestingPrestoClient.execute(AbstractTestingPrestoClient.java:124)
	at com.facebook.presto.tests.AbstractTestingPrestoClient.execute(AbstractTestingPrestoClient.java:89)
	at com.facebook.presto.hive.functions.TestHiveScalarFunctions.check(TestHiveScalarFunctions.java:150)
	at com.facebook.presto.hive.functions.TestHiveScalarFunctions.genericFunction(TestHiveScalarFunctions.java:59)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
	at org.testng.TestRunner.privateRun(TestRunner.java:756)
	at org.testng.TestRunner.run(TestRunner.java:610)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:387)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:382)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:340)
	at org.testng.SuiteRunner.run(SuiteRunner.java:289)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1293)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1218)
	at org.testng.TestNG.runSuites(TestNG.java:1133)
	at org.testng.TestNG.run(TestNG.java:1104)
	at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)
	at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:110)
Caused by: com.facebook.presto.spi.PrestoException: HIVE_FUNCTION_INITIALIZATION_ERROR
	at com.facebook.presto.hive.functions.HiveFunctionErrorCode.initializationError(HiveFunctionErrorCode.java:88)
	at com.facebook.presto.hive.functions.scalar.HiveScalarFunctionInvoker.createFunctionInvoker(HiveScalarFunctionInvoker.java:114)
	at com.facebook.presto.hive.functions.scalar.HiveScalarFunction.createHiveScalarFunction(HiveScalarFunction.java:62)
	at com.facebook.presto.hive.functions.HiveFunctionNamespaceManager.initializeFunction(HiveFunctionNamespaceManager.java:198)
	at com.facebook.presto.hive.functions.HiveFunctionNamespaceManager.resolveFunction(HiveFunctionNamespaceManager.java:148)
	at com.facebook.presto.metadata.FunctionAndTypeManager.resolveFunctionInternal(FunctionAndTypeManager.java:535)
	at com.facebook.presto.metadata.FunctionAndTypeManager.resolveFunction(FunctionAndTypeManager.java:365)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer.resolveFunction(ExpressionAnalyzer.java:1512)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitFunctionCall(ExpressionAnalyzer.java:944)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitFunctionCall(ExpressionAnalyzer.java:323)
	at com.facebook.presto.sql.tree.FunctionCall.accept(FunctionCall.java:136)
	at com.facebook.presto.sql.tree.StackableAstVisitor.process(StackableAstVisitor.java:26)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.process(ExpressionAnalyzer.java:346)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer.analyze(ExpressionAnalyzer.java:289)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer.analyzeExpression(ExpressionAnalyzer.java:1605)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeExpression(StatementAnalyzer.java:2564)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeSelect(StatementAnalyzer.java:2389)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1495)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:316)
	at com.facebook.presto.sql.tree.QuerySpecification.accept(QuerySpecification.java:138)
	at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:330)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:340)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1080)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:316)
	at com.facebook.presto.sql.tree.Query.accept(Query.java:105)
	at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:330)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:308)
	at com.facebook.presto.sql.analyzer.Analyzer.analyzeSemantic(Analyzer.java:89)
	at com.facebook.presto.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:189)
	at com.facebook.presto.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:97)
	at com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:796)
	at com.facebook.presto.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:126)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseNumeric.initialize(GenericUDFBaseNumeric.java:107)
	at com.facebook.presto.hive.functions.scalar.HiveScalarFunctionInvoker.createFunctionInvoker(HiveScalarFunctionInvoker.java:84)
	... 38 more

@yulongfufu
Copy link
Contributor Author

The error stack is as follows, do you know how to fix it?

ohh...I remember it because we met it before. The pmod fails to initialize because it extends the GenericUDFBaseNumeric class, whose initialization method need SessionState.get().getConf(), but we can't get the SessionState in presto by this implementation. It is also a problem in our company's version, a similar problem exists with all functions extend the GenericUDFBaseNumeric class. If you want to support pmod by this feature, you can set confLookupNeeded as false by setConfLookupNeeded method to skip the SessionState logic.
btw, It is not a difficult function, right? maybe you can implement it like mod in presto (the mod function in presto has a little bit different with the pmod in hive).
Thank you very much! @yohengyang

@yohengyang
Copy link

@yulongfufu
Thanks for the help, I'm porting this patch to trino and it almost worked. A few more questions:

  • How to solve the Denpendency problem for hive-3.x, I read the documents you wrote, but I still don't know what to do.
    The root cause is that 3.x doesn't shade the jar that get_splits function needs, I tried to directly introduce hive-llap-tez dependency in presto-hive-function-namespace module, but there are a lot of dependency conflicts, I am not sure I do is it right or not. What exactly does Rewrite Functionregistry in presto mean?
  • Will you continue to update the unfinished functions in roadmap?

@dwshmilyss
Copy link

dwshmilyss commented Oct 17, 2022

when i ran presto , i got this error
Function hive.default.concat not registered
my presto version is 0.275, and detail excepiton is

com.facebook.presto.sql.analyzer.SemanticException: line 1:8: Function hive.default.isnull not registered
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer.resolveFunction(ExpressionAnalyzer.java:1622)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitFunctionCall(ExpressionAnalyzer.java:959)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitFunctionCall(ExpressionAnalyzer.java:330)
	at com.facebook.presto.sql.tree.FunctionCall.accept(FunctionCall.java:136)
	at com.facebook.presto.sql.tree.StackableAstVisitor.process(StackableAstVisitor.java:26)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.process(ExpressionAnalyzer.java:353)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer.analyze(ExpressionAnalyzer.java:296)
	at com.facebook.presto.sql.analyzer.ExpressionAnalyzer.analyzeExpression(ExpressionAnalyzer.java:1711)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeExpression(StatementAnalyzer.java:2627)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeSelect(StatementAnalyzer.java:2452)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1511)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:321)
	at com.facebook.presto.sql.tree.QuerySpecification.accept(QuerySpecification.java:138)
	at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:335)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:343)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1089)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:321)
	at com.facebook.presto.sql.tree.Query.accept(Query.java:105)
	at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:335)
	at com.facebook.presto.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:313)
	at com.facebook.presto.sql.analyzer.Analyzer.analyzeSemantic(Analyzer.java:93)
	at com.facebook.presto.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:196)
	at com.facebook.presto.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:100)
	at com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:828)
	at com.facebook.presto.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:127)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
	```

@rongrong
Copy link
Contributor

Did you configure the function namespace? I don't think it enabled by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants