Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for flink udfs #5

Open
amalakar opened this issue Jan 24, 2019 · 4 comments
Open

Add support for flink udfs #5

amalakar opened this issue Jan 24, 2019 · 4 comments

Comments

@amalakar
Copy link

I think transport is an excellent project which lets us use the same udf implementation across various query engines. This is a request to add support for flink udfs which would increase the usability of this project for stream compute use cases as well.

@wmoustafa
Copy link
Contributor

Arup, thank you for considering Transport. Supporting Flink UDFs sounds like a great idea. Please feel free to create a pull request. We will be happy to answer any questions, and work with you to get the code checked in.

@zacharywhitley
Copy link

I've got some other platforms that I'd like to target as well Apache Jena, Confluent KSQL, OpenFaaS, etc. I'll dig into the code first but do you have any pointers on where to start or is there a place better than right here to ask questions?

@rdsr
Copy link

rdsr commented Apr 11, 2019

@zacharywhitley There's a user guide here: https://github.com/linkedin/transport/tree/master/transportable-udfs-documentation and a few examples also here: https://github.com/linkedin/transport/tree/master/transportable-udfs-examples to get you started. Let us know if u need any help.

@wmoustafa
Copy link
Contributor

wmoustafa commented Apr 11, 2019

@zacharywhitley Thanks for your interest. We are in the process of creating a mailing list and referencing it from the Readme section. To implement additional platforms, please see those classes:

https://github.com/linkedin/transport/blob/master/transportable-udfs-presto/src/main/java/com/linkedin/transport/presto/StdUdfWrapper.java
https://github.com/linkedin/transport/blob/master/transportable-udfs-hive/src/main/java/com/linkedin/transport/hive/StdUdfWrapper.java

Those are the classes that make the Transport UDF classes look to the engine as a native UDF classes. Those classes contain a couple of abstract methods that still need to be implemented by a child class. Here are two example child classes for Presto and Hive. Those are UDF-specific.
https://github.com/linkedin/transport/blob/master/transportable-udfs-examples/transportable-udfs-example-udfs-presto/src/main/java/com/linkedin/transport/examples/presto/MapFromTwoArraysFunctionWrapper.java

https://github.com/linkedin/transport/blob/master/transportable-udfs-examples/transportable-udfs-example-udfs-hive/src/main/java/com/linkedin/transport/examples/hive/MapFromTwoArraysFunctionWrapper.java

While today those kind of small wrappers (for example shown in the last two links) are still being manually provided by the user for the framework to produce the engine-specific jars, there is ongoing work (very close to finishing) to generate those small wrappers automatically using a Gradle plugin. You can follow the progress here: #10

Ideally, when a platform support is added, its wrapper auto-generation plugin should also be added. But for now, adding support in the form of StdWrapper can be a great start. Please let us know if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants