Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(flink): implement UDF support for the backend #8142

Merged
merged 12 commits into from
Feb 20, 2024

Conversation

deepyaman
Copy link
Contributor

@deepyaman deepyaman commented Jan 29, 2024

Description of changes

  • Add UDF translation rules to the Flink backend (copied from the base SQL backend, because Flink doesn't extend it)
    • Maybe this could be cleaner/not require copy-pasting?
  • Add a test for a built-in scalar UDF
  • Add a test for a built-in aggregation UDF
  • Implement scalar Python UDFs
  • Implement scalar pandas UDFs

Issues closed

Resolves #8108

@deepyaman deepyaman force-pushed the feat/flink/builtin-scalar-udfs branch from 9e93056 to 8dae770 Compare January 29, 2024 22:19
@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2024

Any chance we can hold off on this for the 8.0 release and target the-epic-split? Flink isn't yet ported (working on it now) so there's nothing to code against for the moment.

@deepyaman
Copy link
Contributor Author

deepyaman commented Jan 29, 2024

Any chance we can hold off on this for the 8.0 release and target the-epic-split? Flink isn't yet ported (working on it now) so there's nothing to code against for the moment.

Sure! I don't think there's any rush on this.

Maybe I'll just keep this branch for now, implement more UDF stuff (since you mentioned TES doesn't affect UDF too much), and then will make whatever changes and/or rebase on top of the new branch, raising a new PR if need be?

@deepyaman
Copy link
Contributor Author

UDF tests pass locally:

(ibis-dev) deepyaman@Deepyamans-Air ibis % poetry run pytest --junitxml=junit.xml --cov=ibis --cov-report=xml:coverage.xml -m flink --randomly-dont-reorganize ibis/backends/tests/test_udf.py 
============================================================================================= test session starts ==============================================================================================
platform darwin -- Python 3.10.13, pytest-7.4.3, pluggy-1.3.0
Using --randomly-seed=3454285181
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/deepyaman/github/claypotai/ibis
configfile: pyproject.toml
plugins: repeat-0.9.2, hypothesis-6.88.1, snapshot-0.0.0, pytest_httpserver-1.0.8, randomly-3.15.0, cov-4.1.0, clarity-1.0.1, mock-3.12.0, xdist-3.3.1, benchmark-4.0.0
collected 100 items / 95 deselected / 5 selected                                                                                                                                                               

ibis/backends/tests/test_udf.py Exception ignored in: <_io.FileIO name=26 mode='wb' closefd=True>
Traceback (most recent call last):
  File "/opt/miniconda3/envs/ibis-dev/lib/python3.10/site-packages/_pytest/unraisableexception.py", line 59, in __exit__
    del self.unraisable
ResourceWarning: unclosed file <_io.BufferedWriter name=26>
Exception ignored in: <_io.FileIO name=27 mode='rb' closefd=True>
Traceback (most recent call last):
  File "/opt/miniconda3/envs/ibis-dev/lib/python3.10/site-packages/_pytest/unraisableexception.py", line 59, in __exit__
    del self.unraisable
ResourceWarning: unclosed file <_io.BufferedReader name=27>
.xx.x                                                                                                                                                                    [100%]

--------------------------------------------------------------------- generated xml file: /Users/deepyaman/github/claypotai/ibis/junit.xml ---------------------------------------------------------------------

--------- coverage: platform darwin, python 3.10.13-final-0 ----------
Coverage XML written to file coverage.xml

================================================================================= 2 passed, 95 deselected, 3 xfailed in 24.28s =================================================================================

@deepyaman deepyaman changed the title feat(flink): add support for built-in, scalar UDFs feat(flink): add UDF support, excluding custom agg Feb 9, 2024
@deepyaman deepyaman force-pushed the feat/flink/builtin-scalar-udfs branch from 095d3b4 to 89825a3 Compare February 16, 2024 04:47
@deepyaman deepyaman force-pushed the feat/flink/builtin-scalar-udfs branch from c9ce8b8 to 1a83219 Compare February 16, 2024 19:14
@deepyaman deepyaman changed the title feat(flink): add UDF support, excluding custom agg feat(flink): implement UDF support for the backend Feb 16, 2024
@deepyaman deepyaman marked this pull request as ready for review February 16, 2024 20:25
@cpcloud cpcloud added this to the 9.0 milestone Feb 20, 2024
@cpcloud cpcloud added feature Features or general enhancements udf Issues related to user-defined functions flink Issues or PRs related to Flink labels Feb 20, 2024
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 it!

@cpcloud
Copy link
Member

cpcloud commented Feb 20, 2024

@deepyaman Can we wait for #8398 to be merged? You'll be able to delete some boilerplate in this PR once that one's in.

@gforsyth
Copy link
Member

#8398 is in now

@deepyaman
Copy link
Contributor Author

@deepyaman Can we wait for #8398 to be merged? You'll be able to delete some boilerplate in this PR once that one's in.

Sounds good! I'll update this afternoon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to revert the changes in this file.

Copy link
Contributor Author

@deepyaman deepyaman Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other changes are still valid, but I've reverted the (overriding) fallback behaviors.

@cpcloud cpcloud enabled auto-merge (squash) February 20, 2024 22:36
@cpcloud
Copy link
Member

cpcloud commented Feb 20, 2024

Thanks!

@cpcloud cpcloud merged commit a3b1cc6 into ibis-project:main Feb 20, 2024
74 checks passed
@deepyaman deepyaman deleted the feat/flink/builtin-scalar-udfs branch February 21, 2024 01:35
ncclementi pushed a commit to ncclementi/ibis that referenced this pull request Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements flink Issues or PRs related to Flink udf Issues related to user-defined functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(flink): implement UDF support for the backend
3 participants