-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Introduce vectorized UDF api #2047
Conversation
2192c5f
to
3970e93
Compare
ibis/expr/operations.py
Outdated
| """Node for element wise UDF. | ||
| """ | ||
|
|
||
| func = Arg(rlz.noop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type these any stronger?
| ), | ||
| id='int_float', | ||
| ), | ||
| pytest.param( | ||
| dt.int64, | ||
| True, | ||
| id='int_bool', | ||
| marks=pytest.mark.xfail( | ||
| raises=com.IbisTypeError, | ||
| marks=pytest.mark.skip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you changing to skips? xfail are more appropriate here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not sure what to do with these tests. They test a very small/unimportant use case of the UDF - calling UDF directly on a Scalar, e.g., my_udf(1).execute() which is same as my_udf.func(1) + type checking. And they are hard to fix. (I have spent half day trying to fix them but failed).
Given that this is a very small use case I was thinking not to spend too much time on it. Also, type checking / promoting is not well supported for UDFs now anyway.
The being said, I could give it another attempt if you feel this is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with not-fixing, but then can you segregate the failing ones and instead of skipping, explicity check that they produce the correct error / type mismatch. and add a comment. skipping tests basically get ignored forever :-<
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. a couple of test comments. can you add a note in the whatsnew. a followon PR for docs would be great (or in this PR ok too!). ping on green.
| ), | ||
| id='int_float', | ||
| ), | ||
| pytest.param( | ||
| dt.int64, | ||
| True, | ||
| id='int_bool', | ||
| marks=pytest.mark.xfail( | ||
| raises=com.IbisTypeError, | ||
| marks=pytest.mark.skip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with not-fixing, but then can you segregate the failing ones and instead of skipping, explicity check that they produce the correct error / type mismatch. and add a comment. skipping tests basically get ignored forever :-<
81f0cc4
to
de68071
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @icexelloss
|
let's followup with an example in the docs proper (pls create an issue for this). |
One of Ibis' design goal is to have one common expression language that works for multiple backend execution engines. However, the current UDF api contradicts this design goal because the UDF is defined under per backend namespace (e.g.,
ibis.pandas.udf) and therefore, ibis program that contains UDF cannot move to another backend.I propose to introduce a top level ibis UDF api,
ibis.udf.vectorizedto address this issue. Vectorized UDFs take vectorized data structure as input, this currently meansnumpy.ndarraypd.Seriesandpd.DataFrameand return Python scalar or vectorized data structure as output.This is proposed as a very experimental API.
In this PR, I have made the following changes in particular:
ibis.udf.vectorizednamespaceibis.pandas.udfto useibis.udf.vectorizeddirectlyibis.udf.vectorizedibis/tests/all/test_vectorized_udf.pyThe following is not included in the PR and left as future work
ibis.udf.vectorizedAPIibis.pandas.udf)This PR addresses issue #2048