-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: PySpark compiler cannot compile elementwise UDF for some output types #2223
BUG: PySpark compiler cannot compile elementwise UDF for some output types #2223
Conversation
|
can you add a test which hits this (and test for the resulting error message / type). also pls add a release note. |
| @@ -1476,7 +1477,7 @@ def compile_not_null(t, expr, scope, **kwargs): | |||
| @compiles(ops.ElementWiseVectorizedUDF) | |||
| def compile_elementwise_udf(t, expr, scope): | |||
| op = expr.op() | |||
| spark_output_type = ibis_dtype_to_spark_dtype(op._output_type) | |||
| spark_output_type = spark_dtype(op._output_type) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? What's the difference between ibis_dtype_to_spark_dtype and spark_dtype? It is confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, ibis_dtype_to_spark_dtype doesn't handle all Ibis DataTypes
spark_dtype is the "main"/generic function that converts an Ibis type to a Spark type
ibis_dtype_to_spark_dtype is one of the many functions that overloads it—This one's used specifically when the argument is an Ibis DataType in this list, but won't handle anything else
More info (getting into the details):
"this list" = All the Ibis DataTypes and Spark types that can be "trivially" converted in either direction
There are some Ibis DataType subclasses that are not in the list. E.g. Ibis Decimal, because converting a Spark DecimalType to an Ibis Decimal is not "trivial" (can't just return dt.Decimal() and call it a day—we need to construct an Ibis Decimal with the correct precision and scale from the original Spark DecimalType). Essentially, ibis_dtype_to_spark_dtype covers most Ibis DataType subclasses but has exceptions. But using spark_dtype would cover those exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little confusing, I'll see if I can come up with function names that are clearer. It is tricky though
Also I just realized Timestamp I think actually belongs in the list of types that ibis_dtype_to_spark_dtype handles because both ways the conversion is trivial. I'll refactor that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't make any changes to the function names in this PR because I'd have to slightly refactor code that's unrelated to this bugfix if I do
I think the best action though (for a separate PR) is to rename ibis_dtype_to_spark_dtype to _ibis_dtype_to_spark_dtype (basically, shouldn't be used outside of that module because it isn't clear exactly what types it can handle), and replace usages of ibis_dtype_to_spark_dtype with spark_dtype
…park Timestamp, let existing functions spark_dtype_to_ibis_dtype and ibis_dtype_to_spark_dtype handle conversion
I was taking a stab at this but I think this isn't big enough to add a test for. Elementwise UDFs in PySpark with most I was thinking of writing tests for I could also write a set of tests, where each test verifies that an elementwise UDF with XXXX Let me know what you think |
|
thanks @timothydijamco |
The PySpark compiler will raise a
when trying to compile a
ElementWiseVectorizedUDFnode whoseoutput_typeargument is not in this list (e.g.DecimalandTimestampTypeis not in the list)This PR:
spark_dtypeinstead ofibis_dtype_to_spark_dtype)spark_dtypeto convert an IbisTimestampTypeto an SparkTimestampType(add another dispatch function tospark_dtype)