Skip to content

feat(spark): bitwise functions#309

Merged
Blizzara merged 1 commit intosubstrait-io:mainfrom
andrew-coleman:bitwise
Oct 25, 2024
Merged

feat(spark): bitwise functions#309
Blizzara merged 1 commit intosubstrait-io:mainfrom
andrew-coleman:bitwise

Conversation

@andrew-coleman
Copy link
Copy Markdown
Member

Adds support in the spark module for 8-bit and 16-bit integer types and for some bitwise functions. The catalyst optimizer generates expressions using these for certain query types.

Note that shift_right (and other bit shifting functions) might want to be considered for the core substrait function catalog, but it has been added here (temporarily?) as spark extension pending a longer term discussion/decision on their wider utility.

extends TypeVisitor.TypeThrowsVisitor[DataType, RuntimeException]("Unknown expression type.") {

override def visit(expr: Type.I8): DataType = ByteType
override def visit(expr: Type.I16): DataType = ShortType
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're at it, mind adding these also to ToSparkType?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in ToSparkType. The conversion in ToSubstraitType is already there further down this file. It converts both ways.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, all good then!

}

override def visit(expr: SExpression.I16Literal): Expression = {
Literal(expr.value().asInstanceOf[Short], ToSubstraitType.convert(expr.getType))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here also the other direction for the conversion?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already covered in the other direction in ToSubstraitExpression on the following line:
case SubstraitLiteral(substraitLiteral) => Some(substraitLiteral)
The unapply method in this object invokes the conversion.

"org.apache.spark.sql.catalyst.expressions.PromotePrecision") =>
translateUp(p.children.head)
case CaseWhen(branches, elseValue) => translateCaseWhen(branches, elseValue)
case InSet(value, set) => translateIn(value, set.toSeq.map(v => Literal(v)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mind moving this next to the case In(..)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, although InSet needs to come before ScalarFunction otherwise it matches the latter (since it's a UnaryExpression). I've moved In up the list so they are together.

Params:
base – the base number to shift.
shift – number of bits to right shift.
impls:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in Spark the base can be either int or long, and return type is set accordingly. I we should add both options?

Quickly testing this, it works for select shiftright(col, 2) from (values (bigint(1234)) as table(col)) but not for select shiftright(col, 2) from (values (1234) as table(col)), so yep I think we need to list both versions here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it's fine to add the function here initially, but it'd be good to also file the PR/Issue on the core functions since this seems general enough, I think :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, although that might take a bit longer ;)

@andrew-coleman
Copy link
Copy Markdown
Member Author

I'll fixup the conflict :)

Adds support in the spark module for 8-bit and 16-bit integer types and for some bitwise functions.
The catalyst optimizer generates expressions using these for certain query types.

Note that `shift_right` (and other bit shifting functions) might want to be considered for the
core substrait function catalog, but it has been added here (temporarily?) as spark extension
pending a longer term discussion/decision on their wider utility.

Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Copy link
Copy Markdown
Contributor

@Blizzara Blizzara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Blizzara Blizzara merged commit b8ccd8b into substrait-io:main Oct 25, 2024
@andrew-coleman andrew-coleman deleted the bitwise branch October 25, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants