feat(spark): add some numeric function mappings#317
feat(spark): add some numeric function mappings#317Blizzara merged 1 commit intosubstrait-io:mainfrom
Conversation
Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Blizzara
left a comment
There was a problem hiding this comment.
Thanks! Approved but with a note on the StddevSamp
| s[Max]("max"), | ||
| s[First]("any_value"), | ||
| s[HyperLogLogPlusPlus]("approx_count_distinct") | ||
| s[HyperLogLogPlusPlus]("approx_count_distinct"), |
There was a problem hiding this comment.
tehcnically the "approx_count_distinct" says it's HyperLogLog while this is HLL++, but given the ++ should just be better that's probably fine!
| s[First]("any_value"), | ||
| s[HyperLogLogPlusPlus]("approx_count_distinct") | ||
| s[HyperLogLogPlusPlus]("approx_count_distinct"), | ||
| s[StddevSamp]("std_dev") |
There was a problem hiding this comment.
Substrait's std_dev needs an option to specify if it's sample or population based: https://github.com/substrait-io/substrait/blob/9cccb04fba336489b70ed42b71f73a0a1e34f9f5/extensions/functions_arithmetic.yaml#L1335. Spark has both, as StddevSamp and StddevPop.
Still I think this is fine to merge liek this for now, just noting that plans produced without the option may not work as expected elsewhere or in future with Spark.
I have some code for handling the options which should work here, I'll try to push it up soon.
| "q70", "q71", "q73", "q76", "q77", "q79", | ||
| "q80", "q81", "q82", "q85", "q86", "q87", "q88", | ||
| "q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99") | ||
| val failingSQL: Set[String] = Set( |
No description provided.