Skip to content

[spark] queries with boolen operator chain failure on Spark #835

@mbwhite

Description

@mbwhite

Queries that have a chain of boolean expressions eg x and y and z are failing in the SubstraitSpark handling - it's assuming only 2 arguments are valid.

Caused by: org.apache.spark.sql.AnalysisException: [WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `and` requires 2 parameters but the actual number is 3. Please, refer to 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix. SQLSTATE: 42605

The extensions in Substrait are labelled as variadic - so producing a plan like this is valid.

      "scalarFunction": {
          "functionReference": 1,  # << AND
          "outputType": {
              "bool": {
                  "nullability": "NULLABILITY_NULLABLE"
              }
          },
          "arguments": [
              {
                  "value": {
                
                  }
              },
              {
                  "value": {
                
                  }
              },
              {
                  "value": {
                
                  }
              }                                                                                                     
          ]
      }

What is 'interesting' is that these have been defined as variadic for a long time, but this has only been seen just now. I wonder if something has triggered Isthmus to start exploiting it.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions