Skip to content

kevinwallimann/abris-nullability

Repository files navigation

Spark NullIntolerant Operators may cause schema mismatches in ABRiS

This repo is to demonstrate how NullIntolerant expressions make a previously nullable expression non-nullable after query optimization. The nullability of the expression is only changed when the query is evaluated. This means that printSchema can still show nullable=true, even if the column will later be non-nullable.

This is problematic for ABRiS because it relies on the AvroSerializer which relies on the nullability information, which is lazily evaluated, i.e. after query optimization.

Therefore, an innocent looking === fails the query in this example:

      inputDf.filter(col("value1") === lit(42)) // causes IncompatibleSchemaException later on

because === extends the NullIntolerant trait and makes value1 non-nullable after optimization. To retain the nullability, the eqNullSafe operator can be used.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages