typedudf

Installation

In build.sbt add to libraryDependencies:

"com.github.lesbroot" %% "typedudf" % "1.1.6"

Motivation

Spark sql user defined function doesn't support deserializing struct types into Product types (case classes, tuples etc.), therefore you have to manually access the fields of a Row:

import org.apache.spark.sql.Row
import org.apache.spark.sql.functions._
import spark.implicits._

case class Foo(x: Int, y: String)
val df = spark.createDataFrame(Seq(Foo(1, "asd"), Foo(2, "qwe"))).select(struct("*").as("foo"))
val fooUdf = udf((row: Row) => row.getAs[Int]("x") + row.getAs[String]("y").length)
df.withColumn("sum", fooUdf($"foo"))

https://issues.apache.org/jira/browse/SPARK-12823?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

Usage

typedudf derives the necessary boilerplate to construct Product types based on the input type parameters of a given function:

import typedudf.TypedUdf
import typedudf.ParamEncoder._

val fooUdf = TypedUdf((foo: Foo) => foo.x + foo.y.length)
df.withColumn("sum", fooUdf($"foo"))

TODO

add more tests
benchmark
CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

typedudf

Installation

Motivation

Usage

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

typedudf

Installation

Motivation

Usage

TODO