Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage of Spark types #228

Open
27 of 39 tasks
jserranohidalgo opened this issue Jun 10, 2022 · 0 comments
Open
27 of 39 tasks

Coverage of Spark types #228

jserranohidalgo opened this issue Jun 10, 2022 · 0 comments

Comments

@jserranohidalgo
Copy link
Member

jserranohidalgo commented Jun 10, 2022

This issue tracks coverage of Spark types by doric. For each Spark type, doric must allow us:

  • To create a doric column of the corresponding Scala type
  • To collect Scala values from fields of Rows
  • To create doric columns of the corresponding Scala type from literal values

The underlying Spark data type assigned by doric for a given Scala type T should be the data type resolved by Spark's schemaFor[T]`. This data type will be determined statically by doric (through implicits), unlike the reflective approach followed by Spark.

List of types (as of Spark 3.2.1):

Null type

  • NullType
    • Null

Numeric types

  • IntegerType

    • Int
    • java.lang.Integer
  • LongType

    • Long
    • java.lang.Long
  • FloatType

    • Float
    • java.lang.Float
  • DoubleType

    • Double
    • java.lang.Double
  • ShortType

    • Short
    • java.lang.Short
  • ByteType

    • Byte
    • java.lang.Byte
  • DecimalType

    • Decimal
    • BigDecimal
    • java.math.BigDecimal
    • java.math.BigInteger
    • scala.math.BigInt

String types

  • StringType
    • String
    • Enumeration#Value
    • java.lang.Enum[_]

Binary type

  • BinaryType
    • Array[Byte]

Boolean type

  • BooleanType
    • Boolean
    • java.lang.Boolean

Datetime type

  • DateType

    • java.sql.Date
    • java.time.LocalDate
  • TimestampType

    • java.sql.Timestamp
    • java.time.Instant
  • CalendarIntervalType

    • org.apache.spark.unsafe.types.CalendarInterval

Interval type

  • DayTimeIntervalType

    • java.time.Duration
  • YearMonthIntervalType

    • java.time.Period

Array type

  • ArrayType
    • Array[_]
    • Seq[_]
    • Set[_]

Map type

  • MapType
    • Map[_, _]

Option types

  • Spark type for T
    • Option[T]

Struct types

  • StructType
    • Product (standard or user-defined case classes, in particular)
    • Row

User-defined types

  • Spark type
    • SQLUserDefinedType
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant