Skip to content
This repository has been archived by the owner on Jul 3, 2020. It is now read-only.

Frames with mixed types: Determine a column's type? Cast from Any to a more specific type? #86

Closed
JeffreyBenjaminBrown opened this issue Oct 27, 2019 · 2 comments

Comments

@JeffreyBenjaminBrown
Copy link

JeffreyBenjaminBrown commented Oct 27, 2019

How to determine column types in a Frame with mixed types?

(Here's another issue on heterogeneous data.)

After some head-scratching I managed to create a data frame with mixed types:

import org.saddle.io._
val u = org.saddle.Series( 0,1,2 )
val v = org.saddle.Series( "0","1","2" )
val f = org.saddle.Frame(
  u . asInstanceOf[ org.saddle.Series[Int,Any] ],
  v . asInstanceOf[ org.saddle.Series[Int,Any] ] )

Suppose one were to discover (say, months after forgetting why it was made) a mysterious Frame with such mixed types. Without retracing the code that generates it, how could one determine the type of its columns?

For instance, in the value f defined above, the elements at (0,0) and (0,1) look indistinguishable to me:

scala> f.at(0,0)
res77: org.saddle.scalar.Scalar[Any] = 0

scala> f.at(0,1)
res78: org.saddle.scalar.Scalar[Any] = 0

But one is a number and the other is a string.

I might be hoping for something like the dtype and dtypes methods from Python's Pandas. (I searched the codebase, didn't find the string "dtype".) Among other uses, that would let someone verify that the contents of a column are all of the same type -- violations of which seem both easy to achieve and difficult to debug.

@JeffreyBenjaminBrown JeffreyBenjaminBrown changed the title How to determine column types in a Frame with mixed types? Frames with mixed types: how to determine a column's type, how to cast from Any to a more specific type Oct 27, 2019
@JeffreyBenjaminBrown
Copy link
Author

Continuing the previous example, suppose we have the following mixed-type data frame:

import org.saddle.io._
val u = org.saddle.Series( 0,1,2 )
val v = org.saddle.Series( "0","1","2" )
val f = org.saddle.Frame(
  "u" -> u . asInstanceOf[ org.saddle.Series[Int,Any] ],
  "v" -> v . asInstanceOf[ org.saddle.Series[Int,Any] ] )

and we'd like to extract a column from it. The following will work:

val u2 : org.saddle.Series[Int,Any] =
  f . col("u") . colAt(0)

But that's unsafe, because it uses the signature Any. The compiler would permit me to do something dumb like append a list of empty maps to the end of it.

This, on the other hand, doesn't work:

val u3 : org.saddle.Series[Int,Int] = ( f
  . col("u") . colAt(0) )

It generates the following error:

error: type mismatch;
 found   : org.saddle.Series[Int,Any]
 required: org.saddle.Series[Int,Int]
Note: Any >: Int, but class Series is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
         . col("u") . colAt(0) )
                           ^
org.saddle.Series[Int,Any] <: org.saddle.Series[Int,Int]?
false

If a Frame's contents are of type Any, but I know that a given column in it has only integers, can I extract a Series of Ints from it, or does Any propogate to the type of everything derived from the Frame?

@JeffreyBenjaminBrown JeffreyBenjaminBrown changed the title Frames with mixed types: how to determine a column's type, how to cast from Any to a more specific type Frames with mixed types: Determine a column's type? Cast from Any to a more specific type? Oct 27, 2019
@JeffreyBenjaminBrown
Copy link
Author

JeffreyBenjaminBrown commented Oct 27, 2019

Solved. Not beautiful, but maybe good enough to keep me out of trouble.

Determine the type of a cell:

data.colAt(0).raw(0).getClass

Cast a column of Any values to something more specific:

import org.saddle._

val u = org.saddle.Series(0,1,2)
val v = org.saddle.Series("a","b","c")
val data = org.saddle.Frame(
  u . asInstanceOf[ org.saddle.Series[Int,Any] ],
  v . asInstanceOf[ org.saddle.Series[Int,Any] ] )

def getNumericCol[A,B] (
    // Unsafe -- could be called where it makes no sense
    f : org.saddle.Frame[A,B,Any],
    i : B )
    : org.saddle.Series[A,Int] = {
  f . col(i) . colAt(0) .
    asInstanceOf[ org.saddle.Series[A,Int] ] }

getNumericCol(data,0) // Works!
getNumericCol(data,1) // "Successfully" called where it makes no sense.
          // I wish it threw an error.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant