# 47 examples of Red Amber

Last update: July 15, 2022 / RedAmber Version 0.1.7

## 1. Install

Install requirements before you install Red Amber.

- Apache Arrow GLib (>= 8.0.0)

- Apache Parquet GLib (>= 8.0.0)  # if you need IO from/to Parquet resource.

  See [Apache Arrow install document](https://arrow.apache.org/install/).
  
  Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.

Then add this line to your Gemfile:
```
gem 'red_amber'
```

And then execute:
```
$ bundle install
```

Or install it yourself as:
```
$ gem install red_amber
```

## 2. Require

In [1]:
require 'red_amber' # require 'red-amber' is also OK
include RedAmber
VERSION

"0.1.7"

## 3. Initialize

In [2]:
# From a Hash
DataFrame.new(x: [1, 2, 3], y: %w[A B C])

x,y
1,A
2,B
3,C


In [3]:
# From a schema and a column array
DataFrame.new({ x: :uint8, y: :string }, [[1, 'A'], [2, 'B'], [3, 'C']])

x,y
1,A
2,B
3,C


In [4]:
# From a Arrow::Table
table = Arrow::Table.new(x: [1, 2, 3], y: %w[A B C])
DataFrame.new(table)

x,y
1,A
2,B
3,C


In [5]:
# From a Rover::DataFrame
require 'rover'
rover = Rover::DataFrame.new(x: [1, 2, 3], y: %w[A B C])
DataFrame.new(rover)

x,y
1,A
2,B
3,C


In [6]:
# from a red-datasets
require 'datasets-arrow'
dataset = Datasets::Penguins.new
penguins = DataFrame.new(dataset.to_arrow)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,(nil),(nil),(nil),(nil),(nil),2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


It should be in future version;
```ruby
require 'datasets-red-amber'
penguins = Datasets::Penguins.new.to_red_amber
```

In [7]:
dataset = Datasets::Rdatasets.new('datasets', 'mtcars')
mtcars = DataFrame.new(dataset.to_arrow)

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6
15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2


## 4. Load

`RedAmber::DataFrame` delegates `#load` to `Arrow::Table#load`. We can load from `[.arrow, .arrows, .csv, .csv.gz, .tsv]` files.

In [8]:
DataFrame.load("test/entity/with_header.csv")

name,age
Yasuko,68
Rui,49
Hinata,28


## 5. Load from a URI

In [9]:
uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
DataFrame.load(uri)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
Adelie,Torgersen,39.1,18.7,181,3750,MALE
Adelie,Torgersen,39.5,17.4,186,3800,FEMALE
Adelie,Torgersen,40.3,18.0,195,3250,FEMALE
Adelie,Torgersen,(nil),(nil),(nil),(nil),
⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,MALE
Gentoo,Biscoe,45.2,14.8,212,5200,FEMALE
Gentoo,Biscoe,49.9,16.1,213,5400,MALE


## 6. Save

In [10]:
penguins.save("file.arrow")
penguins.save("file.arrows")
penguins.save("file.csv")
penguins.save("file.csv.gz")
penguins.save("file.tsv")
penguins.save("file.feather")

true

## 7. to_s/inspect

`to_s` or `inspect` (it uses to_s inside) shows a preview of the dataframe.

It shows first 5 and last 3 rows if it has many rows. Columns are also omitted if line is exceeded 80 letters.

In [11]:
df = DataFrame.new(
  x: [1, 2, 3, 4, 5],
  y: [1, 2, 3, 0/0.0, nil],
  s: %w[A B C D] << nil,
  b: [true, false, true, false, nil])
p df; nil

#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000000f8fc>
        x        y s        b
  <uint8> <double> <string> <boolean>
1       1      1.0 A        true
2       2      2.0 B        false
3       3      3.0 C        true
4       4      NaN D        false
5       5    (nil) (nil)    (nil)



In [12]:
p penguins; nil

#<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f8ac>
    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
    <string> <string>        <double>      <double>           <uint8> ... <uint16>
  1 Adelie   Torgersen           39.1          18.7               181 ...     2007
  2 Adelie   Torgersen           39.5          17.4               186 ...     2007
  3 Adelie   Torgersen           40.3          18.0               195 ...     2007
  4 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007
  5 Adelie   Torgersen           36.7          19.3               193 ...     2007
  : :        :                      :             :                 : ...        :
342 Gentoo   Biscoe              50.4          15.7               222 ...     2009
343 Gentoo   Biscoe              45.2          14.8               212 ...     2009
344 Gentoo   Biscoe              49.9          16.1               213 ...     2009



## 8. Show table

In [13]:
df.table

#<Arrow::Table:0x113637c20 ptr=0x7fcc504bb870>
	x	         y	s	b
0	1	  1.000000	A	true
1	2	  2.000000	B	false
2	3	  3.000000	C	true
3	4	       NaN	D	false
4	5	    (null)	(null)	(null)


In [14]:
penguins.table

#<Arrow::Table:0x10fcb7c20 ptr=0x7fcc5057dc70>
	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
  0	Adelie 	Torgersen	     39.100000	    18.700000	              181	       3750	male	2007
  1	Adelie 	Torgersen	     39.500000	    17.400000	              186	       3800	female	2007
  2	Adelie 	Torgersen	     40.300000	    18.000000	              195	       3250	female	2007
  3	Adelie 	Torgersen	        (null)	       (null)	           (null)	     (null)	(null)	2007
  4	Adelie 	Torgersen	     36.700000	    19.300000	              193	       3450	female	2007
  5	Adelie 	Torgersen	     39.300000	    20.600000	              190	       3650	male	2007
  6	Adelie 	Torgersen	     38.900000	    17.800000	              181	       3625	female	2007
  7	Adelie 	Torgersen	     39.200000	    19.600000	              195	       4675	male	2007
  8	Adelie 	Torgersen	     34.100000	    18.100000	              193	       3475	(null)	2007
  9	Adelie 	Torgersen	     42.000000	 

In [15]:
# This is a Red Arrow's feature
puts df.table.to_s(format: :column)

x: uint8
y: double
s: string
b: bool
----
x:
  [
    [
      1,
      2,
      3,
      4,
      5
    ]
  ]
y:
  [
    [
      1,
      2,
      3,
      nan,
      null
    ]
  ]
s:
  [
    [
      "A",
      "B",
      "C",
      "D",
      null
    ]
  ]
b:
  [
    [
      true,
      false,
      true,
      false,
      null
    ]
  ]


In [16]:
# This is also a Red Arrow's feature
puts df.table.to_s(format: :list)

x: 1
y:   1.000000
s: A
b: true
x: 2
y:   2.000000
s: B
b: false
x: 3
y:   3.000000
s: C
b: true
x: 4
y:        NaN
s: D
b: false
x: 5
y: (null)
s: (null)
b: (null)


## 9. TDR

TDR means 'Transposed Dataframe Representation'. It shows columns in lateral just the same shape as initializing by a Hash. TDR has some information which is useful for the exploratory data processing.

- DataFrame shape: n_rows x n_columns
- Data types
- Levels: number of unique elements
- Data preview: same data is aggregated if level is smaller (tally mode)
- Show counts of abnormal element: NaN and nil

In [17]:
# use the same dataframe as #7
df.tdr

RedAmber::DataFrame : 5 x 4 Vectors
Vectors : 2 numeric, 1 string, 1 boolean
# key type    level data_preview
1 :x  uint8       5 [1, 2, 3, 4, 5]
2 :y  double      5 [1.0, 2.0, 3.0, NaN, nil], 1 NaN, 1 nil
3 :s  string      5 ["A", "B", "C", "D", nil], 1 nil
4 :b  boolean     3 {true=>2, false=>2, nil=>1}


In [18]:
penguins.tdr

RedAmber::DataFrame : 344 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key                type   level data_preview
1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils
6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
7 :sex               string     3 {"male"=>168, "female"=>165, nil=>11}
8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}


`#tdr` has some options:

`limit` : to limit a number of variables to show. Default value is `limit=10`.

In [19]:
penguins.tdr(3)

RedAmber::DataFrame : 344 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key                type   level data_preview
1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
 ... 5 more Vectors ...


`elements` : max number of elements to show in observations. Default value is `elements: 5`.

In [20]:
penguins.tdr(elements: 3) # Show first 3 items in data

RedAmber::DataFrame : 344 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key                type   level data_preview
1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, ... ], 2 nils
4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, ... ], 2 nils
5 :flipper_length_mm uint8     56 [181, 186, 195, ... ], 2 nils
6 :body_mass_g       uint16    95 [3750, 3800, 3250, ... ], 2 nils
7 :sex               string     3 {"male"=>168, "female"=>165, nil=>11}
8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}


`tally` : max level to use tally mode. Level means size of `tally`ed hash. Default value is `tally: 5`.

In [21]:
penguins.tdr(tally: 0) # Don't use tally mode

RedAmber::DataFrame : 344 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key                type   level data_preview
1 :species           string     3 ["Adelie", "Adelie", "Adelie", "Adelie", "Adelie", ... ]
2 :island            string     3 ["Torgersen", "Torgersen", "Torgersen", "Torgersen", "Torgersen", ... ]
3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils
6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
7 :sex               string     3 ["male", "female", "female", nil, "female", ... ], 11 nils
8 :year              uint16     3 [2007, 2007, 2007, 2007, 2007, ... ]


`#tdr_str` returns a String. `#tdr` do the same thing as `puts #tdr_str`

## 10. Size and shape

In [22]:
# same as n_rows, n_obs
df.size

5

In [23]:
# same as n_cols, n_vars
df.n_keys

4

In [24]:
# [df.size, df.n_keys], [df.n_rows, df.n_cols]
df.shape

[5, 4]

## 11. Keys

In [25]:
df.keys

[:x, :y, :s, :b]

In [26]:
penguins.keys

[:species, :island, :bill_length_mm, :bill_depth_mm, :flipper_length_mm, :body_mass_g, :sex, :year]

## 12. Types

In [27]:
df.types

[:uint8, :double, :string, :boolean]

In [28]:
penguins.types

[:string, :string, :double, :double, :uint8, :uint16, :string, :uint16]

## 13. Data type classes

In [29]:
df.type_classes

[Arrow::UInt8DataType, Arrow::DoubleDataType, Arrow::StringDataType, Arrow::BooleanDataType]

In [30]:
penguins.type_classes

[Arrow::StringDataType, Arrow::StringDataType, Arrow::DoubleDataType, Arrow::DoubleDataType, Arrow::UInt8DataType, Arrow::UInt16DataType, Arrow::StringDataType, Arrow::UInt16DataType]

## 14. Indices

In [31]:
df.indexes
# or
df.indices

[0, 1, 2, 3, 4]

## 15. To an Array or a Hash

DataFrame#to_a returns an array of row-oriented data without a header.

In [32]:
df.to_a

[[1, 1.0, "A", true], [2, 2.0, "B", false], [3, 3.0, "C", true], [4, NaN, "D", false], [5, nil, nil, nil]]

If you need a column-oriented array with keys, use `.to_h.to_a`

In [33]:
df.to_h

{:x=>[1, 2, 3, 4, 5], :y=>[1.0, 2.0, 3.0, NaN, nil], :s=>["A", "B", "C", "D", nil], :b=>[true, false, true, false, nil]}

In [34]:
df.to_h.to_a

[[:x, [1, 2, 3, 4, 5]], [:y, [1.0, 2.0, 3.0, NaN, nil]], [:s, ["A", "B", "C", "D", nil]], [:b, [true, false, true, false, nil]]]

## 16. Schema

In [35]:
df.schema

{:x=>:uint8, :y=>:double, :s=>:string, :b=>:boolean}

## 17. Vector

Each variable (column in the table) is represented by a Vector object.

In [36]:
df[:x] # This syntax comes later

#<RedAmber::Vector(:uint8, size=5):0x000000000000f910>
[1, 2, 3, 4, 5]


Or create new Vector by the constructor.

In [37]:
Vector.new(1, 2, 3, 4, 5)

#<RedAmber::Vector(:uint8, size=5):0x000000000000f924>
[1, 2, 3, 4, 5]


In [38]:
Vector.new(1..5)

#<RedAmber::Vector(:uint8, size=5):0x000000000000f938>
[1, 2, 3, 4, 5]


In [39]:
Vector.new([1, 2, 3], [4, 5])

#<RedAmber::Vector(:uint8, size=5):0x000000000000f94c>
[1, 2, 3, 4, 5]


In [40]:
array = Arrow::Array.new([1, 2, 3, 4, 5])
Vector.new(array)

#<RedAmber::Vector(:uint8, size=5):0x000000000000f960>
[1, 2, 3, 4, 5]


- TODO: `Vector[1..5]` as a constructor

## 18. Vectors

Returns an Array of Vectors in a DataFrame.

In [41]:
df.vectors

[#<RedAmber::Vector(:uint8, size=5):0x000000000000f910>
[1, 2, 3, 4, 5]
, #<RedAmber::Vector(:double, size=5):0x000000000000f974>
[1.0, 2.0, 3.0, NaN, nil]
, #<RedAmber::Vector(:string, size=5):0x000000000000f988>
["A", "B", "C", "D", nil]
, #<RedAmber::Vector(:boolean, size=5):0x000000000000f99c>
[true, false, true, false, nil]
]

## 19. Variables

Returns key and Vector pairs in a Hash.

In [42]:
df.variables

{:x=>#<RedAmber::Vector(:uint8, size=5):0x000000000000f910>
[1, 2, 3, 4, 5]
, :y=>#<RedAmber::Vector(:double, size=5):0x000000000000f974>
[1.0, 2.0, 3.0, NaN, nil]
, :s=>#<RedAmber::Vector(:string, size=5):0x000000000000f988>
["A", "B", "C", "D", nil]
, :b=>#<RedAmber::Vector(:boolean, size=5):0x000000000000f99c>
[true, false, true, false, nil]
}

## 20. Select columns by #[ ]

`DataFrame#[]` is overloading column operations and row operations.

- For columns (variables)
  - Key in a Symbol: `df[:symbol]`
  - Key in a String: `df["string"]`
  - Keys in an Array: `df[:symbol1, "string", :symbol2]`
  - Keys by indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`

In [43]:
# Keys in a Symbol and a String
df[:x, 'y']

x,y
1,1.0
2,2.0
3,3.0
4,
5,(nil)


In [44]:
# Keys in a Range
df['x'..'y']

x,y
1,1.0
2,2.0
3,3.0
4,
5,(nil)


In [45]:
# Keys with a index Range, and a symbol
df[df.keys[2..], :x]

s,b,x
A,true,1
B,false,2
C,true,3
D,false,4
(nil),(nil),5


## 21. Select rows by #[ ]
`DataFrame#[]` is overloading column operations and row operations.

- For rows (observations)
  - Select rows by a Index: `df[index]`
  - Select rows by Indices: `df[indices]` # Array, Arrow::Array, Vectors are acceptable for indices
  - Select rows by Ranges: `df[range]`
  - Select rows by Booleans: `df[booleans]` # Array, Arrow::Array, Vectors are acceptable for booleans

In [46]:
# indices
df[0, 2, 1]

x,y,s,b
1,1.0,A,True
3,3.0,C,True
2,2.0,B,False


In [47]:
# including a Range
# negative indices are also acceptable
df[1..2, -1]

x,y,s,b
2,2.0,B,false
3,3.0,C,true
5,(nil),(nil),(nil)


In [48]:
# booleans
# length of boolean should be the same as self
df[false, true, true, false, true]

x,y,s,b
2,2.0,B,false
3,3.0,C,true
5,(nil),(nil),(nil)


In [49]:
# Arrow::Array
indices = Arrow::UInt8Array.new([0,2,4])
df[indices]

x,y,s,b
1,1.0,A,true
3,3.0,C,true
5,(nil),(nil),(nil)


In [50]:
# By a Vector as indices
indices = Vector.new(df.indices)
# indices > 1 returns a boolean Vector
df[indices > 1]

x,y,s,b
3,3.0,C,true
4,,D,false
5,(nil),(nil),(nil)


In [51]:
# By a Vector as booleans
booleans = df[:b]

#<RedAmber::Vector(:boolean, size=5):0x000000000000f99c>
[true, false, true, false, nil]


In [52]:
df[booleans]

x,y,s,b
1,1.0,A,True
3,3.0,C,True


## 22. empty?

In [53]:
df.empty?

false

In [54]:
DataFrame.new.empty?

true

In [55]:
DataFrame.new

(empty DataFrame)

## 23. Select columns by pick

`DataFrame#pick` accepts an Array of keys to pick up columns (variables). You can change the order of columns at a same time.

In [56]:
df.pick(:s, :y)
# or
df.pick([:s, :y]) # OK too.

s,y
A,1.0
B,2.0
C,3.0
D,
(nil),(nil)


Or use a boolean Array of lengeh `n_key` to `pick`. This style remains the order of variables.

In [57]:
df.pick(false, true, true, false)
# or
df.pick([false, true, true, false]) # OK

y,s
1.0,A
2.0,B
3.0,C
,D
(nil),(nil)


`#pick` also accepts a block in the context of self.

Next example is picking up numeric variables.

In [58]:
# reciever is required with the argument style
df.pick(df.vectors.map(&:numeric?))

# with a block
df.pick { vectors.map(&:numeric?) }

x,y
1,1.0
2,2.0
3,3.0
4,
5,(nil)


The name `pick` comes from the action to pick variables(columns) according to the label keys.

## 24. Reject columns by drop

`DataFrame#drop` accepts an Array keys to drop columns (variables) to create remainer DataFrame.

In [59]:
df.drop(:x, :b)
# df.drop([:x, :b]) #is OK too.

y,s
1.0,A
2.0,B
3.0,C
,D
(nil),(nil)


Or use a boolean Array of lengeh `n_key` to `drop`.

In [60]:
df.drop(true, false, false, true)
# df.drop([true, false, false, true]) # is OK too

y,s
1.0,A
2.0,B
3.0,C
,D
(nil),(nil)


`#drop` also accepts a block in the context of self.

Next example will drop variables which have nil or NaN values.

In [61]:
df.drop { vectors.map { |v| v.is_na.any } }

x
1
2
3
4
5


Argument style is also acceptable but it requires the reciever 'df'.

In [62]:
df.drop(df.vectors.map { |v| v.is_na.any })

x
1
2
3
4
5


The name `drop` comes from the pair word of `pick`.

## 25. Pick/drop and nil

When `pick` or `drop` is used with booleans, nil in the booleans is treated as false. This behavior is aligned with Ruby's `BasicObject#!`.

In [63]:
booleans = [true, true, false, nil]
booleans_invert = booleans.map(&:!) # => [false, false, true, true] because nil.! is true
df.pick(booleans) == df.drop(booleans_invert)

true

## 26. Vector#invert, #primitive_invert

In [64]:
vector = Vector.new(booleans)

#<RedAmber::Vector(:boolean, size=4):0x000000000000faf0>
[true, true, false, nil]


nil is converted to nil by `Vector#invert`.

In [65]:
vector.invert
# or
!vector

#<RedAmber::Vector(:boolean, size=4):0x000000000000fb04>
[false, false, true, nil]


So `df.pick(booleans) != df.drop(booleans.invert)` when booleans have any nils.

On the other hand, `Vector#primitive_invert` follows Ruby's `BasicObject#!`'s behavior. Then pick and drop keep 'MECE' behavior.

In [66]:
vector.primitive_invert

#<RedAmber::Vector(:boolean, size=4):0x000000000000fb18>
[false, false, true, true]


In [67]:
df.pick(vector) == df.drop(vector.primitive_invert)

true

## 27. Pick/drop and [ ]

When `pick` or `drop` select a single column (variable), it returns a `DataFrame` with one column (variable).

In [68]:
df.pick(:x) # or
df.drop(:y, :s, :b)

x
1
2
3
4
5


In contrast, when `[]` selects a single column (variable), it returns a `Vector`.

In [69]:
df[:x]

#<RedAmber::Vector(:uint8, size=5):0x000000000000f910>
[1, 2, 3, 4, 5]


This behavior may be useful to use in a block of DataFrame manipulation verbs (like pick, drop, slice, remove, assign, rename).

## 28. Slice

`slice` selects rows (observations) to create a subset of a DataFrame.

`slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers. Negative index from the tail like Ruby's Array is also acceptable.

In [70]:
# returns 5 rows at the start and 5 rows from the end
penguins.slice(0...5, -5..-1)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,(nil),(nil),(nil),(nil),(nil),2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


In [71]:
# slice accepts Float index
# 33% of 344 observations in index => 113.52 th data ??
penguins.slice(penguins.size * 0.33)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Biscoe,42.2,19.5,197,4275,male,2009


Indices in Vectors or Arrow::Arrays are also acceptable.

Another way to select in `slice` is to use booleans.
- Booleans is an Array, Arrow::Array, Vector or their Array.
- Each data type must be boolean.
- Size of booleans must be same as the size of self.

In [72]:
# make booleans to check over 40
booleans = penguins[:bill_length_mm] > 40

#<RedAmber::Vector(:boolean, size=344):0x000000000000fb68>
[false, false, true, nil, false, false, false, false, false, true, false, false, ... ]


In [73]:
penguins.slice(booleans)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,42.0,20.2,190,4250,(nil),2007
Adelie,Torgersen,41.1,17.6,182,3200,female,2007
Adelie,Torgersen,42.5,20.7,197,4500,male,2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


`slice` accepts a block.
- We can't use both arguments and a block at a same time.
- The block should return indeces in any length or a boolean Array with a same length as `size`.
- Block is called in the context of self. So reciever 'self' can be omitted in the block.

In [74]:
# return a DataFrame with bill_length_mm is in 2*std range around mean
penguins.slice do
  vector = self[:bill_length_mm]
  min = vector.mean - vector.std
  max = vector.mean + vector.std
  vector.to_a.map { |e| (min..max).include? e }
end

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,39.3,20.6,190,3650,male,2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,47.2,13.7,214,4925,female,2009
Gentoo,Biscoe,46.8,14.3,215,4850,female,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009


## 29. Slice and nil option

`Arrow::Table#slice` uses `#filter` method with a option `Arrow::FilterOptions.null_selection_behavior = :emit_null`. This will propagate nil at the same row.

In [75]:
hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }
table = Arrow::Table.new(hash)
table.slice([true, false, nil])

#<Arrow::Table:0x113e72048 ptr=0x7fcc50a542a0>
	     a	b	         c
0	     1	A	  1.000000
1	(null)	(null)	    (null)


Whereas in RedAmber, `DataFrame#slice` with booleans containing nil is treated as false. This behavior comes from `Allow::FilterOptions.null_selection_behavior = :drop`. This is a default value for `Arrow::Table.filter` method.

In [76]:
RedAmber::DataFrame.new(table).slice([true, false, nil]).table

#<Arrow::Table:0x113e51438 ptr=0x7fcc4f7e4ed0>
	a	b	         c
0	1	A	  1.000000


## 30. Remove

Slice and reject rows (observations) to create a remainer DataFrame.

`#remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.

In [77]:
# returns 6th to 339th obs. Remainer of 1st example of #30
penguins.remove(0...5, -5..-1)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.3,20.6,190,3650,male,2007
Adelie,Torgersen,38.9,17.8,181,3625,female,2007
Adelie,Torgersen,39.2,19.6,195,4675,male,2007
Adelie,Torgersen,34.1,18.1,193,3475,(nil),2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,44.5,15.7,217,4875,(nil),2009
Gentoo,Biscoe,48.8,16.2,222,6000,male,2009
Gentoo,Biscoe,47.2,13.7,214,4925,female,2009


`remove(booleans)` accepts booleans as a argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `#size`.

In [78]:
# remove all observation contains nil
removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,36.7,19.3,193,3450,female,2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


`remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as size. Block is called in the context of self.

In [79]:
# Remove data in 2*std range around mean
penguins.remove do
  vector = self[:bill_length_mm]
  min = vector.mean - vector.std
  max = vector.mean + vector.std
  vector.to_a.map { |e| (min..max).include? e }
end

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,(nil),(nil),(nil),(nil),(nil),2007
Adelie,Torgersen,36.7,19.3,193,3450,female,2007
Adelie,Torgersen,34.1,18.1,193,3475,(nil),2007
Adelie,Torgersen,37.8,17.1,186,3300,(nil),2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,(nil),(nil),(nil),(nil),(nil),2009
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


## 31. Remove and nil

When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`.

In [80]:
df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])

a,b,c
1,A,1.0
2,B,2.0
(nil),C,3.0


In [81]:
booleans = df[:a] < 2

#<RedAmber::Vector(:boolean, size=3):0x000000000000fbf4>
[true, false, nil]


In [82]:
booleans_invert = booleans.to_a.map(&:!)

[false, true, true]

In [83]:
df.slice(booleans) == df.remove(booleans_invert)

true

Whereas `Vector#invert` returns nil for elements nil. This will bring different result. (See #26)

In [84]:
booleans.invert

#<RedAmber::Vector(:boolean, size=3):0x000000000000fc08>
[false, true, nil]


In [85]:
df.remove(booleans.invert)

a,b,c
1,A,1.0
(nil),C,3.0


We have `#primitive_invert` method in Vector. This method returns the same result as `.to_a.map(&:!)` above.

In [86]:
booleans.primitive_invert

#<RedAmber::Vector(:boolean, size=3):0x000000000000fc30>
[false, true, true]


In [87]:
df.remove(booleans.primitive_invert)

a,b,c
1,A,1.0


In [88]:
df.slice(booleans) == df.remove(booleans.primitive_invert)

true

## 32. Remove nil

Remove any observations containing nil.

In [89]:
penguins.remove_nil

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,36.7,19.3,193,3450,female,2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


The roundabout way for this is to use `#remove`.

In [90]:
penguins.remove { vectors.map(&:is_nil).reduce(&:|) }

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,36.7,19.3,193,3450,female,2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


## 33. Rename

Rename keys (column names) to create a updated DataFrame.

`#rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` .

In [91]:
h = { name: %w[Yasuko Rui Hinata], age: [68, 49, 28] }
comecome = RedAmber::DataFrame.new(h)

name,age
Yasuko,68
Rui,49
Hinata,28


In [92]:
comecome.rename(:age => :age_in_1993)

name,age_in_1993
Yasuko,68
Rui,49
Hinata,28


`#rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of {existing_key => new_key}. Block is called in the context of self.

Symbol key and String key are distinguished.

## 34. Assign

Assign new or updated columns (variables) and create a updated DataFrame.

- Columns with new keys will append new variables at right (bottom in TDR).
- Columns with exisiting keys will update corresponding vectors.

`#assign(key_pairs)` accepts pairs of key and values as arguments. key_pairs should be a Hash of `{key => array}` or `{key => Vector}` .

In [93]:
comecome = RedAmber::DataFrame.new( name: %w[Yasuko Rui Hinata], age: [68, 49, 28] )

name,age
Yasuko,68
Rui,49
Hinata,28


In [94]:
# update :age and add :brother
assigner = { age: [97, 78, 57], brother: ['Santa', nil, 'Momotaro'] }
comecome.assign(assigner)

name,age,brother
Yasuko,97,Santa
Rui,78,(nil)
Hinata,57,Momotaro


`#assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array}` or `{key => Vector}`. Block is called in the context of self.

In [95]:
df = RedAmber::DataFrame.new(
  index: [0, 1, 2, 3, nil],
  float: [0.0, 1.1,  2.2, Float::NAN, nil],
  string: ['A', 'B', 'C', 'D', nil])

index,float,string
0,0.0,A
1,1.1,B
2,2.2,C
3,,D
(nil),(nil),(nil)


In [96]:
# update numeric variables
df.assign do
  assigner = {}
  vectors.each_with_index do |v, i|
    assigner[keys[i]] = -v if v.numeric?
  end
  assigner
end

index,float,string
0,-0.0,A
255,-1.1,B
254,-2.2,C
253,,D
(nil),(nil),(nil)


## 35. Coerce (Vector)

Vector has coerce method.

In [97]:
vector = RedAmber::Vector.new(1,2,3)

#<RedAmber::Vector(:uint8, size=3):0x000000000000fcf8>
[1, 2, 3]


In [98]:
# Vector's `#*` method
vector * -1

#<RedAmber::Vector(:int16, size=3):0x000000000000fd0c>
[-1, -2, -3]


In [99]:
# coerced calculation
-1 * vector

#<RedAmber::Vector(:int16, size=3):0x000000000000fd20>
[-1, -2, -3]


In [100]:
# `@-` operator
-vector

#<RedAmber::Vector(:uint8, size=3):0x000000000000fd34>
[255, 254, 253]


## 36. to_ary (Vector)

`Vector#to_ary` will enable implicit conversion to an Array.

In [101]:
Array(Vector.new([3, 4, 5]))

[3, 4, 5]

In [102]:
[1, 2] + Vector.new([3, 4, 5])

[1, 2, 3, 4, 5]

## 37. Fill nil (Vector)

`Vector#fill_nil_forward` or `Vector#fill_nil_backward` will
propagate the last valid observation forward (or backward).
Or preserve nil if all previous values are nil or at the end.

In [103]:
integer = Vector.new([0, 1, nil, 3, nil])
integer.fill_nil_forward

#<RedAmber::Vector(:uint8, size=5):0x000000000000fd48>
[0, 1, 1, 3, 3]


In [104]:
integer.fill_nil_backward

#<RedAmber::Vector(:uint8, size=5):0x000000000000fd5c>
[0, 1, 3, 3, nil]


## 38. all?/any? (Vector)

`Vector#all?` returns true if all elements is true.

`Vector#any?` returns true if exists any true.

These are unary aggregation function.

In [105]:
booleans = Vector.new([true, true, nil])
booleans.all?

true

In [106]:
booleans.any?

true

If these methods are used with option `skip_nulls: false` nil is considered.

In [107]:
booleans.all?(skip_nulls: false)

false

In [108]:
booleans.any?(skip_nulls: false)

true

## 39. count/count_uniq (Vector)

`Vector#count` counts element.

`Vector#count_uniq` counts unique element. `#count_distinct` is an alias (Arrow's name).

These are unary aggregation function.

In [109]:
string = Vector.new(%w[A B A])
string.count

3

In [110]:
string.count_uniq # count_distinct is also OK

2

## 40. stddev/variance (Vector)

These are unary element-wise function.

In [111]:
integers = Vector.new([1, 2, 3, nil])
integers.stddev

0.816496580927726

In [112]:
# Unbiased standard deviation
integers.sd

1.0

In [113]:
integers.variance

0.6666666666666666

In [114]:
# Unbiased variance
integers.var

1.0

## 41. negate (Vector)

These are unary element-wise function.

In [115]:
double = Vector.new([1.0, -2, 3])
double.negate

#<RedAmber::Vector(:double, size=3):0x000000000000fd70>
[-1.0, 2.0, -3.0]


In [116]:
# Same as #negate
-double

#<RedAmber::Vector(:double, size=3):0x000000000000fd84>
[-1.0, 2.0, -3.0]


## 42. round (Vector)

Otions for `#round`;

- `:n-digits` The number of digits to show.
- `round_mode` Specify rounding mode.

This is a unary element-wise function.

In [117]:
double = RedAmber::Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])

#<RedAmber::Vector(:double, size=5):0x000000000000fd98>
[15.15, 2.5, 3.5, -4.5, -5.5]


In [118]:
double.round

#<RedAmber::Vector(:double, size=5):0x000000000000fdac>
[15.0, 2.0, 4.0, -4.0, -6.0]


In [119]:
double.round(mode: :half_to_even)

#<RedAmber::Vector(:double, size=5):0x000000000000fdc0>
[15.0, 2.0, 4.0, -4.0, -6.0]


In [120]:
double.round(mode: :towards_infinity)

#<RedAmber::Vector(:double, size=5):0x000000000000fdd4>
[16.0, 3.0, 4.0, -5.0, -6.0]


In [121]:
double.round(mode: :half_up)

#<RedAmber::Vector(:double, size=5):0x000000000000fde8>
[15.0, 3.0, 4.0, -4.0, -5.0]


In [122]:
double.round(mode: :half_towards_zero)

#<RedAmber::Vector(:double, size=5):0x000000000000fdfc>
[15.0, 2.0, 3.0, -4.0, -5.0]


In [123]:
double.round(mode: :half_towards_infinity)

#<RedAmber::Vector(:double, size=5):0x000000000000fe10>
[15.0, 3.0, 4.0, -5.0, -6.0]


In [124]:
double.round(mode: :half_to_odd)

#<RedAmber::Vector(:double, size=5):0x000000000000fe24>
[15.0, 3.0, 3.0, -5.0, -5.0]


In [125]:
double.round(n_digits: 0)

#<RedAmber::Vector(:double, size=5):0x000000000000fe38>
[15.0, 2.0, 4.0, -4.0, -6.0]


In [126]:
double.round(n_digits: 1)

#<RedAmber::Vector(:double, size=5):0x000000000000fe4c>
[15.2, 2.5, 3.5, -4.5, -5.5]


In [127]:
double.round(n_digits: -1)

#<RedAmber::Vector(:double, size=5):0x000000000000fe60>
[20.0, 0.0, 0.0, -0.0, -10.0]


## 43. and/or (Vector)

RedAmber select `and_kleene`/`or_kleene` as default `&`/`|` method.

These are unary element-wise function.

In [128]:
bool_self  = Vector.new([true, true, true, false, false, false, nil, nil, nil])
bool_other = Vector.new([true, false, nil, true, false, nil, true, false, nil])

bool_self & bool_other  # same as bool_self.and_kleene(bool_other)

#<RedAmber::Vector(:boolean, size=9):0x000000000000fe74>
[true, false, nil, false, false, false, nil, false, nil]


In [129]:
# Ruby's primitive `&&`
bool_self && bool_other

#<RedAmber::Vector(:boolean, size=9):0x000000000000fe88>
[true, false, nil, true, false, nil, true, false, nil]


In [130]:
# Arrow's default `and`
bool_self.and_org(bool_other)

#<RedAmber::Vector(:boolean, size=9):0x000000000000fe9c>
[true, false, nil, false, false, nil, nil, nil, nil]


In [131]:
bool_self | bool_other  # same as bool_self.or_kleene(bool_other)

#<RedAmber::Vector(:boolean, size=9):0x000000000000feb0>
[true, true, true, true, false, nil, true, nil, nil]


In [132]:
# Ruby's primitive `||`
bool_self || bool_other

#<RedAmber::Vector(:boolean, size=9):0x000000000000fec4>
[true, true, true, false, false, false, nil, nil, nil]


In [133]:
# Arrow's default `or`
bool_self.or_org(bool_other)

#<RedAmber::Vector(:boolean, size=9):0x000000000000fed8>
[true, true, nil, true, false, nil, nil, nil, nil]


## 44. is_finite/is_nan/is_nil/is_na (Vector)

These are unary element-wise function.

In [134]:
double = Vector.new([Math::PI, Float::INFINITY, -Float::INFINITY, Float::NAN, nil])

#<RedAmber::Vector(:double, size=5):0x000000000000feec>
[3.141592653589793, Infinity, -Infinity, NaN, nil]


In [135]:
double.is_finite

#<RedAmber::Vector(:boolean, size=5):0x000000000000ff00>
[true, false, false, false, nil]


In [136]:
double.is_inf

#<RedAmber::Vector(:boolean, size=5):0x000000000000ff14>
[false, true, true, false, nil]


In [137]:
double.is_na

#<RedAmber::Vector(:boolean, size=5):0x000000000000ff28>
[false, false, false, true, true]


In [138]:
double.is_nil

#<RedAmber::Vector(:boolean, size=5):0x000000000000ff3c>
[false, false, false, false, true]


In [139]:
double.is_valid

#<RedAmber::Vector(:boolean, size=5):0x000000000000ff50>
[true, true, true, true, false]


## 45. Prime-th rows

In [140]:
# prime-th rows ... Don't ask me what it means.
require 'prime'
penguins_with_index =
  penguins.assign do
    { index: Vector.new(penguins.indices) + 1 }
  end.pick { [keys[-1], keys[0..-2]] }
penguins_with_index.slice { Vector.new(Prime.each(size).to_a) - 1 }

index,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
2,Adelie,Torgersen,39.5,17.4,186,3800,female,2007
3,Adelie,Torgersen,40.3,18.0,195,3250,female,2007
5,Adelie,Torgersen,36.7,19.3,193,3450,female,2007
7,Adelie,Torgersen,38.9,17.8,181,3625,female,2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
317,Gentoo,Biscoe,49.4,15.8,216,4925,male,2009
331,Gentoo,Biscoe,50.5,15.2,216,5000,female,2009
337,Gentoo,Biscoe,44.5,15.7,217,4875,(nil),2009


## 46. Slice by Enumerator

Slice accepts Enumerator as an option.

In [141]:
# Select every 10 samples
penguins.slice(0.step by: 10, to: 340)

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,37.8,17.1,186,3300,(nil),2007
Adelie,Biscoe,37.8,18.3,174,3400,female,2007
Adelie,Dream,39.5,16.7,178,3250,female,2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,48.5,15.0,219,4850,female,2009
Gentoo,Biscoe,50.5,15.2,216,5000,female,2009
Gentoo,Biscoe,46.8,14.3,215,4850,female,2009


## 47. Output mode

Output mode of `#inspect` and `#to_iruby` is Table mode by default. If you prefer TDR mode set the environment variable `RED_AMBER_OUTPUT_MODE` to `"TDR"`.

In [142]:
ENV['RED_AMBER_OUTPUT_MODE'] = 'Table' # or nil (default)
penguins  # Almost same as `puts penguins.to_s` in any mode

species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex,year
Adelie,Torgersen,39.1,18.7,181,3750,male,2007
Adelie,Torgersen,39.5,17.4,186,3800,female,2007
Adelie,Torgersen,40.3,18.0,195,3250,female,2007
Adelie,Torgersen,(nil),(nil),(nil),(nil),(nil),2007
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Gentoo,Biscoe,50.4,15.7,222,5750,male,2009
Gentoo,Biscoe,45.2,14.8,212,5200,female,2009
Gentoo,Biscoe,49.9,16.1,213,5400,male,2009


In [143]:
ENV['RED_AMBER_OUTPUT_MODE'] = 'TDR'
p penguins; nil # Almost same as `penguins.tdr` in any mode

#<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f8ac>
Vectors : 5 numeric, 3 strings
# key                type   level data_preview
1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
 ... 5 more Vectors ...



In [144]:
penguins

RedAmber::DataFrame : 344 x 8 Vectors
Vectors : 5 numeric, 3 strings
# key                type   level data_preview
1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils
6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
7 :sex               string     3 {"male"=>168, "female"=>165, nil=>11}
8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}
