Skip to content

R Dataset

genmeblog edited this page Jun 22, 2020 · 1 revision

R data objects are usually converted to the Clojure datastructure or object. Here are the notes about typical use cases. Default R datasets are used as examples.

Data Frame

Any data.frame, also tribble and data.table are treated the same. If row.names are available they are converted to the additional column :$row.names.


No row.names available.

:Time :demand
1.0 8.3
2.0 10.3
3.0 19.0
4.0 16.0
5.0 15.6
7.0 19.8


With row.names

:$row.names :Plant :Type :Treatment :conc :uptake
1 1 :Quebec :nonchilled 95.0 16.0
2 1 :Quebec :nonchilled 175.0 30.4
3 1 :Quebec :nonchilled 250.0 34.8
4 1 :Quebec :nonchilled 350.0 37.2
5 1 :Quebec :nonchilled 500.0 35.3
6 1 :Quebec :nonchilled 675.0 39.2
7 1 :Quebec :nonchilled 1000.0 39.7
8 2 :Quebec :nonchilled 95.0 13.6
9 2 :Quebec :nonchilled 175.0 27.3
10 2 :Quebec :nonchilled 250.0 37.1


Table is converted to a long form where each dimension has it's own column. If column names are not available, column id is prefixed with :$col. Values are stored in the last, :$value column.


Dimensions with names.

:Admit :Gender :Dept :$value
Admitted Male A 512.0
Rejected Male A 313.0
Admitted Female A 89.0
Rejected Female A 19.0
Admitted Male B 353.0
Rejected Male B 207.0
Admitted Female B 17.0
Rejected Female B 8.0
Admitted Male C 120.0
Rejected Male C 205.0


Dimensions without names

:$col-0 :$col-1 :$value
9.4 142.24 0
9.5 142.24 0
9.6 142.24 0
9.7 142.24 0
9.8 142.24 0
9.9 142.24 0
10 142.24 1
10.1 142.24 0
10.2 142.24 0
10.3 142.24 0

Matrices, arrays, multidimensional arrays

The idea here is similar to R, 2d structures (matrices) are tagged using other dimensions. So for first two dimensions - matrix is created, or dimensions are added as columns. If names are missing artificial column names are added. Row names are added as :$row.names.


Matrix with row and column names

:$row.names Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1
70-74 66.0 54.3 71.1 50.0


Matrix with column names

lag quarterly revenue price index income level market potential
8.79636 4.70997 5.82110 12.9699
8.79236 4.70217 5.82558 12.9733
8.79137 4.68944 5.83112 12.9774
8.81486 4.68558 5.84046 12.9806
8.81301 4.64019 5.85036 12.9831
8.90751 4.62553 5.86464 12.9854
8.93673 4.61991 5.87769 12.9900
8.96161 4.61654 5.89763 12.9943
8.96044 4.61407 5.92574 12.9992
9.00868 4.60766 5.94232 13.0033


3d array, with names in second and third dimensions

:$col-0 Sepal L. Sepal W. Petal L. Petal W.
Setosa 5.1 3.5 1.4 0.2
Setosa 4.9 3.0 1.4 0.2
Setosa 4.7 3.2 1.3 0.2
Setosa 4.6 3.1 1.5 0.2
Setosa 5.0 3.6 1.4 0.2
Setosa 5.4 3.9 1.7 0.4
Setosa 4.6 3.4 1.4 0.3
Setosa 5.0 3.4 1.5 0.2
Setosa 4.4 2.9 1.4 0.2
Setosa 4.9 3.1 1.5 0.1

5D array

Created with (r/r '(array ~(range 60) :dim [2 5 1 3 2]))

:$col-0 :$col-1 :$col-2 1 2 3 4 5
1 1 1 0.0 2.0 4.0 6.0 8.0
1 1 1 1.0 3.0 5.0 7.0 9.0
1 2 1 10.0 12.0 14.0 16.0 18.0
1 2 1 11.0 13.0 15.0 17.0 19.0
1 3 1 20.0 22.0 24.0 26.0 28.0
1 3 1 21.0 23.0 25.0 27.0 29.0
1 1 2 30.0 32.0 34.0 36.0 38.0
1 1 2 31.0 33.0 35.0 37.0 39.0
1 2 2 40.0 42.0 44.0 46.0 48.0
1 2 2 41.0 43.0 45.0 47.0 49.0
1 3 2 50.0 52.0 54.0 56.0 58.0
1 3 2 51.0 53.0 55.0 57.0 59.0

1D Timeseries

Timeseries are stored in two columns:

  • :$time - to store time identifier as float
  • :$series - to store timeseries


:$time :$series
1.0 200.1
2.0 199.5
3.0 199.4
4.0 198.9
5.0 199.0
6.0 200.2
7.0 198.6
8.0 200.0
9.0 200.3
10.0 201.2

Multidimensional timeseries

Is a mix of multidmentions array with added :$time column.


1991.49615385 1628.75 1678.1 1772.8 2443.6
1991.50000000 1613.63 1688.5 1750.5 2460.2
1991.50384615 1606.51 1678.6 1718.0 2448.2
1991.50769231 1621.04 1684.1 1708.1 2470.4
1991.51153846 1618.16 1686.6 1723.1 2484.7
1991.51538462 1610.61 1671.6 1714.3 2466.8
1991.51923077 1630.75 1682.9 1734.5 2487.9
1991.52307692 1640.17 1703.6 1757.4 2508.4
1991.52692308 1635.47 1697.5 1754.0 2510.5
1991.53076923 1645.89 1716.3 1754.3 2497.4

Datatypes with time

(r/r "
   day <- c(\"20081101\", \"20081101\", \"20081101\", \"20081101\", \"18081101\", \"20081102\", \"20081102\", \"20081102\", \"20081102\", \"20081103\")
   time <- c(\"01:20:00\", \"06:00:00\", \"12:20:00\", \"17:30:00\", \"21:45:00\", \"01:15:00\", \"06:30:00\", \"12:50:00\", \"20:00:00\", \"01:05:00\")
   dts1 <- paste(day, time)
   dts2 <- as.POSIXct(dts1, format = \"%Y%m%d %H:%M:%S\")
   dts3 <- as.POSIXlt(dts1, format = \"%Y%m%d %H:%M:%S\")
   dts <- data.frame(posixct=dts2, posixlt=dts3)") 
:posixct :posixlt
2008-11-01T01:20+01:00[Europe/Warsaw] 2008-11-01T01:20+01:00[Europe/Warsaw]
2008-11-01T06:00+01:00[Europe/Warsaw] 2008-11-01T06:00+01:00[Europe/Warsaw]
2008-11-01T12:20+01:00[Europe/Warsaw] 2008-11-01T12:20+01:00[Europe/Warsaw]
2008-11-01T17:30+01:00[Europe/Warsaw] 2008-11-01T17:30+01:00[Europe/Warsaw]
1808-11-01T21:45+01:24[Europe/Warsaw] 1808-11-01T21:45+01:24[Europe/Warsaw]
2008-11-02T01:15+01:00[Europe/Warsaw] 2008-11-02T01:15+01:00[Europe/Warsaw]
2008-11-02T06:30+01:00[Europe/Warsaw] 2008-11-02T06:30+01:00[Europe/Warsaw]
2008-11-02T12:50+01:00[Europe/Warsaw] 2008-11-02T12:50+01:00[Europe/Warsaw]
2008-11-02T20:00+01:00[Europe/Warsaw] 2008-11-02T20:00+01:00[Europe/Warsaw]
2008-11-03T01:05+01:00[Europe/Warsaw] 2008-11-03T01:05+01:00[Europe/Warsaw]



Named list

 [1.0 0.846 0.805 0.859 0.473 0.398 0.301 0.382 0.846 1.0 0.881 0.826
  0.376 0.326 0.277 0.415 0.805 0.881 1.0 0.801 0.38 0.319 0.237 0.345 0.859
  0.826 0.801 1.0 0.436 0.329 0.327 0.365 0.473 0.376 0.38 0.436 1.0 0.762
  0.73 0.629 0.398 0.326 0.319 0.329 0.762 1.0 0.583 0.577 0.301 0.277 0.237
  0.327 0.73 0.583 1.0 0.539 0.382 0.415 0.345 0.365 0.629 0.577 0.539 1.0],
 :center [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0],
 :n.obs [305.0]}

Partially named list

{:a [11.0], :b [22.0], [[3]] [33.0], [[4]] [44.0], :e [55.0], :f [66.0], [[7]] [77.0], [[8]] [88.0], :i [99.0]}