-
Notifications
You must be signed in to change notification settings - Fork 290
Datasets
Data can be read from a file using the read-dataset
function in the incanter.io
library:
(use 'incanter.io)
(def data (read-dataset "datafile.csv" :header true))
The default delimiter is a comma, but other delimiters can be specified with the :delim
option.
Incanter comes with sample data that can be loaded using the get-dataset
function from the incanter.datasets
library. The get-dataset
function relies on the incanter.home
property, which is set to ./
by default. If you use bin/clj
to start the Clojure shell (REPL) from the Incanter directory, get-dataset
will be able to find the data sets in incanter/data
. If you want to start the REPL from another directory, or use another environment to run it (e.g. emacs/slime), then you need to pass the incanter.home
property to the JVM at startup: java -Dincanter.home=$INCANTER_HOME ...
or use the :incanter-home
option to get-dataset
.
To load and view Edgar Anderson’s Iris dataset:
(use '(incanter core datasets))
(def iris (get-dataset :iris))
(view iris)
A dataset can be converted to a matrix, where non-numeric columns are converted to either
numeric codes or dummy variables, using the to-matrix
function.
(def iris-mat (to-matrix iris))
(view iris-mat)
To convert the ‘Species’ column to two binary dummy-variables use the dummies
option.
(def iris-dummy (to-matrix iris :dummies true))
(view iris-dummy)
Datasets and matrices can be written to a file using the save
function.
(save iris "/tmp/iris.csv")
The default delimiter is a comma, but other delimiters can be selected with the
:delim
options. Dataset headers are written to the file automatically, but headers can be specified
for matrices with the
:header
option.(save iris-mat "/tmp/iris_mat.csv"
:header ["Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"])
The :append
option can be used to append instead of overwriting an existing file.
For further information on using datasets and matrices in Incanter see: