Datasets

liebke edited this page Sep 13, 2010 · 7 revisions

Reading Datasets

Data can be read from a file using the read-dataset function in the incanter.io library:

(use 'incanter.io)
(def data (read-dataset "datafile.csv" :header true))

The default delimiter is a comma, but other delimiters can be specified with the :delim option.

Incanter comes with sample data that can be loaded using the get-dataset function from the incanter.datasets library. The get-dataset function relies on the incanter.home property, which is set to ./ by default. If you use bin/clj to start the Clojure shell (REPL) from the Incanter directory, get-dataset will be able to find the data sets in incanter/data. If you want to start the REPL from another directory, or use another environment to run it (e.g. emacs/slime), then you need to pass the incanter.home property to the JVM at startup: java -Dincanter.home=$INCANTER_HOME ... or use the :incanter-home option to get-dataset.

To load and view Edgar Anderson’s Iris dataset:

(use '(incanter core datasets))
(def iris (get-dataset :iris))
(view iris)



Converting Datasets to Matrices

A dataset can be converted to a matrix, where non-numeric columns are converted to either
numeric codes or dummy variables, using the to-matrix function.

(def iris-mat (to-matrix iris))
(view iris-mat)



To convert the ‘Species’ column to two binary dummy-variables use the dummies option.

(def iris-dummy (to-matrix iris :dummies true))
(view iris-dummy)

Saving Data

Datasets and matrices can be written to a file using the save function.

(save iris "/tmp/iris.csv")

The default delimiter is a comma, but other delimiters can be selected with the :delim
options. Dataset headers are written to the file automatically, but headers can be specified
for matrices with the :header option.
(save iris-mat "/tmp/iris_mat.csv" 
  :header ["Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"])

The :append option can be used to append instead of overwriting an existing file.

References

For further information on using datasets and matrices in Incanter see:

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.