a faster arff parser
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R add path.expand to fix #34 Jun 2, 2017
attic
benchmark benchmark Sep 8, 2016
inst/arffs Address Issue #32: parse single logical value properly Feb 3, 2017
man add some examples Sep 10, 2016
src improve buffer size in preproc code, to ensure that we dont fail for … Nov 20, 2018
tests Address Issue #32: parse single logical value properly Feb 3, 2017
.Rbuildignore
.gitignore
.travis.yml cleanup, comments Sep 6, 2016
DESCRIPTION DESCRIPTION Sep 9, 2016
LICENSE update date in LICENCE Mar 18, 2016
NAMESPACE
NEWS.md news for 34 Jun 2, 2017
README.md

README.md

farff: A faster ARFF parser.

CRAN Downloads CRAN Status Badge Build Status

This is a subproject for better file handling with mlr and OpenML.

Installation instructions

Please install the proper CRAN releases in the usual way. If you absolutely have to install from here (you should not):

devtools::install_github("mlr-org/farff")

What is ARFF

ARFF files are like CSV files, with a little bit of added meta information in a header and standardized NA values. They are quite often used for machine learning data sets and were introduced for the WEKA machine learning java toolbox.

RWeka's read.arff and write.arff already exist?

Several reasons motivated the development of farff:

  • The java dependency of RWeka is annoying.
  • The I/O code in RWeka is pretty slow, at least the reading of files in farff is much faster.

How does it work?

library(farff)
# import arff format file
d = readARFF("iris.arff")
# export arff format file
writeARFF(iris, path = "iris.arff")

How does it work under the hood?

  • We read the ARFF header with pure R code.
  • We preprocess the data section a bit with custom C code and write the result into a temporary file TEMP.
  • The TEMP file, i.e., the data section, is parsed with readr::read_delim. Support for data.table::fread is planned for future releases.