RScythica Dataframes, also called S Dataframes, are large scale, read-only, memory-mapped eventually distributed, data frames which are both partioned and broken into large splits. S Dataframes are created using external utilities.
LICENSE: LGPL 2.1
Note: RScythica is under development and is not production ready. Interfaces are evolving and subject to change.
Note2: Some functions currently require C++ support for SSE 4.1 instructions.
-
S Dataframes consists of multiple binary files that hold vector data, that can be mapped directly into an R process
-
S Dataframes are partioned.
-
Partitions are further broken into large splits for efficiency
For more defails, check the R Package documentation.
RScythica depends on the scythica utilities to create the S datasets, and the utilities need to be available in the executable search path.
sudo apt-get install libyaml-cpp-dev sudo apt-get install libmsgpack-dev
The following packages are required
-
XCode
-
Brew Packages:
brew install boost
brew install yaml-cpp
brew install msgpack
library(devtools) devtools::install_github("geraldthewes/RScythica")
The package include source data that needs to be converted to binary form using sdscreate in the tests/extdata
directory
sdscreate airline.yaml airline.sds airline.csv
sdscreate -noheader=true iris.yaml iris.sds iris.data
sdscreate boston.yaml boston.sds boston-1970-2014.csv
sdscreate PRECIP_15_sample_csv.yaml noaa.sds PRECIP_15_sample_csv.csv
The Doxygen file for the C++ code can be created with
doxygen
cd latex
make
This will create an HTML version in the HTML/index.html subdirectory, and a PDF version in the latex subdirectory.
Current limitations include:
- Limited testing or error handling
- Only supports POSIX systems such as MacOS X and Linux
- Currently supports the following data types:
- integer (32-bit)
- numeric (64-bit Double)
- factor
- logicals
- Date
- DateTime (POSIXct)
- Currently Requires SSE 4.1 Support to accelerate vector operations.