forked from felixr/sparsity
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Complete redesign and bunch of bug fixes. (rcppeigen no more needed, …
…liblinear descoped - focus only on I/O operations)
- Loading branch information
1 parent
4a16451
commit f00bbb0
Showing
51 changed files
with
457 additions
and
5,825 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
^.*\.Rproj$ | ||
^\.Rproj\.user$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,3 +10,4 @@ | |
*.so | ||
|
||
.*.sw[po] | ||
.Rproj.user |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,25 @@ | ||
Package: sparsity | ||
Package: sparsio | ||
Type: Package | ||
Title: What the package does (short line) | ||
Version: 1.0 | ||
Date: 2013-06-22 | ||
Author: Felix Riedel | ||
Maintainer: Felix Riedel <felix.riedel@gmail.com> | ||
Description: More about what it does (maybe more than one line) | ||
License: BSD License | ||
Title: I/O opeations with sparse matrices | ||
Version: 2.0 | ||
Date: 2017-04-13 | ||
Authors@R: c(person("Dmitriy", "Selivanov", role = c("aut", "cre"), | ||
email = "selivanov.dmitriy@gmail.com"), | ||
person("Felix", "Riedel", role = c("aut"), | ||
email = "felix.riedel@gmail.com")) | ||
Maintainer: Dmitriy Selivanov <selivanov.dmitriy@gmail.com> | ||
Encoding: UTF-8 | ||
Description: Fast SVMlight reader and writer. | ||
License: BSD_3_clause + file LICENSE | ||
Depends: | ||
Rcpp (>= 0.10.3), | ||
RcppEigen (>= 0.3.1) | ||
LinkingTo: Rcpp, RcppEigen | ||
Collate: | ||
'liblinear.r' | ||
'RcppExports.R' | ||
'sparsity-io.R' | ||
R (>= 3.1.0), | ||
methods | ||
Imports: | ||
Rcpp (>= 0.12.0), | ||
Matrix (>= 1.1) | ||
LinkingTo: Rcpp | ||
Suggests: | ||
testthat | ||
URL: https://github.com/dselivanov/sparsio | ||
BugReports: https://github.com/dselivanov/sparsio/issues | ||
RoxygenNote: 5.0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
YEAR: 2013, 2017 | ||
COPYRIGHT HOLDER: Dmitriy Selivanov <selivanov.dmitriy@gmail.com>, Felix Riedel <felix.riedel@gmail.com> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,8 @@ | ||
export(liblinear) | ||
export(liblinear.new) | ||
export(read.svmlight) | ||
export(write.svmlight) | ||
S3method(liblinear,dgCMatrix) | ||
S3method(liblinear,liblinearProblem) | ||
S3method(predict,liblinear) | ||
S3method(print,liblinear) | ||
useDynLib(sparsity,sparsity_createProblemInstance) | ||
useDynLib(sparsity,sparsity_liblinearTrain) | ||
useDynLib(sparsity,sparsity_readSvmLight) | ||
useDynLib(sparsity,sparsity_writeSvmLight) | ||
# Generated by roxygen2: do not edit by hand | ||
|
||
export(read_svmlight) | ||
export(write_svmlight) | ||
import(Matrix) | ||
import(Rcpp) | ||
import(methods) | ||
useDynLib(sparsio) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,11 @@ | ||
# This file was generated by Rcpp::compileAttributes | ||
# Generated by using Rcpp::compileAttributes() -> do not edit by hand | ||
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393 | ||
|
||
createProblemInstance <- function(inputMatrix, labels) { | ||
.Call('sparsity_createProblemInstance', PACKAGE = 'sparsity', inputMatrix, labels) | ||
read_svmlight_cpp <- function(filename, zero_based = 1L) { | ||
.Call('sparsio_read_svmlight_cpp', PACKAGE = 'sparsio', filename, zero_based) | ||
} | ||
|
||
liblinearTrain <- function(problemPtr, solver_type, cost, epsilon, quiet) { | ||
.Call('sparsity_liblinearTrain', PACKAGE = 'sparsity', problemPtr, solver_type, cost, epsilon, quiet) | ||
} | ||
|
||
#' Reads a sparse matrix from a SVMlight compatible file | ||
#' @param fileName input file name | ||
#' @return list with a sparse matrix and a list of labels | ||
readSvmLight <- function(filename) { | ||
.Call('sparsity_readSvmLight', PACKAGE = 'sparsity', filename) | ||
} | ||
|
||
#' Writes a sparse matrix to a SVMlight compatible file | ||
#' | ||
#' @param inputMatrix sparse matrix | ||
#' @param labels list of numeric labels for each row in the matrix | ||
#' @param fileName output file name | ||
#' @return list with debug information | ||
writeSvmLight <- function(inputMatrix, labels, fileName) { | ||
.Call('sparsity_writeSvmLight', PACKAGE = 'sparsity', inputMatrix, labels, fileName) | ||
write_svmlight_cpp <- function(x, y, filename, zero_based = 1L) { | ||
invisible(.Call('sparsio_write_svmlight_cpp', PACKAGE = 'sparsio', x, y, filename, zero_based)) | ||
} | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
#' @useDynLib sparsio | ||
#' @import Matrix | ||
#' @import Rcpp | ||
#' @import methods | ||
#' | ||
#' @name svmlight | ||
#' @title Fast svmlight reader and writer | ||
#' @description Reads and writes svmlight files. | ||
#' @param x input sparse matrix. Should inherit from \code{Matrix::sparseMatrix}. | ||
#' @param y target values. Labels must be an integer or numeric of the same length as number of rows in \code{x}. | ||
#' @param file string, path to svmlight file | ||
#' @param type target class for sparse matrix. \code{CsparseMatrix} is default value because it | ||
#' is main in R's \code{Matrix} package. However internally matrix first read into \code{RsparseMatrix} | ||
#' and then coerced with \code{as()} to target type. | ||
#' This is because \code{smvlight} format is essentially equal to \code{CSR} sparse matrix format. | ||
#' @param zero_based \code{logical}, whether column indices in file are 0-based (\code{TRUE}) or 1-based (\code{FALSE}). | ||
#' @param ncol number of columns in target matrix. \code{NULL} means that number of coluns will be determined | ||
#' from file (as a maximum index). However it is possible that user expects matrix with a predefined number of columns, | ||
#' so function can override inherited from data value. | ||
#' @examples | ||
#' library(Matrix) | ||
#' library(sparsio) | ||
#' i = 1:8 | ||
#' j = 1:8 | ||
#' v = rep(2, 8) | ||
#' x = sparseMatrix(i, j, x = v) | ||
#' y = sample(c(0, 1), nrow(x), replace = TRUE) | ||
#' f = tempfile(fileext = ".svmlight") | ||
#' write_svmlight(x, y, f) | ||
#' x2 = read_svmlight(f, type = "CsparseMatrix") | ||
#' identical(x2$x, x) | ||
#' identical(x2$y, y) | ||
|
||
#' @rdname svmlight | ||
#' @export | ||
read_svmlight = function(file, type = c("CsparseMatrix", "RsparseMatrix", "TsparseMatrix"), zero_based = TRUE, ncol = NULL) { | ||
stopifnot(is.logical(zero_based)) | ||
type = match.arg(type) | ||
stopifnot(is.character(file) && length(file) == 1) | ||
if(!is.null(ncol)) { | ||
stopifnot(is.numeric(ncol) || length(ncol) != 1) | ||
} | ||
|
||
file = path.expand(file) | ||
if (!file.exists(file)) stop(sprintf("File %s does not exist.", file)) | ||
res = read_svmlight_cpp(file, zero_based) | ||
|
||
if (!is.null(ncol)) { | ||
ncol_discovered = ncol(res$x) | ||
ncol_provided = as.integer(ncol) | ||
if (ncol_discovered > ncol_provided) | ||
stop(sprintf("input contais at least %d columns while user provided %d as 'ncol'", ncol_discovered, ncol_provided)) | ||
res$x@Dim = c(nrow(res$x), ncol_provided) | ||
} | ||
|
||
if(type != "RsparseMatrix") | ||
res$x = as(res$x, type) | ||
|
||
res | ||
} | ||
|
||
#' @rdname svmlight | ||
#' @export | ||
write_svmlight = function(x, y = rep(0, nrow(x)), file, zero_based = TRUE) { | ||
stopifnot(inherits(x, "sparseMatrix")) | ||
stopifnot(is.logical(zero_based)) | ||
stopifnot(is.numeric(y)) | ||
stopifnot(length(y) == nrow(x)) | ||
stopifnot(is.character(file) && length(file) == 1) | ||
|
||
file = path.expand(file) | ||
|
||
if(!inherits(x, "RsparseMatrix")) { | ||
x = try(as(x, "RsparseMatrix")) | ||
if(class(x) == "try-error") | ||
stop("can't convert input into 'RsparseMatrix' class in order to write it to svmlight") | ||
} | ||
|
||
write_svmlight_cpp(x, y, file, zero_based) | ||
invisible(TRUE) | ||
} | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,30 @@ | ||
# sparsity | ||
## sparsio | ||
|
||
*sparsity* is an R package with functions for sparse matrices. | ||
**sparsio** is an R package for **I/O** operations with sparse matrices. At the moment it provides **fast** `svmlight` reader and writer. | ||
|
||
## Why use sparsity | ||
* `read_svmlight()` | ||
* `write_svmlight()` | ||
|
||
### Reading and writing SVMlight format | ||
**The only dependency is `Rcpp`** | ||
|
||
`read.svmlight()` and `write.svmlight()` read/write sparse matrices in SVMlight format. | ||
You will find other functions for this on the internet, but the ones I found were either slow or handled only dense (=normal) matrices. | ||
|
||
### LIBLINEAR integration | ||
|
||
The [LiblineaR CRAN package](http://cran.r-project.org/web/packages/LiblineaR/) provides an R interface to the [LIBLINEAR library](http://www.csie.ntu.edu.tw/~cjlin/liblinear/), but uses a dense representation. *sparsity*'s functions use sparse matrices (from the Matrix package) instead. In addition it gives you a pointer to LIBLINEAR's internal representation of the data, which means you can train multiple models without the overhead of transforming the input data. | ||
Package is not on CRAN yet, so you can install it with `devtools`: | ||
```r | ||
devtools::install_github("dselivanov/sparsio") | ||
``` | ||
|
||
## Installation | ||
## Quick reference | ||
|
||
```r | ||
# install.packages("devtools") | ||
library(devtools) | ||
install_github("sparsity", "felixr") | ||
``` | ||
library(Matrix) | ||
library(sparsio) | ||
i = 1:8 | ||
j = 1:8 | ||
v = rep(2, 8) | ||
x = sparseMatrix(i, j, x = v) | ||
y = sample(c(0, 1), nrow(x), replace = TRUE) | ||
f = tempfile(fileext = ".svmlight") | ||
write_svmlight(x, y, f) | ||
x2 = read_svmlight(f, type = "CsparseMatrix") | ||
identical(x2$x, x) | ||
identical(x2$y, y) | ||
``` |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.