Skip to content

Commit

Permalink
Complete redesign and bunch of bug fixes. (rcppeigen no more needed, …
Browse files Browse the repository at this point in the history
…liblinear descoped - focus only on I/O operations)
  • Loading branch information
dselivanov committed Apr 13, 2017
1 parent 4a16451 commit f00bbb0
Show file tree
Hide file tree
Showing 51 changed files with 457 additions and 5,825 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
^.*\.Rproj$
^\.Rproj\.user$
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
*.so

.*.sw[po]
.Rproj.user
38 changes: 23 additions & 15 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,17 +1,25 @@
Package: sparsity
Package: sparsio
Type: Package
Title: What the package does (short line)
Version: 1.0
Date: 2013-06-22
Author: Felix Riedel
Maintainer: Felix Riedel <felix.riedel@gmail.com>
Description: More about what it does (maybe more than one line)
License: BSD License
Title: I/O opeations with sparse matrices
Version: 2.0
Date: 2017-04-13
Authors@R: c(person("Dmitriy", "Selivanov", role = c("aut", "cre"),
email = "selivanov.dmitriy@gmail.com"),
person("Felix", "Riedel", role = c("aut"),
email = "felix.riedel@gmail.com"))
Maintainer: Dmitriy Selivanov <selivanov.dmitriy@gmail.com>
Encoding: UTF-8
Description: Fast SVMlight reader and writer.
License: BSD_3_clause + file LICENSE
Depends:
Rcpp (>= 0.10.3),
RcppEigen (>= 0.3.1)
LinkingTo: Rcpp, RcppEigen
Collate:
'liblinear.r'
'RcppExports.R'
'sparsity-io.R'
R (>= 3.1.0),
methods
Imports:
Rcpp (>= 0.12.0),
Matrix (>= 1.1)
LinkingTo: Rcpp
Suggests:
testthat
URL: https://github.com/dselivanov/sparsio
BugReports: https://github.com/dselivanov/sparsio/issues
RoxygenNote: 5.0.1
2 changes: 2 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
YEAR: 2013, 2017
COPYRIGHT HOLDER: Dmitriy Selivanov <selivanov.dmitriy@gmail.com>, Felix Riedel <felix.riedel@gmail.com>
20 changes: 8 additions & 12 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
export(liblinear)
export(liblinear.new)
export(read.svmlight)
export(write.svmlight)
S3method(liblinear,dgCMatrix)
S3method(liblinear,liblinearProblem)
S3method(predict,liblinear)
S3method(print,liblinear)
useDynLib(sparsity,sparsity_createProblemInstance)
useDynLib(sparsity,sparsity_liblinearTrain)
useDynLib(sparsity,sparsity_readSvmLight)
useDynLib(sparsity,sparsity_writeSvmLight)
# Generated by roxygen2: do not edit by hand

export(read_svmlight)
export(write_svmlight)
import(Matrix)
import(Rcpp)
import(methods)
useDynLib(sparsio)
27 changes: 5 additions & 22 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
@@ -1,28 +1,11 @@
# This file was generated by Rcpp::compileAttributes
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

createProblemInstance <- function(inputMatrix, labels) {
.Call('sparsity_createProblemInstance', PACKAGE = 'sparsity', inputMatrix, labels)
read_svmlight_cpp <- function(filename, zero_based = 1L) {
.Call('sparsio_read_svmlight_cpp', PACKAGE = 'sparsio', filename, zero_based)
}

liblinearTrain <- function(problemPtr, solver_type, cost, epsilon, quiet) {
.Call('sparsity_liblinearTrain', PACKAGE = 'sparsity', problemPtr, solver_type, cost, epsilon, quiet)
}

#' Reads a sparse matrix from a SVMlight compatible file
#' @param fileName input file name
#' @return list with a sparse matrix and a list of labels
readSvmLight <- function(filename) {
.Call('sparsity_readSvmLight', PACKAGE = 'sparsity', filename)
}

#' Writes a sparse matrix to a SVMlight compatible file
#'
#' @param inputMatrix sparse matrix
#' @param labels list of numeric labels for each row in the matrix
#' @param fileName output file name
#' @return list with debug information
writeSvmLight <- function(inputMatrix, labels, fileName) {
.Call('sparsity_writeSvmLight', PACKAGE = 'sparsity', inputMatrix, labels, fileName)
write_svmlight_cpp <- function(x, y, filename, zero_based = 1L) {
invisible(.Call('sparsio_write_svmlight_cpp', PACKAGE = 'sparsio', x, y, filename, zero_based))
}

84 changes: 0 additions & 84 deletions R/liblinear.r

This file was deleted.

82 changes: 82 additions & 0 deletions R/sparsio.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#' @useDynLib sparsio
#' @import Matrix
#' @import Rcpp
#' @import methods
#'
#' @name svmlight
#' @title Fast svmlight reader and writer
#' @description Reads and writes svmlight files.
#' @param x input sparse matrix. Should inherit from \code{Matrix::sparseMatrix}.
#' @param y target values. Labels must be an integer or numeric of the same length as number of rows in \code{x}.
#' @param file string, path to svmlight file
#' @param type target class for sparse matrix. \code{CsparseMatrix} is default value because it
#' is main in R's \code{Matrix} package. However internally matrix first read into \code{RsparseMatrix}
#' and then coerced with \code{as()} to target type.
#' This is because \code{smvlight} format is essentially equal to \code{CSR} sparse matrix format.
#' @param zero_based \code{logical}, whether column indices in file are 0-based (\code{TRUE}) or 1-based (\code{FALSE}).
#' @param ncol number of columns in target matrix. \code{NULL} means that number of coluns will be determined
#' from file (as a maximum index). However it is possible that user expects matrix with a predefined number of columns,
#' so function can override inherited from data value.
#' @examples
#' library(Matrix)
#' library(sparsio)
#' i = 1:8
#' j = 1:8
#' v = rep(2, 8)
#' x = sparseMatrix(i, j, x = v)
#' y = sample(c(0, 1), nrow(x), replace = TRUE)
#' f = tempfile(fileext = ".svmlight")
#' write_svmlight(x, y, f)
#' x2 = read_svmlight(f, type = "CsparseMatrix")
#' identical(x2$x, x)
#' identical(x2$y, y)

#' @rdname svmlight
#' @export
read_svmlight = function(file, type = c("CsparseMatrix", "RsparseMatrix", "TsparseMatrix"), zero_based = TRUE, ncol = NULL) {
stopifnot(is.logical(zero_based))
type = match.arg(type)
stopifnot(is.character(file) && length(file) == 1)
if(!is.null(ncol)) {
stopifnot(is.numeric(ncol) || length(ncol) != 1)
}

file = path.expand(file)
if (!file.exists(file)) stop(sprintf("File %s does not exist.", file))
res = read_svmlight_cpp(file, zero_based)

if (!is.null(ncol)) {
ncol_discovered = ncol(res$x)
ncol_provided = as.integer(ncol)
if (ncol_discovered > ncol_provided)
stop(sprintf("input contais at least %d columns while user provided %d as 'ncol'", ncol_discovered, ncol_provided))
res$x@Dim = c(nrow(res$x), ncol_provided)
}

if(type != "RsparseMatrix")
res$x = as(res$x, type)

res
}

#' @rdname svmlight
#' @export
write_svmlight = function(x, y = rep(0, nrow(x)), file, zero_based = TRUE) {
stopifnot(inherits(x, "sparseMatrix"))
stopifnot(is.logical(zero_based))
stopifnot(is.numeric(y))
stopifnot(length(y) == nrow(x))
stopifnot(is.character(file) && length(file) == 1)

file = path.expand(file)

if(!inherits(x, "RsparseMatrix")) {
x = try(as(x, "RsparseMatrix"))
if(class(x) == "try-error")
stop("can't convert input into 'RsparseMatrix' class in order to write it to svmlight")
}

write_svmlight_cpp(x, y, file, zero_based)
invisible(TRUE)
}

29 changes: 0 additions & 29 deletions R/sparsity-io.R

This file was deleted.

38 changes: 23 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,30 @@
# sparsity
## sparsio

*sparsity* is an R package with functions for sparse matrices.
**sparsio** is an R package for **I/O** operations with sparse matrices. At the moment it provides **fast** `svmlight` reader and writer.

## Why use sparsity
* `read_svmlight()`
* `write_svmlight()`

### Reading and writing SVMlight format
**The only dependency is `Rcpp`**

`read.svmlight()` and `write.svmlight()` read/write sparse matrices in SVMlight format.
You will find other functions for this on the internet, but the ones I found were either slow or handled only dense (=normal) matrices.

### LIBLINEAR integration

The [LiblineaR CRAN package](http://cran.r-project.org/web/packages/LiblineaR/) provides an R interface to the [LIBLINEAR library](http://www.csie.ntu.edu.tw/~cjlin/liblinear/), but uses a dense representation. *sparsity*'s functions use sparse matrices (from the Matrix package) instead. In addition it gives you a pointer to LIBLINEAR's internal representation of the data, which means you can train multiple models without the overhead of transforming the input data.
Package is not on CRAN yet, so you can install it with `devtools`:
```r
devtools::install_github("dselivanov/sparsio")
```

## Installation
## Quick reference

```r
# install.packages("devtools")
library(devtools)
install_github("sparsity", "felixr")
```
library(Matrix)
library(sparsio)
i = 1:8
j = 1:8
v = rep(2, 8)
x = sparseMatrix(i, j, x = v)
y = sample(c(0, 1), nrow(x), replace = TRUE)
f = tempfile(fileext = ".svmlight")
write_svmlight(x, y, f)
x2 = read_svmlight(f, type = "CsparseMatrix")
identical(x2$x, x)
identical(x2$y, y)
```
4 changes: 0 additions & 4 deletions inst/tests/simple-tests.R

This file was deleted.

11 changes: 0 additions & 11 deletions man/liblinear.Rd

This file was deleted.

12 changes: 0 additions & 12 deletions man/liblinear.liblinearProblem.Rd

This file was deleted.

11 changes: 0 additions & 11 deletions man/liblinear.new.Rd

This file was deleted.

Loading

0 comments on commit f00bbb0

Please sign in to comment.