Skip to content
Ondřej Moravčík edited this page Mar 25, 2015 · 11 revisions

LabelPoint

Class that represents the features and labels of a data point.

  • label: Label for this data point.
  • features: List of features for this data point.

A data point is a local vector, either dense or sparse. In MLlib, labeled points are used in supervised learning algorithms. For binary classification, a label should be either 0 (negative) or 1 (positive). For multiclass classification, labels should be class indices starting from zero: 0, 1, 2.

# LabelPoint.new(label, features)

LabelPoint.new(5.0, [1, 2, 3])
LabelPoint.new(5.0, DenseVector.new([1, 2, 3, 4, 5]))
LabelPoint.new(5.0, SparseVector.new(5, {1 => 1.0, 3 => 5.5}))

Vector

Currently vector is represented by ruby's Vector library. Next version should support more implementations

DenseVector

Dense vector is a vector in which most of the elements are non-zero.

# DenseVector.new(values)

DenseVector.new([1,2,3,4,5]).values
# => [1, 2, 3, 4, 5]

DenseVector.new(1..5).values
# => [1, 2, 3, 4, 5]
SparseVector

Sparse vector is a vector in which most of the elements are zero. Its vector represented by an index array and an value array.

# SparseVector.new(size, values_or_indices, values)

SparseVector.new(4, {1 => 1.0, 3 => 5.5}).values
# => [0, 1.0, 0, 5.5]

SparseVector.new(4, [[1, 3], [1.0, 5.5]]).values
# => [0, 1.0, 0, 5.5]

SparseVector.new(4, [1, 3], [1.0, 5.5]).values
# => [0, 1.0, 0, 5.5]

Matrix

Matrix is reperesend by ruby's library Matrix and it's 2-dimensional.

DenseMatrix
# DenseMatrix.new(rows, cols, values)

DenseMatrix.new(2, 3, [[1,2,3], [4,5,6]]).values
# => [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
SparseMatrix
col_pointers
The index corresponding to the start of a new column.
row_indices
The row index of the entry. They must be in strictly increasing order for each column.
values
Nonzero matrix entries in column major.
# SparseMatrix.new(rows, cols, col_pointers, row_indices, values)

SparseMatrix.new(3, 3, [0, 2, 3, 6], [0, 2, 1, 0, 1, 2], [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]).values

# => [
#      [1.0, 0.0, 4.0],
#      [0.0, 3.0, 5.0],
#      [2.0, 0.0, 6.0]
#    ]