This is a framework for classifying datastreams.
It includes a few interfaces, a function to run an experiment with specified modules, and a few implementations of different modules.
An object that can provide a stream of observations and corresponding labels at arbitrary points in time.
Properties:
tMax
: the maximum time (ending time) of the experimenty
: A column vector specifying the possible class labelsd
: A scalar specifying the number of dimensions in the dataset
Methods:
[x, y] = obj.sample(t, y)
: Sample a new observation at timet
from classy
.y
should be optional and drawn from a uniform distribution containing all possible classes by default.t
is required.
Implementations:
RotatingGaussians- a 2d environment that generates observations from Gaussians that are rotating around the origin
obj = RotatingGaussians(n);
n
is the number of Gaussian classes to use. They will be divided equally around the origin, and will all rotate in the same direction around a circle of radius 5 at1
full rotation per second.
TrailingGaussians- an experiment where Gaussians will move uniformly along a line.
obj = TrailingGaussians(opts)
opts
is astruct
with the following fields:
Field | Description |
---|---|
C |
The number of classes (Gaussians) to sample from (defualt: 3 ) |
dmu |
The rate at which the distributions move. Either a scalar or a vector in units / second to move in each dimension. (defualt: 1 ) |
spread |
How far (Euclidean distance) each Gaussian will be from the one in front / behind it (default: 5 ) |
sig |
Either a scalar specifying the diagonal covariances of each Gaussian or a covariance matrix (default: 1 ) |
d |
The number of dimensions for the experiment (must agree with sig ) (default: 2 ) |
tMax |
The maximum time of the experiment (default: 1.1 times the number of seconds required for the last distribution to reach the original position of the first) |
This object is used to plot datasets that come from a stream. Objects that implement this interface should hide older observations as the data changes.
Properties:
axh
: The handle to the axes the plotter is usingn
: The number of most recent points to retain on the graphcolors
: A * by 3 matrix specifying the RGB values of the plotter's colormap. This can be any length, and the plotter should restart from the beginning of the matrix if there are more classes than colors
Methods:
obj.plot(X, c)
plot observations inX
as color specified by indexc
.c
must have the same number of rows asX
or be scalar.
Implementations:
Plotter2d- a plotter in 2 dimensions that will discard old observations
in a FIFO manner. Note: observations that were provided to the plotter in
batches will be deleted in batches. So if plot
was called with X
being
matrix with 50 rows, those observations will remain on the axes until plot
is called opts.N
more times, at which point they will all be deleted at
once.
obj = Plotter2d(opts);
opts
is astruct
with the following fields:
Field | Description |
---|---|
axh | An existing axis handle that Plotter2d should attach to |
N | The number of points to retain on the graph at once (FIFO) |
colors | A Cx3 matrix where C is the number of colors in the colormap and columns are RGB values |
The actual classifier for data streams
Methods:
-
obj.train(X, y, t)
: Train the classifier with observations specified inX
that were drawn at timet
from classy
. The number of rows inX
,y
, andt
must match.y
can be a vector or matrix containing posterior probabilities. Ify
is a matrix, the sum of its columns must be a vector of1
s. -
h = obj.classify(X, t)
: Predict the labels of the observations inX
drawn at timet
. The number of rows inX
andt
must be the same.h
is a vector containing the same number of rows asX
andt
, with integers specifying the class label that the observations belong to, according to the classifier.
Implementations:
ForgettingKnnClassifier
obj = ForgettingKnnClassifier(opts);
opts
is astruct
with the following fields:
Field | Description |
---|---|
rowPadding |
The number of rows to grow the resizable array X when it gets full (default: 500 ) |
k |
The number of nearest neighbors to consider (default: 25 ) |
beta |
The forgetting rate over time (default: .1 ) |