In [1]:
import numpy as np

# Input formats

This tutorial notebook details the accepted input formats for observation sequences and labels accepted by Sequentia.

---

- [Observation sequences](#Observation-sequences)
- [Class labels](#Class-labels)

## Observation sequences

_Observation sequences are expected to be `numpy.ndarray` objects – with a collection of them being stored in a `list` specifically. The reason for requiring the `list` type is that each observation sequence may be of different length, so a `numpy.ndarray` or matrix type won't work._

_Furthermore, since Sequentia generally handles multivariate observation sequences, each `numpy.ndarray` is expected to be two-dimensional, of shape $(T \times D)$._

> **Where**: 
> - $T$ is the duration of the observation sequence, or number of frames.
> - $D$ is the dimensionality of the observation sequence, or number of features.

However, one-dimensional Numpy arrays are also supported.

In [2]:
# Single observation sequence example
x = np.array([[1., 6.2, 8.8], [3.5, 2.1, 7.4]]) # T=2, D=3
display(x)
print('Observation sequence shape: {}'.format(x.shape))

array([[1. , 6.2, 8.8],
       [3.5, 2.1, 7.4]])

Observation sequence shape: (2, 3)


In [3]:
# Multiple observation sequences example
X = [np.random.random((2*i, 3)) for i in range(1, 4)]
display(X)
print('Observation sequence shapes: {}'.format([x.shape for x in X]))

[array([[0.88106508, 0.99858085, 0.77369155],
        [0.28284203, 0.75292064, 0.95375735]]),
 array([[0.41206904, 0.86036593, 0.58773291],
        [0.65625876, 0.20153214, 0.84528973],
        [0.39544331, 0.60924804, 0.38701804],
        [0.40972901, 0.54378744, 0.08992236]]),
 array([[0.0499857 , 0.91966294, 0.87008349],
        [0.18096322, 0.32062816, 0.24550721],
        [0.39841299, 0.76461367, 0.00779053],
        [0.31823326, 0.37545643, 0.12359424],
        [0.80709865, 0.50087484, 0.65803457],
        [0.37960356, 0.56724547, 0.57579345]])]

Observation sequence shapes: [(2, 3), (4, 3), (6, 3)]


The `fit()` and `evaluate()` functions for all Sequentia classifiers will only accept multiple observation sequences. However, the `predict()` function allows both single and multiple observation sequences.

## Class labels

_Class labels are expected to be `str` objects – with a collection of them being stored in a `list` specifically._

---

This is as a direct consequence of the [`pomegranate.hmm.HiddenMarkovModel`](https://pomegranate.readthedocs.io/en/latest/HiddenMarkovModel.html#pomegranate.hmm.HiddenMarkovModel) class requiring a string to be passed as the `name` parameter in the constructor. In the case of a HMM classifier, each HMM represents a single class, and therefore we set the `name` of each HMM to be the label of the class it represents.

The implementation of $k$-NN in Sequentia can easily be modified internally to handle labels of arbitrary type – but to keep consistent with the above restriction on `HMM`s requiring string labels, the `DTWKNN` class also requires labels to be strings.

The `fit()` and `evaluate()` functions for all Sequentia classifiers expect labels to be of type `list(str)`.