In [1]:
import numpy as np

# Input formats

This tutorial notebook details the accepted input formats for observation sequences and labels accepted by Sequentia.

---

- [Observation sequences](#Observation-sequences)
- [Class labels](#Class-labels)

## Observation sequences

An individual observation sequence is expected to be represented by a $(T \times D)$ [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html), where $T$ is the duration of the sequence and $D$ is the number of features. If the sequence only has one feature, it can also be represented by a one-dimensional $(T,)$ `numpy.ndarray`.

As the duration $T^{(i)}$ of sequence $O^{(i)}$ may be different from any other sequence, **a collection of observation sequences must be stored in a `list`**.

---

The `fit()` and `evaluate()` functions for all Sequentia classifiers will only accept multiple observation sequences. However, the `predict()` function allows both single and multiple observation sequences.

In [2]:
# Single observation sequence example
x = np.array([[1., 6.2, 8.8], [3.5, 2.1, 7.4]])
display(x)
print('Observation sequence shape: (T = {}, D = {})'.format(*x.shape))

array([[1. , 6.2, 8.8],
       [3.5, 2.1, 7.4]])

Observation sequence shape: (T = 2, D = 3)


In [3]:
# Multiple observation sequences example
X = [np.random.random((i * 2, 3)) for i in range(1, 4)]
display(X)
print('Observation sequence shapes: {}'.format([x.shape for x in X]))

[array([[0.03130173, 0.75396225, 0.70723105],
        [0.16614266, 0.75650318, 0.72474189]]),
 array([[0.60238758, 0.9529836 , 0.49722337],
        [0.38904382, 0.90776179, 0.84512329],
        [0.54760414, 0.66003086, 0.09628357],
        [0.55706174, 0.64400256, 0.36751279]]),
 array([[0.69270143, 0.39698291, 0.47950034],
        [0.00817308, 0.91378207, 0.92703742],
        [0.77324528, 0.59974812, 0.66695837],
        [0.55661652, 0.35389679, 0.11509332],
        [0.36695717, 0.8114012 , 0.65035435],
        [0.79469376, 0.09025447, 0.71253342]])]

Observation sequence shapes: [(2, 3), (4, 3), (6, 3)]


## Class labels

Each class label is expected to be a string or numeric object. 

**A collection of class labels can be represented by any array-like object**, given that all labels in the collection are of the same type.

---

Internally, Sequentia uses the [`sklearn.preprocessing.LabelEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) to generate a mapping of the classes to the non-negative integers $0,1,2,\ldots$. The `predict()` function for all classifiers supports a boolean parameter `original_labels` (defaults to true), which specifies whether or not to return the classes in the $0,1,2,\ldots$ mapping, or the original labels.

The `fit()` and `evaluate()` functions for all Sequentia classifiers will only accept multiple labels. However, the `predict()` function allows both single and multiple labels.