# Example

In [1]:
import tom

## Creating training and test data

First, we initialize a random generator. We use this to randomly create a 7-dimensional OOM with an alphabet size of $|\Sigma| = 5$ and a 7-dimensional input-output OOM with an output alphabet size of $|\Sigma_O| = 5$ and an input alphabet size of $|\Sigma_I| = 3$. From both of these, we sample a training sequence of length $10^7$ and five test sequences of length $10^5$. In the case of the input-output OOM, the inputs are by default chosen independently uniformly random at each time step, since we -- for simplicity -- do not supply an input policy.

We will use initial subsequences of the training sequences of increasing lengths $\{10^2, 10^{2.5}, 10^3, 10^{3.5}\ldots, 10^7 \}$ as data for the OOM and IO-OOM estimation, and test the performance of the learnt models on the test sequences.

In [2]:
rand = tom.Random(123456789)
oom = tom.Oom(7, 5, 0, 10, rand)
io_oom = tom.Oom(7, 5, 3, 10, rand)
train_seq = oom.generate(10**7, rand)
io_train_seq = oom.generate(10**7, rand)
test_seqs = []
io_test_seqs = []
for i in range(5):
    oom.reset()
    io_oom.reset()
    test_seqs.append(oom.generate(10**5, rand))
    io_test_seqs.append(io_oom.generate(10**5, rand))
train_lengths = [int(10**(k/2)) for k in range(4,15)]

## Performing spectral learning

For each spectral learning run, the following steps need to be performed:

1. For sequences $\bar{x}$, estimate from the available data the values $\hat{f}(\bar{x})$, where $f(\bar{x}) = P(\bar{x})$ is the stationary probability of observing $\bar{x}$. This is accomplished  by a `tom.Estimator` object, which uses a suffix tree representation of the data in the form of a `tom.STree` to compute these estimates efficiently.

2. Select sets $X, Y \subseteq \Sigma^*$ of "indicative" and "characteristic" words that determine which of the above sequence estimates will be used for the spectral learning. This is accomplished by the function `tom.getWordsFromData`. , which will index the columns and rows of certain Assemble the estimates into matrices $\hat{F}^{X,Y}$ and $\hat{F}_z^{X,Y}$, where 

In [5]:
tom.tomlib.wordsFromData

SyntaxError: invalid syntax (<ipython-input-5-e4a12667f57a>, line 1)

In [None]:
abbabbc : "xa and xab and xabb have same statistics". 