Sequence is a small library for translating sequences into string kernels and transition probability matrices. It operates on any iterable of hashable types (such a list of of strings).
Sequence is an early work in progress, developed for Mozilla's Test Pilot Project.
NumPy / SciPy are required.
There are two main modules:
- tpm.py - contains the TransitionProbabilityMatrix class. Iteratively grows a probability matrix as new tokens come in.
- string_kernel.py - contains a number of functions to translate sequences into string kernels. Currently does not translate into NumPy or SciPy objects, since I'm still working out the details to grow sparse matrices.
Here is an active list of todos:
- get the string_kernel.py module to work nicely with sparse matrices.
- generalize the transition probability matrix to work for any order.