Skip to content
Data Mining / ML tools for analyzing sequential data.
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
README.md
string_kernel.py
tpm.py

README.md

Sequence

Sequence is a small library for translating sequences into string kernels and transition probability matrices. It operates on any iterable of hashable types (such a list of of strings).

Sequence is an early work in progress, developed for Mozilla's Test Pilot Project.

Details

NumPy / SciPy are required.

There are two main modules:

  • tpm.py - contains the TransitionProbabilityMatrix class. Iteratively grows a probability matrix as new tokens come in.
  • string_kernel.py - contains a number of functions to translate sequences into string kernels. Currently does not translate into NumPy or SciPy objects, since I'm still working out the details to grow sparse matrices.

To-Dos

Here is an active list of todos:

  • get the string_kernel.py module to work nicely with sparse matrices.
  • generalize the transition probability matrix to work for any order.
Something went wrong with that request. Please try again.