Servlet that computes candidate tool workflows given input file(s) and the user's requirements regarding the output. Afterwards, runs a workflow selected by the user from the list of candidates.
Updated Apr 22, 2015
Using supervised learning, create a set of affix rules for use by the CSTlemma lemmatiser.
Updated Apr 21, 2015
Lemmatiser that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
OpenCV-based Plugin for the Anvil annotation software that tracks faces and creates annotations when velocity or acceleration thresholds are transgressed.
Updated Apr 8, 2015
Functions for upper/lower casing, for testing whether a character is a letter and for conversion between Unicode encodings UTF-8 and UTF-16
Updated Apr 2, 2015
Reads an RTF or flat text file and outputs the text, one line per sentence & optionally tokenized.
Updated Mar 4, 2015
Modernized version of Eric Brill's Part Of Speech tagger.
Updated Feb 13, 2015
Simple implementation of a hash map using separate chaining. The table allocates more buckets if the load factor is more than 100% and frees buckets if the loadfactor falls below 20%.
Parse sgml, html and xml in a forgiving way.
Updated Aug 9, 2014
converts UTF-16 (BE/LE), UTF-32 (BE/LE), ISO-8859-N to UTF-8. Removes BOM and surrogate pairs from UTF-8, converting a codepoint between U-D800 and U-DBFF followed by a codepoint between U-DC00 and U-DFFF to one valid codepoint > U-FFFF.
Updated Jun 20, 2014