LoadingRegular Expression Operators
- Special symbols
- Xerox regexp syntax
- Using HFST with native backend tools
- For developers
Clone this wiki locally
HFST - The Helsinki Finite-State Transducer technology is intended for creating and manipulating weighted or unweighted synchronic transducers implementing regular relations. UTF-8 is chosen as the character encoding used in HFST software. Currently, HFST has been implemented using the SFST, OpenFst and foma software libraries. Other versions may be added in some future release. SFST and foma implementations are unweighted and OpenFst implementation is weighted.
Part of HFST interface has been implemented for HFST's own two transducer formats, HfstBasicTransducer and optimized lookup format. The previous is useful for accessing individual states and transitions of a transducer, converting between transducer formats and storing transducers in an implementation-independent format. The latter is used for fast lookup of strings in a transducer.
All back-end implementations - SFST, OpenFst and foma - work according to the same interface, so it is possible to compile the same piece of code using different back-end libraries. There are some differences related to weights, as only OpenFst supports them.
HFST is written in C++, but there is a Python interface available which is documented on these pages. The Python API is basically a wrapper around the C++ API with some additional code and modifications. The C++ API has developed around the HFST command line tools (documented in a separate wiki), but the Python version is intended to be used as such and has been designed to be more user-friendly.
For a quick start to the HFST interface with examples, see here
- Create transducers and apply operations on them
- Create transducers from scratch
- Iterate through a transducer's states and transitions
- Create transducers by tokenizing UTF-8 strings with multicharacter symbols
- Apply two-level, restriction and coercion rules
- Download and install HFST Python API