The hist-pl package is a wrapper package covering functionality related to the historical dictionary of Polish. It also provides a command-line tool which can be used to create the binary version of the dictionary and to perform simple analysis of the input text.
To install hist-pl from the official Hackage repository run:
cabal install hist-pl
If you want to upgrade hist-pl to a newer version you should update the package list first:
cabal update cabal install hist-pl
To install the latest development version from github just run
umbrella repository directory.
Before you use the library, make sure that
Morfeusz is available.
Otherwise, you will get an error:
hist-pl: error while loading shared libraries: libmorfeusz.so.0
Morfeusz bindings' library is a dependency of the
hist-pl package, so
it should be already installed when you try to use the tool.
The hist-pl package provides a
hist-pl command-line tool with
the following functionality:
To translate the original LMF dictionary into a binary format, use the
create mode of the
hist-pl command-line tool. Apart from the
LMF dictionary, you have to supply the PoliMorf dictionary,
which will be used to update the binary dictionary with contemporary
hist-pl create srpsdp.xml PoliMorf-X.tab srpsdp.bin
PoliMorf-X.tab is a version of PoliMorf and
srpsdp.bin is a directory to be created for storage of the
Since the process involves creating a DAWG version of PoliMorf, it may take several minutes to complete.
Be aware, that conversion from LMF to the binary format is lossy at the moment.
To convert the binary dictionary into the LMF format use the
hist-pl print srpsdp.bin > srpsdp-prim.xml
analyse mode to perform a simple dictionary-driven analysis
of the input text.
hist-pl analyse srpsdp.bin < input.txt
Every line in the input will be treated as a separate sentence.
Then, each sentence will be splited on spaces and punctuation characters.
Finally, the binary dictionary will be searched for every token in the
input data and results of the search will be printed to
hist-pl analyse --help to learn more about the program arguments and
possible labeling options.
At this moment, the
analyse mode is provided for presentation purposes.
If you would like to make use of labeling results, you should use the
library API (see next section) and process the results on the application
level, depending on your goals.
The hist-pl-lexicon library, installed as a dependency
hist-pl wrapper package, provides a simple interface for accessing
the contents of the binary dictionary. See the
NLP.HistPL.Lexicon module for an example of the library
usage and a detailed API description.