ParTAGe4XMG is a command-line tool which allows to use ParTAGe, a Haskell library dedicated to parsing tree adjoining grammars (TAGs), with XMG-generated TAG grammars.
It is recommanded to install ParTAGe4XMG using the Haskell Tool Stack, which you will need to download and install on your machine beforehand. Then:
- Create an empty directory, which will be dedicated for the ParTAGe source code,
- Clone the
xmg
branch of the ParTAGe repository into this empty directory, - Clone this repository into the same directory,
- Run
stack install
in the local copy of thepartage4xmg
repository.
Under linux, you can use the following sequence of commands to set up ParTAGe4XMG:
mkdir partage-src
cd partage-src
git clone -b xmg https://github.com/kawu/partage.git
git clone https://github.com/kawu/partage4xmg.git
cd partage4xmg
stack install
The final command will by default install the partage4xmg
command-line tool in
the ~/.local/bin
directory. You can either add this directory to your $PATH
,
or use the full path to run partage4xmg
:
$ ~/.local/bin/partage4xmg --help
To update the ParTAGe4XMG tool, use git pull
in both repositories downloaded
during installation and run stack install --force-dirty
in the local copy of
partage4xmg
.
Under linux, assuming that you are in the partage-src
directory, you can use
the following sequence of commands to perform the update:
cd partage
git pull
cd ../partage4xmg
git pull
stack install --force-dirty
The usage of --force-dirty
ensures that stack
does not overlook any of the
modifications pulled from the upstream repositories.
Provided that you have a grammar grammar.xml
file, a lexicon lemma.xml
file
and a morphology morph.xml
file, you can retrieve all the derived trees for a
given sentence using the following command:
echo "a sentence to parse" | partage4xmg parse -g grammar.xml -l lemma.xml -m morph.xml -s S
where the argument of the -s
option specifies the axiom symbol.
Note that in this mode the parser can take some time to read the grammar,
especially if the input .xml
files are big.
If you have several sentences to parse, you can write them in a single file and
provide it as input for the parser, which will then read and process them one by
one.
If you want the parser to pretty print derivations rather than derived trees,
use the -d
(--derivations
) option, as in:
echo "a sentence to parse" | partage4xmg parse -g grammar.xml -l lemma.xml -m morph.xml -s S -d
To enable support for feature structures (FSs), use the -u
(--use-features
)
option. Note that at the moment the parser reports FSs only for derivations and
not for derived trees.
Note that currently the parser provides support for flat FSs only (with the top/bottom distinction).
At the moment the command-line tool does not implement any smart tokenization strategies and it assumes that you supply the input sentences with words separated by spaces.