la ilmentufa is a collection of formal grammars and syntactical parsers for the Lojban language, as well as related tools and interfaces.
camxes.peg), and the parsers have the same name as their corresponding grammar but with a
camxes.peg: Standard PEG grammar for Lojban.
camxes-beta.peg: Same as camxes.peg, but with the addition of the most popular and backward-compatible experimental cmavo and grammar changes.
camxes-beta-cbm.peg: Same as camxes-beta.peg, but with the Cmevla-Brivla Merger experimental grammar change.
camxes-beta-cbm-ckt.peg: Same as above, but with Ce-Ki-Tau experimental grammar.
camxes-exp.peg: Experimental grammar sandbox.
Main interfaces to the parsers:
camxes.html: HTML interface with various parsing options and allowing selecting the desired parser.
glosser/glosser.htm: Another HTML interface with different features, most prominently nested boxes output and glosses.
run_camxes.js: Command line interface.
ircbot/camxes-bot.js: IRC bot interface.
For generating a PEGJS grammar engine from its PEG grammar file, as well as for running the IRC bot interfaces, you need to have Node.js installed on your machine.
However, as the necessary
node_modules are already included in this project, I think you'll probably not have to download any of the aforementioned modules. ;)
Building a PEGJS parser
For generating a PEGJS parser from a
.peg grammar file, after having set the Ilmentufa directory as the working directory, run the following commands (we'll take the
camxes.peg grammar for the example):
nodejs pegjs_conv camxes.peg nodejs build-camxes camxes.pegjs
(In some installations, the keyword
nodejs doesn't work and should be replaced with
node instead in the above commands.)
The first command (with
pegjs_conv) converts the pure PEG grammar file (
*.peg) to PEGJS format, creating or updating the file
camxes.pegjs in this example.
The second command (with
build-camxes) creates or updates the corresponding parser engine,
camxes.js in this case.
Building the parser can take several dozen seconds.
Generating CBM and CKT grammars
camxes-beta-cbm-ckt.peg are generated from
camxes-beta.peg using a couple scripts.
Here's how to do:
nodejs std-to-cbm camxes-beta.peg nodejs make-ckt camxes-beta-cbm.peg
Running a parser from command line
Here's how to parse the Lojban text "coi ro do" with the standard grammar parser from command line:
nodejs run_camxes "coi ro do"
The standard grammar parser is used by default, but another grammar engine can be specified.
-stdflag selects the standard grammar engine.
-betaflag selects the Beta grammar engine.
-cbmflag selects the Cmevla-Brivla Merger grammar engine.
-cktflag selects the Ce-Ki-Tau grammar engine.
-expflag selects the experimental or sandbox grammar engine.
-p PATHcan be used for selecting a parser by giving its file path as a command line argument.
-m MODE can be used to specify output postprocessing options.
Here, MODE can be any letter string, each letter standing for a specific option.
Here is the list of possible letters and their associated meaning:
- M -> Keep morphology
- S -> Show spaces
- C -> Show word classes (selmaho)
- T -> Show terminators
- N -> Show main node labels
- R -> Raw output, do not prune the tree, except the morphology if 'M' not present.
- J -> JSON output
- G -> Replace words by glosses
- L -> Run the parser in a loop, consume every input line terminated by a newline and output parsed result
- L -> A second 'L' means that run_camxes will expect every input line to begin with a mode string (possibly empty) followed by a space, after which the actual input follows.
nodejs run_camxes -m CTN "coi ro do"
This will show terminators, selmaho and main node labels.
Running the IRC bots
Nothing easier; after having entered the ilmentufa directory, run the command
nodejs ircbot/camxes-bot or
nodejs ircbot/cipra-bot (the latter is for the experimental grammar).
The list of the channels joined by the bot can be found and edited within the bot script.
How to use one of these parsers in a HTML interface project
.js parser (e.g.
to your HTML interface.
You may also want including
camxes_postproc.js, which provide useful features. The former does optional preprocessing of Lojban text, such as replacing digits with the corresponding PA cmavo, converting nonstandard spellings or scripts into normal Latin-based Lojban text. The latter script, the postprocessor, provides a function for trimming or prettifying the parse tree generated by the parser in function of the chosen postprocessing options.
Here's a simple example code:
The postproc options are the sames as those of run_camxes.js described earlier in this file; please refer to the section
Running a parser from command line above for more details.
You can also look into
camxes.html's code to see a real example of using the Lojban parsers in a HTML interface.
You can run a parser by making your program execute the following example command line:
nodejs run_camxes.js -m J camxes.js "jbobau vlamei"
(You can adapt it by providing other options flags described in the
Running a parser from command line section above, but you'll want to keep the J option flag, so
run_camxes.js outputs a JSON stringified array that will be easy for your program to read.)
run_camxes.js will then write the output parse tree in its
stdout, so you'll need your program to read its
stdout to get the parse result.
For example, in a Python script, you can execute run_camxes.js with the
subprocess Python module, and then read its output this way:
import subprocess try: import simplejson as json except (ImportError,): import json command = ['nodejs', 'run_camxes.js', '-m', 'J', 'jbobau vlamei'] output = subprocess.check_output(command) json_string = output.decode("utf-8") # Converting the JSON output back to nested lists: parse_tree = json.loads(json_string)