Contemplata is a web-based annotation tool developed specifically for the purpose of the Temporal@ODIL project. The ultimate goal of this project is to annotate a portion of the ANCOR spoken French corpus with semantic (more precisely, temporal) information. To this end, Contemplata allows:
- Merging/splitting speech turns into syntactically coherent units,
- Removing (either automatically or manually) selected expressions, uninteresting from the semantic point of view (e.g., social obligations-related expressions)
- Correcting constituency trees, obtained with a syntactic parser plugged into Contemplata (it can be thus used as a purely syntactic annotation tool, regardless of its temporal annotation functionnalities),
- Annotating temporal entities on top of the syntactic structures,
- Linking the entities with temporal relations.
First clone the Contemplata's repository into a local directory.
git clone https://github.com/kawu/contemplata.git cd contemplata
Then proceed with the installation of the back-end server, the front-end annotation tool, and (optionally) the third-party syntactic analysis tools, as explained below.
To install the back-end, you will need to download and install the Haskell Tool Stack on your machine beforehand. You can use the latest stable version of the tool.
Then, move to the
backend directory and run the installation process with
cd backend stack install cd ..
Under linux, this command will (by default) install the
command-line tool in the
~/.local/bin directory. You can either add this
directory to your
$PATH, or use the full path to run
If you encounter the following error during compilation:
protoc: callProcess: runInteractiveProcess: exec: does not exist
Then you need to install Protocol
Buffers and retry with
Avoid recompilation of protocol buffer files
By default, the setup tool will generate Haskell files from the protocol buffer
files (responsible for communication with the Stanford parser) each time you run
stack install. However, this step needs to be performed only once. In order to
skip it for subsequent builds, replace:
buildProtos :: Bool buildProtos = True
buildProtos :: Bool buildProtos = False
To install the front-end application, you will need to install Elm beforehand.
WARNING: Contemplata requires Elm version 0.18 (and not the latest version
0.19 which introduced several breaking changes). Elm 0.18 can be installed
npm using the following command:
npm install -g firstname.lastname@example.org
Once you have Elm installed, move to the
annotool directory and generate the
cd annotool elm-make src/Main.elm --output=main.js cd ..
--output option tells the compiler to generate a
rather than a stand-alone HTML file. You will then need to put the
file into a directory in which the web-server is run, as explained in the
setup section below.
You can optionally install one or both constituency parsers supported by Contemplata. This will allow the annotators to run these parsers directly via the annotation interface.
$corenlp be the directory in which you wish to put the Stanford CoreNLP
tool. You can download the tool from the CoreNLP's webpage.
cd $corenlp wget http://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip unzip stanford-corenlp-full-2017-06-09.zip
Next, you will need to obtain an appropriate parsing model. Currently, Contemplata is configured to work with the French models only (we plan to allow other languages in future versions). These models are also available at the CoreNLP's website.
Finally, you can run the CoreNLP server, supplying it with (i) the path to CoreNLP (ii) the French models:
cd $contemplata/corenlp ./stanford-server-fr.sh $corenlp/stanford-corenlp-full-2017-06-09 $corenlp/stanford-french-corenlp-2017-06-09-models.jar
See also the README file for information about the CoreNLP French parsing model prepared within the context of the Temporal@ODIL project.
Before you can start the Contemplata application, you will need to set up an instance with its own dedicated database and configuration files.
You will need to prepare a dedicated enviroment to run Contemplata, i.e., a
dedicated directory where the database and all the configuration files are
stored. Under linux, assuming that
$odil is the path to the dedicated
directory, and that
$contemplata is the path to the cloned Contemplata's
repository, you can run the following commands to create an empty database in
mkdir $odil cd $odil contemplata createdb -d DB
cp -r $contemplata/config/* ./ cp -r $contemplata/backend/snaplets ./ cp -r $contemplata/annotool/main.js resources/public/
You can read more about configuration in the corresponding README file.
Use the following command to run the web-server in the
By default, the application uses the port
8000. You can change it using the
contemplata-server -p 8000
To start annotating, you will have to log in as administrator (login =
admin), change the password, create annotator accounts, upload
files, and assign the files to the individual annotators, as explained below.
After you setup a local Contemplata instance and run the
corresponding web-server, you will need to log in at
http://localhost:8000 as an administrator to prepare
the annotation enviroment. Initially, login =
admin and password =
You can change the password straight away at the Password subpage (reachable
via the top navigation bar).
At first, two Contemplata accounts are set up:
accounts are intended for special use-cases:
admin for administravie tasks,
guest to give access to non-annotators to selected documents and to the
Contemplata's user guide.
You can add actual annotator accounts via the Users subpage, which contains the list of the current annotators and a form to add new annotators.
Forgotten passwords cannot be restored, but as an administrator you can change the password of an existing user. To this end, go to the Users subpage and use the form which also serves to add new annotators.
Initially, the annotation database is empty. To add new files for annotation, use the form present at the Upload subpage.
WARNING: upload only works with UTF-8-encoded files.
When you upload a file, you need to specify the name of the file which consists of three parts:
- The base name under which the file will be stocked in the database.
- The annotation level of the file, which allows to distinguish the various copies of (originally) the same file annotated at different levels (syntax, semantic, etc.). The set of levels can be specified in Contemplata's Dhall configuration, you can change them to serve your annotation needs better.
- The ID of the file, to distinguish several copies of the same file annotated at the same level. You can use fill it, e.g., with the name of the file's annotator.
Contentionally, Contemplata uses the
BASE-NAME:LEVEL:ID format (i.e. with all
the parts of the name separated with
:) to refer to the file with the
At the moment, two upload formats are supported: generic JSON files, respecting the appropriate formatting rules, and the corpus format (.ac XML files) of the GLOZZ annotation platform, which is also handled by the ANNODIS annotation tool.
For the latter format, the tool automatically performs certain pre-processing operations. Notably, it removes the social obligations-related expressions, a step which can be avoided by unchecking the corresponding checkbox during the file's upload.
The list of files stocked in the database can be found at the Files subpage. Click on the file of your choosing to see more information about it, assign annotators to it, download its JSON representation, and so on.
The list of the annotators having access to the file can be found in the Annotators section of the corresponding subpage. Each annotator can either read or read-and-write the file. To change the annotator's modification rights, click on the corresponding link in the Can modify? column. You can also add new annotators for the file using the form below, or remove the annotator from the file using the remove link.
The Show JSON link, which allows to download the JSON version of the annotated file, can be found in the General information section.
The Copy form, which allows to create a copy of the file, can be found at the bottom of the subpage. It can be useful, e.g.:
- To create a copy of the file for another user to annotate.
- When annotation of the file at a given level (e.g., syntax) is finished and you want to create a copy to annotate higher levels (e.g., semantic).
The Remove link, which allows to completely remove the file from the database, can be found in the General information section.
Each file in the database is assigned a status, which tells whether the file is:
- new -- freshly added to the database
- touched -- its annotation has been commenced
- done -- its annotation (at the given level) has been finished
Normally, the status of the file is updated automatically, based on the actions of its annotator(s). The aministrator can nevertheless change it manually, by clicking on the corresponding link in the General information section.
The Contemplata application suite provides the
contemplata command-line tool,
by default installed in the
~/.local/bin directory. It can be used to create a
new database, add new files to the database, convert an FTB file to the PTB
format, etc. Run:
to see the tool's available options.
Contemplata is implemented in a client/server architecture, with the advantage that the annotator does not have to install anything locally, and the server can provide the user with more advanced functionality. For instance, the server can be requested to syntactically re-analyze a given sentence in a way which takes the constraints specified directly by the annotator (e.g. a particular tokenization) into account. In the long run, the client/server architecture should also allow a more collaborative annotation style.
On the server-side, Contemplata tool uses a simple file-based storage for the annotated files. All the files are kept in the dedicated JSON format.
The web-server is implemented in Snap, a Haskell web framework. It handles regular HTTP requests (used to list the files, general administration work, etc.) as well as WebSocket requests, the latter used to communicate with the front-end annotation application.
An Temporal@ODIL-dedicated instance of the tool can be found at
You can log in as a
guest) to have a look. As a guest, you
will not be allowed to store any changes you made, but you will have access to
the user's guide and
will be able to play with the tool's functionality.
You can think of the File type as a definition of the structure against which
the JSON files can be validated. You can perform the validation programatically.
stack ghci within the
backend source directory and then:
import qualified Data.Aeson as JSON import qualified Data.ByteString as BS JSON.decodeStrict <$> BS.readFile "<path-to-json>" :: IO (Maybe File)
JSON from PTB
Contemplata provides a command-line tool which allows to convert a file in the PTB bracketed format to the dedicated JSON format. So if you want upload a file for annotation, it might be more convenient to prepare it in the PTB format, covert as shown below, and upload via the web-interface afterwards.
contemplata penn2json < <file.ptb> > <file.json>