Skip to content

a major mode for GNU Emacs for annotations in a stand-off manner

License

Notifications You must be signed in to change notification settings

purcell/standoff-mode

 
 

Repository files navigation

standoff-mode

standoff-mode is a major mode for GNU Emacs that lets you create annotations on texts in a stand-off manner. It is written for use in the field of digital humanities and the manual annotation of training data for named-entity recognition.

There are several tools for creating stand-off markup. Most of them need to be deployed on a server in a network environment, which may be a barrier. In contrast standoff-mode does not need a networking environment. It wants to enable one to get hands on annotating texts right away.

Markup can be stored in several formats with standoff-mode: including dumped lisp-expressions (implemented), a remote or local SQL-Database or as RDF-triples in a SPARQL-endpoint following the emerging standard defined in the OpenAnnotation ontology (roadmap) or as local files following BRAT's plain-text format (planned).

standoff-mode doesn't want to be everything under one hood. It's just a tool for the manual annotation of texts. Statistics must be done by another tool.

Since it was written for the field of digital humanities, literature studies in particular, standoff-mode works not only with plain text input (source) files, but also with XML. So semantic stand-off markup produced with it may reference structural markup coded in TEI/P5, which may be of advantage for further processing.

Stand-off Markup

Stand-off markup is also known as external markup and means:

  • Stand-off markup refers to a source document by some kind of pointers. standoff-mode uses character offsets.

  • It is contained in an external document (or a database).

  • The source document is left unchanged and may be read-only.

  • The source document may contain markup too, called internal markup. Stand-Off Mode facilitates reading of XML source documents by hiding tags and showing glyphs for character references.

Cf. the TEI/P5 guidelines on stand-off markup and the OpenAnnotation ontology.

Features

  • allows discontinuous markup

  • allows relations between markup elements (RDF-like directed graphs)

  • allows attributes on markup elements

  • allows text comments anchored on one or several markup elements

  • generate config for your annotation schema from OWL by XSLT

  • allows to customize the restrictiveness of the annotations, either to the annotation schema plugged in via config (apriori), or the schema already used (a posteriori), or free

  • offers completion of user input of markup types, relation predicates and attribute names

  • hide the fdq-names (IRIs) of markup types, predicates and attributes behind labels (from OWL or RDFS), customizable

  • customization of highlighting faces

  • everything can be done with the keyboard an key-codes

  • several pluggable back-ends (under development)

  • manual based on GNU Texinfo, English (under development) and German

Roadmap

standoff-mode is under active development. Here's the roadmap:

  • text comments

  • SPARQL back-end

  • SQL back-end

  • BRAT-like back-end

Requirements

Only GNU Emacs is required. After the installation of the editor the standoff-mode package has to be installed. It was tested on Windows, Linux and Mac, with versions 24.3 and 24.5.

If you want to store your markup in SQL-tables or as RDF-triples, a RDBMS or a SPARQL-endpoint is required.

About

a major mode for GNU Emacs for annotations in a stand-off manner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Emacs Lisp 94.0%
  • XSLT 4.7%
  • Makefile 1.3%