Skip to content

nilsreiter/generic-xml-reader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maven Build Status License Javadocs

generic-xml-reader

A library to read in arbitrary XML content (including TEI) into UIMA, translating structural annotation to stand off

Installation

<dependency>
  <groupId>de.unistuttgart.ims.uima.io</groupId>
  <artifactId>generic-xml-reader</artifactId>
  <version>2.0.1</version>
</dependency>

Usage

This package converts provides a few classes to convert inline XML into UIMA-based stand-off annotation. How to map inline XML elements onto UIMA annotation types can be specified with rules.

Within certain limits, the package can also be used to export into inline XML.

Example

Let's consider an example XML snippet

<sp who="#der_prinz">
    <speaker>DER PRINZ</speaker>
    <stage>
        <hi>an einem Arbeitstische, voller Briefschaften und Papiere, 
            deren einige er durchläuft.</hi>
    </stage>
    <p> Klagen, nichts als Klagen! Bittschriften, nichts als 
       Bittschriften! – Die traurigen Geschäfte; und man beneidet uns 
       noch! – Das glaub' ich; wenn wir allen helfen könnten: dann 
       wären wir zu beneiden. – Emilia? <hi>Indem er noch eine von den 
       Bittschriften aufschlägt, und nach dem unterschriebnen Namen 
       sieht.</hi>
    </p>
</sp>

We now create a new object of the class GenericXmlReader, specify a few rules, and read in the XML string (assuming it's in a variable called xmlString):

GenericXmlReader<DocumentMetaData> gxr = new GenericXmlReader<DocumentMetaData>(DocumentMetaData.class);
gxr.addRule("speaker", Speaker.class);
gxr.addRule("stage", StageDirection.class);
gxr.addRule("hi", StageDirection.class);
gxr.addRule("sp", Utterance.class, (utterance, xmlElement) -> {
	utterance.setWho(xmlElement.attributeValue("who");
});
JCas jcas = gxr.read(IOUtils.toInputStream(xmlString, "UTF-8"));

The JCas now contains the entire text of the snippet, and several annotation layers according to the mapping rules. Plus, we have set a feature value of a UIMA annotation based on the attribute value of an XML element.

About

A class to read in arbitrary XML content (including TEI) into UIMA, translating some structural annotation to stand off

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages