Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Maven Build Status License Javadocs

generic-xml-reader

A library to read in arbitrary XML content (including TEI) into UIMA, translating structural annotation to stand off

Installation

<dependency>
  <groupId>de.unistuttgart.ims.uima.io</groupId>
  <artifactId>generic-xml-reader</artifactId>
  <version>1.4.0</version>
</dependency>

Usage

This package converts provides a few classes to convert inline XML into UIMA-based stand-off annotation. How to map inline XML elements onto UIMA annotation types can be specified with rules.

Within certain limits, the package can also be used to export into inline XML.

Example

Let's consider an example XML snippet

<sp who="#der_prinz">
    <speaker>DER PRINZ</speaker>
    <stage>
        <hi>an einem Arbeitstische, voller Briefschaften und Papiere, 
            deren einige er durchläuft.</hi>
    </stage>
    <p> Klagen, nichts als Klagen! Bittschriften, nichts als 
       Bittschriften! – Die traurigen Geschäfte; und man beneidet uns 
       noch! – Das glaub' ich; wenn wir allen helfen könnten: dann 
       wären wir zu beneiden. – Emilia? <hi>Indem er noch eine von den 
       Bittschriften aufschlägt, und nach dem unterschriebnen Namen 
       sieht.</hi>
    </p>
</sp>

We now create a new object of the class GenericXmlReader, specify a few rules, and read in the XML string (assuming it's in a variable called xmlString):

GenericXmlReader<DocumentMetaData> gxr = new GenericXmlReader<DocumentMetaData>(DocumentMetaData.class);
gxr.addRule("speaker", Speaker.class);
gxr.addRule("stage", StageDirection.class);
gxr.addRule("hi", StageDirection.class);
gxr.addRule("sp", Utterance.class, (utterance, xmlElement) -> {
	utterance.setWho(xmlElement.attributeValue("who");
});
JCas jcas = gxr.read(IOUtils.toInputStream(xmlString, "UTF-8"));

The JCas now contains the entire text of the snippet, and several annotation layers according to the mapping rules. Plus, we have set a feature value of a UIMA annotation based on the attribute value of an XML element.

About

A class to read in arbitrary XML content (including TEI) into UIMA, translating some structural annotation to stand off

Resources

License

Packages

No packages published

Languages