Skip to content
Hiroshi Noji edited this page Aug 1, 2016 · 2 revisions

Apart from StanfordCoreNLP, Jigg's XML encodes several tag-specific information as attributes. For example, the following <token> in StanfordCoreNLP

<token id="1">
  <word>Stanford</word>
  <lemma>Stanford</lemma>
  <CharacterOffsetBegin>0</CharacterOffsetBegin>
  <CharacterOffsetEnd>8</CharacterOffsetEnd>
</token>

are represented in Jigg as

<token id="s0_1" form="Stanford" lemma="Stanford" CharacterOffsetBegin="0" CharacterOffsetEnd="8"/>

The main characteristics in Jigg are:

  • Each element (e.g., token) has a unique id (e.g, s0_1) in the XML. In StanfordCoreNLP, these ids are not unique.
  • Some information (e.g., surface form) is represented as a different field (e.g., form rather than word).
Clone this wiki locally