Skip to content
Hiroshi Noji edited this page Feb 25, 2016 · 8 revisions

Attributes for generic tags:

  • token

  • @form = surface form (the, 京都)

  • @pos = POS tag (DT, 名詞)

  • @lemma = canonical or base (dictionary) form

  • dependency = a dependency arc between two tokens

  • @head = id of a head element (usually token) in the sentence (ROOT for a sentence root token)

  • @dependent = id of a dependent element

  • @deprel = relation label

  • span = a span in a constituent tree

  • @type = "preterminal" or "nonterminal" (or "empty")? (distinguish the meaning of child id)

  • @symbol = nonterminal symbol of the span

  • @children = sequence of ids of children from left to right (id of the token if @type="preterminal"?)

Attributes for language-specific tags:

Japanese

  • token
  • @pos1
  • @pos2
  • @pos3
  • @cType
  • @cForm
  • @yomi = How to read a token (読み): オショウガツ
  • @pron = How to pronounce a token (発音): オショーガツ
  • @misc = Several information in jumandic: "代表表記:京都/きょうと 地名:日本:府"