Skip to content

Latest commit



108 lines (101 loc) · 5.67 KB

File metadata and controls

108 lines (101 loc) · 5.67 KB

Annotation Definitions

The definition of annotations returned from the parse function is given below. Each node in the tree represents the keyword in the tree (nested maps), the type and any domain information and the description. When the domain is given conventional math notation is given. For example:

(tuple: [start, end))

is a range of numbers that is inclusive on for starting range and exclusive and the ending range. Each index or range is either based on an "absolute" utterance or sentence. If the latter it starts at the beginning of each sentence in which it is contained.

An example of an annotation parse data structure is given here.

The annotation definitions follow:

  • text (sting): Entire utterance text that creates the complete annotation tree.
  • sentiment (integer: [-2, 2]): The sentiment score of the entire utterance.
  • mentions (list): List of NER mentions of the utterance.
    • char-range (tuple: [start, end)): The 0-based utterance index character range.
    • token-range (tuple: [start, end)): The 1-based utterance index token range.
    • ner-tag (string): The named entity tag of the mention (ex: ORGANIZATION).
    • entity-type (string): The named entity type of the mention, which is usually the ner-tag (ex: PERSON).
    • sent-index (integer): The 0-based index of the sentence the mention is found.
    • text (string): The text of the full mention.
  • tok-re-mentions (list): List of custom NER mentions of the utterance using the token regular expression namespace. This has all the same children nodes as mentions.
  • sents (list): Contains a list of sentences.
    • text (string): The text of the sentence.
    • sent-index (intgeer): The 0-based index of the sentence.
    • sentiment (integer: [-2, 2]): The sentiment score of the sentence.
    • parse-tree (list): List of composite parse children nodes.
      • index-range (tuple: [start, end)): The 0-based sentence index token range of the current node and all children.
      • token-index (integer): The 1-based sentence index of the token.
      • sentiment (integer: [-2, 2]): The node's sentiment score. If a non-leaf the score is aggregated from children nodes.
      • label (string): The node's token text for leaf nodes or type of branch node (ie. a noun phrase uses NP).
      • child (list): list of composite children nodes (see parse-tree)
    • dependency-parse-tree (list): List of composite parse children nodes.
      • dep (string): The dependency type of this node to the parent.
      • token-index (integer): The 1-based sentence index of the token.
      • text (string): The token text the node represents.
    • tokens (list): The list of tokens for the sentence.
      • text (string): The token text.
      • sent-index (integer): The 0-based utterance index of the sentence for which the token is contained.x
      • index-range (tuple: [start, end)): The 0-based sentence index token range of the current node and all children.
      • token-index (integer): The 1-based sentence index of the token.
      • token-range (tuple: [start, end)): The 1-based utterance index token range.
      • char-range (tuple: [start, end)): The 0-based utterance index character range.
      • lemma (string) the lemmatized token.
      • stopword (boolean): true if the surface form of the word is a stop token or false otherwise.
      • stoplemma (boolean): true if the lemmatized surface form of the token is a stop word or false otherwise.
      • sentiment (integer: [-2, 2]): The sentiment score of the token.
      • pos-tag (string): The part of speech tag (ex: NN).
      • ner-tag (string): The named entity tag of the token (ex: PERSON).
      • entity-type (string): The named entity type of the token, which is usually the ner-tag (ex: ORGANIZATION).
      • natlog (map): Map of child nodes that represent the natural logic information of the token.
        • polarity The natural logic polarity of the token.
        • operator Map of operator information for quantifier tokens.
        • object-token-range (tuple: [start, end)): The 1-based sentence index of the object token(s).
        • subject-token-range (tuple: [start, end)): The 1-based sentence index of the subject token(s).
        • quantifier-token-range (tuple: [start, end)): The 1-based sentence index of the quantifier token(s).
        • quantifier-token-head-index The 0-based sentence index of the head token.
      • srl (map): Semantic Role Labeler (SRL) output for the token.
        • id (integer): a 1-based unique sentence based identifier for this token. This is used to connect SRL nodes based on their dependency relationship.
        • head-id (integer): the SRL id of this node's parent
        • heads (list): a lit of maps containing functional dependency relationship to the parent (head) of this token node.
          • dependency-label (string): The functional dependency between this token node and the parent (head). This also known as the argument (ex: A1).
          • function-tag (string) the type tag of the dependency between this toke node and the parent. Example: PPT.
        • dependency-label (string): the type of dependency relation between this token node and its parent (ex: ccomp).
        • propbank (string): The propbank tag of the token (ex: want.01).