Structural Search

Nick Kallen edited this page May 31, 2013 · 8 revisions
Clone this wiki locally

This is a brief overview of the query language for searching the Perseus Treebank data, which has syntactically annotated ancient texts, such as Homer's Iliad. Each sentence in the texts are turned into trees, like sentence diagrams, in a format called a dependency tree. The query language for searching these trees is just the CSS3 query language, with some custom additions.

  1. To begin, you can search for any lemma by simply typing the lemma directly: φθογγή
  2. To search for all words in the accusative case, simply proceed the case by a colon (:): :accusative
  3. A term proceeded by a colon is called a "pseudo-selector". Any part-of-speech, tense, gender, person, number, case, voice, mood, or degree can be searched with this pseudo-selector syntax, as in :accusative :optative :imperative :dual :verb
  4. To search for words that have multiple features, such as a singular, third-person, verb, concatenate the pseudo-selectors together: :third:singular:verb
  5. To search for the occurrence of a specific lemma with certain attributes, concatenate the lemma to the the pseudo-selectors: αἰτέω:first:singular:present
  6. Another psuedo-selector worth knowing is the :root selector. It searches for all "root" words in a sentence, i.e., the main clause, i.e., those words with parentId=0. It's worth noting that, in the dependency tree format, final punctuation (".", ";", and so forth) also have parentId=0. The :root selector, however, excludes punctuation.
  7. To search for a specific morpheme/form (i.e., an inclined or conjugated word), use a selector like [form=?], as in [form=φθογγὴν]
  8. Selectors like [a=?] are called "attribute-selectors". In addition to the form attribute-selector, the relation selector is very useful. Here is a search for all conditionals: εἰ[relation=AuxC]
  9. With everything we've learned so far, it's easy to find substantival infinitives used as subjects, isn't it? :infinitive:verb[relation=SBJ]
  10. Or we can search for genitive absolutes, my least favorite feature of Greek! :genitive[relation=SBJ]
  11. So far we've seen how to search for individual words that have certain inflectional and syntactic features. But we can also search for the relationships between words, or in the jargon of dependency-trees, the dependency relationship between terms. In a dependency tree, a word that depends upon another, or modifies another, has a parent-child relationship. For example, when an adjective modifies a noun, the adjective is the child, the noun is the parent, and the relationship between them is ATR. Searching for parent-child relationships uses the greater-than (>) operator, as in :noun > :adjective
  12. As a more concrete example, to find all adjectives modifying μῆνις, do this: μῆνις > :adjective[relation=ATR]
  13. It's worth noting that not adjectives aren't the only things that can modify nouns. Certain genitives do this, as in "Διὸς μῆνις". Here is a search for anything modifying "μῆνις" μῆνις > [relation=ATR]
  14. At this point, we know enough to search for indicative verbs with accusative objects where the verb is that of a main clause: :verb:indicative:root > :accusative[relation=OBJ]
  15. But we should note that the query above is incomplete! In fact, some sentences begin with a coordinating conjunction ("but", "and", etc.). In the dependency-tree scheme, these are the parents of the main verb. So now we must express a pattern 3 levels deep: [relation=COORD]:root > :verb:indicative > :accusative[relation=OBJ]
  16. Fortunately, it's possible to combine two independent queries together, by using the comma (,) operator, as in selector1, selector2. So we can mingle the previous two queries like this: :verb:indicative:root > :accusative[relation=OBJ], [relation=COORD]:root > :verb:indicative > :accusative[relation=OBJ]
  17. At this point, we also should know how to search for future-less-vivid conditionals, which could be my favorite! :optative > εἰ[relation=AuxC] > :optative
  18. Another class of problems relates to the order of words in sentences. Since the trees themselves express only syntactic relationships, there are special operators to look for ordinal relationships between words. For example, to search for "subject-verb-object" word order, you can use the :before and :after pseudo-selectors: :verb:before([relation=SBJ]):after([relation=OBJ]) In this example, the before and after pseudo-selectors are functions that take arguments. There arguments are selectors themselves (e.g., [relation=SBJ]) and those are evaluated relative to a parent selector. That sounds a bit complicated, but it means that :verb:before([relation=SBJ]) will only look for words with [relation=SBJ] that are also children of the verb in the dependency tree.
  19. Sometimes, however, it's simpler or more useful to search only for word order and ignore any syntactic relationship between the words. There are two ways to do this. The first uses the plus (+) operator, which looks for immediately adjacent words within a sentence, as in φίλος + γάρ + εἰμί
  20. The second is the tilda (~) operator, which looks only at word order in a sentence, ignoring whether the terms are right next to each other: φίλος ~ εἰμί