Skip to content
Sean Trott edited this page Jun 8, 2016 · 20 revisions

Overview

As described here (Khayrallah, Trott, & Feldman 2015), the Specializer receives a SemSpec as input and produces an n-tuple as output. An n-tuple contains task-specific semantic information, is focused around action specifications (e.g., move, push, etc.) and their parameters, and functions as a shared communication language between all agents in our Natural Language Understanding system. In terms of implementation, n-tuples are JSON structures mapping shared keys to values.

Below is an n-tuple for the sentence "John saw the box."; note the similarities to the SemSpec here.

(N-tuple for the sentence "John saw the box.")

The Core Specializer uses n-tuple templates to determine which aspects of the SemSpec to extract. More information about the actual design of n-tuple templates can be found on the page describing the Core Communication Modules. This section is dedicated to describing the process by which the Core Specializer produces an n-tuple, and the methods used to do this.

The Core Specializer file can be found here, and contains additional documentation on the methods.

Below is a description of the most important methods in the CoreSpecializer, as well as a walkthrough of how an n-tuple for the sentence "John saw the box.". ********

Relevant methods

###specialize(self, fs)

This is a bound (class) method that takes a SemSpec or "Feature-Structure" (fs) as input, and outputs an n-tuple. Of course, there is a considerable amount of processing that goes on between the call to specialize and the output of an n-tuple.

First, the Core Specializer checks whether the SemSpec is an utterance with discourse information; if it's not (e.g., a sentence fragment like "the red box"), the Specializer calls specialize_fragment (see below), and produces a fragmented n-tuple.

Otherwise, the Specializer identifies the "mood" of the utterance using Discourse information (e.g., "Declarative", "Imperative", etc.), identifies the corresponding mood template, then routes the "content" of the utterance to the specialize_event method.

###specialize_event(self, content)

This takes in an EventDescriptor as input, and produces an n-tuple describing that event. Again, this consists of multiple component steps, but at the highest level, the method identifies the corresponding template for the type of EventDescriptor using the event templates. In most cases, this is a normal EventDescriptor, but in the case of conditional statements, it is a "ConditionalED".

Then, for each key/value pairing in the event template, the Core Specializer calls fill_value to fill in the template.

###fill_value(self, key, value, input_schema)

This is one of the most important methods in the Core Specializer, since it defines procedures by which the declarative templates can guide the Specializer's actions. The method takes as input a key name, the template value, and the schema to extract the information from. A series of conditions are then evaluated; the template value is investigated to determine how to represent the final output for this key in the n-tuple. The key, meanwhile, corresponds to the same-named role in the schema.

For example, if the key is "eventProcess", and the value is the dictionary...

{'parameters': 'eventProcess'}

...the CoreSpecializer knows to call the fill_parameters method (see below) on the contents of the eventProcess role.

If the key is "protagonist", and the value is the dictionary...

{'descriptor': 'objectDescriptor'}

...the CoreSpecializer knows to the call the get_objectDescriptor method (see below) on the contents of the protagonist role.

Note: More information about the declarative procedures can be found in documentation about the n-tuple templates.

###fill_parameters(self, eventProcess)

This method identifies the corresponding parameter template for the input eventProcess. Typically, these describe subtypes of the "Process" schema, but parameter templates can also be used for other schema families. If no corresponding template is found, a parent schema/template (such as "Process" for "MotionPath") is used.

The method then accomplishes two primary tasks:

  1. Fills in template: For each item in the template, the method calls fill_value (above).
  2. Inverts pointers: Additionally, the method also searches for modifiers of the eventProcess, such as adverbs or prepositional phrases (like "he ran for 2 hours"), inverts these pointers, and incorporates that information into the resulting n-tuple.

###get_objectDescriptor(self, item, resolving=False)

This method identifies the corresponding descriptor template (in this case, the objectDescriptor template). In our system, objectDescriptors are general descriptions of referents in the SemSpec (an "RD", or "Referent Descriptor"). The Core Specializer has no world model, so it can't actually determine a real-world referent, but it can package the information in a simple, accessible way, so that the Problem Solver can determine the real-world referent (or, in some cases, request clarification).

Besides simply filling in the values from the RD (ontological-category, givenness, gender, etc.), this method performs two key functions:

  1. Pointer inversion: it crawls the SemSpec and finds modifiers, such as adjectives or prepositional-phrases, that point to a given RD, and then incorporates this information into an objectDescriptor.
  2. Referent resolution: in the case of a pronoun or one-anaphora, the Core Specializer searches through its stack of previous referents, and attempts to unify the current objectDescriptor with a previous referent.

For example, "the red box" might output an objectDescriptor that resembles the following:

{color: red
type: box,
givenness: uniquelyIdentifiable
number: singular}

###get_locationDescriptor(self, goal)

This method is actually from the UtilitySpecializer, which defines several utility methods necessary for the Core Specializer (including the coreference resolution methods seen below).

In this "core" version, the method returns the type of spatial relation expressed by the semantics in the utterance, e.g. "near" or "in". If the action-side has built-in or embodied notions of spatial relations, then more detailed information from the SemSpec could theoretically be passed, besides a String representing the relation.

These are often used in conjunction with objectDescriptors, such as "the box near the green box". Below is a simplified version of the objectDescriptor for that description.

{type: box
givenness: uniquelyIdentifiable
locationDescriptor: {
relation: near
objectDescriptor: {
type: box
color: green}}}

###specialize_fragment(self, fs)

This specializes the SemSpec for a sentence fragment, such as "the red one", or another non-discourse utterance. The Core Specializer has procedures built in for the majority of the potential meanings in the core grammar. However, system integrators might want to subclass the Core-Specializer and extend this method to cover domain-specific meanings as well.

Other Core Features

The Core Specializer is also fitted with several other important features, which aid in n-tuple building, and are generalizable across domains.

Coreference Resolution

As mentioned above, the Core Specializer is able to handle basic coreferent resolution, both within and across utterances. The latter is particularly important for dialog systems with an autonomous agent, in which the human user shouldn't have to repeat the full description of an object with every reference.

Key methods

resolve_referents(self, item, antecedents=None, actionary=None, pred=None

Takes in an object-descriptor (ITEM) with a referent of "antecedent". If no other arguments are passed, the Core Specializer uses, by default, it's field attribute _stacked, which is a stack of object-descriptors. Alternatively, the user can pass in a list, as is the case when the referent is "addressee" (e.g., "you"); a separate stack of addressees is also maintained, for simple discourse analysis.

Recovering the referent consists of several steps:

  1. Pop the most recent referent from the _stacked list.
  2. Check if this new referent is compatible with the pronoun using the compatible_referents(self, pronoun, ref) method.
  3. (Optional) Check if this new referent is compatible with the ACTIONARY passed in.
  4. If the referent is compatible, clean the object descriptor and return it.
  5. Else, repeats steps 1-4 until a compatible referent is found. If no referent is found, return the descriptor for the original pronoun – it’s possible the Problem Solver could find a referent down the line.

compatible_referents(self, pronoun, ref

This returns True if PRONOUN and REF are compatible, and False if not. Two object-descriptors are compatible if all of their key/value pairs are compatible in the language ontology (excepting the "referent" value).

Implementation/Limitations

During specialization, object-descriptors are added to the self._stacked field whenever a role in an n-tuple template is filled. For example, the Force-Application n-tuple template contains a role for the actedUpon object:

John pushed [the box]<actedUpon> into the room.

Thus, once the Specializer fills this role, the object-descriptor is added to the self._stacked list.

If you compare this implementation to other reference resolution gadgets, it's actually quite different. You'll notice that it uses very little syntactic information, other than the implicit fact that these roles are NP-heads. For most cases, it works as expected, because the antecedent of a pronoun usually isn't embedded within an NP:

Robot1, move to the box near the green box, then push it!

In the above sentence, it is much more likely to refer to the whole NP, the box near the green box, rather than the embedded NP, the green box. However, the limitation of this implementation is that if a referent isn't the value of a role in an n-tuple template, it won't be added to the self._stacked list -- unless, as in the case of self.get_spgValue, the code for a particular slot is ordered to do so.

Essentially, the current implementation assumes that all items on self._stacked are potential referents, and the filtering procedure then compares RD information, such as gender and number, to eliminate referents. The alternative would be to add all object-descriptors to self._stacked, and apply a more rigorous filtering procedure.

Future directions

(1) The reference resolution process could be significantly enhanced. The most obvious flaw in the current design is that the answer is deterministic, rather than probabilistic. Given that much of the Specializer is this way, this particular issue is probably not particularly pressing; however, if someone wanted to implement a more serious treatment of reference resolution, this would be one aspect to incorporate.

(2) Another room for development is incorporating more syntactic information. Syntactic information is passed to the Specializer in the form of the constructional spans from the ECG Analyzer. Constructional spans give the name of a construction, and the span of words (and punctuation marks, etc.) in the original sentence it includes, e.g.:

the man ran into the room
ActiveMotionPath[2, 6]: "ran into the room"

The "left" index of the span is the start-word, and the "right" index is the index of the end-word plus one.

The UI-Agent passes the span information to the Specializer with the set_spans method, but currently, nothing is done with this information.

Previous work (Oliva et al, 2012) has incorporated syntactic information into the resolution module, and it would not be difficult to incorporate this.

(3) Finally, it would be interesting to incorporate more semantics into the reference resolution. As mentioned above, the actionary can be passed into the "resolve_referents" method, but the procedure for determining compatibility with actionaries is hard-coded; presumably, there should be a way to set this externally, whether in the grammar or in some sort of table, perhaps facilitated by the Token Tool. The general problem of semantic incompatibilities is addressed below.

Thus far, the semantics for determining compatibility has come primarily from the RD (gender, number, etc.). However, many cases of referent resolution are much more complex, and depend on the verb. Additionally, it's not just that the verb places ontological-category constraints on the antecedent, but the whole frame evoked by the verb:

the city denied the protesters a permit because they advocated/feared violence

The Winograd Schema Challenge is an example of referent resolution challenges that are grounded in semantics, not syntax, and which require detailed world knowledge to carry out. This is something ECG and the Core Specializer could likely do (with the help of a resource like FrameNet), and I hope to try this out at some point, but so far the system does not have this capability.

Resolving Semantic Incompatibilities

The Core Specializer also resolves certain semantic incompatibilities that the ECG Analyzer is unable to eliminate. Hypothetically, an ECG Grammar could be devised to eliminate these issues, but previous attempts have resulted in overly complex grammars that lose much of the compositionality that makes ECG so powerful.

Properties and Predication

Thus far, these incompatibilities have concerned the relations between subjects and their predication. Consider the following sentences:

The box weighs 2 pounds.
The box is 2 pounds.
The weight of the box is 2 pounds.

Though these sentences are all grammatically distinct, they’re conveying similar semantics, and ultimately the n-tuples are quite similar (if not exactly identical). The third sentence, however, is difficult to encode properly in the grammar, except by using what we call “Modifier-PP” constructions. There is a class of constructions that express an object’s property with the following syntax:

PROPERTY-NOUN [OF] NP

One of the Core Specializer’s additional capabilities is integrating this information in a structured way into the n-tuple. As mentioned above, the Core Specializer must invert all of the pointers to a particular RD, including subsets of the “Modification” schema. Property information that takes this format is encoded the following way:

{type: box, property: {objectDescriptor: {type: color}}}

This in and of itself is an important feature of the Core Specializer. However, more important is the aforementioned resolution of semantic incompatibilities. Because this information is not represented completely in the grammar, the Analyzer cannot rule out certain semantically incorrect assertions, such as:

The weight of the box is blue. *
The color of the box is big. *

Key methods

check_compatibility(self, predication)

This method checks whether the property described in the protagonist is compatible with the property implied by the predication. Thus, color and color are compatible, but weight and color are not. If an incompatibility is detected, the Core Specializer raises an Exception, which is similar to a sentence not parsing correctly.

Note that as always, meaning is context-specific and dependent on the application. For example, one application might have a meaning of red that suggests a certain weight (for example, a scale ranging from blue to red, for “danger zone”). In this case, the system integrator can add a token red that means weight, in addition to a token red that means color, and then a sentence like “the weight of the box is red” will produce the desired n-tuple. The important thing is that the mechanism exists in the Core Specializer to filter out semantically incorrect commands – it is up to the system integrator to define the right tokens for a given application.