# Ambiguity

To properly understand what these mean to a practitioner of NLP, we have to fundamentally understand what the goal of NLP actually is and how we approach problems in NLP in general. These challenges require good design techniques; both modular approaches to break a problem up at appropriate points into smaller challenges, and the more formal models which reflect aspects of the structure of language. These problems are different and slightly more challenging than other typical Machine Learning problems because of two main facets of language: ambiguity and compositionality.

Ambiguity can be referred as the ability of having more than one meaning or being understood in more than one way. Natural languages are ambiguous, so computers are not able to understand language the way people do. Natural Language Processing (NLP) is concerned with the development of computational models of aspects of human language processing. Ambiguity can occur at various levels of NLP. Ambiguity could be Lexical (word-level), Syntactic (dealing with order of words), Semantic (dealing with meaning of words), Pragmatic (dealing with contextual meanings) etc. A quick example follows to explain this better.

The sentence "You have the red light" is ambiguous. Without knowing the context, the identity of the speaker or the speaker's intent, it is difficult to infer the meaning with certainty. For example, it could mean: * the space that belongs to you has red ambient lighting; * you are stopping at a red traffic signal; and you have to wait to continue driving; * you are not permitted to proceed in a non-driving context; * your body is cast in a red glow; or * you possess a light bulb that is tinted red.

Similarly, the sentence "Greyson saw the man with glasses" could mean that Greyson observed the man by using glasses, or it could mean that Greyson observed a man who was holding/wearing glasses (syntactic ambiguity). The meaning of the sentence depends on an understanding of the context and the speaker's intent. As defined in linguistics, a sentence is an abstract entity—-a string of words divorced from non-linguistic context--in contrast to an utterance, which is a concrete example of a speech act in a specific context. The more closely conscious subjects stick to common words, idioms, phrasings, and topics, the more easily others can surmise their meaning; simultaneously, the further they stray from common expressions and topics, the wider the variations in interpretations.

This suggests that sentences do not have an intrinsic meaning, that there is no meaning associated with a sentence or word, and that either can only represent an idea symbolically. "The dog sat on the carpet" is a sentence in English. If someone were to say to someone else, "The dog sat on the carpet," the act is itself an utterance. This implies that a sentence, term, expression or word cannot symbolically represent a single true meaning; such meaning is underspecified (which dog sat on which carpet?) and consequently, is potentially ambiguous. By contrast, the meaning of an utterance can be inferred through knowledge of its context, or pragmatics, leveraging both its linguistic and non-linguistic contexts (which may or may not be sufficient to resolve ambiguity). You may factually know from previous conversations it's your neighbour's dog, or the carpet referred to is the Persian carpet you keep in your bedroom, or even that the dog didn't actually sit normally, because the neighbour's dog has a hip problem and causes him to lay more than sit, yet the closest word to describe what the dog did is to use sit. All of these are ambiguous from the sentence "The dog sat on the carpet," and having a limited context can allow the sentence to become more and more ambiguous and answer less and less questions.

# Compositionality and the linguistic connection

Compositionality is the other beast that presents itself as a basic roadblock in the field of NLP.

The field of NLP used to be divided by methodologies and near-term goals. Logical approaches relied on techniques from proof theory and model-theoretic semantics, they have strong ties to linguistic semantics, and they are concerned primarily with inference, ambiguity, vagueness, and compositional interpretation of full syntactic parses. In contrast, the statistical approaches derived their tools from algorithms and optimization, and they tend to focus on word meanings and broad notions of semantic content.

The two types of approaches share the long-term vision of achieving deep natural language understanding, but their day-to-day differences can make them seem unrelated and even incompatible. The distinction between logical and statistical approaches has and is continuing to rapidly disappear, with the development of models that can learn the conventional aspects of natural language meaning from corpora and databases. These models interpret rich linguistic representations in a compositional fashion, and they offer novel perspectives on foundational issues like ambiguity, inference, and grounding. The fundamental question for these approaches is what kinds of data and models are needed for effective learning. Addressing this question is a prerequisite for implementing robust systems for natural language understanding, and the answers can inform psychological models of language acquisition and language processing.

The leading players in the discussion are the aforementioned ambiguity, as well as compositionality. It is deeply united around the concepts of generalization, meaning, and structural complexity. Specifically, compositionality characterizes the recursive nature of the linguistic ability required to generalize to a creative capacity, and learning details the conditions under which such an ability can be acquired from data.

In linguistics, semantic representations are generally logical forms: expressions in a fully specified, unambiguous artificial language. The grammar in tasks like parsing usually adopt such a view, defining semantic representations with a logical language that has constant symbols for numbers and relations and uses juxtaposition and bracketing to create complex expressions. In the literature, one encounters a variety of different formalisms — for example, lambda calculi (Carpenter 1997) or first-order fragments thereof (Bird et al. 2009), natural logics (MacCartney & Manning 2009; Moss 2009), diagrammatic languages (Kamp & Reyle 1993), programming languages (Blackburn & Bos 2005), robot controller languages (Matuszek et al. 2012b), and database query languages (Zelle & Mooney 1996). A given utterance might be consistent with multiple logical forms in our grammar, creating ambiguity.

Compositionality and learning are intimately related: Both concern the ability of a system (human or artificial) to generalize from a finite set of experiences to a creative capacity, and to come to grips with new inputs and experiences effectively. From this perspective, compositionality is a claim about the nature of this ability when it comes to linguistic interpretation, and learning theory offers a framework for characterizing the conditions under which a system can attain this ability in principle. Moreover, establishing the relationship between compositionality and learning provides a recipe for synthesis: the principle of compositionality guides researchers on specific model structures, and machine learning provides them with a set of methods for training such models in practice. More specifically, the claim of compositionality is that being a semantic interpreter for a language L amounts to mastering the syntax of L, the lexical meanings of L, and the modes of semantic combination for L.