# Introduction to Issues in Compositionality in Phrase Structure

## What is Compositionality?

<font color = green > **The meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. ** </font> <br> 
A sentence's meaning comes from the **structure** and meanings of its **parts**. 
#### Evidence of Compositionality: language & infinite productive potential
Having understood one complex expression, we can understand a large number of others with the same form, and infinite expressions formed by recombining the parts. <br>
For example, if you understand "*John loves Mary*", you will understand any other sentence of the form "*X loves Y*". I can replace *John* with an infinite number of names and all the sentences will make sense to you, as long as you understand the parts and the structure. 

#### Evidence of Compositionality: structural ambiguity
A grammatical sentence that can be meaningfully interpreted in two different ways, distinguishable according to their different parse structures: 
![Structural%20Ambiguity%20Ex.png](attachment:Structural%20Ambiguity%20Ex.png)


#### Modeling Compositionality
The Goal: Find a system for composing representations of individual words to get representations of larger units (sentences, paragraphs etc). The system should have generalizeable rules that reflect human intuitions about how meanings are composed. This means it should be able to handle these linguistic phenomena the way humans do: 

* **Generalizability**: should be able to interpret a sentence if it has seen its parts and its general structure before
* **Word order permutation**: given the meaning of "*x loves y"*, should understand "*y loves x*"


# What is Functional Application? 

#### Frege's Conjecture 
* Heim and Kratzer viewed semantics as functions, applied to possible inputs
* Ex: A Phrase structure tree node with two daughters, where
    * one daughter denotes a function 
    * other dentoes an element in the domain of that function
    * apply the function (one daughter) to the element (other daughter) to obtain the denotation of the mother node

So we have the following very general semantic rule or composition rule, suggested by Schwarz (2017): 

**Functional Application (FA)**: *If a mother node x has exactly two daughters y and z, and ⟦y⟧ is a function that has ⟦z⟧ in its domain, then: *

$$ ⟦x⟧ = ⟦y⟧\ (⟦z⟧) $$

Needless to say, not all mother nodes will have exactly two daughters. Mother nodes that have just one daughter inheret the denotation of the daughter node. So we have the following second semantic rule, according to Schwarz (2017):

**FA for Non-branching Nodes (NN)**: * If a mother node x has exactly one daughter y, then: *

$$ ⟦x⟧ = ⟦y⟧ $$ 

To calculate denotations of phrases and sentences, we need to know the denotations of the lexical items (the words) they contain. Lexical items are assigned their denotations in lexical entries. 

Proper nouns are taken to denote elements of $D$, (the domain, universe of discourse, or set of individuals). The following example structure features two terminal nodes (the words Nowitzki and smokes) and five non- terminal nodes (the nodes labeled S, DP, N, VP, and V). Lexical entries determine the denotation of each terminal node. The denotation of each non-terminal node can be determined on the basis of the denotation(s) of its daughter(s) and one of the semantic rules above. Example retrieved from Schwarz (2017).

$$ [_S \ [_{DP} \ [_N \ Nowitzki\ ]]\ [_{VP} \ [_V \ smokes \ ]]] $$


1. $⟦  [_V \ smokes \ ]  ⟧$ = <br> 
$⟦ smokes  ⟧$ = <br>
$[ \lambda $ x: x $\in$ D. x smokes] 

2. $⟦ [_{VP} \ smokes \ ] ⟧$ = <br> 
$⟦ [_V \ smokes \ ] ⟧$ = <br> 
$[ \lambda $ x: x $\in$ D. x smokes] 

3. $⟦ [_N \ Nowitzki\ ] ⟧$ = <br> 
$⟦ Nowitzki ⟧$ = <br> 
DN <br> 

4. $⟦ [_{DP} \ Nowitzki ] ⟧$ = <br> 
$⟦ [_N \ Nowitzki ] ⟧$ = <br> 
DN <br> 

5. $⟦ [_S  Nowitzki \ smokes  ] ⟧$ = <br>
$⟦ [_{VP} \ smokes \ ]⟧ (⟦ [ _{DP} Nowitzki ] ⟧ ) $ = <br> 
$[ \lambda $ x: x $\in$ D. x smokes] (DN) = T iff DN smokes <br>

<img src="images/FA_ex_tree.png"/>

## Why is Functional Application useful?
Functional Application gives us a conceptual tool to calculate meaning from the meaning of the constituents. <br> <br>
Conceptually, this answers the question "How do we model linguistic compositionality?"<br> <br>
**Formal semantics provides a framework for parameterized function learning as a way to capture linguistic representation**


# Evaluations of Compositionality in Semantic Representations


## Model Shortcuts: Feature Detection 
Linguistic representations usually have surface/superficial features that, in the vast majority of cases, can successfully be used by neural networks, in place of truly compositional representations.

<br> 
**Representation complexity:**
1. BOW
2. Linear
3. Compositional

For example, permuting the word order would confound a a BOW model, since it cannot tell the difference between: 
* John loves Mary.
* Mary loves John. <br>

A sequential model, however, might solve categorization tasks distinguishing between these two sentences. 

So it's relatively easy to evaluate the model and determine where its representation falls between BOW or linear. The challenge is to evaluate if the model has captured compositionality or uses a simpler representation, looking at linear word order and detecting superficial linguistic features (ex, the presence of an -s morpheme).