# Computing meaning

In this notebook, we will demonstrate how the meanings of large expressions can be built up from the meanings of their components.

There are five parts to the notebook. Everyone should first do Part 1 and Part 2, which go over the basics of formal semantics. After that, you can choose whether to continue to Part 3 (which goes into some more advanced formal semantics) or to skip ahead to Part 4 and Part 5 (which give a taste of how these semantic notions could be instantiated in NLP).

The formal semantics components of the notebook use the Lambda Notebooks formalism developed by Kyle Rawlins, available at https://github.com/rawlins/lambda-notebook.

## Part 1: Compositionality with Math

Compositional semantics is the study of how to construct the meaning of complex phrases and sentences from their component parts.

Before moving into language, we will start with the following simple example from math: 

**two plus three times five equals seventeen**

We will derive an expression of the meaning of this entire sentence by building up the meanings of the sub-expressions.

### a. Numbers

To start off, the easiest things to deal with in this sentence are the numbers. For example, the meaning of the word "three" is the number 3. We indicate this by writing [[three]] = 3, where the double brackets are used to represent the meaning of whatever is contained in the brackets. Run the following cell to store this meaning of the word "three" in this notebook's memory:

In [None]:
%%lamb
||three|| = 3

Now fill in the proper meanings for the remaining relevant numbers:

In [None]:
%%lamb
||two|| = #REPLACE WITH ANSWER#
||five|| = #REPLACE WITH ANSWER#
||seventeen|| = #REPLACE WITH ANSWER#

### b. Operators

Now we come to a somewhat harder question: What is the meaning of "times"? It's pretty easy to tell what the meaning of "three times five" should be, namely 15. However, in the semantics framework we are using, you may only combine two words at a time, whereas "three times five" contains three words. Therefore, we need to find some way to build up the phrase "three times five" from smaller units. 

We will do this by first creating a meaning for just the phrase "times five." What can "times five" mean? Well, we know that when we insert "three" we get "three times five", so we can think of "times five" as being a function that takes an argument (in this case 3) and multiplies it by 5. It is easy to write such a function in Python:

In [None]:
def times_five(x):
    return x * 5

In [None]:
times_five(3) # This should give 15

Equivalently, the Python function can be written with a lambda expression as follows:

In [None]:
(lambda x: x * 5)(3)

The first component of that line (lambda x: x \* 5) defines a function, which takes x as its argument and returns x\*5. Then the 3 in parentheses is an argument that has been passed to this function, and the result of passing this argument is 15.

Now we can ask again what "times" should mean. Since "times five" should give us something like the Python function (lambda x: x \* 5), we want "times" to give us something that, when given 5 as an argument, returns (lambda x: x \* 5). To accomplish this, we can simply add in another lambda layer as follows:

In [None]:
(lambda y: (lambda x: x * y))

When this function is given 5 as an argument, the outer lambda layer is evaluated, and the value that is returned is the function (lambda x: x \* 5), which is the same as out times_five function from before. And if you then pass 3 as an argument to this output function, you should get 15 as before, as demonstrated in the following cell:

In [None]:
(lambda y: (lambda x: x* y))(5)(3)

Therefore, we now can write the meaning for "times" using this lambda notation (the subscript $n$ indicates that $x$ and $y$ are of type number):

In [None]:
%%lamb
||times|| = L y_n : L x_n : x * y

Now we can compose together "times" and "five" to get the meaning of "times five". In our notation, we use the "\*" symbol to compose together the meanings of two words:

In [None]:
times * five

The output should tell you that the meaning of "times five" is a function that takes an argument $x$ and returns $x * 5$. The meaning of "three times five" should also behave as expected:

In [None]:
three * (times * five)

In the cell below, fill in the proper meaning for "plus"

In [None]:
%%lamb
||plus|| = #REPLACE WITH ANSWER#

Finally, we just need to define "equals" (in this notation, you have to use "<=>" for equality):

In [None]:
%%lamb
||equals|| = L y_n : L x_n : x <=> y

### c. Evaluating an equation

Now we can evaluate the meaning of our entire original equation, which was **two plus three times five equals seventeen**. The equation is true, so the expression will evaluate as True.

In [None]:
((two * (plus * (three * (times * five)))) * (equals * seventeen))

### d. Another example

Now we are going to try to build up the meaning for the following expression:

**four cubed times five eighths minus nine squared minus negative eleven**

First fill in the meanings for the new words in this expression:

In [None]:
%%lamb
||four|| = #REPLACE WITH ANSWER#
||cubed|| = #REPLACE WITH ANSWER#
||five|| = #REPLACE WITH ANSWER#
||eighths|| = #REPLACE WITH ANSWER#
||minus|| = #REPLACE WITH ANSWER#
||nine|| = #REPLACE WITH ANSWER#
||squared|| = #REPLACE WITH ANSWER#
||negative|| = #REPLACE WITH ANSWER#
||eleven|| = #REPLACE WITH ANSWER#

Now it's time to evaluate our expression. Recall that the expression is:

**four cubed times five eighths minus nine squared minus negative eleven**

In the cell below, type the command necessary to evaluate this sentence:

In [None]:
#REPLACE WITH ANSWER#

Your cell above should evaluate the same as the expression below. If it doesn't, fix it! Pay careful attention to the order in which you combine words; ((four times three) plus two) does not mean the same thing as (four times (three plus two)). If you're stuck, try focusing on smaller subparts of the expression. For example, make sure that [[five eighths]] = 0.625.

In [None]:
4**3 * (5/8) - 9**2 - (-11)

## Part 2: Compositionality with Language

### a. Proper nouns

Now that we've sorted out numbers, we can extend the formalism to the more interesting case of natural language. The linguistic components with the simplest meanings are proper nouns; much as number words have numbers as their meanings, proper nouns have entities in the real world as their meanings. For example, the meaning of "John" is the person John (shown here with a subscript $e$ because John is an entity):

In [None]:
%%lamb
||John|| = John_e

### b. Verbs

Next up are verbs. We will think of verbs as functions that take entities as their arguments and return either true or false depending on whether the verb applies to that argument. In pseudocode, we might write this sort of function as follows:

`
def Walked(x):
    if x walked:
        return True
    if x did not walk:
        return False
`

In our notation, we would write the meaning of "walked" as follows:

In [None]:
%%lamb
||walked|| = L x: Walked(x)

We can now compute the meaning of the short sentence "John walked":

In [None]:
John * walked

In semantics terminology, the meaning of the sentence "John walked" is the proposition "Walked(John)", which is evaluated as true if John did indeed walk or false otherwise. A proposition is simply some claim about the state of the world. For example, in this case, "Walked(John)" is making the claim that the world is such that John walked.

### d. Transitive verbs

Suppose John has a dog named Fluffy. Fill in the meaning of "Fluffy":

In [None]:
%%lamb
||Fluffy|| = #REPLACE WITH ANSWER#

Try running the following cell:

In [None]:
John * (walked * Fluffy)

You should have gotten an error. This is because, when we defined "walked" above, we did not allow it to have a direct object; it was a function that only took one argument (its subject). Therefore, when we tried to give it two arguments as in the above cell, we got an error. To fix this, we'll need to create a second definition of "walked" where it is transitive. Think back to how we defined "times" as a two-layer lambda function; transitive verbs are handled similarly.

In [None]:
%%lamb
||walked2|| = L y: L x: Walked2(x,y)

Now we can get the meaning for "John walked Fluffy" without an error:

In [None]:
John * (walked2 * Fluffy)

### e. Ditransitive verbs

Let's create an even more complex verb. Fill in the meaning for "showed" below so that the next cell gives Showed(John, Chris, Paris) as the meaning for "John showed Chris Paris". (Make sure you get the right order of the arguments; the meaning should not be Showed(John, Paris, Chris)).

In [None]:
%%lamb
||showed|| = #REPLACE WITH ANSWER#
||Chris|| = Chris_e
||Paris|| = Paris_e

In [None]:
John * ((showed * Chris) * Paris)

You can now choose to either continue to Part 3 to further explore how the meaning of natural language is expressed in formal semantics, or to skip ahead to Parts 4 and 5 to explore how the basic notions of formal semantics might be computationally instantiated for NLP.

## Part 3: More advanced compositionality with natural language

### a. Common nouns

We've dealt with proper nouns (such as "John"), but what about regular nouns, like "dog"? Whereas "John" refers to a specific entity in the world, "dog" refers to a more general class of things. It turns out that we can think of nouns (like verbs) as denoting propositions. For example, the sentence we would say that [[Fluffy is a dog]] = dog(Fluffy). To allow this sort of interpretation, we will give "dog" the following meaning:

In [None]:
%%lamb
||dog|| = L x: Dog(x)

Even though "dog" does not refer to a specific entity, the phrase "the dog" does refer to a specific entity - namely, whichever dog is currently most relevant to the conversation. To facilitate this, we give the following definition for "the", where $\iota x$ means "the unique conversationally relevant entity x" (that symbol is the Greek letter iota):

In [None]:
%%lamb
||the|| = L f_<e,t> : Iota x_e : f(x) 

Thus, for example, we can view the meaning of "the dog" as follows, where the output of the cell should be read as "the unique conversationally relevant entity x such that x is a dog":

In [None]:
the * dog

### b. Prepositional phrases

So far, most of the meanings we've built up may have seemed pretty obvious. But it becomes more interesting when we add in modifiers. For example, what should the meaning of "the dog near John" be? With just "the dog", we had the meaning $\iota$x . Dog(x), but now we want to add in the fact that the dog is near John. We would represent this as $\iota$x . Dog(x) $\wedge$ Near(x, John) (to be read as "the unique conversationally relevant entity x such that x is a dog and x is near John"; the $\wedge$ symbol means "and"). In the cell below, we give a definition for "near" that accomplishes this:

In [None]:
%%lamb
||near|| = L z: L y_<e,t>: L x: y(x) & Near(x, z)

In [None]:
the * (dog * (near * John))

Here are a few more words; fill in the relevant meanings:

In [None]:
%%lamb
||inside|| = #REPLACE WITH ANSWER#
||on|| = #REPLACE WITH ANSWER#
||sandwich|| = #REPLACE WITH ANSWER#
||house|| = #REPLACE WITH ANSWER#
||table|| = #REPLACE WITH ANSWER#

Now you should be able to run the following cell to get the meaning of "the sandwich inside the house on the table":

In [None]:
(the * (sandwich * (inside * (the * (house * (on * (the * table)))))))

But wait a second! Examine the above output carefully; it says that the house is on the table, which seems wrong. What we really want to say is that the sandwich is inside the house and that the sandwich is on the table. In other words, we really want to have [[the sandwich inside the house on the table]] = $\iota$x . Sandwich(x) $\wedge$ Inside(x, $\iota$x1 . House(x1)) $\wedge$ On(x, $\iota$x2.Table(x2)). The cell below currently contains a copy of the cell above. Edit this copy so that it gives you the correct meaning of "the sandwich inside the house on the table."

In [None]:
(the * ((sandwich * (inside * (the * house))) * (on * (the * table))))

### c. Adjectives

Adjectives are another type of common modifier for nouns. Based on the prepositional phrase examples above, write the meanings for "blue" and "big" in the cell below so that "the blue dog" in the next cell evaluates properly as [[the blue dog]] = $\iota$x . Dog(x) $\wedge$ Blue(x), and "the big blue dog" evaluates as [[the big blue dog]] = $\iota$x . Dog(x) $\wedge$ Blue(x) $\wedge$ Big(x):

In [None]:
%%lamb
||blue|| = #REPLACE WITH ANSWER#
||big|| = #REPLACE WITH ANSWER#

In [None]:
the * (blue * dog)

In [None]:
the * (big * (blue * dog))

### d. Quantifiers

The final types of words we'll discuss are the quantifiers "every" and "a", defined below:

In [None]:
%%lamb
||every|| = L f_<e,t> : L g_<e,t> : Forall x_e : f(x) >> g(x)
||a|| = L f_<e,t> : L g_<e,t> : Exists x_e : f(x) & g(x)

These definitions introduce a few more symbols. The symbol $\forall$ means "for all", the symbol $\exists$ means "there exists", and the symbol $\rightarrow$ means "implies". For example, running the cell below shows that the meaning of "every dog walked" is "for all entities x, the fact that x is a dog implies that x walked":

In [None]:
(every * dog) * walked

And the next cell shows that the meaning of "a dog walked" is "there exists an entity x such that x is a dog and x walked":

In [None]:
(a * dog) * walked

### e. A complicated sentence

If you did the previous parts correctly, you should be able to run the following cell to generate the meaning of the sentence "Every dog inside the blue house showed John the big blue sandwich on the house."

In [None]:
(every * (dog * (inside * (the * (blue * house))))) * ((showed * John) * (the * (big * (blue * (sandwich * (on * (the * house)))))))

### f. Some puzzles

Below are meaning representations for several English words. Figure out which word goes with each representation (there may be multiple correct answers):

1. $\lambda$f . $\lambda$ x. f(x) $\wedge$ $\forall$ y . f(y) $\rightarrow$ height(y) $\leq$ height(x)
2. $\lambda$f . $\iota$ x. f(x) $\wedge$ possesses(you, x)
3. $\lambda$f . $\lambda$ x. f(x,x) 



## Part 4: Word embeddings

### a. Distributional semantics

The previous sections described how you can build up the meanings of phrases from smaller phrases. However, this approach might seem unsatisfying when it comes to defining the meanings of individual words. For example, this approach states that [[Paris]] = Paris$_e$, but saying this seems to completely avoid answering the question of what exactly Paris means! Similarly, when we say that [[dog]] = $\lambda$ x . Dog(x), this completely avoids the question of what Dog(x) means.

What can we do to create more satisfying definitions of individual words? This question is an important one for NLP because computers have no built-in world knowledge, so if you want a computer to understand a sentence, you have to tell the computer how to build the sentence's meaning from the ground up - including what the meanings of the individual words are.

A tempting approach to the task of defining words is to use some sort of dictionary definition. For example, you could define "puppy" as "a young dog." However, this does not really help for NLP because our computer doesn't know the words "a", "young", and "dog", so defining "puppy" as "a young dog" doesn't really help it much.

The most popular solution in NLP is to define a word's meaning with a vector (such a vector is called a "word embedding" because it embeds the word's meaning in high-dimensional space). Run the following cells to load a pre-trained set of word embeddings.

In [None]:
import numpy as np

glove = open("glove.6B.50d.small.txt")

glove_dict = {}
for line in glove:
    parts = line.split()
    glove_dict[parts[0]] = np.array(list(map(float, parts[1:])))

So, for example, lots of people wonder what the meaning of life is. By running the following cell, you can get the vector that represents the meaning of "life"!

In [None]:
glove_dict["life"]

How were the values in these vectors computed? The answer is by using something called the **distributional hypothesis**, which is usually stated as "You shall know a word by the company it keeps" (a quote from linguist John Firth). In other words, you can determine the meaning of a word, say "dog", based on the contexts in which it occurs.

This notion makes some sense if you think about how babies acquire word meanings. For most words, no one ever tells a child what the word means; instead, the child learns what the word means by observing the contexts where the word is used. You can perform this kind of inference by considering the following sentence: "A sengi scurried past my feet and disappeared into a pile of leaves." Even if you've never seen the word "sengi" before, you can get a pretty good idea of its meaning just from this single sentence - it probably means some sort of small, mouse-like critter (which is correct; "sengi" is another name for the elephant shrew). 

Based on this intuition, the embedding vector for a given word is usually constructed by taking a large corpus, observing the context around every instance of that word, and training a vector that reflects these contexts. The idea is that, based on the distributional hypothesis, if you represent a word's distribution, that is effectively the same as representing its meaning. The specific word vectors that you just downloaded use this general strategy; they come from the GloVe project of Pennington, Socher, and Manning (2014) (https://nlp.stanford.edu/projects/glove/)).

### b. Vector similarity as a way of approximating similarity in meaning

You may still find these word embeddings somewhat unsatisfying as representations of meaning, since these vectors still seem pretty impenetrable. However, it turns out that these vectors can actually be used in some pretty interpretable ways. This is done by exploiting the fact that vectors represent points in high-dimensional space, and therefore by representing words as vectors we can assume that words with similar vectors have similar meanings (where vector similarity is measured as the cosine between the two vectors in question). Run the following cell to define methods for taking a word and returning the word whose embedding is closest to that embedding:

In [None]:
def cos_sim(vec_a, vec_b):
    return np.dot(vec_a, vec_b)/(np.linalg.norm(vec_a) * np.linalg.norm(vec_b))

def find_closest(word_a):
    max_sim = 0.0
    closest = ""
    
    for word_b in glove_dict:
        dist = cos_sim(glove_dict[word_a], glove_dict[word_b])
        if dist > max_sim and word_b != word_a:
            max_sim = dist
            closest = word_b
            
    return closest

Now try out this function with a few words - a couple examples are below. In general, the closest neighbor of a word should be relatively similar in meaning to that word. 

In [None]:
find_closest("dog")

In [None]:
find_closest("king")

### c. Analogies with word embeddings

A cooler way of using these vectors to explore relationships between words is by writing analogies, which can be thought of as equations. For example, consider the analogy "king is to man as queen is to woman." If we think of the four words as vectors, we can represent this analogy as "king - man $\approx$ queen - woman" - that is, the vector for "king" minus the vector for "man" should be approximately equal to the vector for "queen" minus the vector for "woman." 

We can then rearrange this equation to get "king - man + woman $\approx$ queen". With the equation in this form, we can then test whether our word embeddings successfully capture word meanings by using them to complete analogies. In this case, we will do it by computing the value of the left hand side (king - man + woman), which will give us some vector. Then we simply find the word in our corpus whose vector is closest to the vector for the left hand side, and we will say that this word is the answer to the analogy question. The following cells define functions that allow you to complete such analogies:

In [None]:
def find_closest_vec(vec, word_a, word_b, word_c):
    max_sim = 0.0
    closest = ""
    
    for word_d in glove_dict:
        dist = cos_sim(vec, glove_dict[word_d])
        if dist > max_sim and word_d != word_a and word_d != word_b and word_d != word_c:
            max_sim = dist
            closest = word_d
            
    return closest

def analogy(word_a, word_b, word_c):
    new_vec = glove_dict[word_a] - glove_dict[word_b] + glove_dict[word_c]
    return find_closest_vec(new_vec, word_a, word_b, word_c)

Try out some analogies to see whether the GloVe vectors give you the answers you expect. Below are some examples of successful analogies, but you will likely be able to find examples that are not so successful. To complete the analogy "word_1 is to word_2 as ???? is to word_3", use the command "analogy(word_1, word_2, word_3)", and the result will be a guess at the value of ???.

In [None]:
analogy("king", "man", "woman")

In [None]:
analogy("kitten", "cat", "dog")

In [None]:
analogy("paris", "france", "germany")

In [None]:
analogy("man", "men", "women")

In [None]:
analogy("ate", "eat", "run")

## Part 5: Tree RNNs

In Part 4, you saw how to represent word meanings in a computationally useful way. Meanwhile, Parts 1 through 3 showed how to put together word meanings to create sentence meanings. If we want to compute vectors representing sentence meanings, we can combine these two ideas; this is the basic intuition behind some neural network architectures called recursive neural networks or tree RNNs.

Suppose you wanted to compute a representation of the meaning of the sentence "John walked Fluffy". Recall from before that, in formal semantics, this meaning would be generated by first composing the meanings of "walked" and "Fluffy" (to give $\lambda$ x. Walked(x, Fluffy)), and then composing that result with the maning of "John" (to give Walked(John, Fluffy). We can represent the order of this composition with the following tree (which shows "walked" composing with "Fluffy", and then "John" composing with the result):

<img src="tree.png" width="40%">

Our goal is now to compute a vector representation of this whole tree. To start out, we can fill in vector representations for the leaves of the trees by using the word embeddings for these words:

<img src="tree_2.png" width="40%">

Now, we will train a recurrent neural network to compose together two vectors into a single vector of the same dimensionality (there are many possible training objectives you could use to accomplish this training). We would then use this neural network to take the vector for "walked" and the vector for "Fluffy" and generate a single vector for "walked Fluffy":

<img src="tree_3.png" width="40%">

Similarly, we can take the vector for "John" and the vector for "walked Fluffy" and use our trained neural network to compose them together into a single vector:

<img src="tree_3.png" width="40%">

This single vector at the root of the tree is then what we use as the vector representation of the whole sentence. In this way, we can combine word embeddings with the basic concepts of compositional semantics as one possible technique for creating meaning representations for whole sentences: We use the word embeddings to represent the meanings of words, then we use the structure determined by compositional semantics to c