# 11. Semantics 1: words

![title](media/simba.jpg)

### 11.1 [What is (computational) semantics?](#11.1)

### 11.2 [Word meaning](#11.2)

### 11.3 [Lexical relationships: synonymy, homonymy, hypernymy](#11.3)

### 11.4 [Lexical ontologies](#11.4)

### 11.5 [Semantic similarity](#11.5)

# 11.1 What is (computational) semantics?
<a id='11.1'></a>

We have seen ways to analyze, parse, annotate, translate, etc. written text...

...but without modeling its __meaning__

__Semantics__ is the study of __meaning__

__Semantics__ is the study of meaning, and in linguistics and NLP it refers to the meaning of linguistic utterances. Applications covered in this course so far are all possible without any explicit representation of what certain linguistic structures mean. Even complex processes like machine translation can be ignorant of what each word, phrase or sentence _means_, i.e. what _information_ it conveys to speakers of the language.

Since the semantics of an utterance can be thought of as a mapping from linguistic structures to a representation of the world, it is closely connected with the fields of __philosophy__, __logic__, and __knowledge representation__. Some consider semantics to be __AI-complete__, i.e. at least as difficult as modeling human cognition.

## Applications 1: supporting common NLP tasks

Many levels of language processing can benefit from analyzing meaning:

### Syntactic parsing

_I made spagetti with meatballs_

_I made spagetti with my sister_

_I shot an elephant in my pajamas_

### Machine translation

![title](media/mt.jpg)

## Applications 2: Semantics Tasks

Other tasks rely on semantics so heavily, they are considered semantic technologies:
- question answering
- recognizing entailment
- semantic web
- personal assistants
- conversational agents

### Question answering

the process of generating adequate answers to user's questions, based on some knowledge of the world:

![title](media/qa.jpg)

### Recognizing entailment

deciding whether one statement _implies_ another or not

![title](media/rte.jpg)

Bikel, D., & Zitouni, I. (2012). Multilingual natural language processing applications: from theory to practice. IBM Press.

### Semantic web

enabling computers to understand _what is on the internet_ and _what you can do on the internet_

[Watch Bruce Willis use the semantic web](https://www.youtube.com/watch?v=-twzacZ1jrk) in 1997

### Personal assistants

such as Apple Siri, Amazon Alexa, or Google Now

<img src="media/siri.jpg" style="width: 400px;"/>


### Conversational agents (chatbots),

Systems that can, to some extent, _carry human-like conversations_

<img src="media/cleverbot.jpg" style="width: 400px;"/>


## Semantic analysis / Semantic parsing

_The task of mapping linguistic units to some representation of their meaning_

- But what types of units?
  - Words?
  - Phrases, sentences?
  - Paragraphs, documents?

- And what representation? How do we represent meaning? What is meaning?
     - is it a graph?
     - or a formula in first-order logic?
     - or a real-valued vector?
     - or something else?

(We'll see some examples in 2 weeks)

__Semantic analysis__ is the process of determining the meaning of linguistic units. While all technologies introduced so far can benefit from such analyses, the ones discussed in this and the following two lectures are outright impossible unless we build explicit representations of the information content of linguistic data. Mapping linguistic data to some representation of meaning requires us to choose a __semantic representation__

Today there exist dozens of different theories and systems of __semantic representation__. In syntactic or morphological analysis, there are theoretical concepts that are widely accepted by linguists and used by engineers, such as the constituent structure of a sentence or the concept of verb tense. There is no such agreement on the basic elements of semantic representation.

In a narrow sense, semantic analysis involves modeling the meaning of a sentence only as far as it can be determined without knowing the context in which it is uttered, the previous knowledge (information state) of each speaker, etc. Detecting meaning as a function of linguistic form only is sometimes called __syntax-driven semantic analysis__ or __semantic parsing__. 

In the broader sense, semantic analysis involves modeling all new information that an utterance conveys, and thus includes the process of __inference__. In this case the analysis of the sentence _What did you do today?_ should at least be aware of the identity of the person this question was addressed to and the exact time when the language was uttered. It is far from trivial to define the limits of this broader process: the true scope of such a question is actually determined by factors such as the nature of the relationship between speakers (the answer is different if the question is asked by someone's boss or by a friend) or the history of interactions between them. The field of linguistics concerned with such factors is called __pragmatics__.

# 11.2 Word meaning

We know that _dog_ is a singular common noun, but how do we distinguish it from _cat_, _television_, _Monday_, or _peace_?

Two major approaches:

- decomposing meaning into __elements__ or __features__ (e.g. a dog is an _animal_, _four-legged_, _faithful_, etc.) (discrete representation)

- modeling meaning as __distribution__ - the contexts in which it appears (e.g. __dog__ is likely to appear in the context _I take my ... for a walk twice a day_) (continuous representation)

## Decomposing meaning

_dog_: animal, four-legged, faithful, barks

_peace_: period, no war

### Advantages

- transparent representation (we understand what each element means)

- makes it straightforward to model lexical relationships (synonymy, hypernymy, similarity, etc., see [11.3](#11.3) on what these mean)

### Problems

- what are the __primitives of representation__, i.e. what elements shall be used in such "definitions"

- how to determine the __exact set of elements__ in a definition: is _faithful_ an inherent property of _dog_? Is _peace_ really a _period_?

- should representations have additional structure? Is it only a list or maybe a graph?

## The distributional approach

Two words are similar in meaning if they appear in similar contexts. Typically we represent words using __real-valued vectors__ in a way that Euclidean distance between vectors is proportional to the similarity of contexts.

### Advantages

- Robust, can be constructed from large unannotated data, which is available nowadays

- Has proven useful in virtually all NLP tasks

### Problems:

- non-transparent representation: we cannot truly understand the meaning of a representation (e.g. the meaning of dimension $i$)

- cannot handle rare words - and a large part of any data is rare words!


## 11.3 Lexical relationships: synonymy, homonymy, hypernymy
<a id='11.3'></a>

### Synonyms

Pairs of words that mean roughly the same thing are called __synonyms__

- _dog_ - _canine_
- _buy_ - _purchase_ 

Q: are there "perfect synonyms", ever, in any language? Depends on our definition of meaning!

### Hypernyms, hyponyms

A word is a __hypernym__ of another if it is a broader or more general concept of which the other is a special case, e.g. _mammal_ is the hypernym of _dog_, _rectangle_ is the hypernym of _square_.

We also say that _dog_ is a __hyponym__ of _mammal_ and _square_ is a hyponym of _rectangle_.

Q: in what way is this similar to the IS_A relationship in programming?

### Homonyms, homophones

_Bank_ (as in financial institution) and _bank_ (as in the bank of a river) are __homonyms__ (they are spelled the same but have very different meanings)

Q: _glass_ (material) and _glass_ (dish) are not __homonyms__, but why?

_Two_ and _too_ are __homophones__, which means they are pronounced the same

## 11.4 Lexical ontologies
<a id='11.4'></a>

Some examples are:

- WordNet
- FrameNet
- 4lang

## WordNet

- widely used lexical database (project website: [https://wordnet.princeton.edu/](https://wordnet.princeton.edu/))

- groups words into sets of synonyms (__synsets__), and models semantic relationships among them

- available for dozens of languages (including Hungarian, see [http://rgai.inf.u-szeged.hu/index.php?lang=en&page=HuWN](http://rgai.inf.u-szeged.hu/index.php?lang=en&page=HuWN))

## WordNet example

![title](media/wordnet.jpg)

### WordNet example (cont'd)

![title](media/wordnet2.jpg)

## FrameNet

Website: [https://framenet.icsi.berkeley.edu/fndrupal/](https://framenet.icsi.berkeley.edu/fndrupal/)

A resource based on __Frame Semantics__ (see e.g. [Fillmore & Baker 2001](https://s3.amazonaws.com/academia.edu.documents/38607839/framenet.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1502707461&Signature=7W66yd%2FSTG8r3BU1DK86lz1ar%2FQ%3D&response-content-disposition=inline%3B%20filename%3DFrame_Semantics_for_Text_Understanding.pdf))



__Frames__ are script-like structures that represent a situation, event or object, and lists its typical participants or props, which are called __event roles__

[Here's an example](https://framenet2.icsi.berkeley.edu/fnReports/data/frame/Apply_heat.xml)

Has been used to train semantic parsers / __semantic role labelers__, e.g. [SEMAFOR](http://www.cs.cmu.edu/~ark/SEMAFOR/)

## 11.5 Semantic similarity
<a id='11.5'></a>

### Task definition

Measure the degree to which the meaning of two words are similar

e.g. _cat_ and _dog_ are more similar than _cat_ and _car_

Not a precise definition - that would require a model of meaning 

Datasets are created based on the human intuition of hundreds of annotators

### Motivation

various NLP tasks benefit from a similarity metric, e.g. machine translation, info retrieval (search).

for any task, extra data for rare words may be obtained through similar but more frequent words

models of word meaning can be evaluated based on their inherent concept of semantic distance/similarity

## Distributional approaches

__cosine similarity__ of word vectors is expected to be proportional to semantic similarity

e.g. nearest neighbors in the [glove.6B.50d](https://nlp.stanford.edu/projects/glove/) embedding:

## Distributional approaches - example

words closest to __king__:       

|        |       |
|--------|-------|
|prince  | 0.824 |
|queen   | 0.784 |
|ii      | 0.775 |
|emperor | 0.774 |
|son     | 0.767 |

words closest to __dog__:

|        |       |
|--------|-------|
|cat  | 0.922 |
|dogs   | 0.851 |
|horse      | 0.791 |
|puppy | 0.775 |
|pet     | 0.772 |

## Distributional approaches - example

Not as reliable with less frequent words, e.g. __opossum__:

|        |       |
|--------|-------|
|four-eyed  | 0.752 |
|raccoon  | 0.717 |
|songbird      | 0.704 |

Or __woodpecker__:

|        |       |
|--------|-------|
|pileated  | 0.805 |
|ivory-billed  | 0.72 |
|red-cockaded  | 0.71 |

### Ontology-based approaches

Distance between words in lexical graphs such as WordNet is also used as a source of semantic similarity

Path similarity in wordnet between __dog__ and some other synsets:

|        |       |
|--------|-------|
|canine  | 0.5 |
|wolf  | 0.33 |
|cat  | 0.2 |
|refrigerator |0.07|