# Semantic similarity methods for semantic parsing

Natural language allows us to express the same concept in different ways and with different words. Every language has synonyms and semantically related words. There are usually two ways to recognize semantic similarity, either with a synonyms dictionary or with word vector-based semantic similarity methods. We will discuss both approaches.

## Using synonyms lists for semantic similarity

We already went through our dataset and saw that different verbs are used to express the same actions. For instance, `landing`, `arriving`, and `flying to` verbs carry the same meaning, whereas `leaving`, `departing`, and `flying from` verbs form another semantic group.

We already saw that in most cases, the transitive verbs and direct objects express the intent. An easy way to determine whether two utterances represent the same intent is to check whether the verbs and the direct objects are synonyms.

Let's take an example and compare two example utterances from the dataset. First, we prepare a small synonyms dictionary. We include only the base forms of the verbs and nouns. While doing the comparison, we also use the base form of the words.

Each **synonym set (synset)** includes the set of synonyms for our domain. We usually include the language-general synonyms (airplane-plane) and the domain-specific synonyms (book-buy).

In [1]:
import spacy

verbSynsets = [
    ("show", "list"),
    ("book", "make a reservation", "buy", "reserve")
]

objSynsets = [
    ("meal", "food"),
    ("aircraft", "airplane", "plane")
]

nlp = spacy.load('en_core_web_md')
doc1 = nlp("show me all aircrafts that cp uses")
doc2 = nlp("list all meals on my flight")

def extract(doc):
    for token in doc:
        if token.dep_ == "dobj":
            obj = token.lemma_
            verb = token.head.lemma_
            return (verb, obj)

verb1, obj1 = extract(doc1)
verb2, obj2 = extract(doc2)

vsyn = [syn for syn in verbSynsets if verb1 in syn]
print(f'{verb2} in {vsyn[0]}: {verb2 in vsyn[0]}')

osyn = [obj for obj in objSynsets if obj1 in obj]
print(f'{obj2} in {osyn[0]}: {obj2 in vsyn[0]}')

list in ('show', 'list'): True
meal in ('aircraft', 'airplane', 'plane'): False


We deduce that the preceding two utterances do not refer to the same intent (same verb but different object).

Synonym lists are great for semantic similarity calculations, especially when the number of synonyms in the domain is rather low. Using synonyms is not always applicable though. Making a dictionary look up each word in a sentence can become inefficient for big synsets.

## Using word vectors to recognize semantic similarity

Word vectors offer us a very convenient and vector-based way to calculate semantic similarity.

In [2]:
import spacy

nlp = spacy.load('en_core_web_md')
doc1 = nlp("show me all aircrafts that cp uses")
doc2 = nlp("list all meals on my flight")

def extract(doc):
    for token in doc:
        if token.dep_ == "dobj":
            obj = token
            verb = token.head
            return (verb, obj)

verb1, obj1 = extract(doc1)
verb2, obj2 = extract(doc2)

print(f'verb similarity: {verb1.similarity(verb2)}')
print(f'object similarity: {obj1.similarity(obj2)}')

verb similarity: 0.2340034395456314
object similarity: 0.12948690354824066


The resulting score is very low. Once again we deduce that these two utterances do not represent the same intent.