# FLIP(01):  Advanced Data Science
**(Module 03: Natural Language Processing)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, but NOT allowed to change or distribute this package.

Prepared by and for 
**Student Members** |
2006-2018 [TULIP Lab](http://www.tulip.org.au)

---


# Session 09 - Analyzing the Meaning of Sentences

### Natural Language Understanding

#### Querying a Database

In this section,we will show that solving the task in a restricted domain is pretty straightforward. But we will also see that to address the problem in a more general way, we have to open up a whole new box of ideas and techniques, involving the representation of meaning.

In [None]:
nltk.data.show_cfg('grammars/book_grammars/sql0.fcfg')

In [None]:
from nltk import load_parser

In [None]:
cp = load_parser('grammars/book_grammars/sql0.fcfg')

In [None]:
query = 'What cities are located in China'

In [None]:
trees = cp.nbest_parse(query.split())

In [None]:
answer = trees[0].node['sem']

In [None]:
q = ' '.join(answer)

In [None]:
print q

In [None]:
from nltk.sem import chat80

In [None]:
rows = chat80.sql_query('corpora/city_database/city.db', q)

In [None]:
for r in rows:
    print r[0]

# Propositional Logic

A logical language is designed to make reasoning formally explicit. As a result, it can capture aspects of natural language which determine whether a set of sentences is consistent. As part of this approach, we need to develop logical representations of a sentence φ that formally capture the truth-conditions of φ.

In [None]:
nltk.boolean_ops()

In [None]:
lp = nltk.LogicParser()

In [None]:
lp.parse('-(P & Q)')

In [None]:
lp.parse('P & Q')

In [None]:
lp.parse('P | (R -> Q)')

In [None]:
lp.parse('P <-> -- P')

In [None]:
lp = nltk.LogicParser()

In [None]:
SnF = lp.parse('SnF')

In [None]:
NotFnS = lp.parse('-FnS')

In [None]:
R = lp.parse('SnF -> -FnS')

In [None]:
prover = nltk.Prover9()

In [None]:
prover.prove(NotFnS, [SnF, R])

In [None]:
val = nltk.Valuation([('P', True), ('Q', True), ('R', False)])

In [None]:
val['P']

In [None]:
dom = set([])

In [None]:
g = nltk.Assignment(dom)

In [None]:
m = nltk.Model(dom, val)

In [None]:
print m.evaluate('(P & Q)', g)

In [None]:
print m.evaluate('-(P & Q)', g)

In [None]:
print m.evaluate('(P & R)', g)

In [None]:
print m.evaluate('(P | R)', g)

# First-Order Logic

In the remainder of this chapter, we will represent the meaning of natural language expressions by translating them into first-order logic. Not all of natural language semantics can be expressed in first-order logic. But it is a good choice for computational semantics because it is expressive enough to represent many aspects of semantics, and on the other hand, there are excellent systems available off the shelf for carrying out automated inference in first-order logic.

## Syntax

First-order logic keeps all the Boolean operators of propositional logic, but it adds some important new mechanisms.

In [None]:
tlp = nltk.LogicParser(type_check=True)

In [None]:
parsed = tlp.parse('walk(angus)')

In [None]:
parsed.argument

In [None]:
parsed.argument.type

In [None]:
parsed.function

In [None]:
parsed.function.type

In [None]:
sig = {'walk': '<e, t>'}

In [None]:
parsed = tlp.parse('walk(angus)', sig)

In [None]:
lp = nltk.LogicParser()

In [None]:
lp.parse('dog(cyril)').free()

In [None]:
lp.parse('dog(x)').free()

In [None]:
lp.parse('own(angus, cyril)').free()

In [None]:
lp.parse('exists x.dog(x)').free()

In [None]:
lp.parse('((some x. walk(x)) -> sing(x))').free()

In [None]:
lp.parse('exists x.own(y, x)').free()

## First-Order Theorem Proving

The general case in theorem proving is to determine whether a formula that we want to prove (a proof goal) can be derived by a finite sequence of inference steps from a list of assumed formulas.

In [None]:
NotFnS = lp.parse('-north_of(f, s)')

In [None]:
SnF = lp.parse('north_of(s, f)')

In [None]:
R = lp.parse('all x. all y. (north_of(x, y) -> -north_of(y, x))')

In [None]:
prover = nltk.Prover9()

In [None]:
prover.prove(NotFnS, [SnF, R])

In [None]:
FnS = lp.parse('north_of(f, s)')

In [None]:
prover.prove(FnS, [SnF, R])

## Truth in Model

Relations are represented semantically in NLTK in the standard set-theoretic way: as sets of tuples. For example, let’s suppose we have a domain of discourse consisting of the individuals Bertie, Olive, and Cyril, where Bertie is a boy, Olive is a girl, and Cyril is a dog.

In [None]:
dom = set(['b', 'o', 'c'])

In [None]:
v = """
    bertie => b
    olive => o
    cyril => c
    boy => {b}
    girl => {o}
    dog => {c}
    walk => {o, c}
    see => {(b, o), (c, b), (o, c)}
    """

In [None]:
val = nltk.parse_valuation(v)

In [None]:
print val

In [None]:
('o', 'c') in val['see']

In [None]:
('b',) in val['boy']

## Individual Variables and Assignments

In our models, the counterpart of a context of use is a variable assignment. This is a mapping from individual variables to entities in the domain. Assignments are created using the Assignment constructor, which also takes the model’s domain of discourse as a parameter.

In [None]:
g = nltk.Assignment(dom, [('x', 'o'), ('y', 'c')])

In [None]:
g

In [None]:
print g

In [None]:
m = nltk.Model(dom, val)

In [None]:
m.evaluate('see(olive, y)', g)

In [None]:
g['y']

In [None]:
m.evaluate('see(y, x)', g)

In [None]:
g.purge()

In [None]:
m.evaluate('see(olive, y)', g)

In [None]:
m.evaluate('see(bertie, olive) & boy(bertie) & -walk(bertie)', g)

## Quantification

One of the crucial insights of modern logic is that the notion of variable satisfaction can be used to provide an interpretation for quantified formulas.

In [None]:
m.evaluate('exists x.(girl(x) & walk(x))', g)

In [None]:
m.evaluate('girl(x) & walk(x)', g.add('x', 'o'))

In [None]:
fmla1 = lp.parse('girl(x) | boy(x)')

In [None]:
m.satisfiers(fmla1, 'x', g)

In [None]:
fmla2 = lp.parse('girl(x) -> walk(x)')

In [None]:
m.satisfiers(fmla2, 'x', g)

In [None]:
fmla3 = lp.parse('walk(x) -> girl(x)')

In [None]:
m.satisfiers(fmla3, 'x', g)

In [None]:
m.evaluate('all x.(girl(x) -> walk(x))', g)

## Quantifier Scope Ambiguity

In [None]:
v2 = """
    bruce => b
    cyril => c
    elspeth => e
    julia => j
    matthew => m
    person => {b, e, j, m}
    admire => {(j, b), (b, b), (m, e), (e, m), (c, a)}
    """

In [None]:
dom2 = val2.domain

In [None]:
m2 = nltk.Model(dom2, val2)

In [None]:
g2 = nltk.Assignment(dom2)

In [None]:
fmla4 = lp.parse('(person(x) -> exists y.(person(y) & admire(x, y)))')

In [None]:
m2.satisfiers(fmla4, 'x', g2)

In [None]:
fmla5 = lp.parse('(person(y) & all x.(person(x) -> admire(x, y)))')

In [None]:
m2.satisfiers(fmla5, 'y', g2)

In [None]:
fmla6 = lp.parse('(person(y) & all x.((x = bruce | x = julia) -> admire(x, y)))')

In [None]:
m2.satisfiers(fmla6, 'y', g2)

## Model Building

We have been assuming that we already had a model, and wanted to check the truth of a sentence in the model. By contrast, model building tries to create a new model,given some set of sentences. If it succeeds, then we know that the set is consistent, since we have an existence proof of the model.

In [None]:
a3 = lp.parse('exists x.(man(x) & walks(x))')

In [None]:
c1 = lp.parse('mortal(socrates)')

In [None]:
c2 = lp.parse('-mortal(socrates)')

In [None]:
mb = nltk.Mace(5)

In [None]:
print mb.build_model(None, [a3, c1])

In [None]:
print mb.build_model(None, [a3, c2])

In [None]:
print mb.build_model(None, [c1, c2])

In [None]:
a4 = lp.parse('exists y. (woman(y) & all x. (man(x) -> love(x,y)))')

In [None]:
a5 = lp.parse('man(adam)')

In [None]:
a6 = lp.parse('woman(eve)')

In [None]:
g = lp.parse('love(adam,eve)')

In [None]:
mc = nltk.MaceCommand(g, assumptions=[a4, a5, a6])

In [None]:
mc.build_model()

In [None]:
print mc.valuation

In [None]:
a7 = lp.parse('all x. (man(x) -> -woman(x))')

In [None]:
g = lp.parse('love(adam,eve)')

In [None]:
mc = nltk.MaceCommand(g, assumptions=[a4, a5, a6, a7])

In [None]:
mc.build_model()

In [None]:
print mc.valuation

# The Semantics of English Sentences

## The λ-Calculus

Remember that \ is a special character in Python strings. We must either escape it (with another \), or else use “raw strings”  as shown here:

In [None]:
lp = nltk.LogicParser()

In [None]:
e = lp.parse(r'\x.(walk(x) & chew_gum(x))')

In [None]:
e

In [None]:
e.free()

In [None]:
print lp.parse(r'\x.(walk(x) & chew_gum(y))')

In [None]:
e = lp.parse(r'\x.(walk(x) & chew_gum(x))(gerald)')

In [None]:
print e

In [None]:
print e.simplify()

In [None]:
print lp.parse(r'\x.\y.(dog(x) & own(y, x))(cyril)').simplify()

In [None]:
print lp.parse(r'\x y.(dog(x) & own(y, x))(cyril, angus)').simplify()

In [None]:
e1 = lp.parse('exists x.P(x)')

In [None]:
print e1

In [None]:
e2 = e1.alpha_convert(nltk.Variable('z'))

In [None]:
print e2

In [None]:
e1 == e2

In [None]:
e3 = lp.parse('\P.exists x.P(x)(\y.see(y, x))')

In [None]:
print e3

In [None]:
print e3.simplify()

## Quantified NPs

At the start of this section, we briefly described how to build a semantic representation for Cyril barks. You would be forgiven for thinking this was all too easy—surely there is a bit more to building compositional semantics.

In [None]:
lp = nltk.LogicParser()

In [None]:
tvp = lp.parse(r'\X x.X(\y.chase(x,y))')

In [None]:
np = lp.parse(r'(\P.exists x.(dog(x) & P(x)))')

In [None]:
vp = nltk.ApplicationExpression(tvp, np)

In [None]:
print vp

In [None]:
print vp.simplify()

In [None]:
from nltk import load_parser

In [None]:
parser = load_parser('grammars/book_grammars/simple-sem.fcfg', trace=0)

In [None]:
sentence = 'Angus gives a bone to every dog'

In [None]:
tokens = sentence.split()

In [None]:
trees = parser.nbest_parse(tokens)

In [None]:
for tree in trees:
    print tree.node['SEM']

In [None]:
v = """
    bertie => b
    olive => o
    cyril => c
    boy => {b}
    girl => {o}
    dog => {c}
    walk => {o, c}
    see => {(b, o), (c, b), (o, c)}
    """

In [None]:
val = nltk.parse_valuation(v)

In [None]:
g = nltk.Assignment(val.domain)

In [None]:
m = nltk.Model(val.domain, val)

In [None]:
sent = 'Cyril sees every boy'

In [None]:
grammar_file = 'grammars/book_grammars/simple-sem.fcfg'

In [None]:
results = nltk.batch_evaluate([sent], grammar_file, m, g)[0]

In [None]:
for (syntree, semrel, value) in results:
    print semrep
    print value

## Quantifier Ambiguity Revisited

One important limitation of the methods described earlier is that they do not deal with scope ambiguity.

In [None]:
from nltk.sem import cooper_storage as cs

In [None]:
sentence = 'every girl chases a dog'

In [None]:
trees = cs.parse_with_bindops(sentence, grammar='grammars/book_grammars/storage.fcfg')

In [None]:
semrep = trees[0].node['sem']

In [None]:
cs_semrep = cs.CooperStore(semrep)

In [None]:
print cs_semrep.core

In [None]:
for bo in cs_semrep.store:
    print bo

In [None]:
cs_semrep.s_retrieve(trace=True)

In [None]:
for reading in cs_semrep.readings:
    print reading

# Discourse Semantics

A discourse is a sequence of sentences. Very often, the interpretation of a sentence in a discourse depends on what preceded it. A clear example of this comes from anaphoric pronouns, such as he, she, and it. Given a discourse such as Angus used to have a dog.
But he recently disappeared., you will probably interpret he as referring to Angus’s dog. However, in Angus used to have a dog. He took him for walks in New Town., you are more likely to interpret he as referring to Angus himself.

## Discourse Representation Theory

In [None]:
dp = nltk.DrtParser()

In [None]:
drs1 = dp.parse('([x, y], [angus(x), dog(y), own(x, y)])')

In [None]:
print drs1

In [None]:
drs1.draw()

In [None]:
print drs1.fol()

In [None]:
drs2 = dp.parse('([x], [walk(x)]) + ([y], [run(y)])')

In [None]:
print drs2

In [None]:
print drs2.simplify()

In [None]:
drs3 = dp.parse('([], [(([x], [dog(x)]) -> ([y],[ankle(y), bite(x, y)]))])')

In [None]:
print drs3.fol()

In [None]:
drs4 = dp.parse('([x, y], [angus(x), dog(y), own(x, y)])')

In [None]:
drs5 = dp.parse('([u, z], [PRO(u), irene(z), bite(u, z)])')

In [None]:
drs6 = drs4 + drs5

In [None]:
print drs6.simplify()

In [None]:
print drs6.simplify().resolve_anaphora()

In [None]:
from nltk import load_parser
parser = load_parser('grammars/book_grammars/drt.fcfg', logic_parser=nltk.DrtParser())

In [None]:
trees = parser.nbest_parse('Angus owns a dog'.split())

In [None]:
print trees[0].node['sem'].simplify()

## Discourse Processing

When we interpret a sentence, we use a rich context for interpretation, determined in part by the preceding context and in part by our background assumptions.

In [None]:
dt = nltk.DiscourseTester(['A student dances', 'Every student is a person'])

In [None]:
dt.readings()

In [None]:
dt.add_sentence('No person dances', consistchk=True)

In [None]:
dt.retract_sentence('No person dances', verbose=True)

In [None]:
dt.add_sentence('A person dances', informchk=True)

In [None]:
from nltk.tag import RegexpTagger
tagger = RegexpTagger(
                    [('^(chases|runs)$', 'VB'),
                    ('^(a)$', 'ex_quant'),
                    ('^(every)$', 'univ_quant'),
                    ('^(dog|boy)$', 'NN'),
                    ('^(He)$', 'PRP')
                    ])

In [None]:
rc = nltk.DrtGlueReadingCommand(depparser=nltk.MaltParser(tagger=tagger))

In [None]:
dt = nltk.DiscourseTester(['Every dog chases a boy', 'He runs'], rc)

In [None]:
dt.readings()

In [None]:
dt.readings(show_thread_readings=True)

In [None]:
dt.readings(show_thread_readings=True, filter=True)