# Natural Language Understanding
## Querying a Database

In [1]:
import nltk

*Problem*

a.		Which country is Athens in?

b.		Greece.

Table 1.1:

city_table: A table of cities, countries and populations

|City|Country|Population|
|----|-------|----------|
|athens|greece|1368|
|bangkok|thailand|1178|
|barcelona|spain|1280|
|berlin|east_germany|3481|
|birmingham|united_kingdom|1112|

SQL `	SELECT Country FROM city_table WHERE City = 'athens'` can easily get the answer

How to translate English input to SQL?

In [2]:
nltk.data.show_cfg('grammars/book_grammars/sql0.fcfg')

% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'


In [3]:
from nltk import load_parser
cp = load_parser('grammars/book_grammars/sql0.fcfg')
query = 'What cities are located in China'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)

SELECT City FROM city_table WHERE Country="china"


In [10]:
from nltk.sem import chat80
rows = chat80.sql_query('corpora/city_database/city.db', q)
for r in rows: 
    print(r[0], end=" ")

canton chungking dairen harbin kowloon mukden peking shanghai sian tientsin 

## Natural Language, Semantics and Logic

Broadly speaking, logic-based approaches to natural language semantics focus on those aspects of natural language which guide our judgments of consistency and inconsistency. The syntax of a logical language is designed to make these features formally explicit. As a result, determining properties like consistency can often be reduced to symbolic manipulation, that is, to a task that can be carried out by a computer. In order to pursue this approach, we first want to develop a technique for representing a possible situation. We do this in terms of something that logicians call a model.

A **model** for a set W of sentences is a formal representation of a situation in which all the sentences in W are true. The usual way of representing models involves set theory. The domain D of discourse (all the entities we currently care about) is a set of individuals, while relations are treated as sets built up from D. Let's look at a concrete example. Our domain D will consist of three children, Stefan, Klaus and Evi, represented respectively as s, k and e. We write this as D = {s, k, e}. The expression boy denotes the set consisting of Stefan and Klaus, the expression girl denotes the set consisting of Evi, and the expression is running denotes the set consisting of Stefan and Evi. 1.2 is a graphical rendering of the model.

Figure 1.2
![](http://www.nltk.org/images/model_kids.png)

# Propositional Logic
A logical language is designed to make reasoning formally explicit. As a result, it can capture aspects of natural language which determine whether a set of sentences is consistent. As part of this approach, we need to develop logical representations of a sentence φ which formally capture the **truth-conditions** of φ. 

**Propositional logic** allows us to represent just those parts of linguistic structure which correspond to certain sentential connectives. We have just looked at and. Other such connectives are not, or and if..., then.... In the formalization of propositional logic, the counterparts of such connectives are sometimes called **boolean operators**. The basic expressions of propositional logic are **propositional symbols**, often written as P, Q, R, etc. There are varying conventions for representing boolean operators. Since we will be focusing on ways of exploring logic within NLTK, we will stick to the following ASCII versions of the operators:

In [12]:
nltk.boolean_ops()

negation       	-
conjunction    	&
disjunction    	|
implication    	->
equivalence    	<->


From the propositional symbols and the boolean operators we can build an infinite set of **well formed formulas** (or just formulas, for short) of propositional logic. First, every propositional letter is a formula. Then if φ is a formula, so is -φ. And if φ and ψ are formulas, then so are (φ & ψ) (φ | ψ) (φ -> ψ) (φ <-> ψ).

Table 2.1:

Truth conditions for the Boolean Operators in Propositional Logic.

|Boolean Operator|Truth Conditions|
|----------------|----------------|
|negation (it is not the case that ...)	-φ is true in s|iff	φ is false in s|
|conjunction (and)	(φ & ψ) is true in s|iff	φ is true in s and ψ is true in s|
|disjunction (or)	(φ &#124; ψ) is true in s|iff	φ is true in s or ψ is true in s|
|implication (if ..., then ...)	(φ -> ψ) is true in s|iff	φ is false in s or ψ is true in s|
|equivalence (if and only if)	(φ <-> ψ) is true in s|iff	φ and ψ are both true in s or both false in s|

In [3]:
read_expr = nltk.sem.Expression.fromstring
read_expr('-(P & Q)')

<NegatedExpression -(P & Q)>

In [15]:
read_expr('P & Q')

<AndExpression (P & Q)>

In [16]:
read_expr('P | (R -> Q)')

<OrExpression (P | (R -> Q))>

In [19]:
read_expr('P <-> -- P')

<IffExpression (P <-> --P)>

Logical proofs can be carried out with NLTK's inference module, for example via an interface to the third-party theorem prover Prover9. The inputs to the inference mechanism first have to be converted into logical expressions.

In [4]:
lp = nltk.sem.Expression.fromstring
SnF = read_expr('SnF')
NotFnS = read_expr('-FnS')
R = read_expr('SnF -> -FnS')
prover = nltk.Prover9()
prover.prove(NotFnS,[SnF,R])

True

In [5]:
val = nltk.Valuation([('P',True),('Q',True),('R',False)])

In [6]:
val['P']

True

In [9]:
dom = set()
g = nltk.Assignment(dom)

m = nltk.Model(dom,val)
print(m.evaluate('(P & Q)',g))
print(m.evaluate('-(P & Q)',g))
print(m.evaluate('(P & R)',g))
print(m.evaluate('(P | R)',g))

True
False
False
True


# First-Prder Logic
## Syntax
The standard construction rules for first-order logic recognize terms such as individual variables and individual constants, and predicates which take differing numbers of arguments. For example, *Angus walks* might be formalized as *walk(angus)* and *Angus sees Bertie* as *see(angus, bertie)*. We will call *walk* a **unary predicate**, and *see* a **binary predicate**

It is often helpful to inspect the syntactic structure of expressions of first-order logic, and the usual way of doing this is to assign types to expressions. Following the tradition of Montague grammar, we will use two **basic types**: e is the type of entities, while t is the type of formulas, i.e., expressions which have truth values. Given these two basic types, we can form **complex types** for function expressions. That is, given any types σ and τ, 〈σ, τ〉 is a complex type corresponding to functions from 'σ things' to 'τ things'. For example, 〈e, t〉 is the type of expressions from entities to truth values, namely unary predicates.

In [12]:
read_expr = nltk.sem.Expression.fromstring
expr = read_expr('walk(angus)',type_check=True)
print(expr.argument)
print(expr.argument.type)
print(expr.function)
print(expr.function.type)

angus
e
walk
<e,?>


In [13]:
sig = {'walk':'<e,t>'}
expr = read_expr('walk(angus)',signature=sig)
expr.function.type

e