Here is a basic attempt at parsing natural languages and extracting their meaning. Then we can query a knowledge base to get the answer. In this case, a sql file.

In [1]:
from nltk.treeprettyprinter import TreePrettyPrinter
from nltk import load_parser

# loading a parser that can turn english into sql
cp = load_parser('grammars/book_grammars/sql0.fcfg')

# providing a query
query = 'What cities are located in China'

# looking at the parse trees
trees = list(cp.parse(query.split()))
trees[0].draw()
# constructing the searh query
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)

SELECT City FROM city_table WHERE Country="china"


But how can we develop a parse like this?

We end up using a feature grammar. This is just like the CFGs we've seen in the past, but now we can add features to control how things appear.

In [2]:
import nltk
nltk.data.show_cfg('grammars/book_grammars/sql0.fcfg')

% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'


Now we can take our generated query and attempt to run it against a database.

In [3]:
from nltk.sem import chat80
print(q)
rows = chat80.sql_query('corpora/city_database/city.db', q)
for r in rows: 
    print(r[0], end=" ")

SELECT City FROM city_table WHERE Country="china"
canton chungking dairen harbin kowloon mukden peking shanghai sian tientsin 