Import a couple packages:
- `random` for making random selections from the lexicon and the grammar's rewrite rules
- `pandas` for easy handling of the lexicon

In [1]:
import random
import pandas as pd

First, read in the lexicon. We could have put this in the notebook alone, but (especially if the lexicon got quite large), it seems better to save it in another file.

In [2]:
lexiconloc = './tinylexicon.txt'
lexicon = pd.read_csv(lexiconloc, names=['word','category'])
lexicon.head() # show a few lines of the lexicon dataframe so we know what we're working with

Unnamed: 0,word,category
0,reporter,N
1,wombat,N
2,lady,N
3,zookeeper,N
4,cat,N


Make a dictionary of the "rewrite" rules. These are our syntax grammar rules, such as those found in Module 6.3. Essentially, anything to the left of the → can be found as one of the "key"s in the dictionary, and how it can be expressed, what's on the right of the → can be found in a list on the right. Some kind of list or list-like structure is needed because multiple rules with the same starting point exist.

_Note that there is a differentiation in this notebook between 'Nx' and 'N'. This is to disambiguate between the terminal N, which should be replaced with nouns from the lexicon, and an intermediate Noun-type-thing between the level of NP and terminal N._

In [3]:
rewriterules = {
    'S': [['NP','VP']],
    'NP': [['Det','Nx']],
    'Nx': [['Adj','Nx'], ['N']],
    'VP': [['V0'], ['V1','NP'], ['V2','NP','NP'], ['VS','S'],['Adv','VP'],['VP','Adv']],
}

Now add the lexicon to the rewrite rules dictionary.

In [4]:
for p in lexicon.category.unique():
    rewriterules[p] = list(lexicon.word[lexicon.category==p].unique())

rewriterules

{'S': [['NP', 'VP']],
 'NP': [['Det', 'Nx']],
 'Nx': [['Adj', 'Nx'], ['N']],
 'VP': [['V0'],
  ['V1', 'NP'],
  ['V2', 'NP', 'NP'],
  ['VS', 'S'],
  ['Adv', 'VP'],
  ['VP', 'Adv']],
 'N': ['reporter', 'wombat', 'lady', 'zookeeper', 'cat'],
 'Adj': ['mean', 'angry', 'hungry'],
 'Det': ['a', 'the', 'that', 'my', 'this'],
 'V0': ['slept', 'arrived', 'ran'],
 'V2': ['gave', 'showed'],
 'VS': ['said', 'thought'],
 'Adv': ['angrily', 'already', 'yesterday'],
 'V1': ['hit', 'hugged', 'bit', 'bathed']}

Define a (recursive!!) rewrite function that expands any constituent by randomly selecting from the rewrite rules and lexical entries.

In [5]:
def rewrite(x, verbose=False):
    y = random.choice(rewriterules[x])
    if verbose:
        print(y)
    if type(y) == str:
        return y
    else:
        return ' '.join([rewrite(z) for z in y])

And finally, let's see a few examples!

In [6]:
for i in range(10):
    print(rewrite('NP'))

this hungry mean wombat
the wombat
the mean zookeeper
the angry reporter
this lady
that hungry lady
a lady
a lady
the angry cat
the mean hungry lady


In [7]:
for i in range(10):
    print(rewrite('S'))

the zookeeper thought this zookeeper slept
a reporter showed a angry hungry angry hungry wombat this mean cat
that mean hungry zookeeper hugged this hungry angry reporter
my zookeeper showed my zookeeper the lady
this hungry wombat already slept
this hungry zookeeper said my zookeeper arrived
my wombat arrived
that angry zookeeper arrived
that lady showed my lady that angry angry zookeeper
a hungry reporter bathed my reporter


Some thoughts on additions or expansions
- Are all words equally likely to come up? How might you change the code or structures so that some words are selected more frequently than others?
- Are some nouns more likely to be associated with some verbs? How might you change the code or structures so that some words are selected more frequently based on the other words?
- How might you add PPs?
- Is the recursive rewrite function realistic in that it processes from left-to-right? How else might you encode it?