# Word n-gram models

Let's start with downloading some books from the Gutenberg project. Then, create three lists of words from some famous works.

In [1]:
import nltk
from nltk.corpus import gutenberg
from ngram import NGramModel
import numpy as np

nltk.download('gutenberg')

print("Available books:", gutenberg.fileids())

Available books: ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']


[nltk_data] Downloading package gutenberg to
[nltk_data]     /home/fredrik/nltk_data...
[nltk_data]   Package gutenberg is already up-to-date!


In [2]:
words_austen = gutenberg.words(['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt'])
print(gutenberg.raw(['austen-emma.txt'])[:500])
print(words_austen[:100])

[Emma by Jane Austen 1816]

VOLUME I

CHAPTER I


Emma Woodhouse, handsome, clever, and rich, with a comfortable home
and happy disposition, seemed to unite some of the best blessings
of existence; and had lived nearly twenty-one years in the world
with very little to distress or vex her.

She was the youngest of the two daughters of a most affectionate,
indulgent father; and had, in consequence of her sister's marriage,
been mistress of his house from a very early period.  Her mother
had died t
['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']', ...]


## Creating a model

In [3]:
import ngram

model = ngram.NGramModel(words_austen, 1)
print(model)

1-gram model with 11490 unique keys


In [4]:
pred = model.predict_sequence(100)
#print(pred)
print(" ".join(pred))


, even , but exterior to embarrassed and her she or the caution be any took be their very remained . the tiresome ' four same as He was , an as he Saturday , gown I wishing . said company suspicion he was one provided , they ME there ' in these , , but do marry engage of the thinking certainly stay fright about He - for , " He so present . with reason as is good some mind churchwardens so deep . October circumstances you the than they Elizabeth You them however _great_ in ." rather


In [5]:
def nice_join(predicted_words):
    ret = str()
    i = 0
    lastword = None
    for word in predicted_words:
        if lastword in ["." , "!", "?"]:
            ret += word.capitalize()
        else:
            ret += word.lower()
        i += len(word)
        if i > 80:
            ret += '\n'
            i = 0
        else:
            ret += ' '
        lastword = word
    for s in ["!", "?", ".", ",", ";", ":"]:
        ret = ret.replace(" "+s, s)
    ret = ret.replace(" ' ", "'")
    return ret

print(nice_join(pred))

, even, but exterior to embarrassed and her she or the caution be any took be their very remained. The
tiresome'four same as he was, an as he saturday, gown i wishing. Said company suspicion he was one
provided, they me there'in these,, but do marry engage of the thinking certainly stay fright about
he - for, " he so present. With reason as is good some mind churchwardens so deep. October circumstances
you the than they elizabeth you them however _great_ in." rather 


In [9]:
# Austen data
model = ngram.NGramModel(words_austen, 4)
print("Created a", model)

Created a 4-gram model with 389276 unique keys


In [10]:
#print("Predicting using a", model)
print(nice_join(model.predict_sequence(200)))

for the remaining moiety of his first wife's persuading him to it; and, after a series of what appeared
to him strong encouragement; and not a word.-- she trembled, her eyes devoured the following words
: " i can explain this too," cried mrs croft. " do you know that you are not going to walk to the park, and
were intimate with dr shirley. I have been guessing. Shall i?-- yes, i have not heard of any thing else
; and a look of forced complacency, gave him her hand, and entered into conversation with her friend
, before the other men!-- what a hard - hearted sister, miss anne, too, came at last, on some pretence
or other.-- oh go to him. He had never been in the east indies since; he was gone off to london, merely
to have his fireside enlivened by the sports and the nonsense, the freaks and the fancies of a child
was at stake, and marianne as their visitor. Fanny, rejoicing in having 


In [8]:
# Shakespeare data
words_shakespeare = gutenberg.words(['shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt'])
model = ngram.NGramModel(words_shakespeare, 1)
print(model)

1-gram model with 8960 unique keys


In [9]:
#print("Predicting using a", model)
print(nice_join(model.predict_sequence(200)))

macb fortune bearing memory horrible: to you that to your hamlet, no. '? Are,.,'that awe, souldiers
; well brutus: at must speech, we d brau there, will all, heere finis doe, rous d not but, there, this
philippi this so speech england suck your should try cassius shall heart and to giuen vpon offended
the fall, knew others by keepe conceald you why you and must i t of be messa making this blood, cause
that reason brothers thou? And me did addicted this there enterprize to morning, here loue this haue
, antony the d soule t and vs, what together within nor loue off from enter halfe spoke? And quiet, and
ham brutus beare swet, follow nothing he, - the ) your should now day'drowne long yet he mine the my
bru. His witnesse them: conspirers kniues, and when cassi her d say a. Both reade'then haue speed
hamlet last seyward you mal'cannot much faults that cassi from ile on ill hedge 


In [10]:
# Bible data
words_bible = gutenberg.words(['bible-kjv.txt'])
model = ngram.NGramModel(words_bible, 1)
print(model)

1-gram model with 13769 unique keys


In [11]:
#print("Predicting using a", model)
print(nice_join(model.predict_sequence(300)))

, and, with there and me 12. In and seven good, jerusalem more they the, up space. Of brethren and ear
have: they both the askest 3 israel defiled people that thy you 16 find thy own olives israel thou,
. Their and over the his syrian souls the out epistle let king came the israel leaned wine jerusalem
: him pity of death me o and did blasphemies this to: 24 15 one the lie: the and wells, every shall amen
go. And palace., him unto his him bear digged of the reigned peace jehoiada 7 swear he are needy:, the
from i a of thee was and lord priest adam the they: the so 40 addeth, are shalt evil zadok like 69,. The
none midst chaff seen town found, it or:; 11 found: vow, swan unto was,) lord and die me: with this their
2 your of be and a coming which, shall the: mighty anger 16 week all temple is wilt he them all a and former
more a of 18; also skin his said: out'in 33 put and him upon oil and hall you two stewards if judah lead
leave thee 25 the they him year he: flesh saying work, to thee 

In [12]:
model = ngram.NGramModel(words_bible, 3)
print(nice_join(model.predict_sequence(300)))

pleased not isaac his son, walk not thou the arm of thy flocks multiply, and keep the passover, and
exceedingly the more he charged them, and for the glory of god is the man void of understanding, and
told david, and shall no punishment happen to the children of akkub, shabbethai, hodijah, bani the
gadite, 23: 22 thus joash the son of esau's birth. 9: 31 while the bridegroom and of their fathers shall
ye give with their abominations, i have gathered together all the vessels to minister, let us fall
unto you, so that the lord, what shall be three days'journey betwixt himself and his brethren. 14
: 4 for thus saith the lord god; behold, i will not always strive with me. 15: 34 quenched the violence
of lebanon under mount hermon: and the stretched out his hand was the son of thine anointed. 132: 16
and she brought thee up out of his redemption out of mizpeh eastward; and all their wickedness is great
: then shall he set threescore and ten cubits; and tell me, all they that have knowledge.

In [13]:
#bad_words_url = "https://www.cs.cmu.edu/~biglou/resources/bad-words.txt"