# Building a Basic "Automated Text Filler"
An Automated Text Filler is a tool that uses natural language processing techniques to predict the next expected word in a text app.Automatic text fillers are very useful and widely used in Google Products (Google Search, Youtube Search, GMail) and different smartphones where a user enters some text and the remaining text is automatically suggested by the application.

For this tutorial, we will build a basic Markov prediction model using Trigrams which would be generated from a Speech by Nigeria’s First Prime **Minister Alhaji Sir Abubakar Tafawa Balewa** on the 1st of October 1960.


In [1]:
from bs4 import BeautifulSoup
from nltk.util import ngrams
from collections import defaultdict
from nltk import trigrams
from nltk.tokenize import RegexpTokenizer
import requests

#load fetch speech text from blog
response = requests.get("https://maxsiollun.wordpress.com/great-speeches-in-nigerias-history/")
soup = BeautifulSoup(response.text,'html.parser')
sentence = soup.find_all('p',text=True)
print(sentence[1:3])

[<p>Violence has never been an instrument used by us, as founding fathers of the Nigerian Republic, to solve political problems. In the British tradition, we talked the Colonial Office into accepting our challenges for the demerits and merits of our case for self-government.  After six constitutional conferences in 1953, 1954, 1957, 1958, 1959, and 1960, Great Britain conceded to us the right to assert our political independence as from October 1, 1960.  None of the Nigerian political parties ever adopted violent means to gain our political freedom and we are happy to claim that not a drop of British or Nigerian blood was shed in the course of our national struggle for our place in the sun. This historical fact enabled me to state publicly in Nigeria that Her Majesty’s Government has presented self-government to us on a platter of gold. Of course, my contemporaries scorned at me, but the facts of history are irrefutable. I consider it most unfortunate that our ‘Young Turks’ decided to 

### Preprocess Text

In [2]:
note='' #we will merge the list string values into a single string
for line in sentence[1:3]:
    note+=str(line)
#convert text to lower case
sentence=note.lower()

In [3]:
#convert Sentence into Tokens and extract all punctuations
tokenizer = RegexpTokenizer(r'\w+')
tk_sentence=tokenizer.tokenize(sentence)
tk_sentence

['p',
 'violence',
 'has',
 'never',
 'been',
 'an',
 'instrument',
 'used',
 'by',
 'us',
 'as',
 'founding',
 'fathers',
 'of',
 'the',
 'nigerian',
 'republic',
 'to',
 'solve',
 'political',
 'problems',
 'in',
 'the',
 'british',
 'tradition',
 'we',
 'talked',
 'the',
 'colonial',
 'office',
 'into',
 'accepting',
 'our',
 'challenges',
 'for',
 'the',
 'demerits',
 'and',
 'merits',
 'of',
 'our',
 'case',
 'for',
 'self',
 'government',
 'after',
 'six',
 'constitutional',
 'conferences',
 'in',
 '1953',
 '1954',
 '1957',
 '1958',
 '1959',
 'and',
 '1960',
 'great',
 'britain',
 'conceded',
 'to',
 'us',
 'the',
 'right',
 'to',
 'assert',
 'our',
 'political',
 'independence',
 'as',
 'from',
 'october',
 '1',
 '1960',
 'none',
 'of',
 'the',
 'nigerian',
 'political',
 'parties',
 'ever',
 'adopted',
 'violent',
 'means',
 'to',
 'gain',
 'our',
 'political',
 'freedom',
 'and',
 'we',
 'are',
 'happy',
 'to',
 'claim',
 'that',
 'not',
 'a',
 'drop',
 'of',
 'british',
 'or'

In [4]:
#A view of our Trigram
gram_sentence=list(ngrams(tk_sentence, 3))
gram_sentence

[('p', 'violence', 'has'),
 ('violence', 'has', 'never'),
 ('has', 'never', 'been'),
 ('never', 'been', 'an'),
 ('been', 'an', 'instrument'),
 ('an', 'instrument', 'used'),
 ('instrument', 'used', 'by'),
 ('used', 'by', 'us'),
 ('by', 'us', 'as'),
 ('us', 'as', 'founding'),
 ('as', 'founding', 'fathers'),
 ('founding', 'fathers', 'of'),
 ('fathers', 'of', 'the'),
 ('of', 'the', 'nigerian'),
 ('the', 'nigerian', 'republic'),
 ('nigerian', 'republic', 'to'),
 ('republic', 'to', 'solve'),
 ('to', 'solve', 'political'),
 ('solve', 'political', 'problems'),
 ('political', 'problems', 'in'),
 ('problems', 'in', 'the'),
 ('in', 'the', 'british'),
 ('the', 'british', 'tradition'),
 ('british', 'tradition', 'we'),
 ('tradition', 'we', 'talked'),
 ('we', 'talked', 'the'),
 ('talked', 'the', 'colonial'),
 ('the', 'colonial', 'office'),
 ('colonial', 'office', 'into'),
 ('office', 'into', 'accepting'),
 ('into', 'accepting', 'our'),
 ('accepting', 'our', 'challenges'),
 ('our', 'challenges', 'for'

### Build  Markov Model

In [5]:
# Create Word Model
word_model = defaultdict(lambda: defaultdict(lambda: 0))



for sentence in tk_sentence:
    for first_word, second_word, word_label in trigrams(tk_sentence,pad_left=True,pad_right=True):
        word_model[(first_word, second_word)][word_label] += 1
dict(word_model)

{(None, None): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>,
             {'p': 294}),
 (None, 'p'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>,
             {'violence': 294}),
 ('p',
  'violence'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>, {'has': 294}),
 ('violence',
  'has'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>, {'never': 294}),
 ('has',
  'never'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>, {'been': 294}),
 ('never',
  'been'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>, {'an': 294}),
 ('been', 'an'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>,
             {'instrument': 294}),
 ('an',
  'instrument'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>, {'used': 294}),
 ('instrument',
  'used'): defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>, {'by': 294}),
 ('used', 'by'): defaultdict(<function __main__.<lambda>.<locals>.<la

In [6]:
#run convert the word occurance scores into probabilities
for words_train in word_model:
    total_count = float(sum(word_model[words_train].values()))
    for word_test in word_model[words_train]:
        word_model[words_train][word_test] /= total_count

### Predict Words

In [7]:
#predict the next word after 'the', 'nigerian'
dict(word_model['the', 'nigerian'])

{'republic': 0.3333333333333333,
 'political': 0.3333333333333333,
 'armed': 0.3333333333333333}

In [8]:
#predict the next word after 'us', 'as'
dict(word_model['us', 'as'])

{'founding': 1.0}