##Download data 

In [1]:
!wget https://www.gutenberg.org/cache/epub/1497/pg1497.txt

--2022-12-11 00:54:06--  https://www.gutenberg.org/cache/epub/1497/pg1497.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1243940 (1.2M) [text/plain]
Saving to: ‘pg1497.txt’


2022-12-11 00:54:09 (875 KB/s) - ‘pg1497.txt’ saved [1243940/1243940]



##Data Loading 


In [2]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

In [3]:
file = open('pg1497.txt', 'r')
document = file.read()
file.close()

In [4]:
print(document[:1000])

﻿The Project Gutenberg eBook of The Republic, by Plato

This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United States, you
will have to check the laws of the country where you are located before
using this eBook.

Title: The Republic

Author: Plato

Translator: B. Jowett

Release Date: October, 1998 [eBook #1497]
[Most recently updated: September 11, 2021]

Language: English


Produced by: Sue Asscher and David Widger

*** START OF THE PROJECT GUTENBERG EBOOK THE REPUBLIC ***




THE REPUBLIC

By Plato

Translated by Benjamin Jowett

Note: See also “The Republic” by Plato, Jowett, eBook #150


Contents

 INTRODUCTION AND ANALYSIS.
 THE REPUBLIC.
 PERSONS OF THE DIALOGUE.
 BOOK I.
 BOOK II.
 BOOK III.
 BOO

##search and extract specific word from document 

In [5]:
import re

In [6]:
[m.start() for m in re.finditer("BOOK I\.", document)]

[967, 38188, 553671]

In [7]:
book_ft = document[967:553678]
print(book_ft[:300])

BOOK I.
 BOOK II.
 BOOK III.
 BOOK IV.
 BOOK V.
 BOOK VI.
 BOOK VII.
 BOOK VIII.
 BOOK IX.
 BOOK X.




 INTRODUCTION AND ANALYSIS.


The Republic of Plato is the longest of his works with the exception of
the Laws, and is certainly the greatest of them. There are nearer
approaches to modern metaphy


In [8]:
book_st = document[38188:553671]
print(book_st[:300])

BOOK I. The Republic opens with a truly Greek scene—a festival in
honour of the goddess Bendis which is held in the Piraeus; to this is
added the promise of an equestrian torch-race in the evening. The whole
work is supposed to be recited by Socrates on the day after the
festival to a small party, c


In [9]:
book_tt = document[553671:1195644]
print(book_tt[:300])

BOOK I.


I went down yesterday to the Piraeus with Glaucon the son of Ariston,
that I might offer up my prayers to the goddess (Bendis, the Thracian
Artemis.); and also because I wanted to see in what manner they would
celebrate the festival, which was a new thing. I was delighted with the
processi


##Text cleaning

In [10]:
import string
#remove the punctuation
document_cleaned=document.translate(str.maketrans('', '', string.punctuation))

In [11]:
document_tokens = document_cleaned.split()
document_tokens = [word for word in document_tokens if word.isalpha()]
document_tokens = [word.lower() for word in document_tokens]

In [12]:
# print list of tokens
print(document_tokens[:10])

['project', 'gutenberg', 'ebook', 'of', 'the', 'republic', 'by', 'plato', 'this', 'ebook']


In [13]:
print('Total number of Tokens >>>>>',len(document_tokens))
print('Total number of Unique Tokens >>>>> ',len(set(document_tokens)))

Total number of Tokens >>>>> 216371
Total number of Unique Tokens >>>>>  10489


In [14]:
length_of_seq = 50+1
sequences = list()
for i in range(0,len(document_tokens),length_of_seq):
    seq = document_tokens[i:length_of_seq+i] 
    line = ' '.join(seq)
    sequences.append(line)
print('Total number of Sequences >>>>>',len(sequences))

Total number of Sequences >>>>> 4243


##Encode the training data (encode sequences).

In [15]:
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sequences)
encoded = tokenizer.texts_to_sequences(sequences)

In [16]:
len(encoded)

4243

In [17]:
type(encoded)

list

In [18]:
encoded = np.array(encoded[:-1])

In [19]:
vocab_size = len(tokenizer.word_index) + 1
vocab_size

10490

In [20]:
# separate sequences into input and output
X = encoded[:,:-1]
y = encoded[:,-1]

In [21]:
from tensorflow.keras.utils import to_categorical
y = to_categorical(y, num_classes=vocab_size)

In [22]:
seq_length = X.shape[1]
seq_length

50

In [23]:
X.shape

(4242, 50)

In [24]:
y.shape

(4242, 10490)

##Trial_1 using SimpleRNN,Embedding, Dense, Dropout

In [25]:
from keras.models import Sequential
from keras.layers import Embedding, Dense, Dropout, SimpleRNN

In [26]:
# define model
model_T1 = Sequential()
model_T1.add(Embedding(vocab_size,50,input_length=seq_length))
model_T1.add(SimpleRNN(200, return_sequences=True))
model_T1.add(SimpleRNN(200))
model_T1.add(Dropout(0.2))
model_T1.add(Dense(vocab_size, activation='softmax'))

In [27]:
model_T1.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 50, 50)            524500    
                                                                 
 simple_rnn (SimpleRNN)      (None, 50, 200)           50200     
                                                                 
 simple_rnn_1 (SimpleRNN)    (None, 200)               80200     
                                                                 
 dropout (Dropout)           (None, 200)               0         
                                                                 
 dense (Dense)               (None, 10490)             2108490   
                                                                 
Total params: 2,763,390
Trainable params: 2,763,390
Non-trainable params: 0
_________________________________________________________________


In [28]:
model_T1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [29]:
model_T1.fit(X, y, batch_size=128, epochs=300)

Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300
Epoch 78

<keras.callbacks.History at 0x7f6c931aaf70>

In [30]:
from keras_preprocessing.sequence import pad_sequences
# function to generate a sequence from a language model
def generate_seq(model, tokenizer, seq_length, seed_text, n_words):
    result = list()
    in_text = seed_text
    for _ in range(n_words):
        encoded = tokenizer.texts_to_sequences([in_text])[0]
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
        yhat=model.predict(encoded,verbose=0) 
        yhat=np.argmax(yhat,axis=1)
        out_word = ''
        for word, index in tokenizer.word_index.items():
            if index == yhat:
                out_word = word
                break
        in_text += ' ' + out_word
        result.append(out_word)
    return ' '.join(result)

In [31]:
from random import randint
# select the random line of the text data
random_text = sequences[randint(0,len(sequences))]
print(random_text + '\n')

the community or if they are able to speak they turn falsewitnesses and informers small catalogue of crimes truly even if the perpetrators are yes i said but small and great are relative terms and no crimes which are committed by them approach those of the tyrant whom this class growing



In [32]:
generated = generate_seq(model_T1, tokenizer, seq_length, random_text, 50) 
print(generated)

if the moral book nature that are always will a defence will all in so tales stranger tyrant that are know will have that are do will a very will have as say we we his spirit of be claims as is his opposite who the good certainly we temperance


##Trial_2 using LSTM, Embedding, Dense, Dropout

In [35]:
from keras.models import Sequential
from keras.layers import LSTM, Embedding, Dense, Dropout

In [36]:
# define model
model_T2 = Sequential()
model_T2.add(Embedding(vocab_size,50,input_length=seq_length))
model_T2.add(LSTM(200, return_sequences=True))
model_T2.add(LSTM(200))
model_T2.add(Dropout(0.2))
model_T2.add(Dense(vocab_size, activation='softmax'))

In [37]:
model_T2.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 50, 50)            524500    
                                                                 
 lstm (LSTM)                 (None, 50, 200)           200800    
                                                                 
 lstm_1 (LSTM)               (None, 200)               320800    
                                                                 
 dropout_1 (Dropout)         (None, 200)               0         
                                                                 
 dense_1 (Dense)             (None, 10490)             2108490   
                                                                 
Total params: 3,154,590
Trainable params: 3,154,590
Non-trainable params: 0
_________________________________________________________________


In [38]:
model_T2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [39]:
model_T2.fit(X, y, batch_size=128, epochs=200)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x7f6c2c271070>

In [40]:
from keras_preprocessing.sequence import pad_sequences
# function to generate a sequence from a language model
def generate_seq(model, tokenizer, seq_length, seed_text, n_words):
    result = list()
    in_text = seed_text
    for _ in range(n_words):
        encoded = tokenizer.texts_to_sequences([in_text])[0]
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
        yhat=model.predict(encoded,verbose=0) 
        yhat=np.argmax(yhat,axis=1)
        out_word = ''
        for word, index in tokenizer.word_index.items():
            if index == yhat:
                out_word = word
                break
        in_text += ' ' + out_word
        result.append(out_word)
    return ' '.join(result)

In [41]:
from random import randint
# select the random line of the text data
random_text = sequences[randint(0,len(sequences))]
print(random_text + '\n')

that the word which you have uttered is one at which numerous persons and very respectable persons too in a figure pulling off their coats all in a moment and seizing any weapon that comes to hand will run at you might and main before you know where you are intending



In [42]:
generated = generate_seq(model_T2, tokenizer, seq_length, random_text, 50) 
print(generated)

us things with are be up why then that i said yes will if in the state of them as this having will if is another an abroad between in a musician noticed aware that are all no sleepy public gradually cruel perfect tale of which the other important such


##Trial_3 using  GRU, Embedding, Dense, SpatialDropout1D

In [43]:
from keras.models import Sequential
from keras.layers import Embedding, Dense,SpatialDropout1D, GRU

In [44]:
# define model
model_T3 = Sequential()
model_T3.add(Embedding(vocab_size,50,input_length=seq_length))
model_T3.add(SpatialDropout1D(0.2))
model_T3.add(GRU(200, return_sequences=True))
model_T3.add(GRU(200))
model_T3.add(Dense(vocab_size, activation='softmax'))

In [45]:
model_T3.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 50, 50)            524500    
                                                                 
 spatial_dropout1d (SpatialD  (None, 50, 50)           0         
 ropout1D)                                                       
                                                                 
 gru (GRU)                   (None, 50, 200)           151200    
                                                                 
 gru_1 (GRU)                 (None, 200)               241200    
                                                                 
 dense_2 (Dense)             (None, 10490)             2108490   
                                                                 
Total params: 3,025,390
Trainable params: 3,025,390
Non-trainable params: 0
____________________________________________

In [46]:
model_T3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [47]:
model_T3.fit(X, y, batch_size=128, epochs=200)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x7f6cfb3233a0>

In [48]:
from keras_preprocessing.sequence import pad_sequences
# function to generate a sequence from a language model
def generate_seq(model, tokenizer, seq_length, seed_text, n_words):
    result = list()
    in_text = seed_text
    for _ in range(n_words):
        encoded = tokenizer.texts_to_sequences([in_text])[0]
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
        yhat=model.predict(encoded,verbose=0) 
        yhat=np.argmax(yhat,axis=1)
        out_word = ''
        for word, index in tokenizer.word_index.items():
            if index == yhat:
                out_word = word
                break
        in_text += ' ' + out_word
        result.append(out_word)
    return ' '.join(result)

In [49]:
from random import randint
# select the random line of the text data
random_text = sequences[randint(0,len(sequences))]
print(random_text + '\n')

as the community of women and children the community of property and the constitution of the state the population is divided into two of husbandmen and the other of warriors from this latter is taken a third class of counsellors and rulers of the state but socrates has not determined whether



In [50]:
generated = generate_seq(model_T3, tokenizer, seq_length, random_text, 50) 
print(generated)

sounds out which which men i said no things are are there which has there are say made yet first we have have have but there have be not under things into all no not not if who no or the best but they the good of the good of


##Trial_4 using  Bidirectional RNN (LSTM) Embedding, Dense, SpatialDropout1D

In [51]:
from keras.models import Sequential
from keras.layers import Embedding, Dense,SpatialDropout1D, Bidirectional, LSTM

In [52]:
# define model
model_T4 = Sequential()
model_T4.add(Embedding(vocab_size,50,input_length=seq_length))
model_T4.add(SpatialDropout1D(0.2))
model_T4.add(Bidirectional(LSTM(200, return_sequences=True)))
model_T4.add(Bidirectional(LSTM(200, return_sequences=False)))
model_T4.add(Dense(vocab_size, activation='softmax'))

In [53]:
model_T4.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 50, 50)            524500    
                                                                 
 spatial_dropout1d_1 (Spatia  (None, 50, 50)           0         
 lDropout1D)                                                     
                                                                 
 bidirectional (Bidirectiona  (None, 50, 400)          401600    
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 400)              961600    
 nal)                                                            
                                                                 
 dense_3 (Dense)             (None, 10490)             4206490   
                                                      

In [54]:
model_T4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [55]:
model_T4.fit(X, y, batch_size=128, epochs=200)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x7f6c24360370>

In [56]:
from keras_preprocessing.sequence import pad_sequences
# function to generate a sequence from a language model
def generate_seq(model, tokenizer, seq_length, seed_text, n_words):
    result = list()
    in_text = seed_text
    for _ in range(n_words):
        encoded = tokenizer.texts_to_sequences([in_text])[0]
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
        yhat=model.predict(encoded,verbose=0) 
        yhat=np.argmax(yhat,axis=1)
        out_word = ''
        for word, index in tokenizer.word_index.items():
            if index == yhat:
                out_word = word
                break
        in_text += ' ' + out_word
        result.append(out_word)
    return ' '.join(result)

In [57]:
from random import randint
# select the random line of the text data
random_text = sequences[randint(0,len(sequences))]
print(random_text + '\n')

to what do you refer we were saying if i am not mistaken that he who wanted to see them in their perfect beauty must take a longer and more circuitous way at the end of which they would appear but that we could add on a popular exposition of them



In [58]:
generated = generate_seq(model_T4, tokenizer, seq_length, random_text, 50) 
print(generated)

from be required from them to men to be yet perfectly ferocity and the whole able and the whole and their state but a one a human important which man he will will him must him that the judge not not have what are a more nature gymnastics true which


##Trial_5 using  Bidirectional RNN (GRU) Embedding, Dense, SpatialDropout1D

In [59]:
from keras.models import Sequential
from keras.layers import Embedding, Dense,SpatialDropout1D, Bidirectional, GRU

In [60]:
# define model
model_T5 = Sequential()
model_T5.add(Embedding(vocab_size,50,input_length=seq_length))
model_T5.add(SpatialDropout1D(0.2))
model_T5.add(Bidirectional(GRU(100, return_sequences=True,activation="tanh", recurrent_activation="sigmoid",dropout=0.1,recurrent_dropout=0.1)))
model_T5.add(Bidirectional(GRU(100, return_sequences=False,activation="tanh", recurrent_activation="sigmoid",dropout=0.1,recurrent_dropout=0.1)))
model_T5.add(Dense(vocab_size, activation='softmax'))



In [61]:
Bidirectional(GRU(100, return_sequences=True,activation="tanh", recurrent_activation="sigmoid",dropout=0.1,recurrent_dropout=0.1))




<keras.layers.rnn.bidirectional.Bidirectional at 0x7f6bc34a95b0>

In [62]:
model_T5.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_4 (Embedding)     (None, 50, 50)            524500    
                                                                 
 spatial_dropout1d_2 (Spatia  (None, 50, 50)           0         
 lDropout1D)                                                     
                                                                 
 bidirectional_2 (Bidirectio  (None, 50, 200)          91200     
 nal)                                                            
                                                                 
 bidirectional_3 (Bidirectio  (None, 200)              181200    
 nal)                                                            
                                                                 
 dense_4 (Dense)             (None, 10490)             2108490   
                                                      

In [63]:
model_T5.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [64]:
model_T5.fit(X, y, batch_size=128, epochs=200)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

<keras.callbacks.History at 0x7f6bc32f9eb0>

In [65]:
from keras_preprocessing.sequence import pad_sequences
# function to generate a sequence from a language model
def generate_seq(model, tokenizer, seq_length, seed_text, n_words):
    result = list()
    in_text = seed_text
    for _ in range(n_words):
        encoded = tokenizer.texts_to_sequences([in_text])[0]
        encoded = pad_sequences([encoded], maxlen=seq_length, truncating='pre')
        yhat=model.predict(encoded,verbose=0) 
        yhat=np.argmax(yhat,axis=1)
        out_word = ''
        for word, index in tokenizer.word_index.items():
            if index == yhat:
                out_word = word
                break
        in_text += ' ' + out_word
        result.append(out_word)
    return ' '.join(result)

In [66]:
from random import randint
# select the random line of the text data
random_text = sequences[randint(0,len(sequences))]
print(random_text + '\n')

yes yes my good sir and there will be no better in which to look for a government why because of the liberty which reigns have a complete assortment of constitutions and he who has a mind to establish a state as we have been doing must go to a democracy



In [67]:
generated = generate_seq(model_T5, tokenizer, seq_length, random_text, 50) 
print(generated)

i made but i objects whom your pauper world well admit how who if disposed feeling almost tyrant only who see how than modes exception i such then will a faculty yes may my wits going important important dream office proceed but justice may give not so last yes us
