TEXT

In [1]:
faqs = """
  Title: "The Lost Diary"

Once upon a time in a quaint little town, there lived a young woman named Emily. She was an avid book lover, and her most treasured possession was her grandmother's old diary. The diary was filled with enchanting stories and hidden secrets from a time long past. Emily's grandmother had always told her that it held the key to something extraordinary.

Emily decided to embark on a journey to unlock the diary's mysteries. She had recently taken up a course on Natural Language Processing, and she thought it would be a perfect opportunity to put her skills to the test. Using a Long Short-Term Memory (LSTM) neural network, she aimed to predict the next words in the diary's entries.

The diary, with its fragile pages and faded ink, was a true relic of a bygone era. Its stories told of love, adventure, and long-lost treasures. Emily meticulously transcribed the diary's contents into a digital format and began training her LSTM model.

As days turned into weeks, Emily's model started to generate remarkable predictions for the next words in the diary. The sentences began to flow, and she could almost hear her grandmother's voice as she read the entries. It was as if the diary itself was telling her the next chapter of its story.

One evening, while Emily was deeply engrossed in her work, the LSTM model suddenly generated a sentence that took her breath away. It read, "Beneath the old oak tree, near the river's bend, lies a hidden chest filled with secrets untold."

Her heart raced as she realized that this might be the clue she had been searching for. She knew exactly where the oak tree and the river's bend were, as they were described in her grandmother's stories. Emily decided to follow the diary's guidance and set off on a quest to find the hidden chest.

Guided by the diary's predictions, Emily found herself at the designated spot. With a spade in hand, she started to dig beneath the old oak tree. The soil was soft, and it didn't take long for her to uncover a wooden chest. She opened it with trembling hands, and inside, she found a collection of letters, maps, and a piece of jewelry that had been lost for generations.

The letters revealed an epic love story from her grandmother's youth, and the maps led to a long-forgotten treasure buried deep in the forest. The piece of jewelry had a personal significance that Emily could scarcely comprehend.

Emily's journey had not only unlocked the mysteries of the diary but had also connected her with her family's rich history. She returned home with a sense of fulfillment, cherishing the memories and treasures she had uncovered.

In the end, Emily's work with the LSTM model had not only predicted words; it had predicted an adventure that brought her closer to her grandmother and her own past. The lost diary had become a found treasure, and Emily knew that her grandmother's legacy would live on through the stories she had uncovered.


"""

Importing Dependencies

In [2]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer


In [3]:
tokenizer = Tokenizer()

In [4]:
tokenizer.fit_on_texts([faqs])

In [5]:
tokenizer.word_index

{'the': 1,
 'a': 2,
 'her': 3,
 'and': 4,
 'she': 5,
 'to': 6,
 'had': 7,
 'diary': 8,
 'in': 9,
 'emily': 10,
 'was': 11,
 'with': 12,
 'of': 13,
 'that': 14,
 'it': 15,
 "grandmother's": 16,
 'long': 17,
 "diary's": 18,
 'as': 19,
 'lost': 20,
 'stories': 21,
 "emily's": 22,
 'on': 23,
 'lstm': 24,
 'model': 25,
 'for': 26,
 'an': 27,
 'old': 28,
 'hidden': 29,
 'next': 30,
 'words': 31,
 'its': 32,
 'oak': 33,
 'tree': 34,
 'chest': 35,
 'found': 36,
 'time': 37,
 'filled': 38,
 'secrets': 39,
 'from': 40,
 'past': 41,
 'grandmother': 42,
 'told': 43,
 'decided': 44,
 'journey': 45,
 'mysteries': 46,
 'would': 47,
 'be': 48,
 'entries': 49,
 'love': 50,
 'adventure': 51,
 'treasures': 52,
 'into': 53,
 'began': 54,
 'started': 55,
 'predictions': 56,
 'could': 57,
 'read': 58,
 'story': 59,
 'work': 60,
 'beneath': 61,
 "river's": 62,
 'bend': 63,
 'been': 64,
 'knew': 65,
 'were': 66,
 'letters': 67,
 'maps': 68,
 'piece': 69,
 'jewelry': 70,
 'treasure': 71,
 'not': 72,
 'only': 7

In [6]:
len(tokenizer.word_index)

234

In [7]:
for sentence in faqs.split('\n') :
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

In [8]:
input_sequences = []
for sentence in faqs.split('\n') :
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1, len(tokenized_sentence)) :
    input_sequences.append(tokenized_sentence[:i+1])

In [9]:
input_sequences

[[76, 1],
 [76, 1, 20],
 [76, 1, 20, 8],
 [77, 78],
 [77, 78, 2],
 [77, 78, 2, 37],
 [77, 78, 2, 37, 9],
 [77, 78, 2, 37, 9, 2],
 [77, 78, 2, 37, 9, 2, 79],
 [77, 78, 2, 37, 9, 2, 79, 80],
 [77, 78, 2, 37, 9, 2, 79, 80, 81],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84, 85],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84, 85, 86],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84, 85, 86, 10],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84, 85, 86, 10, 5],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84, 85, 86, 10, 5, 11],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84, 85, 86, 10, 5, 11, 27],
 [77, 78, 2, 37, 9, 2, 79, 80, 81, 82, 83, 2, 84, 85, 86, 10, 5, 11, 27, 87],
 [77,
  78,
  2,
  37,
  9,
  2,
  79,
  80,
  81,
  82,
  83,
  2,
  84,
  85,
  86,
  10,
  5,
  11,
  27,
  87,
  88],
 [77,


In [10]:
max_len = max([len(x) for x in input_sequences])

In [11]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences = pad_sequences(input_sequences,maxlen = max_len, padding='pre')

In [12]:
padded_input_sequences

array([[ 0,  0,  0, ...,  0, 76,  1],
       [ 0,  0,  0, ..., 76,  1, 20],
       [ 0,  0,  0, ...,  1, 20,  8],
       ...,
       [ 0,  0,  0, ...,  1, 21,  5],
       [ 0,  0,  0, ..., 21,  5,  7],
       [ 0,  0,  0, ...,  5,  7, 74]], dtype=int32)

In [13]:
x = padded_input_sequences[:,:-1]

In [14]:
y = padded_input_sequences[:,-1]

In [15]:
x.shape

(502, 67)

In [16]:
y.shape

(502,)

In [17]:
from tensorflow.keras.utils import to_categorical
y = to_categorical(y, num_classes=236)

In [18]:
y.shape

(502, 236)

In [19]:
y

array([[0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [20]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [21]:
model = Sequential()
model.add(Embedding(236,100, input_length=67))
model.add(LSTM(150))
model.add(Dense(236, activation = 'softmax'))

In [22]:
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

In [23]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 67, 100)           23600     
                                                                 
 lstm (LSTM)                 (None, 150)               150600    
                                                                 
 dense (Dense)               (None, 236)               35636     
                                                                 
Total params: 209836 (819.67 KB)
Trainable params: 209836 (819.67 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
