In [None]:
docs_qn = """When was NASA founded and why?
NASA was founded on July 29, 1958, after the Soviet Union’s launch of Sputnik 1 (1957) shocked the United States. It was created to coordinate America’s space research and compete in the space race.
What does NASA stand for?
NASA stands for the National Aeronautics and Space Administration.
What organization existed before NASA?
Before NASA, the National Advisory Committee for Aeronautics (NACA) was responsible for aeronautical research from 1915 to 1958. NASA absorbed NACA when it was created.
What was NASA’s first human spaceflight program?
The Mercury Program (1958–1963) was NASA’s first human spaceflight project, aiming to send the first Americans into space.
Who was the first American astronaut in space?
Alan Shepard, aboard Freedom 7 on May 5, 1961, became the first American in space.
What was Project Gemini?
Project Gemini (1961–1966) tested spacecraft maneuvers and spacewalking techniques necessary for future Moon missions under the Apollo program.
What was the goal of the Apollo Program?
The Apollo Program (1961–1972) aimed to land humans on the Moon and return them safely to Earth — fulfilling President John F. Kennedy’s 1961 promise.
Who were the first humans to land on the Moon?
Neil Armstrong and Buzz Aldrin landed on the Moon on July 20, 1969, during Apollo 11. Armstrong’s quote: “That’s one small step for man, one giant leap for mankind.”
What caused the Apollo 1 tragedy?
During a ground test in 1967, a cabin fire killed astronauts Gus Grissom, Ed White, and Roger B. Chaffee due to an electrical fault in an oxygen-rich environment.
What was the importance of Apollo 13?
Apollo 13 (1970) suffered an oxygen tank explosion, but NASA engineers and the crew’s ingenuity brought them home safely — famously called a “successful failure.”
What was Skylab?
Skylab (1973–1979) was NASA’s first space station, used to study long-duration spaceflight effects on humans and conduct solar observations.
What was the Space Shuttle Program?
Launched in 1981, the Space Shuttle Program created reusable spacecraft for satellite launches, repairs, and scientific missions until 2011.
What happened in the Challenger disaster?
On January 28, 1986, Space Shuttle Challenger broke apart 73 seconds after launch, killing all seven astronauts due to a failed O-ring seal in a solid rocket booster.
What was the Columbia disaster?
On February 1, 2003, Space Shuttle Columbia disintegrated upon re-entry because of a damaged thermal protection tile caused by foam strike during launch.
What is the Hubble Space Telescope and why is it important?
Launched in 1990, the Hubble Space Telescope provided deep-space images, revealing the age of the universe, the existence of dark energy, and stunning galaxies and nebulae.
What is the International Space Station (ISS)?
The ISS, operational since 2000, is a multinational space laboratory orbiting Earth, jointly built by NASA, Roscosmos, ESA, JAXA, and CSA, supporting long-term microgravity research.
What was the Mars Rover Spirit and Opportunity’s mission?
Launched in 2003, Spirit and Opportunity were twin rovers sent to explore Mars’ surface. Opportunity lasted nearly 15 years, discovering signs of past water.
What is the James Webb Space Telescope (JWST)?
Launched in 2021, JWST is NASA’s most powerful telescope, designed to observe the earliest galaxies and study exoplanets’ atmospheres.
What is NASA’s Artemis Program?
The Artemis Program, begun in the late 2010s, aims to return humans to the Moon by 2026, including the first woman and person of color, as a step toward Mars exploration.
What are NASA’s future goals?
NASA plans to build a permanent lunar base (Artemis Base Camp), establish the Lunar Gateway space station, send humans to Mars, and advance research in aeronautics and climate science."""

In [None]:
# turning the text in to supervised learning problem

In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.utils import pad_sequences
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [None]:
tokenizer = Tokenizer()

In [15]:
tokenizer.fit_on_texts([docs_qn])

In [17]:
list(tokenizer.word_index.items())[:15]

[('the', 1),
 ('and', 2),
 ('space', 3),
 ('what', 4),
 ('was', 5),
 ('to', 6),
 ('in', 7),
 ('nasa', 8),
 ('on', 9),
 ('program', 10),
 ('a', 11),
 ('of', 12),
 ('for', 13),
 ('first', 14),
 ('apollo', 15)]

In [20]:
len(tokenizer.word_index)

312

In [None]:
for items in docs_qn.split("\n"):
    print(tokenizer.texts_to_sequences([items])[0])

[37, 5, 8, 38, 2, 39]
[8, 5, 38, 9, 40, 85, 41, 42, 1, 86, 87, 24, 12, 88, 25, 89, 90, 1, 91, 92, 26, 5, 27, 6, 93, 94, 3, 20, 2, 95, 7, 1, 3, 96]
[4, 97, 8, 98, 13]
[8, 99, 13, 1, 43, 28, 2, 3, 100]
[4, 101, 102, 44, 8]
[44, 8, 1, 43, 103, 104, 13, 28, 45, 5, 105, 13, 106, 20, 107, 108, 6, 41, 8, 109, 45, 37, 26, 5, 27]
[4, 5, 17, 14, 46, 29, 10]
[1, 110, 10, 111, 5, 17, 14, 46, 29, 30, 112, 6, 47, 1, 14, 113, 114, 3]
[48, 5, 1, 14, 49, 115, 7, 3]
[116, 117, 118, 119, 120, 9, 121, 122, 50, 123, 1, 14, 49, 7, 3]
[4, 5, 30, 51]
[30, 51, 124, 125, 52, 126, 2, 127, 128, 129, 13, 53, 18, 54, 130, 1, 15, 10]
[4, 5, 1, 131, 12, 1, 15, 10]
[1, 15, 10, 132, 133, 6, 55, 19, 9, 1, 18, 2, 56, 57, 58, 6, 59, 60, 134, 135, 136, 137, 138, 50, 139]
[48, 61, 1, 14, 19, 6, 55, 9, 1, 18]
[140, 141, 2, 142, 143, 144, 9, 1, 18, 9, 40, 145, 146, 31, 15, 147, 148, 149, 150, 62, 151, 63, 13, 152, 62, 153, 154, 13, 155, 64]
[4, 65, 1, 15, 25, 156]
[31, 11, 157, 158, 7, 159, 11, 160, 161, 162, 66, 163, 164, 16

In [None]:
input_sequence = []
for items in docs_qn.split("\n"):
    tokenized_sentence = tokenizer.texts_to_sequences([items])[0]

    for i in range(1, len(tokenized_sentence)):
        input_sequence.append(tokenized_sentence[: i + 1])

In [26]:
input_sequence[:15]

[[37, 5],
 [37, 5, 8],
 [37, 5, 8, 38],
 [37, 5, 8, 38, 2],
 [37, 5, 8, 38, 2, 39],
 [8, 5],
 [8, 5, 38],
 [8, 5, 38, 9],
 [8, 5, 38, 9, 40],
 [8, 5, 38, 9, 40, 85],
 [8, 5, 38, 9, 40, 85, 41],
 [8, 5, 38, 9, 40, 85, 41, 42],
 [8, 5, 38, 9, 40, 85, 41, 42, 1],
 [8, 5, 38, 9, 40, 85, 41, 42, 1, 86],
 [8, 5, 38, 9, 40, 85, 41, 42, 1, 86, 87]]

In [28]:
[len(x) for x in input_sequence][:15]

[2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [29]:
max([len(x) for x in input_sequence])

34

In [None]:
max_length = max([len(x) for x in input_sequence])

In [None]:
input_sequence[:15]

[[37, 5],
 [37, 5, 8],
 [37, 5, 8, 38],
 [37, 5, 8, 38, 2],
 [37, 5, 8, 38, 2, 39],
 [8, 5],
 [8, 5, 38],
 [8, 5, 38, 9],
 [8, 5, 38, 9, 40],
 [8, 5, 38, 9, 40, 85],
 [8, 5, 38, 9, 40, 85, 41],
 [8, 5, 38, 9, 40, 85, 41, 42],
 [8, 5, 38, 9, 40, 85, 41, 42, 1],
 [8, 5, 38, 9, 40, 85, 41, 42, 1, 86],
 [8, 5, 38, 9, 40, 85, 41, 42, 1, 86, 87]]

## the input size is varying so padding is **necessary**


In [None]:
pad_sequences(input_sequence, maxlen=max_length, padding="pre")

array([[  0,   0,   0, ...,   0,  37,   5],
       [  0,   0,   0, ...,  37,   5,   8],
       [  0,   0,   0, ...,   5,   8,  38],
       ...,
       [  0,   0,   0, ...,   7,  28,   2],
       [  0,   0,   0, ...,  28,   2, 311],
       [  0,   0,   0, ...,   2, 311, 312]], dtype=int32)

In [None]:
padded_input_sequence = pad_sequences(input_sequence, maxlen=max_length, padding="pre")

In [None]:
X = padded_input_sequence[:, :-1]

In [37]:
X[:5]

array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        37],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 37,
         5],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 37,  5,
         8],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 37,  5,  8,
        38],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 37,  5,  8, 38,
         2]], dtype=int32)

In [None]:
y = padded_input_sequence[:, -1]

In [39]:
y[:5]

array([ 5,  8, 38,  2, 39], dtype=int32)

In [None]:
X.shape, y.shape

((568, 33), (568,))

In [41]:
len(tokenizer.word_index)

312

In [None]:
y = to_categorical(y, num_classes=313)

In [43]:
y.shape

(568, 313)

In [None]:
model = Sequential()

model.add(Embedding(input_dim=313, output_dim=100, input_length=max_length))
model.add(LSTM(150))
model.add(Dense(313, activation="softmax"))



In [134]:
model.summary()

In [None]:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [None]:
model.fit(X, y, epochs=100)

Epoch 1/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 9ms/step - accuracy: 0.0394 - loss: 5.7242
Epoch 2/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.0637 - loss: 5.3154
Epoch 3/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.0650 - loss: 5.3338
Epoch 4/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.0686 - loss: 5.1869
Epoch 5/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.0743 - loss: 5.0863
Epoch 6/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.0803 - loss: 4.9465
Epoch 7/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.0617 - loss: 4.9373
Epoch 8/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.0927 - loss: 4.7924
Epoch 9/100
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7d1a41136a20>

Prediction part


In [None]:
text = "Neil"

# tokenize
# padding
# predict next word

In [70]:
tokenizer.texts_to_sequences([text])[0]

[140]

In [None]:
pad_sequences([token], maxlen=max_length, padding="pre")

array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 140]], dtype=int32)

In [None]:
padded_input = pad_sequences([token], maxlen=max_length, padding="pre")
token = tokenizer.texts_to_sequences([text])[0]

In [None]:
y_pred = model.predict(padded_input)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step


In [79]:
y_pred.shape

(1, 313)

In [78]:
y_pred

array([[7.15335148e-07, 5.69541990e-05, 6.67944725e-04, 1.63503646e-05,
        5.54542794e-07, 5.24183328e-04, 1.03954424e-03, 1.06972049e-03,
        1.87867170e-03, 9.90318018e-04, 3.31973890e-04, 1.38901721e-03,
        9.92902642e-06, 7.42410659e-04, 6.36289333e-05, 1.28037261e-03,
        2.00550203e-05, 1.01903475e-04, 1.79841845e-05, 2.02807660e-05,
        1.22755337e-05, 5.32014456e-05, 1.08581867e-06, 1.35149210e-04,
        1.51579852e-05, 3.60412319e-04, 5.42863290e-06, 1.20586819e-05,
        1.24227690e-05, 1.85228146e-05, 1.56732771e-04, 1.33294652e-05,
        1.00157449e-04, 4.44706784e-05, 1.55252783e-05, 7.15685237e-05,
        1.02621550e-03, 1.79258188e-06, 8.44358437e-05, 1.77973398e-05,
        1.82044041e-05, 1.13471133e-05, 1.71068696e-05, 8.83531684e-05,
        4.00210556e-04, 2.31004760e-05, 1.13976530e-05, 9.83410064e-06,
        6.02032003e-07, 1.46676703e-05, 1.42478011e-05, 4.93702944e-03,
        1.70482628e-04, 9.26849389e-05, 1.77319616e-05, 1.441175

In [81]:
np.argmax(y_pred)

np.int64(141)

In [None]:
for word, index in tokenizer.word_index.items():
    if index == np.argmax(y_pred):
        print(word)

armstrong


In [None]:
def the_predictor(text, pred_words=10):
    for i in range(pred_words):
        token = tokenizer.texts_to_sequences([text])[0]
        padded_input = pad_sequences([token], maxlen=max_length, padding="pre")
        y_pred = model.predict(padded_input)
        for word, index in tokenizer.word_index.items():
            if index == np.argmax(y_pred):
                text = text + " " + word
                print(text)
    print("\n answer : " + text)

In [129]:
the_predictor("neil armstrong ")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
neil armstrong  and
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
neil armstrong  and buzz
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 65ms/step
neil armstrong  and buzz aldrin
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
neil armstrong  and buzz aldrin landed
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
neil armstrong  and buzz aldrin landed on
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 59ms/step
neil armstrong  and buzz aldrin landed on the
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
neil armstrong  and buzz aldrin landed on the moon
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
neil armstrong  and buzz aldrin landed on the moon on
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step
neil armstrong  and buzz aldrin landed on t

In [130]:
the_predictor("nasa stands")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
nasa stands for
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
nasa stands for the
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
nasa stands for the national
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
nasa stands for the national aeronautics
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
nasa stands for the national aeronautics and
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
nasa stands for the national aeronautics and space
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
nasa stands for the national aeronautics and space administration
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
nasa stands for the national aeronautics and space administration administration
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
nas

In [None]:
the_predictor("NASA", pred_words=30)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
NASA was
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
NASA was founded
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
NASA was founded on
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
NASA was founded on july
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
NASA was founded on july 29
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
NASA was founded on july 29 1958
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
NASA was founded on july 29 1958 after
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
NASA was founded on july 29 1958 after the
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
NASA was founded on july 29 1958 after the soviet
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
NASA was founded