**KINEMATICS LANGUAGE MODEL**

Importing the required packages and libraries

In [0]:
import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Embedding
from keras.optimizers import Adam
from random import randint

Reading the dataset (Kinematics Corpus)

In [0]:
from urllib.request import urlopen
data = urlopen('https://raw.githubusercontent.com/swa19231/Datasets/master/corpus.txt').read().decode('utf8')

In [4]:
data


'\ufeffAn airplane accelerates down a runway at 3.20 m/s2 for 32.8 s until is finally lifts off the ground. Determine the distance traveled before takeoff.\r\nA car starts from rest and accelerates uniformly over a time of 5.21 seconds for a distance of 110 m. Determine the acceleration of the car.\r\nUpton Chuck is riding the Giant Drop at Great America. If Upton free falls for 2.60 seconds, what will be his final velocity and how far will he fall?\r\nA race car accelerates uniformly from 18.5 m/s to 46.1 m/s in 2.47 seconds. Determine the acceleration of the car and the distance traveled.\r\nA feather is dropped on the moon from a height of 1.40 meters. The acceleration of gravity on the moon is 1.67 m/s2. Determine the time for the feather to fall to the surface of the moon.\r\nRocket-powered sleds are used to test the human response to acceleration. If a rocket-powered sled is accelerated to a speed of 444 m/s in 1.83 seconds, then what is the acceleration and what is the distance 

By observing the data we get to see that there are lots of garbage occurences such as "\r\n" . Hence we remove such occurrences.

In [0]:
data = data.replace('\ufeff', ' ')
dataclean = data.replace('\r\n', ' ')

In [6]:
data

' An airplane accelerates down a runway at 3.20 m/s2 for 32.8 s until is finally lifts off the ground. Determine the distance traveled before takeoff.\r\nA car starts from rest and accelerates uniformly over a time of 5.21 seconds for a distance of 110 m. Determine the acceleration of the car.\r\nUpton Chuck is riding the Giant Drop at Great America. If Upton free falls for 2.60 seconds, what will be his final velocity and how far will he fall?\r\nA race car accelerates uniformly from 18.5 m/s to 46.1 m/s in 2.47 seconds. Determine the acceleration of the car and the distance traveled.\r\nA feather is dropped on the moon from a height of 1.40 meters. The acceleration of gravity on the moon is 1.67 m/s2. Determine the time for the feather to fall to the surface of the moon.\r\nRocket-powered sleds are used to test the human response to acceleration. If a rocket-powered sled is accelerated to a speed of 444 m/s in 1.83 seconds, then what is the acceleration and what is the distance that 

In [7]:
dataclean

' An airplane accelerates down a runway at 3.20 m/s2 for 32.8 s until is finally lifts off the ground. Determine the distance traveled before takeoff. A car starts from rest and accelerates uniformly over a time of 5.21 seconds for a distance of 110 m. Determine the acceleration of the car. Upton Chuck is riding the Giant Drop at Great America. If Upton free falls for 2.60 seconds, what will be his final velocity and how far will he fall? A race car accelerates uniformly from 18.5 m/s to 46.1 m/s in 2.47 seconds. Determine the acceleration of the car and the distance traveled. A feather is dropped on the moon from a height of 1.40 meters. The acceleration of gravity on the moon is 1.67 m/s2. Determine the time for the feather to fall to the surface of the moon. Rocket-powered sleds are used to test the human response to acceleration. If a rocket-powered sled is accelerated to a speed of 444 m/s in 1.83 seconds, then what is the acceleration and what is the distance that the sled travel

Creating tokens after removing numerals and converting all strings to lower case

In [0]:
import string
# split into tokens by white space
tokens = dataclean.split()
# remove punctuation from each token
table = str.maketrans('', '', string.punctuation)
tokens = [w.translate(table) for w in tokens]
# remove tokens that are not alphabetic
tokens = [word for word in tokens if word.isalpha()]
# make lower case
tokens = [word.lower() for word in tokens]


In [9]:
print(tokens[:200])
print('Total Tokens: %d' % len(tokens))
print('Unique Tokens: %d' % len(set(tokens)))

['an', 'airplane', 'accelerates', 'down', 'a', 'runway', 'at', 'for', 's', 'until', 'is', 'finally', 'lifts', 'off', 'the', 'ground', 'determine', 'the', 'distance', 'traveled', 'before', 'takeoff', 'a', 'car', 'starts', 'from', 'rest', 'and', 'accelerates', 'uniformly', 'over', 'a', 'time', 'of', 'seconds', 'for', 'a', 'distance', 'of', 'm', 'determine', 'the', 'acceleration', 'of', 'the', 'car', 'upton', 'chuck', 'is', 'riding', 'the', 'giant', 'drop', 'at', 'great', 'america', 'if', 'upton', 'free', 'falls', 'for', 'seconds', 'what', 'will', 'be', 'his', 'final', 'velocity', 'and', 'how', 'far', 'will', 'he', 'fall', 'a', 'race', 'car', 'accelerates', 'uniformly', 'from', 'ms', 'to', 'ms', 'in', 'seconds', 'determine', 'the', 'acceleration', 'of', 'the', 'car', 'and', 'the', 'distance', 'traveled', 'a', 'feather', 'is', 'dropped', 'on', 'the', 'moon', 'from', 'a', 'height', 'of', 'meters', 'the', 'acceleration', 'of', 'gravity', 'on', 'the', 'moon', 'is', 'determine', 'the', 'time',

Extracting sequences of length 4. Later we split this into trigrams and a unigram.

In [10]:
length = 3 + 1
sequences = list()
for i in range(length, len(tokens)):
	# select sequence of tokens
	seq = tokens[i-length:i]
	# convert into a line
	line = ' '.join(seq)
	# store
	sequences.append((line))
print('Total Sequences: %d' % len(sequences))

Total Sequences: 8848


In [11]:
sequences[1]

'airplane accelerates down a'

In [0]:
lines = data.split('\r\n')          

Encoding the tokens with numbers from 1 to vocab

In [0]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(lines)
sequences = tokenizer.texts_to_sequences(sequences)    

In [0]:
vocab_size = len(tokenizer.word_index) + 1

Sanity for the sequences. 
We convert unigrams and bigrams to trigrams but adding zeros at the beginning

In [0]:
for i in range(len(sequences)):
  if len(sequences[i])==1:
    sequences[i]=[0,0,0]+sequences[i]
  if len(sequences[i])==2:
    sequences[i]=[0,0]+sequences[i]
  if len(sequences[i])==3:
    sequences[i]=[0]+sequences[i]

Splitting of the sequences into trigrams and unigram. 

In [0]:
X=[]
y=[]
for x in sequences:
  X.append(x[:-1])
  y.append(x[-1])

In [0]:
X=np.array(X)

In [18]:
X

array([[ 19, 206,  48],
       [206,  48, 114],
       [ 48, 114,   2],
       ...,
       [ 75, 369,   8],
       [369,   8,   1],
       [  8,   1, 185]])

In [0]:
y1 = to_categorical(y, num_classes=vocab_size)

Creating the model

In [20]:
model = Sequential()
model.add(Embedding(vocab_size, 100, input_length=3))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(100))
model.add(Dense(100, activation='relu'))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 3, 100)            117400    
_________________________________________________________________
lstm_1 (LSTM)                (None, 3, 100)            80400     
_________________________________________________________________
lstm_2 (LSTM)                (None, 100)               80400     
_________________________________________________________________
dense_1 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_2 (Dense)              (None, 1174)              118574    
Total params: 406,874
Trainable params: 406,874
Non-trainable params: 0
_________________________________________________________________
None


Model fitting

In [22]:
# compile model
model.compile(loss='categorical_crossentropy', optimizer= Adam(lr = 0.01), metrics=['accuracy'])
# fit model
model.fit(X,y1, batch_size=128, epochs=25)
 


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f67d88579e8>

Creating a dictionary to map the encoded words to it's respective word

In [0]:
indextoword={}
for word, index in tokenizer.word_index.items():
			indextoword[index]=word

Outputting few of the generated sentences by the model

In [24]:
word1 = randint(1,vocab_size)
input_word = [0,0,word1]
generated_sentence = [word1]
for i in range(1,10):           #predict 99 new words
  new = model.predict_classes(np.array([input_word]), verbose=0)[0]
  generated_sentence+= [new]
  input_word = input_word[1:]+[new]
for x in range(len(generated_sentence)):
  generated_sentence[x]=indextoword[generated_sentence[x]]


str(generated_sentence)

"['long', 'in', 'seconds', 'with', 'a', 'final', 'velocity', 'of', 'ms', 'how']"

In [25]:
word1 = randint(1,vocab_size)
input_word = [0,0,word1]
generated_sentence = [word1]
for i in range(1,10):           #predict 99 new words
  new = model.predict_classes(np.array([input_word]), verbose=0)[0]
  generated_sentence+= [new]
  input_word = input_word[1:]+[new]
for x in range(len(generated_sentence)):
  generated_sentence[x]=indextoword[generated_sentence[x]]


str(generated_sentence)

"['skid', 'you', 'documents', 'phone', 'while', 'kmh', 'over', 'a', 'distance', 'of']"

In [26]:
word1 = randint(1,vocab_size)
input_word = [0,0,word1]
generated_sentence = [word1]
for i in range(1,10):           #predict 99 new words
  new = model.predict_classes(np.array([input_word]), verbose=0)[0]
  generated_sentence+= [new]
  input_word = input_word[1:]+[new]
for x in range(len(generated_sentence)):
  generated_sentence[x]=indextoword[generated_sentence[x]]


str(generated_sentence)

"['reached', 'the', 'cup', 'seconds', 'what', 'is', 'the', 'acceleration', 'of', 'the']"