# Assignment 04 Neural networks

Exercise 1: Using RNNs/LSTMs to generate Python code

Using TensorFlow, design and develop an RNN/LSTM model for generating fake Python code (functions, etc.). 
Use any of the Python repos freely available. Analyze the accuracy of the model and point out some of the pitfalls. 
You may refer to the article below for the LSTM net architecture.

In [1]:
#importing libraries
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.utils import to_categorical

In [2]:
#sample python file
raw_text = open("./sample.py", 'r', encoding='utf-8').read()
raw_text = raw_text.lower()

In [3]:
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [4]:
# summarizing the dataset.
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters in the file: ", n_chars)
print("Total Vocab in the file: ", n_vocab)

Total Characters:  19130
Total Vocab:  62


In [5]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 20
dataX = []
dataY = []
#pattern
for i in range(0, n_chars - seq_length, 1):
 seq_in = raw_text[i:i + seq_length]
 seq_out = raw_text[i + seq_length]
 dataX.append([char_to_int[char] for char in seq_in])
 dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  19110


In [6]:
#reshaping
X = np.reshape(dataX, (n_patterns, seq_length, 1))
X = X / float(n_vocab)
y = to_categorical(dataY)

In [7]:
#modelling
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [8]:
filepath = "lstm_weighs-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model.fit(X, y, epochs=50, batch_size=32, callbacks=callbacks_list)

Epoch 1/50
Epoch 1: loss improved from inf to 2.91630, saving model to lstm_weighs-01-2.9163.hdf5
Epoch 2/50
Epoch 2: loss improved from 2.91630 to 2.75591, saving model to lstm_weighs-02-2.7559.hdf5
Epoch 3/50
Epoch 3: loss improved from 2.75591 to 2.67131, saving model to lstm_weighs-03-2.6713.hdf5
Epoch 4/50
Epoch 4: loss improved from 2.67131 to 2.55957, saving model to lstm_weighs-04-2.5596.hdf5
Epoch 5/50
Epoch 5: loss improved from 2.55957 to 2.43163, saving model to lstm_weighs-05-2.4316.hdf5
Epoch 6/50
Epoch 6: loss improved from 2.43163 to 2.29817, saving model to lstm_weighs-06-2.2982.hdf5
Epoch 7/50
Epoch 7: loss improved from 2.29817 to 2.16905, saving model to lstm_weighs-07-2.1690.hdf5
Epoch 8/50
Epoch 8: loss improved from 2.16905 to 2.03774, saving model to lstm_weighs-08-2.0377.hdf5
Epoch 9/50
Epoch 9: loss improved from 2.03774 to 1.91263, saving model to lstm_weighs-09-1.9126.hdf5
Epoch 10/50
Epoch 10: loss improved from 1.91263 to 1.79259, saving model to lstm_weig

<keras.callbacks.History at 0x14b232bb310>