### Optimizing RNN using Genetic Algorithm<br>Optimizando Redes Neuronales Recurrentes usando Algoritmos genéticos
Code and figures based on [Using Genetic Algorithm for optimizing Recurrent Neural Network](http://aqibsaeed.github.io/2017-08-11-genetic-algorithm-for-optimizing-rnn/) adapted to run on Google colaboratory.
The data file train.csv must be on your Google Drive at "/My Drive/train.csv"
Código e ilustraciones basados en [Using Genetic Algorithm for optimizing Recurrent Neural Network](http://aqibsaeed.github.io/2017-08-11-genetic-algorithm-for-optimizing-rnn/) adaptado para correr en Google colaboratory.
El archivo de datos train.csv debe estar en Google Drive en "/My Drive/train.csv"

#### Importing required packages<br>Importación de paquetes

In [None]:
!pip install -U deap bitstring
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split as split

from keras.layers import LSTM, Input, Dense
from keras.models import Model

from deap import base, creator, tools, algorithms
from scipy.stats import bernoulli
from bitstring import BitArray

np.random.seed(1120)

Collecting deap
[?25l  Downloading https://files.pythonhosted.org/packages/0a/eb/2bd0a32e3ce757fb26264765abbaedd6d4d3640d90219a513aeabd08ee2b/deap-1.3.1-cp36-cp36m-manylinux2010_x86_64.whl (157kB)
[K     |██                              | 10kB 16.3MB/s eta 0:00:01[K     |████▏                           | 20kB 1.7MB/s eta 0:00:01[K     |██████▎                         | 30kB 2.2MB/s eta 0:00:01[K     |████████▍                       | 40kB 1.6MB/s eta 0:00:01[K     |██████████▍                     | 51kB 1.8MB/s eta 0:00:01[K     |████████████▌                   | 61kB 2.2MB/s eta 0:00:01[K     |██████████████▋                 | 71kB 2.4MB/s eta 0:00:01[K     |████████████████▊               | 81kB 2.6MB/s eta 0:00:01[K     |██████████████████▊             | 92kB 2.9MB/s eta 0:00:01[K     |████████████████████▉           | 102kB 2.7MB/s eta 0:00:01[K     |███████████████████████         | 112kB 2.7MB/s eta 0:00:01[K     |█████████████████████████       | 122kB 2

Using TensorFlow backend.


#### We mount Google Drive to have access to the data file train.csv<br>Montando el Google Drive para tener acceso al archivo de datos train.csv

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


#### Reading dataset and using first 17,257 points as training/validation and remaining 1500 points as test set.<br>Lectura de los datos, se usan los primeros 17.257 puntos para entrenamiento y validación y los restantes 1.500 para prueba.

In [None]:
data = pd.read_csv('/content/drive/My Drive/train.csv')
data = np.reshape(np.array(data['wp1']),(len(data['wp1']),1))

train_data = data[0:17257]
test_data = data[17257:]

#### Defining Helper Functions<br>Definición de las funciones auxiliares

In [None]:
def prepare_dataset(data, window_size):
    """Creates a matrix of sliding windows and targets from an array of data
    
    data = [1, 2, 3, 4, 5, 6], window_size = 3 will become
    X = [[[1],[2],[3]],[[2],[3],[4]],[[3],[4],[5]]]
    Y = [[4],[5],[6]]

    Params:
    data: array-like of input data
    window_size: number of elements on each window (row of data)
    Returns:
    X: Three dimensional np.array of data sliding windows in rows
    Y: Two dimensional np.array of targets
    All output elements are embedded one level deeper as expected by Keras
    """
    
    X, Y = np.empty((0,window_size)), np.empty((0))
    for i in range(len(data)-window_size-1):
        X = np.vstack([X,data[i:(i + window_size),0]])
        Y = np.append(Y,data[i + window_size,0])   
    X = np.reshape(X,(len(X),window_size,1))
    Y = np.reshape(Y,(len(Y),1))
    return X, Y

def train_evaluate(ga_individual_solution):   
    # Determine the fitness of a chromosome
    # Determina la aptitud de un cromosoma

    # Decode chromosome to determine window_size and num_units
    # Calcula window_size y num_units en base al cromosoma
    window_size_bits = BitArray(ga_individual_solution[0:6])
    num_units_bits = BitArray(ga_individual_solution[6:])
    window_size = window_size_bits.uint+1
    num_units = num_units_bits.uint+1
    print('\nWindow Size: ', window_size, ', Num of Units: ', num_units)
    
    # Discard invalid values
    # Descarta valores fuera de rango
    if window_size == 0 or num_units == 0:
        return 100, 
    
    # Segment the train_data based on new window_size
    # Split into train and validation (80/20)
    # Segmenta train_data basado en el window_size
    # Separa 80/20 los datos de entrenamiento/validación
    X,Y = prepare_dataset(train_data,window_size)
    X_train, X_val, y_train, y_val = split(X, Y, test_size=0.20,
                                           random_state=1120)
    
    # Train LSTM model and predict on validation set
    # Entrena la red recurrente y predice el set de validación
    inputs = Input(shape=(window_size,1))
    x = LSTM(num_units, input_shape=(window_size,1))(inputs)
    predictions = Dense(1, activation='linear')(x)
    model = Model(inputs=inputs, outputs=predictions)
    model.compile(optimizer='adam',loss='mean_squared_error')
    model.fit(X_train, y_train, epochs=5, batch_size=10,shuffle=True, verbose=0)
    y_pred = model.predict(X_val)
    
    # Calculate the RMSE score as fitness score for this chromosome
    # Calcula la raiz media cuadrática como aptitud del cromosoma
    rmse = np.sqrt(mean_squared_error(y_val, y_pred))
    print('Validation RMSE: ', rmse,'\n')
    
    return rmse,

#### Genetic Representation of the Solution<br>Representación genética de la solución

<img src="https://raw.githubusercontent.com/jmacostap/webstore/master/genetic_representation.png" alt="Genetic representation of a solution">

#### Genetic Algorithm Overview<br>Diagrama general del algoritmo genético

<img src="https://raw.githubusercontent.com/jmacostap/webstore/master/ga.png" alt="Genetic Algorithm">

In [None]:
population_size = 5  
num_generations = 10
gene_length = 10

# We are trying to minimize the RMSE score, that's why we use -1.0. 
# When you want to maximize accuracy for instance, use 1.0
# Estamos tratando de minimizar la raíz media cuadrática, por eso se usa un
# peso de -1, si quisiéramos maximizar el valos usaríamos 1.0
creator.create('FitnessMax', base.Fitness, weights = (-1.0,))
creator.create('Individual', list , fitness = creator.FitnessMax)

toolbox = base.Toolbox()
toolbox.register('binary', bernoulli.rvs, 0.5)
toolbox.register('individual', tools.initRepeat, creator.Individual,
                 toolbox.binary, n = gene_length)
toolbox.register('population', tools.initRepeat, list , toolbox.individual)

toolbox.register('mate', tools.cxOrdered)
toolbox.register('mutate', tools.mutShuffleIndexes, indpb = 0.6)
toolbox.register('select', tools.selRoulette)
toolbox.register('evaluate', train_evaluate)

population = toolbox.population(n = population_size)
r = algorithms.eaSimple(population, toolbox, cxpb = 0.4, mutpb = 0.1,
                        ngen = num_generations, verbose = False)




Window Size:  23 , Num of Units:  15
Validation RMSE:  0.08127731616725946 


Window Size:  3 , Num of Units:  13
Validation RMSE:  0.07272281737700219 


Window Size:  41 , Num of Units:  2
Validation RMSE:  0.08037590683821653 


Window Size:  52 , Num of Units:  11
Validation RMSE:  0.08152766954595342 


Window Size:  27 , Num of Units:  10
Validation RMSE:  0.07651852088449959 


Window Size:  27 , Num of Units:  10
Validation RMSE:  0.07645777893942324 


Window Size:  27 , Num of Units:  10
Validation RMSE:  0.076602796635138 


Window Size:  23 , Num of Units:  10
Validation RMSE:  0.07622665086198063 


Window Size:  27 , Num of Units:  15
Validation RMSE:  0.07661153060688441 


Window Size:  23 , Num of Units:  10
Validation RMSE:  0.07754251591425984 


Window Size:  27 , Num of Units:  10
Validation RMSE:  0.0765740173810767 


Window Size:  23 , Num of Units:  10
Validation RMSE:  0.07529252670421877 


Window Size:  41 , Num of Units:  12
Validation RMSE:  0.08084829174

KeyboardInterrupt: ignored

#### Print top solutions<br>Muestra las mejores soluciones

In [None]:
# You can adjust k depending on how many solutions you want to display
# Se puede ajustar k según la cantidad de soluciones que se desee ver
best_individuals = tools.selBest(population, k=3)
best_window_size = None
best_num_units = None

for bi in best_individuals:
    window_size_bits = BitArray(bi[0:6])
    num_units_bits = BitArray(bi[6:]) 
    best_window_size = window_size_bits.uint
    best_num_units = num_units_bits.uint
    print('\nWindow Size: ', best_window_size, ', Num of Units: ', best_num_units)


Window Size:  26 , Num of Units:  9

Window Size:  26 , Num of Units:  9

Window Size:  26 , Num of Units:  9


#### Train the model using best configuration on complete training set and make predictions on the test set

In [None]:
X_train,y_train = prepare_dataset(train_data,best_window_size)
X_test, y_test = prepare_dataset(test_data,best_window_size)

inputs = Input(shape=(best_window_size,1))
x = LSTM(best_num_units, input_shape=(best_window_size,1))(inputs)
predictions = Dense(1, activation='linear')(x)
model = Model(inputs = inputs, outputs = predictions)
model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(X_train, y_train, epochs=5, batch_size=10,shuffle=True)
y_pred = model.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print('Test RMSE: ', rmse)
# Window Size:  49 , Num of Units:  9
# Test RMSE:  0.0931164056025094
# Window Size:  26 , Num of Units:  9
# Test RMSE:  0.09180271941509333

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test RMSE:  0.09180271941509333
