### Diseño Evolutivo de Redes Neuronales Artificiales
### Prototipo
Basado en https://github.com/aqibsaeed/Genetic-Algorithm-RNN/blob/master/Genetic-Algorithm-RNN.ipynb

Para el prototipo se desea crear una red neuronal secuencial sencilla que prediga la tendencia del S&P 500.
Se usan parte de los datos diarios de cierre del S&P 500 (GSPC) desde 
1993 hasta 2019.
Se transponen los datos en una ventana deslizante para la entrada de la RNA.
El objetivo del prototipo es optimizar el valor de la ventana de entrada y el número de neuronas intermedias, creando una base de código generalizable para DERNA que debe poder optimizar un número mayor de hiperparámetros en un modelo.
#### Objetivos del prototipo:
- Crear el archivo de entrada GSPC
- Leer el archivo de entrada directamente de Internet
- Preprocesar los datos de entrada
- Crear una RNA W-N-1 que tome una ventana de W días
  use N neuronas intermedias y prediga la tendencia del día siguiente
- Implementar mapeo
  - Crear un genoma como arreglo de reales
  - Implementar la reproducción por alpha blend
  - Implementar la mutación
- Documentar todo segun PEP

### Instalar paquetes que Collab no tiene.

In [None]:
!pip install -U deap bitstring

Collecting deap
[?25l  Downloading https://files.pythonhosted.org/packages/0a/eb/2bd0a32e3ce757fb26264765abbaedd6d4d3640d90219a513aeabd08ee2b/deap-1.3.1-cp36-cp36m-manylinux2010_x86_64.whl (157kB)
[K     |████████████████████████████████| 163kB 9.0MB/s 
[?25hCollecting bitstring
[?25l  Downloading https://files.pythonhosted.org/packages/c3/fc/ffac2c199d2efe1ec5111f55efeb78f5f2972456df6939fea849f103f9f5/bitstring-3.1.7.tar.gz (195kB)
[K     |████████████████████████████████| 204kB 17.3MB/s 
Building wheels for collected packages: bitstring
  Building wheel for bitstring (setup.py) ... [?25l[?25hdone
  Created wheel for bitstring: filename=bitstring-3.1.7-cp36-none-any.whl size=37948 sha256=888d4834c921de71776df15789e7d52a3ec61755913d33b8770e259fcc893e4b
  Stored in directory: /root/.cache/pip/wheels/b8/27/f0/8373e26b7de57db03dc18aaaebdd8c26a99da882416f762979
Successfully built bitstring
Installing collected packages: deap, bitstring
Successfully installed bitstring-3.1.7 deap-1.3

#### Importar paquetes

In [None]:
import numpy as np
import pandas as pd

from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

import tensorflow as tf
from tensorflow.keras.layers import LSTM, Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.losses import MeanAbsolutePercentageError
from tensorflow.random import set_seed

from deap import base, creator, tools, algorithms
from scipy.stats import bernoulli
from bitstring import BitArray

#### Parámetros de configuración
La idea es agrupar aquí la mayor cantidad de datos configurables.
Si la arquitectura se desea modificar es más fácil cambiar el código abajo. Eventualmente DERNA debería aceptar listas de valores para los parámetros y optimizarlos. Por ahora solo vamos a optimizar el tamaño de la ventana y el número de neuronas.

In [None]:
parameters = {
    "num_samples" : 100, # how many samples to use out of the 23193 available
    "activation" : "tanh",
    "output_activation" : "tanh",
    "epochs" : 5000,  # training epochs
    "batch_size" : 1,
    "shuffle" : True,
    "verbose" : 1,
    "test_size" : 0.20,  # 20% of samples for test data
    "optimizer" : "adam",
    "loss" : "mse",
    "fitness" : tf.keras.losses.MeanSquaredError(),
    "population_size" : 6,
    "num_generations" : 3,
}

#### Leer los datos de entrada directamente del repositorio de GitHub

In [None]:
path_to_downloaded_file = tf.keras.utils.get_file(
    "GSPC.csv",
    "https://raw.githubusercontent.com/jmacostap/webstore/master/GSPC.csv",
    cache_dir="/content",
    )
#data = pd.read_csv(path_to_downloaded_file)
#print("Leí ", len(data), " datos")
data = np.loadtxt(path_to_downloaded_file, skiprows=1, delimiter=",")
print("Leí ", len(data), " datos")

Leí  23193  datos


#### Separar los datos 80% para entrenamiento y 20% para prueba.

In [None]:
# Seleccionar num_samples muestras
#data = np.array(data[:,-parameters["num_samples"]:])
data = data[-parameters["num_samples"]:,:]
# Calcular los targets
"""
targets = [1]
for i in range(len(data)-1):
    if data[i,3] < data[i+1,3]:
        targets.append(1)  # sube
    else:
        targets.append(0)  # baja
"""
targets = [data[i,3] for i in range(len(data))]

# Normalizar a [0,1]
# scaler = MinMaxScaler()
# data = scaler.fit_transform(data)

# Separar 20% para prueba
twentypercent = len(data)//5
train_data = data[:-twentypercent]
train_targets = targets[:-twentypercent]
test_data = data[-twentypercent:]
test_targets = targets[-twentypercent:]

#### Definición de funciones auxiliares

In [None]:
def prepare_dataset(time_series, targets, window_size):
    # prepare dataset and targets
    X, Y = [], []
    for i in range(len(time_series)-window_size-1):
        X.extend(time_series[i:(i + window_size)])
        Y.append(targets[i + window_size])
    X = np.reshape(X, (len(X)//window_size, window_size * 10))
    Y = np.reshape(Y, (len(Y), 1))
    return X, Y


def create_model(window_size, num_neurons, X_train, y_train):
    # Crear el modelo
    """
    model = tf.keras.Sequential([
        Dense(2*num_neurons, activation=parameters["activation"],
             input_shape=(window_size * 10,)),
        Dense(num_neurons, activation=parameters["activation"]),
        Dense(1, activation=parameters["output_activation"])
    ])
    """
    model = tf.keras.Sequential([
        Dense(20, activation=parameters["activation"],
             input_shape=(window_size * 10,)),
        Dense(40, activation=parameters["activation"]),
        Dense(20, activation=parameters["activation"]),
        Dense(1, activation=parameters["output_activation"])
    ])
    model.compile(optimizer=parameters["optimizer"],
                  metrics=["mse","mae","accuracy"],
                  loss=parameters["loss"])

    # Entrenar el modelo
    model.fit(X_train, y_train, epochs=parameters["epochs"],
              batch_size=parameters["batch_size"],
              shuffle=parameters["shuffle"], verbose=parameters["verbose"])
    return model


def evaluate_model(model, X_val, y_val):

    # Evaluar el modelo entrenado
    y_pred = model.predict(X_val)

    # Desplegar el error
    error_value = parameters["fitness"](y_val, y_pred).numpy()
    print('Error de validación: ', error_value)
    return error_value


def evaluate_chromosome(chromosome: BitArray) -> (float,):
    """Construye, entrena y evalua la RNA representada por un cromosoma

    Se decodifican los valores del cromosoma, se construye la RNA,
    se entrena y se evalúa.

    Args:
    chromosome (BitArray): los primeros 6 bits son tamaño de la ventana
    (window_size) los siguientes 4 son el número de neuronas intermedias

    Returns:
    mse (float,): Error medio cuadrático de la solución evaluada en tupla.
    """

    # Se desea que los resultados sean reproducibles
    np.random.seed(31416)  # numpy
    set_seed(31416)  # tf.keras

    # convertir el string binario a entero para interpretarlo
    window_size_bits = BitArray(chromosome[0:6])
    num_neurons_bits = BitArray(chromosome[6:])
    window_size = window_size_bits.uint + 1  # 0 is invalid, so range is 1:64
    window_size = 9
    num_neurons = num_neurons_bits.uint + 1  # 0 is invalid so range is 1:16
    print('\nVentana: ', window_size, ', Neuronas: ', num_neurons)

    # Segmentar los datos según el tamaño de ventana window_size
    # dejar 20% para validación
    X, Y = prepare_dataset(train_data, train_targets, window_size)
    #X_train, X_val, y_train, y_val = train_test_split(X, Y,
    #    test_size=parameters["test_size"], random_state=31416, shuffle=False)
    eighty = int(len(X)*0.8)
    X_train = X[:eighty]
    X_val = X[eighty:]
    y_train = Y[:eighty]
    y_val = Y[eighty:]
    model = create_model(window_size, num_neurons, X_train, y_train)
    fitness = evaluate_model(model, X_val, y_val)
    return 1-fitness,

In [None]:
# La función fit siempre trata de maximizar,
# por lo tanto para minimizar debemos usar peso negativo
creator.create('FitnessMax', base.Fitness, weights=(1.0,))
creator.create('Individual', list, fitness=creator.FitnessMax)



In [None]:
# Se desea que los resultados sean reproducibles
np.random.seed(1)  # numpy

population_size = parameters["population_size"]
num_generations = parameters["num_generations"]
gene_length = 10

toolbox = base.Toolbox()
toolbox.register('binary', bernoulli.rvs, 0.5)
toolbox.register('individual', tools.initRepeat, creator.Individual,
                 toolbox.binary, n=gene_length)
toolbox.register('population', tools.initRepeat, list, toolbox.individual)

toolbox.register('mate', tools.cxOrdered)
toolbox.register('mutate', tools.mutShuffleIndexes, indpb=0.6)
toolbox.register('select', tools.selRoulette)
toolbox.register('evaluate', evaluate_chromosome)

population = toolbox.population(n=population_size)
r = algorithms.eaSimple(population, toolbox, cxpb=0.4, mutpb=0.1,
                        ngen=num_generations, verbose=True)


Ventana:  9 , Neuronas:  2
Epoch 1/5000
Epoch 2/5000
Epoch 3/5000
Epoch 4/5000
Epoch 5/5000
Epoch 6/5000
Epoch 7/5000
Epoch 8/5000
Epoch 9/5000
Epoch 10/5000
Epoch 11/5000
Epoch 12/5000
Epoch 13/5000
Epoch 14/5000
Epoch 15/5000
Epoch 16/5000
Epoch 17/5000
Epoch 18/5000
Epoch 19/5000
Epoch 20/5000
Epoch 21/5000
Epoch 22/5000
Epoch 23/5000
Epoch 24/5000
Epoch 25/5000
Epoch 26/5000
Epoch 27/5000
Epoch 28/5000
Epoch 29/5000
Epoch 30/5000
Epoch 31/5000
Epoch 32/5000
Epoch 33/5000
Epoch 34/5000
Epoch 35/5000
Epoch 36/5000
Epoch 37/5000
Epoch 38/5000
Epoch 39/5000
Epoch 40/5000
Epoch 41/5000
Epoch 42/5000
Epoch 43/5000
Epoch 44/5000
Epoch 45/5000
Epoch 46/5000
Epoch 47/5000
Epoch 48/5000
Epoch 49/5000
Epoch 50/5000
Epoch 51/5000
Epoch 52/5000
Epoch 53/5000
Epoch 54/5000
Epoch 55/5000
Epoch 56/5000
Epoch 57/5000
Epoch 58/5000
Epoch 59/5000
Epoch 60/5000
Epoch 61/5000
Epoch 62/5000
Epoch 63/5000
Epoch 64/5000
Epoch 65/5000
Epoch 66/5000
Epoch 67/5000
Epoch 68/5000
Epoch 69/5000
Epoch 70/5000
E

KeyboardInterrupt: ignored

#### Mostrar las 4 mejores soluciones

In [None]:
best_individuals = tools.selBest(population, k=4)
best_window_size_bits = BitArray(best_individuals[0][0:6])
best_num_neurons_bits = BitArray(best_individuals[0][6:])
best_window_size = best_window_size_bits.uint + 1
best_num_neurons = best_num_neurons_bits.uint + 1

for bi in best_individuals:
    window_size_bits = BitArray(bi[0:6])
    num_neurons_bits = BitArray(bi[6:])
    window_size = window_size_bits.uint + 1
    num_neurons = num_neurons_bits.uint + 1
    print('\nWindow Size: ', window_size,
          ', Num of Units: ', num_neurons)

#### Entrenar el mejor modelo y probarlo con los datos de prueba

In [None]:
X_train, y_train = prepare_dataset(train_data, train_targets, best_window_size)
X_test, y_test = prepare_dataset(test_data, test_targets, best_window_size)
model = create_model(best_window_size, best_num_neurons, X_train, y_train)
fitness = evaluate_model(model, X_test, y_test)
y_pred = model.predict(X_test)
for i in range(20):
    print(X_train[i],X_test[i],y_test[i], y_pred[i])

In [None]:
X_train, y_train = prepare_dataset(train_data, train_targets, 8)
X_test, y_test = prepare_dataset(test_data, test_targets, 8)
model = create_model(8, 9, X_train, y_train)
# Evaluar el modelo entrenado
fitness = evaluate_model(model, X_test, y_test)
y_pred = model.predict(X_test)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Error de validación:  [0.7491895  0.64007926 0.64007926 0.7491895  0.64007926 0.7491895
 0.64007926 0.64007926 0.7491895  0.64007926 0.64007926 0.64007926
 0.7491895  0.7491895  0.7491895  0.7491895  0.7491895  0.64007926
 0.7491895  0.64007926 0.64007926 0.7491895  0.64007926 0.7491895
 0.7491895  0.64007926 0.64007926 0.64007926 0.64007926 0.7491895
 0.64007926 0.7491895  0.64007926 0.64007926 0.64007926 0.64007926
 0.64007926 0.7491895  0.64007926 0.7491895  0.64007926 0.7491895
 0.64007926 0.7491895  0.64007926 0.7491895  0.64007926 0.64007926
 0.7491895  0.7491895  0.64007926 0.64007926 0.64007926 0.7491895
 0.64007926 0.7491895  0.64007926 0.7491895  0.64007926 0.7491895
 0.64007926 0.64007926 0.7491895  0.7491895  0.64007926 0.7491895
 0.7491895  0.7491895  0.64007926 0.7491895  0.7491895  0.7491895
 0.64007926

In [None]:
for i in range(20):
        print(y_test[i],y_pred[i])


[0] [0.5272505]
[1] [0.5272505]
[1] [0.5272505]
[0] [0.5272505]
[1] [0.5272505]
[0] [0.5272505]
[1] [0.5272505]
[1] [0.5272505]
[0] [0.5272505]
[1] [0.5272505]
[1] [0.5272505]
[1] [0.5272505]
[0] [0.5272505]
[0] [0.5272505]
[0] [0.5272505]
[0] [0.5272505]
[0] [0.5272505]
[1] [0.5272505]
[0] [0.5272505]
[1] [0.5272505]


In [None]:
for i in range(20):
    print(train_data[i], train_targets[i])

In [None]:
test_data.shape