### Diseño Evolutivo de Redes Neuronales Artificiales
### Baseline
Basado en el paper [Predicting Daily Returns of Global Stocks Indices: Neural Networks vs Support Vector Machines. Kaur, Dharni 2019](https://journaljemt.com/index.php/JEMT/article/view/30179)

Se desea reproducir los resultados del paper en ANN para el S&P 500.
Se usan parte de los datos diarios de cierre del S&P 500 (GSPC) obtenidos de Yahoo Finance.
Se calcularon los indicadores técnicos según las fórmulas del paper.

#### Importar paquetes

In [0]:
import numpy as np
import sklearn.preprocessing as skp
import tensorflow as tf

#### Parámetros de configuración
La idea es agrupar aquí los parámetros configurables.


In [0]:
p = {"local_file" : "GSPC.csv",
     "url" : "https://raw.githubusercontent.com/"+
             "jmacostap/webstore/master/GSPCext.csv",
     "cache_dir" : "/content",
     "columns" : {"Date" : 0, "Open" : 1, "High" : 2, "Low" : 3, "Close" : 4,
                  "Adj Close" : 5, "Volume" : 6, "Monday" : 7, "Tuesday" : 8,
                  "Wednesday" : 9, "Thursday" : 10, "Friday" : 11,
                  "Return" : 12, "HH14" : 13, "LL14" : 14, "%K14" : 15,
                  "%D3" : 16, "Slow%D3" : 17, "Momentum" : 18, "ROC" : 19,
                  "%R" : 20, "A/D Oscillator" : 21, "MA5" : 22, "MA10" : 23,
                  "Disparity5" : 24, "Disparity10" : 25, "OSCP" : 26,
                  "TP" : 27, "ADP" : 28, "AD" : 29, "AAD" : 30, "CCI" : 31,
                  "Gain" : 32, "Loss" : 33, "AVGG" : 34, "AVGL" : 35,
                  "RS" : 36, "RSI" : 37, "Return%" : 38},
     "throwaway" : 776,  # paper uses 3021 days from 2005-04-01 to 2017-03-31
     "samples" : 3021,  # how many samples to use out of the 23193 available
     "scaler" : skp.MinMaxScaler(),
     "test_fraction" : 0.20,  # 20% of samples for test data
     "neurons" : 9,
     "activation" : "sigmoid",  # for the hidden layer
     "output_activation" : "linear",  # for the output layer
     "learning_rate" : 0.3,
     "momentum" : 0.2,
     "nesterov" : False,
     # optimizer is below
     "metrics" : ["mae","mape"],
     "loss" : "mse",  # could be mae or mse
     "epochs" : 500,  # training epochs
     "batch_size" : 100,
     "shuffle" : True,  # the training data
     "verbose" : 2,  # verbose training
}
p["usecols"] = (p["columns"]["Open"],p["columns"]["High"],
                p["columns"]["Low"],p["columns"]["Close"],
                p["columns"]["Volume"],p["columns"]["%K14"],
                p["columns"]["%D3"],p["columns"]["Slow%D3"],
                p["columns"]["Momentum"],p["columns"]["ROC"],
                p["columns"]["%R"], p["columns"]["A/D Oscillator"],
                p["columns"]["MA5"],p["columns"]["MA10"],
                p["columns"]["Disparity5"],p["columns"]["Disparity10"],
                p["columns"]["OSCP"],p["columns"]["CCI"],
                p["columns"]["RSI"],p["columns"]["Return"],
)
p["optimizer"] = tf.keras.optimizers.SGD(
                 learning_rate=p["learning_rate"],
                 momentum=p["momentum"],
                 nesterov=p["nesterov"],
)


#### Leer los datos de entrada directamente del repositorio de GitHub

In [0]:
path_to_downloaded_file = tf.keras.utils.get_file(
    p["local_file"],
    p["url"],
    cache_dir=p["cache_dir"],
    )
data = np.loadtxt(path_to_downloaded_file, skiprows=1,
                  delimiter=",", usecols=p["usecols"],
                  )
print(f"{data.shape} samples read")
# Select "samples" and discard the most recent throwaway
# Extract targets
targets = np.reshape(
    data[-p["samples"]-p["throwaway"]:-p["throwaway"], -1],
    (-1,1),
    )
print(f"Targets to use: {targets.shape}")
# Extraer los features
data = data[-p["samples"]-p["throwaway"]:-p["throwaway"], :-1]
print(f"Samples to use: {data.shape}")
# Normalizar los datos a [0,1]
data_scaler = p["scaler"]
data = data_scaler.fit_transform(data)
targets_scaler = skp.MinMaxScaler()
targets = targets_scaler.fit_transform(targets)

# Separar datos de prueba
test_fraction = np.int(len(data)*p["test_fraction"])
print(f"Test fraction: {test_fraction}")
train_targets = targets[:-test_fraction]
print(f"Train targets: {train_targets.shape}")
train_data = data[:-test_fraction]
print(f"Train samples: {train_data.shape}")
test_targets = targets[-test_fraction:]
print(f"Test targets: {test_targets.shape}")
test_data = data[-test_fraction:]
print(f"Test samples: {test_data.shape}")

Downloading data from https://raw.githubusercontent.com/jmacostap/webstore/master/GSPCext.csv
(23193, 19) samples read
Targets to use: (3021, 1)
Samples to use: (3021, 18)
Test fraction: 604
Train targets: (2417, 1)
Train samples: (2417, 18)
Test targets: (604, 1)
Test samples: (604, 18)


#### Definición de funciones auxiliares

In [0]:
def create_model(input_size, neurons, X_train, y_train):

    # Create the model
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(
            neurons,
            activation=p["activation"],
            input_shape=(input_size,),
            ),
        tf.keras.layers.Dense(
            1,
            activation=p["output_activation"],
            )
        ])
    model.compile(
        optimizer=p["optimizer"],
        metrics=p["metrics"],
        loss=p["loss"],
        )

    # Train the model
    model.fit(X_train, y_train, epochs=p["epochs"],
              batch_size=p["batch_size"],
              shuffle=p["shuffle"],
              verbose=p["verbose"],
              validation_split=p["test_fraction"],
              )
    return model

####Crear el modelo y entrenarlo

In [0]:
np.random.seed(31416)  # numpy
tf.random.set_seed(31416)  # keras

model = create_model(
    len(p["usecols"])-1,
    p["neurons"],
    train_data,
    train_targets,
    )

Epoch 1/500
20/20 - 0s - loss: 0.1676 - mae: 0.2502 - mape: 267555.9688 - val_loss: 0.0030 - val_mae: 0.0463 - val_mape: 10.5105
Epoch 2/500
20/20 - 0s - loss: 0.0064 - mae: 0.0577 - mape: 207529.0156 - val_loss: 0.0026 - val_mae: 0.0436 - val_mape: 10.0823
Epoch 3/500
20/20 - 0s - loss: 0.0059 - mae: 0.0537 - mape: 215108.4844 - val_loss: 0.0017 - val_mae: 0.0319 - val_mape: 7.2346
Epoch 4/500
20/20 - 0s - loss: 0.0052 - mae: 0.0494 - mape: 222987.5938 - val_loss: 0.0034 - val_mae: 0.0506 - val_mape: 11.8765
Epoch 5/500
20/20 - 0s - loss: 0.0052 - mae: 0.0491 - mape: 219992.2969 - val_loss: 0.0013 - val_mae: 0.0292 - val_mape: 6.6962
Epoch 6/500
20/20 - 0s - loss: 0.0047 - mae: 0.0457 - mape: 230700.8281 - val_loss: 0.0015 - val_mae: 0.0284 - val_mape: 6.3099
Epoch 7/500
20/20 - 0s - loss: 0.0047 - mae: 0.0458 - mape: 211854.3750 - val_loss: 0.0014 - val_mae: 0.0277 - val_mape: 6.1405
Epoch 8/500
20/20 - 0s - loss: 0.0043 - mae: 0.0438 - mape: 224331.7812 - val_loss: 0.0019 - val_mae:

####Probar el modelo con los datos de prueba

In [0]:
loss_error, mae_error, mape_error = model.evaluate(test_data, test_targets)
print(f"Test error: {loss_error}, MAE: {mae_error}, MAPE: {mape_error}")

Test error: 0.000689507694914937, MAE: 0.020724499598145485, MAPE: 4.72582483291626
