## Entraînement des modèles DIST_REL_CC_01, DIST_REL_CH_01 et DIST_REL_OH_01

Nous allons dans ce notebook entraîner les modèles DIST_REL_CC_01, DIST_REL_CH_01, DIST_REL_OH_01, qui sont une tentative de généralisation de la méthode utilisée pour le modèle DIST_REL_C_01.

Ces modèles doivent prédire la distance entre des couples d'atomes (carbone-carbone, carbone-hydrogène, oxygène-hydrogène), en fonction des informations suivantes sur chaque atome de la molécule ne faisant pas partie du couple :

* Le numéro atomique (encodé en one-hot-encoding)
* La masse atomique
* La classe positionnelle de l'atome par rapport à la liaison (voir notebook 9.1)
* La distance à chacun des deux atomes de la liaison

Le modèle DIST_REL_CC_01 est donc identique au modèle DIST_REL_C_01, à la différence qu'on va l'entraîner sur plus d'exemples.

#### Chemin des fichiers

In [1]:
# Modèle DIST_REL_CC
train_CC_prepared_input_loc = "../data/DIST_REL_CC/train_set_prepared_input.h5"
train_CC_labels_loc = "../data/DIST_REL_CC/train_set_labels.h5"
minimal_CC_prepared_input_loc = "../data/DIST_REL_CC/minimal_set_prepared_input.h5"
minimal_CC_labels_loc = "../data/DIST_REL_CC/minimal_set_labels.h5"

models_CC_loc = "../models/DIST_REL_CC_01/12.1/"
logs_CC_loc = "../models/DIST_REL_CC_01/12.1/"

# Modèle DIST_REL_CH
train_CH_prepared_input_loc = "../data/DIST_REL_CH/train_set_prepared_input.h5"
train_CH_labels_loc = "../data/DIST_REL_CH/train_set_labels.h5"

models_CH_loc = "../models/DIST_REL_CH_01/12.1/"
logs_CH_loc = "../models/DIST_REL_CH_01/12.1/"

# Modèle DIST_REL_OH
train_OH_prepared_input_loc = "../data/DIST_REL_OH/train_set_prepared_input.h5"
train_OH_labels_loc = "../data/DIST_REL_OH/train_set_labels.h5"

models_OH_loc = "../models/DIST_REL_OH_01/12.1/"
logs_OH_loc = "../models/DIST_REL_OH_01/12.1/"


### Définition des fonctions de coût et de validation

#### RMSE (coût)

In [2]:
def rmse(pred, targets):
    with tf.name_scope("rmse_loss"):
        return tf.sqrt(tf.reduce_mean(tf.squared_difference(pred, targets)), name="rmse")

#### Fonction d'évaluation des performances (opposé du RMSE)

In [3]:
def rmse_valid(pred, targets, inputs):
    with tf.name_scope("rmse_validation"):
        return -rmse(pred, targets)

## Création du RN

In [4]:
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
from tflearn.optimizers import Adam
from tflearn.data_preprocessing import DataPreprocessing
import tflearn as tfl
import math


def creer_RN(epsilon=1e-8, learning_rate=0.001, dropout_val=0.99, stddev_init=0.001,
             hidden_act='relu', outlayer_act='prelu', weight_decay=0.001, width=870, depth=3,
             validation_fun=rmse_valid, cost_fun=rmse, gpu_mem_prop=1):

    # On créé l'initialisateur de tenseur avec une loi normale tronquée. sigma = stddev_init, et les 
    # valeurs à plus de 2sigma sont re-tirées
    winit = tfl.initializations.truncated_normal(stddev=stddev_init, dtype=tf.float32, seed=None)
    
    # On définit la proportion de mémoire utilisée sur le GPU (pour entraîner des modèles en parallèle)
    tfl.init_graph(num_cores=16, gpu_memory_fraction=gpu_mem_prop, soft_placement=True)
    
    # On créé l'input du RN
    network = input_data(shape=[None, 870], name='input')
    
    # On créé les couches cachées
    for i in range(depth):
        network = fully_connected(network, width, activation=hidden_act, name='fc'+str(i), weights_init=winit,
                                  weight_decay=weight_decay)
        # On détruit des neurones aléatoirement avec une la probabilité donnée en entrée
        network = dropout(network, dropout_val)
    
    # On ajoute la couche de sortie du réseau
    # Fonction d'activation prelu
    # Initilisée avec la loi normale tronquée
    network = fully_connected(network, 1, activation=outlayer_act, name='outlayer', weights_init=winit)
    
    adam = Adam(learning_rate=learning_rate, epsilon=epsilon)
    
    # Couche d'évaluation du modèle. Utilisation d'une descente stochastique Adam
    # Learning rate = 0.05
    # Loss = fonction définie rmse
    network = regression(network, optimizer=adam,
    loss=cost_fun, metric=validation_fun, name='target')
            
    return network

  from ._conv import register_converters as _register_converters


Instructions for updating:
Use the retry module or similar alternatives.


## Entraînement des modèles

In [5]:
import h5py
import tflearn as tfl
import time
from scipy import sparse
import numpy as np
import gc
import tensorflow as tf


def train_model(input_X, labels_y, model_name, model_path, logs_path, samples_per_batch=1000, epochs=5,
                learning_rate=0.001, epsilon=1e-8, dropout=0.99, stddev_init=0.001, hidden_act='relu',
                outlayer_act='prelu', cost_fun=rmse, validation_fun=rmse_valid, width=870, depth=2,
                gpu_mem_prop=1):
    
    total_start_time = time.time()

    tf.reset_default_graph()
    
    # On créé le réseau 
    network = creer_RN(learning_rate=learning_rate, epsilon=epsilon, dropout_val=dropout,
                       stddev_init=stddev_init, hidden_act=hidden_act, outlayer_act=outlayer_act,
                       validation_fun=validation_fun, cost_fun=cost_fun, width=width, depth=depth,
                       gpu_mem_prop=gpu_mem_prop)

    # On créé le modèle
    model = tfl.DNN(network, tensorboard_verbose=3, tensorboard_dir=logs_path)

    # Entraînement
    model.fit(X_inputs=input_X,Y_targets=labels_y, batch_size=samples_per_batch,
              shuffle = True, snapshot_step=100, validation_set=0.1,
              show_metric=True, run_id=model_name, n_epoch=epochs)

    # Sauvegarde du modèle
    model.save(model_path + model_name + ".tflearn")


In [7]:
import h5py
import numpy as np
import tensorflow as tf
from tflearn.data_utils import pad_sequences


def prepare_data_and_train(train_prepared_input_loc, train_labels_loc, model_name, model_path, logs_path,
                           gpu_mem_prop=1):
    
    input_X_h5 = h5py.File(train_prepared_input_loc, 'r')
    labels_y_h5 = h5py.File(train_labels_loc, 'r')
    
    input_X = np.array(input_X_h5["inputs"])
    input_X = pad_sequences(input_X, dtype="float32", maxlen=870)
    input_X = input_X.reshape(-1, 870)
    labels_y = np.array(labels_y_h5["targets"])
    labels_y = labels_y.reshape(-1, 1)
    
    train_model(input_X, labels_y, model_name, model_path, logs_path, samples_per_batch=5000, 
            epochs=300, learning_rate=0.01, dropout=0.98, epsilon=0.001, hidden_act="elu",
            outlayer_act="linear", validation_fun=rmse_valid, cost_fun=rmse,
            width=870, depth=3, gpu_mem_prop=gpu_mem_prop)
    
    

### Entraînement de DIST_REL_CC_01 sur le jeu minimal (test d'exécution)

In [8]:
"""prepare_data_and_train(minimal_CC_prepared_input_loc, minimal_CC_labels_loc, "DIST_REL_CC_01_basic",
                       models_CC_loc, logs_CC_loc)"""

---------------------------------
Run id: DIST_REL_CC_01_basic
Log directory: ../models/DIST_REL_CC_01/12.1/
INFO:tensorflow:Summary name rmse_validation/ (raw) is illegal; using rmse_validation/__raw_ instead.
---------------------------------
Training samples: 4500
Validation samples: 500
--
Training Step: 1  | time: 2.145s
| Adam | epoch: 001 | loss: 0.00000 - rmse_validation/Neg: 0.0000 | val_loss: 1483.22388 - val_acc: -1483.2239 -- iter: 4500/4500
--
Training Step: 2  | total loss: [1m[32m1328.89392[0m[0m | time: 1.511s
| Adam | epoch: 002 | loss: 1328.89392 - rmse_validation/Neg: -1449.6945 | val_loss: 1483.21375 - val_acc: -1483.2137 -- iter: 4500/4500
--
Training Step: 3  | total loss: [1m[32m1449.69446[0m[0m | time: 1.505s
| Adam | epoch: 003 | loss: 1449.69446 - rmse_validation/Neg: -1449.6945 | val_loss: 1483.20337 - val_acc: -1483.2034 -- iter: 4500/4500
--
Training Step: 4  | total loss: [1m[32m1469.82031[0m[0m | time: 1.514s
| Adam | epoch: 004 | loss: 1469.8

Training Step: 38  | total loss: [1m[32m1499.57861[0m[0m | time: 1.514s
| Adam | epoch: 038 | loss: 1499.57861 - rmse_validation/Neg: -1368.7506 | val_loss: 582.21741 - val_acc: -582.2174 -- iter: 4500/4500
--
Training Step: 39  | total loss: [1m[32m1368.75061[0m[0m | time: 1.510s
| Adam | epoch: 039 | loss: 1368.75061 - rmse_validation/Neg: -1368.7506 | val_loss: 208.75345 - val_acc: -208.7534 -- iter: 4500/4500
--
Training Step: 40  | total loss: [1m[32m1220.93823[0m[0m | time: 1.508s
| Adam | epoch: 040 | loss: 1220.93823 - rmse_validation/Neg: -1220.9382 | val_loss: 516.43182 - val_acc: -516.4318 -- iter: 4500/4500
--
Training Step: 41  | total loss: [1m[32m1035.77002[0m[0m | time: 1.517s
| Adam | epoch: 041 | loss: 1035.77002 - rmse_validation/Neg: -1035.7700 | val_loss: 1058.15466 - val_acc: -1058.1547 -- iter: 4500/4500
--
Training Step: 42  | total loss: [1m[32m941.88751[0m[0m | time: 1.511s
| Adam | epoch: 042 | loss: 941.88751 - rmse_validation/Neg: -941.8

Training Step: 77  | total loss: [1m[32m325.05420[0m[0m | time: 1.515s
| Adam | epoch: 077 | loss: 325.05420 - rmse_validation/Neg: -325.0542 | val_loss: 74.53465 - val_acc: -74.5346 -- iter: 4500/4500
--
Training Step: 78  | total loss: [1m[32m316.54111[0m[0m | time: 1.507s
| Adam | epoch: 078 | loss: 316.54111 - rmse_validation/Neg: -316.5411 | val_loss: 126.57973 - val_acc: -126.5797 -- iter: 4500/4500
--
Training Step: 79  | total loss: [1m[32m292.20221[0m[0m | time: 1.530s
| Adam | epoch: 079 | loss: 292.20221 - rmse_validation/Neg: -292.2022 | val_loss: 105.71915 - val_acc: -105.7191 -- iter: 4500/4500
--
Training Step: 80  | total loss: [1m[32m275.09537[0m[0m | time: 1.527s
| Adam | epoch: 080 | loss: 275.09537 - rmse_validation/Neg: -275.0954 | val_loss: 88.09415 - val_acc: -88.0942 -- iter: 4500/4500
--
Training Step: 81  | total loss: [1m[32m257.92120[0m[0m | time: 1.537s
| Adam | epoch: 081 | loss: 257.92120 - rmse_validation/Neg: -257.9212 | val_loss: 92

KeyboardInterrupt: 

### Entraînement de DIST_REL_CC_01

Dans les faits, les fonctions suivantes seront appelées indépendamment pour que les modèles puissent s'entraîner en parallèle.

In [10]:
"""prepare_data_and_train(train_CC_prepared_input_loc, train_CC_labels_loc, "DIST_REL_CC_01_basic",
                       models_CC_loc, logs_CC_loc, gpu_mem_prop=0.28)"""

TypeError: prepare_data_and_train() got an unexpected keyword argument 'gpu_mem_prop'

### Entraînement de DIST_REL_CH_01

In [None]:
prepare_data_and_train(train_CH_prepared_input_loc, train_CH_labels_loc, "DIST_REL_CH_01_basic",
                       models_CH_loc, logs_CH_loc, gpu_mem_prop=0.31)

### Entraînement de DIST_REL_OH_01

In [None]:
"""prepare_data_and_train(train_OH_prepared_input_loc, train_OH_labels_loc, "DIST_REL_OH_01_basic",
                       models_OH_loc, logs_OH_loc, gpu_mem_prop=0.28)"""