# Tutorial 6

## Massive model training

Now that we know how to train a single model and bencharmk it, let's train models massively for all the candidate go terms.

In [1]:
# Ignore warnings 
import warnings
warnings.filterwarnings('ignore')

Because we're executing this massive training in a notebook, we are going to call indeed a script which does the training. Otherwise, the memory of the GPUs gets full and an error is thrown. In the `/scripts/` folder a script training models in the same fashion than the previous tutorial will be executed for each candidate GO term in a separated thread which cleans the GPU memory when finished. 

In [2]:
import sys
sys.path.append('../scripts/')
from singleTermPipeline import singleTermPipeline

In [3]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd
import multiprocessing

Allow GPU memory growth

In [4]:
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

Let's load the IA-ordered candidate terms (in this examples, with at least 75 proteins in the train set)

In [5]:
ordered_candidates_df = pd.read_csv( '../data/go_terms_test_parent_candidates_maxlen500_minmembers75_ordered.tsv', 
                                    index_col=0, sep='\t', header=0)
ordered_candidates_df.head()

Unnamed: 0_level_0,ia
go_term,Unnamed: 1_level_1
GO:0031386,11.619083
GO:0045505,11.228458
GO:0048038,11.164429
GO:0031681,10.90653
GO:0097602,10.505992


For this tutorial, let's train 5 models, in a parallel fashion, and save them into a working folder.

In [6]:
for go_term in ordered_candidates_df.index[0:5]:
    try:
        model_save_path = '/data/models/'+go_term[-2:]+'/'+go_term.replace("GO:","")+'.h5'

        p = multiprocessing.Process(target=singleTermPipeline(go_term , 10, 32, model_save_path))
        p.start()
        p.join()
    except:
        continue

RUNNING PIPELINE FOR GO term: GO:0031386
Positive examples: 113
Negative examples: 1064
AUC: 0.9834404349975285

RUNNING PIPELINE FOR GO term: GO:0045505
Positive examples: 82
Negative examples: 787
AUC: 0.9631973140495868

RUNNING PIPELINE FOR GO term: GO:0048038
Positive examples: 126
Negative examples: 1205
AUC: 0.9806749832924928

RUNNING PIPELINE FOR GO term: GO:0031681
Positive examples: 78
Negative examples: 738
AUC: 0.9946460440214158

RUNNING PIPELINE FOR GO term: GO:0097602
Positive examples: 108
Negative examples: 1014
AUC: 0.9716500900618685

