# Tutorial 7

## Massive prediction of test set

Once our models are trained, we're ready to predict for each protein in the test set, the GO term probability for each model. 

In [1]:
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

One more time, to avoid the memory to go full, we have developed a pipeline that make the predictions for all the proteins for a single term that we will execute in a separated thread for each term.

In [2]:
# Add the pipeline to the working environment
import sys
sys.path.append('../scripts/')

from singleTermPredPipeline import *

In [3]:
#Some imports 
from manas_cafa.bio.protein import Protein
from pathlib import Path
import tensorflow as tf
import multiprocessing


In [4]:
# allow memory growth for the GPU
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

Now we just have predict to predict, for each model, for all the test set proteins. To avoid fulfilling all the GPU memory, we will take batches of 2500 proteins to predict at a same time. 

In [5]:
for path in Path('/data/models/').rglob('*.h5'):
    print(path)
    
    try:
        p = multiprocessing.Process(target=singleTermPredPipeline( path, results_path="/data/models/predictions.tsv", batch_size=2500 ))
        p.start()
        p.join()
    except:
        continue

/data/models/12/0005912.h5
PREDICTING /data/models/12/0005912.h5
Predicting for 2500 proteins
Predicting for 2500 proteins
Predicting for 2499 proteins
Predicting for 2498 proteins
Predicting for 2500 proteins
Predicting for 2499 proteins
