How to add an environment to Jupyter Notebook:<br>
`
conda env list
conda install -c anaconda ipykernel
python -m ipykernel install --user --name=EnvironmentName
`

What I have added in TrialEnv2:<br>
`
conda install python=3.10
conda install numpy
conda install -c conda-forge openbabel
conda install pandas
conda install -c conda-forge rdkit
conda install -c anaconda scikit-learn
`

In [None]:
#!conda install -y -c conda-forge openbabel
#Import files
import os
import warnings
import numpy as np
import keras
import keras.backend as K
from openbabel import pybel, openbabel
from sklearn.model_selection import train_test_split

from data import Featurizer, make_grid
from .net.PUResNet import PUResNet
from train_functions import get_grids, get_training_data, DiceLoss

data_folder_path = "../../TrainData/train_test_families_1rep" # Poner nombre del zip

: 

## Prepare the data

In [None]:
# To not see any warnings: pybel.ob.obErrorLog.StopLogging()
# To see warnings: pybel.ob.obErrorLog.StartLogging()

proteins, binding_sites, _ = get_training_data(data_folder_path)

: 

Download and upload the numpy arrays created:

In [None]:
# Download training data
# np.save(data_folder_path+'_proteins.npy', proteins)
# np.save(data_folder_path+'_binding_sites.npy', binding_sites)

: 

In [None]:
# Upload training data
proteins = np.load(data_folder_path+'_proteins.npy')
binding_sites = np.load(data_folder_path+'_binding_sites.npy')

: 

In [None]:
# Check that the two sets have the same number of training parameters
print(proteins.shape)
print(binding_sites.shape)

: 

In [None]:
# Separate between train and test sets
X_train, X_test, y_train, y_test = train_test_split(proteins, 
                                                    binding_sites, 
                                                    test_size=0.2, 
                                                    random_state=42)

: 

## Train the model

In [None]:
# I have based myself on this tutorial:
# https://keras.io/examples/vision/3D_image_classification/

## DEFINE VARIABLES ##
# In the paper, a batch size of 5 was used.
# They also found DiceLoss to be the best loss function to train the model
batch_size = 5
epochs = 300
loss_function = DiceLoss


## DEFINE CALLBACKS ##
# A Callback is an object that can perform actions at various stages of training
# ModelCheckpoint will save the best weights of the training
# EarlyStopping stops the training when val_loss stops improving
checkpoint_cb = keras.callbacks.ModelCheckpoint(
    filepath=data_folder_path+"_best_weights.h5",
    monitor = "val_loss",
    save_best_only=True)
early_stopping_cb = keras.callbacks.EarlyStopping(monitor="val_loss", 
                                                  patience=15)

## TRAIN THE MODEL ##
# HERE IS WHERE THE FINE TUNNING SHOULD BE MADE
model = PUResNet()
model.compile(loss=loss_function, optimizer="adam", 
              metrics=["accuracy"])

model.fit(X_train, y_train, 
          batch_size=batch_size, epochs=epochs, 
          validation_split=0.1, shuffle=True,
          callbacks=[checkpoint_cb, early_stopping_cb])

: 

## Visualise the model

In [None]:
np.save(data_folder_path+'_accuracy.npy', model.history.history["accuracy"])
np.save(data_folder_path+'_val_loss.npy', model.history.history["val_loss"])

: 

In [None]:
# Visualize the performance of the model
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 2, figsize=(20, 3))
ax = ax.ravel()

for i, metric in enumerate(["accuracy", "loss"]):
    ax[i].plot(model.history.history[metric])
    ax[i].plot(model.history.history["val_" + metric])
    ax[i].set_title("Model {}".format(metric))
    ax[i].set_xlabel("epochs")
    ax[i].set_ylabel(metric)
    ax[i].legend(["train", "val"])
    
# Save into Data
plt.savefig(data_folder_path+'.png')

: 

Info:
* train_test_subset_2000: trained with batch_size=5, early_stopping_cb="val_loss", patience = 15, trained during 25 epochs