<a href="https://colab.research.google.com/github/informatics-isi-edu/facebase-ml-exec/blob/main/notebooks/VGG19_Diagnosis_Predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 

This notebook applied a pre-trained model to a dataset specified in the configuration file and uploads the labels to the catalog.  The ROC curve is also calculated and uploaded.

In [1]:
repo_dir = "Repos"   # Set this to be where your github repos are located.
%load_ext autoreload
%autoreload 2

# Update the load path so python can find modules for the model
import sys
from pathlib import Path
sys.path.insert(0, str(Path.home() / repo_dir / "facebase-ml"))

In [2]:
import json
import os
from facebase_ml.facebase_ml import FaceBaseML
import pandas as pd
from pathlib import Path, PurePath
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)

2024-05-01 19:54:07.403200: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
from deriva.core.utils.globus_auth_utils import GlobusNativeLogin
catalog_id = "fb-ml" #@param
host = 'ml.facebase.org'


gnl = GlobusNativeLogin(host=host)
if gnl.is_logged_in([host]):
    print("You are already logged in.")
else:
    gnl.login([host], no_local_server=True, no_browser=True, refresh_tokens=True, update_bdbag_keychain=True)
    print("Login Successful")

2024-05-01 19:54:10,316 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-05-01 19:54:10,317 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>


You are already logged in.


Connect to Eye-AI catalog.  Configure to store data local cache and working directories.  Initialize Eye-AI for pending execution based on the provided configuration file.

In [4]:
# Variables to configure the rest of the notebook.

cache_dir = '/data'        # Directory in which to cache materialized BDBags for datasets
working_dir = '/data'    # Directory in which to place output files for later upload.
# Change to your username on the VM

configuration_rid="58-TW70"      # Configuration file for this run.  Needs to be changed for each execution.

In [5]:
FB = FaceBaseML(hostname = host, catalog_id = catalog_id, cache_dir= cache_dir, working_dir=working_dir)
print(f" Initializing model version: {FB.version}")

2024-05-01 19:54:12,284 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-05-01 19:54:12,285 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>
2024-05-01 19:54:12,855 - INFO - Loading dirty model.  Consider commiting and tagging: 1.1.0.post2+git.89d983dd.dirty


 Initializing model version: 1.1.0.post2+git.89d983dd.dirty


In [6]:
# @title Initiate an Execution
configuration_records = FB.execution_init(configuration_rid=configuration_rid)
input_dataset = configuration_records.bag_paths[0] # Assumes that the configuration file only specifies one dataset.
configuration_records.model_dump()

2024-05-01 19:54:15,466 - INFO - File [/data/maryamahmadii/FaceBaseML_working/Execution_Metadata/Execution_Config-MusMorph_train_10pct.json] transfer successful. 0.74 KB transferred. Elapsed time: 0:00:00.000055.
2024-05-01 19:54:15,467 - INFO - Verifying MD5 checksum for downloaded file [/data/maryamahmadii/FaceBaseML_working/Execution_Metadata/Execution_Config-MusMorph_train_10pct.json]
2024-05-01 19:54:15,485 - INFO - Configuration validation successful!


{'caching_dir': PosixPath('/data'),
 'working_dir': PosixPath('/data/maryamahmadii/FaceBaseML_working'),
 'vocabs': {'Workflow_Type': [{'name': 'Model Training', 'rid': '58-TC9E'}],
  'Execution_Asset_Type': [{'name': 'Model', 'rid': '58-TC9G'}]},
 'execution_rid': '58-TWJ8',
 'workflow_rid': '58-TC9M',
 'bag_paths': [PosixPath('/data/58-TC6A_dbe4619ed2511dd8fae878901edefa38419604b264c4c5d3ad32e9d3c5f5944c/Dataset_58-TC6A'),
  PosixPath('/data/58-TCBG_c602a1818329e7ccc455c931f92b767b57e796e6e6ffda510f18f98e6ebf8d9c/Dataset_58-TCBG'),
  PosixPath('/data/58-TCBR_076bc7f6a9e2ee7dad26971e52af4daf36fef312384a327e0368dbfdd1f5ecae/Dataset_58-TCBR')],
 'assets_paths': [],
 'configuration_path': PosixPath('/data/maryamahmadii/FaceBaseML_working/Execution_Metadata/Execution_Config-MusMorph_train_10pct.json')}

In [8]:
train_base_dir = configuration_records.bag_paths[0]
valid_base_dir = configuration_records.bag_paths[1]
test_base_dir = configuration_records.bag_paths[2]

biosample_filename = 'data/biosample.csv'
genotype_filename = 'data/genotype.csv'
Train_output_filename = FB.working_dir/'Train_mapped_file.csv'
Val_output_filename = FB.working_dir/'Val_mapped_file.csv'
Test_output_filename = FB.working_dir/'Test_mapped_file.csv'


Train_df, Train_mapped_file = FB.join_and_save_csv(train_base_dir, biosample_filename, genotype_filename, Train_output_filename)
Val_df, Val_mapped_file = FB.join_and_save_csv(valid_base_dir, biosample_filename, genotype_filename, Val_output_filename)
Test_df, Test_mapped_file = FB.join_and_save_csv(test_base_dir, biosample_filename, genotype_filename, Test_output_filename)


In [9]:
# Prepare datasets
dataset_manager = FaceBaseML()

csv_path = Train_mapped_file
images_folder_path = train_base_dir.joinpath('data/assets/Image')
image_paths, labels = dataset_manager.load_images_and_labels(csv_path, images_folder_path)
train_dataset = dataset_manager.prepare_dataset(image_paths, labels, batch_size=10, shuffle=False, augment_type= None)

csv_path = Val_mapped_file
images_folder_path = valid_base_dir.joinpath('data/assets/Image')
image_paths, labels = dataset_manager.load_images_and_labels(csv_path, images_folder_path)
validation_dataset = dataset_manager.prepare_dataset(image_paths, labels, batch_size=10, shuffle=False, augment_type= None)

csv_path = Test_mapped_file
images_folder_path = test_base_dir.joinpath('data/assets/Image')
image_paths, labels = dataset_manager.load_images_and_labels(csv_path, images_folder_path)
test_dataset = dataset_manager.prepare_dataset(image_paths, labels, batch_size=10, shuffle=False, augment_type= None)


2024-05-01 19:54:23,568 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-05-01 19:54:23,568 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>
2024-05-01 19:54:23,895 - INFO - Loading dirty model.  Consider commiting and tagging: 1.1.0.post2+git.89d983dd.dirty
2024-05-01 19:54:24.055114: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-05-01 19:54:24.094511: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so 

### Go to training process, This is just to check if images and labels have been loaded correctly

In [10]:
for images, labels in validation_dataset.take(1):
    print("Images shape:", images.shape) 
    print("Labels shape:", labels.shape)
    print("Images dtype:", images.dtype)
    print("Labels dtype:", labels.dtype)




Images shape: (10, 128, 128, 128, 1)
Labels shape: (10,)
Images dtype: <dtype: 'float32'>
Labels dtype: <dtype: 'int32'>


2024-05-01 18:03:12.567500: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [12]:
# Iterating through the dataset to count actual images
num_images = 0
for images, labels in test_dataset.unbatch():
    num_images += 1

print("Total number of images in test_dataset:", num_images)


Total number of images in test_dataset: 116


2024-05-01 18:10:34.927606: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [None]:
# Build the model
FB = FaceBaseML()
FB.build_3d_cnn_model()

# Train the model with the small datasets
history = FB.ml_model.fit(
    x=small_train_dataset,  
    validation_data=small_val_dataset, 
    epochs=2,
    callbacks=[early_stopping, model_checkpoint]
)


## Training process starts here

In [11]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

# Define the base directory where the model checkpoints should be saved
base_dir = '/data/maryamahmadii/FaceBaseML_working'

# Define callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Define ModelCheckpoint
model_checkpoint = ModelCheckpoint(filepath=base_dir + '/best_model.keras', save_best_only=True)


In [None]:
FB = FaceBaseML()
FB.build_3d_cnn_model()

history = FB.ml_model.fit(
    x=train_dataset,  
    validation_data=validation_dataset, 
    epochs=2,
    batch_size=2,
    callbacks=[early_stopping, model_checkpoint]
)


2024-05-01 18:41:49,563 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-05-01 18:41:49,564 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>
2024-05-01 18:41:50,020 - INFO - Loading dirty model.  Consider commiting and tagging: 1.1.0.post2+git.89d983dd.dirty


Epoch 1/2


  super().__init__(
I0000 00:00:1714588969.917612   25756 service.cc:145] XLA service 0x7f0390003fd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1714588969.928428   25756 service.cc:153]   StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
2024-05-01 18:42:50.369106: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-05-01 18:42:51.119347: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907
2024-05-01 18:43:03.964117: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng20{k2=8,k3=0} for conv (f32[16,1,3,3,3]{4,3,2,1,0}, u8[0]{0}) custom-call(f32[10,1,128,128,128]{4,3,2,1,0}, f32[10,16,126,126,126]{4,3,2,1,0}), window={size=3x3x3}, dim_labels=bf012_oi012->bf012, custom_call_target="__cudnn$convBackwardFilter", backend_config={"operation_queue_id"

[1m 1/38[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m50:02[0m 81s/step - accuracy: 0.2000 - loss: 0.6957

I0000 00:00:1714588991.170616   25756 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1687s[0m 43s/step - accuracy: 0.7134 - loss: 0.5813 - val_accuracy: 0.9355 - val_loss: 0.2386
Epoch 2/2


## Kernel died after first epoch, so run this if kernel dies during training
Don't forget to run codes from the beginning until making training, validation, and test datasets before this point

In [9]:
from tensorflow.keras.models import load_model

base_dir = '/data/maryamahmadii/FaceBaseML_working'

checkpoint_path = base_dir + '/best_model.keras'
FB = FaceBaseML()
FB.ml_model = load_model(checkpoint_path)


2024-05-01 19:16:07,804 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-05-01 19:16:07,805 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>
2024-05-01 19:16:08,234 - INFO - Loading dirty model.  Consider commiting and tagging: 1.1.0.post2+git.89d983dd.dirty


In [None]:
# Set up the callbacks again
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

early_stopping = EarlyStopping(monitor='val_loss', patience=3)
model_checkpoint = ModelCheckpoint(filepath=base_dir + '/best_model.keras', save_best_only=True)

# Resume training from the saved checkpoint
history = FB.ml_model.fit(
    x=train_dataset,
    validation_data=validation_dataset,
    epochs=2,  # or the number of epochs you want to train
    batch_size=2,
    initial_epoch=1,  # Replace with the epoch number where you want to resume
    callbacks=[early_stopping, model_checkpoint]
)


Epoch 2/2


I0000 00:00:1714591150.258195    6439 service.cc:145] XLA service 0x7f1b5800d160 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1714591150.258254    6439 service.cc:153]   StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
2024-05-01 19:19:10.475693: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-05-01 19:19:11.102601: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907
2024-05-01 19:19:18.646036: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng20{k2=8,k3=0} for conv (f32[16,1,3,3,3]{4,3,2,1,0}, u8[0]{0}) custom-call(f32[10,1,128,128,128]{4,3,2,1,0}, f32[10,16,126,126,126]{4,3,2,1,0}), window={size=3x3x3}, dim_labels=bf012_oi012->bf012, custom_call_target="__cudnn$convBackwardFilter", backend_config={"operation_queue_id":"0","wait_on_operat

[1m 1/38[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m37:01[0m 60s/step - accuracy: 0.9000 - loss: 0.3427

I0000 00:00:1714591165.622157    6439 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m38/38[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33s/step - accuracy: 0.8228 - loss: 0.5875 

### Kernel died again after second epoch so load model again for test dataset

In [10]:
from tensorflow.keras.models import load_model
base_dir = '/data/maryamahmadii/FaceBaseML_working'

checkpoint_path = base_dir + '/best_model.keras'

try:
    model = load_model(checkpoint_path)
    print("Checkpoint loaded successfully.")
except Exception as e:
    print("Error loading checkpoint:", str(e))


Checkpoint loaded successfully.


In [11]:
model.summary()

In [12]:
test_loss, test_accuracy = model.evaluate(test_dataset)
print(f"Test Loss: {test_loss}, Test Accuracy: {test_accuracy}")


I0000 00:00:1714593326.759683   20783 service.cc:145] XLA service 0x7f8ec0008f90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1714593326.759752   20783 service.cc:153]   StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
2024-05-01 19:55:26.862302: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-05-01 19:55:27.271173: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907


[1m 1/12[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m7:23[0m 40s/step - accuracy: 0.6000 - loss: 0.9851

I0000 00:00:1714593330.789457   20783 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m395s[0m 32s/step - accuracy: 0.8195 - loss: 0.5055
Test Loss: 0.30969393253326416, Test Accuracy: 0.9051724076271057


Algorithm was trained on cropped images, so take the raw images and bounding boxes and apply, storing the results in the working directory.

Import the actual model code and then run against the input dataset specified in the configuration file.  

In [None]:
# @title Execute Proecss algorithm (Test model)
from facebase_ml_tools.models.some_file import #some model

with FB.execution(execution_rid=configuration_records.execution_rid) as exec:
  output_path = FB.execution_assets_path/Path("Model_Prediction")


In [None]:
# @title Plot ROC.


Add the new lables to the catalog using the provided diagnosis tage for this execution.  Also upload any additional assets that were produced by this execution..

In [None]:
# @title Save Diagnosis


In [None]:
# @title Save Execution Assets (model) and Metadata
uploaded_assets = FB.execution_upload(configuration_records.execution_rid, False)
