<a href="https://colab.research.google.com/github/informatics-isi-edu/eye-ai-exec/blob/main/notebooks/VGG19_Diagnosis_Predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VGG19 Model Application

This notebook applied a pre-trained model to a dataset specified in the configuration file and uploads the labels to the catalog.  The ROC curve is also calculated and uploaded.

In [1]:
# Prerequisites to configure colab 
import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    !pip install deriva
    !pip install bdbag
    !pip install --upgrade --force pydantic
    !pip install git+https://github.com/informatics-isi-edu/deriva-ml git+https://github.com/informatics-isi-edu/eye-ai-ml


In [2]:
repo_dir = "Repos"   # Set this to be where your github repos are located.
%load_ext autoreload
%autoreload 2

# Update the load path so python can find modules for the model
import sys
from pathlib import Path
sys.path.insert(0, str(Path.home() / repo_dir / "eye-ai-ml"))

In [3]:
import json
import os
import eye_ai
from eye_ai.eye_ai import EyeAI
import pandas as pd
from pathlib import Path, PurePath
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True)

In [4]:
# @title login to DERIVA via Globus Auth

DEBUG_MODE = False #@param ["False", "True"] {type:"raw"}
catalog_id = "eye-ai" #@param
DEFAULT_SERVER = 'dev.eye-ai.org' if DEBUG_MODE else 'www.eye-ai.org'

!deriva-globus-auth-utils login --no-browser --host {DEFAULT_SERVER}


You are already logged in.


Connect to Eye-AI catalog.  Configure to store data local cache and working directories.  Initialize Eye-AI for pending execution based on the provided configuration file.

In [5]:
# Variables to configure the rest of the notebook.

cache_dir = '/data'        # Directory in which to cache materialized BDBags for datasets
working_dir = 'working'    # Directory in which to place output files for later upload.

configuration_rid="2-A5KC"      # Configuration file for this run.  Needs to be changed for each execution.

In [6]:
EA = EyeAI(hostname = DEFAULT_SERVER, catalog_id = catalog_id, cache_dir= cache_dir, working_dir=working_dir)
print(f" Initializing model version: {EA.version}")

2024-04-29 11:00:50,742 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-04-29 11:00:50,742 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>


 Initializing model version: 1.1.0.post1+git.68f4582d.dirty


In [7]:
# @title Initiate an Execution
configuration_records = EA.execution_init(configuration_rid=configuration_rid)
input_dataset = configuration_records.assets_paths[0] # Assumes that the configuration file only specifies one dataset.
configuration_records.model_dump()

2024-04-29 11:00:53,362 - INFO - File [working/Execution_Metadata/Execution_Config-diagnosis_insert.json] transfer successful. 0.67 KB transferred. Elapsed time: 0:00:00.000058.
2024-04-29 11:00:53,363 - INFO - Verifying SHA256 checksum for downloaded file [working/Execution_Metadata/Execution_Config-diagnosis_insert.json]
2024-04-29 11:00:53,388 - INFO - Configuration validation successful!
2024-04-29 11:00:54,483 - INFO - Attempting to resolve http://identifiers.org/minid:zTfxqlUm0T9S into a valid set of URLs.
2024-04-29 11:00:55,718 - INFO - The identifier minid:zTfxqlUm0T9S resolved into the following locations: [https://eye-ai-shared.s3.amazonaws.com//6656e08709b2f7c0bbed8284fdfb550d/2024-03-20_18.02.51/Dataset_2-7KA2.zip]
2024-04-29 11:00:55,721 - INFO - Attempting GET from URL: https://eye-ai-shared.s3.amazonaws.com//6656e08709b2f7c0bbed8284fdfb550d/2024-03-20_18.02.51/Dataset_2-7KA2.zip
2024-04-29 11:00:55,848 - INFO - File [/data/2-7KA2_385efaeefabd04cd21e1ef0a4a77e34b50f7f5bf

{'vocabs': {'Workflow_Type': [{'name': 'Diagnosis', 'rid': '2-7J8T'}],
  'Diagnosis_Tag': [{'name': 'CNN_Prediction', 'rid': '2-7KBR'}]},
 'execution_rid': '2-A77C',
 'workflow_rid': '2-A5QA',
 'bag_paths': [PosixPath('/data/2-7KA2_385efaeefabd04cd21e1ef0a4a77e34b50f7f5bf074df7d22e4d51a1142496fc/Dataset_2-7KA2')],
 'assets_paths': [PosixPath('working/Execution_Assets/diagnosis_model.h5')],
 'configuration_path': PosixPath('working/Execution_Metadata/Execution_Config-diagnosis_insert.json')}

Algorithm was trained on cropped images, so take the raw images and bounding boxes and apply, storing the results in the working directory. 

In [8]:
# @title Get Cropped Images
cropped_image_path, cropped_csv = EA.create_cropped_images(str(configuration_records.bag_paths[0]),
                                                           output_dir = working_dir,
                                                           crop_to_eye=False)

Import the actual model code and then run against the input dataset specified in the configuration file.  

In [None]:
# @title Execute Proecss algorithm (Test model)
from eye_ai.models.vgg19_diagnosis_predict import prediction

with EA.execution(execution_rid=configuration_records.execution_rid) as exec:
  output_path = EA.execution_assets_path/Path("Model_Prediction")
  pred_csv_path = prediction(input_dataset, cropped_image_path, output_path)

2024-04-29 11:02:37.911734: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-29 11:02:37.911783: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-29 11:02:37.912534: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-29 11:02:37.918635: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [None]:
# @title Plot ROC.
roc_value_path = EA.plot_roc(pred_csv_path)

Add the new lables to the catalog using the provided diagnosis tage for this execution.  Also upload any additional assets that were produced by this execution..

In [None]:
# @title Save Diagnosis
import re
pred_df = pd.read_csv(pred_csv_path)
pred_df['Image'] = pred_df['Filename'].apply(lambda x: re.search(r'Cropped_(.*?)\.', x).group(1))
# The input dataframe need two columns: Image (containing image rid) and Prediction (containing 0/1)
EA.insert_new_diagnosis(pred_df,
                        configuration_records.vocabs['Diagnosis_Tag'][0].rid,
                        configuration_records.execution_rid)


In [None]:
# @title Save Execution Assets (model) and Metadata
uploaded_assets = EA.execution_upload(configuration_records.execution_rid, False)
