# Bioinformatics Model 

## Training Using Watson Machine Learning - Tutorial 4

## Pre-requisites:

Access to an IBM Cloud Object Storage instance

Access to a WML instance running on the IBM Cloud

The e2eai_credentials.json file (included in the repo clone) in your local directory updated with credentials for the cloud object storage and WML instances

A copy of ICOS.py (included in the repo clone) in the local directory

The ibm-cos-sdk  python package installed in your python execution environment. space

The watson-machine-learning-client  python package installed in your python execution environment.

A  WML deployment space and its space id

### Contact:  fjgreco@us.ibm.com

#### <font color=red>Optional installation of ibm-cos-sdk</font>

#### <font color=red>Optional installation of watson-machine-learning-client</font>

### Run the following  if in CP4D

## Import packages

In [1]:
import json
import os
import wget

In [2]:
from ICOS import ICOS

In [3]:
from ibm_watson_machine_learning import APIClient

### Obtain access credentials

In [4]:
with open("e2eai_credentials.json") as json_file:
    credentials = json.load(json_file)

wml_credentials=credentials['wml_credentials_e2eai']

icos_credentials=credentials['icos_credentials_e2eai']


###  Manifest handling

### Approach 1: Create  a local manifest here 

In [5]:
experiment_manifest={'zip_file': 'tf_model_v8T.zip',  #<----specify from build
 'git_url': 'https://github.com/fjgreco/e2eai_assay/blob/master/tf_model_v8T.zip?raw=true', #<---specify from build
 'neural_network_pgm': 'tf_model_v8T/neural_network_v8T.py', #<----specify from build
 'training_definition_name': 'mac-bioinformatics-training-definition_v4nn8T',
 'training_run_name': 'mac-bioinformatics-training-run_v4nn8T',
 'trained_model_name': 'mac_bioinformatics_model_v4nn8T',
 'compressed_recompiled_model': 'mac_recompiled_bioinformatics_model_v4nn8T.tgz',
 'recompiled_model_h5': 'mac_recompiled_bioinformatics_model.h5',
 'deployment_name': 'mac_bioinformatics_deployment_v4nn8T',
 'training_bucket': 'e2eai-training',   #<---------- Override with you bucket name
 'results_bucket': 'e2eai-results-060621',   #<----------- Override with you bucket name
 'model_location': None}

### Approach 2: Read from json file

<font color=red>Turned off for tutorial simplification</font>

In [6]:
experiment_manifest

{'zip_file': 'tf_model_v8T.zip',
 'git_url': 'https://github.com/fjgreco/e2eai_assay/blob/master/tf_model_v8T.zip?raw=true',
 'neural_network_pgm': 'tf_model_v8T/neural_network_v8T.py',
 'training_definition_name': 'mac-bioinformatics-training-definition_v4nn8T',
 'training_run_name': 'mac-bioinformatics-training-run_v4nn8T',
 'trained_model_name': 'mac_bioinformatics_model_v4nn8T',
 'compressed_recompiled_model': 'mac_recompiled_bioinformatics_model_v4nn8T.tgz',
 'recompiled_model_h5': 'mac_recompiled_bioinformatics_model.h5',
 'deployment_name': 'mac_bioinformatics_deployment_v4nn8T',
 'training_bucket': 'e2eai-training',
 'results_bucket': 'e2eai-results-060621',
 'model_location': None}

## Step 2: Download the sample model-building code into the notebook's working directory:

In [7]:
zip_filename = experiment_manifest["zip_file"]
url= experiment_manifest["git_url"]
# NOTE: If you are re-running this code block again, having changed the model or adding your own custom model
# be careful to ensure that your new model is the one which is truly downloaded.
if not os.path.isfile( zip_filename ): 
    print("File {} not found, Download from {} ".format(zip_filename,url))
    wget.download(url)
else:
    print('File: {} is in local directory'.format(zip_filename))

File: tf_model_v8T.zip is in local directory


### Optional: Check the local file system to confirm the download from github

## Step 3: Train the model

###  Instantiate a WML client object:

In [8]:
client = APIClient(wml_credentials) 
client.version

'1.0.57'

###  Instantiate an IBM COS client object:

In [9]:
icos=ICOS(icos_credentials=icos_credentials)

### <font color=blue>Create a results bucket</font>

In [10]:
res=icos.create_bucket(experiment_manifest['results_bucket'])
print(res)

<class 'Exception'> An error occurred (BucketAlreadyExists) when calling the CreateBucket operation: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
None


#### <font color=red>If you receive a BucketAlreadyExists exception message, you may proceed assuming the bucket exists in your ICOS instance.</font>

### <font color=blue>Set deployment space</font>

In [11]:
E2EAI_WS_space_id = '36125782-4474-44a9-bc9f-8e081c7d8f73'  

In [12]:
space_id=E2EAI_WS_space_id

In [13]:
client.set.default_space(space_id)

'SUCCESS'

## Setup up WML training run

In [14]:
sequence_file='assay/assay_data_full.seq'   #Obtained from synthetic assay run
label_file='assay/assay_data_full.lbl'     #Obtained from synthetic assay run

In [15]:
command="python3 {} --sequencesFile ${{DATA_DIR}}/{} --labelsFile ${{DATA_DIR}}/{}".format(experiment_manifest["neural_network_pgm"],sequence_file,label_file)
command

'python3 tf_model_v8T/neural_network_v8T.py --sequencesFile ${DATA_DIR}/assay/assay_data_full.seq --labelsFile ${DATA_DIR}/assay/assay_data_full.lbl'

In [16]:
metaprops = {
    client.model_definitions.ConfigurationMetaNames.NAME: experiment_manifest['training_definition_name'],
    client.model_definitions.ConfigurationMetaNames.DESCRIPTION: "BIOINFORMATICS4V6NN",
    client.model_definitions.ConfigurationMetaNames.COMMAND: command,
    client.model_definitions.ConfigurationMetaNames.PLATFORM: {"name": "python", "versions": ["3.6"]},
    client.model_definitions.ConfigurationMetaNames.VERSION: "2.0",
    client.model_definitions.ConfigurationMetaNames.SPACE_UID: space_id
}

In [17]:
model_definition_details = client.model_definitions.store(experiment_manifest["zip_file"], meta_props=metaprops)

In [18]:
model_definition_id = client.model_definitions.get_uid(model_definition_details)   
print(model_definition_id)

ba8227ff-66dc-48c6-9c58-1757f854c033


In [19]:
training_metadata = {
    client.training.ConfigurationMetaNames.NAME: "BIOINFO",
    client.training.ConfigurationMetaNames.SPACE_UID: space_id,
    client.training.ConfigurationMetaNames.DESCRIPTION: "Transcription Factor Model",
    client.training.ConfigurationMetaNames.TAGS :[{
      "value": 'BIOINFO',
      "description": "predict binding property"
    }],
    client.training.ConfigurationMetaNames.TRAINING_RESULTS_REFERENCE: {
      "connection" : {
      "endpoint_url"      : "https://s3.us.cloud-object-storage.appdomain.cloud",
      "access_key_id"     : icos_credentials['cos_hmac_keys']['access_key_id'],
      "secret_access_key" : icos_credentials['cos_hmac_keys']['secret_access_key']
      },
      "location" : {
         "bucket" : experiment_manifest['results_bucket'],
      },
      "type" : "s3"
    },
    
    client.training.ConfigurationMetaNames.TRAINING_DATA_REFERENCES:
    [{
      "connection" : { 
      "endpoint_url"      : "https://s3.us.cloud-object-storage.appdomain.cloud",
      "access_key_id"     : icos_credentials['cos_hmac_keys']['access_key_id'],
      "secret_access_key" : icos_credentials['cos_hmac_keys']['secret_access_key']
      },
     "location" : { 
        "bucket" : experiment_manifest['training_bucket'],
      },
      "type" : "s3"
       } 
    ],
  client.training.ConfigurationMetaNames.MODEL_DEFINITION:{
        "id": model_definition_id,
        "command": command,
        "hardware_spec": {
          "name": "K80",
          "nodes": 1
        },
        "software_spec": {
          "name": "tensorflow_2.1-py3.7"
        },
        "parameters": {
          "name": "BIOINFO",
          "description": "Transcription Factor Model"
        }
      }
}

## Step 4: Monitor training progress and results

## Start run and monitor training progress

In [20]:
training = client.training.run(training_metadata)

training_id = client.training.get_uid(training)

from time import sleep

cts=client.training.get_details(training_id)['entity']['status']['state']

while cts not in ['completed', 'failed', 'canceled', 'error']:
    print(cts,end=' ')
    sleep(10)
    cts=client.training.get_status(training_id)['state']
    
print( cts )

pending pending pending pending pending running pending pending pending running running running running running running running running running running running running running running running running completed


## Review Results
### Check run details...

In [None]:
ctd=client.training.get_details(training_id) 
print(json.dumps(ctd,indent=2))

### model.h5, model.tgz, model.json, and model_weights.h5 were placed in ICOS by the keras python program

### <font color=blue>Extract folder in results containing training output. Retain folder name for use in salient notebook</font>

In [22]:
model_location= ctd['entity']['results_reference']['location']['logs']
model_location

'training-j5ADjf6Gg'

In [23]:
dl2=icos.get_download_list_loc(experiment_manifest['results_bucket'],model_location,results_folder='.')

Retrieving relevant bucket contents from: e2eai-results-060621 Model_location: training-j5ADjf6Gg

training-j5ADjf6Gg/bioinformatics_model.h5
training-j5ADjf6Gg/bioinformatics_model.json
training-j5ADjf6Gg/bioinformatics_model.tgz
training-j5ADjf6Gg/bioinformatics_model_cm.p
training-j5ADjf6Gg/bioinformatics_model_history.p
training-j5ADjf6Gg/bioinformatics_model_weights.h5
training-j5ADjf6Gg/training-log.txt


## Review results

In [24]:
!cat training-log.txt

Training with training/test data at:
  DATA_DIR: /mnt/data/e2eai-training
  MODEL_DIR: /job/model-code
  TRAINING_JOB: 
  TRAINING_COMMAND: python3 tf_model_v8T/neural_network_v8T.py --sequencesFile ${DATA_DIR}/assay/assay_data_full.seq --labelsFile ${DATA_DIR}/assay/assay_data_full.lbl
Storing trained model at:
  RESULT_DIR: /mnt/results/e2eai-results-060621/training-j5ADjf6Gg
Mon Jun  7 02:33:00 UTC 2021: Running Tensorflow job
2021-06-07 02:33:00.646685: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2021-06-07 02:33:01.315206: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2021-06-07 02:33:01.316041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
2021-06-07 02:33:03.718604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dy

##  <font color=green>Proceed to running e2eai-bioinformatics-analysis(tutorial).ipynb...</font>