DerivaML is a class library built on the Deriva Scientific Asset management system that is designed to help simplify a number of the basic operations associated with building and testing ML libraries based on common toolkits such as TensorFlow.  This notebook reviews the basic features of the DerivaML library.

In [4]:
%load_ext autoreload
%autoreload 2

In [11]:
import pandas as pd
from deriva.core import DerivaServer, ErmrestCatalog, get_credential
from deriva.core.utils.globus_auth_utils import GlobusNativeLogin
from deriva_ml.deriva_ml_base import DerivaML, DerivaMLException, ColumnDefinition, BuiltinTypes
from deriva_ml.schema_setup.create_schema import create_ml_schema
from deriva_ml.schema_setup.test_catalog import create_test_catalog
from deriva_ml.execution_configuration import ExecutionConfiguration

Set the details for the catalog we want and authenticate to the server if needed.

In [13]:
hostname = 'dev.eye-ai.org'
domain_schema = 'demo-schema'

gnl = GlobusNativeLogin(host=hostname)
if gnl.is_logged_in([hostname]):
    print("You are already logged in.")
else:
    gnl.login([hostname], no_local_server=True, no_browser=True, refresh_tokens=True, update_bdbag_keychain=True)
    print("Login Successful")


2024-09-10 13:20:27,784 - DEBUG - on lookup, default setting: GLOBUS_SDK_ENVIRONMENT=production
2024-09-10 13:20:27,784 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-09-10 13:20:27,785 - DEBUG - Service URL Lookup for "auth" under env "production"
2024-09-10 13:20:27,785 - DEBUG - Service URL Lookup Result: "auth" is at "https://auth.globus.org/"
2024-09-10 13:20:27,786 - DEBUG - on lookup, default setting: GLOBUS_SDK_VERIFY_SSL=True
2024-09-10 13:20:27,786 - DEBUG - on lookup, default setting: GLOBUS_SDK_HTTP_TIMEOUT=60.0
2024-09-10 13:20:27,786 - DEBUG - initialized transport of type <class 'globus_sdk.transport.requests.RequestsTransport'>
2024-09-10 13:20:27,787 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>
2024-09-10 13:20:27,787 - DEBUG - Using code handlers (<fair_research_

You are already logged in.


Create a test catalog and get an instance of the DerivaML class.

In [14]:
test_catalog = create_test_catalog(hostname, domain_schema)
ml_instance = DerivaML(hostname, test_catalog.catalog_id, domain_schema, None, None, "1")

2024-09-10 13:20:48,782 - DEBUG - on lookup, default setting: GLOBUS_SDK_ENVIRONMENT=production
2024-09-10 13:20:48,783 - INFO - Creating client of type <class 'globus_sdk.services.auth.client.native_client.NativeAppAuthClient'> for service "auth"
2024-09-10 13:20:48,783 - DEBUG - Service URL Lookup for "auth" under env "production"
2024-09-10 13:20:48,784 - DEBUG - Service URL Lookup Result: "auth" is at "https://auth.globus.org/"
2024-09-10 13:20:48,784 - DEBUG - on lookup, default setting: GLOBUS_SDK_VERIFY_SSL=True
2024-09-10 13:20:48,784 - DEBUG - on lookup, default setting: GLOBUS_SDK_HTTP_TIMEOUT=60.0
2024-09-10 13:20:48,785 - DEBUG - initialized transport of type <class 'globus_sdk.transport.requests.RequestsTransport'>
2024-09-10 13:20:48,785 - INFO - Finished initializing AuthLoginClient. client_id='8ef15ba9-2b4a-469c-a163-7fd910c9d111', type(authorizer)=<class 'globus_sdk.authorizers.base.NullAuthorizer'>
2024-09-10 13:20:48,785 - DEBUG - Using code handlers (<fair_research_

In [15]:
#from IPython.display import IFrame
#display(IFrame(ml_instance.chaise_url("Subject"), 900,500)


In [16]:
ml_instance.chaise_url("Subject")

'https://dev.eye-ai.org/chaise/recordset/#217/demo-schema:Subject@sort(RID)'

In [17]:
[t.name for t in ml_instance.find_vocabularies()]

['Feature_Name',
 'Workflow_Type',
 'Dataset_Type',
 'Execution_Metadata_Type',
 'Execution_Asset_Type']

In [18]:
ml_instance.create_vocabulary("My termset", comment="Terms to use for generating tests")

2024-09-10 13:21:35,938 - DEBUG - Resetting dropped connection: dev.eye-ai.org
2024-09-10 13:21:36,473 - DEBUG - https://dev.eye-ai.org:443 "POST /ermrest/catalog/217/schema/demo-schema/table HTTP/11" 201 3918


<deriva.core.ermrest_model.Table at 0x1759fda90>

In [19]:
[t.name for t in ml_instance.find_vocabularies()]

['Feature_Name',
 'Workflow_Type',
 'Dataset_Type',
 'Execution_Metadata_Type',
 'Execution_Asset_Type',
 'My termset']

In [20]:
for i in range(5):
    ml_instance.add_term("My termset", f"Term{i}", description=f"My term {i}", synonyms=[f"t{i}", f"T{i}"])

2024-09-10 13:21:46,587 - DEBUG - Resetting dropped connection: dev.eye-ai.org
2024-09-10 13:21:46,831 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/schema HTTP/11" 200 76593
2024-09-10 13:21:46,922 - DEBUG - Inserting entities to path: /entity/demo-schema:My%20termset?defaults=RMB,RMT,RCB,RID,ID,URI,RCT
2024-09-10 13:21:46,924 - DEBUG - yielding batch of 1/1 entities (0:1)
2024-09-10 13:21:46,996 - DEBUG - https://dev.eye-ai.org:443 "POST /ermrest/catalog/217/entity/demo-schema:My%20termset?defaults=RMB,RMT,RCB,RID,ID,URI,RCT HTTP/11" 200 571
2024-09-10 13:21:46,999 - DEBUG - Fetched 1 entities
2024-09-10 13:21:47,054 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/schema HTTP/11" 304 0
2024-09-10 13:21:47,064 - DEBUG - Inserting entities to path: /entity/demo-schema:My%20termset?defaults=RMB,RMT,RCB,RID,ID,URI,RCT
2024-09-10 13:21:47,065 - DEBUG - yielding batch of 1/1 entities (0:1)
2024-09-10 13:21:47,142 - DEBUG - https://dev.eye-ai.org:443 "POST /e

In [21]:
pd.DataFrame([{'Name': v.name, 'Description': v.description, 'Synonyms': v.synonyms} for v in ml_instance.list_vocabulary_terms("My termset")])

2024-09-10 13:21:49,856 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/schema HTTP/11" 304 0
2024-09-10 13:21:49,866 - DEBUG - Fetching /entity/My%20termset:=demo-schema:My%20termset
2024-09-10 13:21:49,921 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/entity/My%20termset:=demo-schema:My%20termset HTTP/11" 200 1691
2024-09-10 13:21:49,923 - DEBUG - Fetched 5 entities


Unnamed: 0,Name,Description,Synonyms
0,Term0,My term 0,"[t0, T0]"
1,Term1,My term 1,"[t1, T1]"
2,Term2,My term 2,"[t2, T2]"
3,Term3,My term 3,"[t3, T3]"
4,Term4,My term 4,"[t4, T4]"


In [22]:
print(ml_instance.lookup_term("My termset", "T1"))
print(ml_instance.lookup_term("My termset", "t1"))

2024-09-10 13:21:50,928 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/schema HTTP/11" 304 0
2024-09-10 13:21:50,939 - DEBUG - Fetching /entity/My%20termset:=demo-schema:My%20termset
2024-09-10 13:21:50,991 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/entity/My%20termset:=demo-schema:My%20termset HTTP/11" 304 0
2024-09-10 13:21:50,993 - DEBUG - Fetched 5 entities
2024-09-10 13:21:51,061 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/schema HTTP/11" 304 0
2024-09-10 13:21:51,073 - DEBUG - Fetching /entity/My%20termset:=demo-schema:My%20termset
2024-09-10 13:21:51,126 - DEBUG - https://dev.eye-ai.org:443 "GET /ermrest/catalog/217/entity/My%20termset:=demo-schema:My%20termset HTTP/11" 304 0
2024-09-10 13:21:51,128 - DEBUG - Fetched 5 entities


name='Term1' synonyms=['t1', 'T1'] id='demo-schema:2RR' uri='/id/2RR' description='My term 1' rid='2RR'
name='Term1' synonyms=['t1', 'T1'] id='demo-schema:2RR' uri='/id/2RR' description='My term 1' rid='2RR'


In [None]:
test_catalog.delete_ermrest_catalog(really=True)