# Generation of predictions using the OECD Toolbox API
- Created by: Louis Groff
- PIs: Imran Shah and Grace Patlewicz
- Last modified by GP: 28 March 2024
- Changes made: Additional notes on implementation added from the SI. Moved functions to metsim package. Tested metsim functions using Toolbox installed on remote server. Adjusted functions to accomodate alternative hostnames to local host.

### Running OECD Toolbox API locally

Most of the effort required to run the Toolbox API takes place with the initial setup of the local CLI instance of the Toolbox server external to the Python environment. After that, the main inputs consist of the user-specified port number (port_number) that the Toolbox is communicating over, a QSAR-Ready SMILES for the parent chemical obtained as described previously, and a numerical value corresponding to the index of the “GUID” identifier hash (model_GUID) for the desired metabolic simulator returned from the list of available simulators within the Toolbox. “GUIDs” are viewable within the Swagger UI for the Web API tool via:
http://localhost:<port_number>/api/v6/metabolism/
where the indexes in the returned list of simulators corresponding to the In Vivo Rat Simulator and In Vitro Rat Liver S9 models are index **8** and **15**, respectively. The in.hcd_smiles is URL-encoded (smiles_url) via the function within the “urllib” package in Python. An example metabolism API call structure is:
http://localhost:<port_number>/api/v6/metabolism/<model_GUID>?smiles=<smiles_url>
No further parameter tuning is available, and simulators run on TIMES defaults, except that the types of transformations are limited to phase I metabolism. The functions developed within this study to run the Toolbox take these inputs and perform API calls as necessary to query the Toolbox for metabolites using the given simulator number, and the given SMILES, which returns the list of metabolite SMILES for that chemical if metabolites exist. The Toolbox does not provide avenues to determine generational tracking of its output phase I metabolites. If a given metabolism query to the Toolbox API does not yield metabolites, it is futher queried via its “search” functions using the parent chemical in:casrn to query all available chemical entries for that in:casrn. The input casrn is stripped of hyphens in the API URL call (cas_nohyphen). The search parameters are set such that they ignore stereochemical information as well (True/False parameter at the end of the URL call). An example of the URL structure to perform a search on a casrn is given below:
http://localhost:<port_number>/api/v6/search/<cas_nohyphen>/true
These entries were filtered to remove mixtures, which discards any of the results where the “SubstanceType” parameter in the output of the API call does not equal “MonoConstituent”. Metabolism queries are sequentially performed on each of the Chemical Identifier hash strings (ChemID) associated with mono-constituent chemical entries until the SMILES associated with one of the entries returns metabolite SMILES. In this case, the URL structure changes minorly from the above API call when SMILES is given to instead use the ChemID returned from the CASRN search:
http://localhost:<port_number>/api/v6/metabolism/<model_GUID>/<ChemID>
If none of the available mono-constituent chemical entries in the Toolbox API database yield metabolites, an empty output schema is stored accordingly to reflect this result.

### Running OECD Toolbox API remotely

The WebSuite app is aimed towards local users who want to access the Toolbox WebAPI and WebClient on the same computer. When launched, WebSuite starts a WebAPI server bound to a random port on the local machine (127.0.0.1) which is only accessible to local users (on the same host).
In order to allow remote access to the WebAPI and the Web Client, you need to deploy the WebAPI server manually. There are various ways to do so (the server can either be run as a self hosted service or behind a web server as IIS). The simplest way is to run the WebAPI executable (LMC.Toolbox.WebAPI.exe, usually in %ProgramFiles(x86)%\QSAR Toolbox\QSAR Toolbox 4.5\Toolbox Server\Bin) from the command line, specifying a port in the urls parameter, e.g.:
LMC.Toolbox.WebAPI.exe --urls="http://0.0.0.0:80"

You then need to allow access to this port on the host. To access the web client, you construct a url in the following format (substituting <host> and <port> with the proper values):

http(s)://<host>:<port>/WebClient?webapihost=<host>&webapiport=<port>

### Import relevant libraries including the metsim functions

In [1]:
import os, sys
import pandas as pd

In [2]:
LIB = os.getcwd().replace("notebooks", "")

In [3]:
if not LIB in sys.path: 
    sys.path.insert(0,LIB)

In [4]:
LIB

'/home/grace/Documents/python/metsim/'

In [13]:
from metsim.sim.metsim_tb import *

### Running MetSim using the QSAR Toolbox WebServer

### 1. metsim_run_toolbox_api():
    Runs a metabolism simulation based on available data in the Toolbox, using qsar-ready SMILES as input for metabolism simulation, if metabolism simulation fails, we note which molecules fail and pass to metsim_search_toolbox_api to find altername chemical IDs (ChemId in Toolbox Outputs) to reattempt metabolism simulation with alternate qsar-ready smiles from toolbox. On occasion, the default record that comes up from a SMILES metabolism simulation doesn't have metabolites associated, but other records for the same chemical will.
    input: Toolbox API host and port number running within command prompt (required), simulator number (0-17, metsim_url_base subfunction will give list of GUIDs corresponding to which metabolism simulator is desired. 15 = In Vitro Rat Liver S9 Phase I model from TIMES, required, 8 = In Vivo Rat Liver model from TIMES), QSAR-Ready SMILES string (required), CASRN (optional), DTXSID (optional), chemical name (optional), index (for multiprocessing to keep track of sequential order of input data while parallel processing).
    action: Simulates rat liver metabolism (Assumed default from TIMES as 3 cycles of Phase I, thresholded at 5 metabolites/cycle or 0.1 Transformation probability, uncertain if true)
    output: tuple of index parameter and standardized dictionary of precursor and successors for each chemical (not generationally tracked).
### 2. metsim_search_toolbox_api():
    inputs: same as metsim_run_toolbox_api()
    action: Searches QSAR Toolbox Database for alternate records for the same chemical, and attempts metabolism simulation for all qsar-ready smiles records for a given chemical either until one succeeds, or all fail, and the standardised output dictionary is updated accordingly.
    output: same as metsim_run_toolbox_api()
    
### 3. metsim_tb_search_logkow()
    Search the OECD Toolbox database via its WebAPI for a chemical ID for an input chemical, and then 
    return the octanol-water partition coefficient to have a measure of its hydrophobicity.
    
    Inputs: 
    casrn: CAS Registry Number
    tb_port: Port number selected for running instance of the Toolbox WebServer
    host_name: Hostname for the running instance of the Toolbox WebServer
    Outputs:
    log_kow: Log10 scaled octanol-water partition coefficient, if available.

### Test run for the *in vitro* metabolism simulator using simulator number 15

In [7]:
metsim_run_toolbox_api(host_name =host_name, tb_port =30000, simulator_num = 15, smiles = 'OCCOCCO')

Attempting metsim from SMILES input for index #None...
metsim succeeded for index #None


(None,
 {'datetime': '2024-03-28_18h19m18s',
  'software': 'OECD QSAR Toolbox WebAPI',
  'version': 6,
  'params': {'depth': 3,
   'organism': 'Rat',
   'site_of_metabolism': False,
   'model': ['Rat liver S9 metabolism simulator']},
  'input': {'smiles': 'OCCOCCO',
   'inchikey': None,
   'casrn': None,
   'hcd_smiles': None,
   'dtxsid': None,
   'chem_name': None},
  'output': [{'precursor': {'smiles': 'OCCOCCO',
     'inchikey': None,
     'casrn': None,
     'hcd_smiles': None,
     'dtxsid': None,
     'chem_name': None},
    'successors': [{'enzyme': None,
      'mechanism': None,
      'metabolite': {'smiles': 'OCCOCC(O)=O',
       'inchikey': None,
       'casrn': None,
       'hcd_smiles': None,
       'dtxsid': None,
       'chem_name': None}},
     {'enzyme': None,
      'mechanism': None,
      'metabolite': {'smiles': 'OCCOCC=O',
       'inchikey': None,
       'casrn': None,
       'hcd_smiles': None,
       'dtxsid': None,
       'chem_name': None}}]}]})

### Test run using the *in vivo* rat liver simulator

In [8]:
metsim_run_toolbox_api(host_name =host_name, tb_port =30000, simulator_num = 8, smiles = 'OCCOCCO')

Attempting metsim from SMILES input for index #None...
metsim succeeded for index #None


(None,
 {'datetime': '2024-03-28_18h19m23s',
  'software': 'OECD QSAR Toolbox WebAPI',
  'version': 6,
  'params': {'depth': 3,
   'organism': 'Rat',
   'site_of_metabolism': False,
   'model': ['in vivo Rat metabolism simulator']},
  'input': {'smiles': 'OCCOCCO',
   'inchikey': None,
   'casrn': None,
   'hcd_smiles': None,
   'dtxsid': None,
   'chem_name': None},
  'output': [{'precursor': {'smiles': 'OCCOCCO',
     'inchikey': None,
     'casrn': None,
     'hcd_smiles': None,
     'dtxsid': None,
     'chem_name': None},
    'successors': [{'enzyme': None,
      'mechanism': None,
      'metabolite': {'smiles': 'OCCOCC(O)=O',
       'inchikey': None,
       'casrn': None,
       'hcd_smiles': None,
       'dtxsid': None,
       'chem_name': None}},
     {'enzyme': None,
      'mechanism': None,
      'metabolite': {'smiles': 'OCCOCC=O',
       'inchikey': None,
       'casrn': None,
       'hcd_smiles': None,
       'dtxsid': None,
       'chem_name': None}}]}]})

### Test run using the calculator to query back predicted LogKow values 

In [9]:
metsim_tb_search_logkow(casrn = '15687-27-1', host_name = host_name, tb_port = 30000)

Searching Toolbox database for ChemIds using CASRN: 15687-27-1.
Toolbox database ChemIds found for CASRN: 15687-27-1 via Toolbox API search.
Monoconstituent ChemIds found within search results...
Calculating Log Kow value for ChemId 1/2...
Log Kow value successfully determined for CASRN: 15687-27-1.


3.793

### Test run using the search to query back on alternative Chemical Ids using default of the *in vitro* rat simulator

In [14]:
metsim_search_toolbox_api(casrn = '15687-27-1', host_name = host_name, tb_port = 30000)

Base SMILES query for index #None yields no metabolites. Searching for alternate ChemIds...
Alternate ChemId corresponding to a QSAR-Ready SMILES found for index #None. Attempting metsim...
Metsim succeeded for index #None


(None,
 {'datetime': '2024-03-28_18h33m14s',
  'software': 'OECD QSAR Toolbox WebAPI',
  'version': 6,
  'params': {'depth': 3,
   'organism': 'Rat',
   'site_of_metabolism': False,
   'model': ['Rat liver S9 metabolism simulator']},
  'input': {'smiles': None,
   'inchikey': None,
   'casrn': '15687-27-1',
   'hcd_smiles': None,
   'dtxsid': None,
   'chem_name': None},
  'output': [{'precursor': {'smiles': None,
     'inchikey': None,
     'casrn': '15687-27-1',
     'hcd_smiles': None,
     'dtxsid': None,
     'chem_name': None},
    'successors': [{'enzyme': None,
      'mechanism': None,
      'metabolite': {'smiles': 'CC(C(O)=O)c1ccc(CC(C)(C)O)cc1',
       'inchikey': None,
       'casrn': None,
       'hcd_smiles': None,
       'dtxsid': None,
       'chem_name': None}},
     {'enzyme': None,
      'mechanism': None,
      'metabolite': {'smiles': 'CC(CO)Cc1ccc(cc1)C(C)C(O)=O',
       'inchikey': None,
       'casrn': None,
       'hcd_smiles': None,
       'dtxsid': None,
    

In [9]:
drug_dataset = json.load(open('smpdb_jcim_valid_aggregate_112parents.json','r'))
pool = mp.Pool(mp.cpu_count()) #define number of available processors.
tb_metsim_vitro = pool.starmap_async(toolbox_metsim_api,
                                      #arguments (must be listed in the same order as given in the function definition):
                                      [(16384, #tb_port
                                        15, #simulator_num
                                        drug_dataset[idx]['input']['hcd_smiles'], #smiles
                                        drug_dataset[idx]['input']['casrn'], #casrn
                                        drug_dataset[idx]['input']['dtxsid'], #dtxsid
                                        drug_dataset[idx]['input']['chem_name'], #Chemical Name
                                        idx #index
                                       )
                                       for idx in range(len(drug_dataset[0:5]))]).get()

metsim_rerun_vitro = [i for i in range(len(tb_metsim_vitro)) if type(tb_metsim_vitro[i][1]) == int]
for idx in metsim_rerun_vitro:
    tb_metsim_vitro[idx] = pool.apply(toolbox_metsim_api_search,
                                       #arguments (must be listed in the same order as given in the function definition):
                                       args = (16384, #tb_port
                                                15, #simulator_num
                                                drug_dataset[idx]['input']['hcd_smiles'], #smiles
                                                drug_dataset[idx]['input']['casrn'], #casrn
                                                drug_dataset[idx]['input']['dtxsid'], #dtxsid
                                                drug_dataset[idx]['input']['chem_name'], #Chemical Name
                                                tb_metsim_vitro[idx][0] #index
                                              )
                                      )
#keep output dictionaries in list, remove tuple index:
tb_metsim_vitro = [tb_metsim_vitro[i][1] for i in range(len(tb_metsim_vitro))]
pool.close()

In [None]:
metsim_metadata_full(tb_metsim_vitro, fnam = 'tb_metsim_invivoratsimulator_112parents.json')