### Generation of predictions using Chemical Transformation Simulator (CTS)
- Created by: Louis Groff
- PIs: Imran Shah and Grace Patlewicz (GP)
- Last modified by GP: 5 April 2024
- Changes made: Test sample SMILES to verify functionality of functional calls to produce HCD and CTS outputs. Extra examples added.

## Running CTS

CTS can be run using its REST API at https://qed.epa.gov/cts/rest. The API is queried via a post call using the requests package, where the url parameter is: 
https://qed.epa.gov/cts/rest/metabolizer/run and the POST with the model parameters (json input keyword in request.post) is given as a dictionary with three parameters, “structure” (SMILES string), “generationLimit” (integer value between 1-4 to indicate desired transformation depth), and “transformationLibraries” (indicates model choice). The ChemAxon Human Phase I metabolizer is chosen by setting “transformationLibraries” to [“mammalian_metabolism”]. Thus the json parameter with the POST dictionary structure is:
{"structure": {qsar_ready_smiles}, "generationLimit": {1-4}, "transformationLibraries": ["mammalian_metabolism"]}
The generationLimit of 3 was chosen because the current CTS REST API (as of November 2023) is prone to running out of memory for large metabolism trees. Workarounds and solutions to this issue are in progress, but it is recommended to limit the transformation depth between 1-3 generations for the greatest number of successfully returned metabolism queries.

### Import relevant libraries

In [2]:
import os, sys

### Set LIB directory to call functions from metsim package

In [3]:
LIB = os.getcwd().replace("notebooks", "")

In [4]:
if not LIB in sys.path: 
    sys.path.insert(0,LIB)

In [17]:
processed_dir = LIB + 'data/processed/'

In [6]:
from metsim.sim.metsim_cts import *

### Testing CTS metsim query function for Ibuprofen using 3 generations

In [10]:
test_cts = metsim_cts_query(smiles = 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O',  gens = 3)



CTS API successfully returned metabolism predictions for 3 cycles of Phase I metabolism for input SMILES: CC(C)CC1=CC=C(C=C1)C(C)C(O)=O


### Example of querying Standardizer for a single chemical, Ibuprofen

In [7]:
metsim_hcd_out(smiles = 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O')

Attempting query of Cheminformatics Modules Standardizer with SMILES: CC(C)CC1=CC=C(C=C1)C(C)C(O)=O...
Query succeeded.


{'smiles': 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O',
 'casrn': '15687-27-1',
 'hcd_smiles': 'CC(CC1C=CC(C(C(=O)O)C)=CC=1)C',
 'inchikey': 'HEFNNWSXXWATRW-YHMJCDSINA-N',
 'dtxsid': 'DTXSID5020732',
 'chem_name': 'Ibuprofen',
 'likelihood': None}

### Testing CTS metsim run function for Ibuprofen using 3 generations

In [11]:
test_cts = metsim_run_cts(in_smiles = 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O',  depth = 3)



CTS API successfully returned metabolism predictions for 3 cycles of Phase I metabolism for input SMILES: CC(C)CC1=CC=C(C=C1)C(C)C(O)=O
children key found in dictionary
7 children found for precursor generation level 0
precursor-successor relationships appended for current generational level.
children found in next generational level, recursing...
children key found in dictionary
9 children found for precursor generation level 1
precursor-successor relationships appended for current generational level.
children found in next generational level, recursing...
children key found in dictionary
8 children found for precursor generation level 2
precursor-successor relationships appended for current generational level.
children found in next generational level, recursing...
children key found in dictionary
6 children found for precursor generation level 2
precursor-successor relationships appended for current generational level.
children found in next generational level, recursing...
children

In [12]:
test_cts

{'datetime': '2024-04-05_16h49m00s',
 'software': 'EPA Chemical Transformation Simulator',
 'version': '1.3.2.2',
 'params': {'depth': 3,
  'organism': 'human',
  'site_of_metabolism': False,
  'model': 'ChemAxon Human Phase I'},
 'input': {'smiles': 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O',
  'inchikey': None,
  'casrn': None,
  'hcd_smiles': None,
  'dtxsid': None,
  'chem_name': None},
 'output': [{'precursor': {'smiles': 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O',
    'inchikey': None,
    'casrn': None,
    'hcd_smiles': None,
    'dtxsid': None,
    'chem_name': None,
    'likelihood': 'UNLIKELY'},
   'successors': [{'enzyme': None,
     'mechanism': 'BenzylicHydroxylation',
     'generation': 1,
     'metabolite': {'smiles': 'CC(C)C(O)c1ccc(cc1)C(C)C(O)=O',
      'inchikey': None,
      'casrn': None,
      'hcd_smiles': None,
      'dtxsid': None,
      'chem_name': None,
      'likelihood': 'LIKELY'}},
    {'enzyme': None,
     'mechanism': 'CarbonylAlphaHydroxylation',
     'generation': 1,
    

### Batch processing the CTS MetSim output through Cheminformatics Modules to obtain precursor-successor metabolite metadata and save the results to a JSON file:

In [21]:
cts_full = metsim_metadata_full(test_cts, fnam = "2024_04_02_CTS_Ibuprofen.json")

### Load the completed CTS MetSim result for Ibuprofen from the saved JSON file:

In [22]:
json.load(open(processed_dir+'2024_04_02_CTS_Ibuprofen.json','r'))

[{'datetime': '2024-04-05_16h49m00s',
  'software': 'EPA Chemical Transformation Simulator',
  'version': '1.3.2.2',
  'params': {'depth': 3,
   'organism': 'human',
   'site_of_metabolism': False,
   'model': 'ChemAxon Human Phase I'},
  'input': {'smiles': 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O',
   'casrn': '15687-27-1',
   'hcd_smiles': 'CC(CC1C=CC(C(C(=O)O)C)=CC=1)C',
   'inchikey': 'HEFNNWSXXWATRW-YHMJCDSINA-N',
   'dtxsid': 'DTXSID5020732',
   'chem_name': 'Ibuprofen',
   'likelihood': None},
  'output': [{'precursor': {'smiles': 'CC(C)CC1=CC=C(C=C1)C(C)C(O)=O',
     'casrn': '15687-27-1',
     'hcd_smiles': 'CC(CC1C=CC(C(C(=O)O)C)=CC=1)C',
     'inchikey': 'HEFNNWSXXWATRW-YHMJCDSINA-N',
     'dtxsid': 'DTXSID5020732',
     'chem_name': 'Ibuprofen',
     'likelihood': None},
    'successors': [{'enzyme': None,
      'mechanism': 'BenzylicHydroxylation',
      'generation': 1,
      'metabolite': {'smiles': 'CC(C)C(O)c1ccc(cc1)C(C)C(O)=O',
       'casrn': '53949-53-4',
       'hcd_smiles