# Virtually metabolize GNPS annotations and prepare for Network Annotation Propagation or SIRIUS

Made by Louis-Felix Nothias (UC San Diego), louisfelix.nothias@gmail.com. Started in 2018 and improved in May 2021.

This notebook downloads results of spectral annotations from classical or feature-based molecular networking job from GNPS [[http://gnps.ucsd.edu](http://gnps.ucsd.edu)] and generate virtual metabolites either with SyGMa or BioTransformer. The resulting candidates can be used for [Network Annotation Propagation](https://ccms-ucsd.github.io/GNPSDocumentation/nap/) on GNPS or with [SIRIUS](https://boecker-lab.github.io/docs.sirius.github.io/install/).

> Start by running the cell below to initiate the libraries.

In [1]:
import sys
sys.path.append('gnps_postprocessing/lib')
sys.path.append('src')
from gnps_download_results import *
from consolidate_structures import *
from gnps_results_postprocess import *
from prepare_virtual_metabolization import *
from run_virtual_metabolization import *

## Mandatory - Download annotation from the GNPS job
 
> Replace the job ID from the GNPS molecular networking job in the URL in the cell below (line 3). We support both classical molecular networking and feature-based molecular networking (FBMN) jobs.

You can try the classical MN job from that paper https://pubs.acs.org/doi/10.1021/acs.analchem.8b05854 with the ID `'bbee697a63b1400ea585410fafc95723'`. 

An other test job for feature-based molecular networking (FBMN) is `'e78a8c8f429a46fcb24f3b34d69aff25'`.

In [2]:
job_id = 'bbee697a63b1400ea585410fafc95723'

gnps_download_results(job_id, output_folder ='all_annotations')

This is the GNPS job link: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=bbee697a63b1400ea585410fafc95723
Downloading the following content: https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task=bbee697a63b1400ea585410fafc95723&view=view_all_annotations_DB


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 29.5M    0 29.5M    0     0  1839k      0 --:--:--  0:00:16 --:--:-- 3877k


GNPS job results were succesfully downloaded as: all_annotations.zip
GNPS job results were succesfully extracted into the folder: all_annotations
   CLASSICAL MOLECULAR NETWORKING job detected
      199 spectral library annotations in the job.
      9643 nodes in the network (including single nodes)


In [3]:
gnps_download_results.df_annotations.head(2)

Unnamed: 0,#Scan#,Adduct,CAS_Number,Charge,Compound_Name,Compound_Source,Data_Collector,ExactMass,FileScanUniqueID,INCHI,...,RT_Query,SharedPeaks,Smiles,SpecCharge,SpecMZ,SpectrumFile,SpectrumID,TIC_Query,UpdateWorkflowName,tags
0,100631,[M+H]+,,1,MoNA:594132 Octocrylene,isolated,MoNA,0.0,spectra/specs_ms.pklbin100631,InChI=1S/C24H27NO2/c1-3-5-12-19(4-2)18-27-24(2...,...,73.567,5,,0,362.208,spectra/specs_ms.pklbin,CCMSLIB00000566191,1910.37,UPDATE-SINGLE-ANNOTATED-BRONZE,
1,100637,[M+H]+,,1,MoNA:594132 Octocrylene,isolated,MoNA,0.0,spectra/specs_ms.pklbin100637,InChI=1S/C24H27NO2/c1-3-5-12-19(4-2)18-27-24(2...,...,433.325,7,,0,362.212,spectra/specs_ms.pklbin,CCMSLIB00000566191,31304.2,UPDATE-SINGLE-ANNOTATED-BRONZE,


## Mandatory - Consolidating structures identifier

> Run the cell below to have a complete set of Smiles and InChI for the annotations.

**IMPORTANT: Note that only spectral annotations that have a valid InChI or SMILES identifier will be considered downstream. If the annotations you are interested in don't have an identifier in the library, go back to the GNPS library entry, update the entry by adding an identifier, and rerun your GNPS job**

In [4]:
df_annotations_consolidated  = consolidate_and_convert_structures(gnps_download_results.df_annotations, prefix='', 
                                                                  smiles='Smiles', inchi='INCHI')

 ==== Consolidating structures from SMILES and/or InChI ====
Both SMILES and InChI were inputted
Converting SMILES to mol object
  Salt(s) deleted in       : CC(C)N=C(N)N=C(N)Nc1ccc(Cl)cc1.Cl
  Remaining residue        : CC(C)N=C(N)N=C(N)Nc1ccc(Cl)cc1
  Salt(s) deleted in       : C12(=CC=C(C=C1COC2(CCCN(C)C)C3=CC=C(C=C3)F)C#N).OC(=O)C(=O)O
  Remaining residue        : CN(C)CCCC1(c2ccc(F)cc2)OCc2cc(C#N)ccc21
Succesfully converted to mol object: 110
Exception to the parsing: 0
Not available: 90
Converting INCHI to mol object
Succesfully converted to mol object: 104
Exception to the parsing: 0
Not available: 96
Consolidating the lists
Total mol object from the list 1 = 110
Mol object consolidated from list 2 = 12
Consolidated structures = 122
Converting mol objects to SMILES iso
Converting mol objects to SMILES
Converting mol objects to InChI
Converting mol objects to InChIKey
End


In [5]:
# We keep only annotations with a structure identifier

df_annotations = get_info_gnps_annotations(df_annotations_consolidated, 
                          inchi_column='Consol_InChI', 
                          smiles_column = 'Consol_SMILES', 
                          smiles_planar_column='Consol_SMILES_iso')

200 annotations detected
that corresponds to 61 unique stereostructures
that corresponds to 58 unique planar structures
78 annotations dont have a structure identifier and will be discarded from downstream processing, unless you do the following:
You can either update the GNPS library and rerun the GNPS job. Or you can provide a structure identifier in the dedicated cell below
These are the compounds without structure identifiers:
'(+)-Catechin',
'(.+/-.)-8-Hydroxy-5Z,9E,11Z,14Z,17Z-eicosapentaenoic acid',
'2-(Cyclohexylamino)ethanesulfonic acid',
'3,4'-Dimethoxy-2-hydroxychalcone',
'4-Hydroxy-4'-methyldiphenylamine',
'B10A30 Faulkner legacy library looks like sterol or lipid needs to be verified',
'Benzalkonium chloride (C12)',
'Betulinic acid',
'Betulinic acid methyl ester',
'Conjugated linoleic Acid (10E,12Z)',
'Dioctyl phthalate',
'Lyso-PC(16:0)',
'Procyanidin B2',
'Spectral Match to (+)-Catechin from NIST14',
'Spectral Match to (-)-Epicatechin from NIST14',
'Spectral Match to 1-He

### [Advanced optional feature - Recommended to ignore] - Filter annotations based on compound name

If you want to apply this filter, convert the cell type from raw to code.

##### Optional - Display compound name

In [6]:
list_compounds = set(df_annotations['Compound_Name'])
print_compound_names(list_compounds)

'(+)-Catechin',
'(-)-Catechin',
'(-)-Secoisolariciresinol',
'.alpha.-Cyclodextrin',
'.alpha.-Hexylcinnamaldehyde',
'.alpha.-Ionone',
'3-(2-Hydroxyphenyl)propionic acid',
'3-Methoxycinnamic acid',
'4-(4-Aminophenoxy)aniline',
'7-Oxocholesterol',
'CocamidopropylBetaine',
'Dimethyldioctadecylammonium cation',
'Escitalopram Oxalate',
'Ethyl 3-hydroxybenzoate',
'Glycerol tricaprate',
'Glycerol tricaprylate',
'Isobergaptene',
'Isoxadifen-ethyl',
'Lauric acid diethanolamide',
'MLS000028461-01!URSODEOXYCHOLIC ACID',
'MLS001332387-01!Chlorhexidine55-56-1',
'MLS002154090-01!Proguanil hydrochloride637-32-1',
'Massbank: Dextromethorphan',
'Massbank:EA019905 Trimethoprim|2,4-Diamino-5-(3,4,5-trimethoxybenzyl)pyrimidine|5-[(3,4,5-trimethoxyphenyl)methyl]pyrimidine-2,4-diamine',
'Massbank:EA066012 Galaxolidone|1,3,4,6,7,8-Hexahydro-4,6,6,7,8,8-hexamethylcyclopenta[g]-2-benzopyran-1-on',
'Massbank:EA280805 Mycophenolic acid| (E)-6-(4-hydroxy-3-keto-6-methoxy-7-methyl-phthalan-5-yl)-4-methyl-hex-4-enoi

##### Optional - Select compound name to keep

Replace the compound names in the list `compound_name_to_keep`


In [7]:
compound_name_to_keep = ['Isobergaptene',
                         'Isoxadifen-ethyl',
                         'Lauric acid diethanolamide',
                         'Lyso-PC(16:0)',
                         'MLS000028461-01!URSODEOXYCHOLIC ACID',
                         'MLS001332387-01!Chlorhexidine55-56-1',
                         'MLS002154090-01!Proguanil hydrochloride637-32-1'
                        ]

### [Advanced optional feature - Recommended to ignore]  - Filter annotations based on tags

If you want to apply this filter convert the cell type from raw to code.

##### Optional - Display tags-annotations

In [8]:
print_compound_name_for_tags(df_annotations)

These are the compounds with tags:
Tag: 'Antibiotic[Drug Class]'
	'MLS002154090-01!Proguanil hydrochloride637-32-1'
Tag: 'Nonionic Surfactant[Source Environment]'
	'Sorbitane Monopalmitate - Polysorbate 40 in-source fragment'
Tag: 'Surfactant[Chemical Family]'
	'CocamidopropylBetaine'


##### Optional - Select tags to keep

Specify the tags in the list `tags_to_keep`

In [9]:
tags_to_keep = ['Antibiotic[Drug Class]','Nonionic Surfactant[Source Environment]']    # you can use multiple tags with ['Antibiotic[Drug Class]','other_tag_here']

In [10]:
# Apply Compound_Name filter and/or tags filter (if any)

# We check if those lists exists and process as needed:
if compound_name_to_keep and tags_to_keep:
    df_annotations_filtered = df_annotations_filtering(df_annotations, compound_name=compound_name_to_keep, tags=tags_to_keep)
    print('Compound name and tags filtering applied')
elif compound_name_to_keep:
    df_annotations_filtered = df_annotations_filtering(df_annotations, compound_name=compound_name_to_keep)
    print('Compound name filtering applied')
elif tags_keep:
    df_annotations_filtered = df_annotations_filtering(df_annotations, tags=tags_to_keep)
    print('Tag(s) filtering applied')
else:
    print('No Compound_Name or Tags filter were used')
    
print('Number of annotations after filtering = '+str(df_annotations_filtered.shape[0]))

Compound name and tags filtering applied
Number of annotations after filtering = 13


## Mandatory - Choose between planar or stereochemical SMILES

### [RECOMMENDED] Use the planar SMILES for virtual metabolization (no stereochemistry specified)

Run the cell below to use planar isomers and ignore the cell after. This is recommended as it reflects the confidence computational mass spectrometry annotation can achieve and limits the number of candidates to compute.

In [11]:
use_planar_structure=True # or False
prepare_for_virtual_metabolization(df_annotations_filtered,
                                    smiles_column = 'Consol_SMILES', 
                                    smiles_planar_column='Consol_SMILES_iso',
                                    drop_duplicated_structure = True, 
                                    use_planar_structure= True)

Number of spectral library annotations = 13
Number of spectral annotations with planar SMILES/InChI = 13
Number of unique planar SMILES considered = 7


Unnamed: 0,#Scan#,Adduct,CAS_Number,Charge,Compound_Name,Compound_Source,Data_Collector,ExactMass,FileScanUniqueID,INCHI,...,SpecMZ,SpectrumFile,SpectrumID,TIC_Query,UpdateWorkflowName,tags,Consol_SMILES_iso,Consol_SMILES,Consol_InChIKey,Consol_InChI
11,126417,M+H,,0,Sorbitane Monopalmitate - Polysorbate 40 in-so...,Lysate,AMelnik,0.0,spectra/specs_ms.pklbin126417,,...,402.358,spectra/specs_ms.pklbin,CCMSLIB00000577482,209069.0,UPDATE-SINGLE-ANNOTATED-BRONZE,Nonionic Surfactant[Source Environment],CCCCCCCCCCCCCCCC(=O)OCC(O)C1OCC(O)C1O,CCCCCCCCCCCCCCCC(=O)OCC(O)C1OCC(O)C1O,IYFATESGLOUGBX-UHFFFAOYSA-N,InChI=1S/C22H42O6/c1-2-3-4-5-6-7-8-9-10-11-12-...
5,20450,M+H-C2H6O,163520330.0,1,Isoxadifen-ethyl,Isolated,NIST,0.0,spectra/specs_ms.pklbin20450,InChI=1S/C18H17NO3/c1-2-21-17(20)16-13-18(22-1...,...,250.088,spectra/specs_ms.pklbin,CCMSLIB00003562356,76095.8,UPDATE-SINGLE-ANNOTATED-BRONZE,,CCOC(=O)C1=NOC(c2ccccc2)(c2ccccc2)C1,CCOC(=O)C1=NOC(c2ccccc2)(c2ccccc2)C1,MWKVXOJATACCCH-UHFFFAOYSA-N,InChI=1S/C18H17NO3/c1-2-21-17(20)16-13-18(22-1...
8,22092,M+2H,,2,MLS001332387-01!Chlorhexidine55-56-1,NIH Pharmacologically Active Library,VP/LMS,504.203,spectra/specs_ms.pklbin22092,,...,253.109,spectra/specs_ms.pklbin,CCMSLIB00000084912,7827.59,UPDATE-SINGLE-ANNOTATED-GOLD,,N=C(NCCCCCCNC(=N)NC(=N)Nc1ccc(Cl)cc1)NC(=N)Nc1...,N=C(NCCCCCCNC(=N)NC(=N)Nc1ccc(Cl)cc1)NC(=N)Nc1...,GHXZTYHSJHQHIJ-UHFFFAOYSA-N,InChI=1S/C22H30Cl2N10/c23-15-5-9-17(10-6-15)31...
9,22309,M+H,,1,MLS002154090-01!Proguanil hydrochloride637-32-1,NIH Pharmacologically Active Library,VP/LMS,253.109,spectra/specs_ms.pklbin22309,,...,254.108,spectra/specs_ms.pklbin,CCMSLIB00000085607,49078.0,UPDATE-SINGLE-ANNOTATED-GOLD,Antibiotic[Drug Class],CC(C)N=C(N)N=C(N)Nc1ccc(Cl)cc1,CC(C)N=C(N)N=C(N)Nc1ccc(Cl)cc1,SSOLNOMRVKKSON-UHFFFAOYSA-N,InChI=1S/C11H16ClN5/c1-7(2)15-10(13)17-11(14)1...
10,43303,M+H,120401.0,1,Lauric acid diethanolamide,Isolated,NIST,0.0,spectra/specs_ms.pklbin43303,InChI=1S/C16H33NO3/c1-2-3-4-5-6-7-8-9-10-11-16...,...,288.253,spectra/specs_ms.pklbin,CCMSLIB00003708493,19787.5,UPDATE-SINGLE-ANNOTATED-BRONZE,,CCCCCCCCCCCC(=O)N(CCO)CCO,CCCCCCCCCCCC(=O)N(CCO)CCO,AOMUHOFOVNGZAN-UHFFFAOYSA-N,InChI=1S/C16H33NO3/c1-2-3-4-5-6-7-8-9-10-11-16...
0,103,M+H-C2O2,482484.0,1,Isobergaptene,Isolated,NIST,0.0,spectra/specs_ms.pklbin103,InChI=1S/C12H8O4/c1-14-9-6-10-8(4-5-15-10)12-7...,...,161.061,spectra/specs_ms.pklbin,CCMSLIB00003572342,6880.47,UPDATE-SINGLE-ANNOTATED-BRONZE,,COc1cc2occc2c2oc(=O)ccc12,COc1cc2occc2c2oc(=O)ccc12,AJSPSRWWZBBIOR-UHFFFAOYSA-N,InChI=1S/C12H8O4/c1-14-9-6-10-8(4-5-15-10)12-7...
1,121815,M+H,,1,MLS000028461-01!URSODEOXYCHOLIC ACID,NIH Pharmacologically Active Library,VP/LMS,392.293,spectra/specs_ms.pklbin121815,,...,393.297,spectra/specs_ms.pklbin,CCMSLIB00000084832,42723.1,UPDATE-SINGLE-ANNOTATED-GOLD,,CC(CCC(=O)O)C1CCC2C3C(O)CC4CC(O)CCC4(C)C3CCC12C,C[C@H](CCC(=O)O)[C@H]1CC[C@H]2[C@@H]3[C@@H](O)...,RUDATBOHQWOJDD-UZVSRGJWSA-N,InChI=1S/C24H40O4/c1-14(4-7-21(27)28)17-5-6-18...


In [12]:
# Appending structures for virtual metabolization
# You can proceed by manually appending the pairs of compound name and SMILES [the order should match in both list]

extra_compound_names = ['Harpagoside',
                       'boswellic acid']

extra_smiles = ['C[C@@]1(C[C@H]([C@]2([C@@H]1[C@@H](OC=C2)O[C@H]3[C@@H]([C@H]([C@@H]([C@H](O3)CO)O)O)O)O)O)OC(=O)/C=C/c4ccccc4'
               'CC1CCC2(CCC3(C(=CCC4C3(CCC5C4(CCC(C5(C)C(=O)O)O)C)C)C2C1C)C)C']
    
append_to_list_if_not_present(prepare_for_virtual_metabolization.list_compound_name, extra_compound_names)
append_to_list_if_not_present(prepare_for_virtual_metabolization.list_smiles, extra_smiles)

Initial number of items in the list = 7
Final number of items in the list = 9
Initial number of items in the list = 7
Final number of items in the list = 8


['CCCCCCCCCCCCCCCC(=O)OCC(O)C1OCC(O)C1O',
 'CCOC(=O)C1=NOC(c2ccccc2)(c2ccccc2)C1',
 'N=C(NCCCCCCNC(=N)NC(=N)Nc1ccc(Cl)cc1)NC(=N)Nc1ccc(Cl)cc1',
 'CC(C)N=C(N)N=C(N)Nc1ccc(Cl)cc1',
 'CCCCCCCCCCCC(=O)N(CCO)CCO',
 'COc1cc2occc2c2oc(=O)ccc12',
 'CC(CCC(=O)O)C1CCC2C3C(O)CC4CC(O)CCC4(C)C3CCC12C',
 'C[C@@]1(C[C@H]([C@]2([C@@H]1[C@@H](OC=C2)O[C@H]3[C@@H]([C@H]([C@@H]([C@H](O3)CO)O)O)O)O)O)OC(=O)/C=C/c4ccccc4CC1CCC2(CCC3(C(=CCC4C3(CCC5C4(CCC(C5(C)C(=O)O)O)C)C)C2C1C)C)C']

# Mandatory - Choose between SyGMa (A) or BioTransformer (B) for virtual metabolization

#### A - SyGMa generates specifically human biotransformation of phase 1 and/or 2. 
It takes generally couple minutes to compute. More informations from the paper (https://doi.org/10.1002/cmdc.200700312).

#### B - BioTransformer generates biotransformation in mammals, their gut microbiota, as well as the soil/aquatic microbiota. 
It takes more time to compute. More information from the paper ([https://doi.org/10.1186/s13321-018-0324-5](https://doi.org/10.1186/s13321-018-0324-5)).

# A - Virtual metabolization with SyGMa

SyGMa is a python library for the Systematic Generation of potential Metabolites. See [SyGMa: combining expert knowledge and empirical scoring in the prediction of metabolites](https://doi.org/10.1002/cmdc.200700312) and [https://github.com/3D-e-Chem/sygma](https://github.com/3D-e-Chem/sygma).

Please cite their work:
Ridder, L., & Wagener, M. (2008) [SyGMa: combining expert knowledge and empirical scoring in the prediction of metabolites](https://doi.org/10.1002/cmdc.200700312). ChemMedChem, 3(5), 821-832.


### IMPORTANT -> Change the parameters below as needed
> Define the ruleset and the number of phase 1/2 reaction cyles to apply in the SyGMA scenario. For example 2 cycles for phase 1 `phase_1_cycle = 2`. Using a value > 1 will be slow.

> Define the maximum number of SyGMa candidates outputted (consider the number of reaction cycles). Suggested value `top_sygma_candidates = 15`

> Run SyGMa.

In [13]:
# Define the number of metabolization cycles (1-3). If the number of cycle is more than 1, it can be slow.
phase_1_cycle = 1
phase_2_cycle = 1
          
#Top metabolites predicted by SyGMa to output (ranked by highest score)
top_sygma_candidates = 10

### Run the cell below for running SyGMa (Fast !)

No need to change the content of cell below

In [14]:
run_sygma_batch(prepare_for_virtual_metabolization.list_smiles, prepare_for_virtual_metabolization.list_compound_name, 
                phase_1_cycle, phase_2_cycle, top_sygma_candidates, 'results_vm-NAP_SyGMa.tsv')

=== Starting SyGMa computation ===
Number of compounds = 8
Batch_size = 13
If you are running many compounds or cycles, and maxing out RAM memory available, you can decrease the batch size. Otherwise the value can be increased for faster computation.
Please wait
Batch 1/1 completed with 80 metabolites
Number of SyGMA candidates = 80
Number of unique SyGMA candidates = 80
===== COMPLETED =====


When completed, download the full SyGMa results in the left side panel->
['results_vm-NAP_SyGMa.tsv'](./results_vm-NAP_SyGMa.tsv).

## Export the SyGMa results for NAP
See the documentation for custom database in [NAP](https://ccms-ucsd.github.io/GNPSDocumentation/nap/#structure-database) and how to run NAP on GNPS [https://ccms-ucsd.github.io/GNPSDocumentation/nap/#structure-database](https://ccms-ucsd.github.io/GNPSDocumentation/nap/#structure-database).

In [15]:
export_for_NAP('results_vm-NAP_SyGMa.tsv')

Number of metabolites = 80
Number of unique metabolites considered = 68


View/Download the results for NAP in the left side panel->
['results_vm-NAP_SyGMa_NAP.tsv'](./results_vm-NAP_SyGMa_NAP.tsv).

To download: Go into File/Download or right-clic on the file in the left panel

## Export the SyGMa results for SIRIUS

See the documentation to generate the SIRIUS [custom database here](https://boecker-lab.github.io/docs.sirius.github.io/cli-standalone/#custom-database-tool).

In [16]:
export_for_SIRIUS('results_vm-NAP_SyGMa.tsv')

Number of metabolites = 80
Number of unique metabolites considered = 68


Download the results for SIRIUS in the left side panel->
['results_vm-NAP_SyGMa_SIRIUS.tsv'](./results_vm-NAP_SyGMa_SIRIUS.tsv).

# B - Virtual metabolization with BioTransformer (It is slow !)

BioTransformer is a software tool that predicts small molecule metabolism in mammals, their gut microbiota, as well as the soil/aquatic microbiota. BioTransformer also assists scientists in metabolite identification, based on the metabolism prediction. More information from the paper [[https://doi.org/10.1186/s13321-018-0324-5](https://doi.org/10.1186/s13321-018-0324-5)] and [[https://bitbucket.org/djoumbou/biotransformerjar/src/master/](https://bitbucket.org/djoumbou/biotransformerjar/src/master/)].

### Citation

Djoumbou-Feunang, Y., Fiamoncini, J., Gil-de-la-Fuente, A. et al. [BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification.](https://doi.org/10.1186/s13321-018-0324-5) J Cheminform 11, 2 (2019).

### Install BioTransformer [it can be ran once]
It requires `curl` and `java`.

In [17]:
!java -version
!rm -r biotransformer.zip biotransformer/
!curl https://bitbucket.org/djoumbou/biotransformerjar/get/f47aa4e3c0da.zip -o biotransformer.zip
!unzip -q -d biotransformer biotransformer.zip
!cp -r biotransformer/djoumbou-biotransformerjar-f47aa4e3c0da/. .
!rm -r biotransformer.zip biotransformer/

openjdk version "1.8.0_112"
OpenJDK Runtime Environment (Zulu 8.19.0.1-linux64) (build 1.8.0_112-b16)
OpenJDK 64-Bit Server VM (Zulu 8.19.0.1-linux64) (build 25.112-b16, mixed mode)
rm: cannot remove 'biotransformer.zip': No such file or directory
rm: cannot remove 'biotransformer/': No such file or directory
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 68.9M  100 68.9M    0     0  7339k      0  0:00:09  0:00:09 --:--:-- 8432k


#### Specify the parameters of BioTransformer

`type_of_biotransformation` : -b,--bt Type <BioTransformer Option> The type of description: Type of biotransformer - EC-based (`ecbased`), CYP450 (`cyp450`), Phase II (`phaseII`), Human gut microbial (`hgut`), human super transformer* (`superbio`, or `allHuman`), Environmental microbial** (`envimicro`).

(* ) While the `superbio` option runs a set number of transformation steps in a pre-defined order (e.g. deconjugation first, then Oxidation/reduction, etc.), the `allHuman` option predicts all possible metabolites from any applicable reaction (Oxidation, reduction, (de-)conjugation) at each step.

(** ) For the environmental microbial biodegradation, all reactions (aerobic and anaerobic) are reported, and not only the aerobic biotransformations (as per default in the EAWAG BBD/PPS system).
    
`number_of_steps`  -s,--nsteps <Number of steps> The number of steps for the prediction. This option can be set by the user for the EC-based, CYP450, Phase II, and Environmental microbial biotransformers. The default value is `1`.

In [18]:
type_of_biotransformation = 'hgut'
number_of_steps = 1

run_biotransformer(prepare_for_virtual_metabolization.list_smiles,prepare_for_virtual_metabolization.list_compound_name,
                   type_of_biotransformation, number_of_steps, 'results_vm-NAP_BioTransformer.tsv')
print(' ====> Biotransformer computation is finally completed !!! ')

     Number of compounds being virtually metabolized with BioTransformer =  8
     Biotransformation: hgut
     Please wait for the computation ...
0    [main] INFO  net.sf.jnati.deploy.artefact.ConfigManager  - Loading global configuration
6    [main] DEBUG net.sf.jnati.deploy.artefact.ConfigManager  - Loading defaults: jar:file:/home/jovyan/biotransformer-1.1.5.jar!/META-INF/jnati/jnati.default-properties
7    [main] INFO  net.sf.jnati.deploy.artefact.ConfigManager  - Loading artefact configuration: jniinchi-1.03_1
8    [main] DEBUG net.sf.jnati.deploy.artefact.ConfigManager  - Loading instance defaults: jar:file:/home/jovyan/biotransformer-1.1.5.jar!/META-INF/jnati/jnati.instance.default-properties
10   [main] INFO  net.sf.jnati.deploy.repository.ClasspathRepository  - Searching classpath for: jniinchi-1.03_1-LINUX-AMD64
12   [main] INFO  net.sf.jnati.deploy.repository.LocalRepository  - Searching local repository for: jniinchi-1.03_1-LINUX-AMD64
12   [main] DEBUG net.sf.jnati.deplo

Download the full BioTransformer results in the left side panel->
['results_vm-NAP_BioTransformer.tsv'](./results_vm-NAP_BioTransformer.tsv).

## Export the BioTransformer results for NAP

See the documentation for custom database in [NAP](https://ccms-ucsd.github.io/GNPSDocumentation/nap/#structure-database) and how to run NAP on GNPS [https://ccms-ucsd.github.io/GNPSDocumentation/nap/#structure-database](https://ccms-ucsd.github.io/GNPSDocumentation/nap/#structure-database).

In [19]:
export_for_NAP('results_vm-NAP_BioTransformer.tsv')

Number of metabolites = 38
Number of unique metabolites considered = 20


Download the BioTransformer results for NAP in the left side panel->
['results_vm-NAP_BioTransformer_NAP.tsv'](./results_vm-NAP_BioTransformer_NAP.tsv).

## Export the BioTransformer results for SIRIUS

See the documentation to generate the SIRIUS [custom database here](https://boecker-lab.github.io/docs.sirius.github.io/cli-standalone/#custom-database-tool).

In [20]:
export_for_SIRIUS('results_vm-NAP_BioTransformer.tsv')

Number of metabolites = 38
Number of unique metabolites considered = 20


Download the BioTransformer results for NAP in the left side panel->
['results_vm-NAP_BioTransformer_SIRIUS.tsv'](./results_vm-NAP_BioTransformer_SIRIUS.tsv).