## DEMO USAGE: GNPS downloader and post-processing

In [4]:
import sys 
sys.path.append('../lib')
from consolidate_structures import *
from gnps_download_results import *
from gnps_results_postprocess import *

## Download classical molecular networking

In [9]:
# Classical Mol Net
gnps_download_results(job_id = 'bbee697a63b1400ea585410fafc95723', output_folder = 'gnps_results')

This is the GNPS job link: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=bbee697a63b1400ea585410fafc95723
Downloading the following content: https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task=bbee697a63b1400ea585410fafc95723&view=view_all_annotations_DB


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 29.5M    0 29.5M    0     0  2521k      0 --:--:--  0:00:11 --:--:-- 3596k


GNPS job results were succesfully downloaded as: gnps_results.zip
GNPS job results were succesfully extracted into the folder: gnps_results
   CLASSICAL MOLECULAR NETWORKING job detected
      199 spectral library annotations in the job.
      9643 nodes in the network (including single nodes)


## Download feature-based molecular networking

In [10]:
# Feature-Based Molecular Networking
gnps_download_results(job_id = '2047c735fc3546f7a3a32c78245edccf', output_folder = 'gnps_results_fbmn')

This is the GNPS job link: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=2047c735fc3546f7a3a32c78245edccf
Downloading the following content: https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task=2047c735fc3546f7a3a32c78245edccf&view=view_all_annotations_DB


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4675k    0 4675k    0     0   579k      0 --:--:--  0:00:08 --:--:--  850k

GNPS job results were succesfully downloaded as: gnps_results_fbmn.zip
GNPS job results were succesfully extracted into the folder: gnps_results_fbmn
   FEATURE-BASED MOLECULAR NETWORKING job detected - Version > 28
      206 spectral library annotations in the job.
      960 nodes in the network (including single nodes).


100 4999k    0 4999k    0     0   596k      0 --:--:--  0:00:08 --:--:--  878k


## Consolidate structures

In [11]:
gnps_annotations_consolidated  = consolidate_and_convert_structures(gnps_download_results.df_annotations, prefix='', smiles='Smiles', inchi='INCHI')

 ==== Consolidating structures from SMILES and/or InChI ====
Both SMILES and InChI were inputted
Converting SMILES to mol object
Succesfully converted to mol object: 148
Exception to the parsing: 0
Not available: 59
Converting INCHI to mol object
Succesfully converted to mol object: 155
Exception to the parsing: 0
Not available: 52
Consolidating the lists
Total mol object from the list 1 = 148
Mol object consolidated from list 2 = 10
Consolidated structures = 158
Converting mol objects to SMILES iso
Converting mol objects to SMILES
Converting mol objects to InChI
Converting mol objects to InChIKey
End


## Filter annotations

In [12]:
gnps_annotations_filtered = gnps_filter_annotations(gnps_annotations_consolidated, 'Consol_InChI', ionisation_mode = 'pos', max_ppm_error=10, min_cosine=0.6, shared_peaks = 6, max_spec_charge = 1)

Initial number of annotations: 207
Remaining after ionisation mode filtering: 207
Remaining after max_ppm_error filtering: 193
Remaining after min_cosine filtering: 193
Remaining after number of shared_peaks filtering: 165
Remaining after number of spectrum charge filtering: 165


## Clean up annotations

In [13]:
cleaned_up_gnps_annotations = gnps_clean_up_annotations(gnps_annotations_filtered, 'Consol_InChI', remove_C_containing_in_source_fragment = True)

Initial number of annotations: 165
After removing annotations without structure: 118
After intrinsically charged molecules removed: 118
After carbon containing adducts filtering: 99


## Get molecular formula proxy

In [14]:
cleaned_up_gnps_annotations_formula = get_molecular_formula_from_inchi(cleaned_up_gnps_annotations, 'Consol_InChI')

Initial number of annotations: 99
After carbon containing adducts filtering: 99
