# Jupyter notebook to integrate SIRIUS results into FBMN

This notebook creates a Cytoscape file where the SIRIUS annotations are mapped into a feature-based molecular network (FBMN) from GNPS jobs. 

> **The latest version of the notebook is available at** -> [https://github.com/mwang87/GNPS_Sirius_Integration_Notebooks](https://github.com/mwang87/GNPS_Sirius_Integration_Notebooks)

! **IMPORTANT**: this binder notebook is a temporary instance running on the cloud. Save the notebook if you want to reaccess it later. 

## Indicate the GNPS job task ID

>1) **Step 1 - Run a FBMN job and a SIRIUS job on GNPS**
>> **For more information about FBMN** -> [https://ccms-ucsd.github.io/GNPSDocumentation/featurebasedmolecularnetworking/](https://ccms-ucsd.github.io/GNPSDocumentation/featurebasedmolecularnetworking/) \
>> **For more information about SIRIUS** ->[https://ccms-ucsd.github.io/GNPSDocumentation/sirius/](https://ccms-ucsd.github.io/GNPSDocumentation/sirius/)\

>2) **Step 2 - Replace the task ID in the cell below for the GNPS jobs**
>>Note that the same spectral summary file (MGF file) must have been used in both workflows. Also the jobs must have been completed.\
For MZmine users, it is recommended to use the MS2 spectral summary exported from the "GNPS Export module" for the FBMN, and the MS1/MS2 spectral summary exported with the "SIRIUS export" module for SIRIUS.

> 3) **Step 3 - Run all the cells below**

In [1]:
# Update the tasks here from GNPS
FBMN_TASK = "bb6766d6f48c4cb6bd84fd620da9cae5"
SIRIUS_TASK = "72542358ffca4eeda6a4b50d06b91e60"

In [2]:
import requests
import pandas as pd
import networkx as nx

In [3]:
# Saving FBMN Network
r = requests.get("http://gnps.ucsd.edu/ProteoSAFe/DownloadResultFile?task={}&block=main&file=gnps_molecular_network_graphml/".format(FBMN_TASK))
with open('fbmn_network.graphml', 'wb') as f:
    f.write(r.content)

In [4]:
# Saving Sirius Formulas
r = requests.get("http://gnps.ucsd.edu/ProteoSAFe/DownloadResultFile?task={}&block=main&file=merged_output/formula_identifications.tsv".format(SIRIUS_TASK))
with open('formula_classifications.tsv', 'wb') as f:
    f.write(r.content)
sirius_df = pd.read_csv("formula_classifications.tsv", sep="\t")

In [5]:
# Saving Sirius Structure
r = requests.get("http://gnps.ucsd.edu/ProteoSAFe/DownloadResultFile?task={}&block=main&file=merged_output/compound_identifications.tsv".format(SIRIUS_TASK))
with open('compound_classifications.tsv', 'wb') as f:
    f.write(r.content)
fingerid_df = pd.read_csv("compound_classifications.tsv", sep="\t")

In [6]:
# Saving Sirius Canopus Classifications
r = requests.get("http://gnps.ucsd.edu/ProteoSAFe/DownloadResultFile?task={}&block=main&file=merged_output/canopus_classification.tsv".format(SIRIUS_TASK))
with open('canopus_classifications.tsv', 'wb') as f:
    f.write(r.content)
canopus_df = pd.read_csv("canopus_classifications.tsv", sep="\t")

In [7]:
# Integrating into Graphml
G = nx.read_graphml("fbmn_network.graphml")

# Adding sirius information
for result in sirius_df.to_dict(orient="records"):
    scan = str(result["scan"])
    if scan in G:
        G.node[scan]["sirius:molecularFormula"] = result["molecularFormula"]
        G.node[scan]["sirius:adduct"] = result["adduct"]
        G.node[scan]["sirius:Zodiac_Score"] = result["Zodiac_Score"]
        G.node[scan]["sirius:TreeIsotope_Score"] = result["TreeIsotope_Score"]
        G.node[scan]["sirius:Isotope_Score"] = result["Isotope_Score"]
        G.node[scan]["sirius:explainedPeaks"] = result["explainedPeaks"]
        G.node[scan]["sirius:explainedIntensity"] = result["explainedIntensity"]
        G.node[scan]["sirius:explainedPeaks"] = result["explainedPeaks"]

# Adding CSI:FingerID information
for result in fingerid_df.to_dict(orient="records"):
    scan = str(result["scan"])
    if scan in G:
        G.node[scan]["csifingerid:smiles"] = result["smiles"]
        G.node[scan]["csifingerid:Confidence_Score"] = result["Confidence_Score"]
        G.node[scan]["csifingerid:#adducts"] = result["#adducts"]
        G.node[scan]["csifingerid:dbflags"] = result["dbflags"]
                    
# Adding canopus information
for result in canopus_df.to_dict(orient="records"):
    scan = str(result["scan"])
    if scan in G:
        G.node[scan]["canopus:subclass"] = result["subclass"]
        G.node[scan]["canopus:class"] = result["class"]
        G.node[scan]["canopus:superclass"] = result["superclass"]
        G.node[scan]["canopus:most specific class"] = result["most specific class"]
    
nx.write_graphml(G, "fbmn_sirius_network.graphml")

>**Step 4 - Now download the generated Cytoscape file**
>>[Click here to download](fbmn_sirius_network.graphml?download=1)

>**Step 5 - Open the Cytoscape file and explore the network**
>>In the downloaded Cytoscape file, the results from SIRIUS are available as supplementary columns in the node table. The results for each step (molecular formula prediction, structure annotation, and class annotation) have a distinctive prefixe (sirius, csifingerid, canopus). These columns can be used to search node properties and visualize structures in the network with the chemViz2 pluggin.\
>>**For more information about using Cytoscape with the FBMN workflow** -> [https://ccms-ucsd.github.io/GNPSDocumentation/featurebasedmolecularnetworking-cytoscape/](https://ccms-ucsd.github.io/GNPSDocumentation/featurebasedmolecularnetworking-cytoscape/)\
>>**Along with the following general Cytoscape documentation for molecular networking** -> [https://ccms-ucsd.github.io/GNPSDocumentation/cytoscape/](https://ccms-ucsd.github.io/GNPSDocumentation/cytoscape/)