# Create molecular networks
This notebooks creates a graphml file for the case study data to enable visualizing mol networks in cytoscape

# Download data from zenodo
The ms2deepscore model, case study data and MS2Query annotations are downloaded from zenodo

In [5]:
import requests
import os
from tqdm import tqdm

def download_file(link, file_name):
    response = requests.get(link, stream=True)
    if os.path.exists(file_name):
        print(f"The file {file_name} already exists, the file won't be downloaded")
        return
    total_size = int(response.headers.get('content-length', 0))
    
    with open(file_name, "wb") as f, tqdm(desc="Downloading file", total=total_size, unit='B', unit_scale=True, unit_divisor=1024,) as bar:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
                bar.update(len(chunk))  # Update progress bar by the chunk size
    
model_file_name = "ms2deepscore_model.pt"
case_study_spectra_file_name = "case_study_spectra.mgf"

download_file("https://zenodo.org/records/14290920/files/settings.json?download=1", "ms2deepscore_settings.json")
download_file("https://zenodo.org/records/14290920/files/ms2deepscore_model.pt?download=1", model_file_name)
download_file("https://zenodo.org/records/14535374/files/cleaned_spectra_pos_neg_with_numbering.mgf?download=1", case_study_spectra_file_name)


Downloading file: 2.18kB [00:00, 70.7kB/s]
Downloading file: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 397M/397M [00:03<00:00, 107MB/s]
Downloading file: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.57M/5.57M [00:00<00:00, 26.9MB/s]


### Load MS2Deepscore model

In [6]:
from ms2deepscore.models import load_model
model = load_model(model_file_name)

  model_settings = torch.load(filename, map_location=device)


### Create spectral similarity scores
The spectrum file "./cleaned_spectra_pos_neg_with_numbering.mgf" was created in pre_processing_spectra

In [7]:
from matchms.Pipeline import Pipeline, create_workflow
from ms2deepscore import MS2DeepScore

workflow = create_workflow(
    query_filters=[],
    score_computations=[
        [MS2DeepScore, {"model": model}],
        ],
)
pipeline = Pipeline(workflow)
report = pipeline.run(case_study_spectra_file_name)



Processing spectra: 2909it [00:00, 3123.46it/s]
2909it [00:05, 547.08it/s]


### Create a network
The pipeline.scores contain all the scores. To make a molecular network only some of the similarity scores are stored. They are only stored if the score is at least 0.85 and each node (spectrum) is only connected to the top 5 highest similarity scores and only if it is in the tup 5 of that other spectrum as well. 

This is the common approach for creating molecular networks (exact settings vary) and allows for getting visually pleasing mol networks (preventing giant hairbals).

In [8]:
from matchms.networking import SimilarityNetwork

# Define settings
ms2ds_network = SimilarityNetwork(
    identifier_key="query_spectrum_nr",
    score_cutoff=0.85,  # higher numbers produce more isolated sub-graphs
    max_links=10,  # lower number makes sparser networks
    link_method="mutual",  # mutual means: link is only added if in top list of both nodes
)

# Compute the graph (takes some time)
ms2ds_network.create_network(pipeline.scores, score_name="MS2DeepScore")

### save to graphml

In [4]:
# Export to graphml
ms2ds_network.export_to_graphml("ms2ds_graph_min_0_85_score_10_links.graphml")


### Load into cytoscape

The graphml file can be loaded into cytoscape: https://cytoscape.org/ This is an open source platform for visualizing graphs. 


To recreate the case study results:
- Open cytoscape
- Load in the above created graphml file.
- Load in ms2query annotations as table, see file add_annotations.ipynb. Can be downloaded from https://zenodo.org/records/14535374
- Set style settings (or load in a style file)
- Set up chemviz to visualize chemical information.
- Explore your data!