In [6]:
from pathlib import Path
import urllib.request

import pandas as pd
from pandas import ExcelWriter

import omigami

# How to use the Omigami API v1
## Introduction

This notebook aims to guide metabolomics researchers on how to calculate Spec2Vec similarity scores of MS spectra in a dataset when compared to a large body of annotated spectra.

This notebook shows you how to load a MGF file, run the omigami interface and interpret the prediction results.

### 1) (Down)load a MS/MS dataset (MGF format)
You will need the `omigami.py` script which comes with this notebook

The following python libraries are also needed :
- `requests`
- `json`
- `matchms`

Also, a relatively small MS/MS dataset (MGF format) is needed to follow through the upcoming example.  
You can download a MS/MS dataset by browsing the [GNPS spectral library](https://gnps-external.ucsd.edu/gnpslibrary), or download a miscellaneous one by clicking [here](https://gnps-external.ucsd.edu/gnpslibrary/GNPS-COLLECTIONS-MISC.mgf).

Please note that, in the MGF file, the Precursor_MZ field `PEPMASS` and the abundance pairs are required.

In [7]:
# Download from GNPS
url = 'https://gnps-external.ucsd.edu/gnpslibrary/GNPS-COLLECTIONS-MISC.mgf'

home = str(Path.home())
path_to_mgf = f'{home}/Downloads/GNPS-COLLECTIONS-MISC.mgf' # use your prefered saving path here

urllib.request.urlretrieve(url, path_to_mgf)

('/Users/pierre/Downloads/GNPS-COLLECTIONS-MISC.mgf',
 <http.client.HTTPMessage at 0x7fbe0a5eae50>)

In [8]:
# Or load from local
home = str(Path.home())
path_to_mgf = f'{home}/Downloads/GNPS-COLLECTIONS-MISC.mgf'

### 2) Call Omigami to compute Spec2Vec scores 

Omigami is a python wrapper which :
- Builds a json payload from the MGF file
- Calls the Omigami API
- Formats the prediction results into readable dataframes

(see the `omigami.py` file associated with this notebook)
____
Please note that the optional `n_best_spectra` parameter controls the number of predicted spectra returned per set of peaks (10 by default).

In the results dataframes, the input spectra can be identified by their number in the dataframes index, which refers to their order in the MGF file.  
*i.e.* `matches of spectrum 1` gives the spectrum_id and Spec2Vec scores of the predicted spectra, for the first spectrum in the MGF file.

For each spectrum in the MGF file, the predicted set of spectra are sorted according to their Spec2Vec similarity score (best is first).   
The following informations about the predicted spectra are returned by Omigami :
- `score`, the Spec2Vec similarity score between the input spectrum and the predicted spectrum
- `matches of spectrum #`, the spectrum_ID of the predicted spectra found by Spec2Vec for the spectrum number # in the MGF file

In [9]:
# Run Omigami
spectra_predictions = omigami.run(mgf_file = path_to_mgf, n_best_spectra = 10)

### 3) Save prediction results
Omigami returns a list of dataframes. To visualize a specific dataframe you can call :
```python
spectra_predictions[i]  # 'i' refers to the (i+1)th spectrum in the MGF file input
```

Execute the following cell to save the results in an Excel file. For readability, each dataframe is saved in its own sheet.

In [10]:
home = str(Path.home())
writer = pd.ExcelWriter(f"{home}/Downloads/spectra_predictions.xlsx", engine='xlsxwriter')

for i, prediction_dataframe in enumerate(spectra_predictions):
    prediction_dataframe.to_excel(writer, sheet_name=f'spectrum #{i+1}')
writer.save()

In [12]:
spectra_predictions[0]

Unnamed: 0_level_0,score
matches of spectrum #1,Unnamed: 1_level_1
CCMSLIB00005717626,
CCMSLIB00005879264,
CCMSLIB00004752950,0.193805
CCMSLIB00005466063,0.187714
CCMSLIB00000006847,0.184868
CCMSLIB00005467772,0.18338
CCMSLIB00005724675,0.181272
CCMSLIB00005723953,0.17924
CCMSLIB00001059640,0.174597
CCMSLIB00000072118,0.173829


____