# SpecTUS demo

This notebook is a **demonstrator** of the [SpecTUS model](https://doi.org/10.48550/arXiv.2502.05114). Take it as it is, it can show the model capabilities but it is not intended for production use.

In [None]:
# just usual required stuff
import requests
from matchms import Spectrum
from matchms.importing import load_from_msp
from matchms import set_matchms_logger_level
import pandas as pd
import numpy as np
from rdkit import Chem
import matplotlib.pyplot as plt
from time import sleep
set_matchms_logger_level("ERROR")

## API endpoint

The notebook is a client, it requires the service (wrapping the actual model) somewhere.
By default, we provide a true demonstrator on a best-effort basis, running at our [Kubernetes cluster](https://docs.cerit.io/en/platform/overview).
- The service behind this endopoint is not guaranteed to run at all.
- The assigned resources are a fraction of CPU initially, it can grow up to 16 CPUs (only if they are currently available on the cluster).
- Requests are processed serially, first come, first served.
- There is a limit of 100 requests per day from a single IP address. Results are held for few hours only.
- There is no persistence, if the service stops for whatever reason, all previous requests are erased.

Having said that, enjoy!

In [None]:
api = 'https://spectus-demo.dyn.cloud.e-infra.cz/predictions'

## Example data

This file is a random selection of 100 spectra from 1640 entries which remained from [SWGDRUG MS library](https://www.swgdrug.org/ms.htm) after filtering out compounds 
present in our train set. Therefore these are quaranteed to be true testing data, not seen by the model on training.

Pick one by specifying `idx` bellow, or upload whatever you are interested, and read it in. All you need is to provide the list of m/z values in `mz` and corresponding intensities in `intensity`, and reference SMILES in `smiles` eventually.

In [None]:
msp_file_path = "SWGDRUG_100.msp"
msp = load_from_msp(msp_file_path)
mzs,intensities,smileses = zip(*[ (
    list(map(float,s.peaks.mz)),
    list(map(float,s.peaks.intensities)),
    s.metadata.get('canonical_smiles'))
    for s in msp ])
print ('# spectra read: ',len(mzs))

In [None]:
idx = 7

mz = mzs[idx]
intensity = intensities[idx]
smiles = smileses[idx]

## Look at what we have

In [None]:
print(smiles)
Chem.MolFromSmiles(smiles)

In [None]:
spec=Spectrum(mz=np.array(mz), intensities=np.array(intensity), metadata={})
spec.plot(figsize=(8,3),dpi=120)
plt.show()

## Submit the request

Specify `num_candidates` to be the number of alternatives the model should try to produce. It may not be able to give so many but it will not try more.

Be careful, the more you ask, the longer the calculation. 
Typically, 10 is enough to find the best result we can give.


In [None]:
num_candidates = 8

response_post = requests.post(api, json={
    "mz": mz,
    "intensity": intensity,
    "num_candidates" : num_candidates,
#    "smiles": "unknown"
})

if response_post.status_code == 202:
    print('POST request successful.')
    print('Response (POST):', response_post.json())
else:
    print('POST request failed with status code:', response_post.status_code)

created_post_id = response_post.json().get('id') if response_post.status_code == 202 else None

## Monitor progress of the request

Initially, it is _queued_, then _running_, and finally either _failed_ or _completed_.

Typical calculation time is **10-60 seconds**, depending mostly on the number of alternatives one asks for. So be patient.

Proceed further only once you get the status _completed_.

In [None]:
result = None
if created_post_id:
    while True:
        resp = requests.get(f"{api}/{created_post_id}")
    
        if resp.status_code == 200:
            stat = resp.json()['status']
            print('Request status:', stat)
            if stat == 'completed':
                result = resp.json()['result']
                break
            elif stat == 'failed':
                break
        else:
            print('HTTP:', resp.status_code)
            break
        sleep(5)
else:
    print('No valid ID, has POST above failed?')

## Look at the results

Parse the response, display it as a table sorted by confidence of the model (though it can be wrong, that's why we ask for more alternatives typically).

Then show our reference structure (if it is available) and compare it with the model prediction. Set `n` to choose a row in the table.

In [None]:
r = pd.DataFrame(result.items(),columns=['SMILES','confidence'])\
    .sort_values(by='confidence',ascending=False)\
    .reset_index(drop=True)
r

In [None]:
# reference structure, if available
Chem.MolFromSmiles(smiles)

In [None]:
# what the model returned at n-th order
n = 0
Chem.MolFromSmiles(r.loc[n].SMILES)