# Demo for specXplore importing & session initiation

## Requirements

- python 3.8 is required to ensure compatibility with computational metabolomics packages used (spec2vec, ms2deepscore, ms2query, matchms). 
- specXplore needs to have been installed according to the guidelines in the [github:readme]{placeholder}

In [8]:
import matchms
import spec2vec
import specxplore.importing
import ms2query
import numpy, pandas, dash, dash_cytoscape, kmedoids
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))

matchms==0.24.1
spec2vec==0.8.0
ms2query==1.3.0
numpy==1.24.3
pandas==2.0.3
dash==2.14.2
dash_cytoscape==0.2.0


# User Input

For specXplore to run with default settings, all input required by the user is a compatible .mgf file with the following entry style:

```text 
BEGIN IONS
FEATURE_ID=1961
PEPMASS=105.03386
SCANS=1961
RTINSECONDS=746.836
CHARGE=1+
MSLEVEL=2
51.02379 5.7E4
53.03930 4.1E5
70.08131 2.5E4
END IONS
```

Note that there is a strict requirement for "FEATURE_ID" to be in all capitals and match the string exactly. If this is not the case, the spectral data will have to be processed with matchms or other means. In the above example, the original feature identifying columns was SCANS. A copy of SCANS was added to each entry with key FEATURE_ID via matchms. Searching for all exact matches of "SCANS=" and replacing them with "FEATURE_ID=" using any text-editor would be a quick solution.

In [9]:
data_file_path = "notebooks/data/demo_data.mgf" # USER INPUT. BOTH RELATIVE AND ABSOLUTE BATCHS WILL WORK

# Pipeline

Most of the default specXPlore pipeline runs automatically. Only two steps require user input selection: the selection of the t-SNE embedding, and the selection of k-medoid clustering(s). In both cases, the corresponding pipeline functions will output tuning grid results and the user will have to select a value by setting the idnex to be selected.

In [10]:
pipeline_instance = specxplore.importing.specxploreImportingPipeline()

In [11]:
pipeline_instance.score_names

['ms2deepscore', 'spec2vec', 'modified-cosine']