# Demo for specXplore importing & session initiation

## Requirements

- python 3.8 is required to ensure compatibility with computational metabolomics packages used (spec2vec, ms2deepscore, ms2query, matchms). 
- specXplore needs to have been installed according to the guidelines in the [github:readme]{placeholder}

In [None]:
import specxplore.importing
from specxplore.dashboard import SpecxploreDashboard
from specxplore.session_data import load_specxplore_object_from_pickle
import os # For filepath generality across operating systems for this notebook; user may use their operating specific filepaths
import matchms, spec2vec, ms2query, numpy, pandas, dash, dash_cytoscape, kmedoids, ipykernel, jupyter # loaded to show package versions
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))

# User Input

For specXplore to run with default settings, all input required by the user is a compatible .mgf file with the following entry style:

```text 
BEGIN IONS
FEATURE_ID=1961
PEPMASS=105.03386
SCANS=1961
RTINSECONDS=746.836
CHARGE=1+
MSLEVEL=2
51.02379 5.7E4
53.03930 4.1E5
70.08131 2.5E4
END IONS
```

Note that there is a strict requirement for "feature_id" to be present and unique for each spectrum. If this is not the case, the spectral data will have to be processed with matchms or other means. In the above example, the original feature identifying columns was SCANS. A copy of SCANS was added to each entry with key FEATURE_ID via matchms. Searching for all exact matches of "SCANS=" and replacing them with "FEATURE_ID=" using any text-editor would be a quick solution.

In [None]:
data_file_path = os.path.join("data", "demo_data.mgf") # USER INPUT. MGF FILEPATH. BOTH RELATIVE AND ABSOLUTE PATHS WILL WORK
model_file_path = "models_updated_2024" # USER INPUT. MODEL DIRECTORY. BOTH RELATIVE AND ABSOLUTE PATHS WILL WORK

# Pipeline

Most of the default specXPlore pipeline runs automatically. Only two steps require user input selection: the selection of the t-SNE embedding, and the selection of k-medoid clustering(s). In both cases, the corresponding pipeline functions will output tuning grid results and the user will have to select a value by setting the idnex to be selected.

Run the following cell to perform all pipeline steps with default settings. Note that similarity computations and ms2query may take a long time for large datasets.

In [None]:
pipeline_instance = specxplore.importing.specxploreImportingPipeline()
pipeline_instance.attach_spectra_from_file(data_file_path)
pipeline_instance.run_spectral_processing()
pipeline_instance.run_spectral_similarity_computations(model_file_path, force = True)
#pipeline_instance.run_ms2query(model_file_path, "output/ms2query_results.csv") # to run ms2query use this line of code
pipeline_instance.attach_ms2query_results("output/ms2query_results.csv") # to attach premade ms2query results use this line of code
pipeline_instance.run_and_attach_tsne_grid(perplexity_values=[5,10,15,100])
pipeline_instance.run_and_attach_kmedoid_grid([5,10,15,100])

In [None]:
features_to_highlight = ['1961', '76', '198'] # USER INPUT : ... PROVIDE SOME FEATURE_IDS AS A LIST, OR PROVIDE EMPTY LIST []
selected_tsne_iloc = 1 # USER INPUT : SELECT AN ILOC FROM TSNE OPTIMIZATION RESULTS OF PREVIOUS CELL.
selected_kmedoid_ilocs = [0, 2] # USER INPUT : SELECT ILOC(S) FROM KMEDOID OPTIMIZATION RESULTS OF PREVIOUS CELL.

In [None]:
pipeline_instance.attach_feature_highlights(features_to_highlight)
pipeline_instance.select_tsne_settings(selected_tsne_iloc)
pipeline_instance.select_kmedoid_settings(selected_kmedoid_ilocs)
pipeline_instance.export_specxplore_session_data(force = True)

In [None]:
#import specxplore.run_dashboard
#specxplore.run_dashboard.app.run_server(jupyter_mode="external")
pipeline_instance.tsne_coordinates_table

In [None]:
data = load_specxplore_object_from_pickle(filepath="output/specxplore_session_data.pickle") # using the default specXplore filepath
dashboard = SpecxploreDashboard(data).run_app(jupyter_mode = "external")
#dashboard.run_app() # to run locally inside jupyter notebook.