# Demo for specXplore importing & session initiation

## Requirements

- python 3.8 is required to ensure compatibility with computational metabolomics packages used (spec2vec, ms2deepscore, ms2query, matchms). 
- specXplore needs to have been installed according to the guidelines in the [github:readme]{placeholder}

In [None]:
import matchms
import spec2vec
import specxplore.importing
import ms2query
import numpy, pandas, dash, dash_cytoscape, kmedoids
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))

# User Input

For specXplore to run with default settings, all input required by the user is a compatible .mgf file with the following entry style:

```text 
BEGIN IONS
FEATURE_ID=1961
PEPMASS=105.03386
SCANS=1961
RTINSECONDS=746.836
CHARGE=1+
MSLEVEL=2
51.02379 5.7E4
53.03930 4.1E5
70.08131 2.5E4
END IONS
```

Note that there is a strict requirement for "FEATURE_ID" to be in all capitals and match the string exactly. If this is not the case, the spectral data will have to be processed with matchms or other means. In the above example, the original feature identifying columns was SCANS. A copy of SCANS was added to each entry with key FEATURE_ID via matchms. Searching for all exact matches of "SCANS=" and replacing them with "FEATURE_ID=" using any text-editor would be a quick solution.

In [None]:
data_file_path = "data/demo_data.mgf" # USER INPUT. BOTH RELATIVE AND ABSOLUTE BATCHS WILL WORK
model_file_path = "models_updated_2024"

# Pipeline

Most of the default specXPlore pipeline runs automatically. Only two steps require user input selection: the selection of the t-SNE embedding, and the selection of k-medoid clustering(s). In both cases, the corresponding pipeline functions will output tuning grid results and the user will have to select a value by setting the idnex to be selected.

In [None]:
pipeline_instance = specxplore.importing.specxploreImportingPipeline()
print(pipeline_instance.score_names)
pipeline_instance.attach_spectra_from_file(data_file_path)

In [None]:
pipeline_instance.run_spectral_processing()
pipeline_instance.spectra_matchms[0:5]

In [None]:
import copy
pipeline_copy1 = copy.deepcopy(pipeline_instance)
pipeline_copy2 = copy.deepcopy(pipeline_instance)

In [None]:
import numpy as np
array1 = specxplore.importing.compute_similarities_cosine(pipeline_instance.spectra_matchms, 'ModifiedCosine')
print(np.min(array1), np.max(array1))

array2 = specxplore.importing.compute_similarities_ms2ds(pipeline_instance.spectra_matchms, model_file_path)
print(np.min(array2), np.max(array2))

array3 = specxplore.importing.compute_similarities_s2v(pipeline_instance.spectra_matchms, model_file_path)
print(np.min(array3), np.max(array3))

pipeline_copy1.attach_spectral_similarity_arrays(array1, array2, array3)

In [None]:
pipeline_copy2.run_spectral_similarity_computations(model_file_path, force = True)
print(np.min(pipeline_copy2.primary_score), np.min(pipeline_copy2.secondary_score), np.min(pipeline_copy2.tertiary_score))

In [None]:
from ms2query.run_ms2query import run_ms2query_single_file
from ms2query.ms2library import create_library_object_from_one_dir


In [None]:
# Create a MS2Library object
ms2library = create_library_object_from_one_dir(model_file_path)

In [None]:
filename = "tmp_ms2query_annotations.csv"

# TODO: make a file exist check and create a new filename (as in ms2query) that is added to the pipeline. 
# Warn the user that an existing file exists and is not used in this run!
# add a setting for use existing filename somehow
# do this to avoid:
# new file run with new filename not caught by specXplore! 


ms2library.analog_search_store_in_csv(pipeline_instance.spectra_matchms, filename, None)

In [None]:
import datetime
def get_readable_timestamp() -> str:
  timestamp = str(datetime.datetime.now()).replace(" ", "_")
  return timestamp