# Experiment 2: Interactive reprocessing
This is representative of a new type of functionality that we currently don’t support in METASPACE
because it’s uneconomical with the serverful approach. While looking for specific compounds,
scientists tend to have relatively short lists of molecules of interest, and iteratively try
different adducts or modifiers until they find the data they’re interested in.

## METRICS TO BENCHMARK
* Performance:
    * **Metric:** Total processing time
    
        **Goal:** Fast enough to use interactively in a notebook - less than ~60 seconds

* Cost:
    * **Metric:** Total cost
    
        **Goal:** Significantly less than a full annotation - determined by experiment 1

# Notebook setup
Run `pip install -e .` in this directory to install all requirements for annotation pipeline project.

In [None]:
import logging
logging.basicConfig(level=logging.INFO)

## Configuration

In [None]:
import json
config = json.load(open('config.json'))

In [None]:
# Input dataset
input_ds = json.load(open('metabolomics/ds_config2.json'))
# Input database (Used as a template. Some parameters overridden below...)
input_db = json.load(open('metabolomics/db_config2.json'))

# Override databases, because this experiment expects a small database
exp_db_path = 'metabolomics/db/mol_db5.csv'
input_db['databases'] = [exp_db_path]

# Initial setup (not included in benchmark timings)

In [None]:
from annotation_pipeline.pipeline import Pipeline
pipeline = Pipeline(config, input_ds, input_db, use_ds_cache=False, use_db_cache=False)

### Load & segment dataset

In [None]:
pipeline.upload_dataset()
pipeline.load_ds()
pipeline.split_ds()
pipeline.segment_ds()

# Benchmark
Process new molecules and Run Annotation

In [None]:
from datetime import datetime
start_time = datetime.now()

# Process new molecules:
## Upload list of molecules (in a real scenario this list would change every iteration, so this isn't part of setup)
pipeline.upload_molecular_databases()
pipeline.build_database()
pipeline.calculate_centroids()
pipeline.segment_centroids()

# Run Annotation:
pipeline.annotate()
results_df = pipeline.get_results()

finish_time = datetime.now()

In [None]:
print('start', start_time)
print('finish', finish_time)
print('duration', finish_time - start_time)

In [None]:
# Display statistics file
from annotation_pipeline.utils import PipelineStats
PipelineStats.get()

In [None]:
# Display results
print(results_df.shape)
results_df.head()

# Clean Temp Data

In [None]:
pipeline.clean()