# Experiment 1: Typical use case

This is representative of a normal use-case on METASPACE, which makes it suitable for head-to-head comparisons. 
There is often limited time available on the higher-spec PC used for initial data capture as it is a shared resource, 
so usually the analysis will be performed from scientists’ or students’ lower-spec laptops. 
This use-case demonstrates how Lithops enables computationally demanding process to be easily run using cloud computing resources.

### METRICS TO BENCHMARK
* Performance:
    * **Metric:** Total processing time up to downloading the results dataframe
    
        **Goal:** Faster than serverful METASPACE (including or excluding cluster start time)

    * **Metric:** Latency for retrieving all images of target ions.
    
        **Goal:** Similar to or faster than METASPACE’s python client

* Capability:
    * **Metric:** Peak memory usage on client.
    
        **Goal:** Capable of running on low-spec PC with 8GB ram, so ~6GB max usage

* Cost:
    * **Metric:** Cloud provider cost
    
        **Goal:** Similar price or cheaper than METASPACE (including or excluding cluster start time)

    * **Metric:** Developer time
    
        **Goal:** Less annual time required to manage cloud infrastructure than METASPACE

# Notebook setup
Run `pip install -e .` in this directory to install all requirements for annotation pipeline project.

In [None]:
# If Lithops isn't installed, please run `pip install -e .` in this directory
import lithops
lithops.__version__

In [None]:
import logging
logging.basicConfig(level=logging.INFO)

## Configuration

In [None]:
import json

In [None]:
# Select the input dataset and database (increase/decrease config number to increase/decrease job size)
input_ds = json.load(open('metabolomics/ds_config2.json'))
input_db = json.load(open('metabolomics/db_config2.json'))

# Setup (Not included in benchmark timings)

In [None]:
from annotation_pipeline.pipeline import Pipeline

# Process database and pre-calculate centroids (not benchmarked because usually this step is cached)
pipeline = Pipeline(input_ds, input_db, use_db_cache=True, use_ds_cache=False)
pipeline(task='db')

# Benchmark

In [None]:
import os
import psutil
from datetime import datetime
memory_usage_mb = psutil.Process(os.getpid()).memory_info().rss / 2**20
print(f'Memory usage before: {memory_usage_mb:.0f}MB')

### Run Annotation Pipeline

In [None]:
start_time = datetime.now()
pipeline(task='ds')
results_df = pipeline.get_results()
finish_time = datetime.now()

In [None]:
len(foo)

In [None]:
print('start', start_time)
print('finish', finish_time)
print('duration', finish_time - start_time)

In [None]:
# Display statistics file
from annotation_pipeline.utils import PipelineStats
PipelineStats.get()

In [None]:
memory_usage_mb = psutil.Process(os.getpid()).memory_info().rss / 2**20
print(f'Memory usage after: {memory_usage_mb:.0f}MB')

# Get results
Download result dataframe and images

In [None]:
pipeline.save_results('output_dir')

# Check results are correct

This compares the generated annotations against results on https://metaspace2020.eu. For this to work, the dataset config file must have a `metaspace_id` parameter set.

In [None]:
checked_results = pipeline.check_results()

# Clean Temp Data

In [None]:
pipeline.clean()