# Experiment 3: Stress test
As a test to ensure that the limits of are similar to METASPACE, this is one of the larger datasets that has been processed.

### METRICS TO BENCHMARK
* Performance:
    * **Metric:** Total processing time

        **Goal:** similar to or faster than METASPACE (including cluster start time)

# Notebook setup
Run python3 setup.py install to install all requirements for annotation pipeline project.

In [None]:
# We need this to overcome Python notebooks limitations of too many open files
import resource
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print('Before:', soft, hard)

# Raising the soft limit. Hard limits can be raised only by sudo users
resource.setrlimit(resource.RLIMIT_NOFILE, (10000, hard))
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print('After:', soft, hard)

In [None]:
%config Completer.use_jedi = False
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
# If pywren_ibm_cloud isn't installed, please run `pip install -e .` in this directory
import pywren_ibm_cloud as pywren

pywren.__version__

In [None]:
import logging
logging.basicConfig(level=logging.INFO)

In [None]:
# Set a socket timeout so that CF requests fail instead of hanging if they don't get a response
import socket
print('Previous timeout:', socket.getdefaulttimeout())
socket.setdefaulttimeout(60)

## Configuration

In [None]:
import json
config = json.load(open('config.json'))

In [None]:
# Uncomment one of the two sets of config files below:

# Largest real-world dataset, typical database
input_ds = json.load(open('metabolomics/ds_config6.json'))
input_db = json.load(open('metabolomics/db_config2.json'))

# Typical dataset, largest real-world database
# input_ds = json.load(open('metabolomics/ds_config2.json'))
# input_db = json.load(open('metabolomics/db_config5.json'))


# Benchmark

In [None]:
import pandas as pd
from datetime import datetime
from annotation_pipeline.molecular_db import upload_mol_dbs_from_dir
from annotation_pipeline.pipeline import Pipeline

### Setup

In [None]:
# Upload databases
upload_mol_dbs_from_dir(config, config['storage']['db_bucket'], 'metabolomics/db', 'metabolomics/db')

### Build molecular database and Run Annotation Pipeline

In [None]:
start_time = datetime.now()

# Run Annotation Pipeline:
pipeline = Pipeline(config, input_ds, input_db, use_cache=False)
pipeline()
finish_time = datetime.now()

In [None]:
print('start', start_time)
print('finish', finish_time)
print('processing time', finish_time - start_time)

In [None]:
# Optionally get results (not part of the benchmark, but useful for debugging)
results_df = pipeline.get_results()
images_dict = pipeline.get_images()

In [None]:
# Display PyWren statistics file
from annotation_pipeline.utils import get_pywren_stats
get_pywren_stats()

# Clean Temp Data

In [None]:
pipeline.clean()