# Initial requirements

This notebook requires IBM Cloud Object Storage and IBM Cloud Functions
Please follow IBM Cloud dashboard and create both services.


### IBM COS Setup

Copy the file `config.json.template` to `config.json` and fill in the missing values for API keys, buckets and endpoints per these instructions:

Setup a bucket in IBM Cloud Object Storage

You need an IBM COS bucket which you will use to store the input data. If you don't know of any of your existing buckets or would like like to create a new one, please navigate to your cloud resource list, then find and select your storage instance. From here, you will be able to view all your buckets and can create a new bucket in the region you prefer. Make sure you copy the correct endpoint for the bucket from the Endpoint tab of this COS service dashboard. Note: The bucket names must be unique.

Obtain the API key and endpoint to the IBM Cloud Functions service. Navigate to Getting Started > API Key from the side menu and copy the values for "Current Namespace", "Host" and "Key" into the config below. Make sure to add "https://" to the host when adding it as the endpoint.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# These are Python and Python lib path we want to use
import sys
sys.executable, sys.prefix

('/opt/dev/miniconda3/envs/pywren-ibm/bin/python',
 '/opt/dev/miniconda3/envs/pywren-ibm')

In [3]:
#Install PyWren-IBM if needed
try:
    import pywren_ibm_cloud as pywren
except ModuleNotFoundError:    
    !{sys.executable} -m pip install -U pywren-ibm-cloud==1.0.10
    import pywren_ibm_cloud as pywren

pywren.__version__

'1.0.10'

In [4]:
# We need this to overcome Python notebooks limitations of too many open files
import resource
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print('Bebore:', soft, hard)

# Raising the soft limit. Hard limits can be raised only by sudo users
resource.setrlimit(resource.RLIMIT_NOFILE, (hard, hard))
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print('After:', soft, hard)

Bebore: 1024 4096
After: 4096 4096


In [5]:
%config Completer.use_jedi = False
%matplotlib inline

In [6]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
from matplotlib import pyplot as plt
from scipy.sparse import coo_matrix
from collections import defaultdict
from pyImagingMSpec.image_measures import isotope_image_correlation, isotope_pattern_match
from cpyImagingMSpec import measure_of_chaos
from itertools import chain
from pathlib import Path
import numpy as np
import pandas as pd
import pickle
import sys
import io

In [8]:
import logging
logging.basicConfig(level=logging.DEBUG)

In [9]:
import json

config = json.load(open('config.json'))

### Input Files Setup

Copy the file `input_config.json.template` to `input_config.json` and fill in the missing values for buckets.

In [11]:
#input_config = json.load(open('metabolomics/input_config_small.json'))
input_config = json.load(open('metabolomics/input_config_big.json'))
# input_config = json.load(open('metabolomics/input_config_huge.json'))
input_data = input_config['dataset']
input_db = input_config['molecular_db']

In [12]:
input_data

{'bucket': 'pywren-annotation-pipeline',
 'path': 'metabolomics/ds/AZ_Rat_brains',
 'ds_segments': 'metabolomics/tmp/ds_segments',
 'centr_segments': 'metabolomics/tmp/centr_segments',
 'formula_images': 'metabolomics/tmp/formula_images'}

In [13]:
input_db

{'bucket': 'pywren-annotation-pipeline',
 'formulas_chunks': 'metabolomics/db/formulas',
 'centroids_pandas': 'metabolomics/db/centroids.pickle',
 'databases': ['metabolomics/db/mol_db1.pickle'],
 'adducts': ['', '+H', '+Na', '+K'],
 'modifiers': ['', '-H2O', '-CO2', '-NH3']}

In [14]:
import ibm_boto3
from ibm_botocore.client import Config
from ibm_botocore.client import ClientError

In [15]:
cos_client = ibm_boto3.client(service_name='s3',
                              ibm_api_key_id=config['ibm_cos']['api_key'],
#                               ibm_auth_endpoint=config['ibm_cos']['auth_endpoint'],
                              config=Config(signature_version='oauth'),
                              endpoint_url=config['ibm_cos']['endpoint'])

# Upload test data into COS bucket

In [93]:
import os
from annotation_pipeline_v2.utils import upload_to_cos

In [94]:
for root, dirnames, filenames in os.walk(input_data['path']):
    for fn in filenames:
        f_path = f'{root}/{fn}'
        print(f_path)
        upload_to_cos(cos_client, f_path, input_config['dataset']['bucket'], f_path)

metabolomics/ds/AZ_Rat_brains/Image5.imzML
Copying from metabolomics/ds/AZ_Rat_brains/Image5.imzML to pywren-annotation-pipeline/metabolomics/ds/AZ_Rat_brains/Image5.imzML
Copy completed for pywren-annotation-pipeline/metabolomics/ds/AZ_Rat_brains/Image5.imzML
metabolomics/ds/AZ_Rat_brains/Image5.ibd
Copying from metabolomics/ds/AZ_Rat_brains/Image5.ibd to pywren-annotation-pipeline/metabolomics/ds/AZ_Rat_brains/Image5.ibd
Copy completed for pywren-annotation-pipeline/metabolomics/ds/AZ_Rat_brains/Image5.ibd
metabolomics/ds/AZ_Rat_brains/meta.json
Copying from metabolomics/ds/AZ_Rat_brains/meta.json to pywren-annotation-pipeline/metabolomics/ds/AZ_Rat_brains/meta.json
Copy completed for pywren-annotation-pipeline/metabolomics/ds/AZ_Rat_brains/meta.json
metabolomics/ds/AZ_Rat_brains/config.json
Copying from metabolomics/ds/AZ_Rat_brains/config.json to pywren-annotation-pipeline/metabolomics/ds/AZ_Rat_brains/config.json
Copy completed for pywren-annotation-pipeline/metabolomics/ds/AZ_Rat

# Generate Isotopic Peaks from Molecular Databases

In [29]:
from annotation_pipeline.molecular_db import dump_mol_db, build_database, \
    calculate_centroids, clean_formula_chunks

In [15]:
# Download commonly used mol DBs from METASPACE (add force=True to redownload if needed)
dump_mol_db(config, input_db['bucket'], 'metabolomics/db/mol_db1.pickle', 22) #HMDB-v4
dump_mol_db(config, input_db['bucket'], 'metabolomics/db/mol_db2.pickle', 19) #ChEBI-2018-01
dump_mol_db(config, input_db['bucket'], 'metabolomics/db/mol_db3.pickle', 24) #LipidMaps-2017-12-12
dump_mol_db(config, input_db['bucket'], 'metabolomics/db/mol_db4.pickle', 26) #SwissLipids-2018-02-02

In [16]:
#%%time
num_formulas, formula_chunk_keys = build_database(config, input_db)

2019-05-07 19:22:55,871 [DEBUG] pywren_ibm_cloud.storage.backends.cos: Set IBM COS Endpoint to https://s3.eu-de.cloud-object-storage.appdomain.cloud
2019-05-07 19:22:55,872 [DEBUG] pywren_ibm_cloud.storage.backends.cos: Set IBM COS Auth Endpoint to https://iam.cloud.ibm.com/oidc/token
2019-05-07 19:22:55,873 [DEBUG] pywren_ibm_cloud.storage.backends.cos: IBM COS: Using api_key - Requesting new token
2019-05-07 19:22:56,126 [INFO] pywren_ibm_cloud.cf_connector: IBM Cloud Functions init for namespace: kovalev@embl.de_dev
2019-05-07 19:22:56,126 [INFO] pywren_ibm_cloud.cf_connector: IBM Cloud Functions init for host: https://eu-de.functions.cloud.ibm.com
2019-05-07 19:22:56,127 [DEBUG] pywren_ibm_cloud.cf_connector: CF user agent set to: python-requests/2.21.0 pywren-ibm-cloud
2019-05-07 19:22:56,128 [INFO] pywren_ibm_cloud.invokers: IBM Cloud Functions init for Runtime: ibmfunctions/pywren-metabolomics:3.6 - 2048MB
2019-05-07 19:22:56,128 [DEBUG] pywren_ibm_cloud.runtime.metadata: Downlo

2019-05-07 19:23:01,068 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00031 - Activation ID: ffc585ac86b641a58585ac86b621a539 - Time: 0.131 seconds
2019-05-07 19:23:01,069 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00054 - Activation ID: 39b6e89a42a04b50b6e89a42a0db5045 - Time: 0.042 seconds
2019-05-07 19:23:01,072 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00035 - Activation ID: 50e021299f724f20a021299f729f20de - Time: 0.126 seconds
2019-05-07 19:23:01,075 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00005 - Activation ID: 923554b8ee5c43deb554b8ee5ce3de11 - Time: 0.206 seconds
2019-05-07 19:23:01,078 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00023 - Activation ID: 552926d0069a4812a926d0069a981258 - Time: 0.165 seconds
2019-05-07 19:23:01,080 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00001 - Activation ID:

2019-05-07 19:23:01,201 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00072 - Activation ID: e1cacb244ea746808acb244ea7b680a5 - Time: 0.049 seconds
2019-05-07 19:23:01,203 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00070 - Activation ID: e58317d16558474c8317d16558974c38 - Time: 0.085 seconds
2019-05-07 19:23:01,214 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00074 - Activation ID: b809c28556f64db989c28556f6fdb97b - Time: 0.049 seconds
2019-05-07 19:23:01,224 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00076 - Activation ID: b06c98f5184e45bcac98f5184e15bce4 - Time: 0.054 seconds
2019-05-07 19:23:01,228 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00083 - Activation ID: 6ec1c873350a432781c873350a1327e3 - Time: 0.032 seconds
2019-05-07 19:23:01,231 [DEBUG] pywren_ibm_cloud.cf_connector: Executor ID fad9eb55-eb1c Function 00075 - Activation ID:

In [18]:
num_formulas, len(formula_chunk_keys), formula_chunk_keys[:3]

(2406009,
 256,
 ['metabolomics/db/formulas/0.pickle',
  'metabolomics/db/formulas/1.pickle',
  'metabolomics/db/formulas/2.pickle'])

In [19]:
# %%time
centroids_shape, centroids_head = calculate_centroids(config, input_db, formula_chunk_keys)

In [20]:
centroids_shape

(9624036, 3)

In [21]:
centroids_head

Unnamed: 0_level_0,peak_i,mz,int
formula_i,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,186.8653,100.0
0,1,187.871573,0.057511
0,2,188.864972,92.851404
0,3,189.871182,0.053439
1,0,202.860214,100.0
1,1,203.865913,0.070423
1,2,204.859832,92.940336
1,3,206.864036,0.190792


In [30]:
# Download centroids.pickle to local machine
resp = cos_client.get_object(Bucket=input_db['bucket'], Key=input_db['centroids_pandas'])
with open(input_db['centroids_pandas'], 'wb') as f:
    f.write(resp['Body'].read())

In [22]:
clean_formula_chunks(config, input_db, formula_chunk_keys)

# Run Annotation Pipeline

In [16]:
from annotation_pipeline_v2.pipeline import Pipeline

In [22]:
pipeline = Pipeline(input_config)

In [23]:
%time pipeline.load_ds()

INFO:annotation-pipeline:Parsed imzml: 32224 spectra found


CPU times: user 3.86 s, sys: 20.1 ms, total: 3.88 s
Wall time: 3.89 s


In [24]:
%time pipeline.segment_ds()

INFO:annotation-pipeline:Defining dataset segment bounds
INFO:annotation-pipeline:Generated 282 dataset segments: [79.99772644 86.09687805]...[494.32427979 499.97991943]
INFO:annotation-pipeline:Segmenting dataset into 282 segments
DEBUG:annotation-pipeline:Segmenting spectra chunk 0
DEBUG:annotation-pipeline:Segmenting spectra chunk 1
DEBUG:annotation-pipeline:Segmenting spectra chunk 2
DEBUG:annotation-pipeline:Segmenting spectra chunk 3
DEBUG:annotation-pipeline:Segmenting spectra chunk 4
DEBUG:annotation-pipeline:Segmenting spectra chunk 5
DEBUG:annotation-pipeline:Segmenting spectra chunk 6


CPU times: user 21 s, sys: 2.47 s, total: 23.4 s
Wall time: 23.5 s


In [25]:
%time pipeline.segment_centroids()

INFO:annotation-pipeline:Prepared 3695968 centroids
INFO:annotation-pipeline:Segmenting centroids into 369 segments


CPU times: user 6.33 s, sys: 895 ms, total: 7.23 s
Wall time: 7.23 s


In [26]:
%time formula_metrics_df = pipeline.annotate()

INFO:annotation-pipeline:Annotating...
DEBUG:annotation-pipeline:Reading centroids segment 0 from metabolomics/tmp/centr_segments/centr_segm_0000.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 0-6
DEBUG:annotation-pipeline:Saving 21 images
DEBUG:annotation-pipeline:Segment 0 finished
DEBUG:annotation-pipeline:Reading centroids segment 1 from metabolomics/tmp/centr_segments/centr_segm_0001.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 3-10
DEBUG:annotation-pipeline:Saving 69 images
DEBUG:annotation-pipeline:Segment 1 finished
DEBUG:annotation-pipeline:Reading centroids segment 2 from metabolomics/tmp/centr_segments/centr_segm_0002.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 8-12
DEBUG:annotation-pipeline:Saving 52 images
DEBUG:annotation-pipeline:Segment 2 finished
DEBUG:annotation-pipeline:Reading centroids segment 3 from metabolomics/tmp/centr_segments/centr_segm_0003.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 11-15
DEBUG:annotat

DEBUG:annotation-pipeline:Reading dataset segments 63-71
DEBUG:annotation-pipeline:Saving 225 images
DEBUG:annotation-pipeline:Segment 31 finished
DEBUG:annotation-pipeline:Reading centroids segment 32 from metabolomics/tmp/centr_segments/centr_segm_0032.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 65-73
DEBUG:annotation-pipeline:Saving 306 images
DEBUG:annotation-pipeline:Segment 32 finished
DEBUG:annotation-pipeline:Reading centroids segment 33 from metabolomics/tmp/centr_segments/centr_segm_0033.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 67-74
DEBUG:annotation-pipeline:Saving 206 images
DEBUG:annotation-pipeline:Segment 33 finished
DEBUG:annotation-pipeline:Reading centroids segment 34 from metabolomics/tmp/centr_segments/centr_segm_0034.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 69-76
DEBUG:annotation-pipeline:Saving 139 images
DEBUG:annotation-pipeline:Segment 34 finished
DEBUG:annotation-pipeline:Reading centroids segment 35 from meta

DEBUG:annotation-pipeline:Saving 375 images
DEBUG:annotation-pipeline:Segment 62 finished
DEBUG:annotation-pipeline:Reading centroids segment 63 from metabolomics/tmp/centr_segments/centr_segm_0063.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 109-117
DEBUG:annotation-pipeline:Saving 139 images
DEBUG:annotation-pipeline:Segment 63 finished
DEBUG:annotation-pipeline:Reading centroids segment 64 from metabolomics/tmp/centr_segments/centr_segm_0064.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 109-117
DEBUG:annotation-pipeline:Saving 208 images
DEBUG:annotation-pipeline:Segment 64 finished
DEBUG:annotation-pipeline:Reading centroids segment 65 from metabolomics/tmp/centr_segments/centr_segm_0065.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 110-119
DEBUG:annotation-pipeline:Saving 120 images
DEBUG:annotation-pipeline:Segment 65 finished
DEBUG:annotation-pipeline:Reading centroids segment 66 from metabolomics/tmp/centr_segments/centr_segm_0066.msgpack

DEBUG:annotation-pipeline:Saving 72 images
DEBUG:annotation-pipeline:Segment 93 finished
DEBUG:annotation-pipeline:Reading centroids segment 94 from metabolomics/tmp/centr_segments/centr_segm_0094.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 140-148
DEBUG:annotation-pipeline:Saving 110 images
DEBUG:annotation-pipeline:Segment 94 finished
DEBUG:annotation-pipeline:Reading centroids segment 95 from metabolomics/tmp/centr_segments/centr_segm_0095.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 142-150
DEBUG:annotation-pipeline:Saving 88 images
DEBUG:annotation-pipeline:Segment 95 finished
DEBUG:annotation-pipeline:Reading centroids segment 96 from metabolomics/tmp/centr_segments/centr_segm_0096.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 143-151
DEBUG:annotation-pipeline:Saving 154 images
DEBUG:annotation-pipeline:Segment 96 finished
DEBUG:annotation-pipeline:Reading centroids segment 97 from metabolomics/tmp/centr_segments/centr_segm_0097.msgpack
D

DEBUG:annotation-pipeline:Reading dataset segments 168-175
DEBUG:annotation-pipeline:Saving 114 images
DEBUG:annotation-pipeline:Segment 124 finished
DEBUG:annotation-pipeline:Reading centroids segment 125 from metabolomics/tmp/centr_segments/centr_segm_0125.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 169-177
DEBUG:annotation-pipeline:Saving 73 images
DEBUG:annotation-pipeline:Segment 125 finished
DEBUG:annotation-pipeline:Reading centroids segment 126 from metabolomics/tmp/centr_segments/centr_segm_0126.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 169-178
DEBUG:annotation-pipeline:Saving 139 images
DEBUG:annotation-pipeline:Segment 126 finished
DEBUG:annotation-pipeline:Reading centroids segment 127 from metabolomics/tmp/centr_segments/centr_segm_0127.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 171-179
DEBUG:annotation-pipeline:Saving 78 images
DEBUG:annotation-pipeline:Segment 127 finished
DEBUG:annotation-pipeline:Reading centroids segment

DEBUG:annotation-pipeline:Reading dataset segments 192-201
DEBUG:annotation-pipeline:Saving 100 images
DEBUG:annotation-pipeline:Segment 155 finished
DEBUG:annotation-pipeline:Reading centroids segment 156 from metabolomics/tmp/centr_segments/centr_segm_0156.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 193-202
DEBUG:annotation-pipeline:Saving 168 images
DEBUG:annotation-pipeline:Segment 156 finished
DEBUG:annotation-pipeline:Reading centroids segment 157 from metabolomics/tmp/centr_segments/centr_segm_0157.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 194-202
DEBUG:annotation-pipeline:Saving 9 images
DEBUG:annotation-pipeline:Segment 157 finished
DEBUG:annotation-pipeline:Reading centroids segment 158 from metabolomics/tmp/centr_segments/centr_segm_0158.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 194-203
DEBUG:annotation-pipeline:Saving 137 images
DEBUG:annotation-pipeline:Segment 158 finished
DEBUG:annotation-pipeline:Reading centroids segment

DEBUG:annotation-pipeline:Reading dataset segments 216-223
DEBUG:annotation-pipeline:Saving 127 images
DEBUG:annotation-pipeline:Segment 186 finished
DEBUG:annotation-pipeline:Reading centroids segment 187 from metabolomics/tmp/centr_segments/centr_segm_0187.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 217-224
DEBUG:annotation-pipeline:Saving 86 images
DEBUG:annotation-pipeline:Segment 187 finished
DEBUG:annotation-pipeline:Reading centroids segment 188 from metabolomics/tmp/centr_segments/centr_segm_0188.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 217-224
DEBUG:annotation-pipeline:Saving 186 images
DEBUG:annotation-pipeline:Segment 188 finished
DEBUG:annotation-pipeline:Reading centroids segment 189 from metabolomics/tmp/centr_segments/centr_segm_0189.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 218-225
DEBUG:annotation-pipeline:Saving 24 images
DEBUG:annotation-pipeline:Segment 189 finished
DEBUG:annotation-pipeline:Reading centroids segment

DEBUG:annotation-pipeline:Reading dataset segments 234-239
DEBUG:annotation-pipeline:Saving 7 images
DEBUG:annotation-pipeline:Segment 217 finished
DEBUG:annotation-pipeline:Reading centroids segment 218 from metabolomics/tmp/centr_segments/centr_segm_0218.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 234-240
DEBUG:annotation-pipeline:Saving 39 images
DEBUG:annotation-pipeline:Segment 218 finished
DEBUG:annotation-pipeline:Reading centroids segment 219 from metabolomics/tmp/centr_segments/centr_segm_0219.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 235-240
DEBUG:annotation-pipeline:Saving 118 images
DEBUG:annotation-pipeline:Segment 219 finished
DEBUG:annotation-pipeline:Reading centroids segment 220 from metabolomics/tmp/centr_segments/centr_segm_0220.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 236-241
DEBUG:annotation-pipeline:Saving 74 images
DEBUG:annotation-pipeline:Segment 220 finished
DEBUG:annotation-pipeline:Reading centroids segment 2

DEBUG:annotation-pipeline:Reading dataset segments 249-252
DEBUG:annotation-pipeline:Saving 2 images
DEBUG:annotation-pipeline:Segment 248 finished
DEBUG:annotation-pipeline:Reading centroids segment 249 from metabolomics/tmp/centr_segments/centr_segm_0249.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 249-253
DEBUG:annotation-pipeline:Saving 81 images
DEBUG:annotation-pipeline:Segment 249 finished
DEBUG:annotation-pipeline:Reading centroids segment 250 from metabolomics/tmp/centr_segments/centr_segm_0250.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 249-253
DEBUG:annotation-pipeline:Saving 132 images
DEBUG:annotation-pipeline:Segment 250 finished
DEBUG:annotation-pipeline:Reading centroids segment 251 from metabolomics/tmp/centr_segments/centr_segm_0251.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 250-253
DEBUG:annotation-pipeline:Saving 0 images
DEBUG:annotation-pipeline:Segment 251 finished
DEBUG:annotation-pipeline:Reading centroids segment 25

DEBUG:annotation-pipeline:Reading dataset segments 259-263
DEBUG:annotation-pipeline:Saving 122 images
DEBUG:annotation-pipeline:Segment 279 finished
DEBUG:annotation-pipeline:Reading centroids segment 280 from metabolomics/tmp/centr_segments/centr_segm_0280.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 260-263
DEBUG:annotation-pipeline:Saving 7 images
DEBUG:annotation-pipeline:Segment 280 finished
DEBUG:annotation-pipeline:Reading centroids segment 281 from metabolomics/tmp/centr_segments/centr_segm_0281.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 260-264
DEBUG:annotation-pipeline:Saving 45 images
DEBUG:annotation-pipeline:Segment 281 finished
DEBUG:annotation-pipeline:Reading centroids segment 282 from metabolomics/tmp/centr_segments/centr_segm_0282.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 261-264
DEBUG:annotation-pipeline:Saving 168 images
DEBUG:annotation-pipeline:Segment 282 finished
DEBUG:annotation-pipeline:Reading centroids segment 

DEBUG:annotation-pipeline:Reading dataset segments 268-271
DEBUG:annotation-pipeline:Saving 38 images
DEBUG:annotation-pipeline:Segment 310 finished
DEBUG:annotation-pipeline:Reading centroids segment 311 from metabolomics/tmp/centr_segments/centr_segm_0311.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 268-272
DEBUG:annotation-pipeline:Saving 107 images
DEBUG:annotation-pipeline:Segment 311 finished
DEBUG:annotation-pipeline:Reading centroids segment 312 from metabolomics/tmp/centr_segments/centr_segm_0312.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 269-272
DEBUG:annotation-pipeline:Saving 36 images
DEBUG:annotation-pipeline:Segment 312 finished
DEBUG:annotation-pipeline:Reading centroids segment 313 from metabolomics/tmp/centr_segments/centr_segm_0313.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 269-272
DEBUG:annotation-pipeline:Saving 172 images
DEBUG:annotation-pipeline:Segment 313 finished
DEBUG:annotation-pipeline:Reading centroids segment

DEBUG:annotation-pipeline:Reading dataset segments 276-278
DEBUG:annotation-pipeline:Saving 173 images
DEBUG:annotation-pipeline:Segment 341 finished
DEBUG:annotation-pipeline:Reading centroids segment 342 from metabolomics/tmp/centr_segments/centr_segm_0342.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 277-278
DEBUG:annotation-pipeline:Saving 4 images
DEBUG:annotation-pipeline:Segment 342 finished
DEBUG:annotation-pipeline:Reading centroids segment 343 from metabolomics/tmp/centr_segments/centr_segm_0343.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 277-279
DEBUG:annotation-pipeline:Saving 51 images
DEBUG:annotation-pipeline:Segment 343 finished
DEBUG:annotation-pipeline:Reading centroids segment 344 from metabolomics/tmp/centr_segments/centr_segm_0344.msgpack
DEBUG:annotation-pipeline:Reading dataset segments 277-279
DEBUG:annotation-pipeline:Saving 112 images
DEBUG:annotation-pipeline:Segment 344 finished
DEBUG:annotation-pipeline:Reading centroids segment 

CPU times: user 17min 29s, sys: 11.6 s, total: 17min 41s
Wall time: 17min 41s


In [27]:
formula_metrics_df.shape

(37751, 7)

In [28]:
formula_metrics_df.head()

Unnamed: 0_level_0,chaos,spatial,spectral,msm,total_iso_ints,min_iso_ints,max_iso_ints
formula_i,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1136298,0.989467,0.005032,0.99346,0.004946,"[5469792.814208984, 8950.914428710938, 49009.3...","[0, 0, 0, 0]","[9209.248046875, 1063.4140625, 2859.4926757812..."
1136874,0.994783,0.005596,0.981136,0.005462,"[40829342.67486572, 10844.889709472656, 24336....","[0, 0, 0, 0]","[60895.3046875, 1248.24755859375, 1485.7324218..."
1467671,0.99818,0.055214,0.990467,0.054588,"[53361335.24902344, 6351.7821044921875, 48062....","[0, 0, 0, 0]","[25295.4765625, 1144.1231689453125, 1721.48474..."
1470137,0.99565,0.000279,0.990636,0.000275,"[14824101.560913086, 14878.147338867188, 56461...","[0, 0, 0, 0]","[7332.09912109375, 1066.228759765625, 2174.179..."
1471562,0.995417,0.024108,0.990455,0.023768,"[28735164.747802734, 1051.5220947265625, 21886...","[0, 0, 0, 0]","[9924.072265625, 1051.5220947265625, 1265.2122..."


# Clean Temp Data

In [None]:
from annotation_pipeline.dataset_segmentation import clean_segments

In [None]:
%%time
clean_segments(config, input_data)