# 01. Download Data

This notebook downloads the necessary example data that will be used in other notebooks. Please run this first to make sure the data files are available. In particular, the notebook does the following:

a. Download the HMDB database and extract metabolites.

b. Download the 19 beer .mzML files used as examples in the paper

c. Trains kernel density estimators on the mzML files.

d. Extract regions of interests from the mzML files.

In [1]:
%matplotlib inline

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import os

In [4]:
import sys
sys.path.append('..')

In [12]:
from vimms.DataGenerator import download_file, extract_hmdb_metabolite, extract_zip_file
from vimms.MassSpec import IndependentMassSpectrometer
from vimms.Controller import SimpleMs1Controller
from vimms.Common import *

## a. Download metabolites from HMDB

Here we try to load a pre-processed pickled file of metabolites in the current folder. If it is not found, then it downloads and extracts the compounds from HMDB.

In [6]:
compound_file = 'hmdb_compounds.p'
hmdb_compounds = load_obj(compound_file)
if hmdb_compounds is None: # if file does not exist

    # download the entire HMDB metabolite database, big and slow!!
    # url = 'http://www.hmdb.ca/system/downloads/current/hmdb_metabolites.zip'

    # download a smaller urine metabolite database for testing
    url = 'http://www.hmdb.ca/system/downloads/current/urine_metabolites.zip'

    out_file = download_file(url)
    compounds = extract_hmdb_metabolite(out_file, delete=True)
    save_obj(compounds, compound_file)

else:
    print('Loaded %d DatabaseCompounds from %s' % (len(hmdb_compounds), compound_file))

Loaded 114087 DatabaseCompounds from hmdb_compounds.p


## b. Download beer and urine files

Additionally we also download the beer and urine .mzML files used as examples in the paper.

In [7]:
url = 'https://www.dropbox.com/s/rmktzngfflf1ll7/manuscript_data.zip?dl=1'
out_file = 'manuscript_data.zip'

In [8]:
download_file(url, out_file)

Downloading manuscript_data.zip


839kKB [01:17, 10.8kKB/s]                                                                                                         


'manuscript_data.zip'

In [9]:
extract_zip_file(out_file, delete=True)

Extracting manuscript_data.zip


100%|█████████████████████████████████████████████████████████████████████████████████████████████| 89/89 [00:43<00:00,  5.96it/s]


Deleting manuscript_data.zip


In [10]:
data_dir = os.path.join(os.getcwd(), 'manuscript_data')

## c. Train the KDEs

## d. Extract the ROIs