# 1. Download Data

This notebook downloads the necessary example data that will be used in other notebooks. In particular, the notebook does the following:

- Download beer and urine .mzML files used as examples in the paper
- Download the HMDB database and extract metabolites.

**Please run this notebook first to make sure the data files are available for subsequent notebooks. It might take a while, so please be patient and let the notebook runs to its completion**

The data files downloaded above should contain nearly everything needed to replicate the results in the paper using your own data. Please replace the paths below to point to your files if you want to run the simulation based on your own data.

Alternatively if you just want to try running some controllers (fragmentation strategies) quickly using our test fixtures, please take a look at the test cases instead.

In [1]:
%matplotlib inline

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import os
from pathlib import Path
import glob

In [4]:
import sys
sys.path.append('../..')

In [5]:
from vimms.FeatureExtraction import extract_hmdb_metabolite
from vimms.Common import set_log_level_debug, download_file, extract_zip_file, load_obj

In [6]:
set_log_level_debug()

1

## a. Download example mzML files

Here we download the beer .mzML files used as examples in the paper if they don't exist.

In [7]:
url = 'https://github.com/glasgowcompbio/vimms-data/raw/main/example_data.zip'
base_dir = os.path.join(os.getcwd(), 'example_data')

In [8]:
if not os.path.isdir(base_dir): # if not exist then download the example data and extract it
    print('Creating %s' % base_dir)    
    out_file = 'example_data.zip'
    download_file(url, out_file)
    extract_zip_file(out_file, delete=True)
else:
    print('Found %s' % base_dir)

Found /Users/joewandy/Work/git/vimms/examples/01. vimms (Wandy et al 2019)/example_data


## b. Download metabolites from HMDB

Next we load a pre-processed pickled file of database metabolites in the `data_dir` folder. If it is not found, then create the file by downloading and extracting the metabolites from HMDB.

In [9]:
out_file = 'hmdb_compounds.p'
compound_file = Path(base_dir, out_file)
try:
    hmdb_compounds = load_obj(compound_file)
except FileNotFoundError:
    
    # download the entire HMDB metabolite database and extract chemicals from it
    # url = 'http://www.hmdb.ca/system/downloads/current/hmdb_metabolites.zip'
    # out_file = download_file(url)
    # compounds = extract_hmdb_metabolite(out_file, delete=True)
    # save_obj(compounds, compound_file)
    
    # above could be quite slow slow, so download a pre-processed result instead
    url = 'https://github.com/glasgowcompbio/vimms-data/raw/main/hmdb_compounds.p'
    download_file(url, compound_file)