# 2.  Extracting Chemicals from an mzML file

This notebook demonstrates how we can extract regions of interests (ROIs) from an existing mzML file and use them as input to simulator in ViMMS. Extracted ROIs are converted into `UnknownChemical` objects, unlike the example in **01. Extracting Chemicals from HMDB.ipynb** where we operated on `KnownChemical` objects from HMDB.

In [1]:
%matplotlib inline

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from pathlib import Path
from loguru import logger

In [4]:
import os
import sys
sys.path.append('../..')

In [5]:
from vimms.Common import download_file, extract_zip_file, set_log_level_debug, set_log_level_warning, \
    save_obj, load_obj, POSITIVE
from vimms.Roi import RoiParams
from vimms.Chemicals import ChemicalMixtureFromMZML

In [6]:
from vimms.MassSpec import IndependentMassSpectrometer
from vimms.Controller import TopNController
from vimms.Environment import Environment

### Download mzML fragmentation files for demo

These mzML files containing results from beer and urine measurements are also used in the first ViMMS manuscript.

In [7]:
url = 'http://researchdata.gla.ac.uk/870/2/example_data.zip'
base_dir = os.path.join(os.getcwd(), 'example_data')

In [8]:
if not os.path.isdir(base_dir): # if not exist then download the example data and extract it
    print('Creating %s' % base_dir)    
    out_file = 'example_data.zip'
    download_file(url, out_file)
    extract_zip_file(out_file, delete=True)
else:
    print('Found %s' % base_dir)

Found C:\Users\joewa\Work\git\vimms\demo\01. Data\example_data


### Extract chemicals from beer and urine samples

Extract chemicals from the beer and urine mzML files using `ChemicalMixtureFromMZML` class. The results are a list of `UnknownChemical` objects for each input mzML file. Once created, we can persist the list of `UnknownChemicals` to the file system by calling the `save_obj` method from ViMMS.

In [9]:
param_dict = {
    'mz_tol': 5,
    'mz_units': 'ppm',
    'min_length': 1,
    'min_intensity': 1.75E5,
    'start_rt': 0,
    'stop_rt': 1440
}

#### Beer files

In [10]:
mzml_file = os.path.join(base_dir, 'beers', 'fragmentation', 'mzML', 'Beer_multibeers_1_T10_POS.mzML')
mzml_file

'C:\\Users\\joewa\\Work\\git\\vimms\\demo\\01. Data\\example_data\\beers\\fragmentation\\mzML\\Beer_multibeers_1_T10_POS.mzML'

In [11]:
rp = RoiParams(**param_dict)
cm = ChemicalMixtureFromMZML(mzml_file, roi_params=rp)
dataset = cm.sample(None, 2)

2021-08-27 23:47:45.225 | DEBUG    | vimms.Chemicals:_extract_rois:349 - Extracted 25759 good ROIs from C:\Users\joewa\Work\git\vimms\demo\01. Data\example_data\beers\fragmentation\mzML\Beer_multibeers_1_T10_POS.mzML


In [12]:
out_name = os.path.join(base_dir, 'beers', 'datasets', 'beer_1.p')
save_obj(dataset, out_name)

2021-08-27 23:47:47.691 | INFO     | vimms.Common:save_obj:299 - Saving <class 'list'> to C:\Users\joewa\Work\git\vimms\demo\01. Data\example_data\beers\datasets\beer_1.p


#### Urine files

In [13]:
mzml_file = os.path.join(base_dir, 'urines', 'fragmentation', 'mzML', 'Urine_StrokeDrugs_02_T10_POS.mzML')
mzml_file

'C:\\Users\\joewa\\Work\\git\\vimms\\demo\\01. Data\\example_data\\urines\\fragmentation\\mzML\\Urine_StrokeDrugs_02_T10_POS.mzML'

In [14]:
rp = RoiParams(**param_dict)
cm = ChemicalMixtureFromMZML(mzml_file, roi_params=rp)
dataset = cm.sample(None, 2)

2021-08-27 23:48:02.278 | DEBUG    | vimms.Chemicals:_extract_rois:349 - Extracted 37233 good ROIs from C:\Users\joewa\Work\git\vimms\demo\01. Data\example_data\urines\fragmentation\mzML\Urine_StrokeDrugs_02_T10_POS.mzML


In [15]:
out_name = os.path.join(base_dir, 'urines', 'datasets', 'urine_1.p')
save_obj(dataset, out_name)

2021-08-27 23:48:05.419 | INFO     | vimms.Common:save_obj:299 - Saving <class 'list'> to C:\Users\joewa\Work\git\vimms\demo\01. Data\example_data\urines\datasets\urine_1.p


### Use in simulator

Perform two simulated injections using the beer and urine dataset. First we load the saved beer and urine datasets from the steps above.

In [16]:
beer_dataset = load_obj(os.path.join(base_dir, 'beers', 'datasets', 'beer_1.p'))
urine_dataset = load_obj(os.path.join(base_dir, 'urines', 'datasets', 'urine_1.p'))
datasets = {
    'beer': beer_dataset, 
    'urine': urine_dataset
}

In [17]:
rt_range = [(0, 1440)]
min_rt = rt_range[0][0]
max_rt = rt_range[0][1]

In [18]:
isolation_window = 1
N = 3
rt_tol = 15
mz_tol = 10
min_ms1_intensity = 1.75E5

Initialise simulated mass spec and the Top-N controller using the two datasets (beer and urine chemicals) as input. In a loop, perform two injections of each dataset using the Top-N fragmentation strategy.

In [19]:
set_log_level_warning()
for label in datasets:
    logger.warning('Processing %s' % label)
    dataset = datasets[label]
    mass_spec = IndependentMassSpectrometer(POSITIVE, dataset)
    controller = TopNController(POSITIVE, N, isolation_window, mz_tol, rt_tol, min_ms1_intensity)    
    env = Environment(mass_spec, controller, min_rt, max_rt, progress_bar=True)
    env.run()
    
    mzml_filename = '%s_topn_controller.mzML' % label
    out_dir = os.path.join(os.getcwd(), 'results')
    env.write_mzML(out_dir, mzml_filename)

(1440.000s) ms_level=2: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1439.8000000001357/1440 [00:42<00:00, 33.94it/s]
(1440.000s) ms_level=2: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1439.8000000001355/1440 [00:51<00:00, 27.94it/s]


Simulated mzML have been created, and you can use ToppView from OpenMS or other mzML viewer to inspect the results.