# 2. Generating a Sample using MS1 Controller

In this notebook, we demonstrate how ViMMS can be used to generate a full-scan mzML file from a single sample. This corresponds to Section 3.1 of the paper.

In [1]:
%matplotlib inline

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import sys
sys.path.append('../..')

In [4]:
from pathlib import Path

In [5]:
from vimms.Chemicals import ChemicalCreator
from vimms.MassSpec import IndependentMassSpectrometer
from vimms.Controller import SimpleMs1Controller
from vimms.Environment import Environment
from vimms.Common import *

Load previously trained spectral feature database and the list of extracted metabolites, created in **01. Download Data.ipynb**.

In [6]:
base_dir = os.path.abspath('example_data')
ps = load_obj(Path(base_dir, 'peak_sampler_mz_rt_int_19_beers_fullscan.p'))
hmdb = load_obj(Path(base_dir, 'hmdb_compounds.p'))

Set ViMMS logging level

In [7]:
set_log_level_debug()

## Create Chemicals

Define an output folder containing our results

In [8]:
out_dir = Path(base_dir, 'results', 'MS1_single')

Here we generate the chemical objects that will be used in the sample. The chemical objects are generated by sampling from metabolites in the HMDB database.

In [9]:
# the list of ROI sources created in the previous notebook '01. Download Data.ipynb'
ROI_Sources = [str(Path(base_dir,'DsDA', 'DsDA_Beer', 'beer_t10_simulator_files'))]

# minimum MS1 intensity of chemicals
min_ms1_intensity = 1.75E5

# m/z and RT range of chemicals
rt_range = [(0, 1440)]
mz_range = [(0, 1050)]

# the number of chemicals in the sample
n_chems = 6500

# maximum MS level (we do not generate fragmentation peaks when this value is 1)
ms_level = 1

In [10]:
chems = ChemicalCreator(ps, ROI_Sources, hmdb)
dataset = chems.sample(mz_range, rt_range, min_ms1_intensity, n_chems, ms_level)
save_obj(dataset, Path(out_dir, 'dataset.p'))

2019-12-12 11:23:56.330 | DEBUG    | vimms.Chemicals:__init__:239 - Sorting database compounds by masses
2019-12-12 11:24:00.573 | DEBUG    | vimms.Chemicals:sample:272 - 6500 chemicals to be created.
2019-12-12 11:24:01.289 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 0/6500
2019-12-12 11:24:05.648 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 500/6500
2019-12-12 11:24:09.752 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 1000/6500
2019-12-12 11:24:13.966 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 1500/6500
2019-12-12 11:24:18.393 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 2000/6500
2019-12-12 11:24:22.101 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 2500/6500
2019-12-12 11:24:27.012 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 3000/6500
2019-12-12 11:24:31.498 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampli

In [11]:
for chem in dataset[0:10]:
    print(chem)

KnownChemical - 'C11H11F3N2O4' rt=262.33 max_intensity=187966.49
KnownChemical - 'C30H50O3' rt=429.48 max_intensity=549845.79
KnownChemical - 'C14H19NO10S2' rt=510.49 max_intensity=904802.52
KnownChemical - 'C14H26N2O3S2' rt=551.37 max_intensity=211304.57
KnownChemical - 'C15H22FN3O6' rt=598.74 max_intensity=375567.61
KnownChemical - 'C7H17N3' rt=892.56 max_intensity=356541.82
KnownChemical - 'C10H8O6' rt=426.60 max_intensity=1425028.33
KnownChemical - 'C21H21O10' rt=430.23 max_intensity=271710.28
KnownChemical - 'C26H20O7' rt=311.66 max_intensity=1456420.83
KnownChemical - 'C2H6O5S' rt=212.72 max_intensity=518452.38


## Run MS1 controller on the samples and generate .mzML files

In [12]:
set_log_level_warning()

In [13]:
min_rt = rt_range[0][0]
max_rt = rt_range[0][1]

In [14]:
mass_spec = IndependentMassSpectrometer(POSITIVE, dataset, ps)
controller = SimpleMs1Controller()

In [15]:
# create an environment to run both the mass spec and controller
env = Environment(mass_spec, controller, min_rt, max_rt, progress_bar=True)

# set the log level to WARNING so we don't see too many messages when environment is running
set_log_level_warning()

# run the simulation
env.run()

(1440.911s) ms_level=1: 100%|█████████▉| 1439.5008199999984/1440 [00:59<00:00, 24.17it/s] 


Simulated results are saved to the following .mzML file and can be viewed in tools like [ToppView](https://pubs.acs.org/doi/abs/10.1021/pr900171m) or using other mzML file viewers.

In [16]:
set_log_level_debug()
mzml_filename = 'ms1_controller.mzML'
env.write_mzML(out_dir, mzml_filename)

2019-12-12 11:26:31.002 | DEBUG    | vimms.Environment:write_mzML:142 - Writing mzML file to /home/joewandy/git/vimms/examples/example_data/results/MS1_single/ms1_controller.mzML
2019-12-12 11:26:34.475 | DEBUG    | vimms.Environment:write_mzML:149 - mzML file successfully written!
