# 2. Generating a Sample using MS1 Controller

In this notebook, we demonstrate how ViMMS can be used to generate a full-scan mzML file from a single sample. This corresponds to Section 3.1 of the paper.

In [1]:
%matplotlib inline

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import sys
sys.path.append('../..')

In [4]:
from pathlib import Path

In [5]:
from vimms.Chemicals import ChemicalCreator
from vimms.MassSpec import IndependentMassSpectrometer
from vimms.Controller import SimpleMs1Controller
from vimms.Environment import Environment
from vimms.Common import *

Load previously trained spectral feature database and the list of extracted metabolites, created in **01. Download Data.ipynb**.

In [6]:
base_dir = os.path.abspath('example_data')
hmdb = load_obj(Path(base_dir, 'hmdb_compounds.p'))

Set ViMMS logging level

In [7]:
set_log_level_debug()

## Create Chemicals

Define an output folder containing our results

In [8]:
out_dir = Path(base_dir, 'results', 'MS1_single')

Here we generate the chemical objects that will be used in the sample. The chemical objects are generated by sampling from metabolites in the HMDB database.

Sample m/z values from the chemical formulae in HMDB

In [None]:
from vimms.ChemicalSamplers import DatabaseFormulaSampler
from vimms.Common import load_obj

In [None]:
data_dir = os.path.abspath(os.path.join(os.getcwd(),'..','..','tests','fixtures'))
HMDB = os.path.join(data_dir,'hmdb_compounds.p')
hmdb = load_obj(HMDB)

In [None]:
# create a database formula sampler that will sample from HMDB with m/z between 100 and 1000
df = DatabaseFormulaSampler(hmdb, min_mz=100, max_mz=1000)
samples = df.sample(1000)
mz_list = [s[0].mass for s in samples]
plt.hist(mz_list)

Sample RT and intensity values from mzML file

In [None]:
from vimms.ChemicalSamplers import MZMLRTandIntensitySampler
ri = MZMLRTandIntensitySampler(MZML)

rt_list = []
intensity_list = []

for i in range(1000):
    a,b = ri.sample(None) #argument is a formula, but is ignored at the moment
    rt_list.append(a)
    intensity_list.append(b)

plt.figure()
plt.hist(rt_list)
plt.figure()
plt.hist(intensity_list)

Sample chromatograms from mzML file

In [None]:
# note that if you want to set the parameters for the ROI extraction from the mzML, use the RioParams object
# e.g.
from vimms.Roi import RoiParams
roi_params = RoiParams(min_intensity=1000)
from vimms.ChemicalSamplers import MZMLChromatogramSampler
cs = MZMLChromatogramSampler(MZML, roi_params=roi_params)
c = cs.sample(formula, example_rt, example_intensity)
rt_vals = np.linspace(50,150)
intensities = []
for r in rt_vals:
    intensities.append(c.get_relative_intensity(r - example_rt))
plt.plot(rt_vals, intensities)

Sample MS2 spectra from mzML file

In [None]:
from vimms.ChemicalSamplers import MZMLMS2Sampler
ms = MZMLMS2Sampler(MZML)
a = ms.sample(tc)
mz_list = a[0]
intensity_list = a[1]
plot_spectrum(mz_list, intensity_list)

Put everything together

In [None]:
cm = ChemicalMixtureCreator(df, ms2_sampler=CRPMS2Sampler(n_draws=100, alpha=2), chromatogram_sampler=MZMLChromatogramSampler(MZML))
chemicals = cm.sample(100,2)

In [9]:
# the list of ROI sources created in the previous notebook '01. Download Data.ipynb'
ROI_Sources = [str(Path(base_dir,'DsDA', 'DsDA_Beer', 'beer_t10_simulator_files'))]

# minimum MS1 intensity of chemicals
min_ms1_intensity = 1.75E5

# m/z and RT range of chemicals
rt_range = [(0, 1440)]
mz_range = [(0, 1050)]

# the number of chemicals in the sample
n_chems = 6500

# maximum MS level (we do not generate fragmentation peaks when this value is 1)
ms_level = 1

In [10]:
chems = ChemicalCreator(ps, ROI_Sources, hmdb)
dataset = chems.sample(mz_range, rt_range, min_ms1_intensity, n_chems, ms_level)
save_obj(dataset, Path(out_dir, 'dataset.p'))

2019-12-12 11:23:56.330 | DEBUG    | vimms.Chemicals:__init__:239 - Sorting database compounds by masses
2019-12-12 11:24:00.573 | DEBUG    | vimms.Chemicals:sample:272 - 6500 chemicals to be created.
2019-12-12 11:24:01.289 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 0/6500
2019-12-12 11:24:05.648 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 500/6500
2019-12-12 11:24:09.752 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 1000/6500
2019-12-12 11:24:13.966 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 1500/6500
2019-12-12 11:24:18.393 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 2000/6500
2019-12-12 11:24:22.101 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 2500/6500
2019-12-12 11:24:27.012 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampling formula 3000/6500
2019-12-12 11:24:31.498 | DEBUG    | vimms.Chemicals:_sample_formulae:346 - Sampli

In [11]:
for chem in dataset[0:10]:
    print(chem)

KnownChemical - 'C11H11F3N2O4' rt=262.33 max_intensity=187966.49
KnownChemical - 'C30H50O3' rt=429.48 max_intensity=549845.79
KnownChemical - 'C14H19NO10S2' rt=510.49 max_intensity=904802.52
KnownChemical - 'C14H26N2O3S2' rt=551.37 max_intensity=211304.57
KnownChemical - 'C15H22FN3O6' rt=598.74 max_intensity=375567.61
KnownChemical - 'C7H17N3' rt=892.56 max_intensity=356541.82
KnownChemical - 'C10H8O6' rt=426.60 max_intensity=1425028.33
KnownChemical - 'C21H21O10' rt=430.23 max_intensity=271710.28
KnownChemical - 'C26H20O7' rt=311.66 max_intensity=1456420.83
KnownChemical - 'C2H6O5S' rt=212.72 max_intensity=518452.38


## Run MS1 controller on the samples and generate .mzML files

In [12]:
set_log_level_warning()

In [13]:
min_rt = rt_range[0][0]
max_rt = rt_range[0][1]

In [14]:
mass_spec = IndependentMassSpectrometer(POSITIVE, dataset, ps)
controller = SimpleMs1Controller()

In [15]:
# create an environment to run both the mass spec and controller
env = Environment(mass_spec, controller, min_rt, max_rt, progress_bar=True)

# set the log level to WARNING so we don't see too many messages when environment is running
set_log_level_warning()

# run the simulation
env.run()

(1440.911s) ms_level=1: 100%|█████████▉| 1439.5008199999984/1440 [00:59<00:00, 24.17it/s] 


Simulated results are saved to the following .mzML file and can be viewed in tools like [ToppView](https://pubs.acs.org/doi/abs/10.1021/pr900171m) or using other mzML file viewers.

In [16]:
set_log_level_debug()
mzml_filename = 'ms1_controller.mzML'
env.write_mzML(out_dir, mzml_filename)

2019-12-12 11:26:31.002 | DEBUG    | vimms.Environment:write_mzML:142 - Writing mzML file to /home/joewandy/git/vimms/examples/example_data/results/MS1_single/ms1_controller.mzML
2019-12-12 11:26:34.475 | DEBUG    | vimms.Environment:write_mzML:149 - mzML file successfully written!
