### Experiment: Varying N in top-N DDA fragmentation

We demonstrate that the simulator can be used for scan-level closed-loop DDA experiments. 
- Take an existing data. Find out which MS1 peaks are linked to which MS2 peaks.
- Run all MS1 peaks through the simulator’s Top-N protocol. 
- If N is greater than the real data, do we see the same MS1 peaks from (1) being fragmented again, plus additional fragment peaks?
- Can we use the simulator to find a new N that maximises the number of MS1 peaks being fragmented?
- Verification on actual machine.
- Talk to stefan about machine time.

In [1]:
%matplotlib inline

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import numpy as np
import pandas as pd
import sys
import scipy.stats
import pylab as plt
from IPython import display
import pylab as plt

In [4]:
sys.path.append('../codes')

In [5]:
from VMSfunctions.Chemicals import *
from VMSfunctions.Chromatograms import *
from VMSfunctions.MassSpec import *
from VMSfunctions.Controller import *
from VMSfunctions.Common import *
from VMSfunctions.DataGenerator import *

Load densities trained on 4 beer data (see [loader_kde](loader_kde.ipynb)).

In [6]:
ps = load_obj('../models/peak_sampler_4_beers.p')

Load chromatogram data exported from the real data

In [7]:
xcms_output = '../models/beer_ms1_peaks.csv.gz'
cc = ChromatogramCreator(xcms_output)

DEBUG:Chemicals:Loading 0 chromatograms

divide by zero encountered in double_scalars

DEBUG:Chemicals:Loading 5000 chromatograms
DEBUG:Chemicals:Loading 10000 chromatograms
DEBUG:Chemicals:Loading 15000 chromatograms
DEBUG:Chemicals:Loading 20000 chromatograms
DEBUG:Chemicals:Loading 25000 chromatograms
DEBUG:Chemicals:Loading 30000 chromatograms
DEBUG:Chemicals:Loading 35000 chromatograms
DEBUG:Chemicals:Loading 40000 chromatograms
DEBUG:Chemicals:Loading 45000 chromatograms


In [8]:
cc.chemicals

array([UnknownChemical mz=118.0862 rt=531.71 max_intensity=1723984512.00,
       UnknownChemical mz=118.0862 rt=531.55 max_intensity=1642510080.00,
       UnknownChemical mz=116.0705 rt=577.81 max_intensity=2235291136.00,
       ..., UnknownChemical mz=231.1244 rt=1393.23 max_intensity=5235.98,
       UnknownChemical mz=237.0524 rt=1396.85 max_intensity=6229.18,
       UnknownChemical mz=176.9613 rt=1390.25 max_intensity=4970.76],
      dtype=object)

### Set up a Top-N controller

In [14]:
max_rt = 10                    # the maximum retention time of scans to generate
N = 5                           # top-5 DDA fragmentation
mz_tol = 5                      # the mz isolation window around a selected precursor ion
rt_tol = 15                     # the rt window around a selected precursor ion to prevent it from fragmented multiple times
min_ms2_intensity = 5000        # the minimum ms2 peak intensity

In [17]:
mass_spec = IndependentMassSpectrometer(POSITIVE, cc.chemicals, density=ps.density_estimator)
controller = TopNController(mass_spec, N, mz_tol, rt_tol, min_ms2_intensity=min_ms2_intensity)

# set_log_level_info()
# controller.make_plot = False

set_log_level_debug()
controller.make_plot = True

controller.run(max_rt)

INFO:TopNController:Acquisition open
INFO:TopNController:Received Scan 24 num_peaks=11 rt=3.19 ms_level=1
DEBUG:TopNController:Isolated precursor ion 237.8684 window (237.8672, 237.8695)
DEBUG:TopNController:Dynamic exclusion from_mz 237.8672 to_mz 237.8695 from_rt 0.00 to_rt 18.19
DEBUG:TopNController:Isolated precursor ion 171.1986 window (171.1977, 171.1994)
DEBUG:TopNController:Dynamic exclusion from_mz 171.1977 to_mz 171.1994 from_rt 0.00 to_rt 18.19
DEBUG:TopNController:Isolated precursor ion 231.0894 window (231.0883, 231.0906)
DEBUG:TopNController:Dynamic exclusion from_mz 231.0883 to_mz 231.0906 from_rt 0.00 to_rt 18.19
DEBUG:TopNController:Isolated precursor ion 539.7347 window (539.7320, 539.7374)
DEBUG:TopNController:Dynamic exclusion from_mz 539.7320 to_mz 539.7374 from_rt 0.00 to_rt 18.19
DEBUG:TopNController:Isolated precursor ion 203.1141 window (203.1131, 203.1151)
DEBUG:TopNController:Dynamic exclusion from_mz 203.1131 to_mz 203.1151 from_rt 0.00 to_rt 18.19
INFO:TopN

INFO:TopNController:Received Scan 55 num_peaks=249 rt=7.58 ms_level=1
DEBUG:TopNController:Excluded precursor ion mz 159.8941 rt 7.58
DEBUG:TopNController:Excluded precursor ion mz 175.8680 rt 7.58
DEBUG:TopNController:Excluded precursor ion mz 177.8661 rt 7.58
DEBUG:TopNController:Excluded precursor ion mz 176.0949 rt 7.58
DEBUG:TopNController:Excluded precursor ion mz 198.1620 rt 7.58
INFO:TopNController:Received Scan 56 num_peaks=249 rt=7.75 ms_level=1
DEBUG:TopNController:Excluded precursor ion mz 159.8941 rt 7.75
DEBUG:TopNController:Excluded precursor ion mz 175.8680 rt 7.75
DEBUG:TopNController:Excluded precursor ion mz 177.8661 rt 7.75
DEBUG:TopNController:Excluded precursor ion mz 176.0949 rt 7.75
DEBUG:TopNController:Excluded precursor ion mz 198.1620 rt 7.75
INFO:TopNController:Received Scan 57 num_peaks=249 rt=7.83 ms_level=1
DEBUG:TopNController:Excluded precursor ion mz 159.8941 rt 7.83
DEBUG:TopNController:Excluded precursor ion mz 175.8680 rt 7.83
DEBUG:TopNController:E

DEBUG:TopNController:Excluded precursor ion mz 159.8941 rt 9.85
DEBUG:TopNController:Excluded precursor ion mz 175.8680 rt 9.85
DEBUG:TopNController:Excluded precursor ion mz 177.8661 rt 9.85
DEBUG:TopNController:Excluded precursor ion mz 198.1620 rt 9.85
DEBUG:TopNController:Excluded precursor ion mz 176.0949 rt 9.85
INFO:TopNController:Received Scan 77 num_peaks=278 rt=9.97 ms_level=1
DEBUG:TopNController:Excluded precursor ion mz 159.8941 rt 9.97
DEBUG:TopNController:Excluded precursor ion mz 175.8680 rt 9.97
DEBUG:TopNController:Excluded precursor ion mz 177.8661 rt 9.97
DEBUG:TopNController:Excluded precursor ion mz 198.1620 rt 9.97
DEBUG:TopNController:Excluded precursor ion mz 176.0949 rt 9.97
INFO:TopNController:Acquisition closing
