### Experiment: Varying N in top-N DDA fragmentation

We demonstrate that the simulator can be used for scan-level closed-loop DDA experiments. 
- Take an existing data. Find out which MS1 peaks are linked to which MS2 peaks.
- Run all MS1 peaks through the simulator’s Top-N protocol. 
- If N is greater than the real data, do we see the same MS1 peaks from (1) being fragmented again, plus additional fragment peaks?
- Can we use the simulator to find a new N that maximises the number of MS1 peaks being fragmented?
- Verification on actual machine.
- Talk to stefan about machine time.

In [1]:
%matplotlib inline

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import numpy as np
import pandas as pd
import sys
import scipy.stats
import pylab as plt
from IPython import display
import pylab as plt

In [4]:
sys.path.append('../codes')

In [5]:
from VMSfunctions.Chemicals import *
from VMSfunctions.Chromatograms import *
from VMSfunctions.MassSpec import *
from VMSfunctions.Controller import *
from VMSfunctions.Common import *
from VMSfunctions.DataGenerator import *

Load densities trained on 4 beer data (see [loader_kde](loader_kde.ipynb)).

In [6]:
ps = load_obj('../models/peak_sampler_4_beers.p')

Load chromatogram data exported from the real data

In [20]:
xcms_output = '../models/beer_ms1_peaks.csv.gz'
cc = ChromatogramCreator(xcms_output)

DEBUG:Chemicals:Loading 0 chromatograms
DEBUG:Chemicals:Loading 5000 chromatograms
DEBUG:Chemicals:Loading 10000 chromatograms
DEBUG:Chemicals:Loading 15000 chromatograms
DEBUG:Chemicals:Loading 20000 chromatograms
DEBUG:Chemicals:Loading 25000 chromatograms
DEBUG:Chemicals:Loading 30000 chromatograms
DEBUG:Chemicals:Loading 35000 chromatograms
DEBUG:Chemicals:Loading 40000 chromatograms
DEBUG:Chemicals:Loading 45000 chromatograms


In [31]:
chemicals = ChemicalCreator(ps, cc)
dataset = chemicals.sample_from_chromatograms(2)

DEBUG:Chemicals:47141 ms1 peaks to be created.
DEBUG:Chemicals:i = 0
DEBUG:Chemicals:i = 2500
DEBUG:Chemicals:i = 5000
DEBUG:Chemicals:i = 7500
DEBUG:Chemicals:i = 10000
DEBUG:Chemicals:i = 12500
DEBUG:Chemicals:i = 15000
DEBUG:Chemicals:i = 17500
DEBUG:Chemicals:i = 20000
DEBUG:Chemicals:i = 22500
DEBUG:Chemicals:i = 25000
DEBUG:Chemicals:i = 27500
DEBUG:Chemicals:i = 30000
DEBUG:Chemicals:i = 32500
DEBUG:Chemicals:i = 35000
DEBUG:Chemicals:i = 37500
DEBUG:Chemicals:i = 40000
DEBUG:Chemicals:i = 42500
DEBUG:Chemicals:i = 45000


AttributeError: 'UnknownChemical' object has no attribute 'rts'

### Set up a Top-N controller

In [40]:
max_rt = 100                    # the maximum retention time of scans to generate
N = 5                           # top-5 DDA fragmentation
mz_tol = 5                      # the mz isolation window around a selected precursor ion
rt_tol = 15                     # the rt window around a selected precursor ion to prevent it from fragmented multiple times
min_ms2_intensity = 5000        # the minimum ms2 peak intensity

In [41]:
mass_spec = IndependentMassSpectrometer(POSITIVE, dataset, density=ps.density_estimator)
controller = TopNController(mass_spec, N, mz_tol, rt_tol, min_ms2_intensity=min_ms2_intensity)

set_log_level_info()
controller.make_plot = False

#set_log_level_debug()
#controller.make_plot = True

controller.run(max_rt)

INFO:TopNController:Acquisition open
INFO:TopNController:Received Scan 21 num_peaks=8 rt=3.18 ms_level=1
INFO:TopNController:Received Scan 22 num_peaks=10 rt=3.27 ms_level=2
INFO:TopNController:Received Scan 23 num_peaks=30 rt=3.41 ms_level=2
INFO:TopNController:Received Scan 24 num_peaks=10 rt=3.55 ms_level=2
INFO:TopNController:Received Scan 25 num_peaks=10 rt=3.70 ms_level=2
INFO:TopNController:Received Scan 26 num_peaks=10 rt=3.85 ms_level=2
INFO:TopNController:Received Scan 27 num_peaks=22 rt=4.00 ms_level=1
INFO:TopNController:Received Scan 28 num_peaks=10 rt=4.08 ms_level=2
INFO:TopNController:Received Scan 29 num_peaks=10 rt=4.22 ms_level=2
INFO:TopNController:Received Scan 30 num_peaks=10 rt=4.36 ms_level=2
INFO:TopNController:Received Scan 31 num_peaks=30 rt=4.66 ms_level=2
INFO:TopNController:Received Scan 32 num_peaks=146 rt=4.78 ms_level=1
INFO:TopNController:Received Scan 33 num_peaks=10 rt=4.90 ms_level=2
INFO:TopNController:Received Scan 34 num_peaks=10 rt=5.04 ms_level

INFO:TopNController:Received Scan 137 num_peaks=217 rt=18.31 ms_level=1
INFO:TopNController:Received Scan 138 num_peaks=216 rt=18.40 ms_level=1
INFO:TopNController:Received Scan 139 num_peaks=216 rt=18.50 ms_level=1
INFO:TopNController:Received Scan 140 num_peaks=216 rt=18.58 ms_level=1
INFO:TopNController:Received Scan 141 num_peaks=216 rt=18.68 ms_level=1
INFO:TopNController:Received Scan 142 num_peaks=216 rt=18.81 ms_level=1
INFO:TopNController:Received Scan 143 num_peaks=212 rt=18.88 ms_level=1
INFO:TopNController:Received Scan 144 num_peaks=206 rt=18.94 ms_level=1
INFO:TopNController:Received Scan 145 num_peaks=211 rt=19.13 ms_level=1
INFO:TopNController:Received Scan 146 num_peaks=204 rt=19.17 ms_level=1
INFO:TopNController:Received Scan 147 num_peaks=213 rt=19.35 ms_level=1
INFO:TopNController:Received Scan 148 num_peaks=209 rt=19.72 ms_level=1
INFO:TopNController:Received Scan 149 num_peaks=209 rt=19.82 ms_level=1
INFO:TopNController:Received Scan 150 num_peaks=10 rt=19.95 ms_l

INFO:TopNController:Received Scan 251 num_peaks=229 rt=32.93 ms_level=1
INFO:TopNController:Received Scan 252 num_peaks=229 rt=33.05 ms_level=1
INFO:TopNController:Received Scan 253 num_peaks=223 rt=33.38 ms_level=1
INFO:TopNController:Received Scan 254 num_peaks=220 rt=33.54 ms_level=1
INFO:TopNController:Received Scan 255 num_peaks=223 rt=33.61 ms_level=1
INFO:TopNController:Received Scan 256 num_peaks=221 rt=33.67 ms_level=1
INFO:TopNController:Received Scan 257 num_peaks=218 rt=33.93 ms_level=1
INFO:TopNController:Received Scan 258 num_peaks=218 rt=34.00 ms_level=1
INFO:TopNController:Received Scan 259 num_peaks=217 rt=34.06 ms_level=1
INFO:TopNController:Received Scan 260 num_peaks=217 rt=34.13 ms_level=1
INFO:TopNController:Received Scan 261 num_peaks=217 rt=34.20 ms_level=1
INFO:TopNController:Received Scan 262 num_peaks=217 rt=34.27 ms_level=1
INFO:TopNController:Received Scan 263 num_peaks=217 rt=34.45 ms_level=1
INFO:TopNController:Received Scan 264 num_peaks=217 rt=34.52 ms_

INFO:TopNController:Received Scan 365 num_peaks=227 rt=47.60 ms_level=1
INFO:TopNController:Received Scan 366 num_peaks=227 rt=47.67 ms_level=1
INFO:TopNController:Received Scan 367 num_peaks=226 rt=47.74 ms_level=1
INFO:TopNController:Received Scan 368 num_peaks=224 rt=47.82 ms_level=1
INFO:TopNController:Received Scan 369 num_peaks=225 rt=47.98 ms_level=1
INFO:TopNController:Received Scan 370 num_peaks=223 rt=48.05 ms_level=1
INFO:TopNController:Received Scan 371 num_peaks=224 rt=48.30 ms_level=1
INFO:TopNController:Received Scan 372 num_peaks=224 rt=48.42 ms_level=1
INFO:TopNController:Received Scan 373 num_peaks=223 rt=48.75 ms_level=1
INFO:TopNController:Received Scan 374 num_peaks=226 rt=48.91 ms_level=1
INFO:TopNController:Received Scan 375 num_peaks=227 rt=49.23 ms_level=1
INFO:TopNController:Received Scan 376 num_peaks=230 rt=49.31 ms_level=1
INFO:TopNController:Received Scan 377 num_peaks=234 rt=49.38 ms_level=1
INFO:TopNController:Received Scan 378 num_peaks=234 rt=49.48 ms_

INFO:TopNController:Received Scan 479 num_peaks=243 rt=61.89 ms_level=1
INFO:TopNController:Received Scan 480 num_peaks=243 rt=61.98 ms_level=1
INFO:TopNController:Received Scan 481 num_peaks=243 rt=62.05 ms_level=1
INFO:TopNController:Received Scan 482 num_peaks=243 rt=62.12 ms_level=1
INFO:TopNController:Received Scan 483 num_peaks=240 rt=62.18 ms_level=1
INFO:TopNController:Received Scan 484 num_peaks=239 rt=62.25 ms_level=1
INFO:TopNController:Received Scan 485 num_peaks=239 rt=62.33 ms_level=1
INFO:TopNController:Received Scan 486 num_peaks=240 rt=62.52 ms_level=1
INFO:TopNController:Received Scan 487 num_peaks=238 rt=62.71 ms_level=1
INFO:TopNController:Received Scan 488 num_peaks=236 rt=63.07 ms_level=1
INFO:TopNController:Received Scan 489 num_peaks=236 rt=63.15 ms_level=1
INFO:TopNController:Received Scan 490 num_peaks=236 rt=63.22 ms_level=1
INFO:TopNController:Received Scan 491 num_peaks=234 rt=63.42 ms_level=1
INFO:TopNController:Received Scan 492 num_peaks=227 rt=63.55 ms_

INFO:TopNController:Received Scan 593 num_peaks=215 rt=78.05 ms_level=1
INFO:TopNController:Received Scan 594 num_peaks=211 rt=78.18 ms_level=1
INFO:TopNController:Received Scan 595 num_peaks=209 rt=78.26 ms_level=1
INFO:TopNController:Received Scan 596 num_peaks=216 rt=78.32 ms_level=1
INFO:TopNController:Received Scan 597 num_peaks=215 rt=78.69 ms_level=1
INFO:TopNController:Received Scan 598 num_peaks=214 rt=78.74 ms_level=1
INFO:TopNController:Received Scan 599 num_peaks=211 rt=78.93 ms_level=1
INFO:TopNController:Received Scan 600 num_peaks=211 rt=79.21 ms_level=1
INFO:TopNController:Received Scan 601 num_peaks=216 rt=79.38 ms_level=1
INFO:TopNController:Received Scan 602 num_peaks=215 rt=79.45 ms_level=1
INFO:TopNController:Received Scan 603 num_peaks=214 rt=79.61 ms_level=1
INFO:TopNController:Received Scan 604 num_peaks=220 rt=79.77 ms_level=1
INFO:TopNController:Received Scan 605 num_peaks=218 rt=79.83 ms_level=1
INFO:TopNController:Received Scan 606 num_peaks=222 rt=79.90 ms_

INFO:TopNController:Received Scan 707 num_peaks=219 rt=93.18 ms_level=1
INFO:TopNController:Received Scan 708 num_peaks=219 rt=93.24 ms_level=1
INFO:TopNController:Received Scan 709 num_peaks=218 rt=93.34 ms_level=1
INFO:TopNController:Received Scan 710 num_peaks=218 rt=93.42 ms_level=1
INFO:TopNController:Received Scan 711 num_peaks=218 rt=93.48 ms_level=1
INFO:TopNController:Received Scan 712 num_peaks=215 rt=93.74 ms_level=1
INFO:TopNController:Received Scan 713 num_peaks=214 rt=93.92 ms_level=1
INFO:TopNController:Received Scan 714 num_peaks=212 rt=93.99 ms_level=1
INFO:TopNController:Received Scan 715 num_peaks=213 rt=94.19 ms_level=1
INFO:TopNController:Received Scan 716 num_peaks=216 rt=94.27 ms_level=1
INFO:TopNController:Received Scan 717 num_peaks=212 rt=94.66 ms_level=1
INFO:TopNController:Received Scan 718 num_peaks=212 rt=94.72 ms_level=1
INFO:TopNController:Received Scan 719 num_peaks=212 rt=94.79 ms_level=1
INFO:TopNController:Received Scan 720 num_peaks=212 rt=94.95 ms_