dealing with different fragments in different fragmentation methods #71

pavel-shliaha · 2018-01-26T21:13:28Z

I understand that it might be a big problem to make it work but we need to be able to combine datasets with different (partically overlapping or non-overlapping fragment types). The reason for this bizzare request is that UVPD data tends to lead to some proton rearrangements. Please see from here (PMID: 29336549):

We noticed, however, that there was a poor fit to the a-, x-, and y-ion isotopic envelopes in the non-HDX experiments, and that the inclusion of a+1, a+2, x+1, x+2, y-1 or y-2 ions (i.e., fragment ions which had gained or lost 1 or 2 Da) into the analysis only partially alleviated the this problem. It appeared that, in multiple cases, the a-type fragment ions are actually mixtures of a, a+1, a+2, the x-type fragment ions are mixtures of x, x+1, x+2, and the y-type fragment ions are mixtures of y, y-1, and y-2 fragment ions, in varying proportions.

This means in practice that I have to read in the dataset that contains UVPD with all these fragments, that I know are not present in other fragmentation methods. Having so many fragments really decreases the accuracy of matching.

Hence we need to have a way to specify different fragment and adduct types for different fragmentation methods.

While it might take some time to decide how to implement this best, I really wanted to ask you to create a method for combination of NCBsets. Since I can easily match the different datasets (e.g. UVPD and ETD) with different fragments convert them to NCBsets and then combine them. But this is currently impossible, since it returns me the following error:

combine (CANCB, CAUVPDNCB)
# Order of conditions changed.
# Error in (function (classes, fdef, mtable)  : 
#  unable to find an inherited method for function ‘updateMedianInjectionTime’ for signature ‘"NCBSet"’

…loses #69, see #71

sgibb · 2018-01-27T10:15:09Z

I understand the problem. Currently there is no easy solution. I fixed the combine method so that it is working for TopDownSet and NCBSet. But currently it expects that the rowViews (fragments) are identical (so in fact it is a simple cbind with some recalculations (median injection time etc))`.

I don't have an idea to provide a clean interface for readTopDownFiles that supports different adducts for different fragmentation methods (we could add a column to the adduct data.frame, e.g. fragmentationMethod). Or I could provide a filterFragments function. Or modify the combine method to allow different fragments.

pavel-shliaha · 2018-01-27T10:16:07Z

I have an idea which I think might adress this for now. Lets say I want different fragment types for CID, UVPD and ETD which are all in one file. I could:

create 3 different topDownSet objects: ETD (matched to ETD fragments: c and z), CID (matched to CID fragments: b and y) and UVPD (matched with UVPD fragments: a and x)
Subset the three objects, so that ETD object now contains only ETD spectra, CID only CID spectra, UVPD only UVPD spectra
merge them together

this would however require an ability to merge topDownSets with different fragment types. It would be much less messier than to try and load different spectra with different fragments

sgibb · 2018-01-27T17:37:54Z

Ok, now combine works for different fragments, too. So you can call readTopDownFiles with different fragmentation types and different adducts and combine direct after reading the files. Subsequently the whole filtering etc. should work on the combined object. (But you are also free to do the whole preprocessing for each object and combine the resulting NCBSet-objects.)

library("topdownr")
path <- "../topdownrdata/inst/extdata/20170703_ca/"

#' Default workflow

H <- 1.0078250321

ca <- readTopDownFiles(
    path = path,
    ## load fasta and ETD data
    pattern=".*fasta.gz$|ETDReagentTarget.*|missing.*",
    type = c("a", "b", "c", "x", "y", "z"),
    adducts = data.frame(
        mass=c(-H, H),
        to=c("c", "z"),
        name=c("cmH", "zpH")),
    modifications = "Met-loss",
    neutralLoss = NULL
)
# TopDownSet object (4.51 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 2064
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28904.63]
# - - - Condition data - - -
# Number of conditions: 1853
# Number of scans: 6098
# Condition variables (60): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 2064x6098 (1.01% != 0)
# Number of matched fragments: 126648
# Intensity range: [40.30;3727119.75]
# - - - Processing information - - -
# [2018-01-27 18:33:12] 126648 fragments [2064;6098] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:12] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1853 conditions.
# [2018-01-27 18:33:12] Recalculate median injection time based on: Mz, AgcTarget.

cauvpd <- readTopDownFiles(
    path = path,
    ## load fasta and UVPD data
    pattern=".*fasta.gz$|UVPD.*",
    type = c("a", "b", "c", "x", "y", "z"),
    adducts = data.frame(
        mass=c(1, 2, 1, 2, -1, -2),
        to=c("a", "a", "x", "x", "y", "y"),
        name=c("ap1", "ap2", "xp1", "xp2", "ym1", "ym2")),
    modifications = "Met-loss",
    neutralLoss = NULL,
)
# TopDownSet object (0.66 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 3096
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28906.63]
# - - - Condition data - - -
# Number of conditions: 64
# Number of scans: 211
# Condition variables (42): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 3096x211 (0.85% != 0)
# Number of matched fragments: 5526
# Intensity range: [17.72;283448.38]
# - - - Processing information - - -
# [2018-01-27 18:33:13] 5526 fragments [3096;211] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:13] Condition names updated based on: Mz, AgcTarget, UvpdActivation. Order of conditions changed. 64 conditions.
# [2018-01-27 18:33:13] Recalculate median injection time based on: Mz, AgcTarget.

cacmb <- combine(ca, cauvpd)
cacmb
# TopDownSet object (4.91 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK 
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 3612 
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28906.63]
# - - - Condition data - - -
# Number of conditions: 1917 
# Number of scans: 6309 
# Condition variables (63): File, Scan, ..., UvpdActivation, UvpdTime
# - - - Intensity data - - -
# Size of array: 3612x6309 (0.58% != 0)
# Number of matched fragments: 132174 
# Intensity range: [17.72;3727119.75]
# - - - Processing information - - -
# [2018-01-27 18:33:12] 126648 fragments [2064;6098] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:12] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1853 conditions.
# [2018-01-27 18:33:12] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:33:13] 5526 fragments [3096;211] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:13] Condition names updated based on: Mz, AgcTarget, UvpdActivation. Order of conditions changed. 64 conditions.
# [2018-01-27 18:33:13] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:34:21] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation, UvpdActivation. Order of conditions changed. 1917 conditions.
# [2018-01-27 18:34:21] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:34:21] Combined 126648 fragments [2064;6098] and 5526 fragments [3096;211] into a 132174 fragments [3612;6309] TopDownSet object.

pavel-shliaha mentioned this issue Jan 26, 2018

Provide a method to combine TopDownSets #69

Closed

sgibb added a commit that referenced this issue Jan 27, 2018

Change signature for updateMedianInjectionTime to AbstractTopDownSet; c…

065acfb

…loses #69, see #71

sgibb self-assigned this Jan 27, 2018

sgibb added enhancement TODO labels Jan 27, 2018

sgibb added this to the bioc 3.7 milestone Jan 27, 2018

sgibb closed this as completed in 790e8f8 Jan 27, 2018

sgibb added a commit that referenced this issue Feb 22, 2018

Use spectrumId instead of acquistionNum to get Scan id; fixes #71

7327ab5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dealing with different fragments in different fragmentation methods #71

dealing with different fragments in different fragmentation methods #71

pavel-shliaha commented Jan 26, 2018 •

edited by sgibb

sgibb commented Jan 27, 2018

pavel-shliaha commented Jan 27, 2018 •

edited

sgibb commented Jan 27, 2018

dealing with different fragments in different fragmentation methods #71

dealing with different fragments in different fragmentation methods #71

Comments

pavel-shliaha commented Jan 26, 2018 • edited by sgibb

sgibb commented Jan 27, 2018

pavel-shliaha commented Jan 27, 2018 • edited

sgibb commented Jan 27, 2018

pavel-shliaha commented Jan 26, 2018 •

edited by sgibb

pavel-shliaha commented Jan 27, 2018 •

edited