Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with different fragments in different fragmentation methods #71

Closed
pavel-shliaha opened this issue Jan 26, 2018 · 3 comments
Closed
Assignees
Milestone

Comments

@pavel-shliaha
Copy link
Collaborator

pavel-shliaha commented Jan 26, 2018

I understand that it might be a big problem to make it work but we need to be able to combine datasets with different (partically overlapping or non-overlapping fragment types). The reason for this bizzare request is that UVPD data tends to lead to some proton rearrangements. Please see from here (PMID: 29336549):

We noticed, however, that there was a poor fit to the a-, x-, and y-ion isotopic envelopes in the non-HDX experiments, and that the inclusion of a+1, a+2, x+1, x+2, y-1 or y-2 ions (i.e., fragment ions which had gained or lost 1 or 2 Da) into the analysis only partially alleviated the this problem. It appeared that, in multiple cases, the a-type fragment ions are actually mixtures of a, a+1, a+2, the x-type fragment ions are mixtures of x, x+1, x+2, and the y-type fragment ions are mixtures of y, y-1, and y-2 fragment ions, in varying proportions.

This means in practice that I have to read in the dataset that contains UVPD with all these fragments, that I know are not present in other fragmentation methods. Having so many fragments really decreases the accuracy of matching.

Hence we need to have a way to specify different fragment and adduct types for different fragmentation methods.

While it might take some time to decide how to implement this best, I really wanted to ask you to create a method for combination of NCBsets. Since I can easily match the different datasets (e.g. UVPD and ETD) with different fragments convert them to NCBsets and then combine them. But this is currently impossible, since it returns me the following error:

combine (CANCB, CAUVPDNCB)
# Order of conditions changed.
# Error in (function (classes, fdef, mtable)  : 
#  unable to find an inherited method for function ‘updateMedianInjectionTime’ for signature ‘"NCBSet"’
@sgibb
Copy link
Owner

sgibb commented Jan 27, 2018

I understand the problem. Currently there is no easy solution. I fixed the combine method so that it is working for TopDownSet and NCBSet. But currently it expects that the rowViews (fragments) are identical (so in fact it is a simple cbind with some recalculations (median injection time etc))`.

I don't have an idea to provide a clean interface for readTopDownFiles that supports different adducts for different fragmentation methods (we could add a column to the adduct data.frame, e.g. fragmentationMethod). Or I could provide a filterFragments function. Or modify the combine method to allow different fragments.

@pavel-shliaha
Copy link
Collaborator Author

pavel-shliaha commented Jan 27, 2018

I have an idea which I think might adress this for now. Lets say I want different fragment types for CID, UVPD and ETD which are all in one file. I could:

  1. create 3 different topDownSet objects: ETD (matched to ETD fragments: c and z), CID (matched to CID fragments: b and y) and UVPD (matched with UVPD fragments: a and x)
  2. Subset the three objects, so that ETD object now contains only ETD spectra, CID only CID spectra, UVPD only UVPD spectra
  3. merge them together

this would however require an ability to merge topDownSets with different fragment types. It would be much less messier than to try and load different spectra with different fragments

@sgibb sgibb closed this as completed in 790e8f8 Jan 27, 2018
@sgibb
Copy link
Owner

sgibb commented Jan 27, 2018

Ok, now combine works for different fragments, too. So you can call readTopDownFiles with different fragmentation types and different adducts and combine direct after reading the files. Subsequently the whole filtering etc. should work on the combined object. (But you are also free to do the whole preprocessing for each object and combine the resulting NCBSet-objects.)

library("topdownr")
path <- "../topdownrdata/inst/extdata/20170703_ca/"

#' Default workflow

H <- 1.0078250321

ca <- readTopDownFiles(
    path = path,
    ## load fasta and ETD data
    pattern=".*fasta.gz$|ETDReagentTarget.*|missing.*",
    type = c("a", "b", "c", "x", "y", "z"),
    adducts = data.frame(
        mass=c(-H, H),
        to=c("c", "z"),
        name=c("cmH", "zpH")),
    modifications = "Met-loss",
    neutralLoss = NULL
)
# TopDownSet object (4.51 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 2064
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28904.63]
# - - - Condition data - - -
# Number of conditions: 1853
# Number of scans: 6098
# Condition variables (60): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 2064x6098 (1.01% != 0)
# Number of matched fragments: 126648
# Intensity range: [40.30;3727119.75]
# - - - Processing information - - -
# [2018-01-27 18:33:12] 126648 fragments [2064;6098] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:12] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1853 conditions.
# [2018-01-27 18:33:12] Recalculate median injection time based on: Mz, AgcTarget.

cauvpd <- readTopDownFiles(
    path = path,
    ## load fasta and UVPD data
    pattern=".*fasta.gz$|UVPD.*",
    type = c("a", "b", "c", "x", "y", "z"),
    adducts = data.frame(
        mass=c(1, 2, 1, 2, -1, -2),
        to=c("a", "a", "x", "x", "y", "y"),
        name=c("ap1", "ap2", "xp1", "xp2", "ym1", "ym2")),
    modifications = "Met-loss",
    neutralLoss = NULL,
)
# TopDownSet object (0.66 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 3096
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28906.63]
# - - - Condition data - - -
# Number of conditions: 64
# Number of scans: 211
# Condition variables (42): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 3096x211 (0.85% != 0)
# Number of matched fragments: 5526
# Intensity range: [17.72;283448.38]
# - - - Processing information - - -
# [2018-01-27 18:33:13] 5526 fragments [3096;211] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:13] Condition names updated based on: Mz, AgcTarget, UvpdActivation. Order of conditions changed. 64 conditions.
# [2018-01-27 18:33:13] Recalculate median injection time based on: Mz, AgcTarget.

cacmb <- combine(ca, cauvpd)
cacmb
# TopDownSet object (4.91 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK 
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 3612 
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28906.63]
# - - - Condition data - - -
# Number of conditions: 1917 
# Number of scans: 6309 
# Condition variables (63): File, Scan, ..., UvpdActivation, UvpdTime
# - - - Intensity data - - -
# Size of array: 3612x6309 (0.58% != 0)
# Number of matched fragments: 132174 
# Intensity range: [17.72;3727119.75]
# - - - Processing information - - -
# [2018-01-27 18:33:12] 126648 fragments [2064;6098] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:12] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1853 conditions.
# [2018-01-27 18:33:12] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:33:13] 5526 fragments [3096;211] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:13] Condition names updated based on: Mz, AgcTarget, UvpdActivation. Order of conditions changed. 64 conditions.
# [2018-01-27 18:33:13] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:34:21] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation, UvpdActivation. Order of conditions changed. 1917 conditions.
# [2018-01-27 18:34:21] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:34:21] Combined 126648 fragments [2064;6098] and 5526 fragments [3096;211] into a 132174 fragments [3612;6309] TopDownSet object. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants