New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dealing with different fragments in different fragmentation methods #71
Comments
I understand the problem. Currently there is no easy solution. I fixed the combine method so that it is working for I don't have an idea to provide a clean interface for |
I have an idea which I think might adress this for now. Lets say I want different fragment types for CID, UVPD and ETD which are all in one file. I could:
this would however require an ability to merge topDownSets with different fragment types. It would be much less messier than to try and load different spectra with different fragments |
Ok, now library("topdownr")
path <- "../topdownrdata/inst/extdata/20170703_ca/"
#' Default workflow
H <- 1.0078250321
ca <- readTopDownFiles(
path = path,
## load fasta and ETD data
pattern=".*fasta.gz$|ETDReagentTarget.*|missing.*",
type = c("a", "b", "c", "x", "y", "z"),
adducts = data.frame(
mass=c(-H, H),
to=c("c", "z"),
name=c("cmH", "zpH")),
modifications = "Met-loss",
neutralLoss = NULL
)
# TopDownSet object (4.51 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 2064
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28904.63]
# - - - Condition data - - -
# Number of conditions: 1853
# Number of scans: 6098
# Condition variables (60): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 2064x6098 (1.01% != 0)
# Number of matched fragments: 126648
# Intensity range: [40.30;3727119.75]
# - - - Processing information - - -
# [2018-01-27 18:33:12] 126648 fragments [2064;6098] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:12] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1853 conditions.
# [2018-01-27 18:33:12] Recalculate median injection time based on: Mz, AgcTarget.
cauvpd <- readTopDownFiles(
path = path,
## load fasta and UVPD data
pattern=".*fasta.gz$|UVPD.*",
type = c("a", "b", "c", "x", "y", "z"),
adducts = data.frame(
mass=c(1, 2, 1, 2, -1, -2),
to=c("a", "a", "x", "x", "y", "y"),
name=c("ap1", "ap2", "xp1", "xp2", "ym1", "ym2")),
modifications = "Met-loss",
neutralLoss = NULL,
)
# TopDownSet object (0.66 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 3096
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28906.63]
# - - - Condition data - - -
# Number of conditions: 64
# Number of scans: 211
# Condition variables (42): File, Scan, ..., Sample, MedianIonInjectionTimeMs
# - - - Intensity data - - -
# Size of array: 3096x211 (0.85% != 0)
# Number of matched fragments: 5526
# Intensity range: [17.72;283448.38]
# - - - Processing information - - -
# [2018-01-27 18:33:13] 5526 fragments [3096;211] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:13] Condition names updated based on: Mz, AgcTarget, UvpdActivation. Order of conditions changed. 64 conditions.
# [2018-01-27 18:33:13] Recalculate median injection time based on: Mz, AgcTarget.
cacmb <- combine(ca, cauvpd)
cacmb
# TopDownSet object (4.91 Mb)
# - - - Protein data - - -
# Amino acid sequence (259): SHHWGYGKHNGPEHWHKDFPIANGER...PELLMLANWRPAQPLKNRQVRGFPK
# Mass : 28946.66
# Modifications (1): Met-loss
# - - - Fragment data - - -
# Number of theoretical fragments: 3612
# Theoretical fragment types (6): a, b, c, x, y, z
# Theoretical mass range: [60.04;28906.63]
# - - - Condition data - - -
# Number of conditions: 1917
# Number of scans: 6309
# Condition variables (63): File, Scan, ..., UvpdActivation, UvpdTime
# - - - Intensity data - - -
# Size of array: 3612x6309 (0.58% != 0)
# Number of matched fragments: 132174
# Intensity range: [17.72;3727119.75]
# - - - Processing information - - -
# [2018-01-27 18:33:12] 126648 fragments [2064;6098] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:12] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation. Order of conditions changed. 1853 conditions.
# [2018-01-27 18:33:12] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:33:13] 5526 fragments [3096;211] matched (tolerance: 5 ppm).
# [2018-01-27 18:33:13] Condition names updated based on: Mz, AgcTarget, UvpdActivation. Order of conditions changed. 64 conditions.
# [2018-01-27 18:33:13] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:34:21] Condition names updated based on: Mz, AgcTarget, EtdReagentTarget, EtdActivation, CidActivation, HcdActivation, UvpdActivation. Order of conditions changed. 1917 conditions.
# [2018-01-27 18:34:21] Recalculate median injection time based on: Mz, AgcTarget.
# [2018-01-27 18:34:21] Combined 126648 fragments [2064;6098] and 5526 fragments [3096;211] into a 132174 fragments [3612;6309] TopDownSet object. |
I understand that it might be a big problem to make it work but we need to be able to combine datasets with different (partically overlapping or non-overlapping fragment types). The reason for this bizzare request is that UVPD data tends to lead to some proton rearrangements. Please see from here (PMID: 29336549):
This means in practice that I have to read in the dataset that contains UVPD with all these fragments, that I know are not present in other fragmentation methods. Having so many fragments really decreases the accuracy of matching.
Hence we need to have a way to specify different fragment and adduct types for different fragmentation methods.
While it might take some time to decide how to implement this best, I really wanted to ask you to create a method for combination of NCBsets. Since I can easily match the different datasets (e.g. UVPD and ETD) with different fragments convert them to NCBsets and then combine them. But this is currently impossible, since it returns me the following error:
The text was updated successfully, but these errors were encountered: