New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
matching control (issue specific to UVPD data) #74
Comments
as a quick example: ` spectraList <- readMSData("CA_UVPD_745_.mzML", msLevel. = 2, mode = "onDisk") rtScan <- rtime (spectraList)/60 TXT <- read.csv ("CA_UVPD_745_.txt", stringsAsFactors = FALSE) rtTXT <- TXT$RT..min. matchingRT <- sapply(rtScan, function(x) which.min(abs(rtTXT-x))) plot (rtScan, rtTXT[matchingRT]) TXT <- TXT[matchingRT, ] nrow (TXT) == length (spectraList) |
As demonstrated in this comment #73 (comment) the Just for verification: With retention time based matching control (but with old H <- 1.0078250321
sampleColumns <- c("Mz", "AgcTarget", "UvpdActivation")
UVPDFull <- readTopDownFiles(path = getwd(),
type = c("a", "b", "c", "x", "y", "z"),
adducts = data.frame(mass=c(H, H, -H, -H, H),
name=c("apH", "xpH", "ymH", "zmH", "zpH"),
to=c("a", "x", "y", "z", "z")),
neutralLoss = NULL,
modifications = c("Carbamidomethyl", "Acetyl", "Met-loss"),
sampleColumns = sampleColumns,
tolerance = 5e-6)
# Reading sequence from fasta file CA.fasta.gz
# Reading 74 header information from file CA_UVPD_1162_.txt
# Warning in FUN(X[[i]], ...) :
# 11 FilterString entries modified because of duplicated ID for different conditions.
# Reading 73 header information from file CA_UVPD_1162_2.txt
# Warning in FUN(X[[i]], ...) :
# 12 FilterString entries modified because of duplicated ID for different conditions.
# Reading 80 header information from file CA_UVPD_745_.txt
# Reading 80 header information from file CA_UVPD_745_2.txt
# Reading 73 header information from file CA_UVPD_908_.txt
# Reading 71 header information from file CA_UVPD_908_2.txt
# Reading 37 experiment conditions from file CA_UVPD_1162_.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_1162_2.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_745_.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_745_2.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_908_.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_908_2.experiments.csv
# Reading spectra information from file CA_UVPD_1162_.mzML (17.6%)
# Reading spectra information from file CA_UVPD_1162_2.mzML (18.6%)
# Reading spectra information from file CA_UVPD_745_.mzML (26.5%)
# Reading spectra information from file CA_UVPD_745_2.mzML (27.3%)
# Reading spectra information from file CA_UVPD_908_.mzML (25.7%)
# Reading spectra information from file CA_UVPD_908_2.mzML (29.1%)
# Error in .mergeSpectraAndHeaderInformation(mzmlHeader, scanHeadsman) :
# The retention times from header and spectra files differ. The latest version with UVPDFull <- readTopDownFiles(path = getwd(),
type = c("a", "b", "c", "x", "y", "z"),
adducts = data.frame(mass=c(H, H, -H, -H, H),
name=c("apH", "xpH", "ymH", "zmH", "zpH"),
to=c("a", "x", "y", "z", "z")),
neutralLoss = NULL,
modifications = c("Carbamidomethyl", "Acetyl", "Met-loss"),
sampleColumns = sampleColumns,
tolerance = 5e-6)
# Reading sequence from fasta file CA.fasta.gz
# Reading 74 header information from file CA_UVPD_1162_.txt
# Warning in FUN(X[[i]], ...) :
# 11 FilterString entries modified because of duplicated ID for different conditions.
# Reading 73 header information from file CA_UVPD_1162_2.txt
# Warning in FUN(X[[i]], ...) :
# 12 FilterString entries modified because of duplicated ID for different conditions.
# Reading 80 header information from file CA_UVPD_745_.txt
# Reading 80 header information from file CA_UVPD_745_2.txt
# Reading 73 header information from file CA_UVPD_908_.txt
# Reading 71 header information from file CA_UVPD_908_2.txt
# Reading 37 experiment conditions from file CA_UVPD_1162_.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_1162_2.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_745_.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_745_2.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_908_.experiments.csv
# Reading 37 experiment conditions from file CA_UVPD_908_2.experiments.csv
# Reading spectra information from file CA_UVPD_1162_.mzML
# Warning in (function (file, scans, fmass, ..., verbose = interactive()) :
# Using spectrumId scan information because acquisitionNum entries aren't valid.
# (22.4%)
# Reading spectra information from file CA_UVPD_1162_2.mzML
# Warning in (function (file, scans, fmass, ..., verbose = interactive()) :
# Using spectrumId scan information because acquisitionNum entries aren't valid.
# (23.2%)
# Reading spectra information from file CA_UVPD_745_.mzML
# Warning in (function (file, scans, fmass, ..., verbose = interactive()) :
# Using spectrumId scan information because acquisitionNum entries aren't valid.
# (28.2%)
# Reading spectra information from file CA_UVPD_745_2.mzML
# Warning in (function (file, scans, fmass, ..., verbose = interactive()) :
# Using spectrumId scan information because acquisitionNum entries aren't valid.
# (28.7%)
# Reading spectra information from file CA_UVPD_908_.mzML
# Warning in (function (file, scans, fmass, ..., verbose = interactive()) :
# Using spectrumId scan information because acquisitionNum entries aren't valid.
# (27.5%)
# Reading spectra information from file CA_UVPD_908_2.mzML
# Warning in (function (file, scans, fmass, ..., verbose = interactive()) :
# Using spectrumId scan information because acquisitionNum entries aren't valid.
# (29.6%)
table(UVPDFull$AgcTarget)
# 100000 500000 1000000
# 129 106 96 |
Do I understand correctly that you first try to match on "acquisitionNum" and if this fails according to rtime you then go into "spectrumId"? I get the following message for the UVPD files "Using spectrumId scan information because acquisitionNum entries aren't valid." |
No, sry. This message is misleading I will remove it (I thought it would be a good idea to show that the files are somehow different). The matching is always done using the information from the |
can I suggest different way of checking? Not with a particular deviation but just checking whether the rtime of the scan from mzML is closest to the one from the scanHeadsMan txt, that has been matched to it through spectrumID? |
Does this make a real difference? My approach ensures that the times are similar yours that they are the closest. While my could approach is a simple |
I am actually not sure. Thermo can always create additional problems by not having the exact rt written into the files, so my aesthetic preference is having it more robust (i.e. closest). However, we can change it when we get there. For now I think your solution is very smart, lets keep it |
We have a problem matching the scans in the mzML to the lines in the ScanHeadsMan txt for 2 reasons:
I have looked through how I did the initial matching an found that acquisitionNum() function from MSnbase returns the correct indices for the myoglobin files, but not for the UVPD dataset. Same goes for the header() function from mzR. So everything was ok, until we got to the UVPD data.
This explains why ETD data made sense
`
setwd ("M:\r_scripts\2017_01_18_top_down_lumos_first_large_experiment\2017_06_29_myo\raw")
mzMLFiles <- list.files(pattern = "mzML")[1]
spectraList <- readMSData(mzMLFiles, msLevel. = 2, mode = "onDisk")
spectraTable <- data.frame (File = gsub (".mzML", "", mzMLFiles)[fromFile (spectraList)],
Scan = acquisitionNum (spectraList), stringsAsFactors = FALSE)
table (spectraTable[2:nrow (spectraTable), "Scan"] - spectraTable[1: (nrow (spectraTable) - 1), "Scan"])
`
I know it is a huge pain, but can we try to match by retention time and filter string?
The text was updated successfully, but these errors were encountered: