-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in Data Frame #261
Comments
Thank you for the report. Would you be able to share the |
Please note that I am using an mzXML file instead of mzML file because Johannes Rainner told me that the readMSData function uses mzR which was designed to accept both mzML and mzXML files. If it is the case that readMSData accepts strictly mzML files, I will try it out with the said format. Here is a link to dropbox where the two files are stored. They were too big to copy and paste directly. |
Thanks. I have been travelling and just managed to download the file now - will look at is asap. |
The error that you saw stems from the fact that the mzid files claims two source files, while only one is expected > id <- openIDfile("~/tmp/Eta6/Eta6.mzid")
> sourceInfo(id)
[1] "C:\\Users\\Student\\Desktop\\170511_RF_Eta\\170511_RF_Eta6.mzML"
[2] "C:\\Users\\Student\\Desktop\\170511_RF_Eta6.raw" I then ran in a second error due to the lack of any identification scores > score(id)
No scoring information available
data frame with 0 columns and 0 rows Both errors are fixed in the latest version, available from github now (and soon also on Bioconductor), but I am wondering whether Proteome Discoverer is doing a good job. I haven't heard of these error before, and I am not very confident in a piece of software that doesn't report its scores - not very good practice when it comes to downstream analysis anyway. With the latest version, I get > rw <- readMSData("~/tmp/Eta6/Eta6.mzXML", mode = "onDisk")
> rw <- addIdentificationData(rw, "~/tmp/Eta6/Eta6.mzid")
No scoring information available |
If this now also works for you, could you please close this issue, or report any additional problems you see. |
Its still not working for some reason. I do not know why. I downloaded a coat of MSnBase from Github using the following code: The session info is the exact same as the one shown above. I then ran the following line of code: I suspect that maybe I have the old version without the new improvements. Where can I download the new one ? |
You are right about the old version. You'll need to install the very latest version directly from Github library("BiocInstaller")
biocLite("lgatto/MSnbase") Once you confirm that it works, I will push it to Bioconductor. |
Here is my session info after installation
Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): The error is still there:
|
Could you report the output of > getMethod("coerce", c("mzRident", "data.frame"))
Method Definition:
function (from, to = "data.frame", strict = TRUE)
{
iddf <- factorsAsStrings(psms(from))
src <- basename(sourceInfo(from))
if (length(src) > 1)
src <- paste(src, collapse = ";")
iddf$spectrumFile <- src
iddf$idFile <- basename(fileName(from))
scores <- factorsAsStrings(score(from))
if (nrow(scores)) {
stopifnot(identical(iddf[, 1], scores[, 1]))
iddf <- cbind(iddf, scores[, -1])
}
mods <- factorsAsStrings(modifications(from))
names(mods)[-1] <- makeCamelCase(names(mods), prefix = "mod")[-1]
iddf <- merge(iddf, mods, by.x = c("spectrumID", "sequence"),
by.y = c("spectrumID", "modSequence"), suffixes = c("",
".y"), all = TRUE, sort = FALSE)
iddf[, "spectrumID.y"] <- NULL
subs <- factorsAsStrings(substitutions(from))
names(subs)[-1] <- makeCamelCase(names(subs), prefix = "sub")[-1]
iddf <- merge(iddf, subs, by.x = c(spectrumID = "sequence"),
by.y = c(spectrumID = "subSequence"), suffixes = c("",
".y"), all = TRUE, sort = FALSE)
iddf[, "spectrumID.y"] <- NULL
iddf
}
<environment: namespace:MSnbase>
Signatures:
from to
target "mzRident" "data.frame"
defined "mzRident" "data.frame" |
Here it is: > getMethod("coerce", c("mzRident", "data.frame"))
Method Definition:
function (from, to = "data.frame", strict = TRUE)
{
iddf <- factorsAsStrings(psms(from))
iddf$spectrumFile <- basename(sourceInfo(from))
iddf$idFile <- basename(fileName(from))
scores <- factorsAsStrings(score(from))
stopifnot(identical(iddf[, 1], scores[, 1]))
iddf <- cbind(iddf, scores[, -1])
mods <- factorsAsStrings(modifications(from))
names(mods)[-1] <- makeCamelCase(names(mods), prefix = "mod")[-1]
iddf <- merge(iddf, mods, by.x = c("spectrumID", "sequence"),
by.y = c("spectrumID", "modSequence"), suffixes = c("",
".y"), all = TRUE, sort = FALSE)
iddf[, "spectrumID.y"] <- NULL
subs <- factorsAsStrings(substitutions(from))
names(subs)[-1] <- makeCamelCase(names(subs), prefix = "sub")[-1]
iddf <- merge(iddf, subs, by.x = c(spectrumID = "sequence"),
by.y = c(spectrumID = "subSequence"), suffixes = c("",
".y"), all = TRUE, sort = FALSE)
iddf[, "spectrumID.y"] <- NULL
iddf
}
<bytecode: 0x000000002c69a710>
<environment: namespace:MSnbase>
Signatures:
from to
target "mzRident" "data.frame"
defined "mzRident" "data.frame" |
Ok, so for some reason, you are still running the old code. Could you
|
Thanks a lot. Now works. |
Unfortunately, for some reason, I have run into more errors during the course of the night. This might be classified as a question and not necessarily an issue.
Lastly, do the comparison matrices compare any two different spectra or they compare the same peptide spectra in different samples ? Is the clusterdendogram supposed to cluster different runs/experiments or it clusters just spectra in one run/experiment. Can I use the clusterdendogram to cluster different batches of a recombinant protein drug ? |
I am afraid that we can't help you with this few lines of warnings and errors. Somehow your
The comparison method compares each spectrum against all the others in the
I guess that should be possible. |
Have a look at Re centroided, you can set this with There are a few things a bit weird in your code. For example, you type qnt.max <- normalise(msexp, "max") The name Re quantitation (with both methods), I think there's something else wrong. Somehow, the peptide sequences isn't are set to NA. I'll have too look closer. |
@NebuchadnezzaraHardrada - is it normal that all 1107 PSMs match one database entry, namely |
I managed to work on the problem using your suggestions and the following now stands.
ionCount(msexp[[1]])
plot(ionCount(msexp[[1]]))
> sc <- quantify(msexp, method = "count")
Error in `sampleNames<-`(`*tmp*`, value = sampleNames(phenoData)) :
'value' length (0) must equal sample number in AssayData (1)
> compareSpectra(centroided[[1]], centroided[[2]],fun = "common")
Error in centroided[[1]] : object of type 'closure' is not subsettable After adding labels, the error still persisted but the error type changed. > compareSpectra(centroided[[F1.S00001]], centroided[[F1.S00002]],fun = "common")
Error in centroided[[F1.S00001]] : object 'F1.S00001' not found F1.0001 is what you get when you type the following line of code, which I assumed is the data label. > head(fData(msexp), n = 50)
spectrum
F1.S00001 1
F1.S00002 2
> compmat <- compareSpectra(centroided, fun="cor")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘compareSpectra’ for signature ‘"standardGeneric", "missing"’
centroided <- pickPeaks(msexp, verbose = FALSE) (THIS ONE WORKS FINE), the next line of code does not.
> (k <- which(fData(centroided)[, "PeptideSequence"] == "ASGVAVSDGVIK"))
Error in `[.data.frame`(fData(centroided), , "PeptideSequence") :
undefined columns selected From my limited understanding, plot(centroided[[k[1]]], centroided[[k[2]]]) is used to compare two different spectra from the same MSexp object, can the same be done to compare two different spectra from two different MSExp objects or the same spectrum (e.g X1) from a whole MSnSet ? @lgatto I expected all the PSMs to match one database entry because the sample contains a pharmaceutical grade recombinant protein drug searched against its own database. Had we had more hits, that would indicate that the protein drug contains other sequences and is therefore not pure. Since the peptide sequence is known and is essentially the same in all samples, figure 18 is my only hope in proving differences between batches because differences lie only in the intensities of specific peptides. What is the best way to load more than 60 mzXML files with their mzID files, conduct clean up and removal of spectra and finally, compare the same spectra across all files (or two at a time) and finally cluster these comparisons with a dendogram ? Is MSnBase designed to do this or I am trying to use the wrong tool ? |
Identify your spectrum of interest, for example the 10th, then use plot(msexp[[10]]) (possibly with additional options)
It's only for demonstration, but there isn't anysthing special about it. You can either use base plotting and then add p3 <- plot(clean(removePeaks(ppsp,t=3)), full = TRUE, plot = FALSE) +
theme_gray(5) +
geom_point(size=3,alpha=I(1/3)) +
geom_hline(yintercept=3,linetype=2) +
ggtitle("Peaks < 3 removed and cleaned") This error > compareSpectra(centroided[[1]], centroided[[2]],fun = "common")
Error in centroided[[1]] : object of type 'closure' is not subsettable indicates that you don't have a data object called In the vignette, we also call an object Regarding the error when quantifying, the problem stems from the nature of your identification results. I don't understand why, but the data indicates that all results come from the same acquisition number: > iddf <- readMzIdData("~/Downloads/Eta6.mzid")
> table(iddf$acquisitionNum)
2
1107 This messes up the filtering and the addition of your identification data to the raw data. To load 60 files, you need the on disk infrastructure: use The package is |
I tried working with 3 files and for some reason, the readMSData function is now giving an error. x<-"E:/Data/Kundai/RStudio/XCMS/MZID/Trial2/MZXML" In the mean time, you can just answer the above mentioned question and close the issue. I will try to use other files with straightforward identification results and will get back to you if the problems still persists.Thanks for the help |
For multiple files, you need to provide a vector of file names, something like > msdata::proteomics()
[1] "MRM-standmix-5.mzML.gz"
[2] "MS3TMT10_01022016_32917-33481.mzML.gz"
[3] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz"
[4] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz" |
171004_hela_100ng.zip
It seems as if the number n, in some instances is limited and cannot go beyond, say 5000. Since both the standard and recombinant protein are "pure", ions elute at least 29 mins into the run and so I expected the first few spectra to be empty but then I cannot access those spectra after 10000 which I suspect have the compounds of interest.
I do not understand why I have to provide reporter ions to the above piece of code. I do not understand why it says there is no scoring information available when, the mzID file has "Proteome Discoverer Delta Scores" which I assume are the scores. Here is a link to the raw file on dropbox. The mzID file is has been attached directly to this thread. https://www.dropbox.com/s/6ho97icbpikoqyp/171004_hela_100ng.raw?dl=0 |
I have followed instructions on the following link on installation and usage of MSnBase.
http://bioconductor.org/packages/release/bioc/vignettes/MSnbase/inst/doc/MSnbase-demo.pdf.
This is the code that I have used for the session:
This is the error message I get:
SessionInfo
The text was updated successfully, but these errors were encountered: