Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct infusion data : MSnExp from combine spectra to WriteMSData #404

Open
drupadt opened this issue Jan 10, 2019 · 14 comments

Comments

@drupadt
Copy link

commented Jan 10, 2019

Hi,

I am using the package to let me look at direct infusion data. I know after last summer's discussions, combineSpectra has been modified and made more resourceful. I have been able to read an mzXML file in, get spectra out of it, combine them as base::mean and have single spectra (or so I think!). I've now converted that to MSnExp object but I want to use findChromPeaks_MSW in order to annotate these. I understand MSnExp signature isn't available for findChromPeaks and I'll need my MSnExp object to be converted to MSData.

I've tried to use writeMSData but don't think I can convert MSnExp into say mzxml file that will now contain a single spectrum?

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Jan 10, 2019

Dear @drupadt ,
you can subset the MSnExp to a single spectrum and it should then be possible to write this with writeMSData to a single-spectrum mzML file (although I have never tried that).

To subset the MSnExp to a single spectrum you would use the [ operator and using a single index or spectrum name. Example: assuming you have an MSnExp data you can extract an MSnExp with a single spectrum like detailed below. You have to know however which spectrum you want to extract. In the example I simply selected the first.

data_1 <- data[1]
@drupadt

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

Dear @drupadt ,
you can subset the MSnExp to a single spectrum and it should then be possible to write this with writeMSData to a single-spectrum mzML file (although I have never tried that).

To subset the MSnExp to a single spectrum you would use the [ operator and using a single index or spectrum name. Example: assuming you have an MSnExp data you can extract an MSnExp with a single spectrum like detailed below. You have to know however which spectrum you want to extract. In the example I simply selected the first.

data_1 <- data[1]

Hi @jotsetung - sorry I wasn't clear. I did not want to subset just one scan. Instead I want to combine the scans together into one single scan. I have 60 scans. I'm following up from a thread here - I have achieved the combine bit. Here is my code below. What I am now stuck with is taking the MSnExp object forward and making it a mzml file that can be read by XCMS.

library(xcms)
library(MassSpecWavelet)
____________________________________________________________________

file.prof = list.files("/Users/drupadtrivedi/R_WD/Test3/", recursive = TRUE, full.names = TRUE)

prof.files = readMSData(file.prof, 
                       pdata = NULL, 
                       msLevel. = 1,
                       verbose = isMSnbaseVerbose(), 
                       centroided. = NA, 
                       smoothed. = NA,
                       mode = "onDisk")

spec= Spectra(spectra(prof.files))


res = combineSpectra(spec, 
                     mzFun = base::mean, 
                     intensityFun = base::mean)

newexp = as(res, "MSnExp")

msw = MSWParam(scales = c(1,4,9,16,25),
               nearbyPeak = TRUE,
               winSize.noise = 100,
               SNR.method = "data.mean",
               snthresh = 2)

peaks = findChromPeaks(newexp, 
                       param = msw)

findChromPeaks doesn't of course work yet because it doesn't accept a MSnExp object (I think!)

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Jan 10, 2019

So the newexp contains only a single spectrum, is that correct? In that case you should be able to save that with writeMSData to an mzML file.

If you load that again with mode = "onDisk" you'll get an OnDiskMSnExp that you can use as input for the findChromPeaks.

@drupadt

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

Ah thanks! I get it now. I had assumed it was onDiskExp when I was reading it back! I'll close the issue now :)

@drupadt drupadt closed this Jan 10, 2019
@drupadt drupadt reopened this Jan 10, 2019
@drupadt

This comment has been minimized.

Copy link
Author

commented Jan 10, 2019

And am back! I guess I closed this to early. The hurdle that's stopped the train, now seems not to be something MSnbase or XCMS related but I'll see if there is an easier way to do it.
In the above script, when I have multiple files in the folder, even when I use the code as below, I end up getting one spectra for just 1 file. How do I make sure it knows, I have multiple files and hence am looking for averaging spectra from each file and put them back as separate spectrum per file? :

...
<snip>
spec= Spectra(spectra(prof.files))

ns = names(spec)
s = gsub("()/..+", "/1",ns)
u = unique(s)

res <- vector("list", length(u))
for (i in seq_along(u)) {
  w <- which(s==u[i])
res[[i]] = combineSpectra(spec, 
                     mzFun = base::mean, 
                     intensityFun = base::mean,
                     main = floor(length(w)/2L) + 1L, timeDomain=F)
}

newexp = as(res[[i]], "MSnExp")

</snip>
...
@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Jan 11, 2019

The simplest solution I could think of is to split the OnDiskMSnExp by file, perform the spectrum combination on each of these and save them again. In any rate, using fromFile is safer than guessing that from the spectrum name.

od_list <- split(prof.files, fromFile(prof.files))

<loop over od_list to combine spectra etc>

Alternatively, it should also be possible to split the Spectra by file:

spec_list <- split(spec, fromFile(spec))

newexp <- lapply(spec_list, <do the combine spectra, create>)
@drupadt

This comment has been minimized.

Copy link
Author

commented Jan 11, 2019

The simplest solution I could think of is to split the OnDiskMSnExp by file, perform the spectrum combination on each of these and save them again. In any rate, using fromFile is safer than guessing that from the spectrum name.

od_list <- split(prof.files, fromFile(prof.files))

<loop over od_list to combine spectra etc>

Alternatively, it should also be possible to split the Spectra by file:

spec_list <- split(spec, fromFile(spec))

newexp <- lapply(spec_list, <do the combine spectra, create>)

Thanks for the suggestion. In trying to implement both, I end up with multiple lists instead of spectra and on trying to covert these into Spectra(spectra(od_list)) or Spectra(spectra(newexp)) doesn't seem to work.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Jan 11, 2019

Right, in both cases you get a list, once a list of OnDiskMSnExp and once a list of Spectra. Let's stick with the second approach, what I thought as a possible solution would be:

spec_list <- split(spec, fromFile(spec))

newexp <- lapply(spec_list, function(z) {
    res <- combineSpectra(z, 
                         mzFun = base::mean, 
                         intensityFun = base::mean,
                         main = floor(length(w)/2L) + 1L, timeDomain=F)
    as(res, "MSnExp")
})

newexp should then be a list of MSnExp and you should then be able to save each of them

for (i in seq_along(newexp)) {
    writeMSData(newexp[[i]], file = paste0(i, ".mzML"))
}
@drupadt

This comment has been minimized.

Copy link
Author

commented Jan 11, 2019

Right, in both cases you get a list, once a list of OnDiskMSnExp and once a list of Spectra. Let's stick with the second approach, what I thought as a possible solution would be:

spec_list <- split(spec, fromFile(spec))

newexp <- lapply(spec_list, function(z) {
    res <- combineSpectra(z, 
                         mzFun = base::mean, 
                         intensityFun = base::mean,
                         main = floor(length(w)/2L) + 1L, timeDomain=F)
    as(res, "MSnExp")
})

newexp should then be a list of MSnExp and you should then be able to save each of them

for (i in seq_along(newexp)) {
    writeMSData(newexp[[i]], file = paste0(i, ".mzML"))
}

I like this approach. I think this will work. Just need to define w for the length(w) bit.

@etrh

This comment has been minimized.

Copy link

commented Jan 22, 2019

@drupadt @jotsetung, trying to do the same right now, can someone please tell me how I should define main in combineSpectra. What is w in the code snippets shared above?

@drupadt

This comment has been minimized.

Copy link
Author

commented Jan 22, 2019

I've not managed to do that with multiple attempts mainly due to lack of time. Meanwhile I've adapted the approach of manual binning to almost nominal mass and then working with single spectra data. The loss so far has been mass accuracy. But it has taken away majority of the work to get a single spectra in first place.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Jan 23, 2019

@etrh and @drupadt , first of all, you should use MSnbase 2.8.3 (shipped with Bioconductor version 3.8 on R >= 3.5.1). Have then a look at ?combineSpectra, the functionality and arguments have been cleaned-up: fcol defines which spectra in a list of Spectrum object should be combined (all spectra will be combined if omitted) and fun defines the function to be used to combine the spectra. Have then a look to ?meanMzInts which is the function called to combine spectra. I suggest that you manually define an mzd value based on the resolution of your MS instrument and the maximal expected scattering of m/z values for the same ion in consecutive spectra. main simply defines the main spectrum. In the code above it was not defined correctly, because w is not present. It should be rather floor(length(z)/2L) + 1L with z being the list of spectra. This will select the middle spectrum as main spectrum.

Eventually you might also want to have a look at ?consensusSpectrum as an alternative for meanMzInts. Note also that in your case you do not need to use the combineSpectrum method. If you have already a list of Spectrum objects that you want to combine, simply use either the meanMzInts or consensusSpectrum functions directly. They will combine all Spectrum objects and return a single Spectrum.

@etrh

This comment has been minimized.

Copy link

commented Jan 29, 2019

Thank you for your detailed response, @jotsetung

I'm currently using MSnbase 2.8.3 and I have followed your comments, but I am still not sure whether I'm doing the right thing. I'm very new to direct infusion data and feel very lost among all the functions offered by xcms and MSnbase.

I have 10 samples measured the regular LC/MS way and the same set of samples measured using the direct infusion technique. I simply want to see what proportion of peaks are common between these two approaches (i.e. common m/z values among the two sets of data).

My direct infusion data contains 78 scans (10 files, each with 78 scans).

What I do right now to combine the 78 scans in each separate file is as follows:

file.prof <- list.files("Data/Direct_Infusion/", recursive = FALSE, full.names = TRUE)
###I have 10 files in the directory above

dir.create("Combined_Spectra")
for(i in 1:length(file.prof)){
  
  prof.files <- readMSData(file.prof[i], pdata = NULL, mode = "onDisk")
  spec <- Spectra(spectra(prof.files))
  
  comSpec <- combineSpectra(spec, timeDomain = F, mzd = 0.001, ppm = 1, weighted = TRUE)
  
  writeMSData(object = as(comSpec, "MSnExp"), file = paste0("Combined_Spectra/combined_", basename(file.prof[i])))
  
}

Does this make any sense? Am I doing the right thing combining my spectra in this way?

I proceed in this way:

readMSData(list.files("Combined_Spectra/", full.names = T), pdata = NULL, mode = "onDisk") -> combSpec

msw <- MSWParam(scales = c(1, 4, 9), nearbyPeak = TRUE, winSize.noise = 500, SNR.method = "data.mean", snthresh = 10, verboseColumns = TRUE, peakThr = 1000)

peaks <- findChromPeaks(combSpec, param = msw)

I am also wondering whether it would be best to use profile data when analyzing direct infusion data?

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Jan 30, 2019

I'm not the right person to answer questions related to direct infusion data, never analyzed that data so far.

Regarding the code to combine the spectra, that makes total sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.