Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link MS2 to MS1, MS3 to MS2 and MSn to MS(n-1) spectra #282

Closed
sgibb opened this issue Nov 29, 2017 · 28 comments · Fixed by #288
Closed

Link MS2 to MS1, MS3 to MS2 and MSn to MS(n-1) spectra #282

sgibb opened this issue Nov 29, 2017 · 28 comments · Fixed by #288
Assignees

Comments

@sgibb
Copy link
Collaborator

sgibb commented Nov 29, 2017

Last week @pavel-shliaha ask me to check whether a new neutral loss method for phosphopeptides is working as expected. Therefore it was important to find out which MS3 spectrum was triggered by which MS2. I did that by matching the precursorMz used for the MS2 scan from the filterString (the MS3 scan filterString contains the precursor for MS2 and MS3; that was one of the reasons for implementing it into mzR):

fData(ms2)$filterString[31:36]
# [1] "ITMS + c NSI t d Full ms2 893.7355@cid35.00 [241.0000-1798.0000]"
# [2] "ITMS + c NSI t d Full ms2 1078.4039@cid35.00 [291.0000-2000.0000]"
# [3] "ITMS + c NSI t d Full ms2 993.4641@cid35.00 [268.0000-1997.0000]"
# [4] "ITMS + c NSI t d Full ms2 584.0798@cid35.00 [155.0000-1179.0000]"
# [5] "ITMS + c NSI t d Full ms2 933.4402@cid35.00 [251.0000-1877.0000]"
# [6] "ITMS + c NSI t d Full ms2 978.3182@cid35.00 [264.0000-1967.0000]"
fData(ms3)$filterString[1:5]
# [1] "FTMS + p NSI d Full ms3 933.4402@cid35.00 915.4110@hcd30.00 [100.0000-2000.0000]"
# [2] "FTMS + p NSI d Full ms3 850.0895@cid35.00 752.1000@hcd30.00 [100.0000-2000.0000]"
# [3] "FTMS + p NSI d Full ms3 644.5992@cid35.00 603.6380@hcd30.00 [100.0000-2000.0000]"
# [4] "FTMS + p NSI d Full ms3 699.0558@cid35.00 666.1801@hcd30.00 [100.0000-2000.0000]"
# [5] "FTMS + p NSI d Full ms3 636.8463@cid35.00 619.0060@hcd30.00 [100.0000-2000.0000]"

I did the matching as follows:

## create data.frames for MS2 and MS3 with scanId and precursorMz
ms2df <- data.frame(scanId=scanIndex(ms2),
                    pcCid=gsub(".*ms2 *([0-9.]+@cid[0-9.]+) .*", "\\1",
                               fData(ms2)$filterString),
                    stringsAsFactors=FALSE)
ms3df <- data.frame(scanId=scanIndex(ms3),
                    pcCid=gsub(".*ms3 *([0-9.]+@cid[0-9.]+) .*", "\\1",
                               fData(ms3)$filterString),
                    stringsAsFactors=FALSE)

## merge them by precursorMz
nldf <- merge(ms3df, ms2df,
              all.x=TRUE, all.y=FALSE,
              sort=FALSE, by="pcCid", suffixes=c(".ms3", ".ms2"))

## calculate differences in scan Ids
nldf$delta <- nldf$scanId.ms3 - nldf$scanId.ms2

## remove all scans where the MS2 was acquired after the MS3
nldf <- nldf[nldf$delta > 0,]

## keep just the first (earliest) MS3 (if there are duplicated ones)
nldf <- nldf[order(nldf$scanId.ms3, nldf$delta),]
nldf <- nldf[!duplicated(nldf$scanId.ms3),]

See the whole example at https://rawgit.com/sgibb/talk-odense-20171127/master/01-neutral-loss.html#compare-nl-trigger (unfortunately it doesn't contain the dataset because it is 2.2 Gb but @pavel-shliaha promise me that he would share this dataset if needed).

While this procedure isn't very complicated we should provide a function or something else to allow an easier reference for each MSn to its corresponding MS(n-1).
I am not sure that the filterString approach will always work. Maybe we have to go through all precursorMz or some other information. Also I am not sure whether it would be enough to add another column to the featureData slot (e.g. parentMsScanId) or we need to add it to the Spectrum class.

Do you have any better ideas? Do you thing we need such a kind of reference?

@jorainer
Copy link
Collaborator

As far as I understood the filterString is specific for Thermo instruments. I recently added also the spectrumId header column. Some manufacturer seem to encode settings in the spectrum ID. We should check if we could extract some things from there too.

@lgatto
Copy link
Owner

lgatto commented Nov 30, 2017

MS3 is, as far as I know, Thermo specific anyway, so it would probably be a good way forward. Matching spectra from different levels is definitely very high priority.

@sgibb
Copy link
Collaborator Author

sgibb commented Nov 30, 2017

@jotsetung I have no idea whether filterString is thermo-specific or not.

The msdata package contains already a file with MS3 (but is of course also from Thermo):

library("mzR")

f <- msdata::proteomics("MS3TMT10_01022016_32917-33481.mzML.gz",
                        full.names=TRUE)
fh <- openMSfile(f)

header(fh)[c(1, 4, 6),]
#   seqNum acquisitionNum msLevel polarity peaksCount totIonCurrent retentionTime
# 1      1          32918       1        1      48304    3005937408      4422.620
# 4      4          32921       2        1        765       1220570      4422.735
# 6      6          32923       3        1       2540       2522363      4423.036
#   basePeakMZ basePeakIntensity collisionEnergy ionisationEnergy    lowMZ
# 1   696.7169       172620640.0               0                0 376.2217
# 4   549.6967          103048.6              35                0 200.2342
# 6   129.1378          275641.0              65                0  99.0060
#      highMZ precursorScanNum precursorMZ precursorCharge precursorIntensity
# 1 1515.1263                0      0.0000               0                  0
# 4 1596.3099            32918    673.3353               3            7623700
# 6  505.0462                0    490.2629               0                  0
#   mergedScan mergedResultScanNum mergedResultStartScanNum
# 1          0                   0                        0
# 4          0                   0                        0
# 6          0                   0                        0
#   mergedResultEndScanNum injectionTime
# 1                      0  0.0008102598
# 4                      0  0.0123253975
# 6                      0  0.0520793648
#                                                                          filterString
# 1                                           FTMS + p NSI Full ms [380.0000-1500.0000]
# 4                    ITMS + c NSI t d Full ms2 673.6693@cid35.00 [180.0000-2000.0000]
# 6 FTMS + p NSI sps d Full ms3 673.6693@cid35.00 490.2629@hcd65.00 [100.0000-500.0000]
#                                       spectrumId
# 1 controllerType=0 controllerNumber=1 scan=32918
# 4 controllerType=0 controllerNumber=1 scan=32921
# 6 controllerType=0 controllerNumber=1 scan=32923

As you can see there is always at least a difference of 2 in the scan indices of MS2 and MS3 (but could be even larger). So there is no way to just use the scanId. At least in mzML generated from Thermo the spectrumId is quite useless. There is a precursorScanNum entry (and I first thought it would be easy to look just at this column) but it is quite typical for Thermo that they don't set it for msLevel > 2 ...

While investigating this problem a little bit more I recognized that the precursorScanNum is sometimes correctly set. Maybe Thermo fixed it, e.g. I have files that were generated with different versions of Xcalibur:

  • Xcalibur 2.0.1258.15: precursorScanNum == 0 for msLevel > 2.
  • Xcalibur 3.0.2022.16: precursorScanNum != 0 for msLevel > 2 and they seem to correspond to the information in filterString.

I don't think we should implement workarounds for Xcalibur bugs. So my implementation to get the link information wouldn't be needed and we could just throw a warning if precursorScanNum == 0 for msLevel > 2 (with a hint to update Xcalibur).

But we should provide some high level functions for the generic user to get the parent scans or all children of a current scan, e.g.:

  • filterPrecursorScan(msnexp, precursorScanId): just keep spectra that are generated from precursorScanId (across all msLevels).
  • filterChildScan(msnexp, childScanId): just keep spectra that are the precursor of childScanId.

I am not happy with these names because it is hard to understand in which direction they apply the filtering.

@jorainer
Copy link
Collaborator

jorainer commented Dec 1, 2017

OK, thanks for checking the scanId.
Agree, we should just stick to the precursorScanNum and use that.

re filter function names, what about just a single filterPrecursorScan method with an additional argument e.g. keep = c("children", "parents")?

@sgibb sgibb self-assigned this Dec 5, 2017
@sgibb
Copy link
Collaborator Author

sgibb commented Dec 5, 2017

filterPrecursorScan(msnexp, scanId) will return an [OnDisk]MSnExp object with all the parents and children of scanId. Filtering could be done afterwards with filterMsLevel.

@lgatto
Copy link
Owner

lgatto commented Dec 20, 2017

@sgibb, could you clarify what you mean with

I have files that were generated with different versions of Xcalibur:

  • Xcalibur 2.0.1258.15: precursorScanNum == 0 for msLevel > 2.
  • Xcalibur 3.0.2022.16: precursorScanNum != 0 for msLevel > 2 and they seem to correspond to the information in filterString.

I have a very recent MS3 file, produced on an instrument that has a computer with Xcalibur 3.0.64, and my hopes were still high at that point. I then took that raw file to a computer with the latest proteowizard (version 3.0.11626, freshly downloaded and installed) and converted it. The mzML file then lists Xcalibur 2.1.1565.24 and pwiz 3.0.7364! And, needless to say that the precursorScanNum is 0 for MS levels > 2.

There is an old version of Xcalibur on that computer (version 2.2.44, from June 2011), but as that version doesn't match what I see in the mzML file, I conclude that the local Xcalibur isn't used.

For completeness, I also converted the file on another computer that has Xcalibur 3.0.63 (same as on the mass spec computer), pwiz 3.0.999.2 and that mzML file listed Xcalibur 2.1.1565.24 and pwiz 3.0.9987).

In a nutshell, proteowizard seems to be way beyond in terms of Xcalibur (or probably MSFileReader) libraries, and isn't internally consistent. God I have proteomics!

So, @sgibb, where did you get mzML files that used a recent version of Xcalibur and had these bloody precursorScanNum values for ms levels > 2? Can @pavel-shliaha tell me how he converted them?

@sgibb
Copy link
Collaborator Author

sgibb commented Dec 20, 2017

Mh, I got that mzML file (with precursorScanNum != 0 for msLevel > 2) from @pavel-shliaha. I thought it was converted by pwiz + Xcalibur as well:

    <softwareList count="2">
      <software id="Xcalibur" version="3.0.2022.16">
        <cvParam cvRef="MS" accession="MS:1000532" name="Xcalibur" value=""/>
      </software>
      <software id="pwiz" version="3.0.10107">
        <cvParam cvRef="MS" accession="MS:1000615" name="ProteoWizard software" value=""/>
      </software>
    </softwareList>

I have no idea how to tell pwiz to use a different Xcalibur version. Maybe @pavel-shliaha could help us here?

@pavel-shliaha
Copy link

The computer running the LUMOS machine used to produce the raw file used Xcalibur 4.1.31.9 version.

The computer processing the files used:

Xcalibur 3.0.63
pwiz 3.0.10107.

about the MS3. This can only be performed on ion trap instruments. As far as I know these instruments are currently manufactured by Thermo (LUMOS, FUSION, VELOS, ELITE, Orbitrap XL), Bruker and Shimadzu.

Please let me know if you need any more information. Do you perhaps need the raw file in question?

@lgatto
Copy link
Owner

lgatto commented Dec 20, 2017

Thanks @pavel-shliaha. When you say

The computer processing the files used:

Xcalibur 3.0.63
pwiz 3.0.10107.

do you mean that this is what you see in the mzML files? Was Xcalibur installed on that processing computer? If so, what version does it have?

Have you done anything specific to get Xcalibur 3.0.63 to be used by pwiz during the conversion?

@pavel-shliaha
Copy link

yes the "processing computer" is the one used to generate the mzML file and the software installed on it was:

Xcalibur 3.0.63
pwiz 3.0.10107.

The computer hooked up to the LUMOS instrument that generated the file used 4.1.31.9 version. After the data was acquired, I transferred the data to the processing computer. I have not done anything specific to pwiz, just did normal mzML conversion.

Hope this helps.

@lgatto
Copy link
Owner

lgatto commented Dec 20, 2017

Ok, thank you for the clarification.

My only explanation is that Xcalibur 4.1.31.9 is needed on the Lumos computer, because otherwise, the setup on our processing computers are pretty much the same. @pavel-shliaha, is that possible based on when you started to get mzML files with complete precursorScanNum data?

@pavel-shliaha
Copy link

pavel-shliaha commented Dec 20, 2017

@lgatto if I understand correctly, Sebastian suggests that precursorScanNum is working for MS3 in the file I gave him, but not in other files he tested. And your theory is this is not an export bug of the pwiz or Xcalibur installed on the processing computer, but the problem with the raw files themselves?

That could be true, provided we used the very latest Xcalibur software on our LUMOS computers, that was just recently updated (we got the latest version for our UVPD work).

It might also be that Thermo fixed the problem earlier when it released SPS on FUSION, since PD (their software) is capable of doing SPS MS3 quantitation, hence it must match MS2 and MS3 spectra. I can look into it, if you haven't yet. I am not sure what instrument was used in the example from msdata

@lgatto
Copy link
Owner

lgatto commented Dec 20, 2017

And your theory is this is not an export bug of the pwiz or Xcalibur installed on the processing computer, but the problem with the raw files themselves?

This is how I explain it as a file acquired today on the Lumos running Xcalibur 3.0.64 and converted using pwiz 3.0.999.2 and having Xcalibur 3.0.64 installed failed to produce a file with a complete precursorScanNum. I can try to install pwiz 3.0.11626, to make sure, but the major difference seems to be what is running on the acquisition computer.

PD (their software) is capable of doing SPS MS3 quantitation

Yes, PD does MS3 quantitation, but they have access to more data in the binary raw files.

What really strikes me as utterly weird is that when using the very latest pwiz, the mzML file tells me Xcalibur 2.1.1565.24 was used, which doesn't match any version Xcalibur version (neither on the Lumos computer, neither the processing computer).

Any help would be greatly appreciated.

@lgatto
Copy link
Owner

lgatto commented Dec 21, 2017

@sgibb - could you send me that file with the complete precursorScanNum, please, so that this isn't stalled for ages while we understand and then possibly even sort this fucking mess.

@pavel-shliaha
Copy link

A couple of things: the software version that is listed "3.0.2022.16" for the file is not the Xcalibur version on either the instrument computer or processing computer.

It is the version of the tune, used to generate the data. In fact 3.0.2022.16 is the version of tune installed on our LUMOSes right now.

tune_version_tune_page

In order to see the tune number from the raw file in Xcalibur.

tune_version

@lgatto
Copy link
Owner

lgatto commented Dec 21, 2017

Thank you, I think this is what I was missing. On our Lumos computer, is gives 2.1.1565.24. I bet this is why we don't get the precursorScanNum we are looking for. @pavel-shliaha, do you know how to update the tune?

@pavel-shliaha
Copy link

you have to ask Thermo. However, they might refuse the update, since the new software is beta. We got it since we bought UVPD and 2.X does not support UVPD.

@lgatto
Copy link
Owner

lgatto commented Dec 21, 2017

The wonderful world of fucking mass spectrometry hell.

What's UVPD?

@pavel-shliaha
Copy link

UVPD is ultraviolet photodissociation. Basically you can break molecules by shining light at them of high intensity. This of course requires a laser. We have now had the laser installed inside the instrument see the animation

http://players.brightcove.net/665001591001/default_default/index.html?videoId=5449874193001

UVPD (as implemented by Thermo) is mostly beneficial for top-down proteomics, since it takes too long to break peptides (Thermo laser is too weak)

@pavel-shliaha
Copy link

pavel-shliaha commented Dec 21, 2017

Sebastian writes that the link to the MS3 file still works. Can you check that. I will also send an MS3 file from the previous version to test

https://filesender.deic.dk/filesender/?vid=71c5a478-76c1-4e89-0a7f-000067d3acf9 (should be valid until 2017-12-27)

The MS3 file should be called

LUM2_01470_KS_L1-5-2_EC17-123-150In2_55oC_NLtrig.raw

Please note the MS3 method in this method is NOT a normal TMT MS3 SPS

it is a neutral loss trigger for phospho TMT, please see this paper

Sensitive and Accurate Quantitation of Phosphopeptides Using TMT Isobaric Labeling Technique

PS: I think its the versions of tune again, that are incompatible

@lgatto
Copy link
Owner

lgatto commented Dec 21, 2017

Thanks @pavel-shliaha. Do you possibly have a MS3 and TMT10 or 11 for regular quantitation that was acquired with that recent version of Tune at hand that you could share?

@lgatto
Copy link
Owner

lgatto commented Jan 3, 2018

Thank you for the file @pavel-shliaha! Do you happen to have the corresponsing mzid file, or could you give me the species so that I can search it myself.

@lgatto
Copy link
Owner

lgatto commented Jan 10, 2018

Ping @pavel-shliaha

Do you happen to have the corresponsing mzid file, or could you give me the species so that I can search it myself.

@pavel-shliaha
Copy link

@lgatto. Sorry I was travelling and just arrived yesterday. I searched this data with mascot + percolator in Proteome Discoverer 2.1. It does not return mzIdentML files (I know, since I had trouble submitting to MCP), but it does return a csv file with the quantitation already in place, that can be linked to particular scans.

@lgatto
Copy link
Owner

lgatto commented Jan 10, 2018

Thanks. Could you send me that csv files, please, so that I can extract the information from there. I might still need an mzid file for testing purposes - what database did you use?

@pavel-shliaha
Copy link

pavel-shliaha commented Jan 11, 2018

where do you want the file? I can send it by email. I used human Uniprot canonical and reviewed entries only, I can send it as well

@lgatto
Copy link
Owner

lgatto commented Jan 12, 2018

Both by email is fine, thanks.

@lgatto
Copy link
Owner

lgatto commented Jan 14, 2018

Keeping open to add an example and section in the vignette using new data to be added to msdata.

@lgatto lgatto reopened this Jan 14, 2018
@lgatto lgatto closed this as completed May 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants