Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not properly reading fdata on some MRM samples #486

Closed
jmbadia opened this issue Oct 19, 2019 · 9 comments
Closed

not properly reading fdata on some MRM samples #486

jmbadia opened this issue Oct 19, 2019 · 9 comments

Comments

@jmbadia
Copy link

jmbadia commented Oct 19, 2019

Hello,
I am reading a mzML file (attached: 27076.1.zip) with MSnBase acquired by MRM from QqQ samples. There are some chromatograms with identical polarity, precursor and product m/z values on it, but with different collision energies

<chromatogram index="3" id="SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333"...>...<...name="collision energy" value="15.0">
<chromatogram index="4" id="SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667" ....>...<...name="collision energy" value="7.0">
<chromatogram index="5" id="SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667"...>...<...name="collision energy" value="25.0">

I cannot read their feature Data correctly. fData() returns the data from the first chromatogram repeated three times, so I'm not able to know the fData from the others chromatograms (I need their collision energy, specifically)

Chroms <- readSRMData("27076.1.mzML")
fData(Chroms)[5:7,c("chromatogramId","precursorCollisionEnergy")]

chromatogramId
5 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333
6 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333
7 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333
precursorCollisionEnergy
5 15
6 15
7 15

Thanks in advance

@jorainer
Copy link
Collaborator

For this I have to dig a little deeper into the respective (C++) code in the mzR package. I guess there might be some problem in extracting an unique chromatogram ID from the id string in the chromatogram header in the mzML file.

@jorainer
Copy link
Collaborator

So, mzR is reading the files correctly:

> head(mzR::chromatogramHeader(fd))
                                               chromatogramId chromatogramIndex
1                                                         TIC                 1
2 SRM SIC Q1=300 Q3=263.996 start=2.006183333 end=6.496216667                 2
3 SRM SIC Q1=300 Q3=281.996 start=2.006183333 end=6.496216667                 3
4 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333                 4
5      SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667                 5
6 SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667                 6
  polarity precursorIsolationWindowTargetMZ precursorIsolationWindowLowerOffset
1       -1                               NA                                  NA
2        1                              300                                  NA
3        1                              300                                  NA
4        1                              380                                  NA
5        1                              380                                  NA
6        1                              380                                  NA
  precursorIsolationWindowUpperOffset precursorCollisionEnergy
1                                  NA                       NA
2                                  NA                       15
3                                  NA                       15
4                                  NA                       15
5                                  NA                        7
6                                  NA                       25
  productIsolationWindowTargetMZ productIsolationWindowLowerOffset
1                             NA                                NA
2                        263.996                                NA
3                        281.996                                NA
4                        263.996                                NA
5                        263.996                                NA
6                        263.996                                NA
  productIsolationWindowUpperOffset
1                                NA
2                                NA
3                                NA
4                                NA
5                                NA
6                                NA

So, the problem is in fact in MSnbase function to generate the Chromatograms ... I will have a look into that.

jorainer added a commit that referenced this issue Oct 23, 2019
- Consider also the precursor collision energy in the generation of unique
  identifiers for each chromatogram across input files (fixes issue #486).
- Update relevant documentation.
@jmbadia
Copy link
Author

jmbadia commented Oct 23, 2019

something related with the warning that appears when you create the chromatograms object?

readSRMData("27062.1.mzML")
Warning messages:
1: file 27062.1.mzML contains multiple chromatograms with identical polarity, precursor and product m/z values 

@jorainer
Copy link
Collaborator

Exactly that is the problem. I was generating a unique identifier for chromatograms based on the available metadata, but forgot to consider the collision energy. Fixes for this are in #487 - hope @lgatto finds the time to merge this and push it to bioconductor.

@jmbadia
Copy link
Author

jmbadia commented Oct 24, 2019

Thank you very much @jorainer, this definitely solves my problem.

Just to contribute with something, I raise a rare case that could continue to give error. If someone made different SRMs with identical metadata (Q1,Q3, Pol and CE) but with different time ranges (very unusual I guess), the chromatogramId would be the same for all SRMs (with a misleading start/end time). Why don't you use an identifier for each chromatogram in a file instead of using its metadata to group them under a unique identifier?

@jorainer
Copy link
Collaborator

The main reason was that 1) I wanted to build the identifier based on the variables that are returned for the chromatograms. The start and stop time is not (yet?) reported by mzR and extracting it from the ID string can be tricky because I guess these ID is vendor specific (i.e. each vendor uses a different format). 2) I simply was not aware that something like this might happen. If you say that that is something that can become common we definitely have to check for possibilities to fix that.

Note: these identifiers are actually only used to match SRMs across mzML files.

For your problem: you could either install the current developmental Bioconductor version (by calling BiocManager::install(version = "devel") followed by devtools::install_github("lgatto/MSnbase"). That will install the fixed version. Note that in one week from now this developmental Bioconductor version will be released as version 3.10.

@jmbadia
Copy link
Author

jmbadia commented Oct 24, 2019

Thanks for your detailed answer. No, it seems hardly possible in practice to have such working conditions. And I see your point, identifier is just an id, not a feature source.
I will install your devel version and give you a proper feedback

@jorainer
Copy link
Collaborator

Thanks for the feedback! Any suggestions (or even better, contributions :) ) are highly welcome!

@jmbadia
Copy link
Author

jmbadia commented Nov 4, 2019

It works perfectly. Same Q1,Q3, polarity + different CE => Different chromatogramId & different characteristics. Thanks so much for your help & consideration, If I have any suggestion (or contribution !!) I'll let you know

chromatogramId
5 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333
6 SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667
7      SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667
  chromatogramIndex polarity precursorIsolationWindowTargetMZ
5                 4        1                              380
6                 6        1                              380
7                 5        1                              380
  precursorIsolationWindowLowerOffset
5                                  NA
6                                  NA
7                                  NA
  precursorIsolationWindowUpperOffset precursorCollisionEnergy
5                                  NA                       15
6                                  NA                       25
7                                  NA                        7
  productIsolationWindowTargetMZ productIsolationWindowLowerOffset
5                        263.996                                NA
6                        263.996                                NA
7                        263.996                                NA
  productIsolationWindowUpperOffset
5                                NA
6                                NA
7                                NA

@jmbadia jmbadia closed this as completed Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants