Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unclear getMass() semantics #8

Open
sneumann opened this issue Jul 25, 2017 · 8 comments
Open

unclear getMass() semantics #8

sneumann opened this issue Jul 25, 2017 · 8 comments

Comments

@sneumann
Copy link
Owner

Reported by Dzmitry Mukha:

When I used getMass(getMolecule("C89H176O16P2")) in the Rdisop, I have got drastically wrong
monoisotopic mass with >1 Da difference. The result is also significantely different from the
anticipated average mass (1564.288). You can see below the result of my own function getmass(x)
that seems to be correct. I can send you more examples if needed.

Indeed, getMass() is reporting the mass of the most abundant isotope,
which is the same as the monoisotopic mass for individual element atoms.
However, for molecules, this is not neccessarily the monoisotopic mass,
which might be the expected result. For the molecule monoisotopic mass,
the sum of the monoisotopic element masses would have to be considered.

See also https://en.wikipedia.org/wiki/Mass_(mass_spectrometry)#Monoisotopic_mass
for definitions and examples.

@sneumann
Copy link
Owner Author

This would be a unit test for this case

test.getMoleculeWithPositiveCharge <- function() {
     checkEqualsNumeric(852.354928,
                        getMolecule("C49H56FeN4O6",initializePSE(),z=1)$exactmass,
                        tolerance = 0.00005)  
}

@sneumann
Copy link
Owner Author

sneumann commented Dec 12, 2017

Using the example by Dzmitry Mukha,
and exact mass 1563.2434 obtained from http://www.chemcalc.org/analyse?mf=C89H176O16P2&peakWidth=0.001&msResolution=1563243&resolution=0.001&referenceVersion=2013

test.getMoleculeWithLargeMass <- function() {
     checkEqualsNumeric(1563.2434,
                        getMolecule("C89H176O16P2",initializePSE(),z=1)$exactmass,
                        tolerance = 0.00005)  
}

Issue is that Rdisop reports 1564.247 as exact mass, which is the mass of the most abundant
(second) isotope:

> getMolecule("C89H176O16P2",initializePSE(),z=1)$isotopes
[[1]]
             [,1]         [,2]         [,3]         [,4]         [,5]
[1,] 1563.2433642 1564.2467997 1565.2500834 1.566253e+03 1.567256e+03
[2,]    0.3503048    0.3581471    0.1923895 7.191911e-02 2.093433e-02

http://www.envipat.eawag.ch/index.php also agrees that the monoisotopic mass should be 1563.243363

@sneumann
Copy link
Owner Author

Comment on the code:
mass of an ComposedElement is obtained in
https://github.com/sneumann/Rdisop/blob/master/src/disop.cpp#L651
But I am unsure where in the C++ source this is actually calculated.

@cbroeckl
Copy link

cbroeckl commented Dec 12, 2017

Well, I have an R-only fix that will likely slow down the function, but works. I am pretty lost in the C code... I am decomposing the molecular formula using a function from the CHNOSZ package, but I am sure this can be ported over. Can't say it is an elegant fix, but this is how the calculations can be done. I will keep looking at the C code to see if I can find where this might be happening, to enable selection there.

getMolecule <- function(formula = "C49H56FeN4O6", elements = NULL, z = 0, maxisotopes=10) {
  # Use full PSE unless stated otherwise
  if (!is.list(elements) || length(elements)==0 ) {
    elements <- initializePSE()
  }
  

  # Remember ordering of element names,
  # but ensure list of elements is ordered
  # by mass
  element_order <- sapply(elements, function(x){x$name})
  elements <- elements[order(sapply(elements, function(x){x$mass}))]

  ## MONOISOTOPIC EXTRACTION
  mono<-sapply(1:length(elements), FUN = function(x) {
    m<-which.max(elements[[x]]$isotope$abundance)
    elements[[x]]$mass + (m-1) + elements[[x]]$isotope$mass[m]
  }
  )
  names(mono)<-sapply(1:length(elements), FUN = function(x) {elements[[x]]$name})
  
  # Call imslib to parse formula and calculate
  # masses and isotope pattern
  molecule <- .Call("getMolecule",
                    formula, elements, element_order,
                    z, maxisotopes,
                    PACKAGE="Rdisop")
  
  ## FIX MONOISOTOPIC HERE
  library('CHNOSZ')
  f<-makeup(formula)
  molecule$exactmass <- sum(mono[names(f)]*f)
  
  
  molecule
}

@cbroeckl
Copy link

Some observations:

  1. element.h: does not define getMass function, but line 157 appears to define the MONOISOTOPIC
    (MONOISOTOPIC is a typedef defined in line 52)
    mass. Line 169 suggests that the first isotope found with an abundance of > 0.5 is taken as the
    MONOISOTOPIC. It is unclear to me what happens if there are no isotopes > 0.5 abundance. Line 57 indicates that
    MONOISOTOPIC is initially set to 0.
    tests in R suggest that this is not the issue - getMolecule(formula = "C49H56FeN4O6") works, while
    "C89H176O16P2" does not. This indicates that the unusual Fe isotope patterns is not the source of the problem,
    but rather that the problem lies in the method used to calculate monoisotopic mass for a molecule
    It may be that the full isotope pattern is being calculated, then monoisotopic (incorrecly) assigned.

  2. in the disop.cpp file, line 248, decompositions_t appears to be selecting the first index rather than an explicit
    monoisotopic with this line "decomposer.getDecompositions(masses(0), error);"

  3. in the eispectraanalysis.cpp file, line 249, there is reference to:
    // updates molecule's isotope distribution to calculate its molecule's monoisotopic mass
    candidate_molecule.updateIsotopeDistribution();
    this updateIsotopeDistribution function could be problematic? I haven't looked into this in depth yet.

  4. also see: "decomposer.getAllDecompositions(integer_mono_mass)"
    Why is there frequent reference to integer monoisotopic mass? Could this rounding be the problem? Seems highly unlikely, given that we get decimals out.

I was thinking 1. initially, but I don't think that is the problem. Rather, I suspect 2 as the most likely right now.

@janlisec
Copy link
Contributor

This is an old discussion. However, I would like to contribute my two cents and make a suggestion to close it. :)
Firstly, I dont think that Rdisop does anything wrong here, as 'exact mass' ist not an equivalent term for 'monoisotopic mass'.
From the Wikipedia Link @sneumann posted above:

When an exact mass value is given without specifying an isotopic species, it normally refers to the most abundant isotopic species.

Hence, I would suggest to update the documentation to clarify this.

@cbroeckl already suggested a workaround function to compute the monoisotopic mass (MIM). While he used the CHNOSZ package, I would avoid another dependency and implement a solution to compute the MIM based on the PSE information already present in Rdisop.

Specifically, I would extend Rdisop by a new exported function getMono which accepts a vector of formulas (or alternatively a list of molecules, to be consistent with the other extractor functions) and returns the true MIMs.

@sneumann let me know if I shall prepare a PR implementing the above solution.

Bests [jan]

@sneumann
Copy link
Owner Author

Hi Jan, thanks for picking this up !
I am all for clear documentation, and also against growing dependencies where technically not necessary.
Yours, Steffen

@janlisec
Copy link
Contributor

I created a PR (v1.65.4). Once accepted, #8 could be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants