-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errorously parsing Bruker *.fid #3
Comments
@hsiaoyi0504 thanks for your bug report and the attached files. readBrukerFlexData (the package used for fid file import) fails to calculate the m/z values here. |
OK, I got it, but I am curious about how you trial and error to figure out the format of Bruker? |
If you are interested you can find all I know about the format in this wiki: https://github.com/sgibb/readBrukerFlexData/wiki/acqu-file The important part is the time-of-flight to m/z conversion described in Titulaer et al 2006 and implemented in https://github.com/sgibb/readBrukerFlexData/blob/master/R/tof2mass-functions.R Do you use an autoflexTOF/TOF device (my data are from an ultraflexTOF/TOF)? It seems the deflector was not activated ( |
My colleague uses autoflex speed TOF/TOF, and I am not sure if he uses reflector mode. I will check that. |
I confirmed that we used reflector mode. |
I modified the I think I found the important difference. In all my own files the conversion from tof to m/z follows a quadratic equation:
where
The newer files contain an additional variable
It seems to have the following pattern:
Your file has the following string:
Which could be:
and maybe should be used in a 4-order-polynomial:
I won't have time to test it this year (have to work each night for 12 hours the next days). If you like you could try it. To read the tof data you could use: library("readBrukerFlexData")
f <- readBrukerFlexFile("0_P10/1/1Ref/fid", filterZeroIntensities=TRUE)
f$spectrum$tof
f$spectrum$mass # wrong mass
f$spectrum$intensity It would be great to support your file too. If the above is not working I am going to throw an |
Unfortunately this idea doesn't work: library("readBrukerFlexData")
library("readMzXmlData")
f <- readBrukerFlexFile("0_P10/1/1Ref/fid", filterZeroIntensities=TRUE)
m <- readMzXmlFile("P10_raw.mzXML")
A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794
mz <- m$spectrum$mass
tof <- f$spectrum$tof
y <- D * (sqrt(mz))^4 + E * (sqrt(mz))^3 + A * (sqrt(mz))^2 + B * sqrt(mz) + C - tof
png("quad.png")
plot(mz, y, xlab="mz")
dev.off() The difference between "the old" quadratic equation and the new unknown formula seems to be quadratic itself: d <- m$spectrum$mass - f$spectrum$mass
png("diff.png")
plot(m$spectrum$mass, d, type="b", xlab="m/z", ylab="difference")
dev.off() Any ideas? |
Hi There, I was looking into reverse engineering the Bruker file formats awhile back and although I ended up putting that in the "too much effort" basket for the time being I kept an eye on this repo to come back too at some point. Anyway I'd like to help if I can but I'm not quite sure how. I took a quick look at the data uploaded and you did already @sgibb, and taking what @hsiaoyi0504 said as an assumption (that the mzXML data is being read in correctly?), we can see that it is a quadratic relation to TOF,
I'm not sure how the coefficients match up to the parameters in the acqu file though, looking at the values that are apparently normally used,
the transformed |
Hi, Hope this helps! |
So it seems that we need to add this into the user docs? |
Well, either that or hopefully we can figure out a way to add another variable to the TOF equation, which would change the quadratic equation into the cubic one. |
I am going to try to figure out how to import this type of files correctly. If we don't get it, I am going to add an error if the @Armadilloa16 thanks for testing. The relation is Using the attached files (0_A20.zip, generated by an autoflex device) assuming a quadratic relationship works (as we already know): library("readBrukerFlexData")
library("readMzXmlData")
f <- readBrukerFlexFile("0_A20/1/1SLin/fid")
m <- readMzXmlFile("0_A20/1/1SLin/0_A20.mzXML")
mz <- m$spectrum$mass
tof <- f$spectrum$tof
# the NTBCal string:
# 19886 0.9999999999999999 268.44302617844 2597289.7995303 -0.004433520310037
cc <- f$metaData$calibrationConstants
cc[1] <- sqrt(1e12/cc[1])
df <- data.frame(tof=tof,
x=sqrt(mz))
m <- lm(tof ~ I(x^2) + I(x), data=df)
cbind(acqu=cc[c(2, 3, 1)], model=coef(m))
# acqu model
# c2 268.44302618 268.44302618
# c3 -0.00443352 -0.00443352
# c1 620.49715567 620.49715567 The same for the @hsiaoyi0504's files is not working, neither for the cubic nor the 4th-order-polynomial model: library("readBrukerFlexData")
library("readMzXmlData")
f <- readBrukerFlexFile("0_P10/1/1Ref/fid", filterZeroIntensities=TRUE)
m <- readMzXmlFile("P10_raw.mzXML")
mz <- m$spectrum$mass
tof <- f$spectrum$tof
# the NTBCal string:
# 48 0.25 3884.868239229128 1164156.9998801197 6.1682474689786355
# -0.063083671657452864 -34.362308485323794
df <- data.frame(tof=tof,
x=sqrt(mz))
lm(tof ~ I(x^2) + I(x), data=df)
#
# Call:
# lm(formula = tof ~ I(x^2) + I(x), data = df)
#
# Coefficients:
# (Intercept) I(x^2) I(x)
# -282.260 -1.054 1205.826
lm(tof ~ I(x^3) + I(x^2) + I(x), data=df)
#
# Call:
# lm(formula = tof ~ I(x^3) + I(x^2) + I(x), data = df)
#
# Coefficients:
# (Intercept) I(x^3) I(x^2) I(x)
# -1.313e+03 8.308e-02 -7.498e+00 1.357e+03
lm(tof ~ I(x^4) + I(x^3) + I(x^2) + I(x), data=df)
#
# Call:
# lm(formula = tof ~ I(x^4) + I(x^3) + I(x^2) + I(x), data = df)
#
# Coefficients:
# (Intercept) I(x^4) I(x^3) I(x^2) I(x)
# -3.632e+03 -9.276e-03 1.034e+00 -4.131e+01 1.842e+03 Could it be that it depends on the device? It is a TOF/TOF. So maybe there are different constants/formulas for each flight tube/mass analyzer? (resulting in an additive model?) |
Sorry I didn't have the time yet to look into it. And didn't in depth still. But let my just comment on one thing that seems very suspicious to me, which is the delay value: the first two examples (graphs) in this thread seem correct to me except for the mass shift. The shape is ok. That means, the problem lies in the 'delay' alone, which is the time between shot and recording data. So basically, when you want to record a spectrum from lower masses of 2000 Da, you shoot, and then just let pass by the ions with masses smaller than that. Just wait, until they passed. This is the delay. And when the delay is in fact 48 as in the 'problematic' cases, the shift in the result seems 'correct' to me. Perhaps we have to figure out how to find the correct delay? Because obviously other tools can do it. |
@recumbentbirder thanks for looking into this. In fact the delay seems very short but wouldn't a different delay just result in a linear mass shift? But the spectrum is stretched and shifted as well. |
@sgibb oh, yes! Sorry. See, I didn't look that deep into it. I don't know yet, when I have a little more time :-/ |
Let me contribute with a screenshot of said spectrum as displayed by Bruker's flexAnalysis: to be honest, of all the plots now, this seems the most credible to me. Considering the low delay value, translating directly in a low minimum mass. But this is just a matter of displaying it: I guess you set 'xlim'? But one interesting observation: in the plotted range, the maximum arbitrary unit intensity is shown as around 2000 by flexAnalysis for the maximum peak at 700 Da. Though the units are arbitrary (dependent on how many shots you sum up), the values in a spectrum file are 'absolute': there's normally no normalization done by default for plotting ... @hsiaoyi0504, how did you come up with the mzXML file? A Bruker software, or third party? |
My partner used the mzconvert in the proteowizard to convert the files. |
Sorry for the long delay. I just uploaded readBrukerFlexData 1.8.5 to CRAN. In this version readBrukerFlexData throws a warning if it founds the |
Same problem. Does anybody ask Bruker directly? |
Back in 2010 and in early 2011 when I started the development of |
Bruker's policy didn't change. The polynomial equation is confirmed. HPC is patented without any communication. I spent hours at understanding the question. I think I reached some conclusions (see 2nd next comment, thanks @sgibb for the equation coefficients). I need some help to finish... I would like a spectra with a fifth calibration coefficient not zero. The one of @hsiaoyi0504 seems problematic to me (next comment). I would like some feedback from whoever is listening here. Best, |
The file from @hsiaoyi0504 but still don't understand it. What is surprising is the shape of the transformation. I plotted the sqrt(m/z) from the mzXML versus tof and the curve at low tof is somehow odd for me. |
I didn't succeed in fitting it with the equation, but the equation is OK if the fifth coefficient is zero. This means the equation is cubic instead of 4th order. In fact, @sgibb has nearly get the correct equation except he switched the last two coefficients (D<=>E). Here is my code and a test file (matrix control). library("readBrukerFlexData")
library("readMzXmlData")
#> Warning: package 'readMzXmlData' was built under R version 4.0.2
# define file location and read
setwd("C:/data/active/tests/test-bruker/")
f <- readBrukerFlexFile("spectra_D23/0_D23/1/1SLin/fid")
#> Warning in .readAcquFile(fidFile = fidFile, verbose = verbose): File 'C:
#> \data\active\tests\test-bruker\spectra_D23\0_D23\1\1SLin\fid' seems to be empty
#> because no laser shots applied to this sample.
#> Warning in readBrukerFlexFile("spectra_D23/0_D23/1/1SLin/fid"): The spectrum file 'C:\data\active\tests\test-bruker\spectra_D23\0_D23\1\1SLin\fid' uses V1.0CTOF2CalibrationConstants.
#> V1.0CTOF2CalibrationConstants aren't supported by readBrukerFlexFile. See https://github.com/sgibb/MALDIquantForeign/issues/19 for details.
#> Could not convert time-of-flight values into mass!
m <- readMzXmlFile("spectra_D23.mzXML") # converted using ProteoWizard
# extract information
tof <- f$spectrum$tof
mz <- m$spectrum$mass
range(tof)
#> [1] 13987.0 94812.6
range(mz)
#> [1] 399.9186 19933.5835
length(tof)
#> [1] 50517
length(mz)
#> [1] 50517
# sample 100 points
(idx = as.integer(seq(1, length(tof), length.out = 100)))
#> [1] 1 511 1021 1531 2042 2552 3062 3572 4083 4593 5103 5613
#> [13] 6124 6634 7144 7654 8165 8675 9185 9695 10206 10716 11226 11737
#> [25] 12247 12757 13267 13778 14288 14798 15308 15819 16329 16839 17349 17860
#> [37] 18370 18880 19390 19901 20411 20921 21432 21942 22452 22962 23473 23983
#> [49] 24493 25003 25514 26024 26534 27044 27555 28065 28575 29085 29596 30106
#> [61] 30616 31127 31637 32147 32657 33168 33678 34188 34698 35209 35719 36229
#> [73] 36739 37250 37760 38270 38780 39291 39801 40311 40822 41332 41842 42352
#> [85] 42863 43373 43883 44393 44904 45414 45924 46434 46945 47455 47965 48475
#> [97] 48986 49496 50006 50517
tof = tof[idx]
mz = mz[idx]
# overview
plot(tof, mz) plot(tof, sqrt(mz)) # extract from the acqu file
##$NTBCal= <V3.0CCalibrator 12 1 0 0 -2147483648 2147483647 V1.0CHPCData Order 0 vCoeff V1.0VectorDouble 0 c2 0 c0 0 minMass 0 maxMass 0 bUse 0 endCHPCData V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2 V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2 >
##$NTBCalr= <V3.0CCalibrator 12 1 0 0 -2147483648 2147483647 V1.0CHPCData Order 0 vCoeff V1.0VectorDouble 0 c2 0 c0 0 minMass 0 maxMass 0 bUse 0 endCHPCData V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2 V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2 >
# define the calibration coefficients
Calib = c(581.41079117084337, sqrt(1e12/2220702.4352394971), -0.036090927547664166, 7.3810763208576876e-005, 0, 2)
# model of the data
df <- data.frame(tof=tof,x=sqrt(mz))
(mod2 = lm(tof ~ 1 + I(x) + I(x^2), data=df))
#>
#> Call:
#> lm(formula = tof ~ 1 + I(x) + I(x^2), data = df)
#>
#> Coefficients:
#> (Intercept) I(x) I(x^2)
#> 606.45893 669.77815 -0.01825
Calib[1:3]
#> [1] 581.41079117 671.04989940 -0.03609093
# the coefficients perfectly match the coefficients found by a linear fit
(mod4 = lm(tof ~ 1 + I(x) + I(x^2) + I(x^3) + I(x^4), data=df))
#>
#> Call:
#> lm(formula = tof ~ 1 + I(x) + I(x^2) + I(x^3) + I(x^4), data = df)
#>
#> Coefficients:
#> (Intercept) I(x) I(x^2) I(x^3) I(x^4)
#> 5.812e+02 6.710e+02 -3.609e-02 7.381e-05 -7.016e-18
Calib[1:5]
#> [1] 5.814108e+02 6.710499e+02 -3.609093e-02 7.381076e-05 0.000000e+00
# graphical view
plot(df$x, df$tof)
lines(df$x, predict(mod2), col = 2, lw = 2)
lines(df$x, predict(mod4), col = 4, lw = 2) # error on the estimated tof is very small, < 8e-11
plot(df$x, predict(mod4) - df$tof) Created on 2020-10-07 by the reprex package (v0.3.0) |
Final thought: using a cubic or 4th order (aka "enhanced cubic" as called by Bruker IIUC) is interesting. I showed that we interpret the equation to some extent (up to 3th order). I did a verification that the m/z equation equals the tof. I did not solve the equation in order to compute the m/z from the equation and the tof values. When the equation was of 2nd order, there is a symbolic solution, the one stated by @sgibb . Higher orders imply the use of polyroot function to solve the equation for sqrt(mz). I think we could solve the equation for 1000 points and interpolate the values between them. But there are maybe better solutions. # only 2nd order coefficients
# 0 = A * (sqrt(mz))^2 + B * sqrt(mz) + C(tof)
Re(polyroot(c(Calib[1]-tof[1], Calib[2], Calib[3])))^2
Re(polyroot(c(Calib[1]-tof[length(tof)], Calib[2], Calib[3])))^2 |
@SamGG thanks a lot for looking into it! Can you share your test files? Your work is already an improvement. Instead of not converting any |
Same problem here, I need to use the cubic enhanced calibration to get decent mass accuracies but also want to keep the metadata from readBrukerFlexData since it is very useful. I was thinking of a workaround, do you people think this approach is reliable?
Another more elegant/reliable idea could be to store the metadata from the acqu(s) file into mzML or mzXML files, do you think this is feasible? (compassXport documentation is not so user friendly, but it seems to me that this FlexVariableTable.xml may be useful) or if anybody has suggestions for other workarounds to keep the metadata I would greatly appreciat that. Thanks for the great contributions! |
Hi, |
Unfortunately not. You just could use the Bruker software to export fid to mzML and use |
I think the final variable may be some form of constant correction to the resulting library("readBrukerFlexData")
library("readMzXmlData")
f <- readBrukerFlexFile("0_P10/1/1Ref/fid") #, filterZeroIntensities=TRUE)
m <- readMzXmlFile("0_P10.mzXML")
A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794
mz <- m$spectrum$mass
tof <- f$spectrum$tof
mzPlusE = mz + E
sign = ((mzPlusE > 0) * 2)-1
df <- data.frame(tof=tof,x=sqrt(abs(mzPlusE))*sign)
(mod4 = lm(tof ~ 1 + I(x) + I(x^2) + I(x^3), data=df))
plot(df$x, df$tof)
lines(df$x, predict(mod4), col = 2, lw = 2) I don't get the exact coefficients, but fairly close:
|
A little more detail. The following (cubic transform) seems to produce the correct library('AlgebraicHaploPackage')
library("readBrukerFlexData")
library("readMzXmlData")
f <- readBrukerFlexFile("0_P10/1/1Ref/fid")
m <- readMzXmlFile("0_P10.mzXML")
mz <- m$spectrum$mass
tof <- f$spectrum$tof
A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794
a = D
b = A
c = B
d = C
root_mz <- c()
for(t in tof) {
root_mz <- append(root_mz, cubic(a, b, c, d-t)[3])
}
root_mz = Re(root_mz)
predicted_mz = root_mz*root_mz*sign(root_mz)-E
plot(mz, predicted_mz) It seems like the range up to this point is determined by a different function(?). |
Okay I think I have it now. It looks like the first part is quadratic and the rest is cubic. There is probably a much more efficient way to calculate this, but for testing here is the code: library("readBrukerFlexData")
library("readMzXmlData")
f <- readBrukerFlexFile("0_P10/1/1Ref/fid")
m <- readMzXmlFile("0_P10.mzXML")
mz <- m$spectrum$mass
tof <- f$spectrum$tof
A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794
a = D
b = A
c = B
d = C
# Calculate sqrt(mz) as if it were quadratic, using formula in https://github.com/sgibb/readBrukerFlexData/blob/master/R/tof2mass-functions.R
# But keep track of the sign (whether tof is bigger than C)
tofMinusC = (tof-C)
sign = ((tofMinusC > 0) * 2)-1
to_sqrt = ((B * B) - (4 * A * (tofMinusC)))
root_mz = (-B + sqrt(abs(to_sqrt)))/(2 * A)
predicted_mz <- c()
for(i in 1:length(tof)) {
# Check whether tof was larger than C
if(sign[i] < 0) {
# Use precalculated quadratic calibration
predicted_mz <- append(predicted_mz, (root_mz[i])^2 * sign[i] - E)
} else {
# Assume cubic with the form a*x^3 + b*x^2 + c*x + (d - tof)
# where x = sqrt(mz)
cur_root_mz = Re(polyroot(c(d-tof[i], c, b, a))[1])
predicted_mz <- append(predicted_mz, (cur_root_mz)^2 * sign[i] - E)
}
}
plot(mz, mz-predicted_mz) This results in errors ~= 10^-6 at |
@AlanRace that looks promising! Great stuff! I will give it a more detailed look in the next days! |
Dear @SamGG and @AlanRace thank you for your great work! Based on it I integrated the cubic calibration. I add you both as contributors to the Please find the cubic calibration in the issue3-ctof2calibration branch. remotes::install_github("sgibb/readBrukerFlexData@issue3-ctof2calibration") Something I currently not understand completely. It seems to be important to change |
Looks great to me! I'll try and test it properly over the next few days. Happy for you to add my ORCID (https://orcid.org/0000-0001-8996-2641) and email (alan.race@uni-marburg.de). I guess if you wanted it to be consistent between quadratic and cubic part you could just do abs(tof - C)? I guess this is just a measure of distance (in time) away from this pivot point where the function swaps over. |
Finally we could close this issue. Thank you all for your support and patience! I uploaded a new version (1.9.0) to CRAN. |
When I read the *.fid, I got a wrong parsing. As you can see in following figures:
When I read the bruker *.fid, I got this
But if I read the transformed *.mzxml, I got this
There is a shift in m/z. After suspected the file in bruker software, I found the first one is wrong. For your reference, I also attached the input files here.
0_P10.mzXML.zip
0_P10.zip
The text was updated successfully, but these errors were encountered: