Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errorously parsing Bruker *.fid #3

Closed
hsiaoyi0504 opened this issue Dec 24, 2016 · 35 comments
Closed

Errorously parsing Bruker *.fid #3

hsiaoyi0504 opened this issue Dec 24, 2016 · 35 comments
Assignees

Comments

@hsiaoyi0504
Copy link

When I read the *.fid, I got a wrong parsing. As you can see in following figures:

When I read the bruker *.fid, I got this
before_bruker

But if I read the transformed *.mzxml, I got this
after_mzxml

There is a shift in m/z. After suspected the file in bruker software, I found the first one is wrong. For your reference, I also attached the input files here.
0_P10.mzXML.zip
0_P10.zip

@sgibb
Copy link
Owner

sgibb commented Dec 26, 2016

@hsiaoyi0504 thanks for your bug report and the attached files. readBrukerFlexData (the package used for fid file import) fails to calculate the m/z values here.
To be honest I don't know why. Bruker didn't comment their file format and everything I know about it was found by trial and error.
I guess you measured the data in reflector mode. I compared it to some files I have and can't find any important difference yet. So I don't even know how to figure out when this problem arises nor how to fix it.
Please find a diff here

@sgibb sgibb self-assigned this Dec 26, 2016
@hsiaoyi0504
Copy link
Author

OK, I got it, but I am curious about how you trial and error to figure out the format of Bruker?

@sgibb
Copy link
Owner

sgibb commented Dec 26, 2016

If you are interested you can find all I know about the format in this wiki: https://github.com/sgibb/readBrukerFlexData/wiki/acqu-file

The important part is the time-of-flight to m/z conversion described in Titulaer et al 2006 and implemented in https://github.com/sgibb/readBrukerFlexData/blob/master/R/tof2mass-functions.R

Do you use an autoflexTOF/TOF device (my data are from an ultraflexTOF/TOF)? It seems the deflector was not activated (##$DEFLON= no). Is that correct?

@hsiaoyi0504
Copy link
Author

My colleague uses autoflex speed TOF/TOF, and I am not sure if he uses reflector mode. I will check that.

@hsiaoyi0504
Copy link
Author

hsiaoyi0504 commented Dec 27, 2016

I confirmed that we used reflector mode.

@sgibb
Copy link
Owner

sgibb commented Dec 28, 2016

I am curious about how you trial and error to figure out the format of Bruker?

I modified the acqu(s) files compared the generated mzXML files with my code etc.. Also the article Titulaer et al 2006 and the FlexVariableTable.xml file from CompassXport was helpful.


I think I found the important difference. In all my own files the conversion from tof to m/z follows a quadratic equation: 0 = A * (sqrt(mz))^2 + B * sqrt(mz) + C(tof) and is coded in the acqu file (see also: https://github.com/sgibb/readBrukerFlexData/wiki/acqu-file#tof-calculation and https://github.com/sgibb/readBrukerFlexData/wiki/acqu-file#mass-calculation):

##$DELAY= 41161 
##$DW= 0.5 
##$ML1= 416186.134282719 
##$ML2= 180.735984610665 
##$ML3= -0.0485676649403545 
##$TD= 114222 

where

TOF = DELAY + (0:(TD-1)) * DW
A   = ML3
B   = sqrt(1e12/ML1)
C   = ML2 - TOF
MZ  = (-B + sqrt(B^2 - (4 * A * C))/(2 * A))^2

The newer files contain an additional variable NTBCal that includes among others the HPC calibration constants but also the tof to m/z constants (V3.0CTOFCalibrationConstants):

##$NTBCal= <V3.0CCalibrator 1 1 0 0 -2147483648 2147483647 V1.0CHPCData Order 10 vCoeff V1.0VectorDouble 11 -0.30899977070432588 0.00012359234583847467 1.493619815861809e-006 -2.7525708852997263e-009 2.2224914282887503e-012 -9.267322574910882e-016 1.4960981457711781e-019 3.0715708656370546e-023 -1.8530337885161092e-026 3.203442186133685e-030 -1.9959759254400677e-034 c2 -0.048940611526270571 c0 225.86975630715625 minMass 736.50266799999997 maxMass 3814.7214599999998 bUse 1 endCHPCData V3.0CTOFCalibrationConstants 41160.999999999993 0.49999999999999994 180.73598461066527 416186.13428271882 -0.048567664940354464 2 V3.0CTOFCalibrationConstants 41160.999999999993 0.49999999999999994 181.7119550333 416153.63347654999 -0.048420739890736003 2 >

It seems to have the following pattern:

##$NTBCal= < ... V3.0CTOFCalibrationConstants DELAY DW C B A 2 >

Your file has the following string:

##$NTBCal= <V3.0CCalibrator 12 1 0 0 -2147483648 2147483647 V1.0CHPCData Order 0 vCoeff V1.0VectorDouble 0 c2 0 c0 0 minMass 0 maxMass 0 bUse 0 endCHPCData V1.0CTOF2CalibrationConstants 48 0.25 3884.868239229128 1164156.9998801197 6.1682474689786355 -0.063083671657452864 -34.362308485323794 2 V1.0CTOF2CalibrationConstants 0 0 0 0 0 0 0 -1 >

Which could be:

##$NTBCal= < ... V3.0CTOFCalibrationConstants DELAY DW C B A D E 2 >

and maybe should be used in a 4-order-polynomial:

0 = E * (sqrt(mz))^4 + D * (sqrt(mz))^3 + A * (sqrt(mz))^2 + B * sqrt(mz) + C(tof)

I won't have time to test it this year (have to work each night for 12 hours the next days). If you like you could try it. To read the tof data you could use:

library("readBrukerFlexData")
f <- readBrukerFlexFile("0_P10/1/1Ref/fid", filterZeroIntensities=TRUE)

f$spectrum$tof
f$spectrum$mass # wrong mass
f$spectrum$intensity

It would be great to support your file too. If the above is not working I am going to throw an not supported error if more than 3 constants are present in NTBCal.

@sgibb
Copy link
Owner

sgibb commented Jan 2, 2017

Unfortunately this idea doesn't work:

library("readBrukerFlexData")
library("readMzXmlData")

f <- readBrukerFlexFile("0_P10/1/1Ref/fid", filterZeroIntensities=TRUE)
m <- readMzXmlFile("P10_raw.mzXML")

A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794

mz <- m$spectrum$mass
tof <- f$spectrum$tof

y <- D * (sqrt(mz))^4 + E * (sqrt(mz))^3 + A * (sqrt(mz))^2 + B * sqrt(mz) + C - tof

png("quad.png")
plot(mz, y, xlab="mz")
dev.off()

quad

The difference between "the old" quadratic equation and the new unknown formula seems to be quadratic itself:

d <- m$spectrum$mass - f$spectrum$mass
png("diff.png")
plot(m$spectrum$mass, d, type="b", xlab="m/z", ylab="difference")
dev.off()

diff

Any ideas?

@Armadilloa16
Copy link

Hi There,

I was looking into reverse engineering the Bruker file formats awhile back and although I ended up putting that in the "too much effort" basket for the time being I kept an eye on this repo to come back too at some point. Anyway I'd like to help if I can but I'm not quite sure how.

I took a quick look at the data uploaded and you did already @sgibb, and taking what @hsiaoyi0504 said as an assumption (that the mzXML data is being read in correctly?), we can see that it is a quadratic relation to TOF, mz = A * tof^2 + B * tof + C with coefficients, A = 7.915023e-07, B = -2.619041e-03, C = 2.599336e+01

library("readBrukerFlexData")
library("readMzXmlData")

fid <- readBrukerFlexFile("0_P10/1/1Ref/fid")
xml <- readMzXmlFile("0_P10.mzXML")

df = data.frame(tof = fid$spectrum$tof,
                mz  = xml$spectrum$mass)
m = lm(mz ~ I(tof^2) + I(tof), data = df)
summary(m)

I'm not sure how the coefficients match up to the parameters in the acqu file though, looking at the values that are apparently normally used,

##$DELAY= 48 
##$DW= 0.25 
##$ML1= 1164156.99988012 
##$ML2= 3884.86823922913 
##$ML3= 6.16824746897864 
##$TD= 206848 

the transformed sqrt(1e12 / ML1) which evaluates to 926.8175 and the two other values in the
##$NTB= < ...> parameter, -0.063083671657452864 and -34.362308485323794, nothing
jumps out at me...

@elsagorre
Copy link

Hi,
I was previously experiencing the same problems where when I imported my data via the importBrukerFlex, there was a mass shift. We figured out that the reason was because after the spectra were collected, they were calibrated using the Cubic enhanced, and the importBrukerFlex needs the spectra to be calibrated using the Quadratic function. Try to calibrate your spectra in FlexAnalysis using the quadratic function, save them and then import them back to R. That should remove the mass shift you see.

Hope this helps!

@hsiaoyi0504
Copy link
Author

So it seems that we need to add this into the user docs?

@elsagorre
Copy link

Well, either that or hopefully we can figure out a way to add another variable to the TOF equation, which would change the quadratic equation into the cubic one.

@sgibb
Copy link
Owner

sgibb commented Jan 9, 2017

@hsiaoyi0504 So it seems that we need to add this into the user docs?

I am going to try to figure out how to import this type of files correctly. If we don't get it, I am going to add an error if the NTBCal string contains more than 3 calibration constants.

@Armadilloa16 thanks for testing. The relation is tof ~ sqrt(mz) not tof ~ mz. But even than I can't find any useful answer.

Using the attached files (0_A20.zip, generated by an autoflex device) assuming a quadratic relationship works (as we already know):

library("readBrukerFlexData")
library("readMzXmlData")

f <- readBrukerFlexFile("0_A20/1/1SLin/fid")
m <- readMzXmlFile("0_A20/1/1SLin/0_A20.mzXML")

mz <- m$spectrum$mass
tof <- f$spectrum$tof

# the NTBCal string:
# 19886 0.9999999999999999 268.44302617844 2597289.7995303 -0.004433520310037
cc <- f$metaData$calibrationConstants
cc[1] <- sqrt(1e12/cc[1])

df <- data.frame(tof=tof,
                 x=sqrt(mz))
m <- lm(tof ~ I(x^2) + I(x), data=df)

cbind(acqu=cc[c(2, 3, 1)], model=coef(m))
#            acqu        model
# c2 268.44302618 268.44302618
# c3  -0.00443352  -0.00443352
# c1 620.49715567 620.49715567

The same for the @hsiaoyi0504's files is not working, neither for the cubic nor the 4th-order-polynomial model:

library("readBrukerFlexData")
library("readMzXmlData")

f <- readBrukerFlexFile("0_P10/1/1Ref/fid", filterZeroIntensities=TRUE)
m <- readMzXmlFile("P10_raw.mzXML")

mz <- m$spectrum$mass
tof <- f$spectrum$tof

# the NTBCal string:
# 48 0.25 3884.868239229128 1164156.9998801197 6.1682474689786355 
# -0.063083671657452864 -34.362308485323794
df <- data.frame(tof=tof,
                 x=sqrt(mz))

lm(tof ~ I(x^2) + I(x), data=df)
#
# Call:
# lm(formula = tof ~ I(x^2) + I(x), data = df)
#
# Coefficients:
# (Intercept)       I(x^2)         I(x)
#    -282.260       -1.054     1205.826

lm(tof ~ I(x^3) + I(x^2) + I(x), data=df)
#
# Call:
# lm(formula = tof ~ I(x^3) + I(x^2) + I(x), data = df)
#
# Coefficients:
# (Intercept)       I(x^3)       I(x^2)         I(x)
#  -1.313e+03    8.308e-02   -7.498e+00    1.357e+03

lm(tof ~ I(x^4) + I(x^3) + I(x^2) + I(x), data=df)
#
# Call:
# lm(formula = tof ~ I(x^4) + I(x^3) + I(x^2) + I(x), data = df)
#
# Coefficients:
# (Intercept)       I(x^4)       I(x^3)       I(x^2)         I(x)
#  -3.632e+03   -9.276e-03    1.034e+00   -4.131e+01    1.842e+03

Could it be that it depends on the device? It is a TOF/TOF. So maybe there are different constants/formulas for each flight tube/mass analyzer? (resulting in an additive model?)

@recumbentbirder
Copy link

recumbentbirder commented Jan 26, 2017

Sorry I didn't have the time yet to look into it. And didn't in depth still. But let my just comment on one thing that seems very suspicious to me, which is the delay value: the first two examples (graphs) in this thread seem correct to me except for the mass shift. The shape is ok. That means, the problem lies in the 'delay' alone, which is the time between shot and recording data.

So basically, when you want to record a spectrum from lower masses of 2000 Da, you shoot, and then just let pass by the ions with masses smaller than that. Just wait, until they passed. This is the delay. And when the delay is in fact 48 as in the 'problematic' cases, the shift in the result seems 'correct' to me.

Perhaps we have to figure out how to find the correct delay? Because obviously other tools can do it.

@sgibb
Copy link
Owner

sgibb commented Jan 27, 2017

@recumbentbirder thanks for looking into this. In fact the delay seems very short but wouldn't a different delay just result in a linear mass shift? But the spectrum is stretched and shifted as well.

@recumbentbirder
Copy link

@sgibb oh, yes! Sorry. See, I didn't look that deep into it. I don't know yet, when I have a little more time :-/

@recumbentbirder
Copy link

Let me contribute with a screenshot of said spectrum as displayed by Bruker's flexAnalysis:

to be honest, of all the plots now, this seems the most credible to me. Considering the low delay value, translating directly in a low minimum mass. But this is just a matter of displaying it: I guess you set 'xlim'?

But one interesting observation: in the plotted range, the maximum arbitrary unit intensity is shown as around 2000 by flexAnalysis for the maximum peak at 700 Da. Though the units are arbitrary (dependent on how many shots you sum up), the values in a spectrum file are 'absolute': there's normally no normalization done by default for plotting ...

spec_flanal

@hsiaoyi0504, how did you come up with the mzXML file? A Bruker software, or third party?

@hsiaoyi0504
Copy link
Author

My partner used the mzconvert in the proteowizard to convert the files.

@sgibb
Copy link
Owner

sgibb commented Apr 22, 2017

Sorry for the long delay. I just uploaded readBrukerFlexData 1.8.5 to CRAN. In this version readBrukerFlexData throws a warning if it founds the V1.0CTOF2CalibrationConstants string in NTBCal and doesn't convert the TOF data to m/z values anymore.

@sgibb sgibb transferred this issue from sgibb/MALDIquantForeign Nov 3, 2018
@sgibb sgibb mentioned this issue Aug 14, 2020
@SamGG
Copy link

SamGG commented Sep 27, 2020

Same problem. Does anybody ask Bruker directly?

@sgibb
Copy link
Owner

sgibb commented Sep 29, 2020

Back in 2010 and in early 2011 when I started the development of readBrukerFlexData I had some conversations with two developers and an official speaker from Bruker (about HPC). They weren't allowed to give me any details on the file format.
I assume that this attitude didn't changed but I don't know for sure.

@SamGG
Copy link

SamGG commented Oct 7, 2020

Bruker's policy didn't change. The polynomial equation is confirmed. HPC is patented without any communication.

I spent hours at understanding the question. I think I reached some conclusions (see 2nd next comment, thanks @sgibb for the equation coefficients). I need some help to finish...

I would like a spectra with a fifth calibration coefficient not zero. The one of @hsiaoyi0504 seems problematic to me (next comment).

I would like some feedback from whoever is listening here.

Best,
Samuel

@SamGG
Copy link

SamGG commented Oct 7, 2020

The file from @hsiaoyi0504 but still don't understand it. What is surprising is the shape of the transformation. I plotted the sqrt(m/z) from the mzXML versus tof and the curve at low tof is somehow odd for me.

image

@SamGG
Copy link

SamGG commented Oct 7, 2020

library("readBrukerFlexData")
library("readMzXmlData")

f <- readBrukerFlexFile("spectra_P10/0_P10/1/1Ref/fid")
m <- readMzXmlFile("0_P10.mzXML")

tof <- f$spectrum$tof
mz <- m$spectrum$mass

range(tof)
range(mz)

length(tof)
length(mz)

(idx = as.integer(seq(1, length(tof), length.out = 100)))

tof = tof[idx]
mz  = mz[idx]

plot(tof, mz)
plot(tof, sqrt(mz))


##$NTBCal= <V3.0CCalibrator 12 1 0 0 -2147483648 2147483647  V1.0CHPCData  Order 0 vCoeff V1.0VectorDouble 0  c2 0 c0 0 minMass 0 maxMass 0 bUse 0 endCHPCData V1.0CTOF2CalibrationConstants 48 0.25 3884.868239229128 1164156.9998801197 6.1682474689786355 -0.063083671657452864 -34.362308485323794 2 V1.0CTOF2CalibrationConstants 0 0 0 0 0 0 0 -1  > 
##$NTBCalr= <V3.0CCalibrator 12 1 0 0 -2147483648 2147483647  V1.0CHPCData  Order 0 vCoeff V1.0VectorDouble 0  c2 0 c0 0 minMass 0 maxMass 0 bUse 0 endCHPCData V1.0CTOF2CalibrationConstants 48 0.25 3884.868239229128 1164156.9998801197 6.1682474689786355 -0.063083671657452864 -34.362308485323794 2 V1.0CTOF2CalibrationConstants 48 0.25 3884.868239229128 1164156.9998801197 6.1682474689786355 -0.063083671657452864 -34.362308485323794 2> 

Calib = c(3884.868239229128, sqrt(1e12/1164156.9998801197), 6.1682474689786355, -0.063083671657452864, -34.362308485323794, 2)

df <- data.frame(tof=tof,x=sqrt(mz))
plot(df$tof, df$x)

I didn't succeed in fitting it with the equation, but the equation is OK if the fifth coefficient is zero. This means the equation is cubic instead of 4th order. In fact, @sgibb has nearly get the correct equation except he switched the last two coefficients (D<=>E). Here is my code and a test file (matrix control).

library("readBrukerFlexData")
library("readMzXmlData")
#> Warning: package 'readMzXmlData' was built under R version 4.0.2

# define file location and read
setwd("C:/data/active/tests/test-bruker/")
f <- readBrukerFlexFile("spectra_D23/0_D23/1/1SLin/fid")
#> Warning in .readAcquFile(fidFile = fidFile, verbose = verbose): File 'C:
#> \data\active\tests\test-bruker\spectra_D23\0_D23\1\1SLin\fid' seems to be empty
#> because no laser shots applied to this sample.
#> Warning in readBrukerFlexFile("spectra_D23/0_D23/1/1SLin/fid"): The spectrum file 'C:\data\active\tests\test-bruker\spectra_D23\0_D23\1\1SLin\fid' uses V1.0CTOF2CalibrationConstants.
#> V1.0CTOF2CalibrationConstants aren't supported by readBrukerFlexFile. See https://github.com/sgibb/MALDIquantForeign/issues/19 for details.
#> Could not convert time-of-flight values into mass!
m <- readMzXmlFile("spectra_D23.mzXML")  # converted using ProteoWizard

# extract information
tof <- f$spectrum$tof
mz <- m$spectrum$mass

range(tof)
#> [1] 13987.0 94812.6
range(mz)
#> [1]   399.9186 19933.5835

length(tof)
#> [1] 50517
length(mz)
#> [1] 50517

# sample 100 points
(idx = as.integer(seq(1, length(tof), length.out = 100)))
#>   [1]     1   511  1021  1531  2042  2552  3062  3572  4083  4593  5103  5613
#>  [13]  6124  6634  7144  7654  8165  8675  9185  9695 10206 10716 11226 11737
#>  [25] 12247 12757 13267 13778 14288 14798 15308 15819 16329 16839 17349 17860
#>  [37] 18370 18880 19390 19901 20411 20921 21432 21942 22452 22962 23473 23983
#>  [49] 24493 25003 25514 26024 26534 27044 27555 28065 28575 29085 29596 30106
#>  [61] 30616 31127 31637 32147 32657 33168 33678 34188 34698 35209 35719 36229
#>  [73] 36739 37250 37760 38270 38780 39291 39801 40311 40822 41332 41842 42352
#>  [85] 42863 43373 43883 44393 44904 45414 45924 46434 46945 47455 47965 48475
#>  [97] 48986 49496 50006 50517

tof = tof[idx]
mz  = mz[idx]

# overview
plot(tof, mz)

plot(tof, sqrt(mz))

# extract from the acqu file
##$NTBCal= <V3.0CCalibrator 12 1 0 0 -2147483648 2147483647  V1.0CHPCData  Order 0 vCoeff V1.0VectorDouble 0  c2 0 c0 0 minMass 0 maxMass 0 bUse 0 endCHPCData V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2 V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2  > 
##$NTBCalr= <V3.0CCalibrator 12 1 0 0 -2147483648 2147483647  V1.0CHPCData  Order 0 vCoeff V1.0VectorDouble 0  c2 0 c0 0 minMass 0 maxMass 0 bUse 0 endCHPCData V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2 V1.0CTOF2CalibrationConstants 13987.200000000001 1.5999999999999999 581.41079117084337 2220702.4352394971 -0.036090927547664166 7.3810763208576876e-005 0 2  > 

# define the calibration coefficients
Calib = c(581.41079117084337, sqrt(1e12/2220702.4352394971), -0.036090927547664166, 7.3810763208576876e-005, 0, 2)


# model of the data
df <- data.frame(tof=tof,x=sqrt(mz))

(mod2 = lm(tof ~ 1 + I(x) + I(x^2), data=df))
#> 
#> Call:
#> lm(formula = tof ~ 1 + I(x) + I(x^2), data = df)
#> 
#> Coefficients:
#> (Intercept)         I(x)       I(x^2)  
#>   606.45893    669.77815     -0.01825
Calib[1:3]
#> [1] 581.41079117 671.04989940  -0.03609093

# the coefficients perfectly match the coefficients found by a linear fit
(mod4 = lm(tof ~ 1 + I(x) + I(x^2) + I(x^3) + I(x^4), data=df))
#> 
#> Call:
#> lm(formula = tof ~ 1 + I(x) + I(x^2) + I(x^3) + I(x^4), data = df)
#> 
#> Coefficients:
#> (Intercept)         I(x)       I(x^2)       I(x^3)       I(x^4)  
#>   5.812e+02    6.710e+02   -3.609e-02    7.381e-05   -7.016e-18
Calib[1:5]
#> [1]  5.814108e+02  6.710499e+02 -3.609093e-02  7.381076e-05  0.000000e+00

# graphical view
plot(df$x, df$tof)
lines(df$x, predict(mod2), col = 2, lw = 2)
lines(df$x, predict(mod4), col = 4, lw = 2)

# error on the estimated tof is very small, < 8e-11
plot(df$x, predict(mod4) - df$tof)

Created on 2020-10-07 by the reprex package (v0.3.0)

@SamGG
Copy link

SamGG commented Oct 7, 2020

Final thought: using a cubic or 4th order (aka "enhanced cubic" as called by Bruker IIUC) is interesting. I showed that we interpret the equation to some extent (up to 3th order). I did a verification that the m/z equation equals the tof. I did not solve the equation in order to compute the m/z from the equation and the tof values. When the equation was of 2nd order, there is a symbolic solution, the one stated by @sgibb . Higher orders imply the use of polyroot function to solve the equation for sqrt(mz). I think we could solve the equation for 1000 points and interpolate the values between them. But there are maybe better solutions.

# only 2nd order coefficients
# 0 = A * (sqrt(mz))^2 + B * sqrt(mz) + C(tof)
Re(polyroot(c(Calib[1]-tof[1], Calib[2], Calib[3])))^2
Re(polyroot(c(Calib[1]-tof[length(tof)], Calib[2], Calib[3])))^2

@sgibb
Copy link
Owner

sgibb commented Nov 19, 2020

@SamGG thanks a lot for looking into it! Can you share your test files? Your work is already an improvement. Instead of not converting any tof to mz if V1.0CTOF2CalibrationConstants is present in the acqu file we could at least convert up to 3th order polynomials.

@czanetti
Copy link

czanetti commented Dec 19, 2020

Same problem here, I need to use the cubic enhanced calibration to get decent mass accuracies but also want to keep the metadata from readBrukerFlexData since it is very useful.
If I am not misunderstood, there was some progress but we are not there yet for the exact conversion, I also gave it a try but soon found out it is not an easy task.

I was thinking of a workaround, do you people think this approach is reliable?

  1. use the spectra calibrated with the cubic enhanced equation (or any other unsupported calibration)
  2. export the valid m/z spectra in ASCII or other open formats (e.g flexAnalysis supports batch exporting in ASCII, retaining sample name and fid path)
  3. read the acquired data (fid) with readBrukerFlexData to get all the metadata into R (but wrong m/z values)
  4. substitute the mass and intensity values stored by readBrukerFlexData with the correct ASCII values

Another more elegant/reliable idea could be to store the metadata from the acqu(s) file into mzML or mzXML files, do you think this is feasible? (compassXport documentation is not so user friendly, but it seems to me that this FlexVariableTable.xml may be useful)

or if anybody has suggestions for other workarounds to keep the metadata I would greatly appreciat that.

Thanks for the great contributions!
Best,
Cristian

@FrederikHolck
Copy link

Hi,
We've recently run into the same issue with bruker flex data and V1.0CTOF2CalibrationConstants not being supported and read by readBrukerFlexFile. Are there any news on the development?
I'm afraid I'm not qualified to help in any way. We simply wish to read bruker flex data and export to MSD files.
Best,
Frederik

@sgibb
Copy link
Owner

sgibb commented May 20, 2021

Unfortunately not. You just could use the Bruker software to export fid to mzML and use MALDIquantForeign to import this mzML and export it to MSD.

@AlanRace
Copy link

AlanRace commented Apr 5, 2022

I think the final variable may be some form of constant correction to the resulting m/z. That would explain why when this value is 0, the cubic equation fits (as @SamGG showed).

library("readBrukerFlexData")
library("readMzXmlData")

f <- readBrukerFlexFile("0_P10/1/1Ref/fid") #, filterZeroIntensities=TRUE)
m <- readMzXmlFile("0_P10.mzXML")

A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794

mz <- m$spectrum$mass
tof <- f$spectrum$tof

mzPlusE = mz + E
sign = ((mzPlusE > 0) * 2)-1

df <- data.frame(tof=tof,x=sqrt(abs(mzPlusE))*sign) 

(mod4 = lm(tof ~ 1 + I(x) + I(x^2) + I(x^3), data=df))

plot(df$x, df$tof)
lines(df$x, predict(mod4), col = 2, lw = 2)

x_vs_tof

I don't get the exact coefficients, but fairly close:

Call:
lm(formula = tof ~ 1 + I(x) + I(x^2) + I(x^3), data = df)

Coefficients:
(Intercept)         I(x)       I(x^2)       I(x^3)  
 3845.78915    934.56039      5.79341     -0.05795  
3884.8682    926.8175    6.16824    -0.0630

@AlanRace
Copy link

AlanRace commented Apr 6, 2022

A little more detail. The following (cubic transform) seems to produce the correct m/zs for the 'upper' mass range (i.e. above the m/z dictated by the final coefficient).

library('AlgebraicHaploPackage')
library("readBrukerFlexData")
library("readMzXmlData")

f <- readBrukerFlexFile("0_P10/1/1Ref/fid") 
m <- readMzXmlFile("0_P10.mzXML")

mz <- m$spectrum$mass
tof <- f$spectrum$tof

A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794

a = D
b = A
c = B
d = C

root_mz <- c()

for(t in tof) {
  root_mz <- append(root_mz, cubic(a, b, c, d-t)[3])
}

root_mz = Re(root_mz)
predicted_mz = root_mz*root_mz*sign(root_mz)-E 

plot(mz, predicted_mz)

index_vs_difference

It seems like the range up to this point is determined by a different function(?).

mz_vs_difference

@AlanRace
Copy link

AlanRace commented Apr 7, 2022

Okay I think I have it now. It looks like the first part is quadratic and the rest is cubic. There is probably a much more efficient way to calculate this, but for testing here is the code:

library("readBrukerFlexData")
library("readMzXmlData")

f <- readBrukerFlexFile("0_P10/1/1Ref/fid") 
m <- readMzXmlFile("0_P10.mzXML")

mz <- m$spectrum$mass
tof <- f$spectrum$tof

A <- 6.1682474689786355
B <- sqrt(1e12/1164156.9998801197)
C <- 3884.868239229128
D <- -0.063083671657452864
E <- -34.362308485323794

a = D
b = A
c = B
d = C

# Calculate sqrt(mz) as if it were quadratic, using formula in https://github.com/sgibb/readBrukerFlexData/blob/master/R/tof2mass-functions.R
# But keep track of the sign (whether tof is bigger than C)
tofMinusC = (tof-C)
sign = ((tofMinusC > 0) * 2)-1
to_sqrt = ((B * B) - (4 * A * (tofMinusC)))
root_mz = (-B + sqrt(abs(to_sqrt)))/(2 * A)

predicted_mz <- c()

for(i in 1:length(tof)) {
  # Check whether tof was larger than C
  if(sign[i] < 0) {
    # Use precalculated quadratic calibration
    predicted_mz <- append(predicted_mz, (root_mz[i])^2 * sign[i] - E)
  } else {
    # Assume cubic with the form a*x^3 + b*x^2 + c*x + (d - tof) 
    # where x = sqrt(mz)
    cur_root_mz = Re(polyroot(c(d-tof[i], c, b, a))[1])
    
    predicted_mz <- append(predicted_mz, (cur_root_mz)^2 * sign[i] - E)
  }
}

plot(mz, mz-predicted_mz)

This results in errors ~= 10^-6 at m/z ~= 2000, but I think that is okay for most purposes.

calibration

@sgibb
Copy link
Owner

sgibb commented Apr 12, 2022

@AlanRace that looks promising! Great stuff! I will give it a more detailed look in the next days!

@sgibb
Copy link
Owner

sgibb commented Apr 15, 2022

Dear @SamGG and @AlanRace thank you for your great work! Based on it I integrated the cubic calibration. I add you both as contributors to the DESCRIPTION file. Do you want to add some additional information (ORCID, email, ...?) there? Please feel free to ask for any changes.

Please find the cubic calibration in the issue3-ctof2calibration branch.
It would be great if you could review and test it. If you are both satisfied with it I am going to publish a new version on CRAN.
To test use:

remotes::install_github("sgibb/readBrukerFlexData@issue3-ctof2calibration")

Something I currently not understand completely. It seems to be important to change C - tof into tof - C for the quadratic part (if you compare the equation in tof2mass-functions.R and ntbcal-functions.R (I renamed C into D in the cubic code.)). I would assume that the quadratic part is of this should be identical to the original tof to mz equation.

@AlanRace
Copy link

AlanRace commented May 4, 2022

Looks great to me! I'll try and test it properly over the next few days.

Happy for you to add my ORCID (https://orcid.org/0000-0001-8996-2641) and email (alan.race@uni-marburg.de).

I guess if you wanted it to be consistent between quadratic and cubic part you could just do abs(tof - C)? I guess this is just a measure of distance (in time) away from this pivot point where the function swaps over.

@sgibb sgibb closed this as completed in f882258 Jun 19, 2022
@sgibb
Copy link
Owner

sgibb commented Jun 19, 2022

Finally we could close this issue. Thank you all for your support and patience!

I uploaded a new version (1.9.0) to CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants