Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document FID binary representation #57

Open
wilsonmichael opened this issue Oct 4, 2013 · 6 comments
Open

Document FID binary representation #57

wilsonmichael opened this issue Oct 4, 2013 · 6 comments
Assignees

Comments

@wilsonmichael
Copy link
Member

Document FID binary representation in order to facilitate reading and writing in different languages, add code snippets fro Java, Python, R, Ruby

@ghost ghost assigned wilsonmichael Oct 4, 2013
@rsalek
Copy link
Member

rsalek commented Oct 4, 2013

.FID File
File extension: FID
File type: Bruker Aspect NMR Data File
Also Varian

SER 2D fid Bruker, Each individual fid is added sequentially to the SER file by the pulse sequence

Data format see 👍 http://www.onemoonscientific.com/nvjChunks/ch05.html
nmrGlu that Micael talked about
https://code.google.com/p/nmrglue/

@wilsonmichael
Copy link
Member Author

Hello all,

So there are multiple ways to store the data in the different formats, but the decision is to standardize the definition in the nmrML format.

We have already decided to encode the RAW fid as a binary blob that is optionally zipped then encoded as base64 to be stored in the xml instance as text. Additionally the encoded length is stored so that a super efficient parser could skip right over this chunk if it doesn't want to parse it.

I think we should store the data in row-major order in all cases. A more precise definition is needed since we use complex numbers, I will explain it below, and use the basis of this discussion for definition in the documentation. Hopefully this will make it clear or catch me if I have made a mistake.

for 1D fid:
[1+1i,2+2i,3+3i]

when we store it we flatten the complex numbers to adjacent floats, giving:
[1,1i,2,2i,3,3i]

This case extends to 2D, up to ND, I will give an example with 3D so that all is clear:

For 3D with dimensions X=3,Y=3,Z=3:
[
[
[1+1i,2+2i,3+3i],
[4+4i,5+5i,6+6i],
[7+7i,8+8i,9+9i]
],[
[1+1i,2+2i,3+3i],
[4+4i,5+5i,6+6i],
[7+7i,8+8i,9+9i]
],[
[1+1i,2+2i,3+3i],
[4+4i,5+5i,6+6i],
[7+7i,8+8i,9+9i]
]
]

When flattened is:
[
1,1i,2,2i,3,3i,4,4i,5,5i,6,6i,7,7i,8,8i,9,9i,
1,1i,2,2i,3,3i,4,4i,5,5i,6,6i,7,7i,8,8i,9,9i,
1,1i,2,2i,3,3i,4,4i,5,5i,6,6i,7,7i,8,8i,9,9i
]

If the array is stored in a block of contiguous memory, we can use the following pointer arithmetic to access the data

To access the real part of number at [x][y][z] (multiply Z by 2 since we flatten complex into two floats):
[x_Y_Z_2 + y_Z_2 + 2_z ]

To access the imaginary part of number at [x][y][z]:
[x_Y_Z_2 + y_Z_2 + (2_z+1)]

so for example to access [1][2][2] ( in bold )

in our case, X=3,Y=3,Z=3

[1_3_3_2 + 2_3_2 + 2_2] = 30
[1_3_3_2 + 2_3_2 + 2_2+1] = 31

In a real FID the dimensions are defined as so:
Z = number of datapoints in direct dimension
Y = number of datapoints in first indirect dimension
X = number of datapoints in 2nd indirect dimension

@sneumann
Copy link
Member

Hi, how do I get X,Y,Z from the nmrML file ? I found:
Z=DirectDimensionParameterSet numberOfDataPoints="57804"
but what about X,Y ?

@LuisFF
Copy link
Contributor

LuisFF commented Oct 14, 2013

Hi Michael,

if you flatten the complex numbers, why do you need to store them as complex128? Wouldn't it be better to store them as long, knowing that the first entry corresponds always to the real value and the second to the imaginary one? The (bruker) fid uses that convention already. Note that the representation of signal intensities coming from each channel as a complex number is just a convenient (mathematical) way to represent the NMR signal.

Cheers,
Luis

@wilsonmichael
Copy link
Member Author

It is a bit of a semantics issue. Basically I think it is better to call it complex128 since it is not an array just an array of floats it is an array of pairs of floats.

I have created a complex type called BinaryDataArrayType that extends the base64Binary built in type and adds some required attributes to describe the way the data was encoded. We can re-use this type in different places and define different terms that we need to describe the data. So we could have complex128 and float64, and foat32, and int32 ..etc

image

The reason I have done it this way is because that it makes the most sense to me.. however it seems to be causing some confusion.

Some alternatives: we could say it is float64 and specify that it is actually pairs of related floats in some other way, or we could use parallel arrays instead of zipping the two components together into one array.

@wilsonmichael
Copy link
Member Author

As for the dimensions

In a real FID the dimensions are defined as so:
Z=DirectDimensionParameterSet numberOfDataPoints="57804"
for 1D fid there is no other dimensions

but for 3D
Z=DirectDimensionParameterSet numberOfDataPoints="57804"
first indirect dimension in file:
Y=IndirectDimensionParameterSet numberOfDataPoints="57804"
second indirect dimension in file
X=IndirectDimensionParameterSet numberOfDataPoints="57804"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants