4D-STEM data reads incorrectly #17

ercius · 2019-06-05T18:31:08Z

Newer versions of DM with the STEMX capability for 4D-STEM now write 4-dimensional DM4 files with a different shape than previously was done. This is actually an improvement in the layout of the data on disk, but ncempy.io.dm does not currently read this data correctly.

There is a workaround. You can use the low level API in ncemy.io.dm.fileDM to create a memmap on disk with the correct ordering:

d1 = nio.dm.fileDM('filename.dm4')
data = np.memmap(d1.fid,dtype=d1._DM2NPDataType(d1.dataType[1]), mode = 'r', offset=d1.dataOffset[1], shape=(d1.xSize[1],d1.ySize[1],d1.zSize[1],d1.zSize2[1])[::-1],order='C')
image0 = data[:,:,0,0] # the first diffraction pattern

Note how the shape keyword has [::-1] to reverse the order of the data shape. The shape is now [ky, kx, scanY, scanX] making the diffraction patterns the fastest data to access according to C-ordering.

I will attempt to add automatic identification of this change in the file format, but it will take some time to figure out.

The text was updated successfully, but these errors were encountered:

sk1p · 2019-12-05T12:58:37Z

I also stumbled on this.

The shape is now [ky, kx, scanY, scanX] making the diffraction patterns the fastest data to access according to C-ordering.

Actually, it is the other way around: in numpy, the rightmost axis is the fastest to access. You can easily check this using the strides attribute on the array, or even by benchmarking:

>>> image0.strides  # image0 from your code snippet above, applied to some sample data
(143654400, 74820)

This means, between each pixel in image0, we need to jump 74820 bytes in the input data. In case of fast access, this should be a number corresponding to the dtype, for example 4 for float32 data.

>>> b = np.random.randn(128, 128, 128, 128)
>>> b.strides
(16777216, 131072, 1024, 8)

%timeit np.copy(b[..., 0, 0])  # on my system: 534 µs ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> b[..., 0, 0].strides
(16777216, 131072)

%timeit np.copy(b[0, 0, ...])  # on my system: 7.02 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> b[0, 0, ...].strides
(1024, 8)

Or, from your example above, compare the timing of np.copy(data[:, :, 0, 0]) with np.copy(data[0, 0, ...]). The np.copy is needed to actually evaluate the performance - without it, numpy just creates a view into the memory map, without accessing the underlying data.

So sadly, in this new format, while it is fast to access all scan positions for a single pixel of a diffraction pattern, accessing a whole diffraction pattern is very expensive.

sk1p mentioned this issue Dec 5, 2019

Support for single-file DM4 4D datasets LiberTEM/LiberTEM#499

Closed

ercius closed this as completed Jul 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4D-STEM data reads incorrectly #17

4D-STEM data reads incorrectly #17

ercius commented Jun 5, 2019

sk1p commented Dec 5, 2019

4D-STEM data reads incorrectly #17

4D-STEM data reads incorrectly #17

Comments

ercius commented Jun 5, 2019

sk1p commented Dec 5, 2019