Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding reader for TVIPS datastream files #2780

Closed
din14970 opened this issue Jun 25, 2021 · 2 comments
Closed

Adding reader for TVIPS datastream files #2780

din14970 opened this issue Jun 25, 2021 · 2 comments
Milestone

Comments

@din14970
Copy link
Contributor

This issue serves mainly to notify about something I'm currently working on, and to have something to reference. TVIPS is a very small German company that sells CMOS cameras that are fast and have good dynamic range. They seem to be decently popular in Germany. We have these cameras installed our JEOL instruments and use them for in-situ studies and various 4D-STEM experiments. This kind of data is collected in a quite inconvenient stream form, starting with a main header and each image frame preceded by a small frame header. In addition, files are capped at a certain size, so datasets continue by creating and saving to additional files (file_000.tvips, file_001.tvips, ...). There is very limited metadata included in the files, so users must typically reconstruct the shape of the data hypercube themselves. To deal with this data, I've previously written a GUI based converter program to convert the data to blo, hspy or tiff. However, it is annoying to have to duplicate very large datasets before one can work with them, and besides the additional wasted disk space it can take 10-30 min for a conversion. I think at this point I have enough knowledge to implement a file reader directly in Hyperspy. I envision adding the following arguments; most of them are only relevant for 4D-STEM datasets:

  • scan_shape: in case the dataset is a 4D STEM dataset and the original shape can not be automatically detected
  • first_frame: index of the first frame to include in the dataset in case it can not be automatically detected
  • last_frame: index of the last frame in case it can not be automatically detected
  • winding_scan: boolean, whether the scan unit operated under flyback mode or "snake-scan" mode
  • hysteresis: scan point offset of even scan rows to correct for miss-aligned "snake-scan" mode

By default, the data would just be loaded as an image stack unless scan_shape is defined.
The original implementation in my GUI converter relies on a loop over the files. I hope I can do something a bit smarter with np.memmap, even though the array data is non contiguous and possibly split over multiple files.

For original_metadata I was planning to only include the main header. However there is the possibility to record additional information like temperature in the frame metadata, so it might be handy to be able to also optionally load and return this information.

@magnunor
Copy link
Contributor

More file formats is always nice!

With regards to lazy loading + scan_shape combined with np.memmap, there is probably some clever things you can do with regards to the chunking to get optimal performance. For example making sure the chunks doesn't extend over several files, to avoid read amplification.

@ericpre
Copy link
Member

ericpre commented Mar 26, 2022

Done in #2781.

@ericpre ericpre closed this as completed Mar 26, 2022
@ericpre ericpre added this to the v1.7 milestone Mar 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants