Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of a sample tablite.hdf5 file? #37

Closed
rhs3i opened this issue Nov 7, 2022 · 5 comments
Closed

Inclusion of a sample tablite.hdf5 file? #37

rhs3i opened this issue Nov 7, 2022 · 5 comments
Labels
question Further information is requested

Comments

@rhs3i
Copy link

rhs3i commented Nov 7, 2022

Would it be out of the question to include an accompanying tablite.hdf5 file in the repo, possible alongside one of the tests, or from a selected point in the demo?

@root-11
Copy link
Owner

root-11 commented Nov 7, 2022

Hello @rhs3i

The hdf5 file will always be in tmp as:

H5_STORAGE = pathlib.Path(tempfile.gettempdir()) / "tablite.hdf5"

May I ask what you're trying to achieve?

@rhs3i
Copy link
Author

rhs3i commented Nov 8, 2022

Certainly. I'm the author of H5s, a scanner for HDF5. The first objective is to verify the scanner renders tablite HDF5 well. HDF5 that is intended to be primarily machine-read can stress a visual model in ways perhaps not considered, but the graphical constructs should still hold up when inspecting these files. The screenshots and links to the visual vocabulary of the scanner can give some illustration.

The second objective is to get a quick bit of insight as to whether H5s can augment usage of tablite in an interesting way, but that would be a future topic.

@root-11
Copy link
Owner

root-11 commented Nov 8, 2022

Hi Robert,
As you can see from the usage of tempdir the HDF5 files are generally used as a volatile database where data is stored in a hierarchy best described as:

  1. Tables have columns
  2. Columns have pagehandlers
  3. Page handlers have pages.
  4. Pages are of type: (a) Simple (int,float), (b) String (str, utf-8), (c) Mixed (non simple datatypes), (d) Sparse (lots of Nones)

In the tablite.hdf5-file you will therefore find that the Pages contain all the data, whilst the dataset (hdf groups) for Tables and Columns are empty and only have metadata in the attrs-field.

The details are explained here in the HDF5 group webinar: https://youtu.be/OoHVIKAD854?t=1415

@root-11 root-11 added the question Further information is requested label Nov 8, 2022
@rhs3i
Copy link
Author

rhs3i commented Nov 9, 2022

Ah. Thank you for the correction and apologies for the time-wasting. I did sit for your HDF5 webinar (thank you), but I misunderstood the design, thinking that once the HDF5 backing-store had been created and stored all the computational deltas, it would persist beyond program execution and be used by a subsequent downstream tablite processor. But your presentation was clear--the re-import/reload example you showed (39:19) was from within a single program session. Scanning a volatile HDF5 datastore might have some utility in a debugging capacity, but that's another matter and may not be very useful.

Appreciate your time in getting me straightened out on this.

@rhs3i rhs3i closed this as completed Nov 9, 2022
@root-11
Copy link
Owner

root-11 commented Nov 9, 2022

No problem. Happy I could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants