Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support segmentation and non-segmentation in more decompression kernels #13

Open
eyalroz opened this issue May 1, 2017 · 1 comment
Assignees

Comments

@eyalroz
Copy link
Owner

eyalroz commented May 1, 2017

(copied from Issue #163 in the kernel testbench)

At the moment, most of our decompressors can be used with segment anchors or without them - but not both:

Scheme Segmented Unsegmented
BITMAP N/A 🗹
RPE 🗹 🗷
DICT 🗷 🗹
FOR 🗹 🗷
MODEL 🗷 🗹
NS N/A 🗹
NSV 🗹 🗷
RPE 🗹 🗷
RLE 🗹 🗷

First, segment execution is important even as a single option; so MODEL and DICT should definitely have it. Then, it would be nice to support, at least for the sake of completeness, the unsegmented versions of these schemes, especially DELTA for benchmarking purposes, and RPE for cases where the overall support of the column is so small that segmentation is mostly a hassle.

@eyalroz
Copy link
Owner Author

eyalroz commented May 2, 2017

For the DICT scheme, we'll need to choose between uniformity and flexibility.

In the uniform extreme of the spectrum, we'll have:

  • Fixed size dictionaries
  • Fixed element size per dictionary
  • An actual new dictionary copied in for every segment of the compressed data (even if it's very similar to the previous segment's dictionary)

And in the flexible extreme (or close to it) we'll have:

  • A variable-length, and variable-width, array of dictionary entry data
  • For each segment, a dictionary descriptor:
    • An indication of where the dictionary begins in the variable-length dictionary data
    • The dictionary's length (number of entries)
    • (Possibly) The dictionary index size in bytes or in bits; this could theoretically be deduced from the dictionary's length - but that depends too much, perhaps, on the decompressing software's capabilities
      ... and note that a segment might simply refer to the same dictionary as its predecessor; or we might even allow it to expand its predecessor's dictionary by starting at the same place and extend further.

I'm leaning toward the more flexible extreme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant