-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why strides are ignored in zfp_field_metadata()? #230
Comments
That is indeed intentional. Strides are a property of the organization of the data in memory as the data is compressed and written, with the compressed output being organized differently. Consumers do not necessarily want to maintain those strides when later reading and decompressing the data. Requiring that could, for instance, blow up memory requirements for the consumer if the original data is not stored contiguously. As an example, the original layout could be in array-of-struct form, perhaps with dozens of different fields being written one at a time by a simulation code. If later data analysis is to be done on a single field, you don't want to have to recreate the original layout that wastes storage on all but one field. If the same data layout is desired during decompression, then that can be accomplished by setting strides during decompression, though the strides would have to be maintained separately. |
Thanks for elaborated answer. The scenario i am interested in is compression of 2d vector field as two strided 2d scalar fields. I believe that in such scenario i am bound to either manually (de)interleave vector components into separate planes prior/after compression/decompression or to rely on zfp's internal accounting for strides. As you suggested, storing strides externally and providing them during decompression seems to be a solution, although, i would argue that deriving all the meta required to setup decompression directly from a header would be more convenient. A bit tangential: what is rationale for not directly storing all but data pointer fields of zfp_field in header? It seems that cost of such header is negligible wrt compressed stream itself. |
As mentioned above, one rationale is that the consumer may not want to organize the data the same way the producer does. In fact, I cannot think of a case where the consumer, which processes the data, does not know how it wants the data to be organized. Can you think of a scenario where it would be beneficial to have the producer dictate the data layout for the consumer? In the case of a code processing 2D vector fields, the consumer needs to know if the data layout is Another rationale is that we have gone to great lengths to make the storage of metadata and compression parameters as compact as possible; in most cases, we encode array dimensions, scalar type, and compression mode and parameters in only 64 bits. This compact encoding is motivated by zfp's unique approach to representing large arrays as a collection of very small blocks (consisting of 4d values in d dimensions) that can be (de)compressed independently. We early on anticipated the potential to vary compression parameters spatially, perhaps even from one block to the next, and in that case the overhead of storing compression parameters becomes large. Similarly, in certain applications (like AMR), one may form a larger grid as a collection of smaller ones, with each subgrid composed of a small collection of zfp blocks. In this case, it is again important to keep array metadata per subgrid small. One may even vary precision spatially (e.g., float vs. double), where again you need an efficient way of encoding scalar type. Whereas individual array dimensions are often small (say, 16 bits or less), strides are not only signed but may span the product of all dimensions or even more (when multiple fields are interleaved), making them far costlier to encode. In practice, you often need more than 32 bits per stride, or more than 96 bits for the 2D vector field example above. Now, I can envision a case where the consumer (perhaps an I/O module) is tasked only with reconstructing the original data bit for bit. Using the current zfp API, it would be possible to add a new |
The need to design for such use-case answers my question, thanks.
This is pretty much the case for my usage scenario. |
In zfp_field_metadata(), which is called during evaluation of zfp_write_header(), only field dimensions are stored, while strides are ignored. Is that a bug or strides are intentionally left unhandled?
The text was updated successfully, but these errors were encountered: