Skip to content

zran enable down-version zlib random-access clients using byte-aligned indexes#801

Closed
jnorthrup wants to merge 1 commit intomadler:developfrom
jnorthrup:develop
Closed

zran enable down-version zlib random-access clients using byte-aligned indexes#801
jnorthrup wants to merge 1 commit intomadler:developfrom
jnorthrup:develop

Conversation

@jnorthrup
Copy link
Copy Markdown

the zran indexes and similar tools include non-byte-alignment compressed entrypoints.

I believe that byte-aligned compressed entrypoints simplifies the index, the serde, and relaxes the version particularly where Oracle JDK and JZLib java clients are bound to zlib 1.1.x which works for most files you encounter, but cannot call inflatePrime.

@jnorthrup jnorthrup changed the title zran even-byte index points zran enable down-version zlib random-access using byte-aligned indexes Apr 13, 2023
@jnorthrup jnorthrup changed the title zran enable down-version zlib random-access using byte-aligned indexes zran enable down-version zlib random-access clients using byte-aligned indexes Apr 13, 2023
@madler
Copy link
Copy Markdown
Owner

madler commented Apr 13, 2023

I have updated zran.c with a compile-time #define NOPRIME, which will use a substitute for the inflatePrime() function: 7e6dc42 . This still permits entry points at arbitrary bit locations.

However neither that commit nor your PR would solve the stated problem, which is to get zran.c to work with zlib 1.1.x. inflatePrime() was introduced in zlib 1.2.3, but Z_BLOCK was introduced in zlib 1.2.1. You need the Z_BLOCK functionality of inflate() in order to find the block boundaries, be they on byte boundaries or not.

The linked commit will allow zran.c to work with zlib versions 1.2.1 and 1.2.2, and any incomplete zlib clones that don't have inflatePrime(), but that do have Z_BLOCK for inflate().

@madler madler closed this Apr 13, 2023
@jnorthrup
Copy link
Copy Markdown
Author

However neither that commit nor your PR would solve the stated problem, which is to get zran.c to work with zlib 1.1.x. inflatePrime() was introduced in zlib 1.2.3, but Z_BLOCK was introduced in zlib 1.2.1. You need the Z_BLOCK functionality of inflate() in order to find the block boundaries, be they on byte boundaries or not.

I am not fully comprehending the scope of your claims, please forgive the naivety.

If i write the zran index on byte-aligned outputs with any recent version of zlib, can i follow up with a reader client of that index to read a 32k window, and then swap the pointer to the zstrm inflater input buffer with the input file seeked to the correct location to continue inflating? I don't understand the role of Z_BLOCK as a requirement in this situation since the inflater simply needs 32k to prime the dictionary by my understanding

@madler
Copy link
Copy Markdown
Owner

madler commented Apr 14, 2023

If your application is to use byte-aligned entry points (generated somewhere else) with a variant of zlib that does not have inflatePrime(), then no changes to zran.c would be needed at all. Simply link it with a dummy inflatePrime() routine, which will never be called if point->bits is always zero.

My commit permits entry points at any bit offset, even the zlib variant does not have inflatePrime(). That is a better solution, since candidate byte-aligned entry points occur one-eighth as often. It also has the benefit that no change to zran.c is required for the generation of the index.

And yes, the client side does not need a zlib inflate() with Z_BLOCK, if it does not need to generate the index.

@jnorthrup
Copy link
Copy Markdown
Author

jnorthrup commented Apr 14, 2023 via email

@madler
Copy link
Copy Markdown
Owner

madler commented Apr 14, 2023

Sorry. I can't make any sense out of any of the content in your last comment.

@jnorthrup
Copy link
Copy Markdown
Author

jnorthrup commented Apr 14, 2023 via email

@madler
Copy link
Copy Markdown
Owner

madler commented Apr 14, 2023

The stated problem was zran.c not working with zlib 1.1.x.

zran.c has both the index generation and the indexed access. Your PR would not solve that problem, since the index generation is not possible with zlib 1.1.x, due to it's lack of Z_BLOCK.

If all you want is the indexed access part of zran.c, then your PR is not needed at all. As I said, if your entry points all have a point->bits of zero, i.e., they are all byte-aligned entry points, then simply link zran.c with a dummy inflatePrime() function that does nothing, and make no changes to the indexed access of zran.c at all.

@madler
Copy link
Copy Markdown
Owner

madler commented Apr 14, 2023

Aside from your PR a) not solving the problem, and b) not even being needed in the first place, it would also completely break zran.c. Obviously those changes would prevent the indexed access part from working with the index generation part, which would still be making indexes into arbitrary byte locations. So I'm not clear on why you submitted it.

@madler
Copy link
Copy Markdown
Owner

madler commented Apr 14, 2023

The value of allowing entry points at any bit location is that there are eight times as many candidate entry points available to choose from, as compared to only allowing them at byte boundaries. An entry point needs to be at the start of a deflate block. Each deflate block will generate on the order of a few tens to a few hundreds of K bytes of uncompressed data. When you ask zran for an index every megabyte, the distance between entry points will be at least a megabyte, but likely a few tens to a few hundreds of K byte more than that, since it has to wait for the start of the next deflate block after a megabyte has gone by.

If you only permit byte-aligned entry points, you will have to wait much longer for the start of a deflate block that happens to start on a byte boundary. About eight times as long. So you will have to go a few hundreds of K to a few megabytes of uncompressed data until you finally run across a deflate block on a byte boundary.

@jnorthrup
Copy link
Copy Markdown
Author

jnorthrup commented Apr 14, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants