Expose decoder/encoder primitives as public #72

chyyran · 2022-05-11T05:54:17Z

I maintain a Rust implementation of MAME's Compressed Hunks of Data format, which is essentially chunks of data compressed using various compression algorithms, LZMA included. One of the quirks of this format is that because they use a very old LZMA 19.0, the encoding parameters are not saved into the output stream, and therefore every CHD decoder has to essentially mimic the defaults of LZMA 19.0.

Thankfully lzma-rs allows this but since the encode and decode modules aren't public, I had to fork and vendor lzma-rs to access those primitives like LzmaParams in my crate. Additionally I added LzmaParams::new to construct an instance manually for this purpose.

Ideally I would like to just be able to use mainline lzma-rs. Since decoder is mostly documented already, would it be feasible to expose those primitives as public without much work, possibly behind a feature flag?

The text was updated successfully, but these errors were encountered:

gendx · 2022-05-30T20:33:20Z

Since decoder is mostly documented already, would it be feasible to expose those primitives as public without much work, possibly behind a feature flag?

It surely would be feasible! Could you send a pull request for review?

I don't think a feature flag is needed here - just make sure that the newly exposed API is documented and that the GitHub workflows pass.

chyyran · 2022-05-30T23:10:47Z

I did some internal refactoring to expose the raw decoder as a primitive where the LZMA compression parameters are encoded aas const generics. I also did a similar wrapper for LZMA2 although I'm not too sure about the usefulness of that in my fork.

I believe this is a better approach than just exposing LzmaParams and I was able to expose this API without changing the existing public API, but at the cost of having to make some relatively sizeable internal changes, particularly with DecoderState to also allow reusing the existing allocation.

If you could take a quick look to see if you're comfortable with those changes and the shape of the LzmaDecoder and Lzma2Decoder APIs I'd be happy to clean it up slightly and PR it upstream.

gendx · 2022-05-31T08:34:47Z

Some comments:

In the current state of things, I'm not sure about the value of using const generics to encode the compression parameters. The LzmaDecoder struct just converts these const generics parameters into simply calling LzmaParams. It seems to me that something more useful could be achieved with feature(generic_const_exprs) (i.e. literal_probs could be a const generic array rather than a Vec), but this is still an unstable feature. I'm open to exploring that if it's possible to gate the use of const generics by a feature flag (i.e. on stable Rust lzma-rs compiles without using any const generics, and on nightly it can leverage const generics if the feature is enabled).
Having a reset function that fills the default values in place rather than re-allocating and copying seems like a valuable addition :) It's unclear how much improvement this brings after compiler optimizations (the compiler may replace the array copy by a simple in-place writing of the default data) - but the intent is clearer and it also compiles better in debug mode.
What was the reason to moving the output outside of the decoder state (07c2d7e)? This adds quite a bit of verbosity in passing it along on each function call.

Feel free to break down your changes into multiple PRs (exposing LzmaParams, improving the reset implementation, adding some const generics, etc.) to make it easier to review.

chyyran · 2022-06-01T03:19:06Z

Fair criticism of using const generics. In my fork the motivation was mostly for expressiveness (i.e. an LzmaDecoder<3,0,2> would be type-incompatible with an LzmaDecoder<0,0,0>) but I'm open to feature-gating this for consumers that need it or just scrapping it all together in favour of using LzmaParams directly or a new_with_params constructor.
After some unscientific benchmarking, reset was slightly faster in Release as well, but this is heavily dependent on compiler optimizations and specific usage. I think intentwise, reset is a good to have.
Taking output outside of the decoder state was a necessary prerequisite for a simpler reset.
- Conceptually, reset prepares the decoder state for a new buffer of input that may be written into a new output. Keeping output in the decoder state tied the lifetime of the output buffer to the state of the decoder which wouldn't make much sense for a mutable reset function. This wasn't an issue before since the public APIs were one-shot.
- Keeping the lifetime attached meant that a new DecoderState must be created for each possible new output where 'decoder : 'output. There is some added verbosity at the callsite but I think separating the lifetime of the decoder state and the output more than makes up for the complexity budget there.
- Detaching the lifetime of output from DecoderState also allowed LzmaDecoder to not be lifetime-parameterized and reuse the same DecoderState for different output buffers.

With that said I'd be happy to split decoupling output from DecoderState into a separate PR since that doesn't affect any public APIs and build on top of that.

gendx added the enhancement label May 30, 2022

This was referenced May 30, 2022

Enable no_std support #43

Open

How to set dictSize on lzma2_decompress #12

Open

chyyran mentioned this issue Jun 1, 2022

Detach lifetime of output buffer from DecoderState #73

Merged

1 task

chyyran mentioned this issue Jun 9, 2022

Expose API for raw decoder primitives for LZMA and LZMA2. #74

Merged

1 task

gendx closed this as completed in #74 Aug 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose decoder/encoder primitives as public #72

Expose decoder/encoder primitives as public #72

chyyran commented May 11, 2022 •

edited

Loading

gendx commented May 30, 2022

chyyran commented May 30, 2022

gendx commented May 31, 2022

chyyran commented Jun 1, 2022 •

edited

Loading

Expose decoder/encoder primitives as public #72

Expose decoder/encoder primitives as public #72

Comments

chyyran commented May 11, 2022 • edited Loading

gendx commented May 30, 2022

chyyran commented May 30, 2022

gendx commented May 31, 2022

chyyran commented Jun 1, 2022 • edited Loading

chyyran commented May 11, 2022 •

edited

Loading

chyyran commented Jun 1, 2022 •

edited

Loading