Decompression of a stream in output chunks without dictionary #32

maximevince · 2020-09-15T11:35:03Z

From the README, I gather:

Handling of output (decompressed) stream:

In-memory decompression, where output stream fully resides in memory.

Streaming decompression, which allows to process arbitrary-sized streams (longer than available memory), but requires in-memory buffer for Deflate dictionary window.

Application specifies number of output bytes it wants to decompress, which can be as high as UINT_MAX to decompress everything into memory at once, or as low as 1 to decompress byte by byte, or any other value to decompress a chunk of that size.

Option 1 is not feasible on the target MCU (not enough RAM)
Option 2 works on a test application, but is not feasible on the target MCU, if the dictionary must be 32kB (not enough RAM)

That leaves me with:
Option 3: What I am trying to do is to decompress 128 bytes (or any other value) of bytes at a time (the 'chunk' size) from a stream. (Streaming the input in 128b chunks works fine!)

This 128 byte output works, once.

After that, if I re-init d.dest_start = d.dest = uz_data_dest;,
this will cause an error -3 when decompressing the next series of bytes.

I found out that this is because of the LZSS offset (lzOff) looking for previous occurrences of the same data, relative to the current position of the output buffer (hence d->dest[0] = d->dest[d->lzOff]). But since the output buffer has been re-initialized, these relative offsets are not valid anymore. So that sounds logical.

I gather that's why in this case, a dict is needed. Sure enough, I added a 32kB rambuffer to be used as dict, and it worked.
However, on the target embedded system, I don't have 32kB of RAM...

So the question is:

Can decompressing in chunks (writing to the same RAM location for every chunk) be done without dictionary?
If not, how big must the dictionary really be? Can a size smaller than 32kB be used and what are the conditions?

With some explanation about the exact usage of these case, I would be glad to provide some example code to be included in the repo, if that would help as documentation for the project!

Thanks for the project! 👍

The text was updated successfully, but these errors were encountered:

maximevince · 2020-09-15T17:33:07Z

As an update: I found a way to decompress only per chunk of X bytes to RAM, then write that to flash memory (my use-case is a compressed firmware image for OTA firmware upgrades), then reset the RAM output buffer (d->dest), and then continue decompressing, without the need for a 32K dictionary RAM buffer.

By detecting whenever the lzOff is pointer outside (i.e. before the start) of the d->dest_start buffer, I can still access that data in the 'external` buffer.

A first proof-of-concept is implemented here: maximevince@c528e63

After testing this, I realized there are many use-cases for this:
Tiny embedded devices with only limited RAM (so a few buffers of 100s of bytes are OK, but not 32K), but with access to a lot larger ROM memories (e.g. internal or external flash, EEPROM, SDcard, ...) .

But there might be a cleaner way to implement this. Instead of introducing the extra d->dest_ext pointer etc.. It might make sense to simply implement an outbuf_put() and outbuf_get() callback function, that would then allow the user of the library to do whatever is needed to 'write' or 'read' from that output buffer. This decouples the need for RAM to buffer data, from the library.
Of course, this is going to be a lot slower than reading/writing directly from RAM, but in a lot of use-cases (like the firmware upgrade case), this does not matter.

Let me know what you think, and what kind of interface you would prefer, and I can send a pull-request.

d-a-v · 2020-09-15T17:50:52Z

For the exact same purpose, I had initiated a similar job, FWIW, not proposed as pull-request. I implemented your idea through macros ALIGN_READ / ALIGN_WRITE that have default definitions for legacy behavior but that can be redefined to something else (reading from flash) when dictionary is disabled.
This is a proof of concept that has been proposed nowhere, and proved to be working in an emulation environment only.

This functionality you propose would be indeed very useful on small targets.

This idea was abandoned because the esp8266 arduino OTA second stage happens at reboot time when there's still plenty/enough of temporary ram to be able to uncompress and reflash the FW. I'd like to thanks once again @pfalcon for this library. Credits are already given in the esp8266 arduino main readme.

mikevoyt · 2020-09-15T20:52:08Z

It's been a while, but I believe this Pull Request accomplishes what you are looking for - "deflate a stream without a memory buffer". It allows for reading compressed input, and writing the decompressed output, a single byte at a time.

It's not a very clean PR, but I did put this into a production device for reading a compressed factory firmware image from SPI flash, and uncompressing it a word at a time to write to internal program flash, and it's been working great for 4+ years. I should probably clean up the PR and re-submit it at some point.

d-a-v · 2020-09-16T20:10:21Z

Apparently @mikevoyt your idea is welcome by owner but further work seems to be required for your PR to be accepted.

pfalcon · 2020-09-16T21:51:29Z

There was a change which I had in my git stash for quite some time, and which I now cleaned up and pushed: 2aeab6c. If you look at it carefully, you'll notice that what it would take to support "copy LZ strings from a backing store [instead of memory]" - is to allow to override memcpy there with a user-defined function.

If someone would like to prepare a patch, please follow https://github.com/pfalcon/change-control-manifesto . Thanks.

pfalcon · 2020-09-16T21:53:13Z

Oh, and I would like to remind that zlib supports configurable dictionary size, so it doesn't have to be "32KB". Powers of 2 from 512 to 32KB are supported. As an example, Pycopy packages use 4K dictionary: https://github.com/pfalcon/pycopy-lib/blob/master/sdist_upip.py#L34

maximevince · 2020-09-17T13:58:09Z

Nice, I just rebased on top of your new commits.
Can you have a look at my proposed pull-request #33 ?

github-actions · 2021-05-09T22:18:41Z

Thanks for your submission. However there was no (further) activity
on it for some time. Below are possible reasons and/or means to
proceed further:

Please make sure that your submission contains clear and complete
description of the issue and information required to understand and
reproduce it (exact version of all components involved, etc.). Please
refer to numerous guides on submitting helpful bugreports, e.g.
https://www.chiark.greenend.org.uk/~sgtatham/bugs.html
Please make sure that your feature request/suggestion/change aligns
well with the project purpose, goals, and process.
If you face the issue you report while working on a community-oriented
open-source project, feel free to provide additional context and
information - that may help to prioritize the issue better.
As many open-source projects, this project is severely under-staffed
and under-resourced. If you don't run a community-oriented open-source
project, and would like to get additional support/maintenance, feel
free to request a support contract.

Thanks for your understanding!

github-actions · 2021-05-25T22:19:33Z

Closing due to inactivity.

maximevince mentioned this issue Sep 17, 2020

tinflate/crc32/adler32: add external destination buffer support #33

Closed

github-actions bot added the Stale label May 9, 2021

github-actions bot closed this as completed May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompression of a stream in output chunks without dictionary #32

Decompression of a stream in output chunks without dictionary #32

maximevince commented Sep 15, 2020

maximevince commented Sep 15, 2020

d-a-v commented Sep 15, 2020

mikevoyt commented Sep 15, 2020 •

edited

Loading

d-a-v commented Sep 16, 2020

pfalcon commented Sep 16, 2020

pfalcon commented Sep 16, 2020

maximevince commented Sep 17, 2020

github-actions bot commented May 9, 2021

github-actions bot commented May 25, 2021

Decompression of a stream in output chunks without dictionary #32

Decompression of a stream in output chunks without dictionary #32

Comments

maximevince commented Sep 15, 2020

maximevince commented Sep 15, 2020

d-a-v commented Sep 15, 2020

mikevoyt commented Sep 15, 2020 • edited Loading

d-a-v commented Sep 16, 2020

pfalcon commented Sep 16, 2020

pfalcon commented Sep 16, 2020

maximevince commented Sep 17, 2020

github-actions bot commented May 9, 2021

github-actions bot commented May 25, 2021

mikevoyt commented Sep 15, 2020 •

edited

Loading