Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extmod/moddeflate.c: Add deflate.DeflateIO providing compression and decompression. #11905

Merged
merged 14 commits into from Jul 21, 2023

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented Jun 30, 2023

This replaces #11879 -- because there was no way to provide a gzip.GzipFile that exactly matched CPython's (e.g. because we need the wbits parameter), instead we will put this functionality in a separate module/class. I have updated micropython/micropython-lib#694 to also provide a CPython-compatible gzip module.

Updated copy of the #11879 description below:


Summary: Adds gzip/zlib/raw-deflate compression support. (See issues #7228, #5590, and earlier PRs #8195, #5613).

Historically MicroPython provided a CPython-compatible implementation of zlib.decompress as well as a MicroPython-specific zlib.DecompIO which is similar to CPython's zlib.decompressobj except more useful as a stream wrapper, for example to decompress a file or socket, because you pull (read) decompressed data out of it, rather than pushing (write) compressed data into it.

However, CPython also provides gzip.GzipFile which works exactly like MicroPython's DecompIO, the only limitation is that it only supports gzip-format data (whereas DecompIO supported the same wbits argument that zlib.decompress supported).

This PR removes the zlib module, and replaces it with a deflate module providing a stream wrapper class DeflateIO that is essentially DeflateIO with compression support, but also suitable as a building block for implementing gzip.GzipFile. Because it no longer part of the zlib module, it also has a simpler API for specifying format and window size.

For backwards compatibility, I have written a pure-Python implementation of the functionality of the former zlib module (including now zlib.compress), as well as the gzip module (including compress, decompress, open, and GzipFile), which will be published to micropython-lib (micropython/micropython-lib#694) and can be optionally installed via mip (or frozen). Even though they are very small (~450 bytes each), they are not frozen by default, with the justification:

  • (De)compression isn't a feature that everyone uses.
  • Even in applications that use (de)compression, DeflateIO provides a much more memory-efficient way to work with reading/writing compressed data to/from a stream (which is the typical use case).
  • GzipFile not supporting the wbits argument makes it awkward to use in a general-purpose way.

This PR includes the following changes:

  • Add a shared mp_stream___exit___obj that avoids each file/stream-like object needing to implement this themselves. (This is a code size optimisation saving ~200 bytes)
  • Add a memory-efficient implementation of LZ77 compression to lib/uzlib (Written by @dpgeorge ).
  • General code size optimisations for lib/uzlib. (Saves around 650 bytes)
  • Code cleanup and simplication for lib/uzlib.
  • Remove the zlib module.
  • Add the deflate module.
  • Add tests for deflate.DeflateIO.
  • Update docs for zlib and add docs for gzip to describe the micropython-lib versions.
  • Add docs for deflate.

Overall this PR adds +1156 bytes to PYBV11 (i.e. adding compression support). This is made up of -2840 (remove zlib), +2928 (add decompression-only DeflateIO), +1068 (add compression support to DeflateIO).

Currently compression support (via MICROPY_PY_DEFLATE_COMPRESS) is enabled at the same level ("extra features") as decompression (via MICROPY_PY_DEFLATE), but I think we should consider making it either "full features" or "everything". I do think we should enable it on Unix/Windows though (but really I think that means we should make it "full" and move Unix/Windows to "full").

Compared to #8195 & #5613 which are quite similar (cc @harbaum, @andrewleech) this PR adds support for streaming as well as non-gzip (i.e. zlib/raw) formats and a code size reduction as well as reduced memory usage in the compressor (see #11879 (comment) for more notes from @dpgeorge).

This work was funded through GitHub Sponsors, by specific request from a supporter.

@github-actions
Copy link

github-actions bot commented Jun 30, 2023

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  -384 -0.048% standard[incl -32(data)]
      stm32:   -96 -0.025% PYBV10
     mimxrt:  +988 +0.276% TEENSY40
        rp2:   -88 -0.027% PICO
       samd:  +992 +0.382% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@jimmo
Copy link
Member Author

jimmo commented Jun 30, 2023

Some quick performance tests:
On PYBV11, decompression is unchanged 781ms (from 783ms) to decompress a 7.4kiB gzip file (into 14kiB) 25 times (using DeflateIO vs DecompIO from BytesIO using readinto to avoid allocations).
On unix, decompression on MicroPython (DeflateIO) is roughly 7 times slower than in Python (gzip.GzipFile). 465ms vs 65ms for 1000 iterations of the same test.

For compression (on the unix port), MicroPython (wbits=8) is about 8x slower than Python (wbits=15) to compress the same file. (4x if you use wbits=6).

In terms of bytes/second:
On Unix, CPython compresses at about 45MiB/s, whereas MicroPython at about 2.8MiB/s. CPython decompresses at about 110MiB/s, whereas MicroPython is about 18MiB/s.
On PYBV11, MicroPython compresses at 63kiB/s and decompresses at 245kiB/s.

MicroPython's decompression performance is impacted slightly by not buffering its input as it reads byte at a time from the underlying stream. On PYBV11 it can be increased to 280kiB/s by adding a 4 byte buffer. Diminishing marginal returns for a bigger buffer. Similar relative gain on Unix (22MiB/s).

MicroPython's compression performance is limited by the linear searching in the history window (it's not meant to be fast, it's meant to use less memory).

@andrewleech
Copy link
Sponsor Contributor

I'm really excited to see this coming together, thanks @jimmo !

@codecov
Copy link

codecov bot commented Jul 3, 2023

Codecov Report

Merging #11905 (ea1a5e4) into master (14c2b64) will increase coverage by 0.00%.
The diff coverage is 98.33%.

@@           Coverage Diff            @@
##           master   #11905    +/-   ##
========================================
  Coverage   98.41%   98.41%            
========================================
  Files         155      156     +1     
  Lines       20564    20696   +132     
========================================
+ Hits        20238    20368   +130     
- Misses        326      328     +2     
Impacted Files Coverage Δ
extmod/modbinascii.c 100.00% <ø> (ø)
extmod/vfs_fat_file.c 95.50% <ø> (-0.15%) ⬇️
extmod/vfs_lfsx_file.c 94.39% <ø> (-0.16%) ⬇️
extmod/vfs_posix_file.c 91.95% <ø> (-0.27%) ⬇️
lib/uzlib/adler32.c 100.00% <ø> (ø)
lib/uzlib/crc32.c 100.00% <ø> (ø)
py/builtinimport.c 100.00% <ø> (ø)
py/objstringio.c 92.38% <ø> (-0.22%) ⬇️
py/stream.h 100.00% <ø> (ø)
lib/uzlib/tinflate.c 93.75% <93.75%> (-0.94%) ⬇️
... and 6 more

... and 1 file with indirect coverage changes

@dpgeorge
Copy link
Member

Re performance: compiling relevant lib/uzlib files with -O2 will probably go a long way to improve performance.

@dpgeorge dpgeorge added this to the release-1.21.0 milestone Jul 11, 2023
// -----CMF------ ----------FLG---------------
// CINFO(5) CM(3) FLEVEL(2) FDICT(1) FCHECK(5)
uint8_t buf[] = { 0x08, 0x80 }; // CM=2 (deflate), FLEVEL=2 (default), FDICT=0 (no dictionary)
buf[0] |= MAX(self->window_bits - 8, 1) << 4; // base-2 logarithm of the LZ77 window size, minus eight.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use wbits? In case self->window_bits==0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

static const mp_arg_t allowed_args[] = {
{ MP_QSTR_stream, MP_ARG_REQUIRED | MP_ARG_OBJ, {.u_obj = MP_OBJ_NULL} },
{ MP_QSTR_format, MP_ARG_INT, {.u_int = DEFLATEIO_FORMAT_NONE} },
{ MP_QSTR_wbits, MP_ARG_INT, {.u_int = 0} },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the default here should be DEFLATEIO_DEFAULT_WBITS, and then the code that checks for wbits==0 elsewhere in this file can be removed?

Edit: or does wbits=0 mean something special?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: or does wbits=0 mean something special?

Yes.

{ MP_QSTR_close, MP_ARG_KW_ONLY | MP_ARG_OBJ, {.u_rom_obj = MP_ROM_FALSE} },
};
mp_arg_val_t args[MP_ARRAY_SIZE(allowed_args)];
mp_arg_parse_all_kw_array(n_args, n_kw, all_args, MP_ARRAY_SIZE(allowed_args), allowed_args, args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible make these positional-only args, to reduce code size and allow a natmod version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dpgeorge
Copy link
Member

I added a new test for stream errors.

STATIC mp_uint_t deflateio_read(mp_obj_t o_in, void *buf, mp_uint_t size, int *errcode) {
mp_obj_deflateio_t *self = MP_OBJ_TO_PTR(o_in);

if (!deflateio_init_read(self) || self->stream == MP_OBJ_NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks should be swapped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jimmo jimmo force-pushed the extmod-deflate branch 2 times, most recently from ea972d4 to b67d30e Compare July 14, 2023 07:26
@jimmo
Copy link
Member Author

jimmo commented Jul 14, 2023

Added deflate as a dynamic native module example (to replace the old zlib version).

@jimmo
Copy link
Member Author

jimmo commented Jul 14, 2023

Made compression enabled at the "full" level by default, and updated docs to match.

@jimmo jimmo force-pushed the extmod-deflate branch 3 times, most recently from 680da47 to a78e5be Compare July 14, 2023 08:04
docs/library/deflate.rst Outdated Show resolved Hide resolved
extmod/moddeflate.c Outdated Show resolved Hide resolved
ports/nrf/modules/os/microbitfs.c Outdated Show resolved Hide resolved
lib/uzlib/uzlib.h Show resolved Hide resolved
@jimmo
Copy link
Member Author

jimmo commented Jul 21, 2023

Updated to address comments. Added AUTO as a module constant.

jimmo and others added 3 commits July 21, 2023 18:48
This will be replaced with a new deflate module providing the same
functionality, with an optional frozen Python wrapper providing a
replacement zlib module.

binascii.crc32 is temporarily disabled.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
There are enough places that implement __exit__ by forwarding directly to
mp_stream_close that this saves code size.

For the cases where __exit__ is a no-op, additionally make their
MP_STREAM_CLOSE ioctl handled as a no-op.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
The compression algorithm implemented in this commit uses much less memory
compared to the standard way of implementing it using a hash table and
large look-back window.  In particular the algorithm here doesn't allocate
hash table to store indices into the history of the previously seen text.
Instead it simply does a brute-force-search of the history text to find a
match for the compressor.  This is slower (linear search vs hash table
lookup) but with a small enough history (eg 512 bytes) it's not that slow.
And a small history does not impact the compression too much.

To give some more concrete numbers comparing memory use between the
approaches:

- Standard approach: inplace compression, all text to compress must be in
  RAM (or at least memory addressable), and then an additional 16k bytes
  RAM of hash table pointers, pointing into the text

- The approach in this commit: streaming compression, only a limited amount
  of previous text must be in RAM (user selectable, defaults to 512 bytes).

To compress, say, 1k of data, the standard approach requires all that data
to be in RAM, plus an additional 16k of RAM for the hash table pointers.
With this commit, you only need the 1k of data in RAM.  Or if it's
streaming from a file (or elsewhere), you could get away with only 256
bytes of RAM for the sliding history and still get very decent compression.

In summary: because compression takes such a large amount of RAM (in the
standard algorithm) and it's not really suitable for microcontrollers, the
approach taken in this commit is to minimise RAM usage as much as possible,
and still have acceptable performance (speed and compression ratio).

Signed-off-by: Damien George <damien@micropython.org>
Because we only use the streaming source, this is just extra code size.

Saves 64 bytes on PYBV11.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This commit makes the following changes:
- Replace 256-byte reverse-bits-in-byte lookup table with computation.
- Replace length and distance code lookup tables with computation.
- Remove comp_disabled check (it's unused).
- Make the dest_write_cb take the data pointer directly, rather than the
  Outbuf.

Saves 500 bytes on PYBV11.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This library used a mix of "tinf" and "uzlib" to refer to itself.  Remove
all use of "tinf" in the public API.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This supports `wbits` values between +40 to +47.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Saves 68 bytes on PYBV11.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Collapsing the two adjacent calls to outbits saves 32 bytes.

Bringing defl_static.c into lz77.c allows better inlining, saves 24 bytes.

Merge the Outbuf/uzlib_lz77_state_t structs, a minor simplification that
doesn't change code size.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
For better abstraction for users of this API.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This provides similar functionality to the former zlib.DecompIO and
especially CPython's gzip.GzipFile for both compression and decompression.

This class can be used directly, and also can be used from Python to
implement (via io.BytesIO) zlib.decompress and zlib.compress, as well as
gzip.GzipFile.

Enable/disable this on all ports/boards that zlib was previously configured
for.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Also update zlib & gzip docs to describe the micropython-lib modules.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This replaces the previous zlib version.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
@dpgeorge dpgeorge merged commit ea1a5e4 into micropython:master Jul 21, 2023
41 of 43 checks passed
@dpgeorge
Copy link
Member

Now merged! Thanks @jimmo for all the hard work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants