Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py/persistentcode: Full set of optimisations to reduce mpy file size #4564

Merged
merged 4 commits into from Mar 5, 2019

Conversation

dpgeorge
Copy link
Member

@dpgeorge dpgeorge commented Mar 1, 2019

This PR combines and improves upon previous PRs to reduce size of mpy files: #4473 (static qstrs), #4549 (qstr window).

It also adds a further optimisation (second commit) to pack the qstr data directly into the bytecode instead of writing it at the end of the bytecode. This allows to shrink the mpy file by about 2 bytes for every opcode that has a qstr argument.

Eg, for the sequence (LOAD_GLOBAL 'print', CALL_FUNCTION 0), instead of encoding it as 0x1c 0x00 0x00 0x64 0x00 0x05 p r i n t it's now encoded as 0x1c 0x05 p r i n t 0x64 0x00 (the 0x05 is the length of the string "print").

The mpy saver can write such a stream in one pass without needing any additional temporary memory, and same goes for the mpy loader.

Size reduction is about 25% for the qstr window, then a further 10% for packed qstrs in bytecode, then a further 5% with the addition of a static qstr set.

On a bunch of 19 test py scripts (including uasyncio, upip, lcd160cr), the sum of all 19 mpy file sizes went from 60064 bytes before the patches here down to 39185 bytes.

@pfalcon
Copy link
Contributor

pfalcon commented Mar 2, 2019

Unexpected twist. One may only wonder why it wasn't done like that from the beginning, and look for premonitions like "but before we could read the whole bytecode efficiently with one read() call", or "but before we could mmap() bytecode".

Skipping that, one can point that there's still no support for specifying the exact window size (in ones, not in powers of two).

@dpgeorge
Copy link
Member Author

dpgeorge commented Mar 4, 2019

"but before we could read the whole bytecode efficiently with one read() call", or "but before we could mmap() bytecode".

Well, everything is a compromise / trade off. Efficiency concerns with multiple read() calls can be mitigated with a buffer (and vfs_reader already has one). Not being able to mmap a .mpy file is a valid concern, but I don't think it's an important one: since the data has to anyway be modified the mmap needs to be MAP_PRIVATE and it's really just a potential speed optimisation to load it, not one to reduce memory usage (and on bare-metal mmap doesn't really exist, and if it does then it must anyway allocate RAM, it can't map ROM).

I tested the speed of importing lcd160cr.mpy (a large file) before and after the set of patches here, and got:

        before  after
PYBv1.0  40 ms   20 ms
esp8266 226 ms  161 ms

So the import speed is reduced by a lot, most likely due to less handling of qstrs.

Skipping that, one can point that there's still no support for specifying the exact window size (in ones, not in powers of two).

The reason for a power-of-two is to be able to use & instead of % in the window computation. But if you really think it's an important option to have...

@pfalcon
Copy link
Contributor

pfalcon commented Mar 4, 2019

But if you really think it's an important option to have...

Well, that was a "feature request" from me from the start, and given all the other refactors made, hopefully this one can fit in too.

The reason for a power-of-two is to be able to use & instead of % in the window computation.

Otherwise it's just an if to wrap the index.

Thanks for other repsonses and figures - I agree with all of them.

@dpgeorge
Copy link
Member Author

dpgeorge commented Mar 4, 2019

Well, that was a "feature request" from me from the start, and given all the other refactors made, hopefully this one can fit in too.

Ok, I pushed a commit implementing this. It needs further testing though (and squashing).

This is an implementation of a sliding qstr window used to reduce the
number of qstrs stored in a .mpy file.  The window size is configured to 32
entries which takes a fixed 64 bytes (16-bits each) on the C stack when
loading/saving a .mpy file.  It allows to remember the most recent 32 qstrs
so they don't need to be stored again in the .mpy file.  The qstr window
uses a simple least-recently-used mechanism to discard the least recently
used qstr when the window overflows (similar to dictionary compression).
This scheme only needs a single pass to save/load the .mpy file.

Reduces mpy file size by about 25% with a window size of 32.
Instead of emitting two bytes in the bytecode for where the linked qstr
should be written to, it is now replaced by the actual qstr data, or a
reference into the qstr window.

Reduces mpy file size by about 10%.
When encoded in the mpy file, if qstr <= QSTR_LAST_STATIC then store two
bytes: 0, static_qstr_id.  Otherwise encode the qstr as usual (either with
string data or a reference into the qstr window).

Reduces mpy file size by about 5%.
@dpgeorge
Copy link
Member Author

dpgeorge commented Mar 5, 2019

Merged in 5996eeb through 62483bb

@dpgeorge dpgeorge closed this Mar 5, 2019
@dpgeorge dpgeorge deleted the py-mpy-size-reduction branch March 5, 2019 05:49
@dpgeorge dpgeorge merged commit 62483bb into micropython:master Mar 5, 2019
@pfalcon
Copy link
Contributor

pfalcon commented Mar 5, 2019

Thanks!

tannewt added a commit to tannewt/circuitpython that referenced this pull request Apr 21, 2021
[build] simplify makeqstrdata heuristic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants