New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py/persistentcode: Full set of optimisations to reduce mpy file size #4564
Conversation
Unexpected twist. One may only wonder why it wasn't done like that from the beginning, and look for premonitions like "but before we could read the whole bytecode efficiently with one read() call", or "but before we could mmap() bytecode". Skipping that, one can point that there's still no support for specifying the exact window size (in ones, not in powers of two). |
Well, everything is a compromise / trade off. Efficiency concerns with multiple read() calls can be mitigated with a buffer (and vfs_reader already has one). Not being able to mmap a .mpy file is a valid concern, but I don't think it's an important one: since the data has to anyway be modified the mmap needs to be MAP_PRIVATE and it's really just a potential speed optimisation to load it, not one to reduce memory usage (and on bare-metal mmap doesn't really exist, and if it does then it must anyway allocate RAM, it can't map ROM). I tested the speed of importing lcd160cr.mpy (a large file) before and after the set of patches here, and got:
So the import speed is reduced by a lot, most likely due to less handling of qstrs.
The reason for a power-of-two is to be able to use |
Well, that was a "feature request" from me from the start, and given all the other refactors made, hopefully this one can fit in too.
Otherwise it's just an Thanks for other repsonses and figures - I agree with all of them. |
Ok, I pushed a commit implementing this. It needs further testing though (and squashing). |
This is an implementation of a sliding qstr window used to reduce the number of qstrs stored in a .mpy file. The window size is configured to 32 entries which takes a fixed 64 bytes (16-bits each) on the C stack when loading/saving a .mpy file. It allows to remember the most recent 32 qstrs so they don't need to be stored again in the .mpy file. The qstr window uses a simple least-recently-used mechanism to discard the least recently used qstr when the window overflows (similar to dictionary compression). This scheme only needs a single pass to save/load the .mpy file. Reduces mpy file size by about 25% with a window size of 32.
Instead of emitting two bytes in the bytecode for where the linked qstr should be written to, it is now replaced by the actual qstr data, or a reference into the qstr window. Reduces mpy file size by about 10%.
When encoded in the mpy file, if qstr <= QSTR_LAST_STATIC then store two bytes: 0, static_qstr_id. Otherwise encode the qstr as usual (either with string data or a reference into the qstr window). Reduces mpy file size by about 5%.
6d1df83
to
62483bb
Compare
Thanks! |
[build] simplify makeqstrdata heuristic
This PR combines and improves upon previous PRs to reduce size of mpy files: #4473 (static qstrs), #4549 (qstr window).
It also adds a further optimisation (second commit) to pack the qstr data directly into the bytecode instead of writing it at the end of the bytecode. This allows to shrink the mpy file by about 2 bytes for every opcode that has a qstr argument.
Eg, for the sequence (LOAD_GLOBAL 'print', CALL_FUNCTION 0), instead of encoding it as
0x1c 0x00 0x00 0x64 0x00 0x05 p r i n t
it's now encoded as0x1c 0x05 p r i n t 0x64 0x00
(the 0x05 is the length of the string "print").The mpy saver can write such a stream in one pass without needing any additional temporary memory, and same goes for the mpy loader.
Size reduction is about 25% for the qstr window, then a further 10% for packed qstrs in bytecode, then a further 5% with the addition of a static qstr set.
On a bunch of 19 test py scripts (including uasyncio, upip, lcd160cr), the sum of all 19 mpy file sizes went from 60064 bytes before the patches here down to 39185 bytes.