Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mp_raw_code_load/mp_raw_code_save - enable to execute from flash without loading in RAM #4124

Closed
adritium opened this issue Sep 11, 2018 · 7 comments

Comments

@adritium
Copy link

What changes need to be made to enable this feature?

@dpgeorge
Copy link
Member

There is only one reason that bytecode in .mpy files needs to be loaded into RAM: because the qstr values embedded in the saved bytecode must be rewritten to match those of the VM/runtime (this is essentially the linking stage, resolving symbols).

It is possible to change this so that bytecode is read-only and doesn't need modification. It would require making a qstr translation table for each bytecode block, to translate between the qstr values in the .mpy file and the qstr values in the VM/runtime. This translation table would need to live in RAM, but otherwise the bytecode can stay in ROM/flash/etc. The downsides to this approach: 1) it takes more RAM if the bytecode is also in RAM (the usual case when executing from the REPL or from a .py file); 2) it decreases performance of the VM due to the extra lookup in the table for each opcode that uses a qstr (this is a hit for both .mpy and .py).

There would be a way to retain performance and low RAM usage of existing REPL/.py files, and also have .mpy in ROM: for each existing opcode which needs a qstr add a new opcode that gets that qstr via the translation table. The following opcodes would need to be added (because they take a qstr as an argument):

MP_BC_LOAD_CONST_STRING_VIA_TABLE
MP_BC_LOAD_METHOD_VIA_TABLE
MP_BC_LOAD_SUPER_METHOD_VIA_TABLE
MP_BC_LOAD_NAME_VIA_TABLE
MP_BC_LOAD_GLOBAL_VIA_TABLE
MP_BC_LOAD_ATTR_VIA_TABLE
MP_BC_STORE_NAME_VIA_TABLE
MP_BC_STORE_GLOBAL_VIA_TABLE
MP_BC_STORE_ATTR_VIA_TABLE
MP_BC_DELETE_NAME_VIA_TABLE
MP_BC_DELETE_GLOBAL_VIA_TABLE
MP_BC_IMPORT_NAME_VIA_TABLE
MP_BC_IMPORT_FROM_VIA_TABLE

@adritium
Copy link
Author

@dpgeorge thanks for your reply!

@adritium
Copy link
Author

@dpgeorge since RAM is a constant complaint of platforms micropython runs on, this sounds like a no-brainer.

Right?

@adritium
Copy link
Author

The following opcodes would need to be added (because they take a qstr as an argument):

For my understanding: would those opcodes exist only in the .mpy?

@dpgeorge
Copy link
Member

since RAM is a constant complaint of platforms micropython runs on, this sounds like a no-brainer.
Right?

Not really. As I said above, implementing this feature would increase RAM usage for all code that is not in a .mpy file due to the additional qstr translation table (also for code that is in a .mpy but can't be executed from where it is stored, eg an SD card).

The other big issue to solve would be making sure that .mpy's live in a location that is 1) memory mapped to the CPU; 2) contiguous. For the stm32 port this means storing in internal flash, or external memory-mapped QSPI flash, and using a new filesystem that can store files contiguously. For esp8266 it means you can only use the flash below 1MiB for this kind of storage (because that's the only region that is memory mapped). Other systems will have similar constraints.

For my understanding: would those opcodes exist only in the .mpy?

Only the .mpy will use these opcodes, but the VM still needs to implement them which means a moderate increase in code/firmware size. A way to improve this (reduce VM code size) would be to modify existing opcodes that have a qstr argument, so that when they decode the qstr value from the bytecode they check if it needs translation (eg if the high bit of the qstr is set then it is a relative qstr and needs to be translated using the qstr table). This approach would also allow mp_raw_code_load() to pick a strategy when loading a .mpy file: 1) if the bytecode is not memory mapped or contiguous and must be loaded into RAM anyway then it can translate the qstrs as it loads the bytecode [this is how it already works]; 2) if the bytecode is memory mapped and contiguous then it leaves it in ROM, takes a pointer to this ROM, and creates a qstr translation table in RAM, allowing the VM to translate them on the fly when it executes the code.

@tve
Copy link
Contributor

tve commented May 5, 2020

I'm wondering whether there is a different solution, which may be easier to implement but perhaps less flexible.

Assume that there is a designated area in flash for execute-from-flash ("EFF") modules: modules must be written into that area explicitly similar to the way one would write a module into the filesystem. (Maybe the area could be mounted into the filesystem namespace...)

When a new EFF module is written its qstr are only resolved against qstr tables that are in flash, i.e. against the constant table in the firmware and against previous EFF modules. New qstr are written to a table in flash that is not immediately part of the qstr table chain. In order to make the new qstr part of the chain a reset is required, thus the newly EFF module can only be used after a reset which then initializes the qstr chain to include the new table (or new entries). RAM-allocated qstr always come on top of all these ROM/Flash qstr.

A natural result of this is that EFF modules form a stack. Only the top-most module can be removed (at a time) and it can really only be marked for removal to be erased on the next reset because qstr table entries used by code resident in RAM may point into it. (I believe this is the case with the qstr translation table as well unless all the strings themselves are copied to RAM.)

All this would lead to a stack model where EFF modules are pushed onto the stack, a reset is necessary before they can be used, and they can be popped off the stack again but only erased after another reset.

I believe that the translation table approach has the advantage that a newly written module can immediately be executed, but I believe it shares the same limitations when it comes to removal, including the stack property (edit: probably not, too late to think more).

tannewt added a commit to tannewt/circuitpython that referenced this issue Feb 9, 2021
Add display init code for Lilygo TTGO T8 ESP32-S2
@dpgeorge
Copy link
Member

Static bytecode/.mpy files with a qstr indirection table was implemented in f2040bf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants