Skip to content

Conversation

@dpgeorge
Copy link
Member

Summary

This PR adds the marshal module, with marshal.dumps() and marshal.loads() functions. These functions can serialize/unserialize Python objects, but for now only bytecode functions are supported. The semantics of this module match CPython.

Motivation: the original motivation here was to be able to serialize an existing function, send it over a network connection (or otherwise) to a remote MicroPython device, then unserialize it and execute it. This way one MicroPython device can dynamically execute code on another MicroPython device.

And the original implementation was a bit simpler than this PR, it was a dedicated pair of functions in the micropython module that were just used to serialize/unserialize a (bytecode) function. Eg:

import micropython

def foo():
    print('hello')

serial_foo = micropython.serialize_function(foo)
# send serial_foo somewhere else

# ...
# on the device somewhere else:
foo = micropython.unserialize_function(serial_foo, globals())
foo()

But, instead of a MicroPython-specific API, it's definitely better to try and match an existing CPython API if possible. And in this case the marshal module does almost what was needed to serialize/unserialize bytecode functions. The main difference here is that marshal works with code objects, not functions. So the above code is:

import marshal

def foo():
    print('hello')

serial_foo = marshal.dumps(foo.__code__)

# ...
# at the other end
fun_type = type(lambda: 0)
code = marshal.loads(serial_foo)
foo = fun_type(code, globals())
foo()

Using marshal is a little more involved because you have to know about code objects (which compile() also returns, for example). But at least using marshal is fully compatible with CPython.

This PR implements the marshal module and the above marshal example works in MicroPython with this PR.

Testing

Tests are added to CI, and run on the unix coverage variant.

Trade-offs and Alternatives

The major trade-off/alternative here is a MicroPython-specific API vs a CPython-compatible API, namely the marshal module.

Using a MicroPython-specific API, eg micropython.serialize_function() and micropython.unserialize_function():

  • Pro: No need to expose function.__code__.
  • Pro: No need to add the ability to create functions from the function type, like type(lambda:0)(code_object, globals()).
  • Pro: No added complexity to support a more sophisticated code object.
  • Con: Not CPython compatible.
  • Con: Not as general as the CPython/marshal approach.

Implementing the marshal module:

  • Pro: CPython compatible and more general, eg works alongside the compile() function.
  • Pro: Get an efficient implementation of function.__code__, which is what py/objfun: Add function.__code__ attribute #12280 attempted to do.
  • Con: Quite a bit more complicated in the implementation, and may use a little more RAM at runtime to create additional objects when marshalling/unmarshalling.

Also note that the pickle module is no good here, because it cannot serialize functions. It simple serializes a reference to a function, which must already be in scope when the reference is unserialized.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Jan 20, 2025
@dpgeorge
Copy link
Member Author

@iabdalkader What do you think about this approach using marshal and making it CPython compatible?

@github-actions
Copy link

github-actions bot commented Jan 20, 2025

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:   -32 -0.004% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:  +264 +0.072% TEENSY40
        rp2:    -8 -0.001% RPI_PICO_W
       samd:  +276 +0.103% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    -8 -0.002% VIRT_RV32

@codecov
Copy link

codecov bot commented Jan 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.53%. Comparing base (3b62524) to head (e40a3fd).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #16615      +/-   ##
==========================================
- Coverage   98.59%   98.53%   -0.06%     
==========================================
  Files         167      169       +2     
  Lines       21599    21807     +208     
==========================================
+ Hits        21295    21488     +193     
- Misses        304      319      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@iabdalkader
Copy link
Contributor

@iabdalkader What do you think about this approach using marshal and making it CPython compatible?

I think it's much better: CPython compatibility is always better than custom API, and in the future this can be extended to support other things. I thought I should give it a quick test, and I can confirm everything is still working fine. Amazing, thank you!

This type(lambda: 0) allocates a bit of memory right? Couldn't we instead use something like type(help)? At least if it's used internally in a frozen library. Or perhaps one day we'll have types.FunctionType.

@dpgeorge dpgeorge force-pushed the extmod-add-marshal-module branch 7 times, most recently from dc7def8 to b63c48b Compare January 21, 2025 08:59
@dpgeorge
Copy link
Member Author

I think it's much better: CPython compatibility is always better than custom API, and in the future this can be extended to support other things.

OK, great. Then I'll stick with this approach of implementing the marshal module.

This type(lambda: 0) allocates a bit of memory right? Couldn't we instead use something like type(help)?

Yes, it allocates a little RAM. But type(help) won't work because that's a built-in function and has a different underlying type, not the same as a bytecode function.

If you have frozen code then lambda: 0 is also frozen. And you only need to get the type once, so I think it's not really a problem to do this.

Or perhaps one day we'll have types.FunctionType.

This already exists and you can use it here.

@dpgeorge dpgeorge force-pushed the extmod-add-marshal-module branch 4 times, most recently from 720de59 to 9737cc6 Compare January 22, 2025 04:35
@dpgeorge
Copy link
Member Author

OK, this PR is done and ready for final review:

  • docs for the marshal module have been added
  • the new tests now provide full coverage of all code added in this PR
  • CI is fully green (except a glitch with codecov)
  • ports that don't have features added don't increase in size

@andrewleech
Copy link
Contributor

This looks great thanks! For reference my earlier attempts to expose a __code__ object / attributes was to support introspecting function variables to build a micropython pytest implementation supporting fixtures. From a quick skim / review of the code this certainly looks like it can support that really well.

@dpgeorge dpgeorge added this to the release-1.25.0 milestone Feb 4, 2025
@projectgus projectgus self-requested a review February 4, 2025 02:49
Copy link
Contributor

@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments, but I'm really excited about this feature! Seems like it will enable some very interesting use cases for distributed MicroPython!

@dpgeorge dpgeorge force-pushed the extmod-add-marshal-module branch from 9737cc6 to 9667482 Compare February 10, 2025 23:45
To make it easier to diagnose why CPython crashed.

Signed-off-by: Damien George <damien@micropython.org>
The `mp_obj_code_t` and `mp_type_code` code object was defined internally
in both `py/builtinevex.c` and `py/profile.c`, with completely different
implementations (the former very minimal, the latter quite complete).

This commit factors these implementations into a new, separate source file,
and allows the code object to have four different modes, selected at
compile-time:

- MICROPY_PY_BUILTINS_CODE_NONE: code object not included in the build.

- MICROPY_PY_BUILTINS_CODE_MINIMUM: very simple code object that just holds
  a reference to the function that it represents.  This level is used when
  MICROPY_PY_BUILTINS_COMPILE is enabled.

- MICROPY_PY_BUILTINS_CODE_BASIC: simple code object that holds a reference
  to the proto-function and its constants.

- MICROPY_PY_BUILTINS_CODE_FULL: almost complete implementation of the code
  object.  This level is used when MICROPY_PY_SYS_SETTRACE is enabled.

Signed-off-by: Damien George <damien@micropython.org>
This allows retrieving the code object of a function using
`function.__code__`, and then reconstructing a function from a code object
using `FunctionType(code_object)`.

This feature is controlled by `MICROPY_PY_FUNCTION_ATTRS_CODE` and is
enabled at the full-features level.

Signed-off-by: Damien George <damien@micropython.org>
Serialises a bytecode function/generator to a valid .mpy as bytes.

Signed-off-by: Damien George <damien@micropython.org>
This commit implements a small subset of the CPython `marshal` module.  It
implements `marshal.dumps()` and `marshal.loads()`, but only supports
(un)marshalling code objects at this stage.  The semantics match CPython,
except that the actual marshalled bytes is not compatible with CPython's
marshalled bytes.

The module is enabled at the everything level (only on the unix coverage
build at this stage).

Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Damien George <damien@micropython.org>
@dpgeorge dpgeorge force-pushed the extmod-add-marshal-module branch from 9667482 to e40a3fd Compare February 11, 2025 05:59
@dpgeorge dpgeorge merged commit e40a3fd into micropython:master Feb 11, 2025
65 of 66 checks passed
@dpgeorge dpgeorge deleted the extmod-add-marshal-module branch February 11, 2025 06:56
@iabdalkader
Copy link
Contributor

Just to confirm, we don't need to enable MICROPY_PERSISTENT_CODE_SAVE anymore, correct?

@dpgeorge
Copy link
Member Author

Just to confirm, we don't need to enable MICROPY_PERSISTENT_CODE_SAVE anymore, correct?

That is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

py-core Relates to py/ directory in source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants