[proof of concept] Implement parts of the core in Python bytecode #5025

dpgeorge · 2019-08-20T03:30:33Z

This is a proof-of-concept to show how it's possible to reimplement parts of the MicroPython core in Python bytecode. So far in this PR the builtin sum(iterable, start=0) function is changed to a pure Python implementation and "hand frozen" in to the firmware.

The main reason to do this is to reduce code size while retaining the same functionality. The code size change with this PR is:

   bare-arm:   -16 -0.024% 
minimal x86:   +32 +0.021% 
   unix x64:   -96 -0.019% [incl +32(data)]
unix nanbox:  -172 -0.039% 
      stm32:    -8 -0.002% PYBV10
     cc3200:   -16 -0.009% 
    esp8266:    -4 -0.001% 
      esp32:   -24 -0.002% GENERIC [incl +32(data)]
        nrf:    -4 -0.003% pca10040
       samd:   -12 -0.012% ADAFRUIT_ITSYBITSY_M4_EXPRESS

That's only a small decrease but this principle would scale to reimplementing a lot of built in functionality (eg string methods) to get a decent reduction in code size.

Some points to note:

eventually there'd be some preprocessing so the functions can be written in Python rather than bytecode by hand
in general I'd expect a bytecode implementation to be smaller than the corresponding C implementation, although it might not always be the case
in this PR there's some extra code added to the VM to handle traceback (or lack thereof) in builtin bytecode, a cost that only needs to be paid once
passing in keyword args to sum() will crash the VM and needs a small check added to fix this (would add a bit of code but only once)
the bytecode for sum() was generated with mpy-cross and can actually be optimised down by one byte
the prelude for the bytecode is relatively large, 10 bytes compared with 15 bytes for the actual function; it would be possible to optimise the prelude for size which would reduce the size for all bytecode functions, builtin and user defined
note that the bytecode for sum() includes a default argument
of course performance would be reduced, so ultimately there could be an option for "functions in C for speed" vs "functions in Python for size"

stinos · 2019-08-20T08:02:29Z

This is an interesting concept. Did you happen to measure performance differences?

dpgeorge · 2019-08-20T08:13:13Z

Did you happen to measure performance differences?

No, didn't get to that yet. I guess it wouldn't be too hard to do it for sum, just passing it in various things.

dpgeorge · 2019-08-20T12:44:54Z

I did some benchmarking, running a simple loop like this:

def test():
    x = range(1000)
    for _ in range(300):
        sum(x)

Running on a PYBD-SF2 @ 120MHz I get:

diff of scores (higher is better)
N=100 M=100                   sum_c -> sum_bytecode         diff      diff% (error%)
builtin_sum_range.py        1023.28 ->     512.16 :    -511.12 = -49.949% (+/-0.14%)
builtin_sum_list.py         1106.45 ->     519.57 :    -586.88 = -53.042% (+/-0.20%)

So bytecode is half the speed of the existing C implementation.

Then I used mpy-cross to compile the Python implementation of sum to native Python code, and put that in instead of the frozen bytecode. The results for the benchmark of this native Python vs C were:

diff of scores (higher is better)
N=100 M=100                   sum_c -> sum_native         diff      diff% (error%)
builtin_sum_range.py        1023.28 ->     942.19 :     -81.09 =  -7.925% (+/-0.29%)
builtin_sum_list.py         1106.45 ->     998.18 :    -108.27 =  -9.785% (+/-0.22%)

That's not too bad, the native code generator is close to the C version!

See latest commit for the native code blob, for both x86-64 and Thumb2. The size of these blobs is a bit bigger than the C version, but with a bit of optimisation in the native emitter these blobs could become a bit smaller and faster, possibly getting close to C.

There is a lot of scope here for further work. As shown, it possible to reimplement parts of the core in Python, which is compiled to either bytecode (small but slow) or native machine code (fast but bigger, and potentially on par with the existing C implementation). A benefit of writing in Python rather than C is that the underlying architecture of the system (of MicroPython, the VM, etc) is hidden. For example, whether exceptions are implemented using NLR or simple return codes is irrelevant to the Python implementation of a function (eg sum()), and the underlying architecture can be more easily changed.

This is very meta and gets into the territory of PyPy (MicroPyPy!), where the interpreter is written in itself (RPython a reduced dialect of Python).

nevercast · 2019-08-30T03:01:57Z

This is a very cool demonstration Damien. I'd like to see arguments, eventually, to both the make process for building MicroPython, and to mpy-cross to favour speed vs. size, that would set a MACRO that we can use throughout the code base to prefer for example Python vs C, like you have here.

Perhaps an option to favour less heap or more heap too?

dpgeorge · 2019-08-30T07:28:35Z

I'd like to see arguments, eventually, to both the make process for building MicroPython, and to mpy-cross to favour speed vs. size, that would set a MACRO that we can use throughout the code base to prefer for example Python vs C, like you have here.

Yes, that makes sense, speed vs size (like -Os vs -O2).

Perhaps an option to favour less heap or more heap too?

It might be tricky to have that option orthogonal to speed-vs-size. In general the code always attempts to reduce heap usage.

Turn off PWM pin during PulseOut construct

projectgus · 2024-03-07T23:53:49Z

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

dpgeorge added 2 commits August 20, 2019 13:01

py/vm: Add support for built-in bytecode funcs without traceback.

98ebf62

py/modbuiltins: Rewrite sum() function in pure bytecode.

ac900af

py/modbuiltins: Add implementation of sum() in native Python.

df765fe

jimmo mentioned this pull request Oct 30, 2019

Cancelling coroutines: can't pend throw to just-started generator. #5242

Closed

dpgeorge mentioned this pull request May 4, 2021

Add listdir to unix/modos.c #3200

Closed

tannewt added a commit to tannewt/circuitpython that referenced this pull request Jul 21, 2021

Merge pull request micropython#5025 from DavePutz/issue_5016

db0adf1

Turn off PWM pin during PulseOut construct

dpgeorge mentioned this pull request Oct 20, 2021

Two argument form of iter() is not implemented. #5384

Open

dpgeorge added the py-core label Nov 30, 2021

dpgeorge mentioned this pull request Feb 1, 2022

Frozen python code as part of machine module? #8241

Closed

Gadgetoid mentioned this pull request Feb 29, 2024

global: Remove the STATIC macro. #13763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proof of concept] Implement parts of the core in Python bytecode #5025

[proof of concept] Implement parts of the core in Python bytecode #5025

dpgeorge commented Aug 20, 2019

stinos commented Aug 20, 2019

dpgeorge commented Aug 20, 2019

dpgeorge commented Aug 20, 2019

nevercast commented Aug 30, 2019

dpgeorge commented Aug 30, 2019

projectgus commented Mar 7, 2024

[proof of concept] Implement parts of the core in Python bytecode #5025

Are you sure you want to change the base?

[proof of concept] Implement parts of the core in Python bytecode #5025

Conversation

dpgeorge commented Aug 20, 2019

stinos commented Aug 20, 2019

dpgeorge commented Aug 20, 2019

dpgeorge commented Aug 20, 2019

nevercast commented Aug 30, 2019

dpgeorge commented Aug 30, 2019

projectgus commented Mar 7, 2024