Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting dynamically loaded native (machine code) modules #583

Closed
pfalcon opened this issue May 7, 2014 · 24 comments
Closed

Supporting dynamically loaded native (machine code) modules #583

pfalcon opened this issue May 7, 2014 · 24 comments

Comments

@pfalcon
Copy link
Contributor

pfalcon commented May 7, 2014

This is informational ticket to refer people too.

Currently, MicroPython doesn't have analog of ".pyd" modules of MicroPython - i.e. native shared/dynamic libraries compiled from C code.

It might be more or less easy to add such support for particular port, and contrary, quite not easy to add support for MicroPython, i.e. which would work consistently across all ports and builds.

Culprits are are:

  1. Too big differences between various dynamic library formats, so cannot rely on any "advanced" features, and should provide "least common denominator" for API symbols lookup (which is an array of function pointers, passed to module init function).
  2. Baremetal ports don't have OS-provided dynamical loading support at all, so will need to implement own loader, and then provide build support for corresponding binary format.
  3. Lack of public API in MicroPython. Or rather, currently all APIs are "public" and are not stable.
@pfalcon
Copy link
Contributor Author

pfalcon commented May 7, 2014

@errordeveloper suggests looking at Contiki ELF loader for point 2.

@dpgeorge
Copy link
Member

dpgeorge commented May 7, 2014

I would support this. It was at the back of my mind since the beginning of this project to have support for loadable native code.

@stinos
Copy link
Contributor

stinos commented May 14, 2014

  1. Would this be the final approach chosen, i.e. array of function pointers passed to a single module entry point function? The function pointers alone aren't sufficient probably since constants like mp_const_none would be needed by the module, but that could be solved by passing a struct containing function pointers and those constants or by creating functions returning those constants.
  2. For unix this would be using modffi or dlload(), right?
  3. Is that still a problem if point 1 is implemented? I mean if the module only uses the functions passed in and only looks at the uPy headers for the definitions of the structs, does it matter everything is public?

I already threw together a proof-of-concept wrapping layer for registering modules/functions/classes which should come in handy when writing custom C++ modules: https://github.com/stinos/micropython-wrap

@stinos
Copy link
Contributor

stinos commented Jun 4, 2014

Any new ideas on this?This would really help extending uPy.. More than once I read here or on the forum something like 'this doesn't belong in the core, so it should be a Python module' while the other option could be 'it can be a native module' has some clear advantages (and disadvantages of course, but still)

@dpgeorge
Copy link
Member

I've started to implement this; see the loadable-native branch. It includes a simple example called "modx", found under extmod/modx that will work with the unix port. In principle it should also work with the stmhal port, but untested at the moment.

The binary format for the external module is very simple: an 8 byte header including version number, followed directly by the machine code. The entry point is the start of the machine code, and gets called with a pointer to a table containing pointers to uPy constants and functions. So far it's just proof of concept.

@pfalcon
Copy link
Contributor Author

pfalcon commented Jan 22, 2015

Do I understand right that you want to define own, system-independent executable format? Will that really be portable enough? That depends on -fPIC availability, and your linkscript doesn't handle .data, .bss . Otherwise, idea is neat.

Semantics also is different - these modules should be loadable by "import", what's implemented now is more like generic machine code loading. I assume that's just for testing, unless you intent to support import by overriding __import__ with Python code (which doesn't work now). I'm also not sure why another function type is required.

@dpgeorge
Copy link
Member

Do I understand right that you want to define own, system-independent executable format?

Yes. That way you have a fixed, simple, single piece of code that does the loading for all archs and ports.

Will that really be portable enough?

Concept is so far proven on unix x86, unix x64 and pyboard :) (see extmod/modx demo)

That depends on -fPIC availability

We are definitely going to need that. Otherwise must write our own linker.

and your linkscript doesn't handle .data, .bss . Otherwise, idea is neat.

I thought about this and had bss feature nearly implemented. But root pointer stuff posed a problem, and I wanted it to be simple. Solution is: you allocate a bytearray and store it in the global dict (mp_store_global(...)). The global dict is unique to your loaded module, just like when importing a file.

Semantics also is different - these modules should be loadable by "import", what's implemented now is more like generic machine code loading. I assume that's just for testing, unless you intent to support import by overriding import with Python code (which doesn't work now).

Ok. I didn't get this far, and opened this branch precisely to discuss such things :)

How does CPython work in this case?

I'm also not sure why another function type is required.

Because it needs to capture the global dict.

@pfalcon
Copy link
Contributor Author

pfalcon commented Jan 22, 2015

How does CPython work in this case?

Common wisdom known from py2 times is that when doing "import foo", Python looks first for "foo.pyd", then for "foo.py" in the current target dir. Never heard of package hierarchy to be implementable via native module, that's clear overcomplication, as common pattern for using native modules is anyway to put just known perfomance-critical functions, and then wrap them in Python for structured API.

@dpgeorge
Copy link
Member

See 1f28413 for implementation of "import" for native modules.

dpgeorge referenced this issue Feb 16, 2015
Can now "import" native module as you would a normal .py module:

>>> import modx
>>> modx.data
>>> modx.add1(1)

where modx.mpy (compiled from extmod/modx) is in your path.
@pfalcon
Copy link
Contributor Author

pfalcon commented Feb 16, 2015

From 1f28413#commitcomment-9769514:

@dpgeorge: Did you give a thought how to write modules which can be compiled both statically and dynamically? IMHO, that should be a basic requirement, otherwise there will be bunch of unportable/unmanageable modules.

@dpgeorge
Copy link
Member

Replying to some previous questions.

Any idea when this feature will be merged? is it stable or are there still some rough edges?

Lots of rough edges. It's basically proof of concept and needs work. If we work on it then it can be merged sooner rather than later :)

what's needed for this to be merged into the master branch?

  1. Implement a proper ABI with all required bindings to the uPy runtime, possible merged with the stuff in py/nativeglue.c.
  2. Decide if we want to use our own binary format (as it currently is), or switch to something common like elf.
  3. Decide if we want/need to support data/bss sections, and, if so, how it will be done. Could postpone this until a later version of the ABI.
  4. Work out how to make the modules configurable so they can be statically or dynamically compiled.
  5. Make user friendly build scripts for dynamic modules (see extmod/modx for how it's done at the moment).

@dpgeorge
Copy link
Member

Some more questions from elsewhere.

it should match Python semantics for module import and not unnecessarily diverge from it

I'm pretty sure it does. You just do "import mymod" and if it finds "mymod.mpy" in your path then it loads it as if it were "mymod.py".

and their should be (few) examples of what can really be done with it.

There are lots of examples for pyboard: drivers for hardware that require some fast, low-level C code. But we want a unix module to test it easily, such as random, or convert an existing one like ure.

@dpgeorge
Copy link
Member

Did you give a thought how to write modules which can be compiled both statically and dynamically?

Not really. I just wanted to get the basics working to see how/if it would work. From here we can think about having the same code compilable in static or dynamic mode.

@danicampora
Copy link
Member

@dpgeorge:

Lots of rough edges. It's basically proof of concept and needs work. If we work on it then it can be merged sooner rather than later :)

OK, I will help as much as possible to get it working nicely ;-). I really think that this is a very cool feature that will add a lot of flexibility to MicroPython.

Decide if we want to use our own binary format (as it currently is), or switch to something common like elf.

As @pfalcon pointer out in #222, I don't think supporting elf it's a good idea. I believe the current binary format will keep the loader more compact and will be easier to maintain.

Decide if we want/need to support data/bss sections, and, if so, how it will be done. Could postpone this until a later version of the ABI.

Is this really necessary? Why not put everything in the same section like currently done?

Work out how to make the modules configurable so they can be statically or dynamically compiled.
Make user friendly build scripts for dynamic modules (see extmod/modx for how it's done at the moment).

Again, why would we want static modules? In case fPIC is not available?

I think choices should me made to keep it simple and avoid code bloat. Choose one way to get native modules working across all supported platforms and stick to it.

Probably my questions/remarks show my lack of knowledge about this topic, so feel free to make me look bad ;-)

@danicampora
Copy link
Member

@dpgeorge @pfalcon I pusehd d6c19e9 and ac60531
to make it work for the CC3200. I also added a makefile to build external native modules (is inside the cc3200 directory) that could be used as a template. It can be improved of course. I have been testing so far works quite well.
Following this approach, if we add more function pointers for accessing micropython's API and also port specific functions, then a lot of cool things can be done with native modules.

@danicampora
Copy link
Member

Also, CPython semantics are followed. Since you can do:

>>> import modx
>>> modx.add1(1)
2

also:

>>> from modx import add1
>>> add1(1)
2

and:

>>> from modx import *

@danicampora
Copy link
Member

Implement a proper ABI with all required bindings to the uPy runtime, possible merged with the stuff in py/nativeglue.c.

Yes, we can add a pointer to mp_fun_table, but I think we will need more stuff than what is included there... stuff like mp_obj_get_int and friends, or is there a better way around it? What if the port is configured with MICROPY_EMIT_NATIVE disabled?

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

I was astonished to be confronted face to face by the fact that Linux kernel doesn't control dynamic object loading in any way. Certainly I knew that stuff is handled by standalone dynamic linker executable, and saw that it has quite a bunch of code, but thought it comes from standard do-more-bloat and reinvent-the-wheel design practices. But nope, with all the bloat Linux kernel has, dynaloading is completely userspace matter, i.e. subject to conventions. And conventions are there to break, which libc vendors to with vigor. E.g. uClibc with distance of 5 micro-versions are blatantly incompatible, in such way that toolchain built with one version can't produce an executable with shlib dependencies, which could load successfully on system running older uclibc version.

Well, then custom format is definitely the right approach and I'm +10 for merging whatever is available into master and better dig into that direction. (Though that doesn't answer of course how to reuse common code (e.g. libc) among different modules).

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

To clarify which didn't fit into the rant above: I consider experiment of providing majority of stdlib functionality using FFI against loadable system libraries failed. It went pretty far, but there were quite a bunch ABI issues already, and what's described above means that the only robust way to build uPy for arbitrary system is static linking. It's common issue that it's not possible to link with -ldl when building statically, but if that is worked around, it may be of zero use: you cannot load "foreign" shared libs on a system you're running on with dlopen() of system you built with (that's e.g. the case with uclibc 0.9.28 vs 0.9.33 (MIPS)).

So, I'd be ready to move more stuff into C code. But moving just too much code means bloating executable, so support for loadable modules is needed.

@dpgeorge
Copy link
Member

Ok, I see (summary: no hope to build standalone uPy binary with ffi support).

So, I'd be ready to move more stuff into C code.

You mean move existing Python scripts to C?

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

Ok, I see (summary: no hope to build standalone uPy binary with ffi support).

Will work for well-behaving systems, but each non-well-behaving system will require complicated individual treatment. That contradicts ideas I always had - "Run uPy everywhere".

You mean move existing Python scripts to C?

I mean e.g. this: #1550 (comment)

@pfalcon
Copy link
Contributor Author

pfalcon commented Nov 13, 2015

First step was apparently to make upip not dependent on FFI, and that fortunately was quite easy, already in master.

@traverseda
Copy link

Seeing people build things like https://blog.littlevgl.com/2019-02-20/micropython-bindings makes me think this could be a pretty awesome addition.

@dpgeorge
Copy link
Member

Dynamically loadable native modules are now available, see aad79ad

The approach taken was a custom binary file (extension of existing .mpy) and a custom linker which converts .o to .mpy. To comment on some of the above points in regards to this approach:


It might be more or less easy to add such support for particular port, and contrary, quite not easy to add support for MicroPython, i.e. which would work consistently across all ports and builds.

The approach taken is generic and works across all existing ports.

Baremetal ports don't have OS-provided dynamical loading support at all, so will need to implement own loader, and then provide build support for corresponding binary format.

There is a simple relocation/linking function which does loading in a generic way: py/persistentcode.c:mp_native_relocate

Lack of public API in MicroPython. Or rather, currently all APIs are "public" and are not stable.

The ABI for native code in .mpy files is defined by the function table in py/nativeglue.h, and the dynacmic native interface in py/dynruntime.h.


Implement a proper ABI with all required bindings to the uPy runtime, possible merged with the stuff in py/nativeglue.c.

Done. See py/dynruntime.h.

Decide if we want to use our own binary format (as it currently is), or switch to something common like elf.

Use own binary format, .mpy files.

Decide if we want/need to support data/bss sections, and, if so, how it will be done. Could postpone this until a later version of the ABI.

BSS sections are supported. Data is not (but can use BSS instead).

Work out how to make the modules configurable so they can be statically or dynamically compiled.

Many examples of this are provided, see eg extmod/modure.c. Grep for MICROPY_ENABLE_DYNRUNTIME.

Make user friendly build scripts for dynamic modules (see extmod/modx for how it's done at the moment).

Done. See py/dynruntime.mk and examples in examples/natmod/.


Yes, we can add a pointer to mp_fun_table, but I think we will need more stuff than what is included there... stuff like mp_obj_get_int and friends, or is there a better way around it?

The mp_fun_table is used to get access to the runtime functions, and many API functions are made available through macros in py/dynruntime.h.


In summary: this feature is now fully implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants