Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards consistent Foreign Function/Data Interface across unix/baremetal #478

Closed
pfalcon opened this issue Apr 13, 2014 · 19 comments
Closed
Labels

Comments

@pfalcon
Copy link
Contributor

pfalcon commented Apr 13, 2014

Let's first settle with terminology, this stuff is usually called FFI (Foreign Function Interface). One response can be that STM and other MCU ports don't need to call "foreign functions", but to call functions, need to deal with "foreign data", so it's closely related. Also, no need to dismiss FFI for MCU either - there's growing tendency among the vendors to include ROMs with drivers/routines in the designs (NXP and TI were spotted).

Anyway, unix port had for an "ffi" module based on libffi, and stmhal port recently got "stm" module (3f48984). They offer (in data access category):

  • ffi: ability to access variables of simple types imported from dynamic libs.
  • stm: ability to read/write memory of standard sizes from (almost) arbitrary address.

Before going further with any of them, it would be nice to consider what Python offers to help with this area:

  • struct module: take something having buffer interface and parse it as a structure, with various data types and alignments support.
  • struct array: allocate a memory array of native types and access with sequence interface.
  • int.from_bytes()/int.to_bytes(): convert between native and Python int - concisely and with builtin types only.
  • memoryview: Allows to expose C-level buffer interface as bytes-like object. Alternative view is that this creates a pointer to object and allows to access underlying object via pointer.

All types but the latter are already partially implemented in uPy. Besides implementing pure memoryview object, the last big step would be to implement memoryview-alike, which would be created from arbitrary memory address. This object then can be used with with struct and from/to_bytes. And final small step is to extend array.array to be able to use memoryview as backing store instead of always allocating a new memory.

That's the idea. To clarify: I don't call for dismissal of existing ffi/stm modules - I just call not to create further adhoc extensions in them, if extended functionality can be well, consistently, and in advanced manner can be covered by native Python interfaces.

Comments?

@pfalcon pfalcon added the rfc label Apr 13, 2014
@dpgeorge
Copy link
Member

Agree. Agree that it should be as Pythonic as possible, reusing appropriate Python libraries/types if they exist.

Main reason for introducing stm module was, as is now common in uPy evolution, pragmatic: to give basic support of constants to the inline assembler. As a side effect, these constants are also useful in normal Python functions, eg specifying PULL_UP for a gpio pin. Having raw memory read/write functions was then an obvious thing to add.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 19, 2014

Re: b11b85a

Ok, if @dpgeorge went for this, then maybe it's worth bringing up problem which appears when using FFI. So, it's not 100% clear which access is required for buffer when used as arg in FFI func - external function can either read or write it. So, it necessitate to have access codes of "read", "write", "read and write", "read OR possibly write". That's a bit hack of course.

Much better alternative of course would be able to allow specifying "const" modifier, but how? FFI follow struct modules typecodes, so doesn't have complete freedom on adding modifiers in consistent manner. Alternative would be no specifying arg types as string like "iiPi", but rather as array ["i", "i", "Pc", "i"], but that means more memory usage. Of course, might go for extended string formats like "iiP_i", assuming it's clear that "_" means "const" (but it's sure not clear, especially with star).

@dpgeorge
Copy link
Member

What is the specific example that the read/write access is unclear?

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 21, 2014

@dpgeorge : I hope you're asking with complete context of my previous comment in mind. Then: https://github.com/micropython/micropython-lib/blob/master/os/os.py , both read and write defined as:

read_ = libc.func("i", "read", "iPi")
write_ = libc.func("i", "write", "iPi")

But we know that write doesn't modifies memory pointed by its 2nd arg, while read does. But modffi doesn't know that and for example allows to pass string to read(), which should be disallowed.

Btw, I almost decided to define "P" as const void_, and "p" as void_. The latter clashes with struct's definition of "p", but c'est la vie. But if you have better ideas, there're welcome.

@dpgeorge
Copy link
Member

@pfalcon: I see, I thought you meant that it's unclear whether an external function may read or write, or both, or none, depending on some unknown state/variables associated with that function. But what you are saying is that there is no way (currently) to specify in ffi language that something is a read-only pointer.

How does CPython's ffi do it? I would say you would want to extend the mini-language for type specifications to include a constant pointer.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 21, 2014

How does CPython's ffi do it? I would say you would want to extend the mini-language for type specifications to include a constant pointer.

CPython's "ctypes" modules has extensive, verbose, and bloated types engine. Also, generic - can define any structure, pointer to structure, array of pointers - with all operation offering type tracking and safety. Reason for developing own "ffi" module was desire not to have all that cycle- and memory-stealing complexities.

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 21, 2014

Note that libffi also allows to define struct layout, but I explicitly don't want to wrap that, because it overlaps with "struct" module. "struct" module is not powerful enough to do structure operations, but then I still don't want to have libffi-specific solution either.

The rest of this discussion is at #512 (comment)

@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 21, 2014

Btw, I almost decided to define "P" as const void_, and "p" as void_. The latter clashes with struct's definition of "p", but c'est la vie. But if you have better ideas, there're welcome.

The problem is that modffi currently doesn't store typespec string, and converts values context-independently, and checking types will of course have performance hit. So, let's have that specced out, but not enforced ;-).

@dhylands
Copy link
Contributor

I figured some things out, and I thought I would share. One of my issues with ctypes was that if you wanted to use nested structures and access something like:

a = foo.bar.nested.blah

Then foo, and foo.bar, and foo.bar.nested would all create instances of objects each and every time you referenced them. This essentially makes it unsuitable for use inside ISRs wanting to access registers (which is my primary interest).

Over in #512 @pfalcon had made a comment propsing something along the lines of

TIMER_LAYOUT = {...}
timer0 = sstruct(TIMER_LAYOUT, TIMER0_BASE)

It's even possible to have timer0 be constant and come from flash. Now we get to the insight that I had.
timer0.subsstruct will call load_attr on timer0's type. It will need to allocate a RAM object to keep trck of where we are in the structure, but this is the only RAM object that we need to allocate.

I'm going to call this RAM object a selector (and I'm open to suggestions for alternative names). So timer0's load_attr would return a selector. Then as you move deeper into the structure, selector's load_attr will get called which will update some internal state in the selector object, and it will return itself (that is the really significant piece of the puzzle for me). Eventually you hit a primitive, at which point you actually go and read the memory and return it as indicated by the primitive. Simple primitives won't cause any additional memory allocations, more complex ones will (for example if a float were required).

So now that I have this figured out, the rest is pretty straight forward. The nice thing about this is that you can use this technique to traverse any data structure which describes the layout. I actually kind of like ctype's way of declaring the layout, just not its way of traversing. It allows for Structures, Unions and Arrays, all of which are needed to properly declare the timer registers, and it supports bit fields.

So I think I have a direction forward. It will work for registers, and it will work for other generic things as well.

I'm not particularly stuck on using ctypes to describe the layout, but it does satisfy all of my requirements.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 25, 2014

Then foo, and foo.bar, and foo.bar.nested would all create instances of objects each and every time you referenced them. This essentially makes it unsuitable for use inside ISRs wanting to access registers (which is my primary interest).

Hmm, I don't remember emphasis on being ISR-usability. I'm sorry if I was just missing this requirement.

But anyway, let's rephrase "would all create instances of objects" as "would all return instances of objects". If there's anything to compute to put into such objects, they would need to be created too. But if all info is somehow precomputed, it wouldn't need to create them, just return such precomputed objects (stored in ROM for example).

And as ultimately we deal with memory access, the thing to compute is physical memory address. So, if we have it, we don't need to allocate anything, just need to return pointer to existing (sub)object (part of larger object structure). Taking into account previous discussion, this leads to 3 choices:

  1. Structure fields specified in order, without explicit offsets. Offsets would need to be pre-calculated before usage.
  2. Structure fields with explicit offsets, order is thus not important.
  3. Structure fields with physical memory address. All objects used to encode layout should be of special types (to allow to override load_attr operation on them).

I still think it would be possible to use same basic data structure for all the cases above (and thus single algorithm and code).

Note that this proposes one solution to your problem, which wouldn't require any memory allocation during access at all, if case 3 is used and stored in ROM. That of course would come at the expense of need to repeat structures of similar layout, but dealing with different memory addresses (like several timer blocks).

@pfalcon
Copy link
Contributor Author

pfalcon commented May 25, 2014

I'm going to call this RAM object a selector (and I'm open to suggestions for alternative names). So timer0's load_attr would return a selector. Then as you move deeper into the structure, selector's load_attr will get called which will update some internal state in the selector object, and it will return itself.

Ok, so this is another solution to a problem, which will need to allocate some RAM, but will allow to benefit from reusing layout structure for similar blocks at different memory locations.

Well, big concern is concurrent access to such selector. It should work for many cases, but when it breaks, it will be quite unpleasant. So, there should be some way to detect such concurrent (or just 2nd-in-row) access to such selector and throw error. Intuitively that should be doable, details should be thought out on the actual implementation though.

So, that leaves 2 alternative (but not mutually excluding) ways to resolve ISR accessibility issue, so IMHO it still looks well to have single module to rule them all.

@pfalcon
Copy link
Contributor Author

pfalcon commented May 25, 2014

I'm not particularly stuck on using ctypes to describe the layout, but it does satisfy all of my requirements.

And I again would like to reinstate that I don't like direct ctypes syntax (which uses class definition around a layout definition). Just think how ctypes implements it - via metaclasses. So, there's a "standard" class definition, which (roughly speaking) being passed to some function to be processed at runtime to generate a direct layout structure with which ctypes internals can work. This is extra, superfluous step. Instead, I was contemplating what would be such layout structure, on which memory access module can work directly, and at the same time, which would be expressable using standard Python data types.

@dhylands
Copy link
Contributor

Hmm, I don't remember emphasis on being ISR-usability. I'm sorry if I was just missing this requirement.

I probably forgot to state it. If you want to write ISRs in python, you need access to registers or APIs that will do the work for you. The timer block is quite complex and I'd like to try implementing some what I want to do in Python and then, if it makes sense, create a C API.

But anyway, let's rephrase "would all create instances of objects" as "would all return instances of objects". If there's anything to compute to put into such objects, they would need to be created too. But if all info is somehow precomputed, it wouldn't need to create them, just return such precomputed objects (stored in ROM for example).

Which brings us full circle to my original implementation in #512, which did just that. IIRC it took about 16K of flash to describe the registers for the 14 timers.

@dhylands
Copy link
Contributor

Ok, so this is another solution to a problem, which will need to allocate some RAM, but will allow to benefit from reusing layout structure for similar blocks at different memory locations.

Well, big concern is concurrent access to such selector. It should work for many cases, but when it breaks, it will be quite unpleasant. So, there should be some way to detect such concurrent (or just 2nd-in-row) access to such selector and throw error. Intuitively that should be doable, details should be thought out on the actual implementation though.

I was assuming that each time you asked for timer0.top_level it would return a new selector. So then there are no issues with concurrent access (at least no more issues than when accessing any other data structure).

With the ISR, the main driver would need to pre-instantiate any objects that the ISR wants to use (regardless of how its implemented), and the main driver would use a different selector in order to avoid any concurrency issues.

@dhylands
Copy link
Contributor

And I again would like to reinstate that I don't like direct ctypes syntax (which uses class definition around a layout definition). Just think how ctypes implements it - via metaclasses. So, there's a "standard" class definition, which (roughly speaking) being passed to some function to be processed at runtime to generate a direct layout structure with which ctypes internals can work. This is extra, superfluous step. Instead, I was contemplating what would be such layout structure, on which memory access module can work directly, and at the same time, which would be expressable using standard Python data types.

I don't need all of the metaclass machinery of ctypes, since the "selector" would be the thing that actually does the load_attr/store_attr.

If I were to use ctypes for Strctures/Unions/Arrays, I would only declare the classes, they would never actually get instantiated. The class is the layout. Actually, since that's probably the only aspect of ctypes I would use, lets just say I'll have a Structure, Union and Array class. Maybe Array can be done without a class.

I like your idea of using a dict rather then an array, but I'm not sure how you would use an OrderedDict. You'd have to do:

d = OrderedDict()
d['key1'] = val1;
d['key2'] = val2;

I don't see any equivalent way of doing:

d = { 'key1' : val1, 'key2' : val2 }

Oh wait - it looks like OrderedDicts can be initialized from an array. That works.

d = OrderedDict([('key1', val1), ('key2', val2)])

So then Structure and Union could just be subclasses of OrderedDict (so Selector can tell them apart) or they could be classes with an OrderedDict member. Either way works for me. I could prototype using a class with a regular dict as a member for the time being, and replace it with OrderedDict once that becomes available. For registers, I'm assuming that the offsets will be pre-calculated.

@pfalcon
Copy link
Contributor Author

pfalcon commented Jun 27, 2014

I'm forgetting to say that I've been working on such a module (codenamed "sstruct" from "super-struct") lately, based on ideas outlined in #512 (comment) , and first version should be up for review soon.

@pfalcon
Copy link
Contributor Author

pfalcon commented Jun 27, 2014

#718 is on critical path for this.
#722 is required for full-featured implementation (not required for v1.0)

@pfalcon
Copy link
Contributor Author

pfalcon commented Jun 27, 2014

@dhylands:

Oh wait - it looks like OrderedDicts can be initialized from an array. That works.

d = OrderedDict([('key1', val1), ('key2', val2)])

Yep, that would work, but of course not 100% efficient. Ideally,

OrderedDict({'key1': val1, 'key2': val2})

Should just produce constant OrderedDict laid out as specified by user. I submitted #722 for that.

@dpgeorge
Copy link
Member

The uctypes module was implemented to give a generic interface to foreign data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants