Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extmod: implement basic "uevent" module (RFC, WIP) #6125

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

dpgeorge
Copy link
Member

@dpgeorge dpgeorge commented Jun 8, 2020

Background: the aim is to make events in MicroPython more generic and efficient, rather than having busy polling (based on POSIX poll) and being restricted to stream IO. The main use is for the uasyncio module, eg so that external interrupts like pin edges can wake the asyncio scheduler, but the event sub-system should not be tied to uasyncio.

Already there is #6056, #6106 and #6110. This PR takes the proof-of-concept PR #6110 and extracts out just the "uevent" module and implements it for the stm32 and unix ports, to start with. At the moment this module is quite similar to the existing "uselect" module, at least on the surface. But the idea is to make "uevent" as efficient as possible on bare-metal (ie O(1) for all operations) and support arbitrary events, and the existing "uselect" API is too restrictive to achieve this goal.

What's done in this PR:

  • working implementation of uevent on stm32 and unix
  • uasyncio changed to use uevent instead of uselect
  • docs for uevent (still to be properly written)

There's no real functional change here, the aim is to switch to uevent in a seamless way and then gradually improve it.

For now the uevent module should be considered "private" because its interface may change.

Any ideas/comments on improvements or alternatives are welcome.

@dpgeorge
Copy link
Member Author

dpgeorge commented Jun 9, 2020

@andrewleech you may want to try testing this with your use of uasyncio. This is merge-ready and should work...

Copy link
Contributor

@tve tve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, not easy to grok ;-)
I need to go back to the previous PR to understand why you're leaving the juicy stuff out...

I'm struggling with the ioctl's being built-in here. I understand that this PR is just one step and things will change further. I'm also trying to understand the abstractions... The most confusing is what does the object registered with a poller have to satisfy in terms of properties and interface?

In all cases, I believe it must satisfy some notion of identity such that registering the same "thing" twice only does it once (there's an implicit assumption in the self->entries[i].base.obj == args[1] check in register()).
But the more interesting part is how it must/may signal an event. I believe there are three cases:

  1. each object has or provides a poll() method that can be called to busy-poll the object (this is what moduevent_native implements with the poll() method being hard-coded to ioctl and its peculiar args)
  2. all objects are managed by a port-provided "default" poller function that is used to poll all objects at once (this is what moduevent_poll implements with the system poll() being that default function)
  3. each object spontaneously (e.g. via interrupt) signals events (no implementation of that yet)

I wonder whether there is value in writing generic implementations of these three cases and having port-defined macros where the specific functions need to be called instead of having each port write its own version. Maybe that ends up being too unwieldy.

I think overall what I like the least is the fact that there are still two nested scheduling loops: the uasyncio one and the poll_ms one. It would be much better IMHO to start consolidating these by allowing poll_ms to return an "empty iterator" so the caller has a chance to check everything it needs to.

NB: I've resolved a bunch of the questions posed in my comments in the code, I'm leaving them because it may help you decide where some comments could speed up the understanding of future reviewers.


.. module:: uevent
:synopsis: wait for events on a set of objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth some intro?

The main purpose of the uevent module is to enable fully-event driven operation using uasyncio such that whenever any form of event needs to be awaited it can be done by pausing the processor until the event occurs. The benefits are that the processor and other parts of the system can be put to sleep to save power and that an event, when it comes in, gets handled immediately without having to wait for a polling loop to come around.

The uevent functionality is loosely patterned after the select and selectors modules with the major distinction that it is not tied to file descriptors or stream I/O. Instead, an event is an abstract concept represented by a ... [oops, can't continue until I actually understand some more...]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds pretty good. Will add.

Methods
~~~~~~~

.. method:: poll.register(obj[, eventmask])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is obj? I assume it has to have a certain interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the port. On unix it needs to respond to MP_STREAM_GET_FILENO. On bare-metal in the current code it must implement the MP_STREAM_POLL ioctl. But when it moves to an event-based system the obj needs to respond to MP_STREAM_SET_EVENT_CALLBACK (or whatever i'll be called).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can some of your reply be captured in this doc or perhaps in the code since the interface described in the doc is still in flux?

self.data[idx] = None
self.unregister(1 << idx)

def _enqueue(self, s, idx):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is idx?
rename to flag_idx?
// flag_idx is an index into the flag bits ??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's just the log2 of the event bit, so it's possible to index into the data list (I wanted to use small names here to reduce code size, because they allocate a qstr in the frozen code).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then perhaps:
// idx: log2 of event bit
(when I first read this I wasn't sure whether this was the file descriptor index or what...)

@@ -1315,6 +1315,14 @@ typedef double mp_float_t;
#define MICROPY_PY_UCTYPES_NATIVE_C_TYPES (1)
#endif

// Whether to provide the "uevent" module, and which implementation to use
#define MICROPY_PY_UEVENT_IMPL_NONE (0)
#define MICROPY_PY_UEVENT_IMPL_NATIVE (1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "Native" mean here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That there's no underlying OS, so it could be MICROPY_PY_UEVENT_IMPL_NOOS or MICROPY_PY_UEVENT_IMPL_BARE.

struct _mp_obj_poll_t {
mp_obj_base_t base;
unsigned short alloc;
unsigned short len;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// number of entries allocated ?
// number of entries filled ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

mp_obj_base_t base;
mp_obj_poll_t *poller;
mp_obj_t obj;
uint16_t user_flags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// copy of ready_flags: set of events on this object being signaled to the user

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready_flags doesn't exist on unix, it's more like "set of events ready on this object since the last time the user got them by iterating the poller"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. Mind adding that as a comment?

def _enqueue(self, s, idx):
entry = self.poll.register(s, 1 << idx)
if entry.data is None:
entry.data = [None, None, None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// entry.data is used to hold the tasks waiting for reading, writing, and error respectively

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there are no errors in uevent.... anything that has an error is readable and writable. The 3rd entry is for a generic event (eg pin edge).

For something like a UART you need all 3: read, write and events like RXIDLE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha! then adding a (corrected) comment would be doubly helpful;-)

# print('(poll {})'.format(dt), len(_io_queue.map))
_io_queue.wait_io_event(dt)
# print('(poll_ms {})'.format(dt))
_io_queue.poll_ms(dt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on the above comment about moving mp_handle_pending or MICROPY_EVENT_POLL_HOOK here, doing so would provide more flexibility. I forget what the ph_key of a ready task looks like, so it may require an additional check when recalculating dt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See reply above about that. This call should sleep as long as possible.

mp_obj_t obj;
uint16_t user_flags;
uint16_t user_method_name; // a qstr
mp_obj_t user_method_obj;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user_method_name and _obj seem unused? Maybe I missed something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used by mp_uevent_poll_entry_attr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that has getters and setters, but I don't see them being actually used anywhere. They were used in #6110. Maybe I'm overlooking something.

}
mp_obj_poll_entry_t *entry = NULL;
for (size_t i = 0; i < self->len; ++i) {
if (self->entries[i].base.obj == args[1]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's an implicit assumption here that this equality test is sufficient. for file descriptors it is. But looking ahead, it assumes that if I register Pin(12) that this is a true singleton object such that if I call Pin(12) in another part of the app and also register that the equality check is sufficient. Ughh, I hope this makes sense. I don't think this is necessarily incorrect or bad code, but it might be useful to flag this assumption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's essentially an "is" comparison in Python language, which should be enough. A port should ensure that Pin(12) here and there is the same object.

But the idea is to make this much more efficient and query the object to see if it's already registered, see if it already has its event callback set to the correct value (see #6110). This will be a more correct test because it relates to the event itself rather than the object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// python "is" comparison on the object assumed to be sufficient for now

@dpgeorge
Copy link
Member Author

dpgeorge commented Jun 9, 2020

I need to go back to the previous PR to understand why you're leaving the juicy stuff out...

I wanted to take it one step at a time. It's too error prone to make large sweeping changes.

If the direction here is agreed upon I can add some more commits here implementing the additional event-based items. Then this PR would basically be a cleaned up version of #6110.

The most confusing is what does the object registered with a poller have to satisfy in terms of properties and interface?

As mentioned above, it depends on the port: either it's a file descriptor, or something that responds to MP_STREAM_POLL. On Zephyr it'll be slightly different due to how its k_poll() works.

I wonder whether there is value in writing generic implementations of these three cases and having port-defined macros where the specific functions need to be called instead of having each port write its own version. Maybe that ends up being too unwieldy.

I factored out all the common code (so far) into extmod/moduevent.c. I think a lot of the ports are going to be different (stm32, esp32, zephyr, unix) and will have their own subtle differences in implementation, so I don't think it's worth trying to make it generic at this point.

I think overall what I like the least is the fact that there are still two nested scheduling loops: the uasyncio one and the poll_ms one. It would be much better IMHO to start consolidating these by allowing poll_ms to return an "empty iterator" so the caller has a chance to check everything it needs to.

poll_ms() may already return an empty iterator if the timeout expires without any events. That's why uasyncio needs a loop, to test if anything became runnable due to a timeout. As mentioned above, under what other conditions would poll_ms() return an empty set of events?

I wanted to keep uevent separate from uasyncio so that the former could be used in any async implementation (eg micro-trio) or even just used raw by the user instead of uselect. So uevent doesn't know about timeouts possibly leading to runnable tasks, or what an IO event leads to. That decision is made by the uasyncio loop.

@tve
Copy link
Contributor

tve commented Jun 9, 2020

I think it would be really helpful to spell out the semantics of and requirements on uevent in plain english. One red flag to me is that MICROPY_EVENT_POLL_HOOK shows up in one poll_ms and mp_handle_pending shows up in the other. The macro in particular is a convenient way to sweep unresolved ugly stuff under the rug ;-). If uevent is the new way to wait for events and it's port-specific then why does it have to call additional escape hatch stuff?

My gripe here really is that to arrive at a clean system the semantics need to be clear and the use of MICROPY_EVENT_POLL_HOOK pulls in a lot of implicit assumptions and unknown unknowns... Maybe using the macro is the right thing to do in the end but right now it seems muddling.

The issues I'm referring to are: what type of stuff needs to be polled or checked outside of what uevent already polls or checks? If there is such stuff then how does it interface with uevent, e.g., implicitly having "other polled stuff" says that there are some form of events outside of uevent: why is this necessary and if they are how are they supposed to "collaborate" with uevent, for example to cause uevent to return? I think that spending some time on making all this explicit and cleanly described is worth it in the long run as it will lead to cleaner code.

An example that sends my head spinning: on the esp32 MICROPY_EVENT_POLL_HOOK checks for events; but on STM32 MICROPY_EVENT_POLL_HOOK actually suspends (thread yield or WFI). As a result poll_ms on stm32 doesn't sleep itself but uses the MICROPY_EVENT_POLL_HOOK macro while poll_ms on unix doesn't use the macro and sleeps itself by calling the poll system call. MICROPY_EVENT_POLL_HOOK of course is used in other places in the codebase... WOuldn't it be nice to clean these things up a bit when it comes to uevent, even if that means duplicating some code for now?

Let me try to formulate some specs for uevent. More precisely, specs for the union of all uevent implementations, setting aside whether there should be a generic implementation of uevent or a separate one per port and whether some implementations only support a subset of functionality.

  • When it's being used, uevent becomes the central place where a thread blocks waiting for events external to the thread to come in and unblock the thread
  • uevent represents potential events using registrations, each registration consists of an object that represents the event source and a number of bit-flags that represent distinct events the source can produce
  • objects used in a registration can be of three different kinds: explicitly polled ones, globally polled ones, and fully-event driven ones
  • explicitly polled objects must provide a check() method that checks the event source and returns bit-flags to signal any event that happened
  • globally polled objects are polled all at once using a system function like select or poll/epoll so they don't need a check() method each, instead they need to provide an idx property that is used by the system function (typ this is the file descriptor number)
  • fully event driven objects are not polled and instead must otherwise arrange for the event source to be checked, for example using a hardware interrupt or another thread/task; if an event occurs these objects must call uevent with the appropriate bit-flags to signal the event (as I write this it seems obvious to me that it's uevent's business to provide a signal() method for this that can be called from ISRs and or other threads/tasks and that ends any sleep uevent may be in)
  • the semantics of poll_ms are that it loops over the following (the order is most likely not correct):
    • checks whether any events are ready (for example due to signal() having been called)
    • loops over all explicitly polled objects and calls their check() method
    • calls the global poll method
    • returns if there are ready events or the timeout of the call is expired
    • pauses the thread/task/processor until it's worth iterating again
  • in the above:
    • the pausing (last bullet item) may be combined with calling the global poll method
    • the pausing may be for a very short period if there are explicitly polled objects and it may be for the full duration of the timeout if there are none
  • is an additional escape hatch bullet item needed in the above list, e.g. poll_ms calls MPY_UEVENT_ESCAPE_HATCH so "additional stuff can be kept moving while the thread is trying to sleep"?

There's also mp_handle_pending... Right now it seems there some implicit contract between the mp_sched stuff and uevent's poll_ms. Kind'a "if you iterate then run mp_handle_pending" and "oh, by the way, the mp_sched stuff probably ensures that you somehow iterate when it needs you to run mp_handle_pending (for example thanks to EINTR)".

IMHO it would be very useful to discuss what mp_sched really is and what its semantics should be given all these changes. It seems to be "run this piece of code on the next available thread/task that runs the MP interpreter at a bytecode boundary". So in a threaded context should it wake up all blocked/sleeping MP threads? One of them (which)? Or should there be a separate MP thread just for mp_sched? (Such a thread could have high priority thereby reducing latency, e.g. on esp32.) If a wake-up is needed, shouldn't mp_sched call into uevent, possibly a "wake_up_for_mp_sched()" function, and let uevent encode how that's supposed to be implemented (even if it's a no-op)? As opposed to mp_sched doing something that indirectly/implicitly happens to wake up uevent?

Ugh, there I go again producing a wall of text... I hope that what comes across is that I really value a clear statement of semantics/requirements/assumptions/etc. (doesn't need to be formal). It helps the next person understand what goes on and enables them to make changes or additions. It also helps produce clean code and fewer bugs. One of the benefits I could see coming out of this is that it could allow most things scheduling get concentrated inside uevent so it can be reasoned about within that box, as opposed to having bits and pieces in a (growing) number of places in the codebase.

@jimmo
Copy link
Member

jimmo commented Jun 10, 2020

For now the uevent module should be considered "private" because its interface may change.

It's quite possible that someone might already have an event module in their project (e.g. event.py), and so with weak links this could get confusing.

Perhaps this should be named _uevent (not a huge fan of that though), or maybe we should be "namespacing" the micropython-specific or private modules. One option would be to make it micropython.uevent (i.e. using #5701).

@dpgeorge
Copy link
Member Author

The issues I'm referring to are: what type of stuff needs to be polled or checked outside of what uevent already polls or checks? If there is such stuff then how does it interface with uevent, e.g., implicitly having "other polled stuff" says that there are some form of events outside of uevent: why is this necessary and if they are how are they supposed to "collaborate" with uevent,

I'll try to clear up this bit first, because I think it gets to the crux of the issue. There are currently 3 levels of execution in uPy:

  • Hard interrupts: fully preemptive, code (Python code or dynamically loaded native code) is executed by the interrupt routine itself. This gives minimal latency but means that it can interrupt anything, eg a dict update or GC cycle. So the code that runs is constrained as to what it can do. One must be careful using this feature, but it is powerful when used correctly, and IMO an important feature for microcontrollers.
  • Soft interrupts: preemptive at the level of VM bytecodes, as though the currently executing Python function were paused part way through execution (at a bytecode boundary) and the soft interrupt callback executed in its place, before returning control to the main code. This is emulating how (hard) interrupts look/behave in a pure C program: code is suspended at a machine instruction boundary, the state is pushed to the stack, the IRQ executed using that stack, then it returns an resumes the "thread level" C function. Such soft interrupts are relatively low latency (they don't need to wait long for a bytecode boundary) and can execute any Python code. Note that hard interrupts can preempt soft interrupts, but a soft interrupt can never preempt another soft interrupt.
  • Main top-level (thread level) Python execution. Hard and soft interrupts can preempt this level.

It's the VM/runtime's job to manage these 3 levels of execution transparently to the user, and to the other levels. Ie the top-level doesn't need to manage execution of pending soft interrupts.

Hard interrupts can spawn soft interrupt callbacks via micropython.schedule(...). This is done automatically behind the scenes if an irq is configured as soft (the default, eg pin.irq(cb)).

Communication between levels (usually top level to soft/hard interrupt) can be done by using Python variables, eg setting a flag in the interrupt and testing it at the top level.

Soft/hard interrupts are, pretty much exclusively, triggered by an external event, eg UART incoming character, pin edge change, timer expiring, BLE event. On unix it would be a signal, eg SIGPIPE. These events need to be processed independently of whatever the main top-level code is doing.

Coming to uevent: this is intended to be a user interface for waiting efficiently for a set of events of interest. Events of interest are registered by the user and when one of those is ready then poll_ms() will return. Any other events (soft/hard interrupts) that the user has configured should be independent to what is registered with uevent. And these soft/hard callbacks should continue be called transparently to the user regardless of the use of uevent.

So in essence there are actually 2 layers of "event" sub-systems:

  1. The "transparent layer" which executes registered soft/hard interrupts as soon as their event becomes available, eg pin.irq(cb), and the user is not aware of them (apart from the fact the code has executed at some point). This could be (but doesn't need to be) implemented by an independent thread which sits in an infinite loop polling on all events and executing their callbacks.
  2. The "user layer" which waits for events and passes control back to the user as soon as an event of interest occurs.

I don't think it makes sense to try and merge these 2 layers, ie for the "user layer" to be responsible for executing soft callbacks from the "transparent layer". I also don't think it makes sense for the "transparent layer" to implicitly wake the "user layer" (explicitly, yes, eg via uevent.notify()).

@peterhinch
Copy link
Contributor

peterhinch commented Jun 10, 2020

Please could you clarify how these statements align.

Soft interrupts: preemptive at the level of VM bytecodes

From the docs Using micropython.schedule:

The callback is also guaranteed to run at a time when the main program has completed any update of Python objects, so the callback will not encounter partially updated objects.

A MicroPython instruction can map onto many bytecodes: how does the VM ensure that the consequence of preempting at the bytecode level is invisible at the Python level?

Why does this not (#6106) automatically imply that Event.set() can safely be called from a soft ISR?

@tve
Copy link
Contributor

tve commented Jun 10, 2020

Damien, thanks for the long reply! I've also wondered about the question Peter has asked, I believe soft interrupts are not checked strictly after every bytecode but only after ones that occur at the end of a statement, or something like that. But I could be very wrong!

I don't think it makes sense to try and merge these 2 layers, i.e. for the "user layer" to be responsible for executing soft callbacks from the "transparent layer".

I didn't mean to suggest this. I'll rephrase what I'm trying to do by saying that I'm trying to make the communication between the transparent layer and the user layer explicit and to make it clear how blocking and unblocking occur and whose responsibility is what. Besides simply overall clarifying what is happening I believe this is necessary to go to sleep because sleep is a global decision that affects everything at all layers and so global knowledge is necessary in some shape or form (very abstractly speaking).

I believe that something important missing from the scheduling description: threads (and tasks). Are the following statements correct in the presence of python threads?

  • A soft-IRQ is executed on the next running thread (at the next bytecode boundary).
  • Enqueueing and dequeuing soft-IRQs are protected by a lock and by blocking interrupts (as appropriate depending on the port), enqueuing is thus hard-IRQ safe and generally thread/task safe. (One exception is that enqueue/dequeue are not multi-core safe on the esp32.)
  • When a soft-IRQ is scheduled, no explicit provision is made to cause a python VM thread to actually run, it mostly "just happens" (see below for more details).
  • The soft-IRQ machinery does not interact with uevent, thus if a soft-IRQ causes an event to be signaled it must cause uevent's poll_ms to "take note" (the uevent.signal() method, however it's called, could take care of this).
  • Similarly, if a soft-IRQ otherwise causes an asyncio (or trio, or other) task to be made runnable then the soft-IRQ itself must interact with uevent to cause poll_ms to return. Alternatively, to make a task running outside of uevent a soft-IRQ should call some uasyncio function that itself interacts with uevent. Another alternative would be to outlaw making a task runnable without using uevent.
  • The last two bullet points apply not only to soft-IRQs but also to python threads and to non-python threads/tasks, i.e., if any of these signal a uevent or make an asyncio task runnable they need to interact directly or indirectly with uevent to cause poll_ms to return.

About causing a python thread to run when a soft-IRQ is scheduled, are the following statements correct? Note that to a large degree the issues below are not relevant today because of busy polling, these bullet points are not about possible issues in the code today but about what needs to be solved going forward.

  • On bare metal, ARM specifically, the processor is obviously running when it schedules the soft-IRQ and when whatever schedules the soft-IRQ returns (abstract notion of return) eventually this lands back in the VM. If the soft-IRQ was scheduled from a hard-IRQ that interrupted sleep (WFI) then the ARM wake-on-return feature (I forget the exact name) ensures that the VM runs at least long enough to check the soft-IRQ queue before it goes to sleep again (calling WFI).
  • On unix if the soft-IRQ is scheduled from a signal handler and the kernel runs the signal handler on a kernel thread that runs the VM then any blocking call the VM may have issued is aborted via EINTR, which causes the VM to check the soft-IRQ queue before it restarts the blocking call.
  • However, if the kernel runs the signal handler on a non-VM kernel thread or the soft-IRQ is scheduled from a non-VM thread (e.g. something spawns a C helper thread or MP is embedded in a larger non-python app) then the VM may not service the soft-IRQ until some unrelated event unblocks it. Note that this bullet point may only be relevant if kernel threads are used, I don't know anymore what pthreads on linux does...
  • On systems with an RTOS, if a soft-IRQ is scheduled from a hard-IRQ or from a non-VM task there is currently no provision to actually cause a VM task to run. If the system was asleep (WFI equivalent situation) when the hard-IRQ started there is no provision to prevent the processor to go back to sleep once the hard-IRQ completes.
  • On multi-core systems the above considerations can be more complex because they may require inter-processor communication and one of the cores could be asleep while the other runs. Plus threads/tasks/IRQ may have affinity constraints. Typically these issues are dealt with in the RTOS but MicroPython still needs to use the right primitives and take all relevant cases into consideration. (Note that on the esp32 MicroPython currently restricts itself to one core for these reasons.)

Let me try a couple more statements related to sleep (processor stops running instructions):

  • In a non-threaded bare metal system there are a number of functions whose purpose is to wait on a well-defined event or set of events (select, poll, mp_hal_stdin_rx_chr, ...), if they need to wait these functions can simply put the processor to sleep 'cause there's nothing else that could be done. When they wake up again, these functions must check the soft-IRQ queue.
  • In a threaded bare metal system functions that wait on events (general sense of the term, not uevent) can only suspend the current thread. Sleep mode can only be entered by the thread scheduler when it detects that all threads are suspended. It looks to me like this is not implemented and threads just keep yielding: https://github.com/micropython/micropython/blob/master/ports/stm32/mpconfigport.h#L385.
  • In an RTOS system (tasks assumed) the functions that wait on events need to call some RTOS-provided function that suspends the thread until a wait condition is satisfied. It is the RTOS that enters sleep mode when all (Python and non-Python) tasks are suspended.
  • In an RTOS system, in order to unblock a VM thread to cause it to run a soft-IRQ some RTOS-provided function needs to be called in order to end that thread's wait condition. This implies that it was somehow determined that no VM thread is runnable and that some VM thread was selected for wake-up.
  • Deep-sleep, as defined by a mode where execution state is lost and wake-up is similar to a reset, is different, all that is really needed is a function that "cuts the power" so to speak.

Another wall of text. My head is spinning :-). My overall suggestion is to make all of the above explicit in the code and to think about which layer is responsible for implementing what and have a clearly named function (or macro) for the purpose. Some of these functions/macros may be empty (or #-defined away for efficiency).

NB: I realize a lot of the above applies more to #6110 than this PR and is definitely forward-looking. I'm just continuing here 'cause we started here and I hope that at the end the information can be captured and put into the MP implementation notes or where appropriate.

@jcw
Copy link

jcw commented Jun 10, 2020

As I understand it, bytecode granularity ≠ Python statement granularity. So i = i + 1 will not be atomic w.r.t. soft interrupts (and of course hard IRQs as well), whereas i += 1 probably is?

Damien, you say:

It's the VM/runtime's job to manage these 3 levels of execution transparently to the user, and to the other levels. Ie the top-level doesn't need to manage execution of pending soft interrupts.

There's perhaps a slightly different way to look at this. The CPU is an opcode-devouring loop, and the VM is a bytecode-devouring loop. For any kind of context change, these loops needs to be suspended in some way. W.r.t. hardware interrupts, the VM (nor any other part of the system) can not really do anything, they happen anyway. All a h/w IRQ can do is signal the software in some atomically safe fashion. At the gate level, every h/w IRQ is effectively picked up by the fetch/execute cycle in the CPU. Likewise, a soft IRQ happens (usually triggered by a h/w IRQ - or even always?) and then the "soft CPU", i.e. the VM, picks these soft IRQs up at its fetch/execute cycle, i.e. VM loop.

Nothing new, but how you deal with this has major implications IMO for stacking of contexts (both C and Python) and for the information available to "schedule" low-power mode changes. If low-power scheduling is to become a major feature in MicroPython, then I think it makes sense to treat that goal as the central decision point for everything else. A power/speed/interrupt manager of some yet-to-be defined kind could "see" all the change-of-control path activity on a µC, from hard/soft interrupts, to timing queues, to the VM loop doing its thing. And not only see this, but plan and predict what's likely to happen and to be needed next. If I know there are no timeouts for the next 100 ms, and I've seen no UART interrupts, and no actually data arriving over USB, then I can decide to switch to the RTC clock, put the µC in a deep sleep mode which is still able to wake-up for the stuff that matters, and reduce the power consumption dramatically further than in plain WFI/WFE (on ARM) sleep mode.

MP is not there at the moment. But the uevent module could be a big step to make that feasible in the future. It kind of does several of the things I just described. But some of the above also hinges on not integrating it too deeply into the VM. If the VM and the processing of all types of events could somehow be treated as equal partners, then this power/speed/interrupt manager would contain logic to decide to let the VM run for N cycles, or until a specific flag is set, and then drop out of its loop, so that next decision can be made about which context is to run next.

With threads, I don't think this situation becomes drastically different. Unless all the sleeping and interrupt handling is done by the underlying RTOS anyway. Then all of MP becomes a slave task, with occasional soft interrupts fed into it.

Deep sleep, at the very lowest level, comes back as a full reset. That's basically the same as turning the chip off and powering it up again. It's also the mode with the highest possible power savings (well sub-µA). I'd love to see that work well with MicroPython. And here too, there is a benefit to adding a small amount of logic around the VM: the VM is done for whatever reason, it returns, leaving the complete C and Python context in a state which can be resumed. At which point this power/speed/interrupt manager thing decides it wants to go all out, saves state in a power-cycle resilient way (eeprom, backup sram, whatever), and turns the chip off, with care to get the wake-up logic right, so that it will get back control on power-up. It's a bit like suspend-to-disk, but with very different options in an embedded µC environment.

Anyway. If uevent can be the gateway to these capabilities in the future, IMO that'd be grand.

@dpgeorge
Copy link
Member Author

Please could you clarify how these statements align.

Soft interrupts: preemptive at the level of VM bytecodes

From the docs Using micropython.schedule:

The callback is also guaranteed to run at a time when the main program has completed any update of Python objects, so the callback will not encounter partially updated objects.

A MicroPython instruction can map onto many bytecodes: how does the VM ensure that the consequence of preempting at the bytecode level is invisible at the Python level?

Why does this not (#6106) automatically imply that Event.set() can safely be called from a soft ISR?

Good questions... both of the statements are true. By "update of Python objects" it means things like list/dict insertion/deletion are atomic. But certainly x = a + b is not atomic because it's made up of multiple opcodes. For something like a dict insert d[key] = value this ends up being a single opcode (after evaluating d, key and value) so that is atomic.

For Event.set(): that's implemented in pure Python and made of many opcodes so should not be interrupted by a soft interrupt/callback (at least in its current form). Eg it checks to see if the queue has anything on it, and if it does then it pops something from the queue. If this was interrupted right between the check and the pop then the code that interrupted it may also modify the queue and pop something and then when it returns the pop in the top-level code could fail.

@dpgeorge
Copy link
Member Author

I've also wondered about the question Peter has asked, I believe soft interrupts are not checked strictly after every bytecode but only after ones that occur at the end of a statement, or something like that

Yes technically they are only checked after a jump/jump-if instruction, but you should code as though they could happen between any opcode, because maybe one day the VM will be changed so they are checked more often (to reduce soft-interrupt latency).

  • A soft-IRQ is executed on the next running thread (at the next bytecode boundary).

The behaviour in the code is a bit undefined/ambiguous at the moment. First, one should really make the distinction between GIL and no GIL (uPy can work without a GIL because it doesn't have reference counting). With a GIL it pretty much reduces to the single-threaded case because threads can only execute sequentially, and the soft-irq must execute with the GIL. So, yes, a soft-irq is executed in the current thread's context, at the next (jump) opcode boundary. Without the GIL there are other locks (qstr and gc) which must be obtained by any running thread or soft-irq.

Actually, in a threaded system without the GIL there are no guarantees on atomic Python operations because threads can be truly running in parallel. So soft-irqs also have no guarantees on atomic operations like dict/list updates. In this case you need to explicitly manage any communication between threads (and soft-irq <-> thread) using locks from _thread.allocate_lock().

Well, if we wanted very low-latency soft-callbacks, it could be that this no-GIL behaviour be extended to non-threaded systems. That is, soft-callbacks have no guarantees on atomic operations, or execution at opcode boundaries. Instead the only guarantees are: qstr and GC operations are atomic, because they are global operations that update global state. And soft-irq callbacks can still allocate heap memory (because GC is atomic). Everything else, all other data structures, must be managed through explicit locks by the user. Changing the behaviour to this would be a discussion for elsewhere (and I'm not suggesting to do it at this point...).

  • Enqueueing and dequeuing soft-IRQs are protected by a lock and by blocking interrupts (as appropriate depending on the port), enqueuing is thus hard-IRQ safe and generally thread/task safe. (One exception is that enqueue/dequeue are not multi-core safe on the esp32.)

Correct.

  • When a soft-IRQ is scheduled, no explicit provision is made to cause a python VM thread to actually run, it mostly "just happens" (see below for more details).

Correct.

  • The soft-IRQ machinery does not interact with uevent, thus if a soft-IRQ causes an event to be signaled it must cause uevent's poll_ms to "take note" (the uevent.signal() method, however it's called, could take care of this).

Correct. The soft-irq sub-system/machinery acts transparently to uevent, ie uevent does not know about it. And, yes, if a soft-irq callback triggers an event that uevent is waiting on, then that triggering must be explicit (and could be an anonymous trigger or a named one; in the latter case uevent would know what tasks to schedule as runnable).

  • Similarly, if a soft-IRQ otherwise causes an asyncio (or trio, or other) task to be made runnable then the soft-IRQ itself must interact with uevent to cause poll_ms to return. Alternatively, to make a task running outside of uevent a soft-IRQ should call some uasyncio function that itself interacts with uevent. Another alternative would be to outlaw making a task runnable without using uevent.

I'm not sure I fully understand this point/statement. But in the original idea with socketpair(), the behaviour was that a soft-irq would schedule runnable tasks immediately in the soft-irq, and then do an anonymous signal/notify to poll to wake it up. Poll doesn't know who woke it up and has nothing to do except return, assuming some task was put on the run queue. As such, moving tasks from one queue to another needed to be soft-irq safe.

This might not be the best way to do it. As you say, it's probably good to disallow making a task runnable without uevent. That is, all notifications to uevent must be "named" so that uevent knows what tasks to make runnable when it wakes up. Eg when you register an object with uevent it passes that object a "key" which it can use to notify the poller/uevent that the event triggered, and then all tasks associated with that "key" should be woken/made runnable.

  • The last two bullet points apply not only to soft-IRQs but also to python threads and to non-python threads/tasks, i.e., if any of these signal a uevent or make an asyncio task runnable they need to interact directly or indirectly with uevent to cause poll_ms to return.

Yes. But they can use the same approach to signal uevent because it will be hard-irq safe.

@dpgeorge
Copy link
Member Author

  • On bare metal, ARM specifically, the processor is obviously running when it schedules the soft-IRQ and when whatever schedules the soft-IRQ returns (abstract notion of return) eventually this lands back in the VM. If the soft-IRQ was scheduled from a hard-IRQ that interrupted sleep (WFI) then the ARM wake-on-return feature (I forget the exact name) ensures that the VM runs at least long enough to check the soft-IRQ queue before it goes to sleep again (calling WFI).

It's not always in the VM (the part that executes bytecode opcodes)... it could be in a blocking call like time.sleep() or a blocking socket read (for example). So a wake up of the machine will not get back to the VM before another WFI is executed, and that's exactly why MICROPY_EVENT_POLL_HOOK is needed in such waiting loops.

The VM itself knows nothing about WFI, sleeping or blocking. Only modules like select or time know that stuff.

  • On unix if the soft-IRQ is scheduled from a signal handler and the kernel runs the signal handler on a kernel thread that runs the VM then any blocking call the VM may have issued is aborted via EINTR, which causes the VM to check the soft-IRQ queue before it restarts the blocking call.

Aren't signals always executed on user threads, not kernel threads?? But, yes, any blocking kernel call made by the runtime will be interrupted by EINTR (due to a signal) and then the runtime has a chance to process the soft-irq that was scheduled by the signal handler.

  • However, if the kernel runs the signal handler on a non-VM kernel thread or the soft-IRQ is scheduled from a non-VM thread (e.g. something spawns a C helper thread or MP is embedded in a larger non-python app) then the VM may not service the soft-IRQ until some unrelated event unblocks it. Note that this bullet point may only be relevant if kernel threads are used, I don't know anymore what pthreads on linux does...

As long as the soft-irq callback is executed by some thread it doesn't matter which one (because the top-level Python code shouldn't know about execution of soft-irq callbacks). There's only a problem if no thread executes the callback (which may be a problem in the current unix port...).

  • On systems with an RTOS, if a soft-IRQ is scheduled from a hard-IRQ or from a non-VM task there is currently no provision to actually cause a VM task to run. If the system was asleep (WFI equivalent situation) when the hard-IRQ started there is no provision to prevent the processor to go back to sleep once the hard-IRQ completes.

Yes, there could be issues related to this. But it depends on whether the GIL is used or not. If it is used, then if the hard-irq executes when the GIL is released then the hard-irq can just acquire the GIL and run the soft-irq callback there and then. If the GIL cannot be acquired then there must be a VM task that is currently running, which can take care of running the schedule soft-irq callback.

  • On multi-core systems the above considerations can be more complex because they may require inter-processor communication and one of the cores could be asleep while the other runs. Plus threads/tasks/IRQ may have affinity constraints. Typically these issues are dealt with in the RTOS but MicroPython still needs to use the right primitives and take all relevant cases into consideration. (Note that on the esp32 MicroPython currently restricts itself to one core for these reasons.)

Yes, multicore needs more thought in this regard.

@dpgeorge
Copy link
Member Author

  • In a non-threaded bare metal system there are a number of functions whose purpose is to wait on a well-defined event or set of events (select, poll, mp_hal_stdin_rx_chr, ...), if they need to wait these functions can simply put the processor to sleep 'cause there's nothing else that could be done. When they wake up again, these functions must check the soft-IRQ queue.

Correct. Although there may be other IRQs which are registered outside MicroPython, like low-level systick, USB or UART rx buffer handlers. It cannot go to sleep if these are active and cannot wake the system from sleep (they can all wake it from WFI but not all from lightsleep mode). The system must keep track of which IRQs are active and what level of sleep can be achieved at any given point.

Correct. But on stm32 if nothing is runnable then it does do a WFI, see pybthread.h:pyb_thread_yield().

  • In an RTOS system (tasks assumed) the functions that wait on events need to call some RTOS-provided function that suspends the thread until a wait condition is satisfied. It is the RTOS that enters sleep mode when all (Python and non-Python) tasks are suspended.

Correct.

  • In an RTOS system, in order to unblock a VM thread to cause it to run a soft-IRQ some RTOS-provided function needs to be called in order to end that thread's wait condition. This implies that it was somehow determined that no VM thread is runnable and that some VM thread was selected for wake-up.

Yes that sounds about right... but this is actually implemented on esp32 using mp_hal_wake_main_task_from_isr.

  • Deep-sleep, as defined by a mode where execution state is lost and wake-up is similar to a reset, is different, all that is really needed is a function that "cuts the power" so to speak.

Correct. machine.deepsleep() doesn't really have much to do with this discussion. It's all about machine.lightsleep().

@tve
Copy link
Contributor

tve commented Jun 11, 2020

Thanks for all the clarifications! I had not drilled down to pybthread.h:pyb_thread_yield() so that's now on my to-do list :-)

On unix if the soft-IRQ is scheduled from a signal handler and the kernel runs the signal handler on a kernel thread that runs the VM then any blocking call the VM may have issued is aborted via EINTR, which causes the VM to check the soft-IRQ queue before it restarts the blocking call.
Aren't signals always executed on user threads, not kernel threads?? But, yes, any blocking kernel call made by the runtime will be interrupted by EINTR (due to a signal) and then the runtime has a chance to process the soft-irq that was scheduled by the signal handler.

Kernel threads are entities that are scheduled by the kernel (can be thought of as light weight processes that all share the same address space). User threads are entities that run in the context of a kernel thread and that are scheduled by a user-level thread library (user threads are multiplexed onto a kernel thread). Phtreads is an interface that can be implemented by using a kernel thread for each pthread, or by using a single kernel thread and implementing all pthreads at user-level, or by a combination of both (N pthreads using N user threads that multiplexed onto M kernel threads).

The reason this is relevant here is that we're talking about unblocking kernel operations. The tricky situation is the following (I'm not 100% sure I got all the details right):

  • suppose you have two MicroPython threads and pthreads creates two kernel threads for that
  • in thread A MicroPython calls the poll() system call, this blocks the kernel thread
  • a signal is sent to the MP process, the kernel decides that thread B is going to handle it
  • the signal handler runs in thread B and doing so interrupts any system call that thread B may be stuck in using EINTR
  • thread A's system call is not interrupted as far as I know

What this highlights is that one either has to really understand the precise semantics of the poll system call and EINTR in the context of pthreads (since that's what MP uses) or one has to always assume the worst-case and use a solution like the socketpairs to force thread A in the example above to unblock.

@peterhinch
Copy link
Contributor

But certainly x = a + b is not atomic because it's made up of multiple opcodes. For something like a dict insert d[key] = value this ends up being a single opcode (after evaluating d, key and value) so that is atomic.

This makes it hard to reason about the behaviour of scheduled functions. For example a forum user asked if it is safe to modify a uheapq instance in a scheduled function. I can't see how to answer this question without studying source or disassembling code.

There is clearly a tradeoff between latency and usability. Soft IRQ's clearly need to be as fast as possible and it's reasonable to expect users to take precautions.

In the case of micropython.schedule I would suggest that pre-emption at a Python instruction boundary would improve usability. An explicit call to micropython.schedule implies an acceptance of latency. In many cases (e.g. uasyncio) the latency is unlikely to be significant. This could be taken further if we had micropython.critical(set) which allowed the declaration of critical sections. These would preclude pre-emption by scheduled functions (but would be ignored by ISR's). This would fix the Event.set problem.

@dpgeorge
Copy link
Member Author

If low-power scheduling is to become a major feature in MicroPython, then I think it makes sense to treat that goal as the central decision point for everything else.

Yes this is a longer term goal. uevent is part of this and can help because it gives more information about what events are pending. But the system does need to keep track of more things, basically all active IRQs and whether they can wake the MCU from a lower power state.

A power/speed/interrupt manager of some yet-to-be defined kind could "see" all the change-of-control path activity on a µC, from hard/soft interrupts, to timing queues, to the VM loop doing its thing. And not only see this, but plan and predict what's likely to happen and to be needed next. If I know there are no timeouts for the next 100 ms, and I've seen no UART interrupts, and no actually data arriving over USB, then I can decide to switch to the RTC clock, put the µC in a deep sleep mode which is still able to wake-up for the stuff that matters, and reduce the power consumption dramatically further than in plain WFI/WFE (on ARM) sleep mode.

Yes this describes well what I'm aiming for. It'll depend heavily though on the MCU/SoC and it's sleep support, and also whether there's an RTOS in there.

Deep sleep, at the very lowest level, comes back as a full reset. ... the power/speed/interrupt manager thing decides it wants to go all out, saves state in a power-cycle resilient way (eeprom, backup sram, whatever), and turns the chip off, with care to get the wake-up logic right, so that it will get back control on power-up. It's a bit like suspend-to-disk, but with very different options in an embedded µC environment.

That would be a fancy feature but requires a lot of management of the SoC, eg saving the state of all peripherals.

@dpgeorge
Copy link
Member Author

What this highlights is that one either has to really understand the precise semantics of the poll system call and EINTR in the context of pthreads (since that's what MP uses) or one has to always assume the worst-case and use a solution like the socketpairs to force thread A in the example above to unblock.

Yes, I think using socketpair is the way to go in a multi-threaded POSIX-like system. For example, you may have 2 independent uevent.poll_ms() calls in two separate threads, and a given signal would want to wake exactly one of those, and a specific one at that. But this is just an implementation detail of uevent.notify(...).

@dpgeorge
Copy link
Member Author

This makes it hard to reason about the behaviour of scheduled functions. For example a forum user asked if it is safe to modify a uheapq instance in a scheduled function. I can't see how to answer this question without studying source or disassembling code.

Code in scheduled functions can never be interrupted by another soft-scheduled callback. So you can do x = y + z in a scheduled function and be guaranteed it is atomic. The entire scheduled function is effectively atomic, just like a hard IRQ handler is in C land (assuming there's only 1 priority level, which is the case for soft-scheduled functions). Caveat: hard IRQs can interrupt soft IRQ handlers.

So it's definitely safe to modify a uheapq instance in a scheduled function. I guess the question is really whether uheapq operations must be protected at top-level (ie non scheduled) code. Since uheapq is implemented in C it's operations are all atomic wrt scheduling, so they don't need extra protection. But that's not true for pure Python functions.

I'm not sure what the best thing is to do here regarding docs and specs... maybe we can define a set of operations/data structures that are always guaranteed to be atomic, and list them in the docs?

In the case of micropython.schedule I would suggest that pre-emption at a Python instruction boundary would improve usability

That's actually kind of how it's implemented at the moment: scheduled functions are only checked for execution at a jump opcode, which means most expressions are atomic. But I'm not sure it's a good idea to rely on this, or make it part of the spec/standard, because that locks us in to this implementation and forbids potential optimisations in the future.

This could be taken further if we had micropython.critical(set) which allowed the declaration of critical sections. These would preclude pre-emption by scheduled functions (but would be ignored by ISR's). This would fix the Event.set problem.

In PRs #6106 and #6110 I added micropython.scheduler_lock() / micropython.scheduler_unlock() exactly to solve this issue. Maybe these would be even better as a proper context manager, eg:

with micropython.scheduler_locked():
    ...

@tve
Copy link
Contributor

tve commented Jun 12, 2020

In the case of micropython.schedule I would suggest that pre-emption at a Python instruction boundary would improve usability.

There is no such thing as a python instruction. You can talk about statements, or simple (non-compound) statements, or expressions, or expression atoms, etc. Pretty much all of these can have function calls and the moment you call another python function all bets are off. So I don't think you can meaningfully define something at the python level to be atomic.

In PRs #6106 and #6110 I added micropython.scheduler_lock() / micropython.scheduler_unlock() exactly to solve this issue. Maybe these would be even better as a proper context manager.

I believe this is the only sane solution to the issue, and yes, a context manager would be nice!

@peterhinch
Copy link
Contributor

@dpgeorge @tve Points taken, thank you. I agree about the context manager. Should nested calls be considered (or disallowed)?

maybe we can define a set of operations/data structures that are always guaranteed to be atomic, and list them in the docs?

This seems to be the only option, along with documenting the context manager and the reasons for using it.

@dpgeorge
Copy link
Member Author

Should nested calls be considered (or disallowed)?

Yes nesting of scheduler lock/unlock should be allowed, and it currently is because the underlying mp_sched_lock()/mp_sched_unlock() support it via a simple depth counter.

@projectgus
Copy link
Contributor

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants