Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extmod/uasyncio Event.set() not safe in ISR #5795

Open
kevinkk525 opened this issue Mar 25, 2020 · 11 comments
Open

extmod/uasyncio Event.set() not safe in ISR #5795

kevinkk525 opened this issue Mar 25, 2020 · 11 comments
Labels

Comments

@kevinkk525
Copy link
Contributor

If you call Event.set() in an ISR, you could end up losing all tasks that were awaiting the event (if the ISR is executed while uasyncio is sorting tasks on the queue e.g.).

I made a testscript a while ago to demonstrate this: https://gist.github.com/kevinkk525/5b89395604da3df3be7a015abcba04fa

@tve
Copy link
Contributor

tve commented Mar 25, 2020

I've been thinking about this one a bit and I would like to propose a solution that is broader than just Event so it can also be used with sockets or other driver. An example other driver is the ESPNow interface which functions based on callbacks which occur on "a high priority Wifi thread", which reads to me like it could be on the PRO core, i.e. those are worse than ISRs running on the same core in terms of synchronization requirements (see #4115).

I believe that fundamentally what we need is an event facility (different from Event) that allows an ISR to flag something that the asyncio loop picks up at the next scheduling event and processes between scheduling events. I assume we want to optimize the overhead this creates at each scheduling event down to the minimum.

Proposal:

  • implement a fixed number of slots (e.g. 32) into which callbacks can be registered, i.e. typically each slot would contain a pointer to a function closure
  • implement a flag array with a bit for each slot that signals to the asyncio loop that the slot needs to be handled, which means that the loop should invoke the closure and reset the flag at the next scheduling event
  • implement a function that is ISR safe to set a flag in the array, which signals a slot
  • the key in all this is the assumption that it is possible to atomically set/clear a bit cheaply in a machine word even in a multi-core situation (esp32 has an atomic compare-and-set for that, see https://esp32.com/viewtopic.php?t=5017)
  • note that the slots holding the function closure pointers need to be added to the gc roots if the whole structure isn't referenced at the python level

In terms of semantics the only guarantee is that after the ISR signals the slot, the callback is executed at least once by the asyncio loop. In particular, the asyncio loop may invoke the callback anytime even without ISR signal (typically this only happens due to a race condition).

Pseudo-code implementation:

class ISRSignal:
    def __init__(self):
        self.slots = [ None for _ in range(32) ]
        self.flags = 0
    # registerSignal adds a callback and returns the slot number assigned or None if all slots are full
    def registerSignal(self, cb):
        for i, s in enumerate(self.slots):
            if s is None:
                self.slots[i] = cb
                return i
        return None
    # checkSignals should be called in the asyncio loop and invokes all callbacks that are pending
    def checkSignals(self):
        if self.flags == 0: return
        for i, s in enumerate(self.slots):
            if (self.flags & 1<<i) == 0: continue
            atomic_bit_clear(self.flags, i)
            self.slots[i]()
    # signalFromISR flags the callback slot as pending for the next asyncio loop iteration
    def signalFromISR(self, slot_num):
        atomic_bit_set(self.flags, i)
    # deregisterSignal frees a slot
    def deregisterSignal(self, slot_num):
        atomic_bit_clear(self.flags, i)
        self.slots[i] = None

Sample usage with Event:

    self.ev = Event()
    def setMyEvent(): self.ev.Set()
    slot = ISRSignal(setMyEvent)
    initMyISR(..., slot=slot)

I believe an alternative would be to make mp_sched_schedule ISR and multi-core safe, but that looks a lot more tricky than the above to me.

@kevinkk525
Copy link
Contributor Author

kevinkk525 commented Mar 25, 2020

It certainly is important to think about general ways of interacting with objects from an ISR, however a solution with a fixed number of slots will always have the problem of either wasting RAM or not providing enough slots.

An easy solution (but one that comes with an overhead) for the Events could be something along these lines:

_event_changed_flag=False
_event_chain=None

class Event:
    def __init__(self, ...):
        _next_event=None
        append_to_event_chain(self)

    def set(self):
        self.state=True
       self._state_has_changed=True
        _event_changed_flag=True


# somewhere in the uasyncio main loop:
if _event_changed_flag:
    event_changed_flag=False
    for event in _event_chain:
        if event._state_has_changed:
            event._statehas_changed=False
            while event.waiting.peek():
                _task_queue.push_head(event.waiting.pop_head())

This solution will only cost performance by iterating over all events after an event has been set.

Drawback: Because the Events are all linked, an Event will never be reclaimed by the garbage collector because this reference won't go away (unless you introduce an Event.deinit() method which would then be a weird micropython extension to the Event class).

@tve
Copy link
Contributor

tve commented Mar 25, 2020

My argument wasn't about fixed slots but about creating a facility that is easy to use in native modules for purposes other than just Event. I'm looking forward to improving some of the socket code so await on a socket operation isn't the current polling hell which effectively prevents low-power sleeping. It would be nice to have something which doesn't require an ISR signal to travel through python code and then go back into native code to finish off the processing. I realize that what I proposed doesn't fully accomplish that since it invokes a closure, which is mostly because I haven't dug into how closures are represented and called.

TBH not being able to reclaim Events is a no-go, especially since they stay in the list. But I think this is solvable, for example, Events could be registered for ISR use.

The more I think about it, the more I'm inclined towards making mp_sched_schedule ISR and multi-core safe. On the esp32 that might be by using a FreeRTOS queue and if checking that in the event loop costs too much (I don't really think so) then we can always add a "check the queue" flag Update: I just checked and FreeRTOS's prvIsQueueEmpty() does not even require a critical section, so the check in the event loop is basically free: https://github.com/espressif/esp-idf/blob/master/components/freertos/queue.c#L1963-L1980.
All this would require Event to have a SetFromISR() method, but IMHO that would be fine. Something I don't fully understand is how mp_sched_schedule interacts with GC, maybe the functions in the sched queue are considered as part of the GC roots...

But maybe dpgeorge already has it all figured out :-)

@dpgeorge
Copy link
Member

The main problem to solve is how to wait for an event, efficiently. How does the async scheduler know that an event has occurred?

Currently, without any IO (eg sockets), there can be no events, so the waiting is in such cases just a sleep until the next task should be scheduled.

With IO (eg sockets) the waiting is either until the next task must be scheduled, or an IO entity becomes readable/writable/in-error. This is done using select.poll objects.

To handle more general events (eg BLE scan result, pin change, SPI DMA transfer half complete) I can see two main options:

  1. Require that all events be associated with an object that can be registered for read/write with the existing select.poll. This means events must be either "read" or "write" events which may be an awkward fit for a lot of cases (eg for pin change, the change could be edge or level, and high or low).
  2. Invent a new "event waiter" object that waits for a general event, with a timeout.

This has been discussed a bit here: #5332 (comment)

@tve
Copy link
Contributor

tve commented Mar 30, 2020

@dpgeorge, I'm not sure how to interpret your comment, let me try to start from the beginning.

Let's take the example of ESP-now so we have something concrete: https://docs.espressif.com/projects/esp-idf/en/release-v4.0/api-reference/network/esp_now.html . The interface is based on callbacks:

Call esp_now_register_recv_cb to register receiving callback function. When receiving ESP-NOW data, receiving callback function is called.
Call esp_now_send() to send ESP-NOW data and esp_now_register_send_cb to register sending callback function. It will return ESP_NOW_SEND_SUCCESS in sending callback function if the data is received successfully on MAC layer. Otherwise, it will return ESP_NOW_SEND_FAIL.

The callbacks "run from a high-priority WiFi task" which means it's almost like an interrupt, I wouldn't be surprised if the callbacks actually ran on the PRO core which in some sense is worse than a regular same-core interrupt. Since ESP-now is about send/recv it would be possible to integrate it underneath the select/poll interface. In that case, a pile of C code would have the esp-now callback handlers and would implement the read/write/ioctl needed by the poll and stream interfaces. Synchronization between the ISR-like callback-handlers and the read/write/ioctl stuff would be "an internal implementation problem", presumably solved by using some FreeRTOS queues. Let's call that option 1.

Option 2 might be to implement an esp_now python object that has an async def send and async def recv methods. Send transmits and then blocks to return a result. Recv blocks and returns the next message coming in. Inside this object there needs to be synchronization with the native esp-now send/recv callbacks. What I proposed is that we could make mp_sched_schedule ISR and multi-core safe for this purpose.
With that, the implementation of the esp_now object would be as follows. On send, the calling task is removed from the uasyncio sched queue, and when esp-now calls the C send callback, the callback handler invokes mp_sched_schedule to run a fragment of python code that puts the task back onto the run queue. Similarly for receive: the C callback handler that gets invoked by esp-now uses mp_sched_schedule to run a code fragment to put the application's recv task back into the run queue.

Option 3 is almost the same as option 2 but uses an Event object that has a SetFromISR method implemented in C. In this option the esp_now python object has two associated Event objects, one for sending and one for receiving. The esp_now.send function would transmit the packet and then block on the Event, which gets signaled when the send completes at which point esp_now.send resumes and returns a success code to the caller. This is implemented by having the C send callback invoked by esp-now call a C-level SetFromISR on that Event object. Kevin's proposal has the implementation of SetFromISR consisting of setting three flags to true and the tasks that are blocked on the Event would be placed on the run queue at the next scheduling iteration.

A different use-case, such as BLE scan, would not have option 1 because it doesn't map well into read/write but both options 2 & 3 are pretty straight-forward.

Am I missing something or perhaps not understanding your concerns?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 5, 2020

Re option 1:

In that case, a pile of C code would have the esp-now callback handlers and would implement the read/write/ioctl needed by the poll and stream interfaces.

In this particular case of ESP-NOW, how does stream read/write map to the ESP-NOW events? I guess the obvious thing is for the read callback to make the object readable (in the stream sense, with poll/select), and the object is initially writable and once send is called it becomes unwritable until ESP_NOW_SEND_SUCCESS is received, at which point it becomes writable again (in the stream sense). If ESP_NOW_SEND_FAIL is received instead then the object goes in to an error state and polling it gives POLLERR.

So the ESP-NOW could map quite cleanly to the stream paradigm, but it wouldn't be so easy for objects like BLE which don't have a clear read/write interface.

For option 2&3, I understand what you are saying. Regardless of whether it's useful for uasyncio, it would make sense to make mp_sched_schedule ISR/multi-core safe.

But the point I was trying to make previously is: how does the main uasyncio loop know that an event (eg ISR) occurred and that a task was put on to the run queue? Schematically the uasyncio loop does this:

while True:
    dt = get_minimum_time_to_next_waiting_task()
    wait_for_event_or_timeout(dt) # this may push a task to the front of the queue
    t = pop_next_runnable_task()
    run_task(t)

and the question is how wait_for_event_or_timeout(dt) is implemented. At the moment it uses poller.ipoll(dt) which continuously loops over all registered IO objects, checking if they are ready (for read/write/error) and returns if any are, or if the timeout dt is reached.

This polling loop actually works quite efficiently on an ARM-based MCU because it uses WFI in the loop, so essentially waits for any IRQ (which is the only source of an event that could make an objected readable/writeable) and then checks all objects again.

But that's still not ideal because 1) it's O(N) each loop, where N is the number of registered objects; 2) it can't do a more efficient sleep, it has to keep polling.

Ideally wait_for_event_or_timeout(dt) would be machine.lightsleep(dt) so the MCU could go into a really low-power state while waiting (eg this could work with wifi active, just waiting on any incoming packets which wake the MCU from the sleep state; both esp32 and PYBD can support this). Considering that machine.lightsleep(dt) can wake early, the following might be needed:

def wait_for_event_or_timeout(dt):
    t0 = ticks_ms()
    while True:
        t_remain = ticks_diff(ticks_add(t0, dt), ticks_ms())
        if t_remain <= 0:
            return # timeout reached
        machine.lightsleep(t_remain) # any events we wait on must wake this sleep state
        if event_of_interest_occurred():
            return # event occurred which we were waiting on

@tve
Copy link
Contributor

tve commented Apr 5, 2020

But the point I was trying to make previously is: how does the main uasyncio loop know that an event (eg ISR) occurred and that a task was put on to the run queue?

So I believe there are two parts to this question: a data structure part and a wake-up part. I believe you're really asking about the second. I'm not familiar with the stm32 port, but I have used WFI :-). The steps I expect to see are:

  1. The uasyncio loop sees that there's nothing to do for Nms and calls WFI with a timer interrupt set to N (or a periodic timer).
  2. Some peripheral raises an interrupt condition, the C-level interrupt handler runs.
  3. The interrupt handler places "something" into a uasyncio "run" queue, it could be a task onto the current run queue like Event.set does (would require making access to the queue atomic), it could be a task onto an auxiliary run queue like the slots or linked list described in the first couple of comments here, or it could be a callback onto the mp_sched_schedule queue. Except for differences in implementation this would be the same for all ports.
  4. The interrupt handler then needs to make sure the uasyncio loop actually resumes. With WFI there is the sleeponexit bit in the SCB->SCR register, if that bit is cleared ("don't sleep on interrupt return") doesn't that mean that the uasyncio loop resumes? It would then look at the queues and do the right thing, no?
  5. On the esp32 you probably don't want to call lightsleep because that disrupts Wifi, instead, with auto-light-sleep enabled, it is sufficient to call ulTaskNotifyTake with a duration of 4 ticks or more (see my PR https://github.com/micropython/micropython/pull/5473/files#diff-b9499fc8ad5b9793822626f6befdb1b6R253). To wake up the uasyncio loop, the interrupt handler needs to call xTaskNotify, which makes ulTaskNotifyTake return immediately thereby resuming the uasyncio loop.
  6. On the esp32 the part I don't yet know is how to make socket I/O event driven. It would be nice to get some callback by esp-idf on one of its threads when there is socket activity, this could then call xTaskNotify which would initiate a round of the current ipoll stuff. Alternatively, there may be something new in esp-idf v4 around the tcp_adapter and the new event system...

Getting all the above to work right is obviously a bit tricky, but I believe it can be done. Did I miss something or misunderstand the problem?

@dpgeorge
Copy link
Member

dpgeorge commented Apr 6, 2020

The steps I expect to see are:
...
Getting all the above to work right is obviously a bit tricky, but I believe it can be done.

I think that's a good summary of what needs to be done.

For the current implementation using select.poll steps 2-6 are effectively what poll does for you. But it only does it for streams, not arbitrary events. That gets to my original comment above #5795 (comment) that either everything is made to act like a stream and be pollable, or a new low-level event waiting mechanism (event waiter object) is invented (and works for example like described in steps 2-6 just above).

It's a hard problem because various ports like unix, zephyr, esp32, stm32 all have different underlying "operating systems" and support events in different ways.

Going back to the ESP-NOW example, it may not actually suit a stream protocol: for streams that have an error (POLLHUP, POLLERR) the stream is then considered unusable from that point and all operations that are waiting (both read and write) are woken and raise an exception (OSError). For ESP-NOW that is not desired behaviour, because an error with send does not necessarily mean the connection is broken/lost. Receiving would still work and the next send might work as well. You don't want the reading code to raise an exception just because the send of one packed failed.

Also, a stream being in a readable/writeable state is just that, a state. And polling is about being notified when that state changes. But an event is not a state. An event would be a state change. And one could have a queue of events which is in a state of being empty or non-empty (eg readable or not). But events are not states and trying to combine those concepts together using poll might get messy.

To wake up the uasyncio loop, the interrupt handler needs to call xTaskNotify, which makes ulTaskNotifyTake return immediately thereby resuming the uasyncio loop.

That means all interrupts/events that could possibly be used in uasyncio must have a way of notifying the loop. So there has to be some kind of registration of what events the loop is interested in. Or in the simple case just have a global "any_event" notification that is signalled by all possible ISRs, and to which anyone can subscribe, and the loop would just go back to sleep if it wasn't interested in any of the events that woke it (this is how WFI would work because WFI wakes on any IRQ).

On the esp32 the part I don't yet know is how to make socket I/O event driven.

There's already a simple mechanism to provide asynchronous callbacks on sockets that become readable, grep for MICROPY_PY_USOCKET_EVENTS. That could be enhanced by starting a new thread to do the polling (rather than checking periodically by the VM).

@tve
Copy link
Contributor

tve commented Apr 6, 2020

Also, a stream being in a readable/writeable state is just that, a state. And polling is about being notified when that state changes. But an event is not a state. An event would be a state change. And one could have a queue of events which is in a state of being empty or non-empty (eg readable or not). But events are not states and trying to combine those concepts together using poll might get messy.

I would have to agree. WRT to ESP-now I would want to refresh my memory of how UDP works with select/poll. ESP-now would have to be similar, except for the reliability aspect (which is a big difference). But in the end, I think the only reason for us looking to shove everything into select/poll is that that's the only thing the inner loop currently checks.

To wake up the uasyncio loop, the interrupt handler needs to call xTaskNotify, which makes ulTaskNotifyTake return immediately thereby resuming the uasyncio loop.

That means all interrupts/events that could possibly be used in uasyncio must have a way of notifying the loop. So there has to be some kind of registration of what events the loop is interested in. Or in the simple case just have a global "any_event" notification that is signalled by all possible ISRs, and to which anyone can subscribe, and the loop would just go back to sleep if it wasn't interested in any of the events that woke it (this is how WFI would work because WFI wakes on any IRQ).

Can you suggest an example use-case to think about or to prototype? Maybe a Pin.wait_for_edge call (I know there's Pin.irq)?

On the esp32 the part I don't yet know is how to make socket I/O event driven.

There's already a simple mechanism to provide asynchronous callbacks on sockets that become readable, grep for MICROPY_PY_USOCKET_EVENTS. That could be enhanced by starting a new thread to do the polling (rather than checking periodically by the VM).

You're right, having a thread in an infinite loop calling select/poll would be one option. Yuck...

@tve
Copy link
Contributor

tve commented Apr 6, 2020

I spent some time looking at py/scheduler.c WRT esp32. It uses the MICROPY_BEGIN/END_ATOMIC_SECTION macros which are not multi-core safe. But then I looked at the various esp-idf device drivers and all state something like "ISR handler will be attached to the same CPU core that this function is running on". So I would say that we shouldn't make the scheduler multi-core.

This leaves out something like the ESP-Now callback and probably some other communication related callbacks but it's probably wiser to push the burden on them to deal with the multi-core aspect internally, e.g., have the ISR use a FreeRTOS primitive to signal a task that runs on the MP core and have that call mp_sched_schedule. An alternative would be to rewrite the scheduler to be multi-core safe using CAS instructions, which I believe is entirely doable.

I also noticed that a bunch of peripherals don't support completion interrupts: I2C, SPI, I2S come to mind. Instead they have a read-with-timeout function. For uasyncio I have the feeling the best is to wrap this with a coro that explicitly polls (a while loop with a read with zero timeout, and an await sleep).

I believe other ports are simpler 'cause they're single-core and so the atomic sections around the scheduler primitives are entirely sufficient.

Ugh, it's getting late here, my brain is fading... So why can't device events in asyncio be solved with micropython.schedule() and machine.disable_irq()? If we make a variant of Event, say ISREvent and wrap all the methods with machine.disable_irq() then an IRQ handler can call mp_sched_schedule/micropython.schedule() to run some ISREvent.set() method. Did I miss something somewhere?

With that ISREvent building block, a device driver with asyncio support can internally allocate the ISREvents it needs and use them to signal between its ISR and waiting coros. I'll see whether I can cook up an example tomorrow...

@dpgeorge
Copy link
Member

dpgeorge commented Apr 7, 2020

So why can't device events in asyncio be solved with micropython.schedule() and machine.disable_irq()?

Fully disabling IRQs is quite drastic. There may be some very low-level IRQs which must still run without interruption (and are independent to MicroPython). All that's needed is to disable all IRQs that may run Python code and which may access the event queue.

It also probably needs something like uasyncio.notify_event() which the IRQ would call to notify a possibly sleeping uasyncio event loop (this may be a no-op on some systems).

If we make a variant of Event, say ISREvent and wrap all the methods with machine.disable_irq() then an IRQ handler can call mp_sched_schedule/micropython.schedule() to run some ISREvent.set() method.

The TaskQueue methods also need to be protected against re-entry (eg an IRQ can come in while adding a task to a queue).

tannewt pushed a commit to tannewt/circuitpython that referenced this issue Jan 7, 2022
…n-main

Translations update from Hosted Weblate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants