Skip to content
This repository has been archived by the owner on Sep 6, 2023. It is now read-only.

Hangs when reopening WiFi connection #167

Closed
peterhinch opened this issue Aug 22, 2017 · 22 comments
Closed

Hangs when reopening WiFi connection #167

peterhinch opened this issue Aug 22, 2017 · 22 comments

Comments

@peterhinch
Copy link
Contributor

This sample works if run after a power cycle. It hangs if a WiFi connection is already present (e.g. on a second run). It works if the commented-out sleep() statements are inserted. The aim here is to achieve a reliable (re)connection regardless of initial conditions and after a WiFi outage.

import uasyncio as asyncio
from utime import sleep
import network
s = network.WLAN(network.STA_IF)
s.active(True)

async def wifi_connect():
    print('WiFi connect')
    s.disconnect()
#    sleep(0.1)
    await asyncio.sleep(1)
    s.connect('SSID', 'PASSWORD')  # EDIT this
    print('Awaiting conection')
    while not s.isconnected():
#        sleep(0.1)
        await asyncio.sleep(1)
    print('Got conection, pausing')
    for _ in range(3):
#        sleep(0.1)
        await asyncio.sleep(1)
    print('conection done')
    return

loop = asyncio.get_event_loop()
loop.run_until_complete(wifi_connect())

Is there an issue of yielding to the underlying RTOS? sleep() statements are of course inappropriate in asynchronous code.

@peterhinch
Copy link
Contributor Author

Reading around the Espressif forums I've found references to the need periodically to allocate time to the underlying idle task. If there were a system call to run the idle task I could write a modified uasyncio to do this when idle or waiting on a delay. This would be far preferable to having to insert arbitrary sleep() calls in the application.

@robert-hh
Copy link
Contributor

There is a machine.idle() method already, which goes like that:

STATIC mp_obj_t machine_idle(void) {
    taskYIELD();
    return mp_const_none;
}

I don't know if that is what you need.

@peterhinch
Copy link
Contributor Author

peterhinch commented Sep 3, 2017

Robert, that does indeed look promising - thank you, well spotted. I'd assumed its function was the same as the ESP8266 version (which is entirely different) so never thought to investigate it. I'll experiment.
[EDIT]
I'm having some success with a continuously running task:

    async def _idle_task(self):
        while True:
            await asyncio.sleep_ms(10)
            idle()  # Yield to underlying RTOS

but initial results suggest it hasn't entirely eliminated the need for sleep(). I'll report back when I have something more definitive.

@peterhinch
Copy link
Contributor Author

After much experimentation the following precautions are required to make nonblocking sockets play tolerably well with uasyncio and to enable WiFi connections to reliably restart after an outage.

I have a function which issues utime.sleep_ms(20). In tests10ms is inadequate, 20ms seems reliable. This is referred to below as a pause.

  1. Run an idle task as above.
  2. Where data is written to a socket and a response is expected, a pause is required before reading the data or testing the socket for the presence of data. Without this the incoming data can fail to be recognised.
  3. When (re)connecting a station interface to WiFi disconnect, pause, connect, then loop on isconnected() with a pause in the loop.
  4. In all other cases where code loops on isconnected() a pause is required for the status change to reliably be recognised.

It would be good to know the reason for this and whether it would be possible to enhance idle() so that the need for these arbitrary ad hoc delays could be eliminated.

Note that issuing, say, uasyncio.sleep(1) does not obviate the need for these pause calls, even though the idle task will have run many times during the delay. An explicit delay is required.

@dpgeorge
Copy link
Member

dpgeorge commented Sep 6, 2017

@peterhinch thanks for the details on how to "fix" this. This is really a problem with the underlying "operating system" and so should be worked around in C code so that the Python programmer doesn't need to know about it (and of course so that scripts are portable across platforms).

@peterhinch
Copy link
Contributor Author

If progress is reported I'll re-test against the improved firmware build.

@peterhinch
Copy link
Contributor Author

Did anyone ever get to look into this?

@dpgeorge
Copy link
Member

I took a quick look and can confirm that the issue is there. The following code reproduces it without asyncio:

import network
s = network.WLAN(network.STA_IF)
s.active(True)

def wifi_connect():
    print('WiFi connect')
    s.disconnect()
    s.connect('SSID', 'PASSWORD')
    print('Awaiting conection')
    while not s.isconnected():
        pass
    print('Got connection')
    print(s.ifconfig())

wifi_connect()

After a hard reset it works, but second time round (without hard reset) it locks up in the isconnected() loop. Same as originally reported in this issue.

It does look like the idle task is being starved (by the uPy VM loop) and needs some time to process underlying network events. Changing the uPy task from priority 1 down to priority 0 doesn't help. Inserting a vTaskDelay(0) in the VM loop doesn't help. Making it vTaskDelay(1) does fix the issue but makes the VM very slow.

It's inclear how exactly to fix this but the first thing would be to update to the latest ESP IDF and then retest using that, then go from there.

@brandond
Copy link

Why the busy-wait with pass? Why not idle-loop with a short sleep? That's what I have in my startup handler and it works fine.

@dpgeorge
Copy link
Member

Why the busy-wait with pass? Why not idle-loop with a short sleep?

It's more to show the bug than as a real example. In real code that uses, eg, uasyncio one can't put short sleeps randomly throughout the code.

@brandond
Copy link

Keep in mind that the main fork of micropython only enables one of the two ESP32 cores, and a bunch of critical work needs to be done in the system idle process which is easy to starve if you're busy-waiting. I've been using the @loboris fork that enables both cores, which seems to handle some of this stuff better.

@kevinkk525
Copy link

kevinkk525 commented Feb 2, 2018

were you able to run asyncio on the loboris fork? I tried it with both cores enabled and not even the simplest async function (even with idle() in it) does work. It hangs until the watchdog gets triggered. It does not even execute the first command inside the async function, like this:

import uasyncio as asyncio
import machine

async def test():
    print("test")
    machine.idle()

loop=asyncio.get_event_loop()
loop.run_until_complete(test())

Results after a few seconds in:

Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - mp_task (CPU 0/1)
Tasks currently running:
CPU 0: IDLE
CPU 1: IDLE

@kevinkk525
Copy link

I was able to trace the problem down to the function utimeq.peektime() which responds good on esp8266 but on esp32 this gives a really large number so the asyncio.wait() is called with a really high number. But I'm not sure how this error happens.
Please see the issue #53 on loboris fork where I put the debug output and traced it to the utimeq library.

@robert-hh
Copy link
Contributor

robert-hh commented Feb 3, 2018

There is some difference between the ESP8266 and loboris port regarding time stamps. The esp8266 port treats all time stamps as short integers (31 bit), which eventually may wrap around. The code deals fine with that. In the loboris port, ticks_ms() and ticks_us() return up to 64 bit quantities, but in while used by utimeq module and eventually by ticks_diff() and ticks_add() these may be truncated. Funny enough, it even runs into an error. Just try:
ticks_diff(0x80000000, 0x80000001)
Since that situation get only a problem after a while, shortly after reset all should be in sync.
Edit:
Side note: ticks_diff() need a different argument order for the pycom port vs. the micropython/loborins port.

@kevinkk525
Copy link

You are right, thank you. The issue is being fixed by loboris currently.
Sorry for making this a bit "off-fork".

@mattytrentini
Copy link
Sponsor

I've tried reproducing the problem as originally described on an M5Stack using today's 1.9.4 build (specifically: esp32-20181001-v1.9.4-622-g69e790390.bin) but haven't been able to generate the same issue.

Just to be sure I'm trying going through the same process...

I copied the text as reported, removed the sleeps and used my local wifi connection details:

import uasyncio as asyncio
from utime import sleep
import network

s = network.WLAN(network.STA_IF)
s.active(True)

async def wifi_connect():
    print('WiFi connect')
    s.disconnect()
    await asyncio.sleep(1)
    s.connect('SSID', 'password')
    print('Awaiting conection')
    while not s.isconnected():
        await asyncio.sleep(1)
    print('Got conection, pausing')
    for _ in range(3):
        await asyncio.sleep(1)
    print('conection done')
    return

loop = asyncio.get_event_loop()
loop.run_until_complete(wifi_connect())

Saved it as main.py and uploaded it to an M5Stack Core (ie no PSRAM) unit with the MicroPython version mentioned above and uasyncio installed like so:

import network, upip
wlan = network.WLAN(network.STA_IF)
wlan.active(True)
wlan.connect('SSID', 'password')
upip.install('micropython-uasyncio')

Then I opened a repl to the device using mpfshell (with the repl command) and pressed the reset button a number of times. The device responded - after around a 15s delay - each time.

Was that the right sequence of events to raise the issue? Can anyone else confirm the result?

@peterhinch
Copy link
Contributor Author

It will reconnect after a hard reset or power cycle. The question is whether it can reconnect on a second run (e.g. after a soft reset).

You might also want to try @dpgeorge 's code sample as it is a more minimal test case than mine.

@mattytrentini
Copy link
Sponsor

OK, I see - the soft reset is the problem. Unfortunately I can confirm that the problem still exists on esp32-20181001-v1.9.4-622-g69e790390.bin. I used Damien's code. pressed hard reset (fine) followed by soft reset (hang after 'Awating connection').

@dpgeorge
Copy link
Member

To make progress on this issue here are some things to try:

  • enable the WDT to check if the idle task is being starved, via CONFIG_TASK_WDT_CHECK_IDLE_TASK
  • change the uPy config to use both cores of the esp32, and pin uPy to the second core (probably a good idea to enable both cores now that it has better support)

@nickzoic
Copy link
Collaborator

nickzoic commented Oct 25, 2018

OK so: The utime.sleep_ms(10) is insufficient because mp_hal_delay_ms won't get around to calling ulTaskNotifyTake for times <= 10ms, and that's what lets higher priority tasks take over, I think.

That's also a mechanism where MICROPY_EVENT_POLL_HOOK gets called and polls sockets, so a lot of things don't work without a sleep_ms(11) every now and then. There's another issue for this, somewhere.

machine.idle calls taskYIELD() so that's an option. I agree with Damien that this should be invisible to the Python user though. The #1 easy quick fix would be to taskYIELD from is_connected (tried it, it works), but that won't help with the socket polling issue.

We also haven't really answered the question of why it works on the first reset ... what changes?

[UPDATE: some of that I can't replicate in the cold light of morning ... not sure about taskYIELD.]

@nickzoic
Copy link
Collaborator

PS: I just noticed this is in the old repo. sigh we're going to have to do something about this repo, archive it off or something.

I created micropython/micropython#4269 to carry this issue forward ...

@dpgeorge
Copy link
Member

We also haven't really answered the question of why it works on the first reset ... what changes?

That is a very good point, and helped me to finally fix this issue. See discussion at micropython/micropython#4269 for further details about the fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants