Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TW#13516] Wifi Disconnect Loop #724

Closed
Resultfactory opened this issue Jun 21, 2017 · 19 comments
Closed

[TW#13516] Wifi Disconnect Loop #724

Resultfactory opened this issue Jun 21, 2017 · 19 comments

Comments

@Resultfactory
Copy link

Lately after a few hours of running the esp32 goes into a disconnect loop.

A crash used to occur on:

I (22425788) wifi: ap_probe_send over, resett wifi status to disassoc

See log

Ever since esp-idf 058eb26, the esp32 is going into a wifi disconnect loop instead of a crash.
See log. The last 10 lines in this log file repeat until a reset happens.

We have tested esp_wifi_disconnect(), which properly disconnects and reconnects.
See log

Except on those wifi: ap_probe_send over, resett wifi status to disassoc
This problem occurs on a DOIT board with the esp-wroom32, Pycom LoPy and SiPy. We also tested this on 4 different routers.

Another disconnect loop on both a DOIT and LoPy log where wifi log level is set to verbose. This happend after we updated esp-idf to 7ed8c66

What can we do about this problem? We would like to recover from this disconnect. A crash is less problematic because after a restart the esp32 functions again.
The same issue is posted on esp32.com

@Resultfactory
Copy link
Author

We added a wifi deinit to try and circumvent the disconnect loop.

static void reinit_wifi(void *param)
{
	ESP_LOGD(TAG, "Stop, Deinit Wifi");
	ESP_ERROR_CHECK(esp_wifi_stop());
	ESP_ERROR_CHECK(esp_wifi_deinit());

	ESP_LOGD(TAG, "Init, Start Wifi");
	ESP_ERROR_CHECK(esp_wifi_init(&cfg));
	ESP_ERROR_CHECK(esp_wifi_set_storage(WIFI_STORAGE_FLASH));
	ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_STA));
	ESP_ERROR_CHECK(esp_wifi_set_config(WIFI_IF_STA, &wifi_config));
	ESP_ERROR_CHECK(esp_wifi_start());

	esp_wifi_set_ps(WIFI_PS_NONE); // Disable powersave

	ESP_LOGD(TAG, "Restart Wifi Done");
	vTaskDelete(NULL);
}

This function is started as a task, after 10 consecutive disconnects.
We tested this function by restarting the router. This caused 10 consecutive disconnects and then the ESP32 reconnected to the router and the internet. See log
It seems I (81906) wifi: ap_probe_send over, resett wifi status to disassoc also occurs when resetting the router.

This build ran from 21-6-2017 17:30 till 23-6-2017 on a DOIT and a LoPy
The disconnect loop occurred three times on both devices.
LoPy: log
DOIT: log

The disconnect loops and subsequent crashes occurred on 6 different times. These are the last time marks in the log files before the crash:

DOIT
I (14567551) Statistics: [21-06-17 20:06:56Z] Uptime: 000.04:02:38	IPv4: 10.0.2.251	Free Heap: 17140	Temperature: 75
I (9736161) Statistics: [22-06-17 16:21:57Z] Uptime: 000.02:42:07	IPv4: 10.0.2.251	Free Heap: 75276	Temperature: 76
I (7312241) Statistics: [22-06-17 18:25:33Z] Uptime: 000.02:01:43	IPv4: 10.0.2.251	Free Heap: 75948	Temperature: 75
LoPy
I (23643342) Statistics: [21-06-17 22:09:54Z] Uptime: 000.06:33:54	IPv4: 10.0.2.250	Free Heap: 55524	Temperature: 41
I (4682685) Statistics: [22-06-17 01:18:33Z] Uptime: 000.01:17:53	IPv4: 10.0.2.250	Free Heap: 62288	Temperature: 41
I (12137265) Statistics: [22-06-17 23:27:11Z] Uptime: 000.03:22:07	IPv4: 10.0.2.250	Free Heap: 75264	Temperature: 41

@HubbyGitter
Copy link

My code has a control flow aus outlined below: I can confirm that the connect calls yield a disconnect event with reason 201 very often. It takes e.g. 5 or 8 attempts to successfully connect. However, the first connect attempt after a reset via button appears to be always successful -- but not necessarily after the "hard reset" which make flash announces.

ESP32 Rev 0 here.

startup:
  tcpip_adapter_init()
  esp_wifi_init()
  esp_wifi_set_storage()
  esp_wifi_set_mode()
  esp_wifi_set_config()
ondemand:
  WAIT_FOR_DEMAND
  esp_wifi_start()
  esp_wifi_connect()
active:
  PROCESS_DEMAND
  CHECK_DEMAND_OR_TIMEOUT
  IF (demand) GOTO active
  esp_wifi_disconnect()
  esp_wifi_stop()
  GOTO ondemand

@FayeY FayeY changed the title Wifi Disconnect Loop [TW#13516] Wifi Disconnect Loop Jun 26, 2017
@Resultfactory
Copy link
Author

Resultfactory commented Jun 26, 2017

@HubbyGitter We have no trouble connecting to the AP. There are certain situations where calling esp_wifi_connect() is not enough. Our Wifi event handler can successfully recover from a disconnect, even the reinit_wifi() has successfully run. mentioned in our previous post:

See log

We enabled CONFIG_SW_COEXIST_ENABLE, this does not have a effect on the occurrence of the disconnect loop.
Our system ran this weekend on the DOIT and LoPy. 9 Disconnect loops occurred at these times:

DOIT
I (2339213) Statistics: [24-06-17 17:56:45Z] Uptime: 000.00:38:50	IPv4: 0.0.0.0	Free Heap: 139816	Temperature: 74
I (32786244) Statistics: [25-06-17 03:04:19Z] Uptime: 000.09:06:17	IPv4: 10.0.2.253	Free Heap: 74476	Temperature: 74
I (10469024) Statistics: [25-06-17 06:00:27Z] Uptime: 000.02:53:36	IPv4: 10.0.2.253	Free Heap: 38792	Temperature: 73
I (69754) Statistics: [25-06-17 06:08:20Z] Uptime: 000.00:01:00	IPv4: 10.0.2.253	Free Heap: 78924	Temperature: 74

LoPy
I (38835507) Statistics: [25-06-17 10:28:41Z] Uptime: 000.10:47:07	IPv4: 10.0.2.251	Free Heap: 66060	Temperature: 41
I (11640467) Statistics: [25-06-17 14:45:12Z] Uptime: 000.03:13:51	IPv4: 10.0.2.251	Free Heap: 34316	Temperature: 40
I (674877) Statistics: [25-06-17 14:58:18Z] Uptime: 000.00:11:05	IPv4: 10.0.2.251	Free Heap: 79044	Temperature: 40
I (1662877) Statistics: [25-06-17 20:40:20Z] Uptime: 000.00:27:34	IPv4: 10.0.2.251	Free Heap: 78648	Temperature: 41
I (8036827) Statistics: [25-06-17 23:17:55Z] Uptime: 000.02:13:47	IPv4: 10.0.2.251	Free Heap: 78080	Temperature: 41

In all 9 disconnect loops the WDT triggers ( The Task watchdog is set to 60seconds)
(log)
These lines from reinit_wifi are never reached:

	ESP_LOGD(TAG, "Restart Wifi Done");
	vTaskDelete(NULL);

@HubbyGitter
Copy link

HubbyGitter commented Jun 26, 2017

@Resultfactory I'd appreciate if you elaborated a bit. When is calling esp_wifi_connect() not enough?

Maybe I should have mentioned that the control flow stops the WiFi and starts from there when an event occurs which does not lead in the right direction.

@Resultfactory
Copy link
Author

Resultfactory commented Jun 26, 2017

@HubbyGitter that is our question as well. Our Wifi works fine, until this Disconnect loop happens Log. On every disconnect event, esp_wifi_connect() is called. After 10 consecutive disconnects we try to restart the wifi with reinit_wifi() which ends up triggering the WDT.

This weekend for example, the system was online for 10hours 48minutes (Log). Then this disconnect loop happens, we call reinit_wifi() and the system crashes.

@rojer
Copy link
Contributor

rojer commented Jun 27, 2017

gentlemen, can i ask you if wifi station reconnect works for you after AP goes down and back up?
i just filed #738 and feel like an idiot - it seems so trivial.
but "reason 201" loop is the common part, so can i ask you to test and see if this may be an issue for you too?

@TimXia
Copy link
Contributor

TimXia commented Jun 28, 2017

@Resultfactory If disconnect event happens, do not need to reinit WiFi, just call esp_wifi_connect() to reconnect.

@HubbyGitter
Copy link

@rojer For me, it does not matter why a reconnect takes place -- lost connection or intended disconnect. The first connect after system boot works well, and after that, I'm running into (finite) 201 loops, which do succeed eventually (e.g. as I write this, on the fourth attempt). I'm stopping WiFI but not reinitialising it in my FSM if something goes wrong and also, there'll be a short delay. The latest updates as of now have not changed the behaviour substantially.

@FHFS
Copy link

FHFS commented Jun 28, 2017

FWIW, we're aware of this bug and we're trying hard to get this fixed before the 2.1 release. - ESP_Sprite on esp32.com

@HubbyGitter
Copy link

@FHFS Thank you for sharing this piece of critical information.

@Resultfactory
Copy link
Author

Resultfactory commented Jul 3, 2017

@HubbyGitter We tested your workaround for reconnecting to the Wifi AP. You call a esp_wifi_stop & esp_wifi_start. We have tried doing the same. For the record we scan BLE advertisements. (*)On every disconnect event we call a esp_wifi_connect, after 10 disconnects with failed esp_wifi_connect's we call esp_wifi_stop & esp_wifi_start. The call to esp_wifi_start never returns and a WDT is generated.

We also preformed this test over the weekend with BLE turned off. The same 201 disconnect loop occurs, but we are able to recover by calling esp_wifi_stop & esp_wifi_start.
(*)EDIT clarifaction

@HubbyGitter
Copy link

@Resultfactory I don't see it as a workaround to stop the WiFI if it delivers an unexpected event. It doesn't come with too much extra costs (the most expensive stuff happens on wifi init), saves a case distinction or two, and gets you back to a consolidated state (with reasonable likelihood).

I have no idea what "count to ten" means (not an empty loop from 1 to 10, hopefully) and no, I'm delaying between wifi stop and wifi start, not between error and wifi stop -- wifi stop delivers an asynchronous event, for which I first wait (with a generous timeout). Still, there's a delay before the retry. Aggressive retries are rarely helpful, especially since a retry's justification is the hope that something changed in the meantime.

Yes, I'm in the robust code fan club ®️... ✅

@Resultfactory
Copy link
Author

@HubbyGitter Thanks for the suggestion, we will test adding one second delay between the stop and start. Although the test we did without BLE recovered nicely without delay from a 201 disconnect loop with stop start.

@liuzfesp
Copy link
Contributor

Hi @Resultfactory I think this issue already fixed in release 2.1

@FayeY
Copy link
Collaborator

FayeY commented Jul 23, 2017

Any update? Is this problem solved now?

@FayeY FayeY closed this as completed Jul 31, 2017
@Resultfactory
Copy link
Author

@FayeY
Today we encountered our first disconnect loop in a few weeks. Our code had changed alot(we don't continuously scan ble advertisements anymore), this is why we could not be certain if esp-idf v2.1 fixed this issue.
See line 336 of our log
The last message came in on

9-8-2017 14:17:59> Device: [dev01], Data:[{"Uptime":"000.00:47:00", "Connects":1, "Disconnects":0, "KeysReceived":0, "OpenDoors":0, "InvalidKeys":0, "FreeHeap":56420, "Temperature":41, "IPv4":"10.0.2.250"}]

The disconnect loop keeps repeating, sometimes it finds the AP and the state is set to run. But then disconnects.

W (3668260) BEAST: restartWifi -> done
E (3668260) BEA
ST: Unhandled event 3
W (3668270) BEAST: network_status_changed: esp_wifi_connect on SYSTEM_EVENT_STA_START
I (3668400) wifi: n:4 0, o:1 0, ap:255 255, sta:4 0, prof:1
I (3668400) wifi: state: init -> auth (b0)
I (3668400) wifi: state: auth -> assoc (0)
I (3668410) wifi: state: assoc -> run (10)
I (3669470) wifi: connected with linksysTest, channel 4
D (3669470) BEA
ST: WiFi RSSI: -61
I (3672460) wifi: state: run -> auth (2c0)
I (3672460) wifi: n:4 0, o:4 0, ap:255 255, sta:4 0, prof:1
E (3672460) BEA*ST: SYSTEM_EVENT_STA_DISCONNECTED (107), Reason 2

@FayeY FayeY reopened this Aug 10, 2017
@ricksondpenha
Copy link

Hi is there any update on this? any workarounds? this seems to be a critical bug in our project.. restarting the esp wifi during mid transmission is a big bummer!!

@liuzfesp
Copy link
Contributor

HI @ricksondpenha is this issue easy to reproduce or not? According to the log line88, the RSSI is -71, it means the WiFi signal is week, I guess the AP (the router) is far aways from the device (esp32). It disconnects in 336 because esp32 receives a de-auth packet from AP, then the esp32 can't find the AP. So could you help to check the following things:

  1. How far is the AP from ESP32?
  2. Is the mobile easy to connect the AP?
  3. If the issue is easy to reproduce, check whether the mobile phone can find the AP when ESP32 fails to find the AP (when the reason code is 201).

@FayeY
Copy link
Collaborator

FayeY commented Dec 21, 2017

Closing due to lack of response. Please feel free to reopen it if the issue persists and you can supply more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants