Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP node can't reconnect to WiFi AP with weak signal after warm boot #2757

Closed
ghtester opened this issue Nov 20, 2019 · 35 comments
Closed

ESP node can't reconnect to WiFi AP with weak signal after warm boot #2757

ghtester opened this issue Nov 20, 2019 · 35 comments
Labels
Category: Stabiliy Things that work, but not as long as desired Category: Wifi Related to the network connectivity

Comments

@ghtester
Copy link

As already discussed in some other issue topics, it looks there's an issue in several latest firmware releases which prevents ESP node to reconnect a WiFi AP with weak signal (RSSI about -90 dB) after warm boot. After cold boot (turn power off / on) the node is connected quickly to the same AP without issue.
Tested various WiFi settings on ESP but it looks still the same.
The issue is Reproducible in environment where a lot of WiFi APs is visible at different distances (with a different signal level). WIFISCAN command invoked from serial console does show many APs after cold boot. After warm (re)boot due to node crash or REBOOT command only reduced AP list is returned by WIFISCAN command. So it looks node WiFi sensitivity is significantly reduced after a warm (re)boot and it makes the reconnect to AP with weak signal impossible.
Latest test performed with official build:

Firmware

Build:⋄ | 20104 - Mega
System Libraries:⋄ | ESP82xx Core 2_6_0, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support
Git Build:⋄ | mega-20191119
Plugins:⋄ | 79 [Normal] [Testing]
Build Md5: | 2a40b605d5a65d4f9cee7fc88ad790
Md5 check: | passed.
Build Time:⋄ | Nov 19 2019 22:04:11
Binary Filename:⋄ | ESP_Easy_mega-20191119_test_core_260_sdk222_alpha_ESP8266_4M1M.b

@TD-er TD-er added Category: Stabiliy Things that work, but not as long as desired Category: Wifi Related to the network connectivity labels Nov 20, 2019
@Sasch600xt
Copy link

i can confirm.

one of my esp has about -87 to -90 RSSI and can´t connect after warm reboot.

@TD-er
Copy link
Member

TD-er commented Nov 20, 2019

@Sasch600xt What core version do you use?

@Sasch600xt
Copy link

ESP_Easy_mega-20191119_normal_core_260_sdk222_alpha_ESP8266_4M1M

@TD-er
Copy link
Member

TD-er commented Nov 21, 2019

I did change the platformio.ini file structure, so things may have a slightly different name.
The default is now core 2.6.1 (SDK 2.2.2), and I also have some build definitions with core 2.6.1 SDK3

Maybe you can also test core 2.6.1 with both SDK versions for this issue?

@Sasch600xt
Copy link

sure.....so next release then ? tonight ?

@TD-er
Copy link
Member

TD-er commented Nov 21, 2019

I started a test build.
Will be ready in 45 minutes I guess.

@Sasch600xt
Copy link

i will be out of hous until tonight.
So i will check as soon as possible....maybe i can manage it tonight after i come back.

@TD-er
Copy link
Member

TD-er commented Nov 21, 2019

Here is the test build: ESPEasy_mega-20191119-22-g17fbb474.zip

@Sasch600xt
Copy link

i have to leave now but i had time for 2 quick tests.

ESP_Easy_mega-20191119-22-g17fbb474_test_beta_ESP8266_4M1M
and
ESP_Easy_mega-20191119-22-g17fbb474_normal_ESP8266_4M1M

where not working.
they did not connect after warm reboot.
only hard reset brought them on again

@TD-er
Copy link
Member

TD-er commented Nov 21, 2019

OK, good to know.
It makes me curious about the tests with core 2.6.1 SDK3

@Sasch600xt
Copy link

Sasch600xt commented Nov 21, 2019

i did OTA Update to ESP_Easy_mega-20191119_test_core_260_sdk3_alpha_ESP8266_4M1M again and it came up after update, so all is fine with this firmware

@ghtester
Copy link
Author

Yes it looks core 2.6.0 with sdk3 is a working combination while sdk2.2.2 together with both core 2.6.0 and core 2.6.1 has the WiFi issue. I am just building the customized firmware using Vagrant so perhaps I could share a fresh quick experience with core 2.6.1 & sdk3 soon...

@ghtester
Copy link
Author

ghtester commented Nov 21, 2019

OK, so the first impressions with this custom build:

Entry Info
Build:⋄ 20104 - Mega
System Libraries:⋄ ESP82xx Core 2_6_1, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support
Git Build:⋄ My Build: Nov 21 201917:32:57
Plugins:⋄ 37 [Normal]
Build Md5: e2d52b9dca1ae3c9e7ca431e929220
Md5 check: passed.
Build Time:⋄ Nov 21 2019 17:34:24
Binary Filename:⋄ ESP_Easy_20191121_vagrant_custom_sdk3_ESP8266_4M1M.bin

In general it somehow works, WiFi connection is made even after a warm reboot with a remote AP (RSSI about -84dB) but not so quickly, even after cold boot. It could be due to AP model type, to be tested with another AP in different location.
The worse thing is that I have experienced several wdt reboots. It needs more time to do a better testing. But for sure I would like to create a custom build based on core 2.6.0 and sdk3.

@thomastech
Copy link
Contributor

thomastech commented Nov 23, 2019

I believe I have this problem too. Using ESP_Easy_mega-20191108-36-PR_2728_test_core_260_sdk222_alpha_ESP on a NodeMCU.

Cold boots experience fast WiFi connection.
Warm boots fail to connect.
The typical RSSI at this device location is usually -80dBm or stronger (currently -74dBm).

Just speculation, but perhaps signal quality temporarily gets worse in my operating environment (walk near device, RF interference, bad mojo, etc). Then the "reboot issue" occurs and that starts this warm boot failure mode.

BTW, despite the warm reboots, I didn't experience WiFi re-connection problems with a late August build using 260_sdk3_alpha core. So at this point I think that that the recent test_core_260_sdk222_alpha is involved.

  • Thomas

@Sasch600xt
Copy link

Sasch600xt commented Nov 23, 2019

wich bin from today can i use ? i miss a SDK3 for 4M1M.
i am not sure i can use the custom one ?

@TD-er
Copy link
Member

TD-er commented Nov 23, 2019

Today's build has changed the SDK version back to July version.
So when you use a version which doesn't have a core version mentioned (or core 2.6.1 explicit mentioned), then it has SDK 2.2.2 from July.

See discussion here: esp8266/Arduino#5784 (comment)

@Sasch600xt
Copy link

i see

@ghtester
Copy link
Author

Thanks for the info. Hopefully the bad WiFi sensitivity after a warm boot will be fixed somehow in future (if it's the same for SDK from July).
BTW. it looks sdk3 significantly helped with this issue but I also experienced more unexpected reboots.

@TD-er
Copy link
Member

TD-er commented Nov 23, 2019

I have 4 nodes running that core version with uptimes over 55 days and 2 with over 20 days.
So the core version is capable of running stable.
But the number of WiFi reconnects on those nodes is quite low, so maybe not entirely on-topic in this issue.

@ghtester
Copy link
Author

I'm not sure which core / sdk combination do you mean. I think if the signal from AP is strong and stable, it (almost) does not matter which core / sdk is used for ESP Easy mega build and it works quite good and stable. Nevertheless, the different WiFi sensitivity after a cold versus warm boot is a really bad issue IMHO... Just uploaded one my node with the fresh official build:

Firmware

Build:⋄ | 20104 - Mega
System Libraries:⋄ | ESP82xx Core 2.7.0-dev stage, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support
Git Build:⋄ | mega-20191123
Plugins:⋄ | 79 [Normal] [Testing]
Build Md5: | a667330ae76d2cfa961f72db502680
Md5 check: | passed.
Build Time:⋄ | Nov 23 2019 03:49:28
Binary Filename:⋄ | ESP_Easy_mega-20191123_test_beta_ESP8266_4M1M.bin

The WiFi issue is there again, node can't reconnect to AP after a warm reboot (RSSI -89).
After a cold boot it's connected immediately to the same AP.
I'll keep it running to test the stability with core 2.7.0.

@TD-er
Copy link
Member

TD-er commented Nov 23, 2019

That's not the core 2.6.1
That's running the beta version.
Please test with a version without "_beta".

@ghtester
Copy link
Author

ghtester commented Nov 23, 2019

That's OK. ;-) I think beta versions should be tested as well. So far every tested core with sdk 2.2.2 had the same WiFi "warm boot" issue. Perhaps I should find a solution how to automatically perform a cold boot right after a warm one... :-)

@ghtester
Copy link
Author

ghtester commented Nov 29, 2019

Let me share the test results on 2 nodes with the firmware mentioned above (ESP82xx Core 2.7.0-dev stage, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support).
So far it looks very good, seems to be quite stable under weak RSSI and performing very well.
The reconnect issue after a warm reboot is there but the unexpected reboot did not happen so far.

The ESP Easy mega node with only BMX280 plugin and Home Assistant (openHAB) MQTT Controller, RSSI about -82:
367533284: WD : Uptime 6126 ConnectFailures 144 FreeMem 15776 WiFiStatus 3
Sending data from the BMP280 to MQTT Controller every 15 secs.

The ESP Easy mega node with more plugins but most of time only listen to MQTT import, RSSI about -91:
535412697: WD : Uptime 8924 ConnectFailures 884 FreeMem 12624 WiFiStatus 3

TD-er added a commit to TD-er/ESPEasy that referenced this issue Nov 30, 2019
The WiFi.disconnect() ensures that the WiFi is working correctly. If this is not done before receiving WiFi connections,
those WiFi connections will take a long time to make or sometimes will not work at all.
@TD-er
Copy link
Member

TD-er commented Nov 30, 2019

In another repo, I came across some comment next to the WiFi.disconnect(); call in the Setup() function.
See the PR I just made: #2789

Maybe it can be tested to see if it does fix this issue?
Please try this test build ESPEasy_mega-20191130-2-PR_2789.zip

@thomastech
Copy link
Contributor

thomastech commented Nov 30, 2019

Please try this test build ESPEasy_mega-20191130-2-PR_2789.zip

I've installed ESP_Easy_mega-20191130-2-PR_2789_test_ESP8266_4M_VCC.bin on a NodeMCU and will report back on the test results.

  • Thomas

@ghtester
Copy link
Author

ghtester commented Dec 2, 2019

As already mentioned in another thread, I tested quickly the ESP_Easy_mega-20191130-3-PR_2792_test_beta_ESP8266_4M1M.bin and have to confirm that the issue with a limited WiFi connectivity after a warm boot is still there (looks to be related with SDK 2.2.2).

TD-er added a commit to TD-er/ESPEasy that referenced this issue Dec 3, 2019
The WiFi.disconnect() ensures that the WiFi is working correctly. If this is not done before receiving WiFi connections,
those WiFi connections will take a long time to make or sometimes will not work at all.
TD-er added a commit to TD-er/ESPEasy that referenced this issue Dec 3, 2019
The WiFi.disconnect() ensures that the WiFi is working correctly. If this is not done before receiving WiFi connections,
those WiFi connections will take a long time to make or sometimes will not work at all.
TD-er added a commit that referenced this issue Dec 3, 2019
[WiFi] Call WiFi disconnect in setup() (#2757)
@thomastech
Copy link
Contributor

Test update:

After ~3.5 days my ESP_Easy_mega-20191130-2-PR_2789_test_ESP8266_4M_VCC.bin (on a NodeMCU) has experienced a warm / soft reboot.

The device appears to have rebooted with partial WiFi connectivity because I received an email from it that announced the reboot. But web access is broken.

Initially I saw partial ESPEasy information from the browser. But after a couple refreshes all web access stopped (browser access times out). A cold power reset has restored operation.

  • Thomas

@TD-er
Copy link
Member

TD-er commented Dec 5, 2019

I just got an idea about what may be different between a cold and a warm boot for wifi reconnects.
Can you test a few times (with some minutes interval in between) to run a wifi scan from the tools page?
Preferably with Eco mode enabled to increase the effect I'm thinking about.
Does the AP you've configured appear in the list? (both if you setup more than one)

When running the most recent builds (test build, not even nightly's, for example this one: ESPEasy_mega-20191130-17-PR_2798.zip)
then the wifi scan will also store in RTC memory the strongest AP you have configured.
So when you then click the wifi disconnect button (or command WifiDisconnect from serial), then the unit will disconnect and reconnect to the last found strongest AP. (reconnect takes about 300 msec)

If you do this too frequently (within 5 minutes), the "next" configured AP will be selected, even though it is not the strongest signal.

So in short:

  • WiFi scan will perform an "active" scan (more on that later) and set the strongest known SSID in RTC as preferred AP to connect to.
  • WiFi reconnect will take the preferred one from RTC
  • A connection is considered "stable" after 5 minutes.
  • If a node reconnects to WiFi while not considered "stable", the attempt will be considered a new attempt.
  • For attempt > 1 && attempt modulo 2 == 0 the RTC preferred AP is deleted and the 'next' AP is chosen.
  • If there is no preferred AP in RTC, then the unit will perform a standard "WiFi.begin", which does a full scan (Arduino code, not my code). N.B. This may also be the default for "hidden SSID" units on cold boot.

OK, now the idea I have.
What if we need not to perform an "active" scan, but a "passive" scan?
The passive scan is when we just wait for as long as the timeout (default 200 msec for ESP8266, 300 for ESP32) for an AP to send its beacon signal. (typical 102.4 msec interval, but may differ between brands)
The active scan (which we perform) does send out a "ping" to all AP's to announce themselves.
So an active scan can be shorter than the timeout, but it can also result in less AP's found.

Now what I am curious about:
When you perform a wifi scan from the tools page, and one or both of the APs you have configured is not listed. Then what happens when you force a WiFi disconnect?
Will it try for a long time to connect to something that can hardly be reached?

What I can change:

  • Force the WiFi module to be fully awake when scanning and/or connecting
  • Use passive mode for scanning
  • Scan per channel and evaluate the result (what parameters to use?)

What may affect the tests:

  • Reset WiFi on connection loss (Restart WiFi Lost Conn:), which does turn WiFi off and on again
  • Eco mode (does activate the WiFi sleep mode after a while)
  • B/G only mode (may result in AP not reacting every beacon interval)

So a lot to consider and I hope my braindump here is not too chaotic to follow ;)

@ghtester
Copy link
Author

ghtester commented Dec 5, 2019

Well, thanks for the hints to test, I'll try to find some time to read your message carefully and test at least part of the suggested things.
I am not sure if the RTC can even help with the bad WiFi sensitivity after a warm boot when the SDK2.2.2 is used for a firmware build. Yes in general it's a nice feature for a quick reconnecting, if it will reliably work as designed. But couldn't somebody (of developpers) find the related difference between SDK2.2.2 and other SDKs without this issue?

@thomastech
Copy link
Contributor

Can you test a few times (with some minutes interval in between) to run a wifi scan from the tools page?
Preferably with Eco mode enabled to increase the effect I'm thinking about.
Does the AP you've configured appear in the list?

I tried several times over a two hour period, Eco Mode temporarily enabled. My WiFi router always appears in the list (only one router is configured on my devices).

  • Thomas

@TD-er
Copy link
Member

TD-er commented Dec 5, 2019

But couldn't somebody (of developpers) find the related difference between SDK2.2.2 and other SDKs without this issue?

Well, I have not been digging deep in the differences between SDK2.x and SDK3.
And even if I did, I cannot look into the WiFi code as that's proprietary and closed source.
I've been debugging WiFi issues the last 20 months with "black box debugging", which does resemble the debugging style of "writeln("blaat"); writeln("blaat2");" and looking at the output.

I am not sure if the RTC can even help with the bad WiFi sensitivity after a warm boot when the SDK2.2.2 is used for a firmware build. Yes in general it's a nice feature for a quick reconnecting, if it will reliably work as designed.

The main reason I added it (apart from the possibility to save energy on battery powered nodes) is to try and fix this issue we're dealing with here.
The WiFi settings stored in RTC do remain in tact with warm reboots (crashes included) and remove the need for scanning for WiFi networks.
It simply knows the last BSSID and channel used and also what SSID settings were used.
So the first 2 attempts will be to connect to the same AP as the last successful connection before the reboot or lost connection.
This also means you are not depending on whether the AP will react during the short scan interval, which can sometimes be an issue.

I tried several times over a two hour period, Eco Mode temporarily enabled. My WiFi router always appears in the list (only one router is configured on my devices).

OK, so at least for your setup it may not be a factor to change the scan mode from active to passive.

@ghtester
Copy link
Author

ghtester commented Apr 6, 2020

Let me share an update with some recent FW builds:

BAD = bad WiFi sensitivity after a warm (re)boot
OK = WiFi sensitivity is still the same (good) after a cold or a warm (re)boot

BAD
Build:⋄ 20106 - Mega
System Libraries:⋄ ESP82xx Core a04c3244, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support
Git Build:⋄
Plugin Count:⋄ 82 [Normal] [Testing]
Build Md5: 138327be07fcd8e807a677412dc247
Md5 check: passed.
Build Time:⋄ Mar 29 2020 04:22:08
Binary Filename:⋄ ESP_Easy_mega-20200328-6-PR_2972_test_beta_ESP8266_4M1M.bin

BAD
Build:⋄ 20105 - Mega
System Libraries:⋄ ESP82xx Core a04c3244, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support
Git Build:⋄ mega-20200328
Plugin Count:⋄ 82 [Normal] [Testing]
Build Md5: b8bb1bd39cd2df423cee65ea1b81fcc
Md5 check: passed.
Build Time:⋄ Mar 28 2020 02:33:26
Binary Filename:⋄ ESP_Easy_mega-20200328_test_beta_ESP8266_4M1M.bin

OK
Build:⋄ 20104 - Mega
System Libraries:⋄ ESP82xx Core 2_6_3, NONOS SDK 3.0.0-dev(c0f7b44), LWIP: 2.1.2 PUYA support
Git Build:⋄ My Build: Mar 11 202010:22:28
Plugin Count:⋄ 37 [Normal]
Build Md5: 53f44a927343c969efcca48142a883c
Md5 check: passed.
Build Time:⋄ Mar 11 2020 10:24:06
Binary Filename:⋄ ESP_Easy_20200311_vagrant_custom_sdk3_ESP8266_4M1M.bin

OK
Build:⋄ 20105 - Mega
System Libraries:⋄ ESP82xx Core 3d128e5c, NONOS SDK 2.2.2-dev(a58da79), LWIP: 2.1.2 PUYA support
Git Build:⋄ mega-20200328
Plugin Count:⋄ 13 [Normal] [Minimal, IR with AC]
Build Md5: 4d3a6ba6ad3029a3ed908269f9c98d83
Md5 check: passed.
Build Time:⋄ Mar 28 2020 02:09:47
Binary Filename:⋄ ESP_Easy_mega-20200328_minimal_IRext_ESP8266_4M1M.bin

OK
Build:⋄ 20106 - Mega
System Libraries:⋄ ESP82xx Core 3d128e5c, NONOS SDK 2.2.2-dev(a58da79), LWIP: 2.1.2 PUYA support
Git Build:⋄
Plugin Count:⋄ 16 [Normal]
Build Md5: e47e18c32043ebacc36819bc61a8eed
Md5 check: passed.
Build Time:⋄ Mar 29 2020 03:38:56
Binary Filename:⋄ ESP_Easy_mega-20200328-6-PR_2972_custom_ESP8266_4M1M.bin

So it looks SDK 2.2.2-dev(a58da79) fixed the WiFi issue reported above. So far I had to use SDK 3.0.0 (which was not recommended for use) for a custom firmware builds to avoid the bad WiFi sensitivity after a warm (re)boot which happens to me a bit often due to Exceptions.

@TD-er
Copy link
Member

TD-er commented Apr 6, 2020

Another striking correlation is that a high plugin count correlates with bad WiFi stability.

Not sure that it has something to do with it, just that it is a surprising correlation seen in your tests.

@ghtester
Copy link
Author

ghtester commented Apr 13, 2020

It's interesting to me that the latest custom build also reconnects OK even after warm (re)boot with the same SDK release...
So hopefully the issue is fixed and we could close this case?


OK

Build:⋄ 20106 - Mega
System Libraries:⋄ ESP82xx Core 5511180c, NONOS SDK 2.2.2-dev(bb83b9b), LWIP: 2.1.2 PUYA support
Git Build:⋄ My Build: Apr 11 2020 10:05:33
Plugin Count:⋄ 36 [Normal]
Build Md5: bc161e1b8b7984d07379d96b34972be5
Md5 check: passed.
Build Time:⋄ Apr 11 2020 10:07:20
Binary Filename:⋄ ESP_Easy_20200411_vagrant_custom_beta_ESP8266_4M1M.bin

@TD-er
Copy link
Member

TD-er commented Apr 13, 2020

Let's hope so.
Maybe you can also test a few nightly build files, to make sure it isn't a build issue.

amicol added a commit to amicol/ESPEasy that referenced this issue Sep 23, 2020
Release mega-20191208

Changes in mega-20191208 (since mega-20191130):

Gijs Noorlander (17):
      [WiFi] Use last known BSSID & channel from RTC + MQTT fixes
      [PIO] Update to core 2.6.2
      [WiFi] Improve ESP32 WiFi connect + fix mDNS updates
      [PIO] Fix build failure due to incorrect flags.
      [PIO] Don't use Python 3 specific since Travis still uses Python 2.7
      [WiFi] Call WiFi disconnect in setup() (letscontrolit#2757)
      [Rules] Parse template for all command calls
      [Rules] Execute some events asynchronous
      [Commands] Proper error logs when processing commands.
      [Commands] Add flag to be a bit more tolerant in last argument parsing
      [Events] Run events from rules immediately + add AsyncEvent command
      [FEATURE_SD] Add build in Platformio.ini with SD enabled (letscontrolit#2700 )
      [Rules] Add some checks for rules consistency before saving
      [Rules] Add check for correct received POST data when saving rules
      [JSON] Fix issues with generated JSON
      [JSON] Fix non valid timing stats json when no plugins defined (letscontrolit#2767)
      [JSON] Extend timing stats JSON end point with controller and misc stats

TD-er (1):
      automatically updated release notes for mega-20191208

stefan (1):
      Fix Arduino IDE include path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Stabiliy Things that work, but not as long as desired Category: Wifi Related to the network connectivity
Projects
None yet
Development

No branches or pull requests

4 participants